platform/kernel/linux-starfive.git
22 months agonfp: check if application firmware is indifferent to port speed
Yinjun Zhang [Thu, 25 Aug 2022 14:12:22 +0000 (16:12 +0200)]
nfp: check if application firmware is indifferent to port speed

A new tlv type is introduced to indicate if application firmware is
indifferent to port speed, and inform management firmware of the
result.

And the result is always true for flower application firmware since
it's indifferent to port speed from the start and will never change.

Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agonfp: propagate port speed from management firmware
Yinjun Zhang [Thu, 25 Aug 2022 14:12:21 +0000 (16:12 +0200)]
nfp: propagate port speed from management firmware

In future releases the NIC application firmware may be indifferent to port
speeds - not built for specific port speeds - and consequently it will not
be able to report VF port speeds to the driver without first learning them.
With this change, the driver will pass the speed of physical ports from
management firmware to application firmware, and the latter will copy the
speed of port 0 to all the active VFs. So that the driver can get VF port
speed as before.

The port speed of a VF may be requested from userspace using:

  ethtool <vf-intf>

Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agoMerge branch 'sparx5-mrouter'
David S. Miller [Mon, 29 Aug 2022 11:57:38 +0000 (12:57 +0100)]
Merge branch 'sparx5-mrouter'

Casper Andersson says:

====================
net: sparx5: add mrouter support

This series adds support for multicast router ports to SparX5. To manage
mrouter ports the driver must keep track of mdb entries. When adding an
mrouter port the driver has to iterate over all mdb entries and modify
them accordingly.

v2:
- add bailout in free_mdb
- re-arrange mdb struct to avoid holes
- change devm_kzalloc -> kzalloc
- change GFP_ATOMIC -> GFP_KERNEL
- fix spelling
====================

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: sparx5: add support for mrouter ports
Casper Andersson [Thu, 25 Aug 2022 09:28:37 +0000 (11:28 +0200)]
net: sparx5: add support for mrouter ports

All multicast should be forwarded to mrouter ports. Mrouter ports must
therefore be part of all active multicast groups, and override flooding
from being disabled.

Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: sparx5: add list for mdb entries in driver
Casper Andersson [Thu, 25 Aug 2022 09:28:36 +0000 (11:28 +0200)]
net: sparx5: add list for mdb entries in driver

Keep track of all mdb entries in software for easy access.

Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoethernet: Add helpers to recognize addresses mapped to IP multicast
Casper Andersson [Thu, 25 Aug 2022 09:28:35 +0000 (11:28 +0200)]
ethernet: Add helpers to recognize addresses mapped to IP multicast

IP multicast must sometimes be discriminated from non-IP multicast,
e.g. when determining the forwarding behavior of a given group in the
presence of multicast router ports on an offloaded bridge. Therefore,
provide helpers to identify these groups.

Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agogenetlink: start to validate reserved header bytes
Jakub Kicinski [Thu, 25 Aug 2022 00:18:30 +0000 (17:18 -0700)]
genetlink: start to validate reserved header bytes

We had historically not checked that genlmsghdr.reserved
is 0 on input which prevents us from using those precious
bytes in the future.

One use case would be to extend the cmd field, which is
currently just 8 bits wide and 256 is not a lot of commands
for some core families.

To make sure that new families do the right thing by default
put the onus of opting out of validation on existing families.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel)
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet_sched: remove impossible conditions
Dan Carpenter [Thu, 25 Aug 2022 13:25:08 +0000 (16:25 +0300)]
net_sched: remove impossible conditions

We no longer allow "handle" to be zero, so there is no need to check
for that.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/Ywd4NIoS4aiilnMv@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: fman: memac: Uninitialized variable on error path
Dan Carpenter [Thu, 25 Aug 2022 13:17:19 +0000 (16:17 +0300)]
net: fman: memac: Uninitialized variable on error path

The "fixed_link" is only allocated sometimes but it's freed
unconditionally in the error handling.  Set it to NULL so we don't free
uninitialized data.

Fixes: 9ea4742a55ca ("net: fman: Configure fixed link in memac_initialization")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Link: https://lore.kernel.org/r/Ywd2X6gdKmTfYBxD@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch 'openvswitch-allow-specifying-ifindex-of-new-interfaces'
Jakub Kicinski [Sat, 27 Aug 2022 02:31:23 +0000 (19:31 -0700)]
Merge branch 'openvswitch-allow-specifying-ifindex-of-new-interfaces'

Andrey Zhadchenko says:

====================
openvswitch: allow specifying ifindex of new interfaces

CRIU currently do not support checkpoint/restore of OVS configurations, but
there was several requests for it. For example,
https://github.com/lxc/lxc/issues/2909

The main problem is ifindexes of newly created interfaces. We realy need to
preserve them after restore. Current openvswitch API does not allow to
specify ifindex. Most of the time we can just create an interface via
generic netlink requests and plug it into ovs but datapaths (generally any
OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
do not support selecting ifindex.

This patch allows to do so.
For new datapaths I decided to use dp_infindex in header as infindex
because it control ifindex for other requests too.
For internal vports I reused OVS_VPORT_ATTR_IFINDEX.

The only concern I have is that previously dp_ifindex was not used for
OVS_DP_VMD_NEW requests and some software may not set it to zero. However
we have been running this patch at Virtuozzo for 2 years and have not
encountered this problem. Not sure if it is worth to add new
ovs_datapath_attr instead.
====================

Link: https://lore.kernel.org/r/20220825020450.664147-1-andrey.zhadchenko@virtuozzo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoopenvswitch: add OVS_DP_ATTR_PER_CPU_PIDS to get requests
Andrey Zhadchenko [Thu, 25 Aug 2022 02:04:50 +0000 (05:04 +0300)]
openvswitch: add OVS_DP_ATTR_PER_CPU_PIDS to get requests

CRIU needs OVS_DP_ATTR_PER_CPU_PIDS to checkpoint/restore newest
openvswitch versions.
Add pids to generic datapath reply. Limit exported pids amount to
nr_cpu_ids.

Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoopenvswitch: allow specifying ifindex of new interfaces
Andrey Zhadchenko [Thu, 25 Aug 2022 02:04:49 +0000 (05:04 +0300)]
openvswitch: allow specifying ifindex of new interfaces

CRIU is preserving ifindexes of net devices after restoration. However,
current Open vSwitch API does not allow to target ifindex, so we cannot
correctly restore OVS configuration.

Add new OVS_DP_ATTR_IFINDEX for OVS_DP_CMD_NEW and use it as desired
ifindex.
Use OVS_VPORT_ATTR_IFINDEX during OVS_VPORT_CMD_NEW to specify new netdev
ifindex.

Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: ftmac100: add an opportunity to get ethaddr from the platform
Sergei Antonov [Wed, 24 Aug 2022 15:17:24 +0000 (18:17 +0300)]
net: ftmac100: add an opportunity to get ethaddr from the platform

This driver always generated a random ethernet address. Leave it as a
fallback solution, but add a call to platform_get_ethdev_address().
Handle EPROBE_DEFER returned from platform_get_ethdev_address() to
retry when EEPROM is ready.

Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220824151724.2698107-1-saproj@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: dsa: mv88e6xxx: Allow external SMI if serial
Marcus Carlberg [Wed, 24 Aug 2022 09:37:06 +0000 (11:37 +0200)]
net: dsa: mv88e6xxx: Allow external SMI if serial

p0_mode set to one of the supported serial mode should not prevent
configuring the external SMI interface in
mv88e6xxx_g2_scratch_gpio_set_smi. The current masking of the p0_mode
only checks the first 2 bits. This results in switches supporting
serial mode cannot setup external SMI on certain serial modes
(Ex: 1000BASE-X and SGMII).

Extend the mask of the p0_mode to include the reduced modes and
serial modes as allowed modes for the external SMI interface.

Signed-off-by: Marcus Carlberg <marcus.carlberg@axis.com>
Link: https://lore.kernel.org/r/20220824093706.19049-1-marcus.carlberg@axis.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agogenetlink: hold read cb_lock during iteration of genl_fam_idr in genl_bind()
Jiri Pirko [Thu, 25 Aug 2022 08:19:40 +0000 (10:19 +0200)]
genetlink: hold read cb_lock during iteration of genl_fam_idr in genl_bind()

In genl_bind(), currently genl_lock and write cb_lock are taken
for iteration of genl_fam_idr and processing of static values
stored in struct genl_family. Take just read cb_lock for this task
as it is sufficient to guard the idr and the struct against
concurrent genl_register/unregister_family() calls.

This will allow to run genl command processing in genl_rcv() and
mnl_socket_setsockopt(.., NETLINK_ADD_MEMBERSHIP, ..) in parallel.

Reported-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220825081940.1283335-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet/mlx4: Fix error check for dma_map_sg
Jack Wang [Thu, 25 Aug 2022 06:35:33 +0000 (08:35 +0200)]
net/mlx4: Fix error check for dma_map_sg

dma_map_sg return 0 on error.

Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20220825063533.21015-1-jinpu.wang@ionos.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: devlink: add RNLT lock assertion to devlink_compat_switch_id_get()
Jiri Pirko [Thu, 25 Aug 2022 11:29:23 +0000 (13:29 +0200)]
net: devlink: add RNLT lock assertion to devlink_compat_switch_id_get()

Similar to devlink_compat_phys_port_name_get(), make sure that
devlink_compat_switch_id_get() is called with RTNL lock held. Comment
already says so, so put this in code as well.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220825112923.1359194-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agomlx4: Do type_clear() for devlink ports when type_set() was called previously
Jiri Pirko [Thu, 25 Aug 2022 11:40:31 +0000 (13:40 +0200)]
mlx4: Do type_clear() for devlink ports when type_set() was called previously

Whenever the type_set() is called on a devlink port, accompany it by
matching type_clear() during cleanup.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220825114031.1361478-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: dsa: mv88e6xxx: support RGMII cmode
Marcus Carlberg [Mon, 22 Aug 2022 14:41:36 +0000 (16:41 +0200)]
net: dsa: mv88e6xxx: support RGMII cmode

Since the probe defaults all interfaces to the highest speed possible
(10GBASE-X in mv88e6393x) before the phy mode configuration from the
devicetree is considered it is currently impossible to use port 0 in
RGMII mode.

This change will allow RGMII modes to be configurable for port 0
enabling port 0 to be configured as RGMII as well as serial depending
on configuration.

Signed-off-by: Marcus Carlberg <marcus.carlberg@axis.com>
Link: https://lore.kernel.org/r/20220822144136.16627-1-marcus.carlberg@axis.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: sched: remove unnecessary init of qdisc skb head
Zhengchao Shao [Wed, 24 Aug 2022 09:10:03 +0000 (17:10 +0800)]
net: sched: remove unnecessary init of qdisc skb head

The memory allocated by using kzallloc_node and kcalloc has been cleared.
Therefore, the structure members of the new qdisc are 0. So there's no
need to explicitly assign a value of 0.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge tag 'wireless-next-2022-08-26-v2' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Fri, 26 Aug 2022 10:56:55 +0000 (11:56 +0100)]
Merge tag 'wireless-next-2022-08-26-v2' of git://git./linux/kernel/git/wireless/wireless-next

Johannes berg says:

====================
Various updates:
 * rtw88: operation, locking, warning, and code style fixes
 * rtw89: small updates
 * cfg80211/mac80211: more EHT/MLO (802.11be, WiFi 7) work
 * brcmfmac: a couple of fixes
 * misc cleanups etc.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agor8152: add PID for the Lenovo OneLink+ Dock
Jean-Francois Le Fillatre [Wed, 24 Aug 2022 19:14:36 +0000 (21:14 +0200)]
r8152: add PID for the Lenovo OneLink+ Dock

The Lenovo OneLink+ Dock contains an RTL8153 controller that behaves as
a broken CDC device by default. Add the custom Lenovo PID to the r8152
driver to support it properly.

Also, systems compatible with this dock provide a BIOS option to enable
MAC address passthrough (as per Lenovo document "ThinkPad Docking
Solutions 2017"). Add the custom PID to the MAC passthrough list too.

Tested on a ThinkPad 13 1st gen with the expected results:

passthrough disabled: Invalid header when reading pass-thru MAC addr
passthrough enabled:  Using pass-thru MAC addr XX:XX:XX:XX:XX:XX

Signed-off-by: Jean-Francois Le Fillatre <jflf_kernel@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
David S. Miller [Fri, 26 Aug 2022 10:39:00 +0000 (11:39 +0100)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-08-24 (ice)

This series contains updates to ice driver only.

Marcin adds support for TC parsing on TTL and ToS fields.

Anatolli adds support for devlink port split command to allow
configuration of various port configurations.

Jake allows for passing and writing an additional NVM write activate
field by expanding current cmd_flag.

Ani makes PHY debug output more readable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agowifi: rtw88: fix uninitialized use of primary channel index
Ping-Ke Shih [Mon, 15 Aug 2022 06:20:04 +0000 (14:20 +0800)]
wifi: rtw88: fix uninitialized use of primary channel index

clang reports uninitialized use:

>> drivers/net/wireless/realtek/rtw88/main.c:731:2: warning: variable
   'primary_channel_idx' is used uninitialized whenever switch default is
   taken [-Wsometimes-uninitialized]
           default:
           ^~~~~~~
   drivers/net/wireless/realtek/rtw88/main.c:754:39: note: uninitialized
   use occurs here
           hal->current_primary_channel_index = primary_channel_idx;
                                                ^~~~~~~~~~~~~~~~~~~
   drivers/net/wireless/realtek/rtw88/main.c:687:24: note: initialize the
   variable 'primary_channel_idx' to silence this warning
           u8 primary_channel_idx;
                                 ^
                                  = '\0'

This situation could not happen, because possible channel bandwidth
20/40/80MHz are enumerated.

Fixes: 341dd1f7de4c ("wifi: rtw88: add the update channel flow to support setting by parameters")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20220815062004.22920-1-pkshih@realtek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agonet: phylink: allow RGMII/RTBI in-band status
Qingfang DENG [Wed, 24 Aug 2022 06:10:34 +0000 (14:10 +0800)]
net: phylink: allow RGMII/RTBI in-band status

As per RGMII specification v2.0, section 3.4.1, RGMII/RTBI has an
optional in-band status feature where the PHY's link status, speed and
duplex mode can be passed to the MAC.
Allow RGMII/RTBI to use in-band status.

Signed-off-by: Qingfang DENG <dqfext@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge branch 'prestera-matchall'
David S. Miller [Fri, 26 Aug 2022 09:04:55 +0000 (10:04 +0100)]
Merge branch 'prestera-matchall'

Maksym Glubokiy says:

====================
net: prestera: matchall features

This patch series extracts matchall rules management out of SPAN API
implementation and adds 2 features on top of that:
  - support for egress traffic (mirred egress action)
  - proper rule priorities management between matchall and flower
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: prestera: manage matchall and flower priorities
Maksym Glubokiy [Tue, 23 Aug 2022 11:39:58 +0000 (14:39 +0300)]
net: prestera: manage matchall and flower priorities

matchall rules can be added only to chain 0 and their priorities have
limitations:
 - new matchall ingress rule's priority must be higher (lower value)
   than any existing flower rule;
 - new matchall egress rule's priority must be lower (higher value)
   than any existing flower rule.

The opposite works for flower rule adding:
 - new flower ingress rule's priority must be lower (higher value)
   than any existing matchall rule;
 - new flower egress rule's priority must be higher (lower value)
   than any existing matchall rule.

This is a hardware limitation and thus must be properly handled in
driver by reporting errors to the user when newly added rule has such a
priority that cannot be installed into the hardware.

To achieve this, the driver must maintain both min/max matchall
priorities for every flower block when user adds/deletes a matchall
rule, as well as both min/max flower priorities for chain 0 for every
adding/deletion of flower rules for chain 0.

Cc: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: prestera: add support for egress traffic mirroring
Serhiy Boiko [Tue, 23 Aug 2022 11:39:57 +0000 (14:39 +0300)]
net: prestera: add support for egress traffic mirroring

This enables adding matchall rules for egress:

  tc filter add .. egress .. matchall skip_sw \
    action mirred egress mirror dev ..

Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: prestera: acl: extract matchall logic into a separate file
Serhiy Boiko [Tue, 23 Aug 2022 11:39:56 +0000 (14:39 +0300)]
net: prestera: acl: extract matchall logic into a separate file

This commit adds more clarity to handling of TC_CLSMATCHALL_REPLACE and
TC_CLSMATCHALL_DESTROY events by calling newly added *_mall_*() handlers
instead of directly calling SPAN API.

This also extracts matchall rules management out of SPAN API since SPAN
is a hardware module which is used to implement 'matchall egress mirred'
action only.

Signed-off-by: Taras Chornyi <tchornyi@marvell.com>
Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: asix: ax88772: add ethtool pause configuration
Oleksij Rempel [Mon, 22 Aug 2022 12:39:43 +0000 (14:39 +0200)]
net: asix: ax88772: add ethtool pause configuration

Add phylink based ethtool pause configuration

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: asix: ax88772: migrate to phylink
Oleksij Rempel [Mon, 22 Aug 2022 12:39:42 +0000 (14:39 +0200)]
net: asix: ax88772: migrate to phylink

There are some exotic ax88772 based devices which may require
functionality provide by the phylink framework. For example:
- US100A20SFP, USB 2.0 auf LWL Converter with SFP Cage
- AX88772B USB to 100Base-TX Ethernet (with RMII) demo board, where it
  is possible to switch between internal PHY and external RMII based
  connection.

So, convert this driver to phylink as soon as possible.

Tested with:
- AX88772A + internal PHY
- AX88772B + external DP83TD510E T1L PHY

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agowifi: mac80211: use full 'unsigned int' type
Xin Gao [Tue, 16 Aug 2022 18:10:40 +0000 (02:10 +0800)]
wifi: mac80211: use full 'unsigned int' type

The full 'unsigned int' is better than 'unsigned'.

Signed-off-by: Xin Gao <gaoxin@cdjrlc.com>
Link: https://lore.kernel.org/r/20220816181040.9044-1-gaoxin@cdjrlc.com
[fix indentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: move from strlcpy with unused retval to strscpy
Wolfram Sang [Thu, 18 Aug 2022 21:02:23 +0000 (23:02 +0200)]
wifi: mac80211: move from strlcpy with unused retval to strscpy

Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: read ethtool's sta_stats from sinfo
Ryder Lee [Thu, 25 Aug 2022 22:33:16 +0000 (06:33 +0800)]
wifi: mac80211: read ethtool's sta_stats from sinfo

Driver may update sinfo directly through .sta_statistics, so this
patch makes sure that ethool gets the correct statistics.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Link: https://lore.kernel.org/r/f9edff14dd7f5205acf1c21bae8e9d8f9802dd88.1661466499.git.ryder.lee@mediatek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: correct SMPS mode in HE 6 GHz capability
Johannes Berg [Thu, 25 Aug 2022 18:57:32 +0000 (20:57 +0200)]
wifi: mac80211: correct SMPS mode in HE 6 GHz capability

If we add 6 GHz capability in MLO, we cannot use the SMPS
mode from the deflink. Pass it separately instead since on
a second link we don't even have a link data struct yet.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agodt-bindings: net: Add missing (unevaluated|additional)Properties on child nodes
Rob Herring [Thu, 25 Aug 2022 19:26:07 +0000 (14:26 -0500)]
dt-bindings: net: Add missing (unevaluated|additional)Properties on child nodes

In order to ensure only documented properties are present, node schemas
must have unevaluatedProperties or additionalProperties set to false
(typically). Add missing properties/$refs as exposed by this addition.

Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220825192609.1538463-1-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 25 Aug 2022 23:07:42 +0000 (16:07 -0700)]
Merge git://git./linux/kernel/git/netdev/net

drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
  21234e3a84c7 ("net/mlx5e: Fix use after free in mlx5e_fs_init()")
  c7eafc5ed068 ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer")
https://lore.kernel.org/all/20220825104410.67d4709c@canb.auug.org.au/
https://lore.kernel.org/all/20220823055533.334471-1-saeed@kernel.org/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonetdev: Use try_cmpxchg in napi_if_scheduled_mark_missed
Uros Bizjak [Mon, 22 Aug 2022 14:32:43 +0000 (16:32 +0200)]
netdev: Use try_cmpxchg in napi_if_scheduled_mark_missed

Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
napi_if_scheduled_mark_missed. x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after cmpxchg
(and related move instruction in front of cmpxchg).

Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg
fails, enabling further code simplifications.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20220822143243.2798-1-ubizjak@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 25 Aug 2022 21:03:58 +0000 (14:03 -0700)]
Merge tag 'net-6.0-rc3' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from ipsec and netfilter (with one broken Fixes tag).

  Current release - new code bugs:

   - dsa: don't dereference NULL extack in dsa_slave_changeupper()

   - dpaa: fix <1G ethernet on LS1046ARDB

   - neigh: don't call kfree_skb() under spin_lock_irqsave()

  Previous releases - regressions:

   - r8152: fix the RX FIFO settings when suspending

   - dsa: microchip: keep compatibility with device tree blobs with no
     phy-mode

   - Revert "net: macsec: update SCI upon MAC address change."

   - Revert "xfrm: update SA curlft.use_time", comply with RFC 2367

  Previous releases - always broken:

   - netfilter: conntrack: work around exceeded TCP receive window

   - ipsec: fix a null pointer dereference of dst->dev on a metadata dst
     in xfrm_lookup_with_ifid

   - moxa: get rid of asymmetry in DMA mapping/unmapping

   - dsa: microchip: make learning configurable and keep it off while
     standalone

   - ice: xsk: prohibit usage of non-balanced queue id

   - rxrpc: fix locking in rxrpc's sendmsg

  Misc:

   - another chunk of sysctl data race silencing"

* tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
  net: lantiq_xrx200: restore buffer if memory allocation failed
  net: lantiq_xrx200: fix lock under memory pressure
  net: lantiq_xrx200: confirm skb is allocated before using
  net: stmmac: work around sporadic tx issue on link-up
  ionic: VF initial random MAC address if no assigned mac
  ionic: fix up issues with handling EAGAIN on FW cmds
  ionic: clear broken state on generation change
  rxrpc: Fix locking in rxrpc's sendmsg
  net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2
  MAINTAINERS: rectify file entry in BONDING DRIVER
  i40e: Fix incorrect address type for IPv6 flow rules
  ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter
  net: Fix a data-race around sysctl_somaxconn.
  net: Fix a data-race around netdev_unregister_timeout_secs.
  net: Fix a data-race around gro_normal_batch.
  net: Fix data-races around sysctl_devconf_inherit_init_net.
  net: Fix data-races around sysctl_fb_tunnels_only_for_init_net.
  net: Fix a data-race around netdev_budget_usecs.
  net: Fix data-races around sysctl_max_skb_frags.
  net: Fix a data-race around netdev_budget.
  ...

22 months agoMerge branch 'mlxsw-remove-some-unused-code'
Jakub Kicinski [Thu, 25 Aug 2022 20:24:47 +0000 (13:24 -0700)]
Merge branch 'mlxsw-remove-some-unused-code'

Petr Machata says:

====================
mlxsw: Remove some unused code

This patchset removes code that is not used anymore after the following two
commits removed all users:

- commit b0d80c013b04 ("mlxsw: Remove Mellanox SwitchX-2 ASIC support")
- commit 9b43fbb8ce24 ("mlxsw: Remove Mellanox SwitchIB ASIC support")
====================

Link: https://lore.kernel.org/r/cover.1661350629.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agomlxsw: Remove unused mlxsw_core_port_type_get()
Jiri Pirko [Wed, 24 Aug 2022 15:18:51 +0000 (17:18 +0200)]
mlxsw: Remove unused mlxsw_core_port_type_get()

Function mlxsw_core_port_type_get() is no longer used. So remove it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agomlxsw: Remove unused port_type_set devlink op
Jiri Pirko [Wed, 24 Aug 2022 15:18:50 +0000 (17:18 +0200)]
mlxsw: Remove unused port_type_set devlink op

port_type_set devlink op is no longer used by any mlxsw driver,
so remove it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agomlxsw: Remove unused IB stuff
Jiri Pirko [Wed, 24 Aug 2022 15:18:49 +0000 (17:18 +0200)]
mlxsw: Remove unused IB stuff

There are some IB leftovers that are no longer used in the code.
So remove them.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch 'net-devlink-sync-flash-and-dev-info-commands'
Jakub Kicinski [Thu, 25 Aug 2022 20:22:58 +0000 (13:22 -0700)]
Merge branch 'net-devlink-sync-flash-and-dev-info-commands'

Jiri Pirko says:

====================
net: devlink: sync flash and dev info commands

Purpose of this patchset is to introduce consistency between two devlink
commands:
  devlink dev info
    Shows versions of running default flash target and components.
  devlink dev flash
    Flashes default flash target or component name (if specified
    on cmdline).

Currently it is up to the driver what versions to expose and what flash
update component names to accept. This is inconsistent. Thankfully, only
netdevsim currently using components so it is still time
to sanitize this.

This patchset makes sure, that devlink.c calls into driver for
component flash update only in case the driver exposes the same version
name.

Example:
$ devlink dev info
netdevsim/netdevsim10:
  driver netdevsim
  versions:
      running:
        fw.mgmt 10.20.30
      stored:
        fw.mgmt 10.20.30
$ devlink dev flash netdevsim/netdevsim10 file somefile.bin
[fw.mgmt] Preparing to flash
[fw.mgmt] Flashing 100%
[fw.mgmt] Flash select
[fw.mgmt] Flashing done
$ devlink dev flash netdevsim/netdevsim10 file somefile.bin component fw.mgmt
[fw.mgmt] Preparing to flash
[fw.mgmt] Flashing 100%
[fw.mgmt] Flash select
[fw.mgmt] Flashing done

$ devlink dev flash netdevsim/netdevsim10 file somefile.bin component dummy
Error: selected component is not supported by this device.
====================

Link: https://lore.kernel.org/r/20220824122011.1204330-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: devlink: limit flash component name to match version returned by info_get()
Jiri Pirko [Wed, 24 Aug 2022 12:20:11 +0000 (14:20 +0200)]
net: devlink: limit flash component name to match version returned by info_get()

Limit the acceptance of component name passed to cmd_flash_update() to
match one of the versions returned by info_get(), marked by version type.
This makes things clearer and enforces 1:1 mapping between exposed
version and accepted flash component.

Check VERSION_TYPE_COMPONENT version type during cmd_flash_update()
execution by calling info_get() with different "req" context.
That causes info_get() to lookup the component name instead of
filling-up the netlink message.

Remove "UPDATE_COMPONENT" flag which becomes used.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonetdevsim: add version fw.mgmt info info_get() and mark as a component
Jiri Pirko [Wed, 24 Aug 2022 12:20:10 +0000 (14:20 +0200)]
netdevsim: add version fw.mgmt info info_get() and mark as a component

Fix the only component user which is netdevsim. It uses component named
"fw.mgmt" in selftests. So add this version to info_get() output with
version type component.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: devlink: extend info_get() version put to indicate a flash component
Jiri Pirko [Wed, 24 Aug 2022 12:20:09 +0000 (14:20 +0200)]
net: devlink: extend info_get() version put to indicate a flash component

Whenever the driver is called by his info_get() op, it may put multiple
version names and values to the netlink message. Extend by additional
helper devlink_info_version_running/stored_put_ext() that allows to
specify a version type that indicates when particular version name
represents a flash component.

This is going to be used in follow-up patch calling info_get() during
flash update command checking if version with this the version type
exists.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoselftests/net: fix reinitialization of TEST_PROGS in net self tests.
Adel Abouchaev [Wed, 24 Aug 2022 18:43:51 +0000 (11:43 -0700)]
selftests/net: fix reinitialization of TEST_PROGS in net self tests.

Assinging will drop all previous tests.

Fixes: b690842d12fd ("selftests/net: test l2 tunnel TOS/TTL inheriting")
Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220824184351.3759862-1-adel.abushaev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch 'net-lantiq_xrx200-fix-errors-under-memory-pressure'
Jakub Kicinski [Thu, 25 Aug 2022 19:41:41 +0000 (12:41 -0700)]
Merge branch 'net-lantiq_xrx200-fix-errors-under-memory-pressure'

Aleksander Jan Bajkowski says:

====================
net: lantiq_xrx200: fix errors under memory pressure

This series fixes issues that can occur in the driver under memory pressure.
Situations when the system cannot allocate memory are rare, so the mentioned
bugs have been fixed recently. The patches have been tested on a BT Home
router with the Lantiq xRX200 chipset.

Changelog:
  v3: - removed netdev_err() log from the first patch
  v2:
   - the second patch has been changed, so that under memory pressure situation
     the driver will not receive packets indefinitely regardless of the NAPI budget,
   - the third patch has been added.
====================

Link: https://lore.kernel.org/r/20220824215408.4695-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: lantiq_xrx200: restore buffer if memory allocation failed
Aleksander Jan Bajkowski [Wed, 24 Aug 2022 21:54:08 +0000 (23:54 +0200)]
net: lantiq_xrx200: restore buffer if memory allocation failed

In a situation where memory allocation fails, an invalid buffer address
is stored. When this descriptor is used again, the system panics in the
build_skb() function when accessing memory.

Fixes: 7ea6cd16f159 ("lantiq: net: fix duplicated skb in rx descriptor ring")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: lantiq_xrx200: fix lock under memory pressure
Aleksander Jan Bajkowski [Wed, 24 Aug 2022 21:54:07 +0000 (23:54 +0200)]
net: lantiq_xrx200: fix lock under memory pressure

When the xrx200_hw_receive() function returns -ENOMEM, the NAPI poll
function immediately returns an error.
This is incorrect for two reasons:
* the function terminates without enabling interrupts or scheduling NAPI,
* the error code (-ENOMEM) is returned instead of the number of received
packets.

After the first memory allocation failure occurs, packet reception is
locked due to disabled interrupts from DMA..

Fixes: fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: lantiq_xrx200: confirm skb is allocated before using
Aleksander Jan Bajkowski [Wed, 24 Aug 2022 21:54:06 +0000 (23:54 +0200)]
net: lantiq_xrx200: confirm skb is allocated before using

xrx200_hw_receive() assumes build_skb() always works and goes straight
to skb_reserve(). However, build_skb() can fail under memory pressure.

Add a check in case build_skb() failed to allocate and return NULL.

Fixes: e015593573b3 ("net: lantiq_xrx200: convert to build_skb")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: stmmac: work around sporadic tx issue on link-up
Heiner Kallweit [Wed, 24 Aug 2022 20:34:49 +0000 (22:34 +0200)]
net: stmmac: work around sporadic tx issue on link-up

This is a follow-up to the discussion in [0]. It seems to me that
at least the IP version used on Amlogic SoC's sometimes has a problem
if register MAC_CTRL_REG is written whilst the chip is still processing
a previous write. But that's just a guess.
Adding a delay between two writes to this register helps, but we can
also simply omit the offending second write. This patch uses the second
approach and is based on a suggestion from Qi Duan.
Benefit of this approach is that we can save few register writes, also
on not affected chip versions.

[0] https://www.spinics.net/lists/netdev/msg831526.html

Fixes: bfab27a146ed ("stmmac: add the experimental PCI support")
Suggested-by: Qi Duan <qi.duan@amlogic.com>
Suggested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/e99857ce-bd90-5093-ca8c-8cd480b5a0a2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Jakub Kicinski [Thu, 25 Aug 2022 19:40:29 +0000 (12:40 -0700)]
Merge branch '10GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-08-24 (ixgbe, i40e)

This series contains updates to ixgbe and i40e drivers.

Jake stops incorrect resetting of SYSTIME registers when starting
cyclecounter for ixgbe.

Sylwester corrects a check on source IP address when validating destination
for i40e.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  i40e: Fix incorrect address type for IPv6 flow rules
  ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter
====================

Link: https://lore.kernel.org/r/20220824193748.874343-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch 'ionic-bug-fixes'
Jakub Kicinski [Thu, 25 Aug 2022 19:40:17 +0000 (12:40 -0700)]
Merge branch 'ionic-bug-fixes'

Shannon Nelson says:

====================
ionic: bug fixes

These are a couple of maintenance bug fixes for the Pensando ionic
networking driver.

Mohamed takes care of a "plays well with others" issue where the
VF spec is a bit vague on VF mac addresses, but certain customers
have come to expect behavior based on other vendor drivers.

Shannon addresses a couple of corner cases seen in internal
stress testing.
====================

Link: https://lore.kernel.org/r/20220824165051.6185-1-snelson@pensando.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoionic: VF initial random MAC address if no assigned mac
R Mohamed Shah [Wed, 24 Aug 2022 16:50:51 +0000 (09:50 -0700)]
ionic: VF initial random MAC address if no assigned mac

Assign a random mac address to the VF interface station
address if it boots with a zero mac address in order to match
similar behavior seen in other VF drivers.  Handle the errors
where the older firmware does not allow the VF to set its own
station address.

Newer firmware will allow the VF to set the station mac address
if it hasn't already been set administratively through the PF.
Setting it will also be allowed if the VF has trust.

Fixes: fbb39807e9ae ("ionic: support sr-iov operations")
Signed-off-by: R Mohamed Shah <mohamed@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoionic: fix up issues with handling EAGAIN on FW cmds
Shannon Nelson [Wed, 24 Aug 2022 16:50:50 +0000 (09:50 -0700)]
ionic: fix up issues with handling EAGAIN on FW cmds

In looping on FW update tests we occasionally see the
FW_ACTIVATE_STATUS command fail while it is in its EAGAIN loop
waiting for the FW activate step to finsh inside the FW.  The
firmware is complaining that the done bit is set when a new
dev_cmd is going to be processed.

Doing a clean on the cmd registers and doorbell before exiting
the wait-for-done and cleaning the done bit before the sleep
prevents this from occurring.

Fixes: fbfb8031533c ("ionic: Add hardware init and device commands")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoionic: clear broken state on generation change
Shannon Nelson [Wed, 24 Aug 2022 16:50:49 +0000 (09:50 -0700)]
ionic: clear broken state on generation change

There is a case found in heavy testing where a link flap happens just
before a firmware Recovery event and the driver gets stuck in the
BROKEN state.  This comes from the driver getting interrupted by a FW
generation change when coming back up from the link flap, and the call
to ionic_start_queues() in ionic_link_status_check() fails.  This can be
addressed by having the fw_up code clear the BROKEN bit if seen, rather
than waiting for a user to manually force the interface down and then
back up.

Fixes: 9e8eaf8427b6 ("ionic: stop watchdog when in broken state")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agorxrpc: Fix locking in rxrpc's sendmsg
David Howells [Wed, 24 Aug 2022 16:35:45 +0000 (17:35 +0100)]
rxrpc: Fix locking in rxrpc's sendmsg

Fix three bugs in the rxrpc's sendmsg implementation:

 (1) rxrpc_new_client_call() should release the socket lock when returning
     an error from rxrpc_get_call_slot().

 (2) rxrpc_wait_for_tx_window_intr() will return without the call mutex
     held in the event that we're interrupted by a signal whilst waiting
     for tx space on the socket or relocking the call mutex afterwards.

     Fix this by: (a) moving the unlock/lock of the call mutex up to
     rxrpc_send_data() such that the lock is not held around all of
     rxrpc_wait_for_tx_window*() and (b) indicating to higher callers
     whether we're return with the lock dropped.  Note that this means
     recvmsg() will not block on this call whilst we're waiting.

 (3) After dropping and regaining the call mutex, rxrpc_send_data() needs
     to go and recheck the state of the tx_pending buffer and the
     tx_total_len check in case we raced with another sendmsg() on the same
     call.

Thinking on this some more, it might make sense to have different locks for
sendmsg() and recvmsg().  There's probably no need to make recvmsg() wait
for sendmsg().  It does mean that recvmsg() can return MSG_EOR indicating
that a call is dead before a sendmsg() to that call returns - but that can
currently happen anyway.

Without fix (2), something like the following can be induced:

WARNING: bad unlock balance detected!
5.16.0-rc6-syzkaller #0 Not tainted
-------------------------------------
syz-executor011/3597 is trying to release lock (&call->user_mutex) at:
[<ffffffff885163a3>] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
but there are no more locks to release!

other info that might help us debug this:
no locks held by syz-executor011/3597.
...
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline]
 __lock_release kernel/locking/lockdep.c:5306 [inline]
 lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657
 __mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900
 rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
 rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:724
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

[Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this]

Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Reported-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
cc: Hawkins Jiawei <yin31149@gmail.com>
cc: Khalid Masum <khalid.masum.92@gmail.com>
cc: Dan Carpenter <dan.carpenter@oracle.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge tag 'cgroup-for-6.0-rc2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 25 Aug 2022 17:52:16 +0000 (10:52 -0700)]
Merge tag 'cgroup-for-6.0-rc2-fixes-2' of git://git./linux/kernel/git/tj/cgroup

Pull another cgroup fix from Tejun Heo:
 "Commit 4f7e7236435c ("cgroup: Fix threadgroup_rwsem <->
  cpus_read_lock() deadlock") required the cgroup
  core to grab cpus_read_lock() before invoking ->attach().

  Unfortunately, it missed adding cpus_read_lock() in
  cgroup_attach_task_all(). Fix it"

* tag 'cgroup-for-6.0-rc2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()

22 months agocgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()
Tetsuo Handa [Thu, 25 Aug 2022 08:38:38 +0000 (17:38 +0900)]
cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()

syzbot is hitting percpu_rwsem_assert_held(&cpu_hotplug_lock) warning at
cpuset_attach() [1], for commit 4f7e7236435ca0ab ("cgroup: Fix
threadgroup_rwsem <-> cpus_read_lock() deadlock") missed that
cpuset_attach() is also called from cgroup_attach_task_all().
Add cpus_read_lock() like what cgroup_procs_write_start() does.

Link: https://syzkaller.appspot.com/bug?extid=29d3a3b4d86c8136ad9e
Reported-by: syzbot <syzbot+29d3a3b4d86c8136ad9e@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 4f7e7236435ca0ab ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock")
Signed-off-by: Tejun Heo <tj@kernel.org>
22 months agonet: sched: delete duplicate cleanup of backlog and qlen
Zhengchao Shao [Wed, 24 Aug 2022 00:52:31 +0000 (08:52 +0800)]
net: sched: delete duplicate cleanup of backlog and qlen

qdisc_reset() is clearing qdisc->q.qlen and qdisc->qstats.backlog
_after_ calling qdisc->ops->reset. There is no need to clear them
again in the specific reset function.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20220824005231.345727-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agonet: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2
Lorenzo Bianconi [Tue, 23 Aug 2022 12:24:07 +0000 (14:24 +0200)]
net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2

Properly report hw rx hash for mt7986 chipset accroding to the new dma
descriptor layout.

Fixes: 197c9e9b17b11 ("net: ethernet: mtk_eth_soc: introduce support for mt7986 chipset")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/091394ea4e705fbb35f828011d98d0ba33808f69.1661257293.git.lorenzo@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agowifi: cfg80211: Add link_id to cfg80211_ch_switch_started_notify()
Veerendranath Jakkam [Fri, 22 Jul 2022 13:11:42 +0000 (18:41 +0530)]
wifi: cfg80211: Add link_id to cfg80211_ch_switch_started_notify()

Add link_id parameter to cfg80211_ch_switch_started_notify() to allow
driver to indicate on which link channel switch started on MLD.

Send the data to userspace so it knows as well.

Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com>
Link: https://lore.kernel.org/r/20220722131143.3438042-1-quic_vjakkam@quicinc.com
Link: https://lore.kernel.org/r/20220722131143.3438042-2-quic_vjakkam@quicinc.com
[squash two patches]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: nl80211: send MLO links channel info in GET_INTERFACE
Veerendranath Jakkam [Fri, 22 Jul 2022 13:10:00 +0000 (18:40 +0530)]
wifi: nl80211: send MLO links channel info in GET_INTERFACE

Currently, MLO link level channel information not sent to
userspace when NL80211_CMD_GET_INTERFACE requested on MLD.

Add support to send channel information for all valid links
for NL80211_CMD_GET_INTERFACE request.

Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com>
Link: https://lore.kernel.org/r/20220722131000.3437894-1-quic_vjakkam@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agonfp: flower: support case of match on ct_state(0/0x3f)
Wenjuan Geng [Tue, 23 Aug 2022 09:01:22 +0000 (11:01 +0200)]
nfp: flower: support case of match on ct_state(0/0x3f)

is_post_ct_flow() function will process only ct_state ESTABLISHED,
then offload_pre_check() function will check FLOW_DISSECTOR_KEY_CT flag.
When config tc filter match ct_state(0/0x3f), dissector->used_keys
with FLOW_DISSECTOR_KEY_CT bit, function offload_pre_check() will
return false, so not offload. This is a special case that can be handled
safely.

Therefore, modify to let initial packet which won't go through conntrack
can be offloaded, as long as the cared ct fields are all zero.

Signed-off-by: Wenjuan Geng <wenjuan.geng@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220823090122.403631-1-simon.horman@corigine.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agowifi: mac80211: allow bw change during channel switch in mesh
Hari Chandrakanthan [Wed, 27 Jul 2022 06:32:29 +0000 (12:02 +0530)]
wifi: mac80211: allow bw change during channel switch in mesh

From 'IEEE Std 802.11-2020 section 11.8.8.4.1':
  The mesh channel switch may be triggered by the need to avoid
  interference to a detected radar signal, or to reassign mesh STA
  channels to ensure the MBSS connectivity.

  A 20/40 MHz MBSS may be changed to a 20 MHz MBSS and a 20 MHz
  MBSS may be changed to a 20/40 MHz MBSS.

Since the standard allows the change of bandwidth during
the channel switch in mesh, remove the bandwidth check present in
ieee80211_set_csa_beacon.

Fixes: c6da674aff94 ("{nl,cfg,mac}80211: enable the triggering of CSA frame in mesh")
Signed-off-by: Hari Chandrakanthan <quic_haric@quicinc.com>
Link: https://lore.kernel.org/r/1658903549-21218-1-git-send-email-quic_haric@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: clean up a needless assignment in ieee80211_sta_activate_link()
Lukas Bulwahn [Fri, 12 Aug 2022 10:31:26 +0000 (12:31 +0200)]
wifi: mac80211: clean up a needless assignment in ieee80211_sta_activate_link()

Commit 177577dbd223 ("wifi: mac80211: sta_info: fix link_sta insertion")
makes ieee80211_sta_activate_link() return 0 in the 'hash' label case.
Hence, setting ret in the !test_sta_flag(...) branch to zero is not needed
anymore and can be dropped.

Remove a needless assignment.

No functional change. No change in object code.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220812103126.25308-1-lukas.bulwahn@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: allow link address A2 in TXQ dequeue
Johannes Berg [Wed, 24 Aug 2022 10:37:32 +0000 (12:37 +0200)]
wifi: mac80211: allow link address A2 in TXQ dequeue

In ieee80211_tx_dequeue() we currently allow a control port
frame to be transmitted on a non-authorized port only if the
A2 matches the local interface address, but if that's an MLD
and the peer is a legacy peer, we need to allow link address
here. Fix that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: fix control port frame addressing
Johannes Berg [Wed, 24 Aug 2022 10:30:16 +0000 (12:30 +0200)]
wifi: mac80211: fix control port frame addressing

For an AP interface, when userspace specifieds the link ID to
transmit the control port frame on (in particular for the
initial 4-way-HS), due to the logic in ieee80211_build_hdr()
for a frame transmitted from/to an MLD, we currently build a
header with

 A1 = DA = MLD address of the peer MLD
 A2 = local link address (!)
 A3 = SA = local MLD address

This clearly makes no sense, and leads to two problems:
 - if the frame were encrypted (not true for the initial
   4-way-HS) the AAD would be calculated incorrectly
 - if iTXQs are used, the frame is dropped by logic in
   ieee80211_tx_dequeue()

Fix the addressing, which fixes the first bullet, and the
second bullet for peer MLDs, I'll fix the second one for
non-MLD peers separately.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: set link ID in TX info for beacons
Johannes Berg [Tue, 23 Aug 2022 10:36:37 +0000 (12:36 +0200)]
wifi: mac80211: set link ID in TX info for beacons

This is simple here, and might save drivers some work if
they have common code for TX between beacons and other
frames.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211_hwsim: fix link change handling
Johannes Berg [Fri, 19 Aug 2022 12:58:42 +0000 (14:58 +0200)]
wifi: mac80211_hwsim: fix link change handling

The code for determining which links to update in wmediumd
or virtio was wrong, fix it to remove the deflink only if
there were no old links, and also add the deflink if there
are no other new links.

Fixes: c204d9df0202 ("wifi: mac80211_hwsim: handle links for wmediumd/virtio")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: maintain link_id in link_sta
Johannes Berg [Fri, 19 Aug 2022 11:12:38 +0000 (13:12 +0200)]
wifi: mac80211: maintain link_id in link_sta

To helper drivers if they e.g. have a lookup of the link_sta
pointer, add the link ID to the link_sta structure.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: cfg80211/mac80211: check EHT capability size correctly
Johannes Berg [Tue, 16 Aug 2022 09:26:23 +0000 (11:26 +0200)]
wifi: cfg80211/mac80211: check EHT capability size correctly

For AP/non-AP the EHT MCS/NSS subfield size differs, the
4-octet subfield is only used for 20 MHz-only non-AP STA.
Pass an argument around everywhere to be able to parse it
properly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211_hwsim: split iftype data into AP/non-AP
Johannes Berg [Tue, 16 Aug 2022 09:22:46 +0000 (11:22 +0200)]
wifi: mac80211_hwsim: split iftype data into AP/non-AP

The next patch will require splitting the data here into
AP and non-AP because for EHT, the format of the MCS/NSS
support (struct ieee80211_eht_mcs_nss_supp) is different,
for AP the only_20mhz cannot be used.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: mlme: don't add empty EML capabilities
Mordechay Goodstein [Sat, 30 Jul 2022 00:51:08 +0000 (03:51 +0300)]
wifi: mac80211: mlme: don't add empty EML capabilities

Draft P802.11be_D2.1, section 35.3.17 states that the EML Capabilities
Field shouldn't be included in case the device doesn't have support for
EMLSR or EMLMR.

Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link")
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: use link ID for MLO in queued frames
Johannes Berg [Wed, 17 Aug 2022 19:57:19 +0000 (21:57 +0200)]
wifi: mac80211: use link ID for MLO in queued frames

When queuing frames to an interface store the link ID we
determined (which possibly came from the driver in the
RX status in the first place) in the RX status, and use
it in the MLME code to send probe responses, beacons and
CSA frames to the right link.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: use the corresponding link for stats update
Vasanthakumar Thiagarajan [Wed, 17 Aug 2022 10:42:13 +0000 (16:12 +0530)]
wifi: mac80211: use the corresponding link for stats update

With link_id reported in rx_status for MLO connection, do the
stats update on the appropriate link instead of always deflink.

Signed-off-by: Vasanthakumar Thiagarajan <quic_vthiagar@quicinc.com>
Link: https://lore.kernel.org/r/20220817104213.2531-3-quic_vthiagar@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: add link information in ieee80211_rx_status
Vasanthakumar Thiagarajan [Wed, 17 Aug 2022 10:42:12 +0000 (16:12 +0530)]
wifi: mac80211: add link information in ieee80211_rx_status

In MLO, when the address translation from link to MLD is done
in fw/hw, it is necessary to be able to have some information
on the link on which the frame has been received. Extend the
rx API to include link_id and a valid flag in ieee80211_rx_status.
Also make chanes to mac80211 rx APIs to make use of the reported
link_id after sanity checks.

Signed-off-by: Vasanthakumar Thiagarajan <quic_vthiagar@quicinc.com>
Link: https://lore.kernel.org/r/20220817104213.2531-2-quic_vthiagar@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: properly implement MLO key handling
Johannes Berg [Wed, 17 Aug 2022 09:17:01 +0000 (11:17 +0200)]
wifi: mac80211: properly implement MLO key handling

Implement key installation and lookup (on TX and RX)
for MLO, so we can use multiple GTKs/IGTKs/BIGTKs.

Co-authored-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: cfg80211: Add link_id parameter to various key operations for MLO
Veerendranath Jakkam [Sat, 30 Jul 2022 05:26:43 +0000 (10:56 +0530)]
wifi: cfg80211: Add link_id parameter to various key operations for MLO

Add support for various key operations on MLD by adding new parameter
link_id. Pass the link_id received from userspace to driver for add_key,
get_key, del_key, set_default_key, set_default_mgmt_key and
set_default_beacon_key to support configuring keys specific to each MLO
link. Userspace must not specify link ID for MLO pairwise key since it
is common for all the MLO links.

Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com>
Link: https://lore.kernel.org/r/20220730052643.1959111-4-quic_vjakkam@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: cfg80211: Prevent cfg80211_wext_siwencodeext() on MLD
Veerendranath Jakkam [Sat, 30 Jul 2022 05:26:42 +0000 (10:56 +0530)]
wifi: cfg80211: Prevent cfg80211_wext_siwencodeext() on MLD

Currently, MLO support is not added for WEXT code and WEXT handlers are
prevented on MLDs. Prevent WEXT handler cfg80211_wext_siwencodeext()
also on MLD which is missed in commit 7b0a0e3c3a88 ("wifi: cfg80211: do
some rework towards MLO link APIs")

Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com>
Link: https://lore.kernel.org/r/20220730052643.1959111-3-quic_vjakkam@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: cfg80211: reject connect response with MLO params for WEP
Veerendranath Jakkam [Sat, 30 Jul 2022 05:26:41 +0000 (10:56 +0530)]
wifi: cfg80211: reject connect response with MLO params for WEP

MLO connections are not supposed to use WEP security. Reject connect
response of MLO connection if WEP security mode is used.

Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com>
Link: https://lore.kernel.org/r/20220730052643.1959111-2-quic_vjakkam@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: fix use-after-free
Johannes Berg [Wed, 17 Aug 2022 08:44:05 +0000 (10:44 +0200)]
wifi: mac80211: fix use-after-free

We've already freed the assoc_data at this point, so need
to use another copy of the AP (MLD) address instead.

Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: use link in TXQ parameter configuration
Shaul Triebitz [Tue, 2 Aug 2022 12:22:42 +0000 (15:22 +0300)]
wifi: mac80211: use link in TXQ parameter configuration

Configure the correct link per the passed parameters.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: cfg80211: add link id to txq params
Shaul Triebitz [Tue, 2 Aug 2022 12:22:42 +0000 (15:22 +0300)]
wifi: cfg80211: add link id to txq params

The Tx queue parameters are per link, so add the link ID
from nl80211 parameters to the API.

While at it, lock the wdev when calling into the driver
so it (and we) can check the link ID appropriately.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: set link BSSID
Shaul Triebitz [Thu, 4 Aug 2022 13:50:18 +0000 (16:50 +0300)]
wifi: mac80211: set link BSSID

For an AP interface, set the link BSSID when the link
is initialized.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: cfg80211: get correct AP link chandef
Shaul Triebitz [Mon, 1 Aug 2022 11:12:29 +0000 (14:12 +0300)]
wifi: cfg80211: get correct AP link chandef

When checking for channel regulatory validity, use the
AP link chandef (and not mesh's chandef).

Fixes: 7b0a0e3c3a88 ("wifi: cfg80211: do some rework towards MLO link APIs")
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: cfg80211: Update RNR parsing to align with Draft P802.11be_D2.0
Ilan Peer [Wed, 3 Aug 2022 15:02:56 +0000 (18:02 +0300)]
wifi: cfg80211: Update RNR parsing to align with Draft P802.11be_D2.0

Based on changes in the specification the TBTT information in
the RNR can include MLD information, so update the parsing to
allow extracting the short SSID information in such a case.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: properly set old_links when removing a link
Shaul Triebitz [Sun, 24 Jul 2022 08:07:32 +0000 (11:07 +0300)]
wifi: mac80211: properly set old_links when removing a link

In ieee80211_sta_remove_link, valid_links is set to
the new_links before calling drv_change_sta_links, but
is used for the old_links.

Fixes: cb71f1d136a6 ("wifi: mac80211: add sta link addition/removal")
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: accept STA changes without link changes
Johannes Berg [Sat, 23 Jul 2022 20:08:49 +0000 (22:08 +0200)]
wifi: mac80211: accept STA changes without link changes

If there's no link ID, then check that there are no changes to
the link, and if so accept them, unless a new link is created.
While at it, reject creating a new link without an address.

This fixes authorizing an MLD (peer) that has no link 0.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agonet: gro: skb_gro_header helper function
Richard Gobert [Tue, 23 Aug 2022 07:10:49 +0000 (09:10 +0200)]
net: gro: skb_gro_header helper function

Introduce a simple helper function to replace a common pattern.
When accessing the GRO header, we fetch the pointer from frag0,
then test its validity and fetch it from the skb when necessary.

This leads to the pattern
skb_gro_header_fast -> skb_gro_header_hard -> skb_gro_header_slow
recurring many times throughout GRO code.

This patch replaces these patterns with a single inlined function
call, improving code readability.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220823071034.GA56142@debian
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agoDocumentation: devlink: fix the locking section
Jiri Pirko [Tue, 23 Aug 2022 07:02:13 +0000 (09:02 +0200)]
Documentation: devlink: fix the locking section

As all callbacks are converted now, fix the text reflecting that change.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220823070213.1008956-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch 'add-a-second-bind-table-hashed-by-port-and-address'
Jakub Kicinski [Thu, 25 Aug 2022 02:30:19 +0000 (19:30 -0700)]
Merge branch 'add-a-second-bind-table-hashed-by-port-and-address'

Joanne Koong says:

====================
Add a second bind table hashed by port and address

Currently, there is one bind hashtable (bhash) that hashes by port only.
This patchset adds a second bind table (bhash2) that hashes by port and
address.

The motivation for adding bhash2 is to expedite bind requests in situations
where the port has many sockets in its bhash table entry (eg a large number
of sockets bound to different addresses on the same port), which makes checking
bind conflicts costly especially given that we acquire the table entry spinlock
while doing so, which can cause softirq cpu lockups and can prevent new tcp
connections.

We ran into this problem at Meta where the traffic team binds a large number
of IPs to port 443 and the bind() call took a significant amount of time
which led to cpu softirq lockups, which caused packet drops and other failures
on the machine.

When experimentally testing this on a local server for ~24k sockets bound to
the port, the results seen were:

ipv4:
before - 0.002317 seconds
with bhash2 - 0.000020 seconds

ipv6:
before - 0.002431 seconds
with bhash2 - 0.000021 seconds

The additions to the initial bhash2 submission [0] are:
* Updating bhash2 in the cases where a socket's rcv saddr changes after it has
* been bound
* Adding locks for bhash2 hashbuckets

[0] https://lore.kernel.org/netdev/20220520001834.2247810-1-kuba@kernel.org/

v3: https://lore.kernel.org/netdev/20220722195406.1304948-2-joannelkoong@gmail.com/
v2: https://lore.kernel.org/netdev/20220712235310.1935121-1-joannelkoong@gmail.com/
v1: https://lore.kernel.org/netdev/20220623234242.2083895-2-joannelkoong@gmail.com/
====================

Link: https://lore.kernel.org/r/20220822181023.3979645-1-joannelkoong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoselftests/net: Add sk_bind_sendto_listen and sk_connect_zero_addr
Joanne Koong [Mon, 22 Aug 2022 18:10:23 +0000 (11:10 -0700)]
selftests/net: Add sk_bind_sendto_listen and sk_connect_zero_addr

This patch adds 2 new tests: sk_bind_sendto_listen and
sk_connect_zero_addr.

The sk_bind_sendto_listen test exercises the path where a socket's
rcv saddr changes after it has been added to the binding tables,
and then a listen() on the socket is invoked. The listen() should
succeed.

The sk_bind_sendto_listen test is copied over from one of syzbot's
tests: https://syzkaller.appspot.com/x/repro.c?x=1673a38df00000

The sk_connect_zero_addr test exercises the path where the socket was
never previously added to the binding tables and it gets assigned a
saddr upon a connect() to address 0.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoselftests/net: Add test for timing a bind request to a port with a populated bhash...
Joanne Koong [Mon, 22 Aug 2022 18:10:22 +0000 (11:10 -0700)]
selftests/net: Add test for timing a bind request to a port with a populated bhash entry

This test populates the bhash table for a given port with
MAX_THREADS * MAX_CONNECTIONS sockets, and then times how long
a bind request on the port takes.

When populating the bhash table, we create the sockets and then bind
the sockets to the same address and port (SO_REUSEADDR and SO_REUSEPORT
are set). When timing how long a bind on the port takes, we bind on a
different address without SO_REUSEPORT set. We do not set SO_REUSEPORT
because we are interested in the case where the bind request does not
go through the tb->fastreuseport path, which is fragile (eg
tb->fastreuseport path does not work if binding with a different uid).

To run the script:
    Usage: ./bind_bhash.sh [-6 | -4] [-p port] [-a address]
    6: use ipv6
    4: use ipv4
    port: Port number
    address: ip address

Without any arguments, ./bind_bhash.sh defaults to ipv6 using ip address
"2001:0db8:0:f101::1" on port 443.

On my local machine, I see:
ipv4:
before - 0.002317 seconds
with bhash2 - 0.000020 seconds

ipv6:
before - 0.002431 seconds
with bhash2 - 0.000021 seconds

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: Add a bhash2 table hashed by port and address
Joanne Koong [Mon, 22 Aug 2022 18:10:21 +0000 (11:10 -0700)]
net: Add a bhash2 table hashed by port and address

The current bind hashtable (bhash) is hashed by port only.
In the socket bind path, we have to check for bind conflicts by
traversing the specified port's inet_bind_bucket while holding the
hashbucket's spinlock (see inet_csk_get_port() and
inet_csk_bind_conflict()). In instances where there are tons of
sockets hashed to the same port at different addresses, the bind
conflict check is time-intensive and can cause softirq cpu lockups,
as well as stops new tcp connections since __inet_inherit_port()
also contests for the spinlock.

This patch adds a second bind table, bhash2, that hashes by
port and sk->sk_rcv_saddr (ipv4) and sk->sk_v6_rcv_saddr (ipv6).
Searching the bhash2 table leads to significantly faster conflict
resolution and less time holding the hashbucket spinlock.

Please note a few things:
* There can be the case where the a socket's address changes after it
has been bound. There are two cases where this happens:

  1) The case where there is a bind() call on INADDR_ANY (ipv4) or
  IPV6_ADDR_ANY (ipv6) and then a connect() call. The kernel will
  assign the socket an address when it handles the connect()

  2) In inet_sk_reselect_saddr(), which is called when rebuilding the
  sk header and a few pre-conditions are met (eg rerouting fails).

In these two cases, we need to update the bhash2 table by removing the
entry for the old address, and add a new entry reflecting the updated
address.

* The bhash2 table must have its own lock, even though concurrent
accesses on the same port are protected by the bhash lock. Bhash2 must
have its own lock to protect against cases where sockets on different
ports hash to different bhash hashbuckets but to the same bhash2
hashbucket.

This brings up a few stipulations:
  1) When acquiring both the bhash and the bhash2 lock, the bhash2 lock
  will always be acquired after the bhash lock and released before the
  bhash lock is released.

  2) There are no nested bhash2 hashbucket locks. A bhash2 lock is always
  acquired+released before another bhash2 lock is acquired+released.

* The bhash table cannot be superseded by the bhash2 table because for
bind requests on INADDR_ANY (ipv4) or IPV6_ADDR_ANY (ipv6), every socket
bound to that port must be checked for a potential conflict. The bhash
table is the only source of port->socket associations.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Thu, 25 Aug 2022 02:18:09 +0000 (19:18 -0700)]
Merge git://git./linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Fix crash with malformed ebtables blob which do not provide all
   entry points, from Florian Westphal.

2) Fix possible TCP connection clogging up with default 5-days
   timeout in conntrack, from Florian.

3) Fix crash in nf_tables tproxy with unsupported chains, also from Florian.

4) Do not allow to update implicit chains.

5) Make table handle allocation per-netns to fix data race.

6) Do not truncated payload length and offset, and checksum offset.
   Instead report EINVAl.

7) Enable chain stats update via static key iff no error occurs.

8) Restrict osf expression to ip, ip6 and inet families.

9) Restrict tunnel expression to netdev family.

10) Fix crash when trying to bind again an already bound chain.

11) Flowtable garbage collector might leave behind pending work to
    delete entries. This patch comes with a previous preparation patch
    as dependency.

12) Allow net.netfilter.nf_conntrack_frag6_high_thresh to be lowered,
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases
  netfilter: flowtable: fix stuck flows on cleanup due to pending work
  netfilter: flowtable: add function to invoke garbage collection immediately
  netfilter: nf_tables: disallow binding to already bound chain
  netfilter: nft_tunnel: restrict it to netdev family
  netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families
  netfilter: nf_tables: do not leave chain stats enabled on error
  netfilter: nft_payload: do not truncate csum_offset and csum_type
  netfilter: nft_payload: report ERANGE for too long offset and length
  netfilter: nf_tables: make table handle allocation per-netns friendly
  netfilter: nf_tables: disallow updates of implicit chain
  netfilter: nft_tproxy: restrict to prerouting hook
  netfilter: conntrack: work around exceeded receive window
  netfilter: ebtables: reject blobs that don't provide all entry points
====================

Link: https://lore.kernel.org/r/20220824220330.64283-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMAINTAINERS: rectify file entry in BONDING DRIVER
Lukas Bulwahn [Wed, 24 Aug 2022 07:29:45 +0000 (09:29 +0200)]
MAINTAINERS: rectify file entry in BONDING DRIVER

Commit c078290a2b76 ("selftests: include bonding tests into the kselftest
infra") adds the bonding tests in the directory:

  tools/testing/selftests/drivers/net/bonding/

The file entry in MAINTAINERS for the BONDING DRIVER however refers to:

  tools/testing/selftests/net/bonding/

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken file pattern.

Repair this file entry in BONDING DRIVER.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/20220824072945.28606-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonetlink: fix some kernel-doc comments
Zhengchao Shao [Wed, 24 Aug 2022 01:36:21 +0000 (09:36 +0800)]
netlink: fix some kernel-doc comments

Modify the comment of input parameter of nlmsg_ and nla_ function.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20220824013621.365103-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>