platform/kernel/linux-starfive.git
13 months agonet/mlx5e: Remove a useless function call
Christophe JAILLET [Mon, 29 May 2023 08:34:59 +0000 (10:34 +0200)]
net/mlx5e: Remove a useless function call

'handle' is known to be NULL here. There is no need to kfree() it.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Light probe local SFs
Shay Drory [Wed, 3 May 2023 11:18:23 +0000 (14:18 +0300)]
net/mlx5: Light probe local SFs

In case user wants to configure the SFs, for example: to use only vdpa
functionality, he needs to fully probe a SF, configure what he wants,
and afterward reload the SF.

In order to save the time of the reload, local SFs will probe without
any auxiliary sub-device, so that the SFs can be configured prior to
its full probe.

The defaults of the enable_* devlink params of these SFs are set to
false.

Usage example:
Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
               hw_addr 00:00:00:00:00:11 state active

Enable ETH auxiliary device:
$ devlink dev param set auxiliary/mlx5_core.sf.1 \
              name enable_eth value true cmode driverinit

Now, in order to fully probe the SF, use devlink reload:
$ devlink dev reload auxiliary/mlx5_core.sf.1

At this point the user have SF devlink instance with auxiliary device
for the Ethernet functionality only.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Move esw multiport devlink param to eswitch code
Shay Drory [Wed, 17 May 2023 14:39:54 +0000 (17:39 +0300)]
net/mlx5: Move esw multiport devlink param to eswitch code

Move the param registration and handling code into the eswitch
code as they are related to each other. No point in having the
devlink param registration done in separate file.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Split function_setup() to enable and open functions
Shay Drory [Wed, 3 May 2023 09:08:48 +0000 (12:08 +0300)]
net/mlx5: Split function_setup() to enable and open functions

mlx5_cmd_init_hca() is taking ~0.2 seconds. In case of a user who
desire to disable some of the SF aux devices, and with large scale-1K
SFs for example, this user will waste more than 3 minutes on
mlx5_cmd_init_hca() which isn't needed at this stage.

Downstream patch will change SFs which are probe over the E-switch,
local SFs, to be probed without any aux dev. In order to support this,
split function_setup() to avoid executing mlx5_cmd_init_hca().

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Set max number of embedded CPU VFs
Daniel Jurgens [Wed, 15 Mar 2023 15:29:13 +0000 (17:29 +0200)]
net/mlx5: Set max number of embedded CPU VFs

Set the maximum number of embedded cpu VF functions available.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Update SRIOV enable/disable to handle EC/VFs
Daniel Jurgens [Tue, 7 Mar 2023 16:52:29 +0000 (18:52 +0200)]
net/mlx5: Update SRIOV enable/disable to handle EC/VFs

Previously on the embedded CPU platform SRIOV was never enabled/disabled
via mlx5_core_sriov_configure. Host VF updates are provided by an event
handler. Now in the disable flow it must be known if this is a disable
due to driver unload or SRIOV detach, or if the user updated the number
of VFs. If due to change in the number of VFs only wait for the pages of
ECVFs.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Query correct caps for min msix vectors
Daniel Jurgens [Tue, 7 Mar 2023 17:13:43 +0000 (19:13 +0200)]
net/mlx5: Query correct caps for min msix vectors

The VFs on the host and the embedded CPU platform share function
numbers. Set the ec_vf_function field to query the caps for the correct
function.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Use correct vport when restoring GUIDs
Daniel Jurgens [Tue, 7 Mar 2023 17:06:58 +0000 (19:06 +0200)]
net/mlx5: Use correct vport when restoring GUIDs

Prior to enabling EC VF functionality the vport number and function ID
were always the same. That's not the case now. Use the correct vport
number to modify the HCA vport context.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Add new page type for EC VF pages
Daniel Jurgens [Mon, 6 Mar 2023 22:53:21 +0000 (00:53 +0200)]
net/mlx5: Add new page type for EC VF pages

When the embedded cpu supports SRIOV it can be enabled and disabled
independently from the host SRIOV. Track the pages separately so we can
properly wait for returned VF pages.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Add/remove peer miss rules for EC VFs
Daniel Jurgens [Tue, 7 Mar 2023 19:36:39 +0000 (21:36 +0200)]
net/mlx5: Add/remove peer miss rules for EC VFs

Add and remove the peer miss rules for EC VFs. It's possible that there
are different amounts of total VFs per function so only create rules for
the minimum number of max VFs.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Add management of EC VF vports
Daniel Jurgens [Tue, 7 Mar 2023 19:24:55 +0000 (21:24 +0200)]
net/mlx5: Add management of EC VF vports

Add init, load, unload, and cleanup of the EC VF vports. This includes
changes in how eswitch SRIOV is managed. Previous on an embedded CPU
platform the number of VFs provided when enabling the eswitch was always
0, host VFs vports are handled in the eswitch functions change event
handler. Now track the number of EC VFs as well, so they can be handled
properly in the enable/disable flows.

There are only 3 marks available for use in xarrays, all 3 were already
in use for this use case. EC VF vports are in a known range so we can
access them by index instead of marks.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Update vport caps query/set for EC VFs
Daniel Jurgens [Tue, 7 Mar 2023 17:51:22 +0000 (19:51 +0200)]
net/mlx5: Update vport caps query/set for EC VFs

These functions are for query/set by vport, there was an underlying
assumption that vport was equal to function ID. That's not the case for
EC VF functions. Set the ec_vf_function bit accordingly.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Enable devlink port for embedded cpu VF vports
Daniel Jurgens [Tue, 7 Mar 2023 17:36:14 +0000 (19:36 +0200)]
net/mlx5: Enable devlink port for embedded cpu VF vports

Enable creation of a devlink port for EC VF vports.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: mlx5_ifc updates for embedded CPU SRIOV
Daniel Jurgens [Mon, 6 Mar 2023 22:27:21 +0000 (00:27 +0200)]
net/mlx5: mlx5_ifc updates for embedded CPU SRIOV

Add ec_vf_vport_base to HCA Capabilities 2. This indicates the base vport
of embedded CPU virtual functions that are connected to the eswitch.

Add ec_vf_function to query/set_hca_caps. If set this indicates
accessing a virtual function on the embedded CPU by function ID. This
should only be used with other_function set to 1.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Simplify unload all rep code
Daniel Jurgens [Tue, 7 Mar 2023 22:02:12 +0000 (00:02 +0200)]
net/mlx5: Simplify unload all rep code

Instead of using type specific iterators which are only used in one place
just traverse the xarray. It will provide suitable ordering based on the
vport numbers. This will also eliminate the need for changes here when
new types are added.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agoMerge branch 'tools-ynl-gen-code-gen-improvements-before-ethtool'
Jakub Kicinski [Fri, 9 Jun 2023 21:40:33 +0000 (14:40 -0700)]
Merge branch 'tools-ynl-gen-code-gen-improvements-before-ethtool'

Jakub Kicinski says:

====================
tools: ynl-gen: code gen improvements before ethtool

I was going to post ethtool but I couldn't stand the ugliness
of the if conditions which were previously generated.
So I cleaned that up and improved a number of other things
ethtool will benefit from.
====================

Link: https://lore.kernel.org/r/20230608211200.1247213-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: support / skip pads on the way to kernel
Jakub Kicinski [Thu, 8 Jun 2023 21:12:00 +0000 (14:12 -0700)]
tools: ynl-gen: support / skip pads on the way to kernel

Kernel does not have padding requirements for 64b attrs.
We can ignore pad attrs.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: don't pass op_name to RenderInfo
Jakub Kicinski [Thu, 8 Jun 2023 21:11:59 +0000 (14:11 -0700)]
tools: ynl-gen: don't pass op_name to RenderInfo

The op_name argument is barely used and identical to op.name
in all cases.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: support code gen for events
Jakub Kicinski [Thu, 8 Jun 2023 21:11:58 +0000 (14:11 -0700)]
tools: ynl-gen: support code gen for events

Netlink specs support both events and notifications (former can
define their own message contents). Plug in missing code to
generate types, parsers and include events into notification
tables.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: sanitize notification tracking
Jakub Kicinski [Thu, 8 Jun 2023 21:11:57 +0000 (14:11 -0700)]
tools: ynl-gen: sanitize notification tracking

Don't modify the raw dicts (as loaded from YAML) to pretend
that the notify attributes also exist on the ops. This makes
the code easier to follow.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl: regen: stop generating common notification handlers
Jakub Kicinski [Thu, 8 Jun 2023 21:11:56 +0000 (14:11 -0700)]
tools: ynl: regen: stop generating common notification handlers

Remove unused notification handlers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: stop generating common notification handlers
Jakub Kicinski [Thu, 8 Jun 2023 21:11:55 +0000 (14:11 -0700)]
tools: ynl-gen: stop generating common notification handlers

Common notification handler was supposed to be a way for the user
to parse the notifications from a socket synchronously.
I don't think we'll end up using it, ynl_ntf_check() works for
all known use cases.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl: regen: regenerate the if ladders
Jakub Kicinski [Thu, 8 Jun 2023 21:11:54 +0000 (14:11 -0700)]
tools: ynl: regen: regenerate the if ladders

Renegate the code to combine } and else and use tmp variable
to store type.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: get attr type outside of if()
Jakub Kicinski [Thu, 8 Jun 2023 21:11:53 +0000 (14:11 -0700)]
tools: ynl-gen: get attr type outside of if()

Reading attr type with mnl_attr_get_type() for each condition
leads to most conditions being longer than 80 chars.
Avoid this by reading the type to a variable on the stack.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: combine else with closing bracket
Jakub Kicinski [Thu, 8 Jun 2023 21:11:52 +0000 (14:11 -0700)]
tools: ynl-gen: combine else with closing bracket

Code gen currently prints:

  }
  else if (...

This is really ugly. Fix it by delaying printing of closing
brackets in anticipation of else coming along.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: complete the C keyword list
Jakub Kicinski [Thu, 8 Jun 2023 21:11:51 +0000 (14:11 -0700)]
tools: ynl-gen: complete the C keyword list

C keywords need to be avoided when naming things.
Complete the list (ethtool has at least one thing called "auto").

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl: regen: cleanup user space header includes
Jakub Kicinski [Thu, 8 Jun 2023 21:11:50 +0000 (14:11 -0700)]
tools: ynl: regen: cleanup user space header includes

Remove unnecessary includes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: cleanup user space header includes
Jakub Kicinski [Thu, 8 Jun 2023 21:11:49 +0000 (14:11 -0700)]
tools: ynl-gen: cleanup user space header includes

Bots started screaming that we're including stdlib.h twice.
While at it move string.h into a common spot and drop stdio.h
which we don't need.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5464
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5466
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5467
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoRevert "tools: ynl: Remove duplicated include in handshake-user.c"
Jakub Kicinski [Fri, 9 Jun 2023 18:00:59 +0000 (11:00 -0700)]
Revert "tools: ynl: Remove duplicated include in handshake-user.c"

This reverts commit e7c5433c5aaab52ddd5448967a9a5db94a3939cc.

Commit e7c5433c5aaa ("tools: ynl: Remove duplicated include
in handshake-user.c") was applied too hastily. It changes
an auto-generated file, and there's already a proper fix
on the list.

Link: https://lore.kernel.org/all/ZIMPLYi%2FxRih+DlC@nanopsycho/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl: Remove duplicated include in handshake-user.c
Yang Li [Thu, 8 Jun 2023 08:31:48 +0000 (16:31 +0800)]
tools: ynl: Remove duplicated include in handshake-user.c

./tools/net/ynl/generated/handshake-user.c: stdlib.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5464
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoMerge branch 'broadcom-phy-led-brightness'
David S. Miller [Fri, 9 Jun 2023 09:38:44 +0000 (10:38 +0100)]
Merge branch 'broadcom-phy-led-brightness'

Florian Fainelli says:

====================
LED brightness support for Broadcom PHYs

This patch series adds support for controlling the LED brightness on
Broadcom PHYs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: phy: broadcom: Add support for setting LED brightness
Florian Fainelli [Wed, 7 Jun 2023 18:34:53 +0000 (11:34 -0700)]
net: phy: broadcom: Add support for setting LED brightness

Broadcom PHYs have two LEDs selector registers which allow us to control
the LED assignment, including how to turn them on/off.

Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: phy: broadcom: Rename LED registers
Florian Fainelli [Wed, 7 Jun 2023 18:34:52 +0000 (11:34 -0700)]
net: phy: broadcom: Rename LED registers

These registers are common to most PHYs and are not specific to the
BCM5482, renamed the constants accordingly, no functional change.

Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoMerge branch 'net-ncsi-refactoring-for-GMA-cmd'
David S. Miller [Fri, 9 Jun 2023 09:32:51 +0000 (10:32 +0100)]
Merge branch 'net-ncsi-refactoring-for-GMA-cmd'

Ivan Mikhaylov says:

====================
net/ncsi: refactoring for GMA command

Make one GMA function for all manufacturers, change ndo_set_mac_address
to dev_set_mac_address for notifiying net layer about MAC change which
ndo_set_mac_address doesn't do.

Changes from v1:
1. delete ftgmac100.txt changes about mac-address-increment
2. add convert to yaml from ftgmac100.txt
3. add mac-address-increment option for ethernet-controller.yaml

Changes from v2:
1. remove DT changes from series, will be done in another one
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet/ncsi: change from ndo_set_mac_address to dev_set_mac_address
Ivan Mikhaylov [Wed, 7 Jun 2023 15:17:42 +0000 (18:17 +0300)]
net/ncsi: change from ndo_set_mac_address to dev_set_mac_address

Change ndo_set_mac_address to dev_set_mac_address because
dev_set_mac_address provides a way to notify network layer about MAC
change. In other case, services may not aware about MAC change and keep
using old one which set from network adapter driver.

As example, DHCP client from systemd do not update MAC address without
notification from net subsystem which leads to the problem with acquiring
the right address from DHCP server.

Fixes: cb10c7c0dfd9e ("net/ncsi: Add NCSI Broadcom OEM command")
Cc: stable@vger.kernel.org # v6.0+ 2f38e84 net/ncsi: make one oem_gma function for all mfr id
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Signed-off-by: Ivan Mikhaylov <fr0st61te@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet/ncsi: make one oem_gma function for all mfr id
Ivan Mikhaylov [Wed, 7 Jun 2023 15:17:41 +0000 (18:17 +0300)]
net/ncsi: make one oem_gma function for all mfr id

Make the one Get Mac Address function for all manufacturers and change
this call in handlers accordingly.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ivan Mikhaylov <fr0st61te@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agousbnet: ipheth: update Kconfig description
Foster Snowhill [Wed, 7 Jun 2023 13:57:02 +0000 (15:57 +0200)]
usbnet: ipheth: update Kconfig description

This module has for a long time not been limited to iPhone <= 3GS.
Update description to match the actual state of the driver.

Remove dead link from 2010, instead reference an existing userspace
iOS device pairing implementation as part of libimobiledevice.

Signed-off-by: Foster Snowhill <forst@pen.gy>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agousbnet: ipheth: add CDC NCM support
Foster Snowhill [Wed, 7 Jun 2023 13:57:01 +0000 (15:57 +0200)]
usbnet: ipheth: add CDC NCM support

Recent iOS releases support CDC NCM encapsulation on RX. This mode is
the default on macOS and Windows. In this mode, an iOS device may include
one or more Ethernet frames inside a single URB.

Freshly booted iOS devices start in legacy mode, but are put into
NCM mode by the official Apple driver. When reconnecting such a device
from a macOS/Windows machine to a Linux host, the device stays in
NCM mode, making it unusable with the legacy ipheth driver code.

To correctly support such a device, the driver has to either support
the NCM mode too, or put the device back into legacy mode.

To match the behaviour of the macOS/Windows driver, and since there
is no documented control command to revert to legacy mode, implement
NCM support. The device is attempted to be put into NCM mode by default,
and falls back to legacy mode if the attempt fails.

Signed-off-by: Foster Snowhill <forst@pen.gy>
Tested-by: Georgi Valkov <gvalkov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agousbnet: ipheth: transmit URBs without trailing padding
Foster Snowhill [Wed, 7 Jun 2023 13:57:00 +0000 (15:57 +0200)]
usbnet: ipheth: transmit URBs without trailing padding

The behaviour of the official iOS tethering driver on macOS is to not
transmit any trailing padding at the end of URBs. This is applicable
to both NCM and legacy modes, including older devices.

Adapt the driver to not include trailing padding in TX URBs, matching
the behaviour of the official macOS driver.

Signed-off-by: Foster Snowhill <forst@pen.gy>
Tested-by: Georgi Valkov <gvalkov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agousbnet: ipheth: fix risk of NULL pointer deallocation
Georgi Valkov [Wed, 7 Jun 2023 13:56:59 +0000 (15:56 +0200)]
usbnet: ipheth: fix risk of NULL pointer deallocation

The cleanup precedure in ipheth_probe will attempt to free a
NULL pointer in dev->ctrl_buf if the memory allocation for
this buffer is not successful. While kfree ignores NULL pointers,
and the existing code is safe, it is a better design to rearrange
the goto labels and avoid this.

Signed-off-by: Georgi Valkov <gvalkov@gmail.com>
Signed-off-by: Foster Snowhill <forst@pen.gy>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoMerge branch 'splice-net-rewrite-splice-to-socket-fix-splice_f_more-and-handle-msg_sp...
Jakub Kicinski [Fri, 9 Jun 2023 02:40:32 +0000 (19:40 -0700)]
Merge branch 'splice-net-rewrite-splice-to-socket-fix-splice_f_more-and-handle-msg_splice_pages-in-af_tls'

David Howells says:

====================
splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS

Here are patches to do the following:

 (1) Block MSG_SENDPAGE_* flags from leaking into ->sendmsg() from
     userspace, whilst allowing splice_to_socket() to pass them in.

 (2) Allow MSG_SPLICE_PAGES to be passed into tls_*_sendmsg().  Until
     support is added, it will be ignored and a splice-driven sendmsg()
     will be treated like a normal sendmsg().  TCP, UDP, AF_UNIX and
     Chelsio-TLS already handle the flag in net-next.

 (3) Replace a chain of functions to splice-to-sendpage with a single
     function to splice via sendmsg() with MSG_SPLICE_PAGES.  This allows a
     bunch of pages to be spliced from a pipe in a single call using a
     bio_vec[] and pushes the main processing loop down into the bowels of
     the protocol driver rather than repeatedly calling in with a page at a
     time.

 (4) Provide a ->splice_eof() op[2] that allows splice to signal to its
     output that the input observed a premature EOF and that the caller
     didn't flag SPLICE_F_MORE, thereby allowing a corked socket to be
     flushed.  This attempts to maintain the current behaviour.  It is also
     not called if we didn't manage to read any data and so didn't called
     the actor function.

     This needs routing though several layers to get it down to the network
     protocol.

     [!] Note that I chose not to pass in any flags - I'm not sure it's
       particularly useful to pass in the splice flags; I also elected
       not to return any error code - though we might actually want to do
       that.

 (5) Provide tls_{device,sw}_splice_eof() to flush a pending TLS record if
     there is one.

 (6) Provide splice_eof() for UDP, TCP, Chelsio-TLS and AF_KCM.  AF_UNIX
     doesn't seem to pay attention to the MSG_MORE or MSG_SENDPAGE_NOTLAST
     flags.

 (7) Alter the behaviour of sendfile() and fix SPLICE_F_MORE/MSG_MORE
     signalling[1] such SPLICE_F_MORE is always signalled until we have
     read sufficient data to finish the request.  If we get a zero-length
     before we've managed to splice sufficient data, we now leave the
     socket expecting more data and leave it to userspace to deal with it.

 (8) Make AF_TLS handle the MSG_SPLICE_PAGES internal sendmsg flag.
     MSG_SPLICE_PAGES is an internal hint that tells the protocol that it
     should splice the pages supplied if it can.  Its sendpage
     implementations are then turned into wrappers around that.

Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6
Link: https://lore.kernel.org/r/20230524153311.3625329-1-dhowells@redhat.com/
====================

Link: https://lore.kernel.org/r/20230607181920.2294972-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES
David Howells [Wed, 7 Jun 2023 18:19:20 +0000 (19:19 +0100)]
tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES

Convert tls_device_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself.  With that, the tls_iter_offset
union is no longer necessary and can be replaced with an iov_iter pointer
and the zc_page argument to tls_push_data() can also be removed.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotls/device: Support MSG_SPLICE_PAGES
David Howells [Wed, 7 Jun 2023 18:19:19 +0000 (19:19 +0100)]
tls/device: Support MSG_SPLICE_PAGES

Make TLS's device sendmsg() support MSG_SPLICE_PAGES.  This causes pages to
be spliced from the source iterator if possible.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES
David Howells [Wed, 7 Jun 2023 18:19:18 +0000 (19:19 +0100)]
tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES

Convert tls_sw_sendpage() and tls_sw_sendpage_locked() to use sendmsg()
with MSG_SPLICE_PAGES rather than directly splicing in the pages itself.

[!] Note that tls_sw_sendpage_locked() appears to have the wrong locking
    upstream.  I think the caller will only hold the socket lock, but it
    should hold tls_ctx->tx_lock too.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotls/sw: Support MSG_SPLICE_PAGES
David Howells [Wed, 7 Jun 2023 18:19:17 +0000 (19:19 +0100)]
tls/sw: Support MSG_SPLICE_PAGES

Make TLS's sendmsg() support MSG_SPLICE_PAGES.  This causes pages to be
spliced from the source iterator if possible.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agosplice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor()
David Howells [Wed, 7 Jun 2023 18:19:16 +0000 (19:19 +0100)]
splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor()

splice_direct_to_actor() doesn't manage SPLICE_F_MORE correctly[1] - and,
as a result, it incorrectly signals/fails to signal MSG_MORE when splicing
to a socket.  The problem I'm seeing happens when a short splice occurs
because we got a short read due to hitting the EOF on a file: as the length
read (read_len) is less than the remaining size to be spliced (len),
SPLICE_F_MORE (and thus MSG_MORE) is set.

The issue is that, for the moment, we have no way to know *why* the short
read occurred and so can't make a good decision on whether we *should* keep
MSG_MORE set.

MSG_SENDPAGE_NOTLAST was added to work around this, but that is also set
incorrectly under some circumstances - for example if a short read fills a
single pipe_buffer, but the next read would return more (seqfile can do
this).

This was observed with the multi_chunk_sendfile tests in the tls kselftest
program.  Some of those tests would hang and time out when the last chunk
of file was less than the sendfile request size:

build/kselftest/net/tls -r tls.12_aes_gcm.multi_chunk_sendfile

This has been observed before[2] and worked around in AF_TLS[3].

Fix this by making splice_direct_to_actor() always signal SPLICE_F_MORE if
we haven't yet hit the requested operation size.  SPLICE_F_MORE remains
signalled if the user passed it in to splice() but otherwise gets cleared
when we've read sufficient data to fulfill the request.

If, however, we get a premature EOF from ->splice_read(), have sent at
least one byte and SPLICE_F_MORE was not set by the caller, ->splice_eof()
will be invoked.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: David Hildenbrand <david@redhat.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: linux-mm@kvack.org

Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d452d48b9f8b1a7f8152d33ef52cfd7fe1735b0a
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agokcm: Use splice_eof() to flush
David Howells [Wed, 7 Jun 2023 18:19:15 +0000 (19:19 +0100)]
kcm: Use splice_eof() to flush

Allow splice to undo the effects of MSG_MORE after prematurely ending a
splice/sendfile due to getting an EOF condition (->splice_read() returned
0) after splice had called sendmsg() with MSG_MORE set when the user didn't
set MSG_MORE.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tom Herbert <tom@herbertland.com>
cc: Tom Herbert <tom@quantonium.net>
cc: Cong Wang <cong.wang@bytedance.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agochelsio/chtls: Use splice_eof() to flush
David Howells [Wed, 7 Jun 2023 18:19:14 +0000 (19:19 +0100)]
chelsio/chtls: Use splice_eof() to flush

Allow splice to end a Chelsio TLS record after prematurely ending a
splice/sendfile due to getting an EOF condition (->splice_read() returned
0) after splice had called sendmsg() with MSG_MORE set when the user didn't
set MSG_MORE.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ayush Sawal <ayush.sawal@chelsio.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoipv4, ipv6: Use splice_eof() to flush
David Howells [Wed, 7 Jun 2023 18:19:13 +0000 (19:19 +0100)]
ipv4, ipv6: Use splice_eof() to flush

Allow splice to undo the effects of MSG_MORE after prematurely ending a
splice/sendfile due to getting an EOF condition (->splice_read() returned
0) after splice had called sendmsg() with MSG_MORE set when the user didn't
set MSG_MORE.

For UDP, a pending packet will not be emitted if the socket is closed
before it is flushed; with this change, it be flushed by ->splice_eof().

For TCP, it's not clear that MSG_MORE is actually effective.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Kuniyuki Iwashima <kuniyu@amazon.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotls/device: Use splice_eof() to flush
David Howells [Wed, 7 Jun 2023 18:19:12 +0000 (19:19 +0100)]
tls/device: Use splice_eof() to flush

Allow splice to end a TLS record after prematurely ending a splice/sendfile
due to getting an EOF condition (->splice_read() returned 0) after splice
had called TLS with a sendmsg() with MSG_MORE set when the user didn't set
MSG_MORE.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotls/sw: Use splice_eof() to flush
David Howells [Wed, 7 Jun 2023 18:19:11 +0000 (19:19 +0100)]
tls/sw: Use splice_eof() to flush

Allow splice to end a TLS record after prematurely ending a splice/sendfile
due to getting an EOF condition (->splice_read() returned 0) after splice
had called TLS with a sendmsg() with MSG_MORE set when the user didn't set
MSG_MORE.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agosplice, net: Add a splice_eof op to file-ops and socket-ops
David Howells [Wed, 7 Jun 2023 18:19:10 +0000 (19:19 +0100)]
splice, net: Add a splice_eof op to file-ops and socket-ops

Add an optional method, ->splice_eof(), to allow splice to indicate the
premature termination of a splice to struct file_operations and struct
proto_ops.

This is called if sendfile() or splice() encounters all of the following
conditions inside splice_direct_to_actor():

 (1) the user did not set SPLICE_F_MORE (splice only), and

 (2) an EOF condition occurred (->splice_read() returned 0), and

 (3) we haven't read enough to fulfill the request (ie. len > 0 still), and

 (4) we have already spliced at least one byte.

A further patch will modify the behaviour of SPLICE_F_MORE to always be
passed to the actor if either the user set it or we haven't yet read
sufficient data to fulfill the request.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: David Hildenbrand <david@redhat.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: linux-mm@kvack.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agosplice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()
David Howells [Wed, 7 Jun 2023 18:19:09 +0000 (19:19 +0100)]
splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()

Replace generic_splice_sendpage() + splice_from_pipe + pipe_to_sendpage()
with a net-specific handler, splice_to_socket(), that calls sendmsg() with
MSG_SPLICE_PAGES set instead of calling ->sendpage().

MSG_MORE is used to indicate if the sendmsg() is expected to be followed
with more data.

This allows multiple pipe-buffer pages to be passed in a single call in a
BVEC iterator, allowing the processing to be pushed down to a loop in the
protocol driver.  This helps pave the way for passing multipage folios down
too.

Protocols that haven't been converted to handle MSG_SPLICE_PAGES yet should
just ignore it and do a normal sendmsg() for now - although that may be a
bit slower as it may copy everything.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg
David Howells [Wed, 7 Jun 2023 18:19:08 +0000 (19:19 +0100)]
tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg

Allow MSG_SPLICE_PAGES to be specified to sendmsg() but treat it as normal
sendmsg for now.  This means the data will just be copied until
MSG_SPLICE_PAGES is handled.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace
David Howells [Wed, 7 Jun 2023 18:19:07 +0000 (19:19 +0100)]
net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace

It is necessary to allow MSG_SENDPAGE_* to be passed into ->sendmsg() to
allow sendmsg(MSG_SPLICE_PAGES) to replace ->sendpage().  Unblocking them
in the network protocol, however, allows these flags to be passed in by
userspace too[1].

Fix this by marking MSG_SENDPAGE_NOPOLICY, MSG_SENDPAGE_NOTLAST and
MSG_SENDPAGE_DECRYPTED as internal flags, which causes sendmsg() to object
if they are passed to sendmsg() by userspace.  Network protocol ->sendmsg()
implementations can then allow them through.

Note that it should be possible to remove MSG_SENDPAGE_NOTLAST once
sendpage is removed as a whole slew of pages will be passed in in one go by
splice through sendmsg, with MSG_MORE being set if it has more data waiting
in the pipe.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Link: https://lore.kernel.org/r/20230526181338.03a99016@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotcp: let tcp_mtu_probe() build headless packets
Eric Dumazet [Wed, 7 Jun 2023 21:41:13 +0000 (21:41 +0000)]
tcp: let tcp_mtu_probe() build headless packets

tcp_mtu_probe() is still copying payload from skbs in the write queue,
using skb_copy_bits(), ignoring potential errors.

Modern TCP stack wants to only deal with payload found in page frags,
as this is a prereq for TCPDirect (host stack might not have access
to the payload)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230607214113.1992947-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge tag 'mlx5-updates-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Fri, 9 Jun 2023 02:28:20 +0000 (19:28 -0700)]
Merge tag 'mlx5-updates-2023-06-06' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-06-06

1) Support 4 ports VF LAG, part 2/2
2) Few extra trivial cleanup patches

Shay Drory Says:
================

Support 4 ports VF LAG, part 2/2

This series continues the series[1] "Support 4 ports VF LAG, part1/2".
This series adds support for 4 ports VF LAG (single FDB E-Switch).

This series of patches refactoring LAG code that make assumptions
about VF LAG supporting only two ports and then enable 4 ports VF LAG.

Patch 1:
- Fix for ib rep code
Patches 2-5:
- Refactors LAG layer.
Patches 6-7:
- Block LAG types which doesn't support 4 ports.
Patch 8:
- Enable 4 ports VF LAG.

This series specifically allows HCAs with 4 ports to create a VF LAG
with only 4 ports. It is not possible to create a VF LAG with 2 or 3
ports using HCAs that have 4 ports.

Currently, the Merged E-Switch feature only supports HCAs with 2 ports.
However, upcoming patches will introduce support for HCAs with 4 ports.

In order to activate VF LAG a user can execute:

devlink dev eswitch set pci/0000:08:00.0 mode switchdev
devlink dev eswitch set pci/0000:08:00.1 mode switchdev
devlink dev eswitch set pci/0000:08:00.2 mode switchdev
devlink dev eswitch set pci/0000:08:00.3 mode switchdev
ip link add name bond0 type bond
ip link set dev bond0 type bond mode 802.3ad
ip link set dev eth2 master bond0
ip link set dev eth3 master bond0
ip link set dev eth4 master bond0
ip link set dev eth5 master bond0

Where eth2, eth3, eth4 and eth5 are net-interfaces of pci/0000:08:00.0
pci/0000:08:00.1 pci/0000:08:00.2 pci/0000:08:00.3 respectively.

User can verify LAG state and type via debugfs:
/sys/kernel/debug/mlx5/0000\:08\:00.0/lag/state
/sys/kernel/debug/mlx5/0000\:08\:00.0/lag/type

[1]
https://lore.kernel.org/netdev/20230601060118.154015-1-saeed@kernel.org/T/#mf1d2083780970ba277bfe721554d4925f03f36d1

================

* tag 'mlx5-updates-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: simplify condition after napi budget handling change
  mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch manager
  net/mlx5: Skip inline mode check after mlx5_eswitch_enable_locked() failure
  net/mlx5e: TC, refactor access to hash key
  net/mlx5e: Remove RX page cache leftovers
  net/mlx5e: Expose catastrophic steering error counters
  net/mlx5: Enable 4 ports VF LAG
  net/mlx5: LAG, block multiport eswitch LAG in case ldev have more than 2 ports
  net/mlx5: LAG, block multipath LAG in case ldev have more than 2 ports
  net/mlx5: LAG, change mlx5_shared_fdb_supported() to static
  net/mlx5: LAG, generalize handling of shared FDB
  net/mlx5: LAG, check if all eswitches are paired for shared FDB
  {net/RDMA}/mlx5: introduce lag_for_each_peer
  RDMA/mlx5: Free second uplink ib port
====================

Link: https://lore.kernel.org/r/20230607210410.88209-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoethtool: ioctl: improve error checking for set_wol
Justin Chen [Wed, 7 Jun 2023 23:14:11 +0000 (16:14 -0700)]
ethtool: ioctl: improve error checking for set_wol

The netlink version of set_wol checks for not supported wolopts and avoids
setting wol when the correct wolopt is already set. If we do the same with
the ioctl version then we can remove these checks from the driver layer.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/1686179653-29750-1-git-send-email-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'complete-lynx-mdio-device-handling'
Jakub Kicinski [Fri, 9 Jun 2023 02:19:52 +0000 (19:19 -0700)]
Merge branch 'complete-lynx-mdio-device-handling'

Russell King says:

====================
complete Lynx mdio device handling

This series completes the mdio device lifetime handling for Lynx PCS
users which do not create their own mdio device, but instead fetch
it using a firmware description - namely the DPAA2 and FMAN_MEMAC
drivers.

In a previous patch set, lynx_pcs_create() was modified to increase
the mdio device refcount, and lynx_pcs_destroy() to drop that
refcount.

The first two patches change these two drivers to put the reference
which they hold immediately after lynx_pcs_create(), effectively
handing the responsibility for maintaining the refcount to the Lynx
PCS driver.

A side effect of the first two patches is that lynx_get_mdio_device()
is no longer used, so patch 3 removes it.

Patch 4 adds a new helper - lynx_pcs_create_fwnode(), which creates
a Lynx PCS instance from the fwnode.

Patch 5 and 6 convert the two drivers to make use of this new helper,
which simply has to find the mdio device, and then create the Lynx
PCS from that.

With those conversions done, lynx_pcs_create() is no longer required
outside pcs-lynx.c, so remove it from public view.

Patch 8 we changes lynx_pcs_create() to return an error-pointer rather
than NULL to bring consistency to the return style, and means that we
can remove the NULL-to-error-pointer conversion from both
lynx_pcs_create_fwnode() and lynx_pcs_create_mdiodev().

Patch 9 adds a check for the fwnode being available, and returns an
-ENODEV error pointer if unavailable.

Patch 10 removes this check from DPAA2, detecting the error pointer
value to continue printing the helpful message.

Patch 11 removes this check from fman_memac, and in doing so fixes a
bug where if the node is unavailable, the reference count is not
dropped.
====================

Link: https://lore.kernel.org/r/ZIBwuw+IuGQo5yV8@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: fman_memac: use pcs-lynx's check for fwnode availability
Russell King (Oracle) [Wed, 7 Jun 2023 11:59:04 +0000 (12:59 +0100)]
net: fman_memac: use pcs-lynx's check for fwnode availability

Use pcs-lynx's check rather than our own when determining if the device
is available. This fixes a bug where the reference gained by
of_parse_phandle() is not dropped if the device is not available.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dpaa2: use pcs-lynx's check for fwnode availability
Russell King (Oracle) [Wed, 7 Jun 2023 11:58:59 +0000 (12:58 +0100)]
net: dpaa2: use pcs-lynx's check for fwnode availability

Use pcs-lynx's check rather than our own when determining if the device
is available.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: pcs: lynx: check that the fwnode is available prior to use
Russell King (Oracle) [Wed, 7 Jun 2023 11:58:54 +0000 (12:58 +0100)]
net: pcs: lynx: check that the fwnode is available prior to use

Check that the fwnode is marked as available prior to trying to lookup
the PCS device, and return -ENODEV if unavailable. Document the return
codes from lynx_pcs_create_fwnode().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: pcs: lynx: change lynx_pcs_create() to return error-pointers
Russell King (Oracle) [Wed, 7 Jun 2023 11:58:49 +0000 (12:58 +0100)]
net: pcs: lynx: change lynx_pcs_create() to return error-pointers

Change lynx_pcs_create() to return an error-pointer on failure to
allocate memory, rather than returning NULL. This allows the removal
of the conversion in lynx_pcs_create_fwnode() and
lynx_pcs_create_mdiodev().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: pcs: lynx: make lynx_pcs_create() static
Russell King (Oracle) [Wed, 7 Jun 2023 11:58:44 +0000 (12:58 +0100)]
net: pcs: lynx: make lynx_pcs_create() static

We no longer need to export lynx_pcs_create() for drivers to use as we
now have all the functionality we need in the two new creation helpers.
Remove the export and prototype, and make it static.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: fman_memac: use lynx_pcs_create_fwnode()
Russell King (Oracle) [Wed, 7 Jun 2023 11:58:39 +0000 (12:58 +0100)]
net: fman_memac: use lynx_pcs_create_fwnode()

Use lynx_pcs_create_fwnode() to create a lynx PCS from a fwnode handle.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dpaa2-mac: use lynx_pcs_create_fwnode()
Russell King (Oracle) [Wed, 7 Jun 2023 11:58:34 +0000 (12:58 +0100)]
net: dpaa2-mac: use lynx_pcs_create_fwnode()

Use lynx_pcs_create_fwnode() to create a lynx PCS from a fwnode handle.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: pcs: lynx: add lynx_pcs_create_fwnode()
Russell King (Oracle) [Wed, 7 Jun 2023 11:58:29 +0000 (12:58 +0100)]
net: pcs: lynx: add lynx_pcs_create_fwnode()

Add a helper to create a lynx PCS from a fwnode handle.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: pcs: lynx: remove lynx_get_mdio_device()
Russell King (Oracle) [Wed, 7 Jun 2023 11:58:24 +0000 (12:58 +0100)]
net: pcs: lynx: remove lynx_get_mdio_device()

lynx_get_mdio_device() is no longer necessary, let's remove it so the
lynx PCS code is always managing the lifetime of the mdiodev.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: fman_memac: allow lynx PCS to handle mdiodev lifetime
Russell King (Oracle) [Wed, 7 Jun 2023 11:58:18 +0000 (12:58 +0100)]
net: fman_memac: allow lynx PCS to handle mdiodev lifetime

Put the mdiodev after lynx_pcs_create() so that the Lynx PCS driver
can manage the lifetime of the mdiodev its using.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dpaa2-mac: allow lynx PCS to manage mdiodev lifetime
Russell King (Oracle) [Wed, 7 Jun 2023 11:58:13 +0000 (12:58 +0100)]
net: dpaa2-mac: allow lynx PCS to manage mdiodev lifetime

Put the mdiodev after lynx_pcs_create() so that the Lynx PCS driver
can manage the lifetime of the mdiodev its using.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: pch_gbe: Allow build on MIPS_GENERIC kernel
Jiaxun Yang [Wed, 7 Jun 2023 05:59:53 +0000 (13:59 +0800)]
net: pch_gbe: Allow build on MIPS_GENERIC kernel

MIPS Boston board, which is using MIPS_GENERIC kernel is using
EG20T PCH and thus need this driver.

Dependency of PCH_GBE, PTP_1588_CLOCK_PCH is also fixed for
MIPS_GENERIC.

Note that CONFIG_PCH_GBE is selected in arch/mips/configs/generic/
board-boston.config for a while, some how it's never wired up
in Kconfig.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230607055953.34110-1-jiaxun.yang@flygoat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomlxsw: spectrum_nve_vxlan: Fix unsupported flag regression
Ido Schimmel [Wed, 7 Jun 2023 12:19:26 +0000 (14:19 +0200)]
mlxsw: spectrum_nve_vxlan: Fix unsupported flag regression

The recently added 'VXLAN_F_LOCALBYPASS' flag is set by default on VXLAN
devices and denotes a behavior that is irrelevant for the hardware data
path. Add it to the lists of IPv4 and IPv6 supported flags to avoid
rejecting offload of VXLAN devices which have this flag set.

Fixes: 69474a8a5837 ("net: vxlan: Add nolocalbypass option to vxlan.")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/5533e63643bf719bbe286fef60f749c9cad35005.1686139716.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'tools-ynl-generate-code-for-the-devlink-family'
Jakub Kicinski [Thu, 8 Jun 2023 21:01:12 +0000 (14:01 -0700)]
Merge branch 'tools-ynl-generate-code-for-the-devlink-family'

Jakub Kicinski says:

====================
tools: ynl: generate code for the devlink family

Another chunk of changes to support more capabilities in the YNL
code gen. Devlink brings in deep nesting and directional messages
(requests and responses have different IDs). We need a healthy
dose of codegen changes to support those (I wasn't planning to
support code gen for "directional" families initially, but
the importance of devlink and ethtool is undeniable).
====================

Link: https://lore.kernel.org/r/20230607202403.1089925-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl: add sample for devlink
Jakub Kicinski [Wed, 7 Jun 2023 20:24:03 +0000 (13:24 -0700)]
tools: ynl: add sample for devlink

Add a sample to show off how to issue basic devlink requests.
For added testing issue get requests while walking a dump.

$ ./devlink
netdevsim/netdevsim1:
    driver: netdevsim
    running fw:
        fw.mgmt: 10.20.30
    ...
netdevsim/netdevsim2:
    driver: netdevsim
    running fw:
        fw.mgmt: 10.20.30
    ...

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl: generate code for the devlink family
Jakub Kicinski [Wed, 7 Jun 2023 20:24:02 +0000 (13:24 -0700)]
tools: ynl: generate code for the devlink family

Admittedly the devlink.yaml spec is fairly limitted,
it only covers basic device get and info-get ops.
That's sufficient to be useful (monitoring FW versions
in the fleet). Plus it gives us a chance to exercise
deep nesting and directional messaging in YNL.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: don't generate forward declarations for policies - regen
Jakub Kicinski [Wed, 7 Jun 2023 20:24:01 +0000 (13:24 -0700)]
tools: ynl-gen: don't generate forward declarations for policies - regen

Renegerate code after dropping forward declarations for policies.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: don't generate forward declarations for policies
Jakub Kicinski [Wed, 7 Jun 2023 20:24:00 +0000 (13:24 -0700)]
tools: ynl-gen: don't generate forward declarations for policies

Now that all nested types have structs and are sorted topologically
there should be no need to generate forward declarations for policies.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: walk nested types in depth
Jakub Kicinski [Wed, 7 Jun 2023 20:23:59 +0000 (13:23 -0700)]
tools: ynl-gen: walk nested types in depth

So far we had only created structures for nested types nested
directly in messages (second level of attrs so to speak).
Walk types in depth to support deeper nesting.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: inherit struct use info
Jakub Kicinski [Wed, 7 Jun 2023 20:23:58 +0000 (13:23 -0700)]
tools: ynl-gen: inherit struct use info

We only render parse and netlink generation helpers as needed,
to avoid generating dead code. Propagate the information from
first- and second-layer attribute sets onto all children.
Otherwise devlink won't work, it has a lot more levels of nesting.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: try to sort the types more intelligently
Jakub Kicinski [Wed, 7 Jun 2023 20:23:57 +0000 (13:23 -0700)]
tools: ynl-gen: try to sort the types more intelligently

We need to sort the structures to avoid the need for forward
declarations. While at it remove the sort of structs when
rendering, it doesn't do anything.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: enable code gen for directional specs
Jakub Kicinski [Wed, 7 Jun 2023 20:23:56 +0000 (13:23 -0700)]
tools: ynl-gen: enable code gen for directional specs

I think that user space code gen for directional specs
works after recent changes. Let them through.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: refactor strmap helper generation
Jakub Kicinski [Wed, 7 Jun 2023 20:23:55 +0000 (13:23 -0700)]
tools: ynl-gen: refactor strmap helper generation

Move generating strmap lookup function to a helper.
No functional changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl-gen: use enum names in op strmap more carefully
Jakub Kicinski [Wed, 7 Jun 2023 20:23:54 +0000 (13:23 -0700)]
tools: ynl-gen: use enum names in op strmap more carefully

In preparation for supporting families which use different msg
ids to and from the kernel - make sure the ids in op strmap
are correct. The map is expected to be used mostly for notifications,
don't generate a separate map for the "to kernel" direction.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonetlink: specs: devlink: fill in some details important for C
Jakub Kicinski [Wed, 7 Jun 2023 20:23:53 +0000 (13:23 -0700)]
netlink: specs: devlink: fill in some details important for C

Python YNL is much more forgiving than the C code gen in terms
of the spec completeness. Fill in a handful of devlink details
to make the spec usable in C.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 8 Jun 2023 18:34:28 +0000 (11:34 -0700)]
Merge git://git./linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Conflicts:

net/sched/sch_taprio.c
  d636fc5dd692 ("net: sched: add rcu annotations around qdisc->qdisc_sleeping")
  dced11ef84fb ("net/sched: taprio: don't overwrite "sch" variable in taprio_dump_class_stats()")

net/ipv4/sysctl_net_ipv4.c
  e209fee4118f ("net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294")
  ccce324dabfe ("tcp: make the first N SYN RTO backoffs linear")
https://lore.kernel.org/all/20230605100816.08d41a7b@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge tag 'net-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 8 Jun 2023 16:27:19 +0000 (09:27 -0700)]
Merge tag 'net-6.4-rc6' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from can, wifi, netfilter, bluetooth and ebpf.

  Current release - regressions:

   - bpf: sockmap: avoid potential NULL dereference in
     sk_psock_verdict_data_ready()

   - wifi: iwlwifi: fix -Warray-bounds bug in iwl_mvm_wait_d3_notif()

   - phylink: actually fix ksettings_set() ethtool call

   - eth: dwmac-qcom-ethqos: fix a regression on EMAC < 3

  Current release - new code bugs:

   - wifi: mt76: fix possible NULL pointer dereference in
     mt7996_mac_write_txwi()

  Previous releases - regressions:

   - netfilter: fix NULL pointer dereference in nf_confirm_cthelper

   - wifi: rtw88/rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS

   - openvswitch: fix upcall counter access before allocation

   - bluetooth:
      - fix use-after-free in hci_remove_ltk/hci_remove_irk
      - fix l2cap_disconnect_req deadlock

   - nic: bnxt_en: prevent kernel panic when receiving unexpected
     PHC_UPDATE event

  Previous releases - always broken:

   - core: annotate rfs lockless accesses

   - sched: fq_pie: ensure reasonable TCA_FQ_PIE_QUANTUM values

   - netfilter: add null check for nla_nest_start_noflag() in
     nft_dump_basechain_hook()

   - bpf: fix UAF in task local storage

   - ipv4: ping_group_range: allow GID from 2147483648 to 4294967294

   - ipv6: rpl: fix route of death.

   - tcp: gso: really support BIG TCP

   - mptcp: fixes for user-space PM address advertisement

   - smc: avoid to access invalid RMBs' MRs in SMCRv1 ADD LINK CONT

   - can: avoid possible use-after-free when j1939_can_rx_register fails

   - batman-adv: fix UaF while rescheduling delayed work

   - eth: qede: fix scheduling while atomic

   - eth: ice: make writes to /dev/gnssX synchronous"

* tag 'net-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  bnxt_en: Implement .set_port / .unset_port UDP tunnel callbacks
  bnxt_en: Prevent kernel panic when receiving unexpected PHC_UPDATE event
  bnxt_en: Skip firmware fatal error recovery if chip is not accessible
  bnxt_en: Query default VLAN before VNIC setup on a VF
  bnxt_en: Don't issue AP reset during ethtool's reset operation
  bnxt_en: Fix bnxt_hwrm_update_rss_hash_cfg()
  net: bcmgenet: Fix EEE implementation
  eth: ixgbe: fix the wake condition
  eth: bnxt: fix the wake condition
  lib: cpu_rmap: Fix potential use-after-free in irq_cpu_rmap_release()
  bpf: Add extra path pointer check to d_path helper
  net: sched: fix possible refcount leak in tc_chain_tmplt_add()
  net: sched: act_police: fix sparse errors in tcf_police_dump()
  net: openvswitch: fix upcall counter access before allocation
  net: sched: move rtm_tca_policy declaration to include file
  ice: make writes to /dev/gnssX synchronous
  net: sched: add rcu annotations around qdisc->qdisc_sleeping
  rfs: annotate lockless accesses to RFS sock flow table
  rfs: annotate lockless accesses to sk->sk_rxhash
  virtio_net: use control_buf for coalesce params
  ...

13 months agoMerge tag 'xfs-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Thu, 8 Jun 2023 15:46:58 +0000 (08:46 -0700)]
Merge tag 'xfs-6.4-rc5-fixes' of git://git./fs/xfs/xfs-linux

Pull xfs fixes from Dave Chinner:
 "These are a set of regression fixes discovered on recent kernels. I
  was hoping to send this to you a week and half ago, but events out of
  my control delayed finalising the changes until early this week.

  Whilst the diffstat looks large for this stage of the merge window, a
  large chunk of it comes from moving the guts of one function from one
  file to another i.e. it's the same code, it is just run in a different
  context where it is safe to hold a specific lock. Otherwise the
  individual changes are relatively small and straigtht forward.

  Summary:

   - Propagate unlinked inode list corruption back up to log recovery
     (regression fix)

   - improve corruption detection for AGFL entries, AGFL indexes and
     XEFI extents (syzkaller fuzzer oops report)

   - Avoid double perag reference release (regression fix)

   - Improve extent merging detection in scrub (regression fix)

   - Fix a new undefined high bit shift (regression fix)

   - Fix for AGF vs inode cluster buffer deadlock (regression fix)"

* tag 'xfs-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: collect errors from inodegc for unlinked inode recovery
  xfs: validate block number being freed before adding to xefi
  xfs: validity check agbnos on the AGFL
  xfs: fix agf/agfl verification on v4 filesystems
  xfs: fix double xfs_perag_rele() in xfs_filestream_pick_ag()
  xfs: fix broken logic when detecting mergeable bmap records
  xfs: Fix undefined behavior of shift into sign bit
  xfs: fix AGF vs inode cluster buffer deadlock
  xfs: defered work could create precommits
  xfs: restore allocation trylock iteration
  xfs: buffer pins need to hold a buffer reference

13 months agoMerge branch 'crypto-splice-net-make-af_alg-handle-sendmsg-msg_splice_pages'
Paolo Abeni [Thu, 8 Jun 2023 11:42:54 +0000 (13:42 +0200)]
Merge branch 'crypto-splice-net-make-af_alg-handle-sendmsg-msg_splice_pages'

David Howells says:

====================
crypto, splice, net: Make AF_ALG handle sendmsg(MSG_SPLICE_PAGES)

Here are patches to make AF_ALG handle the MSG_SPLICE_PAGES internal
sendmsg flag.  MSG_SPLICE_PAGES is an internal hint that tells the protocol
that it should splice the pages supplied if it can.  The sendpage functions
are then turned into wrappers around that.

This set consists of the following parts:

 (1) Move netfs_extract_iter_to_sg() to somewhere more general and rename
     it to drop the "netfs" prefix.  We use this to extract directly from
     an iterator into a scatterlist.

 (2) Make AF_ALG use iov_iter_extract_pages().  This has the additional
     effect of pinning pages obtained from userspace rather than taking
     refs on them.  Pages from kernel-backed iterators would not be pinned,
     but AF_ALG isn't really meant for use by kernel services.

 (3) Change AF_ALG still further to use extract_iter_to_sg().

 (4) Make af_alg_sendmsg() support MSG_SPLICE_PAGES support and make
     af_alg_sendpage() just a wrapper around sendmsg().  This has to take
     refs on the pages pinned for the moment.

 (5) Make hash_sendmsg() support MSG_SPLICE_PAGES by simply ignoring it.
     hash_sendpage() is left untouched to be removed later, after the
     splice core has been changed to call sendmsg().
====================

Link: https://lore.kernel.org/r/20230606130856.1970660-1-dhowells@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agocrypto: af_alg/hash: Support MSG_SPLICE_PAGES
David Howells [Tue, 6 Jun 2023 13:08:56 +0000 (14:08 +0100)]
crypto: af_alg/hash: Support MSG_SPLICE_PAGES

Make AF_ALG sendmsg() support MSG_SPLICE_PAGES in the hashing code.  This
causes pages to be spliced from the source iterator if possible.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: netdev@vger.kernel.org
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agocrypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES
David Howells [Tue, 6 Jun 2023 13:08:55 +0000 (14:08 +0100)]
crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES

Convert af_alg_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: netdev@vger.kernel.org
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agocrypto: af_alg: Support MSG_SPLICE_PAGES
David Howells [Tue, 6 Jun 2023 13:08:54 +0000 (14:08 +0100)]
crypto: af_alg: Support MSG_SPLICE_PAGES

Make AF_ALG sendmsg() support MSG_SPLICE_PAGES.  This causes pages to be
spliced from the source iterator.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: netdev@vger.kernel.org
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agocrypto: af_alg: Indent the loop in af_alg_sendmsg()
David Howells [Tue, 6 Jun 2023 13:08:53 +0000 (14:08 +0100)]
crypto: af_alg: Indent the loop in af_alg_sendmsg()

Put the loop in af_alg_sendmsg() into an if-statement to indent it to make
the next patch easier to review as that will add another branch to handle
MSG_SPLICE_PAGES to the if-statement.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: netdev@vger.kernel.org
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agocrypto: af_alg: Use extract_iter_to_sg() to create scatterlists
David Howells [Tue, 6 Jun 2023 13:08:52 +0000 (14:08 +0100)]
crypto: af_alg: Use extract_iter_to_sg() to create scatterlists

Use extract_iter_to_sg() to decant the destination iterator into a
scatterlist in af_alg_get_rsgl().  af_alg_make_sg() can then be removed.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: netdev@vger.kernel.org
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agocrypto: af_alg: Pin pages rather than ref'ing if appropriate
David Howells [Tue, 6 Jun 2023 13:08:51 +0000 (14:08 +0100)]
crypto: af_alg: Pin pages rather than ref'ing if appropriate

Convert AF_ALG to use iov_iter_extract_pages() instead of
iov_iter_get_pages().  This will pin pages or leave them unaltered rather
than getting a ref on them as appropriate to the iterator.

The pages need to be pinned for DIO-read rather than having refs taken on
them to prevent VM copy-on-write from malfunctioning during a concurrent
fork() (the result of the I/O would otherwise end up only visible to the
child process and not the parent).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: netdev@vger.kernel.org
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agoMove netfs_extract_iter_to_sg() to lib/scatterlist.c
David Howells [Tue, 6 Jun 2023 13:08:50 +0000 (14:08 +0100)]
Move netfs_extract_iter_to_sg() to lib/scatterlist.c

Move netfs_extract_iter_to_sg() to lib/scatterlist.c as it's going to be
used by more than just network filesystems (AF_ALG, for example).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: netdev@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agoWrap lines at 80
David Howells [Tue, 6 Jun 2023 13:08:49 +0000 (14:08 +0100)]
Wrap lines at 80

Wrap a line at 80 to stop checkpatch complaining.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Simon Horman <simon.horman@corigine.com>
cc: linux-crypto@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: netdev@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agoFix a couple of spelling mistakes
David Howells [Tue, 6 Jun 2023 13:08:48 +0000 (14:08 +0100)]
Fix a couple of spelling mistakes

Fix a couple of spelling mistakes in a comment.

Suggested-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/ZHH2mSRqeL4Gs1ft@corigine.com/
Link: https://lore.kernel.org/r/ZHH1nqZWOGzxlidT@corigine.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: netdev@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agoDrop the netfs_ prefix from netfs_extract_iter_to_sg()
David Howells [Tue, 6 Jun 2023 13:08:47 +0000 (14:08 +0100)]
Drop the netfs_ prefix from netfs_extract_iter_to_sg()

Rename netfs_extract_iter_to_sg() and its auxiliary functions to drop the
netfs_ prefix.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-crypto@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: netdev@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agoMerge branch 'txgbe-phylink-support'
Paolo Abeni [Thu, 8 Jun 2023 11:25:14 +0000 (13:25 +0200)]
Merge branch 'txgbe-phylink-support'

Jiawen Wu says:

====================
TXGBE PHYLINK support

Implement I2C, SFP, GPIO and PHYLINK to setup TXGBE link.

Because our I2C and PCS are based on Synopsys Designware IP-core, extend
the i2c-designware and pcs-xpcs driver to realize our functions.
====================

Link: https://lore.kernel.org/r/20230606092107.764621-1-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet: txgbe: Support phylink MAC layer
Jiawen Wu [Tue, 6 Jun 2023 09:21:07 +0000 (17:21 +0800)]
net: txgbe: Support phylink MAC layer

Add phylink support to Wangxun 10Gb Ethernet controller for the 10GBASE-R
interface.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>