platform/kernel/linux-starfive.git
13 months agodt-bindings: net: pse-pd: Allow -N suffix for ethernet-pse node names
Oleksij Rempel [Wed, 31 May 2023 10:21:12 +0000 (12:21 +0200)]
dt-bindings: net: pse-pd: Allow -N suffix for ethernet-pse node names

Extend the pattern matching for PSE-PD controller nodes to allow -N
suffixes. This enables the use of multiple "ethernet-pse" nodes without the
need for a "reg" property.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodevlink: bring port new reply back
Jiri Pirko [Wed, 31 May 2023 14:20:25 +0000 (16:20 +0200)]
devlink: bring port new reply back

In the offending fixes commit I mistakenly removed the reply message of
the port new command. I was under impression it is a new port
notification, partly due to the "notify" in the name of the helper
function. Bring the code sending reply with new port message back, this
time putting it directly to devlink_nl_cmd_port_new_doit()

Fixes: c496daeb8630 ("devlink: remove duplicate port notification")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230531142025.2605001-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 1 Jun 2023 22:33:03 +0000 (15:33 -0700)]
Merge git://git./linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

drivers/net/ethernet/sfc/tc.c
  622ab656344a ("sfc: fix error unwinds in TC offload")
  b6583d5e9e94 ("sfc: support TC decap rules matching on enc_src_port")

net/mptcp/protocol.c
  5b825727d087 ("mptcp: add annotations around msk->subflow accesses")
  e76c8ef5cc5b ("mptcp: refactor mptcp_stream_accept()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge tag 'net-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 1 Jun 2023 21:29:18 +0000 (17:29 -0400)]
Merge tag 'net-6.4-rc5' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Happy Wear a Dress Day.

  Fairly standard-sized batch of fixes, accounting for the lack of
  sub-tree submissions this week. The mlx5 IRQ fixes are notable, people
  were complaining about that. No fires burning.

  Current release - regressions:

   - eth: mlx5e:
      - multiple fixes for dynamic IRQ allocation
      - prevent encap offload when neigh update is running

   - eth: mana: fix perf regression: remove rx_cqes, tx_cqes counters

  Current release - new code bugs:

   - eth: mlx5e: DR, add missing mutex init/destroy in pattern manager

  Previous releases - always broken:

   - tcp: deny tcp_disconnect() when threads are waiting

   - sched: prevent ingress Qdiscs from getting installed in random
     locations in the hierarchy and moving around

   - sched: flower: fix possible OOB write in fl_set_geneve_opt()

   - netlink: fix NETLINK_LIST_MEMBERSHIPS length report

   - udp6: fix race condition in udp6_sendmsg & connect

   - tcp: fix mishandling when the sack compression is deferred

   - rtnetlink: validate link attributes set at creation time

   - mptcp: fix connect timeout handling

   - eth: stmmac: fix call trace when stmmac_xdp_xmit() is invoked

   - eth: amd-xgbe: fix the false linkup in xgbe_phy_status

   - eth: mlx5e:
      - fix corner cases in internal buffer configuration
      - drain health before unregistering devlink

   - usb: qmi_wwan: set DTR quirk for BroadMobi BM818

  Misc:

   - tcp: return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if
     user_mss set"

* tag 'net-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
  mptcp: fix active subflow finalization
  mptcp: add annotations around sk->sk_shutdown accesses
  mptcp: fix data race around msk->first access
  mptcp: consolidate passive msk socket initialization
  mptcp: add annotations around msk->subflow accesses
  mptcp: fix connect timeout handling
  rtnetlink: add the missing IFLA_GRO_ tb check in validate_linkmsg
  rtnetlink: move IFLA_GSO_ tb check to validate_linkmsg
  rtnetlink: call validate_linkmsg in rtnl_create_link
  ice: recycle/free all of the fragments from multi-buffer frame
  net: phy: mxl-gpy: extend interrupt fix to all impacted variants
  net: renesas: rswitch: Fix return value in error path of xmit
  net: dsa: mv88e6xxx: Increase wait after reset deactivation
  net: ipa: Use correct value for IPA_STATUS_SIZE
  tcp: fix mishandling when the sack compression is deferred.
  net/sched: flower: fix possible OOB write in fl_set_geneve_opt()
  sfc: fix error unwinds in TC offload
  net/mlx5: Read embedded cpu after init bit cleared
  net/mlx5e: Fix error handling in mlx5e_refresh_tirs
  net/mlx5: Ensure af_desc.mask is properly initialized
  ...

13 months agofork, vhost: Use CLONE_THREAD to fix freezer/ps regression
Mike Christie [Thu, 1 Jun 2023 18:32:32 +0000 (13:32 -0500)]
fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

When switching from kthreads to vhost_tasks two bugs were added:
1. The vhost worker tasks's now show up as processes so scripts doing
ps or ps a would not incorrectly detect the vhost task as another
process.  2. kthreads disabled freeze by setting PF_NOFREEZE, but
vhost tasks's didn't disable or add support for them.

To fix both bugs, this switches the vhost task to be thread in the
process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call
get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that
SIGKILL/STOP support is required because CLONE_THREAD requires
CLONE_SIGHAND which requires those 2 signals to be supported.

This is a modified version of the patch written by Mike Christie
<michael.christie@oracle.com> which was a modified version of patch
originally written by Linus.

Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER.
Including ignoring signals, setting up the register state, and having
get_signal return instead of calling do_group_exit.

Tidied up the vhost_task abstraction so that the definition of
vhost_task only needs to be visible inside of vhost_task.c.  Making
it easier to review the code and tell what needs to be done where.
As part of this the main loop has been moved from vhost_worker into
vhost_task_fn.  vhost_worker now returns true if work was done.

The main loop has been updated to call get_signal which handles
SIGSTOP, freezing, and collects the message that tells the thread to
exit as part of process exit.  This collection clears
__fatal_signal_pending.  This collection is not guaranteed to
clear signal_pending() so clear that explicitly so the schedule()
sleeps.

For now the vhost thread continues to exist and run work until the
last file descriptor is closed and the release function is called as
part of freeing struct file.  To avoid hangs in the coredump
rendezvous and when killing threads in a multi-threaded exec.  The
coredump code and de_thread have been modified to ignore vhost threads.

Remvoing the special case for exec appears to require teaching
vhost_dev_flush how to directly complete transactions in case
the vhost thread is no longer running.

Removing the special case for coredump rendezvous requires either the
above fix needed for exec or moving the coredump rendezvous into
get_signal.

Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads")
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Co-developed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 months agoMerge tag 'mlx5-fixes-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Thu, 1 Jun 2023 17:15:43 +0000 (10:15 -0700)]
Merge tag 'mlx5-fixes-2023-05-31' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2023-05-31

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Read embedded cpu after init bit cleared
  net/mlx5e: Fix error handling in mlx5e_refresh_tirs
  net/mlx5: Ensure af_desc.mask is properly initialized
  net/mlx5: Fix setting of irq->map.index for static IRQ case
  net/mlx5: Remove rmap also in case dynamic MSIX not supported
====================

Link: https://lore.kernel.org/r/20230601031051.131529-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'mptcp-fixes-for-connect-timeout-access-annotations-and-subflow-init'
Jakub Kicinski [Thu, 1 Jun 2023 17:04:06 +0000 (10:04 -0700)]
Merge branch 'mptcp-fixes-for-connect-timeout-access-annotations-and-subflow-init'

Mat Martineau says:

====================
mptcp: Fixes for connect timeout, access annotations, and subflow init

Patch 1 allows the SO_SNDTIMEO sockopt to correctly change the connect
timeout on MPTCP sockets.

Patches 2-5 add READ_ONCE()/WRITE_ONCE() annotations to fix KCSAN issues.

Patch 6 correctly initializes some subflow fields on outgoing connections.
====================

Link: https://lore.kernel.org/r/20230531-send-net-20230531-v1-0-47750c420571@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomptcp: fix active subflow finalization
Paolo Abeni [Wed, 31 May 2023 19:37:08 +0000 (12:37 -0700)]
mptcp: fix active subflow finalization

Active subflow are inserted into the connection list at creation time.
When the MPJ handshake completes successfully, a new subflow creation
netlink event is generated correctly, but the current code wrongly
avoid initializing a couple of subflow data.

The above will cause misbehavior on a few exceptional events: unneeded
mptcp-level retransmission on msk-level sequence wrap-around and infinite
mapping fallback even when a MPJ socket is present.

Address the issue factoring out the needed initialization in a new helper
and invoking the latter from __mptcp_finish_join() time for passive
subflow and from mptcp_finish_join() for active ones.

Fixes: 0530020a7c8f ("mptcp: track and update contiguous data status")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomptcp: add annotations around sk->sk_shutdown accesses
Paolo Abeni [Wed, 31 May 2023 19:37:07 +0000 (12:37 -0700)]
mptcp: add annotations around sk->sk_shutdown accesses

Christoph reported the mptcp variant of a recently addressed plain
TCP issue. Similar to commit e14cadfd80d7 ("tcp: add annotations around
sk->sk_shutdown accesses") add READ/WRITE ONCE annotations to silence
KCSAN reports around lockless sk_shutdown access.

Fixes: 71ba088ce0aa ("mptcp: cleanup accept and poll")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/401
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomptcp: fix data race around msk->first access
Paolo Abeni [Wed, 31 May 2023 19:37:06 +0000 (12:37 -0700)]
mptcp: fix data race around msk->first access

The first subflow socket is accessed outside the msk socket lock
by mptcp_subflow_fail(), we need to annotate each write access
with WRITE_ONCE, but a few spots still lacks it.

Fixes: 76a13b315709 ("mptcp: invoke MP_FAIL response when needed")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomptcp: consolidate passive msk socket initialization
Paolo Abeni [Wed, 31 May 2023 19:37:05 +0000 (12:37 -0700)]
mptcp: consolidate passive msk socket initialization

When the msk socket is cloned at MPC handshake time, a few
fields are initialized in a racy way outside mptcp_sk_clone()
and the msk socket lock.

The above is due historical reasons: before commit a88d0092b24b
("mptcp: simplify subflow_syn_recv_sock()") as the first subflow socket
carrying all the needed date was not available yet at msk creation
time

We can now refactor the code moving the missing initialization bit
under the socket lock, removing the init race and avoiding some
code duplication.

This will also simplify the next patch, as all msk->first write
access are now under the msk socket lock.

Fixes: 0397c6d85f9c ("mptcp: keep unaccepted MPC subflow into join list")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomptcp: add annotations around msk->subflow accesses
Paolo Abeni [Wed, 31 May 2023 19:37:04 +0000 (12:37 -0700)]
mptcp: add annotations around msk->subflow accesses

The MPTCP can access the first subflow socket in a few spots
outside the socket lock scope. That is actually safe, as MPTCP
will delete the socket itself only after the msk sock close().

Still the such accesses causes a few KCSAN splats, as reported
by Christoph. Silence the harmless warning adding a few annotation
around the relevant accesses.

Fixes: 71ba088ce0aa ("mptcp: cleanup accept and poll")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/402
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomptcp: fix connect timeout handling
Paolo Abeni [Wed, 31 May 2023 19:37:03 +0000 (12:37 -0700)]
mptcp: fix connect timeout handling

Ondrej reported a functional issue WRT timeout handling on connect
with a nice reproducer.

The problem is that the current mptcp connect waits for both the
MPTCP socket level timeout, and the first subflow socket timeout.
The latter is not influenced/touched by the exposed setsockopt().

Overall the above makes the SO_SNDTIMEO a no-op on connect.

Since mptcp_connect is invoked via inet_stream_connect and the
latter properly handle the MPTCP level timeout, we can address the
issue making the nested subflow level connect always unblocking.

This also allow simplifying a bit the code, dropping an ugly hack
to handle the fastopen and custom proto_ops connect.

The issues predates the blamed commit below, but the current resolution
requires the infrastructure introduced there.

Fixes: 54f1944ed6d2 ("mptcp: factor out mptcp_connect()")
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/399
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'rtnetlink-a-couple-of-fixes-in-linkmsg-validation'
Jakub Kicinski [Thu, 1 Jun 2023 16:59:45 +0000 (09:59 -0700)]
Merge branch 'rtnetlink-a-couple-of-fixes-in-linkmsg-validation'

Xin Long says:

====================
rtnetlink: a couple of fixes in linkmsg validation

validate_linkmsg() was introduced to do linkmsg validation for existing
links. However, the new created links also need this linkmsg validation.

Add validate_linkmsg() check for link creating in Patch 1, and add more
tb checks into validate_linkmsg() in Patch 2 and 3.
====================

Link: https://lore.kernel.org/r/cover.1685548598.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agortnetlink: add the missing IFLA_GRO_ tb check in validate_linkmsg
Xin Long [Wed, 31 May 2023 16:01:44 +0000 (12:01 -0400)]
rtnetlink: add the missing IFLA_GRO_ tb check in validate_linkmsg

This fixes the issue that dev gro_max_size and gso_ipv4_max_size
can be set to a huge value:

  # ip link add dummy1 type dummy
  # ip link set dummy1 gro_max_size 4294967295
  # ip -d link show dummy1
    dummy addrgenmode eui64 ... gro_max_size 4294967295

Fixes: 0fe79f28bfaf ("net: allow gro_max_size to exceed 65536")
Fixes: 9eefedd58ae1 ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agortnetlink: move IFLA_GSO_ tb check to validate_linkmsg
Xin Long [Wed, 31 May 2023 16:01:43 +0000 (12:01 -0400)]
rtnetlink: move IFLA_GSO_ tb check to validate_linkmsg

These IFLA_GSO_* tb check should also be done for the new created link,
otherwise, they can be set to a huge value when creating links:

  # ip link add dummy1 gso_max_size 4294967295 type dummy
  # ip -d link show dummy1
    dummy addrgenmode eui64 ... gso_max_size 4294967295

Fixes: 46e6b992c250 ("rtnetlink: allow GSO maximums to be set on device creation")
Fixes: 9eefedd58ae1 ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agortnetlink: call validate_linkmsg in rtnl_create_link
Xin Long [Wed, 31 May 2023 16:01:42 +0000 (12:01 -0400)]
rtnetlink: call validate_linkmsg in rtnl_create_link

validate_linkmsg() was introduced by commit 1840bb13c22f5b ("[RTNL]:
Validate hardware and broadcast address attribute for RTM_NEWLINK")
to validate tb[IFLA_ADDRESS/BROADCAST] for existing links. The same
check should also be done for newly created links.

This patch adds validate_linkmsg() call in rtnl_create_link(), to
avoid the invalid address set when creating some devices like:

  # ip link add dummy0 type dummy
  # ip link add link dummy0 name mac0 address 01:02 type macsec

Fixes: 0e06877c6fdb ("[RTNETLINK]: rtnl_link: allow specifying initial device address")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoice: recycle/free all of the fragments from multi-buffer frame
Maciej Fijalkowski [Wed, 31 May 2023 15:44:57 +0000 (08:44 -0700)]
ice: recycle/free all of the fragments from multi-buffer frame

The ice driver caches next_to_clean value at the beginning of
ice_clean_rx_irq() in order to remember the first buffer that has to be
freed/recycled after main Rx processing loop. The end boundary is
indicated by first descriptor of frame that Rx processing loop has ended
its duties. Note that if mentioned loop ended in the middle of gathering
multi-buffer frame, next_to_clean would be pointing to the descriptor in
the middle of the frame BUT freeing/recycling stage will stop at the
first descriptor. This means that next iteration of ice_clean_rx_irq()
will miss the (first_desc, next_to_clean - 1) entries.

 When running various 9K MTU workloads, such splats were observed:

[  540.780716] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  540.787787] #PF: supervisor read access in kernel mode
[  540.793002] #PF: error_code(0x0000) - not-present page
[  540.798218] PGD 0 P4D 0
[  540.800801] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  540.805231] CPU: 18 PID: 3984 Comm: xskxceiver Tainted: G        W          6.3.0-rc7+ #96
[  540.813619] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[  540.824209] RIP: 0010:ice_clean_rx_irq+0x2b6/0xf00 [ice]
[  540.829678] Code: 74 24 10 e9 aa 00 00 00 8b 55 78 41 31 57 10 41 09 c4 4d 85 ff 0f 84 83 00 00 00 49 8b 57 08 41 8b 4f 1c 65 8b 35 1a fa 4b 3f <48> 8b 02 48 c1 e8 3a 39 c6 0f 85 a2 00 00 00 f6 42 08 02 0f 85 98
[  540.848717] RSP: 0018:ffffc9000f42fc50 EFLAGS: 00010282
[  540.854029] RAX: 0000000000000004 RBX: 0000000000000002 RCX: 000000000000fffe
[  540.861272] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000ffffffff
[  540.868519] RBP: ffff88984a05ac00 R08: 0000000000000000 R09: dead000000000100
[  540.875760] R10: ffff88983fffcd00 R11: 000000000010f2b8 R12: 0000000000000004
[  540.883008] R13: 0000000000000003 R14: 0000000000000800 R15: ffff889847a10040
[  540.890253] FS:  00007f6ddf7fe640(0000) GS:ffff88afdf800000(0000) knlGS:0000000000000000
[  540.898465] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  540.904299] CR2: 0000000000000000 CR3: 000000010d3da001 CR4: 00000000007706e0
[  540.911542] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  540.918789] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  540.926032] PKRU: 55555554
[  540.928790] Call Trace:
[  540.931276]  <TASK>
[  540.933418]  ice_napi_poll+0x4ca/0x6d0 [ice]
[  540.937804]  ? __pfx_ice_napi_poll+0x10/0x10 [ice]
[  540.942716]  napi_busy_loop+0xd7/0x320
[  540.946537]  xsk_recvmsg+0x143/0x170
[  540.950178]  sock_recvmsg+0x99/0xa0
[  540.953729]  __sys_recvfrom+0xa8/0x120
[  540.957543]  ? do_futex+0xbd/0x1d0
[  540.961008]  ? __x64_sys_futex+0x73/0x1d0
[  540.965083]  __x64_sys_recvfrom+0x20/0x30
[  540.969155]  do_syscall_64+0x38/0x90
[  540.972796]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  540.977934] RIP: 0033:0x7f6de5f27934

To fix this, set cached_ntc to first_desc so that at the end, when
freeing/recycling buffers, descriptors from first to ntc are not missed.

Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230531154457.3216621-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: phy: mxl-gpy: extend interrupt fix to all impacted variants
Xu Liang [Wed, 31 May 2023 07:48:22 +0000 (15:48 +0800)]
net: phy: mxl-gpy: extend interrupt fix to all impacted variants

The interrupt fix in commit 97a89ed101bb should be applied on all variants
of GPY2xx PHY and GPY115C.

Fixes: 97a89ed101bb ("net: phy: mxl-gpy: disable interrupts on GPY215 by default")
Signed-off-by: Xu Liang <lxu@maxlinear.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230531074822.39136-1-lxu@maxlinear.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: renesas: rswitch: Fix return value in error path of xmit
Yoshihiro Shimoda [Mon, 29 May 2023 07:38:17 +0000 (16:38 +0900)]
net: renesas: rswitch: Fix return value in error path of xmit

Fix return value in the error path of rswitch_start_xmit(). If TX
queues are full, this function should return NETDEV_TX_BUSY.

Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20230529073817.1145208-1-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge tag 'firewire-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 1 Jun 2023 15:18:20 +0000 (11:18 -0400)]
Merge tag 'firewire-fixes-6.4-rc4' of git://git./linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Takashi Sakamoto:
 "A single patch to use a flexible array rather than a zero-length one"

* tag 'firewire-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: Replace zero-length array with flexible-array member

13 months agoMerge tag 'mailbox-fixes-6.4-rc5' of git://git.linaro.org/landing-teams/working/fujit...
Linus Torvalds [Thu, 1 Jun 2023 15:13:10 +0000 (11:13 -0400)]
Merge tag 'mailbox-fixes-6.4-rc5' of git://git.linaro.org/landing-teams/working/fujitsu/integration

Pull mailbox fix from Jassi Brar:
 "Fix missing mutex unlock in mailbox-test"

* tag 'mailbox-fixes-6.4-rc5' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
  mailbox: mailbox-test: fix a locking issue in mbox_test_message_write()

13 months agonet: dsa: mv88e6xxx: Increase wait after reset deactivation
Andreas Svensson [Tue, 30 May 2023 14:52:23 +0000 (16:52 +0200)]
net: dsa: mv88e6xxx: Increase wait after reset deactivation

A switch held in reset by default needs to wait longer until we can
reliably detect it.

An issue was observed when testing on the Marvell 88E6393X (Link Street).
The driver failed to detect the switch on some upstarts. Increasing the
wait time after reset deactivation solves this issue.

The updated wait time is now also the same as the wait time in the
mv88e6xxx_hardware_reset function.

Fixes: 7b75e49de424 ("net: dsa: mv88e6xxx: wait after reset deactivation")
Signed-off-by: Andreas Svensson <andreas.svensson@axis.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230530145223.1223993-1-andreas.svensson@axis.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agofirewire: Replace zero-length array with flexible-array member
Gustavo A. R. Silva [Mon, 29 May 2023 18:52:07 +0000 (12:52 -0600)]
firewire: Replace zero-length array with flexible-array member

Zero-length and one-element arrays are deprecated, and we are moving
towards adopting C99 flexible-array members, instead.

Address the following warnings found with GCC-13 and
-fstrict-flex-arrays=3 enabled:
sound/firewire/amdtp-stream.c: In function ‘build_it_pkt_header’:
sound/firewire/amdtp-stream.c:694:17: warning: ‘generate_cip_header’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
  694 |                 generate_cip_header(s, cip_header, data_block_counter, syt);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sound/firewire/amdtp-stream.c:694:17: note: referencing argument 2 of type ‘__be32[2]’ {aka ‘unsigned int[2]’}
sound/firewire/amdtp-stream.c:667:13: note: in a call to function ‘generate_cip_header’
  667 | static void generate_cip_header(struct amdtp_stream *s, __be32 cip_header[2],
      |             ^~~~~~~~~~~~~~~~~~~

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].

Link: https://github.com/KSPP/linux/issues/21
Link: https://github.com/KSPP/linux/issues/303
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/ZHT0V3SpvHyxCv5W@work
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
13 months agoMerge tag 'for-linus-2023060101' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 1 Jun 2023 13:02:04 +0000 (09:02 -0400)]
Merge tag 'for-linus-2023060101' of git://git./linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

 - Regression fix for overlong long timeouts during initialization on
   some Logitech Unifying devices (Bastien Nocera)

 - error handling and overflow fixes for Wacom driver (Denis Arefev,
   Jason Gerecke, Nikita Zhandarovich)

* tag 'for-linus-2023060101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: logitech-hidpp: Handle timeout differently from busy
  HID: wacom: Add error check to wacom_parse_and_register()
  HID: google: add jewel USB id
  HID: wacom: avoid integer overflow in wacom_intuos_inout()
  HID: wacom: Check for string overflow from strscpy calls

13 months agoMerge tag 'ata-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal...
Linus Torvalds [Thu, 1 Jun 2023 12:41:33 +0000 (08:41 -0400)]
Merge tag 'ata-6.4-rc5' of git://git./linux/kernel/git/dlemoal/libata

Pull ata fix from Damien Le Moal:

 - Fix ata_find_dev() use of the device number to find a struct
   ata_device for a port. This addresses issues with some passthrough
   commands with libsas managed devices.

* tag 'ata-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: libata-scsi: Use correct device no in ata_find_dev()

13 months agoMerge tag '6.4-rc4-smb3-server-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Thu, 1 Jun 2023 12:27:34 +0000 (08:27 -0400)]
Merge tag '6.4-rc4-smb3-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
 "Eight server fixes (most also for stable):

   - Two fixes for uninitialized pointer reads (rename and link)

   - Fix potential UAF in oplock break

   - Two fixes for potential out of bound reads in negotiate

   - Fix crediting bug

   - Two fixes for xfstests (allocation size fix for test 694 and lookup
     issue shown by test 464)"

* tag '6.4-rc4-smb3-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: call putname after using the last component
  ksmbd: fix incorrect AllocationSize set in smb2_get_info
  ksmbd: fix UAF issue from opinfo->conn
  ksmbd: fix multiple out-of-bounds read during context decoding
  ksmbd: fix slab-out-of-bounds read in smb2_handle_negotiate
  ksmbd: fix credit count leakage
  ksmbd: fix uninitialized pointer read in smb2_create_link()
  ksmbd: fix uninitialized pointer read in ksmbd_vfs_rename()

13 months agoMerge branch 'splice-net-handle-msg_splice_pages-in-chelsio-tls'
Paolo Abeni [Thu, 1 Jun 2023 11:41:40 +0000 (13:41 +0200)]
Merge branch 'splice-net-handle-msg_splice_pages-in-chelsio-tls'

David Howells says:

====================
splice, net: Handle MSG_SPLICE_PAGES in Chelsio-TLS

Here are patches to make Chelsio-TLS handle the MSG_SPLICE_PAGES internal
sendmsg flag.  MSG_SPLICE_PAGES is an internal hint that tells the protocol
that it should splice the pages supplied if it can.  Its sendpage
implementation is then turned into a wrapper around that.
====================

Link: https://lore.kernel.org/r/20230531110008.642903-1-dhowells@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agochelsio: Convert chtls_sendpage() to use MSG_SPLICE_PAGES
David Howells [Wed, 31 May 2023 11:00:08 +0000 (12:00 +0100)]
chelsio: Convert chtls_sendpage() to use MSG_SPLICE_PAGES

Convert chtls_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than
directly splicing in the pages itself.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ayush Sawal <ayush.sawal@chelsio.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agochelsio: Support MSG_SPLICE_PAGES
David Howells [Wed, 31 May 2023 11:00:07 +0000 (12:00 +0100)]
chelsio: Support MSG_SPLICE_PAGES

Make Chelsio's TLS offload sendmsg() support MSG_SPLICE_PAGES, splicing in
pages from the source iterator if possible and copying the data in
otherwise.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ayush Sawal <ayush.sawal@chelsio.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet: ipa: Use correct value for IPA_STATUS_SIZE
Bert Karwatzki [Wed, 31 May 2023 10:36:19 +0000 (12:36 +0200)]
net: ipa: Use correct value for IPA_STATUS_SIZE

IPA_STATUS_SIZE was introduced in commit b8dc7d0eea5a as a replacement
for the size of the removed struct ipa_status which had size
sizeof(__le32[8]). Use this value as IPA_STATUS_SIZE.

Fixes: b8dc7d0eea5a ("net: ipa: stop using sizeof(status)")
Signed-off-by: Bert Karwatzki <spasswolf@web.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230531103618.102608-1-spasswolf@web.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agotcp: fix mishandling when the sack compression is deferred.
fuyuanli [Wed, 31 May 2023 08:01:50 +0000 (16:01 +0800)]
tcp: fix mishandling when the sack compression is deferred.

In this patch, we mainly try to handle sending a compressed ack
correctly if it's deferred.

Here are more details in the old logic:
When sack compression is triggered in the tcp_compressed_ack_kick(),
if the sock is owned by user, it will set TCP_DELACK_TIMER_DEFERRED
and then defer to the release cb phrase. Later once user releases
the sock, tcp_delack_timer_handler() should send a ack as expected,
which, however, cannot happen due to lack of ICSK_ACK_TIMER flag.
Therefore, the receiver would not sent an ack until the sender's
retransmission timeout. It definitely increases unnecessary latency.

Fixes: 5d9f4262b7ea ("tcp: add SACK compression")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: fuyuanli <fuyuanli@didiglobal.com>
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/netdev/20230529113804.GA20300@didi-ThinkCentre-M920t-N000/
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230531080150.GA20424@didi-ThinkCentre-M920t-N000
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet/sched: flower: fix possible OOB write in fl_set_geneve_opt()
Hangyu Hua [Wed, 31 May 2023 10:28:04 +0000 (18:28 +0800)]
net/sched: flower: fix possible OOB write in fl_set_geneve_opt()

If we send two TCA_FLOWER_KEY_ENC_OPTS_GENEVE packets and their total
size is 252 bytes(key->enc_opts.len = 252) then
key->enc_opts.len = opt->length = data_len / 4 = 0 when the third
TCA_FLOWER_KEY_ENC_OPTS_GENEVE packet enters fl_set_geneve_opt. This
bypasses the next bounds check and results in an out-of-bounds.

Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Link: https://lore.kernel.org/r/20230531102805.27090-1-hbh25y@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agoMerge branch 'wangxun-netdev-features-support'
Jakub Kicinski [Thu, 1 Jun 2023 06:02:28 +0000 (23:02 -0700)]
Merge branch 'wangxun-netdev-features-support'

Mengyuan Lou says:

====================
Wangxun netdev features support

Implement tx_csum and rx_csum to support hardware checksum offload.
Implement ndo_vlan_rx_add_vid and ndo_vlan_rx_kill_vid.
Implement ndo_set_features.
Enable macros in netdev features which wangxun can support.
====================

Link: https://lore.kernel.org/r/20230530022632.17938-1-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: txgbe: Implement vlan add and remove ops
Mengyuan Lou [Tue, 30 May 2023 02:26:32 +0000 (10:26 +0800)]
net: txgbe: Implement vlan add and remove ops

txgbe add ndo_vlan_rx_add_vid and ndo_vlan_rx_kill_vid.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: txgbe: Add netdev features support
Mengyuan Lou [Tue, 30 May 2023 02:26:31 +0000 (10:26 +0800)]
net: txgbe: Add netdev features support

Add features and hw_features that ngbe can support.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: ngbe: Implement vlan add and remove ops
Mengyuan Lou [Tue, 30 May 2023 02:26:30 +0000 (10:26 +0800)]
net: ngbe: Implement vlan add and remove ops

ngbe add ndo_vlan_rx_add_vid and ndo_vlan_rx_kill_vid.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: ngbe: Add netdev features support
Mengyuan Lou [Tue, 30 May 2023 02:26:29 +0000 (10:26 +0800)]
net: ngbe: Add netdev features support

Add features and hw_features that ngbe can support.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: libwx: Implement xx_set_features ops
Mengyuan Lou [Tue, 30 May 2023 02:26:28 +0000 (10:26 +0800)]
net: libwx: Implement xx_set_features ops

Implement wx_set_features function which to support
ndo_set_features.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: wangxun: Implement vlan add and kill functions
Mengyuan Lou [Tue, 30 May 2023 02:26:27 +0000 (10:26 +0800)]
net: wangxun: Implement vlan add and kill functions

Implement vlan add/kill functions which add and remove
vlan id in hardware.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: wangxun: libwx add rx offload functions
Mengyuan Lou [Tue, 30 May 2023 02:26:26 +0000 (10:26 +0800)]
net: wangxun: libwx add rx offload functions

Add rx offload functions for wx_clean_rx_irq
which supports ngbe and txgbe to implement
rx offload function.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: wangxun: libwx add tx offload functions
Mengyuan Lou [Tue, 30 May 2023 02:26:25 +0000 (10:26 +0800)]
net: wangxun: libwx add tx offload functions

Add tx offload functions for wx_xmit_frame_ring which
includes wx_encode_tx_desc_ptype, wx_tso and wx_tx_csum.
which supports ngbe and txgbe to implement tx offload
function.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodevlink: make health report on unregistered instance warn just once
Jakub Kicinski [Wed, 31 May 2023 01:55:23 +0000 (18:55 -0700)]
devlink: make health report on unregistered instance warn just once

Devlink health is involved in error recovery. Machines in bad
state tend to be fairly unreliable, and occasionally get stuck
in error loops. Even with a reasonable grace period devlink health
may get a thousand reports in an hour.

In case of reporting on an unregistered devlink instance
the subsequent reports don't add much value. Switch to
WARN_ON_ONCE() to avoid flooding dmesg and fleet monitoring
dashboards.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230531015523.48961-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'add-support-for-vsc85xx-dt-rgmii-delays'
Jakub Kicinski [Thu, 1 Jun 2023 05:33:46 +0000 (22:33 -0700)]
Merge branch 'add-support-for-vsc85xx-dt-rgmii-delays'

Harini Katakam says:

====================
Add support for VSC85xx DT RGMII delays

Provide an option to change RGMII delay value via devicetree.
====================

Link: https://lore.kernel.org/r/20230529122017.10620-1-harini.katakam@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agophy: mscc: Add support for RGMII delay configuration
Harini Katakam [Mon, 29 May 2023 12:20:17 +0000 (17:50 +0530)]
phy: mscc: Add support for RGMII delay configuration

Add support for optional rx/tx-internal-delay-ps from devicetree.
- When rx/tx-internal-delay-ps is/are specified, these take priority
- When either is absent,
1) use 2ns for respective settings if rgmii-id/rxid/txid is/are present
2) use 0.2ns for respective settings if mode is rgmii

Signed-off-by: Harini Katakam <harini.katakam@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agophy: mscc: Use PHY_ID_MATCH_VENDOR to minimize PHY ID table
Harini Katakam [Mon, 29 May 2023 12:20:16 +0000 (17:50 +0530)]
phy: mscc: Use PHY_ID_MATCH_VENDOR to minimize PHY ID table

All the PHY devices variants specified have the same mask and
hence can be simplified to one vendor look up for 0x00070400.
Any individual config can be identified by PHY_ID_MATCH_EXACT
in the respective structure.

Signed-off-by: Harini Katakam <harini.katakam@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agosfc: fix error unwinds in TC offload
Edward Cree [Tue, 30 May 2023 20:25:27 +0000 (21:25 +0100)]
sfc: fix error unwinds in TC offload

Failure ladders weren't exactly unwinding what the function had done up
 to that point; most seriously, when we encountered an already offloaded
 rule, the failure path tried to remove the new rule from the hashtable,
 which would in fact remove the already-present 'old' rule (since it has
 the same key) from the table, and leak its resources.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/r/202305200745.xmIlkqjH-lkp@intel.com/
Fixes: d902e1a737d4 ("sfc: bare bones TC offload on EF100")
Fixes: 17654d84b47c ("sfc: add offloading of 'foreign' TC (decap) rules")
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230530202527.53115-1-edward.cree@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: don't set sw irq coalescing defaults in case of PREEMPT_RT
Heiner Kallweit [Sun, 28 May 2023 17:39:59 +0000 (19:39 +0200)]
net: don't set sw irq coalescing defaults in case of PREEMPT_RT

If PREEMPT_RT is set, then assume that the user focuses on minimum
latency. Therefore don't set sw irq coalescing defaults.
This affects the defaults only, users can override these settings
via sysfs.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/f9439c7f-c92c-4c2c-703e-110f96d841b7@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet/mlx5: Read embedded cpu after init bit cleared
Moshe Shemesh [Fri, 28 Apr 2023 10:48:13 +0000 (13:48 +0300)]
net/mlx5: Read embedded cpu after init bit cleared

During driver load it reads embedded_cpu bit from initialization
segment, but the initialization segment is readable only after
initialization bit is cleared.

Move the call to mlx5_read_embedded_cpu() right after initialization bit
cleared.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Fixes: 591905ba9679 ("net/mlx5: Introduce Mellanox SmartNIC and modify page management logic")
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5e: Fix error handling in mlx5e_refresh_tirs
Saeed Mahameed [Sun, 28 May 2023 06:07:08 +0000 (23:07 -0700)]
net/mlx5e: Fix error handling in mlx5e_refresh_tirs

Allocation failure is outside the critical lock section and should
return immediately rather than jumping to the unlock section.

Also unlock as soon as required and remove the now redundant jump label.

Fixes: 80a2a9026b24 ("net/mlx5e: Add a lock on tir list")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Ensure af_desc.mask is properly initialized
Chuck Lever [Wed, 31 May 2023 19:48:25 +0000 (15:48 -0400)]
net/mlx5: Ensure af_desc.mask is properly initialized

[    9.837087] mlx5_core 0000:02:00.0: firmware version: 16.35.2000
[    9.843126] mlx5_core 0000:02:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
[   10.311515] mlx5_core 0000:02:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[   10.321948] mlx5_core 0000:02:00.0: E-Switch: Total vports 2, per vport: max uc(128) max mc(2048)
[   10.344324] mlx5_core 0000:02:00.0: mlx5_pcie_event:301:(pid 88): PCIe slot advertised sufficient power (27W).
[   10.354339] BUG: unable to handle page fault for address: ffffffff8ff0ade0
[   10.361206] #PF: supervisor read access in kernel mode
[   10.366335] #PF: error_code(0x0000) - not-present page
[   10.371467] PGD 81ec39067 P4D 81ec39067 PUD 81ec3a063 PMD 114b07063 PTE 800ffff7e10f5062
[   10.379544] Oops: 0000 [#1] PREEMPT SMP PTI
[   10.383721] CPU: 0 PID: 117 Comm: kworker/0:6 Not tainted 6.3.0-13028-g7222f123c983 #1
[   10.391625] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0b 06/12/2017
[   10.398750] Workqueue: events work_for_cpu_fn
[   10.403108] RIP: 0010:__bitmap_or+0x10/0x26
[   10.407286] Code: 85 c0 0f 95 c0 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 89 c9 31 c0 48 83 c1 3f 48 c1 e9 06 39 c>
[   10.426024] RSP: 0000:ffffb45a0078f7b0 EFLAGS: 00010097
[   10.431240] RAX: 0000000000000000 RBX: ffffffff8ff0adc0 RCX: 0000000000000004
[   10.438365] RDX: ffff9156801967d0 RSI: ffffffff8ff0ade0 RDI: ffff9156801967b0
[   10.445489] RBP: ffffb45a0078f7e8 R08: 0000000000000030 R09: 0000000000000000
[   10.452613] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000ec
[   10.459737] R13: ffffffff8ff0ade0 R14: 0000000000000001 R15: 0000000000000020
[   10.466862] FS:  0000000000000000(0000) GS:ffff9165bfc00000(0000) knlGS:0000000000000000
[   10.474936] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.480674] CR2: ffffffff8ff0ade0 CR3: 00000001011ae003 CR4: 00000000003706f0
[   10.487800] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   10.494922] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   10.502046] Call Trace:
[   10.504493]  <TASK>
[   10.506589]  ? matrix_alloc_area.constprop.0+0x43/0x9a
[   10.511729]  ? prepare_namespace+0x84/0x174
[   10.515914]  irq_matrix_reserve_managed+0x56/0x10c
[   10.520699]  x86_vector_alloc_irqs+0x1d2/0x31e
[   10.525146]  irq_domain_alloc_irqs_hierarchy+0x39/0x3f
[   10.530284]  irq_domain_alloc_irqs_parent+0x1a/0x2a
[   10.535155]  intel_irq_remapping_alloc+0x59/0x5e9
[   10.539859]  ? kmem_cache_debug_flags+0x11/0x26
[   10.544383]  ? __radix_tree_lookup+0x39/0xb9
[   10.548649]  irq_domain_alloc_irqs_hierarchy+0x39/0x3f
[   10.553779]  irq_domain_alloc_irqs_parent+0x1a/0x2a
[   10.558650]  msi_domain_alloc+0x8c/0x120
[   10.567697]  irq_domain_alloc_irqs_locked+0x11d/0x286
[   10.572741]  __irq_domain_alloc_irqs+0x72/0x93
[   10.577179]  __msi_domain_alloc_irqs+0x193/0x3f1
[   10.581789]  ? __xa_alloc+0xcf/0xe2
[   10.585273]  msi_domain_alloc_irq_at+0xa8/0xfe
[   10.589711]  pci_msix_alloc_irq_at+0x47/0x5c

The crash is due to matrix_alloc_area() attempting to access per-CPU
memory for CPUs that are not present on the system. The CPU mask
passed into reserve_managed_vector() via it's @irqd parameter is
corrupted because it contains uninitialized stack data.

Fixes: bbac70c74183 ("net/mlx5: Use newer affinity descriptor")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Fix setting of irq->map.index for static IRQ case
Niklas Schnelle [Wed, 31 May 2023 08:48:56 +0000 (10:48 +0200)]
net/mlx5: Fix setting of irq->map.index for static IRQ case

When dynamic IRQ allocation is not supported all IRQs are allocated up
front in mlx5_irq_table_create() instead of dynamically as part of
mlx5_irq_alloc(). In the latter dynamic case irq->map.index is set
via the mapping returned by pci_msix_alloc_irq_at(). In the static case
and prior to commit 1da438c0ae02 ("net/mlx5: Fix indexing of mlx5_irq")
irq->map.index was set in mlx5_irq_alloc() twice once initially to 0 and
then to the requested index before storing in the xarray. After this
commit it is only set to 0 which breaks all other IRQ mappings.

Fix this by setting irq->map.index to the requested index together with
irq->map.virq and improve the related comment to make it clearer which
cases it deals with.

Cc: Chuck Lever III <chuck.lever@oracle.com>
Tested-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Fixes: 1da438c0ae02 ("net/mlx5: Fix indexing of mlx5_irq")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agonet/mlx5: Remove rmap also in case dynamic MSIX not supported
Shay Drory [Tue, 30 May 2023 08:59:34 +0000 (11:59 +0300)]
net/mlx5: Remove rmap also in case dynamic MSIX not supported

mlx5 add IRQs to rmap upon MSIX request, and mlx5 remove rmap from
MSIX only if msi_map.index is populated. However, msi_map.index is
populated only when dynamic MSIX is supported. This results in freeing
IRQs without removing them from rmap, which triggers the bellow
WARN_ON[1].

rmap is a feature which have no relation to dynamic MSIX.
Hence, remove the check of msi_map.index when removing IRQ from rmap.

[1]
[  200.307160 ] WARNING: CPU: 20 PID: 1702 at kernel/irq/manage.c:2034 free_irq+0x2ac/0x358
[  200.316990 ] CPU: 20 PID: 1702 Comm: modprobe Not tainted 6.4.0-rc3_for_upstream_min_debug_2023_05_24_14_02 #1
[  200.318939 ] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[  200.321659 ] pc : free_irq+0x2ac/0x358
[  200.322400 ] lr : free_irq+0x20/0x358
[  200.337865 ] Call trace:
[  200.338360 ]  free_irq+0x2ac/0x358
[  200.339029 ]  irq_release+0x58/0xd0 [mlx5_core]
[  200.340093 ]  mlx5_irqs_release_vectors+0x80/0xb0 [mlx5_core]
[  200.341344 ]  destroy_comp_eqs+0x120/0x170 [mlx5_core]
[  200.342469 ]  mlx5_eq_table_destroy+0x1c/0x38 [mlx5_core]
[  200.343645 ]  mlx5_unload+0x8c/0xc8 [mlx5_core]
[  200.344652 ]  mlx5_uninit_one+0x78/0x118 [mlx5_core]
[  200.345745 ]  remove_one+0x80/0x108 [mlx5_core]
[  200.346752 ]  pci_device_remove+0x40/0xd8
[  200.347554 ]  device_remove+0x50/0x88
[  200.348272 ]  device_release_driver_internal+0x1c4/0x228
[  200.349312 ]  driver_detach+0x54/0xa0
[  200.350030 ]  bus_remove_driver+0x74/0x100
[  200.350833 ]  driver_unregister+0x34/0x68
[  200.351619 ]  pci_unregister_driver+0x28/0xa0
[  200.352476 ]  mlx5_cleanup+0x14/0x2210 [mlx5_core]
[  200.353536 ]  __arm64_sys_delete_module+0x190/0x2e8
[  200.354495 ]  el0_svc_common.constprop.0+0x6c/0x1d0
[  200.355455 ]  do_el0_svc+0x38/0x98
[  200.356122 ]  el0_svc+0x1c/0x80
[  200.356739 ]  el0t_64_sync_handler+0xb4/0x130
[  200.357604 ]  el0t_64_sync+0x174/0x178
[  200.358345 ] ---[ end trace 0000000000000000  ]---

Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
13 months agoMerge tag '6.4-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Wed, 31 May 2023 23:24:01 +0000 (19:24 -0400)]
Merge tag '6.4-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Four small smb3 client fixes:

   - two small fixes suggested by kernel test robot

   - small cleanup fix

   - update Paulo's email address in the maintainer file"

* tag '6.4-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: address unused variable warning
  smb: delete an unnecessary statement
  smb3: missing null check in SMB2_change_notify
  smb3: update a reviewer email in MAINTAINERS file

13 months agomailbox: mailbox-test: fix a locking issue in mbox_test_message_write()
Dan Carpenter [Fri, 5 May 2023 09:22:09 +0000 (12:22 +0300)]
mailbox: mailbox-test: fix a locking issue in mbox_test_message_write()

There was a bug where this code forgot to unlock the tdev->mutex if the
kzalloc() failed.  Fix this issue, by moving the allocation outside the
lock.

Fixes: 2d1e952a2b8e ("mailbox: mailbox-test: Fix potential double-free in mbox_test_message_write()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Lee Jones <lee@kernel.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
13 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Wed, 31 May 2023 18:13:09 +0000 (14:13 -0400)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:

 - Fix 64K ARM page size support in bnxt_re and efa

 - bnxt_re fixes for a memory leak, incorrect error handling and a
   remove a bogus FW failure when running on a VF

 - Update MAINTAINERS for hns and efa

 - Fix two rxe regressions added this merge window in error unwind and
   incorrect spinlock primitives

 - hns gets a better algorithm for allocating page tables to avoid
   running out of resources, and a timeout adjustment

 - Fix a text case failure in hns

 - Use after free in irdma and fix incorrect construction of a WQE
   causing mis-execution

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/irdma: Fix Local Invalidate fencing
  RDMA/irdma: Prevent QP use after free
  MAINTAINERS: Update maintainer of Amazon EFA driver
  RDMA/bnxt_re: Do not enable congestion control on VFs
  RDMA/bnxt_re: Fix return value of bnxt_re_process_raw_qp_pkt_rx
  RDMA/bnxt_re: Fix a possible memory leak
  RDMA/hns: Modify the value of long message loopback slice
  RDMA/hns: Fix base address table allocation
  RDMA/hns: Fix timeout attr in query qp for HIP08
  RDMA/efa: Fix unsupported page sizes in device
  RDMA/rxe: Convert spin_{lock_bh,unlock_bh} to spin_{lock_irqsave,unlock_irqrestore}
  RDMA/rxe: Fix double unlock in rxe_qp.c
  MAINTAINERS: Update maintainers of HiSilicon RoCE
  RDMA/bnxt_re: Fix the page_size used during the MR creation

13 months agoMerge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 31 May 2023 18:06:01 +0000 (14:06 -0400)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix two regressions in ext4 and a number of issues reported by syzbot"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: enable the lazy init thread when remounting read/write
  ext4: fix fsync for non-directories
  ext4: add lockdep annotations for i_data_sem for ea_inode's
  ext4: disallow ea_inodes with extended attributes
  ext4: set lockdep subclass for the ea_inode in ext4_xattr_inode_cache_find()
  ext4: add EA_INODE checking to ext4_iget()

13 months agoHID: logitech-hidpp: Handle timeout differently from busy
Bastien Nocera [Wed, 31 May 2023 08:24:28 +0000 (10:24 +0200)]
HID: logitech-hidpp: Handle timeout differently from busy

If an attempt at contacting a receiver or a device fails because the
receiver or device never responds, don't restart the communication, only
restart it if the receiver or device answers that it's busy, as originally
intended.

This was the behaviour on communication timeout before commit 586e8fede795
("HID: logitech-hidpp: Retry commands when device is busy").

This fixes some overly long waits in a critical path on boot, when
checking whether the device is connected by getting its HID++ version.

Signed-off-by: Bastien Nocera <hadess@hadess.net>
Suggested-by: Mark Lord <mlord@pobox.com>
Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
13 months agoudp6: Fix race condition in udp6_sendmsg & connect
Vladislav Efanov [Tue, 30 May 2023 11:39:41 +0000 (14:39 +0300)]
udp6: Fix race condition in udp6_sendmsg & connect

Syzkaller got the following report:
BUG: KASAN: use-after-free in sk_setup_caps+0x621/0x690 net/core/sock.c:2018
Read of size 8 at addr ffff888027f82780 by task syz-executor276/3255

The function sk_setup_caps (called by ip6_sk_dst_store_flow->
ip6_dst_store) referenced already freed memory as this memory was
freed by parallel task in udpv6_sendmsg->ip6_sk_dst_lookup_flow->
sk_dst_check.

          task1 (connect)              task2 (udp6_sendmsg)
        sk_setup_caps->sk_dst_set |
                                  |  sk_dst_check->
                                  |      sk_dst_set
                                  |      dst_release
        sk_setup_caps references  |
        to already freed dst_entry|

The reason for this race condition is: sk_setup_caps() keeps using
the dst after transferring the ownership to the dst cache.

Found by Linux Verification Center (linuxtesting.org) with syzkaller.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Vladislav Efanov <VEfanov@ispras.ru>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoMerge branch 'xstats-for-tc-taprio'
David S. Miller [Wed, 31 May 2023 09:00:30 +0000 (10:00 +0100)]
Merge branch 'xstats-for-tc-taprio'

Vladimir Oltean says:

====================
xstats for tc-taprio

As a result of this discussion:
https://lore.kernel.org/intel-wired-lan/20230411055543.24177-1-muhammad.husaini.zulkifli@intel.com/

it became apparent that tc-taprio should make an effort to standardize
statistics counters related to the 802.1Qbv scheduling as implemented
by the NIC. I'm presenting here one counter suggested by the standard,
and one counter defined by the NXP ENETC controller from LS1028A. Both
counters are reported globally and per traffic class - drivers get
different callbacks for reporting both of these, and get to choose what
to report in both cases.

The iproute2 counterpart is available here for testing:
https://github.com/vladimiroltean/iproute2/commits/taprio-xstats
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: enetc: report statistics counters for taprio
Vladimir Oltean [Tue, 30 May 2023 09:19:48 +0000 (12:19 +0300)]
net: enetc: report statistics counters for taprio

Report the "win_drop" counter from the unstructured ethtool -S as
TCA_TAPRIO_OFFLOAD_STATS_WINDOW_DROPS to the Qdisc layer. It is
available both as a global counter as well as a per-TC one.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: enetc: refactor enetc_setup_tc_taprio() to have a switch/case for cmd
Vladimir Oltean [Tue, 30 May 2023 09:19:47 +0000 (12:19 +0300)]
net: enetc: refactor enetc_setup_tc_taprio() to have a switch/case for cmd

Make enetc_setup_tc_taprio() more amenable to future extensions, like
reporting statistics.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet/sched: taprio: add netlink reporting for offload statistics counters
Vladimir Oltean [Tue, 30 May 2023 09:19:46 +0000 (12:19 +0300)]
net/sched: taprio: add netlink reporting for offload statistics counters

Offloading drivers may report some additional statistics counters, some
of them even suggested by 802.1Q, like TransmissionOverrun.

In my opinion we don't have to limit ourselves to reporting counters
only globally to the Qdisc/interface, especially if the device has more
detailed reporting (per traffic class), since the more detailed info is
valuable for debugging and can help identifying who is exceeding its
time slot.

But on the other hand, some devices may not be able to report both per
TC and global stats.

So we end up reporting both ways, and use the good old ethtool_put_stat()
strategy to determine which statistics are supported by this NIC.
Statistics which aren't set are simply not reported to netlink. For this
reason, we need something dynamic (a nlattr nest) to be reported through
TCA_STATS_APP, and not something daft like the fixed-size and
inextensible struct tc_codel_xstats. A good model for xstats which are a
nlattr nest rather than a fixed struct seems to be cake.

 # Global stats
 $ tc -s qdisc show dev eth0 root
 # Per-tc stats
 $ tc -s class show dev eth0

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet/sched: taprio: replace tc_taprio_qopt_offload :: enable with a "cmd" enum
Vladimir Oltean [Tue, 30 May 2023 09:19:45 +0000 (12:19 +0300)]
net/sched: taprio: replace tc_taprio_qopt_offload :: enable with a "cmd" enum

Inspired from struct flow_cls_offload :: cmd, in order for taprio to be
able to report statistics (which is future work), it seems that we need
to drill one step further with the ndo_setup_tc(TC_SETUP_QDISC_TAPRIO)
multiplexing, and pass the command as part of the common portion of the
muxed structure.

Since we already have an "enable" variable in tc_taprio_qopt_offload,
refactor all drivers to check for "cmd" instead of "enable", and reject
every other command except "replace" and "destroy" - to be future proof.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> # for lan966x
Acked-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet/sched: taprio: don't overwrite "sch" variable in taprio_dump_class_stats()
Vladimir Oltean [Tue, 30 May 2023 09:19:44 +0000 (12:19 +0300)]
net/sched: taprio: don't overwrite "sch" variable in taprio_dump_class_stats()

In taprio_dump_class_stats() we don't need a reference to the root Qdisc
once we get the reference to the child corresponding to this traffic
class, so it's okay to overwrite "sch". But in a future patch we will
need the root Qdisc too, so create a dedicated "child" pointer variable
to hold the child reference. This also makes the code adhere to a more
conventional coding style.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoMerge branch 'dsa-marvell-mv88e6071-and-6020-support'
David S. Miller [Wed, 31 May 2023 08:56:08 +0000 (09:56 +0100)]
Merge branch 'dsa-marvell-mv88e6071-and-6020-support'

Lukasz Majewski says:

====================
dsa: marvell: Add support for mv88e6071 and 6020  switches

After the commit (SHA1: 7e9517375a14f44ee830ca1c3278076dd65fcc8f);
"net: dsa: mv88e6xxx: fix max_mtu of 1492 on 6165, 6191, 6220, 6250, 6290" the
error when mv88e6020 or mv88e6071 is used is not present anymore.

As a result patches for adding max frame size are not required to provide
working setup with aforementioned switches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: dsa: mv88e6xxx: add support for MV88E6071 switch
Lukasz Majewski [Tue, 30 May 2023 08:39:16 +0000 (10:39 +0200)]
net: dsa: mv88e6xxx: add support for MV88E6071 switch

A mv88e6250 family switch with 5 internal PHYs, 2 RMIIs
and no PTP support.

Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: dsa: mv88e6xxx: add support for MV88E6020 switch
Matthias Schiffer [Tue, 30 May 2023 08:39:15 +0000 (10:39 +0200)]
net: dsa: mv88e6xxx: add support for MV88E6020 switch

A mv88e6250 family switch with 2 PHY and RMII ports and
no PTP support.

Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: dsa: Define .set_max_frame_size() callback for mv88e6250 SoC family
Lukasz Majewski [Tue, 30 May 2023 08:39:14 +0000 (10:39 +0200)]
net: dsa: Define .set_max_frame_size() callback for mv88e6250 SoC family

Switches from mv88e6250 family (including mv88e6020 and mv88e6071) need
the possibility to setup the maximal frame size, as they support frames
up to 2048 bytes.

Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: dsa: Switch i2c drivers back to use .probe()
Uwe Kleine-König [Tue, 30 May 2023 06:39:36 +0000 (08:39 +0200)]
net: dsa: Switch i2c drivers back to use .probe()

After commit b8a1a4cd5a98 ("i2c: Provide a temporary .probe_new()
call-back type"), all drivers being converted to .probe_new() and then
03c835f498b5 ("i2c: Switch .probe() to not take an id parameter") convert
back to (the new) .probe() to be able to eventually drop .probe_new() from
struct i2c_driver.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: Make gro complete function to return void
Parav Pandit [Mon, 29 May 2023 13:44:30 +0000 (16:44 +0300)]
net: Make gro complete function to return void

tcp_gro_complete() function only updates the skb fields related to GRO
and it always returns zero. All the 3 drivers which are using it
do not check for the return value either.

Change it to return void instead which simplifies its callers as
error handing becomes unnecessary.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoMerge branch 'net-led-hw-control-api'
David S. Miller [Wed, 31 May 2023 08:42:09 +0000 (09:42 +0100)]
Merge branch 'net-led-hw-control-api'

Christian Marangi says:

====================
leds: introduce new LED hw control APIs

Since this series is cross subsystem between LED and netdev,
a stable branch was created to facilitate merging process.

This is based on top of branch ib-leds-netdev-v6.5 present here [1]
and rebased on top of net-next since the LED stable branch got merged.

This is a continue of [2]. It was decided to take a more gradual
approach to implement LEDs support for switch and phy starting with
basic support and then implementing the hw control part when we have all
the prereq done.

This is the main part of the series, the one that actually implement the
hw control API.

Some history about this feature and why
=======================================

This proposal is highly requested by the entire net community but the API
is not strictly designed for net usage but for a more generic usage.

Initial version were very flexible and designed to try to support every
aspect of the LED driver with many complex function that served multiple
purpose. There was an idea to have sw only and hw only LEDs and sw only
and hw only LEDs.

With some heads up from Andrew from the net mailing list, it was suggested
to implement a more basic yet easy to implement system.

These API strictly work with a designated trigger to offload their
function.
This may be confused with hw blink offload but LED may have an even more
advanced configuration where the entire aspect of the trigger is
offloaded and completely handled by the hardware.

An example of this usage are PHY or switch port LEDs. Almost every of
these kind of device have multiple LED attached and provide info of the
current port state.

Currently we lack any support of them but these device always provide a
way to configure them, from basic feature like turning the LED off or no
(implemented in previous series related to this feature) or even entirely
driven by the hw and power on/off/blink based on some events, like tx/rx
traffic, ethernet cable attached, link speed of 10mbps, 100mbps, 1000mbps
or more. They can also support multiple logic like blink with traffic only
if a particular link speed is attached. (an example of this is when a LED
is designated to be turned on only with 100mbps link speed and configured
to blink on traffic and a secondary LED of a different color is present to
serve the same function but only when the link speed is 1000mbps)

These case are very common for a PHY or a switch but they were never
standardized so OEM support all kind of variant and configuration.

Again with Andrew we compared some feature and we reached a common set
of modes that are for sure present in every kind of devices.

And this concludes history and why.

What is present in this series
==============================

This patch contain the required API to support this feature, I decided on
the name of hw control to quickly describe this feature.

I documented each require API in the related Documentation for leds-class
so I think it might me redundant to expose them here. Feel free to tell me
how to improve it if anything is not clear.

On an abstract idea, this feature require this:

    - The trigger needs to make use of it, this is currently implemented
      for the netdev trigger but other trigger can be expanded if the
      device expose these function. An idea might be a anything that
      handle a storage disk and have the LED configurable to blink when
      there is any activity to the disk.

    - The LED driver needs to expose and implement these new API.

Currently a LED driver supports only a trigger. The trigger should use
the related helper to check if the LED can be driven hy hardware.

The different modes a trigger support are exposed in the kernel include
leds.h header and are used by the LED driver to understand what to do.

From a user standpoint, he should enable modes as usual from sysfs and if
anything is not supported warned.

Final words and missing piece from this series
==============================================

I honestly hope this feature can finally be implemented.

This series originally had also additional modes and logic to add to the
netdev trigger, but I decided to strip them and implement only the API
and support basic tx and rx. After this is merged, I will quickly propose
these additional modes.

Currently this is limited to tx and rx and this is what the current user
qca8k use. Marvell PHY support link and a generic blink with any kind of
traffic (both rx and tx). qca8k switch supports keeping the LED on based on
link speed.

The next series will add the concept of hw control only modes to the netdev
trigger and support for these additional modes:
- link_10
- link_100
- link_1000
- activity

The current implementation is voluntary basic and limited to put the ground
work and have something easy to implement and usable. 99% part of the logic
is done on the trigger side, leaving to the LED driver only the validating
and the apply part.

As shown for the PHY led binding, people are really intrested in this
feature as quickly after they were merged, people were already working on
adding support for it.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/lee/leds.git/?h=ib-leds-netdev-6.5
[2] https://lore.kernel.org/lkml/20230216013230.22978-1-ansuelsmth@gmail.com/

Changes in v4:
- Added review tag from Andrew.
- Move default interval to a define to keep them synced.
- Apply suggested reword to improve Documentation rst.

Changes in v3:
- Rebased on top of net-next

Changes in v2:
- Drop helper as currently used only by one trigger
- Improve Documentation and document return error of some functions
- Squash some patch to reduce series size
- Drop trigger mode mask as currently not used
- Rework hw control validating function to a simple implementation

Changes from previous v8 series:
- Rewrite Documentation from scratch and move to separate commit
- Strip additional trigger modes (to propose in a different series)
- Strip from qca8k driver additional modes (to implement in the different
  series)
- Split the netdev chages to smaller piece to permit easier review

Changelog in the previous v8 series: (stripped of unrelated changes)
v8:
- Improve the documentation of the new feature
- Rename to a more symbolic name
- Fix some bug in netdev trigger (not using BIT())
- Add more define for qca8k-leds driver
- Drop interval support
- Fix many bugs in the validate option in the netdev trigger
v7:
- Fix qca8k leds documentation warning
- Remove RFC tag
v6:
- Back to RFC.
- Drop additional trigger
- Rework netdev trigger to support common modes used by switch and
  hardware only triggers
- Refresh qca8k leds logic and driver
v5:
- Move out of RFC. (no comments from Andrew this is the right path?)
- Fix more spelling mistake (thx Randy)
- Fix error reported by kernel test bot
- Drop the additional HW_CONTROL flag. It does simplify CONFIG
  handling and hw control should be available anyway to support
  triggers as module.
v4:
- Rework implementation and drop hw_configure logic.
  We now expand blink_set.
- Address even more spelling mistake. (thx a lot Randy)
- Drop blink option and use blink_set delay.
v3:
- Rework start/stop as Andrew asked.
- Use test_bit API to check flag passed to hw_control_configure.
- Added a new cmd to hw_control_configure to reset any active blink_mode.
- Refactor all the patches to follow this new implementation.
v2:
- Fix spelling mistake (sorry)
- Drop patch 02 "permit to declare supported offload triggers".
  Change the logic, now the LED driver declare support for them
  using the configure_offload with the cmd TRIGGER_SUPPORTED.
- Rework code to follow this new implementation.
- Update Documentation to better describe how this offload
  implementation work.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: dsa: qca8k: add op to get ports netdev
Andrew Lunn [Mon, 29 May 2023 16:32:43 +0000 (18:32 +0200)]
net: dsa: qca8k: add op to get ports netdev

In order that the LED trigger can blink the switch MAC ports LED, it
needs to know the netdev associated to the port. Add the callback to
return the struct device of the netdev.

Add an helper function qca8k_phy_to_port() to convert the phy back to
dsa_port index, as we reference LED port based on the internal PHY
index and needs to be converted back.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: dsa: qca8k: implement hw_control ops
Christian Marangi [Mon, 29 May 2023 16:32:42 +0000 (18:32 +0200)]
net: dsa: qca8k: implement hw_control ops

Implement hw_control ops to drive Switch LEDs based on hardware events.

Netdev trigger is the declared supported trigger for hw control
operation and supports the following mode:
- tx
- rx

When hw_control_set is called, LEDs are set to follow the requested
mode.
Each LEDs will blink at 4Hz by default.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: expose netdev trigger modes in linux include
Christian Marangi [Mon, 29 May 2023 16:32:41 +0000 (18:32 +0200)]
leds: trigger: netdev: expose netdev trigger modes in linux include

Expose netdev trigger modes to make them accessible by LED driver that
will support netdev trigger for hw control.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: init mode if hw control already active
Christian Marangi [Mon, 29 May 2023 16:32:40 +0000 (18:32 +0200)]
leds: trigger: netdev: init mode if hw control already active

On netdev trigger activation, hw control may be already active by
default. If this is the case and a device is actually provided by
hw_control_get_device(), init the already active mode and set the
bool to hw_control bool to true to reflect the already set mode in the
trigger_data.

Co-developed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: validate configured netdev
Andrew Lunn [Mon, 29 May 2023 16:32:39 +0000 (18:32 +0200)]
leds: trigger: netdev: validate configured netdev

The netdev which the LED should blink for is configurable in
/sys/class/led/foo/device_name. Ensure when offloading that the
configured netdev is the same as the netdev the LED is associated
with. If it is not, only perform software blinking.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: add support for LED hw control
Christian Marangi [Mon, 29 May 2023 16:32:38 +0000 (18:32 +0200)]
leds: trigger: netdev: add support for LED hw control

Add support for LED hw control for the netdev trigger.

The trigger on calling set_baseline_state to configure a new mode, will
do various check to verify if hw control can be used for the requested
mode in can_hw_control() function.

It will first check if the LED driver supports hw control for the netdev
trigger, then will use hw_control_is_supported() and finally will call
hw_control_set() to apply the requested mode.

To use such mode, interval MUST be set to the default value and net_dev
MUST be set. If one of these 2 value are not valid, hw control will
never be used and normal software fallback is used.

The default interval value is moved to a define to make sure they are
always synced.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: reject interval store for hw_control
Christian Marangi [Mon, 29 May 2023 16:32:37 +0000 (18:32 +0200)]
leds: trigger: netdev: reject interval store for hw_control

Reject interval store with hw_control enabled. It's are currently not
supported and MUST be set to the default value with hw control enabled.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: add basic check for hw control support
Christian Marangi [Mon, 29 May 2023 16:32:36 +0000 (18:32 +0200)]
leds: trigger: netdev: add basic check for hw control support

Add basic check for hw control support. Check if the required API are
defined and check if the defined trigger supported in hw control for the
LED driver match netdev.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: introduce check for possible hw control
Christian Marangi [Mon, 29 May 2023 16:32:35 +0000 (18:32 +0200)]
leds: trigger: netdev: introduce check for possible hw control

Introduce function to check if the requested mode can use hw control in
preparation for hw control support. Currently everything is handled in
software so can_hw_control will always return false.

Add knob with the new value hw_control in trigger_data struct to
set hw control possible. Useful for future implementation to implement
in set_baseline_state() the required function to set the requested mode
using LEDs hw control ops and in other function to reject set if hw
control is currently active.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: refactor code setting device name
Andrew Lunn [Mon, 29 May 2023 16:32:34 +0000 (18:32 +0200)]
leds: trigger: netdev: refactor code setting device name

Move the code into a helper, ready for it to be called at
other times. No intended behaviour change.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoDocumentation: leds: leds-class: Document new Hardware driven LEDs APIs
Christian Marangi [Mon, 29 May 2023 16:32:33 +0000 (18:32 +0200)]
Documentation: leds: leds-class: Document new Hardware driven LEDs APIs

Document new Hardware driven LEDs APIs.

Some LEDs can be programmed to be driven by hardware. This is not
limited to blink but also to turn off or on autonomously.
To support this feature, a LED needs to implement various additional
ops and needs to declare specific support for the supported triggers.

Add documentation for each required value and API to make hw control
possible and implementable by both LEDs and triggers.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: add API to get attached device for LED hw control
Andrew Lunn [Mon, 29 May 2023 16:32:32 +0000 (18:32 +0200)]
leds: add API to get attached device for LED hw control

Some specific LED triggers blink the LED based on events from a device
or subsystem.
For example, an LED could be blinked to indicate a network device is
receiving packets, or a disk is reading blocks. To correctly enable and
request the hw control of the LED, the trigger has to check if the
network interface or block device configured via a /sys/class/led file
match the one the LED driver provide for hw control for.

Provide an API call to get the device which the LED blinks for.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: add APIs for LEDs hw control
Christian Marangi [Mon, 29 May 2023 16:32:31 +0000 (18:32 +0200)]
leds: add APIs for LEDs hw control

Add an option to permit LED driver to declare support for a specific
trigger to use hw control and setup the LED to blink based on specific
provided modes.

Add APIs for LEDs hw control. These functions will be used to activate
hardware control where a LED will use the provided flags, from an
unique defined supported trigger, to setup the LED to be driven by
hardware.

Add hw_control_is_supported() to ask the LED driver if the requested
mode by the trigger are supported and the LED can be setup to follow
the requested modes.

Deactivate hardware blink control by setting brightness to LED_OFF via
the brightness_set() callback.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet/netlink: fix NETLINK_LIST_MEMBERSHIPS length report
Pedro Tammela [Mon, 29 May 2023 15:33:35 +0000 (12:33 -0300)]
net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report

The current code for the length calculation wrongly truncates the reported
length of the groups array, causing an under report of the subscribed
groups. To fix this, use 'BITS_TO_BYTES()' which rounds up the
division by 8.

Fixes: b42be38b2778 ("netlink: add API to retrieve all group memberships")
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230529153335.389815-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotipc: delete tipc_mtu_bad from tipc_udp_enable
Xin Long [Mon, 29 May 2023 14:52:13 +0000 (10:52 -0400)]
tipc: delete tipc_mtu_bad from tipc_udp_enable

Since commit a4dfa72d0acd ("tipc: set default MTU for UDP media"), it's
been no longer using dev->mtu for b->mtu, and the issue described in
commit 3de81b758853 ("tipc: check minimum bearer MTU") doesn't exist
in UDP bearer any more.

Besides, dev->mtu can still be changed to a too small mtu after the UDP
bearer is created even with tipc_mtu_bad() check in tipc_udp_enable().
Note that NETDEV_CHANGEMTU event processing in tipc_l2_device_event()
doesn't really work for UDP bearer.

So this patch deletes the unnecessary tipc_mtu_bad from tipc_udp_enable.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/282f1f5cc40e6cad385aa1c60569e6c5b70e2fb3.1685371933.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'net-dsa-mv88e6xxx-add-88e6361-support'
Jakub Kicinski [Wed, 31 May 2023 06:54:35 +0000 (23:54 -0700)]
Merge branch 'net-dsa-mv88e6xxx-add-88e6361-support'

Alexis Lothoré says:

====================
net: dsa: mv88e6xxx: add 88E6361 support

This series brings initial support for Marvell 88E6361 switch.

MV88E6361 is a 8 ports switch with 5 integrated Gigabit PHYs and 3
2.5Gigabit SerDes interfaces. It is in fact a new variant in the
88E639X/88E6193X/88E6191X family with a subset of existing features:
- port 0: MII, RMII, RGMII, 1000BaseX, 2500BaseX
- port 3 to 7: triple speed internal phys
- port 9 and 10: 1000BaseX, 25000BaseX

Since said family is already well supported in mv88e6xxx driver, adding
initial support for this new switch mostly consists in finding the ID
exposed in its identification register, adding a proper description
in switch description tables in mv88e6xxx driver, and enforcing 88E6361
specificities in mv88e6393x_XXX methods.

- first 4 commits introduce an internal phy offset field for switches which
  have internal phys but not starting from port 0
- 5th commit is a fix on existing switches based on first commits
- 6th commit is a slight modification to prepare 886361 support
- last commit introduces 88E6361 support in 88E6393X family

This initial support has been tested with two samples of a custom board
with the following hardware configuration:
- a main CPU connected to MV88E6361 using port 0 as CPU port
- port 9 wired to a SFP cage
- port 10 wired to a G.Hn transceiver

The following setup was used:
PC <-ethernet-> (copper SFP) - Board 1 - (G.hn) <-phone line(RJ11)-> (G.hn) Board 2

The unit 1 has been configured to bridge SFP port and G.hn port together,
which allowed to successfully ping Board 2 from PC.
====================

Link: https://lore.kernel.org/r/20230529080246.82953-1-alexis.lothore@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dsa: mv88e6xxx: enable support for 88E6361 switch
Alexis Lothoré [Mon, 29 May 2023 08:02:46 +0000 (10:02 +0200)]
net: dsa: mv88e6xxx: enable support for 88E6361 switch

Marvell 88E6361 is an 8-port switch derived from the
88E6393X/88E9193X/88E6191X switches family. It can benefit from the
existing mv88e6xxx driver by simply adding the proper switch description in
the driver. Main differences with other switches from this
family are:
- 8 ports exposed (instead of 11): ports 1, 2 and 8 not available
- No 5GBase-x nor SFI/USXGMII support

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dsa: mv88e6xxx: pass mv88e6xxx_chip structure to port_max_speed_mode
Alexis Lothoré [Mon, 29 May 2023 08:02:45 +0000 (10:02 +0200)]
net: dsa: mv88e6xxx: pass mv88e6xxx_chip structure to port_max_speed_mode

Some switches families have minor differences on supported link speed for
ports. Instead of redefining a new port_max_speed_mode for each different
configuration, allow to pass mv88e6xxx_chip structure to allow
differentiating those chips by known chip id

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dsa: mv88e6xxx: fix 88E6393X family internal phys layout
Alexis Lothoré [Mon, 29 May 2023 08:02:44 +0000 (10:02 +0200)]
net: dsa: mv88e6xxx: fix 88E6393X family internal phys layout

88E6393X/88E6193X/88E6191X switches have in fact 8 internal PHYs, but those
are not present starting at port 0: supported ports go from 1 to 8

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dsa: mv88e6xxx: add field to specify internal phys layout
Alexis Lothoré [Mon, 29 May 2023 08:02:43 +0000 (10:02 +0200)]
net: dsa: mv88e6xxx: add field to specify internal phys layout

mv88e6xxx currently assumes that switch equipped with internal phys have
those phys mapped contiguously starting from port 0 (see
mv88e6xxx_phy_is_internal). However, some switches have internal PHYs but
NOT starting from port 0. For example 88e6393X, 88E6193X and 88E6191X have
integrated PHYs available on ports 1 to 8
To properly support this offset, add a new field to allow specifying an
internal PHYs layout. If field is not set, default layout is assumed (start
at port 0)

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dsa: mv88e6xxx: use mv88e6xxx_phy_is_internal in mv88e6xxx_port_ppu_updates
Alexis Lothoré [Mon, 29 May 2023 08:02:42 +0000 (10:02 +0200)]
net: dsa: mv88e6xxx: use mv88e6xxx_phy_is_internal in mv88e6xxx_port_ppu_updates

Make sure to use existing helper to get internal PHYs count instead of
redoing it manually

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dsa: mv88e6xxx: pass directly chip structure to mv88e6xxx_phy_is_internal
Alexis Lothoré [Mon, 29 May 2023 08:02:41 +0000 (10:02 +0200)]
net: dsa: mv88e6xxx: pass directly chip structure to mv88e6xxx_phy_is_internal

Since this function is a simple helper, we do not need to pass a full
dsa_switch structure, we can directly pass the mv88e6xxx_chip structure.
Doing so will allow to share this function with any other function
not manipulating dsa_switch structure but needing info about number of
internal phys

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodt-bindings: net: dsa: marvell: add MV88E6361 switch to compatibility list
Alexis Lothoré [Mon, 29 May 2023 08:02:40 +0000 (10:02 +0200)]
dt-bindings: net: dsa: marvell: add MV88E6361 switch to compatibility list

Marvell MV88E6361 is an 8-port switch derived from the
88E6393X/88E9193X/88E6191X switches family. Since its functional behavior
is very close to switches from this family, it can benefit from existing
drivers for this family, so add it to the list of compatible switches

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'add-layer-2-miss-indication-and-filtering'
Jakub Kicinski [Wed, 31 May 2023 06:37:02 +0000 (23:37 -0700)]
Merge branch 'add-layer-2-miss-indication-and-filtering'

Ido Schimmel says:

====================
Add layer 2 miss indication and filtering

tl;dr
=====

This patchset adds a single bit to the tc skb extension to indicate that
a packet encountered a layer 2 miss in the bridge and extends flower to
match on this metadata. This is required for non-DF (Designated
Forwarder) filtering in EVPN multi-homing which prevents decapsulated
BUM packets from being forwarded multiple times to the same multi-homed
host.

Background
==========

In a typical EVPN multi-homing setup each host is multi-homed using a
set of links called ES (Ethernet Segment, i.e., LAG) to multiple leaf
switches in a rack. These switches act as VTEPs and are not directly
connected (as opposed to MLAG), but can communicate with each other (as
well as with VTEPs in remote racks) via spine switches over L3.

When a host sends a BUM packet over ES1 to VTEP1, the VTEP will flood it
to other VTEPs in the network, including those connected to the host
over ES1. The receiving VTEPs must drop the packet and not forward it
back to the host. This is called "split-horizon filtering" (SPH) [1].

FRR configures SPH filtering using two tc filters. The first, an ingress
filter that matches on packets received from VTEP1 and marks them using
a fwmark (firewall mark). The second, an egress filter configured on the
LAG interface connected to the host that matches on the fwmark and drops
the packets. Example:

 # tc filter add dev vxlan0 ingress pref 1 proto all flower enc_src_ip $VTEP1_IP action skbedit mark 101
 # tc filter add dev bond0 egress pref 1 handle 101 fw action drop

Motivation
==========

For each ES, only one VTEP is elected by the control plane as the DF.
The DF is responsible for forwarding decapsulated BUM traffic to the
host over the ES. The non-DF VTEPs must drop such traffic as otherwise
the host will receive multiple copies of BUM traffic. This is called
"non-DF filtering" [2].

Filtering of multicast and broadcast traffic can be achieved using the
following flower filter:

 # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop

Unlike broadcast and multicast traffic, it is not currently possible to
filter unknown unicast traffic. The classification into unknown unicast
is performed by the bridge driver, but is not visible to other layers.

Implementation
==============

The proposed solution is to add a single bit to the tc skb extension
that is set by the bridge for packets that encountered an FDB or MDB
miss. The flower classifier is extended to be able to match on this new
metadata bit in a similar fashion to existing metadata options such as
'indev'.

A bit that is set for every flooded packet would also work, but it does
not allow us to differentiate between registered and unregistered
multicast traffic which might be useful in the future.

A relatively generic name is chosen for this bit - 'l2_miss' - to allow
its use to be extended to other layer 2 devices such as VXLAN, should a
use case arise.

With the above, the control plane can implement a non-DF filter using
the following tc filters:

 # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop
 # tc filter add dev bond0 egress pref 2 proto all flower indev vxlan0 l2_miss true action drop

The first drops broadcast and multicast traffic and the second drops
unknown unicast traffic.

Testing
=======

A test exercising the different permutations of the 'l2_miss' bit is
added in patch #8.

Patchset overview
=================

Patch #1 adds the new bit to the tc skb extension and sets it in the
bridge driver for packets that encountered a miss. The marking of the
packets and the use of this extension is protected by the
'tc_skb_ext_tc' static key in order to keep performance impact to a
minimum when the feature is not in use.

Patch #2 extends the flow dissector to dissect this information from the
tc skb extension into the 'FLOW_DISSECTOR_KEY_META' key.

Patch #3 extends the flower classifier to be able to match on the new
layer 2 miss metadata. The classifier enables the 'tc_skb_ext_tc' static
key upon the installation of the first filter that matches on 'l2_miss'
and disables the key upon the removal of the last filter that matches on
it.

Patch #4 rejects matching on the new metadata in drivers that already
support the 'FLOW_DISSECTOR_KEY_META' key.

Patches #5-#6 are small preparations in mlxsw.

Patch #7 extends mlxsw to be able to match on layer 2 miss.

Patch #8 adds a selftest.

iproute2 patches can be found here [3].

[1] https://datatracker.ietf.org/doc/html/rfc7432#section-8.3
[2] https://datatracker.ietf.org/doc/html/rfc7432#section-8.5
[3] https://github.com/idosch/iproute2/tree/submit/non_df_filter_v1
[4] https://lore.kernel.org/netdev/20230518113328.1952135-1-idosch@nvidia.com/
[5] https://lore.kernel.org/netdev/20230509070446.246088-1-idosch@nvidia.com/
====================

Link: https://lore.kernel.org/r/20230529114835.372140-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoselftests: forwarding: Add layer 2 miss test cases
Ido Schimmel [Mon, 29 May 2023 11:48:35 +0000 (14:48 +0300)]
selftests: forwarding: Add layer 2 miss test cases

Add test cases to verify that the bridge driver correctly marks layer 2
misses only when it should and that the flower classifier can match on
this metadata.

Example output:

 # ./tc_flower_l2_miss.sh
 TEST: L2 miss - Unicast                                             [ OK ]
 TEST: L2 miss - Multicast (IPv4)                                    [ OK ]
 TEST: L2 miss - Multicast (IPv6)                                    [ OK ]
 TEST: L2 miss - Link-local multicast (IPv4)                         [ OK ]
 TEST: L2 miss - Link-local multicast (IPv6)                         [ OK ]
 TEST: L2 miss - Broadcast                                           [ OK ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomlxsw: spectrum_flower: Add ability to match on layer 2 miss
Ido Schimmel [Mon, 29 May 2023 11:48:34 +0000 (14:48 +0300)]
mlxsw: spectrum_flower: Add ability to match on layer 2 miss

Add the 'fdb_miss' key element to supported key blocks and make use of
it to match on layer 2 miss.

The key is only supported on Spectrum-{2,3,4}. An error is returned for
Spectrum-1 since the key element is not present in any of its key
blocks.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomlxsw: spectrum_flower: Do not force matching on iif
Ido Schimmel [Mon, 29 May 2023 11:48:33 +0000 (14:48 +0300)]
mlxsw: spectrum_flower: Do not force matching on iif

Currently, mlxsw only supports the 'ingress_ifindex' field in the
'FLOW_DISSECTOR_KEY_META' key, but subsequent patches are going to add
support for the 'l2_miss' field as well. It is valid to only match on
'l2_miss' without 'ingress_ifindex', so do not force matching on it.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomlxsw: spectrum_flower: Split iif parsing to a separate function
Ido Schimmel [Mon, 29 May 2023 11:48:32 +0000 (14:48 +0300)]
mlxsw: spectrum_flower: Split iif parsing to a separate function

Currently, mlxsw only supports the 'ingress_ifindex' field in the
'FLOW_DISSECTOR_KEY_META' key, but subsequent patches are going to add
support for the 'l2_miss' field as well. Split the parsing of the
'ingress_ifindex' field to a separate function to avoid nesting. No
functional changes intended.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>