platform/kernel/linux-starfive.git
13 months agonet: dsa: qca8k: add op to get ports netdev
Andrew Lunn [Mon, 29 May 2023 16:32:43 +0000 (18:32 +0200)]
net: dsa: qca8k: add op to get ports netdev

In order that the LED trigger can blink the switch MAC ports LED, it
needs to know the netdev associated to the port. Add the callback to
return the struct device of the netdev.

Add an helper function qca8k_phy_to_port() to convert the phy back to
dsa_port index, as we reference LED port based on the internal PHY
index and needs to be converted back.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: dsa: qca8k: implement hw_control ops
Christian Marangi [Mon, 29 May 2023 16:32:42 +0000 (18:32 +0200)]
net: dsa: qca8k: implement hw_control ops

Implement hw_control ops to drive Switch LEDs based on hardware events.

Netdev trigger is the declared supported trigger for hw control
operation and supports the following mode:
- tx
- rx

When hw_control_set is called, LEDs are set to follow the requested
mode.
Each LEDs will blink at 4Hz by default.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: expose netdev trigger modes in linux include
Christian Marangi [Mon, 29 May 2023 16:32:41 +0000 (18:32 +0200)]
leds: trigger: netdev: expose netdev trigger modes in linux include

Expose netdev trigger modes to make them accessible by LED driver that
will support netdev trigger for hw control.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: init mode if hw control already active
Christian Marangi [Mon, 29 May 2023 16:32:40 +0000 (18:32 +0200)]
leds: trigger: netdev: init mode if hw control already active

On netdev trigger activation, hw control may be already active by
default. If this is the case and a device is actually provided by
hw_control_get_device(), init the already active mode and set the
bool to hw_control bool to true to reflect the already set mode in the
trigger_data.

Co-developed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: validate configured netdev
Andrew Lunn [Mon, 29 May 2023 16:32:39 +0000 (18:32 +0200)]
leds: trigger: netdev: validate configured netdev

The netdev which the LED should blink for is configurable in
/sys/class/led/foo/device_name. Ensure when offloading that the
configured netdev is the same as the netdev the LED is associated
with. If it is not, only perform software blinking.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: add support for LED hw control
Christian Marangi [Mon, 29 May 2023 16:32:38 +0000 (18:32 +0200)]
leds: trigger: netdev: add support for LED hw control

Add support for LED hw control for the netdev trigger.

The trigger on calling set_baseline_state to configure a new mode, will
do various check to verify if hw control can be used for the requested
mode in can_hw_control() function.

It will first check if the LED driver supports hw control for the netdev
trigger, then will use hw_control_is_supported() and finally will call
hw_control_set() to apply the requested mode.

To use such mode, interval MUST be set to the default value and net_dev
MUST be set. If one of these 2 value are not valid, hw control will
never be used and normal software fallback is used.

The default interval value is moved to a define to make sure they are
always synced.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: reject interval store for hw_control
Christian Marangi [Mon, 29 May 2023 16:32:37 +0000 (18:32 +0200)]
leds: trigger: netdev: reject interval store for hw_control

Reject interval store with hw_control enabled. It's are currently not
supported and MUST be set to the default value with hw control enabled.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: add basic check for hw control support
Christian Marangi [Mon, 29 May 2023 16:32:36 +0000 (18:32 +0200)]
leds: trigger: netdev: add basic check for hw control support

Add basic check for hw control support. Check if the required API are
defined and check if the defined trigger supported in hw control for the
LED driver match netdev.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: introduce check for possible hw control
Christian Marangi [Mon, 29 May 2023 16:32:35 +0000 (18:32 +0200)]
leds: trigger: netdev: introduce check for possible hw control

Introduce function to check if the requested mode can use hw control in
preparation for hw control support. Currently everything is handled in
software so can_hw_control will always return false.

Add knob with the new value hw_control in trigger_data struct to
set hw control possible. Useful for future implementation to implement
in set_baseline_state() the required function to set the requested mode
using LEDs hw control ops and in other function to reject set if hw
control is currently active.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: trigger: netdev: refactor code setting device name
Andrew Lunn [Mon, 29 May 2023 16:32:34 +0000 (18:32 +0200)]
leds: trigger: netdev: refactor code setting device name

Move the code into a helper, ready for it to be called at
other times. No intended behaviour change.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoDocumentation: leds: leds-class: Document new Hardware driven LEDs APIs
Christian Marangi [Mon, 29 May 2023 16:32:33 +0000 (18:32 +0200)]
Documentation: leds: leds-class: Document new Hardware driven LEDs APIs

Document new Hardware driven LEDs APIs.

Some LEDs can be programmed to be driven by hardware. This is not
limited to blink but also to turn off or on autonomously.
To support this feature, a LED needs to implement various additional
ops and needs to declare specific support for the supported triggers.

Add documentation for each required value and API to make hw control
possible and implementable by both LEDs and triggers.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: add API to get attached device for LED hw control
Andrew Lunn [Mon, 29 May 2023 16:32:32 +0000 (18:32 +0200)]
leds: add API to get attached device for LED hw control

Some specific LED triggers blink the LED based on events from a device
or subsystem.
For example, an LED could be blinked to indicate a network device is
receiving packets, or a disk is reading blocks. To correctly enable and
request the hw control of the LED, the trigger has to check if the
network interface or block device configured via a /sys/class/led file
match the one the LED driver provide for hw control for.

Provide an API call to get the device which the LED blinks for.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoleds: add APIs for LEDs hw control
Christian Marangi [Mon, 29 May 2023 16:32:31 +0000 (18:32 +0200)]
leds: add APIs for LEDs hw control

Add an option to permit LED driver to declare support for a specific
trigger to use hw control and setup the LED to blink based on specific
provided modes.

Add APIs for LEDs hw control. These functions will be used to activate
hardware control where a LED will use the provided flags, from an
unique defined supported trigger, to setup the LED to be driven by
hardware.

Add hw_control_is_supported() to ask the LED driver if the requested
mode by the trigger are supported and the LED can be setup to follow
the requested modes.

Deactivate hardware blink control by setting brightness to LED_OFF via
the brightness_set() callback.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agotipc: delete tipc_mtu_bad from tipc_udp_enable
Xin Long [Mon, 29 May 2023 14:52:13 +0000 (10:52 -0400)]
tipc: delete tipc_mtu_bad from tipc_udp_enable

Since commit a4dfa72d0acd ("tipc: set default MTU for UDP media"), it's
been no longer using dev->mtu for b->mtu, and the issue described in
commit 3de81b758853 ("tipc: check minimum bearer MTU") doesn't exist
in UDP bearer any more.

Besides, dev->mtu can still be changed to a too small mtu after the UDP
bearer is created even with tipc_mtu_bad() check in tipc_udp_enable().
Note that NETDEV_CHANGEMTU event processing in tipc_l2_device_event()
doesn't really work for UDP bearer.

So this patch deletes the unnecessary tipc_mtu_bad from tipc_udp_enable.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/282f1f5cc40e6cad385aa1c60569e6c5b70e2fb3.1685371933.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'net-dsa-mv88e6xxx-add-88e6361-support'
Jakub Kicinski [Wed, 31 May 2023 06:54:35 +0000 (23:54 -0700)]
Merge branch 'net-dsa-mv88e6xxx-add-88e6361-support'

Alexis Lothoré says:

====================
net: dsa: mv88e6xxx: add 88E6361 support

This series brings initial support for Marvell 88E6361 switch.

MV88E6361 is a 8 ports switch with 5 integrated Gigabit PHYs and 3
2.5Gigabit SerDes interfaces. It is in fact a new variant in the
88E639X/88E6193X/88E6191X family with a subset of existing features:
- port 0: MII, RMII, RGMII, 1000BaseX, 2500BaseX
- port 3 to 7: triple speed internal phys
- port 9 and 10: 1000BaseX, 25000BaseX

Since said family is already well supported in mv88e6xxx driver, adding
initial support for this new switch mostly consists in finding the ID
exposed in its identification register, adding a proper description
in switch description tables in mv88e6xxx driver, and enforcing 88E6361
specificities in mv88e6393x_XXX methods.

- first 4 commits introduce an internal phy offset field for switches which
  have internal phys but not starting from port 0
- 5th commit is a fix on existing switches based on first commits
- 6th commit is a slight modification to prepare 886361 support
- last commit introduces 88E6361 support in 88E6393X family

This initial support has been tested with two samples of a custom board
with the following hardware configuration:
- a main CPU connected to MV88E6361 using port 0 as CPU port
- port 9 wired to a SFP cage
- port 10 wired to a G.Hn transceiver

The following setup was used:
PC <-ethernet-> (copper SFP) - Board 1 - (G.hn) <-phone line(RJ11)-> (G.hn) Board 2

The unit 1 has been configured to bridge SFP port and G.hn port together,
which allowed to successfully ping Board 2 from PC.
====================

Link: https://lore.kernel.org/r/20230529080246.82953-1-alexis.lothore@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dsa: mv88e6xxx: enable support for 88E6361 switch
Alexis Lothoré [Mon, 29 May 2023 08:02:46 +0000 (10:02 +0200)]
net: dsa: mv88e6xxx: enable support for 88E6361 switch

Marvell 88E6361 is an 8-port switch derived from the
88E6393X/88E9193X/88E6191X switches family. It can benefit from the
existing mv88e6xxx driver by simply adding the proper switch description in
the driver. Main differences with other switches from this
family are:
- 8 ports exposed (instead of 11): ports 1, 2 and 8 not available
- No 5GBase-x nor SFI/USXGMII support

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dsa: mv88e6xxx: pass mv88e6xxx_chip structure to port_max_speed_mode
Alexis Lothoré [Mon, 29 May 2023 08:02:45 +0000 (10:02 +0200)]
net: dsa: mv88e6xxx: pass mv88e6xxx_chip structure to port_max_speed_mode

Some switches families have minor differences on supported link speed for
ports. Instead of redefining a new port_max_speed_mode for each different
configuration, allow to pass mv88e6xxx_chip structure to allow
differentiating those chips by known chip id

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dsa: mv88e6xxx: fix 88E6393X family internal phys layout
Alexis Lothoré [Mon, 29 May 2023 08:02:44 +0000 (10:02 +0200)]
net: dsa: mv88e6xxx: fix 88E6393X family internal phys layout

88E6393X/88E6193X/88E6191X switches have in fact 8 internal PHYs, but those
are not present starting at port 0: supported ports go from 1 to 8

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dsa: mv88e6xxx: add field to specify internal phys layout
Alexis Lothoré [Mon, 29 May 2023 08:02:43 +0000 (10:02 +0200)]
net: dsa: mv88e6xxx: add field to specify internal phys layout

mv88e6xxx currently assumes that switch equipped with internal phys have
those phys mapped contiguously starting from port 0 (see
mv88e6xxx_phy_is_internal). However, some switches have internal PHYs but
NOT starting from port 0. For example 88e6393X, 88E6193X and 88E6191X have
integrated PHYs available on ports 1 to 8
To properly support this offset, add a new field to allow specifying an
internal PHYs layout. If field is not set, default layout is assumed (start
at port 0)

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dsa: mv88e6xxx: use mv88e6xxx_phy_is_internal in mv88e6xxx_port_ppu_updates
Alexis Lothoré [Mon, 29 May 2023 08:02:42 +0000 (10:02 +0200)]
net: dsa: mv88e6xxx: use mv88e6xxx_phy_is_internal in mv88e6xxx_port_ppu_updates

Make sure to use existing helper to get internal PHYs count instead of
redoing it manually

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dsa: mv88e6xxx: pass directly chip structure to mv88e6xxx_phy_is_internal
Alexis Lothoré [Mon, 29 May 2023 08:02:41 +0000 (10:02 +0200)]
net: dsa: mv88e6xxx: pass directly chip structure to mv88e6xxx_phy_is_internal

Since this function is a simple helper, we do not need to pass a full
dsa_switch structure, we can directly pass the mv88e6xxx_chip structure.
Doing so will allow to share this function with any other function
not manipulating dsa_switch structure but needing info about number of
internal phys

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodt-bindings: net: dsa: marvell: add MV88E6361 switch to compatibility list
Alexis Lothoré [Mon, 29 May 2023 08:02:40 +0000 (10:02 +0200)]
dt-bindings: net: dsa: marvell: add MV88E6361 switch to compatibility list

Marvell MV88E6361 is an 8-port switch derived from the
88E6393X/88E9193X/88E6191X switches family. Since its functional behavior
is very close to switches from this family, it can benefit from existing
drivers for this family, so add it to the list of compatible switches

Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'add-layer-2-miss-indication-and-filtering'
Jakub Kicinski [Wed, 31 May 2023 06:37:02 +0000 (23:37 -0700)]
Merge branch 'add-layer-2-miss-indication-and-filtering'

Ido Schimmel says:

====================
Add layer 2 miss indication and filtering

tl;dr
=====

This patchset adds a single bit to the tc skb extension to indicate that
a packet encountered a layer 2 miss in the bridge and extends flower to
match on this metadata. This is required for non-DF (Designated
Forwarder) filtering in EVPN multi-homing which prevents decapsulated
BUM packets from being forwarded multiple times to the same multi-homed
host.

Background
==========

In a typical EVPN multi-homing setup each host is multi-homed using a
set of links called ES (Ethernet Segment, i.e., LAG) to multiple leaf
switches in a rack. These switches act as VTEPs and are not directly
connected (as opposed to MLAG), but can communicate with each other (as
well as with VTEPs in remote racks) via spine switches over L3.

When a host sends a BUM packet over ES1 to VTEP1, the VTEP will flood it
to other VTEPs in the network, including those connected to the host
over ES1. The receiving VTEPs must drop the packet and not forward it
back to the host. This is called "split-horizon filtering" (SPH) [1].

FRR configures SPH filtering using two tc filters. The first, an ingress
filter that matches on packets received from VTEP1 and marks them using
a fwmark (firewall mark). The second, an egress filter configured on the
LAG interface connected to the host that matches on the fwmark and drops
the packets. Example:

 # tc filter add dev vxlan0 ingress pref 1 proto all flower enc_src_ip $VTEP1_IP action skbedit mark 101
 # tc filter add dev bond0 egress pref 1 handle 101 fw action drop

Motivation
==========

For each ES, only one VTEP is elected by the control plane as the DF.
The DF is responsible for forwarding decapsulated BUM traffic to the
host over the ES. The non-DF VTEPs must drop such traffic as otherwise
the host will receive multiple copies of BUM traffic. This is called
"non-DF filtering" [2].

Filtering of multicast and broadcast traffic can be achieved using the
following flower filter:

 # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop

Unlike broadcast and multicast traffic, it is not currently possible to
filter unknown unicast traffic. The classification into unknown unicast
is performed by the bridge driver, but is not visible to other layers.

Implementation
==============

The proposed solution is to add a single bit to the tc skb extension
that is set by the bridge for packets that encountered an FDB or MDB
miss. The flower classifier is extended to be able to match on this new
metadata bit in a similar fashion to existing metadata options such as
'indev'.

A bit that is set for every flooded packet would also work, but it does
not allow us to differentiate between registered and unregistered
multicast traffic which might be useful in the future.

A relatively generic name is chosen for this bit - 'l2_miss' - to allow
its use to be extended to other layer 2 devices such as VXLAN, should a
use case arise.

With the above, the control plane can implement a non-DF filter using
the following tc filters:

 # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop
 # tc filter add dev bond0 egress pref 2 proto all flower indev vxlan0 l2_miss true action drop

The first drops broadcast and multicast traffic and the second drops
unknown unicast traffic.

Testing
=======

A test exercising the different permutations of the 'l2_miss' bit is
added in patch #8.

Patchset overview
=================

Patch #1 adds the new bit to the tc skb extension and sets it in the
bridge driver for packets that encountered a miss. The marking of the
packets and the use of this extension is protected by the
'tc_skb_ext_tc' static key in order to keep performance impact to a
minimum when the feature is not in use.

Patch #2 extends the flow dissector to dissect this information from the
tc skb extension into the 'FLOW_DISSECTOR_KEY_META' key.

Patch #3 extends the flower classifier to be able to match on the new
layer 2 miss metadata. The classifier enables the 'tc_skb_ext_tc' static
key upon the installation of the first filter that matches on 'l2_miss'
and disables the key upon the removal of the last filter that matches on
it.

Patch #4 rejects matching on the new metadata in drivers that already
support the 'FLOW_DISSECTOR_KEY_META' key.

Patches #5-#6 are small preparations in mlxsw.

Patch #7 extends mlxsw to be able to match on layer 2 miss.

Patch #8 adds a selftest.

iproute2 patches can be found here [3].

[1] https://datatracker.ietf.org/doc/html/rfc7432#section-8.3
[2] https://datatracker.ietf.org/doc/html/rfc7432#section-8.5
[3] https://github.com/idosch/iproute2/tree/submit/non_df_filter_v1
[4] https://lore.kernel.org/netdev/20230518113328.1952135-1-idosch@nvidia.com/
[5] https://lore.kernel.org/netdev/20230509070446.246088-1-idosch@nvidia.com/
====================

Link: https://lore.kernel.org/r/20230529114835.372140-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoselftests: forwarding: Add layer 2 miss test cases
Ido Schimmel [Mon, 29 May 2023 11:48:35 +0000 (14:48 +0300)]
selftests: forwarding: Add layer 2 miss test cases

Add test cases to verify that the bridge driver correctly marks layer 2
misses only when it should and that the flower classifier can match on
this metadata.

Example output:

 # ./tc_flower_l2_miss.sh
 TEST: L2 miss - Unicast                                             [ OK ]
 TEST: L2 miss - Multicast (IPv4)                                    [ OK ]
 TEST: L2 miss - Multicast (IPv6)                                    [ OK ]
 TEST: L2 miss - Link-local multicast (IPv4)                         [ OK ]
 TEST: L2 miss - Link-local multicast (IPv6)                         [ OK ]
 TEST: L2 miss - Broadcast                                           [ OK ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomlxsw: spectrum_flower: Add ability to match on layer 2 miss
Ido Schimmel [Mon, 29 May 2023 11:48:34 +0000 (14:48 +0300)]
mlxsw: spectrum_flower: Add ability to match on layer 2 miss

Add the 'fdb_miss' key element to supported key blocks and make use of
it to match on layer 2 miss.

The key is only supported on Spectrum-{2,3,4}. An error is returned for
Spectrum-1 since the key element is not present in any of its key
blocks.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomlxsw: spectrum_flower: Do not force matching on iif
Ido Schimmel [Mon, 29 May 2023 11:48:33 +0000 (14:48 +0300)]
mlxsw: spectrum_flower: Do not force matching on iif

Currently, mlxsw only supports the 'ingress_ifindex' field in the
'FLOW_DISSECTOR_KEY_META' key, but subsequent patches are going to add
support for the 'l2_miss' field as well. It is valid to only match on
'l2_miss' without 'ingress_ifindex', so do not force matching on it.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomlxsw: spectrum_flower: Split iif parsing to a separate function
Ido Schimmel [Mon, 29 May 2023 11:48:32 +0000 (14:48 +0300)]
mlxsw: spectrum_flower: Split iif parsing to a separate function

Currently, mlxsw only supports the 'ingress_ifindex' field in the
'FLOW_DISSECTOR_KEY_META' key, but subsequent patches are going to add
support for the 'l2_miss' field as well. Split the parsing of the
'ingress_ifindex' field to a separate function to avoid nesting. No
functional changes intended.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoflow_offload: Reject matching on layer 2 miss
Ido Schimmel [Mon, 29 May 2023 11:48:31 +0000 (14:48 +0300)]
flow_offload: Reject matching on layer 2 miss

Adjust drivers that support the 'FLOW_DISSECTOR_KEY_META' key to reject
filters that try to match on the newly added layer 2 miss field. Add an
extack message to clearly communicate the failure reason to user space.

The following users were not patched:

1. mtk_flow_offload_replace(): Only checks that the key is present, but
   does not do anything with it.
2. mlx5_tc_ct_set_tuple_match(): Used as part of netfilter offload,
   which does not make use of the new field, unlike tc.
3. get_netdev_from_rule() in nfp: Likewise.

Example:

 # tc filter add dev swp1 egress pref 1 proto all flower skip_sw l2_miss true action drop
 Error: mlxsw_spectrum: Can't match on "l2_miss".
 We have an error talking to the kernel

Acked-by: Elad Nachman <enachman@marvell.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet/sched: flower: Allow matching on layer 2 miss
Ido Schimmel [Mon, 29 May 2023 11:48:30 +0000 (14:48 +0300)]
net/sched: flower: Allow matching on layer 2 miss

Add the 'TCA_FLOWER_L2_MISS' netlink attribute that allows user space to
match on packets that encountered a layer 2 miss. The miss indication is
set as metadata in the tc skb extension by the bridge driver upon FDB or
MDB lookup miss and dissected by the flow dissector to the
'FLOW_DISSECTOR_KEY_META' key.

The use of this skb extension is guarded by the 'tc_skb_ext_tc' static
key. As such, enable / disable this key when filters that match on layer
2 miss are added / deleted.

Tested:

 # cat tc_skb_ext_tc.py
 #!/usr/bin/env -S drgn -s vmlinux

 refcount = prog["tc_skb_ext_tc"].key.enabled.counter.value_()
 print(f"tc_skb_ext_tc reference count is {refcount}")

 # ./tc_skb_ext_tc.py
 tc_skb_ext_tc reference count is 0

 # tc filter add dev swp1 egress proto all handle 101 pref 1 flower src_mac 00:11:22:33:44:55 action drop
 # tc filter add dev swp1 egress proto all handle 102 pref 2 flower src_mac 00:11:22:33:44:55 l2_miss true action drop
 # tc filter add dev swp1 egress proto all handle 103 pref 3 flower src_mac 00:11:22:33:44:55 l2_miss false action drop

 # ./tc_skb_ext_tc.py
 tc_skb_ext_tc reference count is 2

 # tc filter replace dev swp1 egress proto all handle 102 pref 2 flower src_mac 00:01:02:03:04:05 l2_miss false action drop

 # ./tc_skb_ext_tc.py
 tc_skb_ext_tc reference count is 2

 # tc filter del dev swp1 egress proto all handle 103 pref 3 flower
 # tc filter del dev swp1 egress proto all handle 102 pref 2 flower
 # tc filter del dev swp1 egress proto all handle 101 pref 1 flower

 # ./tc_skb_ext_tc.py
 tc_skb_ext_tc reference count is 0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoflow_dissector: Dissect layer 2 miss from tc skb extension
Ido Schimmel [Mon, 29 May 2023 11:48:29 +0000 (14:48 +0300)]
flow_dissector: Dissect layer 2 miss from tc skb extension

Extend the 'FLOW_DISSECTOR_KEY_META' key with a new 'l2_miss' field and
populate it from a field with the same name in the tc skb extension.
This field is set by the bridge driver for packets that incur an FDB or
MDB miss.

The next patch will extend the flower classifier to be able to match on
layer 2 misses.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoskbuff: bridge: Add layer 2 miss indication
Ido Schimmel [Mon, 29 May 2023 11:48:28 +0000 (14:48 +0300)]
skbuff: bridge: Add layer 2 miss indication

For EVPN non-DF (Designated Forwarder) filtering we need to be able to
prevent decapsulated traffic from being flooded to a multi-homed host.
Filtering of multicast and broadcast traffic can be achieved using the
following flower filter:

 # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop

Unlike broadcast and multicast traffic, it is not currently possible to
filter unknown unicast traffic. The classification into unknown unicast
is performed by the bridge driver, but is not visible to other layers
such as tc.

Solve this by adding a new 'l2_miss' bit to the tc skb extension. Clear
the bit whenever a packet enters the bridge (received from a bridge port
or transmitted via the bridge) and set it if the packet did not match an
FDB or MDB entry. If there is no skb extension and the bit needs to be
cleared, then do not allocate one as no extension is equivalent to the
bit being cleared. The bit is not set for broadcast packets as they
never perform a lookup and therefore never incur a miss.

A bit that is set for every flooded packet would also work for the
current use case, but it does not allow us to differentiate between
registered and unregistered multicast traffic, which might be useful in
the future.

To keep the performance impact to a minimum, the marking of packets is
guarded by the 'tc_skb_ext_tc' static key. When 'false', the skb is not
touched and an skb extension is not allocated. Instead, only a
5 bytes nop is executed, as demonstrated below for the call site in
br_handle_frame().

Before the patch:

```
        memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
  c37b09:       49 c7 44 24 28 00 00    movq   $0x0,0x28(%r12)
  c37b10:       00 00

        p = br_port_get_rcu(skb->dev);
  c37b12:       49 8b 44 24 10          mov    0x10(%r12),%rax
        memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
  c37b17:       49 c7 44 24 30 00 00    movq   $0x0,0x30(%r12)
  c37b1e:       00 00
  c37b20:       49 c7 44 24 38 00 00    movq   $0x0,0x38(%r12)
  c37b27:       00 00
```

After the patch (when static key is disabled):

```
        memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
  c37c29:       49 c7 44 24 28 00 00    movq   $0x0,0x28(%r12)
  c37c30:       00 00
  c37c32:       49 8d 44 24 28          lea    0x28(%r12),%rax
  c37c37:       48 c7 40 08 00 00 00    movq   $0x0,0x8(%rax)
  c37c3e:       00
  c37c3f:       48 c7 40 10 00 00 00    movq   $0x0,0x10(%rax)
  c37c46:       00

#ifdef CONFIG_HAVE_JUMP_LABEL_HACK

static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
{
        asm_volatile_goto("1:"
  c37c47:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
        br_tc_skb_miss_set(skb, false);

        p = br_port_get_rcu(skb->dev);
  c37c4c:       49 8b 44 24 10          mov    0x10(%r12),%rax
```

Subsequent patches will extend the flower classifier to be able to match
on the new 'l2_miss' bit and enable / disable the static key when
filters that match on it are added / deleted.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'devlink-move-port-ops-into-separate-structure'
Jakub Kicinski [Tue, 30 May 2023 17:32:22 +0000 (10:32 -0700)]
Merge branch 'devlink-move-port-ops-into-separate-structure'

Jiri Pirko says:

====================
devlink: move port ops into separate structure

In devlink, some of the objects have separate ops registered alongside
with the object itself. Port however have ops in devlink_ops structure.
For drivers what register multiple kinds of ports with different ops
this is not convenient.

This patchset changes does following changes:
1) Introduces devlink_port_ops with functions that allow devlink port
   to be registered passing a pointer to driver port ops. (patch #1)
2) Converts drivers to define port_ops and register ports passing the
   ops pointer. (patches #2, #3, #4, #6, #8, and #9)
3) Moves ops from devlink_ops struct to devlink_port_ops.
   (patches #5, #7, #10-15)

No functional changes.
====================

Link: https://lore.kernel.org/r/20230526102841.2226553-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodevlink: save devlink_port_ops into a variable in devlink_port_function_validate()
Jiri Pirko [Fri, 26 May 2023 10:28:41 +0000 (12:28 +0200)]
devlink: save devlink_port_ops into a variable in devlink_port_function_validate()

Now when the original ops variable is removed, introduce it again
but this time for devlink_port_ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodevlink: move port_del() to devlink_port_ops
Jiri Pirko [Fri, 26 May 2023 10:28:40 +0000 (12:28 +0200)]
devlink: move port_del() to devlink_port_ops

Move port_del() from devlink_ops into newly introduced devlink_port_ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodevlink: move port_fn_state_get/set() to devlink_port_ops
Jiri Pirko [Fri, 26 May 2023 10:28:39 +0000 (12:28 +0200)]
devlink: move port_fn_state_get/set() to devlink_port_ops

Move port_fn_state_get/set() from devlink_ops into newly introduced
devlink_port_ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodevlink: move port_fn_migratable_get/set() to devlink_port_ops
Jiri Pirko [Fri, 26 May 2023 10:28:38 +0000 (12:28 +0200)]
devlink: move port_fn_migratable_get/set() to devlink_port_ops

Move port_fn_migratable_get/set() from devlink_ops into newly introduced
devlink_port_ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodevlink: move port_fn_roce_get/set() to devlink_port_ops
Jiri Pirko [Fri, 26 May 2023 10:28:37 +0000 (12:28 +0200)]
devlink: move port_fn_roce_get/set() to devlink_port_ops

Move port_fn_roce_get/set() from devlink_ops into newly introduced
devlink_port_ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodevlink: move port_fn_hw_addr_get/set() to devlink_port_ops
Jiri Pirko [Fri, 26 May 2023 10:28:36 +0000 (12:28 +0200)]
devlink: move port_fn_hw_addr_get/set() to devlink_port_ops

Move port_fn_hw_addr_get/set() from devlink_ops into newly introduced
devlink_port_ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomlx5: register devlink ports with ops
Jiri Pirko [Fri, 26 May 2023 10:28:35 +0000 (12:28 +0200)]
mlx5: register devlink ports with ops

Use newly introduce devlink port registration function variant and
register devlink port passing ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agosfc: register devlink port with ops
Jiri Pirko [Fri, 26 May 2023 10:28:34 +0000 (12:28 +0200)]
sfc: register devlink port with ops

Use newly introduce devlink port registration function variant and
register devlink port passing ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodevlink: move port_type_set() op into devlink_port_ops
Jiri Pirko [Fri, 26 May 2023 10:28:33 +0000 (12:28 +0200)]
devlink: move port_type_set() op into devlink_port_ops

Move port_type_set() from devlink_ops into newly introduced
devlink_port_ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomlx4: register devlink port with ops
Jiri Pirko [Fri, 26 May 2023 10:28:32 +0000 (12:28 +0200)]
mlx4: register devlink port with ops

Use newly introduce devlink port registration function variant and
register devlink port passing ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodevlink: move port_split/unsplit() ops into devlink_port_ops
Jiri Pirko [Fri, 26 May 2023 10:28:31 +0000 (12:28 +0200)]
devlink: move port_split/unsplit() ops into devlink_port_ops

Move port_split/unsplit() from devlink_ops into newly introduced
devlink_port_ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonfp: devlink: register devlink port with ops
Jiri Pirko [Fri, 26 May 2023 10:28:30 +0000 (12:28 +0200)]
nfp: devlink: register devlink port with ops

Use newly introduce devlink port registration function variant and
register devlink port passing ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agomlxsw_core: register devlink port with ops
Jiri Pirko [Fri, 26 May 2023 10:28:29 +0000 (12:28 +0200)]
mlxsw_core: register devlink port with ops

Use newly introduce devlink port registration function variant and
register devlink port passing ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Tested-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoice: register devlink port for PF with ops
Jiri Pirko [Fri, 26 May 2023 10:28:28 +0000 (12:28 +0200)]
ice: register devlink port for PF with ops

Use newly introduce devlink port registration function variant and
register devlink port passing ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodevlink: introduce port ops placeholder
Jiri Pirko [Fri, 26 May 2023 10:28:27 +0000 (12:28 +0200)]
devlink: introduce port ops placeholder

In devlink, some of the objects have separate ops registered alongside
with the object itself. Port however have ops in devlink_ops structure.
For drivers what register multiple kinds of ports with different ops
this is not convenient. Introduce devlink_port_ops and a set
of functions that allow drivers to pass ops pointer during
port registration.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: fec: remove last_bdp from fec_enet_txq_xmit_frame()
Wei Fang [Mon, 29 May 2023 02:26:15 +0000 (10:26 +0800)]
net: fec: remove last_bdp from fec_enet_txq_xmit_frame()

The last_bdp is initialized to bdp, and both last_bdp and bdp are
not changed. That is to say that last_bdp and bdp are always equal.
So bdp can be used directly.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230529022615.669589-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agor8169: check for PCI read error in probe
Heiner Kallweit [Sun, 28 May 2023 17:35:12 +0000 (19:35 +0200)]
r8169: check for PCI read error in probe

Check whether first PCI read returns 0xffffffff. Currently, if this is
the case, the user sees the following misleading message:
unknown chip XID fcf, contact r8169 maintainers (see MAINTAINERS file)

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/75b54d23-fefe-2bf4-7e80-c9d3bc91af11@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agodsa: lan9303: Remove stray gpiod_unexport() call
Andy Shevchenko [Sun, 28 May 2023 14:25:31 +0000 (17:25 +0300)]
dsa: lan9303: Remove stray gpiod_unexport() call

There is no gpiod_export() and gpiod_unexport() looks pretty much stray.
The gpiod_export() and gpiod_unexport() shouldn't be used in the code,
GPIO sysfs is deprecated. That said, simply drop the stray call.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230528142531.38602-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agoliquidio: Use vzalloc()
Christophe JAILLET [Sat, 27 May 2023 19:40:08 +0000 (21:40 +0200)]
liquidio: Use vzalloc()

Use vzalloc() instead of hand writing it with vmalloc()+memset().
This is less verbose.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/93b010824d9d92376e8d49b9eb396a0fa0c0ac80.1685216322.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agoMerge branch 'microchip_t1s-update-on-microchip-10base-t1s-phy-driver'
Paolo Abeni [Tue, 30 May 2023 09:50:07 +0000 (11:50 +0200)]
Merge branch 'microchip_t1s-update-on-microchip-10base-t1s-phy-driver'

Parthiban Veerasooran says:

====================
microchip_t1s: Update on Microchip 10BASE-T1S PHY driver

This patch series contain the below updates,
- Fixes on the Microchip LAN8670/1/2 10BASE-T1S PHYs support in the
  net/phy/microchip_t1s.c driver.
- Adds support for the Microchip LAN8650/1 Rev.B0 10BASE-T1S Internal
  PHYs in the net/phy/microchip_t1s.c driver.
====================

Link: https://lore.kernel.org/r/20230526152348.70781-1-Parthiban.Veerasooran@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet: phy: microchip_t1s: add support for Microchip LAN865x Rev.B0 PHYs
Parthiban Veerasooran [Fri, 26 May 2023 15:23:48 +0000 (20:53 +0530)]
net: phy: microchip_t1s: add support for Microchip LAN865x Rev.B0 PHYs

Add support for the Microchip LAN865x Rev.B0 10BASE-T1S Internal PHYs
(LAN8650/1). The LAN865x combines a Media Access Controller (MAC) and an
internal 10BASE-T1S Ethernet PHY to access 10BASE‑T1S networks. As
LAN867X and LAN865X are using the same function for the read_status,
rename the function as lan86xx_read_status.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Reviewed-by: Ramón Nordin Rodriguez <ramon.nordin.rodriguez@ferroamp.se>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet: phy: microchip_t1s: remove unnecessary interrupts disabling code
Parthiban Veerasooran [Fri, 26 May 2023 15:23:47 +0000 (20:53 +0530)]
net: phy: microchip_t1s: remove unnecessary interrupts disabling code

By default, except Reset Complete interrupt in the Interrupt Mask 2
Register all other interrupts are disabled/masked. As Reset Complete
status is already handled, it doesn't make sense to disable it.

Reviewed-by: Ramón Nordin Rodriguez <ramon.nordin.rodriguez@ferroamp.se>
Tested-by: Ramón Nordin Rodriguez <ramon.nordin.rodriguez@ferroamp.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet: phy: microchip_t1s: fix reset complete status handling
Parthiban Veerasooran [Fri, 26 May 2023 15:23:46 +0000 (20:53 +0530)]
net: phy: microchip_t1s: fix reset complete status handling

As per the datasheet DS-LAN8670-1-2-60001573C.pdf, the Reset Complete
status bit in the STS2 register has to be checked before proceeding to
the initial configuration. Reading STS2 register will also clear the
Reset Complete interrupt which is non-maskable.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Reviewed-by: Ramón Nordin Rodriguez <ramon.nordin.rodriguez@ferroamp.se>
Tested-by: Ramón Nordin Rodriguez <ramon.nordin.rodriguez@ferroamp.se>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet: phy: microchip_t1s: update LAN867x PHY supported revision number
Parthiban Veerasooran [Fri, 26 May 2023 15:23:45 +0000 (20:53 +0530)]
net: phy: microchip_t1s: update LAN867x PHY supported revision number

As per AN1699, the initial configuration in the driver applies to LAN867x
Rev.B1 hardware revision. 0x0007C160 (Rev.A0) and 0x0007C161 (Rev.B0)
never released to production and hence they don't need to be supported.

Reviewed-by: Ramón Nordin Rodriguez <ramon.nordin.rodriguez@ferroamp.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet: phy: microchip_t1s: replace read-modify-write code with phy_modify_mmd
Parthiban Veerasooran [Fri, 26 May 2023 15:23:44 +0000 (20:53 +0530)]
net: phy: microchip_t1s: replace read-modify-write code with phy_modify_mmd

Replace read-modify-write code in the lan867x_config_init function to
avoid handling data type mismatch and to simplify the code.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Reviewed-by: Ramón Nordin Rodriguez <ramon.nordin.rodriguez@ferroamp.se>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet: phy: microchip_t1s: modify driver description to be more generic
Parthiban Veerasooran [Fri, 26 May 2023 15:23:43 +0000 (20:53 +0530)]
net: phy: microchip_t1s: modify driver description to be more generic

Remove LAN867X from the driver description as this driver is common for
all the Microchip 10BASE-T1S PHYs.

Reviewed-by: Ramón Nordin Rodriguez <ramon.nordin.rodriguez@ferroamp.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agoMerge branch 'microchip-dsa-driver-improvements'
Paolo Abeni [Tue, 30 May 2023 07:48:22 +0000 (09:48 +0200)]
Merge branch 'microchip-dsa-driver-improvements'

Oleksij Rempel says:

====================
Microchip DSA Driver Improvements

changes v2:
- set .max_register = U8_MAX, it should be more readable
- clarify in the RMW error handling patch, logging behavior
  expectation.

I'd like to share a set of patches for the Microchip DSA driver. These
patches were chosen from a bigger set because they are simpler and
should be easier to review. The goal is to make the code easier to read,
get rid of unused code, and handle errors better.
====================

Link: https://lore.kernel.org/r/20230526073445.668430-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet: dsa: microchip: Add register access control for KSZ8873 chip
Oleksij Rempel [Fri, 26 May 2023 07:34:45 +0000 (09:34 +0200)]
net: dsa: microchip: Add register access control for KSZ8873 chip

This update introduces specific register access boundaries for the
KSZ8873 and KSZ8863 chips within the DSA Microchip driver. The outlined
ranges target global control registers, port registers, and advanced
control registers.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet: dsa: microchip: ksz8: Prepare ksz8863_smi for regmap register access validation
Oleksij Rempel [Fri, 26 May 2023 07:34:44 +0000 (09:34 +0200)]
net: dsa: microchip: ksz8: Prepare ksz8863_smi for regmap register access validation

This patch prepares the ksz8863_smi part of ksz8 driver to utilize the
regmap register access validation feature.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet: dsa: microchip: remove ksz_port:on variable
Oleksij Rempel [Fri, 26 May 2023 07:34:43 +0000 (09:34 +0200)]
net: dsa: microchip: remove ksz_port:on variable

The only place where this variable would be set to false is the
ksz8_config_cpu_port() function. But it is done in a bogus way:

  for (i = 0; i < dev->phy_port_cnt; i++) {
if (i == dev->phy_port_cnt) <--- will be never executed.
break;
p->on = 1;

So, we never have a situation where p->on = 0. In this case, we can just
remove it.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet: dsa: microchip: add an enum for regmap widths
Vladimir Oltean [Fri, 26 May 2023 07:34:42 +0000 (09:34 +0200)]
net: dsa: microchip: add an enum for regmap widths

It is not immediately obvious that this driver allocates, via the
KSZ_REGMAP_TABLE() macro, 3 regmaps for register access: dev->regmap[0]
for 8-bit access, dev->regmap[1] for 16-bit and dev->regmap[2] for
32-bit access.

In future changes that add support for reg_fields, each field will have
to specify through which of the 3 regmaps it's going to go. Add an enum
now, to denote one of the 3 register access widths, and make the code go
through some wrapper functions for easier review and further
modification.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agonet: dsa: microchip: improving error handling for 8-bit register RMW operations
Oleksij Rempel [Fri, 26 May 2023 07:34:41 +0000 (09:34 +0200)]
net: dsa: microchip: improving error handling for 8-bit register RMW operations

This patch refines the error handling mechanism for 8-bit register
read-modify-write operations. In case of a failure, it now logs an error
message detailing the problematic offset. This enhancement aids in
debugging by providing more precise information when these operations
encounter issues.

Furthermore, the ksz_prmw8() function has been updated to return error
values rather than void, enabling calling functions to appropriately
respond to errors.

Additionally, in case of an error that affects both the current and
future accesses, the PHY driver will log the errors consistently, akin
to the existing behavior in all ksz_read*/ksz_write* helpers.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 months agoMerge branch 'netlink-specs-add-ynl-spec-for-ovs_flow'
Jakub Kicinski [Tue, 30 May 2023 05:05:40 +0000 (22:05 -0700)]
Merge branch 'netlink-specs-add-ynl-spec-for-ovs_flow'

Donald Hunter says:

====================
netlink: specs: add ynl spec for ovs_flow

Add a ynl specification for ovs_flow. The spec is sufficient to dump ovs
flows but some attrs have been left as binary blobs because ynl doesn't
support C arrays in struct definitions yet.

Patches 1-3 add features for genetlink-legacy specs
Patch 4 is the ovs_flow netlink spec
====================

Link: https://lore.kernel.org/r/20230527133107.68161-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonetlink: specs: add ynl spec for ovs_flow
Donald Hunter [Sat, 27 May 2023 13:31:07 +0000 (14:31 +0100)]
netlink: specs: add ynl spec for ovs_flow

Add a ynl specification for ovs_flow. This spec is sufficient to dump ovs
flows. Some attrs are left as binary blobs because ynl doesn't support C
arrays in struct definitions yet.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl: Support enums in struct members in genetlink-legacy
Donald Hunter [Sat, 27 May 2023 13:31:06 +0000 (14:31 +0100)]
tools: ynl: Support enums in struct members in genetlink-legacy

Support decoding scalars as enums in struct members for genetlink-legacy
specs.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agotools: ynl: Initialise fixed headers to 0 in genetlink-legacy
Donald Hunter [Sat, 27 May 2023 13:31:05 +0000 (14:31 +0100)]
tools: ynl: Initialise fixed headers to 0 in genetlink-legacy

This eliminates the need for e.g. --json '{"dp-ifindex":0}' which is not
too big a deal for ovs but will get tiresome for fixed header structs that
have many members.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodoc: ynl: Add doc attr to struct members in genetlink-legacy spec
Donald Hunter [Sat, 27 May 2023 13:31:04 +0000 (14:31 +0100)]
doc: ynl: Add doc attr to struct members in genetlink-legacy spec

Make it possible to document the meaning of struct member attributes in
genetlink-legacy specs.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agodevlink: Spelling corrections
Simon Horman [Fri, 26 May 2023 13:45:13 +0000 (15:45 +0200)]
devlink: Spelling corrections

Make some minor spelling corrections in comments.

Found by inspection.

Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230526-devlink-spelling-v1-1-9a3e36cdebc8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: fix signedness bug in skb_splice_from_iter()
Dan Carpenter [Fri, 26 May 2023 13:39:15 +0000 (16:39 +0300)]
net: fix signedness bug in skb_splice_from_iter()

The "len" variable needs to be signed for the error handling to work
correctly.

Fixes: 2e910b95329c ("net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/366861a7-87c8-4bbf-9101-69dd41021d07@kili.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dpaa2-mac: use correct interface to free mdiodev
Russell King (Oracle) [Fri, 26 May 2023 11:44:43 +0000 (12:44 +0100)]
net: dpaa2-mac: use correct interface to free mdiodev

Rather than using put_device(&mdiodev->dev), use the proper interface
provided to dispose of the mdiodev - that being mdio_device_free().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/E1q2VsB-008QlZ-El@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge branch 'net-pcs-add-helpers-to-xpcs-and-lynx-to-manage-mdiodev'
Jakub Kicinski [Tue, 30 May 2023 04:46:55 +0000 (21:46 -0700)]
Merge branch 'net-pcs-add-helpers-to-xpcs-and-lynx-to-manage-mdiodev'

Russell King says:

====================
net: pcs: add helpers to xpcs and lynx to manage mdiodev

This morning, we have had two instances where the destruction of the
MDIO device associated with XPCS and Lynx has been wrong. Rather than
allowing this pattern of errors to continue, let's make it easier for
driver authors to get this right by adding a helper.

The changes are essentially:

1. Add two new mdio device helpers to manage the underlying struct
   device reference count. Note that the existing mdio_device_free()
   doesn't actually free anything, it merely puts the reference count.

2. Make the existing _create() and _destroy() PCS driver methods
   increment and decrement this refcount using these helpers. This
   results in no overall change, although drivers may hang on to
   the mdio device for a few cycles longer.

3. Add _create_mdiodev() which creates the mdio device before calling
   the existing _create() method. Once the _create() method has
   returned, we put the reference count on the mdio device.

   If _create() was successful, then the reference count taken there
   will "hold" the mdio device for the lifetime of the PCS (in other
   words, until _destroy() is called.) However, if _create() failed,
   then dropping the refcount at this point will free the mdio device.

   This is the exact behaviour we desire.

4. Convert users that create a mdio device and then call the PCS's
   _create() method over to the new _create_mdiodev() method, and
   simplify the cleanup.

We also have DPAA2 and fmem_memac that look up their PCS rather than
creating it. These could also drop their reference count on the MDIO
device immediately after calling lynx_pcs_create(), which would then
mean we wouldn't need lynx_get_mdio_device() and the associated
complexity to put the device in dpaa2_pcs_destroy() and pcs_put().
Note that DPAA2 bypasses the mdio device's abstractions by calling
put_device() directly.
====================

Link: https://lore.kernel.org/r/ZHCGZ8IgAAwr8bla@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: enetc: use lynx_pcs_create_mdiodev()
Russell King (Oracle) [Fri, 26 May 2023 10:14:50 +0000 (11:14 +0100)]
net: enetc: use lynx_pcs_create_mdiodev()

Use the newly introduced lynx_pcs_create_mdiodev() which simplifies the
creation and destruction of the lynx PCS.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: dsa: ocelot: use lynx_pcs_create_mdiodev()
Russell King (Oracle) [Fri, 26 May 2023 10:14:44 +0000 (11:14 +0100)]
net: dsa: ocelot: use lynx_pcs_create_mdiodev()

Use the newly introduced lynx_pcs_create_mdiodev() which simplifies the
creation and destruction of the lynx PCS.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: pcs: lynx: add lynx_pcs_create_mdiodev()
Russell King (Oracle) [Fri, 26 May 2023 10:14:39 +0000 (11:14 +0100)]
net: pcs: lynx: add lynx_pcs_create_mdiodev()

Add lynx_pcs_create_mdiodev() to simplify the creation of the mdio
device associated with lynx PCS. In order to allow lynx_pcs_destroy()
to clean this up, we need to arrange for lynx_pcs_create() to take a
refcount on the mdiodev, and lynx_pcs_destroy() to put it.

Adding the refcounting to lynx_pcs_create()..lynx_pcs_destroy() will
be transparent to existing users of these interfaces.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: stmmac: use xpcs_create_mdiodev()
Russell King (Oracle) [Fri, 26 May 2023 10:14:34 +0000 (11:14 +0100)]
net: stmmac: use xpcs_create_mdiodev()

Use the new xpcs_create_mdiodev() creator, which simplifies the
creation and destruction of the mdio device associated with xpcs.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: pcs: xpcs: add xpcs_create_mdiodev()
Russell King (Oracle) [Fri, 26 May 2023 10:14:29 +0000 (11:14 +0100)]
net: pcs: xpcs: add xpcs_create_mdiodev()

Add xpcs_create_mdiodev() to simplify the creation of the mdio device
associated with the XPCS. In order to allow xpcs_destroy() to clean
this up, we need to arrange for xpcs_create() to take a refcount on
the mdiodev, and xpcs_destroy() to put it.

Adding the refcounting to xpcs_create()..xpcs_destroy() will be
transparent to existing users of these interfaces.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: mdio: add mdio_device_get() and mdio_device_put()
Russell King (Oracle) [Fri, 26 May 2023 10:14:24 +0000 (11:14 +0100)]
net: mdio: add mdio_device_get() and mdio_device_put()

Add two new operations for a mdio device to manage the refcount on the
underlying struct device. This will be used by mdio PCS drivers to
simplify the creation and destruction handling, making it easier for
users to get it correct.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf...
Jakub Kicinski [Sat, 27 May 2023 00:26:00 +0000 (17:26 -0700)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-05-26

We've added 54 non-merge commits during the last 10 day(s) which contain
a total of 76 files changed, 2729 insertions(+), 1003 deletions(-).

The main changes are:

1) Add the capability to destroy sockets in BPF through a new kfunc,
   from Aditi Ghag.

2) Support O_PATH fds in BPF_OBJ_PIN and BPF_OBJ_GET commands,
   from Andrii Nakryiko.

3) Add capability for libbpf to resize datasec maps when backed via mmap,
   from JP Kobryn.

4) Move all the test kfuncs for CI out of the kernel and into bpf_testmod,
   from Jiri Olsa.

5) Big batch of xsk selftest improvements to prep for multi-buffer testing,
   from Magnus Karlsson.

6) Show the target_{obj,btf}_id in tracing link's fdinfo and dump it
   via bpftool, from Yafang Shao.

7) Various misc BPF selftest improvements to work with upcoming LLVM 17,
   from Yonghong Song.

8) Extend bpftool to specify netdevice for resolving XDP hints,
   from Larysa Zaremba.

9) Document masking in shift operations for the insn set document,
   from Dave Thaler.

10) Extend BPF selftests to check xdp_feature support for bond driver,
    from Lorenzo Bianconi.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
  bpf: Fix bad unlock balance on freeze_mutex
  libbpf: Ensure FD >= 3 during bpf_map__reuse_fd()
  libbpf: Ensure libbpf always opens files with O_CLOEXEC
  selftests/bpf: Check whether to run selftest
  libbpf: Change var type in datasec resize func
  bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command
  libbpf: Selftests for resizing datasec maps
  libbpf: Add capability for resizing datasec maps
  selftests/bpf: Add path_fd-based BPF_OBJ_PIN and BPF_OBJ_GET tests
  libbpf: Add opts-based bpf_obj_pin() API and add support for path_fd
  bpf: Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
  libbpf: Start v1.3 development cycle
  bpf: Validate BPF object in BPF_OBJ_PIN before calling LSM
  bpftool: Specify XDP Hints ifname when loading program
  selftests/bpf: Add xdp_feature selftest for bond device
  selftests/bpf: Test bpf_sock_destroy
  selftests/bpf: Add helper to get port using getsockname
  bpf: Add bpf_sock_destroy kfunc
  bpf: Add kfunc filter function to 'struct btf_kfunc_id_set'
  bpf: udp: Implement batching for sockets iterator
  ...
====================

Link: https://lore.kernel.org/r/20230526222747.17775-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agonet: phy: broadcom: Register dummy IRQ handler
Florian Fainelli [Thu, 25 May 2023 17:59:15 +0000 (10:59 -0700)]
net: phy: broadcom: Register dummy IRQ handler

In order to have our interrupt descriptor fully setup and in particular
the action, ensure that we register a full fledged interrupt handler.
This also allow us to set the interrupt polarity and flow through the
same call.

This is specifically necessary for kernel/irq/pm.c::suspend_device_irq
to set the interrupt descriptor to the IRQD_WAKEUP_ARMED state and
enable the interrupt for wake-up since it was still in a disabled state.

Without an interrupt descriptor we would have ran into cases where the
wake-up interrupt is not capable of waking up the system, specifically
if we resumed the system ACPI S5 using the Ethernet PHY. In that case
the Ethernet PHY interrupt would be pending by the time the kernel
booted, which it would acknowledge but then we could never use it as
a wake-up source again.

Fixes: 8baddaa9d4ba ("net: phy: broadcom: Add support for Wake-on-LAN")
Suggested-by: Doug Berger <doug.berger@broadcom.com>
Debugged-by: Doug Berger <doug.berger@broadcom.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agotcp: remove unused TCP_SYNQ_INTERVAL definition
Neal Cardwell [Thu, 25 May 2023 14:57:36 +0000 (10:57 -0400)]
tcp: remove unused TCP_SYNQ_INTERVAL definition

Currently TCP_SYNQ_INTERVAL is defined but never used.

According to "git log -S TCP_SYNQ_INTERVAL net-next/main" it seems
the last references to TCP_SYNQ_INTERVAL were removed by 2015
commit fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agobpf: Fix bad unlock balance on freeze_mutex
Daniel Borkmann [Fri, 26 May 2023 10:13:56 +0000 (12:13 +0200)]
bpf: Fix bad unlock balance on freeze_mutex

Commit c4c84f6fb2c4 ("bpf: drop unnecessary bpf_capable() check in
BPF_MAP_FREEZE command") moved the permissions check outside of the
freeze_mutex in the map_freeze() handler. The error paths still jumps
to the err_put which tries to unlock the freeze_mutex even though it
was not locked in the first place. Fix it.

Fixes: c4c84f6fb2c4 ("bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command")
Reported-by: syzbot+8982e75c2878b9ffeac5@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
13 months agolibbpf: Ensure FD >= 3 during bpf_map__reuse_fd()
Andrii Nakryiko [Thu, 25 May 2023 22:13:11 +0000 (15:13 -0700)]
libbpf: Ensure FD >= 3 during bpf_map__reuse_fd()

Improve bpf_map__reuse_fd() logic and ensure that dup'ed map FD is
"good" (>= 3) and has O_CLOEXEC flags. Use fcntl(F_DUPFD_CLOEXEC) for
that, similarly to ensure_good_fd() helper we already use in low-level
APIs that work with bpf() syscall.

Suggested-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230525221311.2136408-2-andrii@kernel.org
13 months agolibbpf: Ensure libbpf always opens files with O_CLOEXEC
Andrii Nakryiko [Thu, 25 May 2023 22:13:10 +0000 (15:13 -0700)]
libbpf: Ensure libbpf always opens files with O_CLOEXEC

Make sure that libbpf code always gets FD with O_CLOEXEC flag set,
regardless if file is open through open() or fopen(). For the latter
this means to add "e" to mode string, which is supported since pretty
ancient glibc v2.7.

Also drop the outdated TODO comment in usdt.c, which was already completed.

Suggested-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230525221311.2136408-1-andrii@kernel.org
13 months agoMerge branch 'mv88e6xxx-phylink-prepare'
David S. Miller [Fri, 26 May 2023 09:39:41 +0000 (10:39 +0100)]
Merge branch 'mv88e6xxx-phylink-prepare'

Russell King says:

====================
net: dsa: mv88e6xxx: prepare for phylink_pcs conversion

These two patches provide some preparation for converting the mv88e6xxx
DSA driver to use phylink_pcs rather than bolting the serdes bits into
the MAC calls.

In order to correctly drive mv88e6xxx hardware when the PCS code is
split, we need to force the link down while changing the configuration
of a port. This is provided for via the mac_prepare() and mac_finish()
methods, but DSA does not forward these on to DSA drivers.

Patch 1 adds support to the DSA core to forward these two methods to
DSA drivers, and patch 2 moves the code from mv88e6xxx_mac_config()
into the respective methods.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: dsa: mv88e6xxx: move link forcing to mac_prepare/mac_finish
Russell King (Oracle) [Thu, 25 May 2023 10:38:50 +0000 (11:38 +0100)]
net: dsa: mv88e6xxx: move link forcing to mac_prepare/mac_finish

Move the link forcing out of mac_config() and into the mac_prepare()
and mac_finish() methods. This results in no change to the order in
which these operations are performed, but does mean when we convert
mv88e6xxx to phylink_pcs support, we will continue to preserve this
ordering.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: dsa: add support for mac_prepare() and mac_finish() calls
Russell King (Oracle) [Thu, 25 May 2023 10:38:44 +0000 (11:38 +0100)]
net: dsa: add support for mac_prepare() and mac_finish() calls

Add DSA support for the phylink mac_prepare() and mac_finish() calls.
These were introduced as part of the PCS support to allow MACs to
perform preparatory steps prior to configuration, and finalisation
steps after the MAC and PCS has been configured.

Introducing phylink_pcs support to the mv88e6xxx DSA driver needs some
code moved out of its mac_config() stage into the mac_prepare() and
mac_finish() stages, and this commit facilitates such code in DSA
drivers.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet: ynl: prefix uAPI header include with uapi/
Jakub Kicinski [Wed, 24 May 2023 17:09:01 +0000 (10:09 -0700)]
net: ynl: prefix uAPI header include with uapi/

To keep things simple we used to include the uAPI header
in the kernel in the #include <linux/$family.h> format.
This works well enough, most of the genl families should
have headers in include/net/ so linux/$family.h ends up
referring to the uAPI header, anyway. And if it doesn't
no big deal, we'll just include more info than we need.

Unless that is there is a naming conflict. Someone recently
created include/linux/psp.h which will be a problem when
supporting the PSP protocol. (I'm talking about
work-in-progress patches, but it's just a proof that assuming
lack of name conflicts was overly optimistic.)

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agosfc: handle VI shortage on ef100 by readjusting the channels
Pieter Jansen van Vuuren [Wed, 24 May 2023 09:36:38 +0000 (10:36 +0100)]
sfc: handle VI shortage on ef100 by readjusting the channels

When fewer VIs are allocated than what is allowed we can readjust
the channels by calling efx_mcdi_alloc_vis() again.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agonet/core: Enable socket busy polling on -RT
Kurt Kanzenbach [Tue, 23 May 2023 11:15:18 +0000 (13:15 +0200)]
net/core: Enable socket busy polling on -RT

Busy polling is currently not allowed on PREEMPT_RT, because it disables
preemption while invoking the NAPI callback. It is not possible to acquire
sleeping locks with disabled preemption. For details see commit
20ab39d13e2e ("net/core: disable NET_RX_BUSY_POLL on PREEMPT_RT").

However, strict cyclic and/or low latency network applications may prefer busy
polling e.g., using AF_XDP instead of interrupt driven communication.

The preempt_disable() is used in order to prevent the poll_owner and NAPI owner
to be preempted while owning the resource to ensure progress. Netpoll performs
busy polling in order to acquire the lock. NAPI is locked by setting the
NAPIF_STATE_SCHED flag. There is no busy polling if the flag is set and the
"owner" is preempted. Worst case is that the task owning NAPI gets preempted and
NAPI processing stalls.  This is can be prevented by properly prioritising the
tasks within the system.

Allow RX_BUSY_POLL on PREEMPT_RT if NETPOLL is disabled. Don't disable
preemption on PREEMPT_RT within the busy poll loop.

Tested on x86 hardware with v6.1-RT and v6.3-RT on Intel i225 (igc) with
AF_XDP/ZC sockets configured to run in busy polling mode.

Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Fri, 26 May 2023 03:56:19 +0000 (20:56 -0700)]
Merge git://git./linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge tag 'ib-leds-netdev-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Fri, 26 May 2023 03:37:28 +0000 (20:37 -0700)]
Merge tag 'ib-leds-netdev-v6.5' of git://git./linux/kernel/git/lee/leds

Lee Jones says:

====================
Immutable branch between LEDs and netdev due for the v6.5 merge window

Andrew Lunn says:

  Christian Marangi and I will be continuing the work of offloading LED
  blinking to Ethernet MAC and PHY LED controllers. The next set of
  patches is again cross subsystem, LEDs and netdev. It also requires
  some patches you have in for-leds-next:

  a286befc24e8 leds: trigger: netdev: Use mutex instead of spinlocks
  509412749002 leds: trigger: netdev: Convert device attr to macro
  0fd93ac85826 leds: trigger: netdev: Rename add namespace to netdev trigger enum modes
  eb31ca4531a0 leds: trigger: netdev: Drop NETDEV_LED_MODE_LINKUP from mode
  3fc498cf54b4 leds: trigger: netdev: Recheck NETDEV_LED_MODE_LINKUP on dev rename

  I'm assuming the new series will get nerged via netdev, with your
  Acked-by. Could you create a stable branch with these patches which
  can be pulled into netdev?

* tag 'ib-leds-netdev-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds:
  leds: trigger: netdev: Use mutex instead of spinlocks
  leds: trigger: netdev: Convert device attr to macro
  leds: trigger: netdev: Rename add namespace to netdev trigger enum modes
  leds: trigger: netdev: Drop NETDEV_LED_MODE_LINKUP from mode
  leds: trigger: netdev: Recheck NETDEV_LED_MODE_LINKUP on dev rename
====================

Link: https://lore.kernel.org/r/20230525111521.GA411262@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Fri, 26 May 2023 02:56:10 +0000 (19:56 -0700)]
Merge git://git./linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Conflicts:

net/ipv4/raw.c
  3632679d9e4f ("ipv{4,6}/raw: fix output xfrm lookup wrt protocol")
  c85be08fc4fa ("raw: Stop using RTO_ONLINK.")
https://lore.kernel.org/all/20230525110037.2b532b83@canb.auug.org.au/

Adjacent changes:

drivers/net/ethernet/freescale/fec_main.c
  9025944fddfe ("net: fec: add dma_wmb to ensure correct descriptor values")
  144470c88c5d ("net: fec: using the standard return codes when xdp xmit errors")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 months agoselftests/bpf: Check whether to run selftest
Daniel Müller [Thu, 25 May 2023 23:22:48 +0000 (23:22 +0000)]
selftests/bpf: Check whether to run selftest

The sockopt test invokes test__start_subtest and then unconditionally
asserts the success. That means that even if deny-listed, any test will
still run and potentially fail.
Evaluate the return value of test__start_subtest() to achieve the
desired behavior, as other tests do.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230525232248.640465-1-deso@posteo.net
13 months agoMerge tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 25 May 2023 17:55:26 +0000 (10:55 -0700)]
Merge tag 'net-6.4-rc4' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bluetooth and bpf.

  Current release - regressions:

   - net: fix skb leak in __skb_tstamp_tx()

   - eth: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs

  Current release - new code bugs:

   - handshake:
      - fix sock->file allocation
      - fix handshake_dup() ref counting

   - bluetooth:
      - fix potential double free caused by hci_conn_unlink
      - fix UAF in hci_conn_hash_flush

  Previous releases - regressions:

   - core: fix stack overflow when LRO is disabled for virtual
     interfaces

   - tls: fix strparser rx issues

   - bpf:
      - fix many sockmap/TCP related issues
      - fix a memory leak in the LRU and LRU_PERCPU hash maps
      - init the offload table earlier

   - eth: mlx5e:
      - do as little as possible in napi poll when budget is 0
      - fix using eswitch mapping in nic mode
      - fix deadlock in tc route query code

  Previous releases - always broken:

   - udplite: fix NULL pointer dereference in __sk_mem_raise_allocated()

   - raw: fix output xfrm lookup wrt protocol

   - smc: reset connection when trying to use SMCRv2 fails

   - phy: mscc: enable VSC8501/2 RGMII RX clock

   - eth: octeontx2-pf: fix TSOv6 offload

   - eth: cdc_ncm: deal with too low values of dwNtbOutMaxSize"

* tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
  udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated().
  net: phy: mscc: enable VSC8501/2 RGMII RX clock
  net: phy: mscc: remove unnecessary phydev locking
  net: phy: mscc: add support for VSC8501
  net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLE
  net/handshake: Enable the SNI extension to work properly
  net/handshake: Unpin sock->file if a handshake is cancelled
  net/handshake: handshake_genl_notify() shouldn't ignore @flags
  net/handshake: Fix uninitialized local variable
  net/handshake: Fix handshake_dup() ref counting
  net/handshake: Remove unneeded check from handshake_dup()
  ipv6: Fix out-of-bounds access in ipv6_find_tlv()
  net: ethernet: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs
  docs: netdev: document the existence of the mail bot
  net: fix skb leak in __skb_tstamp_tx()
  r8169: Use a raw_spinlock_t for the register locks.
  page_pool: fix inconsistency for page_pool_ring_[un]lock()
  bpf, sockmap: Test progs verifier error with latest clang
  bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops
  bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer
  ...

13 months agolibbpf: Change var type in datasec resize func
JP Kobryn [Thu, 25 May 2023 00:13:23 +0000 (17:13 -0700)]
libbpf: Change var type in datasec resize func

This changes a local variable type that stores a new array id to match
the return type of btf__add_array().

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230525001323.8554-1-inwardvessel@gmail.com
13 months agoMerge tag 'for-v6.4-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux...
Linus Torvalds [Thu, 25 May 2023 17:26:36 +0000 (10:26 -0700)]
Merge tag 'for-v6.4-rc' of git://git./linux/kernel/git/sre/linux-power-supply

Pull power supply fixes from Sebastian Reichel:

 - Fix power_supply_get_battery_info for devices without parent devices
   resulting in NULL pointer dereference

 - Fix desktop systems reporting to run on battery once a power-supply
   device with device scope appears (e.g. a HID keyboard with a battery)

 - Ratelimit debug print about driver not providing data

 - Fix race condition related to external_power_changed in multiple
   drivers (ab8500, axp288, bq25890, sc27xx, bq27xxx)

 - Fix LED trigger switching from blinking to solid-on when charging
   finishes

 - Fix multiple races in bq27xxx battery driver

 - mt6360: handle potential ENOMEM from devm_work_autocancel

 - sbs-charger: Fix SBS_CHARGER_STATUS_CHARGE_INHIBITED bit

 - rt9467: avoid passing 0 to dev_err_probe

* tag 'for-v6.4-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (21 commits)
  power: supply: Fix logic checking if system is running from battery
  power: supply: mt6360: add a check of devm_work_autocancel in mt6360_charger_probe
  power: supply: sbs-charger: Fix INHIBITED bit for Status reg
  power: supply: rt9467: Fix passing zero to 'dev_err_probe'
  power: supply: Ratelimit no data debug output
  power: supply: Fix power_supply_get_battery_info() if parent is NULL
  power: supply: bq24190: Call power_supply_changed() after updating input current
  power: supply: bq25890: Call power_supply_changed() after updating input current or voltage
  power: supply: bq27xxx: Use mod_delayed_work() instead of cancel() + schedule()
  power: supply: bq27xxx: After charger plug in/out wait 0.5s for things to stabilize
  power: supply: bq27xxx: Ensure power_supply_changed() is called on current sign changes
  power: supply: bq27xxx: Move bq27xxx_battery_update() down
  power: supply: bq27xxx: Add cache parameter to bq27xxx_battery_current_and_status()
  power: supply: bq27xxx: Fix poll_interval handling and races on remove
  power: supply: bq27xxx: Fix I2C IRQ race on remove
  power: supply: bq27xxx: Fix bq27xxx_battery_update() race condition
  power: supply: leds: Fix blink to LED on transition
  power: supply: sc27xx: Fix external_power_changed race
  power: supply: bq25890: Fix external_power_changed race
  power: supply: axp288_fuel_gauge: Fix external_power_changed race
  ...

13 months agobpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command
Andrii Nakryiko [Wed, 24 May 2023 22:54:19 +0000 (15:54 -0700)]
bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command

Seems like that extra bpf_capable() check in BPF_MAP_FREEZE handler was
unintentionally left when we switched to a model that all BPF map
operations should be allowed regardless of CAP_BPF (or any other
capabilities), as long as process got BPF map FD somehow.

This patch replaces bpf_capable() check in BPF_MAP_FREEZE handler with
writeable access check, given conceptually freezing the map is modifying
it: map becomes unmodifiable for subsequent updates.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230524225421.1587859-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 months agoMerge tag 'sound-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Thu, 25 May 2023 16:48:23 +0000 (09:48 -0700)]
Merge tag 'sound-6.4-rc4' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes:

   - HD-audio runtime PM bug fix

   - A couple of HD-audio quirks

   - Fix series of ASoC Intel AVS drivers

   - ASoC DPCM fix for a bug found on new Intel systems

   - A few other ASoC device-specific small fixes"

* tag 'sound-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: Enable headset onLenovo M70/M90
  ASoC: dwc: move DMA init to snd_soc_dai_driver probe()
  ASoC: cs35l41: Fix default regmap values for some registers
  ALSA: hda: Fix unhandled register update during auto-suspend period
  ASoC: dt-bindings: tlv320aic32x4: Fix supply names
  ASoC: Intel: avs: Add missing checks on FE startup
  ASoC: Intel: avs: Fix avs_path_module::instance_id size
  ASoC: Intel: avs: Account for UID of ACPI device
  ASoC: Intel: avs: Fix declaration of enum avs_channel_config
  ASoC: Intel: Skylake: Fix declaration of enum skl_ch_cfg
  ASoC: Intel: avs: Access path components under lock
  ASoC: Intel: avs: Fix module lookup
  ALSA: hda/ca0132: add quirk for EVGA X299 DARK
  ASoC: soc-pcm: test if a BE can be prepared
  ASoC: rt5682: Disable jack detection interrupt during suspend
  ASoC: lpass: Fix for KASAN use_after_free out of bounds