Vladimir Oltean [Wed, 20 Oct 2021 17:49:55 +0000 (20:49 +0300)]
net: dsa: tag_8021q: make dsa_8021q_{rx,tx}_vid take dp as argument
Pass a single argument to dsa_8021q_rx_vid and dsa_8021q_tx_vid that
contains the necessary information from the two arguments that are
currently provided: the switch and the port number.
Also rename those functions so that they have a dsa_port_* prefix, since
they operate on a struct dsa_port *.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 20 Oct 2021 17:49:54 +0000 (20:49 +0300)]
net: dsa: tag_sja1105: do not open-code dsa_switch_for_each_port
Find the remaining iterators over dst->ports that only filter for the
ports belonging to a certain switch, and replace those with the
dsa_switch_for_each_port helper that we have now.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 20 Oct 2021 17:49:53 +0000 (20:49 +0300)]
net: dsa: convert cross-chip notifiers to iterate using dp
The majority of cross-chip switch notifiers need to filter in some way
over the type of ports: some install VLANs etc on all cascade ports.
The difference is that the matching function, which filters by port
type, is separate from the function where the iteration happens. So this
patch needs to refactor the matching functions' prototypes as well, to
take the dp as argument.
In a future patch/series, I might convert dsa_towards_port to return a
struct dsa_port *dp too, but at the moment it is a bit entangled with
dsa_routing_port which is also used by mv88e6xxx and they both return an
int port. So keep dsa_towards_port the way it is and convert it into a
dp using dsa_to_port.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 20 Oct 2021 17:49:52 +0000 (20:49 +0300)]
net: dsa: remove gratuitous use of dsa_is_{user,dsa,cpu}_port
Find the occurrences of dsa_is_{user,dsa,cpu}_port where a struct
dsa_port *dp was already available in the function scope, and replace
them with the dsa_port_is_{user,dsa,cpu} equivalent function which uses
that dp directly and does not perform another hidden dsa_to_port().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 20 Oct 2021 17:49:51 +0000 (20:49 +0300)]
net: dsa: do not open-code dsa_switch_for_each_port
Find the remaining iterators over dst->ports that only filter for the
ports belonging to a certain switch, and replace those with the
dsa_switch_for_each_port helper that we have now.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 20 Oct 2021 17:49:50 +0000 (20:49 +0300)]
net: dsa: remove the "dsa_to_port in a loop" antipattern from the core
Ever since Vivien's conversion of the ds->ports array into a dst->ports
list, and the introduction of dsa_to_port, iterations through the ports
of a switch became quadratic whenever dsa_to_port was needed.
dsa_to_port can either be called directly, or indirectly through the
dsa_is_{user,cpu,dsa,unused}_port helpers.
Use the newly introduced dsa_switch_for_each_port() iteration macro
that works with the iterator variable being a struct dsa_port *dp
directly, and not an int i. It is an expensive variable to go from i to
dp, but cheap to go from dp to i.
This macro iterates through the entire ds->dst->ports list and filters
by the ports belonging just to the switch provided as argument.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 20 Oct 2021 17:49:49 +0000 (20:49 +0300)]
net: dsa: introduce helpers for iterating through ports using dp
Since the DSA conversion from the ds->ports array into the dst->ports
list, the DSA API has encouraged driver writers, as well as the core
itself, to write inefficient code.
Currently, code that wants to filter by a specific type of port when
iterating, like {!unused, user, cpu, dsa}, uses the dsa_is_*_port helper.
Under the hood, this uses dsa_to_port which iterates again through
dst->ports. But the driver iterates through the port list already, so
the complexity is quadratic for the typical case of a single-switch
tree.
This patch introduces some iteration helpers where the iterator is
already a struct dsa_port *dp, so that the other variant of the
filtering functions, dsa_port_is_{unused,user,cpu_dsa}, can be used
directly on the iterator. This eliminates the second lookup.
These functions can be used both by the core and by drivers.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 21 Oct 2021 11:18:10 +0000 (12:18 +0100)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-10-20
Sudheer Mogilappagari says:
This series introduces initial support for Application Device Queues(ADQ)
in ice driver. ADQ provides traffic isolation for application flows in
hardware and ability to steer traffic to a given traffic class. This
helps in aligning NIC queues to application threads.
Traffic classes are configured using mqprio framework of tc command
and mapped to HW channels(VSIs) in the driver. The queue set of each
traffic class is managed by corresponding VSI. Each traffic channel
can be configured with bandwidth rate-limiting limits and is offloaded
to the hardware through the mqprio framework by specifying the mode
option as 'channel' and shaper option as 'bw_rlimit'.
Next, the flows of application can be steered into a given traffic class
using "tc filter" command. The option "skip_sw hw_tc x" indicates
hw-offload of filtering and steering filtered traffic into specified TC.
Non-matching traffic flows through TC0.
When channel configuration are removed queue configuration is set to
default and filters configured on individual traffic classes are deleted.
example:
$ ethtool -K eth0 hw-tc-offload on
Configure 3 traffic classes and map priority 0,1,2 to TC0, TC1 and TC2
respectively. TC0 has 2 queues from offset 0 & TC1 has 8 queues from
offset 2 and TC2 has 4 queues from offset 10. Enable hardware offload
of channels.
$ tc qdisc add dev eth0 root mqprio num_tc 3 map 0 1 2 queues \
2@0 8@2 4@10 hw 1 mode channel
$ tc qdisc show dev eth0
qdisc mqprio 8001: root tc 2 map 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0
queues:(0:1) (2:9) (10:13)
mode:channel
Configure two filters to match based on dst ipaddr, dst tcp port and
redirect to TC1 and TC2.
$ tc qdisc add dev eth0 clsact
$ tc filter add dev eth0 protocol ip ingress prio 1 flower\
dst_ip 192.168.1.1/32 ip_proto tcp dst_port 80\
skip_sw hw_tc 1
$ tc filter add dev eth0 protocol ip ingress prio 1 flower\
dst_ip 192.168.1.1/32 ip_proto tcp dst_port 5001\
skip_sw hw_tc 2
$ tc filter show dev eth0 ingress
Delete traffic classes configuration:
$ sudo tc qdisc del dev eth0 root
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 21 Oct 2021 11:14:30 +0000 (12:14 +0100)]
Merge branch 'mscc-ocelot-all-ports-vlan-untagged-egress'
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 20 Oct 2021 17:58:52 +0000 (20:58 +0300)]
net: mscc: ocelot: track the port pvid using a pointer
Now that we have a list of struct ocelot_bridge_vlan entries, we can
rewrite the pvid logic to simply point to one of those structures,
instead of having a separate structure with a "bool valid".
The NULL pointer will represent the lack of a bridge pvid (not to be
confused with the lack of a hardware pvid on the port, that is present
at all times).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 20 Oct 2021 17:58:51 +0000 (20:58 +0300)]
net: mscc: ocelot: add the local station MAC addresses in VID 0
The ocelot switchdev driver does not include the CPU port in the list of
flooding destinations for unknown traffic, instead that traffic is
supposed to match FDB entries to reach the CPU.
The addresses it installs are:
(a) the station MAC address, in ocelot_probe_port() and later during
runtime in ocelot_port_set_mac_address(). These are the VLAN-unaware
addresses. The VLAN-aware addresses are in ocelot_vlan_vid_add().
(b) multicast addresses added with dev_mc_add() (not bridge host MDB
entries) in ocelot_mc_sync()
(c) multicast destination MAC addresses for MRP in ocelot_mrp_save_mac(),
to make sure those are dropped (not forwarded) by the bridging
service, just trapped to the CPU
So we can see that the logic is slightly buggy ever since the initial
commit
a556c76adc05 ("net: mscc: Add initial Ocelot switch support").
This is because, when ocelot_probe_port() runs, the port pvid is 0.
Then we join a VLAN-aware bridge, the pvid becomes 1, we call
ocelot_port_set_mac_address(), this learns the new MAC address in VID 1
(also fails to forget the old one, since it thinks it's in VID 1, but
that's not so important). Then when we leave the VLAN-aware bridge,
outside world is unable to ping our new MAC address because it isn't
learned in VID 0, the VLAN-unaware pvid.
[ note: this is strictly based on static analysis, I don't have hardware
to test. But there are also many more corner cases ]
The basic idea is that we should have a separation of concerns, and the
FDB entries used for standalone operation should be managed by the
driver, and the FDB entries used by the bridging service should be
managed by the bridge. So the standalone and VLAN-unaware bridge FDB
entries should not follow the bridge PVID, because that will only be
active when the bridge is VLAN-aware. So since the port pvid is
coincidentally zero during probe time, just make those entries
statically go to VID 0.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 20 Oct 2021 17:58:50 +0000 (20:58 +0300)]
net: mscc: ocelot: allow a config where all bridge VLANs are egress-untagged
At present, the ocelot driver accepts a single egress-untagged bridge
VLAN, meaning that this sequence of operations:
ip link add br0 type bridge vlan_filtering 1
ip link set swp0 master br0
bridge vlan add dev swp0 vid 2 pvid untagged
fails because the bridge automatically installs VID 1 as a pvid & untagged
VLAN, and vid 2 would be the second untagged VLAN on this port. It is
necessary to delete VID 1 before proceeding to add VID 2.
This limitation comes from the fact that we operate the port tag, when
it has an egress-untagged VID, in the OCELOT_PORT_TAG_NATIVE mode.
The ocelot switches do not have full flexibility and can either have one
single VID as egress-untagged, or all of them.
There are use cases for having all VLANs as egress-untagged as well, and
this patch adds support for that.
The change rewrites ocelot_port_set_native_vlan() into a more generic
ocelot_port_manage_port_tag() function. Because the software bridge's
state, transmitted to us via switchdev, can become very complex, we
don't attempt to track all possible state transitions, but instead take
a more declarative approach and just make ocelot_port_manage_port_tag()
figure out which more to operate in:
- port is VLAN-unaware: the classified VLAN (internal, unrelated to the
802.1Q header) is not inserted into packets on egress
- port is VLAN-aware:
- port has tagged VLANs:
-> port has no untagged VLAN: set up as pure trunk
-> port has one untagged VLAN: set up as trunk port + native VLAN
-> port has more than one untagged VLAN: this is an invalid config
which is rejected by ocelot_vlan_prepare
- port has no tagged VLANs
-> set up as pure egress-untagged port
We don't keep the number of tagged and untagged VLANs, we just count the
structures we keep.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 20 Oct 2021 17:58:49 +0000 (20:58 +0300)]
net: mscc: ocelot: convert the VLAN masks to a list
First and foremost, the driver currently allocates a constant sized
4K * u32 (16KB memory) array for the VLAN masks. However, a typical
application might not need so many VLANs, so if we dynamically allocate
the memory as needed, we might actually save some space.
Secondly, we'll need to keep more advanced bookkeeping of the VLANs we
have, notably we'll have to check how many untagged and how many tagged
VLANs we have. This will have to stay in a structure, and allocating
another 16 KB array for that is again a bit too much.
So refactor the bridge VLANs in a linked list of structures.
The hook points inside the driver are ocelot_vlan_member_add() and
ocelot_vlan_member_del(), which previously used to operate on the
ocelot->vlan_mask[vid] array element.
ocelot_vlan_member_add() and ocelot_vlan_member_del() used to call
ocelot_vlan_member_set() to commit to the ocelot->vlan_mask.
Additionally, we had two calls to ocelot_vlan_member_set() from outside
those callers, and those were directly from ocelot_vlan_init().
Those calls do not set up bridging service VLANs, instead they:
- clear the VLAN table on reset
- set the port pvid to the value used by this driver for VLAN-unaware
standalone port operation (VID 0)
So now, when we have a structure which represents actual bridge VLANs,
VID 0 doesn't belong in that structure, since it is not part of the
bridging layer.
So delete the middle man, ocelot_vlan_member_set(), and let
ocelot_vlan_init() call directly ocelot_vlant_set_mask() which forgoes
any data structure and writes directly to hardware, which is all that we
need.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 20 Oct 2021 17:58:48 +0000 (20:58 +0300)]
net: mscc: ocelot: add a type definition for REW_TAG_CFG_TAG_CFG
This is a cosmetic patch which clarifies what are the port tagging
options for Ocelot switches.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kiran Patil [Fri, 15 Oct 2021 23:35:17 +0000 (16:35 -0700)]
ice: Add tc-flower filter support for channel
Add support to add/delete channel specific filter using tc-flower.
For now, only supported action is "skip_sw hw_tc <tc_num>"
Filter criteria is specific to channel and it can be
combination of L3, L3+L4, L2+L4.
Example:
MATCH criteria Action
---------------------------
src and/or dest IPv4[6]/mask -> Forward to "hw_tc <tc_num>"
dest IPv4[6]/mask + dest L4 port -> Forward to "hw_tc <tc_num>"
dest MAC + dest L4 port -> Forward to "hw_tc <tc_num>"
src IPv4[6]/mask + src L4 port -> Forward to "hw_tc <tc_num>"
src MAC + src L4 port -> Forward to "hw_tc <tc_num>"
Adding tc-flower filter for channel using "hw_tc"
-------------------------------------------------
tc qdisc add dev <ethX> clsact
Above two steps are only needed the first time when adding
tc-flower filter.
tc filter add dev <ethX> protocol ip ingress prio 1 flower \
dst_ip 192.168.0.1/32 ip_proto tcp dst_port 5001 \
skip_sw hw_tc 1
tc filter show dev <ethX> ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1 hw_tc 1
eth_type ipv4
ip_proto tcp
dst_ip 192.168.0.1
dst_port 5001
skip_sw
in_hw in_hw_count 1
Delete specific filter:
-------------------------
tc filter del dev <ethx> ingress pref 1 handle 0x1 flower
Delete All filters:
------------------
tc filter del dev <ethX> ingress
Co-developed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Kiran Patil [Fri, 15 Oct 2021 23:35:16 +0000 (16:35 -0700)]
ice: enable ndo_setup_tc support for mqprio_qdisc
Add support in driver for TC_QDISC_SETUP_MQPRIO. This support
enables instantiation of channels in HW using existing MQPRIO
infrastructure which is extended to be offloadable. This
provides a mechanism to configure dedicated set of queues for
each TC.
Configuring channels using "tc mqprio":
--------------------------------------
tc qdisc add dev <ethX> root mqprio num_tc 3 map 0 1 2 \
queues 4@0 4@4 4@8 hw 1 mode channel
Above command configures 3 TCs having 4 queues each. "hw 1 mode channel"
implies offload of channel configuration to HW. When driver processes
configuration received via "ndo_setup_tc: QDISC_SETUP_MQPRIO", each
TC maps to HW VSI with specified queues.
User can optionally specify bandwidth min and max rate limit per TC
(see example below). If shaper params like min and/or max bandwidth
rate limit are specified, driver configures VSI specific rate limiter
in HW.
Configuring channels and bandwidth shaper parameters using "tc mqprio":
----------------------------------------------------------------
tc qdisc add dev <ethX> root mqprio \
num_tc 4 map 0 1 2 3 queues 4@0 4@4 4@8 4@12 hw 1 mode channel \
shaper bw_rlimit min_rate 1Gbit 2Gbit 3Gbit 4Gbit \
max_rate 4Gbit 5Gbit 6Gbit 7Gbit
Command to view configured TCs:
-----------------------------
tc qdisc show dev <ethX>
Deleting TCs:
------------
tc qdisc del dev <ethX> root mqprio
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Kiran Patil [Fri, 15 Oct 2021 23:35:15 +0000 (16:35 -0700)]
ice: Add infrastructure for mqprio support via ndo_setup_tc
Add infrastructure required for "ndo_setup_tc:qdisc_mqprio".
ice_vsi_setup is modified to configure traffic classes based
on mqprio data received from the stack. This includes low-level
functions to configure min, max rate-limit parameters in hardware
for traffic classes. Each traffic class gets mapped to a hardware
channel (VSI) which can be individually configured with different
bandwidth parameters.
Co-developed-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Toke Høiland-Jørgensen [Tue, 19 Oct 2021 17:47:09 +0000 (19:47 +0200)]
fq_codel: generalise ce_threshold marking for subset of traffic
Commit
e72aeb9ee0e3 ("fq_codel: implement L4S style ce_threshold_ect1
marking") expanded the ce_threshold feature of FQ-CoDel so it can
be applied to a subset of the traffic, using the ECT(1) bit of the ECN
field as the classifier. However, hard-coding ECT(1) as the only
classifier for this feature seems limiting, so let's expand it to be more
general.
To this end, change the parameter from a ce_threshold_ect1 boolean, to a
one-byte selector/mask pair (ce_threshold_{selector,mask}) which is applied
to the whole diffserv/ECN field in the IP header. This makes it possible to
classify packets by any value in either the ECN field or the diffserv
field. In particular, setting a selector of INET_ECN_ECT_1 and a mask of
INET_ECN_MASK corresponds to the functionality before this patch, and a
mask of ~INET_ECN_MASK allows using the selector as a straight-forward
match against a diffserv code point:
# apply ce_threshold to ECT(1) traffic
tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x1/0x3
# apply ce_threshold to ECN-capable traffic marked as diffserv AF22
tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x50/0xfc
Regardless of the selector chosen, the normal rules for ECN-marking of
packets still apply, i.e., the flow must still declare itself ECN-capable
by setting one of the bits in the ECN field to get marked at all.
v2:
- Add tc usage examples to patch description
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20211019174709.69081-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefan Agner [Tue, 19 Oct 2021 19:16:47 +0000 (21:16 +0200)]
phy: micrel: ksz8041nl: do not use power down mode
Some Micrel KSZ8041NL PHY chips exhibit continuous RX errors after using
the power down mode bit (0.11). If the PHY is taken out of power down
mode in a certain temperature range, the PHY enters a weird state which
leads to continuously reporting RX errors. In that state, the MAC is not
able to receive or send any Ethernet frames and the activity LED is
constantly blinking. Since Linux is using the suspend callback when the
interface is taken down, ending up in that state can easily happen
during a normal startup.
Micrel confirmed the issue in errata DS80000700A [*], caused by abnormal
clock recovery when using power down mode. Even the latest revision (A4,
Revision ID 0x1513) seems to suffer that problem, and according to the
errata is not going to be fixed.
Remove the suspend/resume callback to avoid using the power down mode
completely.
[*] https://ww1.microchip.com/downloads/en/DeviceDoc/
80000700A.pdf
Fixes:
1a5465f5d6a2 ("phy/micrel: Add suspend/resume support to Micrel PHYs")
Signed-off-by: Stefan Agner <stefan@agner.ch>
Acked-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tim Gardner [Tue, 19 Oct 2021 18:19:50 +0000 (12:19 -0600)]
net: enetc: unmap DMA in enetc_send_cmd()
Coverity complains of a possible dereference of a null return value.
5. returned_null: kzalloc returns NULL. [show details]
6. var_assigned: Assigning: si_data = NULL return value from kzalloc.
488 si_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
489 cbd.length = cpu_to_le16(data_size);
490
491 dma = dma_map_single(&priv->si->pdev->dev, si_data,
492 data_size, DMA_FROM_DEVICE);
While this kzalloc() is unlikely to fail, I did notice that the function
returned without unmapping si_data.
Fix this by refactoring the error paths and checking for kzalloc()
failure.
Fixes:
888ae5a3952ba ("net: enetc: add tc flower psfp offload driver")
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org (open list)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Tue, 19 Oct 2021 16:42:28 +0000 (09:42 -0700)]
net-core: use netdev_* calls for kernel messages
While loading a driver and changing the number of queues, I noticed this
message in the kernel log:
"[253489.070080] Number of in use tx queues changed invalidating tc
mappings. Priority traffic classification disabled!"
But I had no idea what interface was being talked about because this
message used pr_warn().
After investigating, it appears we can use the netdev_* helpers already
defined to create predictably formatted messages, and that already handle
<unknown netdev> cases, in more of the messages in dev.c.
After this change, this message (and others) will look like this:
"[ 170.181093] ice 0000:3b:00.0 ens785f0: Number of in use tx queues
changed invalidating tc mappings. Priority traffic classification
disabled!"
One goal here was not to change the message significantly from the
original format so as to not break user's expectations, so I just
changed messages that used pr_* and generally started with %s ==
dev->name.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 19 Oct 2021 16:39:27 +0000 (09:39 -0700)]
batman-adv: use eth_hw_addr_set() instead of ether_addr_copy()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Convert batman from ether_addr_copy() to eth_hw_addr_set():
@@
expression dev, np;
@@
- ether_addr_copy(dev->dev_addr, np)
+ eth_hw_addr_set(dev, np)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 19 Oct 2021 16:36:06 +0000 (09:36 -0700)]
mac802154: use dev_addr_set() - manual
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 19 Oct 2021 16:36:05 +0000 (09:36 -0700)]
mac802154: use dev_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 19 Oct 2021 16:30:07 +0000 (09:30 -0700)]
batman-adv: prepare for const netdev->dev_addr
netdev->dev_addr will be constant soon, make sure
the qualifier is propagated thru batman-adv.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tim Gardner [Tue, 19 Oct 2021 12:19:25 +0000 (06:19 -0600)]
soc: fsl: dpio: Unsigned compared against 0 in qbman_swp_set_irq_coalescing()
Coverity complains of unsigned compare against 0. There are 2 cases in
this function:
1821 itp = (irq_holdoff * 1000) / p->desc->qman_256_cycles_per_ns;
CID 121131 (#1 of 1): Unsigned compared against 0 (NO_EFFECT)
unsigned_compare: This less-than-zero comparison of an unsigned value is never true. itp < 0U.
1822 if (itp < 0 || itp > 4096) {
1823 max_holdoff = (p->desc->qman_256_cycles_per_ns * 4096) / 1000;
1824 pr_err("irq_holdoff must be between 0..%dus\n", max_holdoff);
1825 return -EINVAL;
1826 }
1827
unsigned_compare: This less-than-zero comparison of an unsigned value is never true. irq_threshold < 0U.
1828 if (irq_threshold >= p->dqrr.dqrr_size || irq_threshold < 0) {
1829 pr_err("irq_threshold must be between 0..%d\n",
1830 p->dqrr.dqrr_size - 1);
1831 return -EINVAL;
1832 }
Fix this by removing the comparisons altogether as they are incorrect. Zero is
a possible value in either case. Also fix a minor comment typo and update the
2 pr_err() calls to use %u formatting as well as be more precise regarding
the exact error.
Fixes:
ed1d2143fee5 ("soc: fsl: dpio: add support for irq coalescing per software portal")
Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
Cc: Roy Pledge <Roy.Pledge@nxp.com>
Cc: Li Yang <leoyang.li@nxp.com>
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ansuel Smith [Tue, 19 Oct 2021 00:08:50 +0000 (02:08 +0200)]
net: dsa: qca8k: tidy for loop in setup and add cpu port check
Tidy and organize qca8k setup function from multiple for loop.
Change for loop in bridge leave/join to scan all port and skip cpu port.
No functional change intended.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 20 Oct 2021 10:43:11 +0000 (11:43 +0100)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-10-19
This series contains updates to ice driver only.
Brett implements support for ndo_set_vf_rate allowing for min_tx_rate
and max_tx_rate to be set for a VF.
Jesse updates DIM moderation to improve latency and resolves problems
with reported rate limit and extra software generated interrupts.
Wojciech moves a check for trusted VFs to the correct function,
disables lb_en for switchdev offloads, and refactors ethtool ops due
to differences in support for PF and port representor support.
Cai Huoqing utilizes the helper function devm_add_action_or_reset().
Gustavo A. R. Silva replaces uses of allocation to devm_kcalloc() as
applicable.
Dan Carpenter propagates an error instead of returning success.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 20 Oct 2021 10:41:02 +0000 (11:41 +0100)]
Merge branch 'dev_addr-conversions-part-three'
Jakub Kicinski says:
====================
ethernet: manual netdev->dev_addr conversions (part 3)
Manual conversions of Ethernet drivers writing directly
to netdev->dev_addr (part 3 out of 3).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 19 Oct 2021 15:00:11 +0000 (08:00 -0700)]
ethernet: via-velocity: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 19 Oct 2021 15:00:10 +0000 (08:00 -0700)]
ethernet: via-rhine: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 19 Oct 2021 15:00:09 +0000 (08:00 -0700)]
ethernet: tlan: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, do the swapping, then
call eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 19 Oct 2021 15:00:08 +0000 (08:00 -0700)]
ethernet: tehuti: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Break the address up into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 19 Oct 2021 15:00:07 +0000 (08:00 -0700)]
ethernet: stmmac: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 19 Oct 2021 15:00:06 +0000 (08:00 -0700)]
ethernet: netsec: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 20 Oct 2021 10:32:28 +0000 (11:32 +0100)]
Merge branch 'sja1105-next'
Vladimir Oltean says:
====================
New RGMII delay DT bindings for the SJA1105 DSA driver
During recent reviews I've been telling people that new MAC drivers
should adopt a certain DT binding format for RGMII delays in order to
avoid conflicting interpretations. Some suggestions were better received
than others, and it appears we are still far from a consensus.
Part of the problem seems to be that there are still drivers that apply
RGMII delays based on an incorrect interpretation of the device tree,
and these serve as a bad example for others.
I happen to maintain one of those drivers and I am able to test it, so I
figure that one of the ways in which I can make a change is to stop
providing a bad example.
Therefore, this series adds support for the "rx-internal-delay-ps" and
"tx-internal-delay-ps" properties inside sja1105 switch port DT nodes,
and if these are present, they will decide what RGMII delays will the
driver apply.
The in-tree device trees are also updated to follow the new format, as
well as the schema validator.
I assume it's okay to get all changes merged in through the same tree
(net-next). Although the DTS changes could be split, if needed - the
driver works with or without them. There is one more DTS which should be
changed, which is in Shawn's tree but not in net-next:
https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux.git/tree/arch/arm64/boot/dts/freescale/fsl-lx2160a-bluebox3.dts?h=for-next
For that, I'd have to send a separate patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 18 Oct 2021 19:29:52 +0000 (22:29 +0300)]
net: dsa: sja1105: parse {rx, tx}-internal-delay-ps properties for RGMII delays
This change does not fix any functional issue or address any real life
use case that wasn't possible before. It is just a small step in the
process of standardizing the way in which Ethernet MAC drivers may apply
RGMII delays (traditionally these have been applied by PHYs, with no
clear definition of what to do in the case of a fixed-link).
The sja1105 driver used to apply MAC-level RGMII delays on the RX data
lines when in fixed-link mode and using a phy-mode of "rgmii-rxid" or
"rgmii-id" and on the TX data lines when using "rgmii-txid" or "rgmii-id".
But the standard definitions don't say anything about behaving
differently when the port is in fixed-link vs when it isn't, and the new
device tree bindings are about having a way of applying the delays in a
way that is independent of the phy-mode and of the fixed-link property.
When the {rx,tx}-internal-delay-ps properties are present, use them,
otherwise fall back to the old behavior and warn.
One other thing to note is that the SJA1105 hardware applies a delay
value in degrees rather than in picoseconds (the delay in ps changes
depending on the frequency of the RGMII clock - 125 MHz at 1G, 25 MHz at
100M, 2.5MHz at 10M). I assume that is fine, we calculate the phase
shift of the internal delay lines assuming that the device tree meant
gigabit, and we let the hardware scale those according to the link speed.
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210723173108.459770-6-prasanna.vengateshan@microchip.com/
Link: https://patchwork.ozlabs.org/project/netdev/patch/20200616074955.GA9092@laureti-dev/#2461123
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 18 Oct 2021 19:29:51 +0000 (22:29 +0300)]
dt-bindings: net: dsa: sja1105: add {rx,tx}-internal-delay-ps
Add a schema validator to nxp,sja1105.yaml and to dsa.yaml for explicit
MAC-level RGMII delays. These properties must be per port and must be
present only for a phy-mode that represents RGMII.
We tell dsa.yaml that these port properties might be present, we also
define their valid values for SJA1105. We create a common definition for
the RX and TX valid range, since it's quite a mouthful.
We also modify the example to include the explicit RGMII delay properties.
On the fixed-link ports (in the example, port 4), having these explicit
delays is actually mandatory, since with the new behavior, the driver
shouts that it is interpreting what delays to apply based on phy-mode.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 18 Oct 2021 19:29:50 +0000 (22:29 +0300)]
dt-bindings: net: dsa: inherit the ethernet-controller DT schema
Since a switch is basically a bunch of Ethernet controllers, just
inherit the common schema for one to get stronger type validation of the
properties of a port.
For example, before this change it was valid to have a phy-mode = "xfi"
even if "xfi" is not part of ethernet-controller.yaml, now it is not.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 18 Oct 2021 19:29:49 +0000 (22:29 +0300)]
dt-bindings: net: dsa: sja1105: fix example so all ports have a phy-handle of fixed-link
All ports require either a phy-handle or a fixed-link, and port 3 in the
example didn't have one. Add it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 19 Oct 2021 22:40:52 +0000 (15:40 -0700)]
Merge branch 'net-sched-fixes-after-recent-qdisc-running-changes'
Eric Dumazet says:
====================
net: sched: fixes after recent qdisc->running changes
First patch fixes a plain bug in qdisc_run_begin().
Second patch removes a pair of atomic operations, increasing performance.
====================
Link: https://lore.kernel.org/r/20211019003402.2110017-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 19 Oct 2021 00:34:02 +0000 (17:34 -0700)]
net: sched: remove one pair of atomic operations
__QDISC_STATE_RUNNING is only set/cleared from contexts owning qdisc lock.
Thus we can use less expensive bit operations, as we were doing
before commit
f9eb8aea2a1e ("net_sched: transform qdisc running bit into a seqcount")
Fixes:
29cbcd858283 ("net: sched: Remove Qdisc::running sequence counter")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ahmed S. Darwish <a.darwish@linutronix.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 19 Oct 2021 00:34:01 +0000 (17:34 -0700)]
net: sched: fix logic error in qdisc_run_begin()
For non TCQ_F_NOLOCK qdisc, qdisc_run_begin() tries to set
__QDISC_STATE_RUNNING and should return true if the bit was not set.
test_and_set_bit() returns old bit value, therefore we need to invert.
Fixes:
29cbcd858283 ("net: sched: Remove Qdisc::running sequence counter")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ahmed S. Darwish <a.darwish@linutronix.de>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Wed, 13 Oct 2021 08:00:12 +0000 (11:00 +0300)]
ice: fix an error code in ice_ena_vfs()
Return the error code if ice_eswitch_configure() fails. Don't return
success.
Fixes:
1c54c839935b ("ice: enable/disable switchdev when managing VFs")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Gustavo A. R. Silva [Wed, 6 Oct 2021 18:09:08 +0000 (13:09 -0500)]
ice: use devm_kcalloc() instead of devm_kzalloc()
Use 2-factor multiplication argument form devm_kcalloc() instead
of devm_kzalloc().
Link: https://github.com/KSPP/linux/issues/162
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Cai Huoqing [Wed, 22 Sep 2021 12:59:46 +0000 (20:59 +0800)]
ice: Make use of the helper function devm_add_action_or_reset()
The helper function devm_add_action_or_reset() will internally
call devm_add_action(), and if devm_add_action() fails then it will
execute the action mentioned and return the error code. So
use devm_add_action_or_reset() instead of devm_add_action()
to simplify the error handling, reduce the code.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Wojciech Drewek [Fri, 8 Oct 2021 08:44:03 +0000 (10:44 +0200)]
ice: Refactor PR ethtool ops
This patch improves a few things:
- it fixes issue where ethtool -i reports that PR supports
priv-flags and tests when in fact it does not support them
- instead of using the same functions for both PF and PR ethtool ops,
this patch introduces separate ops for both cases and internal
functions with core logic.
- prevent accessing VF VSI while VF is not ready by calling
ice_check_vf_ready_for_cfg
- all PR specific functions in ethtool.c were moved to one place in
file
- instead overwriting n_priv_flags in ice_repr_get_drvinfo,
priv-flags code was moved from __ice_get_drvinfo to ice_get_drvinfo
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Wojciech Drewek [Thu, 23 Sep 2021 12:43:48 +0000 (14:43 +0200)]
ice: Manage act flags for switchdev offloads
Currently it is not possible to set/unset lb_en and lan_en flags
for advanced rules during their creation. Both flags are enabled
by default. In case of switchdev offloads for egress traffic we
need lb_en to be disabled. Because of that, we work around it by
updating the rule immediately after its creation.
This change allows us to set/unset those flags right away and it
gets rid of old workaround as well. Using ice_adv_rule_flags_info
structure we can pass info about flags we want to be set for
a given advanced rule. Flags are stored in flags_info.act.
Values from act would be used only if act_valid was set to true,
otherwise default values would be used.
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Wojciech Drewek [Fri, 15 Oct 2021 08:27:19 +0000 (10:27 +0200)]
ice: Forbid trusted VFs in switchdev mode
Merge issues caused the check for switchdev mode has been inserted
in wrong place. It should be in ice_set_vf_trust not in ice_set_vf_mac.
Trusted VFs are forbidden in switchdev mode because they should
be configured only from the host side.
Fixes:
1c54c839935b ("ice: enable/disable switchdev when managing VFs")
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jesse Brandeburg [Mon, 20 Sep 2021 19:30:14 +0000 (12:30 -0700)]
ice: fix software generating extra interrupts
The driver tried to work around missing completion events that occurred
while interrupts are disabled, by triggering a software interrupt
whenever we exit polling (but we had to have polled at least once).
This was causing a *lot* of extra interrupts for some workloads like
NVMe over TCP, which resulted in regressions in performance. It was also
visible when polling didn't prevent interrupts when busy_poll was
enabled.
Fix the extra interrupts by utilizing our previously unused 3rd ITR
(interrupt throttle) index and set it to 20K interrupts per second, and
then trigger a software interrupt within that rate limit.
While here, slightly refactor the code to avoid an overwrite of a local
variable in the case of wb_en = true.
Fixes:
b7306b42beaf ("ice: manage interrupts during poll exit")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jesse Brandeburg [Mon, 20 Sep 2021 19:30:13 +0000 (12:30 -0700)]
ice: fix rate limit update after coalesce change
If the adaptive settings are changed with
ethtool -C ethx adaptive-rx off adaptive-tx off
then the interrupt rate limit should be maintained as a user set value,
but only if BOTH adaptive settings are off. Fix a bug where the rate
limit that was being used in adaptive mode was staying set in the
register but was not reported correctly by ethtool -c ethx. Due to long
lines include a small refactor of q_vector variable.
Fixes:
b8b4772377dd ("ice: refactor interrupt moderation writes")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jesse Brandeburg [Mon, 20 Sep 2021 19:30:12 +0000 (12:30 -0700)]
ice: update dim usage and moderation
The driver was having trouble with unreliable latency when doing single
threaded ping-pong tests. This was root caused to the DIM algorithm
landing on a too slow interrupt value, which caused high latency, and it
was especially present when queues were being switched frequently by the
scheduler as happens on default setups today.
In attempting to improve this, we allow the upper rate limit for
interrupts to move to rate limit of 4 microseconds as a max, which means
that no vector can generate more than 250,000 interrupts per second. The
old config was up to 100,000. The driver previously tried to program the
rate limit too frequently and if the receive and transmit side were both
active on the same vector, the INTRL would be set incorrectly, and this
change fixes that issue as a side effect of the redesign.
This driver will operate from now on with a slightly changed DIM table
with more emphasis towards latency sensitivity by having more table
entries with lower latency than with high latency (high being >= 64
microseconds).
The driver also resets the DIM algorithm state with a new stats set when
there is no work done and the data becomes stale (older than 1 second),
for the respective receive or transmit portion of the interrupt.
Add a new helper for setting rate limit, which will be used more
in a followup patch.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Brett Creeley [Mon, 13 Sep 2021 18:22:19 +0000 (11:22 -0700)]
ice: Add support for VF rate limiting
Implement ndo_set_vf_rate to support setting of min_tx_rate and
max_tx_rate; set the appropriate bandwidth in the scheduler for the
node representing the specified VF VSI.
Co-developed-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Cai Huoqing [Mon, 18 Oct 2021 13:16:29 +0000 (21:16 +0800)]
net: ethernet: ixp4xx: Make use of dma_pool_zalloc() instead of dma_pool_alloc/memset()
Replacing dma_pool_alloc/memset() with dma_pool_zalloc()
to simplify the code.
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe JAILLET [Thu, 14 Oct 2021 18:26:14 +0000 (20:26 +0200)]
ieee802154: Remove redundant 'flush_workqueue()' calls
'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.
Remove the redundant 'flush_workqueue()' calls.
This was generated with coccinelle:
@@
expression E;
@@
- flush_workqueue(E);
destroy_workqueue(E);
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Tue, 19 Oct 2021 07:49:54 +0000 (10:49 +0300)]
devlink: Remove extra device_lock assert checks
PCI core code in the pci_call_probe() has a path that doesn't hold
device_lock. It happens because the ->probe() is called through the
workqueue mechanism.
349 static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
350 const struct pci_device_id *id)
351 {
352
....
377 if (cpu < nr_cpu_ids)
378 error = work_on_cpu(cpu, local_pci_probe, &ddi);
Luckily enough, the core still ensures that only single flow is executed,
so it safe to remove the assert checks that anyway were added for annotations
purposes.
Fixes:
b88f7b1203bf ("devlink: Annotate devlink API calls")
Reported-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
luo penghao [Mon, 18 Oct 2021 08:55:13 +0000 (08:55 +0000)]
ethernet: Remove redundant statement
The variable will be assigned again later in the if condition,
there is no meaning there.
drivers/net/ethernet/broadcom/tg3.c:5750:2 warning:
Value stored to 'current_link_up' is never read.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hancock [Tue, 19 Oct 2021 10:24:50 +0000 (11:24 +0100)]
net: phylink: Support disabling autonegotiation for PCS
The auto-negotiation state in the PCS as set by
phylink_mii_c22_pcs_config was previously always enabled when the
driver is configured for in-band autonegotiation, even if
autonegotiation was disabled on the interface with ethtool. Update the
code to set the BMCR_ANENABLE bit based on the interface's
autonegotiation enabled state.
Update phylink_mii_c22_pcs_get_state to not check
autonegotiation-related fields when autonegotiation is disabled.
Update phylink_mac_pcs_get_state to initialize the state based on the
interface's configured speed, duplex and pause parameters rather than
to unknown when autonegotiation is disabled, before calling the
driver's pcs_get_state functions, as they are not likely to provide
meaningful data for these fields when autonegotiation is disabled. In
this case the driver is really just filling in the link state field.
Note that in cases where there is a downstream PHY connected, such as
with SGMII and a copper PHY, the configuration set by ethtool is
handled by phy_ethtool_ksettings_set and not propagated to the PCS.
This is correct since SGMII or 1000Base-X autonegotiation with the PCS
should normally still be used even if the copper side has disabled it.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sebastian Andrzej Siewior [Tue, 19 Oct 2021 10:12:04 +0000 (12:12 +0200)]
net: sched: Allow statistics reads from softirq.
Eric reported that the rate estimator reads statics from the softirq
which in turn triggers a warning introduced in the statistics rework.
The warning is too cautious. The updates happen in the softirq context
so reads from softirq are fine since the writes can not be preempted.
The updates/writes happen during qdisc_run() which ensures one writer
and the softirq context.
The remaining bad context for reading statistics remains in hard-IRQ
because it may preempt a writer.
Fixes:
29cbcd8582837 ("net: sched: Remove Qdisc::running sequence counter")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King (Oracle) [Tue, 19 Oct 2021 10:00:04 +0000 (11:00 +0100)]
net: phylink: rejig SFP interface selection in ksettings_set()
Commit
ea269a6f7207 ("net: phylink: Update SFP selected interface on
advertising changes") added a better solution to selecting the
interface mode for SFPs using the advertisement mask. This method will
work for mvneta and mvpp2 when selecting between 2500base-X and
1000base-X without needing to use the basex helper, or indicate that
we support both 1000base-X and 2500base-X when in either of these two
interface modes.
Hence, we need to eliminate the validation prior to selecting the
interface, otherwise when we clean up mvneta's validation function, we
will end up locking to 2500base-X as we validate with an interface mode
of PHY_INERFACE_MODE_2500BASEX.
The supported mask will already have been reduced down to the union of
support for the SFP and MAC already, so we can be confident that the
advertisement mask is already appropriately restricted. We only need to
select the appropriate interface, and then revalidate with the new
interface mode.
We get rid of the check for pl->sfp_port too, this is meaningless here
as it doesn't get cleared when a module is removed, so it doesn't
indicate if a module is present. Just rely on pl->sfp_bus.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
luo penghao [Mon, 18 Oct 2021 08:51:54 +0000 (08:51 +0000)]
e1000e: Remove redundant statement
This assignment statement is meaningless, because the statement
will execute to the tag "set_itr_now".
The clang_analyzer complains as follows:
drivers/net/ethernet/intel/e1000e/netdev.c:2552:3 warning:
Value stored to 'current_itr' is never read.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 19 Oct 2021 11:46:25 +0000 (12:46 +0100)]
Merge branch 'eth_hw_addr_gen-for-switches'
Jakub Kicinski says:
====================
ethernet: add eth_hw_addr_gen() for switches
While doing the last polishing of the drivers/ethernet
changes I realized we have a handful of drivers offsetting
some base MAC addr by an id. So I decided to add a helper
for it. The helper takes care of wrapping which is probably
not 100% necessary but seems like a good idea. And it saves
driver side LoC (the diffstat is actually negative if we
compare against the changes I'd have to make if I was to
convert all these drivers to not operate directly on
netdev->dev_addr).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 21:10:07 +0000 (14:10 -0700)]
ethernet: sparx5: use eth_hw_addr_gen()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 21:10:06 +0000 (14:10 -0700)]
ethernet: mlxsw: use eth_hw_addr_gen()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 21:10:05 +0000 (14:10 -0700)]
ethernet: fec: use eth_hw_addr_gen()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 21:10:04 +0000 (14:10 -0700)]
ethernet: prestera: use eth_hw_addr_gen()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Vadym and Taras report that the current behavior of the driver
is not exactly expected and it's better to add the port id in
like other drivers do.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 21:10:03 +0000 (14:10 -0700)]
ethernet: ocelot: use eth_hw_addr_gen()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 21:10:02 +0000 (14:10 -0700)]
ethernet: add a helper for assigning port addresses
We have 5 drivers which offset base MAC addr by port id.
Create a helper for them.
This helper takes care of overflows, which some drivers
did not do, please complain if that's going to break
anything!
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 19 Oct 2021 11:41:48 +0000 (12:41 +0100)]
Merge branch 'dev_addr-conversions-part-two'
Jakub Kicinski says:
====================
ethernet: manual netdev->dev_addr conversions (part 2)
Manual conversions of Ethernet drivers writing directly
to netdev->dev_addr (part 2 out of 3).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 14:29:32 +0000 (07:29 -0700)]
ethernet: smsc: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Break the address up into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 14:29:31 +0000 (07:29 -0700)]
ethernet: smc91x: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 14:29:30 +0000 (07:29 -0700)]
ethernet: sis900: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 14:29:29 +0000 (07:29 -0700)]
ethernet: sis190: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 14:29:28 +0000 (07:29 -0700)]
ethernet: sxgbe: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 14:29:27 +0000 (07:29 -0700)]
ethernet: rocker: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 14:29:26 +0000 (07:29 -0700)]
ethernet: renesas: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Break the address up into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 14:29:25 +0000 (07:29 -0700)]
ethernet: r8169: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 14:29:24 +0000 (07:29 -0700)]
ethernet: netxen: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Invert the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 14:29:23 +0000 (07:29 -0700)]
ethernet: lpc: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 14:29:22 +0000 (07:29 -0700)]
ethernet: sky2/skge: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 18 Oct 2021 14:29:21 +0000 (07:29 -0700)]
ethernet: mv643xx: use eth_hw_addr_set()
Commit
406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 19 Oct 2021 11:24:52 +0000 (12:24 +0100)]
Merge branch 'mlxsw-multi-level-qdisc-offload'
Ido Schimmel says:
====================
mlxsw: Multi-level qdisc offload
Petr says:
Currently, mlxsw admits for offload a suitable root qdisc, and its
children. Thus up to two levels of hierarchy are offloaded. Often, this is
enough: one can configure TCs with RED and TCs with a shaper on, and can
even see counters for each TC by looking at a qdisc at a sufficiently
shallow position.
While simple, the system has obvious shortcomings. It is not possible to
configure both RED and shaping on one TC. It is not possible to place a
PRIO below root TBF, which would then be offloaded as port shaper. FIFOs
are only offloaded at root or directly below, which is confusing to users,
because RED and TBF of course have their own FIFO.
This patch set lifts assumptions that prevent offloading multi-level qdisc
trees.
In patch #1, offload of a graft operation is added to TBF. Grafts are
issued as another qdisc is linked to the qdisc in question, and give
drivers a chance to react to the linking. The absence of this event was not
a major issue so far, because TBF was not considered classful, which
changes with this patchset.
The codebase currently assumes that ETS and PRIO are the only classful
qdiscs. The following patches gradually lift this assumption.
In patch #2, calculation of traffic class and priomap of a qdisc is fixed.
Patch #3 fixes handling of future FIFOs. Child FIFO qdiscs may be created
and notified before their parent qdisc exists and therefore need special
handling.
Patches #4, #5 and #6 unify, respectively, child destruction, child
grafting, and cleanup of statistics.
Patch #7 adds a function that validates whether a given qdisc topology is
offloadable.
Finally in patch #8, TBF and RED become classful. At this point, FIFO
qdiscs grafted to an offloaded qdisc should always be offloaded.
Patch #9 adds a selftest to verify some offloadable and unoffloadable qdisc
trees.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 19 Oct 2021 08:07:12 +0000 (11:07 +0300)]
selftests: mlxsw: Add a test for un/offloadable qdisc trees
This checks that various qdisc configurations either are or are not
offloaded.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 19 Oct 2021 08:07:11 +0000 (11:07 +0300)]
mlxsw: spectrum_qdisc: Make RED, TBF offloads classful
Permit offloading qdiscs below RED and TBF. In order to avoid having to
implement trivial propagating callbacks for get_prio_bitmap and
get_tclass_num, extend mlxsw_sp_qdisc_get_prio_bitmap() and
..._get_tclass_num() to handle the lack of the callback as a cue to forward
the request to the parent.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 19 Oct 2021 08:07:10 +0000 (11:07 +0300)]
mlxsw: spectrum_qdisc: Validate qdisc topology
A following patch will enable offloading qdiscs that are deeper than
directly under root qdisc. Currently the topology validation consists of
demanding a root qdisc position for ETS and PRIO. Since RED and TBF are
considered classless, this is enough. In order to prevent some nonsensical
combinations when RED and TBF become classful, introduce a more general
topology validator.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 19 Oct 2021 08:07:09 +0000 (11:07 +0300)]
mlxsw: spectrum_qdisc: Clean stats recursively when priomap changes
On Spectrum, there are no per-TC TX counters. Instead, mlxsw uses per-prio
counters and aggregates them according to the priomap. Therefore when
priomap changes, the counter base values need to be reset to reflect the
change. Previously, this was only done for the sole child qdisc, but a
following patch makes RED and TBF classful. Thus apply the request to the
whole sub-tree.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 19 Oct 2021 08:07:08 +0000 (11:07 +0300)]
mlxsw: spectrum_qdisc: Unify graft validation
Qdisc graft operations have so far been reported at PRIO, ETS and RED, with
RED events ignored, because RED was not considered a classful qdisc. A
following patch will make mlxsw recognize RED and TBF as classful qdiscs,
and thus it is necessary to validate grafting at these qdiscs as well.
Rename the existing graft validator to make it clear that it is a generic
function, and invoke for RED and TBF graft events as well. Drop the
unnecessary PRIO helper and invoke the graft validator directly for PRIO as
well.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 19 Oct 2021 08:07:07 +0000 (11:07 +0300)]
mlxsw: spectrum_qdisc: Destroy children in mlxsw_sp_qdisc_destroy()
Currently ETS and PRIO are the only offloaded classful qdiscs. Since they
are both similar, their destroy handler is the same, and it handles
children destruction itself. But now it is possible to do it generically
for any classful qdisc. Therefore promote the recursive destruction from
the ETS handler to mlxsw_sp_qdisc_destroy(), so that RED and TBF pick it up
in follow-up patches.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 19 Oct 2021 08:07:06 +0000 (11:07 +0300)]
mlxsw: spectrum_qdisc: Extract two helpers for handling future FIFOs
Extract from __mlxsw_sp_qdisc_ets_replace() two helpers for handling of one
future FIFO resp. reinitializing the array of future FIFOs.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 19 Oct 2021 08:07:05 +0000 (11:07 +0300)]
mlxsw: spectrum_qdisc: Query tclass / priomap instead of caching it
Currently when keeping track of qdiscs, mlxsw notes the TC and priomap
corresponding to each qdisc. That is fine currently, as there only ever is
one level of qdiscs to update: the direct children of ETS / PRIO. However
as deeper structures are made offloadable, ETS would need to update these
values for the complete subtree, and interim qdiscs would need to remember
to propagate the value.
Instead, reverse the responsibility: child qdiscs can ask their parent what
their TC and priomap are. ETS / PRIO know the answer right away, or there
are defaults for when the root qdisc does not assign them (e.g. when RED is
used as root qdisc). When RED and TBF become classful, they will simply
forward the request up to their parent.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 19 Oct 2021 08:07:04 +0000 (11:07 +0300)]
net: sch_tbf: Add a graft command
As another qdisc is linked to the TBF, the latter should issue an event to
give drivers a chance to react to the grafting. In other qdiscs, this event
is called GRAFT, so follow suit with TBF as well.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 19 Oct 2021 11:16:34 +0000 (12:16 +0100)]
Merge tag 'mlx5-updates-2021-10-18' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
mlx5-updates-2021-10-18
Maor Maor Gottlieb says:
========================
Use hash to select the affinity port in VF LAG
Current VF LAG architecture is based on QP association with a port.
QP must be created after LAG is enabled to allow association with non-native port.
VM Packets going on slow-path to eSwicth manager (SW path or hairpin) will be transmitted
through a different QP than the VM. This means that Different packets of the same flow might
egress from different physical ports.
This patch-set solves this issue by moving the port selection to be based on the hash function
defined by the bond.
When the device is moved to VF LAG mode, the driver creates TTC (traffic type classifier) flow
tables in order to classify the packet and steer it to the relevant hash function. Similar to what
is done in the mlx5 RSS implementation.
Each rule in the TTC table, forwards the packet to port selection flow table which has one hash
split flow group which contains two "catch all" flow table entries. Each entry point to the
relative uplink port. As shown below:
-------------------
| FT |
TTC rule -> | ----------- |
| FG| FTE --|-|-----> uplink of port #1
| | FTE --|-|-----> uplink of port #2
| ----------- |
-------------------
Hash split flow group is flow group that created as type of HASH_SPLIT and associated with match definer.
The match definer define the fields which included in the hash calculation.
The driver creates the match definer according to the xmit hash policy of the bond driver.
Patches overview:
========================
Minor E-Switch updates:
- Patch #12, dynamic allocation of dest array
- Patch #13, increase number of forward destinations to 32
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 19 Oct 2021 11:12:21 +0000 (12:12 +0100)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/next-queue
Mateusz Palczewski says:
====================
40GbE Intel Wired LAN Driver Updates 2021-10-18
Use single state machine for driver initialization
and for service initialized driver. The init state
machine implemented in init_task() is merged
into the watchdog_task(). The init_task() function
is removed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Maor Dickman [Sun, 5 Sep 2021 11:22:11 +0000 (14:22 +0300)]
net/mlx5: E-Switch, Increase supported number of forward destinations to 32
Increase supported number of forward destinations in the same rule, local
and remote, from 2 to 32.
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Dickman [Mon, 20 Sep 2021 07:51:05 +0000 (10:51 +0300)]
net/mlx5: E-Switch, Use dynamic alloc for dest array
Use dynamic allocation for the dest array in preparation for
the next patch which increase MLX5_MAX_FLOW_FWD_VPORTS and
will cause stack allocation to be bigger than 1024 bytes.
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Gottlieb [Wed, 18 Aug 2021 21:19:14 +0000 (00:19 +0300)]
net/mlx5: Lag, use steering to select the affinity port in LAG
Use the steering based solution for select the affinity port
when the LAG mode is based on hash policy and the device support
in port selection flow table.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Gottlieb [Thu, 15 Jul 2021 07:13:52 +0000 (10:13 +0300)]
net/mlx5: Lag, add support to create/destroy/modify port selection
Add create function, build the steering tables, TTC and definers
according to the LAG hash type.
The destroy function, destroys all the steering components.
The modify functions is used when the bond mapping changes and it
iterates over all the rules in the definers and modifies them to steer
the packet to the relevant active ports.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Gottlieb [Thu, 15 Jul 2021 06:43:35 +0000 (09:43 +0300)]
net/mlx5: Lag, add support to create TTC tables for LAG port selection
Add support to create inner and outer TTC tables for LAG port
selection. These tables are used to classify the packets in
order to select the related definer.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Gottlieb [Tue, 17 Aug 2021 07:24:05 +0000 (10:24 +0300)]
net/mlx5: Lag, add support to create definers for LAG
Every definer will consist of a flow table with a single hash group
with exactly two flow table entries, one for each device port.
The destination of these entries is the uplink vport according to the
port state and hash policy.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Gottlieb [Tue, 13 Jul 2021 12:30:45 +0000 (15:30 +0300)]
net/mlx5: Lag, set match mask according to the traffic type bitmap
Set the related bits in the match definer mask according to the
TT mapping.
This mask will be used to create the match definers.
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>