review.tizen.org Git - platform/kernel/linux-starfive.git/log

phy: micrel: ksz8041nl: do not use power down mode

Some Micrel KSZ8041NL PHY chips exhibit continuous RX errors after using
the power down mode bit (0.11). If the PHY is taken out of power down
mode in a certain temperature range, the PHY enters a weird state which
leads to continuously reporting RX errors. In that state, the MAC is not
able to receive or send any Ethernet frames and the activity LED is
constantly blinking. Since Linux is using the suspend callback when the
interface is taken down, ending up in that state can easily happen
during a normal startup.

Micrel confirmed the issue in errata DS80000700A [*], caused by abnormal
clock recovery when using power down mode. Even the latest revision (A4,
Revision ID 0x1513) seems to suffer that problem, and according to the
errata is not going to be fixed.

Remove the suspend/resume callback to avoid using the power down mode
completely.

[*] https://ww1.microchip.com/downloads/en/DeviceDoc/80000700A.pdf

Fixes: 1a5465f5d6a2 ("phy/micrel: Add suspend/resume support to Micrel PHYs")
Signed-off-by: Stefan Agner <stefan@agner.ch>
Acked-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: unmap DMA in enetc_send_cmd()

Coverity complains of a possible dereference of a null return value.

    5. returned_null: kzalloc returns NULL. [show details]
    6. var_assigned: Assigning: si_data = NULL return value from kzalloc.
488        si_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
489        cbd.length = cpu_to_le16(data_size);
490
491        dma = dma_map_single(&priv->si->pdev->dev, si_data,
492                             data_size, DMA_FROM_DEVICE);

While this kzalloc() is unlikely to fail, I did notice that the function
returned without unmapping si_data.

Fix this by refactoring the error paths and checking for kzalloc()
failure.

Fixes: 888ae5a3952ba ("net: enetc: add tc flower psfp offload driver")
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org (open list)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-core: use netdev_* calls for kernel messages

While loading a driver and changing the number of queues, I noticed this
message in the kernel log:

"[253489.070080] Number of in use tx queues changed invalidating tc
mappings. Priority traffic classification disabled!"

But I had no idea what interface was being talked about because this
message used pr_warn().

After investigating, it appears we can use the netdev_* helpers already
defined to create predictably formatted messages, and that already handle
<unknown netdev> cases, in more of the messages in dev.c.

After this change, this message (and others) will look like this:
"[ 170.181093] ice 0000:3b:00.0 ens785f0: Number of in use tx queues
changed invalidating tc mappings. Priority traffic classification
disabled!"

One goal here was not to change the message significantly from the
original format so as to not break user's expectations, so I just
changed messages that used pr_* and generally started with %s ==
dev->name.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

batman-adv: use eth_hw_addr_set() instead of ether_addr_copy()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Convert batman from ether_addr_copy() to eth_hw_addr_set():

  @@
  expression dev, np;
  @@
  - ether_addr_copy(dev->dev_addr, np)
  + eth_hw_addr_set(dev, np)

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

mac802154: use dev_addr_set() - manual

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

mac802154: use dev_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

batman-adv: prepare for const netdev->dev_addr

netdev->dev_addr will be constant soon, make sure
the qualifier is propagated thru batman-adv.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

soc: fsl: dpio: Unsigned compared against 0 in qbman_swp_set_irq_coalescing()

Coverity complains of unsigned compare against 0. There are 2 cases in
this function:

1821        itp = (irq_holdoff * 1000) / p->desc->qman_256_cycles_per_ns;

CID 121131 (#1 of 1): Unsigned compared against 0 (NO_EFFECT)
unsigned_compare: This less-than-zero comparison of an unsigned value is never true. itp < 0U.
1822        if (itp < 0 || itp > 4096) {
1823                max_holdoff = (p->desc->qman_256_cycles_per_ns * 4096) / 1000;
1824                pr_err("irq_holdoff must be between 0..%dus\n", max_holdoff);
1825                return -EINVAL;
1826        }
1827
     unsigned_compare: This less-than-zero comparison of an unsigned value is never true. irq_threshold < 0U.
1828        if (irq_threshold >= p->dqrr.dqrr_size || irq_threshold < 0) {
1829                pr_err("irq_threshold must be between 0..%d\n",
1830                       p->dqrr.dqrr_size - 1);
1831                return -EINVAL;
1832        }

Fix this by removing the comparisons altogether as they are incorrect. Zero is
a possible value in either case. Also fix a minor comment typo and update the
2 pr_err() calls to use %u formatting as well as be more precise regarding
the exact error.

Fixes: ed1d2143fee5 ("soc: fsl: dpio: add support for irq coalescing per software portal")
Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
Cc: Roy Pledge <Roy.Pledge@nxp.com>
Cc: Li Yang <leoyang.li@nxp.com>
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: tidy for loop in setup and add cpu port check

Tidy and organize qca8k setup function from multiple for loop.
Change for loop in bridge leave/join to scan all port and skip cpu port.
No functional change intended.

Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2021-10-19

This series contains updates to ice driver only.

Brett implements support for ndo_set_vf_rate allowing for min_tx_rate
and max_tx_rate to be set for a VF.

Jesse updates DIM moderation to improve latency and resolves problems
with reported rate limit and extra software generated interrupts.

Wojciech moves a check for trusted VFs to the correct function,
disables lb_en for switchdev offloads, and refactors ethtool ops due
to differences in support for PF and port representor support.

Cai Huoqing utilizes the helper function devm_add_action_or_reset().

Gustavo A. R. Silva replaces uses of allocation to devm_kcalloc() as
applicable.

Dan Carpenter propagates an error instead of returning success.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dev_addr-conversions-part-three'

Jakub Kicinski says:

====================
ethernet: manual netdev->dev_addr conversions (part 3)

Manual conversions of Ethernet drivers writing directly
to netdev->dev_addr (part 3 out of 3).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: via-velocity: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: via-rhine: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: tlan: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, do the swapping, then
call eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: tehuti: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Break the address up into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: stmmac: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: netsec: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sja1105-next'

Vladimir Oltean says:

====================
New RGMII delay DT bindings for the SJA1105 DSA driver

During recent reviews I've been telling people that new MAC drivers
should adopt a certain DT binding format for RGMII delays in order to
avoid conflicting interpretations. Some suggestions were better received
than others, and it appears we are still far from a consensus.

Part of the problem seems to be that there are still drivers that apply
RGMII delays based on an incorrect interpretation of the device tree,
and these serve as a bad example for others.
I happen to maintain one of those drivers and I am able to test it, so I
figure that one of the ways in which I can make a change is to stop
providing a bad example.

Therefore, this series adds support for the "rx-internal-delay-ps" and
"tx-internal-delay-ps" properties inside sja1105 switch port DT nodes,
and if these are present, they will decide what RGMII delays will the
driver apply.

The in-tree device trees are also updated to follow the new format, as
well as the schema validator.

I assume it's okay to get all changes merged in through the same tree
(net-next). Although the DTS changes could be split, if needed - the
driver works with or without them. There is one more DTS which should be
changed, which is in Shawn's tree but not in net-next:
https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux.git/tree/arch/arm64/boot/dts/freescale/fsl-lx2160a-bluebox3.dts?h=for-next
For that, I'd have to send a separate patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: parse {rx, tx}-internal-delay-ps properties for RGMII delays

This change does not fix any functional issue or address any real life
use case that wasn't possible before. It is just a small step in the
process of standardizing the way in which Ethernet MAC drivers may apply
RGMII delays (traditionally these have been applied by PHYs, with no
clear definition of what to do in the case of a fixed-link).

The sja1105 driver used to apply MAC-level RGMII delays on the RX data
lines when in fixed-link mode and using a phy-mode of "rgmii-rxid" or
"rgmii-id" and on the TX data lines when using "rgmii-txid" or "rgmii-id".
But the standard definitions don't say anything about behaving
differently when the port is in fixed-link vs when it isn't, and the new
device tree bindings are about having a way of applying the delays in a
way that is independent of the phy-mode and of the fixed-link property.

When the {rx,tx}-internal-delay-ps properties are present, use them,
otherwise fall back to the old behavior and warn.

One other thing to note is that the SJA1105 hardware applies a delay
value in degrees rather than in picoseconds (the delay in ps changes
depending on the frequency of the RGMII clock - 125 MHz at 1G, 25 MHz at
100M, 2.5MHz at 10M). I assume that is fine, we calculate the phase
shift of the internal delay lines assuming that the device tree meant
gigabit, and we let the hardware scale those according to the link speed.

Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210723173108.459770-6-prasanna.vengateshan@microchip.com/
Link: https://patchwork.ozlabs.org/project/netdev/patch/20200616074955.GA9092@laureti-dev/#2461123
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: dsa: sja1105: add {rx,tx}-internal-delay-ps

Add a schema validator to nxp,sja1105.yaml and to dsa.yaml for explicit
MAC-level RGMII delays. These properties must be per port and must be
present only for a phy-mode that represents RGMII.

We tell dsa.yaml that these port properties might be present, we also
define their valid values for SJA1105. We create a common definition for
the RX and TX valid range, since it's quite a mouthful.

We also modify the example to include the explicit RGMII delay properties.
On the fixed-link ports (in the example, port 4), having these explicit
delays is actually mandatory, since with the new behavior, the driver
shouts that it is interpreting what delays to apply based on phy-mode.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: dsa: inherit the ethernet-controller DT schema

Since a switch is basically a bunch of Ethernet controllers, just
inherit the common schema for one to get stronger type validation of the
properties of a port.

For example, before this change it was valid to have a phy-mode = "xfi"
even if "xfi" is not part of ethernet-controller.yaml, now it is not.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: dsa: sja1105: fix example so all ports have a phy-handle of fixed-link

All ports require either a phy-handle or a fixed-link, and port 3 in the
example didn't have one. Add it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-sched-fixes-after-recent-qdisc-running-changes'

Eric Dumazet says:

====================
net: sched: fixes after recent qdisc->running changes

First patch fixes a plain bug in qdisc_run_begin().
Second patch removes a pair of atomic operations, increasing performance.
====================

Link: https://lore.kernel.org/r/20211019003402.2110017-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sched: remove one pair of atomic operations

__QDISC_STATE_RUNNING is only set/cleared from contexts owning qdisc lock.

Thus we can use less expensive bit operations, as we were doing
before commit f9eb8aea2a1e ("net_sched: transform qdisc running bit into a seqcount")

Fixes: 29cbcd858283 ("net: sched: Remove Qdisc::running sequence counter")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ahmed S. Darwish <a.darwish@linutronix.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sched: fix logic error in qdisc_run_begin()

For non TCQ_F_NOLOCK qdisc, qdisc_run_begin() tries to set
__QDISC_STATE_RUNNING and should return true if the bit was not set.

test_and_set_bit() returns old bit value, therefore we need to invert.

Fixes: 29cbcd858283 ("net: sched: Remove Qdisc::running sequence counter")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ahmed S. Darwish <a.darwish@linutronix.de>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: fix an error code in ice_ena_vfs()

Return the error code if ice_eswitch_configure() fails. Don't return
success.

Fixes: 1c54c839935b ("ice: enable/disable switchdev when managing VFs")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: use devm_kcalloc() instead of devm_kzalloc()

Use 2-factor multiplication argument form devm_kcalloc() instead
of devm_kzalloc().

Link: https://github.com/KSPP/linux/issues/162
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Make use of the helper function devm_add_action_or_reset()

The helper function devm_add_action_or_reset() will internally
call devm_add_action(), and if devm_add_action() fails then it will
execute the action mentioned and return the error code. So
use devm_add_action_or_reset() instead of devm_add_action()
to simplify the error handling, reduce the code.

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Refactor PR ethtool ops

This patch improves a few things:

- it fixes issue where ethtool -i reports that PR supports
  priv-flags and tests when in fact it does not support them
- instead of using the same functions for both PF and PR ethtool ops,
  this patch introduces separate ops for both cases and internal
  functions with core logic.
- prevent accessing VF VSI while VF is not ready by calling
  ice_check_vf_ready_for_cfg
- all PR specific functions in ethtool.c were moved to one place in
  file
- instead overwriting n_priv_flags in ice_repr_get_drvinfo,
  priv-flags code was moved from __ice_get_drvinfo to ice_get_drvinfo

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Manage act flags for switchdev offloads

Currently it is not possible to set/unset lb_en and lan_en flags
for advanced rules during their creation. Both flags are enabled
by default. In case of switchdev offloads for egress traffic we
need lb_en to be disabled. Because of that, we work around it by
updating the rule immediately after its creation.

This change allows us to set/unset those flags right away and it
gets rid of old workaround as well. Using ice_adv_rule_flags_info
structure we can pass info about flags we want to be set for
a given advanced rule. Flags are stored in flags_info.act.
Values from act would be used only if act_valid was set to true,
otherwise default values would be used.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Forbid trusted VFs in switchdev mode

Merge issues caused the check for switchdev mode has been inserted
in wrong place. It should be in ice_set_vf_trust not in ice_set_vf_mac.

Trusted VFs are forbidden in switchdev mode because they should
be configured only from the host side.

Fixes: 1c54c839935b ("ice: enable/disable switchdev when managing VFs")
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: fix software generating extra interrupts

The driver tried to work around missing completion events that occurred
while interrupts are disabled, by triggering a software interrupt
whenever we exit polling (but we had to have polled at least once).

This was causing a *lot* of extra interrupts for some workloads like
NVMe over TCP, which resulted in regressions in performance. It was also
visible when polling didn't prevent interrupts when busy_poll was
enabled.

Fix the extra interrupts by utilizing our previously unused 3rd ITR
(interrupt throttle) index and set it to 20K interrupts per second, and
then trigger a software interrupt within that rate limit.

While here, slightly refactor the code to avoid an overwrite of a local
variable in the case of wb_en = true.

Fixes: b7306b42beaf ("ice: manage interrupts during poll exit")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: fix rate limit update after coalesce change

If the adaptive settings are changed with
ethtool -C ethx adaptive-rx off adaptive-tx off
then the interrupt rate limit should be maintained as a user set value,
but only if BOTH adaptive settings are off. Fix a bug where the rate
limit that was being used in adaptive mode was staying set in the
register but was not reported correctly by ethtool -c ethx. Due to long
lines include a small refactor of q_vector variable.

Fixes: b8b4772377dd ("ice: refactor interrupt moderation writes")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: update dim usage and moderation

The driver was having trouble with unreliable latency when doing single
threaded ping-pong tests. This was root caused to the DIM algorithm
landing on a too slow interrupt value, which caused high latency, and it
was especially present when queues were being switched frequently by the
scheduler as happens on default setups today.

In attempting to improve this, we allow the upper rate limit for
interrupts to move to rate limit of 4 microseconds as a max, which means
that no vector can generate more than 250,000 interrupts per second. The
old config was up to 100,000. The driver previously tried to program the
rate limit too frequently and if the receive and transmit side were both
active on the same vector, the INTRL would be set incorrectly, and this
change fixes that issue as a side effect of the redesign.

This driver will operate from now on with a slightly changed DIM table
with more emphasis towards latency sensitivity by having more table
entries with lower latency than with high latency (high being >= 64
microseconds).

The driver also resets the DIM algorithm state with a new stats set when
there is no work done and the data becomes stale (older than 1 second),
for the respective receive or transmit portion of the interrupt.

Add a new helper for setting rate limit, which will be used more
in a followup patch.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add support for VF rate limiting

Implement ndo_set_vf_rate to support setting of min_tx_rate and
max_tx_rate; set the appropriate bandwidth in the scheduler for the
node representing the specified VF VSI.

Co-developed-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

net: ethernet: ixp4xx: Make use of dma_pool_zalloc() instead of dma_pool_alloc/memset()

Replacing dma_pool_alloc/memset() with dma_pool_zalloc()
to simplify the code.

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ieee802154: Remove redundant 'flush_workqueue()' calls

'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.

Remove the redundant 'flush_workqueue()' calls.

This was generated with coccinelle:

@@
expression E;
@@
- flush_workqueue(E);
destroy_workqueue(E);

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: Remove extra device_lock assert checks

PCI core code in the pci_call_probe() has a path that doesn't hold
device_lock. It happens because the ->probe() is called through the
workqueue mechanism.

   349 static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
   350                           const struct pci_device_id *id)
   351 {
   352
....
   377         if (cpu < nr_cpu_ids)
   378                 error = work_on_cpu(cpu, local_pci_probe, &ddi);

Luckily enough, the core still ensures that only single flow is executed,
so it safe to remove the assert checks that anyway were added for annotations
purposes.

Fixes: b88f7b1203bf ("devlink: Annotate devlink API calls")
Reported-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: Remove redundant statement

The variable will be assigned again later in the if condition,
there is no meaning there.

drivers/net/ethernet/broadcom/tg3.c:5750:2 warning:

Value stored to 'current_link_up' is never read.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phylink: Support disabling autonegotiation for PCS

The auto-negotiation state in the PCS as set by
phylink_mii_c22_pcs_config was previously always enabled when the
driver is configured for in-band autonegotiation, even if
autonegotiation was disabled on the interface with ethtool. Update the
code to set the BMCR_ANENABLE bit based on the interface's
autonegotiation enabled state.

Update phylink_mii_c22_pcs_get_state to not check
autonegotiation-related fields when autonegotiation is disabled.

Update phylink_mac_pcs_get_state to initialize the state based on the
interface's configured speed, duplex and pause parameters rather than
to unknown when autonegotiation is disabled, before calling the
driver's pcs_get_state functions, as they are not likely to provide
meaningful data for these fields when autonegotiation is disabled. In
this case the driver is really just filling in the link state field.

Note that in cases where there is a downstream PHY connected, such as
with SGMII and a copper PHY, the configuration set by ethtool is
handled by phy_ethtool_ksettings_set and not propagated to the PCS.
This is correct since SGMII or 1000Base-X autonegotiation with the PCS
should normally still be used even if the copper side has disabled it.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: Allow statistics reads from softirq.

Eric reported that the rate estimator reads statics from the softirq
which in turn triggers a warning introduced in the statistics rework.

The warning is too cautious. The updates happen in the softirq context
so reads from softirq are fine since the writes can not be preempted.
The updates/writes happen during qdisc_run() which ensures one writer
and the softirq context.
The remaining bad context for reading statistics remains in hard-IRQ
because it may preempt a writer.

Fixes: 29cbcd8582837 ("net: sched: Remove Qdisc::running sequence counter")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phylink: rejig SFP interface selection in ksettings_set()

Commit ea269a6f7207 ("net: phylink: Update SFP selected interface on
advertising changes") added a better solution to selecting the
interface mode for SFPs using the advertisement mask. This method will
work for mvneta and mvpp2 when selecting between 2500base-X and
1000base-X without needing to use the basex helper, or indicate that
we support both 1000base-X and 2500base-X when in either of these two
interface modes.

Hence, we need to eliminate the validation prior to selecting the
interface, otherwise when we clean up mvneta's validation function, we
will end up locking to 2500base-X as we validate with an interface mode
of PHY_INERFACE_MODE_2500BASEX.

The supported mask will already have been reduced down to the union of
support for the SFP and MAC already, so we can be confident that the
advertisement mask is already appropriately restricted. We only need to
select the appropriate interface, and then revalidate with the new
interface mode.

We get rid of the check for pl->sfp_port too, this is meaningless here
as it doesn't get cleared when a module is removed, so it doesn't
indicate if a module is present. Just rely on pl->sfp_bus.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: Remove redundant statement

This assignment statement is meaningless, because the statement
will execute to the tag "set_itr_now".

The clang_analyzer complains as follows:

drivers/net/ethernet/intel/e1000e/netdev.c:2552:3 warning:

Value stored to 'current_itr' is never read.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'eth_hw_addr_gen-for-switches'

Jakub Kicinski says:

====================
ethernet: add eth_hw_addr_gen() for switches

While doing the last polishing of the drivers/ethernet
changes I realized we have a handful of drivers offsetting
some base MAC addr by an id. So I decided to add a helper
for it. The helper takes care of wrapping which is probably
not 100% necessary but seems like a good idea. And it saves
driver side LoC (the diffstat is actually negative if we
compare against the changes I'd have to make if I was to
convert all these drivers to not operate directly on
netdev->dev_addr).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: sparx5: use eth_hw_addr_gen()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: mlxsw: use eth_hw_addr_gen()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: fec: use eth_hw_addr_gen()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: prestera: use eth_hw_addr_gen()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Vadym and Taras report that the current behavior of the driver
is not exactly expected and it's better to add the port id in
like other drivers do.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: ocelot: use eth_hw_addr_gen()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: add a helper for assigning port addresses

We have 5 drivers which offset base MAC addr by port id.
Create a helper for them.

This helper takes care of overflows, which some drivers
did not do, please complain if that's going to break
anything!

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dev_addr-conversions-part-two'

Jakub Kicinski says:

====================
ethernet: manual netdev->dev_addr conversions (part 2)

Manual conversions of Ethernet drivers writing directly
to netdev->dev_addr (part 2 out of 3).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: smsc: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Break the address up into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: smc91x: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: sis900: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: sis190: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: sxgbe: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: rocker: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: renesas: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Break the address up into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: r8169: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: netxen: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Invert the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: lpc: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: sky2/skge: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: mv643xx: use eth_hw_addr_set()

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Read the address into an array on the stack, then call
eth_hw_addr_set().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-multi-level-qdisc-offload'

Ido Schimmel says:

====================
mlxsw: Multi-level qdisc offload

Petr says:

Currently, mlxsw admits for offload a suitable root qdisc, and its
children. Thus up to two levels of hierarchy are offloaded. Often, this is
enough: one can configure TCs with RED and TCs with a shaper on, and can
even see counters for each TC by looking at a qdisc at a sufficiently
shallow position.

While simple, the system has obvious shortcomings. It is not possible to
configure both RED and shaping on one TC. It is not possible to place a
PRIO below root TBF, which would then be offloaded as port shaper. FIFOs
are only offloaded at root or directly below, which is confusing to users,
because RED and TBF of course have their own FIFO.

This patch set lifts assumptions that prevent offloading multi-level qdisc
trees.

In patch #1, offload of a graft operation is added to TBF. Grafts are
issued as another qdisc is linked to the qdisc in question, and give
drivers a chance to react to the linking. The absence of this event was not
a major issue so far, because TBF was not considered classful, which
changes with this patchset.

The codebase currently assumes that ETS and PRIO are the only classful
qdiscs. The following patches gradually lift this assumption.

In patch #2, calculation of traffic class and priomap of a qdisc is fixed.

Patch #3 fixes handling of future FIFOs. Child FIFO qdiscs may be created
and notified before their parent qdisc exists and therefore need special
handling.

Patches #4, #5 and #6 unify, respectively, child destruction, child
grafting, and cleanup of statistics.

Patch #7 adds a function that validates whether a given qdisc topology is
offloadable.

Finally in patch #8, TBF and RED become classful. At this point, FIFO
qdiscs grafted to an offloaded qdisc should always be offloaded.

Patch #9 adds a selftest to verify some offloadable and unoffloadable qdisc
trees.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mlxsw: Add a test for un/offloadable qdisc trees

This checks that various qdisc configurations either are or are not
offloaded.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_qdisc: Make RED, TBF offloads classful

Permit offloading qdiscs below RED and TBF. In order to avoid having to
implement trivial propagating callbacks for get_prio_bitmap and
get_tclass_num, extend mlxsw_sp_qdisc_get_prio_bitmap() and
..._get_tclass_num() to handle the lack of the callback as a cue to forward
the request to the parent.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_qdisc: Validate qdisc topology

A following patch will enable offloading qdiscs that are deeper than
directly under root qdisc. Currently the topology validation consists of
demanding a root qdisc position for ETS and PRIO. Since RED and TBF are
considered classless, this is enough. In order to prevent some nonsensical
combinations when RED and TBF become classful, introduce a more general
topology validator.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_qdisc: Clean stats recursively when priomap changes

On Spectrum, there are no per-TC TX counters. Instead, mlxsw uses per-prio
counters and aggregates them according to the priomap. Therefore when
priomap changes, the counter base values need to be reset to reflect the
change. Previously, this was only done for the sole child qdisc, but a
following patch makes RED and TBF classful. Thus apply the request to the
whole sub-tree.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_qdisc: Unify graft validation

Qdisc graft operations have so far been reported at PRIO, ETS and RED, with
RED events ignored, because RED was not considered a classful qdisc. A
following patch will make mlxsw recognize RED and TBF as classful qdiscs,
and thus it is necessary to validate grafting at these qdiscs as well.
Rename the existing graft validator to make it clear that it is a generic
function, and invoke for RED and TBF graft events as well. Drop the
unnecessary PRIO helper and invoke the graft validator directly for PRIO as
well.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_qdisc: Destroy children in mlxsw_sp_qdisc_destroy()

Currently ETS and PRIO are the only offloaded classful qdiscs. Since they
are both similar, their destroy handler is the same, and it handles
children destruction itself. But now it is possible to do it generically
for any classful qdisc. Therefore promote the recursive destruction from
the ETS handler to mlxsw_sp_qdisc_destroy(), so that RED and TBF pick it up
in follow-up patches.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_qdisc: Extract two helpers for handling future FIFOs

Extract from __mlxsw_sp_qdisc_ets_replace() two helpers for handling of one
future FIFO resp. reinitializing the array of future FIFOs.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_qdisc: Query tclass / priomap instead of caching it

Currently when keeping track of qdiscs, mlxsw notes the TC and priomap
corresponding to each qdisc. That is fine currently, as there only ever is
one level of qdiscs to update: the direct children of ETS / PRIO. However
as deeper structures are made offloadable, ETS would need to update these
values for the complete subtree, and interim qdiscs would need to remember
to propagate the value.

Instead, reverse the responsibility: child qdiscs can ask their parent what
their TC and priomap are. ETS / PRIO know the answer right away, or there
are defaults for when the root qdisc does not assign them (e.g. when RED is
used as root qdisc). When RED and TBF become classful, they will simply
forward the request up to their parent.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sch_tbf: Add a graft command

As another qdisc is linked to the TBF, the latter should issue an event to
give drivers a chance to react to the grafting. In other qdiscs, this event
is called GRAFT, so follow suit with TBF as well.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-updates-2021-10-18' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

mlx5-updates-2021-10-18

Maor Maor Gottlieb says:
========================
Use hash to select the affinity port in VF LAG

Current VF LAG architecture is based on QP association with a port.
QP must be created after LAG is enabled to allow association with non-native port.
VM Packets going on slow-path to eSwicth manager (SW path or hairpin) will be transmitted
through a different QP than the VM. This means that Different packets of the same flow might
egress from different physical ports.

This patch-set solves this issue by moving the port selection to be based on the hash function
defined by the bond.

When the device is moved to VF LAG mode, the driver creates TTC (traffic type classifier) flow
tables in order to classify the packet and steer it to the relevant hash function. Similar to what
is done in the mlx5 RSS implementation.

Each rule in the TTC table, forwards the packet to port selection flow table which has one hash
split flow group which contains two "catch all" flow table entries. Each entry point to the
relative uplink port. As shown below:

-------------------
| FT              |
TTC rule -> |     ----------- |
|   FG|   FTE --|-|-----> uplink of port #1
|     |   FTE --|-|-----> uplink of port #2
|     ----------- |
-------------------

Hash split flow group is flow group that created as type of HASH_SPLIT and associated with match definer.
The match definer define the fields which included in the hash calculation.

The driver creates the match definer according to the xmit hash policy of the bond driver.

Patches overview:
========================

Minor E-Switch updates:
- Patch #12, dynamic  allocation of dest array
- Patch #13, increase number of forward destinations to 32

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/next-queue

Mateusz Palczewski says:

====================
40GbE Intel Wired LAN Driver Updates 2021-10-18

Use single state machine for driver initialization
and for service initialized driver. The init state
machine implemented in init_task() is merged
into the watchdog_task(). The init_task() function
is removed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: E-Switch, Increase supported number of forward destinations to 32

Increase supported number of forward destinations in the same rule, local
and remote, from 2 to 32.

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: E-Switch, Use dynamic alloc for dest array

Use dynamic allocation for the dest array in preparation for
the next patch which increase MLX5_MAX_FLOW_FWD_VPORTS and
will cause stack allocation to be bigger than 1024 bytes.

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Lag, use steering to select the affinity port in LAG

Use the steering based solution for select the affinity port
when the LAG mode is based on hash policy and the device support
in port selection flow table.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Lag, add support to create/destroy/modify port selection

Add create function, build the steering tables, TTC and definers
according to the LAG hash type.

The destroy function, destroys all the steering components.

The modify functions is used when the bond mapping changes and it
iterates over all the rules in the definers and modifies them to steer
the packet to the relevant active ports.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Lag, add support to create TTC tables for LAG port selection

Add support to create inner and outer TTC tables for LAG port
selection. These tables are used to classify the packets in
order to select the related definer.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Lag, add support to create definers for LAG

Every definer will consist of a flow table with a single hash group
with exactly two flow table entries, one for each device port.
The destination of these entries is the uplink vport according to the
port state and hash policy.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Lag, set match mask according to the traffic type bitmap

Set the related bits in the match definer mask according to the
TT mapping.
This mask will be used to create the match definers.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Lag, set LAG traffic type mapping

Generate a traffic type bitmap that will define which
steering objects we need to create for the steering
based LAG.

Bits in this bitmap are set according to the LAG hash type.
In addition, have a field that indicate if the lag is in encap
mode or not.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Lag, move lag files into directory

Downstream patches add another lag related file so it makes
sense to have all the lag files in a dedicated directory.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Introduce new uplink destination type

The uplink destination type should be used in rules to steer the
packet to the uplink when the device is in steering based LAG mode.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Add support to create match definer

Introduce new APIs to create and destroy flow matcher
for given format id.

Flow match definer object is used for defining the fields and
mask used for the hash calculation. User should mask the desired
fields like done in the match criteria.

This object is assigned to flow group of type hash. In this flow
group type, packets lookup is done based on the hash result.

This patch also adds the required bits to create such flow group.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Introduce port selection namespace

Add new port selection flow steering namespace. Flow steering rules in
this namespaceare are used to determine the physical port for egress
packets.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Support partial TTC rules

Add bitmasks to ttc_params to indicate if rule is valid or not.
It will allow to create TTC table with support only in part of the
traffic types.
In later patches which introduce the steering based LAG port selection,
TTC will be created with only part of the rules according to the hash
type.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

qed: Change the TCP common variable - "iscsi_ooo"

Change the TCP common variable - "iscsi_ooo" to "ooo_opq".
This variable is common between all the TCP L5 protocols and not
specific to iSCSI.

Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Link: https://lore.kernel.org/r/20211015124118.29041-2-smalin@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qed: Optimize the ll2 ooo flow

Optimize the ll2 TCP out-of-order likely flows:
- Optimize the non-error flows of the ll2 ooo data path.
- Optimize "QED_OOO_RIGHT_BUF" over "QED_OOO_LEFT_BUF".

Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Link: https://lore.kernel.org/r/20211015124118.29041-1-smalin@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: adjust file entry for of_net.c after movement

Commit e330fb14590c ("of: net: move of_net under net/") moves of_net.c
to ./net/core/, but misses to adjust the reference to this file in
MAINTAINERS.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains:

warning: no file matches F: drivers/of/of_net.c

Adjust the file entry after this file movement.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20211016055815.14397-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

iavf: Combine init and watchdog state machines

Use single state machine for driver initialization and for service
initialized driver. The init state machine implemented in init_task()
is merged into the watchdog_task(). The init_task() function is
removed.

Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

iavf: Add __IAVF_INIT_FAILED state

This commit adds a new state, __IAVF_INIT_FAILED to the state machine.
From now on initialization functions report errors not by returning an
error value, but by changing the state to indicate that something went
wrong.

Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

iavf: Refactor iavf state machine tracking

Replace state changes of iavf state machine
with a method that also tracks the previous
state the machine was on.

This change is required for further work with
refactoring init and watchdog state machines.

Tracking of previous state would help us
recover iavf after failure has occurred.

Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

mlx5: prevent 64bit divide

mlx5_tout_ms() returns a u64, we can't directly divide it.
This is not a problem here, @timeout which is the value
that actually matters here is already a ulong, so this
implies storing return value of mlx5_tout_ms() on a ulong
should be fine.

This fixes:

ERROR: modpost: "__udivdi3" [drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko] undefined!

Fixes: 32def4120e48 ("net/mlx5: Read timeout values from DTOR")
Link: https://lore.kernel.org/r/20211018172608.1069754-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: Fix reading non-legacy supported link modes

Everything except the first 32 bits was lost when the pause flags were
added. This makes the 50000baseCR2 mode flag (bit 34) not appear.

I have tested this with a 10G card (SFN5122F-R7) by modifying it to
return a non-legacy link mode (10000baseCR).

Signed-off-by: Erik Ekman <erik@kryo.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: fix delay applied to wrong cpu in parse_port_config

Fix delay settings applied to wrong cpu in parse_port_config. The delay
values is set to the wrong index as the cpu_port_index is incremented
too early. Start the cpu_port_index to -1 so the correct value is
applied to address also the case with invalid phy mode and not available
port.

Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS for net-next:

1) Add new run_estimation toggle to IPVS to stop the estimation_timer
   logic, from Dust Li.

2) Relax superfluous dynset check on NFT_SET_TIMEOUT.

3) Add egress hook, from Lukas Wunner.

4) Nowadays, almost all hook functions in x_table land just call the hook
   evaluation loop. Remove remaining hook wrappers from iptables and IPVS.
   From Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rtl8365mb-vc-support'

Alvin Šipraga says:

====================
net: dsa: add support for RTL8365MB-VC

This series adds support for Realtek's RTL8365MB-VC, a 4+1 port
10/100/1000M Ethernet switch. The driver - rtl8365mb - was developed by
Michael Ramussen and myself.

This version of the driver is relatively slim, implementing only the
standalone port functionality and no offload capabilities. It is based
on a previous RFC series [1] from August, and the main difference is the
removal of some spurious VLAN operations. Otherwise I have simply
addressed most of the feedback. Please see the respective patches for
more detail.

In parallel I am working on offloading the bridge layer capabilities,
but I would like to get the basic stuff upstreamed as soon as possible.

v3 -> v4:
  - get irq before setting virq parents (fixes kernel test robot
    warning)
  - remove pad-to-72-bytes logic in tagger xmit (fixes DENG Qingfang's
    suggestion); no longer needed as we set CPU minimum RX size to 64
    bytes
  - use mutex to protect MIB counter access instead of a spinlock (fixes
    Jakub's feedback on v3 statistics refactoring)

v2 -> v3:
  - move IRQ setup earlier in probe per Florian's suggestion
  - fix compilation error on some archs due to FIELD_PREP use in v1
  - follow Jakub's suggestion and use the standard ethtool stats API;
    NOTE: new patch in the series for relevant DSA plumbing
  - following the stats change, it became apparent that the rtl8366
    helper library is no longer that helpful; scrap it and implement
    the ethtool ops specifically for this chip

v1 -> v2:
  - drop DSA port type checks during MAC configuration
  - use OF properties to configure RGMII TX/RX delay
  - don't set default fwd_offload_mark if packet is trapped to CPU
  - remove port mapping macros
  - update device tree bindings documentation with an example
  - cosmetic changes to the tagging driver using FIELD_* macros

[1] https://lore.kernel.org/netdev/20210822193145.1312668-1-alvin@pqrs.dk/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: realtek: add support for RTL8365MB-VC internal PHYs

The RTL8365MB-VC ethernet switch controller has 4 internal PHYs for its
user-facing ports. All that is needed is to let the PHY driver core
pick up the IRQ made available by the switch driver.

Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>