review.tizen.org Git - platform/kernel/linux-starfive.git/log

rtnetlink: Add RTNH_F_TRAP flag

The flag indicates to user space that the nexthop is not programmed to
forward packets in hardware, but rather to trap them to the CPU. This is
needed, for example, when the MAC of the nexthop neighbour is not
resolved and packets should reach the CPU to trigger neighbour
resolution.

The flag will be used in subsequent patches by netdevsim to test nexthop
objects programming to device drivers and in the future by mlxsw as
well.

Changes since RFC:
* Reword commit message

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: vxlan: Convert to new notification info

Convert the sole listener of the nexthop notification chain (the VXLAN
driver) to the new notification info.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Prepare new notification info

Prepare the new notification information so that it could be passed to
listeners in the new patch.

Changes since RFC:
* Add a blank line in __nh_notifier_single_info_init()

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Pass extack to nexthop notifier

The next patch will add extack to the notification info. This allows
listeners to veto notifications and communicate the reason to user space.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Add nexthop notification data structures

Add data structures that will be used for nexthop replace and delete
notifications in the previously introduced nexthop notification chain.

New data structures are added instead of passing the existing nexthop
code structures directly for several reasons.

First, the existing structures encode a lot of bookkeeping information
which is irrelevant for listeners of the notification chain.

Second, the existing structures can be changed without worrying about
introducing regressions in listeners since they are not accessed
directly by them.

Third, listeners of the notification chain do not need to each parse the
relatively complex nexthop code structures. They are passing the
required information in a simplified way.

Note that a single 'has_encap' bit is added instead of the actual
encapsulation information since current listeners do not support such
nexthops.

Changes since RFC:
* s/is_encap/has_encap/

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'mlx5-updates-2020-11-03' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-11-03

This series includes updates to mlx5 software steering component.

1) Few improvements in the DR area, such as removing unneeded checks,
  renaming to better general names, refactor in some places, etc.

2) Software steering (DR) Memory management improvements

This patch series contains SW Steering memory management improvements:
using buddy allocator instead of an existing bucket allocator, and
several other optimizations.

The buddy system is a memory allocation and management algorithm
that manages memory in power of two increments.

The algorithm is well-known and well-described, such as here:
https://en.wikipedia.org/wiki/Buddy_memory_allocation

Linux uses this algorithm for managing and allocating physical pages,
as described here:
https://www.kernel.org/doc/gorman/html/understand/understand009.html

In our case, although the algorithm in principal is similar to the
Linux physical page allocator, the "building blocks" and the circumstances
are different: in SW steering, buddy allocator doesn't really allocates
a memory, but rather manages ICM (Interconnect Context Memory) that was
previously allocated and registered.

The ICM memory that is used in SW steering is always power
of 2 (order), so buddy system is a good fit for this.

Patches in this series:

[PATH 4] net/mlx5: DR, Add buddy allocator utilities
  This patch adds a modified implementation of a well-known buddy allocator,
  adjusted for SW steering needs: the algorithm in principal is similar to
  the Linux physical page allocator, but in our case buddy allocator doesn't
  really allocate a memory, but rather manages ICM memory that was previously
  allocated and registered.

[PATH 5] net/mlx5: DR, Handle ICM memory via buddy allocation instead of bucket management
  This patch changes ICM management of SW steering to use buddy-system mechanism
  Instead of the previous bucket management.

[PATH 6] net/mlx5: DR, Sync chunks only during free
  This patch makes syncing happen only when freeing memory chunks.

[PATH 7] net/mlx5: DR, ICM memory pools sync optimization
  This patch adds tracking of pool's "hot" memory and makes the
  check whether steering sync is required much shorter and faster.

[PATH 8] net/mlx5: DR, Free buddy ICM memory if it is unused
  This patch adds tracking buddy's used ICM memory,
  and frees the buddy if all its memory becomes unused.

3) Misc code cleanups

* tag 'mlx5-updates-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net: mlx5: Replace in_irq() usage
  net/mlx5: Cleanup kernel-doc warnings
  net/mlx4: Cleanup kernel-doc warnings
  net/mlx5e: Validate stop_room size upon user input
  net/mlx5: DR, Free unused buddy ICM memory
  net/mlx5: DR, ICM memory pools sync optimization
  net/mlx5: DR, Sync chunks only during free
  net/mlx5: DR, Handle ICM memory via buddy allocation instead of buckets
  net/mlx5: DR, Add buddy allocator utilities
  net/mlx5: DR, Rename matcher functions to be more HW agnostic
  net/mlx5: DR, Rename builders HW specific names
  net/mlx5: DR, Remove unused member of action struct
====================

Link: https://lore.kernel.org/r/20201105201242.21716-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/usb/r8153_ecm: support ECM mode for RTL8153

Support ECM mode based on cdc_ether with relative mii functions,
when CONFIG_USB_RTL8152 is not set, or the device is not supported
by r8152 driver.

Both r8152 and r8153_ecm would check the return value of
rtl8152_get_version() in porbe(). If rtl8152_get_version()
return none zero value, the r8152 is used for the device
with vendor mode. Otherwise, the r8153_ecm is used for the
device with ECM mode.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/1394712342-15778-392-Taiwan-albertk@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Add mhi-net driver

This patch adds a new network driver implementing MHI transport for
network packets. Packets can be in any format, though QMAP (rmnet)
is the usual protocol (flow control + PDN mux).

It support two MHI devices, IP_HW0 which is, the path to the IPA
(IP accelerator) on qcom modem, And IP_SW0 which is the software
driven IP path (to modem CPU).

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1604424234-24446-2-git-send-email-loic.poulain@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bus: mhi: Add mhi_queue_is_full function

This function can be used by client driver to determine whether it's
possible to queue new elements in a channel ring.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1604424234-24446-1-git-send-email-loic.poulain@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-add-support-for-shared-interrupts-part-1'

Ioana Ciornei says:

====================
net: phy: add support for shared interrupts (part 1)

This patch set aims to actually add support for shared interrupts in
phylib and not only for multi-PHY devices. While we are at it,
streamline the interrupt handling in phylib.

For a bit of context, at the moment, there are multiple phy_driver ops
that deal with this subject:

- .config_intr() - Enable/disable the interrupt line.

- .ack_interrupt() - Should quiesce any interrupts that may have been
  fired.  It's also used by phylib in conjunction with .config_intr() to
  clear any pending interrupts after the line was disabled, and before
  it is going to be enabled.

- .did_interrupt() - Intended for multi-PHY devices with a shared IRQ
  line and used by phylib to discern which PHY from the package was the
  one that actually fired the interrupt.

- .handle_interrupt() - Completely overrides the default interrupt
  handling logic from phylib. The PHY driver is responsible for checking
  if any interrupt was fired by the respective PHY and choose
  accordingly if it's the one that should trigger the link state machine.

From my point of view, the interrupt handling in phylib has become
somewhat confusing with all these callbacks that actually read the same
PHY register - the interrupt status.  A more streamlined approach would
be to just move the responsibility to write an interrupt handler to the
driver (as any other device driver does) and make .handle_interrupt()
the only way to deal with interrupts.

Another advantage with this approach would be that phylib would gain
support for shared IRQs between different PHY (not just multi-PHY
devices), something which at the moment would require extending every
PHY driver anyway in order to implement their .did_interrupt() callback
and duplicate the same logic as in .ack_interrupt(). The disadvantage
of making .did_interrupt() mandatory would be that we are slightly
changing the semantics of the phylib API and that would increase
confusion instead of reducing it.

What I am proposing is the following:

- As a first step, make the .ack_interrupt() callback optional so that
  we do not break any PHY driver amid the transition.

- Every PHY driver gains a .handle_interrupt() implementation that, for
  the most part, would look like below:

irq_status = phy_read(phydev, INTR_STATUS);
if (irq_status < 0) {
phy_error(phydev);
return IRQ_NONE;
}

if (!(irq_status & irq_mask))
return IRQ_NONE;

phy_trigger_machine(phydev);

return IRQ_HANDLED;

- Remove each PHY driver's implementation of the .ack_interrupt() by
  actually taking care of quiescing any pending interrupts before
  enabling/after disabling the interrupt line.

- Finally, after all drivers have been ported, remove the
  .ack_interrupt() and .did_interrupt() callbacks from phy_driver.

This patch set is part 1 and it addresses the changes needed in phylib
and 7 PHY drivers. The rest can be found on my Github branch here:
https://github.com/IoanaCiornei/linux/commits/phylib-shared-irq

I do not have access to most of these PHY's, therefore I Cc-ed the
latest contributors to the individual PHY drivers in order to have
access, hopefully, to more regression testing.

Changes in v2:
- Rework the .handle_interrupt() implementation for each driver so that
   only the enabled interrupts are taken into account when
   IRQ_NONE/IRQ_HANDLED it returned. The main idea is so that we avoid
   falsely blaming a device for triggering an interrupt when this is not
   the case.
   The only devices for which I was unable to make this adjustment were
   the BCM8706, BCM8727, BCMAC131 and BCM5241 since I do not have access
   to their datasheets.
- I also updated the pseudo-code added in the cover-letter so that it's
   more clear how a .handle_interrupt() callback should look like.
====================

Tested-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201101125114.1316879-1-ciorneiioana@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: remove the use of .ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: Willy Liu <willy.liu@realtek.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: implement generic .handle_interrupt() callback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: Willy Liu <willy.liu@realtek.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: add genphy_handle_interrupt_no_ack()

It seems there are cases where the interrupts are handled by another
entity (ie an IRQ controller embedded inside the PHY) and do not need
any other interraction from phylib. For this kind of PHYs, like the
RTL8366RB, add the genphy_handle_interrupt_no_ack() function which just
triggers the link state machine.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: davicom: remove the use of .ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: davicom: implement generic .handle_interrupt() calback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: cicada: remove the use of .ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: cicada: implement the generic .handle_interrupt() callback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: broadcom: remove use of ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Michael Walle <michael@walle.cc>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: broadcom: implement generic .handle_interrupt() callback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: remove the use of .ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: implement generic .handle_interrupt() callback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: mscc: remove the use of .ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com> # VSC8514
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: mscc: implement generic .handle_interrupt() callback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Also, remove the .did_interrupt() callback since it's not anymore used.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com> # VSC8514
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: mscc: use phy_trigger_machine() to notify link change

According to the comment describing the phy_mac_interrupt() function, it
it intended to be used by MAC drivers which have noticed a link change
thus its use in the mscc PHY driver is improper and, most probably, was
added just because phy_trigger_machine() was not exported.
Now that we have acces to trigger the link state machine, use directly
the phy_trigger_machine() function to notify a link change detected by
the PHY driver.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: at803x: remove the use of .ack_interrupt()

In preparation of removing the .ack_interrupt() callback, we must replace
its occurrences (aka phy_clear_interrupt), from the 2 places where it is
called from (phy_enable_interrupts and phy_disable_interrupts), with
equivalent functionality.

This means that clearing interrupts now becomes something that the PHY
driver is responsible of doing, before enabling interrupts and after
clearing them. Make this driver follow the new contract.

Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: Michael Walle <michael@walle.cc>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: at803x: implement generic .handle_interrupt() callback

In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.

Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: Michael Walle <michael@walle.cc>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: make .ack_interrupt() optional

As a first step into making phylib and all PHY drivers to actually
have support for shared IRQs, make the .ack_interrupt() callback
optional.

After all drivers have been moved to implement the generic
interrupt handle, the phy_drv_supports_irq() check will be
changed again to only require the .handle_interrupts() callback.

Cc: Alexandru Ardelean <alexandru.ardelean@analog.com>
Cc: Andre Edich <andre.edich@microchip.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Baruch Siach <baruch@tkos.co.il>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Dan Murphy <dmurphy@ti.com>
Cc: Divya Koppera <Divya.Koppera@microchip.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Jerome Brunet <jbrunet@baylibre.com>
Cc: Kavya Sree Kotagiri <kavyasree.kotagiri@microchip.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Marco Felsch <m.felsch@pengutronix.de>
Cc: Marek Vasut <marex@denx.de>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: Mathias Kresin <dev@kresin.me>
Cc: Maxim Kochetkov <fido_max@inbox.ru>
Cc: Michael Walle <michael@walle.cc>
Cc: Neil Armstrong <narmstrong@baylibre.com>
Cc: Nisar Sayed <Nisar.Sayed@microchip.com>
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: Philippe Schenker <philippe.schenker@toradex.com>
Cc: Willy Liu <willy.liu@realtek.com>
Cc: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: add a shutdown procedure

In case of a board which uses a shared IRQ we can easily end up with an
IRQ storm after a forced reboot.

For example, a 'reboot -f' will trigger a call to the .shutdown()
callbacks of all devices. Because phylib does not implement that hook,
the PHY is not quiesced, thus it can very well leave its IRQ enabled.

At the next boot, if that IRQ line is found asserted by the first PHY
driver that uses it, but _before_ the driver that is _actually_ keeping
the shared IRQ asserted is probed, the IRQ is not going to be
acknowledged, thus it will keep being fired preventing the boot process
of the kernel to continue. This is even worse when the second PHY driver
is a module.

To fix this, implement the .shutdown() callback and disable the
interrupts if these are used.

Note that we are still susceptible to IRQ storms if the previous kernel
exited with a panic or if the bootloader left the shared IRQ active, but
there is absolutely nothing we can do about these cases.

Cc: Alexandru Ardelean <alexandru.ardelean@analog.com>
Cc: Andre Edich <andre.edich@microchip.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Baruch Siach <baruch@tkos.co.il>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Dan Murphy <dmurphy@ti.com>
Cc: Divya Koppera <Divya.Koppera@microchip.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Jerome Brunet <jbrunet@baylibre.com>
Cc: Kavya Sree Kotagiri <kavyasree.kotagiri@microchip.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Marco Felsch <m.felsch@pengutronix.de>
Cc: Marek Vasut <marex@denx.de>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: Mathias Kresin <dev@kresin.me>
Cc: Maxim Kochetkov <fido_max@inbox.ru>
Cc: Michael Walle <michael@walle.cc>
Cc: Neil Armstrong <narmstrong@baylibre.com>
Cc: Nisar Sayed <Nisar.Sayed@microchip.com>
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: Philippe Schenker <philippe.schenker@toradex.com>
Cc: Willy Liu <willy.liu@realtek.com>
Cc: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: export phy_error and phy_trigger_machine

These functions are currently used by phy_interrupt() to either signal
an error condition or to trigger the link state machine. In an attempt
to actually support shared PHY IRQs, export these two functions so that
the actual PHY drivers can use them.

Cc: Alexandru Ardelean <alexandru.ardelean@analog.com>
Cc: Andre Edich <andre.edich@microchip.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Baruch Siach <baruch@tkos.co.il>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Dan Murphy <dmurphy@ti.com>
Cc: Divya Koppera <Divya.Koppera@microchip.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Jerome Brunet <jbrunet@baylibre.com>
Cc: Kavya Sree Kotagiri <kavyasree.kotagiri@microchip.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Marco Felsch <m.felsch@pengutronix.de>
Cc: Marek Vasut <marex@denx.de>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: Mathias Kresin <dev@kresin.me>
Cc: Maxim Kochetkov <fido_max@inbox.ru>
Cc: Michael Walle <michael@walle.cc>
Cc: Neil Armstrong <narmstrong@baylibre.com>
Cc: Nisar Sayed <Nisar.Sayed@microchip.com>
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: Philippe Schenker <philippe.schenker@toradex.com>
Cc: Willy Liu <willy.liu@realtek.com>
Cc: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sctp: bring inet(6)_skb_parm back to sctp_input_cb

inet(6)_skb_parm was removed from sctp_input_cb by Commit a1dd2cf2f1ae
("sctp: allow changing transport encap_port by peer packets"), as it
thought sctp_input_cb->header is not used any more in SCTP.

syzbot reported a crash:

  [ ] BUG: KASAN: use-after-free in decode_session6+0xe7c/0x1580
  [ ]
  [ ] Call Trace:
  [ ]  <IRQ>
  [ ]  dump_stack+0x107/0x163
  [ ]  kasan_report.cold+0x1f/0x37
  [ ]  decode_session6+0xe7c/0x1580
  [ ]  __xfrm_policy_check+0x2fa/0x2850
  [ ]  sctp_rcv+0x12b0/0x2e30
  [ ]  sctp6_rcv+0x22/0x40
  [ ]  ip6_protocol_deliver_rcu+0x2e8/0x1680
  [ ]  ip6_input_finish+0x7f/0x160
  [ ]  ip6_input+0x9c/0xd0
  [ ]  ipv6_rcv+0x28e/0x3c0

It was caused by sctp_input_cb->header/IP6CB(skb) still used in sctp rx
path decode_session6() but some members overwritten by sctp6_rcv().

This patch is to fix it by bring inet(6)_skb_parm back to sctp_input_cb
and not overwriting it in sctp4/6_rcv() and sctp_udp_rcv().

Reported-by: syzbot+5be8aebb1b7dfa90ef31@syzkaller.appspotmail.com
Fixes: a1dd2cf2f1ae ("sctp: allow changing transport encap_port by peer packets")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/136c1a7a419341487c504be6d1996928d9d16e02.1604472932.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'hirschmann-hellcreek-dsa-driver'

Kurt Kanzenbach says:

====================
Hirschmann Hellcreek DSA driver

this series adds a DSA driver for the Hirschmann Hellcreek TSN switch
IP. Characteristics of that IP:

* Full duplex Ethernet interface at 100/1000 Mbps on three ports
* IEEE 802.1Q-compliant Ethernet Switch
* IEEE 802.1Qbv Time-Aware scheduling support
* IEEE 1588 and IEEE 802.1AS support

That IP is used e.g. in

https://www.arrow.com/en/campaigns/arrow-kairos

Due to the hardware setup the switch driver is implemented using DSA. A special
tagging protocol is leveraged. Furthermore, this driver supports PTP and
hardware timestamping.

This work is part of the AccessTSN project: https://www.accesstsn.com/

The previous versions can be found here:

* https://lkml.kernel.org/netdev/20200618064029.32168-1-kurt@linutronix.de/
* https://lkml.kernel.org/netdev/20200710113611.3398-1-kurt@linutronix.de/
* https://lkml.kernel.org/netdev/20200723081714.16005-1-kurt@linutronix.de/
* https://lkml.kernel.org/netdev/20200820081118.10105-1-kurt@linutronix.de/
* https://lkml.kernel.org/netdev/20200901125014.17801-1-kurt@linutronix.de/
* https://lkml.kernel.org/netdev/20200904062739.3540-1-kurt@linutronix.de/
* https://lkml.kernel.org/netdev/20201004112911.25085-1-kurt@linutronix.de/
* https://lkml.kernel.org/netdev/20201028074221.29326-1-kurt@linutronix.de/

Changes since v7:

* Simplify tagging code (rebase to net-next)
* Pass info instead of ptr (Florian Fainelli)
* Fix yamllint warnings (Rob Herring)

Changes since v6:

* Add .tail_tag = true (Vladimir Oltean)
* Fix vlan_filtering=0 bridges (Vladimir Oltean)
* Enforce restrictions (Vladimir Oltean)
* Sort stuff alphabetically (Vladimir Oltean)
* Rename hellcreek.yaml to hirschmann,hellcreek.yaml
* Typo fixes

Changes since v5:

* Implement configure_vlan_while_not_filtering behavior (Vladimir Oltean)
* Minor cleanups

Changes since v4:

* Fix W=1 compiler warnings (kernel test robot)
* Add tags

Changes since v3:

* Drop TAPRIO support (David Miller)
   => Switch to mutexes due to the lack of hrtimers
* Use more specific compatible strings and add platform data (Andrew Lunn)
* Fix Kconfig ordering (Andrew Lunn)

Changes since v2:

* Make it compile by getting all requirements merged first (Jakub Kicinski, David Miller)
* Use "tsn" for TSN register set (Rob Herring)
* Fix DT binding issues (Rob Herring)

Changes since v1:

* Code simplifications (Florian Fainelli, Vladimir Oltean)
* Fix issues with hellcreek.yaml bindings (Florian Fainelli)
* Clear reserved field in ptp v2 event messages (Richard Cochran)
* Make use of generic ptp parsing function (Richard Cochran, Vladimir Oltean)
* Fix Kconfig (Florian Fainelli)
* Add tags (Florian Fainelli, Rob Herring, Richard Cochran)

Changes since RFC ordered by reviewers:

* Andrew Lunn
   * Use dev_dbg for debug messages
   * Get rid of __ function names where possible
   * Use reverse xmas tree variable ordering
   * Remove redundant/useless checks
   * Improve comments e.g. for PTP
   * Fix Kconfig ordering
   * Make LED handling more generic and provide info via DT
   * Setup advertisement of PHYs according to hardware
   * Drop debugfs patch
* Jakub Kicinski
   * Fix compiler warnings
* Florian Fainelli
   * Switch to YAML DT bindings
* Richard Cochran
   * Fix typo
   * Add missing NULL checks
====================

Link: https://lore.kernel.org/r/20201103071101.3222-1-kurt@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: dsa: Add documentation for Hellcreek switches

Add basic documentation and example.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: Add vendor prefix for Hirschmann

Hirschmann is building devices for automation and networking. Add them to the
vendor prefixes.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: hellcreek: Add PTP status LEDs

The switch has two controllable I/Os which are usually connected to LEDs. This
is useful to immediately visually see the PTP status.

These provide two signals:

* is_gm

   This LED can be activated if the current device is the grand master in that
   PTP domain.

* sync_good

   This LED can be activated if the current device is in sync with the network
   time.

Expose these via the LED framework to be controlled via user space
e.g. linuxptp.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: hellcreek: Add support for hardware timestamping

The switch has the ability to take hardware generated time stamps per port for
PTPv2 event messages in Rx and Tx direction. That is useful for achieving needed
time synchronization precision for TSN devices/switches. So add support for it.

There are two directions:

* RX

   The switch has a single register per port to capture a timestamp. That
   mechanism is not used due to correlation problems. If the software processing
   is too slow and a PTPv2 event message is received before the previous one has
   been processed, false timestamps will be captured. Therefore, the switch can
   do "inline" timestamping which means it can insert the nanoseconds part of
   the timestamp directly into the PTPv2 event message. The reserved field (4
   bytes) is leveraged for that. This might not be in accordance with (older)
   PTP standards, but is the only way to get reliable results.

* TX

   In Tx direction there is no correlation problem, because the software and the
   driver has to ensure that only one event message is "on the fly". However,
   the switch provides also a mechanism to check whether a timestamp is
   lost. That can only happen when a timestamp is read and at this point another
   message is timestamped. So, that lost bit is checked just in case to indicate
   to the user that the driver or the software is somewhat buggy.

Signed-off-by: Kamil Alkhouri <kamil.alkhouri@hs-offenburg.de>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: hellcreek: Add PTP clock support

The switch has internal PTP hardware clocks. Add support for it. There are three
clocks:

* Synchronized
* Syntonized
* Free running

Currently the synchronized clock is exported to user space which is a good
default for the beginning. The free running clock might be exported later
e.g. for implementing 802.1AS-2011/2020 Time Aware Bridges (TAB). The switch
also supports cross time stamping for that purpose.

The implementation adds support setting/getting the time as well as offset and
frequency adjustments. However, the clock only holds a partial timeofday
timestamp. This is why we track the seconds completely in software (see overflow
work and last_ts).

Furthermore, add the PTP multicast addresses into the FDB to forward that
packages only to the CPU port where they are processed by a PTP program.

Signed-off-by: Kamil Alkhouri <kamil.alkhouri@hs-offenburg.de>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: Add DSA driver for Hirschmann Hellcreek switches

Add a basic DSA driver for Hirschmann Hellcreek switches. Those switches are
implementing features needed for Time Sensitive Networking (TSN) such as support
for the Time Precision Protocol and various shapers like the Time Aware Shaper.

This driver includes basic support for networking:

* VLAN handling
* FDB handling
* Port statistics
* STP
* Phylink

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: Give drivers the chance to veto certain upper devices

Some switches rely on unique pvids to ensure port separation in
standalone mode, because they don't have a port forwarding matrix
configurable in hardware. So, setups like a group of 2 uppers with the
same VLAN, swp0.100 and swp1.100, will cause traffic tagged with VLAN
100 to be autonomously forwarded between these switch ports, in spite
of there being no bridge between swp0 and swp1.

These drivers need to prevent this from happening. They need to have
VLAN filtering enabled in standalone mode (so they'll drop frames tagged
with unknown VLANs) and they can only accept an 8021q upper on a port as
long as it isn't installed on any other port too. So give them the
chance to veto bad user requests.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
[Kurt: Pass info instead of ptr]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: Add tag handling for Hirschmann Hellcreek switches

The Hirschmann Hellcreek TSN switches have a special tagging protocol for frames
exchanged between the CPU port and the master interface. The format is a one
byte trailer indicating the destination or origin port.

It's quite similar to the Micrel KSZ tagging. That's why the implementation is
based on that code.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mlx5: Replace in_irq() usage

mlx5_eq_async_int() uses in_irq() to decide whether eq::lock needs to be
acquired and released with spin_[un]lock() or the irq saving/restoring
variants.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

mlx5_eq_async_int() knows the context via the action argument already so
using it for the lock variant decision is a straight forward replacement
for in_irq().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Cleanup kernel-doc warnings

$ git ls-files *.[ch] | egrep drivers/net/ethernet/mellanox/ | \
xargs scripts/kernel-doc -none

drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:57:
warning: Enum value 'MLX5_FPGA_ACCESS_TYPE_I2C' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:57:
warning: Enum value 'MLX5_FPGA_ACCESS_TYPE_DONTCARE' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:118:
warning: Function parameter or member 'cb_arg' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:160:
warning: Function parameter or member 'conn' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:160:
warning: Excess function parameter 'fdev' description ...

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reported-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>

net/mlx4: Cleanup kernel-doc warnings

$ git ls-files *.[ch] | egrep drivers/net/ethernet/mellanox/ | \
xargs scripts/kernel-doc -none

drivers/net/ethernet/mellanox/mlx4/fw_qos.h:144:
warning: Function parameter or member 'in_param' not described ...
drivers/net/ethernet/mellanox/mlx4/fw_qos.h:144:
warning: Excess function parameter 'out_param' description ...

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reported-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>

net/mlx5e: Validate stop_room size upon user input

Stop room is a space that may be taken by WQEs in the SQ during a packet
transmit. It is used to check if next packet has enough room in the SQ.
Stop room guarantees this packet can be served and if not, the queue is
stopped, so no more packets are passed to the driver until it's ready.

Currently, stop_room size is calculated and validated upon tx queues
allocation. This makes it impossible to know if user provided valid
input for certain parameters when interface is down.

Instead, store stop_room in mlx5e_sq_param and create
mlx5e_validate_params(), to validate its fields upon user input even
when the interface is down.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Free unused buddy ICM memory

Track buddy's used ICM memory, and free it if all
of the buddy's memory bacame unused.
Do this only for STEs.
MODIFY_ACTION buddies are much smaller, so in case there
is a large amount of modify_header actions, which result
in large amount of MODIFY_ACTION buddies, doing this
cleanup during sync will result in performance hit while
not freeing significant amount of memory.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, ICM memory pools sync optimization

Track the pool's hot ICM memory when freeing/allocating
chunk, so that when checking if the sync is required, just
check if the pool hot memory has reached the sync threshold.

Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Sync chunks only during free

When freeing chunks, we want to sync the steering
so that all the "hot" memory will be written to ICM
and all the chunks that are in the hot_list will be
actually destroyed.
When allocating from the pool, we don't have a need
to sync the steering, as we're not freeing anything,
and sync might just hurt the performance in terms of
flow-per-second offloaded.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Handle ICM memory via buddy allocation instead of buckets

Till now in order to manage the ICM memory we used bucket
mechanism, which kept a bucket per specified size (sizes were
between 1 block to 2^21 blocks).

Now changing that with buddy-system mechanism, which gives us much
more flexible way to manage the ICM memory.
Its biggest advantage over the bucket is by using the same ICM memory
area for all the sizes of blocks, which reduces the memory consumption.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Add buddy allocator utilities

Add implementation of SW Steering variation of buddy allocator.

The buddy system for ICM memory uses 2 main data structures:
- Bitmap per order, that keeps the current state of allocated
blocks for this order
- Indicator for the number of available blocks per each order

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Rename matcher functions to be more HW agnostic

Remove flex parser from the matcher function names since
the matcher should not be aware of such HW specific details.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Rename builders HW specific names

We will support multiple STE versions.
The existing naming is not suitable for newer versions.
Removed the HW specific details and renamed with a more
general names.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Remove unused member of action struct

Struct mlx5dr_action doesn't use this member

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net: sched: implement action-specific terse dump

Allow user to request action terse dump with new flag value
TCA_FLAG_TERSE_DUMP. Only output essential action info in terse dump (kind,
stats, index and cookie, if set by the user when creating the action). This
is different from filter terse dump where index is excluded (filter can be
identified by its own handle).

Move tcf_action_dump_terse() function to the beginning of source file in
order to call it from tcf_dump_walker().

Signed-off-by: Vlad Buslov <vlad@buslov.dev>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://lore.kernel.org/r/20201102201243.287486-1-vlad@buslov.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

1) Move existing bridge packet reject infra to nf_reject_{ipv4,ipv6}.c
   from Jose M. Guisado.

2) Consolidate nft_reject_inet initialization and dump, also from Jose.

3) Add the netdev reject action, from Jose.

4) Allow to combine the exist flag and the destroy command in ipset,
   from Joszef Kadlecsik.

5) Expose bucket size parameter for hashtables, also from Jozsef.

6) Expose the init value for reproducible ipset listings, from Jozsef.

7) Use __printf attribute in nft_request_module, from Andrew Lunn.

8) Allow to use reject from the inet ingress chain.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next:
  netfilter: nft_reject_inet: allow to use reject from inet ingress
  netfilter: nftables: Add __printf() attribute
  netfilter: ipset: Expose the initval hash parameter to userspace
  netfilter: ipset: Add bucketsize parameter to all hash types
  netfilter: ipset: Support the -exist flag with the destroy command
  netfilter: nft_reject: add reject verdict support for netdev
  netfilter: nft_reject: unify reject init and dump into nft_reject
  netfilter: nf_reject: add reject skbuff creation helpers
====================

Link: https://lore.kernel.org/r/20201104141149.30082-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mptcp-miscellaneous-mptcp-fixes'

Mat Martineau says:

====================
mptcp: Miscellaneous MPTCP fixes

This is a collection of small fixup and minor enhancement patches that
have accumulated in the MPTCP tree while net-next was closed. These are
prerequisites for larger changes we have queued up.

Patch 1 refines receive buffer autotuning.

Patches 2 and 4 are some minor locking and refactoring changes.

Patch 3 improves GRO and RX coalescing with MPTCP skbs.

Patches 5-7 add a sysctl for tuning ADD_ADDR retransmission timeout,
corresponding test code, and documentation.

v2: Add sysctl documentation and fix signoff tags.
====================

Link: https://lore.kernel.org/r/20201103190509.27416-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: add ADD_ADDR timeout test case

This patch added the test case for retransmitting ADD_ADDR when timeout
occurs. It set NS1's add_addr_timeout to 1 second, and drop NS2's ADD_ADDR
echo packets.

Here we need to slow down the transfer process of all data to let the
ADD_ADDR suboptions can be retransmitted three times. So we added a new
parameter "speed" for do_transfer, it can be set with fast or slow.

We also added three new optional parameters for run_tests, and dropped
run_remove_tests function.

Since we added the netfilter rules in this test case, we need to update
the "config" file.

Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: networking: mptcp: Add MPTCP sysctl entries

Describe the two MPTCP sysctls, what the values mean, and the default
settings.

Acked-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: add a new sysctl add_addr_timeout

This patch added a new sysctl, named add_addr_timeout, to control the
timeout value (in seconds) of the ADD_ADDR retransmission.

Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: split mptcp_clean_una function

mptcp_clean_una() will wake writers in case memory could be reclaimed.
When called from mptcp_sendmsg the wakeup code isn't needed.

Move the wakeup to a new helper and then use that from the mptcp worker.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: propagate MPTCP skb extensions on xmit splits

When the TCP stack splits a packet on the write queue, the tail
half currently lose the associated skb extensions, and will not
carry the DSM on the wire.

The above does not cause functional problems and is allowed by
the RFC, but interact badly with GRO and RX coalescing, as possible
candidates for aggregation will carry different TCP options.

This change tries to improve the MPTCP behavior, propagating the
skb extensions on split.

Additionally, we must prevent the MPTCP stack from updating the
mapping after the split occur: that will both violate the RFC and
fool the reader.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: use _fast lock version in __mptcp_move_skbs

The function is short and won't sleep, so this can use the _fast version.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: adjust mptcp receive buffer limit if subflow has larger one

In addition to tcp autotuning during read, it may also increase the
receive buffer in tcp_clamp_window().

In this case, mptcp should adjust its receive buffer size as well so
it can move all pending skbs from the subflow socket to the mptcp socket.

At this time, TCP can have more skbs ready for processing than what the
mptcp receive buffer size allows.

In the mptcp case, the receive window announced is based on the free
space of the mptcp parent socket instead of the individual subflows.

Following the subflow allows mptcp to grow its receive buffer.

This is especially noticeable for loopback traffic where two skbs are
enough to fill the initial receive window.

In mptcp_data_ready() we do not hold the mptcp socket lock, so modifying
mptcp_sk->sk_rcvbuf is racy. Do it when moving skbs from subflow to
mptcp socket, both sockets are locked in this case.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

enetc: Remove Tx checksumming offload code

Tx checksumming has been defeatured and completely removed
from the h/w reference manual. Made a little cleanup for the
TSE case as this is complementary code.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20201103140213.3294-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpaa_eth: use false and true for bool variables

Fix coccicheck warnings:

./dpaa_eth.c:2549:2-22: WARNING: Assignment of 0/1 to bool variable
./dpaa_eth.c:2562:2-22: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://lore.kernel.org/r/1604405100-33255-1-git-send-email-zou_wei@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: adin: implement cable-test support

The ADIN1300/ADIN1200 support cable diagnostics using TDR.

The cable fault detection is automatically run on all four pairs looking at
all combinations of pair faults by first putting the PHY in standby (clear
the LINK_EN bit, PHY_CTRL_3 register, Address 0x0017) and then enabling the
diagnostic clock (set the DIAG_CLK_EN bit, PHY_CTRL_1 register, Address
0x0012).

Cable diagnostics can then be run (set the CDIAG_RUN bit in the
CDIAG_RUN register, Address 0xBA1B). The results are reported for each pair
in the cable diagnostics results registers, CDIAG_DTLD_RSLTS_0,
CDIAG_DTLD_RSLTS_1, CDIAG_DTLD_RSLTS_2, and CDIAG_DTLD_RSLTS_3, Address
0xBA1D to Address 0xBA20).

The distance to the first fault for each pair is reported in the cable
fault distance registers, CDIAG_FLT_DIST_0, CDIAG_FLT_DIST_1,
CDIAG_FLT_DIST_2, and CDIAG_FLT_DIST_3, Address 0xBA21 to Address 0xBA24).

This change implements support for this using phylib's cable-test support.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Link: https://lore.kernel.org/r/20201103074436.93790-2-alexandru.ardelean@analog.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: adin: disable diag clock & disable standby mode in config_aneg

When the PHY powers up, the diagnostics clock isn't enabled (bit 2 in
register PHY_CTRL_1 (0x0012)).
Also, the PHY is not in standby mode, so bit 13 in PHY_CTRL_3 (0x0017) is
always set at power up.

The standby mode and the diagnostics clock are both meant to be for the
cable diagnostics feature of the PHY (in phylib this would be equivalent to
the cable-test support), and for the frame-generator feature of the PHY.

In standby mode, the PHY doesn't negotiate links or manage links.

To use the cable diagnostics/test (or frame-generator), the PHY must be
first set in standby mode, so that the link operation doesn't interfere.
Then, the diagnostics clock must be enabled.

For the cable-test feature, when the operation finishes, the PHY goes into
PHY_UP state, and the config_aneg hook is called.

For the ADIN PHY, we need to make sure that during autonegotiation
configuration/setup the PHY is removed from standby mode and the
diagnostics clock is disabled, so that normal operation is resumed.

This change does that by moving the set of the ADIN1300_LINKING_EN bit (2)
in the config_aneg (to disable standby mode).
Previously, this was set in the downshift setup, because the downshift
retry value and the ADIN1300_LINKING_EN are in the same register.

And the ADIN1300_DIAG_CLK_EN bit (13) is cleared, to disable the
diagnostics clock.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Link: https://lore.kernel.org/r/20201103074436.93790-1-alexandru.ardelean@analog.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-net-bridge-add-tests-for-mldv2'

Nikolay Aleksandrov says:

====================
selftests: net: bridge: add tests for MLDv2

This is the second selftests patch-set for the new multicast functionality
which adds tests for the bridge's MLDv2 support. The tests use full
precooked packets which are sent via mausezahn and the resulting state
after each test is checked for proper X,Y sets, (*,G) source list, source
list entry timers, (S,G) existence and flags, packet forwarding and
blocking, exclude group expiration and (*,G) auto-add. The first 3 patches
factor out common functions which are used by IGMPv3 tests in lib.sh and
add support for IPv6 test UDP packet, then patch 4 adds the first test with
the initial MLDv2 setup.
The following new tests are added:
- base case: MLDv2 report ff02::cc is_include
- include -> allow report
- include -> is_include report
- include -> is_exclude report
- include -> to_exclude report
- exclude -> allow report
- exclude -> is_include report
- exclude -> is_exclude report
- exclude -> to_exclude report
- include -> block report
- exclude -> block report
- exclude timeout (move to include + entry deletion)
- S,G port entry automatic add to a *,G,exclude port

The variable names and set notation are the same as per RFC 3810,
for more information check RFC 3810 sections 2.3 and 7.
====================

Link: https://lore.kernel.org/r/20201103172412.1044840-1-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: add test for mldv2 *,g auto-add

When we have *,G ports in exclude mode and a new S,G,port is added
the kernel has to automatically create an S,G entry for each exclude
port to get proper forwarding.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: add test for mldv2 exclude timeout

Test that when a group in exclude mode expires it changes mode to
include and the blocked entries are deleted.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: add test for mldv2 exc -> block report

The test checks for the following case:
   Router State  Report Received  New Router State     Actions
   EXCLUDE (X,Y)   BLOCK (A)      EXCLUDE (X+(A-Y),Y)  (A-X-Y) =
                                                            Filter Timer
                                                       Send Q(MA,A-Y)

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: add test for mldv2 inc -> block report

The test checks for the following case:
Router State Report Received New Router State Actions
INCLUDE (A) BLOCK (B) INCLUDE (A) Send Q(MA,A*B)

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: add test for mldv2 exc -> to_exclude report

The test checks for the following case:
   Router State  Report Received  New Router State     Actions
   EXCLUDE (X,Y)   TO_EX (A)      EXCLUDE (A-Y,Y*A)    (A-X-Y) =
                                                            Filter Timer
                                                       Delete (X-A)
                                                       Delete (Y-A)
                                                       Send Q(MA,A-Y)
                                                       Filter Timer=MALI

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: add test for mldv2 exc -> is_exclude report

The test checks for the following case:
   Router State  Report Received  New Router State     Actions
   EXCLUDE (X,Y)     IS_EX (A)     EXCLUDE (A-Y, Y*A)  (A-X-Y)=MALI
                                                       Delete (X-A)
                                                       Delete (Y-A)
                                                       Filter Timer=MALI

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: add test for mldv2 exc -> is_include report

The test checks for the following case:
Router State Report Received New Router State Actions
EXCLUDE (X,Y) IS_IN (A) EXCLUDE (X+A, Y-A) (A)=MALI

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: add test for mldv2 exc -> allow report

The test checks for the following case:
Router State Report Received New Router State Actions
EXCLUDE (X,Y) ALLOW (A) EXCLUDE (X+A,Y-A) (A)=MALI

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: add test for mldv2 inc -> to_exclude report

The test checks for the following case:
   Router State  Report Received  New Router State     Actions
   INCLUDE (A)     TO_EX (B)      EXCLUDE (A*B,B-A)    (B-A)=0
                                                       Delete (A-B)
                                                       Send Q(MA,A*B)
                                                       Filter Timer=MALI

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: add test for mldv2 inc -> is_exclude report

The test checks for the following case:
   Router State  Report Received  New Router State     Actions
   INCLUDE (A)       IS_EX (B)     EXCLUDE (A*B, B-A)  (B-A)=0
                                                       Delete (A-B)
                                                       Filter Timer=MALI

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: add test for mldv2 inc -> is_include report

The test checks for the following case:
Router State Report Received New Router State Actions
INCLUDE (A) IS_IN (B) INCLUDE (A+B) (B)=MALI

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: add test for mldv2 inc -> allow report

The test checks for the following case:
Router State Report Received New Router State Actions
INCLUDE (A) ALLOW (B) INCLUDE (A+B) (B)=MALI

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: add initial MLDv2 include test

Add the initial setup for MLDv2 tests with the first test of a simple
is_include report. For MLDv2 we need to setup the bridge properly and we
also send the full precooked packets instead of relying on mausezahn to
fill in some parts. For verification we use the generic S,G state checking
functions from lib.sh.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: factor out and rename sg state functions

Factor out S,G entry state checking functions for existence, forwarding,
blocking and timer to lib.sh so they can be later used by MLDv2 tests.
Add brmcast_ suffix to their name to make the relation to the bridge
explicit.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: lib: add support for IPv6 mcast packet test

In order to test an IPv6 multicast packet we need to pass different tc
and mausezahn protocols only, so add a simple check for the destination
address which decides if we should generate an IPv4 or IPv6 mcast
packet.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: factor out mcast_packet_test

Factor out mcast_packet_test into lib.sh so it can be later extended and
reused by MLDv2 tests.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mt7530: support setting MTU

MT7530/7531 has a global RX packet length register, which can be used
to set MTU.

Supported packet length values are 1522 (1518 if untagged), 1536,
1552, and multiple of 1024 (from 2048 to 15360).

Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Link: https://lore.kernel.org/r/20201103050618.11419-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-ipa-tell-gsi-the-ipa-version'

Alex Elder says:

====================
net: ipa: tell GSI the IPA version

The GSI code that supports IPA avoids having knowledge about the
IPA layer it serves. One result of this is that Boolean flags are
used during GSI initialization to convey that certain hardware
version-dependent special behaviors should be used.

A given version of IPA hardware uses a fixed/well-defined version
of GSI, so the IPA version really implies the GSI version.

If given only the IPA version, the GSI code supporting IPA can
use it to implement certain special behaviors required for IPA
*or* GSI. This avoids the need to pass and maintain numerous
Boolean flags.
====================

Link: https://lore.kernel.org/r/20201102175400.6282-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: eliminate legacy arguments

We enable a channel doorbell engine only for IPA v3.5.1, and that is
now handled directly by gsi_channel_program().

When initially setting up a channel, we want that doorbell engine
enabled, and we can request that independent of the IPA version.

Doing that makes the "legacy" argument to gsi_channel_setup_one()
unnecessary. And with that gone we can get rid of the "legacy"
argument to gsi_channel_setup(), and gsi_setup() as well.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: use version in gsi_channel_program()

Use the IPA version in gsi_channel_program() to determine whether
we should enable the GSI doorbell engine when requested. This way,
callers only say whether or not it should be enabled if needed,
regardless of hardware version.

Rename the "legacy" argument to gsi_channel_reset(), and have
it indicate whether the doorbell engine should be enabled when
reprogramming following the reset.

Change all callers of gsi_channel_reset() to indicate whether to
enable the doorbell engine after reset, independent of hardware
version.

Rework a little logic in ipa_endpoint_reset() to get rid of the
"legacy" variable previously passed to gsi_channel_reset().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: use version in gsi_channel_reset()

A quirk of IPA v3.5.1 requires a channel reset on an RX channel to
be performed twice. Use the IPA version in gsi_channel_reset()
rather than the passed-in legacy flag to determine that.

This is actually a bug fix, because this double reset is supposed
to occur independent of whether we're enabling the doorbell engine.
Now they will be independent.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: use version in gsi_channel_init()

A quirk of IPA v4.2 requires the AP to allocate the GSI channels
that are owned by the modem.

Rather than pass a flag argument to gsi_channel_init(), use the
IPA version directly in that function to determine whether modem
channels need to be allocated.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: record IPA version in GSI structure

Record the IPA version passed to gsi_init() in the GSI structure.
This allows that value to be used directly where needed, rather than
passing and storing certain flag arguments through the code.

In particular, for all but one supported version of IPA, the command
channel is programmed to only use an "escape buffer". By storing
the IPA version, we can do a simple version check in one location,
and avoid storing a flag field in every channel (and passing a flag
along while initializing channels to set that field properly).

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: expose IPA version to the GSI layer

Although GSI is integral to IPA, it is a separate hardware component
and the IPA code supporting it has been structured to avoid explicit
dependence on IPA details.  An example of this is that gsi_init() is
passed a number of Boolean flags to indicate special behaviors,
whose values are dependent on the IPA hardware version.  Looking
ahead, newer hardware versions would require even more such special
behaviors.

For any given version of IPA hardware (like 3.5.1 or 4.2), the GSI
hardware version is fixed (in this case, 1.3 and 2.2, respectively).
So the IPA version *implies* the GSI version, and the IPA version
can be used as effectively the equivalent of the GSI hardware version.

Rather than proliferating new special behavior flags, just provide
the IPA version to the GSI layer when it is initialized.  The GSI
code can then use that directly to determine whether special
behaviors are required.  The IPA version enumerated type is already
isolated to its own header file, so the exposure of this IPA detail
is very limited.

For now, just change gsi_init() to pass the version rather than the
Boolean flags, and set the flag values internal to that function.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: restrict special reset to IPA v3.5.1

With IPA v3.5.1, if IPA aggregation is active at the time an
underlying GSI channel reset is performed, some special handling
is required.

There is logic in ipa_endpoint_reset() that arranges for that
special handling, but it's done for all hardware versions, not
just IPA v3.5.1.

Fix the logic to properly restrict the special behavior.

Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20201102173435.5987-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

chelsio/chtls: Utilizing multiple rxq/txq to process requests

patch adds a logic to utilize multiple queues to process requests.
The queue selection logic uses a round-robin distribution technique
using a counter.

Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201102162832.22344-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-pf: Fix sizeof() mismatch

An incorrect sizeof() is being used, sizeof(u64 *) is not correct,
it should be sizeof(*sq->sqb_ptrs).

Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201102134601.698436-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dev_ioctl: remove redundant initialization of variable err

The variable err is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201102121615.695196-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

forcedeth: fix excluded_middle.cocci warnings

Condition !A || A && B is equivalent to !A || B.

Generated by: scripts/coccinelle/misc/excluded_middle.cocci

Fixes: b76f0ea01312 ("coccinelle: misc: add excluded_middle.cocci script")
CC: Denis Efremov <efremov@linux.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2011020936100.3077@hadrien
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: Add support for RTL8221B-CG series

Realtek single-port 2.5Gbps Ethernet PHYs are list as below:
RTL8226-CG: the 1st generation 2.5Gbps single port PHY
RTL8226B-CG/RTL8221B-CG: the 2nd generation 2.5Gbps single port PHY
RTL8221B-VB-CG: the 3rd generation 2.5Gbps single port PHY
RTL8221B-VM-CG: the 2.5Gbps single port PHY with MACsec feature

This patch adds the minimal drivers to manage these transceivers.

Signed-off-by: Willy Liu <willy.liu@realtek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/1604281927-9874-1-git-send-email-willy.liu@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'fsl-qbman-in_interrupt-cleanup'

Sebastian Andrzej Siewior says:

====================
fsl/qbman: in_interrupt() cleanup.

This is the in_interrupt() clean for FSL DPAA framework and the two
users.

The `napi' parameter has been renamed to `sched_napi', the other parts
are same as in the previous post [0].

[0] https://lkml.kernel.org/r/20201027225454.3492351-1-bigeasy@linutronix.de
====================

Link: https://lore.kernel.org/r/20201101232257.3028508-1-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

crypto: caam: Replace in_irq() usage.

The driver uses in_irq() + in_serving_softirq() magic to decide if NAPI
scheduling is required or packet processing.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
separated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

Use the `sched_napi' argument passed by the callback. It is set true if
called from the interrupt handler and NAPI should be scheduled.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Aymen Sghaier <aymen.sghaier@nxp.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Li Yang <leoyang.li@nxp.com>
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Tested-by: Camelia Groza <camelia.groza@nxp.com>

net: dpaa: Replace in_irq() usage.

The driver uses in_irq() + in_serving_softirq() magic to decide if NAPI
scheduling is required or packet processing.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
separated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

Use the `sched_napi' argument passed by the callback. It is set true if
called from the interrupt handler and NAPI should be scheduled.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Aymen Sghaier <aymen.sghaier@nxp.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Tested-by: Camelia Groza <camelia.groza@nxp.com>

soc/fsl/qbman: Add an argument to signal if NAPI processing is required.

dpaa_eth_napi_schedule() and caam_qi_napi_schedule() schedule NAPI if
invoked from:

- Hard interrupt context
- Any context which is not serving soft interrupts

Any context which is not serving soft interrupts includes hard interrupts
so the in_irq() check is redundant. caam_qi_napi_schedule() has a comment
about this:

        /*
         * In case of threaded ISR, for RT kernels in_irq() does not return
         * appropriate value, so use in_serving_softirq to distinguish between
         * softirq and irq contexts.
         */
         if (in_irq() || !in_serving_softirq())

This has nothing to do with RT. Even on a non RT kernel force threaded
interrupts run obviously in thread context and therefore in_irq() returns
false when invoked from the handler.

The extension of the in_irq() check with !in_serving_softirq() was there
when the drivers were added, but in the out of tree FSL BSP the original
condition was in_irq() which got extended due to failures on RT.

The usage of in_xxx() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
separated or the context be conveyed in an argument passed by the caller,
which usually knows the context. Right he is, the above construct is
clearly showing why.

The following callchains have been analyzed to end up in
dpaa_eth_napi_schedule():

qman_p_poll_dqrr()
  __poll_portal_fast()
    fq->cb.dqrr()
       dpaa_eth_napi_schedule()

portal_isr()
  __poll_portal_fast()
    fq->cb.dqrr()
       dpaa_eth_napi_schedule()

Both need to schedule NAPI.
The crypto part has another code path leading up to this:
  kill_fq()
     empty_retired_fq()
       qman_p_poll_dqrr()
         __poll_portal_fast()
            fq->cb.dqrr()
               dpaa_eth_napi_schedule()

kill_fq() is called from task context and ends up scheduling NAPI, but
that's pointless and an unintended side effect of the !in_serving_softirq()
check.

The code path:
  caam_qi_poll() -> qman_p_poll_dqrr()

is invoked from NAPI and I *assume* from crypto's NAPI device and not
from qbman's NAPI device. I *guess* it is okay to skip scheduling NAPI
(because this is what happens now) but could be changed if it is wrong
due to `budget' handling.

Add an argument to __poll_portal_fast() which is true if NAPI needs to be
scheduled. This requires propagating the value to the caller including
`qman_cb_dqrr' typedef which is used by the dpaa and the crypto driver.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Aymen Sghaier <aymen.sghaier@nxp.com>
Cc: Herbert XS <herbert@gondor.apana.org.au>
Cc: Li Yang <leoyang.li@nxp.com>
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Tested-by: Camelia Groza <camelia.groza@nxp.com>