review.tizen.org Git - platform/kernel/linux-rpi.git/log

net: ipa: support zeroing new cache tables

IPA v5.0+ separates the configuration of entries in the cached
(previously "hashed") routing and filtering tables into distinct
registers. Previously a single "filter and router" register updated
entries in both tables at once; now the routing and filter table
caches have separate registers that define their content.

This patch updates the code that zeroes entries in the cached filter
and router tables to support IPA versions including v5.0+.

Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: update table cache flushing

Update the code that causes filter and router table caches to be
flushed so that it supports IPA versions 5.0+. It adds a comment in
ipa_hardware_config_hashing() that explains that cacheing does not
need to be enabled, just as before, because it's enabled by default.
(For the record, the FILT_ROUT_CACHE_CFG register would have been
used if we wanted to explicitly enable these.)

Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: define IPA v5.0+ registers

Define some new registers that appear starting with IPA v5.0, along
with enumerated types identifying their fields.  Code that uses
these will be added by upcoming patches.

Most of the new registers are related to filter and routing tables,
and in particular, their "hashed" variant.  These tables are better
described as "cached", where a hash value determines which entries
are cached.  From now on, naming related to this functionality will
use "cache" instead of "hash", and that is reflected in these new
register names.  Some registers for managing these caches and their
contents have changed as well.

A few other new field definitions for registers (unrelated to table
caches) are also defined.

Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: extend endpoints in packet init command

The IP_PACKET_INIT immediate command defines the destination
endpoint to which a packet should be sent. Prior to IPA v5.0, a
5 bit field in that command represents the endpoint, but starting
with IPA v5.0, the field is extended to 8 bits to support more than
32 endpoints.

Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: support more endpoints

Increase the number of endpoints supported by the driver to 36,
which IPA v5.0 supports.  This makes it impossible to check at build
time whether the supported number is too big to fit within the
(5-bit) PACKET_INIT destination endpoint field.  Instead, convert
the build time check to compare against what fits in 8 bits.

Add a check in ipa_endpoint_config() to also ensure the hardware
reports an endpoint count that's in the expected range.  Just
open-code 32 as the limit (the PACKET_INIT field mask is not
available where we'd want to use it).

Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'mlx5-updates-2023-01-30' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-01-30

Add fast update encryption key

Jianbo Liu Says:
================

Data encryption keys (DEKs) are the keys used for data encryption and
decryption operations. Starting from version 22.33.0783, firmware is
optimized to accelerate the update of user keys into DEK object in
hardware. The support for bulk allocation and destruction of DEK
objects is added, and the bulk allocated DEKs are uninitialized, as
the bulk creation requires no input key. When offload
encryption/decryption, user gets one object from a bulk, and updates
key by a new "modify DEK" command. This command is the same as create
DEK object, but requires no heavy context memory allocation in
firmware, which consumes most cpu cycles of the create DEK command.

DEKs are cached internally by the NIC, so invalidating internal NIC
caches is required before reusing DEKs. The SYNC_CRYPTO command is
added to support it. DEK object can be reused, the keys in it can be
updated after this command is executed.

This patchset enhances the key creation and destruction flow, to get
use of this new feature. Any user, for example, ktls, ipsec and
macsec, can use it to offload keys. But, only ktls uses it, as others
don't need many keys, and caching two many DEKs in pool is wasteful.

There are two new data struts added:
    a. DEK pool. One pool is created for each key type. The bulks by
the type, are placed in the pool's different bulk lists, according to
the number of available and in_used DEKs in the bulk.
    b. DEK bulk. All DEKs in one bulk allocation are store here. There
are two bitmaps to indicate the state of each DEK.

New APIs are then added. When user need a DEK object,
    a. Fetch one bulk with avail DEKs, from the partial_list or
avail_list, otherwise create new one.
    b. Pick one DEK, and set its need_sync and in_used bits to 1.
Move the bulk to full_list if no more available keys, or put it to
partial_list if the bulk is newly created.
    c. Update DEK object's key with user key, by the "modify DEK"
command.
    d. Return DEK struct to user, then it gets the object id and fills
it into the offload commands.
When user free a DEK,
    a. Set in_use bit to 0. If all need_sync bits are 1 and all in_use
bits of this bulk are 0, move it to sync_list.
    b. If the number of DEKs, which are freed by users, is over the
threshold (128), schedule a workqueue to do the sync process.

For the sync process, the SYNC_CRYPTO command is executed first. Then,
for each bulks in partial_list, full_list and sync_list, reset
need_sync bits of the freed DEK objects. If all need_sync bits in one
bulk are zero, move it to avail_list.

We already supported TIS pool to recycle the TISes. With this series
and TIS pool, TLS CPS performance is improved greatly.
And we tested https on the system:
    CPU: dual AMD EPYC 7763 64-Core processors
    RAM: 512G
    DEV: ConnectX-6 DX, with FW ver 22.33.0838 and TLS_OPTIMISE=true
TLS CPS performance numbers are:
    Before: 11k connections/sec
    After: 101 connections/sec

================

* tag 'mlx5-updates-2023-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: kTLS, Improve connection rate by using fast update encryption key
  net/mlx5: Keep only one bulk of full available DEKs
  net/mlx5: Add async garbage collector for DEK bulk
  net/mlx5: Reuse DEKs after executing SYNC_CRYPTO command
  net/mlx5: Use bulk allocation for fast update encryption key
  net/mlx5: Add bulk allocation and modify_dek operation
  net/mlx5: Add support SYNC_CRYPTO command
  net/mlx5: Add new APIs for fast update encryption key
  net/mlx5: Refactor the encryption key creation
  net/mlx5: Add const to the key pointer of encryption key creation
  net/mlx5: Prepare for fast crypto key update if hardware supports it
  net/mlx5: Change key type to key purpose
  net/mlx5: Add IFC bits and enums for crypto key
  net/mlx5: Add IFC bits for general obj create param
  net/mlx5: Header file for crypto
====================

Link: https://lore.kernel.org/r/20230131031201.35336-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '10GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN: Remove redundant Device Control Error Reporting Enable

Bjorn Helgaas says:

Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"),
the PCI core sets the Device Control bits that enable error reporting for
PCIe devices.

This series removes redundant calls to pci_enable_pcie_error_reporting()
that do the same thing from several NIC drivers.

There are several more drivers where this should be removed; I started with
just the Intel drivers here.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ixgbe: Remove redundant pci_enable_pcie_error_reporting()
  igc: Remove redundant pci_enable_pcie_error_reporting()
  igb: Remove redundant pci_enable_pcie_error_reporting()
  ice: Remove redundant pci_enable_pcie_error_reporting()
  iavf: Remove redundant pci_enable_pcie_error_reporting()
  i40e: Remove redundant pci_enable_pcie_error_reporting()
  fm10k: Remove redundant pci_enable_pcie_error_reporting()
  e1000e: Remove redundant pci_enable_pcie_error_reporting()
====================

Link: https://lore.kernel.org/r/20230130192519.686446-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-mlxsw-convert-to-iproute2-dcb'

Petr Machata says:

====================
selftests: mlxsw: Convert to iproute2 dcb

There is a dedicated tool for configuration of DCB in iproute2. Use it
in the selftests instead of lldpad.

Patches #1-#3 convert three tests. Patch #4 drops the now-unnecessary
lldpad helpers.
====================

Link: https://lore.kernel.org/r/cover.1675096231.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: forwarding: lib: Drop lldpad_app_wait_set(), _del()

The existing users of these helpers have been converted to iproute2 dcb.
Drop the helpers.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: qos_defprio: Convert from lldptool to dcb

Set up default port priority through the iproute2 dcb tool, which is easier
to understand and manage.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: qos_dscp_router: Convert from lldptool to dcb

Set up DSCP prioritization through the iproute2 dcb tool, which is easier
to understand and manage.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: qos_dscp_bridge: Convert from lldptool to dcb

Set up DSCP prioritization through the iproute2 dcb tool, which is easier
to understand and manage.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mdio-add-amlogic-gxl-mdio-mux-support'

Jerome Brunet says:

====================
net: mdio: add amlogic gxl mdio mux support

Add support for the MDIO multiplexer found in the Amlogic GXL SoC family.
This multiplexer allows to choose between the external (SoC pins) MDIO bus,
or the internal one leading to the integrated 10/100M PHY.

This multiplexer has been handled with the mdio-mux-mmioreg generic driver
so far. When it was added, it was thought the logic was handled by a
single register.

It turns out more than a single register need to be properly set.
As long as the device is using the Amlogic vendor bootloader, or upstream
u-boot with net support, it is working fine since the kernel is inheriting
the bootloader settings. Without net support in the bootloader, this glue
comes unset in the kernel and only the external path may operate properly.

With this driver (and the associated change in
arch/arm64/boot/dts/amlogic/meson-gxl.dtsi), the kernel no longer relies
on the bootloader to set things up, fixing the problem.
====================

Link: https://lore.kernel.org/r/20230130151616.375168-1-jbrunet@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: add amlogic gxl mdio mux support

Add support for the mdio mux and internal phy glue of the GXL SoC
family

Reported-by: Da Xue <da@lessconfused.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: add amlogic gxl mdio multiplexer

Add documentation for the MDIO bus multiplexer found on the Amlogic GXL
SoC family

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tools-ynl-more-docs-and-basic-ethtool-support'

Jakub Kicinski says:

====================
tools: ynl: more docs and basic ethtool support

I got discouraged from supporting ethtool in specs, because
generating the user space C code seems a little tricky.
The messages are ID'ed in a "directional" way (to and from
kernel are separate ID "spaces"). There is value, however,
in having the spec and being able to for example use it
in Python.

After paying off some technical debt - add a partial
ethtool spec. Partial because the header for ethtool is almost
a 1000 LoC, so converting in one sitting is tough. But adding
new commands should be trivial now.

Last but not least I add more docs, I realized that I've been
sending a similar "instructions" email to people working on
new families. It's now intro-specs.rst.
====================

Link: https://lore.kernel.org/r/20230131023354.1732677-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: net: use python3 explicitly

The scripts require Python 3 and some distros are dropping
Python 2 support.

Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: netlink: add a starting guide for working with specs

We have a bit of documentation about the internals of Netlink
and the specs, but really the goal is for most people to not
worry about those. Add a practical guide for beginners who
want to poke at the specs.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: add partial specification for ethtool

Ethtool is one of the most actively developed families.
With the changes to the CLI it should be possible to use
the YNL based code for easy prototyping and development.
Add a partial family definition. I've tested the string
set and rings. I don't have any MAC Merge implementation
to test with, but I added the definition for it, anyway,
because it's last. New commands can simply be added at
the end without having to worry about manually providing
IDs / values.

Set (with notification support - None is the response,
the data is from the notification):

$ sudo ./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/ethtool.yaml \
    --do rings-set \
    --json '{"header":{"dev-name":"enp0s31f6"}, "rx":129}' \
    --subscribe monitor
None
[{'msg': {'header': {'dev-index': 2, 'dev-name': 'enp0s31f6'},
          'rx': 136,
          'rx-max': 4096,
          'tx': 256,
          'tx-max': 4096,
          'tx-push': 0},
  'name': 'rings-ntf'}]

Do / dump (yes, the kernel requires that even for dump and even
if empty - the "header" nest must be there):

$ ./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/ethtool.yaml \
    --do rings-get \
    --json '{"header":{"dev-index": 2}}'
{'header': {'dev-index': 2, 'dev-name': 'enp0s31f6'},
'rx': 136,
'rx-max': 4096,
'tx': 256,
'tx-max': 4096,
'tx-push': 0}

$ ./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/ethtool.yaml \
    --dump rings-get \
    --json '{"header":{}}'
[{'header': {'dev-index': 2, 'dev-name': 'enp0s31f6'},
  'rx': 136,
  'rx-max': 4096,
  'tx': 256,
  'tx-max': 4096,
  'tx-push': 0},
{'header': {'dev-index': 3, 'dev-name': 'wlp0s20f3'}, 'tx-push': 0},
{'header': {'dev-index': 19, 'dev-name': 'enp58s0u1u1'},
  'rx': 100,
  'rx-max': 4096,
  'tx-push': 0}]

And error reporting:

$ ./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/ethtool.yaml \
    --dump rings-get \
    --json '{"header":{"flags":5}}'
Netlink error: Invalid argument
nl_len = 68 (52) nl_flags = 0x300 nl_type = 2
error: -22 extack: {'msg': 'reserved bit set',
                         'bad-attr-offs': 24,
'bad-attr': '.header.flags'}
None

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: finish up operation enum-models

I had a (bright?) idea of introducing the concept of enum-models
to account for all the weird ways families enumerate their messages.
I've never finished it because generating C code for each of them
is pretty daunting. But for languages which can use ID values directly
the support is simple enough, so clean this up a bit.

"unified" model is what I recommend going forward.
"directional" model is what ethtool uses.
"notify-split" is used by the proposed DPLL code, but we can just
make them use "unified", it hasn't been merged :)

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: load jsonschema on demand

The CLI script tries to validate jsonschema by default.
It's seems better to validate too many times than too few.
However, when copying the scripts to random servers having
to install jsonschema is tedious. Load jsonschema via
importlib, and let the user opt out.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: use operation names from spec on the CLI

When I wrote the first version of the Python code I was quite
excited that we can generate class methods directly from the
spec. Unfortunately we need to use valid identifiers for method
names (specifically no dashes are allowed). Don't reuse those
names on the CLI, it's much more natural to use the operation
names exactly as listed in the spec.

Instead of:
./cli --do rings_get
use:
./cli --do rings-get

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: support pretty printing bad attribute names

One of my favorite features of the Netlink specs is that they
make decoding structured extack a ton easier.
Implement pretty printing bad attribute names in YNL.

For example it will now say:

  'bad-attr': '.header.flags'

rather than the useless:

  'bad-attr-offs': 32

Proof:

  $ ./cli.py --spec ethtool.yaml --do rings_get \
     --json '{"header":{"dev-index":1, "flags":4}}'
  Netlink error: Invalid argument
  nl_len = 68 (52) nl_flags = 0x300 nl_type = 2
error: -22 extack: {'msg': 'reserved bit set',
'bad-attr': '.header.flags'}

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: support multi-attr

Ethtool uses mutli-attr, add the support to YNL.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: support directional enum-model in CLI

Support families which use different IDs for messages
to and from the kernel.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: add support for types needed by ethtool

Ethtool needs support for handful of extra types.
It doesn't have the definitions section yet.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: use the common YAML loading and validation code

Adapt the common object hierarchy in code gen and CLI.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: add an object hierarchy to represent parsed spec

There's a lot of copy and pasting going on between the "cli"
and code gen when it comes to representing the parsed spec.
Create a library which both can use.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: move the cli and netlink code around

Move the CLI code out of samples/ and the library part
of it into tools/net/ynl/lib/. This way we can start
sharing some code with the code gen.

Initially I thought that code gen is too C-specific to
share anything but basic stuff like calculating values
for enums can easily be shared.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl-gen: prevent do / dump reordering

An earlier fix tried to address generated code jumping around
one code-gen run to another. Turns out dict()s are already
ordered since Python 3.7, the problem is that we iterate over
operation modes using a set(). Sets are unordered in Python.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: use dev PM wakeirq handling

Replace the enable_irq_wake() call with one to dev_pm_set_wake_irq()
instead. This will let the dev PM framework automatically manage the
the wakeup capability of the ipa IRQ and ensure that userspace requests
to enable/disable wakeup for the IPA via sysfs are respected.

Signed-off-by: Caleb Connolly <caleb.connolly@linaro.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20230127202758.2913612-1-caleb.connolly@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: microchip: ptp: fix up PTP dependency

When NET_DSA_MICROCHIP_KSZ_COMMON is built-in but PTP is a loadable
module, the ksz_ptp support still causes a link failure:

ld.lld-16: error: undefined symbol: ptp_clock_index
>>> referenced by ksz_ptp.c
>>> drivers/net/dsa/microchip/ksz_ptp.o:(ksz_get_ts_info) in archive vmlinux.a

This can happen if NET_DSA_MICROCHIP_KSZ8863_SMI is enabled, or
even if none of the KSZ9477_I2C/KSZ_SPI/KSZ8863_SMI ones are active
but only the common module is.

The most straightforward way to address this is to move the
dependency to NET_DSA_MICROCHIP_KSZ_PTP itself, which can now
only be enabled if both PTP_1588_CLOCK support is reachable
from NET_DSA_MICROCHIP_KSZ_COMMON. Alternatively, one could make
NET_DSA_MICROCHIP_KSZ_COMMON a hidden Kconfig symbol and extend the
PTP_1588_CLOCK_OPTIONAL dependency to NET_DSA_MICROCHIP_KSZ8863_SMI as
well, but that is a little more fragile.

Fixes: eac1ea20261e ("net: dsa: microchip: ptp: add the posix clock support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230130131808.1084796-1-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Documentation: networking: correct spelling

Correct spelling problems for Documentation/networking/ as reported
by codespell.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Jiri Pirko <jiri@nvidia.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20230129231053.20863-5-rdunlap@infradead.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ibmvnic: Toggle between queue types in affinity mapping

Previously, ibmvnic IRQs were assigned to CPU numbers by assigning all
the IRQs for transmit queues then assigning all the IRQs for receive
queues. With multi-threaded processors, in a heavy RX or TX environment,
physical cores would either be overloaded or underutilized (due to the
IRQ assignment algorithm). This approach is sub-optimal because IRQs for
the same subprocess (RX or TX) would be bound to adjacent CPU numbers,
meaning they were more likely to be contending for the same core.

For example, in a system with 64 CPU's and 32 queues, the IRQs would
be bound to CPU in the following pattern:

IRQ type | CPU number
-----------------------
TX0 | 0-1
TX1 | 2-3
<etc>
RX0 | 32-33
RX1 | 34-35
<etc>

Observe that in SMT-8, the first 4 tx queues would be sharing the
same core.

A more optimal algorithm would balance the number RX and TX IRQ's across
the physical cores. Therefore, to increase performance, distribute RX and
TX IRQs across cores by alternating between assigning IRQs for RX and TX
queues to CPUs.
With a system with 64 CPUs and 32 queues, this results in the following
pattern:

IRQ type | CPU number
-----------------------
TX0 | 0-1
RX0 | 2-3
TX1 | 4-5
RX1 | 6-7
<etc>

Observe that in SMT-8, there is equal distribution of RX and TX IRQs
per core. In the above case, each core handles 2 TX and 2 RX IRQ's.

Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Haren Myneni <haren@linux.ibm.com>
Link: https://lore.kernel.org/r/20230127214358.318152-1-nnac123@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'add-support-for-the-the-vsc7512-internal-copper-phys'

Colin Foster says:

====================
add support for the the vsc7512 internal copper phys

This patch series is a continuation to add support for the VSC7512:
https://patchwork.kernel.org/project/netdevbpf/list/?series=674168&state=*

That series added the framework and initial functionality for the
VSC7512 chip. Several of these patches grew during the initial
development of the framework, which is why v1 will include changelogs.
It was during v9 of that original MFD patch set that these were dropped.

With that out of the way, the VSC7512 is mainly a subset of the VSC7514
chip. The 7512 lacks an internal MIPS processor, but otherwise many of
the register definitions are identical. That is why several of these
patches are simply to expose common resources from
drivers/net/ethernet/mscc/*.

This patch only adds support for the first four ports (swp0-swp3). The
remaining ports require more significant changes to the felix driver,
and will be handled in the future.
====================

Link: https://lore.kernel.org/r/20230127193559.1001051-1-colin.foster@in-advantage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mfd: ocelot: add external ocelot switch control

Utilize the existing ocelot MFD interface to add switch functionality to
the Microsemi VSC7512 chip.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Acked-for-MFD-by: Lee Jones <lee@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: ocelot: add external ocelot switch control

Add control of an external VSC7512 chip.

Currently the four copper phy ports are fully functional. Communication to
external phys is also functional, but the SGMII / QSGMII interfaces are
currently non-functional.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: mfd: ocelot: add ethernet-switch hardware support

The main purpose of the Ocelot chips are the Ethernet switching
functionalities. Document the support for these features.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: mscc,vsc7514-switch: add dsa binding for the vsc7512

The VSC7511, VSC7512, VSC7513 and VSC7514 all have the ability to be
controlled either internally by a memory-mapped CPU, or externally via
interfaces like SPI and PCIe. The internal CPU of the VSC7511 and 7512
don't have the resources to run Linux, so must be controlled via these
external interfaces in a DSA configuration.

Add mscc,vsc7512-switch compatible string to indicate that the chips are
being controlled externally in a DSA configuration.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mfd: ocelot: prepend resource size macros to be 32-bit

The *_RES_SIZE macros are initally <= 0x100. Future resource sizes will be
upwards of 0x200000 in size.

To keep things clean, fully align the RES_SIZE macros to 32-bit to do
nothing more than make the code more consistent.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Acked-for-MFD-by: Lee Jones <lee@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: felix: add functionality when not all ports are supported

When the Felix driver would probe the ports and verify functionality, it
would fail if it hit single port mode that wasn't supported by the driver.

The initial case for the VSC7512 driver will have physical ports that
exist, but aren't supported by the driver implementation. Add the
OCELOT_PORT_MODE_NONE macro to handle this scenario, and allow the Felix
driver to continue with all the ports that are currently functional.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: felix: add support for MFD configurations

The architecture around the VSC7512 differs from existing felix drivers. In
order to add support for all the chip's features (pinctrl, MDIO, gpio) the
device had to be laid out as a multi-function device (MFD).

One difference between an MFD and a standard platform device is that the
regmaps are allocated to the parent device before the child devices are
probed. As such, there is no need for felix to initialize new regmaps in
these configurations, they can simply be requested from the parent device.

Add support for MFD configurations by performing this request from the
parent device.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: felix: add configurable device quirks

The define FELIX_MAC_QUIRKS was used directly in the felix.c shared driver.
Other devices (VSC7512 for example) don't require the same quirks, so they
need to be configured on a per-device basis.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: expose vsc7514_regmap definition

The VSC7514 target regmap is identical for ones shared with similar
hardware, specifically the VSC7512. Share this resource, and change the
name to match the pattern of other exported resources.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: expose ocelot_reset routine

Resetting the switch core is the same whether it is done internally or
externally. Move this routine to the ocelot library so it can be used by
other drivers.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: expose vcap_props structure

The vcap_props structure is common to other devices, specifically the
VSC7512 chip that can only be controlled externally. Export this structure
so it doesn't need to be recreated.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: expose regfield definition to be used by other drivers

The ocelot_regfields struct is common between several different chips, some
of which can only be controlled externally. Export this structure so it
doesn't have to be duplicated in these other drivers.

Rename the structure as well, to follow the conventions of other shared
resources.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: expose ocelot wm functions

Expose ocelot_wm functions so they can be shared with other drivers.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: motorcomm: change the phy id of yt8521 and yt8531s to lowercase

The phy id is usually defined in lower case.

Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230128063558.5850-2-Frank.Sae@motor-comm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fix the spelling problem of Sentinel

CHECK: 'sentinal' may be misspelled - perhaps 'sentinel'?

Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230128063558.5850-1-Frank.Sae@motor-comm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sh: checksum: add missing linux/uaccess.h include

SuperH does not include uaccess.h, even tho it calls access_ok().

Fixes: 68f4eae781dd ("net: checksum: drop the linux/uaccess.h include")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230128073108.1603095-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: b44: Remove the unused function __b44_cam_read()

The function __b44_cam_read() is defined in the b44.c file, but not called
elsewhere, so remove this unused function.

drivers/net/ethernet/broadcom/b44.c:199:20: warning: unused function '__b44_cam_read'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3858
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230128090413.79824-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: kTLS, Improve connection rate by using fast update encryption key

As the fast DEK update is fully implemented, use it for kTLS to get
better performance.
TIS pool was already supported to recycle the TISes. With this series
and TIS pool, TLS CPS is improved by 9x higher, from 11k/s to 101k/s.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Keep only one bulk of full available DEKs

One bulk with full available keys is left undestroyed, to service the
possible requests from users quickly.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Add async garbage collector for DEK bulk

After invalidation, the idle bulk with all DEKs available for use, is
destroyed, to free keys and mem.

To get better performance, the firmware destruction operation is done
asynchronously. So idle bulks are enqueued in destroy_list first, then
destroyed in system workqueue. This will improve performance, as the
destruction doesn't need to hold pool's mutex.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Reuse DEKs after executing SYNC_CRYPTO command

To fast update encryption keys, those freed keys with need_sync bit 1
and in_use bit 0 in a bulk, can be recycled. The keys are cached
internally by the NIC, so invalidating internal NIC caches by
SYNC_CRYPTO command is required before reusing them. A threshold in
driver is added to avoid invalidating for every update. Only when the
number of DEKs, which need to be synced, is over this threshold, the
sync process will start. Besides, it is done in system workqueue.

After SYNC_CRYPTO command is executed successfully, the bitmaps of
each bulk must be reset accordingly, so that the freed DEKs can be
reused. From the analysis in previous patch, the number of reused DEKs
can be calculated by hweight_long(need_sync XOR in_use), and the
need_sync bits can be reset by simply copying from in_use bits.

Two more list (avail_list and sync_list) are added for each pool. The
avail_list is for a bulk when all bits in need_sync are reset after
sync. If there is no avail deks, and all are be freed by users, the
bulk is moved to sync_list, instead of being destroyed in previous
patch, and waiting for the invalidation. While syncing, they are
simply reset need_sync bits, and moved to avail_list.

Besides, add a wait_for_free list for the to-be-free DEKs. It is to
avoid this corner case: when thread A is done with SYNC_CRYPTO but just
before starting to reset the bitmaps, thread B is alloc dek, and free
it immediately. It's obvious that this DEK can't be reused this time,
so put it to waiting list, and do free after bulk bitmaps reset is
finished.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Use bulk allocation for fast update encryption key

We create a pool for each key type. For the pool, there is a struct
to store the info for all DEK objects of one bulk allocation. As we
use crypto->log_dek_obj_range, which is set to 12 in previous patch,
for the log_obj_range of bulk allocation, 4096 DEKs are allocated in
one time.

To trace the state of all the keys in a bulk, two bitmaps are created.
The need_sync bitmap is used to indicate the available state of the
corresponding key. If the bit is 0, it can be used (available) as it
either is newly created by FW, or SYNC_CRYPTO is executed and bit is
reset after it is freed by upper layer user (this is the case to be
handled in later patch). Otherwise, the key need to be synced. The
in_use bitmap is used to indicate the key is being used, and reset
when user free it.

When ktls, ipsec or macsec need a key from a bulk, it get one with
need_sync bit 0, then set both need_sync and in_used bit to 1. When
user free a key, only in_use bit is reset to 0. So, for the
combinations of (need_sync, in_use) of one DEK object,
   - (0,0) means the key is ready for use,
   - (1,1) means the key is currently being used by a user,
   - (1,0) means the key is freed, and waiting for being synced,
   - (0,1) is invalid state.

There are two lists in each pool, partial_list and full_list,
according to the number for available DEKs in a bulk. When user need a
key, it get a bulk, either from partial list, or create new one from
FW. Then the bulk is put in the different pool's lists according to
the num of avail deks it has. If there is no avail deks, and all of
them are be freed by users, for now, the bulk is destroyed.

To speed up the bitmap search, a variable (avail_start) is added to
indicate where to start to search need_sync bitmap for available key.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Add bulk allocation and modify_dek operation

To support fast update of keys into hardware, we optimize firmware to
achieve the maximum rate. The approach is to create DEK objects in
bulk, and update each of them with modify command.
This patch supports bulk allocation and modify_dek commands for new
firmware. However, as log_obj_range is 0 for now, only one DEK obj is
allocated each time, and then updated with user key by modify_dek.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Add support SYNC_CRYPTO command

Add support for SYNC_CRYPTO command. For now, it is executed only when
initializing DEK, but needed when reusing keys in later patch.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Add new APIs for fast update encryption key

New APIs are added to support fast update DEKs. As a pool is created
for each key purpose (type), one pair of pool APIs to get/put pool.
Anotehr pair of DEKs APIs is to get DEK object from pool and update it
with user key, or free it back to the pool. As The bulk allocation
and destruction will be supported in later patches, old implementation
is used here.

To support these APIs, pool and dek structs are defined first. Only
small number of fields are stored in them. For example, key_purpose
and refcnt in pool struct, DEK object id in dek struct. More fields
will be added to these structs in later patches, for example, the
different bulk lists for pool struct, the bulk pointer dek struct
belongs to, and a list_entry for the list in a pool, which is used to
save keys waiting for being freed while other thread is doing sync.

Besides the creation and destruction interfaces, new one is also added
to get obj id.

Currently these APIs are planned to used by TLS only.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Refactor the encryption key creation

Move the common code to general functions which can be used by fast
update encryption key in later patches.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Add const to the key pointer of encryption key creation

Change key pointer to const void *, as there is no need to change the
key content. This is also to avoid modifying the key by mistake.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Prepare for fast crypto key update if hardware supports it

Add CAP for crypto offload, do the simple initialization if hardware
supports it. Currently set log_dek_obj_range to 12, so 4k DEKs will be
created in one bulk allocation.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Change key type to key purpose

Change the naming of key type in DEK fields and macros, to be
consistent with the device spec.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Add IFC bits and enums for crypto key

Add and extend structure layouts and defines for fast crypto key
update. This is a prerequisite to support bulk creation, key
modification and destruction, software wrapped DEK, and SYNC_CRYPTO
command.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Add IFC bits for general obj create param

Before this patch, the log_obj_range was defined inside
general_obj_in_cmd_hdr to support bulk allocation. However, we need to
modify/query one of the object in the bulk in later patch, so change
those fields to param bits for parameters specific for cmd header, and
add general_obj_create_param according to what was updated in spec.
We will also add general_obj_query_param for modify/query later.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Header file for crypto

Take crypto API out of the generic mlx5.h header into a dedicated
header.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

ixgbe: Remove redundant pci_enable_pcie_error_reporting()

pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.

Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.

Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Remove redundant pci_enable_pcie_error_reporting()

pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.

Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.

Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igb: Remove redundant pci_enable_pcie_error_reporting()

pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.

Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.

Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Remove redundant pci_enable_pcie_error_reporting()

pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.

Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.

Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

iavf: Remove redundant pci_enable_pcie_error_reporting()

pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.

Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.

Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

i40e: Remove redundant pci_enable_pcie_error_reporting()

pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.

Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.

Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

fm10k: Remove redundant pci_enable_pcie_error_reporting()

pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.

Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.

Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

e1000e: Remove redundant pci_enable_pcie_error_reporting()

pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.

Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.

Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Merge branch 'devlink-next'

Jakub Kicinski says:

====================
devlink: fix reload notifications and remove features

First two patches adjust notifications during devlink reload.
The last patch removes no longer needed devlink features.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>

devlink: remove devlink features

Devlink features were introduced to disallow devlink reload calls of
userspace before the devlink was fully initialized. The reason for this
workaround was the fact that devlink reload was originally called
without devlink instance lock held.

However, with recent changes that converted devlink reload to be
performed under devlink instance lock, this is redundant so remove
devlink features entirely.

Note that mlx5 used this to enable devlink reload conditionally only
when device didn't act as multi port slave. Move the multi port check
into mlx5_devlink_reload_down() callback alongside with the other
checks preventing the device from reload in certain states.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: send objects notifications during devlink reload

Currently, the notifications are only sent for params. People who
introduced other objects forgot to add the reload notifications here.

To make sure all notifications happen according to existing comment,
benefit from existence of devlink_notify_register/unregister() helpers
and use them in reload code.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: move devlink reload notifications back in between _down() and _up() calls

This effectively reverts commit 05a7f4a8dff1 ("devlink: Break parameter
notification sequence to be before/after unload/load driver").

Cited commit resolved a problem in mlx5 params implementation,
when param notification code accessed memory previously freed
during reload.

Now, when the params can be registered and unregistered when devlink
instance is registered, mlx5 code unregisters the problematic param
during devlink reload. The fix is therefore no longer needed.

Current behavior is a it problematic, as it sends DEL notifications even
in potential case when reload_down() call fails which might confuse
userspace notifications listener.

So move the reload notifications back where they were originally in
between reload_down() and reload_up() calls.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sparx5-ES2-VCAP-support'

Steen Hegelund says:

====================
Adding Sparx5 ES2 VCAP support

This provides the Egress Stage 2 (ES2) VCAP (Versatile Content-Aware
Processor) support for the Sparx5 platform.

The ES2 VCAP is an Egress Access Control VCAP that uses frame keyfields and
previously classified keyfields to apply e.g. policing, trapping or
mirroring to frames.

The ES2 VCAP has 2 lookups and they are accessible with a TC chain id:

- chain 20000000: ES2 Lookup 0
- chain 20100000: ES2 Lookup 1

As the other Sparx5 VCAPs the ES2 VCAP has its own lookup/port keyset
configuration that decides which keys will be used for matching on which
traffic type.

The ES2 VCAP has these traffic classifications:

- IPv4 frames
- IPv6 frames
- Other frames

The ES2 VCAP can match on an ISDX key (Ingress Service Index) as one of the
frame metadata keyfields. The IS0 VCAP can update this key using its
actions, and this allows a IS0 VCAP rule to be linked to an ES2 rule.

This is similar to how the IS0 VCAP and the IS2 VCAP use the PAG
(Policy Association Group) keyfield to link rules.

From user space this is exposed via "chain offsets", so an IS0 rule with a
"goto chain 20000015" action will use an ISDX key value of 15 to link to a
rule in the ES2 VCAP attached to the same chain id.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add KUNIT tests for enabling/disabling chains

This enhances the KUNIT test of the VCAP API with tests of the chaining
functionality.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add TC support for the ES2 VCAP

This enables the TC command to use the Sparx5 ES2 VCAP, and provides a new
ES2 ethertype table and handling of rule links between IS0 and ES2.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add ingress information to VCAP instance

This allows the check of the goto action to be specific to the ingress and
egress VCAP instances.

The debugfs support is also updated to show this information.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add ES2 VCAP keyset configuration for Sparx5

This adds the ES2 VCAP port keyset configuration for Sparx5 and also
updates the debugFS support to show the keyset configuration and the egress
port mask.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add ES2 VCAP model and updated KUNIT VCAP model

This provides the VCAP model for the Sparx5 ES2 (Egress Stage 2) VCAP.

This VCAP provides tagging and remarking functionality

This also renames a VCAP keyfield: VCAP_KF_MIRROR_ENA becomes
VCAP_KF_MIRROR_PROBE, as the first name was caused by a mistake in the
model transformation.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Improve error message when parsing CVLAN filter

This improves the error message when a TC filter with CVLAN tag is used and
the selected VCAP instance does not support this.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Improve the IP frame key match for IPv6 frames

This ensures that it will be possible for a VCAP rule to distinguish IPv6
frames from non-IP frames, as the IS0 keyset usually selected for the IPv6
traffic class in (7TUPLE) does not offer a key that specifies IPv6
directly: only non-IPv4.

The IP_SNAP key ensures that we select (at least) IP frames.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add support for getting keysets without a type id

When there is only one keyset available for a certain VCAP rule size, the
particular keyset does not need a type id when encoded in the VCAP
Hardware.

This provides support for getting a keyset from a rule, when this is the
case: only one keyset fits this rule size.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'batadv-next-pullrequest-20230127' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

- bump version strings, by Simon Wunderlich

- drop prandom.h includes, by Sven Eckelmann

- fix mailing list address, by Sven Eckelmann

- multicast feature preparation, by Linus Lüssing (2 patches)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Add a check for oversized packets

Occasionnaly we may get oversized packets from the hardware which
exceed the nomimal 2KiB buffer size we allocate SKBs with. Add an early
check which drops the packet to avoid invoking skb_over_panic() and move
on to processing the next packet.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fec: convert to gpio descriptor

The driver can be trivially converted, as it only triggers the gpio
pin briefly to do a reset, and it already only supports DT.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mdio: mux-meson-g12a: use __clk_is_enabled to simplify the code

By using __clk_is_enabled () we can avoid defining an own variable for
tracking whether enable counter is zero.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netlink: recommend policy range validation

For large ranges (outside of s16) the documentation currently
recommends open-coding the validation, but it's better to use
the NLA_POLICY_FULL_RANGE() or NLA_POLICY_FULL_RANGE_SIGNED()
policy validation instead; recommend that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230127084506.09f280619d64.I5dece85f06efa8ab0f474ca77df9e26d3553d4ab@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
bpf-next 2023-01-28

We've added 124 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 6386 insertions(+), 1827 deletions(-).

The main changes are:

1) Implement XDP hints via kfuncs with initial support for RX hash and
   timestamp metadata kfuncs, from Stanislav Fomichev and
   Toke Høiland-Jørgensen.
   Measurements on overhead: https://lore.kernel.org/bpf/875yellcx6.fsf@toke.dk

2) Extend libbpf's bpf_tracing.h support for tracing arguments of
   kprobes/uprobes and syscall as a special case, from Andrii Nakryiko.

3) Significantly reduce the search time for module symbols by livepatch
   and BPF, from Jiri Olsa and Zhen Lei.

4) Enable cpumasks to be used as kptrs, which is useful for tracing
   programs tracking which tasks end up running on which CPUs
   in different time intervals, from David Vernet.

5) Fix several issues in the dynptr processing such as stack slot liveness
   propagation, missing checks for PTR_TO_STACK variable offset, etc,
   from Kumar Kartikeya Dwivedi.

6) Various performance improvements, fixes, and introduction of more
   than just one XDP program to XSK selftests, from Magnus Karlsson.

7) Big batch to BPF samples to reduce deprecated functionality,
   from Daniel T. Lee.

8) Enable struct_ops programs to be sleepable in verifier,
   from David Vernet.

9) Reduce pr_warn() noise on BTF mismatches when they are expected under
   the CONFIG_MODULE_ALLOW_BTF_MISMATCH config anyway, from Connor O'Brien.

10) Describe modulo and division by zero behavior of the BPF runtime
    in BPF's instruction specification document, from Dave Thaler.

11) Several improvements to libbpf API documentation in libbpf.h,
    from Grant Seltzer.

12) Improve resolve_btfids header dependencies related to subcmd and add
    proper support for HOSTCC, from Ian Rogers.

13) Add ipip6 and ip6ip decapsulation support for bpf_skb_adjust_room()
    helper along with BPF selftests, from Ziyang Xuan.

14) Simplify the parsing logic of structure parameters for BPF trampoline
    in the x86-64 JIT compiler, from Pu Lehui.

15) Get BTF working for kernels with CONFIG_RUST enabled by excluding
    Rust compilation units with pahole, from Martin Rodriguez Reboredo.

16) Get bpf_setsockopt() working for kTLS on top of TCP sockets,
    from Kui-Feng Lee.

17) Disable stack protection for BPF objects in bpftool given BPF backends
    don't support it, from Holger Hoffstätte.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (124 commits)
  selftest/bpf: Make crashes more debuggable in test_progs
  libbpf: Add documentation to map pinning API functions
  libbpf: Fix malformed documentation formatting
  selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata
  selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket.
  bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().
  bpf/selftests: Verify struct_ops prog sleepable behavior
  bpf: Pass const struct bpf_prog * to .check_member
  libbpf: Support sleepable struct_ops.s section
  bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable
  selftests/bpf: Fix vmtest static compilation error
  tools/resolve_btfids: Alter how HOSTCC is forced
  tools/resolve_btfids: Install subcmd headers
  bpf/docs: Document the nocast aliasing behavior of ___init
  bpf/docs: Document how nested trusted fields may be defined
  bpf/docs: Document cpumask kfuncs in a new file
  selftests/bpf: Add selftest suite for cpumask kfuncs
  selftests/bpf: Add nested trust selftests suite
  bpf: Enable cpumasks to be queried and used as kptrs
  bpf: Disallow NULLable pointers for trusted kfuncs
  ...
====================

Link: https://lore.kernel.org/r/20230128004827.21371-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netpoll: Remove 4s sleep during carrier detection

This patch removes the msleep(4s) during netpoll_setup() if the carrier
appears instantly.

Here are some scenarios where this workaround is counter-productive in
modern ages:

Servers which have BMC communicating over NC-SI via the same NIC as gets
used for netconsole. BMC will keep the PHY up, hence the carrier
appearing instantly.

The link is fibre, SERDES getting sync could happen within 0.1Hz, and
the carrier also appears instantly.

Other than that, if a driver is reporting instant carrier and then
losing it, this is probably a driver bug.

Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20230125185230.3574681-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/netdev/net

Conflicts:

drivers/net/ethernet/intel/ice/ice_main.c
  418e53401e47 ("ice: move devlink port creation/deletion")
  643ef23bd9dd ("ice: Introduce local var for readability")
https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/
https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/

drivers/net/ethernet/engleder/tsnep_main.c
  3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues")
  25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock")
https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/

net/netfilter/nf_conntrack_proto_sctp.c
  13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"")
  a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths")
  f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets")
https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/
https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mt7530: fix tristate and help description

Fix description for tristate and help sections which include inaccurate
information.

Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/20230126190110.9124-1-arinc.unal@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-xdp-execute-xdp_do_flush-before-napi_complete_done'

Magnus Karlsson says:

====================
net: xdp: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found in [1].

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in [2].

The drivers have only been compile-tested since I do not own any of
the HW below. So if you are a maintainer, it would be great if you
could take a quick look to make sure I did not mess something up.

Note that these were the drivers I found that violated the ordering by
running a simple script and manually checking the ones that came up as
potential offenders. But the script was not perfect in any way. There
might still be offenders out there, since the script can generate
false negatives.

[1] https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
[2] https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
====================

Link: https://lore.kernel.org/r/20230125074901.2737-1-magnus.karlsson@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpaa2-eth: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.

Fixes: d678be1dc1ec ("dpaa2-eth: add XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpaa_eth: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.

Fixes: a1e031ffb422 ("dpaa_eth: add XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Acked-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>