review.tizen.org Git - platform/kernel/linux-rpi.git/log

ice: Add GTPU FDIR filter for AVF

Add new FDIR filter type to forward GTPU packets by matching TEID or QFI.
The filter is only enabled when COMMS DDP package is downloaded.

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add non-IP Layer2 protocol FDIR filter for AVF

Add new filter type that allow forward non-IP Ethernet packets base on its
ethertype. The filter is only enabled when COMMS DDP package is loaded.

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add new actions support for VF FDIR

Add two new actions support for VF FDIR:

A passthrough action does not specify the destination queue, but
just allow the packet go to next pipeline stage, a typical use
cases is combined with a software mark (FDID) action.

Allow specify a 2^n continuous queues as the destination of a FDIR rule.
Packet distribution is based on current RSS configure.

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add FDIR pattern action parser for VF

Add basic FDIR flow list and pattern / action parse functions for VF.

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Enable FDIR Configure for AVF

The virtual channel is going to be extended to support FDIR and
RSS configure from AVF. New data structures and OP codes will be
added, the patch enable the FDIR part.

To support above advanced AVF feature, we need to figure out
what kind of data structure should be passed from VF to PF to describe
an FDIR rule or RSS config rule. The common part of the requirement is
we need a data structure to represent the input set selection of a rule's
hash key.

An input set selection is a group of fields be selected from one or more
network protocol layers that could be identified as a specific flow.
For example, select dst IP address from an IPv4 header combined with
dst port from the TCP header as the input set for an IPv4/TCP flow.

The patch adds a new data structure virtchnl_proto_hdrs to abstract
a network protocol headers group which is composed of layers of network
protocol header(virtchnl_proto_hdr).

A protocol header contains a 32 bits mask (field_selector) to describe
which fields are selected as input sets, as well as a header type
(enum virtchnl_proto_hdr_type). Each bit is mapped to a field in
enum virtchnl_proto_hdr_field guided by its header type.

+------------+-----------+------------------------------+
|            | Proto Hdr | Header Type A                |
|            |           +------------------------------+
|            |           | BIT 31 | ... | BIT 1 | BIT 0 |
|            |-----------+------------------------------+
|Proto Hdrs  | Proto Hdr | Header Type B                |
|            |           +------------------------------+
|            |           | BIT 31 | ... | BIT 1 | BIT 0 |
|            |-----------+------------------------------+
|            | Proto Hdr | Header Type C                |
|            |           +------------------------------+
|            |           | BIT 31 | ... | BIT 1 | BIT 0 |
|            |-----------+------------------------------+
|            |    ....                                  |
+-------------------------------------------------------+

All fields in enum virtchnl_proto_hdr_fields are grouped with header type
and the value of the first field of a header type is always 32 aligned.

enum proto_hdr_type {
        header_type_A = 0;
        header_type_B = 1;
        ....
}

enum proto_hdr_field {
        /* header type A */
        header_A_field_0 = 0,
        header_A_field_1 = 1,
        header_A_field_2 = 2,
        header_A_field_3 = 3,

        /* header type B */
        header_B_field_0 = 32, // = header_type_B << 5
        header_B_field_0 = 33,
        header_B_field_0 = 34
        header_B_field_0 = 35,
        ....
};

So we have:
proto_hdr_type = proto_hdr_field / 32
bit offset = proto_hdr_field % 32

To simply the protocol header's operations, couple help macros are added.
For example, to select src IP and dst port as input set for an IPv4/UDP
flow.

we have:
struct virtchnl_proto_hdr hdr[2];

VIRTCHNL_SET_PROTO_HDR_TYPE(&hdr[0], IPV4)
VIRTCHNL_ADD_PROTO_HDR_FIELD(&hdr[0], IPV4, SRC)

VIRTCHNL_SET_PROTO_HDR_TYPE(&hdr[1], UDP)
VIRTCHNL_ADD_PROTO_HDR_FIELD(&hdr[1], UDP, DST)

The byte array is used to store the protocol header of a training package.
The byte array must be network order.

The patch added virtual channel support for iAVF FDIR add/validate/delete
filter. iAVF FDIR is Flow Director for Intel Adaptive Virtual Function
which can direct Ethernet packets to the queues of the Network Interface
Card. Add/delete command is adding or deleting one rule for each virtual
channel message, while validate command is just verifying if this rule
is valid without any other operations.

To add or delete one rule, driver needs to config TCAM and Profile,
build training packets which contains the input set value, and send
the training packets through FDIR Tx queue. In addition, driver needs to
manage the software context to avoid adding duplicated rules, deleting
non-existent rule, input set conflicts and other invalid cases.

NOTE:
Supported pattern/actions and their parse functions are not be included in
this patch, they will be added in a separate one.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Simei Su <simei.su@intel.com>
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add support for per VF ctrl VSI enabling

We are going to enable FDIR configure for AVF through virtual channel.
The first step is to add helper functions to support control VSI setup.
A control VSI will be allocated for a VF when AVF creates its
first FDIR rule through ice_vf_ctrl_vsi_setup().
The patch will also allocate FDIR rule space for VF's control VSI.
If a VF asks for flow director rules, then those should come entirely
from the best effort pool and not from the guaranteed pool. The patch
allow a VF VSI to have only space in the best effort rules.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Enhanced IPv4 and IPv6 flow filter

Separate IPv4 and IPv6 ptype bit mask table into 2 tables:
with or without L4 protocols.

When a flow filter without any l4 type is specified, the
ICE_FLOW_SEG_HDR_IPV_OTHER flag can be used to describe if user
want to create a IP rule target for all IP packet or just IP
packet without l4 header.

Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Support to separate GTP-U uplink and downlink

To apply different input set for GTP-U packet with or without extend
header as well as GTP-U uplink and downlink, we need to add TCAM mask
matching capability. This allows comprehending different PTYPE
attributes by examining flags from the parser. Using this method,
different profiles can be used by examining flag values from the parser.

Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add more advanced protocol support in flow filter

Add more protocol support in flow filter, these
include PPPoE, L2TPv3, GTP, PFCP, ESP and AH.

Signed-off-by: Ting Xu <ting.xu@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Support non word aligned input set field

To support FDIR input set with protocol field like DSCP, TTL,
PROT, etc. which is not word aligned, we need to enable field
vector masking.

Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add more basic protocol support for flow filter

Add more protocol and field support for flow filter include:
ETH, VLAN, ICMP, ARP and TCP flag.

Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Revert "net: dsa: sja1105: Clear VLAN filtering offload netdev feature"

This reverts commit e9bf96943b408e6c99dd13fb01cb907335787c61.

The topic of the reverted patch is the support for switches with global
VLAN filtering, added by commit 061f6a505ac3 ("net: dsa: Add
ndo_vlan_rx_{add, kill}_vid implementation"). Be there a switch with 4
ports swp0 -> swp3, and the following setup:

ip link add br0 type bridge vlan_filtering 1
ip link set swp0 master br0
ip link set swp1 master br0

What would happen with VLAN-tagged traffic received on standalone ports
swp2 and swp3? Well, it would get dropped, were it not for the
.ndo_vlan_rx_add_vid and .ndo_vlan_rx_kill_vid implementations (called
from vlan_vid_add and vlan_vid_del respectively). Basically, for DSA
switches where VLAN filtering is a global attribute, we enforce the
standalone ports to have 'rx-vlan-filter: off [fixed]' in their ethtool
features, which lets the user know that all VLAN-tagged packets that are
not explicitly added in the RX filtering list are dropped.

As for the sja1105 driver, at the time of the reverted patch, it was
operating in a pretty handicapped mode when it had ports under a bridge
with vlan_filtering=1. Specifically, it was unable to terminate traffic
through the CPU port (for further explanation see "Traffic support" in
Documentation/networking/dsa/sja1105.rst).

However, since then, the sja1105 driver has made considerable progress,
and that limitation is no longer as severe now. Specifically, since
commit 2cafa72e516f ("net: dsa: sja1105: add a new
best_effort_vlan_filtering devlink parameter"), the driver is able to
perform CPU termination even when some ports are under bridges with
vlan_filtering=1. Then, since commit 8841f6e63f2c ("net: dsa: sja1105:
make devlink property best_effort_vlan_filtering true by default"), this
even became the default operating mode.

So we can now take advantage of the logic in the DSA core.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: add support for ethtool get_ringparam

Add support for the ethtool get_ringparam operation.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipa-cfg-data-updates'

Alex Elder says:

====================
net: ipa: more configuration data updates

This series starts with two patches that should have been included
in an earlier series.  With these in place, QSB settings are
programmed from information found in the data files rather than
being embedded in code.  Support is then added for reprenting
another QSB property (supported for IPA v4.0+).

The third patch updates the definition of the sequencer type used
for an endpoint.  Previously a set of 2-byte symbols with fairly
long names defined the sequencer type, but now those are broken into
1-byte halves whose names are a little more informative.

The fourth patch moves the sequencer type definition so it only
applies to TX endpoints (they aren't valid for RX endpoints).  And
the last makes some minor documentation updates.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: update some comments in "ipa_data.h"

Fix/expand some comments in "ipa_data.h".

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: sequencer type is for TX endpoints only

We only program the sequencer type for TX endpoints.  So move the
definition of the sequencer type fields into the TX-specific portion
of the endpoint configuration data.  There's no need to maintain
this in the IPA structure; we can extract it from the configuration
data it points to in the one spot it's needed.

We previously specified the sequencer type for RX endpoints with
INVALID values.  These are no longer needed, so get rid of them.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: split sequencer type in two

An IPA endpoint has a sequencer that must be configured based on how
the endpoint is to be used.  Currently the IPA code programs the
sequencer type by splitting a value into four 4-bit nibbles.  Doing
that doesn't really add much value, and regardless, a better way of
splitting the sequencer type is into two halves--the lower byte
describing how normal packet processing is handled, and the next
byte describing information about processing replicas.

So split the sequencer type into two sub-parts:  the sequencer type
and the replication sequencer type.  Define the values supported for
the "main" sequencer type, and define the values supported for the
replication part separately.

In addition, the sequencer type names are quite verbose, encoding
what the type includes, but also what it *excludes*.  Rename the
sequencer types in a way that mainly describes the number of passes
that a packet takes through the IPA processing pipeline, and how
many of those passes end by supplying the processed packet to the
microprocessor.

The result expands the supported types beyond what is required for
now, but simplifies the way these are defined.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: implement MAX_READS_BEATS QSB data

Starting with IPA v4.0, a limit is placed on the number of bytes
outstanding in a transaction, to reduce latency. The limit is
imposed only if this value is non-zero.

We don't use a non-zero value for SC7180, but newer versions of IPA
do. Prepare for that by allowing a programmed value to be specified
in the platform configuration data.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: use configuration data for QSB settings

Use the QSB configuration data in ipa_hardware_config_qsb(), rather
than determining in code what values to use based on IPA version.
Pass configuration data to ipa_hardware_config() so it can be passed
to ipa_hardware_config_qsb().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: decnet: Fixed multiple coding style issues

Made changes to coding style as suggested by checkpatch.pl
changes are of the type:
open brace '{' following struct go on the same line
do not use assignment in if condition

Signed-off-by: Sai Kalyaan Palla <saikalyaan63@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2021-03-19

This series contains updates to igc and e1000e drivers.

Sasha removes unused defines in igc driver.

Jiapeng Zhong changes bool assignments from 0/1 to false/true for igc.

Wei Yongjun marks e1000e_pm_prepare() as __maybe_unused to resolve a
defined but not used warning under certain configurations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: Mark e1000e_pm_prepare() as __maybe_unused

The function e1000e_pm_prepare() may have no callers depending
on configuration, so it must be marked __maybe_unused to avoid
harmless warning:

drivers/net/ethernet/intel/e1000e/netdev.c:6926:12:
warning: 'e1000e_pm_prepare' defined but not used [-Wunused-function]
6926 | static int e1000e_pm_prepare(struct device *dev)
| ^~~~~~~~~~~~~~~~~

Fixes: ccf8b940e5fd ("e1000e: Leverage direct_complete to speed up s2ram")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Assign boolean values to a bool variable

Fix the following coccicheck warnings:

./drivers/net/ethernet/intel/igc/igc_main.c:4961:2-14: WARNING:
Assignment of 0/1 to bool variable.

./drivers/net/ethernet/intel/igc/igc_main.c:4955:2-14: WARNING:
Assignment of 0/1 to bool variable.

./drivers/net/ethernet/intel/igc/igc_main.c:4933:1-13: WARNING:
Assignment of 0/1 to bool variable.

./drivers/net/ethernet/intel/igc/igc_main.c:4592:1-24: WARNING:
Assignment of 0/1 to bool variable.

./drivers/net/ethernet/intel/igc/igc_main.c:4438:2-25: WARNING:
Assignment of 0/1 to bool variable.

./drivers/net/ethernet/intel/igc/igc_main.c:4396:2-25: WARNING:
Assignment of 0/1 to bool variable.

./drivers/net/ethernet/intel/igc/igc_main.c:4018:2-25: WARNING:
Assignment of 0/1 to bool variable.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Remove unused MII_CR_LOOPBACK

MII_CR_LOOPBACK masks not in use in i225 device and can be removed.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Remove unused MII_CR_SPEED

Force PHY speed not supported for i225 devices.
MII_CR_SPEED masks not in use in i225 device and can be removed.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

net: add CONFIG_PCPU_DEV_REFCNT

I was working on a syzbot issue, claiming one device could not be
dismantled because its refcount was -1

unregister_netdevice: waiting for sit0 to become free. Usage count = -1

It would be nice if syzbot could trigger a warning at the time
this reference count became negative.

This patch adds CONFIG_PCPU_DEV_REFCNT options which defaults
to per cpu variables (as before this patch) on SMP builds.

v2: free_dev label in alloc_netdev_mqs() is moved to avoid
a compiler warning (-Wunused-label), as reported
by kernel test robot <lkp@intel.com>

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipa-update-config-data'

Alex Elder says:

====================
net: ipa: update configuration data

Each IPA version has a "data" file defining how various things are
configured.  This series gathers a few updates to this information:
  - The first patch makes all configuration data constant
  - The second fixes an incorrect (but seemingly harmless) value
  - The third simplifies things a bit by using implicit zero
    initialization for memory regions that are empty
  - The fourth adds definitions for memory regions that exist but
    are not yet used
  - The fifth use configuration data rather than conditional code to
    set some bus parameters
====================

net: ipa: define QSB limits in configuration data

Define the maximum number of reads and writes to configure for the
QSB masters used for IPA in configuration data.

We don't use these values yet; the next commit takes care of that.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: define some new memory regions

There are several memory regions that are defined starting with IPA
v4.0, but which were not used for the SC7180 SoC (IPA v4.2). Even
though they're not used (yet), define them so they are ready to be
used for SoCs when they become supported.

There are two QUOTA statistics memory regions, one for the modem and
one for the AP. Define distinct names for these regions, and get
rid of the definition of IPA_MEM_STATS_QUOTA.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: don't define empty memory regions

The AP_HEADER memory region for both the SDM845 and SC7180 SoCs has
zero size, and has no canaries.  Defining an offset for such a
zero-length region is not meaningful, so it's better not to define
it at all.  The size of this region is used in the code, but its
value will still be zero because the memory regions are defined in
statically initialized memory.

For the SC7180, the STATS_DROP memory region has a zero size and no
canaries as well.

These regions are the only place where a zero-sized region is
defined despite having no canaries.  Remove them.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: fix canary count for SC7180 UC_INFO region

There should be no canary values written before the beginning of the
UC_INFO memory region. This was correct for SDM845, but somehow was
committed with the wrong value for SC7180.

This bug seems to cause no harm, so we'll just correct it without
back-porting.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: make all configuration data constant

All of the platform configuration data should be constant, but
that isn't the case for the memory regions, interconnects, and
clocks. Fix this.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

igc: Remove unused MII_CR_RESET

MII_CR_RESET mask not in use in i225 device and can be removed

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Merge branch 'mscc-VSC8584-fixes'

Bjarni Jonasson says:

====================
Fixes applied to VCS8584 family

Three different fixes applied to VSC8584 family:
1. LCPLL reset
2. Serdes calibration
3. Coma mode disabled

The same fixes has already been applied to VSC8514
and most of the functionality can be reused for the VSC8584.

v1 -> v2:
Preserved reversed christmas tree
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc: coma mode disabled for VSC8584

This patch releases coma mode for VSC8584 as done for VSC8514 in
commit ca0d7fd0a58d ("net: phy: mscc: coma mode disabled for VSC8514")

Fixes: a5afc1678044a ("net: phy: mscc: add support for VSC8584 PHY.")
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc: improved serdes calibration applied to VSC8584

Introduced 'FOJI' serdes calibration in commit 85e97f0b984e
("net: phy: mscc: improved serdes calibration applied to VSC8514")
Now including the VSC8584 family.

Fixes: a5afc1678044a ("net: phy: mscc: add support for VSC8584 PHY.")
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc: Applying LCPLL reset to VSC8584

Introduced LCPLL reset in
commit d15e08d9fb82 ("net: phy: mscc: adding LCPLL reset to VSC8514").
Now applying this reset to the VSC8584 phy familiy.

Fixes: a5afc1678044a ("net: phy: mscc: add support for VSC8584 PHY.")
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: teardown CBDR during PF/VF unbind

Michael reports that after the blamed patch, unbinding a VF would cause
these transactions to remain pending, and trigger some warnings with the
DMA API debug:

$ echo 1 > /sys/bus/pci/devices/0000\:00\:00.0/sriov_numvfs
pci 0000:00:01.0: [1957:ef00] type 00 class 0x020001
fsl_enetc_vf 0000:00:01.0: Adding to iommu group 19
fsl_enetc_vf 0000:00:01.0: enabling device (0000 -> 0002)
fsl_enetc_vf 0000:00:01.0 eno0vf0: renamed from eth0

$ echo 0 > /sys/bus/pci/devices/0000\:00\:00.0/sriov_numvfs
DMA-API: pci 0000:00:01.0: device driver has pending DMA allocations while released from device [count=1]
One of leaked entries details: [size=2048 bytes] [mapped with DMA_BIDIRECTIONAL] [mapped as coherent]
WARNING: CPU: 0 PID: 2547 at kernel/dma/debug.c:853 dma_debug_device_change+0x174/0x1c8
(...)
Call trace:
dma_debug_device_change+0x174/0x1c8
blocking_notifier_call_chain+0x74/0xa8
device_release_driver_internal+0x18c/0x1f0
device_release_driver+0x20/0x30
pci_stop_bus_device+0x8c/0xe8
pci_stop_and_remove_bus_device+0x20/0x38
pci_iov_remove_virtfn+0xb8/0x128
sriov_disable+0x3c/0x110
pci_disable_sriov+0x24/0x30
enetc_sriov_configure+0x4c/0x108
sriov_numvfs_store+0x11c/0x198
(...)
DMA-API: Mapped at:
dma_entry_alloc+0xa4/0x130
debug_dma_alloc_coherent+0xbc/0x138
dma_alloc_attrs+0xa4/0x108
enetc_setup_cbdr+0x4c/0x1d0
enetc_vf_probe+0x11c/0x250
pci 0000:00:01.0: Removing from iommu group 19

This happens because stupid me moved enetc_teardown_cbdr outside of
enetc_free_si_resources, but did not bother to keep calling
enetc_teardown_cbdr from all the places where enetc_free_si_resources
was called. In particular, now it is no longer called from the main
unbind function, just from the probe error path.

Fixes: 4b47c0b81ffd ("net: enetc: don't initialize unused ports from a separate code path")
Reported-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: Fix spelling mistake "ratelimitter" -> "ratelimiter"

There is a spelling mistake in an error message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: Fix missing spin_lock_init() in otx2_tc_add_flow()

The driver allocates the spinlock but not initialize it.
Use spin_lock_init() on it to initialize it correctly.

Fixes: d8ce30e0cf76 ("octeontx2-pf: add tc flower stats handler for hw offloads")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: use lower_32_bits/upper_32_bits macros

Use the lower_32_bits/upper_32_bits macros to simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hinic: Remove unused variable.

drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c: In function ‘mgmt_recv_msg_handler’:
drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c:443:18: warning: unused variable ‘pdev’ [-Wunused-variable]
443 | struct pci_dev *pdev = pf_to_mgmt->hwif->pdev;
| ^~~~

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hinic-cleanups'

Daode Huang says:

====================
net: hinic; make some cleanup for hinic

This set try to remove the unnecessary output message, add a blank line,
remove the dupliate word and change the deprecated strlcp functions in
hinic driver, for details, please refer to each patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hinic: convert strlcpy to strscpy

Usage of strlcpy in linux kernel has been recently
deprecated[1], so convert hinic driver to strscpy

[1] https://lore.kernel.org/lkml/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL
=V6A6G1oUZcprmknw@mail.gmail.com/

Signed-off-by: Daode Huang <huangdaode@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hinic: remove the repeat word "the" in comment.

There is a duplicate "the" in the comment, so delete it.

Signed-off-by: Daode Huang <huangdaode@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hinic: add a blank line after declarations

There should be a blank line after declarations, so just add it.

Signed-off-by: Daode Huang <huangdaode@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hinic: Remove unnecessary 'out of memory' message

This patch removes unnecessary out of memory message in hinic driver,
fixes the following checkpatch.pl warning:
"WARNING: Possible unnecessary 'out of memory' message"

Signed-off-by: Daode Huang <huangdaode@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: use napi_alloc_skb

Using napi_alloc_skb in NAPI context avoids enable/disable IRQs, which
increases iperf3 result by a few Mbps. Since napi_alloc_skb() uses
NET_IP_ALIGN, convert other alloc methods to the same padding. Tested
on Intel Core2 and AMD K10 platforms.

Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: switch to napi_gro_receive

Changing to napi_gro_receive() improves efficiency significantly. Tested
on Intel Core2-based motherboards and iperf3.

Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: at803x: remove at803x_aneg_done()

Here is what Vladimir says about it:

  at803x_aneg_done() keeps the aneg reporting as "not done" even when
  the copper-side link was reported as up, but the in-band autoneg has
  not finished.

  That was the _intended_ behavior when that code was introduced, and
  Heiner have said about it [1]:

  | That's not nice from the PHY:
  | It signals "link up", and if the system asks the PHY for link details,
  | then it sheepishly says "well, link is *almost* up".

  If the specification of phy_aneg_done behavior does not include
  in-band autoneg (and it doesn't), then this piece of code does not
  belong here.

  The fact that we can no longer trigger this code from phylib is yet
  another reason why it fails at its intended (and wrong) purpose and
  should be removed.

Removing the SGMII link check, would just keep the call to
genphy_aneg_done(), which is also the fallback. Thus we can just remove
at803x_aneg_done() altogether.

[1] https://lore.kernel.org/netdev/fdf0074a-2572-5914-6f3e-77202cbf96de@gmail.com/

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

taprio: Handle short intervals and large packets

When using short intervals e.g. below one millisecond, large packets won't be
transmitted at all. The software implementations checks whether the packet can
be fit into the remaining interval. Therefore, it takes the packet length and
the transmission speed into account. That is correct.

However, for large packets it may be that the transmission time exceeds the
interval resulting in no packet transmission. The same situation works fine with
hardware offloading applied.

The problem has been observed with the following schedule and iperf3:

|tc qdisc replace dev lan1 parent root handle 100 taprio \
|   num_tc 8 \
|   map 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 \
|   queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
|   base-time $base \
|   sched-entry S 0x40 500000 \
|   sched-entry S 0xbf 500000 \
|   clockid CLOCK_TAI \
|   flags 0x00

[...]

|root@tsn:~# iperf3 -c 192.168.2.105
|Connecting to host 192.168.2.105, port 5201
|[  5] local 192.168.2.121 port 52610 connected to 192.168.2.105 port 5201
|[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
|[  5]   0.00-1.00   sec  45.2 KBytes   370 Kbits/sec    0   1.41 KBytes
|[  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes

After debugging, it seems that the packet length stored in the SKB is about
7000-8000 bytes. Using a 100 Mbit/s link the transmission time is about 600us
which larger than the interval of 500us.

Therefore, segment the SKB into smaller chunks if the packet is too big. This
yields similar results than the hardware offload:

|root@tsn:~# iperf3 -c 192.168.2.105
|Connecting to host 192.168.2.105, port 5201
|- - - - - - - - - - - - - - - - - - - - - - - - -
|[ ID] Interval           Transfer     Bitrate         Retr
|[  5]   0.00-10.00  sec  48.9 MBytes  41.0 Mbits/sec    0             sender
|[  5]   0.00-10.02  sec  48.7 MBytes  40.7 Mbits/sec                  receiver

Furthermore, the segmentation can be skipped for the full offload case, as the
driver or the hardware is expected to handle this.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: net: forwarding: Fix a typo

s/verfied/verified/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'gro-retpoline'

Alexander Lobakin says:

====================
net: avoid retpoline overhead on VLAN and TEB GRO

dev_gro_receive() uses indirect calls for IP GRO functions, but
it works only for the outermost headers and untagged frames.
Simple VLAN tag before an IP header restores the performance hit.
This simple series straightens the GRO calls for IP headers going
after VLAN tag or inner Ethernet header (GENEVE, NvGRE, VxLAN)
for retpolined kernels.
====================

ethernet: avoid retpoline overhead on TEB (GENEVE, NvGRE, VxLAN) GRO

The two most popular headers going after Ethernet are IPv4 and IPv6.
Retpoline overhead for them is addressed only in dev_gro_receive(),
when they lie right after the outermost Ethernet header.
Use the indirect call wrappers in TEB (Transparent Ethernet Bridging,
such as GENEVE, NvGRE, VxLAN etc.) GRO receive code to reduce the
penalty when processing the inner headers.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan/8021q: avoid retpoline overhead on GRO

The two most popular headers going after VLAN are IPv4 and IPv6.
Retpoline overhead for them is addressed only in dev_gro_receive(),
when they lie right after the outermost Ethernet header.
Use the indirect call wrappers in VLAN GRO receive code to reduce
the penalty on receiving tagged frames (when hardware stripping is
off or not available).

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>

gro: add combined call_gro_receive() + INDIRECT_CALL_INET() helper

call_gro_receive() is used to limit GRO recursion, but it works only
with callback pointers.
There's a combined version of call_gro_receive() + INDIRECT_CALL_2()
in <net/inet_common.h>, but it doesn't check for IPv6 modularity.
Add a similar new helper to cover both of these. It can and will be
used to avoid retpoline overhead when IP header lies behind another
offloaded proto.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>

gro: make net/gro.h self-contained

If some source file includes <net/gro.h>, but doesn't include
<linux/indirect_call_wrapper.h>:

In file included from net/8021q/vlan_core.c:7:
./include/net/gro.h:6:1: warning: data definition has no type or storage class
6 | INDIRECT_CALLABLE_DECLARE(struct sk_buff *ipv6_gro_receive(struct list_head *,
| ^~~~~~~~~~~~~~~~~~~~~~~~~
./include/net/gro.h:6:1: error: type defaults to ‘int’ in declaration of ‘INDIRECT_CALLABLE_DECLARE’ [-Werror=implicit-int]

[...]

Include <linux/indirect_call_wrapper.h> directly. It's small and
won't pull lots of dependencies.
Also add some incomplete struct declarations to be fully stacked.

Fixes: 04f00ab2275f ("net/core: move gro function declarations to separate header ")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>

Fix a typo

s/serisouly/seriously/

...and the sentence construction.

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ionic-fixes'

Shannon Nelson says:

====================
ionic fixes

These are a few little fixes and cleanups found while working
on other features and more testing.
====================

ionic: protect adminq from early destroy

Don't destroy the adminq while there is an outstanding request.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: stop watchdog when in broken state

Up to now we've been ignoring any error return from the
queue starting in the link status check, so we fix that here.
If the driver had to reset and couldn't get things running
properly again, for example after a Tx Timeout and the FW is
not responding to commands, don't let the link watchdog try
to restart the queues. At this point the user can try to DOWN
and UP the device to clear the errors.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: block actions during fw reset

Block some actions while the FW is in a reset activity
and the queues are not configured.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: update ethtool support bits for BASET

Add support in get_link_ksettings for a couple of
new BASET connections.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: fix unchecked reference

We can get to the counter without going through the pointer
that the robot complained about.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: simplify the intr_index use in txq_init

The qcq->intr.index was set when the queue was allocated,
there is no need to reach around to find it.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: code cleanup details

Catch a couple of missing macro name uses, fix a couple
of misspellings, etc.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ocelot: support multiple bridges

The ocelot switches are a bit odd in that they do not have an STP state
to put the ports into. Instead, the forwarding configuration is delayed
from the typical port_bridge_join into stp_state_set, when the port enters
the BR_STATE_FORWARDING state.

I can only guess that the implementation of this quirk is the reason that
led to the simplification of the driver such that only one bridge could
be offloaded at a time.

We can simplify the data structures somewhat, and introduce a per-port
bridge device pointer and STP state, similar to how the LAG offload
works now (there we have a per-port bonding device pointer and TX
enabled state). This allows offloading multiple bridges with relative
ease, while still keeping in place the quirk to delay the programming of
the PGIDs.

We actually need this change now because we need to remove the bogus
restriction from ocelot_bridge_stp_state_set that ocelot->bridge_mask
needs to contain BIT(port), otherwise that function is a no-op.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ocelot: Fix deletetion of MRP entries from MAC table

When a MRP ring was deleted or disabled, the driver was iterating over
the ports to detect if any other MPR rings exists and in case it didn't
exist it would delete the MAC table entry. But the problem was that it
used the last iterated port to delete the MAC table entry and this could
be a NULL port.

The fix consists of using the port on which the function was called.

Fixes: 7c588c3e96e9733a ("net: ocelot: Extend MRP")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: lapbether: Close the LAPB device before its underlying Ethernet device closes

When a virtual LAPB device's underlying Ethernet device closes, the LAPB
device is also closed.

However, currently the LAPB device is closed after the Ethernet device
closes. It would be better to close it before the Ethernet device closes.
This would allow the LAPB device to transmit a last frame to notify the
other side that it is disconnecting.

Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: Remove redundant initialization of pointer pfvf

The pointer pfvf is being initialized with a value that is
never read and it is being updated later with a new value. The
initialization is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Fixes: 56bcef528bd8 ("octeontx2-af: Use npc_install_flow API for promisc and broadcast entries")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cdc_ncm: drop redundant driver-data assignment

The driver data for the data interface has already been set by
usb_driver_claim_interface() so drop the subsequent redundant
assignment.

Note that this also avoids setting the driver data three times in case
of a combined interface.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfc/fdp: Simplify the return expression of fdp_nci_open()

Simplify the return expression.

Signed-off-by: zuoqilin <zuoqilin@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

/net/core/: fix misspellings using codespell tool

A typo is found out by codespell tool in 1734th line of drop_monitor.c:

$ codespell ./net/core/
./net/core/drop_monitor.c:1734: guarnateed ==> guaranteed

Fix a typo found by codespell.

Signed-off-by: Xiong Zhenwu <xiong.zhenwu@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

/net/hsr: fix misspellings using codespell tool

A typo is found out by codespell tool in 111th line of hsr_debugfs.c:

$ codespell ./net/hsr/

net/hsr/hsr_debugfs.c:111: Debufs ==> Debugfs

Fix typos found by codespell.

Signed-off-by: Xiong Zhenwu <xiong.zhenwu@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

of: of_net: Provide function name and param description

Fixes the following W=1 kernel build warning(s):

drivers/of/of_net.c:104: warning: Function parameter or member 'np' not described in 'of_get_mac_address'
drivers/of/of_net.c:104: warning: expecting prototype for mac(). Prototype was for of_get_mac_address() instead

Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: netdev@vger.kernel.org
Cc: devicetree@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: intel: Add PSE and PCH PTP clock source selection

Intel mGbE variant implemented in EHL and TGL can be set to select
different clock frequency based on GPO bits in MAC_GPIO_STATUS register.

We introduce a new "void (*ptp_clk_freq_config)(void *priv)" in platform
data so that if a platform is required to configure the frequency of clock
source, in this case Intel mGBE does, the platform-specific configuration
of the PTP clock setting is done when stmmac_ptp_register() is called.

Signed-off-by: Wong, Vee Khee <vee.khee.wong@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Co-developed-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mv88e6xxx-offload-bridge-flags'

Tobias Waldekranz says:

====================
net: dsa: mv88e6xxx: Offload bridge port flags

Add support for offloading learning and broadcast flooding flags. With
this in place, mv88e6xx supports offloading of all bridge port flags
that are currently supported by the bridge.

Broadcast flooding is somewhat awkward to control as there is no
per-port bit for this like there is for unknown unicast and unknown
multicast. Instead we have to update the ATU entry for the broadcast
address for all currently used FIDs.

v2 -> v3:
  - Only return a netdev from dsa_port_to_bridge_port if the port is
    currently bridged (Vladimir & Florian)

v1 -> v2:
  - Ensure that mv88e6xxx_vtu_get handles VID 0 (Vladimir)
  - Fixed off-by-one in mv88e6xxx_port_set_assoc_vector (Vladimir)
  - Fast age all entries on port when disabling learning (Vladimir)
  - Correctly detect bridge flags on LAG ports (Vladimir)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Offload bridge broadcast flooding flag

These switches have two modes of classifying broadcast:

1. Broadcast is multicast.
2. Broadcast is its own unique thing that is always flooded
everywhere.

This driver uses the first option, making sure to load the broadcast
address into all active databases. Because of this, we can support
per-port broadcast flooding by (1) making sure to only set the subset
of ports that have it enabled whenever joining a new bridge or VLAN,
and (2) by updating all active databases whenever the setting is
changed on a port.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Offload bridge learning flag

Allow a user to control automatic learning per port.

Many chips have an explicit "LearningDisable"-bit that can be used for
this, but we opt for setting/clearing the PAV instead, as it works on
all devices at least as far back as 6083.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Flood all traffic classes on standalone ports

In accordance with the comment in dsa_port_bridge_leave, standalone
ports shall be configured to flood all types of traffic. This change
aligns the mv88e6xxx driver with that policy.

Previously a standalone port would initially not egress any unknown
traffic, but after joining and then leaving a bridge, it would.

This does not matter that much since we only ever send FROM_CPUs on
standalone ports, but it seems prudent to make sure that the initial
values match those that are applied after a bridging/unbridging cycle.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Use standard helper for broadcast address

Use the conventional declaration style of a MAC address in the
kernel (u8 addr[ETH_ALEN]) for the broadcast address, then set it
using the existing helper.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Remove some bureaucracy around querying the VTU

The hardware has a somewhat quirky protocol for reading out the VTU
entry for a particular VID. But there is no reason why we cannot
create a better API for ourselves in the driver.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Provide generic VTU iterator

Move the intricacies of correctly iterating over the VTU to a common
implementation.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Avoid useless attempts to fast-age LAGs

When a port is a part of a LAG, the ATU will create dynamic entries
belonging to the LAG ID when learning is enabled. So trying to
fast-age those out using the constituent port will have no
effect. Unfortunately the hardware does not support move operations on
LAGs so there is no obvious way to transform the request to target the
LAG instead.

Instead we document this known limitation and at least avoid wasting
any time on it.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Add helper to resolve bridge port from DSA port

In order for a driver to be able to query a bridge for information
about itself, e.g. reading out port flags, it has to use a netdev that
is known to the bridge. In the simple case, that is just the netdev
representing the port, e.g. swp0 or swp1 in this example:

   br0
   / \
swp0 swp1

But in the case of an offloaded lag, this will be the bond or team
interface, e.g. bond0 in this example:

     br0
     /
  bond0
   / \
swp0 swp1

Add a helper that hides some of this complexity from the
drivers. Then, redefine dsa_port_offloads_bridge_port using the helper
to avoid double accounting of the set of possible offloaded uppers.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipa-32bit'

Alex Elder says:

====================
net: ipa: support 32-bit targets

There is currently a configuration dependency that restricts IPA to
be supported only on 64-bit machines.  There are only a few things
that really require that, and those are fixed in this series.  The
last patch in the series removes the CONFIG_64BIT build dependency
for IPA.

Version 2 of this series uses upper_32_bits() rather than creating
a new function to extract bits out of a DMA address.  Version 3 of
uses lower_32_bits() as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: relax 64-bit build requirement

We currently assume the IPA driver is built only for a 64 bit kernel.

When this constraint was put in place it eliminated some do_div()
calls, replacing them with the "/" and "%" operators. We now only
use these operations on u32 and size_t objects. In a 32-bit kernel
build, size_t will be 32 bits wide, so there remains no reason to
use do_div() for divide and modulo.

A few recent commits also fix some code that assumes that DMA
addresses are 64 bits wide.

With that, we can get rid of the 64-bit build requirement.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: fix table alignment requirement

We currently have a build-time check to ensure that the minimum DMA
allocation alignment satisfies the constraint that IPA filter and
route tables must point to rules that are 128-byte aligned.

But what's really important is that the actual allocated DMA memory
has that alignment, even if the minimum is smaller than that.

Remove the BUILD_BUG_ON() call checking against minimim DMA alignment
and instead verify at rutime that the allocated memory is properly
aligned.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: use upper_32_bits()

Use upper_32_bits() to extract the high-order 32 bits of a DMA
address. This avoids doing a 32-position shift on a DMA address
if it happens not to be 64 bits wide. Use lower_32_bits() to
extract the low-order 32 bits (because that's what it's for).

Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: fix assumptions about DMA address size

Some build time checks in ipa_table_validate_build() assume that a
DMA address is 64 bits wide.  That is more restrictive than it has
to be.  A route or filter table is 64 bits wide no matter what the
size of a DMA address is on the AP.  The code actually uses a
pointer to __le64 to access table entries, and a fixed constant
IPA_TABLE_ENTRY_SIZE to describe the size of those entries.

Loosen up two checks so they still verify some requirements, but
such that they do not assume the size of a DMA address is 64 bits.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 's390-qeth-next'

Julian Wiedmann says:

====================
s390/qeth: updates 2021-03-18

please apply the following patch series for qeth to netdev's net-next
tree.

This brings two small optimizations (replace a hard-coded GFP_ATOMIC,
pass through the NAPI budget to enable napi_consume_skb()), and removes
some redundant VLAN filter code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: remove RX VLAN filter stubs in L3 driver

The callbacks have been slimmed down to a level where they no longer do
any actual work. So stop pretending that we support the
NETIF_F_HW_VLAN_CTAG_FILTER feature.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: enable napi_consume_skb() for pending TX buffers

Pending TX buffers are completed from the same NAPI code as normal
TX buffers. Pass the NAPI budget to qeth_tx_complete_buf() so that
the freeing of the completed skbs can be deferred.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: allocate initial TX Buffer structs with GFP_KERNEL

qeth_init_qdio_out_buf() is typically called during initialization, and
the GFP_ATOMIC is only needed for a very specific & rare case during TX
completion.

Allow callers to specify a gfp mask, so that the initialization path can
select GFP_KERNEL. While at it also clarify the function name.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-xps-improve-the-xps-maps-handling'

Antoine Tenart says:

====================
net: xps: improve the xps maps handling

This series aims at fixing various issues with the xps code, including
out-of-bound accesses and use-after-free. While doing so we try to
improve the xps code maintainability and readability.

The main change is moving dev->num_tc and dev->nr_ids in the xps maps, to
avoid out-of-bound accesses as those two fields can be updated after the
maps have been allocated. This allows further reworks, to improve the
xps code readability and allow to stop taking the rtnl lock when
reading the maps in sysfs. The maps are moved to an array in net_device,
which simplifies the code a lot.

One future improvement may be to remove the use of xps_map_mutex from
net/core/dev.c, but that may require extra care.

Thanks!
Antoine

Since v3:
  - Removed the 3 patches about the rtnl lock and __netif_set_xps_queue
    as there are extra issues. Those patches were not tied to the
    others, and I'll see want can be done as a separate effort.
  - One small fix in patch 12.

Since v2:
  - Patches 13-16 are new to the series.
  - Fixed another issue I found while preparing v3 (use after free of
    old xps maps).
  - Kept the rtnl lock when calling netdev_get_tx_queue and
    netdev_txq_to_tc.
  - Use get_device/put_device when using the sb_dev.
  - Take the rtnl lock in mlx5 and virtio_net when calling
    netif_set_xps_queue.
  - Fixed a coding style issue.

Since v1:
  - Reordered the patches to improve readability and avoid introducing
    issues in between patches.
  - Use dev_maps->nr_ids to allocate the mask in xps_queue_show but
    still default to nr_cpu_ids/dev->num_rx_queues in xps_queue_show
    when dev_maps hasn't been allocated yet for backward
    compatibility.:w
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: NULL the old xps map entries when freeing them

In __netif_set_xps_queue, old map entries from the old dev_maps are
freed but their corresponding entry in the old dev_maps aren't NULLed.
Fix this.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix use after free in xps

When setting up an new dev_maps in __netif_set_xps_queue, we remove and
free maps from unused CPUs/rx-queues near the end of the function; by
calling remove_xps_queue. However it's possible those maps are also part
of the old not-freed-yet dev_maps, which might be used concurrently.
When that happens, a map can be freed while its corresponding entry in
the old dev_maps table isn't NULLed, leading to: "BUG: KASAN:
use-after-free" in different places.

This fixes the map freeing logic for unused CPUs/rx-queues, to also NULL
the map entries from the old dev_maps table.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-sysfs: move the xps cpus/rxqs retrieval in a common function

Most of the xps_cpus_show and xps_rxqs_show functions share the same
logic. Having it in two different functions does not help maintenance.
This patch moves their common logic into a new function, xps_queue_show,
to improve this.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-sysfs: move the rtnl unlock up in the xps show helpers

Now that nr_ids and num_tc are stored in the xps dev_maps, which are RCU
protected, we do not have the need to protect the maps in the rtnl lock.
Move the rtnl unlock up so we reduce the rtnl locking section.

We also increase the reference count on the subordinate device if any,
as we don't want this device to be freed while we use it (now that the
rtnl lock isn't protecting it in the whole function).

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: improve queue removal readability in __netif_set_xps_queue

Improve the readability of the loop removing tx-queue from unused
CPUs/rx-queues in __netif_set_xps_queue. The change should only be
cosmetic.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>