review.tizen.org Git - platform/kernel/linux-starfive.git/log

can: ctucanfd: platform: add missing dependency to HAS_IOMEM

The kernel test robot noticed that the ctucanfd platform driver fails
during modpost on platforms that don't support IOMEM.

| ERROR: modpost: "devm_ioremap_resource" [drivers/net/can/ctucanfd/ctucanfd_platform.ko] undefined!

This patch adds the missing HAS_IOMEM dependency.

Link: https://lore.kernel.org/all/20220523123720.1656611-1-mkl@pengutronix.de
Fixes: e8f0c23a2415 ("can: ctucanfd: CTU CAN FD open-source IP core - platform/SoC support.")
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Cc: Ondrej Ille <ondrej.ille@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_usb: silence a GCC 12 -Warray-bounds warning

This driver does a lot of casting of smaller buffers to
struct kvaser_cmd_ext, GCC 12 does not like that:

| drivers/net/can/usb/kvaser_usb/kvaser_usb_hydra.c:489:65: warning: array subscript ‘struct kvaser_cmd_ext[0]’ is partly outside array bounds of ‘unsigned char[32]’ [-Warray-bounds]
| drivers/net/can/usb/kvaser_usb/kvaser_usb_hydra.c:489:23: note: in expansion of macro ‘le16_to_cpu’
| 489 | ret = le16_to_cpu(((struct kvaser_cmd_ext *)cmd)->len);
| | ^~~~~~~~~~~

Temporarily silence this warning (move it to W=1 builds).

Link: https://lore.kernel.org/all/20220520194659.2356903-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: peak_usb: fix typo in comment

Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Link: https://lore.kernel.org/all/20220521111145.81697-24-Julia.Lawall@inria.fr
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

net: dsa: OF-ware slave_mii_bus

If found, register the DSA internally allocated slave_mii_bus with an OF
"mdio" child object. It can save some drivers from creating their
custom internal MDIO bus.

Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eth: de4x5: remove support for Generic DECchip & DIGITAL EtherWORKS PCI/EISA

Looks like almost all changes to this driver had been tree-wide
refactoring since git era begun. There is one commit from Al
15 years ago which could potentially be fixing a real bug.

The driver is using virt_to_bus() and is a real magnet for pointless
cleanups. It seems unlikely to have real users. Let's try to shed
this maintenance burden.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: fix error code in mtk_flow_offload_replace()

Preserve the error code from mtk_foe_entry_commit(). Do not return
success.

Fixes: c4f033d9e03e ("net: ethernet: mtk_eth_soc: rework hardware flow table management")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-multi-cpu-port-part-two'

Vladimir Oltean says:

====================
DSA changes for multiple CPU ports (part 2)

As explained in part 1:
https://patchwork.kernel.org/project/netdevbpf/cover/20220511095020.562461-1-vladimir.oltean@nxp.com/
I am trying to enable the second internal port pair from the NXP LS1028A
Felix switch for DSA-tagged traffic via "ocelot-8021q". This series
represents part 2 (of an unknown number) of that effort.

This series deals only with a minor bug fix (first patch) and with code
reorganization in the Felix DSA driver and in the Ocelot switch library.
Hopefully this will lay the ground for a clean introduction of new UAPI
for changing the DSA master of a user port in part 3.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: felix: tag_8021q preparation for multiple CPU ports

Update the VCAP filters to support multiple tag_8021q CPU ports.

TX works using a filter for VLAN ID on the ingress of the CPU port, with
a redirect and a VLAN pop action. This can be updated trivially by
amending the ingress port mask of this rule to match on all tag_8021q
CPU ports.

RX works using a filter for ingress port on the egress of the CPU port,
with a VLAN push action. Here we need to replicate these filters for
each tag_8021q CPU port, and let them all have the same action.
This means that the OCELOT_VCAP_ES0_TAG_8021Q_RXVLAN() cookie needs to
encode a unique value for every {user port, CPU port} pair it's given.
Do this by encoding the CPU port in the upper 16 bits of the cookie, and
the user port in the lower 16 bits.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: switch from {,un}set to {,un}assign for tag_8021q CPU ports

There is a desire for the felix driver to gain support for multiple
tag_8021q CPU ports, but the current model prevents it.

This is because ocelot_apply_bridge_fwd_mask() only takes into
consideration whether a port is a tag_8021q CPU port, but not whose CPU
port it is.

We need a model where we can have a direct affinity between an ocelot
port and a tag_8021q CPU port. This serves as the basis for multiple CPU
ports.

Declare a "dsa_8021q_cpu" backpointer in struct ocelot_port which
encodes that affinity. Repurpose the "ocelot_set_dsa_8021q_cpu" API to
"ocelot_assign_dsa_8021q_cpu" to express the change of paradigm.

Note that this change makes the first practical use of the new
ocelot_port->index field in ocelot_port_unassign_dsa_8021q_cpu(), where
we need to remove the old tag_8021q CPU port from the reserved VLAN range.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: felix: directly call ocelot_port_{set,unset}_dsa_8021q_cpu

Absorb the final details of calling ocelot_port_{,un}set_dsa_8021q_cpu(),
i.e. the need to lock &ocelot->fwd_domain_lock, into the callee, to
simplify the caller and permit easier code reuse later.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: felix: update bridge fwd mask from ocelot lib when changing tag_8021q CPU

Add more logic to ocelot_port_{,un}set_dsa_8021q_cpu() from the ocelot
switch lib by encapsulating the ocelot_apply_bridge_fwd_mask() call that
felix used to have.

This is necessary because the CPU port change procedure will also need
to do this, and it's good to reduce code duplication by having an entry
point in the ocelot switch lib that does all that is needed.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: felix: move the updating of PGID_CPU to the ocelot lib

PGID_CPU must be updated every time a port is configured or unconfigured
as a tag_8021q CPU port. The ocelot switch lib already has a hook for
that operation, so move the updating of PGID_CPU to those hooks.

These bits are pretty specific to DSA, so normally I would keep them out
of the common switch lib, but when tag_8021q is in use, this has
implications upon the forwarding mask determined by
ocelot_apply_bridge_fwd_mask() and called extensively by the switch lib.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: fix missing adjustment of host broadcast flooding

PGID_BC is configured statically by ocelot_init() to flood towards the
CPU port module, and dynamically by ocelot_port_set_bcast_flood()
towards all user ports.

When the tagging protocol changes, the intention is to turn off flooding
towards the old pipe towards the host, and to turn it on towards the new
pipe.

Due to a recent change which removed the adjustment of PGID_BC from
felix_set_host_flood(), 3 things happen.

- when we change from NPI to tag_8021q mode: in this mode, the CPU port
  module is accessed via registers, and used to read PTP packets with
  timestamps. We fail to disable broadcast flooding towards the CPU port
  module, and to enable broadcast flooding towards the physical port
  that serves as a DSA tag_8021q CPU port.

- from tag_8021q to NPI mode: in this mode, the CPU port module is
  redirected to a physical port. We fail to disable broadcast flooding
  towards the physical tag_8021q CPU port, and to enable it towards the
  CPU port module at ocelot->num_phys_ports.

- when the ports are put in promiscuous mode, we also fail to update
  PGID_BC towards the host pipe of the current protocol.

First issue means that felix_check_xtr_pkt() has to do extra work,
because it will not see only PTP packets, but also broadcasts. It needs
to dequeue these packets just to drop them.

Third issue is inconsequential, since PGID_BC is allocated from the
nonreserved multicast PGID space, and these PGIDs are conveniently
initialized to 0x7f (i.e. flood towards all ports except the CPU port
module). Broadcasts reach the NPI port via ocelot_init(), and reach the
tag_8021q CPU port via the hardware defaults.

Second issue is also inconsequential, because we fail both at disabling
and at enabling broadcast flooding on a port, so the defaults mentioned
above are preserved, and they are fine except for the performance impact.

Fixes: 7a29d220f4c0 ("net: dsa: felix: reimplement tagging protocol change with function pointers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fix-silence-gcc-12-warnings-in-drivers-net-wireless'

Jakub Kicinski says:

====================
Fix/silence GCC 12 warnings in drivers/net/wireless/

as mentioned off list we'd like to get GCC 12 warnings quashed.
This set takes care of the warnings we have in drivers/net/wireless/
mostly by relegating them to W=1/W=2 builds.
====================

Link: https://lore.kernel.org/r/20220520194320.2356236-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wifi: carl9170: silence a GCC 12 -Warray-bounds warning

carl9170 has a big union (struct carl9170_cmd) with all the command
types in it. But it allocates buffers only large enough for a given
command. This upsets GCC 12:

drivers/net/wireless/ath/carl9170/cmd.c:125:30: warning: array subscript ‘struct carl9170_cmd[0]’ is partly outside array bounds of ‘unsigned char[8]’ [-Warray-bounds]
125 | tmp->hdr.cmd = cmd;
| ~~~~~~~~~~~~~^~~~~

Punt the warning to W=1 for now. Hopefully GCC will learn to
recognize which fields are in-bounds.

Acked-by: Christian Lamparter <chunkeey@gmail.com>
Acked-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wifi: brcmfmac: work around a GCC 12 -Warray-bounds warning

GCC 12 really doesn't like partial struct allocations:

drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:2202:32: warning: array subscript ‘struct brcmf_ext_join_params_le[0]’ is partly outside array bounds of ‘void[70]’ [-Warray-bounds]
2202 | ext_join_params->scan_le.passive_time =
| ^~

brcmfmac is trying to save 2 bytes at the end by either allocating
or not allocating a channel member. Let's keep @join_params_size
the "right" size but kmalloc() the full structure.

Acked-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wifi: iwlwifi: use unsigned to silence a GCC 12 warning

GCC 12 says:

drivers/net/wireless/intel/iwlwifi/mvm/sta.c:1076:37: warning: array subscript -1 is below array bounds of ‘struct iwl_mvm_tid_data[9]’ [-Warray-bounds]
1076 | if (mvmsta->tid_data[tid].state != IWL_AGG_OFF)
| ~~~~~~~~~~~~~~~~^~~~~

Whatever, tid is a bit from for_each_set_bit(), it's clearly unsigned.

Acked-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wifi: ath6k: silence false positive -Wno-dangling-pointer warning on GCC 12

For some reason GCC 12 decided to complain about the common
pattern of queuing an object onto a list on the stack in ath6k:

    inlined from ‘ath6kl_htc_mbox_tx’ at drivers/net/wireless/ath/ath6kl/htc_mbox.c:1142:3:
include/linux/list.h:74:19: warning: storing the address of local variable ‘queue’ in ‘*&packet_15(D)->list.prev’ [-Wdangling-pointer=]
   74 |         new->prev = prev;
      |         ~~~~~~~~~~^~~~~~

Move the warning to W=1, hopefully it goes away with a compiler
update.

Acked-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wifi: rtlwifi: remove always-true condition pointed out by GCC 12

The .value is a two-dim array, not a pointer.

struct iqk_matrix_regs {
bool iqk_done;
long value[1][IQK_MATRIX_REG_NUM];
};

Acked-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wifi: ath9k: silence array-bounds warning on GCC 12

GCC 12 says:

drivers/net/wireless/ath/ath9k/mac.c: In function ‘ath9k_hw_resettxqueue’:
drivers/net/wireless/ath/ath9k/mac.c:373:22: warning: array subscript 32 is above array bounds of ‘struct ath9k_tx_queue_info[10]’ [-Warray-bounds]
373 | qi = &ah->txq[q];
| ~~~~~~~^~~

I don't know where it got the 32 from, relegate the warning to W=1+.

Acked-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wifi: plfxlc: remove redundant NULL-check for GCC 12

GCC is upset that we check the return value of plfxlc_usb_dev()
even tho it can't be NULL:

drivers/net/wireless/purelifi/plfxlc/usb.c: In function ‘resume’:
drivers/net/wireless/purelifi/plfxlc/usb.c:840:20: warning: the comparison will always evaluate as ‘true’ for the address of ‘dev’ will never be NULL [-Waddress]
840 | if (!pl || !plfxlc_usb_dev(pl))
| ^

plfxlc_usb_dev() returns an address of one of the members of pl,
so it's safe to drop these checks.

Acked-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: toshiba,visconti-dwmac: Update the common clock properties

The clock for this driver switched to the common clock controller driver.
Therefore, update common clock properties for ethernet device in the binding
document.

Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fddi: skfp: smt: Remove extra parameters to vararg macro

cppcheck reports
[drivers/net/fddi/skfp/smt.c:750]: (warning) printf format string requires 0 parameters but 2 are given.

DB_SBAN is a vararg macro, like DB_ESSN. Remove the extra args and the nl.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mt7986-support'

Jakub Kicinski says:

====================
introduce mt7986 ethernet support

Add support for mt7986-eth driver available on mt7986 soc.

Changes since v2:
- rely on GFP_KERNEL whenever possible
- define mtk_reg_map struct to introduce soc register map and avoid macros
- improve comments

Changes since v1:
- drop SRAM option
- convert ring->dma to void
- convert scratch_ring to void
- enable port4
- fix irq dts bindings
- drop gmac1 support from mt7986a-rfb dts for the moment
====================

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: introduce support for mt7986 chipset

Add support for mt7986-eth driver available on mt7986 soc.

Tested-by: Sam Shih <sam.shih@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: convert scratch_ring pointer to void

Simplify the code converting scratch_ring pointer to void

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: convert ring dma pointer to void

Simplify the code converting {tx,rx} ring dma pointer to void

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: introduce MTK_NETSYS_V2 support

Introduce MTK_NETSYS_V2 support. MTK_NETSYS_V2 defines 32B TX/RX DMA
descriptors.
This is a preliminary patch to add mt7986 ethernet support.

Tested-by: Sam Shih <sam.shih@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: introduce device register map

Introduce reg_map structure to add the capability to support different
register definitions. Move register definitions in mtk_regmap structure.
This is a preliminary patch to introduce mt7986 ethernet support.

Tested-by: Sam Shih <sam.shih@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: rely on rxd_size field in mtk_rx_alloc/mtk_rx_clean

Remove mtk_rx_dma structure layout dependency in mtk_rx_alloc/mtk_rx_clean.
Initialize to 0 rxd3 and rxd4 in mtk_rx_alloc.
This is a preliminary patch to add mt7986 ethernet support.

Tested-by: Sam Shih <sam.shih@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: rely on txd_size field in mtk_poll_tx/mtk_poll_rx

This is a preliminary to ad mt7986 ethernet support.

Tested-by: Sam Shih <sam.shih@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: add rxd_size to mtk_soc_data

Similar to tx counterpart, introduce rxd_size in mtk_soc_data data
structure.
This is a preliminary patch to add mt7986 ethernet support.

Tested-by: Sam Shih <sam.shih@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: rely on txd_size in txd_to_idx

This is a preliminary patch to add mt7986 ethernet support.

Tested-by: Sam Shih <sam.shih@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: rely on txd_size in mtk_desc_to_tx_buf

This is a preliminary patch to add mt7986 ethernet support.

Tested-by: Sam Shih <sam.shih@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: rely on txd_size in mtk_tx_alloc/mtk_tx_clean

This is a preliminary patch to add mt7986 ethernet support.

Tested-by: Sam Shih <sam.shih@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: add txd_size to mtk_soc_data

In order to remove mtk_tx_dma size dependency, introduce txd_size in
mtk_soc_data data structure. Rely on txd_size in mtk_init_fq_dma() and
mtk_dma_free() routines.
This is a preliminary patch to add mt7986 ethernet support.

Tested-by: Sam Shih <sam.shih@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: move tx dma desc configuration in mtk_tx_set_dma_desc

Move tx dma descriptor configuration in mtk_tx_set_dma_desc routine.
This is a preliminary patch to introduce mt7986 ethernet support since
it relies on a different tx dma descriptor layout.

Tested-by: Sam Shih <sam.shih@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mtk_eth_soc: rely on GFP_KERNEL for dma_alloc_coherent whenever possible

Rely on GFP_KERNEL for dma descriptors mappings in mtk_tx_alloc(),
mtk_rx_alloc() and mtk_init_fq_dma() since they are run in non-irq
context.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: mediatek,net: add mt7986-eth binding

Introduce dts bindings for mt7986 soc in mediatek,net.yaml.

Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

arm64: dts: mediatek: mt7986: introduce ethernet nodes

Introduce ethernet nodes in mt7986 bindings in order to
enable mt7986a/mt7986b ethernet support.

Co-developed-by: Sam Shih <sam.shih@mediatek.com>
Signed-off-by: Sam Shih <sam.shih@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-gcc12-warnings'

Jakub Kicinski says:

====================
eth: silence the GCC 12 array-bounds warnings

Silence the array-bounds warnings in Ethernet drivers.
v2 uses -Wno-array-bounds directly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

eth: tg3: silence the GCC 12 array-bounds warning

GCC 12 currently generates a rather inconsistent warning:

drivers/net/ethernet/broadcom/tg3.c:17795:51: warning: array subscript 5 is above array bounds of ‘struct tg3_napi[5]’ [-Warray-bounds]
17795 | struct tg3_napi *tnapi = &tp->napi[i];
| ~~~~~~~~^~~

i is guaranteed < tp->irq_max which in turn is either 1 or 5.
There are more loops like this one in the driver, but strangely
GCC 12 dislikes only this single one.

Silence this silliness for now.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

eth: ice: silence the GCC 12 array-bounds warning

GCC 12 gets upset because driver allocates partial
struct ice_aqc_sw_rules_elem buffers. The writes are
within bounds.

Silence these warnings for now, our build bot runs GCC 12
so we won't allow any new instances.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

eth: mtk_eth_soc: silence the GCC 12 array-bounds warning

GCC 12 gets upset because in mtk_foe_entry_commit_subflow()
this driver allocates a partial structure. The writes are
within bounds.

Silence these warnings for now, our build bot runs GCC 12
so we won't allow any new instances.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: offload tc action "ok" using an empty action vector

The "ok" tc action is useful when placed in front of a more generic
filter to exclude some more specific rules from matching it.

The ocelot switches can offload this tc action by creating an empty
action vector (no _ENA fields set to 1). This makes sense for all of
VCAP IS1, IS2 and ES0 (but not for PSFP).

Add support for this action. Note that this makes the
gact_drop_and_ok_test() selftest pass, where "action ok" is used in
front of an "action drop" rule, both offloaded to VCAP IS2.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ocelot-selftests'

Vladimir Oltean says:

====================
Streamline Ocelot tc-chains selftest

This series changes the output and the argument format of the Ocelot
switch selftest so that it is more similar to what can be found in
tools/testing/selftests/net/forwarding/.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: ocelot: tc_flower_chains: reorder interfaces

Use the standard interface order h1, swp1, swp2, h2 that is used by the
forwarding selftest framework. The previous order was confusing even
with the ASCII drawing. That isn't needed anymore.

This also drops the fixed MAC addresses and uses STABLE_MAC_ADDRS, which
ensures the MAC addresses are unique.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: ocelot: tc_flower_chains: use conventional interface names

This is a robotic rename as follows:

eth0 -> swp1
eth1 -> swp2
eth2 -> h2
eth3 -> h1

This brings the selftest more in line with the other forwarding
selftests, where h1 is connected to swp1, and h2 to swp2.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: ocelot: tc_flower_chains: streamline test output

Bring this driver-specific selftest output in line with the other
selftests.

Before:

Testing VLAN pop..                      OK
Testing VLAN push..                     OK
Testing ingress VLAN modification..             OK
Testing egress VLAN modification..              OK
Testing frame prioritization..          OK

After:

TEST: VLAN pop                                                      [ OK ]
TEST: VLAN push                                                     [ OK ]
TEST: Ingress VLAN modification                                     [ OK ]
TEST: Egress VLAN modification                                      [ OK ]
TEST: Frame prioritization                                          [ OK ]

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wrap the wireless pointers in struct net_device in an ifdef

Most protocol-specific pointers in struct net_device are under
a respective ifdef. Wireless is the notable exception. Since
there's a sizable number of custom-built kernels for datacenter
workloads which don't build wireless it seems reasonable to
ifdefy those pointers as well.

While at it move IPv4 and IPv6 pointers up, those are special
for obvious reasons.

Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> # ieee802154
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: Do proper error checking for enet_out clk

An error code returned by devm_clk_get() might have other meanings than
"This clock doesn't exist". So use devm_clk_get_optional() and handle
all remaining errors as fatal.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: DP83822: enable rgmii mode if phy_interface_is_rgmii

RGMII mode can be enable from dp83822 straps, and also writing bit 9
of register 0x17 - RMII and Status Register (RCSR).
When phy_interface_is_rgmii rgmii mode must be enabled, same for
contrary, this prevents malconfigurations of hw straps

References:
- https://www.ti.com/lit/gpn/dp83822i p66

Signed-off-by: Tommaso Merciai <tommaso.merciai@amarulasolutions.com>
Co-developed-by: Michael Trimarchi <michael@amarulasolutions.com>
Suggested-by: Alberto Bianchi <alberto.bianchi@amarulasolutions.com>
Tested-by: Tommaso Merciai <tommaso.merciai@amarulasolutions.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: selftests: Add stress_reuseport_listen to .gitignore

Add newly added stress_reuseport_listen object to .gitignore file.

Fixes: ec8cb4f617a2 ("net: selftests: Stress reuseport listen")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rxrpc-misc'

David Howells says:

====================
rxrpc: Miscellaneous changes

Here are some miscellaneous changes for AF_RXRPC:

(1) Allow the list of local endpoints to be viewed through /proc.

(2) Switch to using refcount_t for refcounting.

(3) Fix a locking issue found by lockdep.

(4) Autogenerate tracing symbol enums from symbol->string maps to make it
     easier to keep them in sync.

(5) Return an error to sendmsg() if a call it tried to set up failed.
     Because it failed at this point, no notification will be generated for
     recvmsg to pick up - but userspace still needs to know about the
     failure.

(6) Fix the selection of abort codes generated by internal events.  In
     particular, rxrpc and kafs shouldn't be generating RX_USER_ABORT
     unless it's because userspace did something to cancel a call.

(7) Adjust the interpretation and handling of certain ACK types to try and
     detect NAT changes causing a call to seem to start mid-flow from a
     different peer.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

afs: Adjust ACK interpretation to try and cope with NAT

If a client's address changes, say if it is NAT'd, this can disrupt an in
progress operation.  For most operations, this is not much of a problem,
but StoreData can be different as some servers modify the target file as
the data comes in, so if a store request is disrupted, the file can get
corrupted on the server.

The problem is that the server doesn't recognise packets that come after
the change of address as belonging to the original client and will bounce
them, either by sending an OUT_OF_SEQUENCE ACK to the apparent new call if
the packet number falls within the initial sequence number window of a call
or by sending an EXCEEDS_WINDOW ACK if it falls outside and then aborting
it.  In both cases, firstPacket will be 1 and previousPacket will be 0 in
the ACK information.

Fix this by the following means:

(1) If a client call receives an EXCEEDS_WINDOW ACK with firstPacket as 1
     and previousPacket as 0, assume this indicates that the server saw the
     incoming packets from a different peer and thus as a different call.
     Fail the call with error -ENETRESET.

(2) Also fail the call if a similar OUT_OF_SEQUENCE ACK occurs if the
     first packet has been hard-ACK'd.  If it hasn't been hard-ACK'd, the
     ACK packet will cause it to get retransmitted, so the call will just
     be repeated.

(3) Make afs_select_fileserver() treat -ENETRESET as a straight fail of
     the operation.

(4) Prioritise the error code over things like -ECONNRESET as the server
     did actually respond.

(5) Make writeback treat -ENETRESET as a retryable error and make it
     redirty all the pages involved in a write so that the VM will retry.

Note that there is still a circumstance that I can't easily deal with: if
the operation is fully received and processed by the server, but the reply
is lost due to address change.  There's no way to know if the op happened.
We can examine the server, but a conflicting change could have been made by
a third party - and we can't tell the difference.  In such a case, a
message like:

    kAFS: vnode modified {100058:146266} b7->b8 YFS.StoreData64 (op=2646a)

will be logged to dmesg on the next op to touch the file and the client
will reset the inode state, including invalidating clean parts of the
pagecache.

Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004811.html
Signed-off-by: David S. Miller <davem@davemloft.net>

rxrpc, afs: Fix selection of abort codes

The RX_USER_ABORT code should really only be used to indicate that the user
of the rxrpc service (ie. userspace) implicitly caused a call to be aborted
- for instance if the AF_RXRPC socket is closed whilst the call was in
progress.  (The user may also explicitly abort a call and specify the abort
code to use).

Change some of the points of generation to use other abort codes instead:

(1) Abort the call with RXGEN_SS_UNMARSHAL or RXGEN_CC_UNMARSHAL if we see
     ENOMEM and EFAULT during received data delivery and abort with
     RX_CALL_DEAD in the default case.

(2) Abort with RXGEN_SS_MARSHAL if we get ENOMEM whilst trying to send a
     reply.

(3) Abort with RX_CALL_DEAD if we stop hearing from the peer if we had
     heard from the peer and abort with RX_CALL_TIMEOUT if we hadn't.

(4) Abort with RX_CALL_DEAD if we try to disconnect a call that's not
     completed successfully or been aborted.

Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>

rxrpc: Return an error to sendmsg if call failed

If at the end of rxrpc sendmsg() or rxrpc_kernel_send_data() the call that
was being given data was aborted remotely or otherwise failed, return an
error rather than returning the amount of data buffered for transmission.

The call (presumably) did not complete, so there's not much point
continuing with it.  AF_RXRPC considers it "complete" and so will be
unwilling to do anything else with it - and won't send a notification for
it, deeming the return from sendmsg sufficient.

Not returning an error causes afs to incorrectly handle a StoreData
operation that gets interrupted by a change of address due to NAT
reconfiguration.

This doesn't normally affect most operations since their request parameters
tend to fit into a single UDP packet and afs_make_call() returns before the
server responds; StoreData is different as it involves transmission of a
lot of data.

This can be triggered on a client by doing something like:

dd if=/dev/zero of=/afs/example.com/foo bs=1M count=512

at one prompt, and then changing the network address at another prompt,
e.g.:

ifconfig enp6s0 inet 192.168.6.2 && route add 192.168.6.1 dev enp6s0

Tracing packets on an Auristor fileserver looks something like:

192.168.6.1 -> 192.168.6.3  RX 107 ACK Idle  Seq: 0  Call: 4  Source Port: 7000  Destination Port: 7001
192.168.6.3 -> 192.168.6.1  AFS (RX) 1482 FS Request: Unknown(64538) (64538)
192.168.6.3 -> 192.168.6.1  AFS (RX) 1482 FS Request: Unknown(64538) (64538)
192.168.6.1 -> 192.168.6.3  RX 107 ACK Idle  Seq: 0  Call: 4  Source Port: 7000  Destination Port: 7001
<ARP exchange for 192.168.6.2>
192.168.6.2 -> 192.168.6.1  AFS (RX) 1482 FS Request: Unknown(0) (0)
192.168.6.2 -> 192.168.6.1  AFS (RX) 1482 FS Request: Unknown(0) (0)
192.168.6.1 -> 192.168.6.2  RX 107 ACK Exceeds Window  Seq: 0  Call: 4  Source Port: 7000  Destination Port: 7001
192.168.6.1 -> 192.168.6.2  RX 74 ABORT  Seq: 0  Call: 4  Source Port: 7000  Destination Port: 7001
192.168.6.1 -> 192.168.6.2  RX 74 ABORT  Seq: 29321  Call: 4  Source Port: 7000  Destination Port: 7001

The Auristor fileserver logs code -453 (RXGEN_SS_UNMARSHAL), but the abort
code received by kafs is -5 (RX_PROTOCOL_ERROR) as the rx layer sees the
condition and generates an abort first and the unmarshal error is a
consequence of that at the application layer.

Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004810.html
Signed-off-by: David S. Miller <davem@davemloft.net>

rxrpc: Automatically generate trace tag enums

Automatically generate trace tag enums from the symbol -> string mapping
tables rather than having the enums as well, thereby reducing duplicated
data.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>

rxrpc: Fix locking issue

There's a locking issue with the per-netns list of calls in rxrpc.  The
pieces of code that add and remove a call from the list use write_lock()
and the calls procfile uses read_lock() to access it.  However, the timer
callback function may trigger a removal by trying to queue a call for
processing and finding that it's already queued - at which point it has a
spare refcount that it has to do something with.  Unfortunately, if it puts
the call and this reduces the refcount to 0, the call will be removed from
the list.  Unfortunately, since the _bh variants of the locking functions
aren't used, this can deadlock.

================================
WARNING: inconsistent lock state
5.18.0-rc3-build4+ #10 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes:
ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b
{SOFTIRQ-ON-W} state was registered at:
...
Possible unsafe locking scenario:

       CPU0
       ----
  lock(&rxnet->call_lock);
  <Interrupt>
    lock(&rxnet->call_lock);

*** DEADLOCK ***

1 lock held by ksoftirqd/2/25:
#0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d

Changes
=======
ver #2)
- Changed to using list_next_rcu() rather than rcu_dereference() directly.

Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>

rxrpc: Use refcount_t rather than atomic_t

Move to using refcount_t rather than atomic_t for refcounts in rxrpc.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>

rxrpc: Allow list of in-use local UDP endpoints to be viewed in /proc

Allow the list of in-use local UDP endpoints in the current network
namespace to be viewed in /proc.

To aid with this, the endpoint list is converted to an hlist and RCU-safe
manipulation is used so that the list can be read with only the RCU
read lock held.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipa-next'

Alex Elder says:

====================
net: ipa: a few more small items

This series consists of three small sets of changes.  Version 2 adds
a patch that avoids a warning that occurs when handling a modem
crash (I unfortunately didn't notice it earlier).  All other patches
are the same--just rebased.

The first three patches allow a few endpoint features to be
specified.  At this time, currently-defined endpoints retain the
same configuration, but when the monitor functionality is added in
the next cycle these options will be required.

The fourth patch simply removes an unused function, explaining also
why it would likely never be used.

The fifth patch is new.  It counts the number of modem TX endpoints
and uses it to determine how many TREs a transaction needs when
when handling a modem crash.  It is needed to avoid exceeding the
limited number of commands imposed by the last four patches.

And the last four patches refactor code related to IPA immediate
commands, eliminating an unused field and then simplifying and
removing some unneeded code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: use data space for command opcodes

The 64-bit data field in a transaction is not used for commands.
And the opcode array is *only* used for commands. They're
(currently) the same size; save a little space in the transaction
structure by enclosing the two fields in a union.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: remove command info pool

The ipa_cmd_info structure now contains only one field, and it's an
enumerated type whose values all fit in 8 bits.  Currently we'll
never use more than 8 TREs in a command transaction, and we can
represent that number of command opcodes in the same space as a 64
bit pointer to an ipa_cmd_info structure.

Define IPA_COMMAND_TRANS_TRE_MAX as the maximum number of TREs that
can be in a command transaction.  Replace the info pointer in a
transaction with a fixed-size array named cmd_opcode[] of that many
bytes.  Store the opcode in this array when adding a command TRE to
a transaction, as was done previously for the info array.  This
makes the ipa_cmd_info unused, so get rid of it.

When committing an immediate command transaction, use the channel's
Boolean command flag to determine whether to fill in the opcode,
which will be taken (as before) from the array in the transaction.

This makes the command info pool unnecessary, so get rid of it.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: remove command direction argument

We no longer use the direction argument for gsi_trans_cmd_add(), so
get rid of it in its definition, and in its seven callers.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: get rid of ipa_cmd_info->direction

The direction field of the ipa_cmd_info structure is set, but never
used. It seems it might have been used for the DMA_SHARED_MEM
immediate command, but the DIRECTION flag is set based on the value
of the passed-in direction flag there.

Anyway, remove this unused field from the ipa_cmd_info structure.
This is done as a separate patch to make it very obvious that it's
not required.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: count the number of modem TX endpoints

In ipa_endpoint_modem_exception_reset_all(), a high estimate was
made of the number of endpoints that need their status register
updated. We only used what was needed, so the high estimate didn't
matter much.

However the next few patches are going to limit the number of
commands in a single transaction, and the overestimate would exceed
that. So count the number of modem TX endpoints at initialization
time, and use it in ipa_endpoint_modem_exception_reset_all().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: kill gsi_trans_commit_wait_timeout()

Since the beginning gsi_trans_commit_wait_timeout() has existed to
provide a way to allow waiting a limited time for a transaction
to complete.  But that function has never been used.

In fact, there is no use for this function, because a transaction
committed to hardware should *always* complete.  The only reason it
might not complete is if there were a hardware failure, or perhaps a
system configuration error.

Furthermore, if a timeout ever did occur, the IPA hardware would be
in an indeterminate state, from which there is no recovery.  It
would require some sort of complete IPA reset, and would require the
participation of the modem, and at this time there is no such
sequence defined.

So get rid of the definition of gsi_trans_commit_wait_timeout(), and
update a few comments accordingly.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: specify RX aggregation time limit in config data

Don't assume that a 500 microsecond time limit should be used for
all receive endpoints that support aggregation. Instead, specify
the time limit to use in the configuration data.

Set a 500 microsecond limit for all existing RX endpoints, as before.

Checking for overflow for the time limit field is a bit complicated.
Rather than duplicate a lot of code in ipa_endpoint_data_valid_one(),
call WARN() if any value is found to be too large when encoding it.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: support hard aggregation limits

Add a new flag for AP receive endpoints that indicates whether
a "hard limit" is used as a criterion for closing aggregation.
Add comments explaining the difference between "hard" and "soft"
aggregation limits.

Pass a flag to ipa_aggr_size_kb() so it computes the proper
aggregation size value whether using hard or soft limits. Move
that function earlier in "ipa_endpoint.c" so it can be used
without a forward-reference.

Update ipa_endpoint_data_valid_one() so it validates endpoints whose
data indicate a hard aggregation limit is used, and so it reports
set aggregation flags for endpoints without aggregation enabled.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: make endpoint HOLB drop configurable

Add a new Boolean flag for RX endpoints defining whether HOLB drop
is initially enabled or disabled for the endpoint. All existing AP
endpoints should have HOLB drop disabled.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: fix typos in comments

Spelling mistakes (triple letters) in comments.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: flower: fix typo in comment

Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: marvell: prestera: fix typo in comment

Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

cirrus: cs89x0: fix typo in comment

Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qed: fix typos in comments

Spelling mistakes (triple letters) in comments.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: fix typo in comment

Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: fix typo in comment

Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sparx5: switchdev: fix typo in comment

Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'add-a-bhash2-table-hashed-by-port-address'

Joanne Koong says:

====================
Add a bhash2 table hashed by port + address

This patchset proposes adding a bhash2 table that hashes by port and address.
The motivation behind bhash2 is to expedite bind requests in situations where
the port has many sockets in its bhash table entry, which makes checking bind
conflicts costly especially given that we acquire the table entry spinlock
while doing so, which can cause softirq cpu lockups and can prevent new tcp
connections.

We ran into this problem at Meta where the traffic team binds a large number
of IPs to port 443 and the bind() call took a significant amount of time
which led to cpu softirq lockups, which caused packet drops and other failures
on the machine

The patches are as follows:
1/2 - Adds a second bhash table (bhash2) hashed by port and address
2/2 - Adds a test for timing how long an additional bind request takes when
the bhash entry is populated

When experimentally testing this on a local server for ~24k sockets bound to
the port, the results seen were:

ipv4:
before - 0.002317 seconds
with bhash2 - 0.000018 seconds

ipv6:
before - 0.002431 seconds
with bhash2 - 0.000021 seconds
====================

Link: https://lore.kernel.org/r/20220520001834.2247810-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: Add test for timing a bind request to a port with a populated bhash entry

This test populates the bhash table for a given port with
MAX_THREADS * MAX_CONNECTIONS sockets, and then times how long
a bind request on the port takes.

When populating the bhash table, we create the sockets and then bind
the sockets to the same address and port (SO_REUSEADDR and SO_REUSEPORT
are set). When timing how long a bind on the port takes, we bind on a
different address without SO_REUSEPORT set. We do not set SO_REUSEPORT
because we are interested in the case where the bind request does not
go through the tb->fastreuseport path, which is fragile (eg
tb->fastreuseport path does not work if binding with a different uid).

To run the test locally, I did:
* ulimit -n 65535000
* ip addr add 2001:0db8:0:f101::1 dev eth0
* ./bind_bhash_test 443

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Add a second bind table hashed by port and address

We currently have one tcp bind table (bhash) which hashes by port
number only. In the socket bind path, we check for bind conflicts by
traversing the specified port's inet_bind2_bucket while holding the
bucket's spinlock (see inet_csk_get_port() and inet_csk_bind_conflict()).

In instances where there are tons of sockets hashed to the same port
at different addresses, checking for a bind conflict is time-intensive
and can cause softirq cpu lockups, as well as stops new tcp connections
since __inet_inherit_port() also contests for the spinlock.

This patch proposes adding a second bind table, bhash2, that hashes by
port and ip address. Searching the bhash2 table leads to significantly
faster conflict resolution and less time holding the spinlock.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wwan: iosm: use a flexible array rather than allocate short objects

GCC array-bounds warns that ipc_coredump_get_list() under-allocates
the size of struct iosm_cd_table *cd_table.

This is avoidable - we just need a flexible array. Nothing calls
sizeof() on struct iosm_cd_list or anything that contains it.

Reviewed-by: M Chetan Kumar <m.chetan.kumar@intel.com>
Link: https://lore.kernel.org/r/20220520060013.2309497-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

stcp: Use memset_after() to zero sctp_stream_out_ext

Use memset_after() helper to simplify the code, there is no functional
change in this patch.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Link: https://lore.kernel.org/r/20220519062932.249926-1-xiujianfeng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: fix the alignment in ocelot_port_fdb_del()

align the extack argument of the ocelot_port_fdb_del()
function.

Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Link: https://lore.kernel.org/r/20220520002040.4442-1-eng.alaamohamedsoliman.am@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: vxlan: Fix kernel coding style

The continuation line does not align with the opening bracket
and this patch fix it.

Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Link: https://lore.kernel.org/r/20220520003614.6073-1-eng.alaamohamedsoliman.am@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: bnxt: make ulp_id unsigned to make GCC 12 happy

GCC array bounds checking complains that ulp_id is validated
only against upper bound. Make it unsigned.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20220520061955.2312968-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib_nexthops: Make ping timeout configurable

Commit 49bb39bddad2 ("selftests: fib_nexthops: Make the test more robust")
increased the timeout of ping commands to 5 seconds, to make the test
more robust. Make the timeout configurable using '-w' argument to allow
user to change it depending on the system that runs the test. Some systems
suffer from slow forwarding performance, so they may need to change the
timeout.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220519070921.3559701-1-amcohen@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: wwan: t7xx: use GFP_ATOMIC under spin lock in t7xx_cldma_gpd_set_next_ptr()

Sometimes t7xx_cldma_gpd_set_next_ptr() is called under spin lock,
so add 'gfp_mask' parameter in t7xx_cldma_gpd_set_next_ptr() to pass
the flag.

Fixes: 39d439047f1d ("net: wwan: t7xx: Add control DMA interface")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Link: https://lore.kernel.org/r/20220519032108.2996400-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: tulip: fix build with CONFIG_GSC

Fix typo which breaks build for parisc.

Fixes: 3daebfbeb455 ("net: tulip: convert to devres")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Link: https://lore.kernel.org/all/CA+G9fYuCzU5VZ_nc+6NEdBXJdVCH=J2SB1Na1G_NS_0BNdGYtg@mail.gmail.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Link: https://lore.kernel.org/r/4719560.GXAFRqVoOG@eto.sf-tec.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: avoid strange behavior with skb_defer_max == 1

When user sets skb_defer_max to 1 the kick threshold is 0
(half of 1). If we increment queue length before the check
the kick will never happen, and the skb may get stranded.
This is likely harmless but can be avoided by moving the
increment after the check. This way skb_defer_max == 1
will always kick. Still a silly config to have, but
somehow that feels more correct.

While at it drop a comment which seems to be outdated
or confusing, and wrap the defer_count write with
a WRITE_ONCE() since it's read on the fast path
that avoids taking the lock.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220518185522.2038683-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc/siena: Remove duplicate check on segments

Siena only supports software TSO. This means more code can be deleted,
as pointed out by the Smatch static checker warning:
drivers/net/ethernet/sfc/siena/tx.c:184 __efx_siena_enqueue_skb()
warn: duplicate check 'segments' (previous on line 158)

Fixes: 956f2d86cb37 ("sfc/siena: Remove build references to missing functionality")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/kernel-janitors/YoH5tJMnwuGTrn1Z@kili/
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/165294463549.23865.4557617334650441347.stgit@palantir17.mph.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp_ipv6: set the drop_reason in the right place

Looks like the IPv6 version of the patch under Fixes was
a copy/paste of the IPv4 but hit the wrong spot.
It is tcp_v6_rcv() which uses drop_reason as a boolean, and
needs to be protected against reason == 0 before calling free.
tcp_v6_do_rcv() has a pretty straightforward flow.

The resulting warning looks like this:
  WARNING: CPU: 1 PID: 0 at net/core/skbuff.c:775
  Call Trace:
    tcp_v6_rcv (net/ipv6/tcp_ipv6.c:1767)
    ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:438)
    ip6_input_finish (include/linux/rcupdate.h:726)
    ip6_input (include/linux/netfilter.h:307)

Fixes: f8319dfd1b3b ("net: tcp: reset 'drop_reason' to NOT_SPCIFIED in tcp_v{4,6}_rcv()")
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20220520021347.2270207-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-ipa-next'

Alex Elder says:

====================
net: ipa: a mix of patches

This series includes a mix of things things that are generally
minor.  The first four are sort of unrelated fixes, and summarizing
them here wouldn't be that helpful.

The last three together make it so only the "configuration data" we
need after initialization is saved for later use.  Most such data is
used only during driver initialization.  But endpoint configuration
is needed later, so the last patch saves a copy of that.  Eventually
we'll want to support reconfiguring endpoints at runtime as well,
and this will facilitate that.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: save a copy of endpoint default config

All elements of the default endpoint configuration are used in the
code when programming an endpoint for use. But none of the other
configuration data is ever needed once things are initialized.

So rather than saving a pointer to *all* of the configuration data,
save a copy of only the endpoint configuration portion.

This will eventually allow endpoint configuration to be modifiable
at runtime. But even before that it means we won't keep a pointer
to configuration data after when no longer needed.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: rename a few endpoint config data types

Rename the just-moved data structure types to drop the "_data"
suffix, to make it more obvious they are no longer meant to be used
just as read-only initialization data. Rename the fields and
variables of these types to use "config" instead of "data" in the
name. This is another small step meant to facilitate review.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: move endpoint configuration data definitions

Move the definitions of the structures defining endpoint-specific
configuration data out of "ipa_data.h" and into "ipa_endpoint.h".
This is a trivial movement of code without any other change, to
prepare for the next few patches.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: open-code ether_setup()

About half of the fields set by the call in ipa_modem_netdev_setup()
are overwritten after the call. Instead, just skip the call, and
open-code the (other) assignments it makes to the net_device
structure fields.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: ignore endianness if there is no header

If we program an RX endpoint to have no header (header length is 0),
header-related endpoint configuration values are meaningless and are
ignored.

The only case we support that defines a header is QMAP endpoints.
In ipa_endpoint_init_hdr_ext() we set the endianness mask value
unconditionally, but it should not be done if there is no header
(meaning it is not configured for QMAP).

Set the endianness conditionally, and rearrange the logic in that
function slightly to avoid testing the qmap flag twice.

Delete an incorrect comment in ipa_endpoint_init_aggr().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: rename a GSI error code

The CHANNEL_NOT_RUNNING error condition has been generalized, so
rename it to be INCORRECT_CHANNEL_STATE.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>