review.tizen.org Git - platform/kernel/linux-rpi.git/log

Merge branch 'net-sched-act_tunnel_key-add-support-for-tunnel_dont_fragment'

Davide Caratti says:

====================
net/sched: act_tunnel_key: add support for TUNNEL_DONT_FRAGMENT

- patch 1 extends TC tunnel_key action to add support for TUNNEL_DONT_FRAGMENT
- patch 2 extends tdc to skip tests when iproute2 support is missing
- patch 3 adds a tdc test case to verify functionality of the control plane
- patch 4 adds a net/forwarding test case to verify functionality of the data plane
====================

Link: https://lore.kernel.org/r/cover.1680082990.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: forwarding: add tunnel_key "nofrag" test case

Add a selftest that configures metadata tunnel encapsulation using the TC
"tunnel_key" action: it includes a test case for setting "nofrag" flag.

Example output:

# selftests: net/forwarding: tc_tunnel_key.sh
# TEST: tunnel_key nofrag (skip_hw) [ OK ]
# INFO: Could not test offloaded functionality
ok 1 selftests: net/forwarding: tc_tunnel_key.sh

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: tc-testing: add tunnel_key "nofrag" test case

# ./tdc.py -e 6bda -l
6bda: (actions, tunnel_key) Add tunnel_key action with nofrag option

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: tc-testing: add "depends_on" property to skip tests

currently, users can skip individual test cases by means of writing

"skip": "yes"

in the scenario file. Extend this functionality, introducing 'dependsOn':
it's optional property like "skip", but the value contains a command (for
example, a probe on iproute2 to check if it supports a specific feature).
If such property is present, tdc executes that command and skips the test
when the return value is non-zero.

Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: act_tunnel_key: add support for "don't fragment"

extend "act_tunnel_key" to allow specifying TUNNEL_DONT_FRAGMENT.

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rtnetlink: Fix do_test_address_proto()

This selftest was introduced recently in the commit cited below. It misses
several check_err() invocations to actually verify that the previous
command succeeded. When these are added, the first one fails, because
besides the addresses added by hand, there can be a link-local address
added by the kernel. Adjust the check to expect at least three addresses
instead of exactly three, and add the missing check_err's.

Furthermore, the explanatory comments assume that the address with no
protocol is $addr2, when in fact it is $addr3. Update the comments.

Fixes: 6a414fd77f61 ("selftests: rtnetlink: Add an address proto test")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/53a579bc883e1bf2fe490d58427cf22c2d1aa21f.1680102695.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ti: Fix format specifier in netcp_create_interface()

After commit 3948b05950fd ("net: introduce a config option to tweak
MAX_SKB_FRAGS"), clang warns:

  drivers/net/ethernet/ti/netcp_core.c:2085:4: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
                          MAX_SKB_FRAGS);
                          ^~~~~~~~~~~~~
  include/linux/dev_printk.h:144:65: note: expanded from macro 'dev_err'
          dev_printk_index_wrap(_dev_err, KERN_ERR, dev, dev_fmt(fmt), ##__VA_ARGS__)
                                                                 ~~~     ^~~~~~~~~~~
  include/linux/dev_printk.h:110:23: note: expanded from macro 'dev_printk_index_wrap'
                  _p_func(dev, fmt, ##__VA_ARGS__);                       \
                               ~~~    ^~~~~~~~~~~
  include/linux/skbuff.h:352:23: note: expanded from macro 'MAX_SKB_FRAGS'
  #define MAX_SKB_FRAGS CONFIG_MAX_SKB_FRAGS
                        ^~~~~~~~~~~~~~~~~~~~
  ./include/generated/autoconf.h:11789:30: note: expanded from macro 'CONFIG_MAX_SKB_FRAGS'
  #define CONFIG_MAX_SKB_FRAGS 17
                               ^~
  1 warning generated.

Follow the pattern of the rest of the tree by changing the specifier to
'%u' and casting MAX_SKB_FRAGS explicitly to 'unsigned int', which
eliminates the warning.

Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20230329-net-ethernet-ti-wformat-v1-1-83d0f799b553@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: fix db type confusion in host fdb/mdb add/del

We have the following code paths:

Host FDB (unicast RX filtering):

dsa_port_standalone_host_fdb_add()   dsa_port_bridge_host_fdb_add()
               |                                     |
               +--------------+         +------------+
                              |         |
                              v         v
                         dsa_port_host_fdb_add()

dsa_port_standalone_host_fdb_del()   dsa_port_bridge_host_fdb_del()
               |                                     |
               +--------------+         +------------+
                              |         |
                              v         v
                         dsa_port_host_fdb_del()

Host MDB (multicast RX filtering):

dsa_port_standalone_host_mdb_add()   dsa_port_bridge_host_mdb_add()
               |                                     |
               +--------------+         +------------+
                              |         |
                              v         v
                         dsa_port_host_mdb_add()

dsa_port_standalone_host_mdb_del()   dsa_port_bridge_host_mdb_del()
               |                                     |
               +--------------+         +------------+
                              |         |
                              v         v
                         dsa_port_host_mdb_del()

The logic added by commit 5e8a1e03aa4d ("net: dsa: install secondary
unicast and multicast addresses as host FDB/MDB") zeroes out
db.bridge.num if the switch doesn't support ds->fdb_isolation
(the majority doesn't). This is done for a reason explained in commit
c26933639b54 ("net: dsa: request drivers to perform FDB isolation").

Taking a single code path as example - dsa_port_host_fdb_add() - the
others are similar - the problem is that this function handles:
- DSA_DB_PORT databases, when called from
  dsa_port_standalone_host_fdb_add()
- DSA_DB_BRIDGE databases, when called from
  dsa_port_bridge_host_fdb_add()

So, if dsa_port_host_fdb_add() were to make any change on the
"bridge.num" attribute of the database, this would only be correct for a
DSA_DB_BRIDGE, and a type confusion for a DSA_DB_PORT bridge.

However, this bug is without consequences, for 2 reasons:

- dsa_port_standalone_host_fdb_add() is only called from code which is
  (in)directly guarded by dsa_switch_supports_uc_filtering(ds), and that
  function only returns true if ds->fdb_isolation is set. So, the code
  only executed for DSA_DB_BRIDGE databases.

- Even if the code was not dead for DSA_DB_PORT, we have the following
  memory layout:

struct dsa_bridge {
struct net_device *dev;
unsigned int num;
bool tx_fwd_offload;
refcount_t refcount;
};

struct dsa_db {
enum dsa_db_type type;

union {
const struct dsa_port *dp; // DSA_DB_PORT
struct dsa_lag lag;
struct dsa_bridge bridge; // DSA_DB_BRIDGE
};
};

So, the zeroization of dsa_db :: bridge :: num on a dsa_db structure of
type DSA_DB_PORT would access memory which is unused, because we only
use dsa_db :: dp for DSA_DB_PORT, and this is mapped at the same address
with dsa_db :: dev for DSA_DB_BRIDGE, thanks to the union definition.

It is correct to fix up dsa_db :: bridge :: num only from code paths
that come from the bridge / switchdev, so move these there.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230329133819.697642-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ksz884x: remove unused change variable

clang with W=1 reports
drivers/net/ethernet/micrel/ksz884x.c:3216:6: error: variable
  'change' set but not used [-Werror,-Wunused-but-set-variable]
        int change = 0;
            ^
This variable is not used so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230329125929.1808420-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/netdev/net

Conflicts:

drivers/net/ethernet/mediatek/mtk_ppe.c
3fbe4d8c0e53 ("net: ethernet: mtk_eth_soc: ppe: add support for flow accounting")
924531326e2d ("net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.3-rc5' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from CAN and WPAN.

  Still quite a few bugs from this release. This pull is a bit smaller
  because major subtrees went into the previous one. Or maybe people
  took spring break off?

  Current release - regressions:

   - phy: micrel: correct KSZ9131RNX EEE capabilities and advertisement

  Current release - new code bugs:

   - eth: wangxun: fix vector length of interrupt cause

   - vsock/loopback: consistently protect the packet queue with
     sk_buff_head.lock

   - virtio/vsock: fix header length on skb merging

   - wpan: ca8210: fix unsigned mac_len comparison with zero

  Previous releases - regressions:

   - eth: stmmac: don't reject VLANs when IFF_PROMISC is set

   - eth: smsc911x: avoid PHY being resumed when interface is not up

   - eth: mtk_eth_soc: fix tx throughput regression with direct 1G links

   - eth: bnx2x: use the right build_skb() helper after core rework

   - wwan: iosm: fix 7560 modem crash on use on unsupported channel

  Previous releases - always broken:

   - eth: sfc: don't overwrite offload features at NIC reset

   - eth: r8169: fix RTL8168H and RTL8107E rx crc error

   - can: j1939: prevent deadlock by moving j1939_sk_errqueue()

   - virt: vmxnet3: use GRO callback when UPT is enabled

   - virt: xen: don't do grant copy across page boundary

   - phy: dp83869: fix default value for tx-/rx-internal-delay

   - dsa: ksz8: fix multiple issues with ksz8_fdb_dump

   - eth: mvpp2: fix classification/RSS of VLAN and fragmented packets

   - eth: mtk_eth_soc: fix flow block refcounting logic

  Misc:

   - constify fwnode pointers in SFP handling"

* tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits)
  net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow
  net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload
  net: ethernet: mtk_eth_soc: fix flow block refcounting logic
  net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()
  net: dsa: sync unicast and multicast addresses for VLAN filters too
  net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only
  xen/netback: use same error messages for same errors
  test/vsock: new skbuff appending test
  virtio/vsock: WARN_ONCE() for invalid state of socket
  virtio/vsock: fix header length on skb merging
  bnxt_en: Add missing 200G link speed reporting
  bnxt_en: Fix typo in PCI id to device description string mapping
  bnxt_en: Fix reporting of test result in ethtool selftest
  i40e: fix registers dump after run ethtool adapter self test
  bnx2x: use the right build_skb() helper
  net: ipa: compute DMA pool size properly
  net: wwan: iosm: fixes 7560 modem crash
  net: ethernet: mtk_eth_soc: fix tx throughput regression with direct 1G links
  ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
  ice: add profile conflict check for AVF FDIR
  ...

Merge tag 'for-6.3/dm-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

- Fix two DM core bugs in the code that handles splitting "abnormal" IO
   (discards, write same and secure erase) and issuing that IO to the
   correct underlying devices (and offsets within those devices).

* tag 'for-6.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: fix __send_duplicate_bios() to always allow for splitting IO
  dm: fix improper splitting for abnormal bios

Merge tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Daniel Vetter:
"Two regression fixes in here, otherwise just the usual stuff:

   - i915 fixes for color mgmt, psr, lmem flush, hibernate oops, and
     more

   - amdgpu: dp mst and hibernate regression fix

   - etnaviv: revert fdinfo support (incl drm/sched revert), leak fix

   - misc ivpu fixes, nouveau backlight, drm buddy allocator 32bit
     fixes"

* tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm: (27 commits)
  Revert "drm/scheduler: track GPU active time per entity"
  Revert "drm/etnaviv: export client GPU usage statistics via fdinfo"
  drm/etnaviv: fix reference leak when mmaping imported buffer
  drm/amdgpu: allow more APUs to do mode2 reset when go to S4
  drm/amd/display: Take FEC Overhead into Timeslot Calculation
  drm/amd/display: Add DSC Support for Synaptics Cascaded MST Hub
  drm: test: Fix 32-bit issue in drm_buddy_test
  drm: buddy_allocator: Fix buddy allocator init on 32-bit systems
  drm/nouveau/kms: Fix backlight registration
  drm/i915/perf: Drop wakeref on GuC RC error
  drm/i915/dpt: Treat the DPT BO as a framebuffer
  drm/i915/gem: Flush lmem contents after construction
  drm/i915/tc: Fix the ICL PHY ownership check in TC-cold state
  drm/i915: Disable DC states for all commits
  drm/i915: Workaround ICL CSC_MODE sticky arming
  drm/i915: Add a .color_post_update() hook
  drm/i915: Move CSC load back into .color_commit_arm() when PSR is enabled on skl/glk
  drm/i915: Split icl_color_commit_noarm() from skl_color_commit_noarm()
  drm/i915/pmu: Use functions common with sysfs to read actual freq
  accel/ivpu: Fix IPC buffer header status field value
  ...

dm: fix __send_duplicate_bios() to always allow for splitting IO

Commit 7dd76d1feec70 ("dm: improve bio splitting and associated IO
accounting") only called setup_split_accounting() from
__send_duplicate_bios() if a single bio were being issued. But the case
where duplicate bios are issued must call it too.

Otherwise the bio won't be split and resubmitted (via recursion through
block core back to DM) to submit the later portions of a bio (which may
map to an entirely different target).

For example, when discarding an entire DM striped device with the
following DM table:
vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048

Before (broken, discards the first striped target's devices twice):
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
device-mapper: striped: target_stripe=0, bdev=7:0, start=2049 len=22528
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=22528

After (works as expected):
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
device-mapper: striped: target_stripe=0, bdev=7:2, start=2048 len=22528
device-mapper: striped: target_stripe=1, bdev=7:3, start=2048 len=22528

Fixes: 7dd76d1feec70 ("dm: improve bio splitting and associated IO accounting")
Cc: stable@vger.kernel.org
Reported-by: Orange Kao <orange@aiven.io>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

dm: fix improper splitting for abnormal bios

"Abnormal" bios include discards, write zeroes and secure erase. By no
longer passing the calculated 'len' pointer, commit 7dd06a2548b2 ("dm:
allow dm_accept_partial_bio() for dm_io without duplicate bios") took a
senseless approach to disallowing dm_accept_partial_bio() from working
for duplicate bios processed using __send_duplicate_bios().

It inadvertently and incorrectly stopped the use of 'len' when
initializing a target's io (in alloc_tio). As such the resulting tio
could address more area of a device than it should.

For example, when discarding an entire DM striped device with the
following DM table:
vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048

Before this fix:

device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=102400
blkdiscard: attempt to access beyond end of device
loop0: rw=2051, sector=2048, nr_sectors = 102400 limit=81920

device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=102400
blkdiscard: attempt to access beyond end of device
loop1: rw=2051, sector=2048, nr_sectors = 102400 limit=81920

After this fix;

device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872

Fixes: 7dd06a2548b2 ("dm: allow dm_accept_partial_bio() for dm_io without duplicate bios")
Cc: stable@vger.kernel.org
Reported-by: Orange Kao <orange@aiven.io>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow

The cache needs to be flushed to ensure that the hardware stops offloading
the flow immediately.

Fixes: 33fc42de3327 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230330120840.52079-3-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload

Check for skb metadata in order to detect the case where the DSA header
is not present.

Fixes: 2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230330120840.52079-2-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_eth_soc: fix flow block refcounting logic

Since we call flow_block_cb_decref on FLOW_BLOCK_UNBIND, we also need to
call flow_block_cb_incref for a newly allocated cb.
Also fix the accidentally inverted refcount check on unbind.

Fixes: 502e84e2382d ("net: ethernet: mtk_eth_soc: add flow offloading support")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230330120840.52079-1-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()

Reported on the Turris forum, mvneta provokes kernel warnings in the
architecture DMA mapping code when mvneta_setup_txqs() fails to
allocate memory. This happens because when mvneta_cleanup_txqs() is
called in the mvneta_stop() path, we leave pointers in the structure
that have been freed.

Then on mvneta_open(), we call mvneta_setup_txqs(), which starts
allocating memory. On memory allocation failure, mvneta_cleanup_txqs()
will walk all the queues freeing any non-NULL pointers - which includes
pointers that were previously freed in mvneta_stop().

Fix this by setting these pointers to NULL to prevent double-freeing
of the same memory.

Fixes: 2adb719d74f6 ("net: mvneta: Implement software TSO")
Link: https://forum.turris.cz/t/random-kernel-exceptions-on-hbl-tos-7-0/18865/8
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1phUe5-00EieL-7q@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: sync unicast and multicast addresses for VLAN filters too

If certain conditions are met, DSA can install all necessary MAC
addresses on the CPU ports as FDB entries and disable flooding towards
the CPU (we call this RX filtering).

There is one corner case where this does not work.

ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
ip link set swp0 master br0 && ip link set swp0 up
ip link add link swp0 name swp0.100 type vlan id 100
ip link set swp0.100 up && ip addr add 192.168.100.1/24 dev swp0.100

Traffic through swp0.100 is broken, because the bridge turns on VLAN
filtering in the swp0 port (causing RX packets to be classified to the
FDB database corresponding to the VID from their 802.1Q header), and
although the 8021q module does call dev_uc_add() towards the real
device, that API is VLAN-unaware, so it only contains the MAC address,
not the VID; and DSA's current implementation of ndo_set_rx_mode() is
only for VID 0 (corresponding to FDB entries which are installed in an
FDB database which is only hit when the port is VLAN-unaware).

It's interesting to understand why the bridge does not turn on
IFF_PROMISC for its swp0 bridge port, and it may appear at first glance
that this is a regression caused by the logic in commit 2796d0c648c9
("bridge: Automatically manage port promiscuous mode."). After all,
a bridge port needs to have IFF_PROMISC by its very nature - it needs to
receive and forward frames with a MAC DA different from the bridge
ports' MAC addresses.

While that may be true, when the bridge is VLAN-aware *and* it has a
single port, there is no real reason to enable promiscuity even if that
is an automatic port, with flooding and learning (there is nowhere for
packets to go except to the BR_FDB_LOCAL entries), and this is how the
corner case appears. Adding a second automatic interface to the bridge
would make swp0 promisc as well, and would mask the corner case.

Given the dev_uc_add() / ndo_set_rx_mode() API is what it is (it doesn't
pass a VLAN ID), the only way to address that problem is to install host
FDB entries for the cartesian product of RX filtering MAC addresses and
VLAN RX filters.

Fixes: 7569459a52c9 ("net: dsa: manage flooding on the CPU ports")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230329151821.745752-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only

Do not set the MV88E6XXX_PORT_CTL0_IGMP_MLD_SNOOP bit on CPU or DSA ports.

This allows the host CPU port to be a regular IGMP listener by sending out
IGMP Membership Reports, which would otherwise not be forwarded by the
mv88exxx chip, but directly looped back to the CPU port itself.

Fixes: 54d792f257c6 ("net: dsa: Centralise global and port setup code into mv88e6xxx.")
Signed-off-by: Steffen Bätz <steffen@innosonix.de>
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230329150140.701559-1-festevam@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux into drm-fixes

- revert gpu time fdinfo support
- reference leak fix on imported buffers

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Lucas Stach <l.stach@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/de8e08c2599ec0e22456ae36e9757b9ff14c2124.camel@pengutronix.de

Merge tag 'amd-drm-fixes-6.3-2023-03-30' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.3-2023-03-30:

amdgpu:
- Hibernation regression fix

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230330153859.18332-1-alexander.deucher@amd.com

Merge tag 'drm-misc-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Short summary of fixes pull:

* various ivpu fixes
* fix nouveau backlight registration
* fix buddy allocator in 32-bit systems

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230330141006.GA22908@linux-uq9g

Merge tag 'amd-drm-fixes-6.3-2023-03-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.3-2023-03-29:

amdgpu:
- Two DP MST fixes

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230329220059.7622-1-alexander.deucher@amd.com

Merge tag 'drm-intel-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

drm/i915 fixes for v6.3-rc5:
- Fix PMU support by reusing functions with sysfs
- Fix a number of issues related to color, PSR and arm/noarm
- Fix state check related to ICL PHY ownership check in TC-cold state
- Flush lmem contents after construction
- Fix hibernate oops related to DPT BO
- Fix perf stream error path wakeref balance

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87355m4gtm.fsf@intel.com

Merge tag 'sound-6.3-rc5' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes:

   - A potential deadlock fix for USB-audio, involving some change in
     PCM core side

   - A regression fix for probes of USB-audio devices with the
     vendor-specific PCM format bits

   - Two regression fixes for the old YMFPCI driver

   - A few HD-audio quirks as usual"

* tag 'sound-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: Add quirk for Lenovo ZhaoYang CF4620Z
  ALSA: ymfpci: Fix BUG_ON in probe function
  ALSA: ymfpci: Create card with device-managed snd_devm_card_new()
  ALSA: usb-audio: Fix regression on detection of Roland VS-100
  ALSA: hda/realtek: Fix support for Dell Precision 3260
  ALSA: usb-audio: Fix recursive locking at XRUN during syncing
  ALSA: hda/conexant: Partial revert of a quirk for Lenovo
  ALSA: hda/realtek: Add quirks for some Clevo laptops

Merge tag 'zonefs-6.3-rc5' of git://git./linux/kernel/git/dlemoal/zonefs

Pull zonefs fixes from Damien Le Moal:

- Make sure to always invalidate the last page of an inode straddling
   inode->i_size to avoid data inconsistencies with appended data when
   the device zone write granularity does not match the page size.

- Do not propagate iomap -ENOBLK error to userspace and use -EBUSY
   instead.

* tag 'zonefs-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user space
  zonefs: Always invalidate last cached page on append write

Revert "drm/scheduler: track GPU active time per entity"

This reverts commit df622729ddbf as it introduces a use-after-free,
which isn't easy to fix without going back to the design drawing board.

Reported-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

Revert "drm/etnaviv: export client GPU usage statistics via fdinfo"

This reverts commit 97804a133c68, as it builds on top of df622729ddbf
("drm/scheduler: track GPU active time per entity") which needs to be
reverted, as it introduces a use-after-free.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

drm/etnaviv: fix reference leak when mmaping imported buffer

drm_gem_prime_mmap() takes a reference on the GEM object, but before that
drm_gem_mmap_obj() already takes a reference, which will be leaked as only
one reference is dropped when the mapping is closed. Drop the extra
reference when dma_buf_mmap() succeeds.

Cc: stable@vger.kernel.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>

drm/amdgpu: allow more APUs to do mode2 reset when go to S4

Skip mode2 reset only for IMU enabled APUs when do S4.

This patch is to fix the regression issue
https://gitlab.freedesktop.org/drm/amd/-/issues/2483
It is generated by commit b589626674de ("drm/amdgpu: skip ASIC reset
for APUs when go to S4").

Fixes: b589626674de ("drm/amdgpu: skip ASIC reset for APUs when go to S4")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2483
Tested-by: Yuan Perry <Perry.Yuan@amd.com>
Signed-off-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x

xen/netback: use same error messages for same errors

Issue the same error message in case an illegal page boundary crossing
has been detected in both cases where this is tested.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20230329080259.14823-1-jgross@suse.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

smsc911x: remove superfluous variable init

phydev is assigned a value right away, no need to initialize it.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230329064414.25028-1-wsa+renesas@sang-engineering.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user space

The call to invalidate_inode_pages2_range() in __iomap_dio_rw() may
fail, in which case -ENOTBLK is returned and this error code is
propagated back to user space trhough iomap_dio_rw() ->
zonefs_file_dio_write() return chain. This error code is fairly obscure
and may confuse the user. Avoid this and be consistent with the behavior
of zonefs_file_dio_append() for similar invalidate_inode_pages2_range()
errors by returning -EBUSY to user space when iomap_dio_rw() returns
-ENOTBLK.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>

zonefs: Always invalidate last cached page on append write

When a direct append write is executed, the append offset may correspond
to the last page of a sequential file inode which might have been cached
already by buffered reads, page faults with mmap-read or non-direct
readahead. To ensure that the on-disk and cached data is consistant for
such last cached page, make sure to always invalidate it in
zonefs_file_dio_append(). If the invalidation fails, return -EBUSY to
userspace to differentiate from IO errors.

This invalidation will always be a no-op when the FS block size (device
zone write granularity) is equal to the page size (e.g. 4K).

Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com>
Fixes: 02ef12a663c7 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>

Merge branch 'net-rps-rfs-improvements'

Eric Dumazet says:

====================
net: rps/rfs improvements

Jason Xing attempted to optimize napi_schedule_rps() by avoiding
unneeded NET_RX_SOFTIRQ raises: [1], [2]

This is quite complex to implement properly. I chose to implement
the idea, and added a similar optimization in ____napi_schedule()

Overall, in an intensive RPC workload, with 32 TX/RX queues with RFS
I was able to observe a ~10% reduction of NET_RX_SOFTIRQ
invocations.

While this had no impact on throughput or cpu costs on this synthetic
benchmark, we know that firing NET_RX_SOFTIRQ from softirq handler
can force __do_softirq() to wakeup ksoftirqd when need_resched() is true.
This can have a latency impact on stressed hosts.

[1] https://lore.kernel.org/lkml/20230325152417.5403-1-kerneljasonxing@gmail.com/
[2] https://lore.kernel.org/netdev/20230328142112.12493-1-kerneljasonxing@gmail.com/
====================

Link: https://lore.kernel.org/r/20230328235021.1048163-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: optimize ____napi_schedule() to avoid extra NET_RX_SOFTIRQ

____napi_schedule() adds a napi into current cpu softnet_data poll_list,
then raises NET_RX_SOFTIRQ to make sure net_rx_action() will process it.

Idea of this patch is to not raise NET_RX_SOFTIRQ when being called indirectly
from net_rx_action(), because we can process poll_list from this point,
without going to full softirq loop.

This needs a change in net_rx_action() to make sure we restart
its main loop if sd->poll_list was updated without NET_RX_SOFTIRQ
being raised.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: optimize napi_schedule_rps()

Based on initial patch from Jason Xing.

Idea is to not raise NET_RX_SOFTIRQ from napi_schedule_rps()
when we queued a packet into another cpu backlog.

We can do this only in the context of us being called indirectly
from net_rx_action(), to have the guarantee our rps_ipi_list
will be processed before we exit from net_rx_action().

Link: https://lore.kernel.org/lkml/20230325152417.5403-1-kerneljasonxing@gmail.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: add softnet_data.in_net_rx_action

We want to make two optimizations in napi_schedule_rps() and
____napi_schedule() which require to know if these helpers are
called from net_rx_action(), instead of being called from
other contexts.

sd.in_net_rx_action is only read/written by the owning cpu.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: napi_schedule_rps() cleanup

napi_schedule_rps() return value is ignored, remove it.

Change the comment to clarify the intent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'fix-header-length-on-skb-merging'

Arseniy Krasnov says:

====================
fix header length on skb merging

this patchset fixes appending newly arrived skbuff to the last skbuff of
the socket's queue during rx path. Problem fires when we are trying to
append data to skbuff which was already processed in dequeue callback
at least once. Dequeue callback calls function 'skb_pull()' which changes
'skb->len'. In current implementation 'skb->len' is used to update length
in header of last skbuff after new data was copied to it. This is bug,
because value in header is used to calculate 'rx_bytes'/'fwd_cnt' and
thus must be constant during skbuff lifetime. Here is example, we have
two skbuffs: skb0 with length 10 and skb1 with length 4.

1) skb0 arrives, hdr->len == skb->len == 10, rx_bytes == 10
2) Read 3 bytes from skb0, skb->len == 7, hdr->len == 10, rx_bytes == 10
3) skb1 arrives, hdr->len == skb->len == 4, rx_bytes == 14
4) Append skb1 to skb0, skb0 now has skb->len == 11, hdr->len == 11.
   But value of 11 in header is invalid.
5) Read whole skb0, update rx_bytes by 11 from skb0's header.
6) At this moment rx_bytes == 3, but socket's queue is empty.

This bug starts to fire since:

commit
077706165717 ("virtio/vsock: don't use skbuff state to account credit")

In fact, it presents before, but didn't triggered due to a little bit
buggy implementation of credit calculation logic. So i'll use Fixes tag
for it.

I really forgot about this branch in rx path when implemented patch
077706165717.

This patchset contains 3 patches:
1) Fix itself.
2) Patch with WARN_ONCE() to catch such problems in future.
3) Patch with test which triggers skb appending logic. It looks like
   simple test with several 'send()' and 'recv()', but it checks, that
   skbuff appending works ok.
====================

Link: https://lore.kernel.org/r/0683cc6e-5130-484c-1105-ef2eb792d355@sberdevices.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

test/vsock: new skbuff appending test

This adds test which checks case when data of newly received skbuff is
appended to the last skbuff in the socket's queue. It looks like simple
test with 'send()' and 'recv()', but internally it triggers logic which
appends one received skbuff to another. Test checks that this feature
works correctly.

This test is actual only for virtio transport.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

virtio/vsock: WARN_ONCE() for invalid state of socket

This adds WARN_ONCE() and return from stream dequeue callback when
socket's queue is empty, but 'rx_bytes' still non-zero. This allows
the detection of potential bugs due to packet merging (see previous
patch).

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

virtio/vsock: fix header length on skb merging

This fixes appending newly arrived skbuff to the last skbuff of the
socket's queue. Problem fires when we are trying to append data to skbuff
which was already processed in dequeue callback at least once. Dequeue
callback calls function 'skb_pull()' which changes 'skb->len'. In current
implementation 'skb->len' is used to update length in header of the last
skbuff after new data was copied to it. This is bug, because value in
header is used to calculate 'rx_bytes'/'fwd_cnt' and thus must be not
be changed during skbuff's lifetime.

Bug starts to fire since:

commit 077706165717
("virtio/vsock: don't use skbuff state to account credit")

It presents before, but didn't triggered due to a little bit buggy
implementation of credit calculation logic. So use Fixes tag for it.

Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'mlx5-updates-2023-03-28' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-03-28

Dragos Tatulea says:
====================

net/mlx5e: RX, Drop page_cache and fully use page_pool

For page allocation on the rx path, the mlx5e driver has been using an
internal page cache in tandem with the page pool. The internal page
cache uses a queue for page recycling which has the issue of head of
queue blocking.

This patch series drops the internal page_cache altogether and uses the
page_pool to implement everything that was done by the page_cache
before:
* Let the page_pool handle dma mapping and unmapping.
* Use fragmented pages with fragment counter instead of tracking via
  page ref.
* Enable skb recycling.

The patch series has the following effects on the rx path:

* Improved performance for the cases when there was low page recycling
  due to head of queue blocking in the internal page_cache. The test
  for this was running a single iperf TCP stream to a rx queue
  which is bound on the same cpu as the application.

  |-------------+--------+--------+------+---------|
  | rq type     | before | after  | unit |   diff  |
  |-------------+--------+--------+------+---------|
  | striding rq |  30.1  |  31.4  | Gbps |  4.14 % |
  | legacy rq   |  30.2  |  33.0  | Gbps |  8.48 % |
  |-------------+--------+--------+------+---------|

* Small XDP performance degradation. The test was is XDP drop
  program running on a single rx queue with small packets incoming
  it looks like this:

  |-------------+----------+----------+------+---------|
  | rq type     | before   | after    | unit |   diff  |
  |-------------+----------+----------+------+---------|
  | striding rq | 19725449 | 18544617 | pps  | -6.37 % |
  | legacy rq   | 19879931 | 18631841 | pps  | -6.70 % |
  |-------------+----------+----------+------+---------|

  This will be handled in a different patch series by adding support for
  multi-packet per page.

* For other cases the performance is roughly the same.

The above numbers were obtained on the following system:
  24 core Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
  32 GB RAM
  ConnectX-7 single port

The breakdown on the patch series is the following:
* Preparations for introducing the mlx5e_frag_page struct.
* Delete the mlx5e_page_cache struct.
* Enable dma mapping from page_pool.
* Enable skb recycling and fragment counting.
* Do deferred release of pages (just before alloc) to ensure better
  page_pool cache utilization.

====================

* tag 'mlx5-updates-2023-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: RX, Remove unnecessary recycle parameter and page_cache stats
  net/mlx5e: RX, Break the wqe bulk refill in smaller chunks
  net/mlx5e: RX, Increase WQE bulk size for legacy rq
  net/mlx5e: RX, Split off release path for xsk buffers for legacy rq
  net/mlx5e: RX, Defer page release in legacy rq for better recycling
  net/mlx5e: RX, Change wqe last_in_page field from bool to bit flags
  net/mlx5e: RX, Defer page release in striding rq for better recycling
  net/mlx5e: RX, Rename xdp_xmit_bitmap to a more generic name
  net/mlx5e: RX, Enable skb page recycling through the page_pool
  net/mlx5e: RX, Enable dma map and sync from page_pool allocator
  net/mlx5e: RX, Remove internal page_cache
  net/mlx5e: RX, Store SHAMPO header pages in array
  net/mlx5e: RX, Remove alloc unit layout constraint for striding rq
  net/mlx5e: RX, Remove alloc unit layout constraint for legacy rq
  net/mlx5e: RX, Remove mlx5e_alloc_unit argument in page allocation
====================

Link: https://lore.kernel.org/r/20230328205623.142075-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'bnxt_en-3-bug-fixes'

Michael Chan says:

====================
bnxt_en: 3 Bug fixes

This series contains 3 small bug fixes covering ethtool self test, PCI
ID string typos, and some missing 200G link speed ethtool reporting logic.
====================

Link: https://lore.kernel.org/r/20230329013021.5205-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Add missing 200G link speed reporting

bnxt_fw_to_ethtool_speed() is missing the case statement for 200G
link speed reported by firmware. As a result, ethtool will report
unknown speed when the firmware reports 200G link speed.

Fixes: 532262ba3b84 ("bnxt_en: ethtool: support PAM4 link speeds up to 200G")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Fix typo in PCI id to device description string mapping

Fix 57502 and 57508 NPAR description string entries. The typos
caused these devices to not match up with lspci output.

Fixes: 49c98421e6ab ("bnxt_en: Add PCI IDs for 57500 series NPAR devices.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Fix reporting of test result in ethtool selftest

When the selftest command fails, driver is not reporting the failure
by updating the "test->flags" when bnxt_close_nic() fails.

Fixes: eb51365846bc ("bnxt_en: Add basic ethtool -t selftest support.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

i40e: fix registers dump after run ethtool adapter self test

Fix invalid registers dump from ethtool -d ethX after adapter self test
by ethtool -t ethY. It causes invalid data display.

The problem was caused by overwriting i40e_reg_list[].elements
which is common for ethtool self test and dump.

Fixes: 22dd9ae8afcc ("i40e: Rework register diagnostic")
Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230328172659.3906413-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-03-28 (ice)

This series contains updates to ice driver only.

Jesse fixes mismatched header documentation reported when building with
W=1.

Brett restricts setting of VSI context to only applicable fields for the
given ICE_AQ_VSI_PROP_Q_OPT_VALID bit.

Junfeng adds check when adding Flow Director filters that conflict with
existing filter rules.

Jakob Koschel adds interim variable for iterating to prevent possible
misuse after looping.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
  ice: add profile conflict check for AVF FDIR
  ice: Fix ice_cfg_rdma_fltr() to only update relevant fields
  ice: fix W=1 headers mismatch
====================

Link: https://lore.kernel.org/r/20230328172035.3904953-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'ieee802154-for-net-2023-03-29' of git://git./linux/kernel/git/wpan/wpan

Stefan Schmidt says:

====================
ieee802154 for net 2023-03-29

Two small fixes this time.

Dongliang Mu removed an unnecessary null pointer check.

Harshit Mogalapalli fixed an int comparison unsigned against signed from a
recent other fix in the ca8210 driver.

* tag 'ieee802154-for-net-2023-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan:
net: ieee802154: remove an unnecessary null pointer check
ca8210: Fix unsigned mac_len comparison with zero in ca8210_skb_tx()
====================

Link: https://lore.kernel.org/r/20230329064541.2147400-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ena: removed unused tx_bytes variable

clang 16.0.0 with W=1 reports:

drivers/net/ethernet/amazon/ena/ena_netdev.c:1901:6: error: variable 'tx_bytes' set but not used [-Werror,-Wunused-but-set-variable]
u32 tx_bytes = 0;

The variable is not used so remove it.

Signed-off-by: Simon Horman <horms@kernel.org>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/r/20230328151958.410687-1-horms@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeon_ep: unlock the correct lock on error path

The h and the f letters are swapped so it unlocks the wrong lock.

Fixes: 577f0d1b1c5f ("octeon_ep: add separate mailbox command and response queues")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/251aa2a2-913e-4868-aac9-0a90fc3eeeda@kili.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnx2x: use the right build_skb() helper

build_skb() no longer accepts slab buffers. Since slab use is fairly
uncommon we prefer the drivers to call a separate slab_build_skb()
function appropriately.

bnx2x uses the old semantics where size of 0 meant buffer from slab.
It sets the fp->rx_frag_size to 0 for MTUs which don't fit in a page.
It needs to call slab_build_skb().

This fixes the WARN_ONCE() of incorrect API use seen with bnx2x.

Reported-by: Thomas Voegtle <tv@lio96.de>
Link: https://lore.kernel.org/all/b8f295e4-ba57-8bfb-7d9c-9d62a498a727@lio96.de/
Fixes: ce098da1497c ("skbuff: Introduce slab_build_skb()")
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230329000013.2734957-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: compute DMA pool size properly

In gsi_trans_pool_init_dma(), the total size of a pool of memory
used for DMA transactions is calculated.  However the calculation is
done incorrectly.

For 4KB pages, this total size is currently always more than one
page, and as a result, the calculation produces a positive (though
incorrect) total size.  The code still works in this case; we just
end up with fewer DMA pool entries than we intended.

Bjorn Andersson tested booting a kernel with 16KB pages, and hit a
null pointer derereference in sg_alloc_append_table_from_pages(),
descending from gsi_trans_pool_init_dma().  The cause of this was
that a 16KB total size was going to be allocated, and with 16KB
pages the order of that allocation is 0.  The total_size calculation
yielded 0, which eventually led to the crash.

Correcting the total_size calculation fixes the problem.

Reported-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Tested-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Fixes: 9dd441e4ed57 ("soc: qcom: ipa: GSI transactions")
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230328162751.2861791-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ptp: add ToD device driver for Intel FPGA cards

Adding a DFL (Device Feature List) device driver of ToD device for
Intel FPGA cards.

The Intel FPGA Time of Day(ToD) IP within the FPGA DFL bus is exposed
as PTP Hardware clock(PHC) device to the Linux PTP stack to synchronize
the system clock to its ToD information using phc2sys utility of the
Linux PTP stack. The DFL is a hardware List within FPGA, which defines
a linked list of feature headers within the device MMIO space to provide
an extensible way of adding subdevice features.

Signed-off-by: Raghavendra Khadatare <raghavendrax.anand.khadatare@intel.com>
Signed-off-by: Tianfei Zhang <tianfei.zhang@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20230328142455.481146-1-tianfei.zhang@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

drm/amd/display: Take FEC Overhead into Timeslot Calculation

8b/10b encoding needs to add 3% fec overhead into the pbn.
In the Synapcis Cascaded MST hub, the first stage MST branch device
needs the information to determine the timeslot count for the
second stage MST branch device. Missing this overhead will leads to
insufficient timeslot allocation.

Cc: stable@vger.kernel.org
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add DSC Support for Synaptics Cascaded MST Hub

Traditional synaptics hub has one MST branch device without virtual dpcd.
Synaptics cascaded hub has two chained MST branch devices. DSC decoding
is performed via root MST branch device, instead of the second MST branch
device.

Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

Merge tag 'xtensa-20230327' of https://github.com/jcmvbkbc/linux-xtensa

Pull xtensa fixes from Max Filippov:

- fix KASAN report in show_stack

- drop linux-xtensa mailing list from the MAINTAINERS file

* tag 'xtensa-20230327' of https://github.com/jcmvbkbc/linux-xtensa:
MAINTAINERS: xtensa: drop linux-xtensa@linux-xtensa.org mailing list
xtensa: fix KASAN report for show_stack

Merge tag 'f2fs-fix-6.3-rc5' of git://git./linux/kernel/git/jaegeuk/f2fs

Pull f2fs fix from Jaegeuk Kim:
"This fixes a tracepoint field size in f2fs in preparation for stricter
rules for tracing fields"

* tag 'f2fs-fix-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: Fix f2fs_truncate_partial_nodes ftrace event

drm: test: Fix 32-bit issue in drm_buddy_test

The drm_buddy_test KUnit tests verify that returned blocks have sizes
which are powers of two using is_power_of_2(). However, is_power_of_2()
operations on a 'long', but the block size is a u64. So on systems where
long is 32-bit, this can sometimes fail even on correctly sized blocks.

This only reproduces randomly, as the parameters passed to the buddy
allocator in this test are random. The seed 0xb2e06022 reproduced it
fine here.

For now, just hardcode an is_power_of_2() implementation using
x & (x - 1).

Signed-off-by: David Gow <davidgow@google.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Arunpravin Paneer Selvam <arunpravin.paneerselvam@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230329065532.2122295-2-davidgow@google.com
Signed-off-by: Christian König <christian.koenig@amd.com>

drm: buddy_allocator: Fix buddy allocator init on 32-bit systems

The drm buddy allocator tests were broken on 32-bit systems, as
rounddown_pow_of_two() takes a long, and the buddy allocator handles
64-bit sizes even on 32-bit systems.

This can be reproduced with the drm_buddy_allocator KUnit tests on i386:
./tools/testing/kunit/kunit.py run --arch i386 \
--kunitconfig ./drivers/gpu/drm/tests drm_buddy

(It results in kernel BUG_ON() when too many blocks are created, due to
the block size being too small.)

This was independently uncovered (and fixed) by Luís Mendes, whose patch
added a new u64 variant of rounddown_pow_of_two(). This version instead
recalculates the size based on the order.

Reported-by: Luís Mendes <luis.p.mendes@gmail.com>
Link: https://lore.kernel.org/lkml/CAEzXK1oghXAB_KpKpm=-CviDQbNaH0qfgYTSSjZgvvyj4U78AA@mail.gmail.com/T/
Signed-off-by: David Gow <davidgow@google.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Arunpravin Paneer Selvam <arunpravin.paneerselvam@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230329065532.2122295-1-davidgow@google.com
Signed-off-by: Christian König <christian.koenig@amd.com>

ALSA: hda/realtek: Add quirk for Lenovo ZhaoYang CF4620Z

Fix headset microphone detection on Lenovo ZhaoYang CF4620Z.

[ adjusted to be applicable to the latest tree -- tiwai ]

Signed-off-by: huangwenhui <huangwenhuia@uniontech.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230328074644.30142-1-huangwenhuia@uniontech.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

drm/nouveau/kms: Fix backlight registration

The nouveau code used to call drm_fb_helper_initial_config() from
nouveau_fbcon_init() before calling drm_dev_register(). This would
probe all connectors so that drm_connector->status could be used during
backlight registration which runs from nouveau_connector_late_register().

After commit 4a16dd9d18a0 ("drm/nouveau/kms: switch to drm fbdev helpers")
the fbdev emulation code, which now is a drm-client, can only run after
drm_dev_register(). So during backlight registration the connectors are
not probed yet and the drm_connector->status == connected check in
nv50_backlight_init() would now always fail.

Replace the drm_connector->status == connected check with
a drm_helper_probe_detect() == connected check to fix nv_backlight
no longer getting registered because of this.

Fixes: 4a16dd9d18a0 ("drm/nouveau/kms: switch to drm fbdev helpers")
Link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/202
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2181941
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230326205433.36485-1-hdegoede@redhat.com

net: wwan: iosm: fixes 7560 modem crash

ModemManger/Apps probing the wwan0xmmrpc0 port for 7560 Modem results in
modem crash.

7560 Modem FW uses the MBIM interface for control command communication
whereas 7360 uses Intel RPC interface so disable wwan0xmmrpc0 port for
7560.

Fixes: d08b0f8f46e4 ("net: wwan: iosm: add rpc interface for xmm modems")
Reported-and-tested-by: Martin <mwolf@adiumentum.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217200
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Signed-off-by: Shane Parslow <shaneparslow808@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: support wake on lan configuration and query

The HNS3 driver supports Wake-on-LAN, which can wake up
the server from power off state to power on state by magic
packet or magic security packet.

ChangeLog:
v1->v2:
Deleted the debugfs function that overlaps with the ethtool function
from suggestion of Andrew Lunn.

v2->v3:
Return the wol configuration stored in driver,
suggested by Alexander H Duyck.

v3->v4:
Add a helper to go from netdev to the local struct,
suggested by Simon Horman and Jakub Kicinski.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sfc-tc-decap-support'

Edward Cree says:

====================
sfc: support TC decap rules

This series adds support for offloading tunnel decapsulation TC rules to
ef100 NICs, allowing matching encapsulated packets to be decapsulated in
hardware and redirected to VFs.
For now an encap match must be on precisely the following fields:
ethertype (IPv4 or IPv6), source IP, destination IP, ipproto UDP,
UDP destination port. This simplifies checking for overlaps in the
driver; the hardware supports a wider range of match fields which
future driver work may expose.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: add offloading of 'foreign' TC (decap) rules

A 'foreign' rule is one for which the net_dev is not the sfc netdevice
or any of its representors.  The driver registers indirect flow blocks
for tunnel netdevs so that it can offload decap rules.  For example:

    tc filter add dev vxlan0 parent ffff: protocol ipv4 flower \
        enc_src_ip 10.1.0.2 enc_dst_ip 10.1.0.1 \
        enc_key_id 1000 enc_dst_port 4789 \
        action tunnel_key unset \
        action mirred egress redirect dev $REPRESENTOR

When notified of a rule like this, register an encap match on the IP
and dport tuple (creating an Outer Rule table entry) and insert an MAE
action rule to perform the decapsulation and deliver to the representee.

Moved efx_tc_delete_rule() below efx_tc_flower_release_encap_match() to
avoid the need for a forward declaration.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: add code to register and unregister encap matches

Add a hashtable to detect duplicate and conflicting matches. If match
is not a duplicate, call MAE functions to add/remove it from OR table.
Calling code not added yet, so mark the new functions as unused.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: add functions to insert encap matches into the MAE

An encap match corresponds to an entry in the exact-match Outer Rule
table; the lookup response includes the encap type (protocol) allowing
the hardware to continue parsing into the inner headers.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: handle enc keys in efx_tc_flower_parse_match()

Translate the fields from flow dissector into struct efx_tc_match.
In efx_tc_flower_replace(), reject filters that match on them, because
only 'foreign' filters (i.e. those for which the ingress dev is not
the sfc netdev or any of its representors, e.g. a tunnel netdev) can
use them.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: add notion of match on enc keys to MAE machinery

Extend the MAE caps check to validate that the hardware supports these
outer-header matches where used by the driver.
Extend efx_mae_populate_match_criteria() to fill in the outer rule ID
and VNI match fields.
Nothing yet populates these match fields, nor creates outer rules.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: document TC-to-EF100-MAE action translation concepts

Includes an explanation of the lifetime of the 'cursor' action-set `act`.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'macvlan-broadcast-queue-bypass'

Herbert Xu says:

====================
macvlan: Allow some packets to bypass broadcast queue

This patch series allows some packets to bypass the broadcast
queue on receive.  Currently all multicast packets are queued
on receive and then processed in a work queue.  This is to avoid
an unbounded amount of work occurring in the receive path, as
one broadcast packet could easily translate into 4,000 packets.

However, for multicast packets with just one receiver (possible
for IPv6 ND), this introduces unnecessary latency as the packet
will go to exactly one device.

This series allows such multicast packets to be processed inline.
It also adds a toggle which lets the admin control what threshold
to set between queueing and not queueing.  A follow-up patch for
iproute will be posted.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

macvlan: Add netlink attribute for broadcast cutoff

Make the broadcast cutoff configurable through netlink.  Note
that macvlan is weird because there is no central device for
us to configure (the lowerdev could be anything).  So all the
options are duplicated over what could be thousands of child
devices.

IFLA_MACVLAN_BC_QUEUE_LEN took the approach of taking the maximum
of all child device settings.  This is unnecessary as we could
simply store the option in the port device and take the last
child device that gets updated as the value to use.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvlan: Skip broadcast queue if multicast with single receiver

As it stands all broadcast and multicast packets are queued and
processed in a work queue.  This is so that we don't overwhelm
the receive softirq path by generating thousands of packets or
more (see commit 412ca1550cbe "macvlan: Move broadcasts into a
work queue").

As such all multicast packets will be delayed, even if they will
be received by a single macvlan device.  As using a workqueue
is not free in terms of latency, we should avoid this where possible.

This patch adds a new filter to determine which addresses should
be delayed and which ones won't.  This is done using a crude
counter of how many times an address has been added to the macvlan
port (ha->synced).  For now if an address has been added more than
once, then it will be considered to be broadcast.  This could be
tuned further by making this threshold configurable.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mptcp-cleanups'

Matthieu Baerts says:

====================
mptcp: a couple of cleanups and improvements

Patch 1 removes an unneeded address copy in subflow_syn_recv_sock().

Patch 2 simplifies subflow_syn_recv_sock() to postpone some actions and
to avoid a bunch of conditionals.

Patch 3 stops reporting limits that are not taken into account when the
userspace PM is used.

Patch 4 adds a new test to validate that the 'subflows' field reported
by the kernel is correct. Such info can be retrieved via Netlink (e.g.
with ss) or getsockopt(SOL_MPTCP, MPTCP_INFO).

---
Changes in v2:
- Patch 3/4's commit message has been updated to use the correct SHA
- Rebased on latest net-next
- Link to v1: https://lore.kernel.org/r/20230324-upstream-net-next-20230324-misc-features-v1-0-5a29154592bd@tessares.net
====================

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mptcp: add mptcp_info tests

This patch adds the mptcp_info fields tests in endpoint_tests(). Add a
new function chk_mptcp_info() to check the given number of the given
mptcp_info field.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/330
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: do not fill info not used by the PM in used

Only the in-kernel PM uses the number of address and subflow limits
allowed per connection.

It then makes more sense not to display such info when other PMs are
used not to confuse the userspace by showing limits not being used.

While at it, we can get rid of the "val" variable and add indentations
instead.

It would have been good to have done this modification directly in
commit 4d25247d3ae4 ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs")
but as we change a bit the behaviour, it is fine not to backport it to
stable.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: simplify subflow_syn_recv_sock()

Postpone the msk cloning to the child process creation
so that we can avoid a bunch of conditionals.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/61
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: avoid unneeded address copy

In the syn_recv fallback path, the msk is unused. We can skip
setting the socket address.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'in6addr_any-cleanups'

Kuniyuki Iwashima says:

====================
ipv6: Random cleanup for in6addr_any.

The first patch removes in6addr_any alternatives and the second
removes redundant initialisation of a local variable.

Changes:
v2: Use ipv6_addr_any() in patch 1. (David Ahern)

v1: https://lore.kernel.org/netdev/20230322012204.33157-1-kuniyu@amazon.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

6lowpan: Remove redundant initialisation.

We'll call memset(&tmp, 0, sizeof(tmp)) later.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Remove in6addr_any alternatives.

Some code defines the IPv6 wildcard address as a local variable and
use it with memcmp() or ipv6_addr_equal().

Let's use in6addr_any and ipv6_addr_any() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'vsock-sockmap-support'

Bobby Eshleman says:

====================
Add support for sockmap to vsock.

We're testing usage of vsock as a way to redirect guest-local UDS
requests to the host and this patch series greatly improves the
performance of such a setup.

Compared to copying packets via userspace, this improves throughput by
121% in basic testing.

Tested as follows.

Setup: guest unix dgram sender -> guest vsock redirector -> host vsock
       server
Threads: 1
Payload: 64k
No sockmap:
- 76.3 MB/s
- The guest vsock redirector was
  "socat VSOCK-CONNECT:2:1234 UNIX-RECV:/path/to/sock"
Using sockmap (this patch):
- 168.8 MB/s (+121%)
- The guest redirector was a simple sockmap echo server,
  redirecting unix ingress to vsock 2:1234 egress.
- Same sender and server programs

*Note: these numbers are from RFC v1

Only the virtio transport has been tested. The loopback transport was
used in writing bpf/selftests, but not thoroughly tested otherwise.

This series requires the skb patch.

Changes in v4:
- af_vsock: fix parameter alignment in vsock_dgram_recvmsg()
- af_vsock: add TCP_ESTABLISHED comment in vsock_dgram_connect()
- vsock/bpf: change ret type to bool

Changes in v3:
- vsock/bpf: Refactor wait logic in vsock_bpf_recvmsg() to avoid
  backwards goto
- vsock/bpf: Check psock before acquiring slock
- vsock/bpf: Return bool instead of int of 0 or 1
- vsock/bpf: Wrap macro args __sk/__psock in parens
- vsock/bpf: Place comment trailer */ on separate line

Changes in v2:
- vsock/bpf: rename vsock_dgram_* -> vsock_*
- vsock/bpf: change sk_psock_{get,put} and {lock,release}_sock() order
  to minimize slock hold time
- vsock/bpf: use "new style" wait
- vsock/bpf: fix bug in wait log
- vsock/bpf: add check that recvmsg sk_type is one dgram, seqpacket, or
  stream.  Return error if not one of the three.
- virtio/vsock: comment __skb_recv_datagram() usage
- virtio/vsock: do not init copied in read_skb()
- vsock/bpf: add ifdef guard around struct proto in dgram_recvmsg()
- selftests/bpf: add vsock loopback config for aarch64
- selftests/bpf: add vsock loopback config for s390x
- selftests/bpf: remove vsock device from vmtest.sh qemu machine
- selftests/bpf: remove CONFIG_VIRTIO_VSOCKETS=y from config.x86_64
- vsock/bpf: move transport-related (e.g., if (!vsk->transport)) checks
  out of fast path
====================

Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/bpf: add a test case for vsock sockmap

Add a test case testing the redirection from connectible AF_VSOCK
sockets to connectible AF_UNIX sockets.

Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/bpf: add vsock to vmtest.sh

Add vsock loopback to the test kernel.

This allows sockmap for vsock to be tested.

Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vsock: support sockmap

This patch adds sockmap support for vsock sockets. It is intended to be
usable by all transports, but only the virtio and loopback transports
are implemented.

SOCK_STREAM, SOCK_DGRAM, and SOCK_SEQPACKET are all supported.

Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

testing/vsock: add vsock_perf to gitignore

This adds the vsock_perf binary to the gitignore file.

Fixes: 8abbffd27ced ("test/vsock: vsock_perf utility")
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Reviewed-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20230327-vsock-add-vsock-perf-to-ignore-v1-1-f28a84f3606b@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ynl-add-support-for-user-headers-and-struct-attrs'

Donald Hunter says:

====================
ynl: add support for user headers and struct attrs

Add support for user headers and struct attrs to YNL. This patchset adds
features to ynl and add a partial spec for openvswitch that demonstrates
use of the features.

Patch 1-4 add features to ynl
Patch 5 adds partial openvswitch specs that demonstrate the new features
Patch 6-7 add documentation for legacy structs and for sub-type
====================

Link: https://lore.kernel.org/r/20230327083138.96044-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: netlink: document the sub-type attribute property

Add a definition for sub-type to the protocol spec doc and a description of
its usage for C arrays in genetlink-legacy.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: netlink: document struct support for genetlink-legacy

Describe the genetlink-legacy support for using struct definitions
for fixed headers and for binary attributes.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: specs: add partial specification for openvswitch

The openvswitch family has a fixed header, uses struct attrs and has array
values. This partial spec demonstrates these features in the YNL CLI. These
specs are sufficient to create, delete and dump datapaths and to dump vports:

$ ./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/ovs_datapath.yaml \
    --do dp-new --json '{ "dp-ifindex": 0, "name": "demo", "upcall-pid": 0}'
None

$ ./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/ovs_datapath.yaml \
    --dump dp-get --json '{ "dp-ifindex": 0 }'
[{'dp-ifindex': 3,
  'masks-cache-size': 256,
  'megaflow-stats': {'cache-hits': 0,
                     'mask-hit': 0,
                     'masks': 0,
                     'pad1': 0,
                     'padding': 0},
  'name': 'test',
  'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
  'user-features': {'dispatch-upcall-per-cpu',
                    'tc-recirc-sharing',
                    'unaligned'}},
{'dp-ifindex': 48,
  'masks-cache-size': 256,
  'megaflow-stats': {'cache-hits': 0,
                     'mask-hit': 0,
                     'masks': 0,
                     'pad1': 0,
                     'padding': 0},
  'name': 'demo',
  'stats': {'flows': 0, 'hit': 0, 'lost': 0, 'missed': 0},
  'user-features': set()}]

$ ./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/ovs_datapath.yaml \
    --do dp-del --json '{ "dp-ifindex": 0, "name": "demo"}'
None

$ ./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/ovs_vport.yaml \
    --dump vport-get --json '{ "dp-ifindex": 3 }'
[{'dp-ifindex': 3,
  'ifindex': 3,
  'name': 'test',
  'port-no': 0,
  'stats': {'rx-bytes': 0,
            'rx-dropped': 0,
            'rx-errors': 0,
            'rx-packets': 0,
            'tx-bytes': 0,
            'tx-dropped': 0,
            'tx-errors': 0,
            'tx-packets': 0},
  'type': 'internal',
  'upcall-pid': [0],
  'upcall-stats': {'fail': 0, 'success': 0}}]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: Add fixed-header support to ynl

Add support for netlink families that add an optional fixed header structure
after the genetlink header and before any attributes. The fixed-header can be
specified on a per op basis, or once for all operations, which serves as a
default value that can be overridden.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: Add struct attr decoding to ynl

Add support for decoding attributes that contain C structs.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: Add C array attribute decoding to ynl

Add support for decoding C arrays from binay blobs in genetlink-legacy
messages.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: Add struct parsing to nlspec

Add python classes for struct definitions to nlspec

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'mlx5-updates-2023-03-20' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-03-20

mlx5 dynamic msix

This patch series adds support for dynamic msix vectors allocation in mlx5.

Eli Cohen Says:

================

The following series of patches modifies mlx5_core to work with the
dynamic MSIX API. Currently, mlx5_core allocates all the interrupt
vectors it needs and distributes them amongst the consumers. With the
introduction of dynamic MSIX support, which allows for allocation of
interrupts more than once, we now allocate vectors as we need them.
This allows other drivers running on top of mlx5_core to allocate
interrupt vectors for their own use. An example for this is mlx5_vdpa,
which uses these vectors to propagate interrupts directly from the
hardware to the vCPU [1].

As a preparation for using this series, a use after free issue is fixed
in lib/cpu_rmap.c and the allocator for rmap entries has been modified.
A complementary API for irq_cpu_rmap_add() has also been introduced.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/patch/?id=0f2bf1fcae96a83b8c5581854713c9fc3407556e

================

* tag 'mlx5-updates-2023-03-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Provide external API for allocating vectors
  net/mlx5: Use one completion vector if eth is disabled
  net/mlx5: Refactor calculation of required completion vectors
  net/mlx5: Move devlink registration before mlx5_load
  net/mlx5: Use dynamic msix vectors allocation
  net/mlx5: Refactor completion irq request/release code
  net/mlx5: Improve naming of pci function vectors
  net/mlx5: Use newer affinity descriptor
  net/mlx5: Modify struct mlx5_irq to use struct msi_map
  net/mlx5: Fix wrong comment
  net/mlx5e: Coding style fix, add empty line
  lib: cpu_rmap: Add irq_cpu_rmap_remove to complement irq_cpu_rmap_add
  lib: cpu_rmap: Use allocator for rmap entries
  lib: cpu_rmap: Avoid use after free on rmap->obj array entries
====================

Link: https://lore.kernel.org/r/20230324231341.29808-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>