review.tizen.org Git - platform/kernel/linux-starfive.git/log

Merge branch 'dsa_master_ioctl-notifier'

Vladimir Oltean says:

====================
net: Convert dsa_master_ioctl() to netdev notifier

This is preparatory work in order for Maxim Georgiev to be able to start
the API conversion process of hardware timestamping from ndo_eth_ioctl()
to ndo_hwtstamp_set():
https://lore.kernel.org/netdev/20230331045619.40256-1-glipus@gmail.com/

In turn, Maxim Georgiev's work is a preparation so that Köry Maincent is
able to make the active hardware timestamping layer selectable by user
space.
https://lore.kernel.org/netdev/20230308135936.761794-1-kory.maincent@bootlin.com/

So, quite some dependency chain.

Before this patch set, DSA prevented the conversion of any networking
driver from the ndo_eth_ioctl() API to the ndo_hwtstamp_set() API,
because it wanted to validate the hwtstamping settings on the DSA
master, and it was only coded up to do this using the old API.

After this patch set, a new netdev notifier exists, which does not
depend on anything that would constitute the "soon-to-be-legacy" API,
but rather, it uses a newly introduced struct kernel_hwtstamp_config,
and it doesn't issue any ioctl at all, being thus compatible both with
ndo_eth_ioctl(), and with the not-yet-introduced, but now possible,
ndo_hwtstamp_set().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: create a netdev notifier for DSA to reject PTP on DSA master

The fact that PTP 2-step TX timestamping is broken on DSA switches if
the master also timestamps the same packets is documented by commit
f685e609a301 ("net: dsa: Deny PTP on master if switch supports it").
We attempt to help the users avoid shooting themselves in the foot by
making DSA reject the timestamping ioctls on an interface that is a DSA
master, and the switch tree beneath it contains switches which are aware
of PTP.

The only problem is that there isn't an established way of intercepting
ndo_eth_ioctl calls, so DSA creates avoidable burden upon the network
stack by creating a struct dsa_netdevice_ops with overlaid function
pointers that are manually checked from the relevant call sites. There
used to be 2 such dsa_netdevice_ops, but now, ndo_eth_ioctl is the only
one left.

There is an ongoing effort to migrate driver-visible hardware timestamping
control from the ndo_eth_ioctl() based API to a new ndo_hwtstamp_set()
model, but DSA actively prevents that migration, since dsa_master_ioctl()
is currently coded to manually call the master's legacy ndo_eth_ioctl(),
and so, whenever a network device driver would be converted to the new
API, DSA's restrictions would be circumvented, because any device could
be used as a DSA master.

The established way for unrelated modules to react on a net device event
is via netdevice notifiers. So we create a new notifier which gets
called whenever there is an attempt to change hardware timestamping
settings on a device.

Finally, there is another reason why a netdev notifier will be a good
idea, besides strictly DSA, and this has to do with PHY timestamping.

With ndo_eth_ioctl(), all MAC drivers must manually call
phy_has_hwtstamp() before deciding whether to act upon SIOCSHWTSTAMP,
otherwise they must pass this ioctl to the PHY driver via
phy_mii_ioctl().

With the new ndo_hwtstamp_set() API, it will be desirable to simply not
make any calls into the MAC device driver when timestamping should be
performed at the PHY level.

But there exist drivers, such as the lan966x switch, which need to
install packet traps for PTP regardless of whether they are the layer
that provides the hardware timestamps, or the PHY is. That would be
impossible to support with the new API.

The proposal there, too, is to introduce a netdev notifier which acts as
a better cue for switching drivers to add or remove PTP packet traps,
than ndo_hwtstamp_set(). The one introduced here "almost" works there as
well, except for the fact that packet traps should only be installed if
the PHY driver succeeded to enable hardware timestamping, whereas here,
we need to deny hardware timestamping on the DSA master before it
actually gets enabled. This is why this notifier is called "PRE_", and
the notifier that would get used for PHY timestamping and packet traps
would be called NETDEV_CHANGE_HWTSTAMP. This isn't a new concept, for
example NETDEV_CHANGEUPPER and NETDEV_PRECHANGEUPPER do the same thing.

In expectation of future netlink UAPI, we also pass a non-NULL extack
pointer to the netdev notifier, and we make DSA populate it with an
informative reason for the rejection. To avoid making it go to waste, we
make the ioctl-based dev_set_hwtstamp() create a fake extack and print
the message to the kernel log.

Link: https://lore.kernel.org/netdev/20230401191215.tvveoi3lkawgg6g4@skbuf/
Link: https://lore.kernel.org/netdev/20230310164451.ls7bbs6pdzs4m6pw@skbuf/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: make dsa_port_supports_hwtstamp() construct a fake ifreq

dsa_master_ioctl() is in the process of getting converted to a different
API, where we won't have access to a struct ifreq * anymore, but rather,
to a struct kernel_hwtstamp_config.

Since ds->ops->port_hwtstamp_get() still uses struct ifreq *, this
creates a difficult situation where we have to make up such a dummy
pointer.

The conversion is a bit messy, because it forces a "good" implementation
of ds->ops->port_hwtstamp_get() to return -EFAULT in copy_to_user()
because of the NULL ifr->ifr_data pointer. However, it works, and it is
only a transient step until ds->ops->port_hwtstamp_get() gets converted
to the new API which passes struct kernel_hwtstamp_config and does not
call copy_to_user().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add struct kernel_hwtstamp_config and make net_hwtstamp_validate() use it

Jakub Kicinski suggested that we may want to add new UAPI for
controlling hardware timestamping through netlink in the future, and in
that case, we will be limited to the struct hwtstamp_config that is
currently passed in fixed binary format through the SIOCGHWTSTAMP and
SIOCSHWTSTAMP ioctls. It would be good if new kernel code already
started operating on an extensible kernel variant of that structure,
similar in concept to struct kernel_ethtool_coalesce vs struct
ethtool_coalesce.

Since struct hwtstamp_config is in include/uapi/linux/net_tstamp.h, here
we introduce include/linux/net_tstamp.h which shadows that other header,
but also includes it, so that existing includers of this header work as
before. In addition to that, we add the definition for the kernel-only
structure, and a helper which translates all fields by manual copying.
I am doing a manual copy in order to not force the alignment (or type)
of the fields of struct kernel_hwtstamp_config to be the same as of
struct hwtstamp_config, even though now, they are the same.

Link: https://lore.kernel.org/netdev/20230330223519.36ce7d23@kernel.org/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: move copy_from_user() out of net_hwtstamp_validate()

The kernel will want to start using the more meaningful struct
hwtstamp_config pointer in more places, so move the copy_from_user() at
the beginning of dev_set_hwtstamp() in order to get to that, and pass
this argument to net_hwtstamp_validate().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: promote SIOCSHWTSTAMP and SIOCGHWTSTAMP ioctls to dedicated handlers

DSA does not want to intercept all ioctls handled by dev_eth_ioctl(),
only SIOCSHWTSTAMP. This can be seen from commit f685e609a301 ("net:
dsa: Deny PTP on master if switch supports it"). However, the way in
which the dsa_ndo_eth_ioctl() is called would suggest otherwise.

Split the handling of SIOCSHWTSTAMP and SIOCGHWTSTAMP ioctls into
separate case statements of dev_ifsioc(), and make each one call its own
sub-function. This also removes the dsa_ndo_eth_ioctl() call from
dev_eth_ioctl(), which from now on exclusively handles PHY ioctls.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: simplify handling of dsa_ndo_eth_ioctl() return code

In the expression "x == 0 || x != -95", the term "x == 0" does not
change the expression's logical value, because 0 != -95, and so,
if x is 0, the expression would still be true by virtue of the second
term. If x is non-zero, the expression depends on the truth value of
the second term anyway. As such, the first term is redundant and can
be deleted.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: don't abuse "default" case for unknown ioctl in dev_ifsioc()

The "switch (cmd)" block from dev_ifsioc() gained a bit too much
unnecessary manual handling of "cmd" in the "default" case, starting
with the private ioctls.

Clean that up by using the "ellipsis" gcc extension, adding separate
cases for the rest of the ioctls, and letting the default case only
return -EINVAL.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: alteon: remove unused len variable

clang with W=1 reports
drivers/net/ethernet/alteon/acenic.c:2438:10: error: variable
  'len' set but not used [-Werror,-Wunused-but-set-variable]
                int i, len = 0;
                       ^
This variable is not used so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-transceiver-trip-points'

Petr Machata says:

====================
mlxsw: Use static trip points for transceiver modules

Ido Schimmel writes:

See patch #1 for motivation and implementation details.

Patches #2-#3 are simple cleanups as a result of the changes in the
first patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: core_thermal: Simplify transceiver module get_temp() callback

The get_temp() callback of a thermal zone associated with a transceiver
module no longer needs to read the temperature thresholds of the module.
Therefore, simplify the callback by only reading the temperature.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: core_thermal: Make mlxsw_thermal_module_init() void

The function can no longer fail so make it void and remove the
associated error path.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: core_thermal: Use static trip points for transceiver modules

The driver registers a thermal zone for each transceiver module and
tries to set the trip point temperatures according to the thresholds
read from the transceiver. If a threshold cannot be read or if a
transceiver is unplugged, the trip point temperature is set to zero,
which means that it is disabled as far as the thermal subsystem is
concerned.

A recent change in the thermal core made it so that such trip points are
no longer marked as disabled, which lead the thermal subsystem to
incorrectly set the associated cooling devices to the their maximum
state [1]. A fix to restore this behavior was merged in commit
f1b80a3878b2 ("thermal: core: Restore behavior regarding invalid trip
points"). However, the thermal maintainer suggested to not rely on this
behavior and instead always register a valid array of trip points [2].

Therefore, create a static array of trip points with sane defaults
(suggested by Vadim) and register it with the thermal zone of each
transceiver module. User space can choose to override these defaults
using the thermal zone sysfs interface since these files are writeable.

Before:

$ cat /sys/class/thermal/thermal_zone11/type
mlxsw-module11
$ cat /sys/class/thermal/thermal_zone11/trip_point_*_temp
65000
75000
80000

After:

$ cat /sys/class/thermal/thermal_zone11/type
mlxsw-module11
$ cat /sys/class/thermal/thermal_zone11/trip_point_*_temp
55000
65000
80000

Also tested by reverting commit f1b80a3878b2 ("thermal: core: Restore
behavior regarding invalid trip points") and making sure that the
associated cooling devices are not set to their maximum state.

[1] https://lore.kernel.org/linux-pm/ZA3CFNhU4AbtsP4G@shredder/
[2] https://lore.kernel.org/linux-pm/f78e6b70-a963-c0ca-a4b2-0d4c6aeef1fb@linaro.org/

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: minor reshuffle of napi_struct

napi_id is read by GRO and drivers to mark skbs, and it currently
sits at the end of the structure, in a mostly unused cache line.
Move it up into a hole, and separate the clearly control path
fields from the important ones.

Before:

struct napi_struct {
struct list_head           poll_list;            /*     0    16 */
long unsigned int          state;                /*    16     8 */
int                        weight;               /*    24     4 */
int                        defer_hard_irqs_count; /*    28     4 */
long unsigned int          gro_bitmask;          /*    32     8 */
int                        (*poll)(struct napi_struct *, int); /*    40     8 */
int                        poll_owner;           /*    48     4 */

/* XXX 4 bytes hole, try to pack */

struct net_device *        dev;                  /*    56     8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct gro_list            gro_hash[8];          /*    64   192 */
/* --- cacheline 4 boundary (256 bytes) --- */
struct sk_buff *           skb;                  /*   256     8 */
struct list_head           rx_list;              /*   264    16 */
int                        rx_count;             /*   280     4 */

/* XXX 4 bytes hole, try to pack */

struct hrtimer             timer;                /*   288    64 */

/* XXX last struct has 4 bytes of padding */

/* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
struct list_head           dev_list;             /*   352    16 */
struct hlist_node          napi_hash_node;       /*   368    16 */
/* --- cacheline 6 boundary (384 bytes) --- */
unsigned int               napi_id;              /*   384     4 */

/* XXX 4 bytes hole, try to pack */

struct task_struct *       thread;               /*   392     8 */

/* size: 400, cachelines: 7, members: 17 */
/* sum members: 388, holes: 3, sum holes: 12 */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 16 bytes */
};

After:

struct napi_struct {
struct list_head           poll_list;            /*     0    16 */
long unsigned int          state;                /*    16     8 */
int                        weight;               /*    24     4 */
int                        defer_hard_irqs_count; /*    28     4 */
long unsigned int          gro_bitmask;          /*    32     8 */
int                        (*poll)(struct napi_struct *, int); /*    40     8 */
int                        poll_owner;           /*    48     4 */

/* XXX 4 bytes hole, try to pack */

struct net_device *        dev;                  /*    56     8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct gro_list            gro_hash[8];          /*    64   192 */
/* --- cacheline 4 boundary (256 bytes) --- */
struct sk_buff *           skb;                  /*   256     8 */
struct list_head           rx_list;              /*   264    16 */
int                        rx_count;             /*   280     4 */
unsigned int               napi_id;              /*   284     4 */
struct hrtimer             timer;                /*   288    64 */

/* XXX last struct has 4 bytes of padding */

/* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
struct task_struct *       thread;               /*   352     8 */
struct list_head           dev_list;             /*   360    16 */
struct hlist_node          napi_hash_node;       /*   376    16 */

/* size: 392, cachelines: 7, members: 17 */
/* sum members: 388, holes: 1, sum holes: 4 */
/* paddings: 1, sum paddings: 4 */
/* forced alignments: 1 */
/* last cacheline: 8 bytes */
} __attribute__((__aligned__(8)));

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: Add support for VF to specify its primary MAC address

Currently in the i40e driver there is no implementation of different
MAC address handling depending on whether it is a legacy or primary.
Introduce new checks for VF to be able to specify its primary MAC
address based on the VIRTCHNL_ETHER_ADDR_PRIMARY type.

Primary MAC address are treated differently compared to legacy
ones in a scenario where:
1. If a unicast MAC is being added and it's specified as
VIRTCHNL_ETHER_ADDR_PRIMARY, then replace the current
default_lan_addr.addr.
2. If a unicast MAC is being deleted and it's type
is specified as VIRTCHNL_ETHER_ADDR_PRIMARY, then zero the
hw_lan_addr.addr.

Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-03-30 (documentation, ice)

This series contains updates to driver documentation and the ice driver.

Tony removes links and addresses related to the out-of-tree driver from the
Intel ethernet driver documentation.

Jake removes a comment that is no longer valid to the ice driver.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: remove comment about not supporting driver reinit
  Documentation/eth/intel: Remove references to SourceForge
  Documentation/eth/intel: Update address for driver support
====================

Link: https://lore.kernel.org/r/20230330165935.2503604-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-next-2023-03-30' of https://git./linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter updates for net-next

1. No need to disable BH in nfnetlink proc handler, freeing happens
   via call_rcu.
2. Expose classid in nfetlink_queue, from Eric Sage.
3. Fix nfnetlink message description comments, from Matthieu De Beule.
4. Allow removal of offloaded connections via ctnetlink, from Paul Blakey.

* tag 'nf-next-2023-03-30' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: ctnetlink: Support offloaded conntrack entry deletion
  netfilter: Correct documentation errors in nf_tables.h
  netfilter: nfnetlink_queue: enable classid socket info retrieval
  netfilter: nfnetlink_log: remove rcu_bh usage
====================

Link: https://lore.kernel.org/r/20230331104809.2959-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: fec: add power-domains property

Add optional power domains property

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230328061518.1985981-1-peng.fan@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Refine SYN handling for PAWS.

Our Network Load Balancer (NLB) [0] has multiple nodes with different
IP addresses, and each node forwards TCP flows from clients to backend
targets.  NLB has an option to preserve the client's source IP address
and port when routing packets to backend targets. [1]

When a client connects to two different NLB nodes, they may select the
same backend target.  Then, if the client has used the same source IP
and port, the two flows at the backend side will have the same 4-tuple.

While testing around such cases, I saw these sequences on the backend
target.

IP 10.0.0.215.60000 > 10.0.3.249.10000: Flags [S], seq 2819965599, win 62727, options [mss 8365,sackOK,TS val 1029816180 ecr 0,nop,wscale 7], length 0
IP 10.0.3.249.10000 > 10.0.0.215.60000: Flags [S.], seq 3040695044, ack 2819965600, win 62643, options [mss 8961,sackOK,TS val 1224784076 ecr 1029816180,nop,wscale 7], length 0
IP 10.0.0.215.60000 > 10.0.3.249.10000: Flags [.], ack 1, win 491, options [nop,nop,TS val 1029816181 ecr 1224784076], length 0
IP 10.0.0.215.60000 > 10.0.3.249.10000: Flags [S], seq 2681819307, win 62727, options [mss 8365,sackOK,TS val 572088282 ecr 0,nop,wscale 7], length 0
IP 10.0.3.249.10000 > 10.0.0.215.60000: Flags [.], ack 1, win 490, options [nop,nop,TS val 1224794914 ecr 1029816181,nop,nop,sack 1 {4156821004:4156821005}], length 0

It seems to be working correctly, but the last ACK was generated by
tcp_send_dupack() and PAWSEstab was increased.  This is because the
second connection has a smaller timestamp than the first one.

In this case, we should send a dup ACK in tcp_send_challenge_ack()
to increase the correct counter and rate-limit it properly.

Let's check the SYN flag after the PAWS tests to avoid adding unnecessary
overhead for most packets.

Link: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html
Link: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html#client-ip-preservation
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvlan: Fix mc_filter calculation

On Wed, Mar 29, 2023 at 08:10:26AM +0000, patchwork-bot+netdevbpf@kernel.org wrote:
>
> Here is the summary with links:
>   - [1/2] macvlan: Skip broadcast queue if multicast with single receiver
>     https://git.kernel.org/netdev/net-next/c/d45276e75e90
>   - [2/2] macvlan: Add netlink attribute for broadcast cutoff
>     https://git.kernel.org/netdev/net-next/c/954d1fa1ac93

Sorry, I made an error and posted my patches from an earlier
revision so a follow-up fix was missing:

---8<---
The bc_cutoff patch broke the calculation of mc_filter causing
some multicast packets to not make it through to the targeted
device.

Fix this by checking whether vlan is set instead of cutoff >= 0.

Also move the cutoff < 0 logic into macvlan_recompute_bc_filter
so that it doesn't change the mc_filter at all.

Fixes: d45276e75e90 ("macvlan: Skip broadcast queue if multicast with single receiver")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-next-2023-03-30' of git://git./linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
Major stack changes:

* TC offload support for drivers below mac80211
* reduced neighbor report (RNR) handling for AP mode
* mac80211 mesh fast-xmit and fast-rx support
* support for another mesh A-MSDU format
   (seems nobody got the spec right)

Major driver changes:

Kalle moved the drivers that were just plain C files
in drivers/net/wireless/ to legacy/ and virtual/ dirs.

hwsim
* multi-BSSID support
* some FTM support

ath11k
* MU-MIMO parameters support
* ack signal support for management packets

rtl8xxxu
* support for RTL8710BU aka RTL8188GU chips

rtw89
* support for various newer firmware APIs

ath10k
* enabled threaded NAPI on WCN3990

iwlwifi
* lots of work for multi-link/EHT (wifi7)
* hardware timestamping support for some devices/firwmares
* TX beacon protection on newer hardware

* tag 'wireless-next-2023-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (181 commits)
  wifi: clean up erroneously introduced file
  wifi: iwlwifi: mvm: correctly use link in iwl_mvm_sta_del()
  wifi: iwlwifi: separate AP link management queues
  wifi: iwlwifi: mvm: free probe_resp_data later
  wifi: iwlwifi: bump FW API to 75 for AX devices
  wifi: iwlwifi: mvm: move max_agg_bufsize into host TLC lq_sta
  wifi: iwlwifi: mvm: send full STA during HW restart
  wifi: iwlwifi: mvm: rework active links counting
  wifi: iwlwifi: mvm: update mac config when assigning chanctx
  wifi: iwlwifi: mvm: use the correct link queue
  wifi: iwlwifi: mvm: clean up mac_id vs. link_id in MLD sta
  wifi: iwlwifi: mvm: fix station link data leak
  wifi: iwlwifi: mvm: initialize max_rc_amsdu_len per-link
  wifi: iwlwifi: mvm: use appropriate link for rate selection
  wifi: iwlwifi: mvm: use the new lockdep-checking macros
  wifi: iwlwifi: mvm: remove chanctx WARN_ON
  wifi: iwlwifi: mvm: avoid sending MAC context for idle
  wifi: iwlwifi: mvm: remove only link-specific AP keys
  wifi: iwlwifi: mvm: skip inactive links
  wifi: iwlwifi: mvm: adjust iwl_mvm_scan_respect_p2p_go_iter() for MLO
  ...
====================

Link: https://lore.kernel.org/r/20230330205612.921134-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tools-ynl-fill-in-some-gaps-of-ethtool-spec'

Stanislav Fomichev says:

====================
tools: ynl: fill in some gaps of ethtool spec

I was trying to fill in the spec while exploring ethtool API for some
related work. I don't think I'll have the patience to fill in the rest,
so decided to share whatever I currently have.

Patches 1-2 add the be16 + spec.
Patches 3-4 implement an ethtool-like python tool to test the spec.

Patches 3-4 are there because it felt more fun do the tool instead
of writing the actual tests; feel free to drop it; sharing mostly
to show that the spec is not a complete nonsense.

The spec is not 100% complete, see patch 2 for what's missing.
I was hoping to finish the stats-get message, but I'm too dump
to implement bitmask marshaling (multi-attr).
====================

Link: https://lore.kernel.org/r/20230329221655.708489-1-sdf@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: ethtool testing tool

This is what I've been using to see whether the spec makes sense.
A small subset of getters (mostly the unprivileged ones) is implemented.
Some setters (channels) also work.
Setters for messages with bitmasks are not implemented.

Initially I was trying to make this tool look 1:1 like real ethtool,
but eventually gave up :-)

Sample output:

$ ./tools/net/ynl/ethtool enp0s31f6
Settings for enp0s31f6:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half
100baseT/Full 1000baseT/Full
Supported pause frame use: no
Supports auto-negotiation: yes
Supported FEC modes: Not reported
Speed: Unknown!
Duplex: Unknown! (255)
Auto-negotiation: on
Port: Twisted Pair
PHYAD: 2
Transceiver: Internal
MDI-X: Unknown (auto)
Current message level: drv probe link
Link detected: no

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: replace print with NlError

Instead of dumping the error on the stdout, make the callee and
opportunity to decide what to do with it. This is mostly for the
ethtool testing.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: populate most of the ethtool spec

Things that are not implemented:
- cable tests
- bitmaks in the requests don't work (needs multi-attr support in ynl.py)
- stats-get seems to return nonsense (not passing a bitmask properly?)
- notifications are not tested

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: support byte-order in cli

Used by ethtool spec.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-af: update type of prof fields in nix_aw_enq_req

Update type of prof and prof_mask fields in nix_as_enq_req
from u64 to struct nix_bandprof_s, which is 128 bits wide.

This is to address warnings with compiling with gcc-12 W=1
regarding string fortification.

Although the union of which these fields are a member is 128bits
wide, and thus writing a 128bit entity is safe, the compiler flags
a problem as the field being written is only 64 bits wide.

  CC [M]  drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.o
scripts/Makefile.build:252: ./drivers/net/ethernet/marvell/octeontx2/nic/Makefile: otx2_dcbnl.o is added to multiple modules: rvu_nicpf rvu_nicvf
  CC [M]  drivers/net/ethernet/marvell/octeontx2/nic/otx2_dcbnl.o
  CC [M]  drivers/net/ethernet/marvell/octeontx2/nic/qos_sq.o
  CC [M]  drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.o
  CC [M]  drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.o
In file included from ./include/linux/string.h:254,
                 from ./include/linux/bitmap.h:11,
                 from ./include/linux/cpumask.h:12,
                 from ./arch/x86/include/asm/paravirt.h:17,
                 from ./arch/x86/include/asm/cpuid.h:62,
                 from ./arch/x86/include/asm/processor.h:19,
                 from ./arch/x86/include/asm/timex.h:5,
                 from ./include/linux/timex.h:67,
                 from ./include/linux/time32.h:13,
                 from ./include/linux/time.h:60,
                 from ./include/linux/stat.h:19,
                 from ./include/linux/module.h:13,
                 from drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:8:
In function 'fortify_memcpy_chk',
    inlined from 'rvu_nix_blk_aq_enq_inst' at drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:969:4:
./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
  529 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function 'fortify_memcpy_chk',
    inlined from 'rvu_nix_blk_aq_enq_inst' at drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:984:4:
./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
  529 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Compile tested only!

Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230329112356.458072-1-horms@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-sched-act_tunnel_key-add-support-for-tunnel_dont_fragment'

Davide Caratti says:

====================
net/sched: act_tunnel_key: add support for TUNNEL_DONT_FRAGMENT

- patch 1 extends TC tunnel_key action to add support for TUNNEL_DONT_FRAGMENT
- patch 2 extends tdc to skip tests when iproute2 support is missing
- patch 3 adds a tdc test case to verify functionality of the control plane
- patch 4 adds a net/forwarding test case to verify functionality of the data plane
====================

Link: https://lore.kernel.org/r/cover.1680082990.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: forwarding: add tunnel_key "nofrag" test case

Add a selftest that configures metadata tunnel encapsulation using the TC
"tunnel_key" action: it includes a test case for setting "nofrag" flag.

Example output:

# selftests: net/forwarding: tc_tunnel_key.sh
# TEST: tunnel_key nofrag (skip_hw) [ OK ]
# INFO: Could not test offloaded functionality
ok 1 selftests: net/forwarding: tc_tunnel_key.sh

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: tc-testing: add tunnel_key "nofrag" test case

# ./tdc.py -e 6bda -l
6bda: (actions, tunnel_key) Add tunnel_key action with nofrag option

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: tc-testing: add "depends_on" property to skip tests

currently, users can skip individual test cases by means of writing

"skip": "yes"

in the scenario file. Extend this functionality, introducing 'dependsOn':
it's optional property like "skip", but the value contains a command (for
example, a probe on iproute2 to check if it supports a specific feature).
If such property is present, tdc executes that command and skips the test
when the return value is non-zero.

Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: act_tunnel_key: add support for "don't fragment"

extend "act_tunnel_key" to allow specifying TUNNEL_DONT_FRAGMENT.

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rtnetlink: Fix do_test_address_proto()

This selftest was introduced recently in the commit cited below. It misses
several check_err() invocations to actually verify that the previous
command succeeded. When these are added, the first one fails, because
besides the addresses added by hand, there can be a link-local address
added by the kernel. Adjust the check to expect at least three addresses
instead of exactly three, and add the missing check_err's.

Furthermore, the explanatory comments assume that the address with no
protocol is $addr2, when in fact it is $addr3. Update the comments.

Fixes: 6a414fd77f61 ("selftests: rtnetlink: Add an address proto test")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/53a579bc883e1bf2fe490d58427cf22c2d1aa21f.1680102695.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ti: Fix format specifier in netcp_create_interface()

After commit 3948b05950fd ("net: introduce a config option to tweak
MAX_SKB_FRAGS"), clang warns:

  drivers/net/ethernet/ti/netcp_core.c:2085:4: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
                          MAX_SKB_FRAGS);
                          ^~~~~~~~~~~~~
  include/linux/dev_printk.h:144:65: note: expanded from macro 'dev_err'
          dev_printk_index_wrap(_dev_err, KERN_ERR, dev, dev_fmt(fmt), ##__VA_ARGS__)
                                                                 ~~~     ^~~~~~~~~~~
  include/linux/dev_printk.h:110:23: note: expanded from macro 'dev_printk_index_wrap'
                  _p_func(dev, fmt, ##__VA_ARGS__);                       \
                               ~~~    ^~~~~~~~~~~
  include/linux/skbuff.h:352:23: note: expanded from macro 'MAX_SKB_FRAGS'
  #define MAX_SKB_FRAGS CONFIG_MAX_SKB_FRAGS
                        ^~~~~~~~~~~~~~~~~~~~
  ./include/generated/autoconf.h:11789:30: note: expanded from macro 'CONFIG_MAX_SKB_FRAGS'
  #define CONFIG_MAX_SKB_FRAGS 17
                               ^~
  1 warning generated.

Follow the pattern of the rest of the tree by changing the specifier to
'%u' and casting MAX_SKB_FRAGS explicitly to 'unsigned int', which
eliminates the warning.

Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20230329-net-ethernet-ti-wformat-v1-1-83d0f799b553@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: fix db type confusion in host fdb/mdb add/del

We have the following code paths:

Host FDB (unicast RX filtering):

dsa_port_standalone_host_fdb_add()   dsa_port_bridge_host_fdb_add()
               |                                     |
               +--------------+         +------------+
                              |         |
                              v         v
                         dsa_port_host_fdb_add()

dsa_port_standalone_host_fdb_del()   dsa_port_bridge_host_fdb_del()
               |                                     |
               +--------------+         +------------+
                              |         |
                              v         v
                         dsa_port_host_fdb_del()

Host MDB (multicast RX filtering):

dsa_port_standalone_host_mdb_add()   dsa_port_bridge_host_mdb_add()
               |                                     |
               +--------------+         +------------+
                              |         |
                              v         v
                         dsa_port_host_mdb_add()

dsa_port_standalone_host_mdb_del()   dsa_port_bridge_host_mdb_del()
               |                                     |
               +--------------+         +------------+
                              |         |
                              v         v
                         dsa_port_host_mdb_del()

The logic added by commit 5e8a1e03aa4d ("net: dsa: install secondary
unicast and multicast addresses as host FDB/MDB") zeroes out
db.bridge.num if the switch doesn't support ds->fdb_isolation
(the majority doesn't). This is done for a reason explained in commit
c26933639b54 ("net: dsa: request drivers to perform FDB isolation").

Taking a single code path as example - dsa_port_host_fdb_add() - the
others are similar - the problem is that this function handles:
- DSA_DB_PORT databases, when called from
  dsa_port_standalone_host_fdb_add()
- DSA_DB_BRIDGE databases, when called from
  dsa_port_bridge_host_fdb_add()

So, if dsa_port_host_fdb_add() were to make any change on the
"bridge.num" attribute of the database, this would only be correct for a
DSA_DB_BRIDGE, and a type confusion for a DSA_DB_PORT bridge.

However, this bug is without consequences, for 2 reasons:

- dsa_port_standalone_host_fdb_add() is only called from code which is
  (in)directly guarded by dsa_switch_supports_uc_filtering(ds), and that
  function only returns true if ds->fdb_isolation is set. So, the code
  only executed for DSA_DB_BRIDGE databases.

- Even if the code was not dead for DSA_DB_PORT, we have the following
  memory layout:

struct dsa_bridge {
struct net_device *dev;
unsigned int num;
bool tx_fwd_offload;
refcount_t refcount;
};

struct dsa_db {
enum dsa_db_type type;

union {
const struct dsa_port *dp; // DSA_DB_PORT
struct dsa_lag lag;
struct dsa_bridge bridge; // DSA_DB_BRIDGE
};
};

So, the zeroization of dsa_db :: bridge :: num on a dsa_db structure of
type DSA_DB_PORT would access memory which is unused, because we only
use dsa_db :: dp for DSA_DB_PORT, and this is mapped at the same address
with dsa_db :: dev for DSA_DB_BRIDGE, thanks to the union definition.

It is correct to fix up dsa_db :: bridge :: num only from code paths
that come from the bridge / switchdev, so move these there.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230329133819.697642-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ksz884x: remove unused change variable

clang with W=1 reports
drivers/net/ethernet/micrel/ksz884x.c:3216:6: error: variable
  'change' set but not used [-Werror,-Wunused-but-set-variable]
        int change = 0;
            ^
This variable is not used so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230329125929.1808420-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/netdev/net

Conflicts:

drivers/net/ethernet/mediatek/mtk_ppe.c
3fbe4d8c0e53 ("net: ethernet: mtk_eth_soc: ppe: add support for flow accounting")
924531326e2d ("net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.3-rc5' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from CAN and WPAN.

  Still quite a few bugs from this release. This pull is a bit smaller
  because major subtrees went into the previous one. Or maybe people
  took spring break off?

  Current release - regressions:

   - phy: micrel: correct KSZ9131RNX EEE capabilities and advertisement

  Current release - new code bugs:

   - eth: wangxun: fix vector length of interrupt cause

   - vsock/loopback: consistently protect the packet queue with
     sk_buff_head.lock

   - virtio/vsock: fix header length on skb merging

   - wpan: ca8210: fix unsigned mac_len comparison with zero

  Previous releases - regressions:

   - eth: stmmac: don't reject VLANs when IFF_PROMISC is set

   - eth: smsc911x: avoid PHY being resumed when interface is not up

   - eth: mtk_eth_soc: fix tx throughput regression with direct 1G links

   - eth: bnx2x: use the right build_skb() helper after core rework

   - wwan: iosm: fix 7560 modem crash on use on unsupported channel

  Previous releases - always broken:

   - eth: sfc: don't overwrite offload features at NIC reset

   - eth: r8169: fix RTL8168H and RTL8107E rx crc error

   - can: j1939: prevent deadlock by moving j1939_sk_errqueue()

   - virt: vmxnet3: use GRO callback when UPT is enabled

   - virt: xen: don't do grant copy across page boundary

   - phy: dp83869: fix default value for tx-/rx-internal-delay

   - dsa: ksz8: fix multiple issues with ksz8_fdb_dump

   - eth: mvpp2: fix classification/RSS of VLAN and fragmented packets

   - eth: mtk_eth_soc: fix flow block refcounting logic

  Misc:

   - constify fwnode pointers in SFP handling"

* tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits)
  net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow
  net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload
  net: ethernet: mtk_eth_soc: fix flow block refcounting logic
  net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()
  net: dsa: sync unicast and multicast addresses for VLAN filters too
  net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only
  xen/netback: use same error messages for same errors
  test/vsock: new skbuff appending test
  virtio/vsock: WARN_ONCE() for invalid state of socket
  virtio/vsock: fix header length on skb merging
  bnxt_en: Add missing 200G link speed reporting
  bnxt_en: Fix typo in PCI id to device description string mapping
  bnxt_en: Fix reporting of test result in ethtool selftest
  i40e: fix registers dump after run ethtool adapter self test
  bnx2x: use the right build_skb() helper
  net: ipa: compute DMA pool size properly
  net: wwan: iosm: fixes 7560 modem crash
  net: ethernet: mtk_eth_soc: fix tx throughput regression with direct 1G links
  ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
  ice: add profile conflict check for AVF FDIR
  ...

Merge tag 'for-6.3/dm-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

- Fix two DM core bugs in the code that handles splitting "abnormal" IO
   (discards, write same and secure erase) and issuing that IO to the
   correct underlying devices (and offsets within those devices).

* tag 'for-6.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: fix __send_duplicate_bios() to always allow for splitting IO
  dm: fix improper splitting for abnormal bios

wifi: clean up erroneously introduced file

Evidently Gregory sent this file but I (apparently every else) missed
it entirely, remove that.

Fixes: cf85123a210f ("wifi: iwlwifi: mvm: support enabling and disabling HW timestamping")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Merge tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Daniel Vetter:
"Two regression fixes in here, otherwise just the usual stuff:

   - i915 fixes for color mgmt, psr, lmem flush, hibernate oops, and
     more

   - amdgpu: dp mst and hibernate regression fix

   - etnaviv: revert fdinfo support (incl drm/sched revert), leak fix

   - misc ivpu fixes, nouveau backlight, drm buddy allocator 32bit
     fixes"

* tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm: (27 commits)
  Revert "drm/scheduler: track GPU active time per entity"
  Revert "drm/etnaviv: export client GPU usage statistics via fdinfo"
  drm/etnaviv: fix reference leak when mmaping imported buffer
  drm/amdgpu: allow more APUs to do mode2 reset when go to S4
  drm/amd/display: Take FEC Overhead into Timeslot Calculation
  drm/amd/display: Add DSC Support for Synaptics Cascaded MST Hub
  drm: test: Fix 32-bit issue in drm_buddy_test
  drm: buddy_allocator: Fix buddy allocator init on 32-bit systems
  drm/nouveau/kms: Fix backlight registration
  drm/i915/perf: Drop wakeref on GuC RC error
  drm/i915/dpt: Treat the DPT BO as a framebuffer
  drm/i915/gem: Flush lmem contents after construction
  drm/i915/tc: Fix the ICL PHY ownership check in TC-cold state
  drm/i915: Disable DC states for all commits
  drm/i915: Workaround ICL CSC_MODE sticky arming
  drm/i915: Add a .color_post_update() hook
  drm/i915: Move CSC load back into .color_commit_arm() when PSR is enabled on skl/glk
  drm/i915: Split icl_color_commit_noarm() from skl_color_commit_noarm()
  drm/i915/pmu: Use functions common with sysfs to read actual freq
  accel/ivpu: Fix IPC buffer header status field value
  ...

netfilter: ctnetlink: Support offloaded conntrack entry deletion

Currently, offloaded conntrack entries (flows) can only be deleted
after they are removed from offload, which is either by timeout,
tcp state change or tc ct rule deletion. This can cause issues for
users wishing to manually delete or flush existing entries.

Support deletion of offloaded conntrack entries.

Example usage:
# Delete all offloaded (and non offloaded) conntrack entries
# whose source address is 1.2.3.4
$ conntrack -D -s 1.2.3.4
# Delete all entries
$ conntrack -F

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: Correct documentation errors in nf_tables.h

NFTA_RANGE_OP incorrectly says nft_cmp_ops instead of nft_range_ops.
NFTA_LOG_GROUP and NFTA_LOG_QTHRESHOLD claim NLA_U32 instead of NLA_U16
NFTA_EXTHDR_SREG isn't documented as a register

Signed-off-by: Matthieu De Beule <matthieu.debeule@proton.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nfnetlink_queue: enable classid socket info retrieval

This enables associating a socket with a v1 net_cls cgroup. Useful for
applying a per-cgroup policy when processing packets in userspace.

Signed-off-by: Eric Sage <eric_sage@apple.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: nfnetlink_log: remove rcu_bh usage

structure is free'd via call_rcu, so its safe to use rcu_read_lock only.

While at it, skip rcu_read_lock for lookup from packet path, its always
called with rcu held.

Signed-off-by: Florian Westphal <fw@strlen.de>

dm: fix __send_duplicate_bios() to always allow for splitting IO

Commit 7dd76d1feec70 ("dm: improve bio splitting and associated IO
accounting") only called setup_split_accounting() from
__send_duplicate_bios() if a single bio were being issued. But the case
where duplicate bios are issued must call it too.

Otherwise the bio won't be split and resubmitted (via recursion through
block core back to DM) to submit the later portions of a bio (which may
map to an entirely different target).

For example, when discarding an entire DM striped device with the
following DM table:
vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048

Before (broken, discards the first striped target's devices twice):
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
device-mapper: striped: target_stripe=0, bdev=7:0, start=2049 len=22528
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=22528

After (works as expected):
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
device-mapper: striped: target_stripe=0, bdev=7:2, start=2048 len=22528
device-mapper: striped: target_stripe=1, bdev=7:3, start=2048 len=22528

Fixes: 7dd76d1feec70 ("dm: improve bio splitting and associated IO accounting")
Cc: stable@vger.kernel.org
Reported-by: Orange Kao <orange@aiven.io>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

dm: fix improper splitting for abnormal bios

"Abnormal" bios include discards, write zeroes and secure erase. By no
longer passing the calculated 'len' pointer, commit 7dd06a2548b2 ("dm:
allow dm_accept_partial_bio() for dm_io without duplicate bios") took a
senseless approach to disallowing dm_accept_partial_bio() from working
for duplicate bios processed using __send_duplicate_bios().

It inadvertently and incorrectly stopped the use of 'len' when
initializing a target's io (in alloc_tio). As such the resulting tio
could address more area of a device than it should.

For example, when discarding an entire DM striped device with the
following DM table:
vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048

Before this fix:

device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=102400
blkdiscard: attempt to access beyond end of device
loop0: rw=2051, sector=2048, nr_sectors = 102400 limit=81920

device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=102400
blkdiscard: attempt to access beyond end of device
loop1: rw=2051, sector=2048, nr_sectors = 102400 limit=81920

After this fix;

device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872

Fixes: 7dd06a2548b2 ("dm: allow dm_accept_partial_bio() for dm_io without duplicate bios")
Cc: stable@vger.kernel.org
Reported-by: Orange Kao <orange@aiven.io>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow

The cache needs to be flushed to ensure that the hardware stops offloading
the flow immediately.

Fixes: 33fc42de3327 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230330120840.52079-3-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload

Check for skb metadata in order to detect the case where the DSA header
is not present.

Fixes: 2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230330120840.52079-2-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_eth_soc: fix flow block refcounting logic

Since we call flow_block_cb_decref on FLOW_BLOCK_UNBIND, we also need to
call flow_block_cb_incref for a newly allocated cb.
Also fix the accidentally inverted refcount check on unbind.

Fixes: 502e84e2382d ("net: ethernet: mtk_eth_soc: add flow offloading support")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230330120840.52079-1-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()

Reported on the Turris forum, mvneta provokes kernel warnings in the
architecture DMA mapping code when mvneta_setup_txqs() fails to
allocate memory. This happens because when mvneta_cleanup_txqs() is
called in the mvneta_stop() path, we leave pointers in the structure
that have been freed.

Then on mvneta_open(), we call mvneta_setup_txqs(), which starts
allocating memory. On memory allocation failure, mvneta_cleanup_txqs()
will walk all the queues freeing any non-NULL pointers - which includes
pointers that were previously freed in mvneta_stop().

Fix this by setting these pointers to NULL to prevent double-freeing
of the same memory.

Fixes: 2adb719d74f6 ("net: mvneta: Implement software TSO")
Link: https://forum.turris.cz/t/random-kernel-exceptions-on-hbl-tos-7-0/18865/8
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1phUe5-00EieL-7q@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: sync unicast and multicast addresses for VLAN filters too

If certain conditions are met, DSA can install all necessary MAC
addresses on the CPU ports as FDB entries and disable flooding towards
the CPU (we call this RX filtering).

There is one corner case where this does not work.

ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
ip link set swp0 master br0 && ip link set swp0 up
ip link add link swp0 name swp0.100 type vlan id 100
ip link set swp0.100 up && ip addr add 192.168.100.1/24 dev swp0.100

Traffic through swp0.100 is broken, because the bridge turns on VLAN
filtering in the swp0 port (causing RX packets to be classified to the
FDB database corresponding to the VID from their 802.1Q header), and
although the 8021q module does call dev_uc_add() towards the real
device, that API is VLAN-unaware, so it only contains the MAC address,
not the VID; and DSA's current implementation of ndo_set_rx_mode() is
only for VID 0 (corresponding to FDB entries which are installed in an
FDB database which is only hit when the port is VLAN-unaware).

It's interesting to understand why the bridge does not turn on
IFF_PROMISC for its swp0 bridge port, and it may appear at first glance
that this is a regression caused by the logic in commit 2796d0c648c9
("bridge: Automatically manage port promiscuous mode."). After all,
a bridge port needs to have IFF_PROMISC by its very nature - it needs to
receive and forward frames with a MAC DA different from the bridge
ports' MAC addresses.

While that may be true, when the bridge is VLAN-aware *and* it has a
single port, there is no real reason to enable promiscuity even if that
is an automatic port, with flooding and learning (there is nowhere for
packets to go except to the BR_FDB_LOCAL entries), and this is how the
corner case appears. Adding a second automatic interface to the bridge
would make swp0 promisc as well, and would mask the corner case.

Given the dev_uc_add() / ndo_set_rx_mode() API is what it is (it doesn't
pass a VLAN ID), the only way to address that problem is to install host
FDB entries for the cartesian product of RX filtering MAC addresses and
VLAN RX filters.

Fixes: 7569459a52c9 ("net: dsa: manage flooding on the CPU ports")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230329151821.745752-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only

Do not set the MV88E6XXX_PORT_CTL0_IGMP_MLD_SNOOP bit on CPU or DSA ports.

This allows the host CPU port to be a regular IGMP listener by sending out
IGMP Membership Reports, which would otherwise not be forwarded by the
mv88exxx chip, but directly looped back to the CPU port itself.

Fixes: 54d792f257c6 ("net: dsa: Centralise global and port setup code into mv88e6xxx.")
Signed-off-by: Steffen Bätz <steffen@innosonix.de>
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230329150140.701559-1-festevam@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux into drm-fixes

- revert gpu time fdinfo support
- reference leak fix on imported buffers

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Lucas Stach <l.stach@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/de8e08c2599ec0e22456ae36e9757b9ff14c2124.camel@pengutronix.de

Merge tag 'amd-drm-fixes-6.3-2023-03-30' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.3-2023-03-30:

amdgpu:
- Hibernation regression fix

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230330153859.18332-1-alexander.deucher@amd.com

Merge tag 'drm-misc-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Short summary of fixes pull:

* various ivpu fixes
* fix nouveau backlight registration
* fix buddy allocator in 32-bit systems

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230330141006.GA22908@linux-uq9g

ice: remove comment about not supporting driver reinit

Since commit 31c8db2c4fa7 ("ice: implement devlink reinit action"), the ice
driver does support driver re-initialization via devlink reload. Remove the
stale comment indicating that the driver lacks this support from the
ice_devlink_ops structure.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Documentation/eth/intel: Remove references to SourceForge

The out-of-tree driver is hosted on SourceForge, as this does not apply
to the kernel driver remove references to it. Also do some minor
formatting changes around this section.

Suggested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

Documentation/eth/intel: Update address for driver support

Update the email address for support to use Intel Wired LAN, the mailing
list used for kernel development.

Suggested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

Merge tag 'amd-drm-fixes-6.3-2023-03-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.3-2023-03-29:

amdgpu:
- Two DP MST fixes

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230329220059.7622-1-alexander.deucher@amd.com

Merge tag 'drm-intel-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

drm/i915 fixes for v6.3-rc5:
- Fix PMU support by reusing functions with sysfs
- Fix a number of issues related to color, PSR and arm/noarm
- Fix state check related to ICL PHY ownership check in TC-cold state
- Flush lmem contents after construction
- Fix hibernate oops related to DPT BO
- Fix perf stream error path wakeref balance

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87355m4gtm.fsf@intel.com

Merge tag 'sound-6.3-rc5' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes:

   - A potential deadlock fix for USB-audio, involving some change in
     PCM core side

   - A regression fix for probes of USB-audio devices with the
     vendor-specific PCM format bits

   - Two regression fixes for the old YMFPCI driver

   - A few HD-audio quirks as usual"

* tag 'sound-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: Add quirk for Lenovo ZhaoYang CF4620Z
  ALSA: ymfpci: Fix BUG_ON in probe function
  ALSA: ymfpci: Create card with device-managed snd_devm_card_new()
  ALSA: usb-audio: Fix regression on detection of Roland VS-100
  ALSA: hda/realtek: Fix support for Dell Precision 3260
  ALSA: usb-audio: Fix recursive locking at XRUN during syncing
  ALSA: hda/conexant: Partial revert of a quirk for Lenovo
  ALSA: hda/realtek: Add quirks for some Clevo laptops

Merge tag 'zonefs-6.3-rc5' of git://git./linux/kernel/git/dlemoal/zonefs

Pull zonefs fixes from Damien Le Moal:

- Make sure to always invalidate the last page of an inode straddling
   inode->i_size to avoid data inconsistencies with appended data when
   the device zone write granularity does not match the page size.

- Do not propagate iomap -ENOBLK error to userspace and use -EBUSY
   instead.

* tag 'zonefs-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user space
  zonefs: Always invalidate last cached page on append write

Revert "drm/scheduler: track GPU active time per entity"

This reverts commit df622729ddbf as it introduces a use-after-free,
which isn't easy to fix without going back to the design drawing board.

Reported-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

Revert "drm/etnaviv: export client GPU usage statistics via fdinfo"

This reverts commit 97804a133c68, as it builds on top of df622729ddbf
("drm/scheduler: track GPU active time per entity") which needs to be
reverted, as it introduces a use-after-free.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

drm/etnaviv: fix reference leak when mmaping imported buffer

drm_gem_prime_mmap() takes a reference on the GEM object, but before that
drm_gem_mmap_obj() already takes a reference, which will be leaked as only
one reference is dropped when the mapping is closed. Drop the extra
reference when dma_buf_mmap() succeeds.

Cc: stable@vger.kernel.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>

drm/amdgpu: allow more APUs to do mode2 reset when go to S4

Skip mode2 reset only for IMU enabled APUs when do S4.

This patch is to fix the regression issue
https://gitlab.freedesktop.org/drm/amd/-/issues/2483
It is generated by commit b589626674de ("drm/amdgpu: skip ASIC reset
for APUs when go to S4").

Fixes: b589626674de ("drm/amdgpu: skip ASIC reset for APUs when go to S4")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2483
Tested-by: Yuan Perry <Perry.Yuan@amd.com>
Signed-off-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x

xen/netback: use same error messages for same errors

Issue the same error message in case an illegal page boundary crossing
has been detected in both cases where this is tested.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20230329080259.14823-1-jgross@suse.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

smsc911x: remove superfluous variable init

phydev is assigned a value right away, no need to initialize it.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230329064414.25028-1-wsa+renesas@sang-engineering.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user space

The call to invalidate_inode_pages2_range() in __iomap_dio_rw() may
fail, in which case -ENOTBLK is returned and this error code is
propagated back to user space trhough iomap_dio_rw() ->
zonefs_file_dio_write() return chain. This error code is fairly obscure
and may confuse the user. Avoid this and be consistent with the behavior
of zonefs_file_dio_append() for similar invalidate_inode_pages2_range()
errors by returning -EBUSY to user space when iomap_dio_rw() returns
-ENOTBLK.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>

zonefs: Always invalidate last cached page on append write

When a direct append write is executed, the append offset may correspond
to the last page of a sequential file inode which might have been cached
already by buffered reads, page faults with mmap-read or non-direct
readahead. To ensure that the on-disk and cached data is consistant for
such last cached page, make sure to always invalidate it in
zonefs_file_dio_append(). If the invalidation fails, return -EBUSY to
userspace to differentiate from IO errors.

This invalidation will always be a no-op when the FS block size (device
zone write granularity) is equal to the page size (e.g. 4K).

Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com>
Fixes: 02ef12a663c7 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>

Merge branch 'net-rps-rfs-improvements'

Eric Dumazet says:

====================
net: rps/rfs improvements

Jason Xing attempted to optimize napi_schedule_rps() by avoiding
unneeded NET_RX_SOFTIRQ raises: [1], [2]

This is quite complex to implement properly. I chose to implement
the idea, and added a similar optimization in ____napi_schedule()

Overall, in an intensive RPC workload, with 32 TX/RX queues with RFS
I was able to observe a ~10% reduction of NET_RX_SOFTIRQ
invocations.

While this had no impact on throughput or cpu costs on this synthetic
benchmark, we know that firing NET_RX_SOFTIRQ from softirq handler
can force __do_softirq() to wakeup ksoftirqd when need_resched() is true.
This can have a latency impact on stressed hosts.

[1] https://lore.kernel.org/lkml/20230325152417.5403-1-kerneljasonxing@gmail.com/
[2] https://lore.kernel.org/netdev/20230328142112.12493-1-kerneljasonxing@gmail.com/
====================

Link: https://lore.kernel.org/r/20230328235021.1048163-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: optimize ____napi_schedule() to avoid extra NET_RX_SOFTIRQ

____napi_schedule() adds a napi into current cpu softnet_data poll_list,
then raises NET_RX_SOFTIRQ to make sure net_rx_action() will process it.

Idea of this patch is to not raise NET_RX_SOFTIRQ when being called indirectly
from net_rx_action(), because we can process poll_list from this point,
without going to full softirq loop.

This needs a change in net_rx_action() to make sure we restart
its main loop if sd->poll_list was updated without NET_RX_SOFTIRQ
being raised.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: optimize napi_schedule_rps()

Based on initial patch from Jason Xing.

Idea is to not raise NET_RX_SOFTIRQ from napi_schedule_rps()
when we queued a packet into another cpu backlog.

We can do this only in the context of us being called indirectly
from net_rx_action(), to have the guarantee our rps_ipi_list
will be processed before we exit from net_rx_action().

Link: https://lore.kernel.org/lkml/20230325152417.5403-1-kerneljasonxing@gmail.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: add softnet_data.in_net_rx_action

We want to make two optimizations in napi_schedule_rps() and
____napi_schedule() which require to know if these helpers are
called from net_rx_action(), instead of being called from
other contexts.

sd.in_net_rx_action is only read/written by the owning cpu.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: napi_schedule_rps() cleanup

napi_schedule_rps() return value is ignored, remove it.

Change the comment to clarify the intent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

wifi: iwlwifi: mvm: correctly use link in iwl_mvm_sta_del()

This function can be invoked for both MLO and non-MLO, so
it must deal with multi-link correctly. Notable, on auth
timeout, we'd otherwise get a warning due to the erroneous
deflink usage in MLO cases.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100040.b85f6052d51a.Iedfef4b4c4f3ca557aebc0093fdc3f5cfb49b507@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: separate AP link management queues

The link management queues associated with the broadcast stations
were forgotten and so the same queue was used with both broadcast
stations. This leads to lost frames and warnings on cleanup and
HW restart.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100040.0671fa976832.Id5aa9856fd5984e447f247e6d0c3979d9794a21a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: free probe_resp_data later

In the MLD code, we free probe_resp_data before we remove
the MAC from the firmware, so we might receive another one
from the device after freeing, and thus might leak it. Fix
that by moving the free later.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100040.152b1715fc13.Ibd37fed1b24cd25012923ad9170d1fe33ab35c5c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: bump FW API to 75 for AX devices

Start supporting API version 75 for AX devices.

Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100040.f08a27944fc6.Iafe3a2db2b91072a559038b85eca7b6b322be3ff@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: move max_agg_bufsize into host TLC lq_sta

This field is used only for host TLC, so it can reside inside
the corresponding lq_sta struct. Also, TLC lq_sta is cleared
in iwl_mvm_rs_rate_init() upon association, but max_agg_bufsize
is set earlier in iwl_mvm_sta_init(). Thus, place this field
in the persistent part of lq_sta to retain its value.

Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100040.d55361064e39.Ib79d30f27d94607d097f0192af2aacd455a17958@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: send full STA during HW restart

By using the internal station add the station is installed in
firmware with zeroed MAC addresses, which is wrong. Use the
full installation function instead, to fill all data.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100040.62d5371bb3c7.Ie25b62125a3a022f76a36bae5fed9796c18698aa@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: rework active links counting

Remove fw_active_links_num counter since we now have a bitmap of
active links in vif. Also, update link activation status only when
LINK_CONTEXT_MODIFY_ACTIVE bit set in changes parameter.

Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100040.1ecfb27b6b84.I3a5e0bc32b3728e4caae8a231bc3f04ea1d89cad@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: update mac config when assigning chanctx

Some mac parameters, such as HE support, can change at this stage.
Update mac config before updating links.

Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100040.d7882a0d6e04.Ie38cd854a237c46cf85fd7143dc757326f30da6e@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: use the correct link queue

For bcase/mcast tx frames use the link queue.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100040.ccd7218e4be2.I40f608a0441190cc26137b039f7cb7b065fd4e0c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: clean up mac_id vs. link_id in MLD sta

Here we always have a link ID, not MAC ID, so clean up the
naming.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100040.3def62de34b5.I10c9cf5dbfd1fc1e9c9c7d6d4cefcf0c08f1f2da@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: fix station link data leak

When we remove the station, we call iwl_mvm_sta_del() which
returns true if we cannot remove it from the firmware yet,
which happens if this is the station ID for the AP station
that's still used because the MAC is still associated.

However, we still must free the link data as the station is
only kept alive in the firmware, in mac80211 and driver the
data structures are destroyed.

To fix that, we need to make iwl_mvm_mld_free_sta_link()
track whether or not the station is still alive in FW, as
otherwise we might reuse the station ID in the meantime and
iwl_mvm_mld_rm_sta_from_fw() would reject the later delete
from the firmware. Add an argument to it for that. Then we
can use the return value of iwl_mvm_sta_del() for that to
fix the issue, and call iwl_mvm_mld_rm_sta_from_fw() only
if we need to not keep the station in FW.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100039.1d81d4c71f35.I8fc60ac28ffc1147e9b1250e5e6237b3cb5516ac@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: initialize max_rc_amsdu_len per-link

Initialize max_rc_amsdu_len per-link both on state change and when a new
link is added.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100039.5bf521fe58b8.I73fe585f0ff75d41b5afd32077e3d6e48c90db2a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: use appropriate link for rate selection

We were still using the deflink in most cases, update the code to use
the appropriate link.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100039.fa1025502fb4.Iaba0c64740fdcf04a521e2f213bd3f3e27862472@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: use the new lockdep-checking macros

Use the new macros from mac80211 that do lockdep checking
on the RCU dereferences, instead of hard-coding 1 as the
argument to rcu_dereference_protected().

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100039.112df5c8dec2.I1a1008f5566e509953d988f254d15c9e58630418@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: remove chanctx WARN_ON

During link switching there might be a link that's marked
active but has no chanctx assigned (so it's not active from
our driver's POV), skip such links in power recalculation
without any warning.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100039.c629090bd5d2.If7a680d5e349d454f2122f936c21522b9528a55f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: avoid sending MAC context for idle

If we change only idle, avoid sending the MAC context
command since it doesn't depend on idle state. This
also fixes an issue with the firmware because without
this we send the command during link switching, as we
just deactivated the first link, and we cannot send
this command when there's no active link.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100039.f5218f8453ec.I1325ff14ec07a27dd7ea2c1c210a1721d969839f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: remove only link-specific AP keys

When we remove the AP station, we iterate over the links
and remove all the keys, however, the key iteration will
return all keys for all links, so skip the ones that we
don't need based on the link ID.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100039.e724878f502e.I66870d4629244b4b309be79e11cbbd384bdf93be@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: skip inactive links

When iterating station links, skip the links that are not
yet active by checking for mvmvif->link[] existence.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100039.bd9b4e64c478.Ie21422c3bf2589d22942c3c57d26e6330d2e3afc@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: adjust iwl_mvm_scan_respect_p2p_go_iter() for MLO

When looking for a GO on for setting the scan parameters, iterate
over all the available links.

Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100039.4eb25d5655d0.Ie69f7313e4337f78c262a835aea3f707273a4209@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: rxmq: report link ID to mac80211

Add a fw_id_to_link_sta array in mvm to track the link
STA for each firmware station ID, and then use that to
report the link a frame was received on (since we know
the station ID from firmware).

Notably, this fixes beacon tracking for the correct link
since mac80211 now queues and processes those on the one
link identified by the link ID only.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100039.c7dd3ec18077.I12ef9eb4a5b8b5c2b9d6bcaa1fda73b59eba39d8@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: use bcast/mcast link station id

For an MLD AP, use the correct link mcast or bcast station id.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100039.0cffa6c45242.I342e17e7bca87b7f05939eb2ebd36fc2aff0b49f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: translate management frame address

For management frames sent by an AP interface, translate
the MLD addresses to LINK addresses (for an MLD AP).
AP (non-bufferable) management frames are sent via
the broadcast station so the translation cannot be
done by the firmware.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100039.3cb4292f51e8.Ia662c00ff271c70eda927c12ed49b045b1eb8489@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: implement mac80211 callback change_sta_links

Add/removed from iwl driver and firmware station links.
Update the station queues accordingly (which station links
are connected to the station queues).

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100039.156d1aae5de1.I32973141be1190222169879f8caf7038c1a8f769@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: use the link sta address

Replace the deflink.addr with the proper link address
for setting the peer_link_address in the station command.
For a non-MLD station, it will be the deflink.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230329100039.03ab287da0ae.I88fb5ab4e3ea9c886a3fac7ce09c4791469c3c8e@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>