review.tizen.org Git - platform/kernel/linux-starfive.git/log

rxrpc: Show consumed and freed packets as non-dropped in dropwatch

Set a reason when freeing a packet that has been consumed such that
dropwatch doesn't complain that it has been dropped.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

rxrpc: Remove local->defrag_sem

We no longer need local->defrag_sem as all DATA packet transmission is now
done from one thread, so remove it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

rxrpc: Don't lock call->tx_lock to access call->tx_buffer

call->tx_buffer is now only accessed within the I/O thread (->tx_sendmsg is
the way sendmsg passes packets to the I/O thread) so there's no need to
lock around it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

rxrpc: Simplify ACK handling

Now that general ACK transmission is done from the same thread as incoming
DATA packet wrangling, there's no possibility that the SACK table will be
being updated by the latter whilst the former is trying to copy it to an
ACK.

This means that we can safely rotate the SACK table whilst updating it
without having to take a lock, rather than keeping all the bits inside it
in fixed place and copying and then rotating it in the transmitter.

Therefore, simplify SACK handing by keeping track of starting point in the
ring and rotate slots down as we consume them.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

rxrpc: De-atomic call->ackr_window and call->ackr_nr_unacked

call->ackr_window doesn't need to be atomic as ACK generation and ACK
transmission are now done in the same thread, so drop the atomic64 handling
and split it into two separate members.

Similarly, call->ackr_nr_unacked doesn't need to be atomic now either.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

rxrpc: Generate extra pings for RTT during heavy-receive call

When doing a call that has a single transmitted data packet and a massive
amount of received data packets, we only ping for one RTT sample, which
means we don't get a good reading on it.

Fix this by converting occasional IDLE ACKs into PING ACKs to elicit a
response.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

rxrpc: Allow a delay to be injected into packet reception

If CONFIG_AF_RXRPC_DEBUG_RX_DELAY=y, then a delay is injected between
packets and errors being received and them being made available to the
processing code, thereby allowing the RTT to be artificially increased.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

rxrpc: Convert call->recvmsg_lock to a spinlock

Convert call->recvmsg_lock to a spinlock as it's only ever write-locked.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

rxrpc: Shrink the tabulation in the rxrpc trace header a bit

Shrink the tabulation in the rxrpc trace header a bit to allow for fields
with long type names that have been removed.

Signed-off-by: David Howells <dhowells@redhat.com>

rxrpc: Remove whitespace before ')' in trace header

Work around checkpatch warnings in the rxrpc trace header by removing
whitespace before ')' on lines defining the trace record struct.

Signed-off-by: David Howells <dhowells@redhat.com>

rxrpc: Fix trace string

Fix a trace string to indicate that it's discarding the local endpoint for
a preallocated peer, not a preallocated connection.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

Merge branch 'devlink-next'

Jakub Kicinski says:

====================
devlink: fix reload notifications and remove features

First two patches adjust notifications during devlink reload.
The last patch removes no longer needed devlink features.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>

devlink: remove devlink features

Devlink features were introduced to disallow devlink reload calls of
userspace before the devlink was fully initialized. The reason for this
workaround was the fact that devlink reload was originally called
without devlink instance lock held.

However, with recent changes that converted devlink reload to be
performed under devlink instance lock, this is redundant so remove
devlink features entirely.

Note that mlx5 used this to enable devlink reload conditionally only
when device didn't act as multi port slave. Move the multi port check
into mlx5_devlink_reload_down() callback alongside with the other
checks preventing the device from reload in certain states.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: send objects notifications during devlink reload

Currently, the notifications are only sent for params. People who
introduced other objects forgot to add the reload notifications here.

To make sure all notifications happen according to existing comment,
benefit from existence of devlink_notify_register/unregister() helpers
and use them in reload code.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: move devlink reload notifications back in between _down() and _up() calls

This effectively reverts commit 05a7f4a8dff1 ("devlink: Break parameter
notification sequence to be before/after unload/load driver").

Cited commit resolved a problem in mlx5 params implementation,
when param notification code accessed memory previously freed
during reload.

Now, when the params can be registered and unregistered when devlink
instance is registered, mlx5 code unregisters the problematic param
during devlink reload. The fix is therefore no longer needed.

Current behavior is a it problematic, as it sends DEL notifications even
in potential case when reload_down() call fails which might confuse
userspace notifications listener.

So move the reload notifications back where they were originally in
between reload_down() and reload_up() calls.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sparx5-ES2-VCAP-support'

Steen Hegelund says:

====================
Adding Sparx5 ES2 VCAP support

This provides the Egress Stage 2 (ES2) VCAP (Versatile Content-Aware
Processor) support for the Sparx5 platform.

The ES2 VCAP is an Egress Access Control VCAP that uses frame keyfields and
previously classified keyfields to apply e.g. policing, trapping or
mirroring to frames.

The ES2 VCAP has 2 lookups and they are accessible with a TC chain id:

- chain 20000000: ES2 Lookup 0
- chain 20100000: ES2 Lookup 1

As the other Sparx5 VCAPs the ES2 VCAP has its own lookup/port keyset
configuration that decides which keys will be used for matching on which
traffic type.

The ES2 VCAP has these traffic classifications:

- IPv4 frames
- IPv6 frames
- Other frames

The ES2 VCAP can match on an ISDX key (Ingress Service Index) as one of the
frame metadata keyfields. The IS0 VCAP can update this key using its
actions, and this allows a IS0 VCAP rule to be linked to an ES2 rule.

This is similar to how the IS0 VCAP and the IS2 VCAP use the PAG
(Policy Association Group) keyfield to link rules.

From user space this is exposed via "chain offsets", so an IS0 rule with a
"goto chain 20000015" action will use an ISDX key value of 15 to link to a
rule in the ES2 VCAP attached to the same chain id.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add KUNIT tests for enabling/disabling chains

This enhances the KUNIT test of the VCAP API with tests of the chaining
functionality.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add TC support for the ES2 VCAP

This enables the TC command to use the Sparx5 ES2 VCAP, and provides a new
ES2 ethertype table and handling of rule links between IS0 and ES2.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add ingress information to VCAP instance

This allows the check of the goto action to be specific to the ingress and
egress VCAP instances.

The debugfs support is also updated to show this information.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add ES2 VCAP keyset configuration for Sparx5

This adds the ES2 VCAP port keyset configuration for Sparx5 and also
updates the debugFS support to show the keyset configuration and the egress
port mask.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add ES2 VCAP model and updated KUNIT VCAP model

This provides the VCAP model for the Sparx5 ES2 (Egress Stage 2) VCAP.

This VCAP provides tagging and remarking functionality

This also renames a VCAP keyfield: VCAP_KF_MIRROR_ENA becomes
VCAP_KF_MIRROR_PROBE, as the first name was caused by a mistake in the
model transformation.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Improve error message when parsing CVLAN filter

This improves the error message when a TC filter with CVLAN tag is used and
the selected VCAP instance does not support this.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Improve the IP frame key match for IPv6 frames

This ensures that it will be possible for a VCAP rule to distinguish IPv6
frames from non-IP frames, as the IS0 keyset usually selected for the IPv6
traffic class in (7TUPLE) does not offer a key that specifies IPv6
directly: only non-IPv4.

The IP_SNAP key ensures that we select (at least) IP frames.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Add support for getting keysets without a type id

When there is only one keyset available for a certain VCAP rule size, the
particular keyset does not need a type id when encoded in the VCAP
Hardware.

This provides support for getting a keyset from a rule, when this is the
case: only one keyset fits this rule size.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'batadv-next-pullrequest-20230127' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

- bump version strings, by Simon Wunderlich

- drop prandom.h includes, by Sven Eckelmann

- fix mailing list address, by Sven Eckelmann

- multicast feature preparation, by Linus Lüssing (2 patches)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Add a check for oversized packets

Occasionnaly we may get oversized packets from the hardware which
exceed the nomimal 2KiB buffer size we allocate SKBs with. Add an early
check which drops the packet to avoid invoking skb_over_panic() and move
on to processing the next packet.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fec: convert to gpio descriptor

The driver can be trivially converted, as it only triggers the gpio
pin briefly to do a reset, and it already only supports DT.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mdio: mux-meson-g12a: use __clk_is_enabled to simplify the code

By using __clk_is_enabled () we can avoid defining an own variable for
tracking whether enable counter is zero.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netlink: recommend policy range validation

For large ranges (outside of s16) the documentation currently
recommends open-coding the validation, but it's better to use
the NLA_POLICY_FULL_RANGE() or NLA_POLICY_FULL_RANGE_SIGNED()
policy validation instead; recommend that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230127084506.09f280619d64.I5dece85f06efa8ab0f474ca77df9e26d3553d4ab@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
bpf-next 2023-01-28

We've added 124 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 6386 insertions(+), 1827 deletions(-).

The main changes are:

1) Implement XDP hints via kfuncs with initial support for RX hash and
   timestamp metadata kfuncs, from Stanislav Fomichev and
   Toke Høiland-Jørgensen.
   Measurements on overhead: https://lore.kernel.org/bpf/875yellcx6.fsf@toke.dk

2) Extend libbpf's bpf_tracing.h support for tracing arguments of
   kprobes/uprobes and syscall as a special case, from Andrii Nakryiko.

3) Significantly reduce the search time for module symbols by livepatch
   and BPF, from Jiri Olsa and Zhen Lei.

4) Enable cpumasks to be used as kptrs, which is useful for tracing
   programs tracking which tasks end up running on which CPUs
   in different time intervals, from David Vernet.

5) Fix several issues in the dynptr processing such as stack slot liveness
   propagation, missing checks for PTR_TO_STACK variable offset, etc,
   from Kumar Kartikeya Dwivedi.

6) Various performance improvements, fixes, and introduction of more
   than just one XDP program to XSK selftests, from Magnus Karlsson.

7) Big batch to BPF samples to reduce deprecated functionality,
   from Daniel T. Lee.

8) Enable struct_ops programs to be sleepable in verifier,
   from David Vernet.

9) Reduce pr_warn() noise on BTF mismatches when they are expected under
   the CONFIG_MODULE_ALLOW_BTF_MISMATCH config anyway, from Connor O'Brien.

10) Describe modulo and division by zero behavior of the BPF runtime
    in BPF's instruction specification document, from Dave Thaler.

11) Several improvements to libbpf API documentation in libbpf.h,
    from Grant Seltzer.

12) Improve resolve_btfids header dependencies related to subcmd and add
    proper support for HOSTCC, from Ian Rogers.

13) Add ipip6 and ip6ip decapsulation support for bpf_skb_adjust_room()
    helper along with BPF selftests, from Ziyang Xuan.

14) Simplify the parsing logic of structure parameters for BPF trampoline
    in the x86-64 JIT compiler, from Pu Lehui.

15) Get BTF working for kernels with CONFIG_RUST enabled by excluding
    Rust compilation units with pahole, from Martin Rodriguez Reboredo.

16) Get bpf_setsockopt() working for kTLS on top of TCP sockets,
    from Kui-Feng Lee.

17) Disable stack protection for BPF objects in bpftool given BPF backends
    don't support it, from Holger Hoffstätte.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (124 commits)
  selftest/bpf: Make crashes more debuggable in test_progs
  libbpf: Add documentation to map pinning API functions
  libbpf: Fix malformed documentation formatting
  selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata
  selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket.
  bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().
  bpf/selftests: Verify struct_ops prog sleepable behavior
  bpf: Pass const struct bpf_prog * to .check_member
  libbpf: Support sleepable struct_ops.s section
  bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable
  selftests/bpf: Fix vmtest static compilation error
  tools/resolve_btfids: Alter how HOSTCC is forced
  tools/resolve_btfids: Install subcmd headers
  bpf/docs: Document the nocast aliasing behavior of ___init
  bpf/docs: Document how nested trusted fields may be defined
  bpf/docs: Document cpumask kfuncs in a new file
  selftests/bpf: Add selftest suite for cpumask kfuncs
  selftests/bpf: Add nested trust selftests suite
  bpf: Enable cpumasks to be queried and used as kptrs
  bpf: Disallow NULLable pointers for trusted kfuncs
  ...
====================

Link: https://lore.kernel.org/r/20230128004827.21371-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netpoll: Remove 4s sleep during carrier detection

This patch removes the msleep(4s) during netpoll_setup() if the carrier
appears instantly.

Here are some scenarios where this workaround is counter-productive in
modern ages:

Servers which have BMC communicating over NC-SI via the same NIC as gets
used for netconsole. BMC will keep the PHY up, hence the carrier
appearing instantly.

The link is fibre, SERDES getting sync could happen within 0.1Hz, and
the carrier also appears instantly.

Other than that, if a driver is reporting instant carrier and then
losing it, this is probably a driver bug.

Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20230125185230.3574681-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/netdev/net

Conflicts:

drivers/net/ethernet/intel/ice/ice_main.c
  418e53401e47 ("ice: move devlink port creation/deletion")
  643ef23bd9dd ("ice: Introduce local var for readability")
https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/
https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/

drivers/net/ethernet/engleder/tsnep_main.c
  3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues")
  25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock")
https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/

net/netfilter/nf_conntrack_proto_sctp.c
  13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"")
  a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths")
  f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets")
https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/
https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mt7530: fix tristate and help description

Fix description for tristate and help sections which include inaccurate
information.

Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/20230126190110.9124-1-arinc.unal@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-xdp-execute-xdp_do_flush-before-napi_complete_done'

Magnus Karlsson says:

====================
net: xdp: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found in [1].

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in [2].

The drivers have only been compile-tested since I do not own any of
the HW below. So if you are a maintainer, it would be great if you
could take a quick look to make sure I did not mess something up.

Note that these were the drivers I found that violated the ordering by
running a simple script and manually checking the ones that came up as
potential offenders. But the script was not perfect in any way. There
might still be offenders out there, since the script can generate
false negatives.

[1] https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
[2] https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
====================

Link: https://lore.kernel.org/r/20230125074901.2737-1-magnus.karlsson@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpaa2-eth: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.

Fixes: d678be1dc1ec ("dpaa2-eth: add XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpaa_eth: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.

Fixes: a1e031ffb422 ("dpaa_eth: add XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Acked-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

virtio-net: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.

Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

lan966x: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.

Fixes: a825b611c7c1 ("net: lan966x: Add support for XDP_REDIRECT")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qede: execute xdp_do_flush() before napi_complete_done()

Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.

The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.

Fixes: d1b25b79e162b ("qede: add .ndo_xdp_xmit() and XDP_REDIRECT support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftest/bpf: Make crashes more debuggable in test_progs

Reset stdio before printing verbose log of the SIGSEGV'ed test.
Otherwise, it's hard to understand what's going on in the cases like [0].

With the following patch applied:

--- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
@@ -392,6 +392,11 @@ void test_xdp_metadata(void)
       "generate freplace packet"))
goto out;

+
+ ASSERT_EQ(1, 2, "oops");
+ int *x = 0;
+ *x = 1; /* die */
+
while (!retries--) {
if (bpf_obj2->bss->called)
break;

Before:

#281     xdp_metadata:FAIL
Caught signal #11!
Stack trace:
./test_progs(crash_handler+0x1f)[0x55c919d98bcf]
/lib/x86_64-linux-gnu/libc.so.6(+0x3bf90)[0x7f36aea5df90]
./test_progs(test_xdp_metadata+0x1db0)[0x55c919d8c6d0]
./test_progs(+0x23b438)[0x55c919d9a438]
./test_progs(main+0x534)[0x55c919d99454]
/lib/x86_64-linux-gnu/libc.so.6(+0x2718a)[0x7f36aea4918a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f36aea49245]
./test_progs(_start+0x21)[0x55c919b82ef1]

After:

test_xdp_metadata:PASS:ip netns add xdp_metadata 0 nsec
open_netns:PASS:malloc token 0 nsec
open_netns:PASS:open /proc/self/ns/net 0 nsec
open_netns:PASS:open netns fd 0 nsec
open_netns:PASS:setns 0 nsec
..
test_xdp_metadata:FAIL:oops unexpected oops: actual 1 != expected 2
#281     xdp_metadata:FAIL
Caught signal #11!
Stack trace:
./test_progs(crash_handler+0x1f)[0x562714a76bcf]
/lib/x86_64-linux-gnu/libc.so.6(+0x3bf90)[0x7fa663f9cf90]
./test_progs(test_xdp_metadata+0x1db0)[0x562714a6a6d0]
./test_progs(+0x23b438)[0x562714a78438]
./test_progs(main+0x534)[0x562714a77454]
/lib/x86_64-linux-gnu/libc.so.6(+0x2718a)[0x7fa663f8818a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7fa663f88245]
./test_progs(_start+0x21)[0x562714860ef1]

0: https://github.com/kernel-patches/bpf/actions/runs/4019879316/jobs/6907358876

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230127215705.1254316-1-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

libbpf: Add documentation to map pinning API functions

This adds documentation for the following API functions:

- bpf_map__set_pin_path()
- bpf_map__pin_path()
- bpf_map__is_pinned()
- bpf_map__pin()
- bpf_map__unpin()
- bpf_object__pin_maps()
- bpf_object__unpin_maps()

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230126024225.520685-1-grantseltzer@gmail.com

libbpf: Fix malformed documentation formatting

This fixes the doxygen format documentation above the
user_ring_buffer__* APIs. There has to be a newline
before the @brief, otherwise doxygen won't render them
for libbpf.readthedocs.org.

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230126024749.522278-1-grantseltzer@gmail.com

Merge branch 'devlink-parama-cleanup'

Jiri Pirko says:

====================
devlink: Cleanup params usage

This patchset takes care of small cleanup of devlink params usage.
Some of the patches (first 2/3) are cosmetic, but I would like to
point couple of interesting ones:

Patch 9 is the main one of this set and introduces devlink instance
locking for params, similar to other devlink objects. That allows params
to be registered/unregistered when devlink instance is registered.

Patches 10-12 change mlx5 code to register non-driverinit params in the
code they are related to, and thanks to patch 8 this might be when
devlink instance is registered - for example during devlink reload.

---
v1->v2:
- Just small fix in the last patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Move eswitch port metadata devlink param to flow eswitch code

Move the param registration and handling code into the eswitch offloads
code as they are related to each other. No point in having the
devlink param registration done in separate file.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Move flow steering devlink param to flow steering code

Move the param registration and handling code into the flow steering
code as they are related to each other. No point in having the
devlink param registration done in separate file.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Move fw reset devlink param to fw reset code

Move the param registration and handling code into the fw reset code
as they are related to each other. No point in having the devlink param
registration done in separate file.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: protect devlink param list by instance lock

Commit 1d18bb1a4ddd ("devlink: allow registering parameters after
the instance") as the subject implies introduced possibility to register
devlink params even for already registered devlink instance. This is a
bit problematic, as the consistency or params list was originally
secured by the fact it is static during devlink lifetime. So in order to
protect the params list, take devlink instance lock during the params
operations. Introduce unlocked function variants and use them in drivers
in locked context. Put lock assertions to appropriate places.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: put couple of WARN_ONs in devlink_param_driverinit_value_get()

Put couple of WARN_ONs in devlink_param_driverinit_value_get() function
to clearly indicate, that it is a driver bug if used without reload
support or for non-driverinit param.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: make devlink_param_driverinit_value_set() return void

devlink_param_driverinit_value_set() currently returns int with possible
error, but no user is checking it anyway. The only reason for a fail is
a driver bug. So convert the function to return void and put WARN_ONs
on error paths.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: remove pointless call to devlink_param_driverinit_value_set()

devlink_param_driverinit_value_set() call makes sense only for "
driverinit" params. However here, the param is "runtime".
devlink_param_driverinit_value_set() returns -EOPNOTSUPP in such case
and does not do anything. So remove the pointless call to
devlink_param_driverinit_value_set() entirely.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ice: remove pointless calls to devlink_param_driverinit_value_set()

devlink_param_driverinit_value_set() call makes sense only for
"driverinit" params. However here, both params are "runtime".
devlink_param_driverinit_value_set() returns -EOPNOTSUPP in such case
and does not do anything. So remove the pointless calls to
devlink_param_driverinit_value_set() entirely.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: don't work with possible NULL pointer in devlink_param_unregister()

There is a WARN_ON checking the param_item for being NULL when the param
is not inserted in the list. That indicates a driver BUG. Instead of
continuing to work with NULL pointer with its consequences, return.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: make devlink_param_register/unregister static

There is no user outside the devlink code, so remove the export and make
the functions static. Move them before callers to avoid forward
declarations.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Covert devlink params registration to use devlink_params_register/unregister()

Since mlx5 is the only user of devlink API to register/unregister a
single param, convert it to use array registration function allowing to
simplify the devlink API by removing the single param registration
functions.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Change devlink param register/unregister function names

The functions are registering and unregistering devlink params, so
change the names accordingly.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ethtool-netlink-next'

Jakub Kicinski says:

====================
ethtool: netlink: handle SET intro/outro in the common code

Factor out the boilerplate code from SET handlers to common code.

I volunteered to refactor the extack in GET in a conversation
with Vladimir but I gave up.

The handling of failures during dump in GET handlers is a bit
unclear to me. Some code uses presence of info as indication
of dump and tries to avoid reporting errors altogether
(including extack messages).

There's also the question of whether we should have a validation
callback (similar to .set_validate here) for GET. It looks like
.parse_request was expected to perform the validation. It takes
the extack and tb directly, not via info:

int (*parse_request)(struct ethnl_req_info *req_info,
     struct nlattr **tb,
     struct netlink_ext_ack *extack);

int (*prepare_data)(const struct ethnl_req_info *req_info,
    struct ethnl_reply_data *reply_data,
    struct genl_info *info);

so no crashes dereferencing info possible.

But .parse_request doesn't run under rtnl nor ethnl_ops_begin().
As a result some implementations defer validation until .prepare_data
where all the locks are held and they can call out to the driver.

All this makes me think that maybe we should refactor GET in the
same direction I'm refactoring SET. Split .prepare_data, take
more locks in the core, and add a validation helper which would
take extack directly:

    - ret = ops->prepare_data(req_info, reply_data, info);
    + ret = ops->prepare_data_validate(req_info, reply_data, attrs, extack);
    + if (ret < 1) // if 0 -> skip for dump; -EOPNOTSUPP in do
    +   goto err1;
    +
    + ret = ethnl_ops_begin(dev);
    + if (ret)
    +   goto err1;
    +
    + ret = ops->prepare_data(req_info, reply_data); // no extack
    + ethnl_ops_complete(dev);

I'll file that away as a TODO for posterity / older me.

v2:
- invert checks for coalescing to avoid error code changes
- rebase and convert MM as well

v1: https://lore.kernel.org/all/20230121054430.642280-1-kuba@kernel.org/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: netlink: convert commands to common SET

Convert all SET commands where new common code is applicable.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: netlink: handle SET intro/outro in the common code

Most ethtool SET callbacks follow the same general structure.

  ethnl_parse_header_dev_get()
  rtnl_lock()
  ethnl_ops_begin()

  ... do stuff ...

  ethtool_notify()
  ethnl_ops_complete()
  rtnl_unlock()
  ethnl_parse_header_dev_put()

This leads to a lot of copy / pasted code an bugs when people
mis-handle the error path.

Add a generic implementation of this pattern with a .set callback
in struct ethnl_request_ops called to "do stuff".

Also add an optional .set_validate which is called before
ethnl_ops_begin() -- a lot of implementations do basic request
capability / sanity checking at that point.

Because we want to avoid generating the notification when
no change happened - adopt a slightly hairy return values:
- 0 means nothing to do (no notification)
- 1 means done / continue
- negative error codes on error

Reuse .hdr_attr from struct ethnl_request_ops, GET and SET
use the same attr spaces in all cases.

Convert pause as an example (and to avoid unused function warnings).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: convert to regmap read/write API

Convert qca8k to regmap read/write bulk API. The mgmt eth can write up
to 32 bytes of data at times. Currently we use a custom function to do
it but regmap now supports declaration of read/write bulk even without a
bus.

Drop the custom function and rework the regmap function to this new
implementation.

Rework the qca8k_fdb_read/write function to use the new
regmap_bulk_read/write as the old qca8k_bulk_read/write are now dropped.

Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: add QCA8K_ATU_TABLE_SIZE define for fdb access

Add and use QCA8K_ATU_TABLE_SIZE instead of hardcoding the ATU size with
a pure number and using sizeof on the array.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-skbuff-includes'

Jakub Kicinski says:

====================
net: skbuff: clean up unnecessary includes

skbuff.h is included in a significant portion of the tree.
Clean up unused dependencies to speed up builds.

This set only takes care of the most obvious cases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: remove unnecessary includes from net/flow.h

This file is included by a lot of other commonly included
headers, it doesn't need socket.h or flow_dissector.h.

This reduces the size of this file after pre-processing
from 28165 to 4663.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: drop the linux/hrtimer.h include

linux/hrtimer.h include was added because apparently it used
to contain ktime related code. This is no longer the case
and we include linux/time.h explicitly.

Sadly this change is currently a noop because linux/dma-mapping.h
and net/page_pool.h pull in half of the universe.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: drop the linux/splice.h include

splice.h is included since commit a60e3cc7c929 ("net: make
skb_splice_bits more configureable") but really even then
all we needed is some forward declarations. Most of that
code is now gone, and remaining has fwd declarations.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add missing includes of linux/splice.h

Number of files depend on linux/splice.h getting included
by linux/skbuff.h which soon will no longer be the case.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: drop the linux/sched.h include

linux/sched.h was added for skb_mstamp_* (all the way back
before linux/sched.h got split and linux/sched/clock.h created).
We don't need it in skbuff.h any more.

Sadly this change is currently a noop because linux/dma-mapping.h
and net/page_pool.h pull in half of the universe.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: drop the linux/sched/clock.h include

It used to be necessary for skb_mstamp_* static inlines,
but those are gone since we moved to usec timestamps in
TCP, in 2017.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add missing includes of linux/sched/clock.h

Number of files depend on linux/sched/clock.h getting included
by linux/skbuff.h which soon will no longer be the case.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: drop the linux/textsearch.h include

This include was added for skb_find_text() but all we need there
is a forward declaration of struct ts_config.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: checksum: drop the linux/uaccess.h include

net/checksum.h pulls in linux/uaccess.h which is large.

In the x86 header the include seems to not be needed at all.
ARM on the other hand does not include uaccess.h, even tho
it calls access_ok().

In the generic implementation guard the include of linux/uaccess.h
with the same condition as the code that needs it.

With this change pre-processed net/checksum.h shrinks on x86
from 30616 lines to just 1193.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: drop the linux/net.h include

It appears nothing needs it. The kernel builds fine with this
include removed, building an otherwise empty source file with:

#include <linux/skbuff.h>
#ifdef _LINUX_NET_H
#error linux/net.h is back
#endif

works too (meaning net.h is not just pulled in indirectly).

This gives us a slight 0.5% reduction in the pre-processed size
of skbuff.h.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add missing includes of linux/net.h

linux/net.h will soon not be included by linux/skbuff.h.
Fix the cases where source files were depending on the implicit
include.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipa-abstract-status'

Alex Elder says:

====================
net: ipa: abstract status parsing

Under some circumstances, IPA generates a "packet status" structure
that describes information about a packet.  This is used, for
example, when offload hardware detects an error in a packet, or
otherwise discovers a packet needs special handling.  In this case,
the status is delivered (along with the packet it describes) to a
"default" endpoint so that it can be handled by the AP.

Until now, the structure of this status information hasn't changed.
However, to support more than 32 endpoints, this structure required
some changes, such that some fields are rearranged in ways that are
tricky to represent using C code.

This series updates code related to the IPA status structure.  The
first patch uses a local variable to avoid recomputing a packet
length more than once.  The second stops using sizeof() to determine
the size of an IPA packet status structure.  Patches 3-5 extend the
definitions for values held in packet status fields.  Patch 6 does a
little general cleanup to make patch 7 simpler.  Patch 7 stops using
a C structure to represent packet status; instead, a new function
fetches values "by name" from a buffer containing such a structure.
The last patch updates this function so it also supports IPA v5.0+.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: add IPA v5.0 packet status support

Update ipa_status_extract() to support IPA v5.0 and beyond. Because
the format of the IPA packet status depends on the version, pass an
IPA pointer to the function.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: introduce generalized status decoder

Stop assuming the IPA packet status has a fixed format (defined by
a C structure). Instead, use a function to extract each field from
a block of data interpreted as an IPA packet status. Define an
enumerated type that identifies the fields that can be extracted.
The current function extracts fields based on the existing
ipa_status structure format (which is no longer used).

Define IPA_STATUS_RULE_MISS, to replace the calls to field_max() to
represent that condition; those depended on the knowing the width of
a filter or router rule in the IPA packet status structure.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: IPA status preparatory cleanups

The next patch reworks how the IPA packet status structure is
interpreted.  This patch does some preparatory work, to make it
easier to see the effect of that change:
  - Change a few functions that access fields in a IPA packet status
    structure to store field values in local variables with names
    related to the field.
  - Pass a void pointer rather than an (equivalent) status pointer
    to two functions called by ipa_endpoint_status_parse().
  - Use "rule" rather than "val" as the name of a variable that
    holds a routing rule ID.
  - Consistently use "IPA packet status" rather than "status
    element" when referring to this data structure.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: define remaining IPA status field values

Define the remaining values for opcode and exception fields in the
IPA packet status structure. Most of these values are powers-of-2,
suggesting they are meant to be used as bitmasks, but that is not
the case. Add comments to be clear about this, and express the
values in decimal format.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: rename the NAT enumerated type

Rename the ipa_nat_en enumerated type to be ipa_nat_type, and rename
its symbols accordingly. Add a comment indicating those values are
also used in the IPA status nat_type field.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: define all IPA status mask bits

There is a 16 bit status mask defined in the IPA packet status
structure, of which only one (TAG_VALID) is currently used.

Define all other IPA status mask values in an enumerated type whose
numeric values are bit mask values (in CPU byte order) in the status
mask. Use the TAG_VALID value from that type rather than defining a
separate field mask.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: stop using sizeof(status)

The IPA packet status structure changes in IPA v5.0 in ways that are
difficult to represent cleanly. As a small step toward redefining
it as a parsed block of data, use a constant to define its size,
rather than the size of the IPA status structure type.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: refactor status buffer parsing

The packet length encoded in an IPA packet status buffer is computed
more than once in ipa_endpoint_status_parse(). It is also checked
again in ipa_endpoint_status_skip(), which that function calls.

Compute the length once, and use that computed value later rather
than recomputing it. Check for it being zero in the parse function
rather than in ipa_endpoint_status_skip().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: ocelot: build felix.c into a dedicated kernel module

The build system currently complains:

scripts/Makefile.build:252: drivers/net/dsa/ocelot/Makefile:
felix.o is added to multiple modules: mscc_felix mscc_seville

Since felix.c holds the DSA glue layer, create a mscc_felix_dsa_lib.ko.
This is similar to how mscc_ocelot_switch_lib.ko holds a library for
configuring the hardware.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Colin Foster <colin.foster@in-advantage.com>
Link: https://lore.kernel.org/r/20230125145716.271355-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
virtchnl: update and refactor

Jesse Brandeburg says:

The virtchnl.h file is used by i40e/ice physical function (PF) drivers
and irdma when talking to the iavf driver. This series cleans up the
header file by removing unused elements, adding/cleaning some comments,
fixing the data structures so they are explicitly defined, including
padding, and finally does a long overdue rename of the IWARP members in
the structures to RDMA, since the ice driver and it's associated Intel
Ethernet E800 series adapters support both RDMA and IWARP.

The whole series should result in no functional change, but hopefully
clearer code.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  virtchnl: i40e/iavf: rename iwarp to rdma
  virtchnl: do structure hardening
  virtchnl: update header and increase header clarity
  virtchnl: remove unused structure declaration
====================

Link: https://lore.kernel.org/r/20230125212441.4030014-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata

The existing timestamping_enable() is a no-op because it applies
to the socket-related path that we are not verifying here
anymore. (but still leaving the code around hoping we can
have xdp->skb path verified here as well)

  poll: 1 (0)
  xsk_ring_cons__peek: 1
  0xf64788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
  rx_hash: 3697961069
  rx_timestamp:  1674657672142214773 (sec:1674657672.1422)
  XDP RX-time:   1674657709561774876 (sec:1674657709.5618) delta sec:37.4196
  AF_XDP time:   1674657709561871034 (sec:1674657709.5619) delta
sec:0.0001 (96.158 usec)
  0xf64788: complete idx=8 addr=8000

Also, maybe something to archive here, see [0] for Jesper's note
about NIC vs host clock delta.

0: https://lore.kernel.org/bpf/f3a116dc-1b14-3432-ad20-a36179ef0608@redhat.com/

v2:
- Restore original value (Martin)

Fixes: 297a3f124155 ("selftests/bpf: Simple program to dump XDP RX metadata")
Reported-by: Jesper Dangaard Brouer <jbrouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <jbrouer@redhat.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230126225030.510629-1-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Merge branch 'tools-ynl-prevent-reorder-and-fix-flags'

Jakub Kicinski says:

====================
tools: ynl: prevent reorder and fix flags

Some codegen improvements for YAML specs.

First, Lorenzon discovered when switching the XDP feature family
to use flags instead of pure enum that the kdoc got garbled.
The support for enum and flags is therefore unified.

Second when regenerating all families we discussed so far I noticed
that some netlink policies jumped around. We need to ensure we don't
render code based on their ordering in a hash.
====================

Link: https://lore.kernel.org/r/20230126000235.1085551-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: store ops in ordered dict to avoid random ordering

When rendering code we should walk the ops in the order in which
they are declared in the spec. This is both more intuitive and
prevents code from jumping around when hashing in the dict changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: rename ops_list -> msg_list

ops_list contains all the operations, but the main iteration use
case is to walk only ops which define attrs. Rename ops_list to
msg_list, because now it looks like the contents are the same,
just the format is different. While at it convert from tuple
to just keys, none of the users care about the name of the op.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: support kdocs for flags in code generation

Lorenzo reports that after switching from enum to flags netdev
family lost ability to render kdoc (and the enum contents got
generally garbled).

Combine the flags and enum handling in uAPI handling.

Reported-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'convert-drivers-to-return-xfrm-configuration-errors-through-extack'

Leon Romanovsky says:

====================
Convert drivers to return XFRM configuration errors through extack

This series continues effort started by Sabrina to return XFRM configuration
errors through extack. It allows for user space software stack easily present
driver failure reasons to users.

As a note, Intel drivers have a path where extack is equal to NULL, and error
prints won't be available in current patchset. If it is needed, it can be
changed by adding special to Intel macro to print to dmesg in case of
extack == NULL.
====================

Link: https://lore.kernel.org/r/cover.1674560845.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

cxgb4: fill IPsec state validation failure reason

Rely on extack to return failure reason.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bonding: fill IPsec state validation failure reason

Rely on extack to return failure reason.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ixgbe: fill IPsec state validation failure reason

Rely on extack to return failure reason.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ixgbevf: fill IPsec state validation failure reason

Rely on extack to return failure reason.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfp: fill IPsec state validation failure reason

Rely on extack to return failure reason.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netdevsim: Fill IPsec state validation failure reason

Rely on extack to return failure reason.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Fill IPsec state validation failure reason

Rely on extack to return failure reason.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xfrm: extend add state callback to set failure reason

Almost all validation logic is in the drivers, but they are
missing reliable way to convey failure reason to userspace
applications.

Let's use extack to return this information to users.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Fill IPsec policy validation failure reason

Rely on extack to return failure reason.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xfrm: extend add policy callback to set failure reason

Almost all validation logic is in the drivers, but they are
missing reliable way to convey failure reason to userspace
applications.

Let's use extack to return this information to users.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.2-rc6' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter.

  Current release - regressions:

   - sched: sch_taprio: do not schedule in taprio_reset()

  Previous releases - regressions:

   - core: fix UaF in netns ops registration error path

   - ipv4: prevent potential spectre v1 gadgets

   - ipv6: fix reachability confirmation with proxy_ndp

   - netfilter: fix for the set rbtree

   - eth: fec: use page_pool_put_full_page when freeing rx buffers

   - eth: iavf: fix temporary deadlock and failure to set MAC address

  Previous releases - always broken:

   - netlink: prevent potential spectre v1 gadgets

   - netfilter: fixes for SCTP connection tracking

   - mctp: struct sock lifetime fixes

   - eth: ravb: fix possible hang if RIS2_QFF1 happen

   - eth: tg3: resolve deadlock in tg3_reset_task() during EEH

  Misc:

   - Mat stepped out as MPTCP co-maintainer"

* tag 'net-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
  net: mdio-mux-meson-g12a: force internal PHY off on mux switch
  docs: networking: Fix bridge documentation URL
  tsnep: Fix TX queue stop/wake for multiple queues
  net/tg3: resolve deadlock in tg3_reset_task() during EEH
  net: mctp: mark socks as dead on unhash, prevent re-add
  net: mctp: hold key reference when looking up a general key
  net: mctp: move expiry timer delete to unhash
  net: mctp: add an explicit reference from a mctp_sk_key to sock
  net: ravb: Fix possible hang if RIS2_QFF1 happen
  net: ravb: Fix lack of register setting after system resumed for Gen3
  net/x25: Fix to not accept on connected socket
  ice: move devlink port creation/deletion
  sctp: fail if no bound addresses can be used for a given scope
  net/sched: sch_taprio: do not schedule in taprio_reset()
  Revert "Merge branch 'ethtool-mac-merge'"
  netrom: Fix use-after-free of a listening socket.
  netfilter: conntrack: unify established states for SCTP paths
  Revert "netfilter: conntrack: add sctp DATA_SENT state"
  netfilter: conntrack: fix bug in for_each_sctp_chunk
  netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE
  ...