platform/kernel/linux-starfive.git
19 months agonetfilter: conntrack: set icmpv6 redirects as RELATED
Florian Westphal [Tue, 22 Nov 2022 15:00:09 +0000 (16:00 +0100)]
netfilter: conntrack: set icmpv6 redirects as RELATED

icmp conntrack will set icmp redirects as RELATED, but icmpv6 will not
do this.

For icmpv6, only icmp errors (code <= 128) are examined for RELATED state.
ICMPV6 Redirects are part of neighbour discovery mechanism, those are
handled by marking a selected subset (e.g.  neighbour solicitations) as
UNTRACKED, but not REDIRECT -- they will thus be flagged as INVALID.

Add minimal support for REDIRECTs.  No parsing of neighbour options is
added for simplicity, so this will only check that we have the embeeded
original header (ND_OPT_REDIRECT_HDR), and then attempt to do a flow
lookup for this tuple.

Also extend the existing test case to cover redirects.

Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.")
Reported-by: Eric Garver <eric@garver.life>
Link: https://github.com/firewalld/firewalld/issues/1046
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
19 months agonetfilter: ipset: Add support for new bitmask parameter
Vishwanath Pai [Tue, 22 Nov 2022 19:30:57 +0000 (20:30 +0100)]
netfilter: ipset: Add support for new bitmask parameter

Add a new parameter to complement the existing 'netmask' option. The
main difference between netmask and bitmask is that bitmask takes any
arbitrary ip address as input, it does not have to be a valid netmask.

The name of the new parameter is 'bitmask'. This lets us mask out
arbitrary bits in the ip address, for example:
ipset create set1 hash:ip bitmask 255.128.255.0
ipset create set2 hash:ip,port family inet6 bitmask ffff::ff80

Signed-off-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Joshua Hunt <johunt@akamai.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
19 months agonetfilter: conntrack: merge ipv4+ipv6 confirm functions
Florian Westphal [Wed, 9 Nov 2022 11:21:58 +0000 (12:21 +0100)]
netfilter: conntrack: merge ipv4+ipv6 confirm functions

No need to have distinct functions.  After merge, ipv6 can avoid
protooff computation if the connection neither needs sequence adjustment
nor helper invocation -- this is the normal case.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
19 months agonetfilter: conntrack: add sctp DATA_SENT state
Sriram Yagnaraman [Fri, 4 Nov 2022 17:18:35 +0000 (18:18 +0100)]
netfilter: conntrack: add sctp DATA_SENT state

SCTP conntrack currently assumes that the SCTP endpoints will
probe secondary paths using HEARTBEAT before sending traffic.

But, according to RFC 9260, SCTP endpoints can send any traffic
on any of the confirmed paths after SCTP association is up.
SCTP endpoints that sends INIT will confirm all peer addresses
that upper layer configures, and the SCTP endpoint that receives
COOKIE_ECHO will only confirm the address it sent the INIT_ACK to.

So, we can have a situation where the INIT sender can start to
use secondary paths without the need to send HEARTBEAT. This patch
allows DATA/SACK packets to create new connection tracking entry.

A new state has been added to indicate that a DATA/SACK chunk has
been seen in the original direction - SCTP_CONNTRACK_DATA_SENT.
State transitions mostly follows the HEARTBEAT_SENT, except on
receiving HEARTBEAT/HEARTBEAT_ACK/DATA/SACK in the reply direction.

State transitions in original direction:
- DATA_SENT behaves similar to HEARTBEAT_SENT for all chunks,
   except that it remains in DATA_SENT on receving HEARTBEAT,
   HEARTBEAT_ACK/DATA/SACK chunks
State transitions in reply direction:
- DATA_SENT behaves similar to HEARTBEAT_SENT for all chunks,
   except that it moves to HEARTBEAT_ACKED on receiving
   HEARTBEAT/HEARTBEAT_ACK/DATA/SACK chunks

Note: This patch still doesn't solve the problem when the SCTP
endpoint decides to use primary paths for association establishment
but uses a secondary path for association shutdown. We still have
to depend on timeout for connections to expire in such a case.

Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
19 months agonetfilter: nft_inner: fix IS_ERR() vs NULL check
Dan Carpenter [Tue, 15 Nov 2022 13:26:07 +0000 (16:26 +0300)]
netfilter: nft_inner: fix IS_ERR() vs NULL check

The __nft_expr_type_get() function returns NULL on error.  It never
returns error pointers.

Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
19 months agoMerge branch 'cleanup-ocelot_stats-exposure'
Paolo Abeni [Tue, 22 Nov 2022 14:36:45 +0000 (15:36 +0100)]
Merge branch 'cleanup-ocelot_stats-exposure'

Colin Foster says:

====================
cleanup ocelot_stats exposure

The ocelot_stats structures became redundant across all users. Replace
this redundancy with a static const struct. After doing this, several
definitions inside include/soc/mscc/ocelot.h no longer needed to be
shared. Patch 2 removes them.

Checkpatch throws an error for a complicated macro not in parentheses. I
understand the reason for OCELOT_COMMON_STATS was to allow expansion, but
interestingly this patch set is essentially reverting the ability for
expansion. I'm keeping the macro in this set, but am open to remove it,
since it doesn't _actually_ provide any immediate benefits anymore.
====================

Link: https://lore.kernel.org/r/20221119231406.3167852-1-colin.foster@in-advantage.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agonet: mscc: ocelot: issue a warning if stats are incorrectly ordered
Colin Foster [Sat, 19 Nov 2022 23:14:06 +0000 (15:14 -0800)]
net: mscc: ocelot: issue a warning if stats are incorrectly ordered

Ocelot uses regmap_bulk_read() operations to efficiently read stats
registers. Currently the implementation relies on the stats layout to be
ordered to be most efficient.

Issue a warning if any future implementations happen to break this pattern.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Co-developed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agonet: mscc: ocelot: remove unnecessary exposure of stats structures
Colin Foster [Sat, 19 Nov 2022 23:14:05 +0000 (15:14 -0800)]
net: mscc: ocelot: remove unnecessary exposure of stats structures

Since commit 4d1d157fb6a4 ("net: mscc: ocelot: share the common stat
definitions between all drivers") there is no longer a need to share the
stats structures to the world. Relocate these definitions to inside
ocelot_stats.c instead of a global include header.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agonet: mscc: ocelot: remove redundant stats_layout pointers
Colin Foster [Sat, 19 Nov 2022 23:14:04 +0000 (15:14 -0800)]
net: mscc: ocelot: remove redundant stats_layout pointers

Ever since commit 4d1d157fb6a4 ("net: mscc: ocelot: share the common stat
definitions between all drivers") the stats_layout entry in ocelot and
felix drivers have become redundant. Remove the unnecessary code.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agoselftests: net: Add cross-compilation support for BPF programs
Björn Töpel [Sat, 19 Nov 2022 17:18:41 +0000 (18:18 +0100)]
selftests: net: Add cross-compilation support for BPF programs

The selftests/net does not have proper cross-compilation support, and
does not properly state libbpf as a dependency. Mimic/copy the BPF
build from selftests/bpf, which has the nice side-effect that libbpf
is built as well.

Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://lore.kernel.org/r/20221119171841.2014936-1-bjorn@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agosamples: pktgen: Use "grep -E" instead of "egrep"
Tiezhu Yang [Sat, 19 Nov 2022 02:55:04 +0000 (10:55 +0800)]
samples: pktgen: Use "grep -E" instead of "egrep"

The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.

  sed -i "s/egrep/grep -E/g" `grep egrep -rwl samples/pktgen`

Here are the steps to install the latest grep:

  wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
  tar xf grep-3.8.tar.gz
  cd grep-3.8 && ./configure && make
  sudo make install
  export PATH=/usr/local/bin:$PATH

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/1668826504-32162-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agoocteontx2-pf: Add additional checks while configuring ucast/bcast/mcast rules
Suman Ghosh [Fri, 18 Nov 2022 05:33:29 +0000 (11:03 +0530)]
octeontx2-pf: Add additional checks while configuring ucast/bcast/mcast rules

1. If a profile does not support DMAC extraction then avoid installing NPC
flow rules for unicast. Similarly, if LXMB(L2 and L3) extraction is not
supported by the profile then avoid installing broadcast and multicast
rules.
2. Allow MCAM entry insertion for promiscuous mode.
3. For the profiles where DMAC is not extracted in MKEX key default
unicast entry installed by AF is not valid. Hence do not use action
from the AF installed default unicast entry for such cases.
4. Adjacent packet header fields in a packet like IP header source
and destination addresses or UDP/TCP header source port and destination
can be extracted together in MKEX profile. Therefore MKEX profile can be
configured to in two ways:
a. Total of 4 bytes from start of UDP header(src port
   + destination port)
or
b. Two bytes from start and two bytes from offset 2

Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://lore.kernel.org/r/20221118053329.2288486-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
19 months agonet: bcmgenet: Clear RGMII_LINK upon link down
Florian Fainelli [Fri, 18 Nov 2022 21:37:54 +0000 (13:37 -0800)]
net: bcmgenet: Clear RGMII_LINK upon link down

Clear the RGMII_LINK bit upon detecting link down to be consistent with
setting the bit upon link up. We also move the clearing of the
out-of-band disable to the runtime initialization rather than for each
link up/down transition.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20221118213754.1383364-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: microchip: sparx5: fix uninitialized variables
Dan Carpenter [Fri, 18 Nov 2022 15:12:52 +0000 (18:12 +0300)]
net: microchip: sparx5: fix uninitialized variables

Smatch complains that "err" can be uninitialized on these paths.  Also
it's just nicer to "return 0;" instead of "return err;"

Fixes: 3a344f99bb55 ("net: microchip: sparx5: Add support for TC flower ARP dissector")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/Y3eg9Ml/LmLR3L3C@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: fix __sock_gen_cookie()
Eric Dumazet [Fri, 18 Nov 2022 04:38:43 +0000 (04:38 +0000)]
net: fix __sock_gen_cookie()

I was mistaken how atomic64_try_cmpxchg(&sk_cookie, &res, new)
is working.

I was assuming @res would contain the final sk_cookie value,
regardless of the success of our cmpxchg()

We could do something like:

if (atomic64_try_cmpxchg(&sk_cookie, &res, new)
res = new;

But we can avoid a conditional and read sk_cookie again.

atomic64_cmpxchg(&sk_cookie, res, new);
res = atomic64_read(&sk_cookie);

Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1527347 ("Error handling issues")
Fixes: 4ebf802cf1c6 ("net: __sock_gen_cookie() cleanup")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221118043843.3703186-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoMerge branch 'mptcp-netlink'
David S. Miller [Mon, 21 Nov 2022 13:09:08 +0000 (13:09 +0000)]
Merge branch 'mptcp-netlink'

Mat Martineau says:

====================
mptcp: More specific netlink command errors

This series makes the error reporting for the MPTCP_PM_CMD_ADD_ADDR netlink
command more specific, since there are multiple reasons the command could
fail.

Note that patch 2 adds a GENL_SET_ERR_MSG_FMT() macro to genetlink.h,
which is outside the MPTCP subsystem.

Patch 1 refactors in-kernel listening socket and endpoint creation to
simplify the second patch.

Patch 2 updates the error values returned by the in-kernel path manager
when it fails to create a local endpoint.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agomptcp: more detailed error reporting on endpoint creation
Paolo Abeni [Fri, 18 Nov 2022 18:46:08 +0000 (10:46 -0800)]
mptcp: more detailed error reporting on endpoint creation

Endpoint creation can fail for a number of reasons; in case of failure
append the error number to the extended ack message, using a newly
introduced generic helper.

Additionally let mptcp_pm_nl_append_new_local_addr() report different
error reasons.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agomptcp: deduplicate error paths on endpoint creation
Paolo Abeni [Fri, 18 Nov 2022 18:46:07 +0000 (10:46 -0800)]
mptcp: deduplicate error paths on endpoint creation

When endpoint creation fails, we need to free the newly allocated
entry and eventually destroy the paired mptcp listener socket.

Consolidate such action in a single point let all the errors path
reach it.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: Return errno in sk->sk_prot->get_port().
Kuniyuki Iwashima [Fri, 18 Nov 2022 18:25:06 +0000 (10:25 -0800)]
net: Return errno in sk->sk_prot->get_port().

We assume the correct errno is -EADDRINUSE when sk->sk_prot->get_port()
fails, so some ->get_port() functions return just 1 on failure and the
callers return -EADDRINUSE instead.

However, mptcp_get_port() can return -EINVAL.  Let's not ignore the error.

Note the only exception is inet_autobind(), all of whose callers return
-EAGAIN instead.

Fixes: cec37a6e41aa ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: ethernet: renesas: rswitch: Fix MAC address info
Yoshihiro Shimoda [Fri, 18 Nov 2022 00:27:24 +0000 (09:27 +0900)]
net: ethernet: renesas: rswitch: Fix MAC address info

Smatch detected the following warning.

    drivers/net/ethernet/renesas/rswitch.c:1717 rswitch_init() warn:
    '%pM' cannot be followed by 'n'

The 'n' should be '\n'.

Reported-by: Dan Carpenter <error27@gmail.com>
Suggested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoMerge branch 'sarx5-VCAP-debugfs'
David S. Miller [Mon, 21 Nov 2022 11:33:02 +0000 (11:33 +0000)]
Merge branch 'sarx5-VCAP-debugfs'

netdev.vger.kernel.org archive mirror
Steen Hegelund says:

====================
net: Add support for VCAP debugFS in Sparx5

This provides support for getting VCAP instance, VCAP rule and VCAP port
keyset configuration information via the debug file system.

It builds on top of the initial IS2 VCAP support found in these series:

https://lore.kernel.org/all/20221020130904.1215072-1-steen.hegelund@microchip.com/
https://lore.kernel.org/all/20221109114116.3612477-1-steen.hegelund@microchip.com/
https://lore.kernel.org/all/20221111130519.1459549-1-steen.hegelund@microchip.com/

Functionality:
==============

The VCAP API exposes a /sys/kernel/debug/sparx5/vcaps folder containing
the following entries:

- raw_<vcap>_<instance>
    This is a raw dump of the VCAP instance with a line for each available
    VCAP rule.  This information is limited to the VCAP rule address, the
    rule size and the rule keyset name as this requires very little
    information from the VCAP cache.

    This can be used to detect if a valid rule is stored at the correct
    address.

- <vcap>_<instance>
    This dumps the VCAP instance configuration: address ranges, chain id
    ranges, word size of keys and actions etc, and for each VCAP rule the
    details of keys (values and masks) and actions are shown.

    This is useful when discovering if the expected rule is present and in
    which order it will be matched.

- <interface>
    This shows the keyset configuration per lookup and traffic type and the
    set of sticky bits (common for all interfaces). This is cleared when
    shown, so it is possible to sample over a period of time.

    It also shows if this port/lookup is enabled for matching in the VCAP.

    This can be used to find out which keyset the traffic being sent to a
    port, will be matched against, and if such traffic has been seen by one
    of the ports.

Delivery:
=========

This is current plan for delivering the full VCAP feature set of Sparx5:

- TC protocol all support for IS2 VCAP
- Sparx5 IS0 VCAP support
- TC policer and drop action support (depends on the Sparx5 QoS support
  upstreamed separately)
- Sparx5 ES0 VCAP support
- TC flower template support
- TC matchall filter support for mirroring and policing ports
- TC flower filter mirror action support
- Sparx5 ES2 VCAP support

Version History:
================
v2      Removed a 'support' folder (used for integration testing) that had
        been added in patch 6/8 by a mistake.
        Wrapped long lines.

v1      Initial version
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: microchip: sparx5: Add VCAP debugfs KUNIT test
Steen Hegelund [Thu, 17 Nov 2022 21:31:14 +0000 (22:31 +0100)]
net: microchip: sparx5: Add VCAP debugfs KUNIT test

This tests the functionality of the debugFS support:

- finding valid keyset on an address
- raw VCAP output
- full rule VCAP output

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: microchip: sparx5: Add VCAP locking to protect rules
Steen Hegelund [Thu, 17 Nov 2022 21:31:13 +0000 (22:31 +0100)]
net: microchip: sparx5: Add VCAP locking to protect rules

This ensures that the VCAP cache and the lists maintained in the VCAP
instance is protected when accessed by different clients.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: microchip: sparx5: Add VCAP debugFS key/action support for the VCAP API
Steen Hegelund [Thu, 17 Nov 2022 21:31:12 +0000 (22:31 +0100)]
net: microchip: sparx5: Add VCAP debugFS key/action support for the VCAP API

This add support for displaying the keys and actions in a rule.
The keys and action display format will be determined by the size and the
type of the key or action. The longer keys will typically be displayed as a
hexadecimal byte array.

The actionset is not decoded in full as the Sparx5 IS2 only has one
supported action, so this will be added later with other VCAP types.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: microchip: sparx5: Add VCAP rule debugFS support for the VCAP API
Steen Hegelund [Thu, 17 Nov 2022 21:31:11 +0000 (22:31 +0100)]
net: microchip: sparx5: Add VCAP rule debugFS support for the VCAP API

This add support to show all rules in a VCAP instance. The information
shown is:

 - rule id
 - address range
 - size
 - chain id
 - keyset name, subword size, register span
 - actionset name, subword size, register span
 - counter value
 - sticky bit (one bit width counter)

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: microchip: sparx5: Add raw VCAP debugFS support for the VCAP API
Steen Hegelund [Thu, 17 Nov 2022 21:31:10 +0000 (22:31 +0100)]
net: microchip: sparx5: Add raw VCAP debugFS support for the VCAP API

This adds support for decoding VCAP rules with a minimum number of
attributes: address, rule size and keyset.

This allows for a quick inspection of a VCAP instance to determine if the
rule are present and in the correct order.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: microchip: sparx5: Add VCAP debugFS support
Steen Hegelund [Thu, 17 Nov 2022 21:31:09 +0000 (22:31 +0100)]
net: microchip: sparx5: Add VCAP debugFS support

Add a debugFS root folder for Sparx5 and add a vcap folder underneath with
the VCAP instances and the ports

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: microchip: sparx5: Ensure VCAP last_used_addr is set back to default
Steen Hegelund [Thu, 17 Nov 2022 21:31:08 +0000 (22:31 +0100)]
net: microchip: sparx5: Ensure VCAP last_used_addr is set back to default

This ensures that the last_used_addr in a VCAP instance is returned to the
default value when all rules have been deleted.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: microchip: sparx5: Ensure L3 protocol has a default value
Steen Hegelund [Thu, 17 Nov 2022 21:31:07 +0000 (22:31 +0100)]
net: microchip: sparx5: Ensure L3 protocol has a default value

This ensures that the l3_proto always have a valid value and that any
dissector parsing errors causes the flower rule to be discarded.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoMerge branch 'gve-alternate-missed-completions'
David S. Miller [Mon, 21 Nov 2022 10:52:14 +0000 (10:52 +0000)]
Merge branch 'gve-alternate-missed-completions'

Jeroen de Borst says:

====================
gve: Handle alternate miss-completions

Some versions of the virtual NIC present miss-completions in
an alternative way. Let the diver handle these alternate completions
and announce this capability to the device.

The capability is announced uing a new AdminQ command that sends
driver information to the device. The device can refuse a driver
if it is lacking support for a capability, or it can adopt it's
behavior to work around OS specific issues.

Changed in v5:
- Removed comments in fucntion calls
- Switched ENOTSUPP back to EOPNOTSUPP and made sure it gets passed
Changed in v4:
- Clarified new AdminQ command in cover letter
- Changed EOPNOTSUPP to ENOTSUPP to match device's response
Changed in v3:
- Rewording cover letter
- Added 'Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>'
Changes in v2:
- Changed the subject to include 'gve:'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agogve: Handle alternate miss completions
Jeroen de Borst [Thu, 17 Nov 2022 16:27:01 +0000 (08:27 -0800)]
gve: Handle alternate miss completions

The virtual NIC has 2 ways of indicating a miss-path
completion. This handles the alternate.

Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agogve: Adding a new AdminQ command to verify driver
Jeroen de Borst [Thu, 17 Nov 2022 16:27:00 +0000 (08:27 -0800)]
gve: Adding a new AdminQ command to verify driver

Check whether the driver is compatible with the device
presented.

Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoNFC: nci: Extend virtual NCI deinit test
Dmitry Vyukov [Thu, 17 Nov 2022 16:21:01 +0000 (17:21 +0100)]
NFC: nci: Extend virtual NCI deinit test

Extend the test to check the scenario when NCI core tries to send data
to already closed device to ensure that nothing bad happens.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Bongsu Jeon <bongsu.jeon@samsung.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoMerge branch 'axiennet-mdio-bus-freq'
David S. Miller [Mon, 21 Nov 2022 10:36:04 +0000 (10:36 +0000)]
Merge branch 'axiennet-mdio-bus-freq'

Andy Chiu says:

====================
net: axienet: Use a DT property to configure frequency of the MDIO bus

Some FPGA platforms have to set frequency of the MDIO bus lower than 2.5
MHz. Thus, we use a DT property, which is "clock-frequency", to work
with it at boot time. The default 2.5 MHz would be set if the property
is not pressent. Also, factor out mdio enable/disable functions due to
the api change since 253761a0e61b7.

Changelog:
--- v5 ---
1. Make dt-binding patch prior to the implementation patch.
2. Disable mdio bus in error path.
3. Update description of some functions.
--- v4 ---
1. change MAX_MDIO_FREQ to DEFAULT_MDIO_FREQ as suggested by Andrew.
--- v3 RESEND ---
1. Repost the exact same patch again
--- v3 ---
1. Fix coding style, and make probing of the driver fail if MDC overflow
--- v2 ---
1. Use clock-frequency, as defined in mdio.yaml, to configure MDIO
   clock.
2. Only print out frequency if it is set to a non-standard value.
3. Reduce the scope of axienet_mdio_enable and remove
   axienet_mdio_disable because no one really uses it anymore.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: axienet: set mdio clock according to bus-frequency
Andy Chiu [Thu, 17 Nov 2022 15:40:14 +0000 (23:40 +0800)]
net: axienet: set mdio clock according to bus-frequency

Some FPGA platforms have 80KHz MDIO bus frequency constraint when
connecting Ethernet to its on-board external Marvell PHY. Thus, we may
have to set MDIO clock according to the DT. Otherwise, use the default
2.5 MHz, as specified by 802.3, if the entry is not present.

Also, change MAX_MDIO_FREQ to DEFAULT_MDIO_FREQ because we may actually
set MDIO bus frequency higher than 2.5MHz if undelying devices support
it. And properly disable the mdio bus clock in error path.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agodt-bindings: describe the support of "clock-frequency" in mdio
Andy Chiu [Thu, 17 Nov 2022 15:40:13 +0000 (23:40 +0800)]
dt-bindings: describe the support of "clock-frequency" in mdio

mdio bus frequency is going to be configurable at boottime by a property
in DT now, so add a description to it.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: axienet: Unexport and remove unused mdio functions
Andy Chiu [Thu, 17 Nov 2022 15:40:12 +0000 (23:40 +0800)]
net: axienet: Unexport and remove unused mdio functions

Both axienet_mdio_{enable/disable} functions are no longer used in
xilinx_axienet_main.c due to 253761a0e61b7. And axienet_mdio_disable is
not even used in the mdio.c. So unexport and remove them.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: microchip: sparx5: prevent uninitialized variable
Dan Carpenter [Thu, 17 Nov 2022 15:29:05 +0000 (18:29 +0300)]
net: microchip: sparx5: prevent uninitialized variable

Smatch complains that:

    drivers/net/ethernet/microchip/sparx5/sparx5_dcb.c:112
    sparx5_dcb_apptrust_validate() error: uninitialized symbol 'match'.

This would only happen if the:

if (sparx5_dcb_apptrust_policies[i].nselectors != nselectors)

condition is always true (they are not equal).  The "nselectors"
variable comes from dcbnl_ieee_set() and it is a number between 0-256.
This seems like a probably a real bug.

Fixes: 23f8382cd95d ("net: microchip: sparx5: add support for apptrust")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: ethernet: mtk_eth_soc: fix RSTCTRL_PPE{0,1} definitions
Lorenzo Bianconi [Thu, 17 Nov 2022 14:29:53 +0000 (15:29 +0100)]
net: ethernet: mtk_eth_soc: fix RSTCTRL_PPE{0,1} definitions

Fix RSTCTRL_PPE0 and RSTCTRL_PPE1 register mask definitions for
MTK_NETSYS_V2.
Remove duplicated definitions.

Fixes: 160d3a9b1929 ("net: ethernet: mtk_eth_soc: introduce MTK_NETSYS_V2 support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: microchip: sparx5: kunit test: Fix compile warnings.
Horatiu Vultur [Thu, 17 Nov 2022 13:28:12 +0000 (14:28 +0100)]
net: microchip: sparx5: kunit test: Fix compile warnings.

When VCAP_KUNIT_TEST is enabled the following warnings are generated:

drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:257:34: warning: Using plain integer as NULL pointer
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:258:41: warning: Using plain integer as NULL pointer
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:342:23: warning: Using plain integer as NULL pointer
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:359:23: warning: Using plain integer as NULL pointer
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:1327:34: warning: Using plain integer as NULL pointer
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:1328:41: warning: Using plain integer as NULL pointer

Therefore fix this.

Fixes: dccc30cc4906 ("net: microchip: sparx5: Add KUNIT test of counters and sorted rules")
Fixes: c956b9b318d9 ("net: microchip: sparx5: Adding KUNIT tests of key/action values in VCAP API")
Fixes: 67d637516fa9 ("net: microchip: sparx5: Adding KUNIT test for the VCAP API")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoMerge branch 'nfp-ipsec-offload'
David S. Miller [Mon, 21 Nov 2022 08:51:36 +0000 (08:51 +0000)]
Merge branch 'nfp-ipsec-offload'

Simon Horman says:

====================
nfp: IPsec offload support

Huanhuan Wang says:

this series adds support for IPsec offload to the NFP driver.

It covers three enhancements:

1. Patches 1/3:
   - Extend the capability word and control word to to support
     new features.

2. Patch 2/3:
   - Add framework to support IPsec offloading for NFP driver,
     but IPsec offload control plane interface xfrm callbacks which
     interact with upper layer are not implemented in this patch.

3. Patch 3/3:
   - IPsec control plane interface xfrm callbacks are implemented
     in this patch.

Changes since v3
* Remove structure fields that describe firmware but
  are not used for Kernel offload
* Add WARN_ON(!xa_empty()) before call to xa_destroy()
* Added helpers for hash methods

Changes since v2
* OFFLOAD_HANDLE_ERROR macro and the associated code removed
* Unnecessary logging removed
* Hook function xdo_dev_state_free in struct xfrmdev_ops removed
* Use Xarray to maintain SA entries

Changes since v1
* Explicitly return failure when XFRM_STATE_ESN is set
* Fix the issue that AEAD algorithm is not correctly offloaded
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonfp: implement xfrm callbacks and expose ipsec offload feature to upper layer
Huanhuan Wang [Thu, 17 Nov 2022 13:21:02 +0000 (14:21 +0100)]
nfp: implement xfrm callbacks and expose ipsec offload feature to upper layer

Xfrm callbacks are implemented to offload SA info into firmware
by mailbox. It supports 16K SA info in total.

Expose ipsec offload feature to upper layer, this feature will
signal the availability of the offload.

Based on initial work of Norm Bagley <norman.bagley@netronome.com>.

Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonfp: add framework to support ipsec offloading
Huanhuan Wang [Thu, 17 Nov 2022 13:21:01 +0000 (14:21 +0100)]
nfp: add framework to support ipsec offloading

A new metadata type and config structure are introduced to
interact with firmware to support ipsec offloading. This
feature relies on specific firmware that supports ipsec
encrypt/decrypt by advertising related capability bit.

The xfrm callbacks which interact with upper layer are
implemented in the following patch.

Based on initial work of Norm Bagley <norman.bagley@netronome.com>.

Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonfp: extend capability and control words
Yinjun Zhang [Thu, 17 Nov 2022 13:21:00 +0000 (14:21 +0100)]
nfp: extend capability and control words

Currently the 32-bit capability word is almost exhausted, now
allocate some more words to support new features, and control
word is also extended accordingly. Packet-type offloading is
implemented in NIC application firmware, but it's not used in
kernel driver, so reserve this bit here in case it's redefined
for other use.

Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agobna: Avoid clashing function prototypes
Gustavo A. R. Silva [Wed, 16 Nov 2022 16:59:44 +0000 (10:59 -0600)]
bna: Avoid clashing function prototypes

When built with Control Flow Integrity, function prototypes between
caller and function declaration must match. These mismatches are visible
at compile time with the new -Wcast-function-type-strict in Clang[1].

Fix a total of 227 warnings like these:

drivers/net/ethernet/brocade/bna/bna_enet.c:519:3: warning: cast from 'void (*)(struct bna_ethport *, enum bna_ethport_event)' to 'bfa_fsm_t' (aka 'void (*)(void *, int)') converts to incompatible function type [-Wcast-function-type-strict]
                bfa_fsm_set_state(ethport, bna_ethport_sm_down);
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The bna state machine code heavily overloads its state machine functions,
so these have been separated into their own sets of structs, enums,
typedefs, and helper functions. There are almost zero binary code changes,
all seem to be related to header file line numbers changing, or the
addition of the new stats helper.

Important to mention is that while I was manually implementing this changes
I was staring at this[2] patch from Kees Cook. Thanks, Kees. :)

Link: https://github.com/KSPP/linux/issues/240
[1] https://reviews.llvm.org/D134831
[2] https://lore.kernel.org/linux-hardening/20220929230334.2109344-1-keescook@chromium.org/
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: ethernet: mediatek: ppe: assign per-port queues for offloaded traffic
Felix Fietkau [Wed, 16 Nov 2022 08:07:34 +0000 (09:07 +0100)]
net: ethernet: mediatek: ppe: assign per-port queues for offloaded traffic

Keeps traffic sent to the switch within link speed limits

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20221116080734.44013-7-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: dsa: tag_mtk: assign per-port queues
Felix Fietkau [Wed, 16 Nov 2022 08:07:33 +0000 (09:07 +0100)]
net: dsa: tag_mtk: assign per-port queues

Keeps traffic sent to the switch within link speed limits

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20221116080734.44013-6-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues
Felix Fietkau [Wed, 16 Nov 2022 08:07:32 +0000 (09:07 +0100)]
net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues

When sending traffic to multiple ports with different link speeds, queued
packets to one port can drown out tx to other ports.
In order to better handle transmission to multiple ports, use the hardware
shaper feature to implement weighted fair queueing between ports.
Weight and maximum rate are automatically adjusted based on the link speed
of the port.
The first 3 queues are unrestricted and reserved for non-DSA direct tx on
GMAC ports. The following queues are automatically assigned by the MTK DSA
tag driver based on the target port number.
The PPE offload code configures the queues for offloaded traffic in the same
way.
This feature is only supported on devices supporting QDMA. All queues still
share the same DMA ring and descriptor pool.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20221116080734.44013-5-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: ethernet: mtk_eth_soc: avoid port_mg assignment on MT7622 and newer
Felix Fietkau [Wed, 16 Nov 2022 08:07:31 +0000 (09:07 +0100)]
net: ethernet: mtk_eth_soc: avoid port_mg assignment on MT7622 and newer

On newer chips, this field is unused and contains some bits related to queue
assignment. Initialize it to 0 in those cases.
Fix offload_version on MT7621 and MT7623, which still need the previous value.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20221116080734.44013-4-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: ethernet: mtk_eth_soc: drop packets to WDMA if the ring is full
Felix Fietkau [Wed, 16 Nov 2022 08:07:30 +0000 (09:07 +0100)]
net: ethernet: mtk_eth_soc: drop packets to WDMA if the ring is full

Improves handling of DMA ring overflow.
Clarify other WDMA drop related comment.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20221116080734.44013-3-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: ethernet: mtk_eth_soc: increase tx ring size for QDMA devices
Felix Fietkau [Wed, 16 Nov 2022 08:07:29 +0000 (09:07 +0100)]
net: ethernet: mtk_eth_soc: increase tx ring size for QDMA devices

In order to use the hardware traffic shaper feature, a larger tx ring is
needed, especially for the scratch ring, which the hardware shaper uses to
reorder packets.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20221116080734.44013-2-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: fman: remove reference to non-existing config PCS
Lukas Bulwahn [Wed, 16 Nov 2022 10:24:50 +0000 (11:24 +0100)]
net: fman: remove reference to non-existing config PCS

Commit a7c2a32e7f22 ("net: fman: memac: Use lynx pcs driver") makes the
Freescale Data-Path Acceleration Architecture Frame Manager use lynx pcs
driver by selecting PCS_LYNX.

It also selects the non-existing config PCS as well, which has no effect.

Remove this select to a non-existing config.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20221116102450.13928-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonetlink: remove the flex array from struct nlmsghdr
Jakub Kicinski [Fri, 18 Nov 2022 03:39:03 +0000 (19:39 -0800)]
netlink: remove the flex array from struct nlmsghdr

I've added a flex array to struct nlmsghdr in
commit 738136a0e375 ("netlink: split up copies in the ack construction")
to allow accessing the data easily. It leads to warnings with clang,
if user space wraps this structure into another struct and the flex
array is not at the end of the container.

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/all/20221114023927.GA685@u2004-local/
Link: https://lore.kernel.org/r/20221118033903.1651026-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agomrp: introduce active flags to prevent UAF when applicant uninit
Schspa Shi [Wed, 16 Nov 2022 11:45:11 +0000 (19:45 +0800)]
mrp: introduce active flags to prevent UAF when applicant uninit

The caller of del_timer_sync must prevent restarting of the timer, If
we have no this synchronization, there is a small probability that the
cancellation will not be successful.

And syzbot report the fellowing crash:
==================================================================
BUG: KASAN: use-after-free in hlist_add_head include/linux/list.h:929 [inline]
BUG: KASAN: use-after-free in enqueue_timer+0x18/0xa4 kernel/time/timer.c:605
Write at addr f9ff000024df6058 by task syz-fuzzer/2256
Pointer tag: [f9], memory tag: [fe]

CPU: 1 PID: 2256 Comm: syz-fuzzer Not tainted 6.1.0-rc5-syzkaller-00008-
ge01d50cbd6ee #0
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace.part.0+0xe0/0xf0 arch/arm64/kernel/stacktrace.c:156
 dump_backtrace arch/arm64/kernel/stacktrace.c:162 [inline]
 show_stack+0x18/0x40 arch/arm64/kernel/stacktrace.c:163
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x68/0x84 lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:284 [inline]
 print_report+0x1a8/0x4a0 mm/kasan/report.c:395
 kasan_report+0x94/0xb4 mm/kasan/report.c:495
 __do_kernel_fault+0x164/0x1e0 arch/arm64/mm/fault.c:320
 do_bad_area arch/arm64/mm/fault.c:473 [inline]
 do_tag_check_fault+0x78/0x8c arch/arm64/mm/fault.c:749
 do_mem_abort+0x44/0x94 arch/arm64/mm/fault.c:825
 el1_abort+0x40/0x60 arch/arm64/kernel/entry-common.c:367
 el1h_64_sync_handler+0xd8/0xe4 arch/arm64/kernel/entry-common.c:427
 el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:576
 hlist_add_head include/linux/list.h:929 [inline]
 enqueue_timer+0x18/0xa4 kernel/time/timer.c:605
 mod_timer+0x14/0x20 kernel/time/timer.c:1161
 mrp_periodic_timer_arm net/802/mrp.c:614 [inline]
 mrp_periodic_timer+0xa0/0xc0 net/802/mrp.c:627
 call_timer_fn.constprop.0+0x24/0x80 kernel/time/timer.c:1474
 expire_timers+0x98/0xc4 kernel/time/timer.c:1519

To fix it, we can introduce a new active flags to make sure the timer will
not restart.

Reported-by: syzbot+6fd64001c20aa99e34a4@syzkaller.appspotmail.com
Signed-off-by: Schspa Shi <schspa@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoMerge tag 'rxrpc-next-20221116' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Fri, 18 Nov 2022 12:09:20 +0000 (12:09 +0000)]
Merge tag 'rxrpc-next-20221116' of git://git./linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Fix oops and missing config conditionals

The patches that were pulled into net-next previously[1] had some issues
that this patchset fixes:

 (1) Fix missing IPV6 config conditionals.

 (2) Fix an oops caused by calling udpv6_sendmsg() directly on an AF_INET
     socket.

 (3) Fix the validation of network addresses on entry to socket functions
     so that we don't allow an AF_INET6 address if we've selected an
     AF_INET transport socket.

Link: https://lore.kernel.org/r/166794587113.2389296.16484814996876530222.stgit@warthog.procyon.org.uk/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: fix napi_disable() logic error
Eric Dumazet [Thu, 17 Nov 2022 09:26:41 +0000 (09:26 +0000)]
net: fix napi_disable() logic error

Dan reported a new warning after my recent patch:

New smatch warnings:
net/core/dev.c:6409 napi_disable() error: uninitialized symbol 'new'.

Indeed, we must first wait for STATE_SCHED and STATE_NPSVC to be cleared,
to make sure @new variable has been initialized properly.

Fixes: 4ffa1d1c6842 ("net: adopt try_cmpxchg() in napi_{enable|disable}()")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agorxrpc: uninitialized variable in rxrpc_send_ack_packet()
Dan Carpenter [Thu, 17 Nov 2022 07:44:02 +0000 (10:44 +0300)]
rxrpc: uninitialized variable in rxrpc_send_ack_packet()

The "pkt" was supposed to have been deleted in a previous patch.  It
leads to an uninitialized variable bug.

Fixes: 72f0c6fb0579 ("rxrpc: Allocate ACK records at proposal and queue for transmission")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agorxrpc: fix rxkad_verify_response()
Dan Carpenter [Thu, 17 Nov 2022 07:43:38 +0000 (10:43 +0300)]
rxrpc: fix rxkad_verify_response()

The error handling for if skb_copy_bits() fails was accidentally deleted
so the rxkad_decrypt_ticket() function is not called.

Fixes: 5d7edbc9231e ("rxrpc: Get rid of the Rx ring")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: ethernet: mtk_eth_soc: remove cpu_relax in mtk_pending_work
Lorenzo Bianconi [Wed, 16 Nov 2022 23:58:46 +0000 (00:58 +0100)]
net: ethernet: mtk_eth_soc: remove cpu_relax in mtk_pending_work

Get rid of cpu_relax in mtk_pending_work routine since MTK_RESETTING is
set only in mtk_pending_work() and it runs holding rtnl lock

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: ethernet: mtk_eth_soc: do not overwrite mtu configuration running reset routine
Lorenzo Bianconi [Wed, 16 Nov 2022 23:35:04 +0000 (00:35 +0100)]
net: ethernet: mtk_eth_soc: do not overwrite mtu configuration running reset routine

Restore user configured MTU running mtk_hw_init() during tx timeout routine
since it will be overwritten after a hw reset.

Reported-by: Felix Fietkau <nbd@nbd.name>
Fixes: 9ea4d311509f ("net: ethernet: mediatek: add the whole ethernet reset into the reset process")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: ipa: avoid a null pointer dereference
Alex Elder [Wed, 16 Nov 2022 22:37:18 +0000 (16:37 -0600)]
net: ipa: avoid a null pointer dereference

Dan Carpenter reported that Smatch found an instance where a pointer
which had previously been assumed could be null (as indicated by a
null check) was later dereferenced without a similar check.

In practice this doesn't lead to a problem because currently the
pointers used are all non-null.  Nevertheless this patch addresses
the reported problem.

In addition, I spotted another bug that arose in the same commit.
When the command to initialize a routing table memory region was
added, the number of entries computed for the non-hashed table
was wrong (it ended up being a Boolean rather than the count
intended).  This bug is fixed here as well.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/kernel-janitors/Y3OOP9dXK6oEydkf@kili
Tested-by: Caleb Connolly <caleb.connolly@linaro.com>
Fixes: 5cb76899fb47 ("net: ipa: reduce arguments to ipa_table_init_add()")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoMerge tag 'wireless-next-2022-11-18' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Fri, 18 Nov 2022 11:44:36 +0000 (11:44 +0000)]
Merge tag 'wireless-next-2022-11-18' of git://git./linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.2

Second set of patches for v6.2. Only driver patches this time, nothing
really special. Unused platform data support was removed from wl1251
and rtw89 got WoWLAN support.

Major changes:

ath11k

* support configuring channel dwell time during scan

rtw89

* new dynamic header firmware format support

* Wake-over-WLAN support

rtl8xxxu

* enable IEEE80211_HW_SUPPORT_FAST_XMIT
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoMerge branch 'sctp-vrf'
David S. Miller [Fri, 18 Nov 2022 11:42:54 +0000 (11:42 +0000)]
Merge branch 'sctp-vrf'

Xin Long says:

====================
sctp: support vrf processing

This patchset adds the VRF processing in SCTP. Simliar to TCP/UDP,
it includes socket bind and socket/association lookup changes.

For socket bind change, it allows sockets to bind to a VRF device
and allows multiple sockets with the same IP and PORT to bind to
different interfaces in patch 1-3.

For socket/association lookup change, it adds dif and sdif check
in both asoc and ep lookup in patch 4 and 5, and when binding to
nodev, users can decide if accept the packets received from one
l3mdev by setup a sysctl option in patch 6.

Note with VRF support, in a netns, an association will be decided
by src ip + src port + dst ip + dst port + bound_dev_if, and it's
possible for ss to have:

  State       Local Address:Port      Peer Address:Port
   ESTAB     192.168.1.2%vrf-s1:1234
   `- ESTAB   192.168.1.2%veth1:1234   192.168.1.1:1234
   ESTAB     192.168.1.2%vrf-s2:1234
   `- ESTAB   192.168.1.2%veth2:1234   192.168.1.1:1234

See the selftest in patch 7 for more usage.

Also, thanks Carlo for testing this patch series on their use.

v1->v2:
  - In Patch 5, move sctp_sk_bound_dev_eq() definition to net/sctp/
    input.c to avoid a build error when IP_SCTP is disabled, as Paolo
    suggested.
  - In Patch 7, avoid one sleep by disabling the IPv6 dad, and remove
    another sleep by using ss to check if the server's ready, and also
    delete two unncessary sleeps in sctp_hello.c, as Paolo suggested.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoselftests: add a selftest for sctp vrf
Xin Long [Wed, 16 Nov 2022 20:01:22 +0000 (15:01 -0500)]
selftests: add a selftest for sctp vrf

This patch adds 12 small test cases: 01-04 test for the sysctl
net.sctp.l3mdev_accept. 05-10 test for only binding to a right
l3mdev device, the connection can be created. 11-12 test for
two socks binding to different l3mdev devices at the same time,
each of them can process the packets from the corresponding
peer. The tests run for both IPv4 and IPv6 SCTP.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agosctp: add sysctl net.sctp.l3mdev_accept
Xin Long [Wed, 16 Nov 2022 20:01:21 +0000 (15:01 -0500)]
sctp: add sysctl net.sctp.l3mdev_accept

This patch is to add sysctl net.sctp.l3mdev_accept to allow
users to change the pernet global l3mdev_accept.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agosctp: add dif and sdif check in asoc and ep lookup
Xin Long [Wed, 16 Nov 2022 20:01:20 +0000 (15:01 -0500)]
sctp: add dif and sdif check in asoc and ep lookup

This patch at first adds a pernet global l3mdev_accept to decide if it
accepts the packets from a l3mdev when a SCTP socket doesn't bind to
any interface. It's set to 1 to avoid any possible incompatible issue,
and in next patch, a sysctl will be introduced to allow to change it.

Then similar to inet/udp_sk_bound_dev_eq(), sctp_sk_bound_dev_eq() is
added to check either dif or sdif is equal to sk_bound_dev_if, and to
check sid is 0 or l3mdev_accept is 1 if sk_bound_dev_if is not set.
This function is used to match a association or a endpoint, namely
called by sctp_addrs_lookup_transport() and sctp_endpoint_is_match().
All functions that needs updating are:

sctp_rcv():
  asoc:
  __sctp_rcv_lookup()
    __sctp_lookup_association() -> sctp_addrs_lookup_transport()
    __sctp_rcv_lookup_harder()
      __sctp_rcv_init_lookup()
         __sctp_lookup_association() -> sctp_addrs_lookup_transport()
      __sctp_rcv_walk_lookup()
         __sctp_rcv_asconf_lookup()
           __sctp_lookup_association() -> sctp_addrs_lookup_transport()

  ep:
  __sctp_rcv_lookup_endpoint() -> sctp_endpoint_is_match()

sctp_connect():
  sctp_endpoint_is_peeled_off()
    __sctp_lookup_association()
      sctp_has_association()
        sctp_lookup_association()
          __sctp_lookup_association() -> sctp_addrs_lookup_transport()

sctp_diag_dump_one():
  sctp_transport_lookup_process() -> sctp_addrs_lookup_transport()

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agosctp: add skb_sdif in struct sctp_af
Xin Long [Wed, 16 Nov 2022 20:01:19 +0000 (15:01 -0500)]
sctp: add skb_sdif in struct sctp_af

Add skb_sdif function in struct sctp_af to get the enslaved device
for both ipv4 and ipv6 when adding SCTP VRF support in sctp_rcv in
the next patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agosctp: check sk_bound_dev_if when matching ep in get_port
Xin Long [Wed, 16 Nov 2022 20:01:18 +0000 (15:01 -0500)]
sctp: check sk_bound_dev_if when matching ep in get_port

In sctp_get_port_local(), when binding to IP and PORT, it should
also check sk_bound_dev_if to match listening sk if it's set by
SO_BINDTOIFINDEX, so that multiple sockets with the same IP and
PORT, but different sk_bound_dev_if can be listened at the same
time.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agosctp: check ipv6 addr with sk_bound_dev if set
Xin Long [Wed, 16 Nov 2022 20:01:17 +0000 (15:01 -0500)]
sctp: check ipv6 addr with sk_bound_dev if set

When binding to an ipv6 address, it calls ipv6_chk_addr() to check if
this address is on any dev. If a socket binds to a l3mdev but no dev
is passed to do this check, all l3mdev and slaves will be skipped and
the check will fail.

This patch is to pass the bound_dev to make sure the devices under the
same l3mdev can be returned in ipv6_chk_addr(). When the bound_dev is
not a l3mdev or l3slave, l3mdev_master_dev_rcu() will return NULL in
__ipv6_chk_addr_and_flags(), it will keep compitable with before when
NULL dev was passed.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agosctp: verify the bind address with the tb_id from l3mdev
Xin Long [Wed, 16 Nov 2022 20:01:16 +0000 (15:01 -0500)]
sctp: verify the bind address with the tb_id from l3mdev

After binding to a l3mdev, it should use the route table from the
corresponding VRF to verify the addr when binding to an address.

Note ipv6 doesn't need it, as binding to ipv6 address does not
verify the addr with route lookup.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: libwx: Fix dead code for duplicate check
Jiawen Wu [Wed, 16 Nov 2022 01:58:35 +0000 (09:58 +0800)]
net: libwx: Fix dead code for duplicate check

Fix duplicate check on polling timeout.

Fixes: 1efa9bfe58c5 ("net: libwx: Implement interaction with firmware")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agonet: phy: mscc: macsec: do not copy encryption keys
Antoine Tenart [Tue, 15 Nov 2022 15:44:51 +0000 (16:44 +0100)]
net: phy: mscc: macsec: do not copy encryption keys

Following 1b16b3fdf675 ("net: phy: mscc: macsec: clear encryption keys when freeing a flow"),
go one step further and instead of calling memzero_explicit on the key
when freeing a flow, simply not copy the key in the first place as it's
only used when a new flow is set up.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 months agoMerge branch 'net-ipa-change-gsi-firmware-load-specification'
Jakub Kicinski [Fri, 18 Nov 2022 05:46:58 +0000 (21:46 -0800)]
Merge branch 'net-ipa-change-gsi-firmware-load-specification'

Alex Elder says:

====================
net: ipa: change GSI firmware load specification

Currently, GSI firmware must be loaded for IPA before it can be
used--either by the modem, or by the AP.  New hardware supports a
third option, with the bootloader taking responsibility for loading
GSI firmware.  In that case, neither the AP nor the modem needs to
do that.

The first patch in this series deprecates the "modem-init" Device
Tree property in the IPA binding, using a new "qcom,gsi-loader"
property instead.  The second and third implement logic in the code
to support either the "old" or the "new" way of specifying how GSI
firmware is loaded.

The last two patches implement a new value for the "qcom,gsi-loader"
property.  If the value is "skip", neither the AP nor modem needs to
load the GSI firmware.  The first of these patches implements the
change in the IPA binding; the second implements it in the code.
====================

Link: https://lore.kernel.org/r/20221116073257.34010-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: ipa: permit GSI firmware loading to be skipped
Alex Elder [Wed, 16 Nov 2022 07:32:56 +0000 (01:32 -0600)]
net: ipa: permit GSI firmware loading to be skipped

Define a new value "skip" for the "qcom,gsi-loader" Device Tree
property.  If used, it indicates that neither the AP nor the modem
need to load GSI firmware (because it has already been loaded--for
example by the boot loader).

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agodt-bindings: net: qcom,ipa: support skipping GSI firmware load
Alex Elder [Wed, 16 Nov 2022 07:32:55 +0000 (01:32 -0600)]
dt-bindings: net: qcom,ipa: support skipping GSI firmware load

Add a new enumerated value to those defined for the qcom,gsi-loader
property.  If the qcom,gsi-loader is "skip", the GSI firmware will
already be loaded, so neither the AP nor modem is required to load
GSI firmware.

Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: ipa: introduce "qcom,gsi-loader" property
Alex Elder [Wed, 16 Nov 2022 07:32:54 +0000 (01:32 -0600)]
net: ipa: introduce "qcom,gsi-loader" property

Introduce a new way of specifying how the GSI firmware gets loaded
for IPA.  Currently, this is indicated by the presence or absence of
the Boolean "modem-init" Device Tree property.  The new property
must have a value--either "self" or "modem"--which indicates whether
the AP or modem is the GSI firmware loader, respectively.

For legacy systems, the new property will not exist, and the
"modem-init" property will be used.  For newer systems, the
"qcom,gsi-loader" property *must* exist, and must have one of the
two prescribed values.  It is an error to have both properties
defined, and it is an error for the new property to have an
unrecognized value.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: ipa: encapsulate decision about firmware load
Alex Elder [Wed, 16 Nov 2022 07:32:53 +0000 (01:32 -0600)]
net: ipa: encapsulate decision about firmware load

The GSI layer used for IPA requires firmware to be loaded.

Currently either the AP or the modem loads the firmware,
distinguished by whether the "modem-init" Device Tree
property is defined.

Some newer systems implement a third option.  In preparation for
that, encapsulate the code that determines how the GSI firmware
gets loaded in a new function, ipa_firmware_loader().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agodt-bindings: net: qcom,ipa: deprecate modem-init
Alex Elder [Wed, 16 Nov 2022 07:32:52 +0000 (01:32 -0600)]
dt-bindings: net: qcom,ipa: deprecate modem-init

GSI firmware for IPA must be loaded during initialization, either by
the AP or by the modem.  The loader is currently specified based on
whether the Boolean modem-init property is present.

Instead, use a new property with an enumerated value to indicate
explicitly how GSI firmware gets loaded.  With this in place, a
third approach can be added in an upcoming patch.

The new qcom,gsi-loader property has two defined values:
  - self:   The AP loads GSI firmware
  - modem:  The modem loads GSI firmware
The modem-init property must still be supported, but is now marked
deprecated.

Update the example so it represents the SC7180 SoC, and provide
examples for the qcom,gsi-loader, memory-region, and firmware-name
properties.

Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agosctp: move SCTP_PAD4 and SCTP_TRUNC4 to linux/sctp.h
Xin Long [Tue, 15 Nov 2022 15:40:21 +0000 (10:40 -0500)]
sctp: move SCTP_PAD4 and SCTP_TRUNC4 to linux/sctp.h

Move these two macros from net/sctp/sctp.h to linux/sctp.h, so that
it will be enough to include only linux/sctp.h in nft_exthdr.c and
xt_sctp.c. It should not include "net/sctp/sctp.h" if a module does
not have a dependence on SCTP module.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Link: https://lore.kernel.org/r/ef6468a687f36da06f575c2131cd4612f6b7be88.1668526821.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agosctp: change to include linux/sctp.h in net/sctp/checksum.h
Xin Long [Tue, 15 Nov 2022 15:39:53 +0000 (10:39 -0500)]
sctp: change to include linux/sctp.h in net/sctp/checksum.h

Currently "net/sctp/checksum.h" including "net/sctp/sctp.h" is
included in quite some places in netfilter and openswitch and
net/sched. It's not necessary to include "net/sctp/sctp.h" if
a module does not have dependence on SCTP, "linux/sctp.h" is
the right one to include.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Link: https://lore.kernel.org/r/ca7ea96d62a26732f0491153c3979dc1c0d8d34a.1668526793.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoMerge branch 'implement-devlink-rate-api-and-extend-it'
Jakub Kicinski [Fri, 18 Nov 2022 05:41:41 +0000 (21:41 -0800)]
Merge branch 'implement-devlink-rate-api-and-extend-it'

Michal Wilczynski says:

====================
Implement devlink-rate API and extend it

This patch series implements devlink-rate for ice driver. Unfortunately
current API isn't flexible enough for our use case, so there is a need to
extend it. Some functions have been introduced to enable the driver to
export current Tx scheduling configuration.

Pasting justification for this series from commit implementing devlink-rate
in ice driver(that is a part of this series):

There is a need to support modification of Tx scheduler tree, in the
ice driver. This will allow user to control Tx settings of each node in
the internal hierarchy of nodes. As a result user will be able to use
Hierarchy QoS implemented entirely in the hardware.

This patch implemenents devlink-rate API. It also exports initial
default hierarchy. It's mostly dictated by the fact that the tree
can't be removed entirely, all we can do is enable the user to modify
it. For example root node shouldn't ever be removed, also nodes that
have children are off-limits.

Example initial tree with 2 VF's:

[root@fedora ~]# devlink port function rate show
pci/0000:4b:00.0/node_27: type node parent node_26
pci/0000:4b:00.0/node_26: type node parent node_0
pci/0000:4b:00.0/node_34: type node parent node_33
pci/0000:4b:00.0/node_33: type node parent node_32
pci/0000:4b:00.0/node_32: type node parent node_16
pci/0000:4b:00.0/node_19: type node parent node_18
pci/0000:4b:00.0/node_18: type node parent node_17
pci/0000:4b:00.0/node_17: type node parent node_16
pci/0000:4b:00.0/node_21: type node parent node_20
pci/0000:4b:00.0/node_20: type node parent node_3
pci/0000:4b:00.0/node_14: type node parent node_5
pci/0000:4b:00.0/node_5: type node parent node_3
pci/0000:4b:00.0/node_13: type node parent node_4
pci/0000:4b:00.0/node_12: type node parent node_4
pci/0000:4b:00.0/node_11: type node parent node_4
pci/0000:4b:00.0/node_10: type node parent node_4
pci/0000:4b:00.0/node_9: type node parent node_4
pci/0000:4b:00.0/node_8: type node parent node_4
pci/0000:4b:00.0/node_7: type node parent node_4
pci/0000:4b:00.0/node_6: type node parent node_4
pci/0000:4b:00.0/node_4: type node parent node_3
pci/0000:4b:00.0/node_3: type node parent node_16
pci/0000:4b:00.0/node_16: type node parent node_15
pci/0000:4b:00.0/node_15: type node parent node_0
pci/0000:4b:00.0/node_2: type node parent node_1
pci/0000:4b:00.0/node_1: type node parent node_0
pci/0000:4b:00.0/node_0: type node
pci/0000:4b:00.0/1: type leaf parent node_27
pci/0000:4b:00.0/2: type leaf parent node_27

Let me visualize part of the tree:

                        +---------+
                        |  node_0 |
                        +---------+
                             |
                        +----v----+
                        | node_26 |
                        +----+----+
                             |
                        +----v----+
                        | node_27 |
                        +----+----+
                             |
                    |-----------------|
               +----v----+       +----v----+
               |   VF 1  |       |   VF 2  |
               +----+----+       +----+----+

So at this point there is a couple things that can be done.
For example we could only assign parameters to VF's.

[root@fedora ~]# devlink port function rate set pci/0000:4b:00.0/1 \
                 tx_max 5Gbps

This would cap the VF 1 BW to 5Gbps.

But let's say you would like to create a completely new branch.
This can be done like this:

[root@fedora ~]# devlink port function rate add \
                 pci/0000:4b:00.0/node_custom parent node_0
[root@fedora ~]# devlink port function rate add \
                 pci/0000:4b:00.0/node_custom_1 parent node_custom
[root@fedora ~]# devlink port function rate set \
                 pci/0000:4b:00.0/1 parent node_custom_1

This creates a completely new branch and reassigns VF 1 to it.

A number of parameters is supported per each node: tx_max, tx_share,
tx_priority and tx_weight.
====================

Link: https://lore.kernel.org/r/20221115104825.172668-1-michal.wilczynski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoDocumentation: Add documentation for new devlink-rate attributes
Michal Wilczynski [Tue, 15 Nov 2022 10:48:25 +0000 (11:48 +0100)]
Documentation: Add documentation for new devlink-rate attributes

Provide documentation for newly introduced netlink attributes for
devlink-rate: tx_priority and tx_weight.

Mention the possibility to export tree from the driver.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoice: Add documentation for devlink-rate implementation
Michal Wilczynski [Tue, 15 Nov 2022 10:48:24 +0000 (11:48 +0100)]
ice: Add documentation for devlink-rate implementation

Add documentation to a newly added devlink-rate feature. Provide some
examples on how to use the commands, which netlink attributes are
supported and descriptions of the attributes.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoice: Prevent ADQ, DCB coexistence with Custom Tx scheduler
Michal Wilczynski [Tue, 15 Nov 2022 10:48:23 +0000 (11:48 +0100)]
ice: Prevent ADQ, DCB coexistence with Custom Tx scheduler

ADQ, DCB might interfere with Custom Tx Scheduler changes that user
might introduce using devlink-rate API.

Check if ADQ, DCB is active, when user tries to change any setting
in exported Tx scheduler tree. If any of those are active block the user
from doing so, and log an appropriate message.

Remove the exported hierarchy if user enable ADQ or DCB.
Prevent ADQ or DCB from getting configured if user already made some
changes using devlink-rate API.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoice: Implement devlink-rate API
Michal Wilczynski [Tue, 15 Nov 2022 10:48:22 +0000 (11:48 +0100)]
ice: Implement devlink-rate API

There is a need to support modification of Tx scheduler tree, in the
ice driver. This will allow user to control Tx settings of each node in
the internal hierarchy of nodes. As a result user will be able to use
Hierarchy QoS implemented entirely in the hardware.

This patch implemenents devlink-rate API. It also exports initial
default hierarchy. It's mostly dictated by the fact that the tree
can't be removed entirely, all we can do is enable the user to modify
it. For example root node shouldn't ever be removed, also nodes that
have children are off-limits.

Example initial tree with 2 VF's:

[root@fedora ~]# devlink port function rate show

pci/0000:4b:00.0/node_27: type node parent node_26
pci/0000:4b:00.0/node_26: type node parent node_0
pci/0000:4b:00.0/node_34: type node parent node_33
pci/0000:4b:00.0/node_33: type node parent node_32
pci/0000:4b:00.0/node_32: type node parent node_16
pci/0000:4b:00.0/node_19: type node parent node_18
pci/0000:4b:00.0/node_18: type node parent node_17
pci/0000:4b:00.0/node_17: type node parent node_16
pci/0000:4b:00.0/node_21: type node parent node_20
pci/0000:4b:00.0/node_20: type node parent node_3
pci/0000:4b:00.0/node_14: type node parent node_5
pci/0000:4b:00.0/node_5: type node parent node_3
pci/0000:4b:00.0/node_13: type node parent node_4
pci/0000:4b:00.0/node_12: type node parent node_4
pci/0000:4b:00.0/node_11: type node parent node_4
pci/0000:4b:00.0/node_10: type node parent node_4
pci/0000:4b:00.0/node_9: type node parent node_4
pci/0000:4b:00.0/node_8: type node parent node_4
pci/0000:4b:00.0/node_7: type node parent node_4
pci/0000:4b:00.0/node_6: type node parent node_4
pci/0000:4b:00.0/node_4: type node parent node_3
pci/0000:4b:00.0/node_3: type node parent node_16
pci/0000:4b:00.0/node_16: type node parent node_15
pci/0000:4b:00.0/node_15: type node parent node_0
pci/0000:4b:00.0/node_2: type node parent node_1
pci/0000:4b:00.0/node_1: type node parent node_0
pci/0000:4b:00.0/node_0: type node
pci/0000:4b:00.0/1: type leaf parent node_27
pci/0000:4b:00.0/2: type leaf parent node_27

Let me visualize part of the tree:

                    +---------+
                    |  node_0 |
                    +---------+
                         |
                    +----v----+
                    | node_26 |
                    +----+----+
                         |
                    +----v----+
                    | node_27 |
                    +----+----+
                         |
                |-----------------|
           +----v----+       +----v----+
           |   VF 1  |       |   VF 2  |
           +----+----+       +----+----+

So at this point there is a couple things that can be done.
For example we could only assign parameters to VF's.

[root@fedora ~]# devlink port function rate set pci/0000:4b:00.0/1 \
                 tx_max 5Gbps

This would cap the VF 1 BW to 5Gbps.

But let's say you would like to create a completely new branch.
This can be done like this:

[root@fedora ~]# devlink port function rate add \
                 pci/0000:4b:00.0/node_custom parent node_0
[root@fedora ~]# devlink port function rate add \
                 pci/0000:4b:00.0/node_custom_1 parent node_custom
[root@fedora ~]# devlink port function rate set \
                 pci/0000:4b:00.0/1 parent node_custom_1

This creates a completely new branch and reassigns VF 1 to it.

A number of parameters is supported per each node: tx_max, tx_share,
tx_priority and tx_weight.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoice: Add an option to pre-allocate memory for ice_sched_node
Michal Wilczynski [Tue, 15 Nov 2022 10:48:21 +0000 (11:48 +0100)]
ice: Add an option to pre-allocate memory for ice_sched_node

devlink-rate API requires a priv object to be allocated when node still
doesn't have a parent. This is problematic, because ice_sched_node can't
be currently created without a parent.

Add an option to pre-allocate memory for ice_sched_node struct. Add
new arguments to ice_sched_add() and ice_sched_add_elems() that allow
for pre-allocation of memory for ice_sched_node struct.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoice: Introduce new parameters in ice_sched_node
Michal Wilczynski [Tue, 15 Nov 2022 10:48:20 +0000 (11:48 +0100)]
ice: Introduce new parameters in ice_sched_node

To support new devlink-rate API ice_sched_node struct needs to store
a number of additional parameters. This includes tx_max, tx_share,
tx_weight, and tx_priority.

Add new fields to ice_sched_node struct. Add new functions to configure
the hardware with new parameters. Introduce new xarray to identify
nodes uniquely.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agodevlink: Allow to set up parent in devl_rate_leaf_create()
Michal Wilczynski [Tue, 15 Nov 2022 10:48:19 +0000 (11:48 +0100)]
devlink: Allow to set up parent in devl_rate_leaf_create()

Currently the driver is able to create leaf nodes for the devlink-rate,
but is unable to set parent for them. This wasn't as issue before the
possibility to export hierarchy from the driver. After adding the export
feature, in order for the driver to supply correct hierarchy, it's
necessary for it to be able to supply a parent name to
devl_rate_leaf_create().

Introduce a new parameter 'parent_name' in devl_rate_leaf_create().

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agodevlink: Allow for devlink-rate nodes parent reassignment
Michal Wilczynski [Tue, 15 Nov 2022 10:48:18 +0000 (11:48 +0100)]
devlink: Allow for devlink-rate nodes parent reassignment

Currently it's not possible to reassign the parent of the node using one
command. As the previous commit introduced a way to export entire
hierarchy from the driver, being able to modify and reassign parents
become important. This way user might easily change QoS settings without
interrupting traffic.

Example command:
devlink port function rate set pci/0000:4b:00.0/1 parent node_custom_1

This reassigns leaf node parent to node_custom_1.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agodevlink: Enable creation of the devlink-rate nodes from the driver
Michal Wilczynski [Tue, 15 Nov 2022 10:48:17 +0000 (11:48 +0100)]
devlink: Enable creation of the devlink-rate nodes from the driver

Intel 100G card internal firmware hierarchy for Hierarchicial QoS is very
rigid and can't be easily removed. This requires an ability to export
default hierarchy to allow user to modify it. Currently the driver is
only able to create the 'leaf' nodes, which usually represent the vport.
This is not enough for HQoS implemented in Intel hardware.

Introduce new function devl_rate_node_create() that allows for creation
of the devlink-rate nodes from the driver.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agodevlink: Introduce new attribute 'tx_weight' to devlink-rate
Michal Wilczynski [Tue, 15 Nov 2022 10:48:16 +0000 (11:48 +0100)]
devlink: Introduce new attribute 'tx_weight' to devlink-rate

To fully utilize offload capabilities of Intel 100G card QoS capabilities
new attribute 'tx_weight' needs to be introduced. This attribute allows
for usage of Weighted Fair Queuing arbitration scheme among siblings.
This arbitration scheme can be used simultaneously with the strict
priority.

Introduce new attribute in devlink-rate that will allow for configuration
of Weighted Fair Queueing. New attribute is optional.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agodevlink: Introduce new attribute 'tx_priority' to devlink-rate
Michal Wilczynski [Tue, 15 Nov 2022 10:48:15 +0000 (11:48 +0100)]
devlink: Introduce new attribute 'tx_priority' to devlink-rate

To fully utilize offload capabilities of Intel 100G card QoS capabilities
new attribute 'tx_priority' needs to be introduced. This attribute allows
for usage of strict priority arbiter among siblings. This arbitration
scheme attempts to schedule nodes based on their priority as long as the
nodes remain within their bandwidth limit.

Introduce new attribute in devlink-rate that will allow for configuration
of strict priority. New attribute is optional.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agoMerge branch 'autoload-dsa-tagging-driver-when-dynamically-changing-protocol'
Jakub Kicinski [Fri, 18 Nov 2022 05:16:46 +0000 (21:16 -0800)]
Merge branch 'autoload-dsa-tagging-driver-when-dynamically-changing-protocol'

Vladimir Oltean says:

====================
Autoload DSA tagging driver when dynamically changing protocol

This patch set solves the issue reported by Michael and Heiko here:
https://lore.kernel.org/lkml/20221027113248.420216-1-michael@walle.cc/
making full use of Michael's suggestion of having two modaliases: one
gets used for loading the tagging protocol when it's the default one
reported by the switch driver, the other gets loaded at user's request,
by name.

  # modinfo tag_ocelot
  filename:       /lib/modules/6.1.0-rc4+/kernel/net/dsa/tag_ocelot.ko
  license:        GPL v2
  alias:          dsa_tag:seville
  alias:          dsa_tag:id-21
  alias:          dsa_tag:ocelot
  alias:          dsa_tag:id-15
  depends:        dsa_core
  intree:         Y
  name:           tag_ocelot
  vermagic:       6.1.0-rc4+ SMP preempt mod_unload modversions aarch64

Tested on NXP LS1028A-RDB with the following device tree addition:

&mscc_felix_port4 {
dsa-tag-protocol = "ocelot-8021q";
};

&mscc_felix_port5 {
dsa-tag-protocol = "ocelot-8021q";
};

CONFIG_NET_DSA and everything that depends on it is built as module.
Everything auto-loads, and "cat /sys/class/net/eno2/dsa/tagging" shows
"ocelot-8021q". Traffic works as well. Furthermore, "echo ocelot-8021q"
into the aforementioned sysfs file now auto-loads the driver for it.
====================

Link: https://lore.kernel.org/r/20221115011847.2843127-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: dsa: autoload tag driver module on tagging protocol change
Vladimir Oltean [Tue, 15 Nov 2022 01:18:47 +0000 (03:18 +0200)]
net: dsa: autoload tag driver module on tagging protocol change

Issue a request_module() call when an attempt to change the tagging
protocol is made, either by sysfs or by device tree. In the case of
ocelot (the only driver for which the default and the alternative
tagging protocol are compiled as different modules), the user is now no
longer required to insert tag_ocelot_8021q.ko manually.

In the particular case of ocelot, this solves a problem where
tag_ocelot_8021q.ko is built as module, and this is present in the
device tree:

&mscc_felix_port4 {
dsa-tag-protocol = "ocelot-8021q";
};

&mscc_felix_port5 {
dsa-tag-protocol = "ocelot-8021q";
};

Because no one attempts to load the module into the kernel at boot time,
the switch driver will fail to probe (actually forever defer) until
someone manually inserts tag_ocelot_8021q.ko. This is now no longer
necessary and happens automatically.

Rename dsa_find_tagger_by_name() to denote the change in functionality:
there is now feature parity with dsa_tag_driver_get_by_id(), i.o.w. we
also load the module if it's missing.

Link: https://lore.kernel.org/lkml/20221027113248.420216-1-michael@walle.cc/
Suggested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc> # on kontron-sl28 w/ ocelot_8021q
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: dsa: rename dsa_tag_driver_get() to dsa_tag_driver_get_by_id()
Vladimir Oltean [Tue, 15 Nov 2022 01:18:46 +0000 (03:18 +0200)]
net: dsa: rename dsa_tag_driver_get() to dsa_tag_driver_get_by_id()

A future patch will introduce one more way of getting a reference on a
tagging protocl driver (by name). Rename the current method to "by_id".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: dsa: strip sysfs "tagging" string of trailing newline
Vladimir Oltean [Tue, 15 Nov 2022 01:18:45 +0000 (03:18 +0200)]
net: dsa: strip sysfs "tagging" string of trailing newline

Currently, dsa_find_tagger_by_name() uses sysfs_streq() which works both
with strings that contain \n at the end (echo ocelot > .../dsa/tagging)
and with strings that don't (printf ocelot > .../dsa/tagging).

There will be a problem once we'll want to construct the modalias string
based on which we auto-load the protocol kernel module. If the sysfs
buffer ends in a newline, we need to strip it first. This is a
preparatory patch specifically for that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: dsa: provide a second modalias to tag proto drivers based on their name
Vladimir Oltean [Tue, 15 Nov 2022 01:18:44 +0000 (03:18 +0200)]
net: dsa: provide a second modalias to tag proto drivers based on their name

Currently, tagging protocol drivers have a modalias of
"dsa_tag:id-<number>", where the number is one of DSA_TAG_PROTO_*_VALUE.

This modalias makes it possible for the request_module() call in
dsa_tag_driver_get() to work, given the input it has - an integer
returned by ds->ops->get_tag_protocol().

It is also possible to change tagging protocols at (pseudo-)runtime, via
sysfs or via device tree, and this works via the name string of the
tagging protocol rather than via its id (DSA_TAG_PROTO_*_VALUE).

In the latter case, there is no request_module() call, because there is
no association that the DSA core has between the string name and the ID,
to construct the modalias. The module is simply assumed to have been
inserted. This is actually slightly problematic when the tagging
protocol change should take place at probe time, since it's expected
that the dependency module should get autoloaded.

For this purpose, let's introduce a second modalias, so that the DSA
core can call request_module() by name. There is no reason to make the
modalias by name optional, so just modify the MODULE_ALIAS_DSA_TAG_DRIVER()
macro to take both the ID and the name as arguments, and generate two
modaliases behind the scenes.

Suggested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc> # on kontron-sl28 w/ ocelot_8021q
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: dsa: rename tagging protocol driver modalias
Vladimir Oltean [Tue, 15 Nov 2022 01:18:43 +0000 (03:18 +0200)]
net: dsa: rename tagging protocol driver modalias

It's autumn cleanup time, and today's target are modaliases.

Michael says that for users of modinfo, "dsa_tag-20" is not the most
suggestive name, and recommends a change to "dsa_tag-id-20".

Andrew points out that other modaliases have a prefix delimited by
colons, so he recommends "dsa_tag:20" instead of "dsa_tag-20".

To satisfy both proposals, Florian recommends "dsa_tag:id-20".

The modaliases are not stable ABI, and the essential information
(protocol ID) is still conveyed in the new string, which
request_module() must be adapted to form.

Link: 20221027210830.3577793-1-vladimir.oltean@nxp.com
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Suggested-by: Michael Walle <michael@walle.cc>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agonet: dsa: stop exposing tag proto module helpers to the world
Vladimir Oltean [Tue, 15 Nov 2022 01:18:42 +0000 (03:18 +0200)]
net: dsa: stop exposing tag proto module helpers to the world

The DSA tagging protocol driver macros are in the public include/net/dsa.h
probably because that's also where the DSA_TAG_PROTO_*_VALUE macros are
(MODULE_ALIAS_DSA_TAG_DRIVER hinges on those macro definitions).

But there is no reason to expose these helpers to <net/dsa.h>. That
header is shared between switch drivers (drivers/net/dsa/), tagging
protocol drivers (net/dsa/tag_*.c), the DSA core (net/dsa/ sans tag_*.c),
and the rest of the world (DSA master drivers, network stack, etc).
Too much exposure.

On the other hand, net/dsa/dsa_priv.h is included only by the DSA core
and by DSA tagging protocol drivers (or IOW, "friend" modules). Also a
bit too much exposure - I've contemplated creating a new header which is
only included by tagging protocol drivers, but completely separating a
new dsa_tag_proto.h from dsa_priv.h is not immediately trivial - for
example dsa_slave_to_port() is used both from the fast path and from the
control path.

So for now, move these definitions to dsa_priv.h which at least hides
them from the world.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
19 months agodt-bindings: net: ipq4019-mdio: document required clock-names
Robert Marko [Mon, 14 Nov 2022 19:47:33 +0000 (20:47 +0100)]
dt-bindings: net: ipq4019-mdio: document required clock-names

IPQ5018, IPQ6018 and IPQ8074 require clock-names to be set as driver is
requesting the clock based on it and not index, so document that and make
it required for the listed SoC-s.

Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221114194734.3287854-4-robimarko@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>