Geliang Tang [Sat, 13 Mar 2021 01:16:11 +0000 (17:16 -0800)]
mptcp: add rm_list in mptcp_out_options
This patch defined a new struct mptcp_rm_list, the ids field was an
array of the removing address ids, the nr field was the valid number of
removing address ids in the array. The array size was definced as a new
macro MPTCP_RM_IDS_MAX. Changed the member rm_id of struct
mptcp_out_options to rm_list.
In mptcp_established_options_rm_addr, invoked mptcp_pm_rm_addr_signal to
get the rm_list. According the number of addresses in it, calculated
the padded RM_ADDR suboption length. And saved the ids array in struct
mptcp_out_options's rm_list member.
In mptcp_write_options, iterated each address id from struct
mptcp_out_options's rm_list member, set the invalid ones as TCPOPT_NOP,
then filled them into the RM_ADDR suboption.
Changed TCPOLEN_MPTCP_RM_ADDR_BASE from 4 to 3.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 13 Mar 2021 01:44:10 +0000 (17:44 -0800)]
Merge branch 'resil-nhgroups-netdevsim-selftests'
Petr Machata says:
====================
net: Resilient NH groups: netdevsim, selftests
Support for resilient next-hop groups was added in a previous patch set.
Resilient next hop groups add a layer of indirection between the SKB hash
and the next hop. Thus the hash is used to reference a hash table bucket,
which is then used to reference a particular next hop. This allows the
system more flexibility when assigning SKB hash space to next hops.
Previously, each next hop had to be assigned a continuous range of SKB hash
space. With a hash table as an intermediate layer, it is possible to
reassign next hops with a hash table bucket granularity. In turn, this
mends issues with traffic flow redirection resulting from next hop removal
or adjustments in next-hop weights.
This patch set introduces mock offloading of resilient next hop groups by
the netdevsim driver, and a suite of selftests.
- Patch #1 adds a netdevsim-specific lock to protect next-hop hashtable.
Previously, netdevsim relied on RTNL to maintain mutual exclusion.
Patch #2 extracts a helper to make the following patches clearer.
- Patch #3 implements the support for offloading of resilient next-hop
groups.
- Patch #4 introduces a new debugfs interface to set activity on a selected
next-hop bucket. This simulates how HW can periodically report bucket
activity, and buckets thus marked are expected to be exempt from
migration to new next hops when the group changes.
- Patches #5 and #6 clean up the fib_nexthop selftests.
- Patches #7, #8 and #9 add tests for resilient next hop groups. Patch #7
adds resilient-hashing counterparts to fib_nexthops.sh. Patch #8 adds a
new traffic test for resilient next-hop groups. Patch #9 adds a new
traffic test for tunneling.
- Patch #10 actually leverages the netdevsim offload to implement a suite
of algorithmic tests that verify how and when buckets are migrated under
various simulated workload scenarios.
The overall plan is to contribute approximately the following patchsets:
1) Nexthop policy refactoring (already pushed)
2) Preparations for resilient next hop groups (already pushed)
3) Implementation of resilient next hop group (already pushed)
4) Netdevsim offload plus a suite of selftests (this patchset)
5) Preparations for mlxsw offload of resilient next-hop groups
6) mlxsw offload including selftests
Interested parties can look at the complete code at [2].
[1] https://tools.ietf.org/html/rfc2992
[2] https://github.com/idosch/linux/commits/submit/res_integ_v1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 12 Mar 2021 16:50:26 +0000 (17:50 +0100)]
selftests: netdevsim: Add test for resilient nexthop groups offload API
Test various aspects of the resilient nexthop group offload API on top
of the netdevsim implementation. Both good and bad flows are tested.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Co-developed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 12 Mar 2021 16:50:25 +0000 (17:50 +0100)]
selftests: forwarding: Add resilient multipath tunneling nexthop test
Add a resilient nexthop objects version of gre_multipath_nh.sh. Test
that both IPv4 and IPv6 overlays work with resilient nexthop groups
where the nexthops are two GRE tunnels.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 12 Mar 2021 16:50:24 +0000 (17:50 +0100)]
selftests: forwarding: Add resilient hashing test
Verify that IPv4 and IPv6 multipath forwarding works correctly with
resilient nexthop groups and with different weights.
Test that when the idle timer is not zero, the resilient groups are not
rebalanced - because the nexthop buckets are considered active - and the
initial weights (1:1) are used.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 12 Mar 2021 16:50:23 +0000 (17:50 +0100)]
selftests: fib_nexthops: Test resilient nexthop groups
Add test cases for resilient nexthop groups. Exhaustive forwarding tests
are added separately under net/forwarding/.
Examples:
# ./fib_nexthops.sh -t basic_res
Basic resilient nexthop group functional tests
----------------------------------------------
TEST: Add a nexthop group with default parameters [ OK ]
TEST: Get a nexthop group with default parameters [ OK ]
TEST: Get a nexthop group with non-default parameters [ OK ]
TEST: Add a nexthop group with 0 buckets [ OK ]
TEST: Replace nexthop group parameters [ OK ]
TEST: Get a nexthop group after replacing parameters [ OK ]
TEST: Replace idle timer [ OK ]
TEST: Get a nexthop group after replacing idle timer [ OK ]
TEST: Replace unbalanced timer [ OK ]
TEST: Get a nexthop group after replacing unbalanced timer [ OK ]
TEST: Replace with no parameters [ OK ]
TEST: Get a nexthop group after replacing no parameters [ OK ]
TEST: Replace nexthop group type - implicit [ OK ]
TEST: Replace nexthop group type - explicit [ OK ]
TEST: Replace number of nexthop buckets [ OK ]
TEST: Get a nexthop group after replacing with invalid parameters [ OK ]
TEST: Dump all nexthop buckets [ OK ]
TEST: Dump all nexthop buckets in a group [ OK ]
TEST: Dump all nexthop buckets with a specific nexthop device [ OK ]
TEST: Dump all nexthop buckets with a specific nexthop identifier [ OK ]
TEST: Dump all nexthop buckets in a non-existent group [ OK ]
TEST: Dump all nexthop buckets in a non-resilient group [ OK ]
TEST: Dump all nexthop buckets using a non-existent device [ OK ]
TEST: Dump all nexthop buckets with invalid 'groups' keyword [ OK ]
TEST: Dump all nexthop buckets with invalid 'fdb' keyword [ OK ]
TEST: Get a valid nexthop bucket [ OK ]
TEST: Get a nexthop bucket with valid group, but invalid index [ OK ]
TEST: Get a nexthop bucket from a non-resilient group [ OK ]
TEST: Get a nexthop bucket from a non-existent group [ OK ]
Tests passed: 29
Tests failed: 0
# ./fib_nexthops.sh -t ipv4_large_res_grp
IPv4 large resilient group (128k buckets)
-----------------------------------------
TEST: Dump large (x131072) nexthop buckets [ OK ]
Tests passed: 1
Tests failed: 0
# ./fib_nexthops.sh -t ipv6_large_res_grp
IPv6 large resilient group (128k buckets)
-----------------------------------------
TEST: Dump large (x131072) nexthop buckets [ OK ]
Tests passed: 1
Tests failed: 0
# ./fib_nexthops.sh -t ipv4_res_torture
IPv4 runtime resilient nexthop group torture
--------------------------------------------
TEST: IPv4 resilient nexthop group torture test [ OK ]
Tests passed: 1
Tests failed: 0
# ./fib_nexthops.sh -t ipv6_res_torture
IPv6 runtime resilient nexthop group torture
--------------------------------------------
TEST: IPv6 resilient nexthop group torture test [ OK ]
Tests passed: 1
Tests failed: 0
# ./fib_nexthops.sh -t ipv4_res_grp_fcnal
IPv4 resilient groups functional
--------------------------------
TEST: Nexthop group updated when entry is deleted [ OK ]
TEST: Nexthop buckets updated when entry is deleted [ OK ]
TEST: Nexthop group updated after replace [ OK ]
TEST: Nexthop buckets updated after replace [ OK ]
TEST: Nexthop group updated when entry is deleted - nECMP [ OK ]
TEST: Nexthop buckets updated when entry is deleted - nECMP [ OK ]
TEST: Nexthop group updated after replace - nECMP [ OK ]
TEST: Nexthop buckets updated after replace - nECMP [ OK ]
Tests passed: 8
Tests failed: 0
# ./fib_nexthops.sh -t ipv6_res_grp_fcnal
IPv6 resilient groups functional
--------------------------------
TEST: Nexthop group updated when entry is deleted [ OK ]
TEST: Nexthop buckets updated when entry is deleted [ OK ]
TEST: Nexthop group updated after replace [ OK ]
TEST: Nexthop buckets updated after replace [ OK ]
TEST: Nexthop group updated when entry is deleted - nECMP [ OK ]
TEST: Nexthop buckets updated when entry is deleted - nECMP [ OK ]
TEST: Nexthop group updated after replace - nECMP [ OK ]
TEST: Nexthop buckets updated after replace - nECMP [ OK ]
Tests passed: 8
Tests failed: 0
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Co-developed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 12 Mar 2021 16:50:22 +0000 (17:50 +0100)]
selftests: fib_nexthops: List each test case in a different line
The lines with the IPv4 and IPv6 test cases are already very long and
more test cases will be added in subsequent patches.
List each test case in a different line to make it easier to extend the
test with more test cases.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 12 Mar 2021 16:50:21 +0000 (17:50 +0100)]
selftests: fib_nexthops: Declutter test output
Before:
# ./fib_nexthops.sh -t ipv4_torture
IPv4 runtime torture
--------------------
TEST: IPv4 torture test [ OK ]
./fib_nexthops.sh: line 213: 19376 Killed ipv4_del_add_loop1
./fib_nexthops.sh: line 213: 19377 Killed ipv4_grp_replace_loop
./fib_nexthops.sh: line 213: 19378 Killed ip netns exec me ping -f 172.16.101.1 > /dev/null 2>&1
./fib_nexthops.sh: line 213: 19380 Killed ip netns exec me ping -f 172.16.101.2 > /dev/null 2>&1
./fib_nexthops.sh: line 213: 19381 Killed ip netns exec me mausezahn veth1 -B 172.16.101.2 -A 172.16.1.1 -c 0 -t tcp "dp=1-1023, flags=syn" > /dev/null 2>&1
Tests passed: 1
Tests failed: 0
# ./fib_nexthops.sh -t ipv6_torture
IPv6 runtime torture
--------------------
TEST: IPv6 torture test [ OK ]
./fib_nexthops.sh: line 213: 24453 Killed ipv6_del_add_loop1
./fib_nexthops.sh: line 213: 24454 Killed ipv6_grp_replace_loop
./fib_nexthops.sh: line 213: 24456 Killed ip netns exec me ping -f 2001:db8:101::1 > /dev/null 2>&1
./fib_nexthops.sh: line 213: 24457 Killed ip netns exec me ping -f 2001:db8:101::2 > /dev/null 2>&1
./fib_nexthops.sh: line 213: 24458 Killed ip netns exec me mausezahn -6 veth1 -B 2001:db8:101::2 -A 2001:db8:91::1 -c 0 -t tcp "dp=1-1023, flags=syn" > /dev/null 2>&1
Tests passed: 1
Tests failed: 0
After:
# ./fib_nexthops.sh -t ipv4_torture
IPv4 runtime torture
--------------------
TEST: IPv4 torture test [ OK ]
Tests passed: 1
Tests failed: 0
# ./fib_nexthops.sh -t ipv6_torture
IPv6 runtime torture
--------------------
TEST: IPv6 torture test [ OK ]
Tests passed: 1
Tests failed: 0
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 12 Mar 2021 16:50:20 +0000 (17:50 +0100)]
netdevsim: Allow reporting activity on nexthop buckets
A key component of the resilient hashing algorithm is the hash buckets'
activity. If a bucket is active, it will not be populated with a new
nexthop in order not to break existing flows. Therefore, in order to
easily and thoroughly test the algorithm, we need to be in full control
over the reported activity.
Add a debugfs interface that allows user space to have netdevsim report
a nexthop bucket within a resilient nexthop group as active. For
example:
# echo 10 23 > /sys/kernel/debug/netdevsim/netdevsim10/fib/nexthop_bucket_activity
Will mark bucket 23 in nexthop group 10 as active.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 12 Mar 2021 16:50:19 +0000 (17:50 +0100)]
netdevsim: Add support for resilient nexthop groups
Allow resilient nexthop groups to be programmed and account their
occupancy according to their number of buckets. The nexthop group itself
as well as its buckets are marked with hardware flags (i.e.,
'RTNH_F_TRAP').
Replacement of a single nexthop bucket can fail using the following
debugfs knob:
# cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_nexthop_bucket_replace
N
# echo 1 > /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_nexthop_bucket_replace
# cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_nexthop_bucket_replace
Y
Replacement of a resilient nexthop group can fail using the following
debugfs knob:
# cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_res_nexthop_group_replace
N
# echo 1 > /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_res_nexthop_group_replace
# cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_res_nexthop_group_replace
Y
This enables testing of various error paths.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 12 Mar 2021 16:50:18 +0000 (17:50 +0100)]
netdevsim: Create a helper for setting nexthop hardware flags
Instead of calling nexthop_set_hw_flags(), call a helper. It will be
used to also set nexthop bucket flags in a subsequent patch.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Fri, 12 Mar 2021 16:50:17 +0000 (17:50 +0100)]
netdevsim: fib: Introduce a lock to guard nexthop hashtable
Currently netdevsim relies on RTNL to maintain exclusivity in accessing the
nexthop hash table. However, bucket notification may be called without RTNL
having been held. Instead, introduce a custom lock to guard the table.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 13 Mar 2021 01:09:34 +0000 (17:09 -0800)]
Merge branch 'ptp-warnings'
Lee Jones says:
====================
Rid W=1 warnings from PTP
This set is part of a larger effort attempting to clean-up W=1
kernel builds, which are currently overwhelmingly riddled with
niggly little warnings.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lee Jones [Fri, 12 Mar 2021 11:09:10 +0000 (11:09 +0000)]
ptp: ptp_p: Demote non-conformant kernel-doc headers and supply a param description
Fixes the following W=1 kernel build warning(s):
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'control' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'event' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'addend' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'accum' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'test' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'ts_compare' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'rsystime_lo' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'rsystime_hi' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'systime_lo' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'systime_hi' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'trgt_lo' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'trgt_hi' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'asms_lo' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'asms_hi' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'amms_lo' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'amms_hi' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'ch_control' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'ch_event' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'tx_snap_lo' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'tx_snap_hi' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'rx_snap_lo' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'rx_snap_hi' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'src_uuid_lo' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'src_uuid_hi' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'can_status' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'can_snap_lo' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'can_snap_hi' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'ts_sel' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'ts_st' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'reserve1' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'stl_max_set_en' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'stl_max_set' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'reserve2' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'srst' not described in 'pch_ts_regs'
drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'regs' not described in 'pch_dev'
drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'ptp_clock' not described in 'pch_dev'
drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'caps' not described in 'pch_dev'
drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'exts0_enabled' not described in 'pch_dev'
drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'exts1_enabled' not described in 'pch_dev'
drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'mem_base' not described in 'pch_dev'
drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'mem_size' not described in 'pch_dev'
drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'irq' not described in 'pch_dev'
drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'pdev' not described in 'pch_dev'
drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'register_lock' not described in 'pch_dev'
drivers/ptp/ptp_pch.c:128: warning: Function parameter or member 'station' not described in 'pch_params'
drivers/ptp/ptp_pch.c:291: warning: Function parameter or member 'pdev' not described in 'pch_set_station_address'
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: LAPIS SEMICONDUCTOR <tshimizu818@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lee Jones [Fri, 12 Mar 2021 11:09:09 +0000 (11:09 +0000)]
ptp: ptp_clockmatrix: Demote non-kernel-doc header to standard comment
Fixes the following W=1 kernel build warning(s):
drivers/ptp/ptp_clockmatrix.c:1408: warning: Cannot understand * @brief Maximum absolute value for write phase offset in picoseconds
drivers/ptp/ptp_clockmatrix.c:1408: warning: Cannot understand * @brief Maximum absolute value for write phase offset in picoseconds
drivers/ptp/ptp_clockmatrix.c:1408: warning: Cannot understand * @brief Maximum absolute value for write phase offset in picoseconds
drivers/ptp/ptp_clockmatrix.c:1408: warning: Cannot understand * @brief Maximum absolute value for write phase offset in picoseconds
drivers/ptp/ptp_clockmatrix.c:1408: warning: Cannot understand * @brief Maximum absolute value for write phase offset in picoseconds
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: IDT-support-1588@lm.renesas.com
Cc: netdev@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lee Jones [Fri, 12 Mar 2021 11:09:08 +0000 (11:09 +0000)]
ptp_pch: Move 'pch_*()' prototypes to shared header
Fixes the following W=1 kernel build warning(s):
drivers/ptp/ptp_pch.c:193:6: warning: no previous prototype for ‘pch_ch_control_write’ [-Wmissing-prototypes]
drivers/ptp/ptp_pch.c:201:5: warning: no previous prototype for ‘pch_ch_event_read’ [-Wmissing-prototypes]
drivers/ptp/ptp_pch.c:212:6: warning: no previous prototype for ‘pch_ch_event_write’ [-Wmissing-prototypes]
drivers/ptp/ptp_pch.c:220:5: warning: no previous prototype for ‘pch_src_uuid_lo_read’ [-Wmissing-prototypes]
drivers/ptp/ptp_pch.c:231:5: warning: no previous prototype for ‘pch_src_uuid_hi_read’ [-Wmissing-prototypes]
drivers/ptp/ptp_pch.c:242:5: warning: no previous prototype for ‘pch_rx_snap_read’ [-Wmissing-prototypes]
drivers/ptp/ptp_pch.c:259:5: warning: no previous prototype for ‘pch_tx_snap_read’ [-Wmissing-prototypes]
drivers/ptp/ptp_pch.c:300:5: warning: no previous prototype for ‘pch_set_station_address’ [-Wmissing-prototypes]
Cc: Richard Cochran <richardcochran@gmail.com> (maintainer:PTP HARDWARE CLOCK SUPPORT)
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Flavio Suligoi <f.suligoi@asem.it>
Cc: netdev@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lee Jones [Fri, 12 Mar 2021 11:09:07 +0000 (11:09 +0000)]
ptp_pch: Remove unused function 'pch_ch_control_read()'
Fixes the following W=1 kernel build warning(s):
drivers/ptp/ptp_pch.c:182:5: warning: no previous prototype for ‘pch_ch_control_read’ [-Wmissing-prototypes]
Cc: Richard Cochran <richardcochran@gmail.com> (maintainer:PTP HARDWARE CLOCK SUPPORT)
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Flavio Suligoi <f.suligoi@asem.it>
Cc: netdev@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafał Miłecki [Fri, 12 Mar 2021 10:41:08 +0000 (11:41 +0100)]
net: dsa: bcm_sf2: setup BCM4908 internal crossbar
On some SoCs (e.g. BCM4908, BCM631[345]8) SF2 has an integrated
crossbar. It allows connecting its selected external ports to internal
ports. It's used by vendors to handle custom Ethernet setups.
BCM4908 has following 3x2 crossbar. On Asus GT-AC5300 rgmii is used for
connecting external BCM53134S switch. GPHY4 is usually used for WAN
port. More fancy devices use SerDes for 2.5 Gbps Ethernet.
┌──────────┐
SerDes ─── 0 ─┤ │
│ 3x2 ├─ 0 ─── switch port 7
GPHY4 ─── 1 ─┤ │
│ crossbar ├─ 1 ─── runner (accelerator)
rgmii ─── 2 ─┤ │
└──────────┘
Use setup data based on DT info to configure BCM4908's switch port 7.
Right now only GPHY and rgmii variants are supported. Handling SerDes
can be implemented later.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafał Miłecki [Fri, 12 Mar 2021 10:41:07 +0000 (11:41 +0100)]
net: dsa: bcm_sf2: store PHY interface/mode in port structure
It's needed later for proper switch / crossbar setup.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shubhankar Kuranagatti [Fri, 12 Mar 2021 07:30:05 +0000 (13:00 +0530)]
net: ipv4: route.c: Fix indentation of multi line comment.
All comment lines inside the comment block have been aligned.
Every line of comment starts with a * (uniformity in code).
Signed-off-by: Shubhankar Kuranagatti <shubhankarvk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafał Miłecki [Thu, 11 Mar 2021 12:35:21 +0000 (13:35 +0100)]
net: broadcom: bcm4908_enet: support TX interrupt
It appears that each DMA channel has its own interrupt and both rings
can be configured (the same way) to handle interrupts.
1. Make ring interrupts code generic (make it operate on given ring)
2. Move napi to ring (so each has its own)
3. Make IRQ handler generic (match ring against received IRQ number)
4. Add (optional) support for TX interrupt
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafał Miłecki [Thu, 11 Mar 2021 12:35:20 +0000 (13:35 +0100)]
dt-bindings: net: bcm4908-enet: add optional TX interrupt
I discovered that hardware actually supports two interrupts, one per DMA
channel (RX and TX).
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 13 Mar 2021 00:44:46 +0000 (16:44 -0800)]
Merge branch 'macb-fixed-link-fixes'
Robert Hancock says:
====================
macb SGMII fixed-link fixes
Some fixes to the macb driver for use in SGMII mode with a fixed-link (such as
for chip-to-chip connectivity).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hancock [Thu, 11 Mar 2021 20:18:13 +0000 (14:18 -0600)]
net: macb: Disable PCS auto-negotiation for SGMII fixed-link mode
When using a fixed-link configuration in SGMII mode, it's not really
sensible to have auto-negotiation enabled since the link settings are
fixed by definition. In other configurations, such as an SGMII
connection to a PHY, it should generally be enabled.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hancock [Thu, 11 Mar 2021 20:18:12 +0000 (14:18 -0600)]
net: macb: poll for fixed link state in SGMII mode
When using a fixed-link configuration with GEM in SGMII mode, such as
for a chip-to-chip interconnect, the link state was always showing as
established regardless of the actual connectivity state. We can monitor
the pcs_link_state bit in the Network Status register to determine
whether the PCS link state is actually up.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 13 Mar 2021 00:36:07 +0000 (16:36 -0800)]
Merge tag 'mlx5-updates-2021-03-12' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-03-12
1) TC support for ICMP parameters
2) TC connection tracking with mirroring
3) A round of trivial fixups and cleanups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Maor Dickman [Sun, 24 Jan 2021 13:56:36 +0000 (15:56 +0200)]
net/mlx5e: Allow to match on ICMP parameters
Support matching on ICMPv4/6 type and code parameters using misc3
section of match parameters.
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Paul Blakey [Mon, 21 Sep 2020 08:49:26 +0000 (11:49 +0300)]
net/mlx5: CT: Add support for mirroring
Add support for mirroring before the CT action by spliting the pre ct rule.
Mirror outputs are done first on the tc chain,prio table rule (the fwd
rule), which will then forward to a per port fwd table.
On this fwd table, we insert the original pre ct rule that forwards to
ct/ct nat table.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Alaa Hleihel [Thu, 31 Dec 2020 14:24:51 +0000 (16:24 +0200)]
net/mlx5: Display the command index in command mailbox dump
Multiple commands can be printed at the same time which can
lead to wrong order of their lines in dmesg output.
As a result, it's hard to match data dumps to the correct command
or which command was fully dumped at some point.
Fix this by displaying the corresponding command index, and also
indicate when a command was fully dumped.
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Arnd Bergmann [Mon, 8 Mar 2021 15:32:57 +0000 (16:32 +0100)]
net/mlx5e: allocate 'indirection_rqt' buffer dynamically
Increasing the size of the indirection_rqt array from 128 to 256 bytes
pushed the stack usage of the mlx5e_hairpin_fill_rqt_rqns() function
over the warning limit when building with clang and CONFIG_KASAN:
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:970:1: error: stack frame size of 1180 bytes in function 'mlx5e_tc_add_nic_flow' [-Werror,-Wframe-larger-than=]
Using dynamic allocation here is safe because the caller does the
same, and it reduces the stack usage of the function to just a few
bytes.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Tariq Toukan [Wed, 17 Feb 2021 09:43:28 +0000 (11:43 +0200)]
net/mlx5e: Dump ICOSQ WQE descriptor on CQE with error events
Dump the ICOSQ's WQE descriptor when a completion with error is received.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maxim Mikityanskiy [Fri, 29 Jan 2021 16:43:31 +0000 (18:43 +0200)]
net/mlx5e: Use net_prefetchw instead of prefetchw in MPWQE TX datapath
Commit
e20f0dbf204f ("net/mlx5e: RX, Add a prefetch command for small
L1_CACHE_BYTES") switched to using net_prefetchw at all places in mlx5e.
In the same time frame, commit
5af75c747e2a ("net/mlx5e: Enhanced TX
MPWQE for SKBs") added one more usage of prefetchw. When these two
changes were merged, this new occurrence of prefetchw wasn't replaced
with net_prefetchw.
This commit fixes this last occurrence of prefetchw in
mlx5e_tx_mpwqe_session_start, making the same change that was done in
mlx5e_xdp_mpwqe_session_start.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Tue, 9 Mar 2021 16:25:59 +0000 (18:25 +0200)]
net/mlx5e: Remove redundant newline in NL_SET_ERR_MSG_MOD
Fix the following coccicheck warnings:
drivers/net/ethernet/mellanox/mlx5/core/devlink.c:145:29-66: WARNING
avoid newline at end of message in NL_SET_ERR_MSG_MOD
drivers/net/ethernet/mellanox/mlx5/core/devlink.c:140:29-77: WARNING
avoid newline at end of message in NL_SET_ERR_MSG_MOD
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Mark Zhang [Tue, 2 Feb 2021 11:28:29 +0000 (13:28 +0200)]
net/mlx5: Read congestion counters from all ports when lag is active
Read congestion counters from all ports in any lag mode rather than
only in RoCE lag mode (e.g., VF lag).
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Jiapeng Chong [Mon, 22 Feb 2021 09:56:59 +0000 (17:56 +0800)]
net/mlx5: remove unneeded semicolon
Fix the following coccicheck warnings:
./drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c:495:2-3: Unneeded
semicolon.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Junlin Yang [Wed, 3 Mar 2021 02:40:19 +0000 (10:40 +0800)]
net/mlx5: use kvfree() for memory allocated with kvzalloc()
It is allocated with kvzalloc(), the corresponding release function
should not be kfree(), use kvfree() instead.
Generated by: scripts/coccinelle/api/kfree_mismatch.cocci
Signed-off-by: Junlin Yang <yangjunlin@yulong.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Yevgeny Kliteynik [Sat, 6 Feb 2021 20:41:41 +0000 (22:41 +0200)]
net/mlx5: DR, Add missing vhca_id consume from STEv1
The field source_eswitch_owner_vhca_id was not consumed
in the same way as in STEv0. Added the missing set.
Fixes:
10b694186410 ("net/mlx5: DR, Add HW STEv1 match logic")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Yevgeny Kliteynik [Sat, 6 Feb 2021 13:39:25 +0000 (15:39 +0200)]
net/mlx5: DR, Remove unneeded rx_decap_l3 function for STEv1
Remove the dr_ste_v1_set_rx_decap_l3 function that was
replaced by another function - fixing a rebase error.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Yevgeny Kliteynik [Sat, 6 Feb 2021 13:20:41 +0000 (15:20 +0200)]
net/mlx5: DR, Fixed typo in STE v0
"reforamt" -> "reformat"
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Jonathan Neuschäfer [Thu, 11 Mar 2021 17:22:34 +0000 (18:22 +0100)]
docs: networking: phy: Improve placement of parenthesis
"either" is outside the parentheses, so the matching "or" should be too.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 12 Mar 2021 02:35:31 +0000 (18:35 -0800)]
Merge branch 'tcp-delayed-completions'
Eric Dumazet says:
====================
tcp: better deal with delayed TX completions
Jakub and Neil reported an increase of RTO timers whenever
TX completions are delayed a bit more (by increasing
NIC TX coalescing parameters)
While problems have been there forever, second patch might
introduce some regressions so I prefer not backport
them to stable releases before things settle.
Many thanks to FB team for their help and tests.
Few packetdrill tests need to be changed to reflect
the improvements brought by this series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 11 Mar 2021 20:35:06 +0000 (12:35 -0800)]
tcp: remove obsolete check in __tcp_retransmit_skb()
TSQ provides a nice way to avoid bufferbloat on individual socket,
including retransmit packets. We can get rid of the old
heuristic:
/* Do not sent more than we queued. 1/4 is reserved for possible
* copying overhead: fragmentation, tunneling, mangling etc.
*/
if (refcount_read(&sk->sk_wmem_alloc) >
min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
sk->sk_sndbuf))
return -EAGAIN;
This heuristic was giving false positives according to Jakub,
whenever TX completions are delayed above RTT. (Ack packets
are processed by TCP stack before clones are orphaned/freed)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 11 Mar 2021 20:35:05 +0000 (12:35 -0800)]
tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()
Jakub reported Data included in a Fastopen SYN that had to be
retransmit would have to wait for an RTO if TX completions are slow,
even with prior fix.
This is because tcp_rcv_fastopen_synack() does not use standard
rtx logic, meaning TSQ handler exits early in tcp_tsq_write()
because tp->lost_out == tp->retrans_out
Lets make tcp_rcv_fastopen_synack() use standard rtx logic,
by using tcp_mark_skb_lost() on the skb thats needs to be
sent again.
Not this raised a warning in tcp_fastretrans_alert() during my tests
since we consider the data not being aknowledged
by the receiver does not mean packet was lost on the network.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 11 Mar 2021 20:35:04 +0000 (12:35 -0800)]
tcp: plug skb_still_in_host_queue() to TSQ
Jakub and Neil reported an increase of RTO timers whenever
TX completions are delayed a bit more (by increasing
NIC TX coalescing parameters)
Main issue is that TCP stack has a logic preventing a packet
being retransmit if the prior clone has not yet been
orphaned or freed.
This logic came with commit
1f3279ae0c13 ("tcp: avoid
retransmits of TCP packets hanging in host queues")
Thankfully, in the case skb_still_in_host_queue() detects
the initial clone is still in flight, it can use TSQ logic
that will eventually retry later, at the moment the clone
is freed or orphaned.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Neil Spring <ntspring@fb.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tong Zhang [Thu, 11 Mar 2021 04:27:55 +0000 (23:27 -0500)]
isdn: remove extra spaces in the header file
fix some coding style issues in the isdn header
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hoang Huu Le [Thu, 11 Mar 2021 03:33:23 +0000 (10:33 +0700)]
tipc: clean up warnings detected by sparse
This patch fixes the following warning from sparse:
net/tipc/monitor.c:263:35: warning: incorrect type in assignment (different base types)
net/tipc/monitor.c:263:35: expected unsigned int
net/tipc/monitor.c:263:35: got restricted __be32 [usertype]
[...]
net/tipc/node.c:374:13: warning: context imbalance in 'tipc_node_read_lock' - wrong count at exit
net/tipc/node.c:379:13: warning: context imbalance in 'tipc_node_read_unlock' - unexpected unlock
net/tipc/node.c:384:13: warning: context imbalance in 'tipc_node_write_lock' - wrong count at exit
net/tipc/node.c:389:13: warning: context imbalance in 'tipc_node_write_unlock_fast' - unexpected unlock
net/tipc/node.c:404:17: warning: context imbalance in 'tipc_node_write_unlock' - unexpected unlock
[...]
net/tipc/crypto.c:1201:9: warning: incorrect type in initializer (different address spaces)
net/tipc/crypto.c:1201:9: expected struct tipc_aead [noderef] __rcu *__tmp
net/tipc/crypto.c:1201:9: got struct tipc_aead *
[...]
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hoang Le [Thu, 11 Mar 2021 03:33:22 +0000 (10:33 +0700)]
tipc: convert dest node's address to network order
(struct tipc_link_info)->dest is in network order (__be32), so we must
convert the value to network order before assigning. The problem detected
by sparse:
net/tipc/netlink_compat.c:699:24: warning: incorrect type in assignment (different base types)
net/tipc/netlink_compat.c:699:24: expected restricted __be32 [usertype] dest
net/tipc/netlink_compat.c:699:24: got int
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 12 Mar 2021 00:22:39 +0000 (16:22 -0800)]
Merge branch 'mlxsw-Implement-sampling-using-mirroring'
Ido Schimmel says:
====================
mlxsw: Implement sampling using mirroring
So far, sampling was implemented using a dedicated sampling mechanism
that is available on all Spectrum ASICs. Spectrum-2 and later ASICs
support sampling by mirroring packets to the CPU port with probability.
This method has a couple of advantages compared to the legacy method:
* Extra metadata per-packet: Egress port, egress traffic class, traffic
class occupancy and end-to-end latency
* Ability to sample packets on egress / per-flow as opposed to only
ingress
This series should not result in any user-visible changes and its aim is
to convert Spectrum-2 and later ASICs to perform sampling by mirroring
to the CPU port with probability. Future submissions will expose the
additional metadata and enable sampling using more triggers (e.g.,
egress).
Series overview:
Patches #1-#3 extend the SPAN (mirroring) module to accept new
parameters required for sampling. See individual commit messages for
detailed explanation.
Patch #4-#5 split sampling support between Spectrum-1 and later ASIC while
still using the legacy method for all ASIC generations.
Patch #6 converts Spectrum-2 and later ASICs to perform sampling by
mirroring to the CPU port with probability.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 11 Mar 2021 12:24:16 +0000 (14:24 +0200)]
mlxsw: spectrum_matchall: Implement sampling using mirroring
Spectrum-2 and later ASICs support sampling of packets by mirroring to
the CPU with probability. There are several advantages compared to the
legacy dedicated sampling mechanism:
* Extra metadata per-packet: Egress port, egress traffic class, traffic
class occupancy and end-to-end latency
* Ability to sample packets on egress / per-flow
Convert Spectrum-2 and later ASICs to perform sampling by mirroring to
the CPU with probability.
Subsequent patches will add support for egress / per-flow sampling and
expose the extra metadata.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 11 Mar 2021 12:24:15 +0000 (14:24 +0200)]
mlxsw: spectrum_trap: Split sampling traps between ASICs
Sampling of ingress packets is supported using a dedicated sampling
mechanism on all Spectrum ASICs. However, Spectrum-2 and later ASICs
support more sophisticated sampling by mirroring packets to the CPU.
As a preparation for more advanced sampling configurations, split the trap
configuration used for sampled packets between Spectrum-1 and later ASICs.
This is needed since packets that are mirrored to the CPU are trapped
via a different trap identifier compared to packets that are sampled
using the dedicated sampling mechanism.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 11 Mar 2021 12:24:14 +0000 (14:24 +0200)]
mlxsw: spectrum_matchall: Split sampling support between ASICs
Sampling of ingress packets is supported using a dedicated sampling
mechanism on all Spectrum ASICs. However, Spectrum-2 and later ASICs
support more sophisticated sampling by mirroring packets to the CPU.
As a preparation for more advanced sampling configurations, split the
sampling operations between Spectrum-1 and later ASICs.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 11 Mar 2021 12:24:13 +0000 (14:24 +0200)]
mlxsw: spectrum_span: Add SPAN probability rate support
Currently, every packet that matches a mirroring trigger (e.g., received
packets, buffer dropped packets) is mirrored. Spectrum-2 and later ASICs
support mirroring with probability, where every 1 in N matched packets
is mirrored.
Extend the API that creates the binding between the trigger and the SPAN
agent with a probability rate parameter, which is an attribute of the
trigger. Set it to '1' to maintain existing behavior.
Subsequent patches will use it to perform more sophisticated sampling,
by mirroring packets to the CPU with probability.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 11 Mar 2021 12:24:12 +0000 (14:24 +0200)]
mlxsw: reg: Extend mirroring registers with probability rate field
The MPAR and MPAGR registers are used to configure the binding between
the mirroring trigger (e.g., received packet) and the SPAN agent. Add
probability rate field, which will allow us to support sampling by
mirroring to the CPU.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 11 Mar 2021 12:24:11 +0000 (14:24 +0200)]
mlxsw: spectrum_span: Add SPAN session identifier support
When packets are mirrored to the CPU, the trap identifier with which the
packets are trapped is determined according to the session identifier of
the SPAN agent performing the mirroring. Packets that are trapped for
the same logical reason (e.g., buffer drops) should use the same session
identifier.
Currently, a single session is implicitly supported (identifier 0) and
is used for packets that are mirrored to the CPU due to buffer drops
(e.g., early drop).
Subsequent patches are going to mirror packets to the CPU due to
sampling, which will require a different session identifier.
Prepare for that by making the session identifier an attribute of the
SPAN agent.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 12 Mar 2021 00:17:43 +0000 (16:17 -0800)]
Merge tag 'mlx5-updates-2021-03-11' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
This series provides some cleanups to mlx5 driver
For more information please see tag log below.
Please pull and let me know if there is any problem.
mlx5-updates-2021-03-11
Cleanups for mlx5 driver
1) Fix build warnings form Arnd and Vlad
2) Leon improves locking for driver load/unload flows
3) From Roi, Lockdep false dependency warning
4) Other trivial cleanups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 12 Mar 2021 00:13:00 +0000 (16:13 -0800)]
Merge branch 'nexthop-Resilient-next-hop-groups'
Petr Machata says:
====================
nexthop: Resilient next-hop groups
At this moment, there is only one type of next-hop group: an mpath group.
Mpath groups implement the hash-threshold algorithm, described in RFC
2992[1].
To select a next hop, hash-threshold algorithm first assigns a range of
hashes to each next hop in the group, and then selects the next hop by
comparing the SKB hash with the individual ranges. When a next hop is
removed from the group, the ranges are recomputed, which leads to
reassignment of parts of hash space from one next hop to another. RFC 2992
illustrates it thus:
+-------+-------+-------+-------+-------+
| 1 | 2 | 3 | 4 | 5 |
+-------+-+-----+---+---+-----+-+-------+
| 1 | 2 | 4 | 5 |
+---------+---------+---------+---------+
Before and after deletion of next hop 3
under the hash-threshold algorithm.
Note how next hop 2 gave up part of the hash space in favor of next hop 1,
and 4 in favor of 5. While there will usually be some overlap between the
previous and the new distribution, some traffic flows change the next hop
that they resolve to.
If a multipath group is used for load-balancing between multiple servers,
this hash space reassignment causes an issue that packets from a single
flow suddenly end up arriving at a server that does not expect them, which
may lead to TCP reset.
If a multipath group is used for load-balancing among available paths to
the same server, the issue is that different latencies and reordering along
the way causes the packets to arrive in the wrong order.
Resilient hashing is a technique to address the above problem. Resilient
next-hop group has another layer of indirection between the group itself
and its constituent next hops: a hash table. The selection algorithm uses a
straightforward modulo operation on the SKB hash to choose a hash table
bucket, then reads the next hop that this bucket contains, and forwards
traffic there.
This indirection brings an important feature. In the hash-threshold
algorithm, the range of hashes associated with a next hop must be
continuous. With a hash table, mapping between the hash table buckets and
the individual next hops is arbitrary. Therefore when a next hop is deleted
the buckets that held it are simply reassigned to other next hops:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1|1|1|1|2|2|2|2|3|3|3|3|4|4|4|4|5|5|5|5|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
v v v v
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1|1|1|1|2|2|2|2|1|2|4|5|4|4|4|4|5|5|5|5|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Before and after deletion of next hop 3
under the resilient hashing algorithm.
When weights of next hops in a group are altered, it may be possible to
choose a subset of buckets that are currently not used for forwarding
traffic, and use those to satisfy the new next-hop distribution demands,
keeping the "busy" buckets intact. This way, established flows are ideally
kept being forwarded to the same endpoints through the same paths as before
the next-hop group change.
This patch set adds the implementation of resilient next-hop groups.
In a nutshell, the algorithm works as follows. Each next hop has a number
of buckets that it wants to have, according to its weight and the number of
buckets in the hash table. In case of an event that might cause bucket
allocation change, the numbers for individual next hops are updated,
similarly to how ranges are updated for mpath group next hops. Following
that, a new "upkeep" algorithm runs, and for idle buckets that belong to a
next hop that is currently occupying more buckets than it wants (it is
"overweight"), it migrates the buckets to one of the next hops that has
fewer buckets than it wants (it is "underweight"). If, after this, there
are still underweight next hops, another upkeep run is scheduled to a
future time.
Chances are there are not enough "idle" buckets to satisfy the new demands.
The algorithm has knobs to select both what it means for a bucket to be
idle, and for whether and when to forcefully migrate buckets if there keeps
being an insufficient number of idle ones.
To illustrate the usage, consider the following commands:
# ip nexthop add id 1 via 192.0.2.2 dev dummy1
# ip nexthop add id 2 via 192.0.2.3 dev dummy1
# ip nexthop add id 10 group 1/2 type resilient \
buckets 8 idle_timer 60 unbalanced_timer 300
The last command creates a resilient next-hop group. It will have 8
buckets, each bucket will be considered idle when no traffic hits it for at
least 60 seconds, and if the table remains out of balance for 300 seconds,
it will be forcefully brought into balance.
If not present in netlink message, the idle timer defaults to 120 seconds,
and there is no unbalanced timer, meaning the group may remain unbalanced
indefinitely. The value of 120 is the default in Cumulus implementation of
resilient next-hop groups. To a degree the default is arbitrary, the only
value that certainly does not make sense is 0. Therefore going with an
existing deployed implementation is reasonable.
Unbalanced time, i.e. how long since the last time that all nexthops had as
many buckets as they should according to their weights, is reported when
the group is dumped:
# ip nexthop show id 10
id 10 group 1/2 type resilient buckets 8 idle_timer 60 unbalanced_timer 300 unbalanced_time 0
When replacing next hops or changing weights, if one does not specify some
parameters, their value is left as it was:
# ip nexthop replace id 10 group 1,2/2 type resilient
# ip nexthop show id 10
id 10 group 1,2/2 type resilient buckets 8 idle_timer 60 unbalanced_timer 300 unbalanced_time 0
It is also possible to do a dump of individual buckets (and now you know
why there were only 8 of them in the example above):
# ip nexthop bucket show id 10
id 10 index 0 idle_time 5.59 nhid 1
id 10 index 1 idle_time 5.59 nhid 1
id 10 index 2 idle_time 8.74 nhid 2
id 10 index 3 idle_time 8.74 nhid 2
id 10 index 4 idle_time 8.74 nhid 1
id 10 index 5 idle_time 8.74 nhid 1
id 10 index 6 idle_time 8.74 nhid 1
id 10 index 7 idle_time 8.74 nhid 1
Note the two buckets that have a shorter idle time. Those are the ones that
were migrated after the nexthop replace command to satisfy the new demand
that nexthop 1 be given 6 buckets instead of 4.
The patchset proceeds as follows:
- Patches #1 and #2 are small refactoring patches.
- Patch #3 adds a new flag to struct nh_group, is_multipath. This flag is
meant to be set for all nexthop groups that in general have several
nexthops from which they choose, and avoids a more expensive dispatch
based on reading several flags, one for each nexthop group type.
- Patch #4 contains defines of new UAPI attributes and the new next-hop
group type. At this point, the nexthop code is made to bounce the new
type. As the resilient hashing code is gradually added in the following
patch sets, it will remain dead. The last patch will make it accessible.
This patch also adds a suite of new messages related to next hop buckets.
This approach was taken instead of overloading the information on the
existing RTM_{NEW,DEL,GET}NEXTHOP messages for the following reasons.
First, a next-hop group can contain a large number of next-hop buckets
(4k is not unheard of). This imposes limits on the amount of information
that can be encoded for each next-hop bucket given a netlink message is
limited to 64k bytes.
Second, while RTM_NEWNEXTHOPBUCKET is only used for notifications at this
point, in the future it can be extended to provide user space with
control over next-hop buckets configuration.
- Patch #5 contains the meat of the resilient next-hop group support.
- Patches #6 and #7 implement support for notifications towards the
drivers.
- Patch #8 adds an interface for the drivers to report resilient hash
table bucket activity. Drivers will be able to report through this
interface whether traffic is hitting a given bucket.
- Patch #9 adds an interface for the drivers to report whether a given
hash table bucket is offloaded or trapping traffic.
- In patches #10, #11, #12 and #13, UAPI is implemented. This includes all
the code necessary for creation of resilient groups, bucket dumping and
getting, and bucket migration notifications.
- In patch #14 the next-hop groups are finally made available.
The overall plan is to contribute approximately the following patchsets:
1) Nexthop policy refactoring (already pushed)
2) Preparations for resilient next-hop groups (already pushed)
3) Implementation of resilient next-hop groups (this patchset)
4) Netdevsim offload plus a suite of selftests
5) Preparations for mlxsw offload of resilient next-hop groups
6) mlxsw offload including selftests
Interested parties can look at the current state of the code at [2] and
[3].
[1] https://tools.ietf.org/html/rfc2992
[2] https://github.com/idosch/linux/commits/submit/res_integ_v1
[3] https://github.com/idosch/iproute2/commits/submit/res_v1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 11 Mar 2021 18:03:25 +0000 (19:03 +0100)]
nexthop: Enable resilient next-hop groups
Now that all the code is in place, stop rejecting requests to create
resilient next-hop groups.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 11 Mar 2021 18:03:24 +0000 (19:03 +0100)]
nexthop: Notify userspace about bucket migrations
Nexthop replacements et.al. are notified through netlink, but if a delayed
work migrates buckets on the background, userspace will stay oblivious.
Notify these as RTM_NEWNEXTHOPBUCKET events.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 11 Mar 2021 18:03:23 +0000 (19:03 +0100)]
nexthop: Add netlink handlers for bucket get
Allow getting (but not setting) individual buckets to inspect the next hop
mapped therein, idle time, and flags.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 11 Mar 2021 18:03:22 +0000 (19:03 +0100)]
nexthop: Add netlink handlers for bucket dump
Add a dump handler for resilient next hop buckets. When next-hop group ID
is given, it walks buckets of that group, otherwise it walks buckets of all
groups. It then dumps the buckets whose next hops match the given filtering
criteria.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 11 Mar 2021 18:03:21 +0000 (19:03 +0100)]
nexthop: Add netlink handlers for resilient nexthop groups
Implement the netlink messages that allow creation and dumping of resilient
nexthop groups.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 11 Mar 2021 18:03:20 +0000 (19:03 +0100)]
nexthop: Allow reporting activity of nexthop buckets
The kernel periodically checks the idle time of nexthop buckets to
determine if they are idle and can be re-populated with a new nexthop.
When the resilient nexthop group is offloaded to hardware, the kernel
will not see activity on nexthop buckets unless it is reported from
hardware.
Add a function that can be periodically called by device drivers to
report activity on nexthop buckets after querying it from the underlying
device.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 11 Mar 2021 18:03:19 +0000 (19:03 +0100)]
nexthop: Allow setting "offload" and "trap" indication of nexthop buckets
Add a function that can be called by device drivers to set "offload" or
"trap" indication on nexthop buckets following nexthop notifications and
other changes such as a neighbour becoming invalid.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 11 Mar 2021 18:03:18 +0000 (19:03 +0100)]
nexthop: Implement notifiers for resilient nexthop groups
Implement the following notifications towards drivers:
- NEXTHOP_EVENT_REPLACE, when a resilient nexthop group is created.
- NEXTHOP_EVENT_BUCKET_REPLACE any time there is a change in assignment of
next hops to hash table buckets. That includes replacements, deletions,
and delayed upkeep cycles. Some bucket notifications can be vetoed by the
driver, to make it possible to propagate bucket busy-ness flags from the
HW back to the algorithm. Some are however forced, e.g. if a next hop is
deleted, all buckets that use this next hop simply must be migrated,
whether the HW wishes so or not.
- NEXTHOP_EVENT_RES_TABLE_PRE_REPLACE, before a resilient nexthop group is
replaced. Usually the driver will get the bucket notifications as well,
and could veto those. But in some cases, a bucket may not be migrated
immediately, but during delayed upkeep, and that is too late to roll the
transaction back. This notification allows the driver to take a look and
veto the new proposed group up front, before anything is committed.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 11 Mar 2021 18:03:17 +0000 (19:03 +0100)]
nexthop: Add data structures for resilient group notifications
Add data structures that will be used for in-kernel notifications about
addition / deletion of a resilient nexthop group and about changes to a
hash bucket within a resilient group.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 11 Mar 2021 18:03:16 +0000 (19:03 +0100)]
nexthop: Add implementation of resilient next-hop groups
At this moment, there is only one type of next-hop group: an mpath group,
which implements the hash-threshold algorithm.
To select a next hop, hash-threshold algorithm first assigns a range of
hashes to each next hop in the group, and then selects the next hop by
comparing the SKB hash with the individual ranges. When a next hop is
removed from the group, the ranges are recomputed, which leads to
reassignment of parts of hash space from one next hop to another. While
there will usually be some overlap between the previous and the new
distribution, some traffic flows change the next hop that they resolve to.
That causes problems e.g. as established TCP connections are reset, because
the traffic is forwarded to a server that is not familiar with the
connection.
Resilient hashing is a technique to address the above problem. Resilient
next-hop group has another layer of indirection between the group itself
and its constituent next hops: a hash table. The selection algorithm uses a
straightforward modulo operation to choose a hash bucket, and then reads
the next hop that this bucket contains, and forwards traffic there.
This indirection brings an important feature. In the hash-threshold
algorithm, the range of hashes associated with a next hop must be
continuous. With a hash table, mapping between the hash table buckets and
the individual next hops is arbitrary. Therefore when a next hop is deleted
the buckets that held it are simply reassigned to other next hops. When
weights of next hops in a group are altered, it may be possible to choose a
subset of buckets that are currently not used for forwarding traffic, and
use those to satisfy the new next-hop distribution demands, keeping the
"busy" buckets intact. This way, established flows are ideally kept being
forwarded to the same endpoints through the same paths as before the
next-hop group change.
In a nutshell, the algorithm works as follows. Each next hop has a number
of buckets that it wants to have, according to its weight and the number of
buckets in the hash table. In case of an event that might cause bucket
allocation change, the numbers for individual next hops are updated,
similarly to how ranges are updated for mpath group next hops. Following
that, a new "upkeep" algorithm runs, and for idle buckets that belong to a
next hop that is currently occupying more buckets than it wants (it is
"overweight"), it migrates the buckets to one of the next hops that has
fewer buckets than it wants (it is "underweight"). If, after this, there
are still underweight next hops, another upkeep run is scheduled to a
future time.
Chances are there are not enough "idle" buckets to satisfy the new demands.
The algorithm has knobs to select both what it means for a bucket to be
idle, and for whether and when to forcefully migrate buckets if there keeps
being an insufficient number of idle buckets.
There are three users of the resilient data structures.
- The forwarding code accesses them under RCU, and does not modify them
except for updating the time a selected bucket was last used.
- Netlink code, running under RTNL, which may modify the data.
- The delayed upkeep code, which may modify the data. This runs unlocked,
and mutual exclusion between the RTNL code and the delayed upkeep is
maintained by canceling the delayed work synchronously before the RTNL
code touches anything. Later it restarts the delayed work if necessary.
The RTNL code has to implement next-hop group replacement, next hop
removal, etc. For removal, the mpath code uses a neat trick of having a
backup next hop group structure, doing the necessary changes offline, and
then RCU-swapping them in. However, the hash tables for resilient hashing
are about an order of magnitude larger than the groups themselves (the size
might be e.g. 4K entries), and it was felt that keeping two of them is an
overkill. Both the primary next-hop group and the spare therefore use the
same resilient table, and writers are careful to keep all references valid
for the forwarding code. The hash table references next-hop group entries
from the next-hop group that is currently in the primary role (i.e. not
spare). During the transition from primary to spare, the table references a
mix of both the primary group and the spare. When a next hop is deleted,
the corresponding buckets are not set to NULL, but instead marked as empty,
so that the pointer is valid and can be used by the forwarding code. The
buckets are then migrated to a new next-hop group entry during upkeep. The
only times that the hash table is invalid is the very beginning and very
end of its lifetime. Between those points, it is always kept valid.
This patch introduces the core support code itself. It does not handle
notifications towards drivers, which are kept as if the group were an mpath
one. It does not handle netlink either. The only bit currently exposed to
user space is the new next-hop group type, and that is currently bounced.
There is therefore no way to actually access this code.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 11 Mar 2021 18:03:15 +0000 (19:03 +0100)]
nexthop: Add netlink defines and enumerators for resilient NH groups
- RTM_NEWNEXTHOP et.al. that handle resilient groups will have a new nested
attribute, NHA_RES_GROUP, whose elements are attributes NHA_RES_GROUP_*.
- RTM_NEWNEXTHOPBUCKET et.al. is a suite of new messages that will
currently serve only for dumping of individual buckets of resilient next
hop groups. For nexthop group buckets, these messages will carry a nested
attribute NHA_RES_BUCKET, whose elements are attributes NHA_RES_BUCKET_*.
There are several reasons why a new suite of messages is created for
nexthop buckets instead of overloading the information on the existing
RTM_{NEW,DEL,GET}NEXTHOP messages.
First, a nexthop group can contain a large number of nexthop buckets (4k
is not unheard of). This imposes limits on the amount of information that
can be encoded for each nexthop bucket given a netlink message is limited
to 64k bytes.
Second, while RTM_NEWNEXTHOPBUCKET is only used for notifications at
this point, in the future it can be extended to provide user space with
control over nexthop buckets configuration.
- The new group type is NEXTHOP_GRP_TYPE_RES. Note that nexthop code is
adjusted to bounce groups with that type for now.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 11 Mar 2021 18:03:14 +0000 (19:03 +0100)]
nexthop: Add a dedicated flag for multipath next-hop groups
With the introduction of resilient nexthop groups, there will be two types
of multipath groups: the current hash-threshold "mpath" ones, and resilient
groups. Both are multipath, but to determine the fact, the system needs to
consider two flags. This might prove costly in the datapath. Therefore,
introduce a new flag, that should be set for next-hop groups that have more
than one nexthop, and should be considered multipath.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 11 Mar 2021 18:03:13 +0000 (19:03 +0100)]
nexthop: __nh_notifier_single_info_init(): Make nh_info an argument
The cited function currently uses rtnl_dereference() to get nh_info from a
handed-in nexthop. However, under the resilient hashing scheme, this
function will not always be called under RTNL, sometimes the mutual
exclusion will be achieved differently. Therefore move the nh_info
extraction from the function to its callers to make it possible to use a
different synchronization guarantee.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Thu, 11 Mar 2021 18:03:12 +0000 (19:03 +0100)]
nexthop: Pass nh_config to replace_nexthop()
Currently, replace assumes that the new group that is given is a
fully-formed object. But mpath groups really only have one attribute, and
that is the constituent next hop configuration. This may not be universally
true. From the usability perspective, it is desirable to allow the replace
operation to adjust just the constituent next hop configuration and leave
the group attributes as such intact.
But the object that keeps track of whether an attribute was or was not
given is the nh_config object, not the next hop or next-hop group. To allow
(selective) attribute updates during NH group replacement, propagate `cfg'
to replace_nexthop() and further to replace_nexthop_grp().
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 12 Mar 2021 00:09:21 +0000 (16:09 -0800)]
Merge branch 'seg6-next'
Julien Massonneau says:
====================
SRv6: SRH processing improvements
Add support for IPv4 decapsulation in ipv6_srh_rcv() and
ignore routing header with segments left equal to 0 for
seg6local actions that doesn't perfom decapsulation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julien Massonneau [Thu, 11 Mar 2021 15:53:19 +0000 (16:53 +0100)]
seg6: ignore routing header with segments left equal to 0
When there are 2 segments routing header, after an End.B6 action
for example, the second SRH will never be handled by an action, packet will
be dropped when the first SRH has segments left equal to 0.
For actions that doesn't perform decapsulation (currently: End, End.X,
End.T, End.B6, End.B6.Encaps), this patch adds the IP6_FH_F_SKIP_RH flag
in arguments for ipv6_find_hdr().
Signed-off-by: Julien Massonneau <julien.massonneau@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julien Massonneau [Thu, 11 Mar 2021 15:53:18 +0000 (16:53 +0100)]
seg6: add support for IPv4 decapsulation in ipv6_srh_rcv()
As specified in IETF RFC 8754, section 4.3.1.2, if the upper layer
header is IPv4 or IPv6, perform IPv6 decapsulation and resubmit the
decapsulated packet to the IPv4 or IPv6 module.
Only IPv6 decapsulation was implemented. This patch adds support for IPv4
decapsulation.
Link: https://tools.ietf.org/html/rfc8754#section-4.3.1.2
Signed-off-by: Julien Massonneau <julien.massonneau@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 12 Mar 2021 00:01:10 +0000 (16:01 -0800)]
Merge branch 'hns3-next'
Huazhong Tan says:
====================
net: hns3: two updates for -next
This series includes two updates for the HNS3 ethernet driver.
====================
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Thu, 11 Mar 2021 02:14:12 +0000 (10:14 +0800)]
net: hns3: use pause capability queried from firmware
For maintainability and compatibility, add support to use pause
capability queried from firmware, and add debugfs support to dump
this capability.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Thu, 11 Mar 2021 02:14:11 +0000 (10:14 +0800)]
net: hns3: use FEC capability queried from firmware
For maintainability and compatibility, add support to use FEC
capability queried from firmware.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roi Dayan [Tue, 9 Mar 2021 16:16:52 +0000 (18:16 +0200)]
net/mlx5e: Alloc flow spec using kvzalloc instead of kzalloc
flow spec is not small and we do allocate it using kvzalloc
in most places of the driver. fix rest of the places
to use kvzalloc to avoid failure in allocation when
memory is too fragmented.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Eli Cohen [Mon, 8 Feb 2021 09:07:53 +0000 (11:07 +0200)]
net/mlx5: Avoid unnecessary operation
fs_get_obj retrieves the container of fs_parent_node just to pass the
same value as &fs_ns->node. Just pass fs_parent_node to
init_root_tree_recursive() to get exactly the same effect.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Saeed Mahameed [Mon, 14 Dec 2020 22:07:19 +0000 (14:07 -0800)]
net/mlx5e: rep: Improve reg_cX conditions
There is no point of calculating reg_c1 or overriding reg_c0 if we are
going to abort the function.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Roi Dayan [Wed, 3 Mar 2021 08:36:02 +0000 (10:36 +0200)]
net/mlx5: SF, Fix return type
Fix the following coccicheck warnings:
drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h:50:8-9: WARNING:
return of 0/1 in function 'mlx5_sf_dev_allocated' with return type bool
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Saeed Mahameed [Tue, 16 Feb 2021 21:42:22 +0000 (13:42 -0800)]
net/mlx5e: mlx5_tc_ct_init does not fail
mlx5_tc_ct_init() either returns a valid pointer or a NULL, either way
the caller can continue, remove IS_ERR check from callers as it has no
effect.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Vlad Buslov [Mon, 1 Mar 2021 09:43:58 +0000 (11:43 +0200)]
net/mlx5: Fix indir stable stubs
Some of the stubs for CONFIG_MLX5_CLS_ACT==disabled are missing "static
inline" in their definition which causes the following compilation
warnings:
In file included from drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:41:
>> drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:34:1: warning: no previous prototype for function 'mlx5_esw_indir_table_init' [-Wmissing-prototypes]
mlx5_esw_indir_table_init(void)
^
drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:33:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
struct mlx5_esw_indir_table *
^
static
>> drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:40:1: warning: no previous prototype for function 'mlx5_esw_indir_table_destroy' [-Wmissing-prototypes]
mlx5_esw_indir_table_destroy(struct mlx5_esw_indir_table *indir)
^
drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:39:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void
^
static
>> drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:61:1: warning: no previous prototype for function 'mlx5_esw_indir_table_needed' [-Wmissing-prototypes]
mlx5_esw_indir_table_needed(struct mlx5_eswitch *esw,
^
drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:60:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
bool
^
static
3 warnings generated.
Add "static inline" prefix to signatures of stubs that were missing it.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Vlad Buslov [Mon, 1 Mar 2021 09:28:15 +0000 (11:28 +0200)]
net/mlx5e: Add missing include
When CONFIG_IPV6 is disabled the header nexthop.h is not included by
fib_notifier.h which causes tc_tun_encap.c to fail to compile:
In file included from drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:5:
In file included from drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.h:7:
In file included from drivers/net/ethernet/mellanox/mlx5/core/en/tc_priv.h:7:
In file included from drivers/net/ethernet/mellanox/mlx5/core/en_tc.h:40:
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h:78:5: warning: no previous prototype for function 'mlx5e_tc_tun_update_header_ipv6' [-Wmissing-prototypes]
int mlx5e_tc_tun_update_header_ipv6(struct mlx5e_priv *priv,
^
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h:78:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int mlx5e_tc_tun_update_header_ipv6(struct mlx5e_priv *priv,
^
static
>> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1510:12: error: implicit declaration of function 'fib_info_nh' [-Werror,-Wimplicit-function-declaration]
fib_dev = fib_info_nh(fen_info->fi, 0)->fib_nh_dev;
^
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1510:12: note: did you mean 'fib_info_put'?
include/net/ip_fib.h:528:20: note: 'fib_info_put' declared here
static inline void fib_info_put(struct fib_info *fi)
^
>> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1510:42: error: member reference type 'int' is not a pointer
fib_dev = fib_info_nh(fen_info->fi, 0)->fib_nh_dev;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
include/net/ip_fib.h:113:21: note: expanded from macro 'fib_nh_dev'
#define fib_nh_dev nh_common.nhc_dev
^
>> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1552:13: error: incomplete definition of type 'struct fib6_entry_notifier_info'
fen_info = container_of(info, struct fib6_entry_notifier_info, info);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/kernel.h:694:51: note: expanded from macro 'container_of'
BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
include/linux/compiler_types.h:256:74: note: expanded from macro '__same_type'
#define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
^
include/linux/build_bug.h:39:58: note: expanded from macro 'BUILD_BUG_ON_MSG'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
include/linux/compiler_types.h:320:22: note: expanded from macro 'compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/compiler_types.h:308:23: note: expanded from macro '_compiletime_assert'
__compiletime_assert(condition, msg, prefix, suffix)
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/compiler_types.h:300:9: note: expanded from macro '__compiletime_assert'
if (!(condition)) \
^~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info'
struct fib6_entry_notifier_info *fen_info;
^
>> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1552:13: error: offsetof of incomplete type 'struct fib6_entry_notifier_info'
fen_info = container_of(info, struct fib6_entry_notifier_info, info);
^ ~~~~~~
include/linux/kernel.h:697:21: note: expanded from macro 'container_of'
((type *)(__mptr - offsetof(type, member))); })
^ ~~~~
include/linux/stddef.h:17:32: note: expanded from macro 'offsetof'
#define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
^ ~~~~
include/linux/compiler_types.h:140:35: note: expanded from macro '__compiler_offsetof'
#define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
^ ~
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info'
struct fib6_entry_notifier_info *fen_info;
^
>> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1552:11: error: assigning to 'struct fib6_entry_notifier_info *' from incompatible type 'void'
fen_info = container_of(info, struct fib6_entry_notifier_info, info);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1553:12: error: implicit declaration of function 'fib6_info_nh_dev' [-Werror,-Wimplicit-function-declaration]
fib_dev = fib6_info_nh_dev(fen_info->rt);
^
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1553:37: error: incomplete definition of type 'struct fib6_entry_notifier_info'
fib_dev = fib6_info_nh_dev(fen_info->rt);
~~~~~~~~^
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info'
struct fib6_entry_notifier_info *fen_info;
^
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1555:14: error: incomplete definition of type 'struct fib6_entry_notifier_info'
fen_info->rt->fib6_dst.plen != 128)
~~~~~~~~^
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info'
struct fib6_entry_notifier_info *fen_info;
^
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1562:39: error: incomplete definition of type 'struct fib6_entry_notifier_info'
memcpy(&key.endpoint_ip.v6, &fen_info->rt->fib6_dst.addr,
~~~~~~~~^
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info'
struct fib6_entry_notifier_info *fen_info;
^
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1563:24: error: incomplete definition of type 'struct fib6_entry_notifier_info'
sizeof(fen_info->rt->fib6_dst.addr));
~~~~~~~~^
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info'
struct fib6_entry_notifier_info *fen_info;
^
1 warning and 10 errors generated.
Manually include net/nexthop.h in tc_tun_encap.c.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Arnd Bergmann [Thu, 25 Feb 2021 12:54:54 +0000 (13:54 +0100)]
net/mlx5e: fix mlx5e_tc_tun_update_header_ipv6 dummy definition
The alternative implementation of this function in a header file
is declared as a global symbol, and gets added to every .c file
that includes it, which leads to a link error:
arm-linux-gnueabi-ld: drivers/net/ethernet/mellanox/mlx5/core/en_rx.o: in function `mlx5e_tc_tun_update_header_ipv6':
en_rx.c:(.text+0x0): multiple definition of `mlx5e_tc_tun_update_header_ipv6'; drivers/net/ethernet/mellanox/mlx5/core/en_main.o:en_main.c:(.text+0x0): first defined here
Mark it 'static inline' like the other functions here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Thu, 19 Nov 2020 12:20:44 +0000 (14:20 +0200)]
net/mlx5e: CT, Avoid false lock dependency warning
To avoid false lock dependency warning set the ct_entries_ht lock
class different than the lock class of the ht being used when deleting
last flow from a group and then deleting a group, we get into del_sw_flow_group()
which call rhashtable_destroy on fg->ftes_hash which will take ht->mutex but
it's different than the ht->mutex here.
======================================================
WARNING: possible circular locking dependency detected
5.10.0-rc2+ #8 Tainted: G O
------------------------------------------------------
revalidator23/24009 is trying to acquire lock:
ffff888128d83828 (&node->lock){++++}-{3:3}, at: mlx5_del_flow_rules+0x83/0x7a0 [mlx5_core]
but task is already holding lock:
ffff8881081ef518 (&ht->mutex){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x37/0x720
which lock already depends on the new lock.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Leon Romanovsky [Tue, 3 Nov 2020 16:46:31 +0000 (18:46 +0200)]
net/mlx5: Check returned value from health recover sequence
MLX5_INTERFACE_STATE_UP is far from being reliable check for success to
recover, because it can be changed any time and health logic doesn't
have any locks to protect from it.
The locks are not needed here because health recover is good to have,
but not must to success, so rely on the returned value from the
mlx5_recover_device() as a marker for success/failure.
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Leon Romanovsky [Tue, 3 Nov 2020 13:17:10 +0000 (15:17 +0200)]
net/mlx5: Don't rely on interface state bit
The check of MLX5_INTERFACE_STATE_UP is completely useless, because
the FW tracer cleanup is called on every change of the interface
and it ensures that notifier is disabled together with canceling
all the pending works.
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Leon Romanovsky [Tue, 3 Nov 2020 12:42:45 +0000 (14:42 +0200)]
net/mlx5: Remove second FW tracer check
The FW tracer check is called twice, so delete one of them.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Leon Romanovsky [Mon, 2 Nov 2020 14:54:43 +0000 (16:54 +0200)]
net/mlx5: Separate probe vs. reload flows
The mix between probe/unprobe and reload flows causes to have an extra
mutex lock intf_state_mutex that generates LOCKDEP warning between it
and devlink_mutex. As a preparation for the future removal, separate
those flows.
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Leon Romanovsky [Mon, 2 Nov 2020 14:02:14 +0000 (16:02 +0200)]
net/mlx5: Remove impossible checks of interface state
The interface state is constant at this stage and checked
before calling to the register/unregister reserved GIDs.
There is no need to double check it.
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Saeed Mahameed [Thu, 3 Dec 2020 09:55:16 +0000 (11:55 +0200)]
net/mlx5: Don't skip vport check
Users of mlx5_eswitch_get_vport() are required to check return value
prior to passing mlx5_vport further. Fix all the places to do not skip
that check.
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Jiapeng Chong [Thu, 11 Mar 2021 07:11:01 +0000 (15:11 +0800)]
netdevsim: fib: Remove redundant code
Fix the following coccicheck warnings:
./drivers/net/netdevsim/fib.c:874:5-8: Unneeded variable: "err". Return
"0" on line 889.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 10 Mar 2021 22:12:43 +0000 (14:12 -0800)]
net: phy: Expose phydev::dev_flags through sysfs
phydev::dev_flags contains a bitmask of configuration bits requested by
the consumer of a PHY device (Ethernet MAC or switch) towards the PHY
driver. Since these flags are often used for requesting LED or other
type of configuration being able to quickly audit them without
instrumenting the kernel is useful.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 10 Mar 2021 18:52:26 +0000 (10:52 -0800)]
net: dsa: b53: Add debug prints in b53_vlan_enable()
Having dynamic debug prints in b53_vlan_enable() has been helpful to
uncover a recent but update the function to indicate the port being
configured (or -1 for initial setup) and include the global VLAN enabled
and VLAN filtering enable status.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bhaskar Chowdhury [Wed, 10 Mar 2021 22:51:23 +0000 (04:21 +0530)]
net: fddi: skfp: Mundane typo fixes throughout the file smt.h
Few spelling fixes throughout the file.
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shubhankar Kuranagatti [Wed, 10 Mar 2021 21:13:43 +0000 (02:43 +0530)]
net: ipv4: route.c: fix space before tab
The extra space before tab space has been removed.
Signed-off-by: Shubhankar Kuranagatti <shubhankarvk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 Mar 2021 23:34:28 +0000 (15:34 -0800)]
Merge branch 'ionic-next'
Shannon Nelson says:
====================
ionic Rx updates
The ionic driver's Rx path is due for an overhaul in order to
better use memory buffers and to clean up the data structures.
The first two patches convert the driver to using page sharing
between buffers so as to lessen the page alloc and free overhead.
The remaining patches clean up the structs and fastpath code for
better efficency.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 10 Mar 2021 19:26:31 +0000 (11:26 -0800)]
ionic: simplify use of completion types
Make better use of our struct types and type checking by passing
the actual Rx or Tx completion type rather than a generic void
pointer type.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 10 Mar 2021 19:26:30 +0000 (11:26 -0800)]
ionic: rebuild debugfs on qcq swap
With a reconfigure of each queue is needed a rebuild of
the matching debugfs information.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 10 Mar 2021 19:26:29 +0000 (11:26 -0800)]
ionic: simplify rx skb alloc
Remove an unnecessary layer over rx skb allocation.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>