Jakub Kicinski [Tue, 17 Nov 2020 17:16:13 +0000 (09:16 -0800)]
Merge branch 'net-dsa-tag_dsa-unify-regular-and-ethertype-dsa-taggers'
Tobias Waldekranz says:
====================
net: dsa: tag_dsa: Unify regular and ethertype DSA taggers
The first patch ports tag_edsa.c's handling of IGMP/MLD traps to
tag_dsa.c. That way, we start from two logically equivalent taggers
that are then merged. The second commit does the heavy lifting of
actually fusing tag_dsa.c and tag_edsa.c. The final one just follows
up with some clean up of existing comments.
v2 -> v3:
- Add the first patch described above as suggested by Andrew.
- Better documentation of TO_SNIFFER and FORWARD tags.
- Spelling.
v1 -> v2:
- Fixed some grammar and whitespace errors.
- Removed unnecessary default value in Kconfig.
- Removed unnecessary #ifdef.
- Split out comment fixes from functional changes.
- Fully document enum dsa_code.
====================
Link: https://lore.kernel.org/r/20201114234558.31203-1-tobias@waldekranz.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tobias Waldekranz [Sat, 14 Nov 2020 23:45:58 +0000 (00:45 +0100)]
net: dsa: tag_dsa: Use a consistent comment style
Use a consistent style of one-line/multi-line comments throughout the
file.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tobias Waldekranz [Sat, 14 Nov 2020 23:45:57 +0000 (00:45 +0100)]
net: dsa: tag_dsa: Unify regular and ethertype DSA taggers
Ethertype DSA encodes exactly the same information in the DSA tag as
the non-ethertype variety. So refactor out the common parts and reuse
them for both protocols.
This is ensures tag parsing and generation is always consistent across
all mv88e6xxx chips.
While we are at it, explicitly deal with all possible CPU codes on
receive, making sure to set offload_fwd_mark as appropriate.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tobias Waldekranz [Sat, 14 Nov 2020 23:45:56 +0000 (00:45 +0100)]
net: dsa: tag_dsa: Allow forwarding of redirected IGMP traffic
When receiving an IGMP/MLD frame with a TO_CPU tag, the switch has not
performed any forwarding of it. This means that we should not set the
offload_fwd_mark on the skb, in case a software bridge wants it
forwarded.
This is a port of:
1ed9ec9b08ad ("dsa: Allow forwarding of redirected IGMP traffic")
Which corrected the issue for chips using EDSA tags, but not for those
using regular DSA tags.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 16 Nov 2020 18:46:10 +0000 (10:46 -0800)]
Merge branch 'mptcp-improve-multiple-xmit-streams-support'
Paolo Abeni says:
====================
mptcp: improve multiple xmit streams support
This series improves MPTCP handling of multiple concurrent
xmit streams.
The to-be-transmitted data is enqueued to a subflow only when
the send window is open, keeping the subflows xmit queue shorter
and allowing for faster switch-over.
The above requires a more accurate msk socket state tracking
and some additional infrastructure to allow pushing the data
pending in the msk xmit queue as soon as the MPTCP's send window
opens (patches 6-10).
As a side effect, the MPTCP socket could enqueue data to subflows
after close() time - to completely spooling the data sitting in the
msk xmit queue. Dealing with the requires some infrastructure and
core TCP changes (patches 1-5)
Finally, patches 11-12 introduce a more accurate tracking of the other
end's receive window.
Overall this refactor the MPTCP xmit path, without introducing
new features - the new code is covered by the existing self-tests.
v2 -> v3:
- rebased,
- fixed checkpatch issue in patch 1/13
- fixed some state tracking issues in patch 8/13
v1 -> v2:
- this is just a repost, to cope with patchwork issues, no changes
at all
====================
Link: https://lore.kernel.org/r/cover.1605458224.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 16 Nov 2020 09:48:14 +0000 (10:48 +0100)]
mptcp: send explicit ack on delayed ack_seq incr
When the worker moves some bytes from the OoO queue into
the receive queue, the msk->ask_seq is updated, the MPTCP-level
ack carrying that value needs to wait the next ingress packet,
possibly slowing down or hanging the peer
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal [Mon, 16 Nov 2020 09:48:13 +0000 (10:48 +0100)]
mptcp: keep track of advertised windows right edge
Before sending 'x' new bytes also check that the new snd_una would
be within the permitted receive window.
For every ACK that also contains a DSS ack, check whether its tcp-level
receive window would advance the current mptcp window right edge and
update it if so.
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal [Mon, 16 Nov 2020 09:48:12 +0000 (10:48 +0100)]
mptcp: rework poll+nospace handling
MPTCP maintains a status bit, MPTCP_SEND_SPACE, that is set when at
least one subflow and the mptcp socket itself are writeable.
mptcp_poll returns EPOLLOUT if the bit is set.
mptcp_sendmsg makes sure MPTCP_SEND_SPACE gets cleared when last write
has used up all subflows or the mptcp socket wmem.
This reworks nospace handling as follows:
MPTCP_SEND_SPACE is replaced with MPTCP_NOSPACE, i.e. inverted meaning.
This bit is set when the mptcp socket is not writeable.
The mptcp-level ack path schedule will then schedule the mptcp worker
to allow it to free already-acked data (and reduce wmem usage).
This will then wake userspace processes that wait for a POLLOUT event.
sendmsg will set MPTCP_NOSPACE only when it has to wait for more
wmem (blocking I/O case).
poll path will set MPTCP_NOSPACE in case the mptcp socket is
not writeable.
Normal tcp-level notification (SOCK_NOSPACE) is only enabled
in case the subflow socket has no available wmem.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 16 Nov 2020 09:48:11 +0000 (10:48 +0100)]
mptcp: try to push pending data on snd una updates
After the previous patch we may end-up with unsent data
in the write buffer. If such buffer is full, the writer
will block for unlimited time.
We need to trigger the MPTCP xmit path even for the
subflow rx path, on MPTCP snd_una updates.
Keep things simple and just schedule the work queue if
needed.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 16 Nov 2020 09:48:10 +0000 (10:48 +0100)]
mptcp: move page frag allocation in mptcp_sendmsg()
mptcp_sendmsg() is refactored so that first it copies
the data provided from user space into the send queue,
and then tries to spool the send queue via sendmsg_frag.
There a subtle change in the mptcp level collapsing on
consecutive data fragment: we now allow that only on unsent
data.
The latter don't need to deal with msghdr data anymore
and can be simplified in a relevant way.
snd_nxt and write_seq are now tracked independently.
Overall this allows some relevant cleanup and will
allow sending pending mptcp data on msk una update in
later patch.
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 16 Nov 2020 09:48:09 +0000 (10:48 +0100)]
mptcp: refactor shutdown and close
We must not close the subflows before all the MPTCP level
data, comprising the DATA_FIN has been acked at the MPTCP
level, otherwise we could be unable to retransmit as needed.
__mptcp_wr_shutdown() shutdown is responsible to check for the
correct status and close all subflows. Is called by the output
path after spooling any data and at shutdown/close time.
In a similar way, __mptcp_destroy_sock() is responsible to clean-up
the MPTCP level status, and is called when the msk transition
to TCP_CLOSE.
The protocol level close() does not force anymore the TCP_CLOSE
status, but orphan the msk socket and all the subflows.
Orphaned msk sockets are forciby closed after a timeout or
when all MPTCP-level data is acked.
There is a caveat about keeping the orphaned subflows around:
the TCP stack can asynchronusly call tcp_cleanup_ulp() on them via
tcp_close(). To prevent accessing freed memory on later MPTCP
level operations, the msk acquires a reference to each subflow
socket and prevent subflow_ulp_release() from releasing the
subflow context before __mptcp_destroy_sock().
The additional subflow references are released by __mptcp_done()
and the async ULP release is detected checking ULP ops. If such
field has been already cleared by the ULP release path, the
dangling context is freed directly by __mptcp_done().
Co-developed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 16 Nov 2020 09:48:08 +0000 (10:48 +0100)]
mptcp: introduce MPTCP snd_nxt
Track the next MPTCP sequence number used on xmit,
currently always equal to write_next.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 16 Nov 2020 09:48:07 +0000 (10:48 +0100)]
mptcp: add accounting for pending data
Preparation patch to track the data pending in the msk
write queue. No functional change introduced here
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 16 Nov 2020 09:48:06 +0000 (10:48 +0100)]
mptcp: reduce the arguments of mptcp_sendmsg_frag
The current argument list is pretty long and quite unreadable,
move many of them into a specific struct. Later patches
will add more stuff to such struct.
Additionally drop the 'timeo' argument, now unused.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 16 Nov 2020 09:48:05 +0000 (10:48 +0100)]
mptcp: introduce mptcp_schedule_work
remove some of code duplications an allow preventing
rescheduling on close.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 16 Nov 2020 09:48:04 +0000 (10:48 +0100)]
tcp: factor out __tcp_close() helper
unlocked version of protocol level close, will be used by
MPTCP to allow decouple orphaning and subflow level close.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 16 Nov 2020 09:48:03 +0000 (10:48 +0100)]
mptcp: use tcp_build_frag()
mptcp_push_pending() is called even on orphaned
msk (and orphaned subflows), if there is outstanding
data at close() time.
To cope with the above MPTCP needs to handle explicitly
the allocation failure on xmit. The newly introduced
do_tcp_sendfrag() allows that, just plug it.
We can additionally drop a couple of sanity checks,
duplicate in the TCP code.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 16 Nov 2020 09:48:02 +0000 (10:48 +0100)]
tcp: factor out tcp_build_frag()
Will be needed by the next patch, as MPTCP needs to handle
directly the error/memory-allocation-needed path.
No functional changes intended.
Additionally let MPTCP code access the tcp_remove_empty_skb()
helper.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 16 Nov 2020 16:09:50 +0000 (08:09 -0800)]
Merge branch 'fix-inefficiences-and-rename-nla_strlcpy'
Francis Laniel says:
====================
Fix inefficiences and rename nla_strlcpy
This patch set answers to first three issues listed in:
https://github.com/KSPP/linux/issues/110
To sum up, the patch contributions are the following:
1. the first patch fixes an inefficiency where some bytes in dst were written
twice, one with 0 the other with src content.
2. The second one modifies nla_strlcpy to return the same value as strscpy,
i.e. number of bytes written or -E2BIG if src was truncated.
It also modifies code that calls nla_strlcpy and checks for its return value.
3. The third renames nla_strlcpy to nla_strscpy.
Unfortunately, I did not find how to create struct nlattr objects so I tested
my modifications on simple char* and with GDB using tc to get to
tcf_proto_check_kind.
====================
Link: https://lore.kernel.org/r/20201115170806.3578-1-laniel_francis@privacyrequired.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Francis Laniel [Sun, 15 Nov 2020 17:08:06 +0000 (18:08 +0100)]
treewide: rename nla_strlcpy to nla_strscpy.
Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new
name of this function.
Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Francis Laniel [Sun, 15 Nov 2020 17:08:05 +0000 (18:08 +0100)]
Modify return value of nla_strlcpy to match that of strscpy.
nla_strlcpy now returns -E2BIG if src was truncated when written to dst.
It also returns this error value if dstsize is 0 or higher than INT_MAX.
For example, if src is "foo\0" and dst is 3 bytes long, the result will be:
1. "foG" after memcpy (G means garbage).
2. "fo\0" after memset.
3. -E2BIG is returned because src was not completely written into dst.
The callers of nla_strlcpy were modified to take into account this modification.
Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Francis Laniel [Sun, 15 Nov 2020 17:08:04 +0000 (18:08 +0100)]
Fix unefficient call to memset before memcpu in nla_strlcpy.
Before this commit, nla_strlcpy first memseted dst to 0 then wrote src into it.
This is inefficient because bytes whom number is less than src length are written
twice.
This patch solves this issue by first writing src into dst then fill dst with
0's.
Note that, in the case where src length is higher than dst, only 0 is written.
Otherwise there are as many 0's written to fill dst.
For example, if src is "foo\0" and dst is 5 bytes long, the result will be:
1. "fooGG" after memcpy (G means garbage).
2. "foo\0\0" after memset.
Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Sat, 14 Nov 2020 20:49:53 +0000 (21:49 +0100)]
r8169: improve rtl8169_start_xmit
Improve the following in rtl8169_start_xmit:
- tp->cur_tx can be accessed in parallel by rtl_tx(), therefore
annotate the race by using WRITE_ONCE
- avoid checking stop_queue a second time by moving the doorbell check
- netif_stop_queue() uses atomic operation set_bit() that includes a
full memory barrier on some platforms, therefore use
smp_mb__after_atomic to avoid overhead
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/80085451-3eaf-507a-c7c0-08d607c46fbc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lev Stipakov [Fri, 13 Nov 2020 21:59:40 +0000 (23:59 +0200)]
net: xfrm: use core API for updating/providing stats
Commit
d3fd65484c781 ("net: core: add dev_sw_netstats_tx_add") has added
function "dev_sw_netstats_tx_add()" to update net device per-cpu TX
stats.
Use this function instead of own code.
While on it, remove xfrmi_get_stats64() and replace it with
dev_get_tstats64().
Signed-off-by: Lev Stipakov <lev@openvpn.net>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20201113215939.147007-1-lev@openvpn.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lev Stipakov [Fri, 13 Nov 2020 21:53:36 +0000 (23:53 +0200)]
net: openvswitch: use core API to update/provide stats
Commit
d3fd65484c781 ("net: core: add dev_sw_netstats_tx_add") has added
function "dev_sw_netstats_tx_add()" to update net device per-cpu TX
stats.
Use this function instead of own code.
While on it, remove internal_get_stats() and replace it
with dev_get_tstats64().
Signed-off-by: Lev Stipakov <lev@openvpn.net>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20201113215336.145998-1-lev@openvpn.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sun, 15 Nov 2020 00:55:07 +0000 (16:55 -0800)]
Merge branch 'mlxsw-preparations-for-nexthop-objects-support-part-1-2'
Ido Schimmel says:
====================
mlxsw: Preparations for nexthop objects support - part 1/2
This patch set contains small and non-functional changes aimed at making
it easier to support nexthop objects in mlxsw. Follow up patches can be
found here [1].
Patches #1-#4 add a type field to the nexthop group struct instead of
the existing protocol field. This will be used later on to add a nexthop
object type, which can contain both IPv4 and IPv6 nexthops.
Patches #5-#7 move the IPv4 FIB info pointer (i.e., 'struct fib_info')
from the nexthop group struct to the route. The pointer will not be
available when the nexthop group is a nexthop object, but it needs to be
accessible to routes regardless.
Patch #8 is the biggest change, but it is an entirely cosmetic change
and should therefore be easy to review. The motivation and the change
itself are explained in detail in the commit message.
Patches #9-#12 perform small changes so that two functions that are
currently split between IPv4 and IPv6 could be consolidated in patches
Patch #15 removes an outdated comment.
[1] https://github.com/idosch/linux/tree/submit/nexthop_objects
====================
Link: https://lore.kernel.org/r/20201113160559.22148-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 13 Nov 2020 16:05:59 +0000 (18:05 +0200)]
mlxsw: spectrum_router: Remove outdated comment
Since commit
21151f64a458 ("mlxsw: Add new FIB entry type for reject
routes") this comment is no longer correct. Remove it.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 13 Nov 2020 16:05:58 +0000 (18:05 +0200)]
mlxsw: spectrum_router: Consolidate mlxsw_sp_nexthop{4, 6}_type_fini()
The two functions are identical, so consolidate them to
mlxsw_sp_nexthop_type_fini().
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 13 Nov 2020 16:05:57 +0000 (18:05 +0200)]
mlxsw: spectrum_router: Consolidate mlxsw_sp_nexthop{4, 6}_type_init()
The two functions are now identical, so consolidate them to
mlxsw_sp_nexthop_type_init().
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 13 Nov 2020 16:05:56 +0000 (18:05 +0200)]
mlxsw: spectrum_router: Remove unused argument from mlxsw_sp_nexthop6_type_init()
Remove it as it is unused.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 13 Nov 2020 16:05:55 +0000 (18:05 +0200)]
mlxsw: spectrum_router: Pass nexthop netdev to mlxsw_sp_nexthop4_type_init()
Instead of passing the nexthop and resolving the nexthop netdev from it,
pass the nexthop netdev directly.
This will later allow us to consolidate code paths between IPv4 and IPv6
code.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 13 Nov 2020 16:05:54 +0000 (18:05 +0200)]
mlxsw: spectrum_router: Pass nexthop netdev to mlxsw_sp_nexthop6_type_init()
Instead of passing the route and resolving the nexthop netdev from it,
pass the nexthop netdev directly.
This will later allow us to consolidate code paths between IPv4 and IPv6
code.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 13 Nov 2020 16:05:53 +0000 (18:05 +0200)]
mlxsw: spectrum_ipip: Remove overlay protocol from can_offload() callback
The overlay protocol (i.e., IPv4/IPv6) that is being encapsulated has
no impact on whether a certain IP tunnel can be offloaded or not. Only
the underlay protocol matters.
Therefore, remove the unused overlay protocol parameter from the
callback.
This will later allow us to consolidate code paths between IPv4 and IPv6
code.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 13 Nov 2020 16:05:52 +0000 (18:05 +0200)]
mlxsw: spectrum_router: Split nexthop group configuration to a different struct
Currently, the individual nexthops member in the group and attributes of
the group (e.g., its type) are stored in the same struct (i.e., 'struct
mlxsw_sp_nexthop_group'). This is fine since the individual nexthops
cannot change during the lifetime of the group.
With nexthop objects this is no longer the case. An existing nexthop
group can be replaced to use a new set of nexthops. Creating a new
struct whenever a group is replaced entails replacing the group pointer
of all the routes (i.e., 'struct mlxsw_sp_fib_entry') using the group.
Avoid this inefficient step by splitting the nexthop group configuration
to a different struct (i.e., 'struct mlxsw_sp_nexthop_group_info').
When a nexthop group is replaced a new group info struct is created and
the individual rotues do not need to be touched.
Illustration after the change:
mlxsw_sp_fib_entry mlxsw_sp_nexthop_group mlxsw_sp_nexthop_group_info
+-------------------+ +----------------------+ +---------------------------+
| nh_group; +--> nhgi; +--> |
| | | | | |
+-------------------+ +----------------------+ +---------------------------+
No functional changes intended.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 13 Nov 2020 16:05:51 +0000 (18:05 +0200)]
mlxsw: spectrum_router: Move IPv4 FIB info into a union in nexthop group struct
Instead of storing the FIB info as 'priv' when the nexthop group
represents an IPv4 nexthop group, simply store it as a FIB info with a
proper comment.
When nexthop objects are supported, this field will become a union with
the nexthop object's identifier.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 13 Nov 2020 16:05:50 +0000 (18:05 +0200)]
mlxsw: spectrum_router: Remove unused field 'prio' from IPv4 FIB entry struct
Not used anywhere.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 13 Nov 2020 16:05:49 +0000 (18:05 +0200)]
mlxsw: spectrum_router: Store FIB info in route
When needed, IPv4 routes fetch the FIB info (i.e., 'struct fib_info')
from their associated nexthop group. This will not work when the nexthop
group represents a nexthop object (i.e., 'struct nexthop'), as it will
only have access to the nexthop's identifier.
Instead, store the FIB info in the route itself.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 13 Nov 2020 16:05:48 +0000 (18:05 +0200)]
mlxsw: spectrum_router: Associate neighbour table with nexthop instead of group
As explained in the previous patch, nexthop objects can have both IPv4
and IPv6 nexthops in the same group. Therefore, move the neighbour table
to be a property of the nexthop instead of the nexthop group.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 13 Nov 2020 16:05:47 +0000 (18:05 +0200)]
mlxsw: spectrum_router: Use nexthop group type in hash table key
Both IPv4 and IPv6 nexthop groups are hashed in the same table. The
protocol field is used to indicate how the hash should be computed for
each group.
When nexthop group objects are supported, the hash will be computed for
them based on the nexthop identifier.
To differentiate between all the nexthop group types, encode the type of
the group in the key instead of the protocol.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 13 Nov 2020 16:05:46 +0000 (18:05 +0200)]
mlxsw: spectrum_router: Add nexthop group type field
Currently, the type (i.e., IPv4/IPv6) of the nexthop group is derived
from the neighbour table associated with the group.
This is problematic when nexthop objects are taken into account, as a
nexthop group object can contain both IPv4 and IPv6 nexthops.
Instead, add a new field that indicates the type of the group and
initialize it during the group's creation. Currently, the types are IPv4
('struct fib_info') and IPv6 ('struct fib6_info'). In the future another
type will be added for nexthop objects ('struct nexthop').
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Fri, 13 Nov 2020 16:05:45 +0000 (18:05 +0200)]
mlxsw: spectrum_router: Compare key with correct object type
When comparing a key with a nexthop group in rhastable's obj_cmpfn()
callback, make sure that the key and nexthop group are of the same type
(i.e., IPv4 / IPv6).
The bug is not currently visible because IPv6 nexthop groups do not
populate the FIB info pointer and IPv4 nexthop groups do not set the
ifindex for the individual nexthops.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Shi [Fri, 13 Nov 2020 03:51:57 +0000 (11:51 +0800)]
nfc: refined function nci_hci_resp_received
We don't use the parameter result actually, so better to remove it and
skip a gcc warning for unused variable.
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Link: https://lore.kernel.org/r/1605239517-49707-1-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 13 Nov 2020 11:35:53 +0000 (03:35 -0800)]
inet: unexport udp{4|6}_lib_lookup_skb()
These functions do not need to be exported.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201113113553.3411756-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sun, 15 Nov 2020 00:09:59 +0000 (16:09 -0800)]
Merge branch 'tcp-avoid-indirect-call-in-__sk_stream_memory_free'
Eric Dumazet says:
====================
tcp: avoid indirect call in __sk_stream_memory_free()
Small improvement for CONFIG_RETPOLINE=y, when dealing with TCP sockets.
====================
Link: https://lore.kernel.org/r/20201113150809.3443527-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 13 Nov 2020 15:08:09 +0000 (07:08 -0800)]
tcp: avoid indirect call to tcp_stream_memory_free()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 13 Nov 2020 15:08:08 +0000 (07:08 -0800)]
tcp: uninline tcp_stream_memory_free()
Both IPv4 and IPv6 needs it via a function pointer.
Following patch will avoid the indirect call.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oliver Herms [Fri, 13 Nov 2020 08:55:17 +0000 (09:55 +0100)]
IPv4: RTM_GETROUTE: Add RTA_ENCAP to result
This patch adds an IPv4 routes encapsulation attribute
to the result of netlink RTM_GETROUTE requests
(e.g. ip route get 192.0.2.1).
Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20201113085517.GA1307262@tws
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 14 Nov 2020 21:23:01 +0000 (13:23 -0800)]
Merge branch 'ionic-updates'
Shannon Nelson says:
====================
ionic updates
These updates are a bit of code cleaning and a minor
bit of performance tweaking.
v3: convert ionic_lif_quiesce() to void
v2: added void cast on call to ionic_lif_quiesce()
lowered batching threshold
added patch to flatten calls to ionic_lif_rx_mode
added patch to change from_ndo to can_sleep
====================
Link: https://lore.kernel.org/r/20201112182208.46770-1-snelson@pensando.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:08 +0000 (10:22 -0800)]
ionic: useful names for booleans
With a few more uses of true and false in function calls, we
need to give them some useful names so we can tell from the
calling point what we're doing.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:07 +0000 (10:22 -0800)]
ionic: change set_rx_mode from_ndo to can_sleep
Instead of having two different ways of expressing the same
sleepability concept, using opposite logic, we can rework the
from_ndo to can_sleep for a more consistent usage.
Fixes:
1800eee16676 ("net: ionic: Replace in_interrupt() usage.")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:06 +0000 (10:22 -0800)]
ionic: flatten calls to ionic_lif_rx_mode
The _ionic_lif_rx_mode() is only used once and really doesn't
need to be broken out.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:05 +0000 (10:22 -0800)]
ionic: use mc sync for multicast filters
We should be using the multicast sync routines for the multicast
filters. Also, let's just flatten the logic a bit and pull
the small unicast routine back into ionic_set_rx_mode().
Fixes:
1800eee16676 ("net: ionic: Replace in_interrupt() usage.")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:04 +0000 (10:22 -0800)]
ionic: batch rx buffer refilling
We don't need to refill the rx descriptors on every napi
if only a few were handled. Waiting until we can batch up
a few together will save us a few Rx cycles.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:03 +0000 (10:22 -0800)]
ionic: add lif quiesce
After the queues are stopped, expressly quiesce the lif.
This assures that even if the queues were in an odd state,
the firmware will close up everything cleanly.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:02 +0000 (10:22 -0800)]
ionic: check for link after netdev registration
Request a link check as soon as the netdev is registered rather
than waiting for the watchdog to go off in order to get the
interface operational a little more quickly.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:01 +0000 (10:22 -0800)]
ionic: start queues before announcing link up
Change the order of operations in the link_up handling to be
sure that the queues are up and ready before we announce that
the link is up.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
YueHaibing [Thu, 12 Nov 2020 14:49:36 +0000 (22:49 +0800)]
net: macb: Fix passing zero to 'PTR_ERR'
Check PTR_ERR with IS_ERR to fix this.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20201112144936.54776-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lukas Bulwahn [Fri, 13 Nov 2020 13:50:12 +0000 (14:50 +0100)]
ipv6: remove unused function ipv6_skb_idev()
Commit
bdb7cc643fc9 ("ipv6: Count interface receive statistics on the
ingress netdev") removed all callees for ipv6_skb_idev(). Hence, since
then, ipv6_skb_idev() is unused and make CC=clang W=1 warns:
net/ipv6/exthdrs.c:909:33:
warning: unused function 'ipv6_skb_idev' [-Wunused-function]
So, remove this unused function and a -Wunused-function warning.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lore.kernel.org/r/20201113135012.32499-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 14 Nov 2020 17:13:40 +0000 (09:13 -0800)]
Merge git://git./linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-11-14
1) Add BTF generation for kernel modules and extend BTF infra in kernel
e.g. support for split BTF loading and validation, from Andrii Nakryiko.
2) Support for pointers beyond pkt_end to recognize LLVM generated patterns
on inlined branch conditions, from Alexei Starovoitov.
3) Implements bpf_local_storage for task_struct for BPF LSM, from KP Singh.
4) Enable FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage
infra, from Martin KaFai Lau.
5) Add XDP bulk APIs that introduce a defer/flush mechanism to optimize the
XDP_REDIRECT path, from Lorenzo Bianconi.
6) Fix a potential (although rather theoretical) deadlock of hashtab in NMI
context, from Song Liu.
7) Fixes for cross and out-of-tree build of bpftool and runqslower allowing build
for different target archs on same source tree, from Jean-Philippe Brucker.
8) Fix error path in htab_map_alloc() triggered from syzbot, from Eric Dumazet.
9) Move functionality from test_tcpbpf_user into the test_progs framework so it
can run in BPF CI, from Alexander Duyck.
10) Lift hashtab key_size limit to be larger than MAX_BPF_STACK, from Florian Lehner.
Note that for the fix from Song we have seen a sparse report on context
imbalance which requires changes in sparse itself for proper annotation
detection where this is currently being discussed on linux-sparse among
developers [0]. Once we have more clarification/guidance after their fix,
Song will follow-up.
[0] https://lore.kernel.org/linux-sparse/CAHk-=wh4bx8A8dHnX612MsDO13st6uzAz1mJ1PaHHVevJx_ZCw@mail.gmail.com/T/
https://lore.kernel.org/linux-sparse/
20201109221345.uklbp3lzgq6g42zb@ltop.local/T/
* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (66 commits)
net: mlx5: Add xdp tx return bulking support
net: mvpp2: Add xdp tx return bulking support
net: mvneta: Add xdp tx return bulking support
net: page_pool: Add bulk support for ptr_ring
net: xdp: Introduce bulking for xdp tx return path
bpf: Expose bpf_d_path helper to sleepable LSM hooks
bpf: Augment the set of sleepable LSM hooks
bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP
bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP
bpf: Rename some functions in bpf_sk_storage
bpf: Folding omem_charge() into sk_storage_charge()
selftests/bpf: Add asm tests for pkt vs pkt_end comparison.
selftests/bpf: Add skb_pkt_end test
bpf: Support for pointers beyond pkt_end.
tools/bpf: Always run the *-clean recipes
tools/bpf: Add bootstrap/ to .gitignore
bpf: Fix NULL dereference in bpf_task_storage
tools/bpftool: Fix build slowdown
tools/runqslower: Build bpftool using HOSTCC
tools/runqslower: Enable out-of-tree build
...
====================
Link: https://lore.kernel.org/r/20201114020819.29584-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Steen Hegelund [Thu, 12 Nov 2020 09:22:50 +0000 (10:22 +0100)]
net: phy: mscc: Add PTP support for 2 more VSC PHYs
Add VSC8572 and VSC8574 in the PTP configuration
as they also support PTP.
The relevant datasheets can be found here:
- VSC8572: https://www.microchip.com/wwwproducts/en/VSC8572
- VSC8574: https://www.microchip.com/wwwproducts/en/VSC8574
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Link: https://lore.kernel.org/r/20201112092250.914079-1-steen.hegelund@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann [Sat, 14 Nov 2020 01:29:00 +0000 (02:29 +0100)]
Merge branch 'xdp-redirect-bulk'
Lorenzo Bianconi says:
====================
XDP bulk APIs introduce a defer/flush mechanism to return
pages belonging to the same xdp_mem_allocator object
(identified via the mem.id field) in bulk to optimize
I-cache and D-cache since xdp_return_frame is usually
run inside the driver NAPI tx completion loop.
Convert mvneta, mvpp2 and mlx5 drivers to xdp_return_frame_bulk APIs.
More details on benchmarks run on mlx5 can be found here:
https://github.com/xdp-project/xdp-project/blob/master/areas/mem/xdp_bulk_return01.org
Changes since v5:
- do not keep looping over ptr_ring if the cache is full but release leftover
pages running page_pool_return_page
Changes since v4:
- fix comments
- introduce xdp_frame_bulk_init utility routine
- compiler annotations for I-cache code layout
- move rcu_read_lock outside fast-path
- mlx5 xdp bulking code optimization
Changes since v3:
- align DEV_MAP_BULK_SIZE to XDP_BULK_QUEUE_SIZE
- refactor page_pool_put_page_bulk to avoid code duplication
Changes since v2:
- move mvneta changes in a dedicated patch
Changes since v1:
- improve comments
- rework xdp_return_frame_bulk routine logic
- move count and xa fields at the beginning of xdp_frame_bulk struct
- invert logic in page_pool_put_page_bulk for loop
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Lorenzo Bianconi [Fri, 13 Nov 2020 11:48:32 +0000 (12:48 +0100)]
net: mlx5: Add xdp tx return bulking support
Convert mlx5 driver to xdp_return_frame_bulk APIs.
XDP_REDIRECT (upstream codepath): 8.9Mpps
XDP_REDIRECT (upstream codepath + bulking APIs): 10.2Mpps
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/250460319fd868b7b5668fc1deca74dd42813a90.1605267335.git.lorenzo@kernel.org
Lorenzo Bianconi [Fri, 13 Nov 2020 11:48:31 +0000 (12:48 +0100)]
net: mvpp2: Add xdp tx return bulking support
Convert mvpp2 driver to xdp_return_frame_bulk APIs.
XDP_REDIRECT (upstream codepath): 1.79Mpps
XDP_REDIRECT (upstream codepath + bulking APIs): 1.93Mpps
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Matteo Croce <mcroce@microsoft.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/0b38c295e58e8ce251ef6b4e2187a2f457f9f7a3.1605267335.git.lorenzo@kernel.org
Lorenzo Bianconi [Fri, 13 Nov 2020 11:48:30 +0000 (12:48 +0100)]
net: mvneta: Add xdp tx return bulking support
Convert mvneta driver to xdp_return_frame_bulk APIs.
XDP_REDIRECT (upstream codepath): 275Kpps
XDP_REDIRECT (upstream codepath + bulking APIs): 284Kpps
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/9af8014006d022fc0fec78cdaa71beb56999750d.1605267335.git.lorenzo@kernel.org
Lorenzo Bianconi [Fri, 13 Nov 2020 11:48:29 +0000 (12:48 +0100)]
net: page_pool: Add bulk support for ptr_ring
Introduce the capability to batch page_pool ptr_ring refill since it is
usually run inside the driver NAPI tx completion loop.
Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/08dd249c9522c001313f520796faa777c4089e1c.1605267335.git.lorenzo@kernel.org
Lorenzo Bianconi [Fri, 13 Nov 2020 11:48:28 +0000 (12:48 +0100)]
net: xdp: Introduce bulking for xdp tx return path
XDP bulk APIs introduce a defer/flush mechanism to return
pages belonging to the same xdp_mem_allocator object
(identified via the mem.id field) in bulk to optimize
I-cache and D-cache since xdp_return_frame is usually run
inside the driver NAPI tx completion loop.
The bulk queue size is set to 16 to be aligned to how
XDP_REDIRECT bulking works. The bulk is flushed when
it is full or when mem.id changes.
xdp_frame_bulk is usually stored/allocated on the function
call-stack to avoid locking penalties.
Current implementation considers only page_pool memory model.
Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/e190c03eac71b20c8407ae0fc2c399eda7835f49.1605267335.git.lorenzo@kernel.org
Jisheng Zhang [Thu, 12 Nov 2020 01:27:37 +0000 (09:27 +0800)]
net: stmmac: platform: use optional clk/reset get APIs
Use the devm_reset_control_get_optional() and devm_clk_get_optional()
rather than open coding them.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Link: https://lore.kernel.org/r/20201112092606.5173aa6f@xhacker.debian
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Wed, 11 Nov 2020 21:56:29 +0000 (22:56 +0100)]
r8169: improve rtl_tx
We can simplify the for() condition and eliminate variable tx_left.
The change also considers that tp->cur_tx may be incremented by a
racing rtl8169_start_xmit().
In addition replace the write to tp->dirty_tx and the following
smp_mb() with an equivalent call to smp_store_mb(). This implicitly
adds a WRITE_ONCE() to the write.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/c2e19e5e-3d3f-d663-af32-13c3374f5def@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Wed, 11 Nov 2020 21:14:27 +0000 (22:14 +0100)]
r8169: use READ_ONCE in rtl_tx_slots_avail
tp->dirty_tx and tp->cur_tx may be changed by a racing rtl_tx() or
rtl8169_start_xmit(). Use READ_ONCE() to annotate the races and ensure
that the compiler doesn't use cached values.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/5676fee3-f6b4-84f2-eba5-c64949a371ad@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 13 Nov 2020 23:37:10 +0000 (15:37 -0800)]
Merge branch 'net-ipa-two-fixes'
Alex Elder says:
====================
net: ipa: two fixes
This small series makes two fixes to the IPA code:
- While reviewing something else I found that one of the resource
limits on the SDM845 used the wrong value. The first patch
fixes this. The correct value allocates more resources of this
type for IPA to use, and otherwise does not change behavior.
- When the IPA-resident microcontroller starts up it generates an
event, which triggers an AP interrupt. The event merely
provides some information for logging, which we don't support.
We already ignore the event, and that's harmless. So this
patch explicitly ignores it rather than issuing a warning when
it occurs.
====================
Link: https://lore.kernel.org/r/20201112121157.19784-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Thu, 12 Nov 2020 12:20:00 +0000 (06:20 -0600)]
net: ipa: ignore the microcontroller log event
The IPA-resident microcontroller has the ability to log various
activity in an area of IPA shared memory. When the microcontroller
starts it generates an event to the AP to provide information about
the log.
We don't support reading this log, and we can safely ignore the
event. So do that rather than treating the log info event we
receive as "unsupported."
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Thu, 12 Nov 2020 12:11:56 +0000 (06:11 -0600)]
net: ipa: fix source packet contexts limit
I have discovered that the maximum number of source packet contexts
configured for SDM845 is incorrect. Fix this error.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 13 Nov 2020 23:33:37 +0000 (15:33 -0800)]
Merge branch 'sfc-further-ef100-encap-tso-features'
Edward Cree says:
====================
sfc: further EF100 encap TSO features
This series adds support for GRE and GRE_CSUM TSO on EF100 NICs, as
well as improving the handling of UDP tunnel TSO.
====================
Link: https://lore.kernel.org/r/eda2de73-edf2-8b92-edb9-099ebda09ebc@solarflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Edward Cree [Thu, 12 Nov 2020 15:20:05 +0000 (15:20 +0000)]
sfc: support GRE TSO on EF100
We can treat SKB_GSO_GRE almost exactly the same as UDP tunnels, except
that we don't want to edit the outer UDP len (as there isn't one).
For SKB_GSO_GRE_CSUM, we have to use GSO_PARTIAL as the device doesn't
support offload of non-UDP outer L4 checksums.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin Habets <mhabets@solarflare.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Edward Cree [Thu, 12 Nov 2020 15:19:47 +0000 (15:19 +0000)]
sfc: correctly support non-partial GSO_UDP_TUNNEL_CSUM on EF100
By asking the HW for the correct edits, we can make UDP tunnel TSO
work without needing GSO_PARTIAL. So don't specify it in our
netdev->gso_partial_features.
However, retain GSO_PARTIAL support, as this will be used for other
protocols later.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin Habets <mhabets@solarflare.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Edward Cree [Thu, 12 Nov 2020 15:19:26 +0000 (15:19 +0000)]
sfc: extend bitfield macros to 19 fields
Our TSO descriptors got even more fussy.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin Habets <mhabets@solarflare.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Jakub Kicinski [Fri, 13 Nov 2020 23:13:45 +0000 (15:13 -0800)]
Merge branch 'net-ipa-gsi-register-consolidation'
Alex Elder says:
====================
net: ipa: GSI register consolidation
This series rearranges and consolidates some GSI register
definitions. Its general aim is to make things more
consistent, by:
- Using enumerated types to define the values held in GSI register
fields
- Defining field values in "gsi_reg.h", together with the
definition of the register (and field) that holds them
- Format enumerated type members consistently, with hexidecimal
numeric values, and assignments aligned on the same column
There is one checkpatch "CHECK" warning requesting a blank line; I
ignored that because my intention was to group certain definitions.
====================
Link: https://lore.kernel.org/r/20201110215922.23514-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 10 Nov 2020 21:59:22 +0000 (15:59 -0600)]
net: ipa: use enumerated types for GSI field values
Replace constants defined with an "_FVAL" suffix with values defined
in enumerated types, to be consistent with other usage in the driver.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 10 Nov 2020 21:59:21 +0000 (15:59 -0600)]
net: ipa: move GSI command opcode values into "gsi_reg.h"
The gsi_ch_cmd_opcode, gsi_evt_cmd_opcode, and gsi_generic_cmd_opcode
enumerated types are values that fields in the GSI command registers
can take on. Move their definitions out of "gsi.c" and into "gsi_reg.h",
alongside the definition of registers they are associated with.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 10 Nov 2020 21:59:20 +0000 (15:59 -0600)]
net: ipa: move GSI error values into "gsi_reg.h"
The gsi_err_code and gsi_err_type enumerated types are values that
fields in the GSI ERROR_LOG register can take on. Move their
definitions out of "gsi.c" and into "gsi_reg.h", alongside the
definition of the ERROR_LOG register offset and field symbols.
Drop the "_ERR" suffix in the names of the gsi_err_code members.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 10 Nov 2020 21:59:19 +0000 (15:59 -0600)]
net: ipa: move channel type values into "gsi_reg.h"
The gsi_channel_type enumerated type define values used for the
channel type/protocol for event rings and channels. Move its
definition out of "gsi.c" and into "gsi_reg.h", alongside the
definition of the CH_C_CNTXT_0 register offset and its fields.
Add a comment near the definition of the EV_CH_E_CNTXT_0 register
indicating this type is used for its EV_CHTYPE field.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 10 Nov 2020 21:59:18 +0000 (15:59 -0600)]
net: ipa: use common value for channel type and protocol
The numeric values that represent the event ring channel type are
identical to the values that represent the matching protocol used
for a channel. Use a new gsi_channel_type enumerated type to
represent the values programmed for both cases, using "CHANNEL_TYPE"
in member names in place of "EVT_CHTYPE" and "CHANNEL_PROTOCOL".
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 10 Nov 2020 21:59:17 +0000 (15:59 -0600)]
net: ipa: define GSI interrupt types with enums
Define the GSI global interrupt types with an enumerated type whose
values are the bit positions representing the global interrupt types.
Similarly, define the GSI general interrupt types with an enumerated
type whose values are the bit positions of general interrupt types.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wenlin Kang [Thu, 12 Nov 2020 09:34:42 +0000 (17:34 +0800)]
tipc: fix -Wstringop-truncation warnings
Replace strncpy() with strscpy(), fixes the following warning:
In function 'bearer_name_validate',
inlined from 'tipc_enable_bearer' at net/tipc/bearer.c:246:7:
net/tipc/bearer.c:141:2: warning: 'strncpy' specified bound 32 equals destination size [-Wstringop-truncation]
strncpy(name_copy, name, TIPC_MAX_BEARER_NAME);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Wenlin Kang <wenlin.kang@windriver.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Link: https://lore.kernel.org/r/20201112093442.8132-1-wenlin.kang@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 13 Nov 2020 20:03:21 +0000 (12:03 -0800)]
Merge tag 'mac80211-next-for-net-next-2020-11-13' of git://git./linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Some updates:
* injection/radiotap updates for new test capabilities
* remove WDS support - even years ago when we turned
it off by default it was already basically unusable
* support for HE (802.11ax) rates for beacons
* support for some vendor-specific HE rates
* many other small features/cleanups
* tag 'mac80211-next-for-net-next-2020-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (21 commits)
nl80211: fix kernel-doc warning in the new SAE attribute
cfg80211: remove WDS code
mac80211: remove WDS-related code
rt2x00: remove WDS code
b43legacy: remove WDS code
b43: remove WDS code
carl9170: remove WDS code
ath9k: remove WDS code
wireless: remove CONFIG_WIRELESS_WDS
mac80211: assure that certain drivers adhere to DONT_REORDER flag
mac80211: don't overwrite QoS TID of injected frames
mac80211: adhere to Tx control flag that prevents frame reordering
mac80211: add radiotap flag to assure frames are not reordered
mac80211: save HE oper info in BSS config for mesh
cfg80211: add support to configure HE MCS for beacon rate
nl80211: fix beacon tx rate mask validation
nl80211/cfg80211: fix potential infinite loop
cfg80211: Add support to calculate and report 4096-QAM HE rates
cfg80211: Add support to configure SAE PWE value to drivers
ieee80211: Add definition for WFA DPP
...
====================
Link: https://lore.kernel.org/r/20201113101148.25268-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
KP Singh [Fri, 13 Nov 2020 00:59:30 +0000 (00:59 +0000)]
bpf: Expose bpf_d_path helper to sleepable LSM hooks
Sleepable hooks are never called from an NMI/interrupt context, so it
is safe to use the bpf_d_path helper in LSM programs attaching to these
hooks.
The helper is not restricted to sleepable programs and merely uses the
list of sleepable hooks as the initial subset of LSM hooks where it can
be used.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201113005930.541956-3-kpsingh@chromium.org
KP Singh [Fri, 13 Nov 2020 00:59:29 +0000 (00:59 +0000)]
bpf: Augment the set of sleepable LSM hooks
Update the set of sleepable hooks with the ones that do not trigger
a warning with might_fault() when exercised with the correct kernel
config options enabled, i.e.
DEBUG_ATOMIC_SLEEP=y
LOCKDEP=y
PROVE_LOCKING=y
This means that a sleepable LSM eBPF program can be attached to these
LSM hooks. A new helper method bpf_lsm_is_sleepable_hook is added and
the set is maintained locally in bpf_lsm.c
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20201113005930.541956-2-kpsingh@chromium.org
Alexei Starovoitov [Fri, 13 Nov 2020 02:39:28 +0000 (18:39 -0800)]
Merge branch 'bpf: Enable bpf_sk_storage for FENTRY/FEXIT/RAW_TP'
Martin KaFai says:
====================
This set is to allow the FENTRY/FEXIT/RAW_TP tracing program to use
bpf_sk_storage. The first two patches are a cleanup. The last patch is
tests. Patch 3 has the required kernel changes to
enable bpf_sk_storage for FENTRY/FEXIT/RAW_TP.
Please see individual patch for details.
v2:
- Rename some of the function prefix from sk_storage to bpf_sk_storage
- Use prefix check instead of substr check
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Martin KaFai Lau [Thu, 12 Nov 2020 21:13:20 +0000 (13:13 -0800)]
bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP
This patch tests storing the task's related info into the
bpf_sk_storage by fentry/fexit tracing at listen, accept,
and connect. It also tests the raw_tp at inet_sock_set_state.
A negative test is done by tracing the bpf_sk_storage_free()
and using bpf_sk_storage_get() at the same time. It ensures
this bpf program cannot load.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201112211320.2587537-1-kafai@fb.com
Martin KaFai Lau [Thu, 12 Nov 2020 21:13:13 +0000 (13:13 -0800)]
bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP
This patch enables the FENTRY/FEXIT/RAW_TP tracing program to use
the bpf_sk_storage_(get|delete) helper, so those tracing programs
can access the sk's bpf_local_storage and the later selftest
will show some examples.
The bpf_sk_storage is currently used in bpf-tcp-cc, tc,
cg sockops...etc which is running either in softirq or
task context.
This patch adds bpf_sk_storage_get_tracing_proto and
bpf_sk_storage_delete_tracing_proto. They will check
in runtime that the helpers can only be called when serving
softirq or running in a task context. That should enable
most common tracing use cases on sk.
During the load time, the new tracing_allowed() function
will ensure the tracing prog using the bpf_sk_storage_(get|delete)
helper is not tracing any bpf_sk_storage*() function itself.
The sk is passed as "void *" when calling into bpf_local_storage.
This patch only allows tracing a kernel function.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201112211313.2587383-1-kafai@fb.com
Martin KaFai Lau [Thu, 12 Nov 2020 21:13:07 +0000 (13:13 -0800)]
bpf: Rename some functions in bpf_sk_storage
Rename some of the functions currently prefixed with sk_storage
to bpf_sk_storage. That will make the next patch have fewer
prefix check and also bring the bpf_sk_storage.c to a more
consistent function naming.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20201112211307.2587021-1-kafai@fb.com
Martin KaFai Lau [Thu, 12 Nov 2020 21:13:01 +0000 (13:13 -0800)]
bpf: Folding omem_charge() into sk_storage_charge()
sk_storage_charge() is the only user of omem_charge().
This patch simplifies it by folding omem_charge() into
sk_storage_charge().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20201112211301.2586255-1-kafai@fb.com
Jakub Kicinski [Fri, 13 Nov 2020 00:54:48 +0000 (16:54 -0800)]
Merge https://git./linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann [Fri, 13 Nov 2020 00:42:12 +0000 (01:42 +0100)]
Merge branch 'bpf-ptrs-beyond-pkt-end'
Alexei Starovoitov says:
====================
v1->v2:
- removed set-but-unused variable.
- added Jiri's Tested-by.
In some cases LLVM uses the knowledge that branch is taken to optimze the code
which causes the verifier to reject valid programs.
Teach the verifier to recognize that
r1 = skb->data;
r1 += 10;
r2 = skb->data_end;
if (r1 > r2) {
here r1 points beyond packet_end and subsequent
if (r1 > r2) // always evaluates to "true".
}
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Alexei Starovoitov [Wed, 11 Nov 2020 03:12:13 +0000 (19:12 -0800)]
selftests/bpf: Add asm tests for pkt vs pkt_end comparison.
Add few assembly tests for packet comparison.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20201111031213.25109-4-alexei.starovoitov@gmail.com
Alexei Starovoitov [Wed, 11 Nov 2020 03:12:12 +0000 (19:12 -0800)]
selftests/bpf: Add skb_pkt_end test
Add a test that currently makes LLVM generate assembly code:
$ llvm-objdump -S skb_pkt_end.o
0000000000000000 <main_prog>:
; if (skb_shorter(skb, ETH_IPV4_TCP_SIZE))
0: 61 12 50 00 00 00 00 00 r2 = *(u32 *)(r1 + 80)
1: 61 14 4c 00 00 00 00 00 r4 = *(u32 *)(r1 + 76)
2: bf 43 00 00 00 00 00 00 r3 = r4
3: 07 03 00 00 36 00 00 00 r3 += 54
4: b7 01 00 00 00 00 00 00 r1 = 0
5: 2d 23 02 00 00 00 00 00 if r3 > r2 goto +2 <LBB0_2>
6: 07 04 00 00 0e 00 00 00 r4 += 14
; if (skb_shorter(skb, ETH_IPV4_TCP_SIZE))
7: bf 41 00 00 00 00 00 00 r1 = r4
0000000000000040 <LBB0_2>:
8: b4 00 00 00 ff ff ff ff w0 = -1
; if (!(ip = get_iphdr(skb)))
9: 2d 23 05 00 00 00 00 00 if r3 > r2 goto +5 <LBB0_6>
; proto = ip->protocol;
10: 71 12 09 00 00 00 00 00 r2 = *(u8 *)(r1 + 9)
; if (proto != IPPROTO_TCP)
11: 56 02 03 00 06 00 00 00 if w2 != 6 goto +3 <LBB0_6>
; if (tcp->dest != 0)
12: 69 12 16 00 00 00 00 00 r2 = *(u16 *)(r1 + 22)
13: 56 02 01 00 00 00 00 00 if w2 != 0 goto +1 <LBB0_6>
; return tcp->urg_ptr;
14: 69 10 26 00 00 00 00 00 r0 = *(u16 *)(r1 + 38)
0000000000000078 <LBB0_6>:
; }
15: 95 00 00 00 00 00 00 00 exit
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20201111031213.25109-3-alexei.starovoitov@gmail.com
Alexei Starovoitov [Wed, 11 Nov 2020 03:12:11 +0000 (19:12 -0800)]
bpf: Support for pointers beyond pkt_end.
This patch adds the verifier support to recognize inlined branch conditions.
The LLVM knows that the branch evaluates to the same value, but the verifier
couldn't track it. Hence causing valid programs to be rejected.
The potential LLVM workaround: https://reviews.llvm.org/D87428
can have undesired side effects, since LLVM doesn't know that
skb->data/data_end are being compared. LLVM has to introduce extra boolean
variable and use inline_asm trick to force easier for the verifier assembly.
Instead teach the verifier to recognize that
r1 = skb->data;
r1 += 10;
r2 = skb->data_end;
if (r1 > r2) {
here r1 points beyond packet_end and
subsequent
if (r1 > r2) // always evaluates to "true".
}
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20201111031213.25109-2-alexei.starovoitov@gmail.com
Guillaume Nault [Wed, 11 Nov 2020 15:05:35 +0000 (16:05 +0100)]
selftests: set conf.all.rp_filter=0 in bareudp.sh
When working on the rp_filter problem, I didn't realise that disabling
it on the network devices didn't cover all cases: rp_filter could also
be enabled globally in the namespace, in which case it would drop
packets, even if the net device has rp_filter=0.
Fixes:
1ccd58331f6f ("selftests: disable rp_filter when testing bareudp")
Fixes:
bbbc7aa45eef ("selftests: add test script for bareudp tunnels")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/f2d459346471f163b239aa9d63ce3e2ba9c62895.1605107012.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 12 Nov 2020 23:55:25 +0000 (15:55 -0800)]
Merge branch 'mlxsw-spectrum-prepare-for-xm-implementation-prefix-insertion-and-removal'
Ido Schimmel says:
====================
mlxsw: spectrum: Prepare for XM implementation - prefix insertion and removal
Jiri says:
This is a preparation patchset for follow-up support of boards with
extended mezzanine (XM), which is going to allow extended (scale-wise)
router offload.
XM requires a separate PRM register named XMDR to be used instead of
RALUE to insert/update/remove FIB entries. Therefore, this patchset
extends the previously introduces low-level ops to be able to have
XM-specific FIB entry config implementation.
Currently the existing original RALUE implementation is moved to "basic"
low-level ops.
Unlike legacy router, insertion/update/removal of FIB entries into XM
could be done in bulks up to 4 items in a single PRM register write.
That is why this patchset implements "an op context", that allows the
future XM ops implementation to squash multiple FIB events to single
register write. For that, the way in which the FIB events are processed
by the work queue has to be changed.
The conversion from 1:1 FIB event - work callback call to event queue is
implemented in patch #3.
Patch #4 introduces "an op context" that will allow in future to squash
multiple FIB events into one XMDR register write. Patch #12 converts it
from stack to be allocated per instance.
Existing RALUE manipulations are pushed to ops in patch #10.
Patch #13 is introducing a possibility for low-level implementation to
have per FIB entry private memory.
The rest of the patches are either cosmetics or smaller preparations.
====================
Link: https://lore.kernel.org/r/20201110094900.1920158-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:49:00 +0000 (11:49 +0200)]
mlxsw: spectrum_router: Introduce FIB entry update op
Follow-up patchset introducing XMDR implementation is going to need
to distinguish write and update ops. Therefore introduce "update op"
and call "write op" only when new FIB entry is inserted.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>