David S. Miller [Wed, 24 May 2023 07:46:54 +0000 (08:46 +0100)]
Merge branch 'tools-ynl-byteorder'
Donald Hunter says:
====================
tools: ynl: Add byte-order support for struct members
This patchset adds support to ynl for handling byte-order in struct
members. The first patch is a refactor to use predefined Struct() objects
instead of generating byte-order specific formats on the fly. The second
patch adds byte-order handling for struct members.
====================
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Donald Hunter [Tue, 23 May 2023 09:37:48 +0000 (10:37 +0100)]
tools: ynl: Handle byte-order in struct members
Add support for byte-order in struct members in the genetlink-legacy
spec.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Donald Hunter [Tue, 23 May 2023 09:37:47 +0000 (10:37 +0100)]
tools: ynl: Use dict of predefined Structs to decode scalar types
Use a dict of predefined Struct() objects to decode scalar types in native,
big or little endian format. This removes the repetitive code for the
scalar variants and ensures all the signed variants are supported.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King (Oracle) [Mon, 22 May 2023 15:58:08 +0000 (16:58 +0100)]
net: phy: avoid kernel warning dump when stopping an errored PHY
When taking a network interface down (or removing a SFP module) after
the PHY has encountered an error, phy_stop() complains incorrectly
that it was called from HALTED state.
The reason this is incorrect is that the network driver will have
called phy_start() when the interface was brought up, and the fact
that the PHY has a problem bears no relationship to the administrative
state of the interface. Taking the interface administratively down
(which calls phy_stop()) is always the right thing to do after a
successful phy_start() call, whether or not the PHY has encountered
an error.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 24 May 2023 07:22:06 +0000 (08:22 +0100)]
Merge branch 'RTO_ONLINK'
Guillaume Nault says:
====================
ipv4: Remove RTO_ONLINK from udp, ping and raw sockets.
udp_sendmsg(), ping_v4_sendmsg() and raw_sendmsg() use similar patterns
for restricting their route lookup to on-link hosts. Although they use
slightly different code, they all use RTO_ONLINK to override the least
significant bit of their tos value.
RTO_ONLINK is used to restrict the route scope even when the scope is
set to RT_SCOPE_UNIVERSE. Therefore it isn't necessary: we can properly
set the scope to RT_SCOPE_LINK instead.
Removing RTO_ONLINK will allow to convert .flowi4_tos to dscp_t in the
future, thus allowing to properly separate the DSCP from the ECN bits
in the networking stack.
This patch series defines a common helper to figure out what's the
scope of the route lookup. This unifies the way udp, ping and raw
sockets get their routing scope and removes their dependency on
RTO_ONLINK.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Mon, 22 May 2023 14:38:07 +0000 (16:38 +0200)]
udp: Stop using RTO_ONLINK.
Use ip_sendmsg_scope() to properly initialise the scope in
flowi4_init_output(), instead of overriding tos with the RTO_ONLINK
flag. The objective is to eventually remove RTO_ONLINK, which will
allow converting .flowi4_tos to dscp_t.
Now that the scope is determined by ip_sendmsg_scope(), we need to
check its result to set the 'connected' variable.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Mon, 22 May 2023 14:38:02 +0000 (16:38 +0200)]
raw: Stop using RTO_ONLINK.
Use ip_sendmsg_scope() to properly initialise the scope in
flowi4_init_output(), instead of overriding tos with the RTO_ONLINK
flag. The objective is to eventually remove RTO_ONLINK, which will
allow converting .flowi4_tos to dscp_t.
The MSG_DONTROUTE and SOCK_LOCALROUTE cases were already handled by
raw_sendmsg() (SOCK_LOCALROUTE was handled by the RT_CONN_FLAGS*()
macros called by get_rtconn_flags()). However, opt.is_strictroute
wasn't taken into account. Therefore, a side effect of this patch is to
now honour opt.is_strictroute, and thus align raw_sendmsg() with
ping_v4_sendmsg() and udp_sendmsg().
Since raw_sendmsg() was the only user of get_rtconn_flags(), we can now
remove this function.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Mon, 22 May 2023 14:37:57 +0000 (16:37 +0200)]
ping: Stop using RTO_ONLINK.
Define a new helper to figure out the correct route scope to use on TX,
depending on socket configuration, ancillary data and send flags.
Use this new helper to properly initialise the scope in
flowi4_init_output(), instead of overriding tos with the RTO_ONLINK
flag.
The objective is to eventually remove RTO_ONLINK, which will allow
converting .flowi4_tos to dscp_t.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Coco Li [Mon, 22 May 2023 20:15:52 +0000 (13:15 -0700)]
gve: Support IPv6 Big TCP on DQ
Add support for using IPv6 Big TCP on DQ which can handle large TSO/GRO
packets. See https://lwn.net/Articles/895398/. This can improve the
throughput and CPU usage.
Perf test result:
ip -d link show $DEV
gso_max_size 185000 gso_max_segs 65535 tso_max_size 262143 tso_max_segs 65535 gro_max_size 185000
For performance, tested with neper using 9k MTU on hardware that supports 200Gb/s line rate.
In single streams when line rate is not saturated, we expect throughput improvements.
When the networking is performing at line rate, we expect cpu usage improvements.
Tcp_stream (unidirectional stream test, T=thread, F=flow):
skb=180kb, T=1, F=1, no zerocopy: throughput average=64576.88 Mb/s, sender stime=8.3, receiver stime=10.68
skb=64kb, T=1, F=1, no zerocopy: throughput average=64862.54 Mb/s, sender stime=9.96, receiver stime=12.67
skb=180kb, T=1, F=1, yes zerocopy: throughput average=146604.97 Mb/s, sender stime=10.61, receiver stime=5.52
skb=64kb, T=1, F=1, yes zerocopy: throughput average=131357.78 Mb/s, sender stime=12.11, receiver stime=12.25
skb=180kb, T=20, F=100, no zerocopy: throughput average=182411.37 Mb/s, sender stime=41.62, receiver stime=79.4
skb=64kb, T=20, F=100, no zerocopy: throughput average=182892.02 Mb/s, sender stime=57.39, receiver stime=72.69
skb=180kb, T=20, F=100, yes zerocopy: throughput average=182337.65 Mb/s, sender stime=27.94, receiver stime=39.7
skb=64kb, T=20, F=100, yes zerocopy: throughput average=182144.20 Mb/s, sender stime=47.06, receiver stime=39.01
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Coco Li <lixiaoyan@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230522201552.3585421-1-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 May 2023 03:48:53 +0000 (20:48 -0700)]
Merge branch 'splice-net-replace-sendpage-with-sendmsg-msg_splice_pages-part-1'
David Howells says:
====================
splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1
Here's the first tranche of patches towards providing a MSG_SPLICE_PAGES
internal sendmsg flag that is intended to replace the ->sendpage() op with
calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol
that it should splice the pages supplied if it can and copy them if not.
This will allow splice to pass multiple pages in a single call and allow
certain parts of higher protocols (e.g. sunrpc, iwarp) to pass an entire
message in one go rather than having to send them piecemeal. This should
also make it easier to handle the splicing of multipage folios.
A helper, skb_splice_from_iter() is provided to do the work of splicing or
copying data from an iterator. If a page is determined to be unspliceable
(such as being in the slab), then the helper will give an error.
Note that this facility is not made available to userspace and does not
provide any sort of callback.
This set consists of the following parts:
(1) Define the MSG_SPLICE_PAGES flag and prevent sys_sendmsg() from being
able to set it.
(2) Add an extra argument to skb_append_pagefrags() so that something
other than MAX_SKB_FRAGS can be used (sysctl_max_skb_frags for
example).
(3) Add the skb_splice_from_iter() helper to handle splicing pages into
skbuffs for MSG_SPLICE_PAGES that can be shared by TCP, IP/UDP and
AF_UNIX.
(4) Implement MSG_SPLICE_PAGES support in TCP.
(5) Make do_tcp_sendpages() just wrap sendmsg() and then fold it in to its
various callers.
(6) Implement MSG_SPLICE_PAGES support in IP and make udp_sendpage() just
a wrapper around sendmsg().
(7) Implement MSG_SPLICE_PAGES support in IP6/UDP6.
(8) Implement MSG_SPLICE_PAGES support in AF_UNIX.
(9) Make AF_UNIX copy unspliceable pages.
Link: https://lore.kernel.org/r/20230316152618.711970-1-dhowells@redhat.com/
Link: https://lore.kernel.org/r/20230329141354.516864-1-dhowells@redhat.com/
Link: https://lore.kernel.org/r/20230331160914.1608208-1-dhowells@redhat.com/
Link: https://lore.kernel.org/r/20230405165339.3468808-1-dhowells@redhat.com/
Link: https://lore.kernel.org/r/20230406094245.3633290-1-dhowells@redhat.com/
Link: https://lore.kernel.org/r/20230411160902.4134381-1-dhowells@redhat.com/
Link: https://lore.kernel.org/r/20230515093345.396978-1-dhowells@redhat.com/
Link: https://lore.kernel.org/r/20230518113453.1350757-1-dhowells@redhat.com/
====================
Link: https://lore.kernel.org/r/20230522121125.2595254-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:25 +0000 (13:11 +0100)]
unix: Convert unix_stream_sendpage() to use MSG_SPLICE_PAGES
Convert unix_stream_sendpage() to use sendmsg() with MSG_SPLICE_PAGES
rather than directly splicing in the pages itself.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Kuniyuki Iwashima <kuniyu@amazon.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:24 +0000 (13:11 +0100)]
af_unix: Support MSG_SPLICE_PAGES
Make AF_UNIX sendmsg() support MSG_SPLICE_PAGES, splicing in pages from the
source iterator if possible and copying the data in otherwise.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Kuniyuki Iwashima <kuniyu@amazon.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:23 +0000 (13:11 +0100)]
ip: Remove ip_append_page()
ip_append_page() is no longer used with the removal of udp_sendpage(), so
remove it.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:22 +0000 (13:11 +0100)]
udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES
Convert udp_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than
directly splicing in the pages itself.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:21 +0000 (13:11 +0100)]
ip6, udp6: Support MSG_SPLICE_PAGES
Make IP6/UDP6 sendmsg() support MSG_SPLICE_PAGES. This causes pages to be
spliced from the source iterator if possible, copying the data if not.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:20 +0000 (13:11 +0100)]
ip, udp: Support MSG_SPLICE_PAGES
Make IP/UDP sendmsg() support MSG_SPLICE_PAGES. This causes pages to be
spliced from the source iterator.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:19 +0000 (13:11 +0100)]
tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked()
Fold do_tcp_sendpages() into its last remaining caller,
tcp_sendpage_locked().
Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:18 +0000 (13:11 +0100)]
siw: Inline do_tcp_sendpages()
do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
so inline it, allowing do_tcp_sendpages() to be removed. This is part of
replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
cc: Jason Gunthorpe <jgg@ziepe.ca>
cc: Leon Romanovsky <leon@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:17 +0000 (13:11 +0100)]
tls: Inline do_tcp_sendpages()
do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
so inline it, allowing do_tcp_sendpages() to be removed. This is part of
replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:16 +0000 (13:11 +0100)]
espintcp: Inline do_tcp_sendpages()
do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
so inline it, allowing do_tcp_sendpages() to be removed. This is part of
replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steffen Klassert <steffen.klassert@secunet.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:15 +0000 (13:11 +0100)]
tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg
do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
so inline it. This is part of replacing ->sendpage() with a call to
sendmsg() with MSG_SPLICE_PAGES set.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Sitnicki <jakub@cloudflare.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:14 +0000 (13:11 +0100)]
tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES
Convert do_tcp_sendpages() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself. do_tcp_sendpages() can then be
inlined in subsequent patches into its callers.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:13 +0000 (13:11 +0100)]
tcp: Support MSG_SPLICE_PAGES
Make TCP's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be
spliced or copied (if it cannot be spliced) from the source iterator.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:12 +0000 (13:11 +0100)]
net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES
Add a function to handle MSG_SPLICE_PAGES being passed internally to
sendmsg(). Pages are spliced into the given socket buffer if possible and
copied in if not (e.g. they're slab pages or have a zero refcount).
Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:11 +0000 (13:11 +0100)]
net: Pass max frags into skb_append_pagefrags()
Pass the maximum number of fragments into skb_append_pagefrags() rather
than using MAX_SKB_FRAGS so that it can be used from code that wants to
specify sysctl_max_skb_frags.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Mon, 22 May 2023 12:11:10 +0000 (13:11 +0100)]
net: Declare MSG_SPLICE_PAGES internal sendmsg() flag
Declare MSG_SPLICE_PAGES, an internal sendmsg() flag, that hints to a
network protocol that it should splice pages from the source iterator
rather than copying the data if it can. This flag is added to a list that
is cleared by sendmsg syscalls on entry.
This is intended as a replacement for the ->sendpage() op, allowing a way
to splice in several multipage folios in one go.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jaco Coetzee [Mon, 22 May 2023 14:13:35 +0000 (16:13 +0200)]
nfp: add L4 RSS hashing on UDP traffic
Add layer 4 RSS hashing on UDP traffic to allow for the
utilization of multiple queues for multiple connections on
the same IP address.
Previously, since the introduction of the driver, RSS hashing
was only performed on the source and destination IP addresses
of UDP packets thereby limiting UDP traffic to a single queue
for multiple connections on the same IP address. The transport
layer is now included in RSS hashing for UDP traffic, which
was not previously the case. The reason behind the previous
limitation is unclear - either a historic limitation of the
NFP device, or an oversight.
Signed-off-by: Jaco Coetzee <jaco.coetzee@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Link: https://lore.kernel.org/r/20230522141335.22536-1-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Josua Mayer [Mon, 22 May 2023 14:52:42 +0000 (17:52 +0300)]
net: sfp: add support for HXSX-ATRI-1 copper SFP+ module
Walsun offers commercial ("C") and industrial ("I") variants of
multi-rate copper SFP+ modules.
Add quirk for HXSX-ATRI-1 using same parameters as the already supported
commercial variant HXSX-ATRC-1.
Signed-off-by: Josua Mayer <josua@solid-run.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230522145242.30192-2-josua@solid-run.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ratheesh Kannoth [Mon, 22 May 2023 02:04:04 +0000 (07:34 +0530)]
octeontx2-pf: Add support for page pool
Page pool for each rx queue enhance rx side performance
by reclaiming buffers back to each queue specific pool. DMA
mapping is done only for first allocation of buffers.
As subsequent buffers allocation avoid DMA mapping,
it results in performance improvement.
Image | Performance
------------ | ------------
Vannila | 3Mpps
|
with this | 42Mpps
change |
---------------------------
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://lore.kernel.org/r/20230522020404.152020-1-rkannoth@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 23 May 2023 02:12:31 +0000 (19:12 -0700)]
Merge tag 'mlx5-updates-2023-05-19' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-05-19
mlx5 misc changes and code clean up:
The following series contains general changes for improving
E-Switch driver behavior.
1) improving condition checking
2) Code clean up
3) Using metadata matching on send-to-vport rules.
4) Using RoCE v2 instead of v1 for loopback rules.
* tag 'mlx5-updates-2023-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: E-Switch, Initialize E-Switch for eswitch manager
net/mlx5: devlink, Only show PF related devlink warning when needed
net/mlx5: E-Switch, Use metadata matching for RoCE loopback rule
net/mlx5: E-Switch, Use RoCE version 2 for loopback traffic
net/mlx5e: E-Switch, Add a check that log_max_l2_table is valid
net/mlx5e: E-Switch: move debug print of adding mac to correct place
net/mlx5e: E-Switch, Check device is PF when stopping esw offloads
net/mlx5: Remove redundant vport_group_manager cap check
net/mlx5e: E-Switch, Use metadata for vport matching in send-to-vport rules
net/mlx5e: E-Switch, Allow get vport api if esw exists
net/mlx5e: E-Switch, Update when to set other vport context
net/mlx5e: Remove redundant __func__ arg from fs_err() calls
net/mlx5e: E-Switch, Remove flow_source check for metadata matching
net/mlx5: E-Switch, Remove redundant check
net/mlx5: Remove redundant esw multiport validate function
====================
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230519175557.15683-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Sat, 20 May 2023 10:41:42 +0000 (11:41 +0100)]
net: phylink: require supported_interfaces to be filled
We have been requiring the supported_interfaces bitmap to be filled in
by MAC drivers that have a mac_select_pcs() method. Now that all MAC
drivers fill in the supported_interfaces bitmap, it is time to enforce
this. We have already required supported_interfaces to be set in order
for optical SFPs to be configured in commit
f81fa96d8a6c ("net: phylink:
use phy_interface_t bitmaps for optical modules").
Refuse phylink creation if supported_interfaces is empty, and remove
code to deal with cases where this mask is empty.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/E1q0K1u-006EIP-ET@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Sat, 20 May 2023 10:18:30 +0000 (11:18 +0100)]
net: sfp: add support for a couple of copper multi-rate modules
Add support for the Fiberstore SFP-10G-T and Walsun HXSX-ATRC-1
modules. Internally, the PCB silkscreen has what seems to be a part
number of WT_502. Fiberstore use v2.2 whereas Walsun use v2.6.
These modules contain a Marvell AQrate AQR113C PHY, accessible through
I2C 0x51 using the "rollball" protocol. In both cases, the PHY is
programmed to use 10GBASE-R with pause-mode rate adaption.
Unlike the other rollball modules, these only need a four second delay
before we can talk to the PHY.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/E1q0JfS-006Dqc-8t@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Mon, 22 May 2023 11:44:44 +0000 (12:44 +0100)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: allow matching on meta data
Michal Swiatkowski says:
This patchset is intended to improve the usability of the switchdev
slow path. Without matching on a meta data values slow path works
based on VF's MAC addresses. It causes a problem when the VF wants
to use more than one MAC address (e.g. when it is in trusted mode).
Parse all meta data in the same place where protocol type fields are
parsed. Add description for the currently implemented meta data. It is
important to note that depending on DDP not all described meta data can
be available. Using not available meta data leads to error returned by
function which is looking for correct words in profiles read from DDP.
There is also one small improvement, remove of rx field in rule info
structure (patch 2). It is redundant.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Uwe Kleine-König [Sat, 20 May 2023 17:21:04 +0000 (19:21 +0200)]
nfc: Switch i2c drivers back to use .probe()
After commit
b8a1a4cd5a98 ("i2c: Provide a temporary .probe_new()
call-back type"), all drivers being converted to .probe_new() and then
03c835f498b5 ("i2c: Switch .probe() to not take an id parameter")
convert back to (the new) .probe() to be able to eventually drop
.probe_new() from struct i2c_driver.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Begunkov [Fri, 19 May 2023 13:30:36 +0000 (14:30 +0100)]
net/tcp: refactor tcp_inet6_sk()
Don't keep hand coded offset caluclations and replace it with
container_of(). It should be type safer and a bit less confusing.
It also makes it with a macro instead of inline function to preserve
constness, which was previously casted out like in case of
tcp_v6_send_synack().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Fri, 19 May 2023 13:03:59 +0000 (14:03 +0100)]
net: phy: add helpers for comparing phy IDs
There are several places which open code comparing PHY IDs. Provide a
couple of helpers to assist with this, using a slightly simpler test
than the original:
- phy_id_compare() compares two arbitary PHY IDs and a mask of the
significant bits in the ID.
- phydev_id_compare() compares the bound phydev with the specified
PHY ID, using the bound driver's mask.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King (Oracle) [Fri, 19 May 2023 10:33:04 +0000 (11:33 +0100)]
net: altera: tse: remove mac_an_restart() function
The mac_an_restart() method will only be called if the driver sets
legacy_pre_march2020, which the altera tse driver does not do.
Therefore, providing a stub is unnecessary.
Fixes:
fef2998203e1 ("net: altera: tse: convert to phylink")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 19 May 2023 09:32:38 +0000 (11:32 +0200)]
net: ipconfig: move ic_nameservers_fallback into #ifdef block
The new variable is only used when IPCONFIG_BOOTP is defined and otherwise
causes a warning:
net/ipv4/ipconfig.c:177:12: error: 'ic_nameservers_fallback' defined but not used [-Werror=unused-variable]
Move it next to the user.
Fixes:
81ac2722fa19 ("net: ipconfig: Allow DNS to be overwritten by DHCPACK")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Fang [Fri, 19 May 2023 02:01:13 +0000 (10:01 +0800)]
net: fec: remove useless fec_enet_reset_skb()
This patch is a cleanup for fec driver. The fec_enet_reset_skb()
is used to free skb buffers for tx queues and is only invoked in
fec_restart(). However, fec_enet_bd_init() also resets skb buffers
and is invoked in fec_restart() too. So fec_enet_reset_skb() is
redundant and useless.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Fang [Fri, 19 May 2023 01:48:25 +0000 (09:48 +0800)]
net: fec: turn on XDP features
The XDP features are supported since the commit
66c0e13ad236
("drivers: net: turn on XDP features"). Currently, the fec
driver supports NETDEV_XDP_ACT_BASIC, NETDEV_XDP_ACT_REDIRECT
and NETDEV_XDP_ACT_NDO_XMIT. So turn on these XDP features
for fec driver.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 20 May 2023 04:52:18 +0000 (21:52 -0700)]
Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-05-18 (igc, igb, e1000e)
This series contains updates to igc, igb, and e1000e drivers.
Kurt Kanzenbach adds calls to txq_trans_cond_update() for XDP transmit
on igc.
Tom Rix makes definition of igb_pm_ops conditional on CONFIG_PM for igb.
Baozhu Ni adds a missing kdoc description on e1000e.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
e1000e: Add @adapter description to kdoc
igb: Define igb_pm_ops conditionally on CONFIG_PM
igc: Avoid transmit queue timeout for XDP
====================
Link: https://lore.kernel.org/r/20230518170942.418109-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Roi Dayan [Sun, 23 Apr 2023 10:39:57 +0000 (13:39 +0300)]
net/mlx5e: E-Switch, Initialize E-Switch for eswitch manager
Initialize eswitch instance for a function which is eswitch manager
but not a vport group manager.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Tue, 2 May 2023 09:44:47 +0000 (12:44 +0300)]
net/mlx5: devlink, Only show PF related devlink warning when needed
Limit the PF related warning to show if device is actually a PF.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Mon, 24 Apr 2023 09:15:54 +0000 (12:15 +0300)]
net/mlx5: E-Switch, Use metadata matching for RoCE loopback rule
Use metadata matching for RoCE loopback rule if device is configured
to use metadata for source port matching.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Mon, 24 Apr 2023 09:11:13 +0000 (12:11 +0300)]
net/mlx5: E-Switch, Use RoCE version 2 for loopback traffic
Could be port initializing eswitch doesn't support RoCE version 1
but all ports should support RoCE version 2.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Thu, 20 Apr 2023 09:19:25 +0000 (12:19 +0300)]
net/mlx5e: E-Switch, Add a check that log_max_l2_table is valid
If log_max_l2_table is 0 there is no really room for one L2 address.
and should be treated as not supported.
Do the check in MPFS init and for vport context events which
both used to update L2 address.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Sun, 23 Apr 2023 12:24:00 +0000 (15:24 +0300)]
net/mlx5e: E-Switch: move debug print of adding mac to correct place
Move the debug print inside the if clause that actually does the change.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Sun, 23 Apr 2023 07:53:18 +0000 (10:53 +0300)]
net/mlx5e: E-Switch, Check device is PF when stopping esw offloads
Checking sriov is done on the pci device so it can return true
on other devices like SF but nothing should be done in this case.
Add a check that the device is PF.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Thu, 20 Apr 2023 09:14:41 +0000 (12:14 +0300)]
net/mlx5: Remove redundant vport_group_manager cap check
It's enough to check for esw_manager cap for get the esw flow table
caps.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Sun, 2 Apr 2023 10:59:08 +0000 (13:59 +0300)]
net/mlx5e: E-Switch, Use metadata for vport matching in send-to-vport rules
Like other rules use metadata matching if supported instead of
source_port.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Sun, 2 Apr 2023 11:19:54 +0000 (14:19 +0300)]
net/mlx5e: E-Switch, Allow get vport api if esw exists
We could have an esw manager device which is not a vport group manager.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Sun, 2 Apr 2023 11:13:25 +0000 (14:13 +0300)]
net/mlx5e: E-Switch, Update when to set other vport context
Other vport context should be set if vport number is not 0.
In case of ECPF, vport 0 represents the host PF representor so also
need to set other vport context.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Thu, 30 Mar 2023 09:25:25 +0000 (12:25 +0300)]
net/mlx5e: Remove redundant __func__ arg from fs_err() calls
fs_err() already logs the function name. remote the arg so the function
name will not be logged twice.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Tue, 31 Jan 2023 10:06:45 +0000 (12:06 +0200)]
net/mlx5e: E-Switch, Remove flow_source check for metadata matching
There is no reason to check for flow_source cap to allow metadata
matching. When flow_source match is being used the flow_source cap
is being checked.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Thu, 23 Mar 2023 08:01:38 +0000 (10:01 +0200)]
net/mlx5: E-Switch, Remove redundant check
The call to mlx5_eswitch_enable() also does the same check
and if E-Switch not supported it returns 0 without any change.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Wed, 22 Mar 2023 11:27:43 +0000 (13:27 +0200)]
net/mlx5: Remove redundant esw multiport validate function
The function didn't validate the value and doesn't require value
validation as it will always be valid true or false values.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Michal Swiatkowski [Fri, 7 Apr 2023 16:52:19 +0000 (18:52 +0200)]
ice: use src VSI instead of src MAC in slow-path
The use of a source MAC to direct packets from the VF to the corresponding
port representor is only ok if there is only one MAC on a VF. To support
this functionality when the number of MACs on a VF is greater, it is
necessary to match a source VSI instead of a source MAC.
Let's use the new switch API that allows matching on metadata.
If MAC isn't used in match criteria there is no need to handle adding
rule after virtchnl command. Instead add new rule while port representor
is being configured.
Remove rule_added field, checking for sp_rule can be used instead.
Remove also checking for switchdev running in deleting rule as it can be
called from unroll context when running flag isn't set. Checking for
sp_rule covers both context (with and without running flag).
Rules are added in eswitch configuration flow, so there is no need to
have replay function.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michal Swiatkowski [Fri, 7 Apr 2023 16:52:18 +0000 (18:52 +0200)]
ice: allow matching on meta data
Add meta data matching criteria in the same place as protocol matching
criteria. There is no need to add meta data as special words after
parsing all lookups. Trade meta data in the same why as other lookups.
The one difference between meta data lookups and protocol lookups is
that meta data doesn't impact how the packets looks like. Because of that
ignore it when filling testing packet.
Match on tunnel type meta data always if tunnel type is different than
TNL_LAST.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michal Swiatkowski [Fri, 7 Apr 2023 16:52:17 +0000 (18:52 +0200)]
ice: specify field names in ice_prot_ext init
Anonymous initializers are now discouraged. Define ICE_PROTCOL_ENTRY
macro to rewrite anonymous initializers to named one. No functional
changes here.
Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michal Swiatkowski [Fri, 7 Apr 2023 16:52:16 +0000 (18:52 +0200)]
ice: remove redundant Rx field from rule info
Information about the direction is currently stored in sw_act.flag.
There is no need to duplicate it in another field.
Setting direction flag doesn't mean that there is a match criteria for
direction in rule. It is only a information for HW from where switch id
should be collected (VSI or port). In current implementation of advance
rule handling, without matching for direction meta data, we can always
set one the same flag and everything will work the same.
Ability to match on direction meta data will be added in follow up
patches.
Recipe 0, 3 and 9 loaded from package has direction match
criteria, but they are handled in other function.
Move ice_adv_rule_info fields to avoid holes.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Michal Swiatkowski [Fri, 7 Apr 2023 16:52:15 +0000 (18:52 +0200)]
ice: define meta data to match in switch
Add description for each meta data. Redefine tunnel mask to match only
tunneled MAC and tunneled VLAN. It shouldn't try to match other flags
(previously it was 0xff, it is redundant).
VLAN mask was 0xd000, change it to 0xf000. 4 last bits are flags
depending on the same field in packets (VLAN tag). Because of that,
It isn't harmful to match also on ITAG.
Group all MDID and MDID offsets into enums to keep things organized.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Uwe Kleine-König [Thu, 18 May 2023 20:30:49 +0000 (22:30 +0200)]
net: arc: Make arc_emac_remove() return void
The function returns zero unconditionally. Change it to return void instead
which simplifies its callers as error handing becomes unnecessary.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 19 May 2023 03:06:34 +0000 (20:06 -0700)]
Merge branch 'mptcp-refactor-inet_accept-and-mib-updates'
Mat Martineau says:
====================
mptcp: Refactor inet_accept() and MIB updates
Patches 1 and 2 refactor inet_accept() to provide a new __inet_accept()
that's usable with locked sockets, and then make use of that helper to
simplify mptcp_stream_accept().
Patches 3 and 4 add some new MIBS related to MPTCP address advertisement
and update related selftest scripts.
Patch 5 modifies the selftests to ensure MIBS are only printed once when
a test case fails.
====================
Link: https://lore.kernel.org/r/20230516-send-net-next-20220516-v1-0-e91822b7b6e0@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 17 May 2023 19:16:18 +0000 (12:16 -0700)]
selftests: mptcp: centralize stats dumping
If a test case fails, the mptcp_join.sh script can dump the
netns MIBs multiple times, leading to confusing output.
Let's dump such info only once per test-case, when needed.
This additionally allow removing some code duplication.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 17 May 2023 19:16:17 +0000 (12:16 -0700)]
selftests: mptcp: add explicit check for new mibs
Instead of duplicating the all existing TX check with
the TX side, add the new ones on selected test cases.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 17 May 2023 19:16:16 +0000 (12:16 -0700)]
mptcp: introduces more address related mibs
Currently we don't track explicitly a few events related to address
management suboption handling; this patch adds new mibs for ADD_ADDR
and RM_ADDR options tx and for missed tx events due to internal storage
exhaustion.
The self-tests must be updated to properly handle different mibs with
the same/shared prefix.
Additionally removes a couple of warning tracking the loss event.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/378
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 17 May 2023 19:16:15 +0000 (12:16 -0700)]
mptcp: refactor mptcp_stream_accept()
Rewrite the mptcp socket accept op, leveraging the new
__inet_accept() helper.
This way we can avoid acquiring the new socket lock twice
and we can avoid a couple of indirect calls.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 17 May 2023 19:16:14 +0000 (12:16 -0700)]
inet: factor out locked section of inet_accept() in a new helper
No functional changes intended. The new helper will be used
by the MPTCP protocol in the next patch to avoid duplicating
a few LoC.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 19 May 2023 03:04:59 +0000 (20:04 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-05-17 (ice, MAINTAINERS)
This series contains updates to ice driver and MAINTAINERS file.
Paul refactors PHY to link mode reporting and updates some PHY types to
report more accurate link modes for ice.
Dave removes mutual exclusion policy between LAG and SR-IOV in ice
driver.
Jesse updates link for Intel Wired LAN in the MAINTAINERS file.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
MAINTAINERS: update Intel Ethernet links
ice: Remove LAG+SRIOV mutual exclusion
ice: update PHY type to ethtool link mode mapping
ice: refactor PHY type to ethtool link mode
ice: update ICE_PHY_TYPE_HIGH_MAX_INDEX
====================
Link: https://lore.kernel.org/r/20230517165530.3179965-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 19 May 2023 02:52:35 +0000 (19:52 -0700)]
Merge branch 'net-sfp-add-support-for-control-of-rate-selection'
Russell King says:
====================
net: sfp: add support for control of rate selection
This series introduces control of the rate selection SFP pins (or
their soft state in the I2C diagnostics EEPROM). Several SNIA documents
(referenced in the commits) describe the various different modes for
these, and we implement them all for maximum compatibility, but as
we know, SFP modules tend to do their own thing, so that may not be
sufficient.
In order to implement this, we need to change the locking arrangement
in the SFP layer - we need to make st_mutex (state mutex) able to be
taken from within the rtnl lock and sm_mutex (state machine mutex).
Essentially, st_mutex protects the hard (gpio) and soft state signals.
So, patches 2 through 5 rejig the locking so that st_mutex is only
ever taken when we want to fiddle with the signal state variables,
read or write the GPIOs, or read or write the soft state.
Patch 1 adds a helper that makes the locking rejig a little easier
as it combines the update of sfp->state with setting the updated
control state to the module.
Patch 6 adds code to phylink to give the signalling rate for various
PHY interface modes that are relevant to SFPs - this is the baud rate
of the encoded signal, not the data rate, which is what matters for
SFPs. This rate is passed through the SFP bus layer into the SFP
socket driver, which initially has a stub sfp_set_signal_rate().
Patch 7 adds the code to the SFP socket driver to parse the rate
selection data in the EEPROM, configure which RS signals need to be
driven, and the signalling rate threshold. We fill in
sfp_set_signal_rate() to set the rate select pins as appropriate.
====================
Link: https://lore.kernel.org/r/ZGSuTY8GqjM+sqta@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 17 May 2023 10:38:17 +0000 (11:38 +0100)]
net: sfp: add support for rate selection
Add support for parsing the rate select thresholds and switching of the
RS0 and RS1 signals to the transceiver. This is complicated by various
revisions of SFF-8472 and interaction of SFF-8431, SFF-8079 and
INF-8074.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 17 May 2023 10:38:12 +0000 (11:38 +0100)]
net: sfp: add support for setting signalling rate
Add support to the SFP layer to allow phylink to set the signalling
rate for a SFP module. The rate given will be in units of kilo-baud
(1000 baud).
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 17 May 2023 10:38:07 +0000 (11:38 +0100)]
net: sfp: change st_mutex locking
Change st_mutex's use within SFP such that it only protects the various
state members, as it was originally supposed to, and isn't held while
making various calls outside the driver.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 17 May 2023 10:38:02 +0000 (11:38 +0100)]
net: sfp: move sm_mutex into sfp_check_state()
Provide an unlocked version of sfp_sm_event() which can be used by
sfp_check_state() to avoid having to keep re-taking the lock if
several signals have changed state.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 17 May 2023 10:37:57 +0000 (11:37 +0100)]
net: sfp: swap order of rtnl and st_mutex locks
Swap the order of the rtnl and st_mutex locks - st_mutex is now nested
beneath rtnl lock instead of rtnl being beneath st_mutex. This will
allow us to hold st_mutex only while manipulating the module's hardware
or software control state.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 17 May 2023 10:37:52 +0000 (11:37 +0100)]
net: sfp: move rtnl lock to cover reading state
As preparation to moving st_mutex inside rtnl_lock, we need to first
move the rtnl lock to cover reading the state.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 17 May 2023 10:37:47 +0000 (11:37 +0100)]
net: sfp: add helper to modify signal states
There are a couple of locations in the code where we modify
sfp->state, and then call sfp_set_state(, sfp->state) to update
the outputs/soft state to control the module. Provide a helper
which takes a mask and new state so that this is encapsulated in
one location.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 18 May 2023 21:39:34 +0000 (14:39 -0700)]
Merge git://git./linux/kernel/git/netdev/net
Conflicts:
drivers/net/ethernet/freescale/fec_main.c
6ead9c98cafc ("net: fec: remove the xdp_return_frame when lack of tx BDs")
144470c88c5d ("net: fec: using the standard return codes when xdp xmit errors")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 18 May 2023 21:05:48 +0000 (14:05 -0700)]
Merge tag 'nf-next-2023-05-18' of https://git./linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
Netfilter updates for net-next
nftables updates:
1. Allow key existence checks with maps.
At the moment the kernel requires userspace to pass a destination
register for the associated value, make this optional so userspace
can query if the key exists, just like with normal sets.
2. nftables maintains a counter per set that holds the number of
elements. This counter gets decremented on element removal,
but its only incremented if the set has a upper maximum value.
Increment unconditionally, this will allow us to update the
maximum value later on.
3. At DCCP option maching, from Jeremy Sowden.
4. use struct_size macro, from Christophe JAILLET.
Conntrack:
5. Squash holes in struct nf_conntrack_expect, also Christophe JAILLET.
6. Allow clash resolution for GRE Protocol to avoid a packet drop,
from Faicker Mo.
Flowtable:
Simplify route logic and split large functions into smaller
chunks, from Pablo Neira Ayuso.
* tag 'nf-next-2023-05-18' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: flowtable: split IPv6 datapath in helper functions
netfilter: flowtable: split IPv4 datapath in helper functions
netfilter: flowtable: simplify route logic
netfilter: conntrack: allow insertion clash of gre protocol
netfilter: nft_set_pipapo: Use struct_size()
netfilter: Reorder fields in 'struct nf_conntrack_expect'
netfilter: nft_exthdr: add boolean DCCP option matching
netfilter: nf_tables: always increment set element count
netfilter: nf_tables: relax set/map validation checks
====================
Link: https://lore.kernel.org/r/20230518100759.84858-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Baozhu Ni [Wed, 17 May 2023 01:27:26 +0000 (09:27 +0800)]
e1000e: Add @adapter description to kdoc
Provide a description for the kernel doc of the @adapter
of e1000e_trigger_lsc()
Signed-off-by: Baozhu Ni <nibaozhu@yeah.net>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tom Rix [Fri, 28 Apr 2023 20:00:09 +0000 (16:00 -0400)]
igb: Define igb_pm_ops conditionally on CONFIG_PM
For s390, gcc with W=1 reports
drivers/net/ethernet/intel/igb/igb_main.c:186:32: error:
'igb_pm_ops' defined but not used [-Werror=unused-const-variable=]
186 | static const struct dev_pm_ops igb_pm_ops = {
| ^~~~~~~~~~
The only use of igb_pm_ops is conditional on CONFIG_PM.
The definition of igb_pm_ops should also be conditional on CONFIG_PM
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Kurt Kanzenbach [Wed, 12 Apr 2023 07:36:11 +0000 (09:36 +0200)]
igc: Avoid transmit queue timeout for XDP
High XDP load triggers the netdev watchdog:
|NETDEV WATCHDOG: enp3s0 (igc): transmit queue 2 timed out
The reason is the Tx queue transmission start (txq->trans_start) is not updated
in XDP code path. Therefore, add it for all XDP transmission functions.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Linus Torvalds [Thu, 18 May 2023 15:52:14 +0000 (08:52 -0700)]
Merge tag 'net-6.4-rc3' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, xfrm, bluetooth and netfilter.
Current release - regressions:
- ipv6: fix RCU splat in ipv6_route_seq_show()
- wifi: iwlwifi: disable RFI feature
Previous releases - regressions:
- tcp: fix possible sk_priority leak in tcp_v4_send_reset()
- tipc: do not update mtu if msg_max is too small in mtu negotiation
- netfilter: fix null deref on element insertion
- devlink: change per-devlink netdev notifier to static one
- phylink: fix ksettings_set() ethtool call
- wifi: mac80211: fortify the spinlock against deadlock by interrupt
- wifi: brcmfmac: check for probe() id argument being NULL
- eth: ice:
- fix undersized tx_flags variable
- fix ice VF reset during iavf initialization
- eth: hns3: fix sending pfc frames after reset issue
Previous releases - always broken:
- xfrm: release all offloaded policy memory
- nsh: use correct mac_offset to unwind gso skb in nsh_gso_segment()
- vsock: avoid to close connected socket after the timeout
- dsa: rzn1-a5psw: enable management frames for CPU port
- eth: virtio_net: fix error unwinding of XDP initialization
- eth: tun: fix memory leak for detached NAPI queue.
Misc:
- MAINTAINERS: sctp: move Neil to CREDITS"
* tag 'net-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
MAINTAINERS: skip CCing netdev for Bluetooth patches
mdio_bus: unhide mdio_bus_init prototype
bridge: always declare tunnel functions
atm: hide unused procfs functions
net: isa: include net/Space.h
Revert "ARM: dts: stm32: add CAN support on stm32f746"
netfilter: nft_set_rbtree: fix null deref on element insertion
netfilter: nf_tables: fix nft_trans type confusion
netfilter: conntrack: define variables exp_nat_nla_policy and any_addr with CONFIG_NF_NAT
net: wwan: t7xx: Ensure init is completed before system sleep
net: selftests: Fix optstring
net: pcs: xpcs: fix C73 AN not getting enabled
net: wwan: iosm: fix NULL pointer dereference when removing device
vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit()
mailmap: add entries for Nikolay Aleksandrov
igb: fix bit_shift to be in [1..8] range
net: dsa: mv88e6xxx: Fix mv88e6393x EPC write command offset
cassini: Fix a memory leak in the error handling path of cas_init_one()
tun: Fix memory leak for detached NAPI queue.
can: kvaser_pciefd: Disable interrupts in probe error path
...
Linus Torvalds [Thu, 18 May 2023 15:42:23 +0000 (08:42 -0700)]
Merge tag 'media/v6.4-3' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"Several fixes for the dvb core and drivers:
- fix UAF and null pointer de-reference in DVB core
- fix kernel runtime warning for blocking operation in wait_event*()
in dvb core
- fix write size bug in DVB conditional access core
- fix dvb demux continuity counter debug check logic
- randconfig build fixes in pvrusb2 and mn88443x
- fix memory leak in ttusb-dec
- fix netup_unidvb probe-time error check logic
- improve error handling in dw2102 if it can't retrieve DVB MAC
address"
* tag 'media/v6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: dvb-core: Fix use-after-free due to race condition at dvb_ca_en50221
media: dvb-core: Fix kernel WARNING for blocking operation in wait_event*()
media: dvb-core: Fix use-after-free due to race at dvb_register_device()
media: dvb-core: Fix use-after-free due on race condition at dvb_net
media: dvb-core: Fix use-after-free on race condition at dvb_frontend
media: mn88443x: fix !CONFIG_OF error by drop of_match_ptr from ID table
media: ttusb-dec: fix memory leak in ttusb_dec_exit_dvb()
media: dvb_ca_en50221: fix a size write bug
media: netup_unidvb: fix irq init by register it at the end of probe
media: dvb-usb: dw2102: fix uninit-value in su3000_read_mac_address
media: dvb-usb: digitv: fix null-ptr-deref in digitv_i2c_xfer()
media: dvb-usb-v2: rtl28xxu: fix null-ptr-deref in rtl28xxu_i2c_xfer
media: dvb-usb-v2: ce6230: fix null-ptr-deref in ce6230_i2c_master_xfer()
media: dvb-usb-v2: ec168: fix null-ptr-deref in ec168_i2c_xfer()
media: dvb-usb: az6027: fix three null-ptr-deref in az6027_i2c_xfer()
media: netup_unidvb: fix use-after-free at del_timer()
media: dvb_demux: fix a bug for the continuity counter
media: pvrusb2: fix DVB_CORE dependency
Paolo Abeni [Thu, 18 May 2023 13:32:12 +0000 (15:32 +0200)]
Merge branch 'net-lan966x-add-support-for-pcp-dei-dscp'
Horatiu Vultur says:
====================
net: lan966x: Add support for PCP, DEI, DSCP
This patch series extends lan966x to offload to the hardware the
following features:
- PCP: this configuration is per port both at ingress and egress.
- App trust: which allows to specify a trust order of app selectors.
This can be PCP or DSCP or DSCP/PCP.
- default priority
- DSCP: this configuration is shared between the ports both at ingress
and egress.
====================
Link: https://lore.kernel.org/r/20230516201408.3172428-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Horatiu Vultur [Tue, 16 May 2023 20:14:08 +0000 (22:14 +0200)]
net: lan966x: Add support for DSCP rewrite
Add support for DSCP rewrite in lan966x driver. On egress DSCP is
rewritten from either classified DSCP, or frame DSCP. Classified DSCP is
determined by the Analyzer Classifier on ingress, and is mapped from
classified QoS class and DP level. Classification of DSCP is by default
enabled for all ports.
It is required that DSCP is trusted for the egress port *and* rewrite
table is not empty, in order to rewrite DSCP based on classified DSCP,
otherwise DSCP is always rewritten from frame DSCP.
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Horatiu Vultur [Tue, 16 May 2023 20:14:07 +0000 (22:14 +0200)]
net: lan966x: Add support for PCP rewrite
Add support for rewrite of PCP and DEI value, based on QoS and DP level.
The DCB rewrite table is queried for mappings between priority and
PCP/DEI. The classified DP level is then encoded in the DEI bit, if a
mapping for DEI exists.
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Horatiu Vultur [Tue, 16 May 2023 20:14:06 +0000 (22:14 +0200)]
net: lan966x: Add support for offloading default prio
Add support for offloading default prio.
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Horatiu Vultur [Tue, 16 May 2023 20:14:05 +0000 (22:14 +0200)]
net: lan966x: Add support for offloading dscp table
Add support for offloading dscp app entries. The dscp values are global
for all lan966x ports.
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Horatiu Vultur [Tue, 16 May 2023 20:14:04 +0000 (22:14 +0200)]
net: lan966x: Add support for apptrust
Make use of set/getapptrust() to implement per-selector trust
and trust order.
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Horatiu Vultur [Tue, 16 May 2023 20:14:03 +0000 (22:14 +0200)]
net: lan966x: Add support for offloading pcp table
Add support for offloading pcp app entries. Lan966x has 8 priority
queues per port and for each priority it also has a drop precedence.
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Horatiu Vultur [Tue, 16 May 2023 20:14:02 +0000 (22:14 +0200)]
net: lan966x: Add registers to configure PCP, DEI, DSCP
Add the registers that are needed to configure the PCP, DEI and DSCP
of the switch both at ingress and also at egress.
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 18 May 2023 09:06:28 +0000 (11:06 +0200)]
Merge tag 'linux-can-fixes-for-6.4-
20230518' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2023-05-18
this is a pull request of 7 patches for net/master.
The first 6 patches are by Jimmy Assarsson and fix several bugs in the
kvaser_pciefd driver.
The latest patch is from me and reverts a change in stm32f746.dtsi
that causes build errors due to a missing dependent patch.
* tag 'linux-can-fixes-for-6.4-
20230518' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
Revert "ARM: dts: stm32: add CAN support on stm32f746"
can: kvaser_pciefd: Disable interrupts in probe error path
can: kvaser_pciefd: Do not send EFLUSH command on TFD interrupt
can: kvaser_pciefd: Empty SRB buffer in probe
can: kvaser_pciefd: Call request_irq() before enabling interrupts
can: kvaser_pciefd: Clear listen-only bit if not explicitly requested
can: kvaser_pciefd: Set CAN_STATE_STOPPED in kvaser_pciefd_stop()
====================
Link: https://lore.kernel.org/r/20230518073241.1110453-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pablo Neira Ayuso [Thu, 11 May 2023 05:35:35 +0000 (07:35 +0200)]
netfilter: flowtable: split IPv6 datapath in helper functions
Add context structure and helper functions to look up for a matching
IPv6 entry in the flowtable and to forward packets.
No functional changes are intended.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Pablo Neira Ayuso [Thu, 11 May 2023 05:35:34 +0000 (07:35 +0200)]
netfilter: flowtable: split IPv4 datapath in helper functions
Add context structure and helper functions to look up for a matching
IPv4 entry in the flowtable and to forward packets.
No functional changes are intended.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Pablo Neira Ayuso [Thu, 11 May 2023 05:35:33 +0000 (07:35 +0200)]
netfilter: flowtable: simplify route logic
Grab reference to dst from skbuff earlier to simplify route caching.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Faicker Mo [Sun, 23 Apr 2023 02:29:57 +0000 (10:29 +0800)]
netfilter: conntrack: allow insertion clash of gre protocol
NVGRE tunnel is used in the VM-to-VM communications. The VM packets
are encapsulated in NVGRE and sent from the host. For NVGRE
there are two tuples(outer sip and outer dip) in the host conntrack item.
Insertion clashes are more likely to happen if the concurrent connections
are sent from the VM.
Signed-off-by: Faicker Mo <faicker.mo@ucloud.cn>
Signed-off-by: Florian Westphal <fw@strlen.de>
Christophe JAILLET [Fri, 5 May 2023 21:26:34 +0000 (23:26 +0200)]
netfilter: nft_set_pipapo: Use struct_size()
Use struct_size() instead of hand writing it.
This is less verbose and more informative.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Christophe JAILLET [Mon, 8 May 2023 16:53:14 +0000 (18:53 +0200)]
netfilter: Reorder fields in 'struct nf_conntrack_expect'
Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nf_conntrack_expect' from 264
to 256 bytes.
This structure deserve a dedicated cache, so reducing its size looks nice.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Jeremy Sowden [Tue, 9 May 2023 21:19:45 +0000 (22:19 +0100)]
netfilter: nft_exthdr: add boolean DCCP option matching
The xt_dccp iptables module supports the matching of DCCP packets based
on the presence or absence of DCCP options. Extend nft_exthdr to add
this functionality to nftables.
Link: https://bugzilla.netfilter.org/show_bug.cgi?id=930
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>