review.tizen.org Git - platform/kernel/linux-starfive.git/log

net: enetc: increase RX ring default size

As explained in the XDP_TX patch, when receiving a burst of frames with
the XDP_TX verdict, there is a momentary dip in the number of available
RX buffers. The system will eventually recover as TX completions will
start kicking in and refilling our RX BD ring again. But until that
happens, we need to survive with as few out-of-buffer discards as
possible.

This increases the memory footprint of the driver in order to avoid
discards at 2.5Gbps line rate 64B packet sizes, the maximum speed
available for testing on 1 port on NXP LS1028A.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: add support for XDP_TX

For reflecting packets back into the interface they came from, we create
an array of TX software BDs derived from the RX software BDs. Therefore,
we need to extend the TX software BD structure to contain most of the
stuff that's already present in the RX software BD structure, for
reasons that will become evident in a moment.

For a frame with the XDP_TX verdict, we don't reuse any buffer right
away as we do for XDP_DROP (the same page half) or XDP_PASS (the other
page half, same as the skb code path).

Because the buffer transfers ownership from the RX ring to the TX ring,
reusing any page half right away is very dangerous. So what we can do is
we can recycle the same page half as soon as TX is complete.

The code path is:
enetc_poll
-> enetc_clean_rx_ring_xdp
   -> enetc_xdp_tx
   -> enetc_refill_rx_ring
(time passes, another MSI interrupt is raised)
enetc_poll
-> enetc_clean_tx_ring
   -> enetc_recycle_xdp_tx_buff

But that creates a problem, because there is a potentially large time
window between enetc_xdp_tx and enetc_recycle_xdp_tx_buff, period in
which we'll have less and less RX buffers.

Basically, when the ship starts sinking, the knee-jerk reaction is to
let enetc_refill_rx_ring do what it does for the standard skb code path
(refill every 16 consumed buffers), but that turns out to be very
inefficient. The problem is that we have no rx_swbd->page at our
disposal from the enetc_reuse_page path, so enetc_refill_rx_ring would
have to call enetc_new_page for every buffer that we refill (if we
choose to refill at this early stage). Very inefficient, it only makes
the problem worse, because page allocation is an expensive process, and
CPU time is exactly what we're lacking.

Additionally, there is an even bigger problem: if we let
enetc_refill_rx_ring top up the ring's buffers again from the RX path,
remember that the buffers sent to transmission haven't disappeared
anywhere. They will be eventually sent, and processed in
enetc_clean_tx_ring, and an attempt will be made to recycle them.
But surprise, the RX ring is already full of new buffers, because we
were premature in deciding that we should refill. So not only we took
the expensive decision of allocating new pages, but now we must throw
away perfectly good and reusable buffers.

So what we do is we implement an elastic refill mechanism, which keeps
track of the number of in-flight XDP_TX buffer descriptors. We top up
the RX ring only up to the total ring capacity minus the number of BDs
that are in flight (because we know that those BDs will return to us
eventually).

The enetc driver manages 1 RX ring per CPU, and the default TX ring
management is the same. So we do XDP_TX towards the TX ring of the same
index, because it is affined to the same CPU. This will probably not
produce great results when we have a tc-taprio/tc-mqprio qdisc on the
interface, because in that case, the number of TX rings might be
greater, but I didn't add any checks for that yet (mostly because I
didn't know what checks to add).

It should also be noted that we need to change the DMA mapping direction
for RX buffers, since they may now be reflected into the TX ring of the
same device. We choose to use DMA_BIDIRECTIONAL instead of unmapping and
remapping as DMA_TO_DEVICE, because performance is better this way.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: add support for XDP_DROP and XDP_PASS

For the RX ring, enetc uses an allocation scheme based on pages split
into two buffers, which is already very efficient in terms of preventing
reallocations / maximizing reuse, so I see no reason why I would change
that.

+--------+--------+--------+--------+--------+--------+--------+
|        |        |        |        |        |        |        |
| half B | half B | half B | half B | half B | half B | half B |
|        |        |        |        |        |        |        |
+--------+--------+--------+--------+--------+--------+--------+
|        |        |        |        |        |        |        |
| half A | half A | half A | half A | half A | half A | half A | RX ring
|        |        |        |        |        |        |        |
+--------+--------+--------+--------+--------+--------+--------+
     ^                                                     ^
     |                                                     |
next_to_clean                                       next_to_alloc
                                                      next_to_use

                   +--------+--------+--------+--------+--------+
                   |        |        |        |        |        |
                   | half B | half B | half B | half B | half B |
                   |        |        |        |        |        |
+--------+--------+--------+--------+--------+--------+--------+
|        |        |        |        |        |        |        |
| half B | half B | half A | half A | half A | half A | half A | RX ring
|        |        |        |        |        |        |        |
+--------+--------+--------+--------+--------+--------+--------+
|        |        |   ^                                   ^
| half A | half A |   |                                   |
|        |        | next_to_clean                   next_to_use
+--------+--------+
              ^
              |
         next_to_alloc

then when enetc_refill_rx_ring is called, whose purpose is to advance
next_to_use, it sees that it can take buffers up to next_to_alloc, and
it says "oh, hey, rx_swbd->page isn't NULL, I don't need to allocate
one!".

The only problem is that for default PAGE_SIZE values of 4096, buffer
sizes are 2048 bytes. While this is enough for normal skb allocations at
an MTU of 1500 bytes, for XDP it isn't, because the XDP headroom is 256
bytes, and including skb_shared_info and alignment, we end up being able
to make use of only 1472 bytes, which is insufficient for the default
MTU.

To solve that problem, we implement scatter/gather processing in the
driver, because we would really like to keep the existing allocation
scheme. A packet of 1500 bytes is received in a buffer of 1472 bytes and
another one of 28 bytes.

Because the headroom required by XDP is different (and much larger) than
the one required by the network stack, whenever a BPF program is added
or deleted on the port, we drain the existing RX buffers and seed new
ones with the required headroom. We also keep the required headroom in
rx_ring->buffer_offset.

The simplest way to implement XDP_PASS, where an skb must be created, is
to create an xdp_buff based on the next_to_clean RX BDs, but not clear
those BDs from the RX ring yet, just keep the original index at which
the BDs for this frame started. Then, if the verdict is XDP_PASS,
instead of converting the xdb_buff to an skb, we replay a call to
enetc_build_skb (just as in the normal enetc_clean_rx_ring case),
starting from the original BD index.

We would also like to be minimally invasive to the regular RX data path,
and not check whether there is a BPF program attached to the ring on
every packet. So we create a separate RX ring processing function for
XDP.

Because we only install/remove the BPF program while the interface is
down, we forgo the rcu_read_lock() in enetc_clean_rx_ring, since there
shouldn't be any circumstance in which we are processing packets and
there is a potentially freed BPF program attached to the RX ring.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: move up enetc_reuse_page and enetc_page_reusable

For XDP_TX, we need to call enetc_reuse_page from enetc_clean_tx_ring,
so we need to avoid a forward declaration.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: clean the TX software BD on the TX confirmation path

With the future introduction of some new fields into enetc_tx_swbd such
as is_xdp_tx, is_xdp_redirect etc, we need not only to set these bits
to true from the XDP_TX/XDP_REDIRECT code path, but also to false from
the old code paths.

This is because TX software buffer descriptors are kept in a ring that
is shadow of the hardware TX ring, so these structures keep getting
reused, and there is always the possibility that when a software BD is
reused (after we ran a full circle through the TX ring), the old user of
the tx_swbd had set is_xdp_tx = true, and now we are sending a regular
skb, which would need to set is_xdp_tx = false.

To be minimally invasive to the old code paths, let's just scrub the
software TX BD in the TX confirmation path (enetc_clean_tx_ring), once
we know that nobody uses this software TX BD (tx_ring->next_to_clean
hasn't yet been updated, and the TX paths check enetc_bd_unused which
tells them if there's any more space in the TX ring for a new enqueue).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: add a dedicated is_eof bit in the TX software BD

In the transmit path, if we have a scatter/gather frame, it is put into
multiple software buffer descriptors, the last of which has the skb
pointer populated (which is necessary for rearming the TX MSI vector and
for collecting the two-step TX timestamp from the TX confirmation path).

At the moment, this is sufficient, but with XDP_TX, we'll need to
service TX software buffer descriptors that don't have an skb pointer,
however they might be final nonetheless. So add a dedicated bit for
final software BDs that we populate and check explicitly. Also, we keep
looking just for an skb when doing TX timestamping, because we don't
want/need that for XDP.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: move skb creation into enetc_build_skb

We need to build an skb from two code paths now: from the plain RX data
path and from the XDP data path when the verdict is XDP_PASS.

Create a new enetc_build_skb function which contains the essential steps
for building an skb based on the first and last positions of buffer
descriptors within the RX ring.

We also squash the enetc_process_skb function into enetc_build_skb,
because what that function did wasn't very meaningful on its own.

The "rx_frm_cnt++" instruction has been moved around napi_gro_receive
for cosmetic reasons, to be in the same spot as rx_byte_cnt++, which
itself must be before napi_gro_receive, because that's when we lose
ownership of the skb.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: consume the error RX buffer descriptors in a dedicated function

We can and should check the RX BD errors before starting to build the
skb. The only apparent reason why things are done in this backwards
order is to spare one call to enetc_rxbd_next.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: remove extra dev_hold() for fallback tunnels

My previous commits added a dev_hold() in tunnels ndo_init(),
but forgot to remove it from special functions setting up fallback tunnels.

Fallback tunnels do call their respective ndo_init()

This leads to various reports like :

unregister_netdevice: waiting for ip6gre0 to become free. Usage count = 2

Fixes: 48bb5697269a ("ip6_tunnel: sit: proper dev_{hold|put} in ndo_[un]init methods")
Fixes: 6289a98f0817 ("sit: proper dev_{hold|put} in ndo_[un]init methods")
Fixes: 40cb881b5aaa ("ip6_vti: proper dev_{hold|put} in ndo_[un]init methods")
Fixes: 7f700334be9a ("ip6_gre: proper dev_{hold|put} in ndo_[un]init methods")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tipc: fix missing destroy_workqueue() on error in tipc_crypto_start()

Add the missing destroy_workqueue() before return from
tipc_crypto_start() in the error handling case.

Fixes: 1ef6f7c9390f ("tipc: add automatic session key exchange")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'inet-shrink-netns'

Eric Dumazet says:

====================
inet: shrink netns_ipv{4|6}

This patch series work on reducing footprint of netns_ipv4
and netns_ipv6. Some sysctls are converted to bytes,
and some fields are moves to reduce number of holes
and paddings.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: move ip6_dst_ops first in netns_ipv6

ip6_dst_ops have cache line alignement.

Moving it at beginning of netns_ipv6
removes a 48 byte hole, and shrinks netns_ipv6
from 12 to 11 cache lines.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: convert elligible sysctls to u8

Convert most sysctls that can fit in a byte.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: convert tcp_comp_sack_nr sysctl to u8

tcp_comp_sack_nr max value was already 255.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: convert igmp_link_local_mcast_reports sysctl to u8

This sysctl is a bool, can use less storage.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: convert fib_multipath_{use_neigh|hash_policy} sysctls to u8

Make room for better packing of netns_ipv4

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: convert udp_l3mdev_accept sysctl to u8

Reduce footprint of sysctls.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: convert fib_notify_on_flag_change sysctl to u8

Reduce footprint of sysctls.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: shrink netns_ipv4 by another cache line

By shuffling around some fields to remove 8 bytes of hole,
we can save one cache line.

pahole result before/after the patch :

/* size: 768, cachelines: 12, members: 139 */
/* sum members: 673, holes: 11, sum holes: 39 */
/* padding: 56 */
/* paddings: 2, sum paddings: 7 */
/* forced alignments: 1 */

->

/* size: 704, cachelines: 11, members: 139 */
/* sum members: 673, holes: 10, sum holes: 31 */
/* paddings: 2, sum paddings: 7 */
/* forced alignments: 1 */

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: shrink inet_timewait_death_row by 48 bytes

struct inet_timewait_death_row uses two cache lines, because we want
tw_count to use a full cache line to avoid false sharing.

Rework its definition and placement in netns_ipv4 so that:

1) We add 60 bytes of padding after tw_count to avoid
  false sharing, knowing that tcp_death_row will
  have ____cacheline_aligned_in_smp attribute.

2) We do not risk padding before tcp_death_row, because
  we move it at the beginning of netns_ipv4, even if new
fields are added later.

3) We do not waste 48 bytes of padding after it.

Note that I have not changed dccp.

pahole result for struct netns_ipv4 before/after the patch :

/* size: 832, cachelines: 13, members: 139 */
/* sum members: 721, holes: 12, sum holes: 95 */
/* padding: 16 */
/* paddings: 2, sum paddings: 55 */

->

/* size: 768, cachelines: 12, members: 139 */
/* sum members: 673, holes: 11, sum holes: 39 */
/* padding: 56 */
/* paddings: 2, sum paddings: 7 */
/* forced alignments: 1 */

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-coding-style'

Weihang Li says:

====================
net: fix some coding style issues

Do some cleanups according to the coding style of kernel, including wrong
print type, redundant and missing spaces and so on.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: lpc_eth: fix format warnings of block comments

Fix the following format warning:
1. Block comments use * on subsequent lines
2. Block comments use a trailing */ on a separate line

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: toshiba: fix the trailing format of some block comments

Use a trailling */ on a separate line for block comments.

Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ocelot: fix a trailling format issue with block comments

Use a tralling */ on a separate line for block comments.

Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: amd: correct some format issues

There should be a blank line after declarations.

Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: amd8111e: fix inappropriate spaces

Delete unncecessary spaces and add some reasonable spaces according to the
coding-style of kernel.

Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: remove extra words from comments

Remove the redundant "for" from the commment.

Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: fix inaccurate print type

Use "%u" to replace "hu%".

Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qrtr: Convert qrtr_ports from IDR to XArray

The XArray interface is easier for this driver to use. Also fixes a
bug reported by the improper use of GFP_ATOMIC.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: stmicro: Remove duplicate struct declaration

struct stmmac_safety_stats is declared twice. One has been
declared at 29th line. Remove the duplicate.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_tunnel: sit: proper dev_{hold|put} in ndo_[un]init methods

Same reasons than for the previous commits :
6289a98f0817 ("sit: proper dev_{hold|put} in ndo_[un]init methods")
40cb881b5aaa ("ip6_vti: proper dev_{hold|put} in ndo_[un]init methods")
7f700334be9a ("ip6_gre: proper dev_{hold|put} in ndo_[un]init methods")

After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger
a warning [1]

Issue here is that:

- all dev_put() should be paired with a corresponding prior dev_hold().

- A driver doing a dev_put() in its ndo_uninit() MUST also
  do a dev_hold() in its ndo_init(), only when ndo_init()
  is returning 0.

Otherwise, register_netdevice() would call ndo_uninit()
in its error path and release a refcount too soon.

[1]
WARNING: CPU: 1 PID: 21059 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
Modules linked in:
CPU: 1 PID: 21059 Comm: syz-executor.4 Not tainted 5.12.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
Code: 1d 6a 5a e8 09 31 ff 89 de e8 8d 1a ab fd 84 db 75 e0 e8 d4 13 ab fd 48 c7 c7 a0 e1 c1 89 c6 05 4a 5a e8 09 01 e8 2e 36 fb 04 <0f> 0b eb c4 e8 b8 13 ab fd 0f b6 1d 39 5a e8 09 31 ff 89 de e8 58
RSP: 0018:ffffc900025aefe8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000040000 RSI: ffffffff815c51f5 RDI: fffff520004b5def
RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815bdf8e R11: 0000000000000000 R12: ffff888023488568
R13: ffff8880254e9000 R14: 00000000dfd82cfd R15: ffff88802ee2d7c0
FS:  00007f13bc590700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0943e74000 CR3: 0000000025273000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__refcount_dec include/linux/refcount.h:344 [inline]
refcount_dec include/linux/refcount.h:359 [inline]
dev_put include/linux/netdevice.h:4135 [inline]
ip6_tnl_dev_uninit+0x370/0x3d0 net/ipv6/ip6_tunnel.c:387
register_netdevice+0xadf/0x1500 net/core/dev.c:10308
ip6_tnl_create2+0x1b5/0x400 net/ipv6/ip6_tunnel.c:263
ip6_tnl_newlink+0x312/0x580 net/ipv6/ip6_tunnel.c:2052
__rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3443
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3491
rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502
netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:674
____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
___sys_sendmsg+0xf3/0x170 net/socket.c:2404
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 919067cc845f ("net: add CONFIG_PCPU_DEV_REFCNT")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ethtool-fec-netlink'

Jakub Kicinski says:

====================
ethtool: support FEC configuration over netlink

This series adds support for the equivalents of ETHTOOL_GFECPARAM
and ETHTOOL_SFECPARAM over netlink.

As a reminder - this is an API which allows user to query current
FEC mode, as well as set FEC manually if autoneg is disabled.
It does not configure anything if autoneg is enabled (that said
few/no drivers currently reject .set_fecparam calls while autoneg
is disabled, hopefully FW will just ignore the settings).

The existing functionality is mostly preserved in the new API.
The ioctl interface uses a set of flags, and link modes to tell
user which modes are supported. Here is how the flags translate
to the new interface (skipping descriptions for actual FEC modes):

  ioctl flag      |   description         |  new API
================================================================
ETHTOOL_FEC_OFF   | disabled (supported)  | \
ETHTOOL_FEC_RS    |                       |  ` link mode bitset
ETHTOOL_FEC_BASER |                       |  / .._A_FEC_MODES
ETHTOOL_FEC_LLRS  |                       | /
ETHTOOL_FEC_AUTO  | pick based on cable   | bool .._A_FEC_AUTO
ETHTOOL_FEC_NONE  | not supported         | no bit, no AUTO reported

Since link modes are already depended on (although somewhat implicitly)
for expressing supported modes - the new interface uses them for
the manual configuration, as well as uses link mode bit number
to communicate the active mode.

Use of link modes allows us to define any number of FEC modes we want,
and reuse the strset we already have defined.

Separating AUTO as its own attribute is the biggest changed compared
to the ioctl. It means drivers can no longer report AUTO as the
active FEC mode because there is no link mode for AUTO.
active_fec == AUTO makes little sense in the first place IMHO,
active_fec should be the actual mode, so hopefully this is fine.

The other minor departure is that None is no longer explicitly
expressed in the API. But drivers are reasonable in handling of
this somewhat pointless bit, so I'm not expecting any issues there.

One extension which could be considered would be moving active FEC
to ETHTOOL_MSG_LINKMODE_*, but then why not move all of FEC into
link modes? I don't know where to draw the line.

netdevsim support and a simple self test are included.

Next step is adding stats similar to the ones added for pause.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
,

selftests: ethtool: add a netdevsim FEC test

Test FEC settings, iterate over configs.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: add FEC settings support

Add support for ethtool FEC and some ethtool error injection.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: support FEC settings over netlink

Add FEC API to netlink.

This is not a 1-to-1 conversion.

FEC settings already depend on link modes to tell user which
modes are supported. Take this further an use link modes for
manual configuration. Old struct ethtool_fecparam is still
used to talk to the drivers, so we need to translate back
and forth. We can revisit the internal API if number of FEC
encodings starts to grow.

Enforce only one active FEC bit (by using a bit position
rather than another mask).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: Fix typo of 'network' in comment

Signed-off-by: Eric Lin <dslin1010@gmail.com>
Reported-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Only perform atomic nexthop bucket replacement when requested

When cleared, the 'force' parameter in nexthop bucket replacement
notifications indicates that a driver should try to perform an atomic
replacement. Meaning, only update the contents of the bucket if it is
inactive.

Since mlxsw only queries buckets' activity once every second, there is
no point in trying an atomic replacement if the idle timer interval is
smaller than 1 second.

Currently, mlxsw ignores the original value of 'force' and will always
try an atomic replacement if the idle timer is not smaller than 1
second.

Fix this by taking the original value of 'force' into account and never
promoting a non-atomic replacement to an atomic one.

Fixes: 617a77f044ed ("mlxsw: spectrum_router: Add nexthop bucket replacement support")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mptcp-subflow-disconnected'

Mat Martineau says:

====================
MPTCP: Allow initial subflow to be disconnected

An MPTCP connection is aggregated from multiple TCP subflows, and can
involve multiple IP addresses on either peer. The addresses used in the
initial subflow connection are assigned address id 0 on each side of the
link. More addresses can be added and shared with the peer using address
IDs of 1 or larger. MPTCP in Linux shares non-zero address IDs across
all MPTCP connections in a net namespace, which allows userspace to
manage subflow connections across a number of sockets. However, this
makes the address with id 0 a special case, since the IP address
associated with id 0 is potentially different for each socket.

This patch set allows the initial subflow to be disconnected when
userspace specifies an address to remove using both id 0 and an IP
address, or when the peer sends an RM_ADDR for id 0.

Patches 1 and 3 implement the change for requests from the peer and
userspace, respectively.

Patch 2 consolidates some code for disconnecting subflows.

Patches 4-6 update the self tests to cover removal of subflows using
address id 0.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mptcp: remove id 0 address testcases

This patch added the testcases for removing the id 0 subflow and the id 0
address.

In do_transfer, use the removing addresses number '9' for deleting the id
0 address.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mptcp: add addr argument for del_addr

For the id 0 address, different MPTCP connections could be using
different IP addresses for id 0.

This patch added an extra argument IP address for del_addr when
using id 0.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mptcp: avoid calling pm_nl_ctl with bad IDs

IDs are supposed to be between 0 and 255.

In pm_nl_ctl, for both the 'add' and 'get' instruction, the ID is casted
in a u_int8_t. So if we give 256, we will delete ID 0. Obviously, the
goal is not to delete this ID by giving 256.

We could modify pm_nl_ctl and stop if the ID is negative or higher than
255 but probably better not to increase the number of lines for such
things in this tool which is only used in selftests. Instead, we use it
within the limits.

This modification also means that we will no longer add a new ID for the
2nd entry. That's why we removed an expected entry from the dump and
introduced with
commit dc8eb10e95a8 ("selftests: mptcp: add testcases for setting the address ID").

So now we delete ID 9 like before and we add entries for IDs 10 to 255
that are deleted just after.

Note that this could be seen as a fix but it was not really an issue so
far: we were simply playing with ID 0/1 once again. With the following
commit ("selftests: mptcp: add addr argument for del_addr"), it will be
different because ID 0 is going to required an address. We don't want
errors when trying to delete ID 0 without the address argument.

Acked-and-tested-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: remove id 0 address

This patch added a new function mptcp_nl_remove_id_zero_address to
remove the id 0 address.

In this function, traverse all the existing msk sockets to find the
msk matched the input IP address. Then fill the removing list with
id 0, and pass it to mptcp_pm_remove_addr and mptcp_pm_remove_subflow.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: unify RM_ADDR and RM_SUBFLOW receiving

There are some duplicate code in mptcp_pm_nl_rm_addr_received and
mptcp_pm_nl_rm_subflow_received. This patch unifies them into a new
function named mptcp_pm_nl_rm_addr_or_subflow. In it, use the input
parameter rm_type to identify it's now removing an address or a subflow.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: remove all subflows involving id 0 address

There's only one subflow involving the non-zero id address, but there
may be multi subflows involving the id 0 address.

Here's an example:

local_id=0, remote_id=0
local_id=1, remote_id=0
local_id=0, remote_id=1

If the removing address id is 0, all the subflows involving the id 0
address need to be removed.

In mptcp_pm_nl_rm_addr_received/mptcp_pm_nl_rm_subflow_received, the
"break" prevents the iteration to the next subflow, so this patch
dropped them.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix icmp_echo_enable_probe sysctl

sysctl_icmp_echo_enable_probe is an u8.

ipv4_net_table entry should use
.maxlen = sizeof(u8).
.proc_handler = proc_dou8vec_minmax,

Fixes: f1b8fa9fa586 ("net: add sysctl for enabling RFC 8335 PROBE messages")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ionic-cleanups'

Shannon Nelson says:

====================
ionic: code cleanup for heartbeat, dma error counts, sizeof, stats

These patches are a few more bits of code cleanup found in
testing and review: count all our dma error instances, make
better use of sizeof, fix a race in our device heartbeat check,
and clean up code formatting in the ethtool stats collection.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: pull per-q stats work out of queue loops

Abstract out the per-queue data collection work into separate
functions from the per-queue loops in the stats reporting,
similar to what Alex did for the data label strings in
commit acebe5b6107c ("ionic: Update driver to use ethtool_sprintf")

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: avoid races in ionic_heartbeat_check

Rework the heartbeat checks to be sure that we're getting an
atomic operation. Through testing we found occasions where a
separate thread could clash with this check and cause erroneous
heartbeat check results.

Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: fix sizeof usage

Use the actual pointer that we care about as the subject of the
sizeof, rather than a struct name.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: count dma errors

Increment our dma-error counter in a couple of spots
that were missed before.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dpaa2-switch-STP'

Ioana Ciornei says:

====================
dpaa2-switch: add STP support

This patch set adds support for STP to the dpaa2-switch.

First of all, it fixes a bug which was determined by the improper usage
of bridge BR_STATE_* values directly in the MC ABI.
The next patches deal with creating an ACL table per port and trapping
the STP frames to the control interface by adding an entry into each
table.
The last patch configures proper learning state depending on the STP
state.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-switch: setup learning state on STP state change

Depending on what STP state a port is in, the learning on that port
should be enabled or disabled.

When the STP state is DISABLED, BLOCKING or LISTENING no learning should
be happening irrespective of what the bridge previously requested. The
learning state is changed to be the one setup by the bridge when the STP
state is LEARNING or FORWARDING.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-switch: trap STP frames to the CPU

Add an ACL entry in each port's ACL table to redirect any frame that
has the destination MAC address equal to the STP dmac to the control
interface.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-switch: keep track of the current learning state per port

Keep track of the current learning state per port so that we can
reference it in the next patches when setting up a STP state.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-switch: create and assign an ACL table per port

In order to trap frames to the CPU, the DPAA2 switch uses the ACL table.
At probe time, create an ACL table for each switch port so that in the
next patches we can use this to trap STP frames and redirect them to the
control interface.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-switch: fix the translation between the bridge and dpsw STP states

The numerical values used for STP states are different between the
bridge and the MC ABI therefore, the direct usage of the
BR_STATE_* macros directly in the structures passed to the firmware is
incorrect.

Create a separate function that translates between the bridge STP states
and the enum that holds the STP state as seen by the Management Complex.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tc-testing: add simple action change test

Use act_simple to verify that action created with 'tc actions change'
command exists after command returns. The goal is to verify internal action
API reference counting to ensure that the case when netlink message has
NLM_F_REPLACE flag set but action with specified index doesn't exist is
handled correctly.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'udp-gro-L4'

Paolo Abeni says:

====================
udp: GRO L4 improvements

This series improves the UDP L4 - either 'forward' or 'frag_list' -
co-existence with UDP tunnel GRO, allowing the first to take place
correctly even for encapsulated UDP traffic.

The first for patches are mostly bugfixes, addressing some GRO
edge-cases when both tunnels and L4 are present, enabled and in use.

The next 3 patches avoid unneeded segmentation when UDP GRO
traffic traverses in the receive path UDP tunnels.

Finally, some self-tests are included, covering the relevant
GRO scenarios.

Even if most patches are actually bugfixes, this series is
targeting net-next, as overall it makes available a new feature.

v2 -> v3:
- no code changes, more verbose commit messages and comment in
patch 1/8

v1 -> v2:
- restrict post segmentation csum fixup to the only the relevant pkts
- use individual 'accept_gso_type' fields instead of whole gso bitmask
(Willem)
- use only ipv6 addesses from test range in self-tests (Willem)
- hopefully clarified most individual patches commit messages
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: net: add UDP GRO forwarding self-tests

Create a bunch of virtual topologies and verify that
NETIF_F_GRO_FRAGLIST or NETIF_F_GRO_UDP_FWD-enabled
devices aggregate the ingress packets as expected.
Additionally check that the aggregate packets are
segmented correctly when landing on a socket

Also test SKB_GSO_FRAGLIST and SKB_GSO_UDP_L4 aggregation
on top of UDP tunnel (vxlan)

v1 -> v2:
- hopefully clarify the commit message
- moved the overlay network ipv6 range into the 'documentation'
reserved range (Willem)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bareudp: allow UDP L4 GRO passthrou

Similar to the previous commit, let even geneve
passthrou the L4 GRO packets

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

geneve: allow UDP L4 GRO passthrou

Similar to the previous commit, let even geneve
passthrou the L4 GRO packets

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: allow L4 GRO passthrough

When passing up an UDP GSO packet with L4 aggregation, there is
no need to segment it at the vxlan level. We can propagate the
packet untouched and let it be segmented later, if needed.

Introduce an helper to allow let the UDP socket to accept any
L4 aggregation and use it in the vxlan driver.

v1 -> v2:
- updated to use the newly introduced UDP socket 'accept*' fields

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: never accept GSO_FRAGLIST packets

Currently the UDP protocol delivers GSO_FRAGLIST packets to
the sockets without the expected segmentation.

This change addresses the issue introducing and maintaining
a couple of new fields to explicitly accept SKB_GSO_UDP_L4
or GSO_FRAGLIST packets. Additionally updates udp_unexpected_gso()
accordingly.

UDP sockets enabling UDP_GRO stil keep accept_udp_fraglist
zeroed.

v1 -> v2:
- use 2 bits instead of a whole GSO bitmask (Willem)

Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: properly complete L4 GRO over UDP tunnel packet

After the previous patch, the stack can do L4 UDP aggregation
on top of a UDP tunnel.

In such scenario, udp{4,6}_gro_complete will be called twice. This function
will enter its is_flist branch immediately, even though that is only
correct on the second call, as GSO_FRAGLIST is only relevant for the
inner packet.

Instead, we need to try first UDP tunnel-based aggregation, if the GRO
packet requires that.

This patch changes udp{4,6}_gro_complete to skip the frag list processing
when while encap_mark == 1, identifying processing of the outer tunnel
header.
Additionally, clears the field in udp_gro_complete() so that we can enter
the frag list path on the next round, for the inner header.

v1 -> v2:
- hopefully clarified the commit message

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: skip L4 aggregation for UDP tunnel packets

If NETIF_F_GRO_FRAGLIST or NETIF_F_GRO_UDP_FWD are enabled, and there
are UDP tunnels available in the system, udp_gro_receive() could end-up
doing L4 aggregation (either SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST) at
the outer UDP tunnel level for packets effectively carrying and UDP
tunnel header.

That could cause inner protocol corruption. If e.g. the relevant
packets carry a vxlan header, different vxlan ids will be ignored/
aggregated to the same GSO packet. Inner headers will be ignored, too,
so that e.g. TCP over vxlan push packets will be held in the GRO
engine till the next flush, etc.

Just skip the SKB_GSO_UDP_L4 and SKB_GSO_FRAGLIST code path if the
current packet could land in a UDP tunnel, and let udp_gro_receive()
do GRO via udp_sk(sk)->gro_receive.

The check implemented in this patch is broader than what is strictly
needed, as the existing UDP tunnel could be e.g. configured on top of
a different device: we could end-up skipping GRO at-all for some packets.

Anyhow, that is a very thin corner case and covering it will add quite
a bit of complexity.

v1 -> v2:
- hopefully clarify the commit message

Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
Fixes: 36707061d6ba ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: fixup csum for GSO receive slow path

When UDP packets generated locally by a socket with UDP_SEGMENT
traverse the following path:

UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) ->
UDP tunnel (rx) -> UDP socket (no UDP_GRO)

ip_summed will be set to CHECKSUM_PARTIAL at creation time and
such checksum mode will be preserved in the above path up to the
UDP tunnel receive code where we have:

__iptunnel_pull_header() -> skb_pull_rcsum() ->
skb_postpull_rcsum() -> __skb_postpull_rcsum()

The latter will convert the skb to CHECKSUM_NONE.

The UDP GSO packet will be later segmented as part of the rx socket
receive operation, and will present a CHECKSUM_NONE after segmentation.

Additionally the segmented packets UDP CB still refers to the original
GSO packet len. Overall that causes unexpected/wrong csum validation
errors later in the UDP receive path.

We could possibly address the issue with some additional checks and
csum mangling in the UDP tunnel code. Since the issue affects only
this UDP receive slow path, let's set a suitable csum status there.

Note that SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST packets lacking an UDP
encapsulation present a valid checksum when landing to udp_queue_rcv_skb(),
as the UDP checksum has been validated by the GRO engine.

v2 -> v3:
- even more verbose commit message and comments

v1 -> v2:
- restrict the csum update to the packets strictly needing them
- hopefully clarify the commit message and code comments

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ppp: deflate: Remove useless call "zlib_inflateEnd"

Fix the following whitescan warning:

Calling "zlib_inflateEnd(&state->strm)" is only useful for its return
value, which is ignored.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-repeated-words'

Huazhong Tan says:

====================
net: remove repeated words

This patch-set removes some repeated words in comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: remove repeated words

Remove repeated words "that" and "the".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Acked-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: remove repeated word

Remove repeated word "to".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bonding: remove repeated word

Remove repeated word "that".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: i40e: remove repeated words

Remove repeated words "to" and "try".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'obsdolete-todo'

Wang Qing says:

====================
Clean up obsolete TODO files

It is mentioned in the official documents of the Linux Foundation and WIKI
that you can participate in its development according to the TODO files of
each module.

But the TODO files here has not been updated for 15 years, and the function
development described in the file have been implemented or abandoned.

Its existence will mislead developers seeking to view outdated information.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/decnet: Delete obsolete TODO file

The TODO file here has not been updated from 2005, and the function
development described in the file have been implemented or abandoned.

Its existence will mislead developers seeking to view outdated information.

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ax25: Delete obsolete TODO file

The TODO file here has not been updated for 13 years, and the function
development described in the file have been implemented or abandoned.

Its existence will mislead developers seeking to view outdated information.

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fs/jffs2: Delete obsolete TODO file

The TODO file here has not been updated for 14 years, and the function
development described in the file have been implemented or abandoned.

Its existence will mislead developers seeking to view outdated information.

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fs/befs: Delete obsolete TODO file

The TODO file here has not been updated from 2005, and the function
development described in the file have been implemented or abandoned.

Its existence will mislead developers seeking to view outdated information.

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

scsi/aacraid: Delete obsolete TODO file

The TODO file here has not been updated from 2.6.12 for more than 15 years.
Its existence will mislead developers seeking to view outdated information.

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mips/sgi-ip27: Delete obsolete TODO file

The TODO file here has not been updated for 15 years, and the function
development described in the file have been implemented or abandoned.

Its existence will mislead developers seeking to view outdated information.

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mhi: remove pointless conditional before kfree_skb()

It already has null pointer check in kfree_skb(),
remove pointless pointer check before kfree_skb().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: intel: add cross time-stamping freq difference adjustment

Cross time-stamping mechanism used in certain instance of Intel mGbE
may run at different clock frequency in comparison to the clock
frequency used by processor, so we introduce cross T/S frequency
adjustment to ensure TSC calculation is correct when processor got the
cross time-stamps.

Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Use LIST_HEAD() for list_head

There's no need to declare a list and then init it manually,
just use the LIST_HEAD() macro.

Signed-off-by: Shixin Liu <liushixin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Use DEFINE_SPINLOCK() for spinlock

spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().

Changelog:
From v1:
1. fix the mistake reported by kernel test robot.

Signed-off-by: Shixin Liu <liushixin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rfc8335-probe'

Andreas Roeseler says:

====================
add support for RFC 8335 PROBE

The popular utility ping has several severe limitations, such as the
inability to query specific interfaces on a node and requiring
bidirectional connectivity between the probing and probed interfaces.
RFC 8335 attempts to solve these limitations by creating the new utility
PROBE which is a specialized ICMP message that makes use of the ICMP
Extension Structure outlined in RFC 4884.

This patchset adds definitions for the ICMP Extended Echo Request and
Reply (PROBE) types for both IPV4 and IPV6, adds a sysctl to enable
responses to PROBE messages, expands the list of supported ICMP messages
to accommodate PROBE types, adds ipv6_dev_find into ipv6_stubs, and adds
functionality to respond to PROBE requests.

Changes:
v1 -> v2:
- Add AFI definitions
- Switch to functions such as dev_get_by_name and ip_dev_find to lookup
   net devices

v2 -> v3:
Suggested by Willem de Bruijn <willemdebruijn.kernel@gmail.com>
- Add verification of incoming messages before looking up netdev
- Add prefix for PROBE specific defined variables
- Use proc_dointvec_minmax with zero and one for sysctl
- Create struct icmp_ext_echo_iio for parsing incoming packets
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
- Include net/addrconf.h library for ipv6_dev_find

v3 -> v4:
- Use in_addr instead of __be32 for storing IPV4 addresses
- Use IFNAMSIZ to statically allocate space for name in
   icmp_ext_echo_iio
Suggested by Willem de Bruijn <willemdebruijn.kernel@gmail.com>
- Use skb_header_pointer to verify fields in incoming message
- Add check to ensure that extobj_hdr.length is valid
- Check to ensure object payload is padded with ASCII NULL characters
   when probing by name, as specified by RFC 8335
- Statically allocate buff using IFNAMSIZ
- Add rcu blocking around ipv6_dev_find
- Use __in_dev_get_rcu to access IPV4 addresses of identified
   net_device
- Remove check for ICMPV6 PROBE types

v4 -> v5:
- Statically allocate buff to size IFNAMSIZ on declaration
- Remove goto probe in favor of single branch
- Remove strict check for incoming PROBE request padding to nearest
   32-bit boundary
Reported-by: kernel test robot <lkp@intel.com>
v5 -> v6:
- Add documentation for icmp_echo_enable_probe sysctl
- Remove RCU locking around ipv6_dev_find()
- Assign iio based on ctype
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

icmp: add response to RFC 8335 PROBE messages

Modify the icmp_rcv function to check PROBE messages and call icmp_echo
if a PROBE request is detected.

Modify the existing icmp_echo function to respond ot both ping and PROBE
requests.

This was tested using a custom modification to the iputils package and
wireshark. It supports IPV4 probing by name, ifindex, and probing by
both IPV4 and IPV6 addresses. It currently does not support responding
to probes off the proxy node (see RFC 8335 Section 2).

The modification to the iputils package is still in development and can
be found here: https://github.com/Juniper-Clinic-2020/iputils.git. It
supports full sending functionality of PROBE requests, but currently
does not parse the response messages, which is why Wireshark is required
to verify the sent and recieved PROBE messages. The modification adds
the ``-e'' flag to the command which allows the user to specify the
interface identifier to query the probed host. An example usage would be
<./ping -4 -e 1 [destination]> to send a PROBE request of ifindex 1 to the
destination node.

Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: add ipv6_dev_find to stubs

Add ipv6_dev_find to ipv6_stub to allow lookup of net_devices by IPV6
address in net/ipv4/icmp.c.

Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add support for sending RFC 8335 PROBE messages

Modify the ping_supported function to support PROBE message types. This
allows tools such as the ping command in the iputils package to be
modified to send PROBE requests through the existing framework for
sending ping requests.

Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add sysctl for enabling RFC 8335 PROBE messages

Section 8 of RFC 8335 specifies potential security concerns of
responding to PROBE requests, and states that nodes that support PROBE
functionality MUST be able to enable/disable responses and that
responses MUST be disabled by default

Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ICMPV6: add support for RFC 8335 PROBE

Add definitions for the ICMPV6 type of Extended Echo Request and
Extended Echo Reply, as defined by sections 2 and 3 of RFC 8335.

Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

icmp: add support for RFC 8335 PROBE

Add definitions for PROBE ICMP types and codes.

Add AFI definitions for IP and IPV6 as specified by IANA

Add a struct to represent the additional header when probing by IP
address (ctype == 3) for use in parsing incoming PROBE messages

Add a struct to represent the entire Interface Identification Object
(IIO) section of an incoming PROBE packet

Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: lan87xx: fix access to wrong register of LAN87xx

The function lan87xx_config_aneg_ext was introduced to configure
LAN95xxA but as well writes to undocumented register of LAN87xx.
This fix prevents that access.

The function lan87xx_config_aneg_ext gets more suitable for the new
behavior name.

Reported-by: Måns Rullgård <mans@mansr.com>
Fixes: 05b35e7eb9a1 ("smsc95xx: add phylib support")
Signed-off-by: Andre Edich <andre.edich@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-next-for-5.13-20210330' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2021-03-30

this is a pull request of 39 patches for net-next/master.

The first two patches update the MAINTAINERS file. One is by me and
removes Dan Murphy from the from m_can and tcan4x5x. The other one is
by Pankaj Sharma and updates the maintainership of the m-can mmio
driver.

The next three patches are by me and update the CAN echo skb handling.

Vincent Mailhol provides 5 patches where Transmitter Delay
Compensation is added CAN bittiming calculation is cleaned up.

The next patch is by me and adds a missing HAS_IOMEM to the grcan
driver.

Michal Simek's patch for the xilinx driver add dev_err_probe()
support.

Arnd Bergmann's patch for the ucan driver fixes a compiler warning.

Stephane Grosjean provides 3 patches for the peak USB drivers, which
add ethtool set_phys_id and CAN one-shot mode.

Xulin Sun's patch removes a not needed return check in the m-can
driver. Torin Cooper-Bennun provides 3 patches for the m-can driver
that add rx-offload support to ensure that skbs are sent from softirq
context. Wan Jiabing's patch for the tcan4x5x driver removes a
duplicate include.

The next 6 patches are by me and target the mcp251xfd driver. They add
devcoredump support, simplify the UINC handling, and add HW timestamp
support.

The remaining 12 patches target the c_can driver. The first 6 are by
me and do generic checkpatch related cleanup work. Dario Binacchi's
patches bring some cleanups and increase the number of usable message
objects from 16 to 64.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-updates-2021-03-29' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-03-29

Coexistence of CQE compression and HW PTP time-stamp:

From Aya this series improves mlx5 netdev driver to allow
both mlx5 CQE compression (RX descriptor compression, that saves on PCI
transaction) and HW time-stamp PTP to co-exists.

Prior to this series both features were mutually exclusive due to the
nature of CQE compression which reduces the size of RX descriptor for
the price of trimming some data, such as the time-stamp.

In order to allow CQE compression when PTP time stamping is enabled,
We enable it on the regular performance critical RX queues which will
service all the data path traffic that is not PTP.

PTP traffic will be re-directed to dedicated RX queues on which we will
not enable CQE compression and thus keep the time-stamp intact.

Having both features is critical for systems with low PCI BW, e.g.
Multi-Host.

The series will be adding:
1) Infrastructure to create a dedicated RX queue to service the PTP traffic
2) Flow steering plumbing to capture PTP traffic both UDP packets with
destination port 319 and L2 packets with ethertype 0x88F7
3) Steer PTP traffic to the dedicated RX queue.
4) The feature will be enabled when PTP is being configured via the
already existing PTP IOCTL when CQE compression is active, otherwise
no change to the driver flow.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

can: c_can: add support to 64 message objects

D_CAN controller supports 16, 32, 64 or 128 message objects, comparing
to 32 on C_CAN. AM335x/AM437x Sitara processors and DRA7 SOC all
instantiate a D_CAN controller with 64 message objects, as described
in the "DCAN features" subsection of the CAN chapter of their
technical reference manuals.

The driver policy has been kept unchanged, and as in the previous
version, the first half of the message objects is used for reception
and the second for transmission.

The I/O load is increased only in the case of 64 message objects,
keeping it unchanged in the case of 32. Two 32-bit read accesses are
in fact required, which however remained at 16-bit for configurations
with 32 message objects.

Link: https://lore.kernel.org/r/20210302215435.18286-7-dariobin@libero.it
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: prepare to up the message objects number

As pointed by commit c0a9f4d396c9 ("can: c_can: Reduce register
access") the "driver casts the 16 message objects in stone, which is
completely braindead as contemporary hardware has up to 128 message
objects".

The patch prepares the module to extend the number of message objects
beyond the 32 currently managed. This was achieved by transforming the
constants used to manage RX/TX messages into variables without
changing the driver policy.

Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20210302215435.18286-6-dariobin@libero.it
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: use 32-bit write to set arbitration register

The arbitration register is already set up with 32-bit writes in the
other parts of the code except for this point.

Link: https://lore.kernel.org/r/20210302215435.18286-5-dariobin@libero.it
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: add a comment about IF_RX interface's use

After reading the commit 640916db2bf7 ("can: c_can: Make it SMP safe")
it may sound strange to see the IF_RX interface used by the
can_inval_tx_object function. A comment was added to avoid any
misunderstanding.

Link: https://lore.kernel.org/r/20210302215435.18286-4-dariobin@libero.it
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: fix indentation

Commit 524369e2391f ("can: c_can: remove obsolete STRICT_FRAME_ORDERING Kconfig option")
left behind wrong indentation, fix it.

Link: https://lore.kernel.org/r/20210302215435.18286-3-dariobin@libero.it
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: remove unused code

Commit 9d23a9818cb1 ("can: c_can: Remove unused inline function") left
behind C_CAN_MSG_OBJ_TX_LAST constant.

Commit fa39b54ccf28 ("can: c_can: Get rid of pointless interrupts") left
behind C_CAN_MSG_RX_LOW_LAST and C_CAN_MSG_OBJ_RX_SPLIT constants.

The removed code also made a comment useless and misleading.

Link: https://lore.kernel.org/r/20210302215435.18286-2-dariobin@libero.it
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: fix remaining checkpatch warnings

This patch fixes the remaining checkpatch warnings in the driver.

Link: https://lore.kernel.org/r/20210304154240.2747987-7-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>