platform/kernel/linux-starfive.git
3 years agowwan: core: multiple netdevs deletion support
Sergey Ryazanov [Mon, 21 Jun 2021 22:50:54 +0000 (01:50 +0300)]
wwan: core: multiple netdevs deletion support

Use unregister_netdevice_queue() instead of simple
unregister_netdevice() if the WWAN netdev ops does not provide a dellink
callback. This will help to accelerate deletion of multiple netdevs.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agowwan: core: require WWAN netdev setup callback existence
Sergey Ryazanov [Mon, 21 Jun 2021 22:50:53 +0000 (01:50 +0300)]
wwan: core: require WWAN netdev setup callback existence

The setup callback will be unconditionally passed to the
alloc_netdev_mqs(), where the NULL pointer dereference will cause the
kernel panic. So refuse to register WWAN netdev ops with warning
generation if the setup callback is not provided.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agowwan: core: relocate ops registering code
Sergey Ryazanov [Mon, 21 Jun 2021 22:50:52 +0000 (01:50 +0300)]
wwan: core: relocate ops registering code

It is unlikely that RTNL callbacks will call WWAN ops (un-)register
functions, but it is highly likely that the ops (un-)register functions
will use RTNL link create/destroy handlers. So move the WWAN network
interface ops (un-)register functions below the RTNL callbacks to be
able to call them without forward declarations.

No functional changes, just code relocation.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agowwan_hwsim: support network interface creation
Sergey Ryazanov [Mon, 21 Jun 2021 22:50:51 +0000 (01:50 +0300)]
wwan_hwsim: support network interface creation

Add support for networking interface creation via the WWAN core by
registering the WWAN netdev creation ops for each simulated WWAN device.
Implemented minimalistic netdev support where the xmit callback just
consumes all egress skbs.

This should help with WWAN network interfaces creation testing.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mptcp-optimizations'
David S. Miller [Tue, 22 Jun 2021 16:57:45 +0000 (09:57 -0700)]
Merge branch 'mptcp-optimizations'

Mat Martineau says:

====================
mptcp: A few optimizations

Here is a set of patches that we've accumulated and tested in the MPTCP
tree.

Patch 1 removes the MPTCP-level tx skb cache that added complexity but
did not provide a meaningful benefit.

Patch 2 uses the fast socket lock in more places.

Patch 3 improves handling of a data-ready flag.

Patch 4 deletes an unnecessary and racy connection state check.

Patch 5 adds a MIB counter for one type of invalid MPTCP header.

Patch 6 improves self test failure output.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: mptcp: display proper reason to abort tests
Matthieu Baerts [Mon, 21 Jun 2021 22:54:38 +0000 (15:54 -0700)]
selftests: mptcp: display proper reason to abort tests

Without this modification, we were often displaying this error messages:

  FAIL: Could not even run loopback test

But $ret could have been set to a non 0 value in many different cases:

- net.mptcp.enabled=0 is not working as expected
- setsockopt(..., TCP_ULP, "mptcp", ...) is allowed
- ping between each netns are failing
- tests between ns1 as a receiver and ns>1 are failing
- other tests not involving ns1 as a receiver are failing

So not only for the loopback test.

Now a clearer message, including the time it took to run all tests, is
displayed.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: add MIB counter for invalid mapping
Paolo Abeni [Mon, 21 Jun 2021 22:54:37 +0000 (15:54 -0700)]
mptcp: add MIB counter for invalid mapping

Account this exceptional events for better introspection.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: drop redundant test in move_skbs_to_msk()
Paolo Abeni [Mon, 21 Jun 2021 22:54:36 +0000 (15:54 -0700)]
mptcp: drop redundant test in move_skbs_to_msk()

Currently we check the msk state to avoid enqueuing new
skbs at msk shutdown time.

Such test is racy - as we can't acquire the msk socket lock -
and useless, as the caller already checked the subflow
field 'disposable', covering the same scenario in a race
free manner - read and updated under the ssk socket lock.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: don't clear MPTCP_DATA_READY in sk_wait_event()
Paolo Abeni [Mon, 21 Jun 2021 22:54:35 +0000 (15:54 -0700)]
mptcp: don't clear MPTCP_DATA_READY in sk_wait_event()

If we don't flush entirely the receive queue, we need set
again such bit later. We can simply avoid clearing it.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: use fast lock for subflows when possible
Paolo Abeni [Mon, 21 Jun 2021 22:54:34 +0000 (15:54 -0700)]
mptcp: use fast lock for subflows when possible

There are a bunch of callsite where the ssk socket
lock is acquired using the full-blown version eligible for
the fast variant. Let's move to the latter.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomptcp: drop tx skb cache
Paolo Abeni [Mon, 21 Jun 2021 22:54:33 +0000 (15:54 -0700)]
mptcp: drop tx skb cache

The mentioned cache was introduced to reduce the number of skb
allocation in atomic context, but the required complexity is
excessive.

This change remove the mentioned cache.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'marvell-mdio-ACPI'
David S. Miller [Tue, 22 Jun 2021 16:54:55 +0000 (09:54 -0700)]
Merge branch 'marvell-mdio-ACPI'

Marcin Wojtas says:

====================
ACPI MDIO support for Marvell controllers

The third version of the patchset main change is
dropping a clock handling optimisation patch
for mvmdio driver. Other than that it sets
explicit dependency on FWNODE_MDIO for CONFIG_FSL_XGMAC_MDIO
and applies minor cosmetic improvements (please see the
'Changelog' below).

The firmware ACPI description is exposed in the public github branch:
https://github.com/semihalf-wojtas-marcin/edk2-platforms/commits/acpi-mdio-r20210613
There is also MacchiatoBin firmware binary available for testing:
https://drive.google.com/file/d/1eigP_aeM4wYQpEaLAlQzs3IN_w1-kQr0

I'm looking forward to the comments or remarks.

Best regards,
Marcin

Changelog:
v2->v3
* Rebase on top of net-next/master.
* Drop "net: mvmdio: simplify clock handling" patch.
* 1/6 - fix code block comments.
* 2/6 - unchanged
* 3/6 - add "depends on FWNODE_MDIO" for CONFIG_FSL_XGMAC_MDIO
* 4/6 - drop mention about the clocks from the commit message.
* 5/6 - unchanged
* 6/6 - add Andrew's RB.

v1->v2
* 1/7 - new patch
* 2/7 - new patch
* 3/7 - new patch
* 4/7 - new patch
* 5/7 - remove unnecessary `if (has_acpi_companion())` and rebase onto
        the new clock handling
* 6/7 - remove deprecated comment
* 7/7 - no changes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mvpp2: remove unused 'has_phy' field
Marcin Wojtas [Mon, 21 Jun 2021 17:30:28 +0000 (19:30 +0200)]
net: mvpp2: remove unused 'has_phy' field

The 'has_phy' field from struct mvpp2_port is no longer used.
Remove it.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mvpp2: enable using phylink with ACPI
Marcin Wojtas [Mon, 21 Jun 2021 17:30:27 +0000 (19:30 +0200)]
net: mvpp2: enable using phylink with ACPI

Now that the MDIO and phylink are supported in the ACPI
world, enable to use them in the mvpp2 driver. Ensure a backward
compatibility with the firmware whose ACPI description does
not contain the necessary elements for the proper phy handling
and fall back to relying on the link interrupts instead.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mvmdio: add ACPI support
Marcin Wojtas [Mon, 21 Jun 2021 17:30:26 +0000 (19:30 +0200)]
net: mvmdio: add ACPI support

This patch introducing ACPI support for the mvmdio driver by adding
acpi_match_table with two entries:

* "MRVL0100" for the SMI operation
* "MRVL0101" for the XSMI mode

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/fsl: switch to fwnode_mdiobus_register
Marcin Wojtas [Mon, 21 Jun 2021 17:30:25 +0000 (19:30 +0200)]
net/fsl: switch to fwnode_mdiobus_register

Utilize the newly added helper routine
for registering the MDIO bus via fwnode_
interface.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mdiobus: Introduce fwnode_mdbiobus_register()
Marcin Wojtas [Mon, 21 Jun 2021 17:30:24 +0000 (19:30 +0200)]
net: mdiobus: Introduce fwnode_mdbiobus_register()

This patch introduces a new helper function that
wraps acpi_/of_ mdiobus_register() and allows its
usage via common fwnode_ interface.

Fall back to raw mdiobus_register() in case CONFIG_FWNODE_MDIO
is not enabled, in order to satisfy compatibility
in all future user drivers.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoDocumentation: ACPI: DSD: describe additional MAC configuration
Marcin Wojtas [Mon, 21 Jun 2021 17:30:23 +0000 (19:30 +0200)]
Documentation: ACPI: DSD: describe additional MAC configuration

Document additional MAC configuration modes which can be processed
by the existing fwnode_ phylink helpers:

* "managed" standard ACPI _DSD property [1]
* "fixed-link" data-only subnode linked in the _DSD package via
  generic mechanism of the hierarchical data extension [2]

[1] https://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf
[2] https://github.com/UEFI/DSD-Guide/blob/main/dsd-guide.pdf

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agovirtio/vsock: avoid NULL deref in virtio_transport_seqpacket_allow()
Eric Dumazet [Mon, 21 Jun 2021 14:53:48 +0000 (07:53 -0700)]
virtio/vsock: avoid NULL deref in virtio_transport_seqpacket_allow()

Make sure the_virtio_vsock is not NULL before dereferencing it.

general protection fault, probably for non-canonical address 0xdffffc0000000071: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000388-0x000000000000038f]
CPU: 0 PID: 8452 Comm: syz-executor406 Not tainted 5.13.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:virtio_transport_seqpacket_allow+0xbf/0x210 net/vmw_vsock/virtio_transport.c:503
Code: e8 c6 d9 ab f8 84 db 0f 84 0f 01 00 00 e8 09 d3 ab f8 48 8d bd 88 03 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 06 0f 8e 2a 01 00 00 44 0f b6 a5 88 03 00 00
RSP: 0018:ffffc90003757c18 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: 0000000000000071 RSI: ffffffff88c908e7 RDI: 0000000000000388
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff88c90a06 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff88c90840 R14: 0000000000000000 R15: 0000000000000001
FS:  0000000001bee300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000082 CR3: 000000002847e000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 vsock_assign_transport+0x575/0x700 net/vmw_vsock/af_vsock.c:490
 vsock_connect+0x200/0xc00 net/vmw_vsock/af_vsock.c:1337
 __sys_connect_file+0x155/0x1a0 net/socket.c:1824
 __sys_connect+0x161/0x190 net/socket.c:1841
 __do_sys_connect net/socket.c:1851 [inline]
 __se_sys_connect net/socket.c:1848 [inline]
 __x64_sys_connect+0x6f/0xb0 net/socket.c:1848
 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x43ee69
Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd49e7c788 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000400488 RCX: 000000000043ee69
RDX: 0000000000000010 RSI: 0000000020000080 RDI: 0000000000000003
RBP: 0000000000402e50 R08: 0000000000000000 R09: 0000000000400488
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402ee0
R13: 0000000000000000 R14: 00000000004ac018 R15: 0000000000400488

Fixes: 53efbba12cc7 ("virtio/vsock: enable SEQPACKET for transport")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arseny Krasnov <arseny.krasnov@kaspersky.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmvnic: Use strscpy() instead of strncpy()
Kees Cook [Mon, 21 Jun 2021 21:35:09 +0000 (14:35 -0700)]
ibmvnic: Use strscpy() instead of strncpy()

Since these strings are expected to be NUL-terminated and the buffers
are exactly sized (in vnic_client_data_len()) with no padding, strncpy()
can be safely replaced with strscpy() here, as strncpy() on
NUL-terminated string is considered deprecated[1]. This has the
side-effect of silencing a -Warray-bounds warning due to the compiler
being confused about the vlcd incrementing:

In file included from ./include/linux/string.h:253,
                 from ./include/linux/bitmap.h:10,
                 from ./include/linux/cpumask.h:12,
                 from ./include/linux/mm_types_task.h:14,
                 from ./include/linux/mm_types.h:5,
                 from ./include/linux/buildid.h:5,
                 from ./include/linux/module.h:14,
                 from drivers/net/ethernet/ibm/ibmvnic.c:35:
In function '__fortify_strncpy',
    inlined from 'vnic_add_client_data' at drivers/net/ethernet/ibm/ibmvnic.c:3919:2:
./include/linux/fortify-string.h:39:30: warning: '__builtin_strncpy' offset 12 from the object at 'v
lcd' is out of the bounds of referenced subobject 'name' with type 'char[]' at offset 12 [-Warray-bo
unds]
   39 | #define __underlying_strncpy __builtin_strncpy
      |                              ^
./include/linux/fortify-string.h:51:9: note: in expansion of macro '__underlying_strncpy'
   51 |  return __underlying_strncpy(p, q, size);
      |         ^~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/ibm/ibmvnic.c: In function 'vnic_add_client_data':
drivers/net/ethernet/ibm/ibmvnic.c:3883:7: note: subobject 'name' declared here
 3883 |  char name[];
      |       ^~~~

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings

Cc: Dany Madden <drt@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Cc: Thomas Falcon <tlfalcon@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: handle ARPHRD_IP6GRE in dev_is_mac_header_xmit()
Guillaume Nault [Mon, 21 Jun 2021 20:08:49 +0000 (22:08 +0200)]
net: handle ARPHRD_IP6GRE in dev_is_mac_header_xmit()

Similar to commit 3b707c3008ca ("net: dev_is_mac_header_xmit() true for
ARPHRD_RAWIP"), add ARPHRD_IP6GRE to dev_is_mac_header_xmit(), to make
ip6gre compatible with act_mirred and __bpf_redirect().

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoRevert "net/sched: cls_flower: Remove match on n_proto"
Boris Sukholitko [Mon, 21 Jun 2021 09:24:29 +0000 (12:24 +0300)]
Revert "net/sched: cls_flower: Remove match on n_proto"

This reverts commit 0dca2c7404a938cb10c85d0515cee40ed5348788.

The commit in question breaks hardware offload of flower filters.

Quoting Vladimir Oltean <olteanv@gmail.com>:

 fl_hw_replace_filter() and fl_reoffload() create a struct
 flow_cls_offload with a rule->match.mask member derived from the mask
 of the software classifier: &f->mask->key - that same mask that is used
 for initializing the flow dissector keys, and the one from which Boris
 removed the basic.n_proto member because it was bothering him.

Reported-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ll_temac: Remove left-over debug message
Esben Haabendal [Mon, 21 Jun 2021 08:20:08 +0000 (10:20 +0200)]
net: ll_temac: Remove left-over debug message

Fixes: f63963411942 ("net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY")
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: add pf_family_names[] for protocol family
Yejune Deng [Mon, 21 Jun 2021 05:12:25 +0000 (13:12 +0800)]
net: add pf_family_names[] for protocol family

Modify the pr_info content from int to char * in sock_register() and
sock_unregister(), this looks more readable.

Fixed build error in ARCH=sparc64.

Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'ingenic-fixes'
David S. Miller [Mon, 21 Jun 2021 21:38:48 +0000 (14:38 -0700)]
Merge branch 'ingenic-fixes'

Zhou Yanjie says:

====================
Fix for Ingenic MAC support.

1.Remove the unexpected "snps,dwmac" item in the example.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodt-bindings: dwmac: Remove unexpected item.
周琰杰 (Zhou Yanjie) [Sun, 20 Jun 2021 12:38:49 +0000 (20:38 +0800)]
dt-bindings: dwmac: Remove unexpected item.

Remove the unexpected "snps,dwmac" item in the example.

Fixes: 3b8401066e5a ("dt-bindings: dwmac: Add bindings for new Ingenic SoCs.")
Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: Fix a memory leak in an error handling path in 'hclge_handle_error_info_log()'
Christophe JAILLET [Sun, 20 Jun 2021 09:49:40 +0000 (11:49 +0200)]
net: hns3: Fix a memory leak in an error handling path in 'hclge_handle_error_info_log()'

If this 'kzalloc()' fails we must free some resources as in all the other
error handling paths of this function.

Fixes: 2e2deee7618b ("net: hns3: add the RAS compatibility adaptation solution")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'fec-tx'
David S. Miller [Mon, 21 Jun 2021 21:24:21 +0000 (14:24 -0700)]
Merge branch 'fec-tx'

Joakim Zhang says:

====================
net: fec: fix TX bandwidth fluctuations

This patch set intends to fix TX bandwidth fluctuations, any feedback would be appreciated.

---
ChangeLogs:
V1: remove RFC tag, RFC discussions please turn to below:
    https://lore.kernel.org/lkml/YK0Ce5YxR2WYbrAo@lunn.ch/T/
V2: change functions to be static in this patch set. And add the
t-b tag.
V3: fix sparse warining: ntohs()->htons()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: fec: add ndo_select_queue to fix TX bandwidth fluctuations
Fugang Duan [Mon, 21 Jun 2021 06:27:37 +0000 (14:27 +0800)]
net: fec: add ndo_select_queue to fix TX bandwidth fluctuations

As we know that AVB is enabled by default, and the ENET IP design is
queue 0 for best effort, queue 1&2 for AVB Class A&B. Bandwidth of each
queue 1&2 set in driver is 50%, TX bandwidth fluctuated when selecting
tx queues randomly with FEC_QUIRK_HAS_AVB quirk available.

This patch adds ndo_select_queue callback to select queues for
transmitting to fix this issue. It will always return queue 0 if this is
not a vlan packet, and return queue 1 or 2 based on priority of vlan
packet.

You may complain that in fact we only use single queue for trasmitting
if we are not targeted to VLAN. Yes, but seems we have no choice, since
AVB is enabled when the driver probed, we can't switch this feature
dynamicly. After compare multiple queues to single queue, TX throughput
almost no improvement.

One way we can implemet is to configure the driver to multiple queues
with Round-robin scheme by default. Then add ndo_setup_tc callback to
enable/disable AVB feature for users. Unfortunately, ENET AVB IP seems
not follow the standard 802.1Qav spec. We only can program
DMAnCFG[IDLE_SLOPE] field to calculate bandwidth fraction. And idle
slope is restricted to certain valus (a total of 19). It's far away from
CBS QDisc implemented in Linux TC framework. If you strongly suggest to do
this, I think we only can support limited numbers of bandwidth and reject
others, but it's really urgly and wried.

With this patch, VLAN tagged packets route to queue 0/1/2 based on vlan
priority; VLAN untagged packets route to queue 0.

Tested-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Reported-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: fec: add FEC_QUIRK_HAS_MULTI_QUEUES represents i.MX6SX ENET IP
Joakim Zhang [Mon, 21 Jun 2021 06:27:36 +0000 (14:27 +0800)]
net: fec: add FEC_QUIRK_HAS_MULTI_QUEUES represents i.MX6SX ENET IP

Frieder Schrempf reported a TX throuthput issue [1], it happens quite often
that the measured bandwidth in TX direction drops from its expected/nominal
value to something like ~50% (for 100M) or ~67% (for 1G) connections.

[1] https://lore.kernel.org/linux-arm-kernel/421cc86c-b66f-b372-32f7-21e59f9a98bc@kontron.de/

The issue becomes clear after digging into it, Net core would select
queues when transmitting packets. Since FEC have not impletemented
ndo_select_queue callback yet, so it will call netdev_pick_tx to select
queues randomly.

For i.MX6SX ENET IP with AVB support, driver default enables this
feature. According to the setting of QOS/RCMRn/DMAnCFG registers, AVB
configured to Credit-based scheme, 50% bandwidth of each queue 1&2.

With below tests let me think more:
1) With FEC_QUIRK_HAS_AVB quirk, can reproduce TX bandwidth fluctuations issue.
2) Without FEC_QUIRK_HAS_AVB quirk, can't reproduce TX bandwidth fluctuations issue.

The related difference with or w/o FEC_QUIRK_HAS_AVB quirk is that, whether we
program FTYPE field of TxBD or not. As I describe above, AVB feature is
enabled by default. With FEC_QUIRK_HAS_AVB quirk, frames in queue 0
marked as non-AVB, and frames in queue 1&2 marked as AVB Class A&B. It's
unreasonable if frames in queue 1&2 are not required to be time-sensitive.
So when Net core select tx queues ramdomly, Credit-based scheme would work
and lead to TX bandwidth fluctuated. On the other hand, w/o
FEC_QUIRK_HAS_AVB quirk, frames in queue 1&2 are all marked as non-AVB, so
Credit-based scheme would not work.

Till now, how can we fix this TX throughput issue? Yes, please remove
FEC_QUIRK_HAS_AVB quirk if you suffer it from time-nonsensitive networking.
However, this quirk is used to indicate i.MX6SX, other setting depends
on it. So this patch adds a new quirk FEC_QUIRK_HAS_MULTI_QUEUES to
represent i.MX6SX, it is safe for us remove FEC_QUIRK_HAS_AVB quirk
now.

FEC_QUIRK_HAS_AVB quirk is set by default in the driver, and users may
not know much about driver details, they would waste effort to find the
root cause, that is not we want. The following patch is a implementation
to fix it and users don't need to modify the driver.

Tested-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Reported-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'dsa-cross-chip'
David S. Miller [Mon, 21 Jun 2021 19:50:20 +0000 (12:50 -0700)]
Merge branch 'dsa-cross-chip'

Vladimir Oltean says:

====================
Improvement for DSA cross-chip setups

This series improves some aspects in multi-switch DSA tree topologies:
- better device tree validation
- better handling of MTU changes
- better handling of multicast addresses
- removal of some unused code
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: remove cross-chip support from the MRP notifiers
Vladimir Oltean [Mon, 21 Jun 2021 16:42:19 +0000 (19:42 +0300)]
net: dsa: remove cross-chip support from the MRP notifiers

With MRP hardware assist being supported only by the ocelot switch
family, which by design does not support cross-chip bridging, the
current match functions are at best a guess and have not been confirmed
in any way to do anything relevant in a multi-switch topology.

Drop the code and make the notifiers match only on the targeted switch
port.

Cc: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: targeted MTU notifiers should only match on one port
Vladimir Oltean [Mon, 21 Jun 2021 16:42:18 +0000 (19:42 +0300)]
net: dsa: targeted MTU notifiers should only match on one port

dsa_slave_change_mtu() calls dsa_port_mtu_change() twice:
- it sends a cross-chip notifier with the MTU of the CPU port which is
  used to update the DSA links.
- it sends one targeted MTU notifier which is supposed to only match the
  user port on which we are changing the MTU. The "propagate_upstream"
  variable is used here to bypass the cross-chip notifier system from
  switch.c

But due to a mistake, the second, targeted notifier matches not only on
the user port, but also on the DSA link which is a member of the same
switch, if that exists.

And because the DSA links of the entire dst were programmed in a
previous round to the largest_mtu via a "propagate_upstream == true"
notification, then the dsa_port_mtu_change(propagate_upstream == false)
call that is immediately upcoming will break the MTU on the one DSA link
which is chip-wise local to the dp whose MTU is changing right now.

Example given this daisy chain topology:

   sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
[  cpu  ] [  user ] [  user ] [  dsa  ] [  user ]
[   x   ] [       ] [       ] [   x   ] [       ]
                                  |
                                  +---------+
                                            |
   sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
[  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
[       ] [       ] [       ] [       ] [   x   ]

ip link set sw0p1 mtu 9000
ip link set sw1p1 mtu 9000 # at this stage, sw0p1 and sw1p1 can talk
                           # to one another using jumbo frames
ip link set sw0p2 mtu 1500 # this programs the sw0p3 DSA link first to
                           # the largest_mtu of 9000, then reprograms it to
                           # 1500 with the "propagate_upstream == false"
                           # notifier, breaking communication between
                           # sw0p1 and sw1p1

To escape from this situation, make the targeted match really match on a
single port - the user port, and rename the "propagate_upstream"
variable to "targeted_match" to clarify the intention and avoid future
issues.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: calculate the largest_mtu across all ports in the tree
Vladimir Oltean [Mon, 21 Jun 2021 16:42:17 +0000 (19:42 +0300)]
net: dsa: calculate the largest_mtu across all ports in the tree

If we have a cross-chip topology like this:

   sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
[  cpu  ] [  user ] [  user ] [  dsa  ] [  user ]
                                  |
                                  +---------+
                                            |
   sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
[  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]

and we issue the following commands:

1. ip link set sw0p1 mtu 1700
2. ip link set sw1p1 mtu 1600

we notice the following happening:

Command 1. emits a non-targeted MTU notifier for the CPU port (sw0p0)
with the largest_mtu calculated across switch 0, of 1700. This matches
sw0p0, sw0p3 and sw1p4 (all CPU ports and DSA links).
Then, it emits a targeted MTU notifier for the user port (sw0p1), again
with MTU 1700 (this doesn't matter).

Command 2. emits a non-targeted MTU notifier for the CPU port (sw0p0)
with the largest_mtu calculated across switch 1, of 1600. This matches
the same group of ports as above, and decreases the MTU for the CPU port
and the DSA links from 1700 to 1600.

As a result, the sw0p1 user port can no longer communicate with its CPU
port at MTU 1700.

To address this, we should calculate the largest_mtu across all switches
that may share a CPU port, and only emit MTU notifiers with that value.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: execute dsa_switch_mdb_add only for routing port in cross-chip topologies
Vladimir Oltean [Mon, 21 Jun 2021 16:42:16 +0000 (19:42 +0300)]
net: dsa: execute dsa_switch_mdb_add only for routing port in cross-chip topologies

Currently, the notifier for adding a multicast MAC address matches on
the targeted port and on all DSA links in the system, be they upstream
or downstream links.

This leads to a considerable amount of useless traffic.

Consider this daisy chain topology, and a MDB add notifier emitted on
sw0p0. It matches on sw0p0, sw0p3, sw1p3 and sw2p4.

   sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
[  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
[   x   ] [       ] [       ] [   x   ] [       ]
                                  |
                                  +---------+
                                            |
   sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
[  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
[       ] [       ] [       ] [   x   ] [   x   ]
                                  |
                                  +---------+
                                            |
   sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
[  user ] [  user ] [  user ] [  user ] [  dsa  ]
[       ] [       ] [       ] [       ] [   x   ]

But switch 0 has no reason to send the multicast traffic for that MAC
address on sw0p3, which is how it reaches switches 1 and 2. Those
switches don't expect, according to the user configuration, to receive
this multicast address from switch 1, and they will drop it anyway,
because the only valid destination is the port they received it on.
They only need to configure themselves to deliver that multicast address
_towards_ switch 1, where the MDB entry is installed.

Similarly, switch 1 should not send this multicast traffic towards
sw1p3, because that is how it reaches switch 2.

With this change, the heat map for this MDB notifier changes as follows:

   sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
[  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
[   x   ] [       ] [       ] [       ] [       ]
                                  |
                                  +---------+
                                            |
   sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
[  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
[       ] [       ] [       ] [       ] [   x   ]
                                  |
                                  +---------+
                                            |
   sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
[  user ] [  user ] [  user ] [  user ] [  dsa  ]
[       ] [       ] [       ] [       ] [   x   ]

Now the mdb notifier behaves the same as the fdb notifier.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: export the dsa_port_is_{user,cpu,dsa} helpers
Vladimir Oltean [Mon, 21 Jun 2021 16:42:15 +0000 (19:42 +0300)]
net: dsa: export the dsa_port_is_{user,cpu,dsa} helpers

The difference between dsa_is_user_port and dsa_port_is_user is that the
former needs to look up the list of ports of the DSA switch tree in
order to find the struct dsa_port, while the latter directly receives it
as an argument.

dsa_is_user_port is already in widespread use and has its place, so
there isn't any chance of converting all callers to a single form.
But being able to do:
dsa_port_is_user(dp)
instead of
dsa_is_user_port(dp->ds, dp->index)

is much more efficient too, especially when the "dp" comes from an
iterator over the DSA switch tree - this reduces the complexity from
quadratic to linear.

Move these helpers from dsa2.c to include/net/dsa.h so that others can
use them too.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: assert uniqueness of dsa,member properties
Vladimir Oltean [Mon, 21 Jun 2021 16:42:14 +0000 (19:42 +0300)]
net: dsa: assert uniqueness of dsa,member properties

The cross-chip notifiers work by comparing each ds->index against the
info->sw_index value from the notifier. The ds->index is retrieved from
the device tree dsa,member property.

If a single tree cross-chip topology does not declare unique switch IDs,
this will result in hard-to-debug issues/voodoo effects such as the
cross-chip notifier for one switch port also matching the port with the
same number from another switch.

Check in dsa_switch_parse_member_of() whether the DSA switch tree
contains a DSA switch with the index we're preparing to add, before
actually adding it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: c101: remove redundant spaces
Peng Li [Sat, 19 Jun 2021 07:28:38 +0000 (15:28 +0800)]
net: c101: remove redundant spaces

According to the chackpatch.pl, no space before tabs.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: c101: replace comparison to NULL with "!card"
Peng Li [Sat, 19 Jun 2021 07:28:37 +0000 (15:28 +0800)]
net: c101: replace comparison to NULL with "!card"

According to the chackpatch.pl, comparison to NULL could
be written "!card".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: c101: add blank line after declarations
Peng Li [Sat, 19 Jun 2021 07:28:36 +0000 (15:28 +0800)]
net: c101: add blank line after declarations

This patch fixes the checkpatch error about missing a blank line
after declarations.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mlxsw-eeprom-page-by-page'
David S. Miller [Mon, 21 Jun 2021 19:33:05 +0000 (12:33 -0700)]
Merge branch 'mlxsw-eeprom-page-by-page'

Ido Schimmel says:

====================
mlxsw: Add support for module EEPROM read by page

Add support for ethtool_ops::get_module_eeprom_by_page() operation.

Patch #1 adds necessary field in device register.

Patch #2 documents possible MCIA status values so that more meaningful
error messages could be returned to user space via extack.

Patch #3 adds the actual implementation.
===================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: core: Add support for module EEPROM read by page
Ido Schimmel [Mon, 21 Jun 2021 07:50:41 +0000 (10:50 +0300)]
mlxsw: core: Add support for module EEPROM read by page

Add support for ethtool_ops::get_module_eeprom_by_page() which allows
user space to read transceiver module EEPROM based on passed parameters.

The I2C address is not validated in order to avoid module-specific code.
In case of wrong address, error will be returned from device's firmware.

Tested by comparing output with legacy method (ioctl) output.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: reg: Document possible MCIA status values
Ido Schimmel [Mon, 21 Jun 2021 07:50:40 +0000 (10:50 +0300)]
mlxsw: reg: Document possible MCIA status values

Will be used to emit meaningful messages to user space via extack in a
subsequent patch.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: reg: Add bank number to MCIA register
Ido Schimmel [Mon, 21 Jun 2021 07:50:39 +0000 (10:50 +0300)]
mlxsw: reg: Add bank number to MCIA register

Add bank number to MCIA (Management Cable Info Access) register in order
to allow access to banked pages on EEPROMs using CMIS (Common Management
Interface Specification) memory map.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'ipa-v3.1'
David S. Miller [Mon, 21 Jun 2021 19:31:00 +0000 (12:31 -0700)]
Merge branch 'ipa-v3.1'

Alex Elder says:

====================
net: ipa: add support for IPA v3.1

This series adds support for IPA v3.1, used by the Qualcomm
Snapdragon 835 (MSM8998).

The first patch adds "qcom,msm8998-ipa" to the DT binding.

The next four patches add code to ensure correct operation on
IPA v3.1:
  - Avoid touching unsupported inter-EE interrupt mask registers
  - Set the proper flags in the clock configuration register
  - Work around the lack of an IPA FLAVOR_0 register
  - Work around the lack of a GSI PARAM_2 register

The last patch defines configuration data for this version of IPA.

Many thanks are due to AngeloGioacchino Del Regno and Jami Kettunen,
both associated with SoMainline.  Angelo first posted code to
implement most of what was required for this, and Jami has been
helpful testing these changes on his hardware.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipa: add IPA v3.1 configuration data
Alex Elder [Mon, 21 Jun 2021 17:56:27 +0000 (12:56 -0500)]
net: ipa: add IPA v3.1 configuration data

Add support for the MSM8998 SoC, which includes IPA version 3.1.

Originally proposed by AngeloGioacchino Del Regno.

Link: https://lore.kernel.org/netdev/20210211175015.200772-6-angelogioacchino.delregno@somainline.org
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: AngeloGioacchino Del Regno
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipa: introduce gsi_ring_setup()
Alex Elder [Mon, 21 Jun 2021 17:56:26 +0000 (12:56 -0500)]
net: ipa: introduce gsi_ring_setup()

Prior to IPA v3.5.1, there is no HW_PARAM_2 GSI register, which we
use to determine the number of channels and endpoints per execution
environment.  In that case, we will just assume the number supported
is the maximum supported by the driver.

Introduce gsi_ring_setup() to encapsulate the code that determines
the number of channels and endpoints.

Update GSI_EVT_RING_COUNT_MAX so it is big enough to handle any
available channel for all supported hardware (IPA v4.9 can have 23
channels and 24 event rings).

Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: AngeloGioacchino Del Regno
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipa: FLAVOR_0 register doesn't exist until IPA v3.5
Alex Elder [Mon, 21 Jun 2021 17:56:25 +0000 (12:56 -0500)]
net: ipa: FLAVOR_0 register doesn't exist until IPA v3.5

The FLAVOR_0 version first appears in IPA v3.5, so avoid attempting
to read it for versions prior to that.

This register contains a concise definition of the number and
direction of endpoints supported by the hardware, and without it
we can't verify endpoint configuration in ipa_endpoint_config().
In this case, just indicate that any endpoint number is available
for use.

Originally proposed by AngeloGioacchino Del Regno.

Link: https://lore.kernel.org/netdev/20210211175015.200772-3-angelogioacchino.delregno@somainline.org
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: AngeloGioacchino Del Regno
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipa: disable misc clock gating for IPA v3.1
Alex Elder [Mon, 21 Jun 2021 17:56:24 +0000 (12:56 -0500)]
net: ipa: disable misc clock gating for IPA v3.1

For IPA v3.1, a workaround is needed to disable gating on a MISC
clock.  I have no further explanation, but this is what the
downstream code (msm-4.4) does.

This was suggested in a patch from AngeloGioacchino Del Regno.

Link: https://lore.kernel.org/netdev/20210211175015.200772-2-angelogioacchino.delregno@somainline.org
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: AngeloGioacchino Del Regno
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipa: inter-EE interrupts aren't always available
Alex Elder [Mon, 21 Jun 2021 17:56:23 +0000 (12:56 -0500)]
net: ipa: inter-EE interrupts aren't always available

The GSI inter-EE interrupts are not supported prior to IPA v3.5.
Don't attempt to initialize them in gsi_irq_setup() for hardware
that does not support them.

Originally proposed by AngeloGioacchino Del Regno.

Link: https://lore.kernel.org/netdev/20210211175015.200772-4-angelogioacchino.delregno@somainline.org
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: AngeloGioacchino Del Regno
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodt-bindings: net: qcom,ipa: add support for MSM8998
Alex Elder [Mon, 21 Jun 2021 17:56:22 +0000 (12:56 -0500)]
dt-bindings: net: qcom,ipa: add support for MSM8998

Add support for "qcom,msm8998-ipa", which uses IPA v3.1.

Originally proposed by AngeloGioacchino Del Regno.

Link: https://lore.kernel.org/linux-arm-msm/20210211175015.200772-8-angelogioacchino.delregno@somainline.org
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years ago__unix_find_socket_byname(): don't pass hash and type separately
Al Viro [Sat, 19 Jun 2021 03:50:33 +0000 (03:50 +0000)]
__unix_find_socket_byname(): don't pass hash and type separately

We only care about exclusive or of those, so pass that directly.
Makes life simpler for callers as well...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agounix_bind_bsd(): unlink if we fail after successful mknod
Al Viro [Sat, 19 Jun 2021 03:50:32 +0000 (03:50 +0000)]
unix_bind_bsd(): unlink if we fail after successful mknod

We can do that more or less safely, since the parent is
held locked all along.  Yes, somebody might observe the
object via dcache, only to have it disappear afterwards,
but there's really no good way to prevent that.  It won't
race with other bind(2) or attempts to move the sucker
elsewhere, or put something else in its place - locked
parent prevents that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agounix_bind_bsd(): move done_path_create() call after dealing with ->bindlock
Al Viro [Sat, 19 Jun 2021 03:50:31 +0000 (03:50 +0000)]
unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock

Final preparations for doing unlink on failure past the successful
mknod.  We can't hold ->bindlock over ->mknod() or ->unlink(), since
either might do sb_start_write() (e.g. on overlayfs).  However, we
can do it while holding filesystem and VFS locks - doing
kern_path_create()
vfs_mknod()
grab ->bindlock
if u->addr had been set
drop ->bindlock
done_path_create
return -EINVAL
else
assign the address to socket
drop ->bindlock
done_path_create
return 0
would be deadlock-free.  Here we massage unix_bind_bsd() to that
form.  We are still doing equivalent transformations.

Next commit will *not* be an equivalent transformation - it will
add a call of vfs_unlink() before done_path_create() in "alread bound"
case.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agofold unix_mknod() into unix_bind_bsd()
Al Viro [Sat, 19 Jun 2021 03:50:30 +0000 (03:50 +0000)]
fold unix_mknod() into unix_bind_bsd()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agounix_bind(): take BSD and abstract address cases into new helpers
Al Viro [Sat, 19 Jun 2021 03:50:29 +0000 (03:50 +0000)]
unix_bind(): take BSD and abstract address cases into new helpers

unix_bind_bsd() and unix_bind_abstract() respectively.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agounix_bind(): separate BSD and abstract cases
Al Viro [Sat, 19 Jun 2021 03:50:28 +0000 (03:50 +0000)]
unix_bind(): separate BSD and abstract cases

We do get some duplication that way, but it's minor compared to
parts that are different.  What we get is an ability to change
locking in BSD case without making failure exits very hard to
follow.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agounix_bind(): allocate addr earlier
Al Viro [Sat, 19 Jun 2021 03:50:27 +0000 (03:50 +0000)]
unix_bind(): allocate addr earlier

makes it easier to massage; we do pay for that by extra work
(kmalloc+memcpy+kfree) in some error cases, but those are not
on the hot paths anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoaf_unix: take address assignment/hash insertion into a new helper
Al Viro [Sat, 19 Jun 2021 03:50:26 +0000 (03:50 +0000)]
af_unix: take address assignment/hash insertion into a new helper

Duplicated logics in all bind variants (autobind, bind-to-path,
bind-to-abstract) gets taken into a common helper.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonfp: flower-ct: check for error in nfp_fl_ct_offload_nft_flow()
Dan Carpenter [Sat, 19 Jun 2021 13:53:26 +0000 (16:53 +0300)]
nfp: flower-ct: check for error in nfp_fl_ct_offload_nft_flow()

The nfp_fl_ct_add_flow() function can fail so we need to check for
failure.

Fixes: 95255017e0a8 ("nfp: flower-ct: add nft flows to nft list")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: qualcomm: rmnet: fix two pointer math bugs
Dan Carpenter [Sat, 19 Jun 2021 13:52:22 +0000 (16:52 +0300)]
net: qualcomm: rmnet: fix two pointer math bugs

We recently changed these two pointers from void pointers to struct
pointers and it breaks the pointer math so now the "txphdr" points
beyond the end of the buffer.

Fixes: 56a967c4f7e5 ("net: qualcomm: rmnet: Remove some unneeded casts")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: iosm: remove an unnecessary NULL check
Dan Carpenter [Sat, 19 Jun 2021 13:51:26 +0000 (16:51 +0300)]
net: iosm: remove an unnecessary NULL check

The address of &ipc_mux->ul_adb can't be NULL because it points to the
middle of a non-NULL struct.

Fixes: 9413491e20e1 ("net: iosm: encode or decode datagram")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: M Chetan Kumar <m.chetan.kumar@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/smc: Fix ENODATA tests in smc_nl_get_fback_stats()
Dan Carpenter [Sat, 19 Jun 2021 13:50:21 +0000 (16:50 +0300)]
net/smc: Fix ENODATA tests in smc_nl_get_fback_stats()

These functions return negative ENODATA but the minus sign was left out
in the tests.

Fixes: f0dd7bf5e330 ("net/smc: Add netlink support for SMC fallback statistics")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: fix a double shift bug
Dan Carpenter [Sat, 19 Jun 2021 13:49:18 +0000 (16:49 +0300)]
net: hns3: fix a double shift bug

These flags are used to set and test bits like this:

if (!test_bit(HCLGE_PTP_FLAG_TX_EN, &ptp->flags) ||

The issue is that test_bit() takes a bit number like 1, but we are
passing BIT(1) instead and it's testing BIT(BIT(1)).  This does not
cause a problem because it is always done consistently and the bit
values are very small.

Fixes: 0bf5eb788512 ("net: hns3: add support for PTP")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: fix different snprintf() limit
Dan Carpenter [Sat, 19 Jun 2021 13:47:38 +0000 (16:47 +0300)]
net: hns3: fix different snprintf() limit

This patch doesn't affect runtime at all, it's just a correctness issue.

The ptp->info.name[] buffer has 16 characters but the snprintf() limit
was capped at 32 characters.  Fortunately, HCLGE_DRIVER_NAME is "hclge"
which isn't close to 16 characters so we're fine.

Fixes: 0bf5eb788512 ("net: hns3: add support for PTP")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: tls: fix chacha+bidir tests
Jakub Kicinski [Fri, 18 Jun 2021 20:25:04 +0000 (13:25 -0700)]
selftests: tls: fix chacha+bidir tests

ChaCha support did not adjust the bidirectional test.
We need to set up KTLS in reverse direction correctly,
otherwise these two cases will fail:

  tls.12_chacha.bidir
  tls.13_chacha.bidir

Fixes: 4f336e88a870 ("selftests/tls: add CHACHA20-POLY1305 to tls selftests")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: tls: clean up uninitialized warnings
Jakub Kicinski [Fri, 18 Jun 2021 20:25:03 +0000 (13:25 -0700)]
selftests: tls: clean up uninitialized warnings

A bunch of tests uses uninitialized stack memory as random
data to send. This is harmless but generates compiler warnings.
Explicitly init the buffers with random data.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Sat, 19 Jun 2021 02:47:02 +0000 (19:47 -0700)]
Merge git://git./linux/kernel/git/netdev/net

Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh

scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Sat, 19 Jun 2021 01:55:29 +0000 (18:55 -0700)]
Merge tag 'net-5.13-rc7' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes for 5.13-rc7, including fixes from wireless, bpf,
  bluetooth, netfilter and can.

  Current release - regressions:

   - mlxsw: spectrum_qdisc: Pass handle, not band number to find_class()
     to fix modifying offloaded qdiscs

   - lantiq: net: fix duplicated skb in rx descriptor ring

   - rtnetlink: fix regression in bridge VLAN configuration, empty info
     is not an error, bot-generated "fix" was not needed

   - libbpf: s/rx/tx/ typo on umem->rx_ring_setup_done to fix umem
     creation

  Current release - new code bugs:

   - ethtool: fix NULL pointer dereference during module EEPROM dump via
     the new netlink API

   - mlx5e: don't update netdev RQs with PTP-RQ, the special purpose
     queue should not be visible to the stack

   - mlx5e: select special PTP queue only for SKBTX_HW_TSTAMP skbs

   - mlx5e: verify dev is present in get devlink port ndo, avoid a panic

  Previous releases - regressions:

   - neighbour: allow NUD_NOARP entries to be force GCed

   - further fixes for fallout from reorg of WiFi locking (staging:
     rtl8723bs, mac80211, cfg80211)

   - skbuff: fix incorrect msg_zerocopy copy notifications

   - mac80211: fix NULL ptr deref for injected rate info

   - Revert "net/mlx5: Arm only EQs with EQEs" it may cause missed IRQs

  Previous releases - always broken:

   - bpf: more speculative execution fixes

   - netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local

   - udp: fix race between close() and udp_abort() resulting in a panic

   - fix out of bounds when parsing TCP options before packets are
     validated (in netfilter: synproxy, tc: sch_cake and mptcp)

   - mptcp: improve operation under memory pressure, add missing
     wake-ups

   - mptcp: fix double-lock/soft lookup in subflow_error_report()

   - bridge: fix races (null pointer deref and UAF) in vlan tunnel
     egress

   - ena: fix DMA mapping function issues in XDP

   - rds: fix memory leak in rds_recvmsg

  Misc:

   - vrf: allow larger MTUs

   - icmp: don't send out ICMP messages with a source address of 0.0.0.0

   - cdc_ncm: switch to eth%d interface naming"

* tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (139 commits)
  net: ethernet: fix potential use-after-free in ec_bhf_remove
  selftests/net: Add icmp.sh for testing ICMP dummy address responses
  icmp: don't send out ICMP messages with a source address of 0.0.0.0
  net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY
  net: ll_temac: Fix TX BD buffer overwrite
  net: ll_temac: Add memory-barriers for TX BD access
  net: ll_temac: Make sure to free skb when it is completely used
  MAINTAINERS: add Guvenc as SMC maintainer
  bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path
  bnxt_en: Fix TQM fastpath ring backing store computation
  bnxt_en: Rediscover PHY capabilities after firmware reset
  cxgb4: fix wrong shift.
  mac80211: handle various extensible elements correctly
  mac80211: reset profile_periodicity/ema_ap
  cfg80211: avoid double free of PMSR request
  cfg80211: make certificate generation more robust
  mac80211: minstrel_ht: fix sample time check
  net: qed: Fix memcpy() overflow of qed_dcbx_params()
  net: cdc_eem: fix tx fixup skb leak
  net: hamradio: fix memory leak in mkiss_close
  ...

3 years agoMerge tag 'for-5.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Fri, 18 Jun 2021 23:39:03 +0000 (16:39 -0700)]
Merge tag 'for-5.13-rc6-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "One more fix, for a space accounting bug in zoned mode. It happens
  when a block group is switched back rw->ro and unusable bytes (due to
  zoned constraints) are subtracted twice.

  It has user visible effects so I consider it important enough for late
  -rc inclusion and backport to stable"

* tag 'for-5.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: fix negative space_info->bytes_readonly

3 years agoMerge tag 'pci-v5.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaa...
Linus Torvalds [Fri, 18 Jun 2021 20:54:11 +0000 (13:54 -0700)]
Merge tag 'pci-v5.13-fixes-2' of git://git./linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - Clear 64-bit flag for host bridge windows below 4GB to fix a resource
   allocation regression added in -rc1 (Punit Agrawal)

 - Fix tegra194 MCFG quirk build regressions added in -rc1 (Jon Hunter)

 - Avoid secondary bus resets on TI KeyStone C667X devices (Antti
   Järvinen)

 - Avoid secondary bus resets on some NVIDIA GPUs (Shanker Donthineni)

 - Work around FLR erratum on Huawei Intelligent NIC VF (Chiqijun)

 - Avoid broken ATS on AMD Navi14 GPU (Evan Quan)

 - Trust Broadcom BCM57414 NIC to isolate functions even though it
   doesn't advertise ACS support (Sriharsha Basavapatna)

 - Work around AMD RS690 BIOSes that don't configure DMA above 4GB
   (Mikel Rychliski)

 - Fix panic during PIO transfer on Aardvark controller (Pali Rohár)

* tag 'pci-v5.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: aardvark: Fix kernel panic during PIO transfer
  PCI: Add AMD RS690 quirk to enable 64-bit DMA
  PCI: Add ACS quirk for Broadcom BCM57414 NIC
  PCI: Mark AMD Navi14 GPU ATS as broken
  PCI: Work around Huawei Intelligent NIC VF FLR erratum
  PCI: Mark some NVIDIA GPUs to avoid bus reset
  PCI: Mark TI C667X to avoid bus reset
  PCI: tegra194: Fix MCFG quirk build regressions
  PCI: of: Clear 64-bit flag for non-prefetchable memory below 4GB

3 years agoafs: Re-enable freezing once a page fault is interrupted
Matthew Wilcox (Oracle) [Wed, 16 Jun 2021 21:22:28 +0000 (22:22 +0100)]
afs: Re-enable freezing once a page fault is interrupted

If a task is killed during a page fault, it does not currently call
sb_end_pagefault(), which means that the filesystem cannot be frozen
at any time thereafter.  This may be reported by lockdep like this:

====================================
WARNING: fsstress/10757 still has locks held!
5.13.0-rc4-build4+ #91 Not tainted
------------------------------------
1 lock held by fsstress/10757:
 #0: ffff888104eac530
 (
sb_pagefaults

as filesystem freezing is modelled as a lock.

Fix this by removing all the direct returns from within the function,
and using 'ret' to indicate whether we were interrupted or successful.

Fixes: 1cf7a1518aef ("afs: Implement shared-writeable mmap")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20210616154900.1958373-1-willy@infradead.org/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMerge branch 'RPMSG-WWAN-CTRL-driver'
David S. Miller [Fri, 18 Jun 2021 20:13:40 +0000 (13:13 -0700)]
Merge branch 'RPMSG-WWAN-CTRL-driver'

Stephan Gerhold says:

====================
net: wwan: Add RPMSG WWAN CTRL driver

This patch series adds a WWAN "control" driver for the remote processor
messaging (rpmsg) subsystem. This subsystem allows communicating with
an integrated modem DSP on many Qualcomm SoCs, e.g. MSM8916 or MSM8974.

The driver is a fairly simple glue layer between WWAN and RPMSG
and is mostly based on the existing mhi_wwan_ctrl.c and rpmsg_char.c.

For more information, see commit message in PATCH 2/3.

I already posted a RFC for this a while ago:
https://lore.kernel.org/linux-arm-msm/YLfL9Q+4860uqS8f@gerhold.net/
and now I'm looking for some feedback for the actual changes. :)

Changes in v3:
  - PATCH 2/3: Clarify commit message
  - PATCH 3/3: Fix build error for cdc-wdm.c, use extra tx_blocking() op instead
v2: https://lore.kernel.org/netdev/20210618075243.42046-1-stephan@gerhold.net/

Changes in v2: Only in PATCH 3/3
  - Fix EPOLLOUT being always set even if poll op is defined
  - Rename poll() op -> tx_poll() since it should be only used for TX
v1: https://lore.kernel.org/netdev/20210615133229.213064-1-stephan@gerhold.net/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wwan: Allow WWAN drivers to provide blocking tx and poll function
Stephan Gerhold [Fri, 18 Jun 2021 17:36:11 +0000 (19:36 +0200)]
net: wwan: Allow WWAN drivers to provide blocking tx and poll function

At the moment, the WWAN core provides wwan_port_txon/off() to implement
blocking writes. The tx() port operation should not block, instead
wwan_port_txon/off() should be called when the TX queue is full or has
free space again.

However, in some cases it is not straightforward to make use of that
functionality. For example, the RPMSG API used by rpmsg_wwan_ctrl.c
does not provide any way to be notified when the TX queue has space
again. Instead, it only provides the following operations:

  - rpmsg_send(): blocking write (wait until there is space)
  - rpmsg_trysend(): non-blocking write (return error if no space)
  - rpmsg_poll(): set poll flags depending on TX queue state

Generally that's totally sufficient for implementing a char device,
but it does not fit well to the currently provided WWAN port ops.

Most of the time, using the non-blocking rpmsg_trysend() in the
WWAN tx() port operation works just fine. However, with high-frequent
writes to the char device it is possible to trigger a situation
where this causes issues. For example, consider the following
(somewhat unrealistic) example:

 # dd if=/dev/zero bs=1000 of=/dev/wwan0qmi0
 dd: error writing '/dev/wwan0qmi0': Resource temporarily unavailable
 1+0 records out

This fails immediately after writing the first record. It's likely
only a matter of time until this triggers issues for some real application
(e.g. ModemManager sending a lot of large QMI packets).

The rpmsg_char device does not have this problem, because it uses
rpmsg_trysend() and rpmsg_poll() to support non-blocking operations.
Make it possible to use the same in the RPMSG WWAN driver by adding
two new optional wwan_port_ops:

  - tx_blocking(): send data blocking if allowed
  - tx_poll(): set additional TX poll flags

This integrates nicely with the RPMSG API and does not require
any change in existing WWAN drivers.

With these changes, the dd example above blocks instead of exiting
with an error.

Cc: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wwan: Add RPMSG WWAN CTRL driver
Stephan Gerhold [Fri, 18 Jun 2021 17:36:10 +0000 (19:36 +0200)]
net: wwan: Add RPMSG WWAN CTRL driver

The remote processor messaging (rpmsg) subsystem provides an interface
to communicate with other remote processors. On many Qualcomm SoCs this
is used to communicate with an integrated modem DSP that implements most
of the modem functionality and provides high-level protocols like
QMI or AT to allow controlling the modem.

For QMI, most older Qualcomm SoCs (e.g. MSM8916/MSM8974) have
a standalone "DATA5_CNTL" channel that allows exchanging QMI messages.
Note that newer SoCs (e.g. SDM845) only allow exchanging QMI messages
via a shared QRTR channel that is available via a socket API on Linux.

For AT, the "DATA4" channel accepts at least a limited set of AT
commands, on many older and newer Qualcomm SoCs, although QMI is
typically the preferred control protocol.

Often there are additional QMI/AT channels (usually named DATA*_CNTL
for QMI and DATA* for AT), but it is not clear if those are really
functional on all devices. Also, at the moment there is no use case
for having multiple QMI/AT ports. If needed more channels could be
added later after more testing.

Note that the data path (network interface) is entirely separate
from the control path and varies between Qualcomm SoCs, e.g. "IPA"
on newer Qualcomm SoCs or "BAM-DMUX" on some older ones.

The RPMSG WWAN CTRL driver exposes the QMI/AT control ports via the
WWAN subsystem, and therefore allows userspace like ModemManager to
set up the modem. Until now, ModemManager had to use the RPMSG-specific
rpmsg-char where the channels must be explicitly exposed as a char
device first and don't show up directly in sysfs.

The driver is a fairly simple glue layer between WWAN and RPMSG
and is mostly based on the existing mhi_wwan_ctrl.c and rpmsg_char.c.

Cc: Loic Poulain <loic.poulain@linaro.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agorpmsg: core: Add driver_data for rpmsg_device_id
Stephan Gerhold [Fri, 18 Jun 2021 17:36:09 +0000 (19:36 +0200)]
rpmsg: core: Add driver_data for rpmsg_device_id

Most device_id structs provide a driver_data field that can be used
by drivers to associate data more easily for a particular device ID.
Add the same for the rpmsg_device_id.

Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
David S. Miller [Fri, 18 Jun 2021 20:10:36 +0000 (13:10 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Jesse Brandeburg says:

====================
100GbE Intel Wired LAN Driver Updates 2021-06-18

Update three of the Intel Ethernet drivers with similar (but not the
same) improvements to simplify the packet type table init, while removing
an unused structure entry. For the ice driver, the table is extended
to 10 bits, which is the hardware limit, and for now is initialized
to zero.

The end result is slightly reduced memory usage, removal of a bunch
of code, and more specific initialization.
====================

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
3 years agoRevert "net: add pf_family_names[] for protocol family"
David S. Miller [Fri, 18 Jun 2021 20:02:45 +0000 (13:02 -0700)]
Revert "net: add pf_family_names[] for protocol family"

This reverts commit 1f3c98eaddec857e16a7a1c6cd83317b3dc89438.

Does not build...

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: add pf_family_names[] for protocol family
Yejune Deng [Fri, 18 Jun 2021 14:32:47 +0000 (22:32 +0800)]
net: add pf_family_names[] for protocol family

Modify the pr_info content from int to char *, this looks more readable.

Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: fix potential use-after-free in ec_bhf_remove
Pavel Skripkin [Fri, 18 Jun 2021 13:49:02 +0000 (16:49 +0300)]
net: ethernet: fix potential use-after-free in ec_bhf_remove

static void ec_bhf_remove(struct pci_dev *dev)
{
...
struct ec_bhf_priv *priv = netdev_priv(net_dev);

unregister_netdev(net_dev);
free_netdev(net_dev);

pci_iounmap(dev, priv->dma_io);
pci_iounmap(dev, priv->io);
...
}

priv is netdev private data, but it is used
after free_netdev(). It can cause use-after-free when accessing priv
pointer. So, fix it by moving free_netdev() after pci_iounmap()
calls.

Fixes: 6af55ff52b02 ("Driver for Beckhoff CX5020 EtherCAT master module.")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'csock-seqpoacket-small-fixes'
David S. Miller [Fri, 18 Jun 2021 19:59:53 +0000 (12:59 -0700)]
Merge branch 'csock-seqpoacket-small-fixes'

Stefano Garzarella says:

====================
vsock: small fixes for seqpacket support

This series contains few patches to clean up a bit the code
of seqpacket recently merged in the net-next tree.

No functionality changes.
====================

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
3 years agovsock/virtio: remove redundant `copy_failed` variable
Stefano Garzarella [Fri, 18 Jun 2021 13:35:26 +0000 (15:35 +0200)]
vsock/virtio: remove redundant `copy_failed` variable

When memcpy_to_msg() fails in virtio_transport_seqpacket_do_dequeue(),
we already set `dequeued_len` with the negative error value returned
by memcpy_to_msg().

So we can directly check `dequeued_len` value instead of using a
dedicated flag variable to skip the copy path for the rest of
fragments.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agovsock: rename vsock_wait_data()
Stefano Garzarella [Fri, 18 Jun 2021 13:35:25 +0000 (15:35 +0200)]
vsock: rename vsock_wait_data()

vsock_wait_data() is used only by STREAM and SEQPACKET sockets,
so let's rename it to vsock_connectible_wait_data(), using the same
nomenclature (connectible) used in other functions after the
introduction of SEQPACKET.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agovsock: rename vsock_has_data()
Stefano Garzarella [Fri, 18 Jun 2021 13:35:24 +0000 (15:35 +0200)]
vsock: rename vsock_has_data()

vsock_has_data() is used only by STREAM and SEQPACKET sockets,
so let's rename it to vsock_connectible_has_data(), using the same
nomenclature (connectible) used in other functions after the
introduction of SEQPACKET.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoNFC: nxp-nci: remove unnecessary label
wengjianfeng [Fri, 18 Jun 2021 08:52:26 +0000 (16:52 +0800)]
NFC: nxp-nci: remove unnecessary label

Remove unnecessary label chunk_exit and return directly.

Signed-off-by: wengjianfeng <wengjianfeng@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: sja1105: completely error out in sja1105_static_config_reload if something...
Vladimir Oltean [Fri, 18 Jun 2021 13:48:12 +0000 (16:48 +0300)]
net: dsa: sja1105: completely error out in sja1105_static_config_reload if something fails

If reloading the static config fails for whatever reason, for example if
sja1105_static_config_check_valid() fails, then we "goto out_unlock_ptp"
but we print anyway that "Reset switch and programmed static config.",
which is confusing because we didn't. We also do a bunch of other stuff
like reprogram the XPCS and reload the credit-based shapers, as if a
switch reset took place, which didn't.

So just unlock the PTP lock and goto out, skipping all of that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: sja1105: allow the TTEthernet configuration in the static config for SJA1110
Vladimir Oltean [Fri, 18 Jun 2021 13:44:00 +0000 (16:44 +0300)]
net: dsa: sja1105: allow the TTEthernet configuration in the static config for SJA1110

Currently sja1105_static_config_check_valid() is coded up to detect
whether TTEthernet is supported based on device ID, and this check was
not updated to cover SJA1110.

However, it is desirable to have as few checks for the device ID as
possible, so the driver core is more generic. So what we can do is look
at the static config table operations implemented by that specific
switch family (populated by sja1105_static_config_init) whether the
schedule table has a non-zero maximum entry count (meaning that it is
supported) or not.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: fix reuse conflict of the rx page
Yunsheng Lin [Fri, 18 Jun 2021 12:09:45 +0000 (20:09 +0800)]
net: hns3: fix reuse conflict of the rx page

In the current rx page reuse handling process, the rx page buffer may
have conflict between driver and stack in high-pressure scenario.

To fix this problem, we need to check whether the page is only owned
by driver at the begin and at the end of a page to make sure there is
no reuse conflict between driver and stack when desc_cb->page_offset
is rollbacked to zero or increased.

Fixes: fa7711b888f2 ("net: hns3: optimize the rx page reuse handling process")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: sja1105: properly power down the microcontroller clock for SJA1110
Vladimir Oltean [Fri, 18 Jun 2021 11:52:54 +0000 (14:52 +0300)]
net: dsa: sja1105: properly power down the microcontroller clock for SJA1110

It turns out that powering down the BASE_TIMER_CLK does not turn off the
microcontroller, just its timers, including the one for the watchdog.
So the embedded microcontroller is still running, and potentially still
doing things.

To prevent unwanted interference, we should power down the BASE_MCSS_CLK
as well (MCSS = microcontroller subsystem).

The trouble is that currently we turn off the BASE_TIMER_CLK for SJA1110
from the .clocking_setup() method, mostly because this is a Clock
Generation Unit (CGU) setting which was traditionally configured in that
method for SJA1105. But in SJA1105, the CGU was used for bringing up the
port clocks at the proper speeds, and in SJA1110 it's not (but rather
for initial configuration), so it's best that we rebrand the
sja1110_clocking_setup() method into what it really is - an implementation
of the .disable_microcontroller() method.

Since disabling the microcontroller only needs to be done once, at probe
time, we can choose the best place to do that as being in sja1105_setup(),
before we upload the static config to the device. This guarantees that
the static config being used by the switch afterwards is really ours.

Note that the procedure to upload a static config necessarily resets the
switch. This already did not reset the microcontroller, only the switch
core, so since the .disable_microcontroller() method is guaranteed to be
called by that point, if it's disabled, it remains disabled. Add a
comment to make that clear.

With the code movement for SJA1110 from .clocking_setup() to
.disable_microcontroller(), both methods are optional and are guarded by
"if" conditions.

Tested by enabling in the device tree the rev-mii switch port 0 that
goes towards the microcontroller, and flashing a firmware that would
have networking. Without this patch, the microcontroller can be pinged,
with this patch it cannot.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'mac80211-for-net-2021-06-18' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Fri, 18 Jun 2021 19:22:55 +0000 (12:22 -0700)]
Merge tag 'mac80211-for-net-2021-06-18' of git://git./linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A couple of straggler fixes:
 * a minstrel HT sample check fix
 * peer measurement could double-free on races
 * certificate file generation at build time could
   sometimes hang
 * some parameters weren't reset between connections
   in mac80211
 * some extensible elements were treated as non-
   extensible, possibly causuing bad connections
   (or failures) if the AP adds data
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests/net: Add icmp.sh for testing ICMP dummy address responses
Toke Høiland-Jørgensen [Fri, 18 Jun 2021 11:04:36 +0000 (13:04 +0200)]
selftests/net: Add icmp.sh for testing ICMP dummy address responses

This adds a new icmp.sh selftest for testing that the kernel will respond
correctly with an ICMP unreachable message with the dummy (192.0.0.8)
source address when there are no IPv4 addresses configured to use as source
addresses.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoicmp: don't send out ICMP messages with a source address of 0.0.0.0
Toke Høiland-Jørgensen [Fri, 18 Jun 2021 11:04:35 +0000 (13:04 +0200)]
icmp: don't send out ICMP messages with a source address of 0.0.0.0

When constructing ICMP response messages, the kernel will try to pick a
suitable source address for the outgoing packet. However, if no IPv4
addresses are configured on the system at all, this will fail and we end up
producing an ICMP message with a source address of 0.0.0.0. This can happen
on a box routing IPv4 traffic via v6 nexthops, for instance.

Since 0.0.0.0 is not generally routable on the internet, there's a good
chance that such ICMP messages will never make it back to the sender of the
original packet that the ICMP message was sent in response to. This, in
turn, can create connectivity and PMTUd problems for senders. Fortunately,
RFC7600 reserves a dummy address to be used as a source for ICMP
messages (192.0.0.8/32), so let's teach the kernel to substitute that
address as a last resort if the regular source address selection procedure
fails.

Below is a quick example reproducing this issue with network namespaces:

ip netns add ns0
ip l add type veth peer netns ns0
ip l set dev veth0 up
ip a add 10.0.0.1/24 dev veth0
ip a add fc00:dead:cafe:42::1/64 dev veth0
ip r add 10.1.0.0/24 via inet6 fc00:dead:cafe:42::2
ip -n ns0 l set dev veth0 up
ip -n ns0 a add fc00:dead:cafe:42::2/64 dev veth0
ip -n ns0 r add 10.0.0.0/24 via inet6 fc00:dead:cafe:42::1
ip netns exec ns0 sysctl -w net.ipv4.icmp_ratelimit=0
ip netns exec ns0 sysctl -w net.ipv4.ip_forward=1
tcpdump -tpni veth0 -c 2 icmp &
ping -w 1 10.1.0.1 > /dev/null
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 29, seq 1, length 64
IP 0.0.0.0 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92
2 packets captured
2 packets received by filter
0 packets dropped by kernel

With this patch the above capture changes to:
IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 31127, seq 1, length 64
IP 192.0.0.8 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Juliusz Chroboczek <jch@irif.fr>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY
Esben Haabendal [Fri, 18 Jun 2021 10:52:38 +0000 (12:52 +0200)]
net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY

As documented in Documentation/networking/driver.rst, the ndo_start_xmit
method must not return NETDEV_TX_BUSY under any normal circumstances, and
as recommended, we simply stop the tx queue in advance, when there is a
risk that the next xmit would cause a NETDEV_TX_BUSY return.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ll_temac: Fix TX BD buffer overwrite
Esben Haabendal [Fri, 18 Jun 2021 10:52:33 +0000 (12:52 +0200)]
net: ll_temac: Fix TX BD buffer overwrite

Just as the initial check, we need to ensure num_frag+1 buffers available,
as that is the number of buffers we are going to use.

This fixes a buffer overflow, which might be seen during heavy network
load. Complete lockup of TEMAC was reproducible within about 10 minutes of
a particular load.

Fixes: 84823ff80f74 ("net: ll_temac: Fix race condition causing TX hang")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ll_temac: Add memory-barriers for TX BD access
Esben Haabendal [Fri, 18 Jun 2021 10:52:28 +0000 (12:52 +0200)]
net: ll_temac: Add memory-barriers for TX BD access

Add a couple of memory-barriers to ensure correct ordering of read/write
access to TX BDs.

In xmit_done, we should ensure that reading the additional BD fields are
only done after STS_CTRL_APP0_CMPLT bit is set.

When xmit_done marks the BD as free by setting APP0=0, we need to ensure
that the other BD fields are reset first, so we avoid racing with the xmit
path, which writes to the same fields.

Finally, making sure to read APP0 of next BD after the current BD, ensures
that we see all available buffers.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ll_temac: Make sure to free skb when it is completely used
Esben Haabendal [Fri, 18 Jun 2021 10:52:23 +0000 (12:52 +0200)]
net: ll_temac: Make sure to free skb when it is completely used

With the skb pointer piggy-backed on the TX BD, we have a simple and
efficient way to free the skb buffer when the frame has been transmitted.
But in order to avoid freeing the skb while there are still fragments from
the skb in use, we need to piggy-back on the TX BD of the skb, not the
first.

Without this, we are doing use-after-free on the DMA side, when the first
BD of a multi TX BD packet is seen as completed in xmit_done, and the
remaining BDs are still being processed.

Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoqlcnic: remove redundant continue statement
Colin Ian King [Fri, 18 Jun 2021 10:19:19 +0000 (11:19 +0100)]
qlcnic: remove redundant continue statement

The continue statement at the end of a for-loop has no effect,
it is redundant and can be removed.

Addresses-Coverity: ("Continue has no effect")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bridge: remove redundant continue statement
Colin Ian King [Fri, 18 Jun 2021 10:01:55 +0000 (11:01 +0100)]
net: bridge: remove redundant continue statement

The continue statement at the end of a for-loop has no effect,
invert the if expression and remove the continue.

Addresses-Coverity: ("Continue has no effect")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: stmmac: remove redundant continue statement
Colin Ian King [Fri, 18 Jun 2021 09:44:25 +0000 (10:44 +0100)]
net: stmmac: remove redundant continue statement

The continue statement in the for-loop has no effect, remove it.

Addresses-Coverity: ("Continue has no effect")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: pxa168_eth: Fix a potential data race in pxa168_eth_remove
Pavel Machek [Fri, 18 Jun 2021 09:35:26 +0000 (11:35 +0200)]
net: pxa168_eth: Fix a potential data race in pxa168_eth_remove

Commit 0571a753cb07 cancelled delayed work too late, keeping small
race. Cancel work sooner to close it completely.

Signed-off-by: Pavel Machek (CIP) <pavel@denx.de>
Fixes: 0571a753cb07 ("net: pxa168_eth: Fix a potential data race in pxa168_eth_remove")
Signed-off-by: David S. Miller <davem@davemloft.net>