Linus Torvalds [Mon, 3 Oct 2022 16:56:25 +0000 (09:56 -0700)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Three fixes for ARM:
- unbreak the RiscPC build
- fix wrong pg_level in page table dumper
- make MT_MEMORY_RO really read-only with LPAE"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9247/1: mm: set readonly for MT_MEMORY_RO with ARM_LPAE
ARM: 9244/1: dump: Fix wrong pg_level in walk_pmd()
ARM: 9243/1: riscpc: Unbreak the build
Linus Torvalds [Mon, 3 Oct 2022 16:48:47 +0000 (09:48 -0700)]
Merge tag 'm68k-for-v6.1-tag1' of git://git./linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- Fix forward secrecy of RNG seed boot record handling
- Make RNG seed boot record handling generic for all m68k platforms
using bootinfo
- defconfig updates
- Minor fixes and improvements
* tag 'm68k-for-v6.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: Rework BI_VIRT_RNG_SEED as BI_RNG_SEED
m68k: Process bootinfo records before saving them
m68k: defconfig: Update defconfigs for v6.0-rc2
m68k: Allow kexec on M68KCLASSIC with MMU enabled only
m68k: Move from strlcpy with unused retval to strscpy
Linus Torvalds [Mon, 3 Oct 2022 16:31:55 +0000 (09:31 -0700)]
Merge tag 'mips_6.1' of git://git./linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
- mainly cleanups
- fix enabling interrupts on second VPE for Lantiq platform
- switch to use gpiod API
- allow firmware passing RND seed
* tag 'mips_6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (29 commits)
MIPS: pci: lantiq: switch to using gpiod API
mips: allow firmware to pass RNG seed to kernel
MIPS: Simplify __bswapdi2() and __bswapsi2()
MIPS: Silence missing prototype warning
mips: update config files
MIPS: Lantiq: vmmc: fix compile break introduced by gpiod patch
MIPS: IRQ: remove orphan allocate_irqno() declaration
MIPS: remove orphan sb1250_time_init() declaration
MIPS: Lantiq: switch vmmc to use gpiod API
MIPS: lantiq: enable all hardware interrupts on second VPE
MIPS: BCM47XX: Cast memcmp() of function to (void *)
mips: ralink: convert to DEFINE_SHOW_ATTRIBUTE
mips: kernel: convert to DEFINE_SHOW_ATTRIBUTE
mips: cavium: convert to DEFINE_SHOW_ATTRIBUTE
MIPS: AR7: remove orphan declarations from arch/mips/include/asm/mach-ar7/ar7.h
MIPS: remove orphan sni_cpu_time_init() declaration
MIPS: IRQ: remove orphan declarations from arch/mips/include/asm/irq.h
MIPS: Octeon: remove orphan octeon_hal_setup_reserved32() declaration
MIPS: Octeon: remove orphan cvmx_fpa_setup_pool() declaration
MIPS: Octeon: remove orphan octeon_swiotlb declaration
...
Lorenzo Bianconi [Thu, 29 Sep 2022 22:38:43 +0000 (00:38 +0200)]
net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c
Remove circular dependency between nf_nat module and nf_conntrack one
moving bpf_ct_set_nat_info kfunc in nf_nat_bpf.c
Fixes:
0fabd2aa199f ("net: netfilter: add bpf_ct_set_nat_info kfunc helper")
Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/51a65513d2cda3eeb0754842e8025ab3966068d8.1664490511.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Bagas Sanjaya [Sun, 2 Oct 2022 03:20:23 +0000 (10:20 +0700)]
Documentation: bpf: Add implementation notes documentations to table of contents
Sphinx reported warnings on missing implementation notes documentations in the
table of contents:
Documentation/bpf/clang-notes.rst: WARNING: document isn't included in any toctree
Documentation/bpf/linux-notes.rst: WARNING: document isn't included in any toctree
Add these documentations to the table of contents (index.rst) of BPF
documentation to fix the warnings.
Link: https://lore.kernel.org/linux-doc/202210020749.yfgDZbRL-lkp@intel.com/
Fixes:
6c7aaffb24efbd ("bpf, docs: Move Clang notes to a separate file")
Fixes:
6166da0a02cde2 ("bpf, docs: Move legacy packet instructions to a separate file")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20221002032022.24693-1-bagasdotme@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Eric Dumazet [Sat, 1 Oct 2022 20:51:02 +0000 (13:51 -0700)]
once: add DO_ONCE_SLOW() for sleepable contexts
Christophe Leroy reported a ~80ms latency spike
happening at first TCP connect() time.
This is because __inet_hash_connect() uses get_random_once()
to populate a perturbation table which became quite big
after commit
4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16")
get_random_once() uses DO_ONCE(), which block hard irqs for the duration
of the operation.
This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock
for operations where we prefer to stay in process context.
Then __inet_hash_connect() can use get_random_slow_once()
to populate its perturbation table.
Fixes:
4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16")
Fixes:
190cc82489f4 ("tcp: change source port randomizarion at connect() time")
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#t
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tetsuo Handa [Sat, 1 Oct 2022 16:43:44 +0000 (01:43 +0900)]
net/ieee802154: reject zero-sized raw_sendmsg()
syzbot is hitting skb_assert_len() warning at raw_sendmsg() for ieee802154
socket. What commit
dc633700f00f726e ("net/af_packet: check len when
min_header_len equals to 0") does also applies to ieee802154 socket.
Link: https://syzkaller.appspot.com/bug?extid=5ea725c25d06fb9114c4
Reported-by: syzbot <syzbot+5ea725c25d06fb9114c4@syzkaller.appspotmail.com>
Fixes:
fd1894224407c484 ("bpf: Don't redirect packets with invalid pkt_len")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxim Mikityanskiy [Sat, 1 Oct 2022 10:57:13 +0000 (13:57 +0300)]
net: wwan: iosm: Call mutex_init before locking it
wwan_register_ops calls wwan_create_default_link, which ends up in the
ipc_wwan_newlink callback that locks ipc_wwan->if_mutex. However, this
mutex is not yet initialized by that point. Fix it by moving mutex_init
above the wwan_register_ops call. This also makes the order of
operations in ipc_wwan_init symmetric to ipc_wwan_deinit.
Fixes:
83068395bbfc ("net: iosm: create default link via WWAN core")
Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Reviewed-by: M Chetan Kumar <m.chetan.kumar@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Oct 2022 11:50:19 +0000 (12:50 +0100)]
Merge branch 'octeontx2-macsec-offload'
Subbaraya Sundeep says:
====================
net: Introduce macsec hardware offload for cn10k platform
CN10K-B and CNF10K-B variaints of CN10K silicon has macsec block(MCS)
to encrypt and decrypt packets at MAC/hardware level. This block is a
global resource with hardware resources like SecYs, SCs and SAs
and is in between NIX block and RPM LMAC. CN10K-B silicon has only
one MCS block which receives packets from all LMACS whereas
CNF10K-B has seven MCS blocks for seven LMACs. Both MCS blocks are
similar in operation except for few register offsets and some
configurations require writing to different registers. This patchset
introduces macsec hardware offloading support. AF driver manages hardware
resources and PF driver consumes them when macsec hardware offloading
is needed.
Patch 1 adds basic pci driver for both CN10K-B and CNF10K-B
silicons and initializes hardware block.
Patches 2 and 3 adds mailboxes to init, reset and manage
resources of the MCS block
Patch 4 adds a low priority rule in MCS TCAM so that the
traffic which do not need macsec processing can be sent/received
Patch 5 adds macsec stats collection support
Patch 6 adds interrupt handling support and any event in which
AF consumer is interested can be notified via mbox notification
Patch 7 adds debugfs support which helps in debugging packet
path
Patch 8 introduces macsec hardware offload feature for
PF netdev driver.
v3 changes:
Fixed clang and sparse warnings
v2 changes:
Fix build error by changing #ifdef CONFIG_MACSEC to
#if IS_ENABLED(CONFIG_MACSEC)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Subbaraya Sundeep [Sat, 1 Oct 2022 04:59:49 +0000 (10:29 +0530)]
octeontx2-pf: mcs: Introduce MACSEC hardware offloading
This patch introduces the macsec offload feature to cn10k
PF netdev driver. The macsec offload ops like adding, deleting
and updating SecYs, SCs, SAs and stats are supported. XPN support
will be added in later patches. Some stats use same counter in hardware
which means based on the SecY mode the same counter represents different
stat. Hence when SecY mode/policy is changed then snapshot of current
stats are captured. Also there is no provision to specify the unique
flow-id/SCI per packet to hardware hence different mac address needs to
be set for macsec interfaces.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Sat, 1 Oct 2022 04:59:48 +0000 (10:29 +0530)]
octeontx2-af: cn10k: mcs: Add debugfs support
This patch adds debugfs entry to dump MCS secy, sc,
sa, flowid and port stats. This helps in debugging
the packet path and to figure out where exactly packet
was dropped.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Sat, 1 Oct 2022 04:59:47 +0000 (10:29 +0530)]
octeontx2-af: cn10k: mcs: Handle MCS block interrupts
Hardware triggers an interrupt for events like PN wrap to zero,
PN crosses set threshold. This interrupt is received
by the MCS_AF. MCS AF then finds the PF/VF to which SA is mapped
and notifies them using mcs_intr_notify mbox message.
PF/VF using mcs_intr_cfg mbox can configure the list
of interrupts for which they want to receive the
notification from AF.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Sat, 1 Oct 2022 04:59:46 +0000 (10:29 +0530)]
octeontx2-af: cn10k: mcs: Support for stats collection
Add mailbox messages to return the resource stats to the
caller. Stats of SecY, SC and SAs as per the macsec standard,
TCAM flow id hits/miss, mailbox to clear the stats are
implemented.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Ankur Dwivedi <adwivedi@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Sat, 1 Oct 2022 04:59:45 +0000 (10:29 +0530)]
octeontx2-af: cn10k: mcs: Install a default TCAM for normal traffic
Out of all the TCAM entries, reserve last TX and RX TCAM flow
entry(low priority) so that normal traffic can be sent out and
received. The traffic which needs macsec processing hits the
high priority TCAM flows. Also install a FLR handler to free
the allocated resources for PF/VF.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Sat, 1 Oct 2022 04:59:44 +0000 (10:29 +0530)]
octeontx2-af: cn10k: mcs: Manage the MCS block hardware resources
To establish a macsec connection association netdev driver
needs hardware resources like SecY, TCAM flows, SCs and SAs.
This patch manages allocating, freeing and configuring those
resources. AF consumers can request resources and configure them
via these mailbox messages. AF can allocate until it runs out of
hardware resources.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Sat, 1 Oct 2022 04:59:43 +0000 (10:29 +0530)]
octeontx2-af: cn10k: mcs: Add mailboxes for port related operations
There are set of configurations to be done at MCS port level like
bringing port out of reset, making port as operational or bypass.
This patch adds all the port related mailbox message handlers
so that AF consumers can use them.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Sat, 1 Oct 2022 04:59:42 +0000 (10:29 +0530)]
octeontx2-af: cn10k: Introduce driver for macsec block.
CN10K-B and CNF10K-B has macsec block(MCS) to encrypt and
decrypt packets at MAC level. This block is a global resource
with hardware resources like SecYs, SCs and SAs and is in
between NIX block and RPM LMAC. CN10K-B silicon has only one MCS
block which receives packets from all LMACS whereas CNF10K-B has
seven MCS blocks for seven LMACs. Both MCS blocks are
similar in operation except for few register offsets and some
configurations require writing to different registers. Those
differences between IPs are handled using separate ops.
This patch adds basic driver and does the initial hardware
calibration and parser configuration.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zheng Wang [Fri, 30 Sep 2022 17:57:25 +0000 (01:57 +0800)]
eth: sp7021: fix use after free bug in spl2sw_nvmem_get_mac_address
This frees "mac" and tries to display its address as part of the error
message on the next line. Swap the order.
Fixes:
fd3040b9394c ("net: ethernet: Add driver for Sunplus SP7021")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Oct 2022 11:46:47 +0000 (12:46 +0100)]
Merge branch 'lan966x-police-mirroring'
Horatiu Vultur says:
====================
net: lan966x: Add police and mirror using tc-matchall
Add tc-matchall classifier offload support both for ingress and egress.
For this add support for the port police and port mirroring action support.
Port police can happen only on ingress while port mirroring is supported
both on ingress and egress
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Horatiu Vultur [Fri, 30 Sep 2022 08:35:40 +0000 (10:35 +0200)]
net: lan966x: Add port mirroring support using tc-matchall
Add support for port mirroring. It is possible to mirror only one port
at a time and it is possible to have both ingress and egress mirroring.
Frames injected by the CPU don't get egress mirrored because they are
bypassing the analyzer module.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Horatiu Vultur [Fri, 30 Sep 2022 08:35:39 +0000 (10:35 +0200)]
net: lan966x: Add port police support using tc-matchall
Add support for port police. It is possible to police only on the
ingress side. To be able to add police support also it was required to
add tc-matchall classifier offload support.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shenwei Wang [Fri, 30 Sep 2022 20:44:27 +0000 (15:44 -0500)]
net: fec: using page pool to manage RX buffers
This patch optimizes the RX buffer management by using the page
pool. The purpose for this change is to prepare for the following
XDP support. The current driver uses one frame per page for easy
management.
Added __maybe_unused attribute to the following functions to avoid
the compiling warning. Those functions will be removed by a separate
patch once this page pool solution is accepted.
- fec_enet_new_rxbdp
- fec_enet_copybreak
The following are the comparing result between page pool implementation
and the original implementation (non page pool).
--- small packet (64 bytes) testing are almost the same
--- no matter what the implementation is
--- on both i.MX8 and i.MX6SX platforms.
shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1 -l 64
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 39728 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 37.0 MBytes 311 Mbits/sec
[ 1] 1.0000-2.0000 sec 36.6 MBytes 307 Mbits/sec
[ 1] 2.0000-3.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 3.0000-4.0000 sec 37.1 MBytes 312 Mbits/sec
[ 1] 4.0000-5.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 5.0000-6.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 6.0000-7.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 7.0000-8.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 0.0000-8.0943 sec 299 MBytes 310 Mbits/sec
--- Page Pool implementation on i.MX8 ----
shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 43204 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 111 MBytes 933 Mbits/sec
[ 1] 1.0000-2.0000 sec 111 MBytes 934 Mbits/sec
[ 1] 2.0000-3.0000 sec 112 MBytes 935 Mbits/sec
[ 1] 3.0000-4.0000 sec 111 MBytes 933 Mbits/sec
[ 1] 4.0000-5.0000 sec 111 MBytes 934 Mbits/sec
[ 1] 5.0000-6.0000 sec 111 MBytes 933 Mbits/sec
[ 1] 6.0000-7.0000 sec 111 MBytes 931 Mbits/sec
[ 1] 7.0000-8.0000 sec 112 MBytes 935 Mbits/sec
[ 1] 8.0000-9.0000 sec 111 MBytes 933 Mbits/sec
[ 1] 9.0000-10.0000 sec 112 MBytes 935 Mbits/sec
[ 1] 0.0000-10.0077 sec 1.09 GBytes 933 Mbits/sec
--- Non Page Pool implementation on i.MX8 ----
shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 49154 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 104 MBytes 868 Mbits/sec
[ 1] 1.0000-2.0000 sec 105 MBytes 878 Mbits/sec
[ 1] 2.0000-3.0000 sec 105 MBytes 881 Mbits/sec
[ 1] 3.0000-4.0000 sec 105 MBytes 879 Mbits/sec
[ 1] 4.0000-5.0000 sec 105 MBytes 878 Mbits/sec
[ 1] 5.0000-6.0000 sec 105 MBytes 878 Mbits/sec
[ 1] 6.0000-7.0000 sec 104 MBytes 875 Mbits/sec
[ 1] 7.0000-8.0000 sec 104 MBytes 875 Mbits/sec
[ 1] 8.0000-9.0000 sec 104 MBytes 873 Mbits/sec
[ 1] 9.0000-10.0000 sec 104 MBytes 875 Mbits/sec
[ 1] 0.0000-10.0073 sec 1.02 GBytes 875 Mbits/sec
--- Page Pool implementation on i.MX6SX ----
shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 57288 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 78.8 MBytes 661 Mbits/sec
[ 1] 1.0000-2.0000 sec 82.5 MBytes 692 Mbits/sec
[ 1] 2.0000-3.0000 sec 82.4 MBytes 691 Mbits/sec
[ 1] 3.0000-4.0000 sec 82.4 MBytes 691 Mbits/sec
[ 1] 4.0000-5.0000 sec 82.5 MBytes 692 Mbits/sec
[ 1] 5.0000-6.0000 sec 82.4 MBytes 691 Mbits/sec
[ 1] 6.0000-7.0000 sec 82.5 MBytes 692 Mbits/sec
[ 1] 7.0000-8.0000 sec 82.4 MBytes 691 Mbits/sec
[ 1] 8.0000-9.0000 sec 82.4 MBytes 691 Mbits/sec
[ 1] 9.0000-9.5506 sec 45.0 MBytes 686 Mbits/sec
[ 1] 0.0000-9.5506 sec 783 MBytes 688 Mbits/sec
--- Non Page Pool implementation on i.MX6SX ----
shenwei@5810:~/pktgen$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 36486 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 70.5 MBytes 591 Mbits/sec
[ 1] 1.0000-2.0000 sec 64.5 MBytes 541 Mbits/sec
[ 1] 2.0000-3.0000 sec 73.6 MBytes 618 Mbits/sec
[ 1] 3.0000-4.0000 sec 73.6 MBytes 618 Mbits/sec
[ 1] 4.0000-5.0000 sec 72.9 MBytes 611 Mbits/sec
[ 1] 5.0000-6.0000 sec 73.4 MBytes 616 Mbits/sec
[ 1] 6.0000-7.0000 sec 73.5 MBytes 617 Mbits/sec
[ 1] 7.0000-8.0000 sec 73.4 MBytes 616 Mbits/sec
[ 1] 8.0000-9.0000 sec 73.4 MBytes 616 Mbits/sec
[ 1] 9.0000-10.0000 sec 73.9 MBytes 620 Mbits/sec
[ 1] 0.0000-10.0174 sec 723 MBytes 605 Mbits/sec
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 30 Sep 2022 14:37:30 +0000 (16:37 +0200)]
net: Remove DECnet leftovers from flow.h.
DECnet was removed by commit
1202cdd66531 ("Remove DECnet support from
kernel"). Let's also revome its flow structure.
Compile-tested only (allmodconfig).
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jianglei Nie [Fri, 30 Sep 2022 06:28:43 +0000 (14:28 +0800)]
bnx2x: fix potential memory leak in bnx2x_tpa_stop()
bnx2x_tpa_stop() allocates a memory chunk from new_data with
bnx2x_frag_alloc(). The new_data should be freed when gets some error.
But when "pad + len > fp->rx_buf_size" is true, bnx2x_tpa_stop() returns
without releasing the new_data, which will lead to a memory leak.
We should free the new_data with bnx2x_frag_free() when "pad + len >
fp->rx_buf_size" is true.
Fixes:
07b0f00964def8af9321cfd6c4a7e84f6362f728 ("bnx2x: fix possible panic under memory stress")
Signed-off-by: Jianglei Nie <niejianglei2021@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Fri, 30 Sep 2022 09:27:40 +0000 (14:57 +0530)]
eth: lan743x: reject extts for non-pci11x1x devices
Remove PTP_PF_EXTTS support for non-PCI11x1x devices since they do not support
the PTP-IO Input event triggered timestamping mechanisms added
Fixes:
60942c397af6 ("net: lan743x: Add support for PTP-IO Event Input External Timestamp (extts)")
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Coco Li [Fri, 30 Sep 2022 22:09:05 +0000 (15:09 -0700)]
gro: add support of (hw)gro packets to gro stack
Current GRO stack only supports incoming packets containing
one frame/MSS.
This patch changes GRO to accept packets that are already GRO.
HW-GRO (aka RSC for some vendors) is very often limited in presence
of interleaved packets. Linux SW GRO stack can complete the job
and provide larger GRO packets, thus reducing rate of ACK packets
and cpu overhead.
This also means BIG TCP can still be used, even if HW-GRO/RSC was
able to cook ~64 KB GRO packets.
v2: fix logic in tcp_gro_receive()
Only support TCP for the moment (Paolo)
Co-Developed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Coco Li <lixiaoyan@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiasheng Jiang [Fri, 30 Sep 2022 04:48:43 +0000 (12:48 +0800)]
net: prestera: acl: Add check for kmemdup
As the kemdup could return NULL, it should be better to check the return
value and return error if fails.
Moreover, the return value of prestera_acl_ruleset_keymask_set() should
be checked by cascade.
Fixes:
604ba230902d ("net: prestera: flower template support")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Taras Chornyi<tchornyi@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Oct 2022 10:18:53 +0000 (11:18 +0100)]
Merge branch 'mptcp-fastclose'
Mat Martineau says:
====================
mptcp: Fastclose edge cases and error handling
MPTCP has existing code to use the MP_FASTCLOSE option header, which
works like a RST for the MPTCP-level connection (regular RSTs only
affect specific subflows in MPTCP). This series has some improvements
for fastclose.
Patch 1 aligns fastclose socket error handling with TCP RST behavior on
TCP sockets.
Patch 2 adds use of MP_FASTCLOSE in some more edge cases, like file
descriptor close, FIN_WAIT timeout, and when the socket has unread data.
Patch 3 updates the fastclose self tests.
Patch 4 does not change any code, just fixes some outdated comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 30 Sep 2022 15:59:34 +0000 (08:59 -0700)]
mptcp: update misleading comments.
The MPTCP data path is quite complex and hard to understend even
without some foggy comments referring to modified code and/or
completely misleading from the beginning.
Update a few of them to more accurately describing the current
status.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 30 Sep 2022 15:59:33 +0000 (08:59 -0700)]
selftests: mptcp: update and extend fastclose test-cases
After the previous patches, the MPTCP protocol can generate
fast-closes on both ends of the connection. Rework the relevant
test-case to carefully trigger the fast-close code-path on a
single end at the time, while ensuring than a predictable amount
of data is spooled on both ends.
Additionally add another test-cases for the passive socket
fast-close.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 30 Sep 2022 15:59:32 +0000 (08:59 -0700)]
mptcp: use fastclose on more edge scenarios
Daire reported a user-space application hang-up when the
peer is forcibly closed before the data transfer completion.
The relevant application expects the peer to either
do an application-level clean shutdown or a transport-level
connection reset.
We can accommodate a such user by extending the fastclose
usage: at fd close time, if the msk socket has some unread
data, and at FIN_WAIT timeout.
Note that at MPTCP close time we must ensure that the TCP
subflows will reset: set the linger socket option to a suitable
value.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 30 Sep 2022 15:59:31 +0000 (08:59 -0700)]
mptcp: propagate fastclose error
When an mptcp socket is closed due to an incoming FASTCLOSE
option, so specific sk_err is set and later syscall will
fail usually with EPIPE.
Align the current fastclose error handling with TCP reset,
properly setting the socket error according to the current
msk state and propagating such error.
Additionally sendmsg() is currently not handling properly
the sk_err, always returning EPIPE.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Oct 2022 10:08:33 +0000 (11:08 +0100)]
Merge branch 'RollBall-Hilink-Turris-10G-copper-SFP-support'
Marek Behún says:
====================
RollBall / Hilink / Turris 10G copper SFP support
I am resurrecting my attempt to add support for RollBall / Hilink /
Turris 10G copper SFPs modules.
The modules contain Marvell 88X3310 PHY, which can communicate with
the system via sgmii, 2500base-x, 5gbase-r, 10gbase-r or usxgmii mode.
Some of the patches I've taken from Russell King's net-queue [1]
(with some rebasing).
The important change from my previous attempts are:
- I am including the changes needed to phylink and marvell10g driver,
so that the 88X3310 PHY is configured to use PHY modes supported by
the host (the PHY defaults to use 10gbase-r only on host's side)
- I have changed the patch that informs phylib about the interfaces
supported by the host (patch 5 of this series): it now fills in the
phydev->host_interfaces member only when connecting a PHY that is
inside a SFP module. This may change in the future.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Behún [Fri, 30 Sep 2022 14:21:10 +0000 (16:21 +0200)]
net: sfp: add support for multigig RollBall transceivers
This adds support for multigig copper SFP modules from RollBall/Hilink.
These modules have a specific way to access clause 45 registers of the
internal PHY.
We also need to wait at least 22 seconds after deasserting TX disable
before accessing the PHY. The code waits for 25 seconds just to be sure.
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Behún [Fri, 30 Sep 2022 14:21:09 +0000 (16:21 +0200)]
net: phy: mdio-i2c: support I2C MDIO protocol for RollBall SFP modules
Some multigig SFPs from RollBall and Hilink do not expose functional
MDIO access to the internal PHY of the SFP via I2C address 0x56
(although there seems to be read-only clause 22 access on this address).
Instead these SFPs PHY can be accessed via I2C via the SFP Enhanced
Digital Diagnostic Interface - I2C address 0x51. The SFP_PAGE has to be
selected to 3 and the password must be filled with 0xff bytes for this
PHY communication to work.
This extends the mdio-i2c driver to support this protocol by adding a
special parameter to mdio_i2c_alloc function via which this RollBall
protocol can be selected.
Signed-off-by: Marek Behún <kabel@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Behún [Fri, 30 Sep 2022 14:21:08 +0000 (16:21 +0200)]
net: sfp: create/destroy I2C mdiobus before PHY probe/after PHY release
Instead of configuring the I2C mdiobus when SFP driver is probed,
create/destroy the mdiobus before the PHY is probed for/after it is
released.
This way we can tell the mdio-i2c code which protocol to use for each
SFP transceiver.
Move the code that determines MDIO I2C protocol from
sfp_sm_probe_for_phy() to sfp_sm_mod_probe(), where most of the SFP ID
parsing is done. Don't allocate I2C bus if no PHY is expected.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Behún [Fri, 30 Sep 2022 14:21:07 +0000 (16:21 +0200)]
net: sfp: Add and use macros for SFP quirks definitions
Add macros SFP_QUIRK(), SFP_QUIRK_M() and SFP_QUIRK_F() for defining SFP
quirk table entries. Use them to deduplicate the code a little bit.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Behún [Fri, 30 Sep 2022 14:21:06 +0000 (16:21 +0200)]
net: phylink: allow attaching phy for SFP modules on 802.3z mode
Some SFPs may contain an internal PHY which may in some cases want to
connect with the host interface in 1000base-x/2500base-x mode.
Do not fail if such PHY is being attached in one of these PHY interface
modes.
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Pali Rohár <pali@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Fri, 30 Sep 2022 14:21:05 +0000 (16:21 +0200)]
net: phy: marvell10g: select host interface configuration
Select the host interface configuration according to the capabilities of
the host if the host provided them. This is currently provided only when
connecting PHY that is inside a SFP.
The PHY supports several configurations of host communication:
- always communicate with host in 10gbase-r, even if copper speed is
lower (rate matching mode),
- the same as above but use xaui/rxaui instead of 10gbase-r,
- switch host SerDes mode between 10gbase-r, 5gbase-r, 2500base-x and
sgmii according to copper speed,
- the same as above but use xaui/rxaui instead of 10gbase-r.
This mode of host communication, called MACTYPE, is by default selected
by strapping pins, but it can be changed in software.
This adds support for selecting this mode according to which modes are
supported by the host.
This allows the kernel to:
- support SFP modules with 88X33X0 or 88E21X0 inside them
Note: we use mv3310_select_mactype() for both 88X3310 and 88X3340,
although 88X3340 does not support XAUI. This is not a problem because
88X3340 does not declare XAUI in it's supported_interfaces, and so this
function will never choose that MACTYPE.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
[ rebase, updated, also added support for 88E21X0 ]
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Behún [Fri, 30 Sep 2022 14:21:04 +0000 (16:21 +0200)]
net: phy: marvell10g: Use tabs instead of spaces for indentation
Some register definitions were defined with spaces used for indentation.
Change them to tabs.
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Behún [Fri, 30 Sep 2022 14:21:03 +0000 (16:21 +0200)]
net: phylink: pass supported host PHY interface modes to phylib for SFP's PHYs
Pass the supported PHY interface types to phylib if the PHY we are
connecting is inside a SFP, so that the PHY driver can select an
appropriate host configuration mode for their interface according to
the host capabilities.
For example the Marvell 88X3310 PHY inside RollBall SFP modules
defaults to 10gbase-r mode on host's side, and the marvell10g
driver currently does not change this setting. But a host may not
support 10gbase-r. For example Turris Omnia only supports sgmii,
1000base-x and 2500base-x modes. The PHY can be configured to use
those modes, but in order for the PHY driver to do that, it needs
to know which modes are supported.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King (Oracle) [Fri, 30 Sep 2022 14:21:02 +0000 (16:21 +0200)]
net: phylink: rename phylink_sfp_config()
phylink_sfp_config() now only deals with configuring the MAC for a
SFP containing a PHY. Rename it to be specific.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Fri, 30 Sep 2022 14:21:01 +0000 (16:21 +0200)]
net: phylink: use phy_interface_t bitmaps for optical modules
Where a MAC provides a phy_interface_t bitmap, use these bitmaps to
select the operating interface mode for optical SFP modules, rather
than using the linkmode bitmaps.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Fri, 30 Sep 2022 14:21:00 +0000 (16:21 +0200)]
net: sfp: augment SFP parsing with phy_interface_t bitmap
We currently parse the SFP EEPROM to a bitmap of ethtool link modes,
and then attempt to convert the link modes to a PHY interface mode.
While this works at present, there are cases where this is sub-optimal.
For example, where a module can operate with several different PHY
interface modes.
To start addressing this, arrange for the SFP EEPROM parsing to also
provide a bitmap of the possible PHY interface modes.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King (Oracle) [Fri, 30 Sep 2022 14:20:59 +0000 (16:20 +0200)]
net: phylink: add ability to validate a set of interface modes
Rather than having the ability to validate all supported interface
modes or a single interface mode, introduce the ability to validate
a subset of supported modes.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[ rebased on current net-next ]
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nathan Huckleberry [Thu, 29 Sep 2022 18:19:47 +0000 (11:19 -0700)]
net: sparx5: Fix return type of sparx5_port_xmit_impl
The ndo_start_xmit field in net_device_ops is expected to be of type
netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev).
The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.
The return type of sparx5_port_xmit_impl should be changed from int to
netdev_tx_t.
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1703
Cc: llvm@lists.linux.dev
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima [Thu, 29 Sep 2022 15:52:04 +0000 (08:52 -0700)]
af_unix: Fix memory leaks of the whole sk due to OOB skb.
syzbot reported a sequence of memory leaks, and one of them indicated we
failed to free a whole sk:
unreferenced object 0xffff8880126e0000 (size 1088):
comm "syz-executor419", pid 326, jiffies
4294773607 (age 12.609s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 7d 00 00 00 00 00 00 00 ........}.......
01 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............
backtrace:
[<
000000006fefe750>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:1970
[<
0000000074006db5>] sk_alloc+0x3b/0x800 net/core/sock.c:2029
[<
00000000728cd434>] unix_create1+0xaf/0x920 net/unix/af_unix.c:928
[<
00000000a279a139>] unix_create+0x113/0x1d0 net/unix/af_unix.c:997
[<
0000000068259812>] __sock_create+0x2ab/0x550 net/socket.c:1516
[<
00000000da1521e1>] sock_create net/socket.c:1566 [inline]
[<
00000000da1521e1>] __sys_socketpair+0x1a8/0x550 net/socket.c:1698
[<
000000007ab259e1>] __do_sys_socketpair net/socket.c:1751 [inline]
[<
000000007ab259e1>] __se_sys_socketpair net/socket.c:1748 [inline]
[<
000000007ab259e1>] __x64_sys_socketpair+0x97/0x100 net/socket.c:1748
[<
000000007dedddc1>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<
000000007dedddc1>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
[<
000000009456679f>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
We can reproduce this issue by creating two AF_UNIX SOCK_STREAM sockets,
send()ing an OOB skb to each other, and close()ing them without consuming
the OOB skbs.
int skpair[2];
socketpair(AF_UNIX, SOCK_STREAM, 0, skpair);
send(skpair[0], "x", 1, MSG_OOB);
send(skpair[1], "x", 1, MSG_OOB);
close(skpair[0]);
close(skpair[1]);
Currently, we free an OOB skb in unix_sock_destructor() which is called via
__sk_free(), but it's too late because the receiver's unix_sk(sk)->oob_skb
is accounted against the sender's sk->sk_wmem_alloc and __sk_free() is
called only when sk->sk_wmem_alloc is 0.
In the repro sequences, we do not consume the OOB skb, so both two sk's
sock_put() never reach __sk_free() due to the positive sk->sk_wmem_alloc.
Then, no one can consume the OOB skb nor call __sk_free(), and we finally
leak the two whole sk.
Thus, we must free the unconsumed OOB skb earlier when close()ing the
socket.
Fixes:
314001f0bf92 ("af_unix: Add OOB support")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Oct 2022 06:59:06 +0000 (07:59 +0100)]
Merge branch 'ip_tunnel-netlink-parms'
Liu Jian says:
====================
Add helper functions to parse netlink msg of ip_tunnel
v1->v2: Move the implementation of the helper function to ip_tunnel_core.c
v2->v3: Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Liu Jian [Thu, 29 Sep 2022 13:52:03 +0000 (21:52 +0800)]
net: Add helper function to parse netlink msg of ip_tunnel_parm
Add ip_tunnel_netlink_parms to parse netlink msg of ip_tunnel_parm.
Reduces duplicate code, no actual functional changes.
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Liu Jian [Thu, 29 Sep 2022 13:52:02 +0000 (21:52 +0800)]
net: Add helper function to parse netlink msg of ip_tunnel_encap
Add ip_tunnel_netlink_encap_parms to parse netlink msg of ip_tunnel_encap.
Reduces duplicate code, no actual functional changes.
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tetsuo Handa [Wed, 28 Sep 2022 15:25:37 +0000 (00:25 +0900)]
net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks()
syzbot is reporting lockdep warning at rds_tcp_reset_callbacks() [1], for
commit
ac3615e7f3cffe2a ("RDS: TCP: Reduce code duplication in
rds_tcp_reset_callbacks()") added cancel_delayed_work_sync() into a section
protected by lock_sock() without realizing that rds_send_xmit() might call
lock_sock().
We don't need to protect cancel_delayed_work_sync() using lock_sock(), for
even if rds_{send,recv}_worker() re-queued this work while __flush_work()
from cancel_delayed_work_sync() was waiting for this work to complete,
retried rds_{send,recv}_worker() is no-op due to the absence of RDS_CONN_UP
bit.
Link: https://syzkaller.appspot.com/bug?extid=78c55c7bc6f66e53dce2
Reported-by: syzbot <syzbot+78c55c7bc6f66e53dce2@syzkaller.appspotmail.com>
Co-developed-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+78c55c7bc6f66e53dce2@syzkaller.appspotmail.com>
Fixes:
ac3615e7f3cffe2a ("RDS: TCP: Reduce code duplication in rds_tcp_reset_callbacks()")
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Oct 2022 06:46:08 +0000 (07:46 +0100)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
1) Refactor selftests to use an array of structs in xfrm_fill_key().
From Gautam Menghani.
2) Drop an unused argument from xfrm_policy_match.
From Hongbin Wang.
3) Support collect metadata mode for xfrm interfaces.
From Eyal Birger.
4) Add netlink extack support to xfrm.
From Sabrina Dubroca.
Please note, there is a merge conflict in:
include/net/dst_metadata.h
between commit:
0a28bfd4971f ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support")
from the net-next tree and commit:
5182a5d48c3d ("net: allow storing xfrm interface metadata in metadata_dst")
from the ipsec-next tree.
Can be solved as done in linux-next.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 2 Oct 2022 21:09:07 +0000 (14:09 -0700)]
Linux 6.0
Linus Torvalds [Sun, 2 Oct 2022 17:00:45 +0000 (10:00 -0700)]
Merge tag 'i2c-for-6.0-rc8' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Add missing DT bindings for STM32 and a resource leak fix for DaVinci"
* tag 'i2c-for-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: davinci: fix PM disable depth imbalance in davinci_i2c_probe
dt-bindings: i2c: st,stm32-i2c: Document wakeup-source property
dt-bindings: i2c: st,stm32-i2c: Document interrupt-names property
Linus Torvalds [Sun, 2 Oct 2022 16:41:27 +0000 (09:41 -0700)]
Merge tag 'perf-urgent-2022-10-02' of git://git./linux/kernel/git/tip/tip
Pull misc perf fixes from Ingo Molnar:
- Fix a PMU enumeration/initialization bug on Intel Alder Lake CPUs
- Fix KVM guest PEBS register handling
- Fix race/reentry bug in perf_output_read_group() reading of PMU
counters
* tag 'perf-urgent-2022-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix reentry problem in perf_output_read_group()
perf/x86/core: Completely disable guest PEBS via guest's global_ctrl
perf/x86/intel: Fix unchecked MSR access error for Alder Lake N
Linus Torvalds [Sun, 2 Oct 2022 16:30:35 +0000 (09:30 -0700)]
Merge tag 'x86_urgent_for_v6.0' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Add the respective UP last level cache mask accessors in order not to
cause segfaults when lscpu accesses their representation in sysfs
- Fix for a race in the alternatives batch patching machinery when
kprobes are set
* tag 'x86_urgent_for_v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cacheinfo: Add a cpu_llc_shared_mask() UP variant
x86/alternative: Fix race in try_get_desc()
David S. Miller [Sun, 2 Oct 2022 15:07:17 +0000 (16:07 +0100)]
Merge branch 'tc-bind_class-hook'
Zhengchao Shao says:
====================
refactor duplicate codes in bind_class hook function
All the bind_class callback duplicate the same logic, so we can refactor
them. First, ensure n arg not empty before call bind_class hook function.
Then, add tc_cls_bind_class() helper. Last, use tc_cls_bind_class() in
filter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhengchao Shao [Tue, 27 Sep 2022 12:48:55 +0000 (20:48 +0800)]
net: sched: use tc_cls_bind_class() in filter
Use tc_cls_bind_class() in filter.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhengchao Shao [Tue, 27 Sep 2022 12:48:54 +0000 (20:48 +0800)]
net: sched: cls_api: introduce tc_cls_bind_class() helper
All the bind_class callback duplicate the same logic, this patch
introduces tc_cls_bind_class() helper for common usage.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhengchao Shao [Tue, 27 Sep 2022 12:48:53 +0000 (20:48 +0800)]
net: sched: ensure n arg not empty before call bind_class
All bind_class callbacks are directly returned when n arg is empty.
Therefore, bind_class is invoked only when n arg is not empty.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhang Qilong [Thu, 29 Sep 2022 14:30:38 +0000 (22:30 +0800)]
i2c: davinci: fix PM disable depth imbalance in davinci_i2c_probe
The pm_runtime_enable will increase power disable depth. Thus a
pairing decrement is needed on the error handling path to keep
it balanced according to context.
Fixes:
17f88151ff190 ("i2c: davinci: Add PM Runtime Support")
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Reviewed-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Marek Vasut [Mon, 26 Sep 2022 20:46:53 +0000 (22:46 +0200)]
dt-bindings: i2c: st,stm32-i2c: Document wakeup-source property
Document wakeup-source property. This fixes dtbs_check warnings
when building current Linux DTs:
"
arch/arm/boot/dts/stm32mp153c-dhcom-drc02.dtb: i2c@
40015000: Unevaluated properties are not allowed ('wakeup-source' was unexpected)
"
Signed-off-by: Marek Vasut <marex@denx.de>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Marek Vasut [Mon, 26 Sep 2022 20:46:31 +0000 (22:46 +0200)]
dt-bindings: i2c: st,stm32-i2c: Document interrupt-names property
Document interrupt-names property with "event" and "error" interrupt names.
This fixes dtbs_check warnings when building current Linux DTs:
"
arch/arm/boot/dts/stm32mp153c-dhcom-drc02.dtb: i2c@
40015000: Unevaluated properties are not allowed ('interrupt-names' was unexpected)
"
Signed-off-by: Marek Vasut <marex@denx.de>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Jakub Kicinski [Sat, 1 Oct 2022 20:30:25 +0000 (13:30 -0700)]
Merge branch 'mlx5-xsk-updates-part3-2022-09-30'
Saeed Mahameed says:
====================
mlx5 xsk updates part3 2022-09-30
The gist of this 4 part series is in this patchset's last patch
This series contains performance optimizations. XSK starts using the
batching allocator, and XSK data path gets separated from the regular
RX, allowing to drop some branches not relevant for non-XSK use cases.
Some minor optimizations for indirect calls and need_wakeup are also
included.
Other than that, this series adds a few features to the mlx5e
implementation of XSK:
1. XDP metadata support on XSK RQs.
2. RSS contexts support for XSK RQs.
3. Some other optimizations
4. Last but not least, change the queuing scheme, so that XSK RQs no longer
use higher indices, but replace the regular RQs.
Maxim Says:
==========
In the initial implementation of XSK in mlx5e, XSK RQs coexisted with
regular RQs in the same channel. The main idea was to allow RSS work the
same for regular traffic, without need to reconfigure RSS to exclude XSK
queues.
However, this scheme didn't prove to be beneficial, mainly because of
incompatibility with other vendors. Some tools don't properly support
using higher indices for XSK queues, some tools get confused with the
double amount of RQs exposed in sysfs. Some use cases are purely XSK,
and allocating the same amount of unused regular RQs is a waste of
resources.
This commit changes the queuing scheme to the standard one, where XSK
RQs replace regular RQs on the channels where XSK sockets are open. Two
RQs still exist in the channel to allow failsafe disable of XSK, but
only one is exposed at a time. The next commit will achieve the desired
memory save by flushing the buffers when the regular RQ is unused.
As the result of this transition:
1. It's possible to use RSS contexts over XSK RQs.
2. It's possible to dedicate all queues to XSK.
3. When XSK RQs coexist with regular RQs, the admin should make sure no
unwanted traffic goes into XSK RQs by either excluding them from RSS or
settings up the XDP program to return XDP_PASS for non-XSK traffic.
4. When using a mixed fleet of mlx5e devices and other netdevs, the same
configuration can be applied. If the application supports the fallback
to copy mode on unsupported drivers, it will work too.
==========
Part 4 will include some final xsk optimizations and minor improvements
part 1: https://lore.kernel.org/netdev/
20220927203611.244301-1-saeed@kernel.org/
part 2: https://lore.kernel.org/netdev/
20220929072156.93299-1-saeed@kernel.org/
====================
Link: https://lore.kernel.org/r/20220930162903.62262-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:29:03 +0000 (09:29 -0700)]
net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues
In the initial implementation of XSK in mlx5e, XSK RQs coexisted with
regular RQs in the same channel. The main idea was to allow RSS work the
same for regular traffic, without need to reconfigure RSS to exclude XSK
queues.
However, this scheme didn't prove to be beneficial, mainly because of
incompatibility with other vendors. Some tools don't properly support
using higher indices for XSK queues, some tools get confused with the
double amount of RQs exposed in sysfs. Some use cases are purely XSK,
and allocating the same amount of unused regular RQs is a waste of
resources.
This commit changes the queuing scheme to the standard one, where XSK
RQs replace regular RQs on the channels where XSK sockets are open. Two
RQs still exist in the channel to allow failsafe disable of XSK, but
only one is exposed at a time. The next commit will achieve the desired
memory save by flushing the buffers when the regular RQ is unused.
As the result of this transition:
1. It's possible to use RSS contexts over XSK RQs.
2. It's possible to dedicate all queues to XSK.
3. When XSK RQs coexist with regular RQs, the admin should make sure no
unwanted traffic goes into XSK RQs by either excluding them from RSS or
settings up the XDP program to return XDP_PASS for non-XSK traffic.
4. When using a mixed fleet of mlx5e devices and other netdevs, the same
configuration can be applied. If the application supports the fallback
to copy mode on unsupported drivers, it will work too.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:29:02 +0000 (09:29 -0700)]
net/mlx5e: Introduce the mlx5e_flush_rq function
Add a function to flush an RQ: clean up descriptors, release pages and
reset the RQ. This procedure is used by the recovery flow, and it will
also be used in a following commit to free some memory when switching a
channel to the XSK mode.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:29:01 +0000 (09:29 -0700)]
net/mlx5e: xsk: Support XDP metadata on XSK RQs
Add support for XDP metadata on XSK RQs for cross-program
communication. The driver no longer calls xdp_set_data_meta_invalid and
copies the metadata to a newly allocated SKB on XDP_PASS.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:29:00 +0000 (09:29 -0700)]
net/mlx5e: Optimize RQ page deallocation
mlx5e_free_rx_mpwqe loops over all pages of a MPWQE, calling
mlx5e_page_release for ones that are not scheduled for XDP_TX or
XDP_REDIRECT; and mlx5e_page_release checks whether it's an XSK RQ or a
regular one for each page/XSK frame. This check can be moved outside the
loop to reduce the number of branches.
mlx5e_free_rx_wqe loops over all fragments, calling mlx5e_page_release
for the ones that are last in a page; and mlx5e_page_release checks
whether it's an XSK RQ or a regular one for each fragment. Using the
fact that XSK doesn't support multiple fragments, it can be optimized
for both XSK and regular usages:
1. Make an early check for XSK and call its deallocator directly, saving
3 branches (loop condition, frag->last_in_page and selection of
deallocator).
2. Call the regular deallocator directly in the non-XSK case, saving a
branch per fragment, except the first one.
After the changes, mlx5e_page_release is removed, as there are no
callers left.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:28:59 +0000 (09:28 -0700)]
net/mlx5e: Call mlx5e_page_release_dynamic directly where possible
mlx5e_page_release calls the appropriate deallocator depending on
whether it's an XSK RQ or a regular one. Some flows that call this
function are not compatible with XSK, so they can call the non-XSK
deallocator directly to save a branch.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:28:58 +0000 (09:28 -0700)]
net/mlx5e: Use non-XSK page allocator in SHAMPO
The SHAMPO flow is not compatible with XSK, it can call the page pool
allocator directly to save a branch.
mlx5e_page_alloc is removed, as it's no longer used in any flow.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:28:57 +0000 (09:28 -0700)]
net/mlx5e: xsk: Use xsk_buff_alloc_batch on striding RQ
XSK provides a function to allocate frames in batches for more efficient
processing. This commit starts using this function on striding RQ and
creates an optimized flow for XSK. A side effect is an opportunity to
optimize the regular RX flow by dropping branching for XSK cases.
Performance improvement is up to 6.4% in the aligned mode and up to 7.5%
in the unaligned mode.
Aligned mode, 2048-byte frames: 12.9 Mpps -> 13.8 Mpps
Aligned mode, 4096-byte frames: 11.8 Mpps -> 12.5 Mpps
Unaligned mode, 2048-byte frames: 11.9 Mpps -> 12.8 Mpps
Unaligned mode, 3072-byte frames: 11.4 Mpps -> 12.1 Mpps
Unaligned mode, 4096-byte frames: 11.0 Mpps -> 11.2 Mpps
CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:28:56 +0000 (09:28 -0700)]
net/mlx5e: xsk: Use xsk_buff_alloc_batch on legacy RQ
XSK provides a function to allocate frames in batches for more efficient
processing. This commit starts using this function on legacy RQ, adding
a special case for XSK. The new branch introduced basically replaces the
branch that was removed from the same place a few commits before.
A check is made that DMA sync is not needed, because the batching
allocator falls back to returning one frame when DMA sync is needed, and
this is best handled by the loop in the standard case.
Performance improvement is up to 8% in the aligned mode and up to 9% in
the unaligned mode.
Aligned mode, 2048-byte frames: 12.8 Mpps -> 13.5 Mpps
Aligned mode, 4096-byte frames: 11.5 Mpps -> 12.4 Mpps
Unaligned mode, 2048-byte frames: 12.2 Mpps -> 13.4 Mpps
Unaligned mode, 3072-byte frames: 11.6 Mpps -> 12.5 Mpps
Unaligned mode, 4096-byte frames: 11.2 Mpps -> 12.2 Mpps
CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:28:55 +0000 (09:28 -0700)]
net/mlx5e: xsk: Split out WQE allocation for legacy XSK RQ
Allocation of XSK frames on legacy RQ may be made more efficient with a
specialized routine that relies on certain assumptions, such as there is
only one fragment, allocation units (XSK frames) are not shared among
multiple packets. It reduces the number of branches both in the XSK code
and in the regular RQ, because with this approach there is only a single
check whether it's an XSK or regular RQ.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:28:54 +0000 (09:28 -0700)]
net/mlx5e: Remove the outer loop when allocating legacy RQ WQEs
Legacy RQ WQEs are allocated in a loop in small batches (8 WQEs). As
partial batches are allowed, there is no point to have a loop in a loop,
so the outer loop is removed, and the batch size is increased up to the
total number of WQEs to allocate, still not smaller than 8.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:28:53 +0000 (09:28 -0700)]
net/mlx5e: xsk: Use partial batches in legacy RQ with XSK
The previous commit allowed allocating WQE batches in legacy RQ
partially, however, XSK still checks whether there are enough frames in
the fill ring. Remove this check to allow to allocate batches partially
also with XSK.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:28:52 +0000 (09:28 -0700)]
net/mlx5e: Use partial batches in legacy RQ
Legacy RQ allocates WQEs in batches. If the batch allocation fails, the
pages of the allocated part are released. This commit changes this
behavior to allow to use the pages that have been already allocated.
After this change, we need to be careful about indexing rq->wqe.frags[].
The WQ size is a power of two that divides by wqe_bulk (8), and the old
code used whole bulks, which allowed to use indices [8*K; 8*K+7] without
overflowing. Now that the bulks may be partial, the range can start at
any location (not only at 8*K), so we need to wrap them around to avoid
out-of-bounds array access.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:28:51 +0000 (09:28 -0700)]
net/mlx5e: Make the wqe_index_mask calculation more exact
The old calculation of wqe_index_mask may give false positives, i.e.
request bulking of pairs of WQEs when not strictly needed, for example,
when the first fragment size is equal to the PAGE_SIZE, bulking is not
needed, even if the number of fragments is odd.
Make the calculation more exact to cut false positives.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:28:50 +0000 (09:28 -0700)]
net/mlx5e: Introduce wqe_index_mask for legacy RQ
When fragments of different WQEs share the same page, mlx5e_post_rx_wqes
must wait until the old WQE stops using the page, only then the new WQE
can allocate the new page. Essentially, it means that if WQE index i is
still in use, the allocation must stop before `i % bulk`, where bulk is
the number of WQEs that may share the same page.
As bulk is always a power of two, `i % bulk = i & (bulk - 1)`, and the
new wqe_index_mask field will be equal to `bulk - 1`.
At the same time, wqe_bulk remains for optimization purposes and stores
`max(bulk, 8)`, which allows to skip the allocation until we have at
least 8 WQEs free.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:28:49 +0000 (09:28 -0700)]
net/mlx5e: xsk: Drop the check for XSK state in mlx5e_xsk_wakeup
The MLX5E_CHANNEL_STATE_XSK flag checked in mlx5e_xsk_wakeup indicates
that XSK queues are open, but not necessarily activated. This check is
not very useful, because:
0. Both XSK setup and netdev state transitions take the same state_lock
mutex, so they can't happen at the same time.
1. If the netdev is up, xsk_is_bound can return true only when
MLX5E_CHANNEL_STATE_XSK is set on the corresponding channel.
mlx5e_xsk_wakeup is only called when xsk_is_bound is true.
2. If the XSK socket is bound, and the netdev is going up or down,
mlx5e_xsk_wakeup can take one of two branches, depending on the return
value of napi_if_scheduled_mark_missed:
2.1. True means one of two things: either NAPI was enabled at this
point, which means MLX5E_CHANNEL_STATE_XSK was also set; or NAPI was
disabled, and nothing really happened.
2.2. False means that NAPI was enabled by this point, which also implies
MLX5E_CHANNEL_STATE_XSK was set. Additionally, mlx5e_xsk_wakeup contains
a following check for MLX5E_SQ_STATE_ENABLED on async_icosq, and this
flag implies MLX5E_CHANNEL_STATE_XSK too on XSK channels.
As checking this flag doesn't cut any flows, remove the check.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Fri, 30 Sep 2022 16:28:48 +0000 (09:28 -0700)]
net/mlx5e: xsk: Use mlx5e_trigger_napi_icosq for XSK wakeup
mlx5e_xsk_wakeup triggers an IRQ by posting a NOP to async_icosq, taking
a spinlock to protect from concurrent access. There is already a
function that does the same: mlx5e_trigger_napi_icosq. Use this function
in mlx5e_xsk_wakeup.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Sat, 1 Oct 2022 16:39:42 +0000 (09:39 -0700)]
Merge tag 'usb-6.0-final' of git://git./linux/kernel/git/gregkh/usb
Pull USB/Thunderbolt fixes from Greg KH:
"Here are some tiny USB and Thunderbolt driver fixes and quirks.
Included in here are:
- three uas/usb-storage driver quirks to get the devices working
properly due to broken firmware images in them (they can not run at
high data rates, and are also throttled on other operating systems
because of this)
- thunderbolt bugfix for plug event delays
- typec runtime warning removal
- dwc3 st driver bugfix. Note, a follow-on fix for this will end up
coming in for 6.1-rc1 as the developers are still arguing over what
the final solution will be, but this should be sufficient for now
All of these have been in linux-next with no reported problems"
* tag 'usb-6.0-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
uas: ignore UAS for Thinkplus chips
usb-storage: Add Hiksemi USB3-FW to IGNORE_UAS
uas: add no-uas quirk for Hiksemi usb_disk
usb: dwc3: st: Fix node's child name
usb: typec: ucsi: Remove incorrect warning
thunderbolt: Explicitly reset plug events delay back to USB4 spec value
Linus Torvalds [Sat, 1 Oct 2022 16:27:18 +0000 (09:27 -0700)]
Merge tag 'media/v6.0-1' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- some fixes for the v4l2 ioctl handler logic
- a fix for an out of bound access in the DVB videobuf2 handler
- three driver fixes (rkvdec, mediatek/vcodek and uvcvideo)
* tag 'media/v6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: rkvdec: Disable H.264 error detection
media: mediatek: vcodec: Drop platform_get_resource(IORESOURCE_IRQ)
media: dvb_vb2: fix possible out of bound access
media: v4l2-ioctl.c: fix incorrect error path
media: v4l2-compat-ioctl32.c: zero buffer passed to v4l2_compat_get_array_args()
media: uvcvideo: Fix InterfaceProtocol for Quanta camera
Linus Torvalds [Sat, 1 Oct 2022 16:13:29 +0000 (09:13 -0700)]
Merge tag 'mm-hotfixes-stable-2022-09-30' of git://git./linux/kernel/git/akpm/mm
Pull more hotfixes from Andrew Morton:
"One MAINTAINERS update, two MM fixes, both cc:stable"
The previous pull wasn't fated to be the last one..
* tag 'mm-hotfixes-stable-2022-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
damon/sysfs: fix possible memleak on damon_sysfs_add_target
mm: fix BUG splat with kvmalloc + GFP_ATOMIC
MAINTAINERS: drop entry to removed file in ARM/RISCPC ARCHITECTURE
Dmitry Torokhov [Fri, 30 Sep 2022 15:57:18 +0000 (08:57 -0700)]
MIPS: pci: lantiq: switch to using gpiod API
This patch switches the driver from legacy gpio API to the newer
gpiod API.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Jason A. Donenfeld [Fri, 30 Sep 2022 14:01:38 +0000 (16:01 +0200)]
mips: allow firmware to pass RNG seed to kernel
Nearly all other firmware environments have some way of passing a RNG
seed to initialize the RNG: DTB's rng-seed, EFI's RNG protocol, m68k's
bootinfo block, x86's setup_data, and so forth. This adds something
similar for MIPS, which will allow various firmware environments,
bootloaders, and hypervisors to pass an RNG seed to initialize the
kernel's RNG.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Alexei Starovoitov [Sat, 1 Oct 2022 15:49:45 +0000 (08:49 -0700)]
bpf, docs: Delete misformatted table.
Delete misformatted table.
Fixes:
6166da0a02cd ("bpf, docs: Move legacy packet instructions to a separate file")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Sami Tolvanen [Fri, 30 Sep 2022 20:33:10 +0000 (20:33 +0000)]
Makefile.extrawarn: Move -Wcast-function-type-strict to W=1
We enable -Wcast-function-type globally in the kernel to warn about
mismatching types in function pointer casts. Compilers currently
warn only about ABI incompability with this flag, but Clang 16 will
enable a stricter version of the check by default that checks for an
exact type match. This will be very noisy in the kernel, so disable
-Wcast-function-type-strict without W=1 until the new warnings have
been addressed.
Cc: stable@vger.kernel.org
Link: https://reviews.llvm.org/D134831
Link: https://github.com/ClangBuiltLinux/linux/issues/1724
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220930203310.4010564-1-samitolvanen@google.com
Daniel Golle [Fri, 30 Sep 2022 00:56:53 +0000 (01:56 +0100)]
net: ethernet: mtk_eth_soc: fix state in __mtk_foe_entry_clear
Setting ib1 state to MTK_FOE_STATE_UNBIND in __mtk_foe_entry_clear
routine as done by commit
0e80707d94e4c8 ("net: ethernet: mtk_eth_soc:
fix typo in __mtk_foe_entry_clear") breaks flow offloading, at least
on older MTK_NETSYS_V1 SoCs, OpenWrt users have confirmed the bug on
MT7622 and MT7621 systems.
Felix Fietkau suggested to use MTK_FOE_STATE_INVALID instead which
works well on both, MTK_NETSYS_V1 and MTK_NETSYS_V2.
Tested on MT7622 (Linksys E8450) and MT7986 (BananaPi BPI-R3).
Suggested-by: Felix Fietkau <nbd@nbd.name>
Fixes:
0e80707d94e4c8 ("net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear")
Fixes:
33fc42de33278b ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/YzY+1Yg0FBXcnrtc@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 1 Oct 2022 02:03:49 +0000 (19:03 -0700)]
Merge branch 'nfp-support-fec-mode-reporting-and-auto-neg'
Simon Horman says:
====================
nfp: support FEC mode reporting and auto-neg
this series adds support for the following features to the nfp driver:
* Patch 1/5: Support active FEC mode
* Patch 2/5: Don't halt driver on non-fatal error when interacting with fw
* Patch 3/5: Treat port independence as a firmware rather than port property
* Patch 4/5: Support link auto negotiation
* Patch 5/5: Support restart of link auto negotiation
====================
Link: https://lore.kernel.org/r/20220929085832.622510-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fei Qin [Thu, 29 Sep 2022 08:58:32 +0000 (10:58 +0200)]
nfp: add support restart of link auto-negotiation
Add support restart of link auto-negotiation.
This may be initiated using:
# ethtool -r <intf>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yinjun Zhang [Thu, 29 Sep 2022 08:58:31 +0000 (10:58 +0200)]
nfp: add support for link auto negotiation
Report the auto negotiation capability if it's supported
in management firmware, and advertise it if it's enabled.
Changing port speed is not allowed when autoneg is enabled.
The ethtool <intf> command displays the auto-neg capability:
# ethtool enp1s0np0
Settings for enp1s0np0:
Supported ports: [ FIBRE ]
Supported link modes: Not reported
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: None RS BASER
Advertised link modes: Not reported
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: None RS BASER
Speed: 25000Mb/s
Duplex: Full
Auto-negotiation: on
Port: FIBRE
PHYAD: 0
Transceiver: internal
Link detected: yes
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yinjun Zhang [Thu, 29 Sep 2022 08:58:30 +0000 (10:58 +0200)]
nfp: refine the ABI of getting `sp_indiff` info
Considering that whether application firmware is indifferent to
port speed is a firmware property instead of port property, now use
a new rtsym to get the property instead of parsing per-port tlv caps.
With this change, relevant code is moved to `nfp_main` layer.
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yinjun Zhang [Thu, 29 Sep 2022 08:58:29 +0000 (10:58 +0200)]
nfp: avoid halt of driver init process when non-fatal error happens
It's not a fatal error when setting `hwinfo` into management firmware
fails, no need to halt the whole driver initialization process.
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yinjun Zhang [Thu, 29 Sep 2022 08:58:28 +0000 (10:58 +0200)]
nfp: add support for reporting active FEC mode
The latest management firmware can now report the active FEC
mode. Adapt driver accordingly so that user can get the active
FEC mode by running command:
# ethtool --show-fec <intf>
Also correct use of `fec` field.
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Levi Yun [Mon, 26 Sep 2022 16:06:11 +0000 (16:06 +0000)]
damon/sysfs: fix possible memleak on damon_sysfs_add_target
When damon_sysfs_add_target couldn't find proper task, New allocated
damon_target structure isn't registered yet, So, it's impossible to free
new allocated one by damon_sysfs_destroy_targets.
By calling damon_add_target as soon as allocating new target, Fix this
possible memory leak.
Link: https://lkml.kernel.org/r/20220926160611.48536-1-sj@kernel.org
Fixes:
a61ea561c871 ("mm/damon/sysfs: link DAMON for virtual address spaces monitoring")
Signed-off-by: Levi Yun <ppbuk5246@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [5.17.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Florian Westphal [Mon, 26 Sep 2022 15:16:50 +0000 (17:16 +0200)]
mm: fix BUG splat with kvmalloc + GFP_ATOMIC
Martin Zaharinov reports BUG with 5.19.10 kernel:
kernel BUG at mm/vmalloc.c:2437!
invalid opcode: 0000 [#1] SMP
CPU: 28 PID: 0 Comm: swapper/28 Tainted: G W O 5.19.9 #1
[..]
RIP: 0010:__get_vm_area_node+0x120/0x130
__vmalloc_node_range+0x96/0x1e0
kvmalloc_node+0x92/0xb0
bucket_table_alloc.isra.0+0x47/0x140
rhashtable_try_insert+0x3a4/0x440
rhashtable_insert_slow+0x1b/0x30
[..]
bucket_table_alloc uses kvzalloc(GPF_ATOMIC). If kmalloc fails, this now
falls through to vmalloc and hits code paths that assume GFP_KERNEL.
Link: https://lkml.kernel.org/r/20220926151650.15293-1-fw@strlen.de
Fixes:
a421ef303008 ("mm: allow !GFP_KERNEL allocations for kvmalloc")
Signed-off-by: Florian Westphal <fw@strlen.de>
Suggested-by: Michal Hocko <mhocko@suse.com>
Link: https://lore.kernel.org/linux-mm/Yy3MS2uhSgjF47dy@pc636/T/#t
Acked-by: Michal Hocko <mhocko@suse.com>
Reported-by: Martin Zaharinov <micron10@gmail.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lukas Bulwahn [Mon, 19 Sep 2022 07:52:55 +0000 (09:52 +0200)]
MAINTAINERS: drop entry to removed file in ARM/RISCPC ARCHITECTURE
Commit
c1fe8d054c0a ("ARM: riscpc: use GENERIC_IRQ_MULTI_HANDLER") removes
arch/arm/include/asm/hardware/entry-macro-iomd.S, but missed to adjust
MAINTAINERS.
Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken reference.
Drop the file entry to the removed file in ARM/RISCPC ARCHITECTURE.
Link: https://lkml.kernel.org/r/20220919075255.386-1-lukas.bulwahn@gmail.com
Fixes:
c1fe8d054c0a ("ARM: riscpc: use GENERIC_IRQ_MULTI_HANDLER")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhengchao Shao [Thu, 29 Sep 2022 04:19:09 +0000 (12:19 +0800)]
selftests/tc-testing: update qdisc/cls/action features in config
Since three patchsets "add tc-testing test cases", "refactor duplicate
codes in the tc cls walk function", and "refactor duplicate codes in the
qdisc class walk function" are merged to net-next tree, the list of
supported features needs to be updated in config file.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20220929041909.83913-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dmitry Torokhov [Thu, 29 Sep 2022 20:22:55 +0000 (13:22 -0700)]
dt-bindings: nfc: marvell,nci: fix reset line polarity in examples
The reset line is supposed to be "active low" (it even says so in the
description), but examples incorrectly show it as "active high"
(likely because original examples use 0 which is technically "active
high" but in practice often "don't care" if the driver is using legacy
gpio API, as this one does).
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/YzX+nzJolxAKmt+z@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 1 Oct 2022 01:17:22 +0000 (18:17 -0700)]
Merge branch 'devlink-sanitize-per-port-region-creation-destruction'
Jiri Pirko says:
====================
devlink: sanitize per-port region creation/destruction
Currently the only user of per-port devlink regions is DSA. All drivers
that use regions register them before devlink registration. For DSA,
this was not possible as the internals of struct devlink_port needed for
region creation are initialized during port registration.
This introduced a mismatch in creation flow of devlink and devlink port
regions. As you can see, it causes the DSA driver to make the port
init/exit flow a bit cumbersome.
Fix this by introducing port_init/fini() which could be optionally
called by drivers like DSA, to prepare struct devlink_port to be used
for region creation purposes before devlink port register is called.
Tested by Vladimir on his setup.
====================
Link: https://lore.kernel.org/r/20220929072902.2986539-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>