Sean Anderson [Mon, 9 Sep 2024 16:10:16 +0000 (12:10 -0400)]
net: xilinx: axienet: Relax partial rx checksum checks
The partial rx checksum feature computes a checksum over the entire
packet, regardless of the L3 protocol. Remove the check for IPv4.
Additionally, testing with csum.py (from kselftests) shows no anomalies
with 64-byte packets, so we can remove that check as well.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909161016.1149119-5-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Mon, 9 Sep 2024 16:10:15 +0000 (12:10 -0400)]
net: xilinx: axienet: Set RXCSUM in features
When it is supported by hardware, we enable receive checksum offload
unconditionally. Update features to reflect this.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20240909161016.1149119-4-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Mon, 9 Sep 2024 16:10:14 +0000 (12:10 -0400)]
net: xilinx: axienet: Enable NETIF_F_HW_CSUM for partial tx checksumming
Partial tx chechsumming is completely generic and does not depend on the
L3/L4 protocol. Signal this to the net subsystem by enabling the
more-generic offload feature (instead of restricting ourselves to
TCP/UDP over IPv4 checksumming only like is necessary with full
checksumming).
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909161016.1149119-3-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Mon, 9 Sep 2024 16:10:13 +0000 (12:10 -0400)]
net: xilinx: axienet: Remove unused checksum variables
These variables are set but never used. Remove them.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://patch.msgid.link/20240909161016.1149119-2-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Colin Ian King [Mon, 9 Sep 2024 13:46:12 +0000 (14:46 +0100)]
rtase: Fix spelling mistake: "tx_underun" -> "tx_underrun"
There is a spelling mistake in the struct field tx_underun, rename
it to tx_underrun.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909134612.63912-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Colin Ian King [Mon, 9 Sep 2024 14:00:21 +0000 (15:00 +0100)]
r8169: Fix spelling mistake: "tx_underun" -> "tx_underrun"
There is a spelling mistake in the struct field tx_underun, rename
it to tx_underrun.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20240909140021.64884-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dave Taht [Mon, 9 Sep 2024 09:16:28 +0000 (11:16 +0200)]
sch_cake: constify inverse square root cache
sch_cake uses a cache of the first 16 values of the inverse square root
calculation for the Cobalt AQM to save some cycles on the fast path.
This cache is populated when the qdisc is first loaded, but there's
really no reason why it can't just be pre-populated. So change it to be
pre-populated with constants, which also makes it possible to constify
it.
This gives a modest space saving for the module (not counting debug data):
.text: -224 bytes
.rodata: +80 bytes
.bss: -64 bytes
Total: -192 bytes
Signed-off-by: Dave Taht <dave.taht@gmail.com>
[ fixed up comment, rewrote commit message ]
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20240909091630.22177-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pieter Van Trappen [Mon, 9 Sep 2024 13:42:59 +0000 (15:42 +0200)]
net: dsa: microchip: update tag_ksz masks for KSZ9477 family
Remove magic number 7 by introducing a GENMASK macro instead.
Remove magic number 0x80 by using the BIT macro instead.
Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20240909134301.75448-1-vtpieter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 10 Sep 2024 23:55:24 +0000 (16:55 -0700)]
Merge branch 'net-timestamp-introduce-a-flag-to-filter-out-rx-software-and-hardware-report'
Jason Xing says:
====================
net-timestamp: introduce a flag to filter out rx software and hardware report
When one socket is set SOF_TIMESTAMPING_RX_SOFTWARE which means the
whole system turns on the netstamp_needed_key button, other sockets
that only have SOF_TIMESTAMPING_SOFTWARE will be affected and then
print the rx timestamp information even without setting
SOF_TIMESTAMPING_RX_SOFTWARE generation flag.
How to solve it without breaking users?
We introduce a new flag named SOF_TIMESTAMPING_OPT_RX_FILTER. Using
it together with SOF_TIMESTAMPING_SOFTWARE can stop reporting the
rx software timestamp.
Similarly, we also filter out the hardware case where one process
enables the rx hardware generation flag, then another process only
passing SOF_TIMESTAMPING_RAW_HARDWARE gets the timestamp. So we can set
both SOF_TIMESTAMPING_RAW_HARDWARE and SOF_TIMESTAMPING_OPT_RX_FILTER
to stop reporting rx hardware timestamp after this patch applied.
v6: https://lore.kernel.org/
20240906095640.77533-1-kerneljasonxing@gmail.com
v5: https://lore.kernel.org/
20240905071738.3725-1-kerneljasonxing@gmail.com
v4: https://lore.kernel.org/
20240830153751.86895-1-kerneljasonxing@gmail.com
v3: https://lore.kernel.org/
20240828160145.68805-1-kerneljasonxing@gmail.com
v2: https://lore.kernel.org/
20240825152440.93054-1-kerneljasonxing@gmail.com
====================
Link: https://patch.msgid.link/20240909015612.3856-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Xing [Mon, 9 Sep 2024 01:56:12 +0000 (09:56 +0800)]
net-timestamp: add selftests for SOF_TIMESTAMPING_OPT_RX_FILTER
Test a few possible cases where we use SOF_TIMESTAMPING_OPT_RX_FILTER
with software or hardware report/generation flag.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240909015612.3856-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Xing [Mon, 9 Sep 2024 01:56:11 +0000 (09:56 +0800)]
net-timestamp: introduce SOF_TIMESTAMPING_OPT_RX_FILTER flag
introduce a new flag SOF_TIMESTAMPING_OPT_RX_FILTER in the receive
path. User can set it with SOF_TIMESTAMPING_SOFTWARE to filter
out rx software timestamp report, especially after a process turns on
netstamp_needed_key which can time stamp every incoming skb.
Previously, we found out if an application starts first which turns on
netstamp_needed_key, then another one only passing SOF_TIMESTAMPING_SOFTWARE
could also get rx timestamp. Now we handle this case by introducing this
new flag without breaking users.
Quoting Willem to explain why we need the flag:
"why a process would want to request software timestamp reporting, but
not receive software timestamp generation. The only use I see is when
the application does request
SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_TX_SOFTWARE."
Similarly, this new flag could also be used for hardware case where we
can set it with SOF_TIMESTAMPING_RAW_HARDWARE, then we won't receive
hardware receive timestamp.
Another thing about errqueue in this patch I have a few words to say:
In this case, we need to handle the egress path carefully, or else
reporting the tx timestamp will fail. Egress path and ingress path will
finally call sock_recv_timestamp(). We have to distinguish them.
Errqueue is a good indicator to reflect the flow direction.
Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240909015612.3856-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Xing [Sun, 8 Sep 2024 12:41:41 +0000 (20:41 +0800)]
net-timestamp: correct the use of SOF_TIMESTAMPING_RAW_HARDWARE
SOF_TIMESTAMPING_RAW_HARDWARE is a report flag which passes the
timestamps generated by either SOF_TIMESTAMPING_TX_HARDWARE or
SOF_TIMESTAMPING_RX_HARDWARE to the userspace all the time.
So let us revise the doc here.
Link: https://lore.kernel.org/all/66d8c21d3042a_163d93294cb@willemb.c.googlers.com.notmuch/
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://patch.msgid.link/20240908124141.39628-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 10 Sep 2024 23:42:14 +0000 (16:42 -0700)]
Merge branch 'net-stmmac-fpe-via-ethtool-tc'
Furong Xu says:
====================
net: stmmac: FPE via ethtool + tc
Move the Frame Preemption(FPE) over to the new standard API which uses
ethtool-mm/tc-mqprio/tc-taprio.
====================
Link: https://patch.msgid.link/cover.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu [Fri, 6 Sep 2024 14:30:12 +0000 (22:30 +0800)]
net: stmmac: silence FPE kernel logs
ethtool --show-mm can get real-time state of FPE.
fpe_irq_status logs should keep quiet.
tc-taprio can always query driver state, delete unbalanced logs.
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/39943d7967f291674a97ef0572878aca273087e9.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu [Fri, 6 Sep 2024 14:30:11 +0000 (22:30 +0800)]
net: stmmac: support fp parameter of tc-taprio
tc-taprio can select whether traffic classes are express or preemptible.
0) tc qdisc add dev eth1 parent root handle 100 taprio \
num_tc 4 \
map 0 1 2 3 2 2 2 2 2 2 2 2 2 2 2 3 \
queues 1@0 1@1 1@2 1@3 \
base-time
1000000000 \
sched-entry S 03
10000000 \
sched-entry S 0e
10000000 \
flags 0x2 fp P E E E
1) After some traffic tests, MAC merge layer statistics are all good.
Local device:
[ {
"ifname": "eth1",
"pmac-enabled": true,
"tx-enabled": true,
"tx-active": true,
"tx-min-frag-size": 60,
"rx-min-frag-size": 60,
"verify-enabled": true,
"verify-time": 100,
"max-verify-time": 128,
"verify-status": "SUCCEEDED",
"statistics": {
"MACMergeFrameAssErrorCount": 0,
"MACMergeFrameSmdErrorCount": 0,
"MACMergeFrameAssOkCount": 0,
"MACMergeFragCountRx": 0,
"MACMergeFragCountTx": 17837,
"MACMergeHoldCount": 18639
}
} ]
Remote device:
[ {
"ifname": "end1",
"pmac-enabled": true,
"tx-enabled": true,
"tx-active": true,
"tx-min-frag-size": 60,
"rx-min-frag-size": 60,
"verify-enabled": true,
"verify-time": 100,
"max-verify-time": 128,
"verify-status": "SUCCEEDED",
"statistics": {
"MACMergeFrameAssErrorCount": 0,
"MACMergeFrameSmdErrorCount": 0,
"MACMergeFrameAssOkCount": 17189,
"MACMergeFragCountRx": 17837,
"MACMergeFragCountTx": 0,
"MACMergeHoldCount": 0
}
} ]
Tested on DWMAC CORE 5.10a
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/0d21ae356fb3cab77337527e87d46748a4852055.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu [Fri, 6 Sep 2024 14:30:10 +0000 (22:30 +0800)]
net: stmmac: support fp parameter of tc-mqprio
tc-mqprio can select whether traffic classes are express or preemptible.
After some traffic tests, MAC merge layer statistics are all good.
Local device:
ethtool --include-statistics --json --show-mm eth1
[ {
"ifname": "eth1",
"pmac-enabled": true,
"tx-enabled": true,
"tx-active": true,
"tx-min-frag-size": 60,
"rx-min-frag-size": 60,
"verify-enabled": true,
"verify-time": 100,
"max-verify-time": 128,
"verify-status": "SUCCEEDED",
"statistics": {
"MACMergeFrameAssErrorCount": 0,
"MACMergeFrameSmdErrorCount": 0,
"MACMergeFrameAssOkCount": 0,
"MACMergeFragCountRx": 0,
"MACMergeFragCountTx": 35105,
"MACMergeHoldCount": 0
}
} ]
Remote device:
ethtool --include-statistics --json --show-mm end1
[ {
"ifname": "end1",
"pmac-enabled": true,
"tx-enabled": true,
"tx-active": true,
"tx-min-frag-size": 60,
"rx-min-frag-size": 60,
"verify-enabled": true,
"verify-time": 100,
"max-verify-time": 128,
"verify-status": "SUCCEEDED",
"statistics": {
"MACMergeFrameAssErrorCount": 0,
"MACMergeFrameSmdErrorCount": 0,
"MACMergeFrameAssOkCount": 35105,
"MACMergeFragCountRx": 35105,
"MACMergeFragCountTx": 0,
"MACMergeHoldCount": 0
}
} ]
Tested on DWMAC CORE 5.10a
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/592965ea93ed8240f0a1b8f6f8ebb8914f69419b.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu [Fri, 6 Sep 2024 14:30:09 +0000 (22:30 +0800)]
net: stmmac: configure FPE via ethtool-mm
Implement ethtool --show-mm and --set-mm callbacks.
NIC up/down, link up/down, suspend/resume, kselftest-ethtool_mm,
all tested okay.
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/06ed409314fe0ee37b78b800922f2c0cce762532.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu [Fri, 6 Sep 2024 14:30:08 +0000 (22:30 +0800)]
net: stmmac: refactor FPE verification process
Drop driver defined stmmac_fpe_state, and switch to common
ethtool_mm_verify_status for local TX verification status.
Local side and remote side verification processes are completely
independent. There is no reason at all to keep a local state and
a remote state.
Add a spinlock to avoid races among ISR, timer, link update
and register configuration.
This patch is based on Vladimir Oltean's proposal.
Vladimir Oltean says:
====================
In the INITIAL state, the timer sends MPACKET_VERIFY. Eventually the
stmmac_fpe_event_status() IRQ fires and advances the state to VERIFYING,
then rearms the timer after verify_time ms. If a subsequent IRQ comes in
and modifies the state to SUCCEEDED after getting MPACKET_RESPONSE, the
timer sees this. It must enable the EFPE bit now. Otherwise, it
decrements the verify_limit counter and tries again. Eventually it
moves the status to FAILED, from which the IRQ cannot move it anywhere
else, except for another stmmac_fpe_apply() call.
====================
Co-developed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/151f86c8428eba967039718c6bf90a7d841e703b.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu [Fri, 6 Sep 2024 14:30:07 +0000 (22:30 +0800)]
net: stmmac: drop stmmac_fpe_handshake
ethtool --set-mm can trigger FPE verification process by calling
stmmac_fpe_send_mpacket, stmmac_fpe_handshake should be gone.
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/42018b1a15eb3ced567fd6a73798c7cd4e08799a.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu [Fri, 6 Sep 2024 14:30:06 +0000 (22:30 +0800)]
net: stmmac: move stmmac_fpe_cfg to stmmac_priv data
By moving the fpe_cfg field to the stmmac_priv data, stmmac_fpe_cfg
becomes platform-data eventually, instead of a run-time config.
Suggested-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://patch.msgid.link/d9b3d7ecb308c5e39778a4c8ae9df288a2754379.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexander Dahl [Fri, 6 Sep 2024 06:22:56 +0000 (08:22 +0200)]
net: mdiobus: Debug print fwnode handle instead of raw pointer
Was slightly misleading before, because printed is pointer to fwnode,
not to phy device, as placement in message suggested. Include header
for dev_dbg() declaration while at it.
Output before:
[ +0.001247] mdio_bus
f802c000.ethernet-
ffffffff: registered phy
2612f00a fwnode at address 3
Output after:
[ +0.001229] mdio_bus
f802c000.ethernet-
ffffffff: registered phy fwnode /ahb/apb/ethernet@
f802c000/ethernet-phy@3 at address 3
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexander Dahl <ada@thorsis.com>
Link: https://patch.msgid.link/20240906062256.11289-1-ada@thorsis.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
D. Wythe [Fri, 6 Sep 2024 02:35:35 +0000 (10:35 +0800)]
net/smc: add sysctl for smc_limit_hs
In commit
48b6190a0042 ("net/smc: Limit SMC visits when handshake workqueue congested"),
we introduce a mechanism to put constraint on SMC connections visit
according to the pressure of SMC handshake process.
At that time, we believed that controlling the feature through netlink
was sufficient. However, most people have realized now that netlink is
not convenient in container scenarios, and sysctl is a more suitable
approach.
In addition, since commit
462791bbfa35 ("net/smc: add sysctl interface for SMC")
had introcuded smc_sysctl_net_init(), it is reasonable for us to
initialize limit_smc_hs in it instead of initializing it in
smc_pnet_net_int().
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Link: https://patch.msgid.link/1725590135-5631-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lee Trager [Thu, 5 Sep 2024 23:37:51 +0000 (16:37 -0700)]
eth: fbnic: Add devlink firmware version info
This adds support to show firmware version information for both stored and
running firmware versions. The version and commit is displayed separately
to aid monitoring tools which only care about the version.
Example output:
# devlink dev info
pci/0000:01:00.0:
driver fbnic
serial_number 88-25-08-ff-ff-01-50-92
versions:
running:
fw 24.07.15-017
fw.commit h999784ae9df0
fw.bootloader 24.07.10-000
fw.bootloader.commit hfef3ac835ce7
stored:
fw 24.07.24-002
fw.commit hc9d14a68b3f2
fw.bootloader 24.07.22-000
fw.bootloader.commit h922f8493eb96
fw.undi 01.00.03-000
Signed-off-by: Lee Trager <lee@trager.us>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240905233820.1713043-1-lee@trager.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 10 Sep 2024 09:04:18 +0000 (11:04 +0200)]
Merge branch 'net-lan966x-use-the-newly-introduced-fdma-library'
Daniel Machon says:
====================
net: lan966x: use the newly introduced FDMA library
This patch series is the second of a 2-part series [1], that adds a new
common FDMA library for Microchip switch chips Sparx5 and lan966x. These
chips share the same FDMA engine, and as such will benefit from a common
library with a common implementation. This also has the benefit of
removing a lot of open-coded bookkeeping and duplicate code for the two
drivers.
In this second series, the FDMA library will be taken into use by the
lan966x switch driver.
###################
# Example of use: #
###################
- Initialize the rx and tx fdma structs with values for: number of
DCB's, number of DB's, channel ID, DB size (data buffer size), and
total size of the requested memory. Also provide two callbacks:
nextptr_cb() and dataptr_cb() for getting the nextptr and dataptr.
- Allocate memory using fdma_alloc_phys() or fdma_alloc_coherent().
- Initialize the DCB's with fdma_dcb_init().
- Add new DCB's with fdma_dcb_add().
- Free memory with fdma_free_phys() or fdma_free_coherent().
#####################
# Patch breakdown: #
#####################
Patch #1: select FDMA library for lan966x.
Patch #2: includes the fdma_api.h header and removes old symbols.
Patch #3: replaces old rx and tx variables with equivalent ones from the
fdma struct. Only the variables that can be changed without
breaking traffic is changed in this patch.
Patch #4: uses the library for allocation of rx buffers. This requires
quite a bit of refactoring in this single patch.
Patch #5: uses the library for adding DCB's in the rx path.
Patch #6: uses the library for freeing rx buffers.
Patch #7: uses the library for allocation of tx buffers. This requires
quite a bit of refactoring in this single patch.
Patch #8: uses the library for adding DCB's in the tx path.
Patch #9: uses the library helpers in the tx path.
Patch #10: ditch last_in_use variable and use library instead.
Patch #11: uses library helpers throughout.
Patch #12: refactor lan966x_fdma_reload() function.
[1] https://lore.kernel.org/netdev/
20240902-fdma-sparx5-v1-0-
1e7d5e5a9f34@microchip.com/
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
====================
Link: https://patch.msgid.link/20240905-fdma-lan966x-v1-0-e083f8620165@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:40 +0000 (10:06 +0200)]
net: lan966x: refactor buffer reload function
Now that we store everything in the fdma structs, refactor
lan966x_fdma_reload() to store and restore the entire struct.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:39 +0000 (10:06 +0200)]
net: lan966x: use a few FDMA helpers throughout
The library provides helpers for a number of DCB and DB operations. Use
these throughout the code and remove the old ones.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:38 +0000 (10:06 +0200)]
net: lan966x: ditch tx->last_in_use variable
This variable is used in the tx path to determine the last used DCB. The
library has the variable last_dcb for the exact same purpose. Ditch the
last_in_use variable throughout.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:37 +0000 (10:06 +0200)]
net: lan966x: use library helper for freeing tx buffers
The library has the helper fdma_free_phys() for freeing physical FDMA
memory. Use it in the exit path.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:36 +0000 (10:06 +0200)]
net: lan966x: use FDMA library for adding DCB's in the tx path
Use the fdma_dcb_add() function to add DCB's in the tx path. This gets
rid of the open-coding of nextptr and dataptr handling and leaves it to
the library.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:35 +0000 (10:06 +0200)]
net: lan966x: use the FDMA library for allocation of tx buffers
Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for rx
buffer allocation and use the new buffers throughout.
In order to replace the old buffers with the new ones, we have to do the
following refactoring:
- use fdma_alloc_phys() and fdma_dcb_init()
- replace the variables: tx->dma, tx->dcbs and tx->curr_entry
with the equivalents from the FDMA struct.
- add lan966x_fdma_tx_dataptr_cb callback for obtaining the dataptr.
- Initialize FDMA struct values.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:34 +0000 (10:06 +0200)]
net: lan966x: use library helper for freeing rx buffers
The library has the helper fdma_free_phys() for freeing physical FDMA
memory. Use it in the exit path.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:33 +0000 (10:06 +0200)]
net: lan966x: use FDMA library for adding DCB's in the rx path
Use the fdma_dcb_add() function to add DCB's in the rx path. This gets
rid of the open-coding of nextptr and dataptr handling and the functions
for adding DCB's.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:32 +0000 (10:06 +0200)]
net: lan966x: use the FDMA library for allocation of rx buffers
Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for rx
buffer allocation and use the new buffers throughout.
In order to replace the old buffers with the new ones, we have to do the
following refactoring:
- use fdma_alloc_phys() and fdma_dcb_init()
- replace the variables: rx->dma, rx->dcbs and rx->last_entry
with the equivalents from the FDMA struct.
- make use of fdma->db_size for rx buffer size.
- add lan966x_fdma_rx_dataptr_cb callback for obtaining the dataptr.
- Initialize FDMA struct values.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:31 +0000 (10:06 +0200)]
net: lan966x: replace a few variables with new equivalent ones
Replace the old rx and tx variables: channel_id, FDMA_DCB_MAX,
FDMA_RX_DCB_MAX_DBS, FDMA_TX_DCB_MAX_DBS, dcb_index and db_index with
the equivalents from the FDMA rx and tx structs. These variables are not
entangled in any buffer allocation and can therefore be replaced in
advance.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:30 +0000 (10:06 +0200)]
net: lan966x: use FDMA library symbols
Include and use the new FDMA header, which now provides the required
masks and bit offsets for operating on the DCB's and DB's.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:29 +0000 (10:06 +0200)]
net: lan966x: select FDMA library
Select the newly introduced FDMA library.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 10 Sep 2024 02:20:42 +0000 (19:20 -0700)]
Merge branch 'ionic-convert-rx-queue-buffers-to-use-page_pool'
Brett Creeley says:
====================
ionic: convert Rx queue buffers to use page_pool
Our home-grown buffer management needs to go away and we need to play
nicely with the page_pool infrastructure. This patchset cleans up some
of our API use and converts the Rx traffic queues to use page_pool.
The first few patches are for tidying up things, then a small XDP
configuration refactor, adding page_pool support, and finally adding
support to hot swap an XDP program without having to reconfigure
anything.
The result is code that more closely follows current patterns, as well as
a either a performance boost or equivalent performance as seen with
iperf testing:
mss netio tx_pps rx_pps total_pps tx_bw rx_bw total_bw
---- ------- ---------- ---------- ----------- ------- ------- ----------
Before:
256 bidir 13,839,293 15,515,227 29,354,520 34 38 71
512 bidir 13,913,249 14,671,693 28,584,942 62 65 127
1024 bidir 13,006,189 13,695,413 26,701,602 109 115 224
1448 bidir 12,489,905 12,791,734 25,281,639 145 149 294
2048 bidir 9,195,622 9,247,649 18,443,271 148 149 297
4096 bidir 5,149,716 5,247,917 10,397,633 160 163 323
8192 bidir 3,029,993 3,008,882 6,038,875 179 179 358
9000 bidir 2,789,358 2,800,744 5,590,102 181 180 361
After:
256 bidir 21,540,037 21,344,644 42,884,681 52 52 104
512 bidir 23,170,014 19,207,260 42,377,274 103 85 188
1024 bidir 17,934,280 17,819,247 35,753,527 150 149 299
1448 bidir 15,242,515 14,907,030 30,149,545 167 174 341
2048 bidir 10,692,542 10,663,023 21,355,565 177 176 353
4096 bidir 6,024,977 6,083,580 12,108,557 187 180 367
8192 bidir 3,090,449 3,048,266 6,138,715 180 176 356
9000 bidir 2,859,146 2,864,226 5,723,372 178 180 358
v2: https://lore.kernel.org/
20240826184422.21895-1-brett.creeley@amd.com
v1: https://lore.kernel.org/
20240625165658.34598-1-shannon.nelson@amd.com
====================
Link: https://patch.msgid.link/20240906232623.39651-1-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Brett Creeley [Fri, 6 Sep 2024 23:26:23 +0000 (16:26 -0700)]
ionic: Allow XDP program to be hot swapped
Using examples of other driver(s), add the ability to hot-swap an XDP
program without having to reconfigure the queues. To prevent the
q->xdp_prog to be read/written more than once use READ_ONCE() and
WRITE_ONCE() on the q->xdp_prog.
The q->xdp_prog was being checked in multiple different for loops in the
hot path. The change to allow xdp_prog hot swapping created the
possibility for many READ_ONCE(q->xdp_prog) calls during a single napi
callback. Refactor the Rx napi handling to allow a previous
READ_ONCE(q->xdp_prog) (or NULL for hwstamp_rxq) to be passed into the
relevant functions.
Also, move other Rx related hotpath handling into the newly created
ionic_rx_cq_service() function to reduce the scope of the xdp_prog
local variable and put all Rx handling in one function similar to Tx.
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-8-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Fri, 6 Sep 2024 23:26:22 +0000 (16:26 -0700)]
ionic: convert Rx queue buffers to use page_pool
Our home-grown buffer management needs to go away and we need
to be playing nicely with the page_pool infrastructure. This
converts the Rx traffic queues to use page_pool.
Also, since ionic_rx_buf_size() was removed, redefine
IONIC_PAGE_SIZE to account for IONIC_MAX_BUF_LEN being the
largest allowed buffer to prevent overflowing u16 variables,
which could happen when PAGE_SIZE is defined as >= 64KB.
include/linux/minmax.h:93:37: warning: conversion from 'long unsigned int' to 'u16' {aka 'short unsigned int'} changes value from '65536' to '0' [-Woverflow]
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-7-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Brett Creeley [Fri, 6 Sep 2024 23:26:21 +0000 (16:26 -0700)]
ionic: Fully reconfigure queues when going to/from a NULL XDP program
Currently when going to/from a NULL XDP program the driver uses
ionic_stop_queues_reconfig() and then ionic_start_queues_reconfig() in
order to re-register the xdp_rxq_info and re-init the queues. This is
fine until page_pool(s) are used in an upcoming patch.
In preparation for adding page_pool support make sure to completely
rebuild the queues when going to/from a NULL XDP program. Without this
change the call to mem_allocator_disconnect() never happens when going
to a NULL XDP program, which eventually results in
xdp_rxq_info_reg_mem_model() failing with -ENOSPC due to the mem_id_pool
ida having no remaining space.
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-6-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Fri, 6 Sep 2024 23:26:20 +0000 (16:26 -0700)]
ionic: always use rxq_info
Instead of setting up and tearing down the rxq_info only when the XDP
program is loaded or unloaded, we will build the rxq_info whether or not
XDP is in use. This is the more common use pattern and better supports
future conversion to page_pool. Since the rxq_info wants the napi_id
we re-order things slightly to tie this into the queue init and deinit
functions where we do the add and delete of napi.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-5-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Fri, 6 Sep 2024 23:26:19 +0000 (16:26 -0700)]
ionic: use per-queue xdp_prog
We originally were using a per-interface xdp_prog variable to track
a loaded XDP program since we knew there would never be support for a
per-queue XDP program. With that, we only built the per queue rxq_info
struct when an XDP program was loaded and removed it on XDP program unload,
and used the pointer as an indicator in the Rx hotpath to know to how build
the buffers. However, that's really not the model generally used, and
makes a conversion to page_pool Rx buffer cacheing a little problematic.
This patch converts the driver to use the more common approach of using
a per-queue xdp_prog pointer to work out buffer allocations and need
for bpf_prog_run_xdp(). We jostle a couple of fields in the queue struct
in order to keep the new xdp_prog pointer in a warm cacheline.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-4-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Fri, 6 Sep 2024 23:26:18 +0000 (16:26 -0700)]
ionic: rename ionic_xdp_rx_put_bufs
We aren't "putting" buf, we're just unlinking them from our tracking in
order to let the XDP_TX and XDP_REDIRECT tx clean paths take care of the
pages when they are done with them. This rename clears up the intent.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-3-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Fri, 6 Sep 2024 23:26:17 +0000 (16:26 -0700)]
ionic: debug line for Tx completion errors
Here's a little debugging aid in case the device starts throwing
Tx completion errors.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-2-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 10 Sep 2024 00:44:52 +0000 (17:44 -0700)]
Merge branch 'rx-software-timestamp-for-all-round-3'
Gal Pressman says:
====================
RX software timestamp for all - round 3
Rounds 1 & 2 of drivers conversion were merged [1][2], this round will
complete the work.
[1] https://lore.kernel.org/netdev/
20240901112803.212753-1-gal@nvidia.com/
[2] https://lore.kernel.org/netdev/
20240904074922.256275-1-gal@nvidia.com/
====================
Link: https://patch.msgid.link/20240906144632.404651-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:32 +0000 (17:46 +0300)]
ptp: ptp_ines: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-17-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:31 +0000 (17:46 +0300)]
ixp4xx_eth: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patch.msgid.link/20240906144632.404651-16-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:30 +0000 (17:46 +0300)]
net: stmmac: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-15-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:29 +0000 (17:46 +0300)]
sfc/siena: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/20240906144632.404651-14-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:28 +0000 (17:46 +0300)]
sfc: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/20240906144632.404651-13-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:27 +0000 (17:46 +0300)]
qede: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-12-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:26 +0000 (17:46 +0300)]
net: mscc: ocelot: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-11-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:25 +0000 (17:46 +0300)]
net/funeth: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-10-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:24 +0000 (17:46 +0300)]
enic: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-9-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:23 +0000 (17:46 +0300)]
net: thunderx: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-8-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:22 +0000 (17:46 +0300)]
liquidio: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-7-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:21 +0000 (17:46 +0300)]
net: macb: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Link: https://patch.msgid.link/20240906144632.404651-6-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:20 +0000 (17:46 +0300)]
amd-xgbe: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Link: https://patch.msgid.link/20240906144632.404651-5-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:19 +0000 (17:46 +0300)]
bonding: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-4-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:18 +0000 (17:46 +0300)]
tg3: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240906144632.404651-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:17 +0000 (17:46 +0300)]
bnxt_en: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240906144632.404651-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MD Danish Anwar [Fri, 6 Sep 2024 09:36:49 +0000 (15:06 +0530)]
net: ti: icssg-prueth: Make pa_stats optional
pa_stats is optional in dt bindings, make it optional in driver as well.
Currently if pa_stats syscon regmap is not found driver returns -ENODEV.
Fix this by not returning an error in case pa_stats is not found and
continue generating ethtool stats without pa_stats.
Fixes: 550ee90ac61c ("net: ti: icssg-prueth: Add support for PA Stats")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240906093649.870883-1-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Fri, 6 Sep 2024 07:36:09 +0000 (08:36 +0100)]
net: ibm: emac: Use __iomem annotation for emac_[xg]aht_base
dev->emacp contains an __iomem pointer and values derived
from it are used as __iomem pointers. So use this annotation
in the return type for helpers that derive pointers from dev->emacp.
Flagged by Sparse as:
.../core.c:444:36: warning: incorrect type in argument 1 (different address spaces)
.../core.c:444:36: expected unsigned int volatile [noderef] [usertype] __iomem *addr
.../core.c:444:36: got unsigned int [usertype] *
.../core.c: note: in included file:
.../core.h:416:25: warning: cast removes address space '__iomem' of expression
Compile tested only.
No functional change intended.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240906-emac-iomem-v1-1-207cc4f3fed0@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 10 Sep 2024 00:38:03 +0000 (17:38 -0700)]
Merge branch 'selftests-net-add-packetdrill'
Willem de Bruijn says:
====================
selftests/net: add packetdrill
Lay the groundwork to import into kselftests the over 150 packetdrill
TCP/IP conformance tests on github.com/google/packetdrill.
1/2: add kselftest infra for TEST_PROGS that need an interpreter
2/2: add the specific packetdrill tests
====================
Link: https://patch.msgid.link/20240905231653.2427327-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willem de Bruijn [Thu, 5 Sep 2024 23:15:52 +0000 (19:15 -0400)]
selftests/net: integrate packetdrill with ksft
Lay the groundwork to import into kselftests the over 150 packetdrill
TCP/IP conformance tests on github.com/google/packetdrill.
Florian recently added support for packetdrill tests in nf_conntrack,
in commit
a8a388c2aae49 ("selftests: netfilter: add packetdrill based
conntrack tests").
This patch takes a slightly different approach. It relies on
ksft_runner.sh to run every *.pkt file in the directory.
Any future imports of packetdrill tests should require no additional
coding. Just add the *.pkt files.
Initially import only two features/directories from github. One with a
single script, and one with two. This was the only reason to pick
tcp/inq and tcp/md5.
The path replaces the directory hierarchy in github with a flat space
of files: $(subst /,_,$(wildcard tcp/**/*.pkt)). This is the most
straightforward option to integrate with kselftests. The Linked thread
reviewed two ways to maintain the hierarchy: TEST_PROGS_RECURSE and
PRESERVE_TEST_DIRS. But both introduce significant changes to
kselftest infra and with that risk to existing tests.
Implementation notes:
- restore alphabetical order when adding the new directory to
tools/testing/selftests/Makefile
- imported *.pkt files and support verbatim from the github project,
except for
- update `source ./defaults.sh` path (to adjust for flat dir)
- add SPDX headers
- remove one author statement
- Acknowledgment: drop an e (checkpatch)
Tested:
make -C tools/testing/selftests \
TARGETS=net/packetdrill \
run_tests
make -C tools/testing/selftests \
TARGETS=net/packetdrill \
install INSTALL_PATH=$KSFT_INSTALL_PATH
# in virtme-ng
./run_kselftest.sh -c net/packetdrill
./run_kselftest.sh -t net/packetdrill:tcp_inq_client.pkt
Link: https://lore.kernel.org/netdev/20240827193417.2792223-1-willemdebruijn.kernel@gmail.com/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905231653.2427327-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willem de Bruijn [Thu, 5 Sep 2024 23:15:51 +0000 (19:15 -0400)]
selftests: support interpreted scripts with ksft_runner.sh
Support testcases that are themselves not executable, but need an
interpreter to run them.
If a test file is not executable, but an executable file
ksft_runner.sh exists in the TARGET dir, kselftest will run
./ksft_runner.sh ./$BASENAME_TEST
Packetdrill may add hundreds of packetdrill scripts for testing. These
scripts must be passed to the packetdrill process.
Have kselftest run each test directly, as it already solves common
runner requirements like parallel execution and isolation (netns).
A previous RFC added a wrapper in between, which would have to
reimplement such functionality.
Link: https://lore.kernel.org/netdev/66d4d97a4cac_3df182941a@willemb.c.googlers.com.notmuch/T/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905231653.2427327-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 10 Sep 2024 00:17:45 +0000 (17:17 -0700)]
Merge branch 'various-cleanups'
Rosen Penev says:
====================
various cleanups
Allow CI to build. Also a bugfix for dual GMAC devices.
====================
Link: https://patch.msgid.link/20240905194938.8453-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sven Eckelmann [Thu, 5 Sep 2024 19:49:38 +0000 (12:49 -0700)]
net: ag71xx: disable napi interrupts during probe
ag71xx_probe is registering ag71xx_interrupt as handler for gmac0/gmac1
interrupts. The handler is trying to use napi_schedule to handle the
processing of packets. But the netif_napi_add for this device is
called a lot later in ag71xx_probe.
It can therefore happen that a still running gmac0/gmac1 is triggering the
interrupt handler with a bit from AG71XX_INT_POLL set in
AG71XX_REG_INT_STATUS. The handler will then call napi_schedule and the
napi code will crash the system because the ag->napi is not yet
initialized.
The gmcc0/gmac1 must be brought in a state in which it doesn't signal a
AG71XX_INT_POLL related status bits as interrupt before registering the
interrupt handler. ag71xx_hw_start will take care of re-initializing the
AG71XX_REG_INT_ENABLE.
This will become relevant when dual GMAC devices get added here.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20240905194938.8453-8-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rosen Penev [Thu, 5 Sep 2024 19:49:37 +0000 (12:49 -0700)]
net: ag71xx: remove always true branch
The opposite of this condition is checked above and if true, function
returns. Which means this can never be false.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240905194938.8453-7-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rosen Penev [Thu, 5 Sep 2024 19:49:36 +0000 (12:49 -0700)]
net: ag71xx: get reset control using devm api
Currently, the of variant is missing reset_control_put in error paths.
The devm variant does not require it.
Allows removing mdio_reset from the struct as it is not used outside the
function.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240905194938.8453-6-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rosen Penev [Thu, 5 Sep 2024 19:49:35 +0000 (12:49 -0700)]
net: ag71xx: use ethtool_puts
Allows simplifying get_strings and avoids manual pointer manipulation.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240905194938.8453-5-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rosen Penev [Thu, 5 Sep 2024 19:49:34 +0000 (12:49 -0700)]
net: ag71xx: update FIFO bits and descriptions
Taken from QCA SDK. No functional difference as same bits get applied.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20240905194938.8453-4-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rosen Penev [Thu, 5 Sep 2024 19:49:33 +0000 (12:49 -0700)]
net: ag71xx: add MODULE_DESCRIPTION
Now that COMPILE_TEST is enabled, it gets flagged when building with
allmodconfig W=1 builds. Text taken from the beginning of the file.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240905194938.8453-3-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rosen Penev [Thu, 5 Sep 2024 19:49:32 +0000 (12:49 -0700)]
net: ag71xx: add COMPILE_TEST to test compilation
While this driver is meant for MIPS only, it can be compiled on x86 just
fine. Remove pointless parentheses while at it.
Enables CI building of this driver.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240905194938.8453-2-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 10 Sep 2024 00:14:28 +0000 (17:14 -0700)]
Merge branch 'af_unix-correct-manage_oob-when-oob-follows-a-consumed-oob'
Kuniyuki Iwashima says:
====================
af_unix: Correct manage_oob() when OOB follows a consumed OOB.
Recently syzkaller reported UAF of OOB skb.
The bug was introduced by commit
93c99f21db36 ("af_unix: Don't stop
recv(MSG_DONTWAIT) if consumed OOB skb is at the head.") but uncovered
by another recent commit
8594d9b85c07 ("af_unix: Don't call skb_get()
for OOB skb.").
[0]: https://lore.kernel.org/netdev/
00000000000083b05a06214c9ddc@google.com/
====================
Link: https://patch.msgid.link/20240905193240.17565-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Thu, 5 Sep 2024 19:32:40 +0000 (12:32 -0700)]
af_unix: Don't return OOB skb in manage_oob().
syzbot reported use-after-free in unix_stream_recv_urg(). [0]
The scenario is
1. send(MSG_OOB)
2. recv(MSG_OOB)
-> The consumed OOB remains in recv queue
3. send(MSG_OOB)
4. recv()
-> manage_oob() returns the next skb of the consumed OOB
-> This is also OOB, but unix_sk(sk)->oob_skb is not cleared
5. recv(MSG_OOB)
-> unix_sk(sk)->oob_skb is used but already freed
The recent commit
8594d9b85c07 ("af_unix: Don't call skb_get() for OOB
skb.") uncovered the issue.
If the OOB skb is consumed and the next skb is peeked in manage_oob(),
we still need to check if the skb is OOB.
Let's do so by falling back to the following checks in manage_oob()
and add the test case in selftest.
Note that we need to add a similar check for SIOCATMARK.
[0]:
BUG: KASAN: slab-use-after-free in unix_stream_read_actor+0xa6/0xb0 net/unix/af_unix.c:2959
Read of size 4 at addr
ffff8880326abcc4 by task syz-executor178/5235
CPU: 0 UID: 0 PID: 5235 Comm: syz-executor178 Not tainted
6.11.0-rc5-syzkaller-00742-gfbdaffe41adc #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:93 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
unix_stream_read_actor+0xa6/0xb0 net/unix/af_unix.c:2959
unix_stream_recv_urg+0x1df/0x320 net/unix/af_unix.c:2640
unix_stream_read_generic+0x2456/0x2520 net/unix/af_unix.c:2778
unix_stream_recvmsg+0x22b/0x2c0 net/unix/af_unix.c:2996
sock_recvmsg_nosec net/socket.c:1046 [inline]
sock_recvmsg+0x22f/0x280 net/socket.c:1068
____sys_recvmsg+0x1db/0x470 net/socket.c:2816
___sys_recvmsg net/socket.c:2858 [inline]
__sys_recvmsg+0x2f0/0x3e0 net/socket.c:2888
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5360d6b4e9
Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007fff29b3a458 EFLAGS:
00000246 ORIG_RAX:
000000000000002f
RAX:
ffffffffffffffda RBX:
00007fff29b3a638 RCX:
00007f5360d6b4e9
RDX:
0000000000002001 RSI:
0000000020000640 RDI:
0000000000000003
RBP:
00007f5360dde610 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
0000000000000001
R13:
00007fff29b3a628 R14:
0000000000000001 R15:
0000000000000001
</TASK>
Allocated by task 5235:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
unpoison_slab_object mm/kasan/common.c:312 [inline]
__kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slub.c:3988 [inline]
slab_alloc_node mm/slub.c:4037 [inline]
kmem_cache_alloc_node_noprof+0x16b/0x320 mm/slub.c:4080
__alloc_skb+0x1c3/0x440 net/core/skbuff.c:667
alloc_skb include/linux/skbuff.h:1320 [inline]
alloc_skb_with_frags+0xc3/0x770 net/core/skbuff.c:6528
sock_alloc_send_pskb+0x91a/0xa60 net/core/sock.c:2815
sock_alloc_send_skb include/net/sock.h:1778 [inline]
queue_oob+0x108/0x680 net/unix/af_unix.c:2198
unix_stream_sendmsg+0xd24/0xf80 net/unix/af_unix.c:2351
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
___sys_sendmsg net/socket.c:2651 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2680
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 5235:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
__kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2252 [inline]
slab_free mm/slub.c:4473 [inline]
kmem_cache_free+0x145/0x350 mm/slub.c:4548
unix_stream_read_generic+0x1ef6/0x2520 net/unix/af_unix.c:2917
unix_stream_recvmsg+0x22b/0x2c0 net/unix/af_unix.c:2996
sock_recvmsg_nosec net/socket.c:1046 [inline]
sock_recvmsg+0x22f/0x280 net/socket.c:1068
__sys_recvfrom+0x256/0x3e0 net/socket.c:2255
__do_sys_recvfrom net/socket.c:2273 [inline]
__se_sys_recvfrom net/socket.c:2269 [inline]
__x64_sys_recvfrom+0xde/0x100 net/socket.c:2269
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The buggy address belongs to the object at
ffff8880326abc80
which belongs to the cache skbuff_head_cache of size 240
The buggy address is located 68 bytes inside of
freed 240-byte region [
ffff8880326abc80,
ffff8880326abd70)
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:
0000000000000000 index:0x0 pfn:0x326ab
ksm flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xfdffffff(slab)
raw:
00fff00000000000 ffff88801eaee780 ffffea0000b7dc80 dead000000000003
raw:
0000000000000000 00000000800c000c 00000001fdffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 4686, tgid 4686 (udevadm), ts
32357469485, free_ts
28829011109
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1493
prep_new_page mm/page_alloc.c:1501 [inline]
get_page_from_freelist+0x2e4c/0x2f10 mm/page_alloc.c:3439
__alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4695
__alloc_pages_node_noprof include/linux/gfp.h:269 [inline]
alloc_pages_node_noprof include/linux/gfp.h:296 [inline]
alloc_slab_page+0x5f/0x120 mm/slub.c:2321
allocate_slab+0x5a/0x2f0 mm/slub.c:2484
new_slab mm/slub.c:2537 [inline]
___slab_alloc+0xcd1/0x14b0 mm/slub.c:3723
__slab_alloc+0x58/0xa0 mm/slub.c:3813
__slab_alloc_node mm/slub.c:3866 [inline]
slab_alloc_node mm/slub.c:4025 [inline]
kmem_cache_alloc_node_noprof+0x1fe/0x320 mm/slub.c:4080
__alloc_skb+0x1c3/0x440 net/core/skbuff.c:667
alloc_skb include/linux/skbuff.h:1320 [inline]
alloc_uevent_skb+0x74/0x230 lib/kobject_uevent.c:289
uevent_net_broadcast_untagged lib/kobject_uevent.c:326 [inline]
kobject_uevent_net_broadcast+0x2fd/0x580 lib/kobject_uevent.c:410
kobject_uevent_env+0x57d/0x8e0 lib/kobject_uevent.c:608
kobject_synth_uevent+0x4ef/0xae0 lib/kobject_uevent.c:207
uevent_store+0x4b/0x70 drivers/base/bus.c:633
kernfs_fop_write_iter+0x3a1/0x500 fs/kernfs/file.c:334
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0xa72/0xc90 fs/read_write.c:590
page last free pid 1 tgid 1 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1094 [inline]
free_unref_page+0xd22/0xea0 mm/page_alloc.c:2612
kasan_depopulate_vmalloc_pte+0x74/0x90 mm/kasan/shadow.c:408
apply_to_pte_range mm/memory.c:2797 [inline]
apply_to_pmd_range mm/memory.c:2841 [inline]
apply_to_pud_range mm/memory.c:2877 [inline]
apply_to_p4d_range mm/memory.c:2913 [inline]
__apply_to_page_range+0x8a8/0xe50 mm/memory.c:2947
kasan_release_vmalloc+0x9a/0xb0 mm/kasan/shadow.c:525
purge_vmap_node+0x3e3/0x770 mm/vmalloc.c:2208
__purge_vmap_area_lazy+0x708/0xae0 mm/vmalloc.c:2290
_vm_unmap_aliases+0x79d/0x840 mm/vmalloc.c:2885
change_page_attr_set_clr+0x2fe/0xdb0 arch/x86/mm/pat/set_memory.c:1881
change_page_attr_set arch/x86/mm/pat/set_memory.c:1922 [inline]
set_memory_nx+0xf2/0x130 arch/x86/mm/pat/set_memory.c:2110
free_init_pages arch/x86/mm/init.c:924 [inline]
free_kernel_image_pages arch/x86/mm/init.c:943 [inline]
free_initmem+0x79/0x110 arch/x86/mm/init.c:970
kernel_init+0x31/0x2b0 init/main.c:1476
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Memory state around the buggy address:
ffff8880326abb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880326abc00: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
>
ffff8880326abc80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880326abd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc
ffff8880326abd80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
Fixes: 93c99f21db36 ("af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head.")
Reported-by: syzbot+8811381d455e3e9ec788@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=8811381d455e3e9ec788
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240905193240.17565-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Thu, 5 Sep 2024 19:32:39 +0000 (12:32 -0700)]
af_unix: Move spin_lock() in manage_oob().
When OOB skb has been already consumed, manage_oob() returns the next
skb if exists. In such a case, we need to fall back to the else branch
below.
Then, we want to keep holding spin_lock(&sk->sk_receive_queue.lock).
Let's move it out of if-else branch and add lightweight check before
spin_lock() for major use cases without OOB skb.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240905193240.17565-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Thu, 5 Sep 2024 19:32:38 +0000 (12:32 -0700)]
af_unix: Rename unlinked_skb in manage_oob().
When OOB skb has been already consumed, manage_oob() returns the next
skb if exists. In such a case, we need to fall back to the else branch
below.
Then, we need to keep two skbs and free them later with consume_skb()
and kfree_skb().
Let's rename unlinked_skb accordingly.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240905193240.17565-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Thu, 5 Sep 2024 19:32:37 +0000 (12:32 -0700)]
af_unix: Remove single nest in manage_oob().
This is a prep for the later fix.
No functional change intended.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240905193240.17565-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 10 Sep 2024 00:11:05 +0000 (17:11 -0700)]
Merge tag 'linux-can-next-for-6.12-
20240909' of git://git./linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2024-09-09
The first patch is by Rob Herring and simplifies the DT parsing in the
cc770 driver.
The next 2 patches both target the rockchip_canfd driver added in the
last pull request to net-next. The first one is by Nathan Chancellor
and fixes the return type of rkcanfd_start_xmit(), the second one is
by me and fixes a 64 bit division on 32 bit platforms in
rkcanfd_timestamp_init().
* tag 'linux-can-next-for-6.12-
20240909' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: rockchip_canfd: rkcanfd_timestamp_init(): fix 64 bit division on 32 bit platforms
can: rockchip_canfd: fix return type of rkcanfd_start_xmit()
net: can: cc770: Simplify parsing DT properties
====================
Link: https://patch.msgid.link/20240909063657.2287493-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 6 Sep 2024 16:10:59 +0000 (09:10 -0700)]
net: remove dev_pick_tx_cpu_id()
dev_pick_tx_cpu_id() has been introduced with two users by
commit
a4ea8a3dacc3 ("net: Add generic ndo_select_queue functions").
The use in AF_PACKET has been removed in 2019 by
commit
b71b5837f871 ("packet: rework packet_pick_tx_queue() to use common code selection")
The other user was a Netlogic XLP driver, removed in 2021 by
commit
47ac6f567c28 ("staging: Remove Netlogic XLP network driver").
It's relatively unlikely that any modern driver will need an
.ndo_select_queue implementation which picks purely based on CPU ID
and skips XPS, delete dev_pick_tx_cpu_id()
Found by code inspection.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240906161059.715546-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 9 Sep 2024 23:52:06 +0000 (16:52 -0700)]
Merge branch 'selftests-mptcp-add-time-per-subtests-in-tap-output'
Matthieu Baerts says:
====================
selftests: mptcp: add time per subtests in TAP output
Patches here add 'time=<N>ms' in the diagnostic data of the TAP output,
e.g.
ok 1 - pm_netlink: defaults addr list # time=9ms
This addition is useful to quickly identify which subtests are taking a
longer time than the others, or more than expected.
Note that there are no specific formats to follow to show this time
according to the TAP 13, TAP 14 and KTAP specifications, but we follow
the format being parsed by NIPA [1].
Patch 1 modifies mptcp_lib.sh to add this support to all MPTCP
selftests.
Patch 2 removes the now duplicated info in mptcp_connect.sh
Patch 3 slightly improves the precision of the first subtests in all
MPTCP subtests.
Patches 4 and 5 remove duplicated spaces in TAP output, for the TAP
parsers that cannot handle them properly.
v1: https://lore.kernel.org/
20240902-net-next-mptcp-ksft-subtest-time-v1-0-
f1ed499a11b1@kernel.org
Link: https://github.com/linux-netdev/nipa/pull/36
====================
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-0-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts (NGI0) [Fri, 6 Sep 2024 18:46:11 +0000 (20:46 +0200)]
selftests: mptcp: connect: remove duplicated spaces in TAP output
It is nice to have a visual alignment in the test output to present the
different results, but it makes less sense in the TAP output that is
there for computers.
It sounds then better to remove the duplicated whitespaces in the TAP
output, also because it can cause some issues with TAP parsers expecting
only one space around the directive delimiter (#).
While at it, change the variable name (result_msg) to something more
explicit.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-5-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts (NGI0) [Fri, 6 Sep 2024 18:46:10 +0000 (20:46 +0200)]
selftests: mptcp: diag: remove trailing whitespace
It doesn't need to be there, and it can cause some issues with TAP
parsers expecting only one space around the directive delimiter (#).
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-4-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts (NGI0) [Fri, 6 Sep 2024 18:46:09 +0000 (20:46 +0200)]
selftests: mptcp: reset the last TS before the first test
Just to slightly improve the precision of the duration of the first
test.
In mptcp_join.sh, the last append_prev_results is now done as soon as
the last test is over: this will add the last result in the list, and
get a more precise time for this last test.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-3-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts (NGI0) [Fri, 6 Sep 2024 18:46:08 +0000 (20:46 +0200)]
selftests: mptcp: connect: remote time in TAP output
It is now added by the MPTCP lib automatically, see the parent commit.
The time in the TAP output might be slightly different from the one
displayed before, but that's OK.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-2-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts (NGI0) [Fri, 6 Sep 2024 18:46:07 +0000 (20:46 +0200)]
selftests: mptcp: lib: add time per subtests in TAP output
It adds 'time=<N>ms' in the diagnostic data of the TAP output, e.g.
ok 1 - pm_netlink: defaults addr list # time=9ms
This addition is useful to quickly identify which subtests are taking a
longer time than the others, or more than expected.
Note that there are no specific formats to follow to show this time
according to the TAP 13 [1], TAP 14 [2] and KTAP [3] specifications.
Let's then define this one here.
Link: https://testanything.org/tap-version-13-specification.html
Link: https://testanything.org/tap-version-14-specification.html
Link: https://docs.kernel.org/dev-tools/ktap.html
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-1-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Xing [Thu, 5 Sep 2024 16:00:35 +0000 (00:00 +0800)]
selftests: return failure when timestamps can't be reported
When I was trying to modify the tx timestamping feature, I found that
running "./txtimestamp -4 -C -L 127.0.0.1" didn't reflect the error:
I succeeded to generate timestamp stored in the skb but later failed
to report it to the userspace (which means failed to put css into cmsg).
It can happen when someone writes buggy codes in __sock_recv_timestamp(),
for example.
After adding the check so that running ./txtimestamp will reflect the
result correctly like this if there is a bug in the reporting phase:
protocol: TCP
payload: 10
server port: 9000
family: INET
test SND
USR:
1725458477 s 667997 us (seq=0, len=0)
Failed to report timestamps
USR:
1725458477 s 718128 us (seq=0, len=0)
Failed to report timestamps
USR:
1725458477 s 768273 us (seq=0, len=0)
Failed to report timestamps
USR:
1725458477 s 818416 us (seq=0, len=0)
Failed to report timestamps
...
In the future, it will help us detect whether the new coming patch has
bugs or not.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905160035.62407-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Mon, 9 Sep 2024 13:14:54 +0000 (14:14 +0100)]
Merge branch 'unmask-dscp-part-four'
Ido Schimmel says:
====================
Unmask upper DSCP bits - part 4 (last)
tl;dr - This patchset finishes to unmask the upper DSCP bits in the IPv4
flow key in preparation for allowing IPv4 FIB rules to match on DSCP. No
functional changes are expected.
The TOS field in the IPv4 flow key ('flowi4_tos') is used during FIB
lookup to match against the TOS selector in FIB rules and routes.
It is currently impossible for user space to configure FIB rules that
match on the DSCP value as the upper DSCP bits are either masked in the
various call sites that initialize the IPv4 flow key or along the path
to the FIB core.
In preparation for adding a DSCP selector to IPv4 and IPv6 FIB rules, we
need to make sure the entire DSCP value is present in the IPv4 flow key.
This patchset finishes to unmask the upper DSCP bits by adjusting all
the callers of ip_route_output_key() to properly initialize the full
DSCP value in the IPv4 flow key.
No functional changes are expected as commit
1fa3314c14c6 ("ipv4:
Centralize TOS matching") moved the masking of the upper DSCP bits to
the core where 'flowi4_tos' is matched against the TOS selector.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 5 Sep 2024 16:51:40 +0000 (19:51 +0300)]
sctp: Unmask upper DSCP bits in sctp_v4_get_dst()
Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.
Note that the 'tos' variable holds the full DS field.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 5 Sep 2024 16:51:39 +0000 (19:51 +0300)]
ipv4: udp_tunnel: Unmask upper DSCP bits in udp_tunnel_dst_lookup()
Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.
Note that callers of udp_tunnel_dst_lookup() pass the entire DS field in
the 'tos' argument.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 5 Sep 2024 16:51:38 +0000 (19:51 +0300)]
netfilter: nf_dup4: Unmask upper DSCP bits in nf_dup_ipv4_route()
Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 5 Sep 2024 16:51:37 +0000 (19:51 +0300)]
netfilter: nft_flow_offload: Unmask upper DSCP bits in nft_flow_route()
Unmask the upper DSCP bits when calling nf_route() which eventually
calls ip_route_output_key() so that in the future it could perform the
FIB lookup according to the full DSCP value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 5 Sep 2024 16:51:36 +0000 (19:51 +0300)]
ipv4: netfilter: Unmask upper DSCP bits in ip_route_me_harder()
Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 5 Sep 2024 16:51:35 +0000 (19:51 +0300)]
ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_xmit()
Unmask the upper DSCP bits when initializing an IPv4 flow key via
ip_tunnel_init_flow() before passing it to ip_route_output_key() so that
in the future we could perform the FIB lookup according to the full DSCP
value.
Note that the 'tos' variable includes the full DS field. Either the one
specified as part of the tunnel parameters or the one inherited from the
inner packet.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 5 Sep 2024 16:51:34 +0000 (19:51 +0300)]
ipv4: ip_tunnel: Unmask upper DSCP bits in ip_md_tunnel_xmit()
Unmask the upper DSCP bits when initializing an IPv4 flow key via
ip_tunnel_init_flow() before passing it to ip_route_output_key() so that
in the future we could perform the FIB lookup according to the full DSCP
value.
Note that the 'tos' variable includes the full DS field. Either the one
specified via the tunnel key or the one inherited from the inner packet.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 5 Sep 2024 16:51:33 +0000 (19:51 +0300)]
ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_bind_dev()
Unmask the upper DSCP bits when initializing an IPv4 flow key via
ip_tunnel_init_flow() before passing it to ip_route_output_key() so that
in the future we could perform the FIB lookup according to the full DSCP
value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 5 Sep 2024 16:51:32 +0000 (19:51 +0300)]
ipv4: icmp: Unmask upper DSCP bits in icmp_reply()
Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 5 Sep 2024 16:51:31 +0000 (19:51 +0300)]
bpf: lwtunnel: Unmask upper DSCP bits in bpf_lwt_xmit_reroute()
Unmask the upper DSCP bits when calling ip_route_output_key() so that in
the future it could perform the FIB lookup according to the full DSCP
value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 5 Sep 2024 16:51:30 +0000 (19:51 +0300)]
ipv4: ip_gre: Unmask upper DSCP bits in ipgre_open()
Unmask the upper DSCP bits when calling ip_route_output_gre() so that in
the future it could perform the FIB lookup according to the full DSCP
value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>