Double Lo [Thu, 29 Sep 2022 01:25:26 +0000 (20:25 -0500)]
brcmfmac: fix CERT-P2P:5.1.10 failure
This patch fix CERT-P2P:5.1.10 failure at step 18 Group formation failed
due to chip is under dump survey. Decrease the dump survery duration to
pass this certification case.
Signed-off-by: Double Lo <double.lo@cypress.com>
Signed-off-by: Chi-hsien Lin <chi-hsien.lin@infineon.com>
Signed-off-by: Ian Lin <ian.lin@infineon.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220929012527.4152-4-ian.lin@infineon.com
Wright Feng [Thu, 29 Sep 2022 01:25:25 +0000 (20:25 -0500)]
brcmfmac: fix firmware trap while dumping obss stats
When doing dump_survey, host will call "dump_obss" iovar to firmware
side. Host need to make sure the HW clock in dongle is on, or there is
high probability that firmware gets trap because register or shared
memory access failed. To fix this, we disable mpc when doing dump obss
and set it back after that.
[28350.512799] brcmfmac: brcmf_dump_obss: dump_obss error (-52)
[28743.402314] ieee80211 phy0: brcmf_fw_crashed: Firmware has halted or
crashed
[28745.869430] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[28745.877546] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
Signed-off-by: Wright Feng <wright.feng@cypress.com>
Signed-off-by: Chi-hsien Lin <chi-hsien.lin@infineon.com>
Signed-off-by: Ian Lin <ian.lin@infineon.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220929012527.4152-3-ian.lin@infineon.com
Wright Feng [Thu, 29 Sep 2022 01:25:24 +0000 (20:25 -0500)]
brcmfmac: Add dump_survey cfg80211 ops for HostApd AutoChannelSelection
To enable ACS feature in Hostap daemon, dump_survey cfg80211 ops and dump
obss survey command in firmware side are needed. This patch is for adding
dump_survey feature and adding DUMP_OBSS feature flag to check if
firmware supports dump_obss iovar.
Signed-off-by: Wright Feng <wright.feng@cypress.com>
Signed-off-by: Chi-hsien Lin <chi-hsien.lin@cypress.com>
Signed-off-by: Ian Lin <ian.lin@infineon.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220929012527.4152-2-ian.lin@infineon.com
Bitterblue Smith [Wed, 28 Sep 2022 20:36:51 +0000 (23:36 +0300)]
wifi: rtl8xxxu: gen2: Turn on the rate control
Re-enable the function rtl8xxxu_gen2_report_connect.
It informs the firmware when connecting to a network. This makes the
firmware enable the rate control, which makes the upload faster.
It also informs the firmware when disconnecting from a network. In the
past this made reconnecting impossible because it was sending the
auth on queue 0x7 (TXDESC_QUEUE_VO) instead of queue 0x12
(TXDESC_QUEUE_MGNT):
wlp0s20f0u3: send auth to 90:55:de:__:__:__ (try 1/3)
wlp0s20f0u3: send auth to 90:55:de:__:__:__ (try 2/3)
wlp0s20f0u3: send auth to 90:55:de:__:__:__ (try 3/3)
wlp0s20f0u3: authentication with 90:55:de:__:__:__ timed out
Probably the firmware disables the unnecessary TX queues when it
knows it's disconnected.
However, this was fixed in commit
edd5747aa12e ("wifi: rtl8xxxu: Fix
skb misuse in TX queue selection").
Fixes:
c59f13bbead4 ("rtl8xxxu: Work around issue with 8192eu and 8723bu devices not reconnecting")
Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/43200afc-0c65-ee72-48f8-231edd1df493@gmail.com
Bitterblue Smith [Wed, 28 Sep 2022 21:16:46 +0000 (00:16 +0300)]
wifi: rtl8xxxu: Support new chip RTL8188FU
This chip is found in the cheapest USB adapters, e.g. 1.17 USD with
VAT and shipping from China included.
It's a gen 2 chip, similar to the RTL8723BU, but without Bluetooth.
Features: 2.4 GHz, b/g/n mode, 1T1R, 150 Mbps.
The vendor driver rtl8188fu version 4.3.23.6_20964.
20170110 [0]
was used as reference. The CD shipped with the device includes a
newer driver, version 5.11.5-1-g12f7cde4b.
20201102, but that one
couldn't complete the WPA2 key exchange thing for whatever reason.
[0] https://github.com/kelebek333/rtl8188fu
Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/b14f299d-3248-98fe-eee1-ba50d2e76c74@gmail.com
Ping-Ke Shih [Wed, 28 Sep 2022 08:43:36 +0000 (16:43 +0800)]
wifi: rtw89: 8852be: add 8852BE PCI entry
8852BE has two variants with different ID. One is 10ec:b852 that is a main
model with 2x2 antenna, and the other is 10ec:b85b that is a 1x1 model.
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220928084336.34981-10-pkshih@realtek.com
Ping-Ke Shih [Wed, 28 Sep 2022 08:43:35 +0000 (16:43 +0800)]
wifi: rtw89: 8852b: add chip_ops to read phy cap
This efuse region is to store PHY calibration, and it is a separated region
from the region that stores MAC address. Then, use these data to configure
via chip_ops::power_trim that is a calibration mechanism of TX power.
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220928084336.34981-9-pkshih@realtek.com
Ping-Ke Shih [Wed, 28 Sep 2022 08:43:34 +0000 (16:43 +0800)]
wifi: rtw89: 8852b: add chip_ops to read efuse
efuse stores individual data about a chip itself, such as MAC address,
country code, RF and crystal calibration data, and so on. Define a struct
to help access efuse content, and copy them into a common struct.
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220928084336.34981-8-pkshih@realtek.com
Ping-Ke Shih [Wed, 28 Sep 2022 08:43:33 +0000 (16:43 +0800)]
wifi: rtw89: 8852b: add chip_ops::set_txpwr
This chip_ops is to set TX power according to country, channel, rate and
so on. Since shared code is used to configure TX power, we only implement
specific part in this patch.
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220928084336.34981-7-pkshih@realtek.com
Zong-Zhe Yang [Wed, 28 Sep 2022 08:43:32 +0000 (16:43 +0800)]
wifi: rtw89: debug: txpwr_table considers sign
Previously, value of each field is just shown as unsigned.
Now, we start to show them with sign to make things more intuitive
during debugging.
Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220928084336.34981-6-pkshih@realtek.com
Zong-Zhe Yang [Wed, 28 Sep 2022 08:43:31 +0000 (16:43 +0800)]
wifi: rtw89: phy: make generic txpwr setting functions
Previously, we thought control registers or setting things for TX power
series may change according to chip. So, setting functions are implemented
chip by chip. However, until now, the functions keep the same among chips,
at least 8852A, 8852C, and 8852B. There is a sufficient number of chips to
share generic setting functions. So, we now remake them including TX power
by rate, TX power offset, TX power limit, and TX power limit RU as generic
ones in phy.c.
Besides, there are some code refinements in the generic ones, but almost
all of the logic doesn't change.
Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220928084336.34981-5-pkshih@realtek.com
Ping-Ke Shih [Wed, 28 Sep 2022 08:43:30 +0000 (16:43 +0800)]
wifi: rtw89: 8852b: add tables for RFK
These tables are used by RFK to assist to configure PHY and RF registers.
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220928084336.34981-4-pkshih@realtek.com
Ping-Ke Shih [Wed, 28 Sep 2022 08:43:29 +0000 (16:43 +0800)]
wifi: rtw89: 8852b: add BB and RF tables (2 of 2)
These tables contain BB and RF parameters that driver will load them into
registers. It also contains TX power according to country, band, rate and
so on. Increasing thermal can cause TX power degraded, so power tracking
tables are defined to compensate TX power.
Internal version of these tables:
- HALRF_029_00_014 (R32)
- HALBB_027_046_05
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220928084336.34981-3-pkshih@realtek.com
Ping-Ke Shih [Wed, 28 Sep 2022 08:43:28 +0000 (16:43 +0800)]
wifi: rtw89: 8852b: add BB and RF tables (1 of 2)
These tables contain BB and RF parameters that driver will load them into
registers. It also contains TX power according to country, band, rate and
so on. Increasing thermal can cause TX power degraded, so power tracking
tables are defined to compensate TX power.
Internal version of these tables:
- HALRF_029_00_014 (R32)
- HALBB_027_046_05
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220928084336.34981-2-pkshih@realtek.com
Jakub Kicinski [Fri, 30 Sep 2022 17:07:30 +0000 (10:07 -0700)]
Merge tag 'wireless-next-2022-09-30' of git://git./linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.1
Few stack changes and lots of driver changes in this round. brcmfmac
has more activity as usual and it gets new hardware support. ath11k
improves WCN6750 support and also other smaller features. And of
course changes all over.
Note: in early September wireless tree was merged to wireless-next to
avoid some conflicts with mac80211 patches, this shouldn't cause any
problems but wanted to mention anyway.
Major changes:
mac80211
- refactoring and preparation for Wi-Fi 7 Multi-Link Operation (MLO)
feature continues
brcmfmac
- support CYW43439 SDIO chipset
- support BCM4378 on Apple platforms
- support CYW89459 PCIe chipset
rtw89
- more work to get rtw8852c supported
- P2P support
- support for enabling and disabling MSDU aggregation via nl80211
mt76
- tx status reporting improvements
ath11k
- cold boot calibration support on WCN6750
- Target Wake Time (TWT) debugfs support for STA interface
- support to connect to a non-transmit MBSSID AP profile
- enable remain-on-channel support on WCN6750
- implement SRAM dump debugfs interface
- enable threaded NAPI on all hardware
- WoW support for WCN6750
- support to provide transmit power from firmware via nl80211
- support to get power save duration for each client
- spectral scan support for 160 MHz
wcn36xx
- add SNR from a received frame as a source of system entropy
* tag 'wireless-next-2022-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (231 commits)
wifi: rtl8xxxu: Improve rtl8xxxu_queue_select
wifi: rtl8xxxu: Fix AIFS written to REG_EDCA_*_PARAM
wifi: rtl8xxxu: gen2: Enable 40 MHz channel width
wifi: rtw89: 8852b: configure DLE mem
wifi: rtw89: check DLE FIFO size with reserved size
wifi: rtw89: mac: correct register of report IMR
wifi: rtw89: pci: set power cut closed for 8852be
wifi: rtw89: pci: add to do PCI auto calibration
wifi: rtw89: 8852b: implement chip_ops::{enable,disable}_bb_rf
wifi: rtw89: add DMA busy checking bits to chip info
wifi: rtw89: mac: define DMA channel mask to avoid unsupported channels
wifi: rtw89: pci: mask out unsupported TX channels
iwlegacy: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper
ipw2x00: Replace zero-length array with DECLARE_FLEX_ARRAY() helper
wifi: iwlwifi: Track scan_cmd allocation size explicitly
brcmfmac: Remove the call to "dtim_assoc" IOVAR
brcmfmac: increase dcmd maximum buffer size
brcmfmac: Support 89459 pcie
brcmfmac: increase default max WOWL patterns to 16
cw1200: fix incorrect check to determine if no element is found in list
...
====================
Link: https://lore.kernel.org/r/20220930150413.A7984C433D6@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 30 Sep 2022 14:55:55 +0000 (07:55 -0700)]
Merge branch 'mlx5-xsk-updates-part2-2022-09-28'
Saeed Mahameed says:
====================
mlx5 xsk updates part2 2022-09-28
XSK buffer improvements, This is part #2 of 4 parts series.
1) Expose xsk min chunk size to drivers, to allow the driver to adjust to a
better buffer stride size
2) Adjust MTT page size to the XSK frame size, to avoid umem overrun in
certain situations.
3) Use xsk frame size as the striding RQ page size for XSK RQs
4) KSM for unaligned XSK, KSM allows arbitrary buffer chunk lengths
registration in HW, which makes more sense for unaligned XSK.
4) More cleanups and optimizations in preparation for next improvements
in part3
part 1: https://lore.kernel.org/netdev/
20220927203611.244301-1-saeed@kernel.org/
====================
Link: https://lore.kernel.org/r/20220929072156.93299-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:56 +0000 (00:21 -0700)]
net/mlx5e: Clean up and fix error flows in mlx5e_alloc_rq
Although mlx5e_rq_free_shampo can be called unconditionally, it belongs
to case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ. Move it there to allow to
add more init/cleanup actions to the striding RQ case.
If xdp_rxq_info_reg_mem_model fails, don't forget to destroy the page
pool.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:55 +0000 (00:21 -0700)]
net/mlx5e: Move repeating clear_bit in mlx5e_rx_reporter_err_rq_cqe_recover
The same clear_bit is called in both error and success flows. Move the
call to do it only once and remove the out label.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:54 +0000 (00:21 -0700)]
net/mlx5e: Split out channel (de)activation in rx_res
To decrease the nesting level and reduce duplication of code, create
functions to redirect direct RQTs to the actual RQs or drop_rq, which
are used in the activation and deactivation flows of channels.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:53 +0000 (00:21 -0700)]
net/mlx5e: xsk: Remove mlx5e_xsk_page_alloc_pool
mlx5e_xsk_page_alloc_pool became a thin wrapper around xsk_buff_alloc.
Drop it and call xsk_buff_alloc directly.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:52 +0000 (00:21 -0700)]
net/mlx5e: Convert struct mlx5e_alloc_unit to a union
struct mlx5e_alloc_unit consists of a single union. Convert it to a
union itself to simplify casting it to struct xdp_buff *, which will be
used to implement XSK batching on striding RQ.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:51 +0000 (00:21 -0700)]
net/mlx5e: Remove DMA address from mlx5e_alloc_unit
mlx5e_alloc_unit stores the DMA address and a pointer to either struct
page (regular RQ) or struct xdp_buff (XSK RQ). This DMA address is
redundant, because when a page or an XSK frame is allocated, the same
address is also stored there. Some flows take the address from struct
mlx5e_alloc_unit, and some take it from struct page or xdp_buff.
This commit removes the address from struct mlx5e_alloc_unit, which
makes it twice as small and improves locality (this struct is used in an
array), also saving on unnecessary stores to the addr field. Almost all
flows know unambiguously whether the DMA address should be taken from
page or from xdp_buff. The exception is the allocation flows, where a
new branch appeared, which will be optimized out in the next commits.
struct mlx5e_alloc_unit used to be called mlx5e_dma_info.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:50 +0000 (00:21 -0700)]
net/mlx5e: Rename mlx5e_dma_info to prepare for removal of DMA address
The next commit will remove the DMA address from the struct currently
called mlx5e_dma_info, because the same value can be retrieved with
page_pool_get_dma_addr(page) in almost all cases, with the notable
exception of SHAMPO (HW GRO implementation) that modifies this address
on the fly, after the initial allocation.
To keep the SHAMPO logic intact, struct mlx5e_dma_info remains in the
SHAMPO code, consisting of addr and page (XSK is not compatible with
SHAMPO). The struct used in all other places is renamed to
mlx5e_alloc_unit, allowing the next commit to remove the addr field
without affecting SHAMPO.
The new name means "allocation unit", and it's more appropriate after
the field with the DMA address gets removed.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:49 +0000 (00:21 -0700)]
net/mlx5e: Optimize the page cache reducing its size 2x
RX page cache stores dma_info structs, that consist of a pointer to
struct page and a DMA address. In fact, the DMA address is extracted
from struct page using page_pool_get_dma_addr when a page is pushed to
the cache. By moving this call to the point when a page is popped from
the cache, we can avoid storing the DMA address in the cache,
effectively reducing its size by two times without losing any
functionality.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:48 +0000 (00:21 -0700)]
net/mlx5e: Fix calculations for ICOSQ size
WQEs must not cross page boundaries, they are padded with NOPs if they
don't fit the page. mlx5e_mpwrq_total_umr_wqebbs doesn't take into
account this padding, risking reserving not enough space.
The padding is not straightforward to add to this calculation, because
WQEs of different sizes may be mixed together in the queue. If each page
ends with a big WQE that doesn't fit and requires at most its size minus
1 WQEBB of padding, the total space can be much bigger than in case when
smaller WQEs take advantage of this padding.
Replace the wrong exact calculation by the following estimation. Each
padding can be at most the size of the maximum WQE used in the queue
minus one WQEBB. Let's call the rest of the page "useful space". If we
divide the total size of all needed WQEs by this useful space, rounding
up, we'll get the number of pages, which is enough to contain all these
WQEs. It's correct, because every WQE that appeared on the boundary
between two blocks of useful space would start in the useful space of
one page and end in the padding of the same page, while our estimation
reserved space for its tail in the next space, making the estimation not
smaller than the real space occupied in the queue.
The code actually uses a looser estimation: instead of taking the
maximum size of all used WQE types minus 1 WQEBB, it takes the maximum
hardware size of a WQE. It's made for simplicity and extensibility.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:47 +0000 (00:21 -0700)]
xsk: Remove unused xsk_buff_discard
The previous commit removed the last usage of xsk_buff_discard in mlx5e,
so the function that is no longer used can be removed.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
CC: "Björn Töpel" <bjorn@kernel.org>
CC: Magnus Karlsson <magnus.karlsson@intel.com>
CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:46 +0000 (00:21 -0700)]
net/mlx5e: xsk: Use KSM for unaligned XSK
UMR MTTs used in striding RQ have certain alignment requirements. While
it's guaranteed to work when UMR pages are aligned to the UMR page size,
in practice it works then UMR pages are aligned to 8 bytes. However,
it's still not enough flexibility for the unaligned mode of XSK. This
patch leverages KSM to map UMR pages without alignment requirements,
when unaligned XSK is active. The downside is that KSM entries are twice
as big as MTTs, which limits the maximum WQE size, so regular RQs and
aligned XSK continue using MTTs.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:45 +0000 (00:21 -0700)]
net/mlx5: Add MLX5_FLEXIBLE_INLEN to safely calculate cmd inlen
Some commands use a flexible array after a common header. Add a macro to
safely calculate the total input length of the command, detecting
overflows and printing errors with specific values when such overflows
happen.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:44 +0000 (00:21 -0700)]
net/mlx5e: Keep a separate MKey for striding RQ
Currently, rq->mkey_be keeps a big-endian value of either the PA MKey
(for legacy RQ, no address translation) or MTT MKey (for striding RQ,
direct address translation). Striding RQ stores the same value in
rq->umr_mkey in the native endianness.
The next commit will make striding RQ use KSM MKey (indirect address
translation) for the unaligned mode of XSK, which will require storing
both KSM MKey and PA MKey in the RQ struct. This commit optimizes fields
of mlx5e_rq: umr_mkey is removed (it's redundant), mkey_be always points
to the PA MKey, and mpwqe.umr_mkey_be points to the MTT MKey (or to the
KSM MKey, starting from the next commit).
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:43 +0000 (00:21 -0700)]
net/mlx5e: xsk: Use XSK frame size as striding RQ page size
XSK RQs support striding RQ linear mode, but the stride size is always
set to PAGE_SIZE. It may be larger than the XSK frame size,
unnecessarily reducing the useful space in a WQE, but more importantly
causing UMEM data corruption in certain cases.
Normally, stride size bigger than XSK frame size is not a problem if the
hardware enforces the MTU. However, traffic between vports skips the
hardware MTU check, and oversized packets may be received.
If an oversized packet is bigger than the XSK frame but not bigger than
the stride, it will cause overwriting of the adjacent UMEM region. If
the packet takes more than one stride, they can be recycled for reuse
so it's not a problem when the XSK frame size matches the stride size.
To reduce the impact of the above issue, attempt to use the MTT page
size for striding RQ that matches the XSK frame size, allowing to safely
use 2048-byte frames on an up-to-date firmware.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:42 +0000 (00:21 -0700)]
net/mlx5e: Use runtime page_shift for striding RQ
This commit allows striding RQ to determine MTT page size at runtime,
instead of sticking to the compile-time PAGE_SIZE. This functionality
will be used by a following commit that adjusts the MTT page size to the
XSK frame size.
Stick with PAGE_SIZE for XSK on legacy RQ, as frag_stride is not used in
data path, it only helps calculate how pages are partitioned into
fragments, and PAGE_SIZE will ensure each fragment starts at the
beginning of a new allocation unit (XSK frame).
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Thu, 29 Sep 2022 07:21:41 +0000 (00:21 -0700)]
xsk: Expose min chunk size to drivers
Drivers should be aware of the range of valid UMEM chunk sizes to be
able to allocate their internal structures of an appropriate size. It
will be used by mlx5e in the following patches.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
CC: "Björn Töpel" <bjorn@kernel.org>
CC: Magnus Karlsson <magnus.karlsson@intel.com>
CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hongbin Wang [Thu, 29 Sep 2022 06:12:05 +0000 (02:12 -0400)]
ip6_vti:Remove the space before the comma
There should be no space before the comma
Signed-off-by: Hongbin Wang <wh_bin@126.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 30 Sep 2022 12:04:23 +0000 (13:04 +0100)]
Merge branch 'Mediatek-mt8188'
Jianguo Zhang says:
====================
Mediatek ethernet patches for mt8188
Changes in v7:
v7:
1) Add 'Reviewed-by: AngeloGioacchino Del Regno
<angelogioacchino.delregno@collabora.com>' info in commit message of
patch 'dt-bindings: net: snps,dwmac: add new property snps,clk-csr',
'arm64: dts: mediatek: mt2712e: Update the name of property 'clk_csr''
and 'net: stmmac: add a parse for new property 'snps,clk-csr''.
v6:
1) Update commit message of patch 'dt-bindings: net: snps,dwmac: add new property snps,clk-csr'
2) Add a parse for new property 'snps,clk-csr' in patch
'net: stmmac: add a parse for new property 'snps,clk-csr''
v5:
1) Rename the property 'clk_csr' as 'snps,clk-csr' in binding
file as Krzysztof Kozlowski'comment.
2) Add DTS patch 'arm64: dts: mediatek: mt2712e: Update the name of property 'clk_csr''
as Krzysztof Kozlowski'comment.
3) Add driver patch 'net: stmmac: Update the name of property 'clk_csr''
as Krzysztof Kozlowski'comment.
v4:
1) Update the commit message of patch 'dt-bindings: net: snps,dwmac: add clk_csr property'
as Krzysztof Kozlowski'comment.
v3:
1) List the names of SoCs mt8188 and mt8195 in correct order as
AngeloGioacchino Del Regno's comment.
2) Add patch version info as Krzysztof Kozlowski'comment.
v2:
1) Delete patch 'stmmac: dwmac-mediatek: add support for mt8188' as
Krzysztof Kozlowski's comment.
2) Update patch 'dt-bindings: net: mediatek-dwmac: add support for
mt8188' as Krzysztof Kozlowski's comment.
3) Add clk_csr property to fix warning ('clk_csr' was unexpected) when
runnig 'make dtbs_check'.
v1:
1) Add ethernet driver entry for mt8188.
2) Add binding document for ethernet on mt8188.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jianguo Zhang [Thu, 29 Sep 2022 01:47:58 +0000 (09:47 +0800)]
net: stmmac: add a parse for new property 'snps,clk-csr'
Parse new property 'snps,clk-csr' firstly because the new property
is documented in binding file, if failed, fall back to old property
'clk_csr' for legacy case
Signed-off-by: Jianguo Zhang <jianguo.zhang@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jianguo Zhang [Thu, 29 Sep 2022 01:47:57 +0000 (09:47 +0800)]
arm64: dts: mediatek: mt2712e: Update the name of property 'clk_csr'
Update the name of property 'clk_csr' as 'snps,clk-csr' to align with
the property name in the binding file.
Signed-off-by: Jianguo Zhang <jianguo.zhang@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jianguo Zhang [Thu, 29 Sep 2022 01:47:56 +0000 (09:47 +0800)]
dt-bindings: net: snps,dwmac: add new property snps,clk-csr
Add description for new property snps,clk-csr in binding file
Signed-off-by: Jianguo Zhang <jianguo.zhang@mediatek.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jianguo Zhang [Thu, 29 Sep 2022 01:47:55 +0000 (09:47 +0800)]
dt-bindings: net: mediatek-dwmac: add support for mt8188
Add binding document for the ethernet on mt8188
Signed-off-by: Jianguo Zhang <jianguo.zhang@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 28 Sep 2022 22:07:55 +0000 (23:07 +0100)]
net/mlx5: Fix spelling mistake "syndrom" -> "syndrome"
There is a spelling mistake in a devlink_health_report message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 28 Sep 2022 21:54:39 +0000 (22:54 +0100)]
net: bna: Fix spelling mistake "muliple" -> "multiple"
There is a spelling mistake in a literal string in the array
bnad_net_stats_strings. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nick Child [Wed, 28 Sep 2022 21:43:50 +0000 (16:43 -0500)]
ibmveth: Ethtool set queue support
Implement channel management functions to allow dynamic addition and
removal of transmit queues. The `ethtool --show-channels` and
`ethtool --set-channels` commands can be used to get and set the
number of queues, respectively. Allow the ability to add as many
transmit queues as available processors but never allow more than the
hard maximum of 16. The number of receive queues is one and cannot be
modified.
Depending on whether the requested number of queues is larger or
smaller than the current value, either allocate or free long term
buffers. Since long term buffer construction and destruction can
occur in two different areas, from either channel set requests or
device open/close, define functions for performing this work. If
allocation of a new buffer fails, then attempt to revert back to the
previous number of queues.
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nick Child [Wed, 28 Sep 2022 21:43:49 +0000 (16:43 -0500)]
ibmveth: Implement multi queue on xmit
The `ndo_start_xmit` function is protected by a spinlock on the tx queue
being used to transmit the skb. Allow concurrent calls to
`ndo_start_xmit` by using more than one tx queue. This allows for
greater throughput when several jobs are trying to transmit data.
Introduce 16 tx queues (leave single rx queue as is) which each
correspond to one DMA mapped long term buffer.
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nick Child [Wed, 28 Sep 2022 21:43:48 +0000 (16:43 -0500)]
ibmveth: Copy tx skbs into a premapped buffer
Rather than DMA mapping and unmapping every outgoing skb, copy the skb
into a buffer that was mapped during the drivers open function. Copying
the skb and its frags have proven to be more time efficient than
mapping and unmapping. As an effect, performance increases by 3-5
Gbits/s.
Allocate and DMA map one continuous 64KB buffer at `ndo_open`. This
buffer is maintained until `ibmveth_close` is called. This buffer is
large enough to hold the largest possible linnear skb. During
`ndo_start_xmit`, copy the skb and all of it's frags into the continuous
buffer. By manually linnearizing all the socket buffers, time is saved
during memcpy as well as more efficient handling in FW.
As a result, we no longer need to worry about the firmware limitation
of handling a max of 6 frags. So, we only need to maintain 1 descriptor
instead of 6 and can hardcode 0 for the other 5 descriptors during
h_send_logical_lan.
Since, DMA allocation/mapping issues can no longer arise in xmit
functions, we can further reduce code size by removing the need for a
bounce buffer on DMA errors.
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 28 Sep 2022 21:37:53 +0000 (22:37 +0100)]
bnx2: Fix spelling mistake "bufferred" -> "buffered"
There are spelling mistakes in two literal strings. Fix these.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steven Hsieh [Wed, 28 Sep 2022 17:57:58 +0000 (10:57 -0700)]
net: bridge: assign path_cost for 2.5G and 5G link speed
As 2.5G, 5G ethernet ports are more common and affordable,
these ports are being used in LAN bridge devices.
STP port_cost() is missing path_cost assignment for these link speeds,
causes highest cost 100 being used.
This result in lower speed port being picked
when there is loop between 5G and 1G ports.
Original path_cost: 10G=2, 1G=4, 100m=19, 10m=100
Adjusted path_cost: 10G=2, 5G=3, 2.5G=4, 1G=5, 100m=19, 10m=100
speed greater than 10G = 1
Signed-off-by: Steven Hsieh <steven.hsieh@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 28 Sep 2022 14:36:18 +0000 (15:36 +0100)]
net: lan966x: Fix spelling mistake "tarffic" -> "traffic"
There is a spelling mistake in a netdev_err message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Gobert [Wed, 28 Sep 2022 12:55:31 +0000 (14:55 +0200)]
net-next: skbuff: refactor pskb_pull
pskb_may_pull already contains all of the checks performed by
pskb_pull.
Use pskb_may_pull for validation in pskb_pull, eliminating the
duplication and making __pskb_pull obsolete.
Replace __pskb_pull with pskb_pull where applicable.
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Yufen [Wed, 28 Sep 2022 12:30:14 +0000 (20:30 +0800)]
net: bonding: Convert to use sysfs_emit()/sysfs_emit_at() APIs
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the value
to be returned to user space.
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Yufen [Wed, 28 Sep 2022 11:49:44 +0000 (19:49 +0800)]
net-sysfs: Convert to use sysfs_emit() APIs
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the value
to be returned to user space.
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Yufen [Wed, 28 Sep 2022 11:49:43 +0000 (19:49 +0800)]
net: tun: Convert to use sysfs_emit() APIs
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the value
to be returned to user space.
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 30 Sep 2022 10:32:54 +0000 (11:32 +0100)]
Merge branch 'net-tsnep-multiqueue'
Gerhard Engleder says:
====================
tsnep: multi queue support and some other improvements
Add support for additional TX/RX queues along with RX flow classification
support.
Binding is extended to allow additional interrupts for additional TX/RX
queues. Also dma-coherent is allowed as minor improvement.
RX path optimisation is done by using page pool as preparations for future
XDP support.
v4:
- rework dma-coherent commit message (Krzysztof Kozlowski)
- fixed order of interrupt-names in binding (Krzysztof Kozlowski)
- add line break between examples in binding (Krzysztof Kozlowski)
- add RX_CLS_LOC_ANY support to RX flow classification
v3:
- now with changes in cover letter
v2:
- use netdev_name() (Jakub Kicinski)
- use ENOENT if RX flow rule is not found (Jakub Kicinski)
- eliminate return code of tsnep_add_rule() (Jakub Kicinski)
- remove commit with lazy refill due to depletion problem (Jakub Kicinski)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerhard Engleder [Tue, 27 Sep 2022 19:58:42 +0000 (21:58 +0200)]
tsnep: Use page pool for RX
Use page pool for RX buffer handling. Makes RX path more efficient and
is required prework for future XDP support.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerhard Engleder [Tue, 27 Sep 2022 19:58:41 +0000 (21:58 +0200)]
tsnep: Add EtherType RX flow classification support
Received Ethernet frames are assigned to first RX queue per default.
Based on EtherType Ethernet frames can be assigned to other RX queues.
This enables processing of real-time Ethernet protocols on dedicated
RX queues.
Add RX flow classification interface for EtherType based RX queue
assignment.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerhard Engleder [Tue, 27 Sep 2022 19:58:40 +0000 (21:58 +0200)]
tsnep: Support multiple TX/RX queue pairs
Support additional TX/RX queue pairs if dedicated interrupt is
available. Interrupts are detected by name in device tree.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerhard Engleder [Tue, 27 Sep 2022 19:58:39 +0000 (21:58 +0200)]
tsnep: Move interrupt from device to queue
For multiple queues multiple interrupts shall be used. Therefore, rework
global interrupt to per queue interrupt.
Every interrupt name shall contain interface name and queue information.
To get a valid interface name, the interrupt request needs to by done
during open like in other drivers. Additionally, this allows the removal
of some initialisation checks in the interrupt handler.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerhard Engleder [Tue, 27 Sep 2022 19:58:38 +0000 (21:58 +0200)]
dt-bindings: net: tsnep: Allow additional interrupts
Additional TX/RX queue pairs require dedicated interrupts. Extend
binding with additional interrupts.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gerhard Engleder [Tue, 27 Sep 2022 19:58:37 +0000 (21:58 +0200)]
dt-bindings: net: tsnep: Allow dma-coherent
Within SoCs like ZynqMP, FPGA logic can be connected to different kinds
of AXI master ports. Also cache coherent AXI master ports are available.
The property "dma-coherent" is used to signal that DMA is cache
coherent.
Add "dma-coherent" property to allow the configuration of cache coherent
DMA.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 30 Sep 2022 02:29:02 +0000 (19:29 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-09-28 (ice)
Arkadiusz implements a single pin initialization function, checking feature
bits, instead of having separate device functions and updates sub-device
IDs for recognizing E810T devices.
Martyna adds support for switchdev filters on VLAN priority field.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Add support for VLAN priority filters in switchdev
ice: support features on new E810T variants
ice: Merge pin initialization of E810 and E810T adapters
====================
Link: https://lore.kernel.org/r/20220928203217.411078-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wang Yufen [Wed, 28 Sep 2022 11:34:20 +0000 (19:34 +0800)]
net: phy: Convert to use sysfs_emit() APIs
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the value
to be returned to user space.
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/1664364860-29153-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 30 Sep 2022 01:52:06 +0000 (18:52 -0700)]
Merge branch 'add-tc-taprio-support-for-queuemaxsdu'
Vladimir Oltean says:
====================
Add tc-taprio support for queueMaxSDU
The tc-taprio offload mode supported by the Felix DSA driver has
limitations surrounding its guard bands.
The initial discussion was at:
https://lore.kernel.org/netdev/
c7618025da6723418c56a54fe4683bd7@walle.cc/
with the latest status being that we now have a vsc9959_tas_guard_bands_update()
method which makes a best-guess attempt at how much useful space to
reserve for packet scheduling in a taprio interval, and how much to
reserve for guard bands.
IEEE 802.1Q actually does offer a tunable variable (queueMaxSDU) which
can determine the max MTU supported per traffic class. In turn we can
determine the size we need for the guard bands, depending on the
queueMaxSDU. This way we can make the guard band of small taprio
intervals smaller than one full MTU worth of transmission time, if we
know that said traffic class will transport only smaller packets.
As discussed with Gerhard Engleder, the queueMaxSDU may also be useful
in limiting the latency on an endpoint, if some of the TX queues are
outside of the control of the Linux driver.
https://patchwork.kernel.org/project/netdevbpf/patch/
20220914153303.1792444-11-vladimir.oltean@nxp.com/
Allow input of queueMaxSDU through netlink into tc-taprio, offload it to
the hardware I have access to (LS1028A), and (implicitly) deny
non-default values to everyone else. Kurt Kanzenbach has also kindly
tested and shared a patch to offload this to hellcreek.
v3 at:
https://patchwork.kernel.org/project/netdevbpf/cover/
20220927234746.1823648-1-vladimir.oltean@nxp.com/
v2 at:
https://patchwork.kernel.org/project/netdevbpf/list/?series=679954&state=*
v1 at:
https://patchwork.kernel.org/project/netdevbpf/cover/
20220914153303.1792444-1-vladimir.oltean@nxp.com/
====================
Link: https://lore.kernel.org/r/20220928095204.2093716-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 28 Sep 2022 09:52:04 +0000 (12:52 +0300)]
net: enetc: offload per-tc max SDU from tc-taprio
The driver currently sets the PTCMSDUR register statically to the max
MTU supported by the interface. Keep this logic if tc-taprio is absent
or if the max_sdu for a traffic class is 0, and follow the requested max
SDU size otherwise.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 28 Sep 2022 09:52:03 +0000 (12:52 +0300)]
net: enetc: use common naming scheme for PTGCR and PTGCAPR registers
The Port Time Gating Control Register (PTGCR) and Port Time Gating
Capability Register (PTGCAPR) have definitions in the driver which
aren't in line with the other registers. Rename these.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 28 Sep 2022 09:52:02 +0000 (12:52 +0300)]
net: enetc: cache accesses to &priv->si->hw
The &priv->si->hw construct dereferences 2 pointers and makes lines
longer than they need to be, in turn making the code harder to read.
Replace &priv->si->hw accesses with a "hw" variable when there are 2 or
more accesses within a function that dereference this. This includes
loops, since &priv->si->hw is a loop invariant.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kurt Kanzenbach [Wed, 28 Sep 2022 09:52:01 +0000 (12:52 +0300)]
net: dsa: hellcreek: Offload per-tc max SDU from tc-taprio
Add support for configuring the max SDU per priority and per port. If not
specified, keep the default.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 28 Sep 2022 09:52:00 +0000 (12:52 +0300)]
net: dsa: hellcreek: refactor hellcreek_port_setup_tc() to use switch/case
The following patch will need to make this function also respond to
TC_QUERY_BASE, so make the processing more structured around the
tc_setup_type.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 28 Sep 2022 09:51:59 +0000 (12:51 +0300)]
net: dsa: felix: offload per-tc max SDU from tc-taprio
Our current vsc9959_tas_guard_bands_update() algorithm has a limitation
imposed by the hardware design. To avoid packet overruns between one
gate interval and the next (which would add jitter for scheduled traffic
in the next gate), we configure the switch to use guard bands. These are
as large as the largest packet which is possible to be transmitted.
The problem is that at tc-taprio intervals of sizes comparable to a
guard band, there isn't an obvious place in which to split the interval
between the useful portion (for scheduling) and the guard band portion
(where scheduling is blocked).
For example, a 10 us interval at 1Gbps allows 1225 octets to be
transmitted. We currently split the interval between the bare minimum of
33 ns useful time (required to schedule a single packet) and the rest as
guard band.
But 33 ns of useful scheduling time will only allow a single packet to
be sent, be that packet 1200 octets in size, or 60 octets in size. It is
impossible to send 2 60 octets frames in the 10 us window. Except that
if we reduced the guard band (and therefore the maximum allowable SDU
size) to 5 us, the useful time for scheduling is now also 5 us, so more
packets could be scheduled.
The hardware inflexibility of not scheduling according to individual
packet lengths must unfortunately propagate to the user, who needs to
tune the queueMaxSDU values if he wants to fit more small packets into a
10 us interval, rather than one large packet.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 28 Sep 2022 09:51:58 +0000 (12:51 +0300)]
net/sched: taprio: allow user input of per-tc max SDU
IEEE 802.1Q clause 12.29.1.1 "The queueMaxSDUTable structure and data
types" and 8.6.8.4 "Enhancements for scheduled traffic" talk about the
existence of a per traffic class limitation of maximum frame sizes, with
a fallback on the port-based MTU.
As far as I am able to understand, the 802.1Q Service Data Unit (SDU)
represents the MAC Service Data Unit (MSDU, i.e. L2 payload), excluding
any number of prepended VLAN headers which may be otherwise present in
the MSDU. Therefore, the queueMaxSDU is directly comparable to the
device MTU (1500 means L2 payload sizes are accepted, or frame sizes of
1518 octets, or 1522 plus one VLAN header). Drivers which offload this
are directly responsible of translating into other units of measurement.
To keep the fast path checks optimized, we keep 2 arrays in the qdisc,
one for max_sdu translated into frame length (so that it's comparable to
skb->len), and another for offloading and for dumping back to the user.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 28 Sep 2022 09:51:57 +0000 (12:51 +0300)]
net/sched: query offload capabilities through ndo_setup_tc()
When adding optional new features to Qdisc offloads, existing drivers
must reject the new configuration until they are coded up to act on it.
Since modifying all drivers in lockstep with the changes in the Qdisc
can create problems of its own, it would be nice if there existed an
automatic opt-in mechanism for offloading optional features.
Jakub proposes that we multiplex one more kind of call through
ndo_setup_tc(): one where the driver populates a Qdisc-specific
capability structure.
First user will be taprio in further changes. Here we are introducing
the definitions for the base functionality.
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220923163310.3192733-3-vladimir.oltean@nxp.com/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Co-developed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yuan Can [Wed, 28 Sep 2022 08:56:36 +0000 (08:56 +0000)]
net/tipc: Remove unused struct distr_queue_item
After commit
09b5678c778f("tipc: remove dead code in tipc_net and relatives"),
struct distr_queue_item is not used any more and can be removed as well.
Signed-off-by: Yuan Can <yuancan@huawei.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Link: https://lore.kernel.org/r/20220928085636.71749-1-yuancan@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 28 Sep 2022 08:43:09 +0000 (10:43 +0200)]
net: skb: introduce and use a single page frag cache
After commit
3226b158e67c ("net: avoid 32 x truesize under-estimation
for tiny skbs") we are observing 10-20% regressions in performance
tests with small packets. The perf trace points to high pressure on
the slab allocator.
This change tries to improve the allocation schema for small packets
using an idea originally suggested by Eric: a new per CPU page frag is
introduced and used in __napi_alloc_skb to cope with small allocation
requests.
To ensure that the above does not lead to excessive truesize
underestimation, the frag size for small allocation is inflated to 1K
and all the above is restricted to build with 4K page size.
Note that we need to update accordingly the run-time check introduced
with commit
fd9ea57f4e95 ("net: add napi_get_frags_check() helper").
Alex suggested a smart page refcount schema to reduce the number
of atomic operations and deal properly with pfmemalloc pages.
Under small packet UDP flood, I measure a 15% peak tput increases.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Alexander H Duyck <alexanderduyck@fb.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/6b6f65957c59f86a353fc09a5127e83a32ab5999.1664350652.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Tue, 27 Sep 2022 15:37:01 +0000 (08:37 -0700)]
net: sched: cls_u32: Avoid memcpy() false-positive warning
To work around a misbehavior of the compiler's ability to see into
composite flexible array structs (as detailed in the coming memcpy()
hardening series[1]), use unsafe_memcpy(), as the sizing,
bounds-checking, and allocation are all very tightly coupled here.
This silences the false-positive reported by syzbot:
memcpy: detected field-spanning write (size 80) of single field "&n->sel" at net/sched/cls_u32.c:1043 (size 16)
[1] https://lore.kernel.org/linux-hardening/
20220901065914.1417829-2-keescook@chromium.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Reported-by: syzbot+a2c4601efc75848ba321@syzkaller.appspotmail.com
Link: https://lore.kernel.org/lkml/000000000000a96c0b05e97f0444@google.com/
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20220927153700.3071688-1-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marek Vasut [Tue, 27 Sep 2022 01:24:49 +0000 (03:24 +0200)]
dt-bindings: net: snps,dwmac: Document stmmac-axi-config subnode
The stmmac-axi-config subnode is present in multiple dwmac instance DTs,
document its content per snps,axi-config property description which is
a phandle to this subnode.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220927012449.698915-1-marex@denx.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 27 Sep 2022 21:23:06 +0000 (14:23 -0700)]
docs: netlink: clarify the historical baggage of Netlink flags
nlmsg_flags are full of historical baggage, inconsistencies and
strangeness. Try to document it more thoroughly. Explain the meaning
of the ECHO flag (and while at it clarify the comment in the uAPI).
Handwave a little about the NEW request flags and how they make
sense on the surface but cater to really old paradigm before commands
were a thing.
I will add more notes on how to make use of ECHO and discouragement
for reuse of flags to the kernel-side documentation.
Link: https://lore.kernel.org/r/20220927212306.823862-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 29 Sep 2022 21:30:51 +0000 (14:30 -0700)]
Merge git://git./linux/kernel/git/netdev/net
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 29 Sep 2022 15:32:53 +0000 (08:32 -0700)]
Merge tag 'net-6.0-rc8' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from wifi and can.
Current release - regressions:
- phy: don't WARN for PHY_UP state in mdio_bus_phy_resume()
- wifi: fix locking in mac80211 mlme
- eth:
- revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()"
- mlxbf_gige: fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe
Previous releases - regressions:
- wifi: fix regression with non-QoS drivers
Previous releases - always broken:
- mptcp: fix unreleased socket in accept queue
- wifi:
- don't start TX with fq->lock to fix deadlock
- fix memory corruption in minstrel_ht_update_rates()
- eth:
- macb: fix ZynqMP SGMII non-wakeup source resume failure
- mt7531: only do PLL once after the reset
- usbnet: fix memory leak in usbnet_disconnect()
Misc:
- usb: qmi_wwan: add new usb-id for Dell branded EM7455"
* tag 'net-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (30 commits)
mptcp: fix unreleased socket in accept queue
mptcp: factor out __mptcp_close() without socket lock
net: ethernet: mtk_eth_soc: fix mask of RX_DMA_GET_SPORT{,_V2}
net: mscc: ocelot: fix tagged VLAN refusal while under a VLAN-unaware bridge
can: c_can: don't cache TX messages for C_CAN cores
ice: xsk: drop power of 2 ring size restriction for AF_XDP
ice: xsk: change batched Tx descriptor cleaning
net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455
selftests: Fix the if conditions of in test_extra_filter()
net: phy: Don't WARN for PHY_UP state in mdio_bus_phy_resume()
net: stmmac: power up/down serdes in stmmac_open/release
wifi: mac80211: mlme: Fix double unlock on assoc success handling
wifi: mac80211: mlme: Fix missing unlock on beacon RX
wifi: mac80211: fix memory corruption in minstrel_ht_update_rates()
wifi: mac80211: fix regression with non-QoS drivers
wifi: mac80211: ensure vif queues are operational after start
wifi: mac80211: don't start TX with fq->lock to fix deadlock
wifi: cfg80211: fix MCS divisor value
net: hippi: Add missing pci_disable_device() in rr_init_one()
net/mlxbf_gige: Fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe
...
Linus Torvalds [Thu, 29 Sep 2022 15:22:53 +0000 (08:22 -0700)]
Merge tag 'input-for-v6.0-rc7' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- small fixes for iqs62x-keys and melfas_mip4 drivers
- corrected register address in snvs_pwrkey driver
- Synaptic driver will stop trying to use intertouch (native) mode on
some Lenovo AMD devices
* tag 'input-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: snvs_pwrkey - fix SNVS_HPVIDR1 register address
Input: synaptics - disable Intertouch for Lenovo T14 and P14s AMD G1
Input: iqs62x-keys - drop unused device node references
Input: melfas_mip4 - fix return value check in mip4_probe()
Linus Torvalds [Thu, 29 Sep 2022 12:40:59 +0000 (05:40 -0700)]
Merge tag 'ata-6.0-rc7' of git://git./linux/kernel/git/dlemoal/libata
Pull ATA fixes from Damien Le Moal:
"Three late patches to fix problems discovered recently:
- Add a horkage to disable link power management by default for the
Pioneer BDR-207M and BDR-205 DVD drives (from Niklas)
- Two patches to fix setting the maximum queue depth of libsas owned
ATA devices (from me)"
* tag 'ata-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata-sata: Fix device queue depth control
ata: libata-scsi: Fix initialization of device queue depth
libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205
Linus Torvalds [Thu, 29 Sep 2022 12:35:32 +0000 (05:35 -0700)]
Merge tag 'loongarch-fixes-6.0-3' of git://git./linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Some trivial fixes and cleanup"
* tag 'loongarch-fixes-6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Clean up loongson3_smp_ops declaration
LoongArch: Fix and cleanup csr_era handling in do_ri()
LoongArch: Align the address of kernel_entry to 4KB
ruanjinjie [Wed, 28 Sep 2022 03:17:08 +0000 (11:17 +0800)]
net: cpmac: Add __init/__exit annotations to module init/exit funcs
Add __init/__exit annotations to module init/exit funcs
Signed-off-by: ruanjinjie <ruanjinjie@huawei.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220928031708.89120-1-ruanjinjie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yang Yingliang [Tue, 27 Sep 2022 15:14:06 +0000 (23:14 +0800)]
ethernet: 8390: remove unnecessary check of mem
The 'mem' returned by platform_get_resource() has been checked in probe
function, so it is no need do this check in remove function.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220927151406.797800-1-yangyingliang@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shang XiaoJing [Tue, 27 Sep 2022 14:18:35 +0000 (22:18 +0800)]
nfp: Use skb_put_data() instead of skb_put/memcpy pair
Use skb_put_data() instead of skb_put() and memcpy(), which is clear.
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Link: https://lore.kernel.org/r/20220927141835.19221-1-shangxiaojing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yuan Can [Tue, 27 Sep 2022 13:39:40 +0000 (13:39 +0000)]
net: liquidio: Remove unused struct lio_trusted_vf_ctx
After commit
6870957ed5bc("liquidio: make soft command calls synchronous"), no
one use struct lio_trusted_vf_ctx, so remove it.
Signed-off-by: Yuan Can <yuancan@huawei.com>
Link: https://lore.kernel.org/r/20220927133940.104181-1-yuancan@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Liu Shixin [Tue, 27 Sep 2022 11:19:25 +0000 (19:19 +0800)]
net: ethernet: mtk_eth_soc: use DEFINE_SHOW_ATTRIBUTE to simplify code
Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
No functional change.
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Link: https://lore.kernel.org/r/20220927111925.2424100-1-liushixin2@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shang XiaoJing [Tue, 27 Sep 2022 02:45:11 +0000 (10:45 +0800)]
wwan_hwsim: Use skb_put_data() instead of skb_put/memcpy pair
Use skb_put_data() instead of skb_put() and memcpy(), which is clear.
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Link: https://lore.kernel.org/r/20220927024511.14665-1-shangxiaojing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shang XiaoJing [Tue, 27 Sep 2022 02:30:43 +0000 (10:30 +0800)]
net: ax88796c: Use skb_put_data() instead of skb_put/memcpy pair
Use skb_put_data() instead of skb_put() and memcpy(), which is clear.
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Link: https://lore.kernel.org/r/20220927023043.17769-1-shangxiaojing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shang XiaoJing [Tue, 27 Sep 2022 02:28:02 +0000 (10:28 +0800)]
ethernet: s2io: Use skb_put_data() instead of skb_put/memcpy pair
Use skb_put_data() instead of skb_put() and memcpy(), which is shorter
and clear. Drop the tmp variable that is not needed any more.
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Link: https://lore.kernel.org/r/20220927022802.16050-1-shangxiaojing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bitterblue Smith [Sun, 18 Sep 2022 12:47:05 +0000 (15:47 +0300)]
wifi: rtl8xxxu: Improve rtl8xxxu_queue_select
Remove the unused ieee80211_hw* parameter, and pass ieee80211_hdr*
instead of relying on skb->data having the right value at the time
the function is called.
This doesn't change the functionality at all.
Fixes:
26f1fad29ad9 ("New driver: rtl8xxxu (mac80211)")
Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Acked-by: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/2af44c28-1c12-46b9-85b9-011560bf7f7e@gmail.com
Bitterblue Smith [Sun, 18 Sep 2022 12:42:25 +0000 (15:42 +0300)]
wifi: rtl8xxxu: Fix AIFS written to REG_EDCA_*_PARAM
ieee80211_tx_queue_params.aifs is not supposed to be written directly
to the REG_EDCA_*_PARAM registers. Instead process it like the vendor
drivers do. It's kinda hacky but it works.
This change boosts the download speed and makes it more stable.
Tested with RTL8188FU but all the other supported chips should also
benefit.
Fixes:
26f1fad29ad9 ("New driver: rtl8xxxu (mac80211)")
Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Acked-by: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/038cc03f-3567-77ba-a7bd-c4930e3b2fad@gmail.com
Bitterblue Smith [Sun, 18 Sep 2022 12:40:56 +0000 (15:40 +0300)]
wifi: rtl8xxxu: gen2: Enable 40 MHz channel width
The module parameter ht40_2g was supposed to enable 40 MHz operation,
but it didn't.
Tell the firmware about the channel width when updating the rate mask.
This makes it work with my gen 2 chip RTL8188FU.
I'm not sure if anything needs to be done for the gen 1 chips, if 40
MHz channel width already works or not. They update the rate mask with
a different structure which doesn't have a field for the channel width.
Also set the channel width correctly for sta_statistics.
Fixes:
f653e69009c6 ("rtl8xxxu: Implement basic 8723b specific update_rate_mask() function")
Fixes:
bd917b3d28c9 ("rtl8xxxu: fill up txrate info for gen1 chips")
Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Acked-by: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/3a950997-7580-8a6b-97a0-e0a81a135456@gmail.com
Jakub Kicinski [Thu, 29 Sep 2022 02:30:21 +0000 (19:30 -0700)]
Merge tag 'mlx5-updates-2022-09-27' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-09-27
This is Part #1 of 4 parts series to align mlx5's implementation of
XSK (AF_XDP) RX-Qs indexing and management with other vendors:
Maxim Says:
===========
xsk: Bug fixes for frame mapping on striding RQ
Striding RQ relies on the driver mapping RX buffers into the NIC's
virtual memory space. Currently, regadless of the XSK frame size, mlx5e
maps them using MTT, and each mapping's length is PAGE_SIZE. As the
result, the stride size used by striding RQ is also equal to PAGE_SIZE.
This decision has the following issues:
1. In the XSK aligned mode with frame size smaller than PAGE_SIZE, it's
suboptimal. Using 2K strides and 2K pages allows to post twice as fewer
WQEs.
2. MTT is not suitable for unaligned frames, as it requires natural
alignment theoretically, in practice at least 8-byte alignment.
3. Using mapping and stride bigger than the frame has risk of writing
over the bounds of the XSK frame upon receiving packets bigger than MTU,
which is possible in some specific configurations.
This series addresses issues 1 and 2 and alleviates issue 3. Where
possible, page and stride size will match the XSK frame size (firmware
upgrade may be needed to have effect for 2K frames). Unaligned mode will
use KSM instead of MTT, which allows to drop the partial workaround [1].
[1]: https://lore.kernel.org/netdev/YufYFQ6JN91lQbso@boxer/T/
====================
Link: https://lore.kernel.org/r/20220927203611.244301-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Tue, 27 Sep 2022 20:36:11 +0000 (13:36 -0700)]
net/mlx5e: Use runtime values of striding RQ parameters in datapath
Some of the parameters of striding RQ are compile-time constants, but
they are going to become dynamically calculated at runtime in a
following commit. This commit prepares the datapath to take cached
runtime parameters, prefilled at queue creation.
New fields added to struct mlx5e_rq fit into an existing 7-byte hole.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Tue, 27 Sep 2022 20:36:10 +0000 (13:36 -0700)]
net/mlx5e: Make dma_info array dynamic in struct mlx5e_mpw_info
This commit moves the dma_info array to the end of struct mlx5e_mpw_info
to make it a flexible array. It also removes the intermediate struct
mlx5e_umr_dma_info, which used to contain only this array. The
flexibility of dma_info will allow to choose its size dynamically in a
following commit.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Tue, 27 Sep 2022 20:36:09 +0000 (13:36 -0700)]
net/mlx5e: Improve the MTU change shortcut
Normally, the MTU change requires reopening the channels, but it can be
skipped if the new MTU doesn't change any of the queue parameters and if
MTU is not used in the data path.
The shortcut is applicable to the non-linear mode of striding RQ,
because the only thing affected by MTU is the queue length. As ethtool
sets the queue length in packets, but striding RQ length is defined in
strides or bytes, we estimate the RQ length to be at least as big as the
requested number of MTU-sized packets, that's why it depends on MTU.
Improve the shortcut by actually checking whether the RQ length stayed
the same, instead of an intermediate step in the calculation.
As MTU also affects the SHAMPO parameters, skip the shortcut if SHAMPO
is in use.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Tue, 27 Sep 2022 20:36:08 +0000 (13:36 -0700)]
net/mlx5e: xsk: Fix SKB headroom calculation in validation
In a typical scenario, if an XSK socket is opened first, then an XDP
program is attached, mlx5e_validate_xsk_param will be called twice:
first on XSK bind, second on channel restart caused by enabling XDP. The
validation includes a call to mlx5e_rx_is_linear_skb, which checks the
presence of the XDP program.
The above means that mlx5e_rx_is_linear_skb might return true the first
time, but false the second time, as mlx5e_rx_get_linear_sz_skb's return
value will increase, because of a different headroom used with XDP.
As XSK RQs never exist without XDP, it would make sense to trick
mlx5e_rx_get_linear_sz_skb into thinking XDP is enabled at the first
check as well. This way, if MTU is too big, it would be detected on XSK
bind, without giving false hope to the userspace application.
However, it turns out that this check is too restrictive in the first
place. SKBs created on XDP_PASS on XSK RQs don't have any headroom. That
means that big MTUs filtered out on the first and the second checks
might actually work.
So, address this issue in the proper way, but taking into account the
absence of the SKB headroom on XSK RQs, when calculating the buffer
size.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Tue, 27 Sep 2022 20:36:07 +0000 (13:36 -0700)]
net/mlx5e: xsk: Remove dead code in validation
One of the checks in mlx5e_rx_is_linear_skb verifies that the RX buffer
fits into the XSK frame size. Remove the duplicating check from
mlx5e_validate_xsk_param. It allows to make mlx5e_rx_get_min_frag_sz
static.
Remove mlx5e_rx_is_xdp altogether, as its only usage is located in a
branch where xsk == NULL.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Tue, 27 Sep 2022 20:36:06 +0000 (13:36 -0700)]
net/mlx5e: Simplify stride size calculation for linear RQ
Linear RX buffers must be big enough to fit the MTU-sized packet along
with the headroom. On the other hand, they must be small enough to fit
into a page (or into an XSK frame). A straightforward way to check
whether the linear mode is possible would be comparing the required
buffer size to PAGE_SIZE or XSK frame size.
Stride size in the linear mode is defined by the following constraints:
1. A stride is at least as big as the buffer size, and it's a power of
two.
2. If non-XSK XDP is enabled, the stride size is PAGE_SIZE, because
mlx5e requires each packet to be in its own page when XDP is in use. The
previous constraint is automatically fulfilled, because buffer size
can't be bigger than PAGE_SIZE.
3. XSK uses stride size equal to PAGE_SIZE, but the following commits
will allow it to use roundup_pow_of_two(XSK frame size), by allowing the
NIC's MMU to use page sizes not equal to the CPU page size.
This commit puts the above requirements and constraints straight to the
code in an attempt to simplify it and to prepare it for changes made in
the next patches.
For the reference, the old code uses an equivalent, but trickier
calculation (high-level simplified pseudocode):
if XDP or XSK:
mlx5e_rx_get_linear_frag_sz := max(buffer size, PAGE_SIZE)
else:
mlx5e_rx_get_linear_frag_sz := buffer size
mlx5e_rx_is_linear_skb := mlx5e_rx_get_linear_frag_sz <= PAGE_SIZE
stride size := roundup_pow_of_two(mlx5e_rx_get_linear_frag_sz)
The new code effectively removes mlx5e_rx_get_linear_frag_sz that used
to return either buffer size or stride size, depending on the situation,
making it hard to work with and to make changes:
if XDP or XSK:
mlx5e_rx_get_linear_stride_sz := PAGE_SIZE
else
mlx5e_rx_get_linear_stride_sz := roundup_pow_of_two(buffer size)
mlx5e_rx_is_linear_skb := buffer size <= (PAGE_SIZE or XSK frame sz)
stride size := mlx5e_rx_get_linear_stride_sz
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Tue, 27 Sep 2022 20:36:05 +0000 (13:36 -0700)]
net/mlx5e: kTLS, Check ICOSQ WQE size in advance
Instead of WARNing in runtime when TLS offload WQEs posted to ICOSQ are
over the hardware limit, check their size before enabling TLS RX
offload, and block the offload if the condition fails. It also allows to
drop a u16 field from struct mlx5e_icosq.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Tue, 27 Sep 2022 20:36:04 +0000 (13:36 -0700)]
net/mlx5e: Use the aligned max TX MPWQE size
TX MPWQE size is limited to the cacheline-aligned maximum. Use the same
value for the stop room and the capability check.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Tue, 27 Sep 2022 20:36:03 +0000 (13:36 -0700)]
net/mlx5e: Fix a typo in mlx5e_xdp_mpwqe_is_full
Fix a typo in the function name: mpqwe -> mpwqe (stands for multi-packet
work queue element).
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Tue, 27 Sep 2022 20:36:02 +0000 (13:36 -0700)]
net/mlx5e: Use mlx5e_stop_room_for_max_wqe where appropriate
mlx5e_alloc_xdpsq calculates sq->stop_room internally, but there is
already a function for that: mlx5e_stop_room_for_max_wqe. This commit
makes use of this function.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>