Sriram R [Thu, 25 Nov 2021 09:30:14 +0000 (15:00 +0530)]
ath11k: Avoid NULL ptr access during mgmt tx cleanup
Currently 'ar' reference is not added in skb_cb during
WMI mgmt tx. Though this is generally not used during tx completion
callbacks, on interface removal the remaining idr cleanup callback
uses the ar ptr from skb_cb from mgmt txmgmt_idr. Hence
fill them during tx call for proper usage.
Also free the skb which is missing currently in these
callbacks.
Crash_info:
[19282.489476] Unable to handle kernel NULL pointer dereference at virtual address
00000000
[19282.489515] pgd =
91eb8000
[19282.496702] [
00000000] *pgd=
00000000
[19282.502524] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[19282.783728] PC is at ath11k_mac_vif_txmgmt_idr_remove+0x28/0xd8 [ath11k]
[19282.789170] LR is at idr_for_each+0xa0/0xc8
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.5.0.1-00729-QCAHKSWPL_SILICONZ-3 v2
Signed-off-by: Sriram R <quic_srirrama@quicinc.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1637832614-13831-1-git-send-email-quic_srirrama@quicinc.com
Loic Poulain [Mon, 22 Nov 2021 18:04:11 +0000 (19:04 +0100)]
wcn36xx: Use correct SSN for ADD BA request
Since firmware uses its own sequence number counters, we need to
use firmware number as well when mac80211 generates the ADD_BA
request packet. Indeed the firmware sequence counters tend to
slightly drift from the mac80211 ones because of firmware offload
features like ARP responses. This causes the starting sequence
number field of the ADD_BA request to be unaligned, and can possibly
cause issues with strict/picky APs.
To fix this, we retrieve the current firmware sequence number for
a given TID through the smd_trigger_ba API, and use that number as
replacement of the mac80211 starting sequence number.
This change also ensures that any issue in the smd *ba procedures
will cause the ba action to properly fail, and remove useless call
to smd_trigger_ba() from IEEE80211_AMPDU_RX_START.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1637604251-11763-1-git-send-email-loic.poulain@linaro.org
Anilkumar Kolli [Wed, 24 Nov 2021 17:11:31 +0000 (19:11 +0200)]
ath11k: Use host CE parameters for CE interrupts configuration
CE interrupt configuration uses host ce parameters to assign/free
interrupts. Use host ce parameters to enable/disable interrupts.
This patch fixes below BUG,
BUG: KASAN: global-out-of-bounds in 0xffffffbffdfb035c at addr
ffffffbffde6eeac
Read of size 4 by task kworker/u8:2/132
Address belongs to variable ath11k_core_qmi_firmware_ready+0x1b0/0x5bc [ath11k]
OOB is due to ath11k_ahb_ce_irqs_enable() iterates ce_count(which is 12)
times and accessing 12th element in target_ce_config
(which has only 11 elements) from ath11k_ahb_ce_irq_enable().
With this change host ce configs are used to enable/disable interrupts.
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.5.0.1-00471-QCAHKSWPL_SILICONZ-1
Fixes: 967c1d1131fa ("ath11k: move target ce configs to hw_params")
Signed-off-by: Anilkumar Kolli <akolli@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1637249558-12793-1-git-send-email-akolli@codeaurora.org
Kees Cook [Thu, 18 Nov 2021 20:24:16 +0000 (12:24 -0800)]
ath11k: Use memset_startat() for clearing queue descriptors
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.
Use memset_startat() so memset() doesn't get confused about writing
beyond the destination member that is intended to be the starting point
of zeroing through the end of the struct. Additionally split up a later
field-spanning memset() so that memset() can reason about the size.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211118202416.1286046-1-keescook@chromium.org
Colin Ian King [Tue, 23 Nov 2021 09:04:31 +0000 (09:04 +0000)]
ath11k: Fix spelling mistake "detetction" -> "detection"
There is a spelling mistake in an ath11k_warn message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211123090431.165103-1-colin.i.king@gmail.com
Kalle Valo [Wed, 24 Nov 2021 09:43:16 +0000 (11:43 +0200)]
Revert "ath11k: add read variant from SMBIOS for download board data"
This reverts commit
46e46db313a2bf3c48cac4eb8bdb613b762f301b. Mark reported
that it breaks QCA6390 hw2.0 on Dell XPS 13 9310:
[ 5.537034] ath11k_pci 0000:72:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
[ 5.537038] ath11k_pci 0000:72:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50 fw_build_id
[ 5.537236] ath11k_pci 0000:72:00.0: failed to fetch board data for bus=pci,qmi-chip-id=0,qmi-board-id=255,variant=DE_1901 from ath11k/QCA6390/hw2.0/board-2.bin
[ 5.537255] ath11k_pci 0000:72:00.0: failed to fetch board-2.bin or board.bin from QCA6390/hw2.0
[ 5.537257] ath11k_pci 0000:72:00.0: qmi failed to fetch board file: -2
[ 5.537258] ath11k_pci 0000:72:00.0: failed to load board data file: -2
So we need to back to the drawing board and implement it so that backwards
compatiblity is not broken.
Reported-by: Mark Herbert <mark.herbert42@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211124094316.9096-1-kvalo@codeaurora.org
Anilkumar Kolli [Mon, 22 Nov 2021 11:13:58 +0000 (13:13 +0200)]
ath11k: Fix mon status ring rx tlv processing
In HE monitor capture, HAL_TLV_STATUS_PPDU_DONE is received
on processing multiple skb. Do not clear the ppdu_info
till the HAL_TLV_STATUS_PPDU_DONE is received.
This fixes below warning and packet drops in monitor mode.
"Rate marked as an HE rate but data is invalid: MCS: 6, NSS: 0"
WARNING: at
PC is at ieee80211_rx_napi+0x624/0x840 [mac80211]
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.4.0.1-01693-QCAHKSWPL_SILICONZ-1
Signed-off-by: Anilkumar Kolli <akolli@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1637249433-10316-1-git-send-email-akolli@codeaurora.org
Wen Gong [Mon, 22 Nov 2021 11:13:58 +0000 (13:13 +0200)]
ath11k: add read variant from SMBIOS for download board data
This is to read variant from SMBIOS such as read from DT, the variant
string will be used to one part of string which used to search board
data from board-2.bin.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211118100033.8384-1-quic_wgong@quicinc.com
Wen Gong [Mon, 22 Nov 2021 11:13:58 +0000 (13:13 +0200)]
ath11k: skip sending vdev down for channel switch
The ath11k driver currently sends vdev down to the firmware before
updating the channel context, which is followed by a vdev restart
command.
Sending vdev down is not required before sending a vdev restart,
because the firmware internally does vdev down when ath11k sends
a vdev restart command.
Firmware will happen crash while channel switch without this change.
Hence skip the vdev down command sending when updating the channel
context and then fix the firmware crash issue.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211118095901.8271-1-quic_wgong@quicinc.com
Wen Gong [Mon, 22 Nov 2021 11:13:58 +0000 (13:13 +0200)]
ath11k: fix read fail for htt_stats and htt_peer_stats for single pdev
The pdev id is set to 0 for single pdev configured hardware, the real
pdev id is not 0 in firmware, for example, its pdev id is 1 for 5G/6G
phy and 2 for 2G band phy. For HTT_H2T_MSG_TYPE_EXT_STATS_CFG message,
firmware parse the pdev_mask to its pdev id, ath11k set it to 0 for
single pdev, it is not correct, need set it with the real pdev id of
firmware.
Save the real pdev id report by firmware and set it correctly.
Below commands run success with this patch:
cat /sys/kernel/debug/ieee80211/phy0/ath11k/htt_stats
cat /sys/kernel/debug/ieee80211/phy0/netdev\:wls1/stations/00\:03\:7f\:75\:59\:85/htt_peer_stats
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211118095700.8149-1-quic_wgong@quicinc.com
Wen Gong [Mon, 22 Nov 2021 11:13:57 +0000 (13:13 +0200)]
ath11k: calculate the correct NSS of peer for HE capabilities
When connected to 6G mode AP, it does not have VHT/HT capabilities,
so the NSS is not set, then it is 1 by default.
This patch is to calculate the NSS with supported HE-MCS and NSS set
of HE capabilities.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01280-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211118095453.8030-1-quic_wgong@quicinc.com
Wen Gong [Mon, 22 Nov 2021 11:13:57 +0000 (13:13 +0200)]
ath11k: change to treat alpha code na as world wide regdomain
Some firmware versions for WCN6855 report the default regdomain with
alpha code "na" by default when load as a world wide regdomain, ath11k
should treat it as a world wide alpha code.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211118094848.7776-1-quic_wgong@quicinc.com
Baochen Qiang [Fri, 19 Nov 2021 13:36:26 +0000 (15:36 +0200)]
ath11k: Set IRQ affinity to CPU0 in case of one MSI vector
With VT-d disabled on Intel platform, ath11k gets only one MSI
vector. In that case, ath11k does not free IRQ when doing suspend,
hence the kernel has to migrate it to CPU0 (if it was affine to
other CPUs) and allocates a new MSI vector. However, ath11k has
no chance to reconfig it to HW srngs during this phase, thus
ath11k fails to resume.
This issue can be fixed by setting IRQ affinity to CPU0 before
request_irq is called. With such affinity, migration will not
happen and thus the vector keeps unchanged during suspend/resume.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Baochen Qiang <bqiang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211026041732.5323-1-bqiang@codeaurora.org
Carl Huang [Fri, 19 Nov 2021 13:36:26 +0000 (15:36 +0200)]
ath11k: do not restore ASPM in case of single MSI vector
Current code enables ASPM by default, it allows MHI to enter M2 state.
In case of one MSI vector, system hang is observed if ath11k does MHI
register reading in this state. The issue was reported on Dell XPS 13
9310 but is seen also on XPS 15 and XPS 17 laptops.
The workaround here is to prevent MHI from entering M2 state, this can
be done by disabling ASPM if only one MSI vector is used. When using 32
vectors ASPM is enabled as before.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Baochen Qiang <bqiang@codeaurora.org>
Link: https://lore.kernel.org/r/20211026041722.5271-1-bqiang@codeaurora.org
Carl Huang [Fri, 19 Nov 2021 13:36:26 +0000 (15:36 +0200)]
ath11k: add support one MSI vector
On some platforms it's not possible to allocate 32 MSI vectors for various
reasons, be it kernel configuration, VT-d disabled, buggy BIOS etc. So
ath11k was not able to use QCA6390 PCI devices on those platforms. Add
support for one MSI vector to solve that.
In case of one MSI vector, interrupt migration needs to be disabled. This
is because when interrupt migration happens, the msi_data may change.
However, msi_data is already programmed to rings during initial phase and
ath11k has no way to know that msi_data is changed during run time and
reprogram again.
In case of one MSI vector, MHI subsystem should not use IRQF_NO_SUSPEND
as QCA6390 doesn't set this flag too. Ath11k doesn't need to leave
IRQ enabled in suspend state.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Baochen Qiang <bqiang@codeaurora.org>
Link: https://lore.kernel.org/r/20211026041714.5219-1-bqiang@codeaurora.org
Carl Huang [Fri, 19 Nov 2021 13:36:26 +0000 (15:36 +0200)]
ath11k: refactor multiple MSI vector implementation
This is to prepare for one MSI vector support. IRQ enable and disable
of CE and DP are done only in case of multiple MSI vectors.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Baochen Qiang <bqiang@codeaurora.org>
Link: https://lore.kernel.org/r/20211026041705.5167-1-bqiang@codeaurora.org
Carl Huang [Fri, 19 Nov 2021 13:36:26 +0000 (15:36 +0200)]
ath11k: use ATH11K_PCI_IRQ_DP_OFFSET for DP IRQ
Like ATH11K_PCI_IRQ_CE0_OFFSET, define ATH11K_PCI_IRQ_DP_OFFSET for
DP to save the IRQ instead of base_vector from MSI config.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Baochen Qiang <bqiang@codeaurora.org>
Link: https://lore.kernel.org/r/20211026041655.5112-1-bqiang@codeaurora.org
Carl Huang [Fri, 19 Nov 2021 13:36:26 +0000 (15:36 +0200)]
ath11k: add CE and ext IRQ flag to indicate irq_handler
This change adds two flags to indicate whether IRQ handler for CE
and DP can be called. This is because in one MSI vector case,
interrupt is not disabled in hif_stop and hif_irq_disable. Otherwise,
MHI interrupt is disabled too.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Baochen Qiang <bqiang@codeaurora.org>
Link: https://lore.kernel.org/r/20211026041646.5060-1-bqiang@codeaurora.org
Carl Huang [Fri, 19 Nov 2021 13:36:26 +0000 (15:36 +0200)]
ath11k: get msi_data again after request_irq is called
The reservation mode of interrupts in kernel assigns a dummy vector
when the interrupt is allocated and assigns a real vector when the
request_irq is called. The reservation mode helps to ease vector
pressure when devices with a large amount of queues/interrupts
are initialized, but only a minimal subset of those queues/interrupts
is actually used.
So on reservation mode, the msi_data may change after request_irq
is called, so ath11k reads msi_data again after request_irq is called,
and then the correct msi_data is programmed into QCA6390 hardware
components. Without this change, spurious interrupt occurs in case of
one MSI vector. When VT-d in BIOS is enabled and ath11k can get 32 MSI
vectors, ath11k always get the same msi_data before and after request_irq,
that's why this change is only required when one MSI vector is to be
supported.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211026041636.5008-1-bqiang@codeaurora.org
Kalle Valo [Fri, 19 Nov 2021 13:32:01 +0000 (15:32 +0200)]
Merge ath-next from git://git./linux/kernel/git/kvalo/ath.git
ath.git patches for v5.17. Major changes:
ath10k
* fetch (pre-)calibration data via nvmem subsystem
ath11k
* enable 802.11 power save mode in station mode for qca6390 and wcn6855
* trace log support
* proper board file detection for WCN6855 based on PCI ids
* BSS color change support
Peter Seiderer [Tue, 16 Nov 2021 22:07:20 +0000 (23:07 +0100)]
ath9k: fix intr_txqs setting
The struct ath_hw member intr_txqs is never reset/assigned outside
of ath9k_hw_init_queues() and with the used bitwise-or in the interrupt
handling ar9002_hw_get_isr() accumulates all ever set interrupt flags.
Fix this by using a pure assign instead of bitwise-or for the
first line (note: intr_txqs is only evaluated in case ATH9K_INT_TX bit
is set).
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211116220720.30145-1-ps.report@gmx.net
Seevalamuthu Mariappan [Wed, 17 Nov 2021 07:39:41 +0000 (09:39 +0200)]
ath11k: add hw_param for wakeup_mhi
Wakeup mhi is needed before pci_read/write only for QCA6390 and WCN6855. Since
wakeup & release mhi is enabled for all hardwares, below mhi assert is seen in
QCN9074 when doing 'rmmod ath11k_pci':
Kernel panic - not syncing: dev_wake != 0
CPU: 2 PID: 13535 Comm: procd Not tainted 4.4.60 #1
Hardware name: Generic DT based system
[<
80316dac>] (unwind_backtrace) from [<
80313700>] (show_stack+0x10/0x14)
[<
80313700>] (show_stack) from [<
805135dc>] (dump_stack+0x7c/0x9c)
[<
805135dc>] (dump_stack) from [<
8032136c>] (panic+0x84/0x1f8)
[<
8032136c>] (panic) from [<
80549b24>] (mhi_pm_disable_transition+0x3b8/0x5b8)
[<
80549b24>] (mhi_pm_disable_transition) from [<
80549ddc>] (mhi_power_down+0xb8/0x100)
[<
80549ddc>] (mhi_power_down) from [<
7f5242b0>] (ath11k_mhi_op_status_cb+0x284/0x3ac [ath11k_pci])
[E][__mhi_device_get_sync] Did not enter M0 state, cur_state:RESET pm_state:SHUTDOWN Process
[E][__mhi_device_get_sync] Did not enter M0 state, cur_state:RESET pm_state:SHUTDOWN Process
[E][__mhi_device_get_sync] Did not enter M0 state, cur_state:RESET pm_state:SHUTDOWN Process
[<
7f5242b0>] (ath11k_mhi_op_status_cb [ath11k_pci]) from [<
7f524878>] (ath11k_mhi_stop+0x10/0x20 [ath11k_pci])
[<
7f524878>] (ath11k_mhi_stop [ath11k_pci]) from [<
7f525b94>] (ath11k_pci_power_down+0x54/0x90 [ath11k_pci])
[<
7f525b94>] (ath11k_pci_power_down [ath11k_pci]) from [<
8056b2a8>] (pci_device_shutdown+0x30/0x44)
[<
8056b2a8>] (pci_device_shutdown) from [<
805cfa0c>] (device_shutdown+0x124/0x174)
[<
805cfa0c>] (device_shutdown) from [<
8033aaa4>] (kernel_restart+0xc/0x50)
[<
8033aaa4>] (kernel_restart) from [<
8033ada8>] (SyS_reboot+0x178/0x1ec)
[<
8033ada8>] (SyS_reboot) from [<
80301b80>] (ret_fast_syscall+0x0/0x34)
Hence, disable wakeup/release mhi using hw_param for other hardwares.
Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.5.0.1-01060-QCAHKSWPL_SILICONZ-1
Fixes: a05bd8513335 ("ath11k: read and write registers below unwindowed address")
Signed-off-by: Seevalamuthu Mariappan <quic_seevalam@quicinc.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1636702019-26142-1-git-send-email-quic_seevalam@quicinc.com
Jakub Kicinski [Fri, 19 Nov 2021 01:50:18 +0000 (17:50 -0800)]
Merge tag 'regmap-no-bus-update-bits' of git://git./linux/kernel/git/broonie/regmap
Mark Brown says:
===================
regmap: Allow regmap_update_bits() to be offloaded with no bus
Some hardware can do this so let's use that capability.
===================
Link: https://lore.kernel.org/all/YZWDOidBOssP10yS@sirena.org.uk/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 18 Nov 2021 21:13:16 +0000 (13:13 -0800)]
Merge git://git./linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 18 Nov 2021 20:54:24 +0000 (12:54 -0800)]
Merge tag 'net-5.16-rc2' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, mac80211.
Current release - regressions:
- devlink: don't throw an error if flash notification sent before
devlink visible
- page_pool: Revert "page_pool: disable dma mapping support...",
turns out there are active arches who need it
Current release - new code bugs:
- amt: cancel delayed_work synchronously in amt_fini()
Previous releases - regressions:
- xsk: fix crash on double free in buffer pool
- bpf: fix inner map state pruning regression causing program
rejections
- mac80211: drop check for DONT_REORDER in __ieee80211_select_queue,
preventing mis-selecting the best effort queue
- mac80211: do not access the IV when it was stripped
- mac80211: fix radiotap header generation, off-by-one
- nl80211: fix getting radio statistics in survey dump
- e100: fix device suspend/resume
Previous releases - always broken:
- tcp: fix uninitialized access in skb frags array for Rx 0cp
- bpf: fix toctou on read-only map's constant scalar tracking
- bpf: forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing
progs
- tipc: only accept encrypted MSG_CRYPTO msgs
- smc: transfer remaining wait queue entries during fallback, fix
missing wake ups
- udp: validate checksum in udp_read_sock() (when sockmap is used)
- sched: act_mirred: drop dst for the direction from egress to
ingress
- virtio_net_hdr_to_skb: count transport header in UFO, prevent
allowing bad skbs into the stack
- nfc: reorder the logic in nfc_{un,}register_device, fix unregister
- ipsec: check return value of ipv6_skip_exthdr
- usb: r8152: add MAC passthrough support for more Lenovo Docks"
* tag 'net-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (96 commits)
ptp: ocp: Fix a couple NULL vs IS_ERR() checks
net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()
net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound
ipv6: check return value of ipv6_skip_exthdr
e100: fix device suspend/resume
devlink: Don't throw an error if flash notification sent before devlink visible
page_pool: Revert "page_pool: disable dma mapping support..."
ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port()
octeontx2-af: debugfs: don't corrupt user memory
NFC: add NCI_UNREG flag to eliminate the race
NFC: reorder the logic in nfc_{un,}register_device
NFC: reorganize the functions in nci_request
tipc: check for null after calling kmemdup
i40e: Fix display error code in dmesg
i40e: Fix creation of first queue by omitting it if is not power of two
i40e: Fix warning message and call stack during rmmod i40e driver
i40e: Fix ping is lost after configuring ADq on VF
i40e: Fix changing previously set num_queue_pairs for PFs
i40e: Fix NULL ptr dereference on VSI filter sync
i40e: Fix correct max_pkt_size on VF RX queue
...
Linus Torvalds [Thu, 18 Nov 2021 20:41:14 +0000 (12:41 -0800)]
Merge tag 'for-5.16-rc1-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Several xes and one old ioctl deprecation. Namely there's fix for
crashes/warnings with lzo compression that was suspected to be caused
by first pull merge resolution, but it was a different bug.
Summary:
- regression fix for a crash in lzo due to missing boundary checks of
the page array
- fix crashes on ARM64 due to missing barriers when synchronizing
status bits between work queues
- silence lockdep when reading chunk tree during mount
- fix false positive warning in integrity checker on devices with
disabled write caching
- fix signedness of bitfields in scrub
- start deprecation of balance v1 ioctl"
* tag 'for-5.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: deprecate BTRFS_IOC_BALANCE ioctl
btrfs: make 1-bit bit-fields of scrub_page unsigned int
btrfs: check-integrity: fix a warning on write caching disabled disk
btrfs: silence lockdep when reading chunk tree during mount
btrfs: fix memory ordering between normal and ordered work functions
btrfs: fix a out-of-bound access in copy_compressed_data_to_page()
Linus Torvalds [Thu, 18 Nov 2021 20:31:29 +0000 (12:31 -0800)]
Merge tag 'fs_for_v5.16-rc2' of git://git./linux/kernel/git/jack/linux-fs
Pull UDF fix from Jan Kara:
"A fix for a long-standing UDF bug where we were not properly
validating directory position inside readdir"
* tag 'fs_for_v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix crash after seekdir
Linus Torvalds [Thu, 18 Nov 2021 20:17:33 +0000 (12:17 -0800)]
Merge tag 'fs.idmapped.v5.16-rc2' of git://git./linux/kernel/git/brauner/linux
Pull setattr idmapping fix from Christian Brauner:
"This contains a simple fix for setattr. When determining the validity
of the attributes the ia_{g,u}id fields contain the value that will be
written to inode->i_{g,u}id. When the {g,u}id attribute of the file
isn't altered and the caller's fs{g,u}id matches the current {g,u}id
attribute the attribute change is allowed.
The value in ia_{g,u}id does already account for idmapped mounts and
will have taken the relevant idmapping into account. So in order to
verify that the {g,u}id attribute isn't changed we simple need to
compare the ia_{g,u}id value against the inode's i_{g,u}id value.
This only has any meaning for idmapped mounts as idmapping helpers are
idempotent without them. And for idmapped mounts this really only has
a meaning when circular idmappings are used, i.e. mappings where e.g.
id 1000 is mapped to id 1001 and id 1001 is mapped to id 1000. Such
ciruclar mappings can e.g. be useful when sharing the same home
directory between multiple users at the same time.
Before this patch we could end up denying legitimate attribute changes
and allowing invalid attribute changes when circular mappings are
used. To even get into this situation the caller must've been
privileged both to create that mapping and to create that idmapped
mount.
This hasn't been seen in the wild anywhere but came up when expanding
the fstest suite during work on a series of hardening patches. All
idmapped fstests pass without any regressions and we're adding new
tests to verify the behavior of circular mappings.
The new tests can be found at [1]"
Link: https://lore.kernel.org/linux-fsdevel/20211109145713.1868404-2-brauner@kernel.org
* tag 'fs.idmapped.v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
fs: handle circular mappings correctly
Linus Torvalds [Thu, 18 Nov 2021 20:13:24 +0000 (12:13 -0800)]
Merge tag 'for-5.16/parisc-4' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"parisc bug and warning fixes and wire up futex_waitv.
Fix some warnings which showed up with allmodconfig builds, a revert
of a change to the sigreturn trampoline which broke signal handling,
wire up futex_waitv and add CONFIG_PRINTK_TIME=y to 32bit defconfig"
* tag 'for-5.16/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Enable CONFIG_PRINTK_TIME=y in 32bit defconfig
Revert "parisc: Reduce sigreturn trampoline to 3 instructions"
parisc: Wrap assembler related defines inside __ASSEMBLY__
parisc: Wire up futex_waitv
parisc: Include stringify.h to avoid build error in crypto/api.c
parisc/sticon: fix reverse colors
Linus Torvalds [Thu, 18 Nov 2021 20:05:22 +0000 (12:05 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Selftest changes:
- Cleanups for the perf test infrastructure and mapping hugepages
- Avoid contention on mmap_sem when the guests start to run
- Add event channel upcall support to xen_shinfo_test
x86 changes:
- Fixes for Xen emulation
- Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache
- Fixes for migration of 32-bit nested guests on 64-bit hypervisor
- Compilation fixes
- More SEV cleanups
Generic:
- Cap the return value of KVM_CAP_NR_VCPUS to both KVM_CAP_MAX_VCPUS
and num_online_cpus(). Most architectures were only using one of
the two"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus()
KVM: RISC-V: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
KVM: x86: Assume a 64-bit hypercall for guests with protected state
selftests: KVM: Add /x86_64/sev_migrate_tests to .gitignore
riscv: kvm: fix non-kernel-doc comment block
KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()
KVM: SEV: Drop a redundant setting of sev->asid during initialization
KVM: SEV: WARN if SEV-ES is marked active but SEV is not
KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()
KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache
KVM: nVMX: Use a gfn_to_hva_cache for vmptrld
KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check
KVM: x86/xen: Use sizeof_field() instead of open-coding it
KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12
KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
...
Linus Torvalds [Thu, 18 Nov 2021 19:01:06 +0000 (11:01 -0800)]
Merge tag 'docs-5.16-2' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"A handful of documentation fixes for 5.16"
* tag 'docs-5.16-2' of git://git.lwn.net/linux:
Documentation/process: fix a cross reference
Documentation: update vcpu-requests.rst reference
docs: accounting: update delay-accounting.rst reference
libbpf: update index.rst reference
docs: filesystems: Fix grammatical error "with" to "which"
doc/zh_CN: fix a translation error in management-style
docs: ftrace: fix the wrong path of tracefs
Documentation: arm: marvell: Fix link to armada_1000_pb.pdf document
Documentation: arm: marvell: Put Armada XP section between Armada 370 and 375
Documentation: arm: marvell: Add some links to homepage / product infos
docs: Update Sphinx requirements
Linus Torvalds [Thu, 18 Nov 2021 18:50:45 +0000 (10:50 -0800)]
Merge tag 'printk-for-5.16-fixup' of git://git./linux/kernel/git/printk/linux
Pull printk fixes from Petr Mladek:
- Try to flush backtraces from other CPUs also on the local one. This
was a regression caused by printk_safe buffers removal.
- Remove header dependency warning.
* tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: Remove printk.h inclusion in percpu.h
printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
Dan Carpenter [Thu, 18 Nov 2021 11:22:11 +0000 (14:22 +0300)]
ptp: ocp: Fix a couple NULL vs IS_ERR() checks
The ptp_ocp_get_mem() function does not return NULL, it returns error
pointers.
Fixes: 773bda964921 ("ptp: ocp: Expose various resources on the timecard.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 Nov 2021 12:11:51 +0000 (12:11 +0000)]
Merge branch 'lan78xx-napi'
John Efstathiades says:
===================
lan78xx NAPI Performance Improvements
This patch set introduces a set of changes to the lan78xx driver
that were originally developed as part of an investigation into
the performance of TCP and UDP transfers on an Android system.
The changes increase the throughput of both UDP and TCP transfers
and reduce the overall CPU load.
These improvements are also seen on a standard Linux kernel. Typical
results are included at the end of this document.
The changes to the driver evolved over time. The patches presented
here attempt to organise the changes in to coherent blocks that
affect logically connected parts of the driver. The patches do not
reflect the way in which the code evolved during the performance
investigation.
Each patch produces a working driver that has an incremental
improvement but patches 2, 3 and 6 should be considered a single
update.
The changes affect the following parts of the driver:
1. Deferred URB processing
The deferred URB processing that was originally done by a tasklet
is now done by a NAPI polling routine. The NAPI cycle has a fixed
work budget that controls how many received frames are passed to
the network stack.
Patch 6 introduces the NAPI polling but depends on preceding patches.
The new NAPI polling routine is also responsible for submitting
Rx and Tx URBs to the USB host controller.
Moving the URB processing to a NAPI-based system "smoothed"
incoming and outgoing data flows on the Android system under
investigation. However, taken in isolation, moving from a tasklet
approach to a NAPI approach made little or no difference to the
overall performance.
2. URB buffer management
The driver creates a pool of Tx and a pool of Rx URB buffers. Each
buffer is large enough to accommodate a packet with the maximum MTU
data. URBs are allocated from these pools as required.
Patch 2 introduces the new Tx buffer pool.
Patch 3 introduces the new Rx buffer pool.
3. Tx pending data
SKBs containing data to be transmitted are added to a queue. The
driver tracks free Tx URBs and the corresponding free Tx URB space.
When new Tx URBs are submitted, pending data is copied into the
URB buffer until the URB buffer is filled or there is no more
pending data. This maximises utilisation the LAN78xx internal
USB and network frame buffers.
New Tx URBs are submitted to the USB host controller as part of the
NAPI polling cycle.
Patch 2 introduces these changes.
4. Rx URB completion
A new URB is no longer submitted as part of the URB completion
callback.
New URBs are submitted during the NAPI polling cycle.
Patch 3 introduces these changes.
5. Rx URB processing
Completed URBs are put on to queue for processing (as is done in the
current driver). Network packets in completed URBs are copied from
the URB buffer in to dynamically allocated SKBs and passed to
the network stack.
The emptied URBs are resubmitted to the USB host controller.
Patch 3 introduces this change. Patch 6 updates the change to use
NAPI SKBs.
Each packet passed to the network stack is a single NAPI work item.
If the NAPI work budget is exhausted the remaining packets in the
URB are put onto an overflow queue that is processed at the start
of the next NAPI cycle.
Patch 6 introduces this change.
6. Driver-specific hard_header_len
The driver-specific hard_header_len adjustment was removed as it
broke generic receive offload (GRO) processing. Moreover, it was no
longer required due the change in Tx pending data management (see
point 3. above).
Patch 5 introduces this change.
The modification has been tested on four different target machines:
Target | CPU | ARCH | cores | kernel | RAM |
-----------------+------------+---------+-------+--------+-------|
Raspberry Pi 4B | Cortex-A72 | aarch64 | 4 | 64-bit | 2 GB |
Nitrogen8M SBC | Cortex-A53 | aarch64 | 4 | 64-bit | 2 GB |
Compaq Pressario | Pentium D | i686 | 2 | 32-bit | 4 GB |
Dell T3620 | Core i3 | x86_64 | 2+2 | 64-bit | 16 GB |
The targets, apart from the Compaq, each have an on-chip USB3 host
controller. A PCIe-based USB3 host controller card was added to the
Compaq to provide the necessary USB3 host interface.
The network throughput was measured using iperf3. The peer device was
a second Dell T3620 fitted with an Intel i210 network interface. The
target machine and the peer device were connected via a Netgear GS105
gigabit switch.
The CPU load was measured using mpstat running on the target machine.
The tables below summarise the throughput and CPU load improvements
achieved by the updated driver.
The bandwidth is the average bandwidth reported by iperf3 at the end
of a 60-second test.
The percentage idle figure is the average idle reported across all
CPU cores on the target machine for the duration of the test.
TCP Rx (target receiving, peer transmitting)
| Standard Driver | NAPI Driver |
Target | Bandwidth | % Idle | Bandwidth | % Idle |
-----------------+-----------+--------+--------------------|
RPi4 Model B | 941 | 74.9 | 941 | 91.5 |
Nitrogen8M | 941 | 76.2 | 941 | 92.7 |
Compaq Pressario | 941 | 44.5 | 941 | 82.1 |
Dell T3620 | 941 | 88.9 | 941 | 98.3 |
TCP Tx (target transmitting, peer receiving)
| Standard Driver | NAPI Driver |
Target | Bandwidth | % Idle | Bandwidth | % Idle |
-----------------+-----------+--------+--------------------|
RPi4 Model B | 683 | 80.1 | 942 | 97.6 |
Nitrogen8M | 942 | 97.8 | 942 | 97.3 |
Compaq Pressario | 939 | 80.0 | 942 | 91.2 |
Dell T3620 | 942 | 95.3 | 942 | 97.6 |
UDP Rx (target receiving, peer transmitting)
| Standard Driver | NAPI Driver |
Target | Bandwidth | % Idle | Bandwidth | % Idle |
-----------------+-----------+--------+--------------------|
RPi4 Model B | - | - | 958 (0%) | 76.2 |
Nitrogen8M | 690 (25%) | 57.7 | 937 (0%) | 68.5 |
Compaq Pressario | 958 (0%) | 50.2 | 958 (0%) | 61.6 |
Dell T3620 | 958 (0%) | 89.6 | 958 (0%) | 85.3 |
The figure in brackets is the percentage packet loss.
UDP Tx (target transmitting, peer receiving)
| Standard Driver | NAPI Driver |
Target | Bandwidth | % Idle | Bandwidth | % Idle |
-----------------+-----------+--------+--------------------|
RPi4 Model B | 370 | 75.0 | 886 | 78.9 |
Nitrogen8M | 710 | 75.0 | 958 | 85.3 |
Compaq Pressario | 958 | 65.5 | 958 | 76.6 |
Dell T3620 | 958 | 97.0 | 958 | 97.3 |
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John Efstathiades [Thu, 18 Nov 2021 11:01:39 +0000 (11:01 +0000)]
lan78xx: Introduce NAPI polling support
This patch introduces a NAPI-style approach for processing completed
Rx URBs that contributes to improving driver throughput and reducing
CPU load.
Packets in completed URBs are copied to NAPI SKBs and passed to the
network stack for processing. Each frame passed to the stack is one
work item in the NAPI budget.
If the NAPI budget is consumed and frames remain, they are added to
an overflow queue that is processed at the start of the next NAPI
polling cycle.
The NAPI handler is also responsible for copying pending Tx data to
Tx URBs and submitting them to the USB host controller for
transmission.
Signed-off-by: John Efstathiades <john.efstathiades@pebblebay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Efstathiades [Thu, 18 Nov 2021 11:01:38 +0000 (11:01 +0000)]
lan78xx: Remove hardware-specific header update
Remove hardware-specific header length adjustment as it is no longer
required. It also breaks generic receive offload (GRO) processing of
received TCP frames that results in a TCP ACK being sent for each
received frame.
Signed-off-by: John Efstathiades <john.efstathiades@pebblebay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Efstathiades [Thu, 18 Nov 2021 11:01:37 +0000 (11:01 +0000)]
lan78xx: Re-order rx_submit() to remove forward declaration
Move position of rx_submit() to remove forward declaration of
rx_complete() which is now no longer required.
Signed-off-by: John Efstathiades <john.efstathiades@pebblebay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Efstathiades [Thu, 18 Nov 2021 11:01:36 +0000 (11:01 +0000)]
lan78xx: Introduce Rx URB processing improvements
This patch introduces a new approach to allocating and managing
Rx URBs that contributes to improving driver throughput and reducing
CPU load.
A pool of Rx URBs is created during driver instantiation. All the
URBs are initially submitted to the USB host controller for
processing.
The default URB buffer size is different for each USB bus speed.
The chosen sizes provide good USB utilisation with little impact on
overall packet latency.
Completed URBs are processed in the driver bottom half. The URB
buffer contents are copied to a dynamically allocated SKB, which is
then passed to the network stack. The URB is then re-submitted to
the USB host controller.
NOTE: the call to skb_copy() in rx_process() that copies the URB
contents to a new SKB is a temporary change to make this patch work
in its own right. This call will be removed when the NAPI processing
is introduced by patch 6 in this patch set.
Signed-off-by: John Efstathiades <john.efstathiades@pebblebay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Efstathiades [Thu, 18 Nov 2021 11:01:35 +0000 (11:01 +0000)]
lan78xx: Introduce Tx URB processing improvements
This patch introduces a new approach to allocating and managing
Tx URBs that contributes to improving driver throughput and reducing
CPU load.
A pool of Tx URBs is created during driver instantiation. A URB is
allocated from the pool when there is data to transmit. The URB is
released back to the pool when the data has been transmitted by the
device.
The default URB buffer size is different for each USB bus speed.
The chosen sizes provide good USB utilisation with little impact on
overall packet latency.
SKBs to be transmitted are added to a pending queue for processing.
The driver tracks the available Tx URB buffer space and copies as
much pending data as possible into each free URB. Each full URB
is then submitted to the USB host controller for transmission.
Signed-off-by: John Efstathiades <john.efstathiades@pebblebay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Efstathiades [Thu, 18 Nov 2021 11:01:34 +0000 (11:01 +0000)]
lan78xx: Fix memory allocation bug
Fix memory allocation that fails to check for NULL return.
Signed-off-by: John Efstathiades <john.efstathiades@pebblebay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 Nov 2021 12:07:24 +0000 (12:07 +0000)]
Merge branch 'dsa-felix-psfp'
Xiaoliang Yang says:
====================
net: dsa: felix: psfp support on vsc9959
VSC9959 hardware supports Per-Stream Filtering and Policing(PSFP).
This patch series add PSFP support on tc flower offload of ocelot
driver. Use chain 30000 to distinguish PSFP from VCAP blocks. Add gate
and police set to support PSFP in VSC9959 driver.
v6-v7 changes:
- Add a patch to restrict psfp rules on ingress port.
- Using stats.drops to show the packet count discarded by the rule.
v5->v6 changes:
- Modify ocelot_mact_lookup() parameters.
- Use parameters ssid and sfid instead of streamdata in
ocelot_mact_learn_streamdata() function.
- Serialize STREAMDATA and MAC table write.
v4->v5 changes:
- Add MAC table lock patch, and move stream data write in
ocelot_mact_learn_streamdata().
- Add two sections of VCAP policers to Seville platform.
v3->v4 changes:
- Introduce vsc9959_psfp_sfi_table_get() function in patch where it is
used to fix compile warning.
v2->v3 changes:
- Reorder first two patches. Export struct ocelot_mact_entry, then add
ocelot_mact_lookup() and ocelot_mact_write() functions.
- Add PSFP list to struct ocelot, and init it by using
ocelot->ops->psfp_init().
v1->v2 changes:
- Use tc flower offload of ocelot driver to support PSFP add and delete.
- Add PSFP tables add/del functions in felix_vsc9959.c.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Thu, 18 Nov 2021 10:12:04 +0000 (18:12 +0800)]
net: dsa: felix: restrict psfp rules on ingress port
PSFP rules take effect on the streams from any port of VSC9959 switch.
This patch use ingress port to limit the rule only active on this port.
Each stream can only match two ingress source ports in VSC9959. Streams
from lowest port gets the configuration of SFID pointed by MAC Table
lookup and streams from highest port gets the configuration of (SFID+1)
pointed by MAC Table lookup. This patch defines the PSFP rule on highest
port as dummy rule, which means that it does not modify the MAC table.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Thu, 18 Nov 2021 10:12:03 +0000 (18:12 +0800)]
net: dsa: felix: use vcap policer to set flow meter for psfp
This patch add police action to set flow meter table which is defined
in IEEE802.1Qci. Flow metering is two rates two buckets and three color
marker to policing the frames, we only enable one rate one bucket in
this patch.
Flow metering shares a same policer pool with VCAP policers, so the PSFP
policer calls ocelot_vcap_policer_add() and ocelot_vcap_policer_del() to
set flow meter police.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Thu, 18 Nov 2021 10:12:02 +0000 (18:12 +0800)]
net: mscc: ocelot: use index to set vcap policer
Policer was previously automatically assigned from the highest index to
the lowest index from policer pool. But police action of tc flower now
uses index to set an police entry. This patch uses the police index to
set vcap policers, so that one policer can be shared by multiple rules.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Thu, 18 Nov 2021 10:12:01 +0000 (18:12 +0800)]
net: dsa: felix: add stream gate settings for psfp
This patch adds stream gate settings for PSFP. Use SGI table to store
stream gate entries. Disable the gate entry when it is not used by any
stream.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Thu, 18 Nov 2021 10:12:00 +0000 (18:12 +0800)]
net: dsa: felix: support psfp filter on vsc9959
VSC9959 supports Per-Stream Filtering and Policing(PSFP) that complies
with the IEEE 802.1Qci standard. The stream is identified by Null stream
identification(DMAC and VLAN ID) defined in IEEE802.1CB.
For PSFP, four tables need to be set up: stream table, stream filter
table, stream gate table, and flow meter table. Identify the stream by
parsing the tc flower keys and add it to the stream table. The stream
filter table is automatically maintained, and its index is determined by
SGID(flow gate index) and FMID(flow meter index).
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Thu, 18 Nov 2021 10:11:59 +0000 (18:11 +0800)]
net: mscc: ocelot: add gate and police action offload to PSFP
PSFP support gate and police action. This patch add the gate and police
action to flower parse action, check chain ID to determine which block
to offload. Adding psfp callback functions to add, delete and update gate
and police in PSFP table if hardware supports it.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Thu, 18 Nov 2021 10:11:58 +0000 (18:11 +0800)]
net: mscc: ocelot: set vcap IS2 chain to goto PSFP chain
Some chips in the ocelot series such as VSC9959 support Per-Stream
Filtering and Policing(PSFP), which is processing after VCAP blocks.
We set this block on chain 30000 and set vcap IS2 chain to goto PSFP
chain if hardware support.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Thu, 18 Nov 2021 10:11:57 +0000 (18:11 +0800)]
net: mscc: ocelot: add MAC table stream learn and lookup operations
ocelot_mact_learn_streamdata() can be used in VSC9959 to overwrite an
FDB entry with stream data. The stream data includes SFID and SSID which
can be used for PSFP and FRER set.
ocelot_mact_lookup() can be used to check if the given {DMAC, VID} FDB
entry is exist, and also can retrieve the DEST_IDX and entry type for
the FDB entry.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Teng Qi [Thu, 18 Nov 2021 07:01:18 +0000 (15:01 +0800)]
net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()
The definition of macro MOTO_SROM_BUG is:
#define MOTO_SROM_BUG (lp->active == 8 && (get_unaligned_le32(
dev->dev_addr) & 0x00ffffff) == 0x3e0008)
and the if statement
if (MOTO_SROM_BUG) lp->active = 0;
using this macro indicates lp->active could be 8. If lp->active is 8 and
the second comparison of this macro is false. lp->active will remain 8 in:
lp->phy[lp->active].gep = (*p ? p : NULL); p += (2 * (*p) + 1);
lp->phy[lp->active].rst = (*p ? p : NULL); p += (2 * (*p) + 1);
lp->phy[lp->active].mc = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].ana = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].fdx = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].ttm = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].mci = *p;
However, the length of array lp->phy is 8, so array overflows can occur.
To fix these possible array overflows, we first check lp->active and then
return -EINVAL if it is greater or equal to ARRAY_SIZE(lp->phy) (i.e. 8).
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Teng Qi <starmiku1207184332@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeremy Kerr [Thu, 18 Nov 2021 06:57:23 +0000 (14:57 +0800)]
mctp/test: Update refcount checking in route fragment tests
In
99ce45d5e, we moved a route refcount decrement from
mctp_do_fragment_route into the caller. This invalidates the assumption
that the route test makes about refcount behaviour, so the route tests
fail.
This change fixes the test case to suit the new refcount behaviour.
Fixes: 99ce45d5e7db ("mctp: Implement extended addressing")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yao Jing [Thu, 18 Nov 2021 06:10:18 +0000 (06:10 +0000)]
ipv6: ah6: use swap() to make code cleaner
Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
opencoding it.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yao Jing <yao.jing2@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
zhangyue [Thu, 18 Nov 2021 05:46:32 +0000 (13:46 +0800)]
net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound
In line 5001, if all id in the array 'lp->phy[8]' is not 0, when the
'for' end, the 'k' is 8.
At this time, the array 'lp->phy[8]' may be out of bound.
Signed-off-by: zhangyue <zhangyue1@kylinos.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 18 Nov 2021 01:57:29 +0000 (17:57 -0800)]
tcp: add missing htmldocs for skb->ll_node and sk->defer_list
Add missing entries to fix these "make htmldocs" warnings.
./include/linux/skbuff.h:953: warning: Function parameter or member 'll_node' not described in 'sk_buff'
./include/net/sock.h:540: warning: Function parameter or member 'defer_list' not described in 'sock'
Fixes: f35f821935d8 ("tcp: defer skb freeing after socket lock is released")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 Nov 2021 11:49:52 +0000 (11:49 +0000)]
Merge branch '10GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
10GbE Intel Wired LAN Driver Updates 2021-11-17
Radoslaw Tyl says:
The change is a consequence of errors reported by the ixgbevf driver
while starting several virtual guests at the same time on ESX host.
During this, VF was not able to communicate correctly with the PF,
as a result reported "PF still in reset state. Is the PF interface up?"
and then goes to locked state. The only thing left was to reload
the VF driver on the guest OS.
The background of the problem is that the current PFU and VFU
semaphore locking mechanism between sender and receiver may cause
overriding Mailbox memory (VFMBMEM), in such scenario receiver of
the original message will read the invalid, corrupted or one (or more)
message may be lost.
This change is actually as a support for communication with PF ESX
driver and does not contains changes and support for ixgbe driver.
For maintain backward compatibility, previous communication method
has been preserved in the form of LEGACY functions.
In the future there is a plan to add a support for a 1.5 mailbox API
communication also to ixgbe driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 Nov 2021 11:48:33 +0000 (11:48 +0000)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/net-
queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-11-17
This series contains updates to i40e driver only.
Eryk adds accounting for VLAN header in packet size when VF port VLAN is
configured. He also fixes TC queue distribution when the user has changed
queue counts as well as for configuration of VF ADQ which caused dropped
packets.
Michal adds tracking for when a VSI is being released to prevent null
pointer dereference when managing filters.
Karen ensures PF successfully initiates VF requested reset which could
cause a call trace otherwise.
Jedrzej moves validation of channel queue value earlier to prevent
partial configuration when the value is invalid.
Grzegorz corrects the reported error when adding filter fails.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 17 Nov 2021 17:36:29 +0000 (09:36 -0800)]
net: mdio: Replaced BUG_ON() with WARN()
Killing the kernel because a certain MDIO bus object is not in the
desired state at various points in the registration or unregistration
paths is excessive and is not helping in troubleshooting or fixing
issues. Replace the BUG_ON() with WARN() and print out the MDIO bus name
to facilitate debugging.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jordy Zomer [Wed, 17 Nov 2021 19:06:48 +0000 (20:06 +0100)]
ipv6: check return value of ipv6_skip_exthdr
The offset value is used in pointer math on skb->data.
Since ipv6_skip_exthdr may return -1 the pointer to uh and th
may not point to the actual udp and tcp headers and potentially
overwrite other stuff. This is why I think this should be checked.
EDIT: added {}'s, thanks Kees
Signed-off-by: Jordy Zomer <jordy@pwning.systems>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Wed, 17 Nov 2021 20:59:52 +0000 (12:59 -0800)]
e100: fix device suspend/resume
As reported in [1], e100 was no longer working for suspend/resume
cycles. The previous commit mentioned in the fixes appears to have
broken things and this attempts to practice best known methods for
device power management and keep wake-up working while allowing
suspend/resume to work. To do this, I reorder a little bit of code
and fix the resume path to make sure the device is enabled.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=214933
Fixes: 69a74aef8a18 ("e100: use generic power management")
Cc: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Reported-by: Alexey Kuznetsov <axet@me.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Alexey Kuznetsov <axet@me.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 Nov 2021 11:38:45 +0000 (11:38 +0000)]
Merge branch 'dpaa2-phylink'
Russell King says:
====================
net: dpaa2: phylink validate implementation updates
This series converts dpaa2 to fill in the supported_interfaces member
of phylink_config, cleans up the validate() implementation, and then
converts to phylink_generic_validate(). Previous behaviour should be
preserved.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King (Oracle) [Wed, 17 Nov 2021 17:24:13 +0000 (17:24 +0000)]
net: dpaa2-mac: use phylink_generic_validate()
DPAA2 has no special behaviour in its validation implementation, so can
be switched to phylink_generic_validate().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King (Oracle) [Wed, 17 Nov 2021 17:24:07 +0000 (17:24 +0000)]
net: dpaa2-mac: remove interface checks in dpaa2_mac_validate()
As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode, nor handle
PHY_INTERFACE_MODE_NA in the validation function. Remove these to
simplify the implementation.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Wed, 17 Nov 2021 17:24:02 +0000 (17:24 +0000)]
net: dpaa2-mac: populate supported_interfaces member
Populate the phy interface mode bitmap for the Freescale DPAA2 driver
with interfaces modes supported by the MAC.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 Nov 2021 11:36:48 +0000 (11:36 +0000)]
Merge branch 'ag71xx-phylink'
Russell King says:
====================
net: ag71xx: phylink validate implementation updates
This series converts ag71xx to fill in the supported_interfaces member
of phylink_config, cleans up the validate() implementation, and then
converts to phylink_generic_validate().
The question over the port linkmode restriction has been answered by
Oleksij - there is no reason for this restriction, so we can go the
whole hog with this conversion. Thanks!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King (Oracle) [Wed, 17 Nov 2021 16:46:31 +0000 (16:46 +0000)]
net: ag71xx: use phylink_generic_validate()
ag71xx apparently only supports MII port type, which makes it different
from other implementations. However, Oleksij says there is no special
reason for this.
Convert the driver to use phylink_generic_validate(), which will allow
all ethtool port linkmodes instead of only MII, giving the driver
consistent behaviour with other drivers.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King (Oracle) [Wed, 17 Nov 2021 16:46:25 +0000 (16:46 +0000)]
net: ag71xx: remove interface checks in ag71xx_mac_validate()
As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode, nor handle
PHY_INTERFACE_MODE_NA in the validation function. Remove these to
simplify the implementation.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Wed, 17 Nov 2021 16:46:20 +0000 (16:46 +0000)]
net: ag71xx: populate supported_interfaces member
Populate the phy_interface_t bitmap for the Atheros ag71xx driver with
interfaces modes supported by the MAC.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Wed, 17 Nov 2021 14:49:09 +0000 (16:49 +0200)]
devlink: Don't throw an error if flash notification sent before devlink visible
The mlxsw driver calls to various devlink flash routines even before
users can get any access to the devlink instance itself. For example,
mlxsw_core_fw_rev_validate() one of such functions.
__mlxsw_core_bus_device_register
-> mlxsw_core_fw_rev_validate
-> mlxsw_core_fw_flash
-> mlxfw_firmware_flash
-> mlxfw_status_notify
-> devlink_flash_update_status_notify
-> __devlink_flash_update_notify
-> WARN_ON(...)
It causes to the WARN_ON to trigger warning about devlink not registered.
Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Reported-by: Danielle Ratson <danieller@nvidia.com>
Tested-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bhupesh Sharma [Wed, 17 Nov 2021 11:05:38 +0000 (16:35 +0530)]
net: stmmac: dwmac-qcom-ethqos: add platform level clocks management
Split clocks settings from init callback into clks_config callback,
which could support platform level clock management.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Bhupesh Sharma <bhupesh.sharma@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Wed, 17 Nov 2021 07:56:52 +0000 (15:56 +0800)]
page_pool: Revert "page_pool: disable dma mapping support..."
This reverts commit
d00e60ee54b12de945b8493cf18c1ada9e422514.
As reported by Guillaume in [1]:
Enabling LPAE always enables CONFIG_ARCH_DMA_ADDR_T_64BIT
in 32-bit systems, which breaks the bootup proceess when a
ethernet driver is using page pool with PP_FLAG_DMA_MAP flag.
As we were hoping we had no active consumers for such system
when we removed the dma mapping support, and LPAE seems like
a common feature for 32 bits system, so revert it.
1. https://www.spinics.net/lists/netdev/msg779890.html
Fixes: d00e60ee54b1 ("page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Tested-by: "kernelci.org bot" <bot@kernelci.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Teng Qi [Wed, 17 Nov 2021 03:44:53 +0000 (11:44 +0800)]
ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port()
The if statement:
if (port >= DSAF_GE_NUM)
return;
limits the value of port less than DSAF_GE_NUM (i.e., 8).
However, if the value of port is 6 or 7, an array overflow could occur:
port_rst_off = dsaf_dev->mac_cb[port]->port_rst_off;
because the length of dsaf_dev->mac_cb is DSAF_MAX_PORT_NUM (i.e., 6).
To fix this possible array overflow, we first check port and if it is
greater than or equal to DSAF_MAX_PORT_NUM, the function returns.
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Teng Qi <starmiku1207184332@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Mladek [Thu, 18 Nov 2021 09:03:47 +0000 (10:03 +0100)]
Merge branch 'rework/printk_safe-removal' into for-linus
Helge Deller [Wed, 17 Nov 2021 14:48:45 +0000 (15:48 +0100)]
parisc: Enable CONFIG_PRINTK_TIME=y in 32bit defconfig
Signed-off-by: Helge Deller <deller@gmx.de>
Helge Deller [Wed, 17 Nov 2021 10:05:07 +0000 (11:05 +0100)]
Revert "parisc: Reduce sigreturn trampoline to 3 instructions"
This reverts commit
e4f2006f1287e7ea17660490569cff323772dac4.
This patch shows problems with signal handling. Revert it for now.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v5.15
Helge Deller [Tue, 16 Nov 2021 12:12:21 +0000 (13:12 +0100)]
parisc: Wrap assembler related defines inside __ASSEMBLY__
Building allmodconfig shows errors in the gpu/drm/msm snapdragon drivers,
because a COND() define is used there which conflicts with the COND() for
PA-RISC assembly. Although the snapdragon driver isn't relevant for parisc, it
is nevertheless compiled when CONFIG_COMPILE_TEST is defined.
Move the COND() define and other PA-RISC mnemonics inside the #ifdef
__ASSEMBLY__ part to avoid this conflict.
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: kernel test robot <lkp@intel.com>
Helge Deller [Tue, 16 Nov 2021 12:11:26 +0000 (13:11 +0100)]
parisc: Wire up futex_waitv
Signed-off-by: Helge Deller <deller@gmx.de>
Helge Deller [Mon, 15 Nov 2021 16:34:50 +0000 (17:34 +0100)]
parisc: Include stringify.h to avoid build error in crypto/api.c
Include stringify.h to avoid this build error:
arch/parisc/include/asm/jump_label.h: error: expected ':' before '__stringify'
arch/parisc/include/asm/jump_label.h: error: label 'l_yes' defined but not used [-Werror=unused-label]
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: kernel test robot <lkp@intel.com>
Vitaly Kuznetsov [Tue, 16 Nov 2021 16:34:43 +0000 (17:34 +0100)]
KVM: x86: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
It doesn't make sense to return the recommended maximum number of
vCPUs which exceeds the maximum possible number of vCPUs.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20211116163443.88707-7-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Tue, 16 Nov 2021 16:34:42 +0000 (17:34 +0100)]
KVM: s390: Cap KVM_CAP_NR_VCPUS by num_online_cpus()
KVM_CAP_NR_VCPUS is a legacy advisory value which on other architectures
return num_online_cpus() caped by KVM_CAP_NR_VCPUS or something else
(ppc and arm64 are special cases). On s390, KVM_CAP_NR_VCPUS returns
the same as KVM_CAP_MAX_VCPUS and this may turn out to be a bad
'advice'. Switch s390 to returning caped num_online_cpus() too.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Message-Id: <
20211116163443.88707-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Tue, 16 Nov 2021 16:34:41 +0000 (17:34 +0100)]
KVM: RISC-V: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
It doesn't make sense to return the recommended maximum number of
vCPUs which exceeds the maximum possible number of vCPUs.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Anup Patel <anup.patel@wdc.com>
Message-Id: <
20211116163443.88707-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Tue, 16 Nov 2021 16:34:40 +0000 (17:34 +0100)]
KVM: PPC: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
It doesn't make sense to return the recommended maximum number of
vCPUs which exceeds the maximum possible number of vCPUs.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20211116163443.88707-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Tue, 16 Nov 2021 16:34:39 +0000 (17:34 +0100)]
KVM: MIPS: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
It doesn't make sense to return the recommended maximum number of
vCPUs which exceeds the maximum possible number of vCPUs.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20211116163443.88707-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Tue, 16 Nov 2021 16:34:38 +0000 (17:34 +0100)]
KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
Generally, it doesn't make sense to return the recommended maximum number
of vCPUs which exceeds the maximum possible number of vCPUs.
Note: ARM64 is special as the value returned by KVM_CAP_MAX_VCPUS differs
depending on whether it is a system-wide ioctl or a per-VM one. Previously,
KVM_CAP_NR_VCPUS didn't have this difference and it seems preferable to
keep the status quo. Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
which is what gets returned by system-wide KVM_CAP_MAX_VCPUS.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20211116163443.88707-2-vkuznets@redhat.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tom Lendacky [Mon, 24 May 2021 17:48:57 +0000 (12:48 -0500)]
KVM: x86: Assume a 64-bit hypercall for guests with protected state
When processing a hypercall for a guest with protected state, currently
SEV-ES guests, the guest CS segment register can't be checked to
determine if the guest is in 64-bit mode. For an SEV-ES guest, it is
expected that communication between the guest and the hypervisor is
performed to shared memory using the GHCB. In order to use the GHCB, the
guest must have been in long mode, otherwise writes by the guest to the
GHCB would be encrypted and not be able to be comprehended by the
hypervisor.
Create a new helper function, is_64_bit_hypercall(), that assumes the
guest is in 64-bit mode when the guest has protected state, and returns
true, otherwise invoking is_64_bit_mode() to determine the mode. Update
the hypercall related routines to use is_64_bit_hypercall() instead of
is_64_bit_mode().
Add a WARN_ON_ONCE() to is_64_bit_mode() to catch occurences of calls to
this helper function for a guest running with protected state.
Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Reported-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <
e0b20c770c9d0d1403f23d83e785385104211f74.
1621878537.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Arnaldo Carvalho de Melo [Tue, 16 Nov 2021 15:03:25 +0000 (12:03 -0300)]
selftests: KVM: Add /x86_64/sev_migrate_tests to .gitignore
$ git status
nothing to commit, working tree clean
$
$ make -C tools/testing/selftests/kvm/ > /dev/null 2>&1
$ git status
Untracked files:
(use "git add <file>..." to include in what will be committed)
tools/testing/selftests/kvm/x86_64/sev_migrate_tests
nothing added to commit but untracked files present (use "git add" to track)
$
Fixes: 6a58150859fdec76 ("selftest: KVM: Add intra host migration tests")
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Message-Id: <YZPIPfvYgRDCZi/w@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Randy Dunlap [Sun, 7 Nov 2021 03:47:06 +0000 (20:47 -0700)]
riscv: kvm: fix non-kernel-doc comment block
Don't use "/**" to begin a comment block for a non-kernel-doc comment.
Prevents this docs build warning:
vcpu_sbi.c:3: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Copyright (c) 2019 Western Digital Corporation or its affiliates.
Fixes: dea8ee31a039 ("RISC-V: KVM: Add SBI v0.1 support")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Anup Patel <anup.patel@wdc.com>
Cc: kvm@vger.kernel.org
Cc: kvm-riscv@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Message-Id: <
20211107034706.30672-1-rdunlap@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 16 Nov 2021 13:08:19 +0000 (08:08 -0500)]
Merge branch 'kvm-5.16-fixes' into kvm-master
* Fixes for Xen emulation
* Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache
* Fixes for migration of 32-bit nested guests on 64-bit hypervisor
* Compilation fixes
* More SEV cleanups
Sean Christopherson [Tue, 9 Nov 2021 21:51:01 +0000 (21:51 +0000)]
KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()
Rename cmd_allowed_from_miror() to is_cmd_allowed_from_mirror(), fixing
a typo and making it obvious that the result is a boolean where
false means "not allowed".
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211109215101.
2211373-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 9 Nov 2021 21:51:00 +0000 (21:51 +0000)]
KVM: SEV: Drop a redundant setting of sev->asid during initialization
Remove a fully redundant write to sev->asid during SEV/SEV-ES guest
initialization. The ASID is set a few lines earlier prior to the call to
sev_platform_init(), which doesn't take "sev" as a param, i.e. can't
muck with the ASID barring some truly magical behind-the-scenes code.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211109215101.
2211373-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 9 Nov 2021 21:50:59 +0000 (21:50 +0000)]
KVM: SEV: WARN if SEV-ES is marked active but SEV is not
WARN if the VM is tagged as SEV-ES but not SEV. KVM relies on SEV and
SEV-ES being set atomically, and guards common flows with "is SEV", i.e.
observing SEV-ES without SEV means KVM has a fatal bug.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211109215101.
2211373-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 9 Nov 2021 21:50:58 +0000 (21:50 +0000)]
KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()
Set sev_info.active during SEV/SEV-ES activation before calling any code
that can potentially consume sev_info.es_active, e.g. set "active" and
"es_active" as a pair immediately after the initial sanity checks. KVM
generally expects that es_active can be true if and only if active is
true, e.g. sev_asid_new() deliberately avoids sev_es_guest() so that it
doesn't get a false negative. This will allow WARNing in sev_es_guest()
if the VM is tagged as SEV-ES but not SEV.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211109215101.
2211373-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 9 Nov 2021 21:50:56 +0000 (21:50 +0000)]
KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
Reject COPY_ENC_CONTEXT_FROM if the destination VM has created vCPUs.
KVM relies on SEV activation to occur before vCPUs are created, e.g. to
set VMCB flags and intercepts correctly.
Fixes: 54526d1fd593 ("KVM: x86: Support KVM VMs sharing SEV context")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Nathan Tempelman <natet@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211109215101.
2211373-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Mon, 15 Nov 2021 16:50:27 +0000 (16:50 +0000)]
KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache
In commit
7e2175ebd695 ("KVM: x86: Fix recording of guest steal time /
preempted status") I removed the only user of these functions because
it was basically impossible to use them safely.
There are two stages to the GFN->PFN mapping; first through the KVM
memslots to a userspace HVA and then through the page tables to
translate that HVA to an underlying PFN. Invalidations of the former
were being handled correctly, but no attempt was made to use the MMU
notifiers to invalidate the cache when the HVA->GFN mapping changed.
As a prelude to reinventing the gfn_to_pfn_cache with more usable
semantics, rip it out entirely and untangle the implementation of
the unsafe kvm_vcpu_map()/kvm_vcpu_unmap() functions from it.
All current users of kvm_vcpu_map() also look broken right now, and
will be dealt with separately. They broadly fall into two classes:
* Those which map, access the data and immediately unmap. This is
mostly gratuitous and could just as well use the existing user
HVA, and could probably benefit from a gfn_to_hva_cache as they
do so.
* Those which keep the mapping around for a longer time, perhaps
even using the PFN directly from the guest. These will need to
be converted to the new gfn_to_pfn_cache and then kvm_vcpu_map()
can be removed too.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <
20211115165030.7422-8-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Mon, 15 Nov 2021 16:50:26 +0000 (16:50 +0000)]
KVM: nVMX: Use a gfn_to_hva_cache for vmptrld
And thus another call to kvm_vcpu_map() can die.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <
20211115165030.7422-7-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Mon, 15 Nov 2021 16:50:25 +0000 (16:50 +0000)]
KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check
Kill another mostly gratuitous kvm_vcpu_map() which could just use the
userspace HVA for it.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <
20211115165030.7422-6-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Mon, 15 Nov 2021 16:50:23 +0000 (16:50 +0000)]
KVM: x86/xen: Use sizeof_field() instead of open-coding it
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <
20211115165030.7422-4-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Mon, 15 Nov 2021 16:50:24 +0000 (16:50 +0000)]
KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12
Using kvm_vcpu_map() for reading from the guest is entirely gratuitous,
when all we do is a single memcpy and unmap it again. Fix it up to use
kvm_read_guest()... but in fact I couldn't bring myself to do that
without also making it use a gfn_to_hva_cache for both that *and* the
copy in the other direction.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <
20211115165030.7422-5-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Mon, 15 Nov 2021 16:50:21 +0000 (16:50 +0000)]
KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
In commit
319afe68567b ("KVM: xen: do not use struct gfn_to_hva_cache") we
stopped storing this in-kernel as a GPA, and started storing it as a GFN.
Which means we probably should have stopped calling gpa_to_gfn() on it
when userspace asks for it back.
Cc: stable@vger.kernel.org
Fixes: 319afe68567b ("KVM: xen: do not use struct gfn_to_hva_cache")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <
20211115165030.7422-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Maxim Levitsky [Mon, 15 Nov 2021 13:18:37 +0000 (15:18 +0200)]
KVM: x86/mmu: include EFER.LMA in extended mmu role
Incorporate EFER.LMA into kvm_mmu_extended_role, as it used to compute the
guest root level and is not reflected in kvm_mmu_page_role.level when TDP
is in use. When simply running the guest, it is impossible for EFER.LMA
and kvm_mmu.root_level to get out of sync, as the guest cannot transition
from PAE paging to 64-bit paging without toggling CR0.PG, i.e. without
first bouncing through a different MMU context. And stuffing guest state
via KVM_SET_SREGS{,2} also ensures a full MMU context reset.
However, if KVM_SET_SREGS{,2} is followed by KVM_SET_NESTED_STATE, e.g. to
set guest state when migrating the VM while L2 is active, the vCPU state
will reflect L2, not L1. If L1 is using TDP for L2, then root_mmu will
have been configured using L2's state, despite not being used for L2. If
L2.EFER.LMA != L1.EFER.LMA, and L2 is using PAE paging, then root_mmu will
be configured for guest PAE paging, but will match the mmu_role for 64-bit
paging and cause KVM to not reconfigure root_mmu on the next nested VM-Exit.
Alternatively, the root_mmu's role could be invalidated after a successful
KVM_SET_NESTED_STATE that yields vcpu->arch.mmu != vcpu->arch.root_mmu,
i.e. that switches the active mmu to guest_mmu, but doing so is unnecessarily
tricky, and not even needed if L1 and L2 do have the same role (e.g., they
are both 64-bit guests and run with the same CR4).
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <
20211115131837.195527-3-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Maxim Levitsky [Mon, 15 Nov 2021 13:18:36 +0000 (15:18 +0200)]
KVM: nVMX: don't use vcpu->arch.efer when checking host state on nested state load
When loading nested state, don't use check vcpu->arch.efer to get the
L1 host's 64-bit vs. 32-bit state and don't check it for consistency
with respect to VM_EXIT_HOST_ADDR_SPACE_SIZE, as register state in vCPU
may be stale when KVM_SET_NESTED_STATE is called---and architecturally
does not exist. When restoring L2 state in KVM, the CPU is placed in
non-root where nested VMX code has no snapshot of L1 host state: VMX
(conditionally) loads host state fields loaded on VM-exit, but they need
not correspond to the state before entry. A simple case occurs in KVM
itself, where the host RIP field points to vmx_vmexit rather than the
instruction following vmlaunch/vmresume.
However, for the particular case of L1 being in 32- or 64-bit mode
on entry, the exit controls can be treated instead as the source of
truth regarding the state of L1 on entry, and can be used to check
that vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE matches vmcs12.HOST_EFER if
vmcs12.VM_EXIT_LOAD_IA32_EFER is set. The consistency check on CPU
EFER vs. vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE, instead, happens only
on VM-Enter. That's because, again, there's conceptually no "current"
L1 EFER to check on KVM_SET_NESTED_STATE.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <
20211115131837.195527-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>