review.tizen.org Git - platform/kernel/linux-rpi.git/log

wifi: brcmfmac: chip: Only disable D11 cores; handle an arbitrary number

At least on BCM4387, the D11 cores are held in reset on cold startup and
firmware expects to release reset itself. Just assert reset here and let
firmware deassert it. Premature deassertion results in the firmware
failing to initialize properly some of the time, with strange AXI bus
errors.

Also, BCM4387 has 3 cores, up from 2. The logic for handling that is in
brcmf_chip_ai_resetcore(), but since we aren't using that any more, just
handle it here.

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230214092423.15175-1-marcan@marcan.st

wifi: brcmfmac: support CQM RSSI notification with older firmware

Using the BCM4339 firmware from linux-firmware (version "BCM4339/2 wl0:
Sep 5 2019 11:05:52 version 6.37.39.113 (r722271 CY)" from
cypress/cyfmac4339-sdio.bin) the RSSI respose is only 4 bytes, which
results in an error being logged.

It seems that older devices send only the RSSI field and neither SNR nor
noise is included. Handle this by accepting a 4 byte message and
reading only the RSSI from it.

Fixes: 7dd56ea45a66 ("brcmfmac: add support for CQM RSSI notifications")
Signed-off-by: John Keeping <john@metanate.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230124104248.2917465-1-john@metanate.com

wifi: brcmfmac: pcie: Provide a buffer of random bytes to the device

Newer Apple firmwares on chipsets without a hardware RNG require the
host to provide a buffer of 256 random bytes to the device on
initialization. This buffer is present immediately before NVRAM,
suffixed by a footer containing a magic number and the buffer length.

This won't affect chips/firmwares that do not use this feature, so do it
unconditionally for all Apple platforms (those with an Apple OTP).

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Hector Martin <marcan@marcan.st>
Reviewed-by: Julian Calaby <julian.calaby@gmail.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230214080034.3828-3-marcan@marcan.st

wifi: brcmfmac: acpi: Add support for fetching Apple ACPI properties

On DT platforms, the module-instance and antenna-sku-info properties
are passed in the DT. On ACPI platforms, module-instance is passed via
the analogous Apple device property mechanism, while the antenna SKU
info is instead obtained via an ACPI method that grabs it from
non-volatile storage.

Add support for this, to allow proper firmware selection on Apple
platforms.

Signed-off-by: Hector Martin <marcan@marcan.st>
Reviewed-by: Julian Calaby <julian.calaby@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230214080034.3828-2-marcan@marcan.st

wifi: rtw89: refine FW feature judgement on packet drop

The newer chips use the newer firmware branches, e.g. v027, v029.
And, those firmware branches are supposed to support packet drop
when they are just branched out.

The initial firmware branch used by each chip is as below.
* 8852A: v009
* 8852C: v027
* 8852B: v027

So, only 8852A may use firmware which doesn't support packet drop at
runtime. To save trivial positive listing in firmware feature table,
we change to reverse judgment.

Besides, rtw89_mac_ptk_drop_by_band_and_wait() missed to check firmware
feature before calling rtw89_fw_h2c_pkt_drop(). We also fix it.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230220070202.29868-7-pkshih@realtek.com

wifi: rtw89: 8852b: enable hw_scan support

This enables hw_scan for 8852b after firmware version 0.29.29.0.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230220070202.29868-6-pkshih@realtek.com

wifi: rtw89: 8852b: add channel encoding for hw_scan

To obtain correct packet frequency for hw_scan, 52b needs to decode
the channel index obtained from hardware. Change the driver related
set channel part as well to make driver/firmware compatible.

Signed-off-by: Po-Hao Huang <phhuang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230220070202.29868-5-pkshih@realtek.com

wifi: rtw89: adjust channel encoding to common function

Since the range of channel table is identical among ICs. Make channel
encode/decode function common and not IC dependent. So all ICs with
matching firmware that needs this kind of coding can use it directly.
This patch doesn't change logic at all.

Signed-off-by: Po-Hao Huang <phhuang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230220070202.29868-4-pkshih@realtek.com

wifi: rtw89: fw: configure CRASH_TRIGGER feature for 8852B

RTL8852B firmware supports CRASH_TRIGGER feature from v0.29.29.0.
After this is configured, debugfs fw_crash can support type 1 on
RTL8852B to trigger firmware crash and verify L2 recovery through
simulation.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230220070202.29868-3-pkshih@realtek.com

wifi: rtw89: add tx_wake notify for 8852B

8852B has the same issue: management frames get stuck when wifi
chip enters low ps mode, so we alse add notify wake function to
trigger wifi chip wake before forwarding management frames.

Signed-off-by: Chin-Yen Lee <timlee@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230220070202.29868-2-pkshih@realtek.com

wifi: rtw88: rtw8822c: Implement RTL8822CS (SDIO) efuse parsing

The efuse of the SDIO RTL8822CS chip has only one known member: the mac
address is at offset 0x16a. Add a struct rtw8822cs_efuse describing this
and use it for copying the mac address when the SDIO bus is used.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230218152944.48842-6-martin.blumenstingl@googlemail.com

wifi: rtw88: rtw8822b: Implement RTL8822BS (SDIO) efuse parsing

The efuse of the SDIO RTL8822BS chip has only one known member: the mac
address is at offset 0x11a. Add a struct rtw8822bs_efuse describing this
and use it for copying the mac address when the SDIO bus is used.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230218152944.48842-5-martin.blumenstingl@googlemail.com

wifi: rtw88: rtw8821c: Implement RTL8821CS (SDIO) efuse parsing

The efuse of the SDIO RTL8821CS chip has only one known member: the mac
address is at offset 0x11a. Add a struct rtw8821cs_efuse describing this
and use it for copying the mac address when the SDIO bus is used.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230218152944.48842-4-martin.blumenstingl@googlemail.com

wifi: rtw88: mac: Add SDIO HCI support in the TX/page table setup

txdma_queue_mapping() and priority_queue_cfg() can use the first entry
of each chip's rqpn_table and page_table. Add this mapping so data
transmission is possible on SDIO based chipsets.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230218152944.48842-3-martin.blumenstingl@googlemail.com

wifi: rtw88: mac: Add support for the SDIO HCI in rtw_pwr_seq_parser()

rtw_pwr_seq_parser() needs to know about the HCI bus interface mask for
the SDIO bus so it can parse the chip state change sequences.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230218152944.48842-2-martin.blumenstingl@googlemail.com

wifi: rtl8xxxu: Remove always true condition in rtl8xxxu_print_chipinfo

Fix a new smatch warning:
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c:1580 rtl8xxxu_print_chipinfo() warn: always true condition '(priv->chip_cut <= 15) => (0-15 <= 15)'

Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202302140753.71IgU77A-lkp@intel.com/
Fixes: 7b0ac469e331 ("wifi: rtl8xxxu: Recognise all possible chip cuts")
Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/68eff98b-a022-5a00-f330-adf623a35772@gmail.com

wifi: rtw89: add RNR support for 6 GHz scan

Since 6 GHz band has around 60 channels and more strict rules for
active probing. Reduced neighbor report can be used to reduce the
channels we scan and get specific target BSS info to probe for.

Declare flag WIPHY_FLAG_SPLIT_SCAN_6GHZ so the scan request could be
divided into two portions: legacy bands and 6 GHz bands. So RNR
information from legacy bands could later be used when 6 GHz scan.

When the scan flag NL80211_SCAN_FLAG_COLOCATED_6GHZ is set, cfg80211
will pass down a reduced channel set which contains PSCs and non-PSC
with RNR info received in the 2 GHz/5 GHz band. This reduces the
scan duration by allowing us to only scan for channels in which APs
are currently operating.

Signed-off-by: Po-Hao Huang <phhuang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230217120007.8835-1-pkshih@realtek.com

wifi: rtlwifi: rtl8192de: Remove the unused variable bcnfunc_enable

Variable bcnfunc_enable is not effectively used, so delete it.

drivers/net/wireless/realtek/rtlwifi/rtl8192de/hw.c:1050:5: warning: variable 'bcnfunc_enable' set but not used.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4110
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230217092529.105899-1-jiapeng.chong@linux.alibaba.com

wifi: rtl8xxxu: 8188e: parse single one element of RA report for station mode

Intentionally parsing single one element of RA report by breaking loop
causes a smatch warning:
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8188e.c:1678 rtl8188e_handle_ra_tx_report2() warn:
ignoring unreachable code.

With existing comments, it intends to process single one element for
station mode, but it will parse more elements in AP mode if it's
implemented. Implement program logic according to existing comment to avoid
smatch warning, and also be usable for both AP and stations modes.

Compile test only.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202302142135.LCqUTVGY-lkp@intel.com/
Cc: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Tested-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230216004654.4642-1-pkshih@realtek.com

wifi: wfx: Remove some dead code

wait_for_completion_timeout() can not return a <0 value.
So simplify the logic and remove dead code.

-ERESTARTSYS can not be returned by do_wait_for_common() for tasks with
TASK_UNINTERRUPTIBLE, which is the case for wait_for_completion_timeout()

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Jérôme Pouiller <jerome.pouiller@silabs.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/809c4a645c8d1306c0d256345515865c40ec731c.1676464422.git.christophe.jaillet@wanadoo.fr

wifi: rtlwifi: rtl8192ce: fix dealing empty EEPROM values

On OpenWRT platform, RTL8192CE could be soldered on board, but not standard PCI
module. In this case, some EEPROM values aren't programmed and left 0xff.
Load default values when the EEPROM values are empty to avoid problems.

Signed-off-by: Lu jicong <jiconglu58@gmail.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230214063602.2257263-1-jiconglu58@gmail.com

net/mlx5e: Align IPsec ASO result memory to be as required by hardware

Hardware requires an alignment to 64 bytes to return ASO data. Missing
this alignment caused to unpredictable results while ASO events were
generated.

Fixes: 8518d05b8f9a ("net/mlx5e: Create Advanced Steering Operation object for IPsec")
Reported-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/de0302c572b90c9224a72868d4e0d657b6313c4b.1676797613.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'mlx5-updates-2023-02-15' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-02-15

1) From Gal Tariq and Parav, Few cleanups for mlx5 driver.

2) From Vlad: Allow offloading of ct 'new' match based on [1]

[1] https://lore.kernel.org/netdev/20230201163100.1001180-1-vladbu@nvidia.com/

* tag 'mlx5-updates-2023-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: RX, Remove doubtful unlikely call
  net/mlx5e: Fix outdated TLS comment
  net/mlx5e: Remove unused function mlx5e_sq_xmit_simple
  net/mlx5e: Allow offloading of ct 'new' match
  net/mlx5e: Implement CT entry update
  net/mlx5: Simplify eq list traversal
  net/mlx5e: Remove redundant page argument in mlx5e_xdp_handle()
  net/mlx5e: Remove redundant page argument in mlx5e_xmit_xdp_buff()
  net/mlx5e: Switch to using napi_build_skb()
====================

Link: https://lore.kernel.org/r/20230218090513.284718-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-sched-cls_api-support-hardware-miss-to-tc-action'

Paul Blakey says:

====================
net/sched: cls_api: Support hardware miss to tc action

This series adds support for hardware miss to instruct tc to continue execution
in a specific tc action instance on a filter's action list. The mlx5 driver patch
(besides the refactors) shows its usage instead of using just chain restore.

Currently a filter's action list must be executed all together or
not at all as driver are only able to tell tc to continue executing from a
specific tc chain, and not a specific filter/action.

This is troublesome with regards to action CT, where new connections should
be sent to software (via tc chain restore), and established connections can
be handled in hardware.

Checking for new connections is done when executing the ct action in hardware
(by checking the packet's tuple against known established tuples).
But if there is a packet modification (pedit) action before action CT and the
checked tuple is a new connection, hardware will need to revert the previous
packet modifications before sending it back to software so it can
re-match the same tc filter in software and re-execute its CT action.

The following is an example configuration of stateless nat
on mlx5 driver that isn't supported before this patchet:

#Setup corrosponding mlx5 VFs in namespaces
$ ip netns add ns0
$ ip netns add ns1
$ ip link set dev enp8s0f0v0 netns ns0
$ ip netns exec ns0 ifconfig enp8s0f0v0 1.1.1.1/24 up
$ ip link set dev enp8s0f0v1 netns ns1
$ ip netns exec ns1 ifconfig enp8s0f0v1 1.1.1.2/24 up

#Setup tc arp and ct rules on mxl5 VF representors
$ tc qdisc add dev enp8s0f0_0 ingress
$ tc qdisc add dev enp8s0f0_1 ingress
$ ifconfig enp8s0f0_0 up
$ ifconfig enp8s0f0_1 up

#Original side
$ tc filter add dev enp8s0f0_0 ingress chain 0 proto ip flower \
    ct_state -trk ip_proto tcp dst_port 8888 \
      action pedit ex munge tcp dport set 5001 pipe \
      action csum ip tcp pipe \
      action ct pipe \
      action goto chain 1
$ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
    ct_state +trk+est \
      action mirred egress redirect dev enp8s0f0_1
$ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
    ct_state +trk+new \
      action ct commit pipe \
      action mirred egress redirect dev enp8s0f0_1
$ tc filter add dev enp8s0f0_0 ingress chain 0 proto arp flower \
      action mirred egress redirect dev enp8s0f0_1

#Reply side
$ tc filter add dev enp8s0f0_1 ingress chain 0 proto arp flower \
      action mirred egress redirect dev enp8s0f0_0
$ tc filter add dev enp8s0f0_1 ingress chain 0 proto ip flower \
    ct_state -trk ip_proto tcp \
      action ct pipe \
      action pedit ex munge tcp sport set 8888 pipe \
      action csum ip tcp pipe \
      action mirred egress redirect dev enp8s0f0_0

#Run traffic
$ ip netns exec ns1 iperf -s -p 5001&
$ sleep 2 #wait for iperf to fully open
$ ip netns exec ns0 iperf -c 1.1.1.2 -p 8888

#dump tc filter stats on enp8s0f0_0 chain 0 rule and see hardware packets:
$ tc -s filter show dev enp8s0f0_0 ingress chain 0 proto ip | grep "hardware.*pkt"
        Sent hardware 9310116832 bytes 6149672 pkt
        Sent hardware 9310116832 bytes 6149672 pkt
        Sent hardware 9310116832 bytes 6149672 pkt

A new connection executing the first filter in hardware will first rewrite
the dst port to the new port, and then the ct action is executed,
because this is a new connection, hardware will need to be send this back
to software, on chain 0, to execute the first filter again in software.
The dst port needs to be reverted otherwise it won't re-match the old
dst port in the first filter. Because of that, currently mlx5 driver will
reject offloading the above action ct rule.

This series adds support for hardware partially executing a filter's action list,
and letting tc software continue processing in the specific action instance
where hardware left off (in the above case after the "action pedit ex munge tcp
dport... of the first rule") allowing support for scenarios such as the above.
====================

Link: https://lore.kernel.org/r/20230217223620.28508-1-paulb@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TC, Set CT miss to the specific ct action instance

Currently, CT misses restore the missed chain on the tc skb extension so
tc will continue from the relevant chain. Instead, restore the CT action's
miss cookie on the extension, which will instruct tc to continue from the
this specific CT action instance on the relevant filter's action list.

Map the CT action's miss_cookie to a new miss object (ACT_MISS), and use
this miss mapping instead of the current chain miss object (CHAIN_MISS)
for CT action misses.

To restore this new miss mapping value, add a RX restore rule for each
such mapping value.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Sholmo <ozsh@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG

This reg usage is always a mapped object, not necessarily
containing chain info.

Rename to properly convey what it stores.
This patch doesn't change any functionality.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Refactor tc miss handling to a single function

Move tc miss handling code to en_tc.c, and remove
duplicate code.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Kconfig: Make tc offload depend on tc skb extension

Tc skb extension is a basic requirement for using tc
offload to support correct restoration on action miss.

Depend on it.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: flower: Support hardware miss to tc action

To support hardware miss to tc action in actions on the flower
classifier, implement the required getting of filter actions,
and setup filter exts (actions) miss by giving it the filter's
handle and actions.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: flower: Move filter handle initialization earlier

To support miss to action during hardware offload the filter's
handle is needed when setting up the actions (tcf_exts_init()),
and before offloading.

Move filter handle initialization earlier.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: cls_api: Support hardware miss to tc action

For drivers to support partial offload of a filter's action list,
add support for action miss to specify an action instance to
continue from in sw.

CT action in particular can't be fully offloaded, as new connections
need to be handled in software. This imposes other limitations on
the actions that can be offloaded together with the CT action, such
as packet modifications.

Assign each action on a filter's action list a unique miss_cookie
which drivers can then use to fill action_miss part of the tc skb
extension. On getting back this miss_cookie, find the action
instance with relevant cookie and continue classifying from there.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: Rename user cookie and act cookie

struct tc_action->act_cookie is a user defined cookie,
and the related struct flow_action_entry->act_cookie is
used as an handle similar to struct flow_cls_offload->cookie.

Rename tc_action->act_cookie to user_cookie, and
flow_action_entry->act_cookie to cookie so their names
would better fit their usage.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'ieee802154-for-net-next-2023-02-20' of git://git./linux/kernel/git/sschmidt/wpan-next

Stefan Schmidt says:

====================
pull-request: ieee802154-next 2023-02-20

Miquel Raynal build upon his earlier work and introduced two new
features into the ieee802154 stack. Beaconing to announce existing
PAN's and passive scanning to discover the beacons and associated
PAN's. The matching changes to the userspace configuration tool
have been posted as well and will be released together with the
kernel release.

Arnd Bergmann and Dmitry Torokhov worked on converting the
at86rf230 and cc2520 drivers away from the unused platform_data
usage and towards the new gpiod API. (I had to add a revert as
Dmitry found a regression on an already pushed tree on my side).

Changes since v1 (pull request 2023-02-02)
- Netlink API extack and NLA_POLICY* usage as suggested by Jakub
- Removed always true condition found by kernel test robot
- Simplify device removal with running background job for scanning
- Fix problems with beacon sending in some cases by using the MLME
  tx path

* tag 'ieee802154-for-net-next-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next:
  ieee802154: Drop device trackers
  mac802154: Fix an always true condition
  mac802154: Send beacons using the MLME Tx path
  ieee802154: Change error code on monitor scan netlink request
  ieee802154: Convert scan error messages to extack
  ieee802154: Use netlink policies when relevant on scan parameters
  ieee802154: at86rf230: switch to using gpiod API
  ieee802154: at86rf230: drop support for platform data
  Revert "at86rf230: convert to gpio descriptors"
  cc2520: move to gpio descriptors
  mac802154: Avoid superfluous endianness handling
  at86rf230: convert to gpio descriptors
  mac802154: Handle basic beaconing
  ieee802154: Add support for user beaconing requests
  mac802154: Handle passive scanning
  mac802154: Add MLME Tx locked helpers
  mac802154: Prepare forcing specific symbol duration
  ieee802154: Introduce a helper to validate a channel
  ieee802154: Define a beacon frame header
  ieee802154: Add support for user scanning requests
====================

Link: https://lore.kernel.org/r/20230220213749.386451-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: fix builds without CONFIG_RTC_LIB

Add an embarrassingly missed semicolon plus and embarrassingly missed
parenthesis breaking kernel building when CONFIG_RTC_LIB is not set
like the one reported with ia64 config.

Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202302170047.EjCPizu3-lkp@intel.com/
Fixes: 14743ddd2495 ("sfc: add devlink info support for ef100")
Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com>
Link: https://lore.kernel.org/r/20230220110133.29645-1-alejandro.lucero-palau@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: clean up some inconsistent indentings

Fix some indentngs and remove the warning below:
drivers/net/ethernet/sfc/mae.c:657 efx_mae_enumerate_mports() warn: inconsistent indenting

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4117
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20230220065958.52941-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx4_en: Introduce flexible array to silence overflow warning

The call "skb_copy_from_linear_data(skb, inl + 1, spc)" triggers a FORTIFY
memcpy() warning on ppc64 platform:

In function ‘fortify_memcpy_chk’,
    inlined from ‘skb_copy_from_linear_data’ at ./include/linux/skbuff.h:4029:2,
    inlined from ‘build_inline_wqe’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:722:4,
    inlined from ‘mlx4_en_xmit’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:1066:3:
./include/linux/fortify-string.h:513:25: error: call to ‘__write_overflow_field’ declared with
attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()?
[-Werror=attribute-warning]
  513 |                         __write_overflow_field(p_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Same behaviour on x86 you can get if you use "__always_inline" instead of
"inline" for skb_copy_from_linear_data() in skbuff.h

The call here copies data into inlined tx destricptor, which has 104
bytes (MAX_INLINE) space for data payload. In this case "spc" is known
in compile-time but the destination is used with hidden knowledge
(real structure of destination is different from that the compiler
can see). That cause the fortify warning because compiler can check
bounds, but the real bounds are different.  "spc" can't be bigger than
64 bytes (MLX4_INLINE_ALIGN), so the data can always fit into inlined
tx descriptor. The fact that "inl" points into inlined tx descriptor is
determined earlier in mlx4_en_xmit().

Avoid confusing the compiler with "inl + 1" constructions to get to past
the inl header by introducing a flexible array "data" to the struct so
that the compiler can see that we are not dealing with an array of inl
structs, but rather, arbitrary data following the structure. There are
no changes to the structure layout reported by pahole, and the resulting
machine code is actually smaller.

Reported-by: Josef Oskera <joskera@redhat.com>
Link: https://lore.kernel.org/lkml/20230217094541.2362873-1-joskera@redhat.com
Fixes: f68f2ff91512 ("fortify: Detect struct member overflows in memcpy() at compile-time")
Cc: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20230218183842.never.954-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/ulp: Remove redundant ->clone() test in inet_clone_ulp().

Commit 2c02d41d71f9 ("net/ulp: prevent ULP without clone op from entering
the LISTEN status") guarantees that all ULP listeners have clone() op, so
we no longer need to test it in inet_clone_ulp().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230217200920.85306-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-02-17

We've added 64 non-merge commits during the last 7 day(s) which contain
a total of 158 files changed, 4190 insertions(+), 988 deletions(-).

The main changes are:

1) Add a rbtree data structure following the "next-gen data structure"
   precedent set by recently-added linked-list, that is, by using
   kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky.

2) Add a new benchmark for hashmap lookups to BPF selftests,
   from Anton Protopopov.

3) Fix bpf_fib_lookup to only return valid neighbors and add an option
   to skip the neigh table lookup, from Martin KaFai Lau.

4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory
   accouting for container environments, from Yafang Shao.

5) Batch of ice multi-buffer and driver performance fixes,
   from Alexander Lobakin.

6) Fix a bug in determining whether global subprog's argument is
   PTR_TO_CTX, which is based on type names which breaks kprobe progs,
   from Andrii Nakryiko.

7) Prep work for future -mcpu=v4 LLVM option which includes usage of
   BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier,
   from Eduard Zingerman.

8) More prep work for later building selftests with Memory Sanitizer
   in order to detect usages of undefined memory, from Ilya Leoshkevich.

9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer
   dereference via sendmsg(), from Maciej Fijalkowski.

10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui.

11) Fix BPF memory allocator in combination with BPF hashtab where it could
    corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao.

12) Fix LoongArch BPF JIT to always use 4 instructions for function
    address so that instruction sequences don't change between passes,
    from Hengqi Chen.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
  selftests/bpf: Add bpf_fib_lookup test
  bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
  riscv, bpf: Add bpf trampoline support for RV64
  riscv, bpf: Add bpf_arch_text_poke support for RV64
  riscv, bpf: Factor out emit_call for kernel and bpf context
  riscv: Extend patch_text for multiple instructions
  Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"
  selftests/bpf: Add global subprog context passing tests
  selftests/bpf: Convert test_global_funcs test to test_loader framework
  bpf: Fix global subprog context argument resolution logic
  LoongArch, bpf: Use 4 instructions for function address in JIT
  bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
  bpf: Disable bh in bpf_test_run for xdp and tc prog
  xsk: check IFF_UP earlier in Tx path
  Fix typos in selftest/bpf files
  selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
  ...
====================

Link: https://lore.kernel.org/r/20230217221737.31122-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sched/topology: fix KASAN warning in hop_cmp()

Despite that prev_hop is used conditionally on cur_hop
is not the first hop, it's initialized unconditionally.

Because initialization implies dereferencing, it might happen
that the code dereferences uninitialized memory, which has been
spotted by KASAN. Fix it by reorganizing hop_cmp() logic.

Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
Fixes: cd7f55359c90 ("sched: add sched_numa_find_nth_cpu()")
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/Y+7avK6V9SyAWsXi@yury-laptop/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bcmgenet: Support wake-up from s2idle

When we suspend into s2idle we also need to enable the interrupt line
that generates the MPD and HFB interrupts towards the host CPU interrupt
controller (typically the ARM GIC or MIPS L1) to make it exit s2idle.

When we suspend into other modes such as "standby" or "mem" we engage a
power management state machine which will gate off the CPU L1 controller
(priv->irq0) and ungate the side band wake-up interrupt (priv->wol_irq).
It is safe to have both enabled as wake-up sources because they are
mutually exclusive given any suspend mode.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

scm: add user copy checks to put_cmsg()

This is a followup of commit 2558b8039d05 ("net: use a bounce
buffer for copying skb->mark")

x86 and powerpc define user_access_begin, meaning
that they are not able to perform user copy checks
when using user_write_access_begin() / unsafe_copy_to_user()
and friends [1]

Instead of waiting bugs to trigger on other arches,
add a check_object_size() in put_cmsg() to make sure
that new code tested on x86 with CONFIG_HARDENED_USERCOPY=y
will perform more security checks.

[1] We can not generically call check_object_size() from
unsafe_copy_to_user() because UACCESS is enabled at this point.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: drop leftover duplicate/unused code

The recent merge from net left-over some unused code in
leftover.c - nomen omen.

Just drop the unused bits.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-next-for-6.3-20230217' of git://git./linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:

====================
pull-request: can-next 2023-02-17 - fixed

this is a pull request of 4 patches for net-next/master.

The first patch is by Yang Li and converts the ctucanfd driver to
devm_platform_ioremap_resource().

The last 3 patches are by Frank Jungclaus, target the esd_usb driver
and contains preparations for the upcoming support of the esd
CAN-USB/3 hardware.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: lan966x: Use automatic selection of VCAP rule actionset

Since commit 81e164c4aec5 ("net: microchip: sparx5: Add automatic
selection of VCAP rule actionset") the VCAP API has the capability to
select automatically the actionset based on the actions that are attached
to the rule. So it is not needed anymore to hardcode the actionset in the
driver, therefore it is OK to remove this.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'default_rps_mask-follow-up'

Paolo Abeni says:

====================
net: default_rps_mask follow-up

The first patch namespacify the setting. In the common case, once
proper isolation is in place in the main namespace, forwarding
to/from each child netns will allways happen on the desidered CPUs.

Any additional RPS stage inside the child namespace will not provide
additional isolation and could hurt performance badly if picking a
CPU on a remote node.

The 2nd patch adds more self-tests coverage.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

self-tests: more rps self tests

Explicitly check for child netns and main ns independency

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: make default_rps_mask a per netns attribute

That really was meant to be a per netns attribute from the beginning.

The idea is that once proper isolation is in place in the main
namespace, additional demux in the child namespaces will be redundant.
Let's make child netns default rps mask empty by default.

To avoid bloating the netns with a possibly large cpumask, allocate
it on-demand during the first write operation.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-next-2023-02-17' of git://git./linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.3

Third set of patches for v6.3. This time only a set of small fixes
submitted during the last day or two.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Add safeguard to check for NULL tupe in objects updates via
   NFT_MSG_NEWOBJ, this should not ever happen. From Alok Tiwari.

2) Incorrect pointer check in the new destroy rule command,
   from Yang Yingliang.

3) Incorrect status bitcheck in nf_conntrack_udp_packet(),
   from Florian Westphal.

4) Simplify seq_print_acct(), from Ilia Gavrilov.

5) Use 2-arg optimal variant of kfree_rcu() in IPVS,
   from Julian Anastasov.

6) TCP connection enters CLOSE state in conntrack for locally
   originated TCP reset packet from the reject target,
   from Florian Westphal.

The fixes #2 and #3 in this series address issues from the previous pull
nf-next request in this net-next cycle.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: reduce stack usage

The vcap_admin structures in vcap_api_next_lookup_advanced_test()
take several hundred bytes of stack frame, but when CONFIG_KASAN_STACK
is enabled, each one of them also has extra padding before and after
it, which ends up blowing the warning limit:

In file included from drivers/net/ethernet/microchip/vcap/vcap_api.c:3521:
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c: In function 'vcap_api_next_lookup_advanced_test':
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:1954:1: error: the frame size of 1448 bytes is larger than 1400 bytes [-Werror=frame-larger-than=]
1954 | }

Reduce the total stack usage by replacing the five structures with
an array that only needs one pair of padding areas.

Fixes: 1f741f001160 ("net: microchip: sparx5: Add KUNIT tests for enabling/disabling chains")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: use IS_ENABLED() checks for CONFIG_SFC_SRIOV

One local variable has become unused after a recent change:

drivers/net/ethernet/sfc/ef100_nic.c: In function 'ef100_probe_netdev_pf':
drivers/net/ethernet/sfc/ef100_nic.c:1155:21: error: unused variable 'net_dev' [-Werror=unused-variable]
struct net_device *net_dev = efx->net_dev;
^~~~~~~

The variable is still used in an #ifdef. Replace the #ifdef with
an if(IS_ENABLED()) check that lets the compiler see where it is
used, rather than adding another #ifdef.

This also fixes an uninitialized return value in ef100_probe_netdev_pf()
that gcc did not spot.

Fixes: 7e056e2360d9 ("sfc: obtain device mac address based on firmware handle for ef100")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ice: properly alloc ICE_VSI_LB

Devlink reload patchset introduced regression. ICE_VSI_LB wasn't
taken into account when doing default allocation. Fix it by adding a
case for ICE_VSI_LB in ice_vsi_alloc_def().

Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Fix spelling mistake "creationg" -> "creating"

There is a spelling mistake in a pci_warn message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Alejandro Lucero <alejandro.lucero-palau@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: Add NIX Errata workaround on CN10K silicon

This patch adds workaround for below 2 HW erratas

1. Due to improper clock gating, NIXRX may free the same
NPA buffer multiple times.. to avoid this, always enable
NIX RX conditional clock.

2. NIX FIFO does not get initialized on reset, if the SMQ
flush is triggered before the first packet is processed, it
will lead to undefined state. The workaround to perform SMQ
flush only if packet count is non-zero in MDQ.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Read EEE abilities when using .features

A PHY driver can use a static integer value to indicate what link mode
features it supports, i.e, its abilities.. This is the old way, but
useful when dynamically determining the devices features does not
work, e.g. support of fibre.

EEE support has been moved into phydev->supported_eee. This needs to
be set otherwise the code assumes EEE is not supported. It is normally
set as part of reading the devices abilities. However if a static
integer value was used, the dynamic reading of the abilities is not
performed. Add a call to genphy_c45_read_eee_abilities() to read the
EEE abilities.

Fixes: 8b68710a3121 ("net: phy: start using genphy_c45_ethtool_get/set_eee()")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'phydev-locks'

Andrew Lunn says:

====================
Add additional phydev locks

The phydev lock should be held when accessing members of phydev, or
calling into the driver. Some of the phy_ethtool_ functions are
missing locks. Add them. To avoid deadlock the marvell driver is
modified since it calls one of the functions which gain locks, which
would result in a deadlock.

The missing locks have not caused noticeable issues, so these patches
are for net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Add locks to ethtool functions

The phydev lock should be held while accessing members of phydev,
or calling into the driver.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell: Use the unlocked genphy_c45_ethtool_get_eee()

phy_ethtool_get_eee() is about to gain locking of the phydev lock.
This means it cannot be used within a PHY driver without causing a
deadlock. Swap to using genphy_c45_ethtool_get_eee() which assumes the
lock has already been taken.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'icmp6-drop-reason'

Eric Dumazet says:

====================
ipv6: icmp6: better drop reason support

This series aims to have more precise drop reason reports for icmp6.

This should reduce false positives on most usual cases.

This can be extended as needed later.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: icmp6: add drop reason support to icmpv6_echo_reply()

Change icmpv6_echo_reply() to return a drop reason.

For the moment, return NOT_SPECIFIED or SKB_CONSUMED.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: icmp6: add SKB_DROP_REASON_IPV6_NDISC_NS_OTHERHOST

Hosts can often receive neighbour discovery messages
that are not for them.

Use a dedicated drop reason to make clear the packet is dropped
for this normal case.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: icmp6: add SKB_DROP_REASON_IPV6_NDISC_BAD_OPTIONS

This is a generic drop reason for any error detected
in ndisc_parse_options().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: icmp6: add drop reason support to ndisc_redirect_rcv()

Change ndisc_redirect_rcv() to return a drop reason.

For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
and values from icmpv6_notify().

More reasons are added later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: icmp6: add drop reason support to ndisc_router_discovery()

Change ndisc_router_discovery() to return a drop reason.

For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
and SKB_CONSUMED.

More reasons are added later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: icmp6: add drop reason support to ndisc_recv_rs()

Change ndisc_recv_rs() to return a drop reason.

For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
or SKB_CONSUMED. More reasons are added later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: icmp6: add drop reason support to ndisc_recv_na()

Change ndisc_recv_na() to return a drop reason.

For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
or SKB_CONSUMED. More reasons are added later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: icmp6: add drop reason support to ndisc_recv_ns()

Change ndisc_recv_ns() to return a drop reason.

For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
or SKB_CONSUMED.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add location to trace_consume_skb()

kfree_skb() includes the location, it makes sense
to add it to consume_skb() as well.

After patch:

taskd_EventMana  8602 [004]   420.406239: skb:consume_skb: skbaddr=0xffff893a4a6d0500 location=unix_stream_read_generic
         swapper     0 [011]   422.732607: skb:consume_skb: skbaddr=0xffff89597f68cee0 location=mlx4_en_free_tx_desc
      discipline  9141 [043]   423.065653: skb:consume_skb: skbaddr=0xffff893a487e9c00 location=skb_consume_udp
         swapper     0 [010]   423.073166: skb:consume_skb: skbaddr=0xffff8949ce9cdb00 location=icmpv6_rcv
         borglet  8672 [014]   425.628256: skb:consume_skb: skbaddr=0xffff8949c42e9400 location=netlink_dump
         swapper     0 [028]   426.263317: skb:consume_skb: skbaddr=0xffff893b1589dce0 location=net_rx_action
            wget 14339 [009]   426.686380: skb:consume_skb: skbaddr=0xffff893a51b552e0 location=tcp_rcv_state_process

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xsk: support use vaddr as ring

When we try to start AF_XDP on some machines with long running time, due
to the machine's memory fragmentation problem, there is no sufficient
contiguous physical memory that will cause the start failure.

If the size of the queue is 8 * 1024, then the size of the desc[] is
8 * 1024 * 8 = 16 * PAGE, but we also add struct xdp_ring size, so it is
16page+. This is necessary to apply for a 4-order memory. If there are a
lot of queues, it is difficult to these machine with long running time.

Here, that we actually waste 15 pages. 4-Order memory is 32 pages, but
we only use 17 pages.

This patch replaces __get_free_pages() by vmalloc() to allocate memory
to solve these problems.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'taprio-queuemaxsdu-fixes'

Vladimir Oltean says:

====================
taprio queueMaxSDU fixes

This fixes 3 issues noticed while attempting to reoffload the
dynamically calculated queueMaxSDU values. These are:
- Dynamic queueMaxSDU is not calculated correctly due to a lost patch
- Dynamically calculated queueMaxSDU needs to be clamped on the low end
- Dynamically calculated queueMaxSDU needs to be clamped on the high end
====================

Link: https://lore.kernel.org/r/20230215224632.2532685-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: taprio: dynamic max_sdu larger than the max_mtu is unlimited

It makes no sense to keep randomly large max_sdu values, especially if
larger than the device's max_mtu. These are visible in "tc qdisc show".
Such a max_sdu is practically unlimited and will cause no packets for
that traffic class to be dropped on enqueue.

Just set max_sdu_dynamic to U32_MAX, which in the logic below causes
taprio to save a max_frm_len of U32_MAX and a max_sdu presented to user
space of 0 (unlimited).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: taprio: don't allow dynamic max_sdu to go negative after stab adjustment

The overhead specified in the size table comes from the user. With small
time intervals (or gates always closed), the overhead can be larger than
the max interval for that traffic class, and their difference is
negative.

What we want to happen is for max_sdu_dynamic to have the smallest
non-zero value possible (1) which means that all packets on that traffic
class are dropped on enqueue. However, since max_sdu_dynamic is u32, a
negative is represented as a large value and oversized dropping never
happens.

Use max_t with int to force a truncation of max_frm_len to no smaller
than dev->hard_header_len + 1, which in turn makes max_sdu_dynamic no
smaller than 1.

Fixes: fed87cc6718a ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: taprio: fix calculation of maximum gate durations

taprio_calculate_gate_durations() depends on netdev_get_num_tc() and
this returns 0. So it calculates the maximum gate durations for no
traffic class.

I had tested the blamed commit only with another patch in my tree, one
which in the end I decided isn't valuable enough to submit ("net/sched:
taprio: mask off bits in gate mask that exceed number of TCs").

The problem is that having this patch threw off my testing. By moving
the netdev_set_num_tc() call earlier, we implicitly gave to
taprio_calculate_gate_durations() the information it needed.

Extract only the portion from the unsubmitted change which applies the
mqprio configuration to the netdev earlier.

Link: https://patchwork.kernel.org/project/netdevbpf/patch/20230130173145.475943-15-vladimir.oltean@nxp.com/
Fixes: a306a90c8ffe ("net/sched: taprio: calculate tc gate durations")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

rxrpc: Fix overproduction of wakeups to recvmsg()

Fix three cases of overproduction of wakeups:

(1) rxrpc_input_split_jumbo() conditionally notifies the app that there's
     data for recvmsg() to collect if it queues some data - and then its
     only caller, rxrpc_input_data(), goes and wakes up recvmsg() anyway.

     Fix the rxrpc_input_data() to only do the wakeup in failure cases.

(2) If a DATA packet is received for a call by the I/O thread whilst
     recvmsg() is busy draining the call's rx queue in the app thread, the
     call will left on the recvmsg() queue for recvmsg() to pick up, even
     though there isn't any data on it.

     This can cause an unexpected recvmsg() with a 0 return and no MSG_EOR
     set after the reply has been posted to a service call.

     Fix this by discarding pending calls from the recvmsg() queue that
     don't need servicing yet.

(3) Not-yet-completed calls get requeued after having data read from them,
     even if they have no data to read.

     Fix this by only requeuing them if they have data waiting on them; if
     they don't, the I/O thread will requeue them when data arrives or they
     fail.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/3386149.1676497685@warthog.procyon.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-final-gsi-register-updates'

Alex Elder says:

====================
net: final GSI register updates

I believe this is the last set of changes required to allow IPA v5.0
to be supported.  There is a little cleanup work remaining, but that
can happen in the next Linux release cycle.  Otherwise we just need
config data and register definitions for IPA v5.0 (and DTS updates).
These are ready but won't be posted without further testing.

The first patch in this series fixes a minor bug in a patch just
posted, which I found too late.  The second eliminates the GSI
memory "adjustment"; this was done previously to avoid/delay the
need to implement a more general way to define GSI register offsets.
Note that this patch causes "checkpatch" warnings due to indentation
that aligns with an open parenthesis.

The third patch makes use of the newly-defined register offsets, to
eliminate the need for a function that hid a few details.  The next
modifies a different helper function to work properly for IPA v5.0+.
The fifth patch changes the way the event ring size is specified
based on how it's now done for IPA v5.0+.  And the last defines a
new register required for IPA v5.0+.
====================

Link: https://lore.kernel.org/r/20230215195352.755744-1-elder@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ipa: add HW_PARAM_4 GSI register

Starting at IPA v5.0, the number of event rings per EE is defined
in a field in a new HW_PARAM_4 GSI register rather than HW_PARAM_2.
Define this new register and its fields, and update the code that
checks the number of rings supported by hardware to use the proper
field based on IPA version.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ipa: support different event ring encoding

Starting with IPA v5.0, a channel's event ring index is encoded in
a field in the CH_C_CNTXT_1 GSI register rather than CH_C_CNTXT_0.
Define a new field ID for the former register and encode the event
ring in the appropriate register.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ipa: avoid setting an undefined field

The GSI channel protocol field in the CH_C_CNTXT_0 GSI register is
widened starting IPA v5.0, making the CHTYPE_PROTOCOL_MSB field
added in IPA v4.5 unnecessary. Update the code to reflect this.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ipa: kill ev_ch_e_cntxt_1_length_encode()

Now that we explicitly define each register field width there is no
need to have a special encoding function for the event ring length.
Add a field for this to the EV_CH_E_CNTXT_1 GSI register, and use it
in place of ev_ch_e_cntxt_1_length_encode() (which can be removed).

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ipa: kill gsi->virt_raw

Starting at IPA v4.5, almost all GSI registers had their offsets
changed by a fixed amount (shifted downward by 0xd000).  Rather than
defining offsets for all those registers dependent on version, an
adjustment was applied for most register accesses.  This was
implemented in commit cdeee49f3ef7f ("net: ipa: adjust GSI register
addresses").  It was later modified to be a bit more obvious about
the adjusment, in commit 571b1e7e58ad3 ("net: ipa: use a separate
pointer for adjusted GSI memory").

We now are able to define every GSI register with its own offset, so
there's no need to implement this special adjustment.

So get rid of the "virt_raw" pointer, and just maintain "virt" as
the (non-adjusted) base address of I/O mapped GSI register memory.

Redefine the offsets of all GSI registers (other than the INTER_EE
ones, which were not subject to the adjustment) for IPA v4.5+,
subtracting 0xd000 from their defined offsets instead.

Move the ERROR_LOG and ERROR_LOG_CLR definitions further down in the
register definition files so all registers are defined in order of
their offset.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ipa: fix an incorrect assignment

I spotted an error in a patch posted this week, unfortunately just
after it got accepted.  The effect of the bug is that time-based
interrupt moderation is disabled.  This is not technically a bug,
but it is not what is intended.  The problem is that a |= assignment
got implemented as a simple assignment, so the previously assigned
value was ignored.

Fixes: edc6158b18af ("net: ipa: define fields for event-ring related registers")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dpaa2-eth: do not always set xsk support in xdp_features flag

Do not always add NETDEV_XDP_ACT_XSK_ZEROCOPY bit in xdp_features flag
but check if the NIC really supports it.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/3dba6ea42dc343a9f2d7d1a6a6a6c173235e1ebf.1676471386.git.lorenzo@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ieee802154: Drop device trackers

In order to prevent a device from disappearing when a background job was
started, dev_hold() and dev_put() calls were made. During the
stabilization phase of the scan/beacon features, it was later decided
that removing the device while a background job was ongoing was a valid use
case, and we should instead stop the background job and then remove the
device, rather than prevent the device from being removed. This is what
is currently done, which means manually reference counting the device
during background jobs is no longer needed.

Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests")
Fixes: 9bc114504b07 ("ieee802154: Add support for user beaconing requests")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-7-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

mac802154: Fix an always true condition

At this stage we simply do not care about the delayed work value,
because active scan is not yet supported, so we can blindly queue
another work once a beacon has been sent.

It fixes a smatch warning:
mac802154_beacon_worker() warn: always true condition
'(local->beacon_interval >= 0) => (0-u32max >= 0)'

Fixes: 3accf4762734 ("mac802154: Handle basic beaconing")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-6-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

mac802154: Send beacons using the MLME Tx path

Using ieee802154_subif_start_xmit() to bypass the net queue when
sending beacons is broken because it does not acquire the
HARD_TX_LOCK(), hence not preventing datagram buffers to be smashed by
beacons upon contention situation. Using the mlme_tx helper is not the
best fit either but at least it is not buggy and has little-to-no
performance hit. More details are given in the comment explaining this
choice in the code.

Fixes: 3accf4762734 ("mac802154: Handle basic beaconing")
Reported-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-5-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

ieee802154: Change error code on monitor scan netlink request

Returning EPERM gives the impression that "right now" it is not
possible, but "later" it could be, while what we want to express is the
fact that this is not currently supported at all (might change in the
future). So let's return EOPNOTSUPP instead.

Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests")
Suggested-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-4-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

ieee802154: Convert scan error messages to extack

Instead of printing error messages in the kernel log, let's use extack.
When there is a netlink error returned that could be further specified
with a string, use extack as well.

Apply this logic to the very recent scan/beacon infrastructure.

Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests")
Fixes: 9bc114504b07 ("ieee802154: Add support for user beaconing requests")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-3-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

ieee802154: Use netlink policies when relevant on scan parameters

Instead of open-coding scan parameters (page, channels, duration, etc),
let's use the existing NLA_POLICY* macros. This help greatly reducing
the error handling and clarifying the overall logic.

Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-2-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

net/mlx5e: RX, Remove doubtful unlikely call

When building an skb in non-linear mode, it is not likely nor unlikely
that the xdp buff has fragments, it depends on the size of the packet
received.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

net/mlx5e: Fix outdated TLS comment

Comment is outdated since
commit 40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support").

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Remove unused function mlx5e_sq_xmit_simple

The last usage was removed as part of
commit 40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support").

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Allow offloading of ct 'new' match

Allow offloading filters that match on conntrack 'new' state in order to
enable UDP NEW offload in the following patch.

Unhardcode ct 'established' from ct modify header infrastructure code and
determine correct ct state bit according to the metadata action 'cookie'
field.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Implement CT entry update

With support for UDP NEW offload the flow_table may now send updates for
existing flows. Support properly replacing existing entries by updating
flow restore_cookie and replacing the rule with new one with the same match
but new mod_hdr action that sets updated ctinfo.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Simplify eq list traversal

EQ list is read only while finding the matching EQ.
Hence, avoid *_safe() version.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

net/mlx5e: Remove redundant page argument in mlx5e_xdp_handle()

Remove the page parameter, it can be derived from the xdp_buff member
of mlx5e_xdp_buff.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Remove redundant page argument in mlx5e_xmit_xdp_buff()

Remove the page parameter, it can be derived from the xdp_buff.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Switch to using napi_build_skb()

Use napi_build_skb() which uses NAPI percpu caches to obtain
skbuff_head instead of inplace allocation.

napi_build_skb() calls napi_skb_cache_get(), which returns a cached
skb, or allocates a bulk of NAPI_SKB_CACHE_BULK (16) if cache is empty.

Performance test:
TCP single stream, single ring, single core, default MTU (1500B).

Before: 26.5 Gbits/sec
After: 30.1 Gbits/sec (+13.6%)

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>

selftests/bpf: Add bpf_fib_lookup test

This patch tests the bpf_fib_lookup helper when looking up
a neigh in NUD_FAILED and NUD_STALE state. It also adds test
for the new BPF_FIB_LOOKUP_SKIP_NEIGH flag.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217205515.3583372-2-martin.lau@linux.dev

bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup

The bpf_fib_lookup() also looks up the neigh table.
This was done before bpf_redirect_neigh() was added.

In the use case that does not manage the neigh table
and requires bpf_fib_lookup() to lookup a fib to
decide if it needs to redirect or not, the bpf prog can
depend only on using bpf_redirect_neigh() to lookup the
neigh. It also keeps the neigh entries fresh and connected.

This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid
the double neigh lookup when the bpf prog always call
bpf_redirect_neigh() to do the neigh lookup. The params->smac
output is skipped together when SKIP_NEIGH is set because
bpf_redirect_neigh() will figure out the smac also.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev

riscv, bpf: Add bpf trampoline support for RV64

BPF trampoline is the critical infrastructure of the BPF subsystem, acting
as a mediator between kernel functions and BPF programs. Numerous important
features, such as using BPF program for zero overhead kernel introspection,
rely on this key component. We can't wait to support bpf trampoline on RV64.
The related tests have passed, as well as the test_verifier with no new
failure ceses.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/bpf/20230215135205.1411105-5-pulehui@huaweicloud.com