review.tizen.org Git - platform/kernel/linux-starfive.git/log

can: raw: add support for SO_MARK

Add support for SO_MARK to the CAN_RAW protocol. This makes it
possible to add traffic control filters based on the fwmark.

Link: https://lore.kernel.org/all/20221210113653.170346-1-mkl@pengutronix.de
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: m_can: Call the RAM init directly from m_can_chip_config

When we try to access the mcan message ram addresses during the probe,
hclk is gated by any other drivers or disabled, because of that probe
gets failed.

Move the mram init functionality to mcan chip config called by
m_can_start from mcan open function, by that time clocks are
enabled.

Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Vivek Yadav <vivek.2311@samsung.com>
Link: https://lore.kernel.org/all/20221207100632.96200-2-vivek.2311@samsung.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge patch series "can: usb: remove pointers to struct usb_interface in device's priv structures"

Vincent Mailhol <mailhol.vincent@wanadoo.fr> says:

The gs_can and ucan drivers keep a pointer to struct usb_interface in
their private structure. This is not needed. For gs_can the only use
is to retrieve struct usb_device, which is already available in
gs_usb::udev. For ucan, the field is set but never used.

Remove the struct usb_interface fields and clean up.

Link: https://lore.kernel.org/all/20221208081142.16936-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: gs_usb: remove gs_can::iface

The iface field of struct gs_can is only used to retrieve the
usb_device which is already available in gs_can::udev.

Replace each occurrence of interface_to_usbdev(dev->iface) with
dev->udev. This done, remove gs_can::iface.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20221208081142.16936-3-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: ucan: remove unused ucan_priv::intf

Field intf of struct ucan_priv is set but never used. Remove it.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20221208081142.16936-2-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

net: af_can: remove useless parameter 'err' in 'can_rx_register()'

Since commit bdfb5765e45b remove NULL-ptr checks from users of
can_dev_rcv_lists_find(). 'err' parameter is useless, so remove it.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/all/20221208090940.3695670-1-yebin@huaweicloud.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: ucan: use strscpy() to instead of strncpy()

The implementation of strscpy() is more robust and safer.
That's now the recommended way to copy NUL terminated strings.

Signed-off-by: Xu Panda <xu.panda@zte.com.cn>
Signed-off-by: Yang Yang <yang.yang29@zte.com>
Link: https://lore.kernel.org/all/202212070909095189693@zte.com.cn
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge patch series "can: etas_es58x: report firmware, bootloader and hardware version"

Vincent Mailhol <mailhol.vincent@wanadoo.fr> says:

The goal of this series is to report the firmware version, the
bootloader version and the hardware revision of ETAS ES58x devices.

These are already reported in the kernel log but this isn't best
practice. Remove the kernel log and instead export all these through
devlink. The devlink core automatically exports the firmware and the
bootloader version to ethtool, so no need to implement the
ethtool_ops::get_drvinfo() callback anymore.

Patch one and two implement the core support for devlink (at device
level) and devlink port (at the network interface level).

Patch three export usb_cache_string() and patch four add a new info
attribute to devlink.h. Both are prerequisites for patch five.

Patch five is the actual goal: it parses the product information from
a custom usb string returned by the device and expose them through
devlink.

Patch six removes the product information from the kernel log.

Finally, patch seven add a devlink documentation page with list all
the information attributes reported by the driver.

* Sample outputs following this series *

| $ devlink dev info
| usb/1-9:1.1:
|   serial_number 0108954
|   versions:
|       fixed:
|         board.rev B012/000
|       running:
|         fw 04.00.01
|         fw.bootloader 02.00.00

| $ devlink port show can0
| usb/1-9:1.1/0: type eth netdev can0 flavour physical port 0 splittable false

| $ ethtool -i can0
| driver: etas_es58x
| version: 6.1.0-rc7+
| firmware-version: 04.00.01 02.00.00
| expansion-rom-version:
| bus-info: 1-9:1.1
| supports-statistics: no
| supports-test: no
| supports-eeprom-access: no
| supports-register-dump: no
| supports-priv-flags: no

* Changelog *

v4 -> v5: https://lore.kernel.org/all/20221130174658.29282-1-mailhol.vincent@wanadoo.fr

  * [PATH 2/7] add devlink port support. This extends devlink to the
    network interface.

  * thanks to devlink port, 'ethtool -i' is now able to retrieve the
    firmware version from devlink. No need to implement the
    ethtool_ops::get_drvinfo() callback anymore: remove one patch from
    the series.

  * [PATCH 4/7] A new patch to add a new info attribute for the
    bootloader version in devlink.h. This patch was initially sent as
    a standalone patch here:
      https://lore.kernel.org/netdev/20221129031406.3849872-1-mailhol.vincent@wanadoo.fr
    Merging it to this series so that it is both added and used at the
    same time.

  * [PATCH 5/7] use the newly info attribute defined in patch 4/7 to
    report the bootloader version instead of the custom string "bl".

  * [PATCH 5/7] because the series does not implement
    ethtool_ops::get_drvinfo() anymore, the two helper functions
    es58x_sw_version_is_set() and es58x_hw_revision_is_set() are only
    used in devlink.c. Move them from es58x_core.h to es58x_devlink.c.

  * [PATCH 5/7] small rework of the helper function
    es58x_hw_revision_is_set(): it is OK to only check the letter (if
    the letter is '\0', it will not be possible to print the next
    numbers).

  * [PATCH 5/7 and 6/7] add reviewed-by Andrew Lunn tag.

  * [PATCH 7/7] Now, 'ethtool -i' reports both the firmware version
    and the bootloader version (this is how the core export the
    information from devlink to ethtool). Update the documentation to
    reflect this fact.

  * Reoder the patches.

v3 -> v4: https://lore.kernel.org/all/20221126162211.93322-1-mailhol.vincent@wanadoo.fr

  * major rework to use devlink instead of sysfs following Andrew's
    comment.

  * split the series in 6 patches.

  * [PATCH 1/6] add Acked-by: Greg Kroah-Hartman

v2 -> v3: https://lore.kernel.org/all/20221113040108.68249-1-mailhol.vincent@wanadoo.fr

  * patch 2/3: do not spam the kernel log anymore with the product
    number. Instead parse the product information string, extract the
    firmware version, the bootloadar version and the hardware revision
    and export them through sysfs.

  * patch 2/3: rework the parsing in order not to need additional
    fields in struct es58x_parameters.

  * patch 3/3: only populate ethtool_drvinfo::fw_version because since
    commit edaf5df22cb8 ("ethtool: ethtool_get_drvinfo: populate
    drvinfo fields even if callback exits"), there is no need to
    populate ethtool_drvinfo::driver and ethtool_drvinfo::bus_info in
    the driver.

v1 -> v2: https://lore.kernel.org/all/20221104171604.24052-1-mailhol.vincent@wanadoo.fr

  * was a single patch. It is now a series of three patches.
  * add a first new patch to export  usb_cache_string().
  * add a second new patch to apply usb_cache_string() to existing code.
  * add missing check on product info string to prevent a buffer overflow.
  * add comma on the last entry of struct es58x_parameters.

v1: https://lore.kernel.org/all/20221104073659.414147-1-mailhol.vincent@wanadoo.fr

Link: https://lore.kernel.org/all/20221130174658.29282-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Documentation: devlink: add devlink documentation for the etas_es58x driver

List all the version information reported by the etas_es58x driver
through devlink. Also, update MAINTAINERS with the newly created file.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20221130174658.29282-8-mailhol.vincent@wanadoo.fr
[mkl: fixed version information table: "bl" -> "fw.bootloader"
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: etas_es58x: remove es58x_get_product_info()

Now that the product information are available under devlink, no more
need to print them in the kernel log. Remove es58x_get_product_info().

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/all/20221130174658.29282-7-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: etas_es58x: export product information through devlink_ops::info_get()

ES58x devices report below product information through a custom usb
string:

  * the firmware version
  * the bootloader version
  * the hardware revision

Parse this string, store the results in struct es58x_dev, export:

  * the firmware version through devlink's "fw" name
  * the bootloader version through devlink's "fw.bootloader" name
  * the hardware revisionthrough devlink's "board.rev" name

Those devlink entries are not critical to use the device, if parsing
fails, print an informative log message and continue to probe the
device.

In addition to that, use usb_device::serial to report the device
serial number.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/all/20221130174658.29282-6-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

net: devlink: add DEVLINK_INFO_VERSION_GENERIC_FW_BOOTLOADER

As discussed in [1], abbreviating the bootloader to "bl" might not be
well understood. Instead, a bootloader technically being a firmware,
name it "fw.bootloader".

Add a new macro to devlink.h to formalize this new info attribute name
and update the documentation.

[1] https://lore.kernel.org/netdev/20221128142723.2f826d20@kernel.org/

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20221130174658.29282-5-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: use devm_platform_get_and_ioremap_resource()

Convert platform_get_resource(), devm_ioremap_resource() to a single
call to devm_platform_get_and_ioremap_resource(), as this is exactly
what this function does.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Link: https://lore.kernel.org/all/202211111443005202576@zte.com.cn
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

USB: core: export usb_cache_string()

usb_cache_string() can also be useful for the drivers so export it.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/all/20221130174658.29282-4-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

dt-bindings: can: renesas,rcar-canfd: Document RZ/Five SoC

The CANFD block on the RZ/Five SoC is identical to one found on the
RZ/G2UL SoC. "renesas,r9a07g043-canfd" compatible string will be used
on the RZ/Five SoC so to make this clear, update the comment to include
RZ/Five SoC.

No driver changes are required as generic compatible string
"renesas,rzg2l-canfd" will be used as a fallback on RZ/Five SoC.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/20221115123811.1182922-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: etas_es58x: add devlink port support

Add support for devlink port which extends the devlink support to the
network interface level. For now, the etas_es58x driver will only rely
on the default features that devlink port has to offer and not
implement additional feature ones.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20221130174658.29282-3-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

dt-bindings: can: fsl,flexcan: add imx93 compatible

Add a new compatible string for imx93.

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/all/1669116752-4260-2-git-send-email-haibo.chen@nxp.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: etas_es58x: add devlink support

Add basic support for devlink at the device level. The callbacks of
struct devlink_ops will be implemented next.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20221130174658.29282-2-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: flexcan: add auto stop mode for IMX93 to support wakeup

IMX93 do not contain a GPR to config the stop mode, it will set
the flexcan into stop mode automatically once the ARM core go
into low power mode (WFI instruct) and gate off the flexcan
related clock automatically. But to let these logic work as
expect, before ARM core go into low power mode, need to make
sure the flexcan related clock keep on.

To support stop mode and wakeup feature on imx93, this patch
add a new fsl_imx93_devtype_data to separate from imx8mp.

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Link: https://lore.kernel.org/all/1669116752-4260-1-git-send-email-haibo.chen@nxp.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: etas_es58x: sort the includes by alphabetic order

Follow the best practices, reorder the includes.

While doing so, bump up copyright year of each modified files.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20221126160525.87036-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: ctucanfd: Drop obsolete dependency on COMPILE_TEST

Since commit 0166dc11be91 ("of: make CONFIG_OF user selectable"), it
is possible to test-build any driver which depends on OF on any
architecture by explicitly selecting OF. Therefore depending on
COMPILE_TEST as an alternative is no longer needed.

It is actually better to always build such drivers with OF enabled,
so that the test builds are closer to how each driver will actually be
built on its intended target. Building them without OF may not test
much as the compiler will optimize out potentially large parts of the
code. In the worst case, this could even pop false positive warnings.
Dropping COMPILE_TEST here improves the quality of our testing and
avoids wasting time on non-existent issues.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Cc: Ondrej Ille <ondrej.ille@gmail.com>
Acked-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Link: https://lore.kernel.org/all/20221124141604.4265225f@endymion.delvare
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge patch series "R-Car CAN FD driver enhancements"

Biju Das <biju.das.jz@bp.renesas.com> says:

The CAN FD IP found on RZ/G2L SoC has some HW features different to
that of R-Car. For example, it has multiple resets, dedicated channel
tx and error interrupts, separate global rx and error interrupts
compared to shared irq for R-Car. it does not s ECC error flag
registers and clk post divider present on R-Car.

Similarly, R-Car V3U has 8 channels whereas other SoCs has only 2
channels. Currently all the HW differences are handled by comparing
with chip_id enum.

This patch series aims to replace chip_id with struct
rcar_canfd_hw_info to handle the HW feature differences and driver
data present on both IPs.

The changes are trivial and tested on RZ/G2L SMARC EVK.

This patch series depend upon [1].

[1] https://lore.kernel.org/all/20221025155657.1426948-1-biju.das.jz@bp.renesas.com

changes since v2: https://lore.kernel.org/all/20221026131732.1843105-1-biju.das.jz@bp.renesas.com
* Replaced data type of max_channels from unsigned int->u8 to save memory.
* Replaced data type of postdiv from unsigned int->u8 to save memory.

changes since v1: https://lore.kernel.org/all/20221022104357.1276740-1-biju.das.jz@bp.renesas.com
* Updated commit description for R-Car V3U SoC detection using
   driver data.
* Replaced data type of max_channels from u32->unsigned int.
* Replaced multi_global_irqs->shared_global_irqs to make it
   positive checks.
* Replaced clk_postdiv->postdiv driver data variable.
* Simplified the calcualtion for fcan_freq.
* Replaced info->has_gerfl to gpriv->info->has_gerfl and wrapped
   the ECC error flag checks inside a single if statement.
* Added Rb tag from Geert patch#1,#2,#3 and #5

Link: https://lore.kernel.org/all/20221027082158.95895-1-biju.das.jz@bp.renesas.com
[mkl: only take patches 1...5 to avoid merge conflict]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_canfd: Add multi_channel_irqs to struct rcar_canfd_hw_info

RZ/G2L has separate IRQ lines for tx and error interrupt for each
channel whereas R-Car has a combined IRQ line for all the channel
specific tx and error interrupts.

Add multi_channel_irqs to struct rcar_canfd_hw_info to select the
driver to choose between combined and separate irq registration for
channel interrupts. This patch also removes enum rcanfd_chip_id and
chip_id from both struct rcar_canfd_hw_info, as it is unused.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/20221027082158.95895-6-biju.das.jz@bp.renesas.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_canfd: Add postdiv to struct rcar_canfd_hw_info

R-Car has a clock divider for CAN FD clock within the IP, whereas
it is not available on RZ/G2L.

Add postdiv variable to struct rcar_canfd_hw_info to take care of this
difference.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/20221027082158.95895-5-biju.das.jz@bp.renesas.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_canfd: Add shared_global_irqs to struct rcar_canfd_hw_info

RZ/G2L has separate IRQ lines for receive FIFO and global error interrupt
whereas R-Car has shared IRQ line.

Add shared_global_irqs to struct rcar_canfd_hw_info to select the driver to
choose between shared and separate irq registration for global
interrupts.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/20221027082158.95895-4-biju.das.jz@bp.renesas.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_canfd: Add max_channels to struct rcar_canfd_hw_info

R-Car V3U supports a maximum of 8 channels whereas rest of the SoCs
support 2 channels.

Add max_channels variable to struct rcar_canfd_hw_info to handle this
difference.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/20221027082158.95895-3-biju.das.jz@bp.renesas.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: m_can: sort header inclusion alphabetically

Sort header inclusion alphabetically.

Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Vivek Yadav <vivek.2311@samsung.com>
Link: https://lore.kernel.org/all/20221104051617.21173-1-vivek.2311@samsung.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_canfd: rcar_canfd_probe: Add struct rcar_canfd_hw_info to driver data

The CAN FD IP found on RZ/G2L SoC has some HW features different to that
of R-Car. For example, it has multiple resets and multiple IRQs for global
and channel interrupts. Also, it does not have ECC error flag registers
and clk post divider present on R-Car. Similarly, R-Car V3U has 8 channels
whereas other SoCs has only 2 channels.

This patch adds the struct rcar_canfd_hw_info to take care of the
HW feature differences and driver data present on both IPs. It also
replaces the driver data chip type with struct rcar_canfd_hw_info by
moving chip type to it.

Whilst started using driver data instead of chip_id for detecting
R-Car V3U SoCs.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/all/20221027082158.95895-2-biju.das.jz@bp.renesas.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_usb: kvaser_usb_set_{,data}bittiming(): remove empty lines in variable declaration

Fix coding style by removing empty lines in variable declaration.

Fixes: 39d3df6b0ea8 ("can: kvaser_usb: Compare requested bittiming parameters with actual parameters in do_set_{,data}_bittiming")
Cc: Jimmy Assarsson <extja@kvaser.com>
Cc: Anssi Hannula <anssi.hannula@bitwise.fi>
Link: https://lore.kernel.org/all/20221031114513.81214-2-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_usb: kvaser_usb_set_bittiming(): fix redundant initialization warning for err

The variable err is initialized, but the initialized value is
Overwritten before it is read. Fix the warning by not initializing the
variable err at all.

Fixes: 39d3df6b0ea8 ("can: kvaser_usb: Compare requested bittiming parameters with actual parameters in do_set_{,data}_bittiming")
Cc: Jimmy Assarsson <extja@kvaser.com>
Cc: Anssi Hannula <anssi.hannula@bitwise.fi>
Link: https://lore.kernel.org/all/20221031114513.81214-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge tag 'ipsec-next-2022-12-09' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
ipsec-next 2022-12-09

1) Add xfrm packet offload core API.
   From Leon Romanovsky.

2) Add xfrm packet offload support for mlx5.
   From Leon Romanovsky and Raed Salem.

3) Fix a typto in a error message.
   From Colin Ian King.

* tag 'ipsec-next-2022-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: (38 commits)
  xfrm: Fix spelling mistake "oflload" -> "offload"
  net/mlx5e: Open mlx5 driver to accept IPsec packet offload
  net/mlx5e: Handle ESN update events
  net/mlx5e: Handle hardware IPsec limits events
  net/mlx5e: Update IPsec soft and hard limits
  net/mlx5e: Store all XFRM SAs in Xarray
  net/mlx5e: Provide intermediate pointer to access IPsec struct
  net/mlx5e: Skip IPsec encryption for TX path without matching policy
  net/mlx5e: Add statistics for Rx/Tx IPsec offloaded flows
  net/mlx5e: Improve IPsec flow steering autogroup
  net/mlx5e: Configure IPsec packet offload flow steering
  net/mlx5e: Use same coding pattern for Rx and Tx flows
  net/mlx5e: Add XFRM policy offload logic
  net/mlx5e: Create IPsec policy offload tables
  net/mlx5e: Generalize creation of default IPsec miss group and rule
  net/mlx5e: Group IPsec miss handles into separate struct
  net/mlx5e: Make clear what IPsec rx_err does
  net/mlx5e: Flatten the IPsec RX add rule path
  net/mlx5e: Refactor FTE setup code to be more clear
  net/mlx5e: Move IPsec flow table creation to separate function
  ...
====================

Link: https://lore.kernel.org/r/20221209093310.4018731-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: Add missing error check to devlink_resource_put()

When the resource size changes, the return value of the
'nla_put_u64_64bit' function is not checked. That has been fixed to avoid
rechecking at the next step.

Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.

Note that this is harmless, we'd error out at the next put().

Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru>
Link: https://lore.kernel.org/r/20221208082821.3927937-1-Ilia.Gavrilov@infotecs.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

skbuff: Introduce slab_build_skb()

syzkaller reported:

BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294
Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295

For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to
build_skb().

When build_skb() is passed a frag_size of 0, it means the buffer came
from kmalloc. In these cases, ksize() is used to find its actual size,
but since the allocation may not have been made to that size, actually
perform the krealloc() call so that all the associated buffer size
checking will be correctly notified (and use the "new" pointer so that
compiler hinting works correctly). Split this logic out into a new
interface, slab_build_skb(), but leave the original 0 checking for now
to catch any stragglers.

Reported-by: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ
Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function")
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: pepsipu <soopthegoop@gmail.com>
Cc: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: ast@kernel.org
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Hao Luo <haoluo@google.com>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: jolsa@kernel.org
Cc: KP Singh <kpsingh@kernel.org>
Cc: martin.lau@linux.dev
Cc: Stanislav Fomichev <sdf@google.com>
Cc: song@kernel.org
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221208060256.give.994-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bcmgenet: Remove the unused function

The function dmadesc_get_addr() is defined in the bcmgenet.c file, but
not called elsewhere, so remove this unused function.

drivers/net/ethernet/broadcom/genet/bcmgenet.c:120:26: warning: unused function 'dmadesc_get_addr'.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3401
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221209033723.32452-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mptcp-miscellaneous-cleanup'

Mat Martineau says:

====================
mptcp: Miscellaneous cleanup

Two code cleanup patches for the 6.2 merge window that don't change
behavior:

Patch 1 makes proper use of nlmsg_free(), as suggested by Jakub while
reviewing f8c9dfbd875b ("mptcp: add pm listener events").

Patch 2 clarifies success status in a few mptcp functions, which
prevents some smatch false positives.
====================

Link: https://lore.kernel.org/r/20221209004431.143701-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: return 0 instead of 'err' var

When 'err' is 0, it looks clearer to return '0' instead of the variable
called 'err'.

The behaviour is then not modified, just a clearer code.

By doing this, we can also avoid false positive smatch warnings like
this one:

net/mptcp/pm_netlink.c:1169 mptcp_pm_parse_pm_addr_attr() warn: missing error code? 'err'

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: use nlmsg_free instead of kfree_skb

Use nlmsg_free() instead of kfree_skb() in pm_netlink.c.

The SKB's have been created by nlmsg_new(). The proper cleaning way
should then be done with nlmsg_free().

For the moment, nlmsg_free() is simply calling kfree_skb() so we don't
change the behaviour here.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'mlx5-updates-2022-12-08' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2022-12-08

1) Support range match action in SW steering

Yevgeny Kliteynik says:
=======================

The following patch series adds support for a range match action in
SW Steering.

SW steering is able to match only on the exact values of the packet fields,
as requested by the user: the user provides mask for the fields that are of
interest, and the exact values to be matched on when the traffic is handled.

The following patch series add new type of action - Range Match, where the
user provides a field to be matched on and a range of values (min to max)
that will be considered as hit.

There are several new notions that were implemented in order to support
Range Match:
- MATCH_RANGES Steering Table Entry (STE): the new STE type that allows
   matching the packets' fields on the range of values instead of a specific
   value.
- Match Definer: this is a general FW object that defines which fields
   in the packet will be referenced by the mask and tag of each STE.
   Match definer ID is part of STE fields, and it defines how the HW needs
   to interpret the STE's mask/tag values.
   Till now SW steering used the definers that were managed by FW and
   implemented the STE layout as described by the HW spec.
   Now that we're adding a new type of STE, SW steering needs to also be
   able to define this new STE's layout, and this is do

=======================

2) From OZ add support for meter mtu offload
   2.1: Refactor the code to allow both metering and range post actions as a
        pre-step for adding police mtu offload support.
   2.2: Instantiate mtu green/red flow tables with a single match-all rule.
        Add the green/red actions to the hit/miss table accordingly
   2.3: Initialize the meter object with the TC police mtu parameter.
        Use the hardware range match action feature.

3) From MaorD, support routes with more than 2 nexthops in multipath

4) Michael and Or, improve and extend vport representor counters.

* tag 'mlx5-updates-2022-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Expose steering dropped packets counter
  net/mlx5: Refactor and expand rep vport stat group
  net/mlx5e: multipath, support routes with more than 2 nexthops
  net/mlx5e: TC, add support for meter mtu offload
  net/mlx5e: meter, add mtu post meter tables
  net/mlx5e: meter, refactor to allow multiple post meter tables
  net/mlx5: DR, Add support for range match action
  net/mlx5: DR, Add function that tells if STE miss addr has been initialized
  net/mlx5: DR, Some refactoring of miss address handling
  net/mlx5: DR, Manage definers with refcounts
  net/mlx5: DR, Handle FT action in a separate function
  net/mlx5: DR, Rework is_fw_table function
  net/mlx5: DR, Add functions to create/destroy MATCH_DEFINER general object
  net/mlx5: fs, add match on ranges API
  net/mlx5: mlx5_ifc updates for MATCH_DEFINER general object
====================

Link: https://lore.kernel.org/r/20221209001420.142794-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-12-08 (ice)

Jacob Keller says:

This series of patches primarily consists of changes to fix some corner
cases that can cause Tx timestamp failures. The issues were discovered and
reported by Siddaraju DH and primarily affect E822 hardware, though this
series also includes some improvements that affect E810 hardware as well.

The primary issue is regarding the way that E822 determines when to generate
timestamp interrupts. If the driver reads timestamp indexes which do not
have a valid timestamp, the E822 interrupt tracking logic can get stuck.
This is due to the way that E822 hardware tracks timestamp index reads
internally. I was previously unaware of this behavior as it is significantly
different in E810 hardware.

Most of the fixes target refactors to ensure that the ice driver does not
read timestamp indexes which are not valid on E822 hardware. This is done by
using the Tx timestamp ready bitmap register from the PHY. This register
indicates what timestamp indexes have outstanding timestamps waiting to be
captured.

Care must be taken in all cases where we read the timestamp registers, and
thus all flows which might have read these registers are refactored. The
ice_ptp_tx_tstamp function is modified to consolidate as much of the logic
relating to these registers as possible. It now handles discarding stale
timestamps which are old or which occurred after a PHC time update. This
replaces previously standalone thread functions like the periodic work
function and the ice_ptp_flush_tx_tracker function.

In addition, some minor cleanups noticed while writing these refactors are
included.

The remaining patches refactor the E822 implementation to remove the
"bypass" mode for timestamps. The E822 hardware has the ability to provide a
more precise timestamp by making use of measurements of the precise way that
packets flow through the hardware pipeline. These measurements are known as
"Vernier" calibration. The "bypass" mode disables many of these measurements
in favor of a faster start up time for Tx and Rx timestamping. Instead, once
these measurements were captured, the driver tries to reconfigure the PHY to
enable the vernier calibrations.

Unfortunately this recalibration does not work. Testing indicates that the
PHY simply remains in bypass mode without the increased timestamp precision.
Remove the attempt at recalibration and always use vernier mode. This has
one disadvantage that Tx and Rx timestamps cannot begin until after at least
one packet of that type goes through the hardware pipeline. Because of this,
further refactor the driver to separate Tx and Rx vernier calibration.
Complete the Tx and Rx independently, enabling the appropriate type of
timestamp as soon as the relevant packet has traversed the hardware
pipeline. This was reported by Milena Olech.

Note that although these might be considered "bug fixes", the required
changes in order to appropriately resolve these issues is large. Thus it
does not feel suitable to send this series to net.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: reschedule ice_ptp_wait_for_offset_valid during reset
  ice: make Tx and Rx vernier offset calibration independent
  ice: only check set bits in ice_ptp_flush_tx_tracker
  ice: handle flushing stale Tx timestamps in ice_ptp_tx_tstamp
  ice: cleanup allocations in ice_ptp_alloc_tx_tracker
  ice: protect init and calibrating check in ice_ptp_request_ts
  ice: synchronize the misc IRQ when tearing down Tx tracker
  ice: check Tx timestamp memory register for ready timestamps
  ice: handle discarding old Tx requests in ice_ptp_tx_tstamp
  ice: always call ice_ptp_link_change and make it void
  ice: fix misuse of "link err" with "link status"
  ice: Reset TS memory for all quads
  ice: Remove the E822 vernier "bypass" logic
  ice: Use more generic names for ice_ptp_tx fields
====================

Link: https://lore.kernel.org/r/20221208213932.1274143-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: openvswitch: Add support to count upcall packets

Add support to count upall packets, when kmod of openvswitch
upcall to count the number of packets for upcall succeed and
failed, which is a better way to see how many packets upcalled
on every interfaces.

Signed-off-by: wangchuanlei <wangchuanlei@inspur.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Allow rhashtable to be used from irq-safe contexts

rhashtable currently only does bh-safe synchronization making it impossible
to use from irq-safe contexts. Switch it to use irq-safe synchronization to
remove the restriction.

v2: Update the lock functions to return the ulong flags value and unlock
functions to take the value directly instead of passing around the
pointer. Suggested by Linus.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: David Vernet <dvernet@meta.com>
Acked-by: Josh Don <joshdon@google.com>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Barret Rhoden <brho@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-sched-retpoline'

Pedro Tammela says:

====================
net/sched: retpoline wrappers for tc

In tc all qdics, classifiers and actions can be compiled as modules.
This results today in indirect calls in all transitions in the tc hierarchy.
Due to CONFIG_RETPOLINE, CPUs with mitigations=on might pay an extra cost on
indirect calls. For newer Intel cpus with IBRS the extra cost is
nonexistent, but AMD Zen cpus and older x86 cpus still go through the
retpoline thunk.

Known built-in symbols can be optimized into direct calls, thus
avoiding the retpoline thunk. So far, tc has not been leveraging this
build information and leaving out a performance optimization for some
CPUs. In this series we wire up 'tcf_classify()' and 'tcf_action_exec()'
with direct calls when known modules are compiled as built-in as an
opt-in optimization.

We measured these changes in one AMD Zen 4 cpu (Retpoline), one AMD Zen 3 cpu (Retpoline),
one Intel 10th Gen CPU (IBRS), one Intel 3rd Gen cpu (Retpoline) and one
Intel Xeon CPU (IBRS) using pktgen with 64b udp packets. Our test setup is a
dummy device with clsact and matchall in a kernel compiled with every
tc module as built-in.  We observed a 3-8% speed up on the retpoline CPUs,
when going through 1 tc filter, and a 60-100% speed up when going through 100 filters.
For the IBRS cpus we observed a 1-2% degradation in both scenarios, we believe
the extra branches check introduced a small overhead therefore we added
a static key that bypasses the wrapper on kernels not using the retpoline mitigation,
but compiled with CONFIG_RETPOLINE.

1 filter:
CPU        | before (pps) | after (pps) | diff
R9 7950X   | 5914980      | 6380227     | +7.8%
R9 5950X   | 4237838      | 4412241     | +4.1%
R9 5950X   | 4265287      | 4413757     | +3.4%   [*]
i5-3337U   | 1580565      | 1682406     | +6.4%
i5-10210U  | 3006074      | 3006857     | +0.0%
i5-10210U  | 3160245      | 3179945     | +0.6%   [*]
Xeon 6230R | 3196906      | 3197059     | +0.0%
Xeon 6230R | 3190392      | 3196153     | +0.01%  [*]

100 filters:
CPU        | before (pps) | after (pps) | diff
R9 7950X   | 373598       | 820396      | +119.59%
R9 5950X   | 313469       | 633303      | +102.03%
R9 5950X   | 313797       | 633150      | +101.77% [*]
i5-3337U   | 127454       | 211210      | +65.71%
i5-10210U  | 389259       | 381765      | -1.9%
i5-10210U  | 408812       | 412730      | +0.9%    [*]
Xeon 6230R | 415420       | 406612      | -2.1%
Xeon 6230R | 416705       | 405869      | -2.6%    [*]

[*] In these tests we ran pktgen with clone set to 1000.

On the 7950x system we also tested the impact of filters if iteration order
placement varied, first by compiling a kernel with the filter under test being
the first one in the static iteration and then repeating it with being last (of 15 classifiers existing today).
We saw a difference of +0.5-1% in pps between being the first in the iteration vs being the last.
Therefore we order the classifiers and actions according to relevance per our current thinking.

v5->v6:
- Address Eric Dumazet suggestions

v4->v5:
- Rebase

v3->v4:
- Address Eric Dumazet suggestions

v2->v3:
- Address suggestions by Jakub, Paolo and Eric
- Dropped RFC tag (I forgot to add it on v2)

v1->v2:
- Fix build errors found by the bots
- Address Kuniyuki Iwashima suggestions

====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: avoid indirect classify functions on retpoline kernels

Expose the necessary tc classifier functions and wire up cls_api to use
direct calls in retpoline kernels.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: avoid indirect act functions on retpoline kernels

Expose the necessary tc act functions and wire up act_api to use
direct calls in retpoline kernels.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: add retpoline wrapper for tc

On kernels using retpoline as a spectrev2 mitigation,
optimize actions and filters that are compiled as built-ins into a direct call.

On subsequent patches we expose the classifiers and actions functions
and wire up the wrapper into tc.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: move struct action_ops definition out of ifdef

The type definition should be visible even in configurations not using
CONFIG_NET_CLS_ACT.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xfrm: Fix spelling mistake "oflload" -> "offload"

There is a spelling mistake in a NL_SET_ERR_MSG message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

Merge branch 'mlx5 IPsec packet offload support (Part II)'

Leon Romanovsky says:

============
This is second part with implementation of packet offload.
============

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

net: phy: remove redundant "depends on" lines

Delete a few lines of "depends on PHYLIB" since they are inside
an "if PHYLIB / endif # PHYLIB" block, i.e., they are redundant
and the other 50+ drivers there don't use "depends on PHYLIB"
since it is not needed.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Link: https://lore.kernel.org/r/20221207044257.30036-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP

Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from
write_seq sockets instead of snd_una.

This should have been the behavior from the start. Because processes
may now exist that rely on the established behavior, do not change
behavior of the existing option, but add the right behavior with a new
flag. It is encouraged to always set SOF_TIMESTAMPING_OPT_ID_TCP on
stream sockets along with the existing SOF_TIMESTAMPING_OPT_ID.

Intuitively the contract is that the counter is zero after the
setsockopt, so that the next write N results in a notification for
the last byte N - 1.

On idle sockets snd_una == write_seq and this holds for both. But on
sockets with data in transmission, snd_una records the unacked offset
in the stream. This depends on the ACK response from the peer. A
process cannot learn this in a race free manner (ioctl SIOCOUTQ is one
racy approach).

write_seq records the offset at the last byte written by the process.
This is a better starting point. It matches the intuitive contract in
all circumstances, unaffected by external behavior.

The new timestamp flag necessitates increasing sk_tsflags to 32 bits.
Move the field in struct sock to avoid growing the socket (for some
common CONFIG variants). The UAPI interface so_timestamping.flags is
already int, so 32 bits wide.

Reported-by: Sotirios Delimanolis <sotodel@meta.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20221207143701.29861-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'fix-possible-deadlock-during-wed-attach'

Lorenzo Bianconi says:

====================
fix possible deadlock during WED attach

Fix a possible deadlock in mtk_wed_attach if mtk_wed_wo_init routine fails.
Check wo pointer is properly allocated before running mtk_wed_wo_reset() and
mtk_wed_wo_deinit().
====================

Link: https://lore.kernel.org/r/cover.1670421354.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_wed: fix possible deadlock if mtk_wed_wo_init fails

Introduce __mtk_wed_detach() in order to avoid a deadlock in
mtk_wed_attach routine if mtk_wed_wo_init fails since both
mtk_wed_attach and mtk_wed_detach run holding hw_lock mutex.

Fixes: 4c5de09eb0d0 ("net: ethernet: mtk_wed: add configure wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_wed: fix some possible NULL pointer dereferences

Fix possible NULL pointer dereference in mtk_wed_detach routine checking
wo pointer is properly allocated before running mtk_wed_wo_reset() and
mtk_wed_wo_deinit().
Even if it is just a theoretical issue at the moment check wo pointer is
not NULL in mtk_wed_mcu_msg_update.
Moreover, honor mtk_wed_mcu_send_msg return value in mtk_wed_wo_reset()

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Fixes: 4c5de09eb0d0 ("net: ethernet: mtk_wed: add configure wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfp: Fix spelling mistake "tha" -> "the"

There is a spelling mistake in a nn_dp_warn message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20221207094312.2281493-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: Fix O=dir builds

The BPF Makefile in net/bpf did incorrect path substitution for O=dir
builds, e.g.

  make O=/tmp/kselftest headers
  make O=/tmp/kselftest -C tools/testing/selftests

would fail in selftest builds [1] net/ with

  clang-16: error: no such file or directory: 'kselftest/net/bpf/nat6to4.c'
  clang-16: error: no input files

Add a pattern prerequisite and an order-only-prerequisite (for
creating the directory), to resolve the issue.

[1] https://lore.kernel.org/all/202212060009.34CkQmCN-lkp@intel.com/

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 837a3d66d698 ("selftests: net: Add cross-compilation support for BPF programs")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20221206102838.272584-1-bjorn@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlxsw-add-spectrum-1-ip6gre-support'

Petr Machata says:

====================
mlxsw: Add Spectrum-1 ip6gre support

Ido Schimmel writes:

Currently, mlxsw only supports ip6gre offload on Spectrum-2 and newer
ASICs. Spectrum-1 can also offload ip6gre tunnels, but it needs double
entry router interfaces (RIFs) for the RIFs representing these tunnels.
In addition, the RIF index needs to be even. This is handled in
patches #1-#3.

The implementation can otherwise be shared between all Spectrum
generations. This is handled in patches #4-#5.

Patch #6 moves a mlxsw ip6gre selftest to a shared directory, as ip6gre
is no longer only supported on Spectrum-2 and newer ASICs.

This work is motivated by users that require multiple GRE tunnels that
all share the same underlay VRF. Currently, mlxsw only supports
decapsulation based on the underlay destination IP (i.e., not taking the
GRE key into account), so users need to configure these tunnels with
different source IPs and IPv6 addresses are easier to spare than IPv4.

Tested using existing ip6gre forwarding selftests.
====================

Link: https://lore.kernel.org/r/cover.1670414573.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: Move IPv6 decap_error test to shared directory

Now that Spectrum-1 gained ip6gre support we can move the test out of
the Spectrum-2 directory.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_ipip: Add Spectrum-1 ip6gre support

As explained in the previous patch, the existing Spectrum-2 ip6gre
implementation can be reused for Spectrum-1. Change the Spectrum-1
ip6gre operations structure to use the common operations.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_ipip: Rename Spectrum-2 ip6gre operations

There are two main differences between Spectrum-1 and newer ASICs in
terms of IP-in-IP support:

1. In Spectrum-1, RIFs representing ip6gre tunnels require two entries
   in the RIF table.

2. In Spectrum-2 and newer ASICs, packets ingress the underlay (during
   encapsulation) and egress the underlay (during decapsulation) via a
   special generic loopback RIF.

The first difference was handled in previous patches by adding the
'double_rif_entry' field to the Spectrum-1 operations structure of
ip6gre RIFs. The second difference is handled during RIF creation, by
only creating a generic loopback RIF in Spectrum-2 and newer ASICs.

Therefore, the ip6gre operations can be shared between Spectrum-1 and
newer ASIC in a similar fashion to how the ipgre operations are shared.

Rename the operations to not be Spectrum-2 specific and move them
earlier in the file so that they could later be used for Spectrum-1.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_router: Add support for double entry RIFs

In Spectrum-1, loopback router interfaces (RIFs) used for IP-in-IP
encapsulation with an IPv6 underlay require two RIF entries and the RIF
index must be even.

Prepare for this change by extending the RIF parameters structure with a
'double_entry' field that indicates if the RIF being created requires
two RIF entries or not. Only set it for RIFs representing ip6gre tunnels
in Spectrum-1.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_router: Parametrize RIF allocation size

Currently, each router interface (RIF) consumes one entry in the RIFs
table. This is going to change in subsequent patches where some RIFs
will consume two table entries.

Prepare for this change by parametrizing the RIF allocation size. For
now, always pass '1'.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_router: Use gen_pool for RIF index allocation

Currently, each router interface (RIF) consumes one entry in the RIFs
table and there are no alignment constraints. This is going to change in
subsequent patches where some RIFs will consume two table entries and
their indexes will need to be aligned to the allocation size (even).

Prepare for this change by converting the RIF index allocation to use
gen_pool with the 'gen_pool_first_fit_order_align' algorithm.

No Kconfig changes necessary as mlxsw already selects
'GENERIC_ALLOCATOR'.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/netdev/net

No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Expose steering dropped packets counter

Add rx steering discarded packets counter to the vnic_diag debugfs.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Refactor and expand rep vport stat group

Expand representor vport stat group to support all counters from the
vport stat group, to count all the traffic passing through the vport.

Fix current implementation where fill_stats and update_stats use
different structs.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: multipath, support routes with more than 2 nexthops

Today multipath offload is only supported when the number of
nexthops is 2 which block the use of it in case of system with
2 NICs.

This patch solve it by enabling multipath offload per NIC if
2 nexthops of the route are its uplinks.

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: TC, add support for meter mtu offload

Initialize the meter object with the TC police mtu parameter.
Use the hardware range destination to compare the pkt len to the mtu setting.
Assign the range destination hit/miss ft to the police conform/exceed
attributes.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: meter, add mtu post meter tables

TC police action may configure the maximum packet size to be handled by
the policer, in addition to byte/packet rate.
MTU check is realized in hardware using the range destination, specifying
a hit ft, if packet len is in the range, or miss ft otherwise.

Instantiate mtu green/red flow tables with a single match-all rule.
Add the green/red actions to the hit/miss table accordingly.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: meter, refactor to allow multiple post meter tables

TC police action may configure the maximum packet size to be handled by
the policer, in addition to byte/packet rate.
Currently the post meter table steers the packet according to the meter
aso output.

Refactor the code to allow both metering and range post actions as a
pre-step for adding police mtu offload support.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Add support for range match action

Add support for matching on range.
The supported type of range is L2 frame size.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Add function that tells if STE miss addr has been initialized

Up until now miss address in all the STEs was used to connect miss lists
and to link the last STE in the list to end anchor.
Match range STE will require special handling because its miss address is
part of the 'action'. That is, range action has hit and miss addresses.
Since the range action is always the last action, need to make sure that
its miss address isn't overwritten by the end anchor.

Adding new function mlx5dr_ste_is_miss_addr_set() to answer the question
whether the STE's miss address has already been set as part of STE
initialization. Use a callback that always returns false right now. Once
match range is added, a different callback will be used for that STE type.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Some refactoring of miss address handling

In preparation for MATCH RANGE STE support, create a function
to set the miss address of an STE.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Manage definers with refcounts

In many cases different actions will ask for the same definer format.
Instead of allocating new definer general object and running out of
definers, have an xarray of allocated definers and keep track of their
usage with refcounts: allocate a new definer only when there isn't
one with the same format already created, and destroy definer only
when its refcount runs down to zero.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Handle FT action in a separate function

As preparation for range action support, moving the handling
of final ICM address for flow table action to a separate function.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Rework is_fw_table function

This patch handles the following two changes w.r.t. is_fw_table function:

1. When SW steering is asked to create/destroy FW table, we allow for
creation/destruction of only termination tables. Rename mlx5_dr_is_fw_table
both to comply with the static function naming and to reflect that we're
actually checking for FW termination table.

2. When the action 'go to flow table' is created, the destination flow
table can be any FW table, not only termination table. Adding function
to check if the dest table is FW table. This function will also be used
by the later creation of range match action, so putting it the header file.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Add functions to create/destroy MATCH_DEFINER general object

SW steering is able to match only on the exact values of the packet fields,
as requested by the user: the user provides mask for the fields that are of
interest, and the exact values to be matched on when the traffic is handled.

Match Definer is a general FW object that defines which fields in the
packet will be referenced by the mask and tag of each STE. Match definer ID
is part of STE fields, and it defines how the HW needs to interpret the STE's
mask/tag values.
Till now SW steering used the definers that were managed by FW and implemented
the STE layout as described by the HW spec. Now that we're adding a new type
of STE, SW steering needs to define for the HW how it should interpret this
new STE's layout.
This is done with a programmable match definer.

The programmable definer allows to selects which fields will be included in
the definer, and their layout: it has up to 9 DW selectors 8 Byte selectors.
Each selector indicates a DW/Byte worth of fields out of the table that
is defined by HW spec by referencing the offset of the required DW/Byte.

This patch adds dr_cmd function to create and destroy MATCH_DEFINER
general object.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: fs, add match on ranges API

Range is a new flow destination type which allows matching on
a range of values instead of matching on a specific value.

Range flow destination has the following fields:
- hit_ft: flow table to forward the traffic in case of hit
- miss_ft: flow table to forward the traffic in case of miss
- field: which packet characteristic to match on
- min: minimal value for the selected field
- max: maximal value for the selected field

Note:
- In order to match, the value in the packet should meet
the following criteria: min <= value < max
- Currently, the only supported field type is L2 packet length

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: mlx5_ifc updates for MATCH_DEFINER general object

Update full structure of match definer and add an ID of
the SELECT match definer type.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Merge tag 'net-6.1-rc9' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, can and netfilter.

  Current release - new code bugs:

   - bonding: ipv6: correct address used in Neighbour Advertisement
     parsing (src vs dst typo)

   - fec: properly scope IRQ coalesce setup during link up to supported
     chips only

  Previous releases - regressions:

   - Bluetooth fixes for fake CSR clones (knockoffs):
       - re-add ERR_DATA_REPORTING quirk
       - fix crash when device is replugged

   - Bluetooth:
       - silence a user-triggerable dmesg error message
       - L2CAP: fix u8 overflow, oob access
       - correct vendor codec definition
       - fix support for Read Local Supported Codecs V2

   - ti: am65-cpsw: fix RGMII configuration at SPEED_10

   - mana: fix race on per-CQ variable NAPI work_done

  Previous releases - always broken:

   - af_unix: diag: fetch user_ns from in_skb in unix_diag_get_exact(),
     avoid null-deref

   - af_can: fix NULL pointer dereference in can_rcv_filter

   - can: slcan: fix UAF with a freed work

   - can: can327: flush TX_work on ldisc .close()

   - macsec: add missing attribute validation for offload

   - ipv6: avoid use-after-free in ip6_fragment()

   - nft_set_pipapo: actually validate intervals in fields after the
     first one

   - mvneta: prevent oob access in mvneta_config_rss()

   - ipv4: fix incorrect route flushing when table ID 0 is used, or when
     source address is deleted

   - phy: mxl-gpy: add workaround for IRQ bug on GPY215B and GPY215C"

* tag 'net-6.1-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
  net: dsa: sja1105: avoid out of bounds access in sja1105_init_l2_policing()
  s390/qeth: fix use-after-free in hsci
  macsec: add missing attribute validation for offload
  net: mvneta: Fix an out of bounds check
  net: thunderbolt: fix memory leak in tbnet_open()
  ipv6: avoid use-after-free in ip6_fragment()
  net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq()
  net: phy: mxl-gpy: add MDINT workaround
  net: dsa: mv88e6xxx: accept phy-mode = "internal" for internal PHY ports
  xen/netback: don't call kfree_skb() under spin_lock_irqsave()
  dpaa2-switch: Fix memory leak in dpaa2_switch_acl_entry_add() and dpaa2_switch_acl_entry_remove()
  ethernet: aeroflex: fix potential skb leak in greth_init_rings()
  tipc: call tipc_lxc_xmit without holding node_read_lock
  can: esd_usb: Allow REC and TEC to return to zero
  can: can327: flush TX_work on ldisc .close()
  can: slcan: fix freed work crash
  can: af_can: fix NULL pointer dereference in can_rcv_filter
  net: dsa: sja1105: fix memory leak in sja1105_setup_devlink_regions()
  ipv4: Fix incorrect route flushing when table ID 0 is used
  ipv4: Fix incorrect route flushing when source address is deleted
  ...

Merge branch 'mlx4-better-big-tcp-support'

Eric Dumazet says:

====================
mlx4: better BIG-TCP support

mlx4 uses a bounce buffer in TX whenever the tx descriptors
wrap around the right edge of the ring.

Size of this bounce buffer was hard coded and can be
increased if/when needed.
====================

Link: https://lore.kernel.org/r/20221207141237.2575012-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx4: small optimization in mlx4_en_xmit()

Test against MLX4_MAX_DESC_TXBBS only matters if the TX
bounce buffer is going to be used.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS

Google production kernel has increased MAX_SKB_FRAGS to 45
for BIG-TCP rollout.

Unfortunately mlx4 TX bounce buffer is not big enough whenever
an skb has up to 45 page fragments.

This can happen often with TCP TX zero copy, as one frag usually
holds 4096 bytes of payload (order-0 page).

Tested:
Kernel built with MAX_SKB_FRAGS=45
ip link set dev eth0 gso_max_size 185000
netperf -t TCP_SENDFILE

I made sure that "ethtool -G eth0 tx 64" was properly working,
ring->full_size being set to 15.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Wei Wang <weiwan@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx4: rename two constants

MAX_DESC_SIZE is really the size of the bounce buffer used
when reaching the right side of TX ring buffer.

MAX_DESC_TXBBS get a MLX4_ prefix.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: reschedule ice_ptp_wait_for_offset_valid during reset

If the ice_ptp_wait_for_offest_valid function is scheduled to run while the
driver is resetting, it will exit without completing calibration. The work
function gets scheduled by ice_ptp_port_phy_restart which will be called as
part of the reset recovery process.

It is possible for the first execution to occur before the driver has
completely cleared its resetting flags. Ensure calibration completes by
rescheduling the task until reset is fully completed.

Reported-by: Siddaraju DH <siddaraju.dh@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: make Tx and Rx vernier offset calibration independent

The Tx and Rx calibration and timestamp generation blocks are independent.
However, the ice driver waits until both blocks are ready before
configuring either block.

This can result in delay of configuring one block because we have not yet
received a packet in the other block.

There is no reason to wait to finish programming Tx just because we haven't
received a packet. Similarly there is no reason to wait to program Rx just
because we haven't transmitted a packet.

Instead of checking both offset status before programming either block,
refactor the ice_phy_cfg_tx_offset_e822 and ice_phy_cfg_rx_offset_e822
functions so that they perform their own offset status checks.
Additionally, make them also check the offset ready bit to determine if
the offset values have already been programmed.

Call the individual configure functions directly in
ice_ptp_wait_for_offset_valid. The functions will now correctly check
status, and program the offsets if ready. Once the offset is programmed,
the functions will exit quickly after just checking the offset ready
register.

Remove the ice_phy_calc_vernier_e822 in ice_ptp_hw.c, as well as the offset
valid check functions in ice_ptp.c entirely as they are no longer
necessary.

With this change, the Tx and Rx blocks will each be enabled as soon as
possible without waiting for the other block to complete calibration. This
can enable timestamps faster in setups which have a low rate of transmitted
or received packets. In particular, it can stop a situation where one port
never receives traffic, and thus never finishes calibration of the Tx
block, resulting in continuous faults reported by the ptp4l daemon
application.

Signed-off-by: Siddaraju DH <siddaraju.dh@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: only check set bits in ice_ptp_flush_tx_tracker

The ice_ptp_flush_tx_tracker function is called to clear all outstanding Tx
timestamp requests when the port is being brought down. This function
iterates over the entire list, but this is unnecessary. We only need to
check the bits which are actually set in the ready bitmap.

Replace this logic with for_each_set_bit, and follow a similar flow as in
ice_ptp_tx_tstamp_cleanup. Note that it is safe to call dev_kfree_skb_any
on a NULL pointer as it will perform a no-op so we do not need to verify
that the skb is actually NULL.

The new implementation also avoids clearing (and thus reading!) the PHY
timestamp unless the index is marked as having a valid timestamp in the
timestamp status bitmap. This ensures that we properly clear the status
registers as appropriate.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: handle flushing stale Tx timestamps in ice_ptp_tx_tstamp

In the event of a PTP clock time change due to .adjtime or .settime, the
ice driver needs to update the cached copy of the PHC time and also discard
any outstanding Tx timestamps.

This is required because otherwise the wrong copy of the PHC time will be
used when extending the Tx timestamp. This could result in reporting
incorrect timestamps to the stack.

The current approach taken to handle this is to call
ice_ptp_flush_tx_tracker, which will discard any timestamps which are not
yet complete.

This is problematic for two reasons:

1) it could lead to a potential race condition where the wrong timestamp is
   associated with a future packet.

   This can occur with the following flow:

   1. Thread A gets request to transmit a timestamped packet, and picks an
      index and transmits the packet

   2. Thread B calls ice_ptp_flush_tx_tracker and sees the index in use,
      marking is as disarded. No timestamp read occurs because the status
      bit is not set, but the index is released for re-use

   3. Thread A gets a new request to transmit another timestamped packet,
      picks the same (now unused) index and transmits that packet.

   4. The PHY transmits the first packet and updates the timestamp slot and
      generates an interrupt.

   5. The ice_ptp_tx_tstamp thread executes and sees the interrupt and a
      valid timestamp but associates it with the new Tx SKB and not the one
      that actual timestamp for the packet as expected.

   This could result in the previous timestamp being assigned to a new
   packet producing incorrect timestamps and leading to incorrect behavior
   in PTP applications.

   This is most likely to occur when the packet rate for Tx timestamp
   requests is very high.

2) on E822 hardware, we must avoid reading a timestamp index more than once
   each time its status bit is set and an interrupt is generated by
   hardware.

   We do have some extensive checks for the unread flag to ensure that only
   one of either the ice_ptp_flush_tx_tracker or ice_ptp_tx_tstamp threads
   read the timestamp. However, even with this we can still have cases
   where we "flush" a timestamp that was actually completed in hardware.
   This can lead to cases where we don't read the timestamp index as
   appropriate.

To fix both of these issues, we must avoid calling ice_ptp_flush_tx_tracker
outside of the teardown path.

Rather than using ice_ptp_flush_tx_tracker, introduce a new state bitmap,
the stale bitmap. Start this as cleared when we begin a new timestamp
request. When we're about to extend a timestamp and send it up to the
stack, first check to see if that stale bit was set. If so, drop the
timestamp without sending it to the stack.

When we need to update the cached PHC timestamp out of band, just mark all
currently outstanding timestamps as stale. This will ensure that once
hardware completes the timestamp we'll ignore it correctly and avoid
reporting bogus timestamps to userspace.

With this change, we fix potential issues caused  by calling
ice_ptp_flush_tx_tracker during normal operation.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: cleanup allocations in ice_ptp_alloc_tx_tracker

The ice_ptp_alloc_tx_tracker function must allocate the timestamp array and
the bitmap for tracking the currently in use indexes. A future change is
going to add yet another allocation to this function.

If these allocations fail we need to ensure that we properly cleanup and
ensure that the pointers in the ice_ptp_tx structure are NULL.

Simplify this logic by allocating to local variables first. If any
allocation fails, then free everything and exit. Only update the ice_ptp_tx
structure if all allocations succeed.

This ensures that we have no side effects on the Tx structure unless all
allocations have succeeded. Thus, no code will see an invalid pointer and
we don't need to re-assign NULL on cleanup.

This is safe because kernel "free" functions are designed to be NULL safe
and perform no action if passed a NULL pointer. Thus its safe to simply
always call kfree or bitmap_free even if one of those pointers was NULL.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: protect init and calibrating check in ice_ptp_request_ts

When requesting a new timestamp, the ice_ptp_request_ts function does not
hold the Tx tracker lock while checking init and calibrating. This means
that we might issue a new timestamp request just after the Tx timestamp
tracker starts being deinitialized. This could lead to incorrect access of
the timestamp structures. Correct this by moving the init and calibrating
checks under the lock, and updating the flows which modify these fields to
use the lock.

Note that we do not need to hold the lock while checking for tx->init in
ice_ptp_tx_tstamp. This is because the teardown function will use
synchronize_irq after clearing the flag to ensure that the threaded
interrupt completes. Either a) the tx->init flag will be cleared before the
ice_ptp_tx_tstamp function starts, thus it will exit immediately, or b) the
threaded interrupt will be executing and the synchronize_irq will wait
until the threaded interrupt has completed at which point we know the init
field has definitely been set and new interrupts will not execute the Tx
timestamp thread function.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Merge branch 'mlx5-Support-tc-police-jump-conform-exceed-attribute'

Saeed Mahameed says:

====================
Support tc police jump conform-exceed attribute

The tc police action conform-exceed option defines how to handle
packets which exceed or conform to the configured bandwidth limit.
One of the possible conform-exceed values is jump, which skips over
a specified number of actions.
This series adds support for conform-exceed jump action.

The series adds platform support for branching actions by providing
true/false flow attributes to the branching action.
This is necessary for supporting police jump, as each branch may
execute a different action list.

The first five patches are preparation patches:
- Patches 1 and 2 add support for actions with no destinations (e.g. drop)
- Patch 3 refactor the code for subsequent function reuse
- Patch 4 defines an abstract way for identifying terminating actions
- Patch 5 updates action list validations logic considering branching actions

The following three patches introduce an interface for abstracting branching
actions:
- Patch 6 introduces an abstract api for defining branching actions
- Patch 7 generically instantiates the branching flow attributes using
  the abstract API

Patch 8 adds the platform support for jump actions, by executing the following
sequence:
  a. Store the jumping flow attr
  b. Identify the jump target action while iterating the actions list.
  c. Instantiate a new flow attribute after the jump target action.
     This is the flow attribute that the branching action should jump to.
  d. Set the target post action id on:
    d.1. The jumping attribute, thus realizing the jump functionality.
    d.2. The attribute preceding the target jump attr, if not terminating.

The next patches apply the platform's branching attributes to the police
action:
- Patch 9 is a refactor patch
- Patch 10 initializes the post meter table with the red/green flow attributes,
           as were initialized by the platform
- Patch 11 enables the offload of meter actions using jump conform-exceed
           value.
====================

Link: https://lore.kernel.org/all/20221203221337.29267-1-saeed@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TC, allow meter jump control action

Separate the matchall police action validation from flower validation.
Isolate the action validation logic in the police action parser.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-12-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TC, init post meter rules with branching attributes

Instantiate the post meter actions with the platform initialized branching
action attributes.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-11-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TC, rename post_meter actions

Currently post meter supports only the pipe/drop conform-exceed policy.
This assumption is reflected in several variable names.
Rename the following variables as a pre-step for using the generalized
branching action platform.

Rename fwd_green_rule/drop_red_rule to green_rule/red_rule respectively.
Repurpose red_counter/green_counter to act_counter/drop_counter to allow
police conform-exceed configurations that do not drop.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-10-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TC, initialize branching action with target attr

Identify the jump target action when iterating the action list.
Initialize the jump target attr with the jumping attribute during the
parsing phase. Initialize the jumping attr post action with the target
during the offload phase.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-9-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TC, initialize branch flow attributes

Initialize flow attribute for drop, accept, pipe and jump branching actions.

Instantiate a flow attribute instance according to the specified branch
control action. Store the branching attributes on the branching action
flow attribute during the parsing phase. Then, during the offload phase,
allocate the relevant mod header objects to the branching actions.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-8-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TC, set control params for branching actions

Extend the act tc api to set the branch control params aligning with
the police conform/exceed use case.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-7-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TC, validate action list per attribute

Currently the entire flow action list is validate for offload limitations.
For example, flow with both forward and drop actions are declared invalid
due to hardware restrictions.
However, a multi-table hardware model changes the limitations from a flow
scope to a single flow attribute scope.

Apply offload limitations to flow attributes instead of the entire flow.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-6-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TC, add terminating actions

Extend act api to identify actions that terminate action list.
Pre-step for terminating branching actions.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-5-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TC, reuse flow attribute post parser processing

After the tc action parsing phase the flow attribute is initialized with
relevant eswitch offload objects such as tunnel, vlan, header modify and
counter attributes. The post processing is done both for fdb and post-action
attributes.

Reuse the flow attribute post parsing logic by both fdb and post-action
offloads.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-4-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: fs, assert null dest pointer when dest_num is 0

Currently create_flow_handle() assumes a null dest pointer when there
are no destinations.
This might not be the case as the caller may pass an allocated dest
array while setting the dest_num parameter to 0.

Assert null dest array for flow rules that have no destinations (e.g. drop
rule).

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>