review.tizen.org Git - platform/kernel/linux-rpi.git/log

net/mlx5: DR, Remove hw_ste from mlx5dr_ste to reduce memory

It can be calculated via function mlx5dr_ste_get_hw_ste().
Very simple and lightweight, no need to use a dedicated member.

Reduce 8 bytes from struct mlx5dr_ste and its size is 48 bytes now.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Shun Hao <shunh@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Remove 4 members from mlx5dr_ste_htbl to reduce memory

Remove chunk_size in struct mlx5dr_icm_chunk and use
chunk->size instead.

Remove ste_arr/hw_ste_arr/miss_list since they can be accessed
from htbl->chunk pointer, no need to keep a copy.

This commit reduces 28 bytes from struct mlx5dr_ste_htbl and its
size is 32 bytes now.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Shun Hao <shunh@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Remove num_of_entries byte_size from struct mlx5_dr_icm_chunk

Target to reduce the memory consumption in large scale of flow rules.

They can be calculated quickly from buddy memory pool.
1. num_of_entries calls dr_icm_pool_get_chunk_num_of_entries().
2. byte_size calls dr_icm_pool_get_chunk_byte_size().

Use chunk size in dr_icm_chunk to speed up and the one in dr_ste_htbl
will be removed in the upcoming commit.

This commit reduce 8 bytes from struct mlx5_dr_icm_chunk and its
current size is 56 bytes.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Shun Hao <shunh@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Remove icm_addr from mlx5dr_icm_chunk to reduce memory

It can be calculated quickly from buddy memory pool by
function mlx5dr_icm_pool_get_chunk_icm_addr().
This function is very lightweight and straightforward.

Reduce 8 bytes and current size of struct mlx5_dr_icm_chunk
is 64 bytes.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Shun Hao <shunh@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Remove mr_addr rkey from struct mlx5dr_icm_chunk

Reduce memory footprint by removing mr_addr and rkey from
mlx5_dr_icm_chunk.
1. mr_addr is calculated by mlx5dr_icm_pool_get_chunk_mr_addr()
2. rkey is calculated by mlx5dr_icm_pool_get_chunk_rkey()
The two new functions are very lightweight and straightforward.

Reduce 8 bytes from struct mlx5_dr_icm_chunk, its current size is
72 bytes.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Shun Hao <shunh@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Adjust structure member to reduce memory hole

Accord to profiling, mlx5dr_ste/mlx5dr_icm_chunk are the two
hot structures. Their memory layout can be optimized by
adjusting member sequences.

Struct mlx5dr_ste size changes from 64 bytes to 56 bytes.

In the upcoming commits, struct mlx5dr_icm_chunk memory layout
will change automatically after removing some members.
Keep it untouched here.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Reviewed-by: Shun Hao <shunh@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Drop cqe_bcnt32 from mlx5e_skb_from_cqe_mpwrq_linear

The packet size in mlx5e_skb_from_cqe_mpwrq_linear can't overflow u16,
since the maximum packet size in linear striding RQ is 2^13 bytes. Drop
the unneeded u32 variable.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Drop the len output parameter from mlx5e_xdp_handle

The len parameter of mlx5e_xdp_handle is used to output the new packet
length after XDP has processed the packet and returned XDP_PASS.
However, this value can be calculated on the caller site, as the caller
knows if it was an XDP_PASS.

This commit drops the len parameter and moves the calculation to the
caller, reducing the number of parameters passed to the function and
preparing for XDP support in non-linear legacy RQ.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: RX, Test the XDP program existence out of the handler

Instead of early return inside mlx5e_xdp_handle(), let the caller check
if an XDP program is loaded. This allows saving a few unnecessary
function calls and calculations in case !prog.

Performance test: single core, drop packets in iptables
Before: 3,872,504 pps
After: 3,975,628 pps (+2.66%)

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Build SKB in place over the first fragment in non-linear legacy RQ

As a performance optimization and preparation to enabling XDP multi
buffer on non-linear legacy RQ, build the linear part of the SKB over
the first fragment, instead of allocating a new buffer and copying the
first 256 bytes there.

To achieve this, add headroom and tailroom to the first fragment.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add headroom only to the first fragment in legacy RQ

Currently, rq->buff.headroom is applied to all fragments in legacy RQ.
In the linear mode, there is a non-zero headroom, but there is only one
fragment per packet. In the non-linear mode, the headroom is zero.

This commit changes the logic to apply the headroom only to the first
fragment. The current behavior remains the same for both linear and
non-linear modes. However, it allows the next commit to enable headroom
for the non-linear mode, which will be applied only to the first
fragment.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Validate MTU when building non-linear legacy RQ fragments info

mlx5e_build_rq_frags_info() assumes that MTU is not bigger than
PAGE_SIZE * MLX5E_MAX_RX_FRAGS, which is 16K for 4K pages. Currently,
the firmware limits MTU to 10K, so the assumption doesn't lead to a bug.

This commits adds an additional driver check for reliability, since the
firmware boundary might be changed.

The calculation is taken to a separate function with a comment
explaining it. It's a preparation for the following patches that
introcuce XDP multi buffer support.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

selftests: net: fix array_size.cocci warning

Fix array_size.cocci warning in tools/testing/selftests/net.

Use `ARRAY_SIZE(arr)` instead of forms like `sizeof(arr)/sizeof(arr[0])`.

It has been tested with gcc (Debian 8.3.0-6) 8.3.0.

Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Link: https://lore.kernel.org/r/20220316092858.9398-1-guozhengkui@vivo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: stmmac: clean up impossible condition

This code works but it has a static checker warning:

    drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:1687 init_dma_rx_desc_rings()
    warn: always true condition '(queue >= 0) => (0-u32max >= 0)'

Obviously, it makes no sense to check if an unsigned int is >= 0.  What
prevents this code from being a forever loop is that later there is a
separate check for if (queue == 0).

The "queue" variable is less than MTL_MAX_RX_QUEUES (8) so it can easily
fit in an int type.  Any larger value for "queue" would lead to an array
overflow when we assign "rx_q = &priv->rx_queue[queue]".

Fixes: de0b90e52a11 ("net: stmmac: rearrange RX and TX desc init into per-queue basis")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20220316083744.GB30941@kili
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: geneve: support IPv4/IPv6 as inner protocol

This patch adds support for encapsulating IPv4/IPv6 within GENEVE.

In order to use this, a new IFLA_GENEVE_INNER_PROTO_INHERIT flag needs
to be provided at device creation. This property cannot be changed for
the time being.

In case IP traffic is received on a non-tun device the drop count is
increased.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Link: https://lore.kernel.org/r/20220316061557.431872-1-eyal.birger@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-mvneta-armada-98dx2530-soc'

Chris Packham says:

====================
net: mvneta: Armada 98DX2530 SoC

This is split off from [1] to let it go in via net-next rather than waiting for
the rest of the series to land.

[1] - https://lore.kernel.org/lkml/20220314213143.2404162-1-chris.packham@alliedtelesis.co.nz/
====================

Link: https://lore.kernel.org/r/20220315215207.2746793-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mvneta: Add support for 98DX2530 Ethernet port

The 98DX2530 SoC is similar to the Armada 3700 except it needs a
different MBUS window configuration. Add a new compatible string to
identify this device and the required MBUS window configuration.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: net: mvneta: Add marvell,armada-ac5-neta

The out of band port on the 98DX2530 SoC is similar to the armada-3700
except it requires a slightly different MBUS window configuration. Add a
new compatible string so this difference can be accounted for.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ptp: ocp: Fix PTP_PF_* verification requests

Update and check functionality for pin configuration requests:

PTP_PF_NONE: requests "IN: None", disabling the pin.

  # testptp -d /dev/ptp3 -L3,0 -i1
  set pin function okay
  # cat sma4
  IN: None

PTP_PF_EXTTS: should configure external timestamps, but since the
timecard can steer inputs to multiple inputs as well as timestamps,
allow the request, but don't change configurations.

  # testptp -d /dev/ptp3 -L3,1 -i1
  set pin function okay

  (no functional or configuration change here yet)

PTP_PF_PEROUT: Channel 0 is the PHC, at 1PPS.  Channels 1-4 are
the programmable frequency generators.

  # fails because period is not 1PPS.
  # testptp -d /dev/ptp3 -L3,2 -i0  -p 500000000
  PTP_PEROUT_REQUEST: Invalid argument

  # testptp -d /dev/ptp3 -L3,2 -i0  -p 1000000000
  periodic output request okay
  # cat sma4
  OUT: PHC

  # testptp -d /dev/ptp3 -L3,2 -i1 -p 500000000 -w 200000000
  periodic output request okay
  # cat sma4
  OUT: GEN1
  # cat gen1/signal
  500000000 40 0 1 2022-03-10T23:55:26 TAI
  # cat gen1/running
  1

  # testptp -d /dev/ptp3 -L3,2 -i1 -p 0
  periodic output request okay
  # cat gen1/running
  0

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/r/20220315194626.1895-1-jonathan.lemon@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'flow_offload-add-tc-vlan-push_eth-and-pop_eth-actions'

Roi Dayan says:

====================
flow_offload: add tc vlan push_eth and pop_eth actions

Offloading vlan push_eth and pop_eth actions is needed in order to
correctly offload MPLSoUDP encap and decap flows, this series extends
the flow offload API to support these actions and updates mlx5 to
parse them.
====================

Link: https://lore.kernel.org/r/20220315110211.1581468-1-roid@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: MPLSoUDP encap, support action vlan pop_eth explicitly

Currently the MPLSoUDP encap offload does the L2 pop implicitly
while adding such action explicitly (vlan eth_push) will cause
the rule to not be offloaded.

Solve it by adding offload support for vlan eth_push in case of
MPLSoUDP decap case.

Flow example:
filter root protocol ip pref 1 flower chain 0
filter root protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 2.2.2.22
  src_ip 2.2.2.21
  in_hw in_hw_count 1
        action order 1: vlan  pop_eth pipe
         index 1 ref 1 bind 1
        used_hw_stats delayed

        action order 2: mpls  push protocol mpls_uc label 555 tc 3 ttl 255 pipe
         index 1 ref 1 bind 1
        used_hw_stats delayed

        action order 3: tunnel_key  set
        src_ip 8.8.8.21
        dst_ip 8.8.8.22
        dst_port 6635
        csum
        tos 0x4
        ttl 6 pipe
         index 1 ref 1 bind 1
        used_hw_stats delayed

        action order 4: mirred (Egress Redirect to device bareudp0) stolen
        index 1 ref 1 bind 1
        used_hw_stats delayed

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: MPLSoUDP decap, use vlan push_eth instead of pedit

Currently action pedit of source and destination MACs is used
to fill the MACs in L2 push step in MPLSoUDP decap offload,
this isn't aligned to tc SW which use vlan eth_push action
to do this.

To fix that, offload support for vlan veth_push action is
added together with mpls pop action, and deprecate the use
of pedit of MACs.

Flow example:
filter protocol mpls_uc pref 1 flower chain 0
filter protocol mpls_uc pref 1 flower chain 0 handle 0x1
  eth_type 8847
  mpls_label 555
  enc_dst_port 6635
  in_hw in_hw_count 1
        action order 1: tunnel_key  unset pipe
         index 2 ref 1 bind 1
        used_hw_stats delayed

        action order 2: mpls  pop protocol ip pipe
         index 2 ref 1 bind 1
        used_hw_stats delayed

        action order 3: vlan  push_eth dst_mac de:a2:ec:d6:69:c8 src_mac de:a2:ec:d6:69:c8 pipe
         index 2 ref 1 bind 1
        used_hw_stats delayed

        action order 4: mirred (Egress Redirect to device enp8s0f0_0) stolen
        index 2 ref 1 bind 1
        used_hw_stats delayed

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: add vlan push_eth and pop_eth action to the hardware IR

Add vlan push_eth and pop_eth action to the hardware intermediate
representation model which would subsequently allow it to be used
by drivers for offload.

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: Never offload FDB entries on standalone ports

If a port joins a bridge that it can't offload, it will fallback to
standalone mode and software bridging. In this case, we never want to
offload any FDB entries to hardware either.

Previously, for host addresses, we would eventually end up in
dsa_port_bridge_host_fdb_add, which would unconditionally dereference
dp->bridge and cause a segfault.

Fixes: c26933639b54 ("net: dsa: request drivers to perform FDB isolation")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220315233033.1468071-1-tobias@waldekranz.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

phy: Remove duplicated include in phy-fsl-lynx-28g.c

Fix following includecheck warning:
./drivers/phy/freescale/phy-fsl-lynx-28g.c: linux/workqueue.h is
included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220315235603.59481-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mv643xx_eth: undo some opreations in mv643xx_eth_probe

Cannot directly return platform_get_irq return irq, there
are operations that need to be undone.

Fixes: bf2b83425b59 ("net: mv643xx_eth: use platform_get_irq() instead of platform_get_resource()")
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220316012444.2126070-1-chi.minghao@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hns3: Fix spelling mistake "does't" -> "doesn't"

There is a spelling mistake in a dev_warn message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220315222914.2960786-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gve: Fix spelling mistake "droping" -> "dropping"

There is a spelling mistake in a netdev_warn warning. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220315222615.2960504-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ksz884x: optimize netdev_open flow and remove static variable

remove the static next_jiffies variable, and reinitialize next_jiffies
to simplify netdev_open

Signed-off-by: wujunwen <wudaemon@163.com>
Link: https://lore.kernel.org/r/20220315122857.78601-1-wudaemon@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hamradio: Fix wrong assignment of 'bbc->cfg.loopback'

In file hamradio/baycom_epp.c, the baycom_setmode interface, there
is a problem with improper use of strstr.

Suppose that when modestr="noloopback", both conditions which are
'strstr(modestr,"noloopback")' and 'strstr(modestr,"loopback")'
will be true(not NULL), this lead the bc->cfg.loopback variable
will be first assigned to 0, and then reassigned to 1.

This will cause 'bc->cfg.loopback = 0' will never take effect. That
obviously violates the logic of the code, so adjust the order of
their execution to solve the problem.

Signed-off-by: Meng Tang <tangmeng@uniontech.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20220315074851.6456-1-tangmeng@uniontech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bareudp: use ipv6_mod_enabled to check if IPv6 enabled

bareudp_create_sock() use AF_INET6 by default if IPv6 CONFIG enabled.
But if user start kernel with ipv6.disable=1, the bareudp sock will
created failed, which cause the interface open failed even with ethertype
ip. e.g.

# ip link add bareudp1 type bareudp dstport 2 ethertype ip
# ip link set bareudp1 up
RTNETLINK answers: Address family not supported by protocol

Fix it by using ipv6_mod_enabled() to check if IPv6 enabled. There is
no need to check IS_ENABLED(CONFIG_IPV6) as ipv6_mod_enabled() will
return false when CONFIG_IPV6 no enabled in include/linux/ipv6.h.

Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20220315062618.156230-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'linux-can-next-for-5.18-20220316' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2022-03-16

the first 3 patches are by Oliver Hartkopp target the CAN ISOTP
protocol and fix a problem found by syzbot in isotp_bind(), return
-EADDRNOTAVAIL in unbound sockets in isotp_recvmsg() and add support
for MSG_TRUNC to isotp_recvmsg().

Amit Kumar Mahapatra converts the xilinx,can device tree bindings to
yaml.

The last patch is by Julia Lawall and fixes typos in the ucan driver.

* tag 'linux-can-next-for-5.18-20220316' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: ucan: fix typos in comments
  dt-bindings: can: xilinx_can: Convert Xilinx CAN binding to YAML
  can: isotp: support MSG_TRUNC flag when reading from socket
  can: isotp: return -EADDRNOTAVAIL when reading from unbound socket
  can: isotp: sanitize CAN ID checks in isotp_bind()
====================

Link: https://lore.kernel.org/r/20220316204710.716341-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

can: ucan: fix typos in comments

Various spelling mistakes in comments.
Detected with the help of Coccinelle.

Link: https://lore.kernel.org/all/20220314115354.144023-28-Julia.Lawall@inria.fr
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

dt-bindings: can: xilinx_can: Convert Xilinx CAN binding to YAML

Convert Xilinx CAN binding documentation to YAML.

Link: https://lore.kernel.org/all/20220316171105.17654-1-amit.kumar-mahapatra@xilinx.com
Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@xilinx.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: isotp: support MSG_TRUNC flag when reading from socket

When providing the MSG_TRUNC flag via recvmsg() syscall the return value
provides the real length of the packet or datagram, even when it was longer
than the passed buffer.

Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://github.com/linux-can/can-utils/issues/347#issuecomment-1065932671
Link: https://lore.kernel.org/all/20220316164258.54155-3-socketcan@hartkopp.net
Suggested-by: Derek Will <derekrobertwill@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: isotp: return -EADDRNOTAVAIL when reading from unbound socket

When reading from an unbound can-isotp socket the syscall blocked
indefinitely. As unbound sockets (without given CAN address information)
do not make sense anyway we directly return -EADDRNOTAVAIL on read()
analogue to the known behavior from sendmsg().

Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://github.com/linux-can/can-utils/issues/349
Link: https://lore.kernel.org/all/20220316164258.54155-2-socketcan@hartkopp.net
Suggested-by: Derek Will <derekrobertwill@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: isotp: sanitize CAN ID checks in isotp_bind()

Syzbot created an environment that lead to a state machine status that
can not be reached with a compliant CAN ID address configuration.
The provided address information consisted of CAN ID 0x6000001 and 0xC28001
which both boil down to 11 bit CAN IDs 0x001 in sending and receiving.

Sanitize the SFF/EFF CAN ID values before performing the address checks.

Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/20220316164258.54155-1-socketcan@hartkopp.net
Reported-by: syzbot+2339c27f5c66c652843e@syzkaller.appspotmail.com
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge branch 'devlink-expose-instance-locking-and-simplify-port-splitting'

Jakub Kicinski says:

====================
devlink: expose instance locking and simplify port splitting

This series puts the devlink ports fully under the devlink instance
lock's protection. As discussed in the past it implements my preferred
solution of exposing the instance lock to the drivers. This way drivers
which want to support port splitting can lock the devlink instance
themselves on the probe path, and we can take that lock in the core
on the split/unsplit paths.

nfp and mlxsw are converted, with slightly deeper changes done in
nfp since I'm more familiar with that driver.

Now that the devlink port is protected we can pass a pointer to
the drivers, instead of passing a port index and forcing the drivers
to do their own lookups. Both nfp and mlxsw can container_of() to
their own structures.
====================

Link: https://lore.kernel.org/r/20220315060009.1028519-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: pass devlink_port to port_split / port_unsplit callbacks

Now that devlink ports are protected by the instance lock
it seems natural to pass devlink_port as an argument to
the port_split / port_unsplit callbacks.

This should save the drivers from doing a lookup.

In theory drivers may have supported unsplitting ports
which were not registered prior to this change.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: hold the instance lock in port_split / port_unsplit callbacks

Let the core take the devlink instance lock around port splitting
and remove the now redundant locking in the drivers.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: mlxsw: switch to explicit locking for port registration

Explicitly lock the devlink instance and use devl_ API.

This will be used by the subsequent patch to invoke
.port_split / .port_unsplit callbacks with devlink
instance lock held.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: nfp: replace driver's "pf" lock with devlink instance lock

The whole reason for existence of the pf mutex is that we could
not lock the devlink instance around port splitting. There are
more types of reconfig which can make ports appear or disappear.
Now that the devlink instance lock is exposed to drivers and
"locked" helpers exist we can switch to using the devlink lock
directly.

Next patches will move the locking inside .port_(un)split to
the core.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: nfp: wrap locking assertions in helpers

We can replace the PF lock with devlink instance lock in subsequent
changes. To make the patches easier to comprehend and limit line
lengths - factor out the existing locking assertions.

No functional changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: expose instance locking and add locked port registering

It should be familiar and beneficial to expose devlink instance
lock to the drivers. This way drivers can block devlink from
calling them during critical sections without breakneck locking.

Add port helpers, port splitting callbacks will be the first
target.

Use 'devl_' prefix for "explicitly locked" API. Initial RFC used
'__devlink' but that's too much typing.

devl_lock_is_held() is not defined without lockdep, which is
the same behavior as lockdep_is_held() itself.

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mediatek-next'

Biao Huang says:

====================
MediaTek Ethernet Patches on MT8195

Changes in v13:
1. add reviewed-by in "net: dt-bindings: dwmac: add support for mt8195"
   as Rob's comments.
2. drop num_clks defined in mediatek_dwmac_plat_data struct in "stmmac:
   dwmac-mediatek: Reuse more common features" as Angelo's comments.

Changes in v12:
1. add a new patch "stmmac: dwmac-mediatek: re-arrange clock setting" to
   this series, to simplify clock handling in driver, which benefits to
   binding file mediatek-dwmac.yaml.
2. modify dt-binding description in patch "net: dt-bindings: dwmac: add
   support for mt8195" as Rob's comments in v10 series, put mac_cg to the
   end of clock list.
3. there are small changes in patch "stmmac: dwmac-mediatek: add support
   for mt8195", @AngeloGioacchino, please review it kindly.

Changes in v11:
1. add reivewed-by in "net: dt-bindings: dwmac: Convert mediatek-dwmac to
   DT schema" as Rob's comments.
2. fall back "net: dt-bindings: dwmac: add support for mt8195" to v8 version
   as mentioned in previous reply(https://patchwork.ozlabs.org/project/devicetree-bindings/patch/20211216055328.15953-7-biao.huang@mediatek.com/):
   2.1 there is already a special clock named "rmii_internal", which need to
       be put to the end of the clock list(driver special handling),
       so we can't simply put new "mac_cg" for mt8195 to the end of the clock
       list.
   2.2 we prefer the if-then schema, which will make mt8195 clock list clearer
       with some duplicated information.
   2.3 we expect the future IC will follow mt2712 or mt8195, so we only need
       add new IC name to compatible list for future IC, and will not make the
       clock list binding files worse.

Changes in v10:
1. add detailed description in "arm64: dts: mt2712: update ethernet
   device node" to make the modifications clearer as Matthias's comments.
2. modify dt-binding description as Rob's comments, and "make dtbs_check" runs
   pass locally with "arm64: dts: mt2712: update ethernet device node"
   in this series.

Changes in v9:
1. remove oneOf for 1 entry as Rob's comments.
2. add new clocks to the end of existing clocks to simplify
   the binding as Rob's comments.

Changes in v8:
1. add acked-by in "stmmac: dwmac-mediatek: add platform level clocks
   management" patch

Changes in v7:
1. fix uninitialized warning as Jakub's comments.

Changes in v6:
1. update commit message as Jakub's comments.
2. split mt8195 eth dts patch("arm64: dts: mt8195: add ethernet device
   node") from this series, since mt8195 dtsi/dts basic patches is still
   under reviewing.
   https://patchwork.kernel.org/project/linux-mediatek/list/?series=579071
   we'll resend mt8195 eth dts patch once all the dependent patches are
   accepted.

Changes in v5:
1. remove useless inclusion in dwmac-mediatek.c as Angelo's comments.
2. add acked-by in "net-next: stmmac: dwmac-mediatek: add support for
   mt8195" patch

Changes in v4:
1. add changes in commit message in "net-next: dt-bindings: dwmac:
   Convert mediatek-dwmac to DT schema" patch.
2. remove ethernet-controller.yaml since snps,dwmac.yaml already include it.

Changes in v3:
1. Add prefix "net-next" to support new IC as Denis's suggestion.
2. Split dt-bindings to two patches, one for conversion, and the other for
   new IC.
3. add a new patch to update device node in mt2712-evb.dts to accommodate to
   changes in driver.
4. remove unnecessary wrapper as Angelo's suggestion.
5. Add acked-by in "net-next: stmmac: dwmac-mediatek: Reuse more common
   features" patch.

Changes in v2:
1. fix errors/warnings in mediatek-dwmac.yaml with upgraded dtschema tools

Changes in v1:
This series include 5 patches:
1. add platform level clocks management for dwmac-mediatek
2. resue more common features defined in stmmac_platform.c
3. add ethernet entry for mt8195
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dt-bindings: dwmac: add support for mt8195

Add binding document for the ethernet on mt8195.

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: dwmac-mediatek: add support for mt8195

Add Ethernet support for MediaTek SoCs from the mt8195 family.

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Acked-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dt-bindings: dwmac: Convert mediatek-dwmac to DT schema

Convert mediatek-dwmac to DT schema, and delete old mediatek-dwmac.txt.
And there are some changes in .yaml than .txt, others almost keep the same:
  1. compatible "const: snps,dwmac-4.20".
  2. delete "snps,reset-active-low;" in example, since driver remove this
     property long ago.
  3. add "snps,reset-delay-us = <0 10000 10000>" in example.
  4. the example is for rgmii interface, keep related properties only.

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

arm64: dts: mt2712: update ethernet device node

Since there are some changes in ethernet driver:
update ethernet device node in dts to accommodate to it.

1. stmmac_probe_config_dt() in stmmac_platform.c will initialize specified
   parameters according to compatible string "snps,dwmac-4.20a", then,
   dwmac-mediatek.c can skip the initialization if add compatible string
   "snps,dwmac-4.20a" in eth device node.
2. commit 882007ed7832 ("net-next: dt-binding: dwmac-mediatek: add more
   description for RMII") added rmii internal support, we should add
   corresponding clocks/clocks-names in eth device node.
3. add "snps,reset-delays-us = <0 10000 10000>;" to ensure reset delay
   can meet PHY requirement.

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: dwmac-mediatek: re-arrange clock setting

The rmii_internal clock is needed only when PHY
interface is RMII, and reference clock is from MAC.

Re-arrange the clock setting as following:
1. the optional "rmii_internal" is controlled by devm_clk_get(),
2. other clocks still be configured by devm_clk_bulk_get().

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: dwmac-mediatek: Reuse more common features

This patch makes dwmac-mediatek reuse more features
supported by stmmac_platform.c.

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Acked-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: dwmac-mediatek: add platform level clocks management

This patch implements clks_config callback for dwmac-mediatek platform,
which could support platform level clocks management.

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Acked-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-03-15

Jacob Keller says:

The ice_sriov.c file now houses almost all of the virtualization code in the
ice driver. This includes both Single Root specific implementation as well
as generic functionality such as the virtchnl interface.

We are planning to implement support for Scalable IOV in the ice driver in
the future. This implementation will want to use the generic functionality
in ice_sriov.c

Rather than dump the Scalable IOV code into ice_sriov.c, we will want to
implement it in a separate file, ice_siov.c

To help with this, refactor the code in ice_sriov.c and split the generic
functionality out into separate files.

Reorganize code to make the non-implementation specific bits into new files
with the following general guidelines:

* ice_vf_lib.[ch]

Basic VF structures and accessors. This is where scheme-independent
code will reside.

* ice_virtchnl.[ch]

Virtchnl message handling. This is where the bulk of the logic for
processing messages from VFs using the virtchnl messaging scheme will
reside. This is separated from ice_vf_lib.c because it is somewhat
distinct and stand alone.

* ice_sriov.[ch]

Single Root IOV implementation, including initialization and the
routines for interacting with SR-IOV based netdev operations.

* (future) ice_siov.[ch]

Scalable IOV implementation.

The end goal is to make it easier to re-use the generic parts of the
virtualization logic while keeping separate the concerns of the Single Root
implementation.

In addition to the pure code moves, this series has a reset refactor which
clean up the functionality to make it easier to reuse the reset code. A new
ops table is introduced to make the VF reset logic more generic. The Single
Root specific details are implemented in ice_sriov.c. A future series
implementing Scalable IOV support will use this ops table to allow re-use of
the reset logic which is now in ice_vf_lib.c
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: sparx5: Use Switchdev fdb events for managing fdb entries

Changes the handling of fdb entries to use Switchdev events,
instead of the previous "sync_bridge" and "sync_port" which
only run when adding or removing VLANs on the bridge.

Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Link: https://lore.kernel.org/r/20220314160918.4rfrrfgmbsf2pxl3@wse-c0155
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Add l3mdev index to flow struct and avoid oif reset for port devices

The fundamental premise of VRF and l3mdev core code is binding a socket
to a device (l3mdev or netdev with an L3 domain) to indicate L3 scope.
Legacy code resets flowi_oif to the l3mdev losing any original port
device binding. Ben (among others) has demonstrated use cases where the
original port device binding is important and needs to be retained.
This patch handles that by adding a new entry to the common flow struct
that can indicate the l3mdev index for later rule and table matching
avoiding the need to reset flowi_oif.

In addition to allowing more use cases that require port device binds,
this patch brings a few datapath simplications:

1. l3mdev_fib_rule_match is only called when walking fib rules and
   always after l3mdev_update_flow. That allows an optimization to bail
   early for non-VRF type uses cases when flowi_l3mdev is not set. Also,
   only that index needs to be checked for the FIB table id.

2. l3mdev_update_flow can be called with flowi_oif set to a l3mdev
   (e.g., VRF) device. By resetting flowi_oif only for this case the
   FLOWI_FLAG_SKIP_NH_OIF flag is not longer needed and can be removed,
   removing several checks in the datapath. The flowi_iif path can be
   simplified to only be called if the it is not loopback (loopback can
   not be assigned to an L3 domain) and the l3mdev index is not already
   set.

3. Avoid another device lookup in the output path when the fib lookup
   returns a reject failure.

Note: 2 functional tests for local traffic with reject fib rules are
updated to reflect the new direct failure at FIB lookup time for ping
rather than the failure on packet path. The current code fails like this:

    HINT: Fails since address on vrf device is out of device scope
    COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
    ping: Warning: source address might be selected on device other than: eth1
    PING 172.16.3.1 (172.16.3.1) from 172.16.3.1 eth1: 56(84) bytes of data.

    --- 172.16.3.1 ping statistics ---
    1 packets transmitted, 0 received, 100% packet loss, time 0ms

where the test now directly fails:

    HINT: Fails since address on vrf device is out of device scope
    COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
    ping: connect: No route to host

Signed-off-by: David Ahern <dsahern@kernel.org>
Tested-by: Ben Greear <greearb@candelatech.com>
Link: https://lore.kernel.org/r/20220314204551.16369-1-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: remove PF pointer from ice_check_vf_init

The ice_check_vf_init function takes both a PF and a VF pointer. Every
caller looks up the PF pointer from the VF structure. Some callers only
use of the PF pointer is call this function. Move the lookup inside
ice_check_vf_init and drop the unnecessary argument.

Cleanup the callers to drop the now unnecessary local variables. In
particular, replace the local PF pointer with a HW structure pointer in
ice_vc_get_vf_res_msg which simplifies a few accesses to the HW
structure in that function.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: introduce ice_virtchnl.c and ice_virtchnl.h

Just as we moved the generic virtualization library logic into
ice_vf_lib.c, move the virtchnl message handling into ice_virtchnl.c

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: cleanup long lines in ice_sriov.c

Before we move the virtchnl message handling from ice_sriov.c into
ice_virtchnl.c, cleanup some long line warnings to avoid checkpatch.pl
complaints.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: introduce ICE_VF_RESET_LOCK flag

The ice_reset_vf function performs actions which must be taken only
while holding the VF configuration lock. Some flows already acquired the
lock, while other flows must acquire it just for the reset function. Add
the ICE_VF_RESET_LOCK flag to the function so that it can handle taking
and releasing the lock instead at the appropriate scope.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: introduce ICE_VF_RESET_NOTIFY flag

In some cases of resetting a VF, the PF would like to first notify the
VF that a reset is impending. This is currently done via
ice_vc_notify_vf_reset. A wrapper to ice_reset_vf, ice_vf_reset_vf, is
used to call this function first before calling ice_reset_vf.

In fact, every single call to ice_vc_notify_vf_reset occurs just prior
to a call to ice_vc_reset_vf.

Now that ice_reset_vf has flags, replace this separate call with an
ICE_VF_RESET_NOTIFY flag. This removes an unnecessary exported function
of ice_vc_notify_vf_reset, and also makes there be a single function to
reset VFs (ice_reset_vf).

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: convert ice_reset_vf to take flags

The ice_reset_vf function takes a boolean parameter which indicates
whether or not the reset is due to a VFLR event.

This is somewhat confusing to read because readers must interpret what
"true" and "false" mean when seeing a line of code like
"ice_reset_vf(vf, false)".

We will want to add another toggle to the ice_reset_vf in a following
change. To avoid proliferating many arguments, convert this function to
take flags instead. ICE_VF_RESET_VFLR will indicate if this is a VFLR
reset. A value of 0 indicates no flags.

One could argue that "ice_reset_vf(vf, 0)" is no more readable than
"ice_reset_vf(vf, false)".. However, this type of flags interface is
somewhat common and using 0 to mean "no flags" makes sense in this
context. We could bother to add a define for "ICE_VF_RESET_PLAIN" or
something similar, but this can be confusing since its not an actual bit
flag.

This paves the way to add another flag to the function in a following
change.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: convert ice_reset_vf to standard error codes

The ice_reset_vf function returns a boolean value indicating whether or
not the VF reset. This is a bit confusing since it means that callers
need to know how to interpret the return value when needing to indicate
an error.

Refactor the function and call sites to report a regular error code. We
still report success (i.e. return 0) in cases where the reset is in
progress or is disabled.

Existing callers don't care because they do not check the return value.
We keep the error code anyways instead of a void return because we
expect future code which may care about or at least report the error
value.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: make ice_reset_all_vfs void

The ice_reset_all_vfs function returns true if any VFs were reset, and
false otherwise. However, no callers check the return value.

Drop this return value and make the function void since the callers do
not care about this.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: drop is_vflr parameter from ice_reset_all_vfs

The ice_reset_all_vfs function takes a parameter to handle whether its
operating after a VFLR event or not. This is not necessary as every
caller always passes true. Simplify the interface by removing the
parameter.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: move reset functionality into ice_vf_lib.c

Now that the reset functions do not rely on Single Root specific
behavior, move the ice_reset_vf, ice_reset_all_vfs, and
ice_vf_rebuild_host_cfg functions and their dependent helper functions
out of ice_sriov.c and into ice_vf_lib.c

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: fix a long line warning in ice_reset_vf

We're about to move ice_reset_vf out of ice_sriov.c and into
ice_vf_lib.c

One of the dev_err statements has a checkpatch.pl violation due to
putting the vf->vf_id on the same line as the dev_err. Fix this style
issue first before moving the code.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: introduce VF operations structure for reset flows

The ice driver currently supports virtualization using Single Root IOV,
with code in the ice_sriov.c file. In the future, we plan to also
implement support for Scalable IOV, which uses slightly different
hardware implementations for some functionality.

To eventually allow this, we introduce a new ice_vf_ops structure which
will contain the basic operations that are different between the two IOV
implementations. This primarily includes logic for how to handle the VF
reset registers, as well as what to do before and after rebuilding the
VF's VSI.

Implement these ops structures and call the ops table instead of
directly calling the SR-IOV specific function. This will allow us to
easily add the Scalable IOV implementation in the future. Additionally,
it helps separate the generalized VF logic from SR-IOV specifics. This
change allows us to move the reset logic out of ice_sriov.c and into
ice_vf_lib.c without placing any Single Root specific details into the
generic file.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: fix incorrect dev_dbg print mistaking 'i' for vf->vf_id

If we fail to clear the malicious VF indication after a VF reset, the
dev_dbg message which is printed uses the local variable 'i' when it
meant to use vf->vf_id. Fix this.

Fixes: 0891c89674e8 ("ice: warn about potentially malicious VFs")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: introduce ice_vf_lib.c, ice_vf_lib.h, and ice_vf_lib_private.h

Introduce the ice_vf_lib.c file along with the ice_vf_lib.h and
ice_vf_lib_private.h header files.

These files will house the generic VF structures and access functions.
Move struct ice_vf and its dependent definitions into this new header
file.

The ice_vf_lib.c is compiled conditionally on CONFIG_PCI_IOV. Some of
its functionality is required by all driver files. However, some of its
functionality will only be required by other files also conditionally
compiled based on CONFIG_PCI_IOV.

Declaring these functions used only in CONFIG_PCI_IOV files in
ice_vf_lib.h is verbose. This is because we must provide a fallback
implementation for each function in this header since it is included in
files which may not be compiled with CONFIG_PCI_IOV.

Instead, introduce a new ice_vf_lib_private.h header which verifies that
CONFIG_PCI_IOV is enabled. This header is intended to be directly
included in .c files which are CONFIG_PCI_IOV only. Add a #error
indication that will complain if the file ever gets included by another
C file on a kernel with CONFIG_PCI_IOV disabled. Add a comment
indicating the nature of the file and why it is useful.

This makes it so that we can easily define functions exposed from
ice_vf_lib.c into other virtualization files without needing to add
fallback implementations for every single function.

This begins the path to separate out generic code which will be reused
by other virtualization implementations from ice_sriov.h and ice_sriov.c

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-03-14

Jacob Keller says:

The ice_virtchnl_pf.c file has become a single place for a lot of
virtualization functionality. This includes most of the virtchnl message
handling, integration with kernel hooks like the .ndo operations, reset
logic, and more.

We are planning in the future to implement and support Scalable IOV in the
ice driver. To do this, much (but not all) of the code in ice_virtchnl_pf.c
will want to be reused.

Rather than dump all of the Scalable IOV implementation into
ice_virtchnl_pf.c it makes sense to house it in a separate file. But that
still leaves all of the Single Root IOV code littered among more generic
logic.

The long term goal is to re-organize the code such that generic re-usable
code is split into separate files. The ice_sriov.c file would end up
containing all of the Single Root IOV implementation specific details, while
ice_vf_lib.[ch] and ice_virtchnl.[ch] contain the generic pieces.

As a first step, notice that ice_sriov.c currently does not contain much of
the SR-IOV implementation. This is housed primarily in ice_virtchnl_pf.c

The code in ice_sriov.c is really generic and relates to the VF mailbox,
including mailbox overflow detection.

Rename ice_sriov.c to ice_vf_mbx.c, and then rename ice_virtchnl_pf.c to
ice_sriov.c

A later series will finish the refactor by splitting ice_sriov.c into
multiple files, moving the generic code into ice_vf_lib.c and ice_virtchnl.c

To prepare for that series, perform some basic cleanup and other refactors
that we've accumulated during this development cycle.

This series builds on top of the recent hash table refactor work.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: use ice_is_vf_trusted helper function
  ice: log an error message when eswitch fails to configure
  ice: cleanup error logging for ice_ena_vfs
  ice: move ice_set_vf_port_vlan near other .ndo ops
  ice: refactor spoofchk control code in ice_sriov.c
  ice: rename ICE_MAX_VF_COUNT to avoid confusion
  ice: remove unused definitions from ice_sriov.h
  ice: convert vf->vc_ops to a const pointer
  ice: remove circular header dependencies on ice.h
  ice: rename ice_virtchnl_pf.c to ice_sriov.c
  ice: rename ice_sriov.c to ice_vf_mbx.c
====================

Link: https://lore.kernel.org/r/20220315011155.2166817-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

1) Revert CHECKSUM_UNNECESSARY for UDP packet from conntrack.

2) Reject unsupported families when creating tables, from Phil Sutter.

3) GRE support for the flowtable, from Toshiaki Makita.

4) Add GRE offload support for act_ct, also from Toshiaki.

5) Update mlx5 driver to support for GRE flowtable offload,
   from Toshiaki Makita.

6) Oneliner to clean up incorrect indentation in nf_conntrack_bridge,
   from Jiapeng Chong.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: bridge: clean up some inconsistent indenting
  net/mlx5: Support GRE conntrack offload
  act_ct: Support GRE offload
  netfilter: flowtable: Support GRE
  netfilter: nf_tables: Reject tables of unsupported family
  Revert "netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY"
====================

Link: https://lore.kernel.org/r/20220315091513.66544-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: fix build error due to missing IEEE_8021QAZ_MAX_TCS

IEEE_8021QAZ_MAX_TCS is defined in include/uapi/linux/dcbnl.h, which is
included by net/dcbnl.h. Then, linux/netdevice.h conditionally includes
net/dcbnl.h if CONFIG_DCB is enabled.

Therefore, when CONFIG_DCB is disabled, this indirect dependency is
broken.

There isn't a good reason to include net/dcbnl.h headers into the ocelot
switch library which exports low-level hardware API, so replace
IEEE_8021QAZ_MAX_TCS with OCELOT_NUM_TC which has the same value.

Fixes: 978777d0fb06 ("net: dsa: felix: configure default-prio and dscp priorities")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220315131215.273450-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sparx5: fix a couple warning messages

The WARN_ON() macro takes a condition, not a warning message.

Fixes: 0933bd04047c ("net: sparx5: Add support for ptp clocks")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20220314140327.GB30883@kili
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'netdevsim-support-for-l3-hw-stats'

Petr Machata says:

====================
netdevsim: Support for L3 HW stats

"L3 stats" is a suite of interface statistics aimed at reflecting traffic
taking place in a HW device, on an object corresponding to some software
netdevice. Support for this stats suite has been added recently, in commit
ca0a53dcec94 ("Merge branch 'net-hw-counters-for-soft-devices'").

In this patch set:

- Patch #1 adds support for L3 stats to netdevsim.

  Real devices can have various conditions for when an L3 counter is
  available. To simulate this, netdevsim maintains a list of devices
  suitable for HW stats collection. Only when l3_stats is enabled on both a
  netdevice itself, and in netdevsim, will netdevsim contribute values to
  L3 stats.

  This enablement and disablement is done via debugfs:

    # echo $ifindex > /sys/kernel/debug/netdevsim/$DEV/hwstats/l3/enable_ifindex
    # echo $ifindex > /sys/kernel/debug/netdevsim/$DEV/hwstats/l3/disable_ifindex

  Besides this, there is a third toggle to mark a device for future failure:

    # echo $ifindex > /sys/kernel/debug/netdevsim/$DEV/hwstats/l3/fail_next_enable

- This allows HW-independent testing of stats reporting and in-kernel APIs,
  as well as a test for enablement rollback, which is difficult to do
  otherwise. This netdevsim-specific selftest is added in patch #2.

- Patch #3 adds another driver-specific selftest, namely a test aimed at
  checking mlxsw-induced stats monitoring events.

====================

Link: https://lore.kernel.org/r/cover.1647265833.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: mlxsw: hw_stats_l3: Add a new test

Add a test that verifies that UAPI notifications are emitted, as mlxsw
installs and deinstalls HW counters for the L3 offload xstats.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: netdevsim: hw_stats_l3: Add a new test

Add a test that verifies basic UAPI contracts, netdevsim operation,
rollbacks after partial enablement in core, and UAPI notifications.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netdevsim: Introduce support for L3 offload xstats

Add support for testing of HW stats support that was added recently, namely
the L3 stats support. L3 stats are provided for devices for which the L3
stats have been turned on, and that were enabled for netdevsim through a
debugfs toggle:

    # echo $ifindex > /sys/kernel/debug/netdevsim/$DEV/hwstats/l3/enable_ifindex

For fully enabled netdevices, netdevsim counts 10pps of ingress traffic and
20pps of egress traffic. Similarly, L3 stats can be disabled for a given
device, and netdevsim ceases pretending there is any HW traffic going on:

    # echo $ifindex > /sys/kernel/debug/netdevsim/$DEV/hwstats/l3/disable_ifindex

Besides this, there is a third toggle to mark a device for future failure:

    # echo $ifindex > /sys/kernel/debug/netdevsim/$DEV/hwstats/l3/fail_next_enable

A future request to enable L3 stats on such netdevice will be bounced by
netdevsim:

    # ip -j l sh dev d | jq '.[].ifindex'
    66
    # echo 66 > /sys/kernel/debug/netdevsim/netdevsim10/hwstats/l3/enable_ifindex
    # echo 66 > /sys/kernel/debug/netdevsim/netdevsim10/hwstats/l3/fail_next_enable
    # ip stats set dev d l3_stats on
    Error: netdevsim: Stats enablement set to fail.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: Kconfig: micrel_phy: fix dependency issue

When building driver CONFIG_MICREL_PHY the follow error shows up:

aarch64-linux-gnu-ld: drivers/net/phy/micrel.o: in function `lan8814_ts_info':
micrel.c:(.text+0x1764): undefined reference to `ptp_clock_index'
micrel.c:(.text+0x1764): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_index'
aarch64-linux-gnu-ld: drivers/net/phy/micrel.o: in function `lan8814_probe':
micrel.c:(.text+0x4720): undefined reference to `ptp_clock_register'
micrel.c:(.text+0x4720): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_register'

Rework Kconfig for MICREL_PHY to depend on 'PTP_1588_CLOCK_OPTIONAL'.
Arnd describes in a good way why its needed to add this depends in patch
e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies").

Reported-by: kernel test robot <lkp@intel.com>
Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220314110254.12498-1-anders.roxell@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: sfp: add 2500base-X quirk for Lantech SFP module

The Lantech 8330-262D-E module is 2500base-X capable, but it reports the
nominal bitrate as 2500MBd instead of 3125MBd. Add a quirk for the
module.

The following in an EEPROM dump of such a SFP with the serial number
redacted:

00: 03 04 07 00 00 00 01 20 40 0c 05 01 19 00 00 00    ???...? @????...
10: 1e 0f 00 00 4c 61 6e 74 65 63 68 20 20 20 20 20    ??..Lantech
20: 20 20 20 20 00 00 00 00 38 33 33 30 2d 32 36 32        ....8330-262
30: 44 2d 45 20 20 20 20 20 56 31 2e 30 03 52 00 cb    D-E     V1.0?R.?
40: 00 1a 00 00 46 43 XX XX XX XX XX XX XX XX XX XX    .?..FCXXXXXXXXXX
50: 20 20 20 20 32 32 30 32 31 34 20 20 68 b0 01 98        220214  h???
60: 45 58 54 52 45 4d 45 4c 59 20 43 4f 4d 50 41 54    EXTREMELY COMPAT
70: 49 42 4c 45 20 20 20 20 20 20 20 20 20 20 20 20    IBLE

Signed-off-by: Michael Walle <michael@walle.cc>
Link: https://lore.kernel.org/r/20220312205014.4154907-1-michael@walle.cc
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ice: use ice_is_vf_trusted helper function

The ice_vc_cfg_promiscuous_mode_msg function directly checks
ICE_VIRTCHNL_VF_CAP_PRIVILEGE, instead of using the existing helper
function ice_is_vf_trusted. Switch this to use the helper function so
that all trusted checks are consistent. This aids in any potential
future refactor by ensuring consistent code.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: log an error message when eswitch fails to configure

When ice_eswitch_configure fails, print an error message to make it more
obvious why VF initialization did not succeed.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: cleanup error logging for ice_ena_vfs

The ice_ena_vfs function and some of its sub-functions like
ice_set_per_vf_res use a "if (<function>) { <print error> ; <exit> }"
flow. This flow discards specialized errors reported by the called
function.

This style is generally not preferred as it makes tracing error sources
more difficult. It also means we cannot log the actual error received
properly.

Refactor several calls in the ice_ena_vfs function that do this to catch
the error in the 'ret' variable. Report this in the messages, and then
return the more precise error value.

Doing this reveals that ice_set_per_vf_res returns -EINVAL or -EIO in
places where -ENOSPC makes more sense. Fix these calls up to return the
more appropriate value.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: move ice_set_vf_port_vlan near other .ndo ops

The ice_set_vf_port_vlan function is located in ice_sriov.c very far
away from the other .ndo operations that it is similar to. Move this so
that its located near the other .ndo operation definitions.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: refactor spoofchk control code in ice_sriov.c

The API to control the VSI spoof checking for a VF VSI has three
functions: enable, disable, and set. The set function takes the VSI and
the VF and decides whether to call enable or disable based on the
vf->spoofchk field.

In some flows, vf->spoofchk is not yet set, such as the function used to
control the setting for a VF. (vf->spoofchk is only updated after a
success).

Simplify this API by refactoring ice_vf_set_spoofchk_cfg to be
"ice_vsi_apply_spoofchk" which takes the boolean and allows all callers
to avoid having to determine whether to call enable or disable
themselves.

This matches the expected callers better, and will prevent the need to
export more than one function when this code must be called from another
file.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: rename ICE_MAX_VF_COUNT to avoid confusion

The ICE_MAX_VF_COUNT field is defined in ice_sriov.h. This count is true
for SR-IOV but will not be true for all VF implementations, such as when
the ice driver supports Scalable IOV.

Rename this definition to clearly indicate ICE_MAX_SRIOV_VFS.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: remove unused definitions from ice_sriov.h

A few more macros exist in ice_sriov.h which are not used anywhere.
These can be safely removed. Note that ICE_VIRTCHNL_VF_CAP_L2 capability
is set but never checked anywhere in the driver. Thus it is also safe to
remove.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: convert vf->vc_ops to a const pointer

The vc_ops structure is used to allow different handlers for virtchnl
commands when the driver is in representor mode. The current
implementation uses a copy of the ops table in each VF, and modifies
this copy dynamically.

The usual practice in kernel code is to store the ops table in a
constant structure and point to different versions. This has a number of
advantages:

  1. Reduced memory usage. Each VF merely points to the correct table,
     so they're able to re-use the same constant lookup table in memory.
  2. Consistency. It becomes more difficult to accidentally update or
     edit only one op call. Instead, the code switches to the correct
     able by a single pointer write. In general this is atomic, either
     the pointer is updated or its not.
  3. Code Layout. The VF structure can store a pointer to the table
     without needing to have the full structure definition defined prior
     to the VF structure definition. This will aid in future refactoring
     of code by allowing the VF pointer to be kept in ice_vf_lib.h while
     the virtchnl ops table can be maintained in ice_virtchnl.h

There is one major downside in the case of the vc_ops structure. Most of
the operations in the table are the same between the two current
implementations. This can appear to lead to duplication since each
implementation must now fill in the complete table. It could make
spotting the differences in the representor mode more challenging.
Unfortunately, methods to make this less error prone either add
complexity overhead (macros using CPP token concatenation) or don't work
on all compilers we support (constant initializer from another constant
structure).

The cost of maintaining two structures does not out weigh the benefits
of the constant table model.

While we're making these changes, go ahead and rename the structure and
implementations with "virtchnl" instead of "vc_vf_". This will more
closely align with the planned file renaming, and avoid similar names when
we later introduce a "vf ops" table for separating Scalable IOV and
Single Root IOV implementations.

Leave the accessor/assignment functions in order to avoid issues with
compiling with options disabled. The interface makes it easier to handle
when CONFIG_PCI_IOV is disabled in the kernel.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: remove circular header dependencies on ice.h

Several headers in the ice driver include ice.h even though they are
themselves included by that header. The most notable of these is
ice_common.h, but several other headers also do this.

Such a recursive inclusion is problematic as it forces headers to be
included in a strict order, otherwise compilation errors can result. The
circular inclusions do not trigger an endless loop due to standard
header inclusion guards, however other errors can occur.

For example, ice_flow.h defines ice_rss_hash_cfg, which is used by
ice_sriov.h as part of the definition of ice_vf_hash_ip_ctx.

ice_flow.h includes ice_acl.h, which includes ice_common.h, and which
finally includes ice.h. Since ice.h itself includes ice_sriov.h, this
creates a circular dependency.

The definition in ice_sriov.h requires things from ice_flow.h, but
ice_flow.h itself will lead to trying to load ice_sriov.h as part of its
process for expanding ice.h. The current code avoids this issue by
having an implicit dependency without the include of ice_flow.h.

If we were to fix that so that ice_sriov.h explicitly depends on
ice_flow.h the following pattern would occur:

ice_flow.h -> ice_acl.h -> ice_common.h -> ice.h -> ice_sriov.h

At this point, during the expansion of, the header guard for ice_flow.h
is already set, so when ice_sriov.h attempts to load the ice_flow.h
header it is skipped. Then, we go on to begin including the rest of
ice_sriov.h, including structure definitions which depend on
ice_rss_hash_cfg. This produces a compiler warning because
ice_rss_hash_cfg hasn't yet been included. Remember, we're just at the
start of ice_flow.h!

If the order of headers is incorrect (ice_flow.h is not implicitly
loaded first in all files which include ice_sriov.h) then we get the
same failure.

Removing this recursive inclusion requires fixing a few cases where some
headers depended on the header inclusions from ice.h. In addition, a few
other changes are also required.

Most notably, ice_hw_to_dev is implemented as a macro in ice_osdep.h,
which is the likely reason that ice_common.h includes ice.h at all. This
macro implementation requires the full definition of ice_pf in order to
properly compile.

Fix this by moving it to a function declared in ice_main.c, so that we
do not require all files to depend on the layout of the ice_pf
structure.

Note that this change only fixes circular dependencies, but it does not
fully resolve all implicit dependencies where one header may depend on
the inclusion of another. I tried to fix as many of the implicit
dependencies as I noticed, but fixing them all requires a somewhat
tedious analysis of each header and attempting to compile it separately.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: rename ice_virtchnl_pf.c to ice_sriov.c

The ice_virtchnl_pf.c and ice_virtchnl_pf.h files are where most of the
code for implementing Single Root IOV virtualization resides. This code
includes support for bringing up and tearing down VFs, hooks into the
kernel SR-IOV netdev operations, and for handling virtchnl messages from
VFs.

In the future, we plan to support Scalable IOV in addition to Single
Root IOV as an alternative virtualization scheme. This implementation
will re-use some but not all of the code in ice_virtchnl_pf.c

To prepare for this future, we want to refactor and split up the code in
ice_virtchnl_pf.c into the following scheme:

* ice_vf_lib.[ch]

   Basic VF structures and accessors. This is where scheme-independent
   code will reside.

* ice_virtchnl.[ch]

   Virtchnl message handling. This is where the bulk of the logic for
   processing messages from VFs using the virtchnl messaging scheme will
   reside. This is separated from ice_vf_lib.c because it is distinct
   and has a bulk of the processing code.

* ice_sriov.[ch]

   Single Root IOV implementation, including initialization and the
   routines for interacting with SR-IOV based netdev operations.

* (future) ice_siov.[ch]

   Scalable IOV implementation.

As a first step, lets assume that all of the code in
ice_virtchnl_pf.[ch] is for Single Root IOV. Rename this file to
ice_sriov.c and its header to ice_sriov.h

Future changes will further split out the code in these files following
the plan outlined here.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: rename ice_sriov.c to ice_vf_mbx.c

The ice_sriov.c file primarily contains code which handles the logic for
mailbox overflow detection and some other utility functions related to
the virtualization mailbox.

The bulk of the SR-IOV implementation is actually found in
ice_virtchnl_pf.c, and this file isn't strictly SR-IOV specific.

In the future, the ice driver will support an additional virtualization
scheme known as Scalable IOV, and the code in this file will be used
for this alternative implementation.

Rename this file (and its associated header) to ice_vf_mbx.c, so that we
can later re-use the ice_sriov.c file as the SR-IOV specific file.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

nfp: flower: avoid newline at the end of message in NL_SET_ERR_MSG_MOD

Fix the following coccicheck warning:

drivers/net/ethernet/netronome/nfp/flower/action.c:959:7-69: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD

Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220312095823.2425775-1-niklas.soderlund@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Fix use-after-free in mlx5e_stats_grp_sw_update_stats

We need to sync page pool stats only for active channels. Reading ethtool
stats on a down netdev or a netdev with modified number of channels will
result in a user-after-free, trying to access page pools that are freed
already.

BUG: KASAN: use-after-free in mlx5e_stats_grp_sw_update_stats+0x465/0xf80
Read of size 8 at addr ffff888004835e40 by task ethtool/720

Fixes: cc10e84b2ec3 ("mlx5: add support for page_pool_get_stats")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Joe Damato <jdamato@fastly.com>
Link: https://lore.kernel.org/r/20220312005353.786255-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx4_en: use kzalloc

Use kzalloc instead of kmalloc + memset.

The semantic patch that makes this change is:
(https://coccinelle.gitlabpages.inria.fr/website/)

//<smpl>
@@
expression res, size, flag;
@@
- res = kmalloc(size, flag);
+ res = kzalloc(size, flag);
...
- memset(res, 0, size);
//</smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220312102705.71413-3-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: disable preemption in dev_core_stats_XXX_inc() helpers

syzbot was kind enough to remind us that dev->{tx_dropped|rx_dropped}
could be increased in process context.

BUG: using smp_processor_id() in preemptible [00000000] code: syz-executor413/3593
caller is netdev_core_stats_alloc+0x98/0x110 net/core/dev.c:10298
CPU: 1 PID: 3593 Comm: syz-executor413 Not tainted 5.17.0-rc7-syzkaller-02426-g97aeb877de7f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
check_preemption_disabled+0x16b/0x170 lib/smp_processor_id.c:49
netdev_core_stats_alloc+0x98/0x110 net/core/dev.c:10298
dev_core_stats include/linux/netdevice.h:3855 [inline]
dev_core_stats_rx_dropped_inc include/linux/netdevice.h:3866 [inline]
tun_get_user+0x3455/0x3ab0 drivers/net/tun.c:1800
tun_chr_write_iter+0xe1/0x200 drivers/net/tun.c:2015
call_write_iter include/linux/fs.h:2074 [inline]
new_sync_write+0x431/0x660 fs/read_write.c:503
vfs_write+0x7cd/0xae0 fs/read_write.c:590
ksys_write+0x12d/0x250 fs/read_write.c:643
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f2cf4f887e3
Code: 5d 41 5c 41 5d 41 5e e9 9b fd ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
RSP: 002b:00007ffd50dd5fd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007ffd50dd6000 RCX: 00007f2cf4f887e3
RDX: 000000000000002a RSI: 0000000000000000 RDI: 00000000000000c8
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd50dd5ff0 R14: 00007ffd50dd5fe8 R15: 00007ffd50dd5fe4
</TASK>

Fixes: 625788b58445 ("net: add per-cpu storage and net->core_stats")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: jeffreyji <jeffreyji@google.com>
Cc: Brian Vazquez <brianvv@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220312214505.3294762-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

drivers: net: packetengines: fix typos in comments

Various spelling mistakes in comments.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20220314115354.144023-13-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'dpaa2-mac-protocol-change'

Ioana Ciornei says:

====================
dpaa2-mac: add support for changing the protocol at runtime

This patch set adds support for changing the Ethernet protocol at
runtime on Layerscape SoCs which have the Lynx 28G SerDes block.

The first two patches add a new generic PHY driver for the Lynx 28G and
the bindings file associated. The driver reads the PLL configuration at
probe time (the frequency provided to the lanes) and determines what
protocols can be supported.
Based on this the driver can deny or approve a request from the
dpaa2-mac to setup a new protocol.

The next 2 patches add some MC APIs for inquiring what is the running
version of firmware and setting up a new protocol on the MAC.

Moving along, we extract the code for setting up the supported
interfaces on a MAC on a different function since in the next patches
will update the logic.

In the next patch, the dpaa2-mac is updated so that it retrieves the
SerDes PHY based on the OF node and in case of a major reconfig, call
the PHY driver to set up the new protocol on the associated lane and the
MC firmware to reconfigure the MAC side of things.

Finally, the LX2160A dtsi is annotated with the SerDes PHY nodes for the
1st SerDes block. Beside this, the LX2160A Clearfog dtsi is annotated
with the 'phys' property for the exposed SFP cages.

Changes in v2:
- 1/8: add MODULE_LICENSE
Changes in v3:
- 2/8: fix 'make dt_binding_check' errors
- 7/8: reverse order of dpaa2_mac_start() and phylink_start()
- 7/8: treat all RGMII variants in dpmac_eth_if_mode
- 7/8: remove the .mac_prepare callback
- 7/8: ignore PHY_INTERFACE_MODE_NA in validate
Changes in v4:
- 1/8: remove the DT nodes parsing
- 1/8: add an xlate function
- 2/8: remove the children phy nodes for each lane
- 7/8: rework the of_phy_get if statement
- 8/8: remove the DT nodes for each lane and the lane id in the
phys phandle
Changes in v5:
- 2/8: use phy as the name of the DT node in the example
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

arch: arm64: dts: lx2160a: describe the SerDes block #1

Describe the SerDes block #1 using the generic phys infrastructure. This
way, the ethernet nodes can each reference their serdes lanes
individually using the 'phys' dts property.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-mac: configure the SerDes phy on a protocol change

This patch integrates the dpaa2-eth driver with the generic PHY
infrastructure in order to search, find and reconfigure the SerDes lanes
in case of a protocol change.

On the .mac_config() callback, the phy_set_mode_ext() API is called so
that the Lynx 28G SerDes PHY driver can change the lane's configuration.
In the same phylink callback the MC firmware is called so that it
reconfigures the MAC side to run using the new protocol.

The consumer drivers - dpaa2-eth and dpaa2-switch - are updated to call
the dpaa2_mac_start/stop functions newly added which will
power_on/power_off the associated SerDes lane.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-mac: move setting up supported_interfaces into a function

The logic to setup the supported interfaces will get annotated based on
what the configuration of the SerDes PLLs supports. Move the current
setup into a separate function just to try to keep it clean.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-mac: retrieve API version and detect features

Retrieve the API version running on the firmware and based on it detect
which features are available for usage.
The first one to be listed is the capability to change the MAC protocol
at runtime.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>