review.tizen.org Git - platform/kernel/linux-starfive.git/log

Merge branch 'ip-neigh-skb-reason'

Menglong Dong says:

====================
net: use kfree_skb_reason() for ip/neighbour

In the series "net: use kfree_skb_reason() for ip/udp packet receive",
reasons for skb drops are added to the packet receive process of IP
layer. Link:

https://lore.kernel.org/netdev/20220205074739.543606-1-imagedong@tencent.com/

And in the first patch of this series, skb drop reasons are added to
the packet egress path of IP layer. As kfree_skb() is not used frequent,
I commit these changes at once and didn't create a patch for every
functions that involed. Following functions are handled:

__ip_queue_xmit()
ip_finish_output()
ip_mc_finish_output()
ip6_output()
ip6_finish_output()
ip6_finish_output2()

Following new drop reasons are introduced (what they mean can be seen
in the document of them):

SKB_DROP_REASON_IP_OUTNOROUTES
SKB_DROP_REASON_BPF_CGROUP_EGRESS
SKB_DROP_REASON_IPV6DISABLED
SKB_DROP_REASON_NEIGH_CREATEFAIL

In the 2th and 3th patches, kfree_skb_reason() is used in neighbour
subsystem instead of kfree_skb(). __neigh_event_send() and
arp_error_report() are involed, and following new drop reasons are
introduced:

SKB_DROP_REASON_NEIGH_FAILED
SKB_DROP_REASON_NEIGH_QUEUEFULL
SKB_DROP_REASON_NEIGH_DEAD

Changes since v2:
- fix typo in the 1th patch of 'SKB_DROP_REASON_IPV6DSIABLED' reported
  by Roman

Changes since v1:
- introduce SKB_DROP_REASON_NEIGH_CREATEFAIL for some path in the 1th
  patch
- introduce SKB_DROP_REASON_NEIGH_DEAD in the 2th patch
- simplify the document for the new drop reasons, as David Ahern
  suggested
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: neigh: add skb drop reasons to arp_error_report()

When neighbour become invalid or destroyed, neigh_invalidate() will be
called. neigh->ops->error_report() will be called if the neighbour's
state is NUD_FAILED, and seems here is the only use of error_report().
So we can tell that the reason of skb drops in arp_error_report() is
SKB_DROP_REASON_NEIGH_FAILED.

Replace kfree_skb() used in arp_error_report() with kfree_skb_reason().

Reviewed-by: Mengen Sun <mengensun@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: neigh: use kfree_skb_reason() for __neigh_event_send()

Replace kfree_skb() used in __neigh_event_send() with
kfree_skb_reason(). Following drop reasons are added:

SKB_DROP_REASON_NEIGH_FAILED
SKB_DROP_REASON_NEIGH_QUEUEFULL
SKB_DROP_REASON_NEIGH_DEAD

The first two reasons above should be the hot path that skb drops
in neighbour subsystem.

Reviewed-by: Mengen Sun <mengensun@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ip: add skb drop reasons for ip egress path

Replace kfree_skb() which is used in the packet egress path of IP layer
with kfree_skb_reason(). Functions that are involved include:

__ip_queue_xmit()
ip_finish_output()
ip_mc_finish_output()
ip6_output()
ip6_finish_output()
ip6_finish_output2()

Following new drop reasons are introduced:

SKB_DROP_REASON_IP_OUTNOROUTES
SKB_DROP_REASON_BPF_CGROUP_EGRESS
SKB_DROP_REASON_IPV6DISABLED
SKB_DROP_REASON_NEIGH_CREATEFAIL

Reviewed-by: Mengen Sun <mengensun@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-ocelot-phylink-updates'

Russell King says:

====================
net: dsa: ocelot: phylink updates

This series updates the Ocelot DSA driver for some of the recent
phylink changes. Specifically, we fill in the supported_interfaces
fields, convert to mac_select_pcs and mark the driver as non-legacy.
We do not convert to phylink_generic_validate() as Ocelot has
special support for its rate adapting PCS which makes the generic
validate method unsuitable for this driver.

The three changes mentioned above are implemented in their own
separate patches with one additional cleanup:

1) Populate the supported_interfaces bitmap
2) Remove the now unnecessary interface checks in the validate methods
3) Convert from phylink_set_pcs() to .mac_select_pcs.
4) Mark the driver as non-legacy

Thanks.

RFC -> non-RFC: add reviewed-by/tested-by's, update patch 1 to set the
supported_interfaces bitmap in felix.c rather than the sub-drivers as
requested by Vladimir.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: ocelot: mark as non-legacy

The ocelot DSA driver does not make use of the speed, duplex, pause or
advertisement in its phylink_mac_config() implementation, so it can be
marked as a non-legacy driver.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: ocelot: convert to mac_select_pcs()

Convert the PCS selection to use mac_select_pcs, which allows the PCS
to perform any validation it needs, and removes the need to set the PCS
in the mac_config() callback, delving into the higher DSA levels to do
so.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: ocelot: remove interface checks

When the supported interfaces bitmap is populated, phylink will itself
check that the interface mode is present in this bitmap. Drivers no
longer need to perform this check themselves. Remove these checks.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: ocelot: populate supported_interfaces

Populate the supported interfaces bitmap for the Ocelot DSA switches.

Since all sub-drivers only support a single interface mode, defined by
ocelot_port->phy_mode, we can handle this in the main driver code
without reference to the sub-driver.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'small-fixes-for-mctp'

Matt Johnston says:

====================
Small fixes for MCTP

This series has 3 fixes for MCTP.
====================

Link: https://lore.kernel.org/r/20220225053938.643605-1-matt@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mctp i2c: Fix hard head TX bounds length check

We should be testing the length before fitting into the u8 byte_count.
This is just a sanity check, the MCTP stack should have limited to MTU
which is checked, and we check consistency later in mctp_i2c_xmit().

Found by Smatch
mctp_i2c_header_create() warn: impossible condition
'(hdr->byte_count > 255) => (0-255 > 255)'

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mctp i2c: Fix potential use-after-free

The skb is handed off to netif_rx() which may free it.
Found by Smatch.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mctp: Avoid warning if unregister notifies twice

Previously if an unregister notify handler ran twice (waiting for
netdev to be released) it would print a warning in mctp_unregister()
every subsequent time the unregister notify occured.

Instead we only need to worry about the case where a mctp_ptr is
set on an unknown device type.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

stmmac: intel: Enable 2.5Gbps for Intel AlderLake-S

Intel AlderLake-S platform is capable of running on 2.5GBps link speed.

This patch enables 2.5Gbps link speed on AlderLake-S platform.

Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Link: https://lore.kernel.org/r/20220225023325.474242-1-vee.khee.wong@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: qca8k: return with -EINVAL on invalid port

Currently an invalid port throws a WARN_ON warning however invalid
uninitialized values in reg and cpu_port_index are being used later
on. Fix this by returning -EINVAL for an invalid port value.

Addresses clang-scan warnings:
drivers/net/dsa/qca8k.c:1981:3: warning: 2nd function call argument is an
uninitialized value [core.CallAndMessage]
drivers/net/dsa/qca8k.c:1999:9: warning: 2nd function call argument is an
uninitialized value [core.CallAndMessage]

Fixes: 7544b3ff745b ("net: dsa: qca8k: move pcs configuration")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220224220557.147075-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'sja1105-phylink-updates'

Russell King says:

====================
net: dsa: sja1105: phylink updates

This series updates the phylink implementation in sja1105 to use the
supported_interfaces bitmap, convert to the mac_select_pcs() interface,
mark as non-legacy, and get rid of the validation method.

As a final step, enable switching between SGMII and 2500BASE-X as it
is a feature that Vladimir desires.

Specifically, the patches in this series:

1. Populates the supported_interfaces bitmap.
2. As a result of the supported_interfaces bitmap being populated,
   sja1105 no longer needs to check the interface mode as phylink
   will do this.
3. Switch away from using phylink_set_pcs(), using the mac_select_pcs()
   method instead.
4. Mark the driver as not-legacy
5. Fill in mac_capabilities using _exactly_ the same conditions as is
   currently used to decide which link modes to support, and convert
   to use phylink_generic_validate()
6. Add brand new support to permit switching between SGMII and
   2500BASE-X modes of operation as per Vladimir's single patch that
   performs steps 1, 2, 5 and 6 in one go.

There are some additional changes in Vladimir's single patch that I
have not included:

* validation of priv->phy_mode[] in sja1105_phylink_get_caps(). The
  driver has already validated the phy_mode for each port in
  sja1105_init_mii_settings(), and a failure here will prevent the
  driver reaching sja1105_phylink_get_caps().

* Changing the decisions on which mac_capabilities to set. Vladimir's
  patch always sets MAC_10FD | MAC_100FD | MAC_1000FD despite the
  current code clearly making the 1G speed conditional on the
  xmii_mode for the port. The change in decision making may be
  visible when in PHY_INTERFACE_MODE_INTERNAL mode, for which
  the phylink_generic_validate() will pass through all the MAC
  capabilities as ethtool link modes.

  Hence, if we have PHY_INTERFACE_MODE_INTERNAL but supports_rgmii[]
  or supports_sgmii[] is non-zero, currently we do not get 1G speeds.
  With Vladimir's additional change, we will get 1G speeds.

  While it is not clear whether that can happen, I feel changing the
  decision making should be a separate patch.

* The decision for MAC_2500FD is made differently -
  sja1105_init_mii_settings() allows PHY_INTERFACE_MODE_2500BASEX
  when supports_2500basex[] is non-zero, and is not based on any other
  condition such as supports_sgmii[] or supports_rgmii[]. Vladimir's
  patch makes it additionally conditional on those supports_.gmii[]
  settings, which is a functional change that should be made in a
  separate patch - and if desired, then sja1105_init_mii_settings()
  should also be updated at the same time.

Consequently, I believe that my previous objections to Vladimir's
single patch approach are well founded and justified, even through
Vladimir is the maintainer of this driver. I have no objection to
the additional changes, I just don't think they should all be wrapped
up into a single patch that converts the way validation is done _and_
also makes a bunch of other functional changes.

RFC->non-RFC: added Vladimir's Reviewed-by's, fixed the typo in the
commit message of patch 6, and removed the phrase at the end of a
comment as requested.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: support switching between SGMII and 2500BASE-X

Vladimir Oltean suggests that sja1105 can support switching between
SGMII and 2500BASE-X modes. Augment sja1105_phylink_get_caps() to
fill in both interface modes if they can be supported.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: convert to phylink_generic_validate()

Populate the MAC capabilities for the SJA1105 DSA switch using the same
decision making which sja1105_phylink_validate() uses. Remove the now
obsolete sja1105_phylink_validate() implementation to allow DSA to use
phylink_generic_validate() for this switch driver.

As noted by Vladimir, this fixes an inconsequential bug which allowed
gigabit and lower interface modes to be indicated when operating in
2500base-X mode.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: mark as non-legacy

The sja1105 DSA driver does not have a phylink_mac_config() method
implementation, it is safe to mark this as a non-legacy driver.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: use .mac_select_pcs() interface

Convert the PCS selection to use mac_select_pcs, which allows the PCS
to perform any validation it needs, and removes the need to set the PCS
in the mac_config() callback, delving into the higher DSA levels to do
so.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: remove interface checks

When the supported interfaces bitmap is populated, phylink will itself
check that the interface mode is present in this bitmap. Drivers no
longer need to perform this check themselves. Remove these checks.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: populate supported_interfaces

Populate the supported interfaces bitmap for the SJA1105 DSA switch.

This switch only supports a static model of configuration, so we
restrict the interface modes to the configured setting.

Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Vladimir Oltean <vladimir. │
Signed-off-by: David S. Miller <davem@davemloft.net>

net: openvswitch: IPv6: Add IPv6 extension header support

This change adds a new OpenFlow field OFPXMT_OFB_IPV6_EXTHDR and
packets can be filtered using ipv6_ext flag.

Signed-off-by: Toms Atteka <cpp.code.lv@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nfp-flow-independent-tc-action-hardware-offload'

Simon Horman says:

====================
nfp: flow-independent tc action hardware offload

Baowen Zheng says:

Allow nfp NIC to offload tc actions independent of flows.

The motivation for this work is to offload tc actions independent of flows
for nfp NIC. We allow nfp driver to provide hardware offload of OVS
metering feature - which calls for policers that may be used by multiple
flows and whose lifecycle is independent of any flows that use them.

When nfp driver tries to offload a flow table using the independent action,
the driver will search if the action is already offloaded to the hardware.
If not, the flow table offload will fail.

When the nfp NIC successes to offload an action, the user can check
in_hw_count when dumping the tc action.

Tc cli command to offload and dump an action:

# tc actions add action police rate 100mbit burst 10000k index 200 skip_sw

# tc -s -d actions list action police

total acts 1

      action order 0:  police 0xc8 rate 100Mbit burst 10000Kb mtu 2Kb action reclassify
      overhead 0b linklayer ethernet
      ref 1 bind 0  installed 142 sec used 0 sec
      Action statistics:
      Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      backlog 0b 0p requeues 0
      skip_sw in_hw in_hw_count 1
      used_hw_stats delayed
====================

Link: https://lore.kernel.org/r/20220223162302.97609-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfp: add NFP_FL_FEATS_QOS_METER to host features to enable meter offload

Add NFP_FL_FEATS_QOS_METER to host features to enable meter
offload in driver.

Before adding this feature, we will not offload any police action
since we will check the host features before offloading any police
action.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfp: add support to offload police action from flower table

Offload flow table if the action is already offloaded to hardware when
flow table uses this action.

Change meter id to type of u32 to support all the action index.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfp: add process to get action stats from hardware

Add a process to update action stats from hardware.

This stats data will be updated to tc action when dumping actions
or filters.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfp: add hash table to store meter table

Add a hash table to store meter table.

This meter table will also be used by flower action.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfp: add support to offload tc action to hardware

Add process to offload tc action to hardware.

Currently we only support to offload police action.

Add meter capability to check if firmware supports
meter offload.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfp: refactor policer config to support ingress/egress meter

Add an policer API to support ingress/egress meter.

Change ingress police to compatible with the new API.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/tcp: Merge TCP-MD5 inbound callbacks

The functions do essentially the same work to verify TCP-MD5 sign.
Code can be merged into one family-independent function in order to
reduce copy'n'paste and generated code.
Later with TCP-AO option added, this will allow to create one function
that's responsible for segment verification, that will have all the
different checks for MD5/AO/non-signed packets, which in turn will help
to see checks for all corner-cases in one function, rather than spread
around different families and functions.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220223175740.452397-1-dima@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'fdb-entries-on-dsa-lag-interfaces'

Vladimir Oltean says:

====================
FDB entries on DSA LAG interfaces

This work permits having static and local FDB entries on LAG interfaces
that are offloaded by DSA ports. New API needs to be introduced in
drivers. To maintain consistency with the bridging offload code, I've
taken the liberty to reorganize the data structures added by Tobias in
the DSA core a little bit.

Tested on NXP LS1028A (felix switch). Would appreciate feedback/testing
on other platforms too. Testing procedure was the one described here:
https://patchwork.kernel.org/project/netdevbpf/cover/20210205130240.4072854-1-vladimir.oltean@nxp.com/

with this script:

ip link del bond0
ip link add bond0 type bond mode 802.3ad
ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
ip link del br0
ip link add br0 type bridge && ip link set br0 up
ip link set br0 arp off
ip link set bond0 master br0 && ip link set bond0 up
ip link set swp0 master br0 && ip link set swp0 up
ip link set dev bond0 type bridge_slave flood off learning off
bridge fdb add dev bond0 <mac address of other eno0> master static

I'm noticing a problem in 'bridge fdb dump' with the 'self' entries, and
I didn't solve this. On Ocelot, an entry learned on a LAG is reported as
being on the first member port of it (so instead of saying 'self bond0',
it says 'self swp1'). This is better than not seeing the entry at all,
but when DSA queries for the FDBs on a port via ds->ops->port_fdb_dump,
it never queries for FDBs on a LAG. Not clear what we should do there,
we aren't in control of the ->ndo_fdb_dump of the bonding/team drivers.
Alternatively, we could just consider the 'self' entries reported via
ndo_fdb_dump as "better than nothing", and concentrate on the 'master'
entries that are in sync with the bridge when packets are flooded to
software.
====================

Link: https://lore.kernel.org/r/20220223140054.3379617-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: felix: support FDB entries on offloaded LAG interfaces

This adds the logic in the Felix DSA driver and Ocelot switch library.
For Ocelot switches, the DEST_IDX that is the output of the MAC table
lookup is a logical port (equal to physical port, if no LAG is used, or
a dynamically allocated number otherwise). The allocation we have in
place for LAG IDs is different from DSA's, so we can't use that:
- DSA allocates a continuous range of LAG IDs starting from 1
- Ocelot appears to require that physical ports and LAG IDs are in the
same space of [0, num_phys_ports), and additionally, ports that aren't
in a LAG must have physical port id == logical port id

The implication is that an FDB entry towards a LAG might need to be
deleted and reinstalled when the LAG ID changes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: support FDB events on offloaded LAG interfaces

This change introduces support for installing static FDB entries towards
a bridge port that is a LAG of multiple DSA switch ports, as well as
support for filtering towards the CPU local FDB entries emitted for LAG
interfaces that are bridge ports.

Conceptually, host addresses on LAG ports are identical to what we do
for plain bridge ports. Whereas FDB entries _towards_ a LAG can't simply
be replicated towards all member ports like we do for multicast, or VLAN.
Instead we need new driver API. Hardware usually considers a LAG to be a
"logical port", and sets the entire LAG as the forwarding destination.
The physical egress port selection within the LAG is made by hashing
policy, as usual.

To represent the logical port corresponding to the LAG, we pass by value
a copy of the dsa_lag structure to all switches in the tree that have at
least one port in that LAG.

To illustrate why a refcounted list of FDB entries is needed in struct
dsa_lag, it is enough to say that:
- a LAG may be a bridge port and may therefore receive FDB events even
  while it isn't yet offloaded by any DSA interface
- DSA interfaces may be removed from a LAG while that is a bridge port;
  we don't want FDB entries lingering around, but we don't want to
  remove entries that are still in use, either

For all the cases below to work, the idea is to always keep an FDB entry
on a LAG with a reference count equal to the DSA member ports. So:
- if a port joins a LAG, it requests the bridge to replay the FDB, and
  the FDB entries get created, or their refcount gets bumped by one
- if a port leaves a LAG, the FDB replay deletes or decrements refcount
  by one
- if an FDB is installed towards a LAG with ports already present, that
  entry is created (if it doesn't exist) and its refcount is bumped by
  the amount of ports already present in the LAG

echo "Adding FDB entry to bond with existing ports"
ip link del bond0
ip link add bond0 type bond mode 802.3ad
ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
ip link del br0
ip link add br0 type bridge
ip link set bond0 master br0
bridge fdb add dev bond0 00:01:02:03:04:05 master static

ip link del br0
ip link del bond0

echo "Adding FDB entry to empty bond"
ip link del bond0
ip link add bond0 type bond mode 802.3ad
ip link del br0
ip link add br0 type bridge
ip link set bond0 master br0
bridge fdb add dev bond0 00:01:02:03:04:05 master static
ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up

ip link del br0
ip link del bond0

echo "Adding FDB entry to empty bond, then removing ports one by one"
ip link del bond0
ip link add bond0 type bond mode 802.3ad
ip link del br0
ip link add br0 type bridge
ip link set bond0 master br0
bridge fdb add dev bond0 00:01:02:03:04:05 master static
ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up

ip link set swp1 nomaster
ip link set swp2 nomaster
ip link del br0
ip link del bond0

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: call SWITCHDEV_FDB_OFFLOADED for the orig_dev

When switchdev_handle_fdb_event_to_device() replicates a FDB event
emitted for the bridge or for a LAG port and DSA offloads that, we
should notify back to switchdev that the FDB entry on the original
device is what was offloaded, not on the DSA slave devices that the
event is replicated on.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: remove "ds" and "port" from struct dsa_switchdev_event_work

By construction, the struct net_device *dev passed to
dsa_slave_switchdev_event_work() via struct dsa_switchdev_event_work
is always a DSA slave device.

Therefore, it is redundant to pass struct dsa_switch and int port
information in the deferred work structure. This can be retrieved at all
times from the provided struct net_device via dsa_slave_to_port().

For the same reason, we can drop the dsa_is_user_port() check in
dsa_fdb_offload_notify().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: switchdev: remove lag_mod_cb from switchdev_handle_fdb_event_to_device

When the switchdev_handle_fdb_event_to_device() event replication helper
was created, my original thought was that FDB events on LAG interfaces
should most likely be special-cased, not just replicated towards all
switchdev ports beneath that LAG. So this replication helper currently
does not recurse through switchdev lower interfaces of LAG bridge ports,
but rather calls the lag_mod_cb() if that was provided.

No switchdev driver uses this helper for FDB events on LAG interfaces
yet, so that was an assumption which was yet to be tested. It is
certainly usable for that purpose, as my RFC series shows:

https://patchwork.kernel.org/project/netdevbpf/cover/20220210125201.2859463-1-vladimir.oltean@nxp.com/

however this approach is slightly convoluted because:

- the switchdev driver gets a "dev" that isn't its own net device, but
  rather the LAG net device. It must call switchdev_lower_dev_find(dev)
  in order to get a handle of any of its own net devices (the ones that
  pass check_cb).

- in order for FDB entries on LAG ports to be correctly refcounted per
  the number of switchdev ports beneath that LAG, we haven't escaped the
  need to iterate through the LAG's lower interfaces. Except that is now
  the responsibility of the switchdev driver, because the replication
  helper just stopped half-way.

So, even though yes, FDB events on LAG bridge ports must be
special-cased, in the end it's simpler to let switchdev_handle_fdb_*
just iterate through the LAG port's switchdev lowers, and let the
switchdev driver figure out that those physical ports are under a LAG.

The switchdev_handle_fdb_event_to_device() helper takes a
"foreign_dev_check" callback so it can figure out whether @dev can
autonomously forward to @foreign_dev. DSA fills this method properly:
if the LAG is offloaded by another port in the same tree as @dev, then
it isn't foreign. If it is a software LAG, it is foreign - forwarding
happens in software.

Whether an interface is foreign or not decides whether the replication
helper will go through the LAG's switchdev lowers or not. Since the
lan966x doesn't properly fill this out, FDB events on software LAG
uppers will get called. By changing lan966x_foreign_dev_check(), we can
suppress them.

Whereas DSA will now start receiving FDB events for its offloaded LAG
uppers, so we need to return -EOPNOTSUPP, since we currently don't do
the right thing for them.

Cc: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: create a dsa_lag structure

The main purpose of this change is to create a data structure for a LAG
as seen by DSA. This is similar to what we have for bridging - we pass a
copy of this structure by value to ->port_lag_join and ->port_lag_leave.
For now we keep the lag_dev, id and a reference count in it. Future
patches will add a list of FDB entries for the LAG (these also need to
be refcounted to work properly).

The LAG structure is created using dsa_port_lag_create() and destroyed
using dsa_port_lag_destroy(), just like we have for bridging.

Because now, the dsa_lag itself is refcounted, we can simplify
dsa_lag_map() and dsa_lag_unmap(). These functions need to keep a LAG in
the dst->lags array only as long as at least one port uses it. The
refcounting logic inside those functions can be removed now - they are
called only when we should perform the operation.

dsa_lag_dev() is renamed to dsa_lag_by_id() and now returns the dsa_lag
structure instead of the lag_dev net_device.

dsa_lag_foreach_port() now takes the dsa_lag structure as argument.

dst->lags holds an array of dsa_lag structures.

dsa_lag_map() now also saves the dsa_lag->id value, so that linear
walking of dst->lags in drivers using dsa_lag_id() is no longer
necessary. They can just look at lag.id.

dsa_port_lag_id_get() is a helper, similar to dsa_port_bridge_num_get(),
which can be used by drivers to get the LAG ID assigned by DSA to a
given port.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mv88e6xxx: use dsa_switch_for_each_port in mv88e6xxx_lag_sync_masks

Make the intent of the code more clear by using the dedicated helper for
iterating over the ports of a switch.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: make LAG IDs one-based

The DSA LAG API will be changed to become more similar with the bridge
data structures, where struct dsa_bridge holds an unsigned int num,
which is generated by DSA and is one-based. We have a similar thing
going with the DSA LAG, except that isn't stored anywhere, it is
calculated dynamically by dsa_lag_id() by iterating through dst->lags.

The idea of encoding an invalid (or not requested) LAG ID as zero for
the purpose of simplifying checks in drivers means that the LAG IDs
passed by DSA to drivers need to be one-based too. So back-and-forth
conversion is needed when indexing the dst->lags array, as well as in
drivers which assume a zero-based index.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: qca8k: rename references to "lag" as "lag_dev"

In preparation of converting struct net_device *dp->lag_dev into a
struct dsa_lag *dp->lag, we need to rename, for consistency purposes,
all occurrences of the "lag" variable in qca8k to "lag_dev".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mv88e6xxx: rename references to "lag" as "lag_dev"

In preparation of converting struct net_device *dp->lag_dev into a
struct dsa_lag *dp->lag, we need to rename, for consistency purposes,
all occurrences of the "lag" variable in mv88e6xxx to "lag_dev".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: rename references to "lag" as "lag_dev"

In preparation of converting struct net_device *dp->lag_dev into a
struct dsa_lag *dp->lag, we need to rename, for consistency purposes,
all occurrences of the "lag" variable in the DSA core to "lag_dev".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: asix: remove code duplicates in asix_mdio_read/write and asix_mdio_read/write_nopm

This functions are mostly same except of one hard coded "in_pm" variable.
So, rework them to reduce maintenance overhead.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20220223110633.3006551-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: marvell: prestera: Fix return value check in prestera_kern_fib_cache_find()

rhashtable_lookup_fast() returns NULL pointer not ERR_PTR(), so
it can return fib_node directly in prestera_kern_fib_cache_find().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220223084954.1771075-2-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: marvell: prestera: Fix return value check in prestera_fib_node_find()

rhashtable_lookup_fast() returns NULL pointer not ERR_PTR(), so
it can return fib_node directly in prestera_fib_node_find().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220223084954.1771075-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sparx5: Support offloading of bridge port flooding flags

Though the SparX-5i can control IPv4/6 multicasts separately from non-IP
multicasts, these are all muxed onto the bridge's BR_MCAST_FLOOD flag.

Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20220223082700.qrot7lepwqcdnyzw@wse-c0155
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/netdev/net

tools/testing/selftests/net/mptcp/mptcp_join.sh
  34aa6e3bccd8 ("selftests: mptcp: add ip mptcp wrappers")

  857898eb4b28 ("selftests: mptcp: add missing join check")
  6ef84b1517e0 ("selftests: mptcp: more robust signal race test")
https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/

drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h
drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c
  fb7e76ea3f3b6 ("net/mlx5e: TC, Skip redundant ct clear actions")
  c63741b426e11 ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information")

  09bf97923224f ("net/mlx5e: TC, Move pedit_headers_action to parse_attr")
  84ba8062e383 ("net/mlx5e: Test CT and SAMPLE on flow attr")
  efe6f961cd2e ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow")
  3b49a7edec1d ("net/mlx5e: TC, Reject rules with multiple CT actions")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'pci-v5.17-fixes-5' of git://git./linux/kernel/git/helgaas/pci

Pull pci fixes from Bjorn Helgaas:

- Fix a merge error that broke PCI device enumeration on mvebu
   platforms, including Turris Omnia (Armada 385) (Pali Rohár)

- Avoid using ATS on all AMD Navi10 and Navi14 GPUs because some
   VBIOSes don't account for "harvested" (disabled) parts of the chip
   when initializing caches (Alex Deucher)

* tag 'pci-v5.17-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Mark all AMD Navi10 and Navi14 GPU ATS as broken
  PCI: mvebu: Fix device enumeration regression

Merge tag 'net-5.17-rc6' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and netfilter.

  Current release - regressions:

   - bpf: fix crash due to out of bounds access into reg2btf_ids

   - mvpp2: always set port pcs ops, avoid null-deref

   - eth: marvell: fix driver load from initrd

   - eth: intel: revert "Fix reset bw limit when DCB enabled with 1 TC"

  Current release - new code bugs:

   - mptcp: fix race in overlapping signal events

  Previous releases - regressions:

   - xen-netback: revert hotplug-status changes causing devices to not
     be configured

   - dsa:
      - avoid call to __dev_set_promiscuity() while rtnl_mutex isn't
        held
      - fix panic when removing unoffloaded port from bridge

   - dsa: microchip: fix bridging with more than two member ports

  Previous releases - always broken:

   - bpf:
      - fix crash due to incorrect copy_map_value when both spin lock
        and timer are present in a single value
      - fix a bpf_timer initialization issue with clang
      - do not try bpf_msg_push_data with len 0
      - add schedule points in batch ops

   - nf_tables:
      - unregister flowtable hooks on netns exit
      - correct flow offload action array size
      - fix a couple of memory leaks

   - vsock: don't check owner in vhost_vsock_stop() while releasing

   - gso: do not skip outer ip header in case of ipip and net_failover

   - smc: use a mutex for locking "struct smc_pnettable"

   - openvswitch: fix setting ipv6 fields causing hw csum failure

   - mptcp: fix race in incoming ADD_ADDR option processing

   - sysfs: add check for netdevice being present to speed_show

   - sched: act_ct: fix flow table lookup after ct clear or switching
     zones

   - eth: intel: fixes for SR-IOV forwarding offloads

   - eth: broadcom: fixes for selftests and error recovery

   - eth: mellanox: flow steering and SR-IOV forwarding fixes

  Misc:

   - make __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor
     friends not report freed skbs as drops

   - force inlining of checksum functions in net/checksum.h"

* tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
  net: mv643xx_eth: process retval from of_get_mac_address
  ping: remove pr_err from ping_lookup
  Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"
  openvswitch: Fix setting ipv6 fields causing hw csum failure
  ipv6: prevent a possible race condition with lifetimes
  net/smc: Use a mutex for locking "struct smc_pnettable"
  bnx2x: fix driver load from initrd
  Revert "xen-netback: Check for hotplug-status existence before watching"
  Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
  net/mlx5e: Fix VF min/max rate parameters interchange mistake
  net/mlx5e: Add missing increment of count
  net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
  net/mlx5e: Fix MPLSoUDP encap to use MPLS action information
  net/mlx5e: Add feature check for set fec counters
  net/mlx5e: TC, Skip redundant ct clear actions
  net/mlx5e: TC, Reject rules with forward and drop actions
  net/mlx5e: TC, Reject rules with drop and modify hdr action
  net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
  net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
  net/mlx5: Fix possible deadlock on rule deletion
  ...

Merge branch '10GbE' of git://git./linux/kernel/git/tnguy/next-queue

Nguyen, Anthony L says:

====================
10GbE Intel Wired LAN Driver Updates 2022-02-23

Yang Li fixes incorrect indenting as reported by smatch for ixgbevf.

Piotr removes non-inclusive language from ixgbe driver.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ixgbe: Remove non-inclusive language
ixgbevf: clean up some inconsistent indenting
====================

Link: https://lore.kernel.org/r/20220223185424.2129067-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

- NVMe pull request:
    - send H2CData PDUs based on MAXH2CDATA (Varun Prakash)
    - fix passthrough to namespaces with unsupported features (Christoph
      Hellwig)

- Clear iocb->private at poll completion (Stefano)

* tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block:
  nvme-tcp: send H2CData PDUs based on MAXH2CDATA
  nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
  nvme: don't return an error from nvme_configure_metadata
  block: clear iocb->private in blkdev_bio_end_io_async()

Merge tag 'io_uring-5.17-2022-02-23' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

- Add a conditional schedule point in io_add_buffers() (Eric)

- Fix for a quiesce speedup merged in this release (Dylan)

- Don't convert to jiffies for event timeout waiting, it's way too
   coarse when we accept a timespec as input (me)

* tag 'io_uring-5.17-2022-02-23' of git://git.kernel.dk/linux-block:
  io_uring: disallow modification of rsrc_data during quiesce
  io_uring: don't convert to jiffies for waiting on timeouts
  io_uring: add a schedule point in io_add_buffers()

Merge tag 'platform-drivers-x86-v5.17-4' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull more x86 platform driver fixes from Hans de Goede:
"Two more fixes:

   - Fix suspend/resume regression on AMD Cezanne APUs in >= 5.16

   - Fix Microsoft Surface 3 battery readings"

* tag 'platform-drivers-x86-v5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  surface: surface3_power: Fix battery readings on batteries without a serial number
  platform/x86: amd-pmc: Set QOS during suspend on CZN w/ timer wakeup

net: mv643xx_eth: process retval from of_get_mac_address

Obtaining a MAC address may be deferred in cases when the MAC is stored
in an NVMEM block, for example, and it may not be ready upon the first
retrieval attempt and return EPROBE_DEFER.

It is also possible that a port that does not rely on NVMEM has been
already created when getting the defer request. Thus, also the resources
allocated previously must be freed when doing a roll-back.

Fixes: 76723bca2802 ("net: mv643xx_eth: add DT parsing support")
Signed-off-by: Mauri Sandberg <maukka@ext.kapsi.fi>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220223142337.41757-1-maukka@ext.kapsi.fi
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ping: remove pr_err from ping_lookup

As Jakub noticed, prints should be avoided on the datapath.
Also, as packets would never come to the else branch in
ping_lookup(), remove pr_err() from ping_lookup().

Fixes: 35a79e64de29 ("ping: fix the dif and sdif check in ping_lookup")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/1ef3f2fcd31bd681a193b1fcf235eee1603819bd.1645674068.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"

Revert of a patch that instead of fixing a AQ error when trying
to reset BW limit introduced several regressions related to
creation and managing TC. Currently there are errors when creating
a TC on both PF and VF.

Error log:
[17428.783095] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783107] i40e 0000:3b:00.1: Failed configuring TC map 0 for VSI 391
[17428.783254] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783259] i40e 0000:3b:00.1: Unable to configure TC map 0 for VSI 391

This reverts commit 3d2504663c41104b4359a15f35670cfa82de1bbf.

Fixes: 3d2504663c41 (i40e: Fix reset bw limit when DCB enabled with 1 TC)
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220223175347.1690692-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

openvswitch: Fix setting ipv6 fields causing hw csum failure

Ipv6 ttl, label and tos fields are modified without first
pulling/pushing the ipv6 header, which would have updated
the hw csum (if available). This might cause csum validation
when sending the packet to the stack, as can be seen in
the trace below.

Fix this by updating skb->csum if available.

Trace resulted by ipv6 ttl dec and then sending packet
to conntrack [actions: set(ipv6(hlimit=63)),ct(zone=99)]:
[295241.900063] s_pf0vf2: hw csum failure
[295241.923191] Call Trace:
[295241.925728]  <IRQ>
[295241.927836]  dump_stack+0x5c/0x80
[295241.931240]  __skb_checksum_complete+0xac/0xc0
[295241.935778]  nf_conntrack_tcp_packet+0x398/0xba0 [nf_conntrack]
[295241.953030]  nf_conntrack_in+0x498/0x5e0 [nf_conntrack]
[295241.958344]  __ovs_ct_lookup+0xac/0x860 [openvswitch]
[295241.968532]  ovs_ct_execute+0x4a7/0x7c0 [openvswitch]
[295241.979167]  do_execute_actions+0x54a/0xaa0 [openvswitch]
[295242.001482]  ovs_execute_actions+0x48/0x100 [openvswitch]
[295242.006966]  ovs_dp_process_packet+0x96/0x1d0 [openvswitch]
[295242.012626]  ovs_vport_receive+0x6c/0xc0 [openvswitch]
[295242.028763]  netdev_frame_hook+0xc0/0x180 [openvswitch]
[295242.034074]  __netif_receive_skb_core+0x2ca/0xcb0
[295242.047498]  netif_receive_skb_internal+0x3e/0xc0
[295242.052291]  napi_gro_receive+0xba/0xe0
[295242.056231]  mlx5e_handle_rx_cqe_mpwrq_rep+0x12b/0x250 [mlx5_core]
[295242.062513]  mlx5e_poll_rx_cq+0xa0f/0xa30 [mlx5_core]
[295242.067669]  mlx5e_napi_poll+0xe1/0x6b0 [mlx5_core]
[295242.077958]  net_rx_action+0x149/0x3b0
[295242.086762]  __do_softirq+0xd7/0x2d6
[295242.090427]  irq_exit+0xf7/0x100
[295242.093748]  do_IRQ+0x7f/0xd0
[295242.096806]  common_interrupt+0xf/0xf
[295242.100559]  </IRQ>
[295242.102750] RIP: 0033:0x7f9022e88cbd
[295242.125246] RSP: 002b:00007f9022282b20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
[295242.132900] RAX: 0000000000000005 RBX: 0000000000000010 RCX: 0000000000000000
[295242.140120] RDX: 00007f9022282ba8 RSI: 00007f9022282a30 RDI: 00007f9014005c30
[295242.147337] RBP: 00007f9014014d60 R08: 0000000000000020 R09: 00007f90254a8340
[295242.154557] R10: 00007f9022282a28 R11: 0000000000000246 R12: 0000000000000000
[295242.161775] R13: 00007f902308c000 R14: 000000000000002b R15: 00007f9022b71f40

Fixes: 3fdbd1ce11e5 ("openvswitch: add ipv6 'set' action")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Link: https://lore.kernel.org/r/20220223163416.24096-1-paulb@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: prevent a possible race condition with lifetimes

valid_lft, prefered_lft and tstamp are always accessed under the lock
"lock" in other places. Reading these without taking the lock may result
in inconsistencies regarding the calculation of the valid and preferred
variables since decisions are taken on these fields for those variables.

Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Niels Dossche <niels.dossche@ugent.be>
Link: https://lore.kernel.org/r/20220223131954.6570-1-niels.dossche@ugent.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/smc: Use a mutex for locking "struct smc_pnettable"

smc_pnetid_by_table_ib() uses read_lock() and then it calls smc_pnet_apply_ib()
which, in turn, calls mutex_lock(&smc_ib_devices.mutex).

read_lock() disables preemption. Therefore, the code acquires a mutex while in
atomic context and it leads to a SAC bug.

Fix this bug by replacing the rwlock with a mutex.

Reported-and-tested-by: syzbot+4f322a6d84e991c38775@syzkaller.appspotmail.com
Fixes: 64e28b52c7a6 ("net/smc: add pnet table namespace support")
Confirmed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20220223100252.22562-1-fmdefrancesco@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnx2x: fix driver load from initrd

Commit b7a49f73059f ("bnx2x: Utilize firmware 7.13.21.0") added
new firmware support in the driver with maintaining older firmware
compatibility. However, older firmware was not added in MODULE_FIRMWARE()
which caused missing firmware files in initrd image leading to driver load
failure from initrd. This patch adds MODULE_FIRMWARE() for older firmware
version to have firmware files included in initrd.

Fixes: b7a49f73059f ("bnx2x: Utilize firmware 7.13.21.0")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215627
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Link: https://lore.kernel.org/r/20220223085720.12021-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Revert "xen-netback: Check for hotplug-status existence before watching"

This reverts commit 2afeec08ab5c86ae21952151f726bfe184f6b23d.

The reasoning in the commit was wrong - the code expected to setup the
watch even if 'hotplug-status' didn't exist. In fact, it relied on the
watch being fired the first time - to check if maybe 'hotplug-status' is
already set to 'connected'. Not registering a watch for non-existing
path (which is the case if hotplug script hasn't been executed yet),
made the backend not waiting for the hotplug script to execute. This in
turns, made the netfront think the interface is fully operational, while
in fact it was not (the vif interface on xen-netback side might not be
configured yet).

This was a workaround for 'hotplug-status' erroneously being removed.
But since that is reverted now, the workaround is not necessary either.

More discussion at
https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Michael Brown <mbrown@fensystems.co.uk>
Link: https://lore.kernel.org/r/20220222001817.2264967-2-marmarek@invisiblethingslab.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"

This reverts commit 1f2565780e9b7218cf92c7630130e82dcc0fe9c2.

The 'hotplug-status' node should not be removed as long as the vif
device remains configured. Otherwise the xen-netback would wait for
re-running the network script even if it was already called (in case of
the frontent re-connecting). But also, it _should_ be removed when the
vif device is destroyed (for example when unbinding the driver) -
otherwise hotplug script would not configure the device whenever it
re-appear.

Moving removal of the 'hotplug-status' node was a workaround for nothing
calling network script after xen-netback module is reloaded. But when
vif interface is re-created (on xen-netback unbind/bind for example),
the script should be called, regardless of who does that - currently
this case is not handled by the toolstack, and requires manual
script call. Keeping hotplug-status=connected to skip the call is wrong
and leads to not configured interface.

More discussion at
https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20220222001817.2264967-1-marmarek@invisiblethingslab.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nvme-5.17-2022-02-24' of git://git.infradead.org/nvme into block-5.17

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 5.17

- send H2CData PDUs based on MAXH2CDATA (Varun Prakash)
- fix passthrough to namespaces with unsupported features (me)"

* tag 'nvme-5.17-2022-02-24' of git://git.infradead.org/nvme:
  nvme-tcp: send H2CData PDUs based on MAXH2CDATA
  nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
  nvme: don't return an error from nvme_configure_metadata

surface: surface3_power: Fix battery readings on batteries without a serial number

The battery on the 2nd hand Surface 3 which I recently bought appears to
not have a serial number programmed in. This results in any I2C reads from
the registers containing the serial number failing with an I2C NACK.

This was causing mshw0011_bix() to fail causing the battery readings to
not work at all.

Ignore EREMOTEIO (I2C NACK) errors when retrieving the serial number and
continue with an empty serial number to fix this.

Fixes: b1f81b496b0d ("platform/x86: surface3_power: MSHW0011 rev-eng implementation")
BugLink: https://github.com/linux-surface/linux-surface/issues/608
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220224101848.7219-1-hdegoede@redhat.com

platform/x86: amd-pmc: Set QOS during suspend on CZN w/ timer wakeup

commit 59348401ebed ("platform/x86: amd-pmc: Add special handling for
timer based S0i3 wakeup") adds support for using another platform timer
in lieu of the RTC which doesn't work properly on some systems. This path
was validated and worked well before submission. During the 5.16-rc1 merge
window other patches were merged that caused this to stop working properly.

When this feature was used with 5.16-rc1 or later some OEM laptops with the
matching firmware requirements from that commit would shutdown instead of
program a timer based wakeup.

This was bisected to commit 8d89835b0467 ("PM: suspend: Do not pause
cpuidle in the suspend-to-idle path"). This wasn't supposed to cause any
negative impacts and also tested well on both Intel and ARM platforms.
However this changed the semantics of when CPUs are allowed to be in the
deepest state. For the AMD systems in question it appears this causes a
firmware crash for timer based wakeup.

It's hypothesized to be caused by the `amd-pmc` driver sending `OS_HINT`
and all the CPUs going into a deep state while the timer is still being
programmed. It's likely a firmware bug, but to avoid it don't allow setting
CPUs into the deepest state while using CZN timer wakeup path.

If later it's discovered that this also occurs from "regular" suspends
without a timer as well or on other silicon, this may be later expanded to
run in the suspend path for more scenarios.

Cc: stable@vger.kernel.org # 5.16+
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/linux-acpi/BL1PR12MB51570F5BD05980A0DCA1F3F4E23A9@BL1PR12MB5157.namprd12.prod.outlook.com/T/#mee35f39c41a04b624700ab2621c795367f19c90e
Fixes: 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path")
Fixes: 23f62d7ab25b ("PM: sleep: Pause cpuidle later and resume it earlier during system transitions")
Fixes: 59348401ebed ("platform/x86: amd-pmc: Add special handling for timer based S0i3 wakeup"
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20220223175237.6209-1-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>

Merge tag 'linux-can-next-for-5.18-20220224' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
this is a pull request of 36 patches for net-next/master.

The first 5 patches are by me and update various CAN DT bindings.

Eric Dumazet's patch for the CAN GW replaces a costly
synchronize_rcu() by a call_rcu().

The next 2 patches by me enhance the CAN bit rate handling, the bit
rate checking is simplified and the arguments and local variables of
functions are marked as const.

A patch by me for the kvaser_usb driver removes a redundant variable.

The next patch by me lets the c_can driver use the default ethtool
drvinfo.

Minghao Chi's patch for the softing driver removes a redundant
variable.

Srinivas Neeli contributes an enhancement for the xilinx_can NAPI poll
function.

Vincent Mailhol's patch for the etas_es58x driver converts to
BITS_PER_TYPE() from of manual calculation.

The next 23 patches target the mcp251xfd driver and are by me. The
first 15 patches, add support for the internal PLL, which includes
simplifying runtime PM handling, better chip detection and error
handling after wakeup, and the PLL handling. The last 8 patches
prepare the driver to support multiple RX-FIFOs and runtime
configurable RX/TX rings. The actual runtime ring configuration via
ethtool will be added in a later patch series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

can: mcp251xfd: mcp251xfd_priv: introduce macros specifying the number of supported TEF/RX/TX rings

This patch introduces macros to define the number of supported TEF, RX
and TX rings. As well as some assertions as sanity checks.

Link: https://lore.kernel.org/all/20220217103826.2299157-9-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: prepare for multiple RX-FIFOs

This patch prepares the driver to use more than one RX-FIFO. Having a
bigger RX buffer is beneficial in high load situations, where the
system temporarily cannot keep up reading CAN frames from the chip.
Using a bigger RX buffer also allows to implement RX IRQ coalescing,
which will be added in a later patch series.

If using more than 1 RX-FIFO the driver has to figure out, which FIFOs
have RX'ed CAN frames pending. This is indicated by a set bit in the
RXIF register, which is positioned directly after the interrupt status
register INT. If more than 1 RX-FIFO is used, the driver reads both
registers in 1 transfer.

The mcp251xfd_handle_rxif() function iterates over all RX rings and
reads out the RX'ed CAN frames for for all pending FIFOs. To keep the
logic for the 1 RX-FIFO only case in mcp251xfd_handle_rxif() simple,
the driver marks that FIFO pending in mcp251xfd_ring_init().

The chip has a dedicated RX interrupt line to signal pending RX'ed
frames. If connected to an input GPIO and the driver will skip the
initial read of the interrupt status register (INT) and directly read
the pending RX'ed frames if the line is active. The driver assumes the
1st RX-FIFO pending (a read of the RXIF register would re-introduce
the skipped initial read of the INT register). Any other pending
RX-FIFO will be served in the main interrupt handler.

Link: https://lore.kernel.org/all/20220217103826.2299157-8-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: ring: update FIFO setup debug info

The recent change of the order of the TX and RX FIFOs is not reflected
in the debug info of the FIFO setup. This patch adjust the order and
additionally prints the base address of each FIFO.

Since the mcp251xfd_ring_init() may fail due to wrongly configured
FIFOs, printing of the FIFO setup is moved there. In case of an error
it would not be printed in mcp251xfd_ring_init().

Link: https://lore.kernel.org/all/20220217103826.2299157-7-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: ring: mcp251xfd_ring_init(): checked RAM usage of ring setup

With this patch the usage of the on-chip RAM is checked. In the
current driver the FIFO setup is fixed and always fits into the RAM.

With an upcoming patch series the ring and FIFO setup will be more
dynamic. Although using more RAM than available should not happen, but
add this safety check, just in case.

Link: https://lore.kernel.org/all/20220217103826.2299157-6-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: ring: change order of TX and RX FIFOs

This patch actually changes the order of the TX and RX FIFOs.

This gives the opportunity to minimize the number of SPI transfers in
the IRQ handler. The read of the IRQ status register and RX FIFO
status registers can be combined into single SPI transfer. If the RX
ring uses FIFO 1, the overall length of the transfer is smaller than
in the original layout, where the RX FIFO comes after the TX FIFO.

Link: https://lore.kernel.org/all/20220217103826.2299157-5-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: ring: prepare to change order of TX and RX FIFOs

This patch improves the initialization of the TX and RX rings. The
initialization functions are now called with pointers to the next free
address (in the on chip RAM) and next free hardware FIFO. The rings
are initialized using these values and the pointers are modified to
point to the next free elements.

This means the order of the mcp251xfd_ring_init_*() functions
specifies the order of the rings in the hardware FIFO. This makes it
possible to change the order of the TX and RX FIFOs, which is done in
the next patch.

This gives the opportunity to minimize the number of SPI transfers in
the IRQ handler. The read of the IRQ status register and RX FIFO
status registers can be combined into single SPI transfer. If the RX
ring uses FIFO 1, the overall length of the transfer is smaller than
in the original layout, where the RX FIFO comes after the TX FIFO.

Link: https://lore.kernel.org/all/20220217103826.2299157-4-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: mcp251xfd_ring_init(): split ring_init into separate functions

This patch splits the initialization of the TEF, TX and RX FIFO in the
mcp251xfd_ring_init() function into separate functions. This is a
preparation patch to move the RX FIFO in front of the TX FIFO.

Link: https://lore.kernel.org/all/20220217103826.2299157-3-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: introduce struct mcp251xfd_tx_ring::nr and ::fifo_nr and make use of it

This patch removes the hard coded assumption that the TX ring uses
hardware FIFO 1. This allows the hardware FIFO 1 to be used for RX and
the next free FIFO for TX.

This gives the opportunity to minimize the number of SPI transfers in
the IRQ handler. The read of the IRQ status register and RX FIFO
status registers can be combined into single SPI transfer. If the RX
ring uses FIFO 1, the overall length of the transfer is smaller than
in the original layout, where the RX FIFO comes after the TX FIFO.

Link: https://lore.kernel.org/all/20220217103826.2299157-2-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: add support for internal PLL

The PLL is enabled if the configured clock is less than or equal to 10 times
the max clock frequency.

The device will operate with two different SPI speeds. A slow speed determined
by the clock without the PLL enabled, and a fast speed derived from the
frequency with the PLL enabled.

Link: https://lore.kernel.org/all/20220207131047.282110-16-mkl@pengutronix.de
Link: https://lore.kernel.org/all/20201015124401.2766-3-mas@csselectronics.com
Co-developed-by: Magnus Aagaard Sørensen <mas@csselectronics.com>
Signed-off-by: Magnus Aagaard Sørensen <mas@csselectronics.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: mcp251xfd_register(): prepare to activate PLL after softreset

If the PLL is needed it must be switched on after chip reset. This
patch adds the required call to mcp251xfd_register().

Link: https://lore.kernel.org/all/20220207131047.282110-15-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: mcp251xfd_chip_clock_init(): prepare for PLL support, wait for OSC ready

This patch prepares the mcp251xfd_chip_clock_init() function for PLL
support.

If the PLL is needed is must be switched on after chip reset. This
should be done in the mcp251xfd_chip_clock_init() function. Prepare
this function to wait for the OSC and PLL to be ready.

Link: https://lore.kernel.org/all/20220207131047.282110-14-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: __mcp251xfd_chip_set_mode(): prepare for PLL support: improve error handling and diagnostics

This patch prepares the __mcp251xfd_chip_set_mode() function for PLL
support by adding more error checks and diagnostics.

Link: https://lore.kernel.org/all/20220207131047.282110-13-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: mcp251xfd_chip_wake(): renamed from mcp251xfd_chip_clock_enable()

This patch renames mcp251xfd_chip_clock_enable() into mcp251xfd_chip_wake() as
this function actually wakes the chip. Additionally the documentation is
adopted.

Link: https://lore.kernel.org/all/20220207131047.282110-12-mkl@pengutronix.de
Co-developed-by: Magnus Aagaard Sørensen <mas@csselectronics.com>
Signed-off-by: Magnus Aagaard Sørensen <mas@csselectronics.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: mcp251xfd_chip_timestamp_init(): factor out into separate function

This patch factors out the timestamp initialization from the clock
initialization.

This is a preparation patch for the PLL support, where clock and
timestamp init must be done separately.

Link: https://lore.kernel.org/all/20220207131047.282110-11-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: mcp251xfd_chip_softreset_check(): wait for OSC ready before accessing chip

This patch changes the order of reading the Mode and Oscillator Ready
bits.

Instead of reading the Mode of the chip directly after reset, first
wait for the oscillator to get ready and the chip to fully start up.
Read the Mode after this.

Link: https://lore.kernel.org/all/20220207131047.282110-10-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: mcp251xfd_chip_wait_for_osc_ready(): prepare for PLL support

The function mcp251xfd_chip_wait_for_osc_ready() polls the Oscillator
Control Register for the oscillator to get ready. By passing the
appropriate parameters (osc_reference and osc_mask) it can also poll
for PLL ready.

This patch adjusts the error message if the Oscillator and/or PLL fail
to get ready.

Link: https://lore.kernel.org/all/20220207131047.282110-9-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: mcp251xfd_chip_wait_for_osc_ready(): improve chip detection and error handling

The function mcp251xfd_chip_wait_for_osc_ready() polls the Oscillator
Control Register for the oscillator to get ready.

This is the first register the driver reads from. Reading implausible
values (all bits set or unset) can be caused by the chip starting up
after power on, waking up after sleep, or by the chip not being preset
at all. Add check for implausible register content
mcp251xfd_reg_invalid() to the regmap_read_poll_timeout() loop.

In case of a regmap_read_poll_timeout() returns a fatal error (and not
a timeout), forward it to the caller.

As mcp251xfd_chip_wait_for_osc_ready() will be called after the probe
function has finished, (currently during ifup), move error message
about failed chip detection from there into the probe function.

Link: https://lore.kernel.org/all/20220207131047.282110-8-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: mcp251xfd_chip_wait_for_osc_ready(): factor out into separate function

This patch factors out mcp251xfd_chip_wait_for_osc_ready() into a
separate function, it will be used in several places in the next
patches.

Link: https://lore.kernel.org/all/20220207131047.282110-7-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: mcp251xfd_chip_stop(): convert to a void function

The mcp251xfd_chip_stop() function tries the best to stop the chip and
put it into sleep mode. It continues, even if some intermediate steps
fail. As none of the callers use the return value, let this function
return void.

Link: https://lore.kernel.org/all/20220207131047.282110-6-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: mcp251xfd_chip_sleep(): introduce function to bring chip into sleep mode

This patch adds a new function to bring the chip into sleep mode, and
replaces several occurrences of open coded variants.

Link: https://lore.kernel.org/all/20220207131047.282110-5-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: mcp251xfd_unregister(): simplify runtime PM handling

The mcp251xfd driver supports runtime PM enabled kernels, but also
works on !CONFIG_PM configurations.

This patch simplifies the runtime PM handling in the
mcp251xfd_unregister(). In the CONFIG_PM case, runtime PM has been
enabled in the mcp251xfd_probe() function, so we can disable it here.
For !CONFIG_PM builds call mcp251xfd_clks_and_vdd_disable() directly.

Link: https://lore.kernel.org/all/20220207131047.282110-4-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: mcp251xfd_regmap_crc_read(): ignore CRC error only if solely OSC register is read

MCP251XFD_REG_OSC is the first register the driver reads from. The
chip may be in deep sleep and the SPI transfer (i.e. the assertion of
the CS) will wake the chip up. This takes about 3ms. The CRC of this
transfer is wrong, or there isn't any chip at all, in this case the
CRC will be wrong, too. The driver ignores the CRC error and returns
the read data to the caller.

To avoid any confusion, this patch changes the
mcp251xfd_regmap_crc_read() function to only ignore the CRC error if
solely the OSC register is read. So when reading more than the OSC
registers at once, CRC errors are not ignored.

Link: https://lore.kernel.org/all/20220207131047.282110-3-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: mcp251xfd_reg_invalid(): rename from mcp251xfd_osc_invalid()

This patch renames mcp251xfd_osc_invalid() to mcp251xfd_reg_invalid(),
as it will be used for other registers than the "osc" register in a
later patch.

This patch also moves this function to more towards the beginning of
the file, to be available for other functions, too.

Link: https://lore.kernel.org/all/20220207131047.282110-2-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: etas_es58x: use BITS_PER_TYPE() instead of manual calculation

The input to the GENMASK() macro was calculated by hand. Replaced it
with a dedicated macro: BITS_PER_TYPE() which does the exact same job.

Link: https://lore.kernel.org/all/20220212130737.3008-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: xilinx_can: Add check for NAPI Poll function

Add check for NAPI poll function to avoid enabling interrupts
with out completing the NAPI call.

Link: https://lore.kernel.org/all/20220208162053.39896-1-srinivas.neeli@xilinx.com
Signed-off-by: Srinivas Neeli <srinivas.neeli@xilinx.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: softing: softing_netdev_open(): remove redundant ret variable

Return value from softing_startstop() directly instead of taking this
in another redundant variable.

Link: https://lore.kernel.org/all/20220112080629.667191-1-chi.minghao@zte.com.cn
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: ethtool: use default drvinfo

The ethtool core implements a default drvinfo.

There's no need to replicate this in the driver, no additional
information is added, so remove this and rely on the default.

Link: https://lore.kernel.org/all/20220124215642.3474154-10-mkl@pengutronix.de
Cc: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_usb: kvaser_usb_send_cmd(): remove redundant variable actual_len

The function usb_bulk_msg() can be called with a NULL pointer as the
"actual_length" parameter. This patch removes this variable.

Link: https://lore.kernel.org/all/20220124215642.3474154-9-mkl@pengutronix.de
Cc: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bittiming: mark function arguments and local variables as const

This patch marks the arguments of some functions as well as some local
variables as constant.

Link: https://lore.kernel.org/all/20220124215642.3474154-7-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: bittiming: can_validate_bitrate(): simplify bit rate checking

This patch simplifies the validation of the fixed bit rates. If a
supported bit rate is found, directly return 0.

If no valid bit rate is found return -EINVAL;

Link: https://lore.kernel.org/all/20220124215642.3474154-6-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: gw: use call_rcu() instead of costly synchronize_rcu()

Commit fb8696ab14ad ("can: gw: synchronize rcu operations
before removing gw job entry") added three synchronize_rcu() calls
to make sure one rcu grace period was observed before freeing
a "struct cgw_job" (which are tiny objects).

This should be converted to call_rcu() to avoid adding delays
in device / network dismantles.

Use the rcu_head that was already in struct cgw_job,
not yet used.

Link: https://lore.kernel.org/all/20220207190706.1499190-1-eric.dumazet@gmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

dt-binding: can: m_can: include common CAN controller bindings

Since commit

| 1f9234401ce0 ("dt-bindings: can: add can-controller.yaml")

there is a common CAN controller binding. Add this to the m_can
binding.

Link: https://lore.kernel.org/all/20220124220653.3477172-4-mkl@pengutronix.de
Reviewed-by: Chandrasekar Ramakrishnan <rcsekar@samsung.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

dt-binding: can: m_can: fix indention of table in bosch,mram-cfg description

This patch fixes the indention of the table in the description of the
bosch,mram-cfg property.

Link: https://lore.kernel.org/all/20220217101111.2291151-1-mkl@pengutronix.de
Reviewed-by: Chandrasekar Ramakrishnan <rcsekar@samsung.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>