Florian Westphal [Mon, 11 Apr 2022 11:01:16 +0000 (13:01 +0200)]
netfilter: ecache: use dedicated list for event redelivery
This disentangles event redelivery and the percpu dying list.
Because entries are now stored on a dedicated list, all
entries are in NFCT_ECACHE_DESTROY_FAIL state and all entries
still have confirmed bit set -- the reference count is at least 1.
The 'struct net' back-pointer can be removed as well.
The pcpu dying list will be removed eventually, it has no functionality.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Luiz Angelo Daros de Luca [Sat, 16 Apr 2022 05:27:37 +0000 (02:27 -0300)]
docs: net: dsa: describe issues with checksum offload
DSA tags before IP header (categories 1 and 2) or after the payload (3)
might introduce offload checksum issues.
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 18 Apr 2022 10:00:19 +0000 (11:00 +0100)]
Merge branch 'mlxsw-line-card'
Ido Schimmel says:
====================
mlxsw: Introduce line card support for modular switch
Jiri says:
This patchset introduces support for modular switch systems and also
introduces mlxsw support for NVIDIA Mellanox SN4800 modular switch.
It contains 8 slots to accommodate line cards - replaceable PHY modules
which may contain gearboxes.
Currently supported line card:
16X 100GbE (QSFP28)
Other line cards that are going to be supported:
8X 200GbE (QSFP56)
4X 400GbE (QSFP-DD)
There may be other types of line cards added in the future.
To be consistent with the port split configuration (splitter cabels),
the line card entities are treated in the similar way. The nature of
a line card is not "a pluggable device", but "a pluggable PHY module".
A concept of "provisioning" is introduced. The user may "provision"
certain slot with a line card type. Driver then creates all instances
(devlink ports, netdevices, etc) related to this line card type. It does
not matter if the line card is plugged-in at the time. User is able to
configure netdevices, devlink ports, setup port splitters, etc. From the
perspective of the switch ASIC, all is present and can be configured.
The carrier of netdevices stays down if the line card is not plugged-in.
Once the line card is inserted and activated, the carrier of
the related netdevices is then reflecting the physical line state,
same as for an ordinary fixed port.
Once user does not want to use the line card related instances
anymore, he can "unprovision" the slot. Driver then removes the
instances.
Patches 1-4 are extending devlink driver API and UAPI in order to
register, show, dump, provision and activate the line card.
Patches 5-17 are implementing the introduced API in mlxsw.
The last patch adds a selftest for mlxsw line cards.
Example:
$ devlink port # No ports are listed
$ devlink lc
pci/0000:01:00.0:
lc 1 state unprovisioned
supported_types:
16x100G
lc 2 state unprovisioned
supported_types:
16x100G
lc 3 state unprovisioned
supported_types:
16x100G
lc 4 state unprovisioned
supported_types:
16x100G
lc 5 state unprovisioned
supported_types:
16x100G
lc 6 state unprovisioned
supported_types:
16x100G
lc 7 state unprovisioned
supported_types:
16x100G
lc 8 state unprovisioned
supported_types:
16x100G
Note that driver exposes list supported line card types. Currently
there is only one: "16x100G".
To provision the slot #8:
$ devlink lc set pci/0000:01:00.0 lc 8 type 16x100G
$ devlink lc show pci/0000:01:00.0 lc 8
pci/0000:01:00.0:
lc 8 state active type 16x100G
supported_types:
16x100G
$ devlink port
pci/0000:01:00.0/0: type notset flavour cpu port 0 splittable false
pci/0000:01:00.0/53: type eth netdev enp1s0nl8p1 flavour physical lc 8 port 1 splittable true lanes 4
pci/0000:01:00.0/54: type eth netdev enp1s0nl8p2 flavour physical lc 8 port 2 splittable true lanes 4
pci/0000:01:00.0/55: type eth netdev enp1s0nl8p3 flavour physical lc 8 port 3 splittable true lanes 4
pci/0000:01:00.0/56: type eth netdev enp1s0nl8p4 flavour physical lc 8 port 4 splittable true lanes 4
pci/0000:01:00.0/57: type eth netdev enp1s0nl8p5 flavour physical lc 8 port 5 splittable true lanes 4
pci/0000:01:00.0/58: type eth netdev enp1s0nl8p6 flavour physical lc 8 port 6 splittable true lanes 4
pci/0000:01:00.0/59: type eth netdev enp1s0nl8p7 flavour physical lc 8 port 7 splittable true lanes 4
pci/0000:01:00.0/60: type eth netdev enp1s0nl8p8 flavour physical lc 8 port 8 splittable true lanes 4
pci/0000:01:00.0/61: type eth netdev enp1s0nl8p9 flavour physical lc 8 port 9 splittable true lanes 4
pci/0000:01:00.0/62: type eth netdev enp1s0nl8p10 flavour physical lc 8 port 10 splittable true lanes 4
pci/0000:01:00.0/63: type eth netdev enp1s0nl8p11 flavour physical lc 8 port 11 splittable true lanes 4
pci/0000:01:00.0/64: type eth netdev enp1s0nl8p12 flavour physical lc 8 port 12 splittable true lanes 4
pci/0000:01:00.0/125: type eth netdev enp1s0nl8p13 flavour physical lc 8 port 13 splittable true lanes 4
pci/0000:01:00.0/126: type eth netdev enp1s0nl8p14 flavour physical lc 8 port 14 splittable true lanes 4
pci/0000:01:00.0/127: type eth netdev enp1s0nl8p15 flavour physical lc 8 port 15 splittable true lanes 4
pci/0000:01:00.0/128: type eth netdev enp1s0nl8p16 flavour physical lc 8 port 16 splittable true lanes 4
To uprovision the slot #8:
$ devlink lc set pci/0000:01:00.0 lc 8 notype
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:41 +0000 (09:42 +0300)]
selftests: mlxsw: Introduce devlink line card provision/unprovision/activation tests
Introduce basic line card manipulation which consists of provisioning,
unprovisioning and activation of a line card.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:40 +0000 (09:42 +0300)]
mlxsw: spectrum: Add port to linecard mapping
For each port get slot_index using PMLP register. For ports residing
on a linecard, identify it with the linecard by setting mapping
using devlink_port_linecard_set() helper. Use linecard slot index for
PMTDB register queries.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:39 +0000 (09:42 +0300)]
mlxsw: core: Extend driver ops by remove selected ports op
In case of line card implementation, the core has to have a way to
remove relevant ports manually. Extend the Spectrum driver ops by an op
that implements port removal of selected ports upon request.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:38 +0000 (09:42 +0300)]
mlxsw: core_linecards: Implement line card activation process
Allow to process events generated upon line card getting "ready" and
"active".
When DSDSC event with "ready" bit set is delivered, that means the
line card is powered up. Use MDDC register to push the line card to
active state. Once FW is done with that, the DSDSC event with "active"
bit set is delivered.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:37 +0000 (09:42 +0300)]
mlxsw: core_linecards: Add line card objects and implement provisioning
Introduce objects for line cards and an infrastructure around that.
Use devlink_linecard_create/destroy() to register the line card with
devlink core. Implement provisioning ops with a list of supported
line cards.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:36 +0000 (09:42 +0300)]
mlxsw: reg: Add Management Binary Code Transfer Register
The MBCT register allows to transfer binary INI codes from the host to
the management FW by transferring it by chunks of maximum 1KB.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:35 +0000 (09:42 +0300)]
mlxsw: reg: Add Management DownStream Device Control Register
The MDDC register allows to control downstream devices and line cards.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:34 +0000 (09:42 +0300)]
mlxsw: reg: Add Management DownStream Device Query Register
The MDDQ register allows to query the DownStream device properties.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:33 +0000 (09:42 +0300)]
mlxsw: spectrum: Introduce port mapping change event processing
Register PMLPE trap and process the port mapping changes delivered
by it by creating related ports. Note that this happens after
provisioning. The INI of the linecard is processed and merged by FW.
PMLPE is generated for each port. Process this mapping change.
Layout of PMLPE is the same as layout of PMLP.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:32 +0000 (09:42 +0300)]
mlxsw: Narrow the critical section of devl_lock during ports creation/removal
No need to hold the lock for alloc and freecpu. So narrow the critical
section. Follow-up patch is going to benefit from this by adding more
code to the functions which will be out of the critical as well.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:31 +0000 (09:42 +0300)]
mlxsw: reg: Add Ports Mapping Event Configuration Register
The PMECR register is used to enable/disable event triggering
in case of local port mapping change.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:30 +0000 (09:42 +0300)]
mlxsw: spectrum: Allocate port mapping array of structs instead of pointers
Instead of array of pointers to port mapping structures, allocate the
array of structures directly.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:29 +0000 (09:42 +0300)]
mlxsw: spectrum: Allow lane to start from non-zero index
So far, the lane index always started from zero. That is not true for
modular systems with gearbox-equipped linecards. Loose the check so the
lanes can start from non-zero index.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:28 +0000 (09:42 +0300)]
devlink: add port to line card relationship set
In order to properly inform user about relationship between port and
line card, introduce a driver API to set line card for a port. Use this
information to extend port devlink netlink message by line card index
and also include the line card index into phys_port_name and by that
into a netdevice name.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:27 +0000 (09:42 +0300)]
devlink: implement line card active state
Allow driver to mark a line card as active. Expose this state to the
userspace over devlink netlink interface with proper notifications.
'active' state means that line card was plugged in after
being provisioned.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:26 +0000 (09:42 +0300)]
devlink: implement line card provisioning
In order to be able to configure all needed stuff on a port/netdevice
of a line card without the line card being present, introduce line card
provisioning. Basically by setting a type, provisioning process will
start and driver is supposed to create a placeholder for instances
(ports/netdevices) for a line card type.
Allow the user to query the supported line card types over line card
get command. Then implement two netlink command SET to allow user to
set/unset the card type.
On the driver API side, add provision/unprovision ops and supported
types array to be advertised. Upon provision op call, the driver should
take care of creating the instances for the particular line card type.
Introduce provision_set/clear() functions to be called by the driver
once the provisioning/unprovisioning is done on its side. These helpers
are not to be called directly due to the async nature of provisioning.
Example:
$ devlink port # No ports are listed
$ devlink lc
pci/0000:01:00.0:
lc 1 state unprovisioned
supported_types:
16x100G
lc 2 state unprovisioned
supported_types:
16x100G
lc 3 state unprovisioned
supported_types:
16x100G
lc 4 state unprovisioned
supported_types:
16x100G
lc 5 state unprovisioned
supported_types:
16x100G
lc 6 state unprovisioned
supported_types:
16x100G
lc 7 state unprovisioned
supported_types:
16x100G
lc 8 state unprovisioned
supported_types:
16x100G
$ devlink lc set pci/0000:01:00.0 lc 8 type 16x100G
$ devlink lc show pci/0000:01:00.0 lc 8
pci/0000:01:00.0:
lc 8 state active type 16x100G
supported_types:
16x100G
$ devlink port
pci/0000:01:00.0/0: type notset flavour cpu port 0 splittable false
pci/0000:01:00.0/53: type eth netdev enp1s0nl8p1 flavour physical lc 8 port 1 splittable true lanes 4
pci/0000:01:00.0/54: type eth netdev enp1s0nl8p2 flavour physical lc 8 port 2 splittable true lanes 4
pci/0000:01:00.0/55: type eth netdev enp1s0nl8p3 flavour physical lc 8 port 3 splittable true lanes 4
pci/0000:01:00.0/56: type eth netdev enp1s0nl8p4 flavour physical lc 8 port 4 splittable true lanes 4
pci/0000:01:00.0/57: type eth netdev enp1s0nl8p5 flavour physical lc 8 port 5 splittable true lanes 4
pci/0000:01:00.0/58: type eth netdev enp1s0nl8p6 flavour physical lc 8 port 6 splittable true lanes 4
pci/0000:01:00.0/59: type eth netdev enp1s0nl8p7 flavour physical lc 8 port 7 splittable true lanes 4
pci/0000:01:00.0/60: type eth netdev enp1s0nl8p8 flavour physical lc 8 port 8 splittable true lanes 4
pci/0000:01:00.0/61: type eth netdev enp1s0nl8p9 flavour physical lc 8 port 9 splittable true lanes 4
pci/0000:01:00.0/62: type eth netdev enp1s0nl8p10 flavour physical lc 8 port 10 splittable true lanes 4
pci/0000:01:00.0/63: type eth netdev enp1s0nl8p11 flavour physical lc 8 port 11 splittable true lanes 4
pci/0000:01:00.0/64: type eth netdev enp1s0nl8p12 flavour physical lc 8 port 12 splittable true lanes 4
pci/0000:01:00.0/125: type eth netdev enp1s0nl8p13 flavour physical lc 8 port 13 splittable true lanes 4
pci/0000:01:00.0/126: type eth netdev enp1s0nl8p14 flavour physical lc 8 port 14 splittable true lanes 4
pci/0000:01:00.0/127: type eth netdev enp1s0nl8p15 flavour physical lc 8 port 15 splittable true lanes 4
pci/0000:01:00.0/128: type eth netdev enp1s0nl8p16 flavour physical lc 8 port 16 splittable true lanes 4
$ devlink lc set pci/0000:01:00.0 lc 8 notype
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 18 Apr 2022 06:42:25 +0000 (09:42 +0300)]
devlink: add support to create line card and expose to user
Extend the devlink API so the driver is going to be able to create and
destroy linecard instances. There can be multiple line cards per devlink
device. Expose this new type of object over devlink netlink API to the
userspace, with notifications.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 17 Apr 2022 18:34:32 +0000 (11:34 -0700)]
tcp: fix signed/unsigned comparison
Kernel test robot reported:
smatch warnings:
net/ipv4/tcp_input.c:5966 tcp_rcv_established() warn: unsigned 'reason' is never less than zero.
I actually had one packetdrill failing because of this bug,
and was about to send the fix :)
v2: Andreas Schwab also pointed out that @reason needs to be negated
before we reach tcp_drop_reason()
Fixes: 4b506af9c5b8 ("tcp: add two drop reasons for tcp_ack()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 17 Apr 2022 12:31:32 +0000 (13:31 +0100)]
Merge branch 'tcp-drop-reason-additions'
Eric Dumazet says:
====================
tcp: drop reason additions
Currently, TCP is either missing drop reasons,
or pretending that some useful packets are dropped.
This patch series makes "perf record -a -e skb:kfree_skb"
much more usable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 16 Apr 2022 00:10:48 +0000 (17:10 -0700)]
tcp: add drop reason support to tcp_ofo_queue()
packets in OFO queue might be redundant, and dropped.
tcp_drop() is no longer needed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 16 Apr 2022 00:10:47 +0000 (17:10 -0700)]
tcp: add drop reasons to tcp_rcv_synsent_state_process()
Re-use existing reasons.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 16 Apr 2022 00:10:46 +0000 (17:10 -0700)]
tcp: make tcp_rcv_synsent_state_process() drop monitor friend
1) A valid RST packet should be consumed, to not confuse drop monitor.
2) Same remark for packet validating cross syn setup,
even if we might ignore part of it.
3) When third packet of 3WHS is delayed, do not pretend
the SYNACK was dropped.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 16 Apr 2022 00:10:45 +0000 (17:10 -0700)]
tcp: add drop reason support to tcp_prune_ofo_queue()
Add one reason for packets dropped from OFO queue because
of memory pressure.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 16 Apr 2022 00:10:44 +0000 (17:10 -0700)]
tcp: add two drop reasons for tcp_ack()
Add TCP_TOO_OLD_ACK and TCP_ACK_UNSENT_DATA drop
reasons so that tcp_rcv_established() can report
them.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 16 Apr 2022 00:10:43 +0000 (17:10 -0700)]
tcp: add drop reasons to tcp_rcv_state_process()
Add basic support for drop reasons in tcp_rcv_state_process()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 16 Apr 2022 00:10:42 +0000 (17:10 -0700)]
tcp: make tcp_rcv_state_process() drop monitor friendly
tcp_rcv_state_process() incorrectly drops packets
instead of consuming it, making drop monitor very noisy,
if not unusable.
Calling tcp_time_wait() or tcp_done() is part
of standard behavior, packets triggering these actions
were not dropped.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 16 Apr 2022 00:10:41 +0000 (17:10 -0700)]
tcp: add drop reason support to tcp_validate_incoming()
Creates four new drop reasons for the following cases:
1) packet being rejected by RFC 7323 PAWS check
2) packet being rejected by SEQUENCE check
3) Invalid RST packet
4) Invalid SYN packet
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 16 Apr 2022 00:10:40 +0000 (17:10 -0700)]
tcp: get rid of rst_seq_match
Small cleanup in tcp_validate_incoming(), no need for rst_seq_match
setting and testing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 16 Apr 2022 00:10:39 +0000 (17:10 -0700)]
tcp: consume incoming skb leading to a reset
Whenever tcp_validate_incoming() handles a valid RST packet,
we should not pretend the packet was dropped.
Create a special section at the end of tcp_validate_incoming()
to handle this case.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 17 Apr 2022 12:28:29 +0000 (13:28 +0100)]
Merge branch 'qca8k_preiv-shrink'
Ansuel Smith says:
====================
net: Reduce qca8k_priv space usage
These 6 patch is a first attempt at reducting qca8k_priv space.
The code changed a lot during times and we have many old logic
that can be replaced with new implementation
The first patch drop the tracking of MTU. We mimic what was done
for mtk and we change MTU only when CPU port is changed.
The second patch finally drop a piece of story of this driver.
The ar8xxx_port_status struct was used by the first implementation
of this driver to put all sort of status data for the port...
With the evolution of DSA all that stuff got dropped till only
the enabled state was the only part of the that struct.
Since it's overkill to keep an array of int, we convert the variable
to a simple u8 where we store the status of each port. This is needed
to don't reanable ports on system resume.
The third patch is a preparation for patch 4. As Vladimir explained
in another patch, we waste a tons of space by keeping a duplicate of
the switch dsa ops in qca8k_priv. The only reason for this is to
dynamically set the correct mdiobus configuration (a legacy dsa one,
or a custom dedicated one)
To solve this problem, we just drop the phy_read/phy_write and we
declare a custom mdiobus in any case.
This way we can use a static dsa switch ops struct and we can drop it
from qca8k_priv
Patch 4 drop the duplicated dsa_switch_ops.
Patch 5 is a fixup for mdio read error.
Patch 6 is an effort to standardize how bus name are done.
This series is just a start of more cleanup.
The idea is to move this driver to the qca dir and split common code
from specific code. Also the mgmt eth code still requires some love
and can totally be optimized by recycling the same skb over time.
Also while working on the MTU it was notice some problem with
the stmmac driver and with the reloading phase that cause all
sort of problems with qca8k.
I'm sending this here just to try to keep small series instead of
proposing monster series hard to review.
v2:
- Rework MTU patch
v3:
- Drop unrealated changes from patch 3
- Add fixup patch for mdio read
- Unify bus name for legacy and OF mdio bus
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ansuel Smith [Fri, 15 Apr 2022 23:30:17 +0000 (01:30 +0200)]
net: dsa: qca8k: unify bus id naming with legacy and OF mdio bus
Add support for multiple switch with OF mdio bus declaration.
Unify the bus id naming and use the same logic for both legacy and OF
mdio bus.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ansuel Smith [Fri, 15 Apr 2022 23:30:16 +0000 (01:30 +0200)]
net: dsa: qca8k: correctly handle mdio read error
Restore original way to handle mdio read error by returning 0xffff.
This was wrongly changed when the internal_mdio_read was introduced,
now that both legacy and internal use the same function, make sure that
they behave the same way.
Fixes: ce062a0adbfe ("net: dsa: qca8k: fix kernel panic with legacy mdio mapping")
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ansuel Smith [Fri, 15 Apr 2022 23:30:15 +0000 (01:30 +0200)]
net: dsa: qca8k: drop dsa_switch_ops from qca8k_priv
Now that dsa_switch_ops is not switch specific anymore, we can drop it
from qca8k_priv and use the static ops directly for the dsa_switch
pointer.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ansuel Smith [Fri, 15 Apr 2022 23:30:14 +0000 (01:30 +0200)]
net: dsa: qca8k: rework and simplify mdiobus logic
In an attempt to reduce qca8k_priv space, rework and simplify mdiobus
logic.
We now declare a mdiobus instead of relying on DSA phy_read/write even
if a mdio node is not present. This is all to make the qca8k ops static
and not switch specific. With a legacy implementation where port doesn't
have a phy map declared in the dts with a mdio node, we declare a
'qca8k-legacy' mdiobus. The conversion logic is used as legacy read and
write ops are used instead of the internal one.
Also drop the legacy_phy_port_mapping as we now declare mdiobus with ops
that already address the workaround.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ansuel Smith [Fri, 15 Apr 2022 23:30:13 +0000 (01:30 +0200)]
net: dsa: qca8k: drop port_sts from qca8k_priv
Port_sts is a thing of the past for this driver. It was something
present on the initial implementation of this driver and parts of the
original struct were dropped over time. Using an array of int to store if
a port is enabled or not to handle PM operation seems overkill. Switch
and use a simple u8 to store the port status where each bit correspond
to a port. (bit is set port is enabled, bit is not set, port is disabled)
Also add some comments to better describe why we need to track port
status.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ansuel Smith [Fri, 15 Apr 2022 23:30:12 +0000 (01:30 +0200)]
net: dsa: qca8k: drop MTU tracking from qca8k_priv
DSA set the CPU port based on the largest MTU of all the slave ports.
Based on this we can drop the MTU array from qca8k_priv and set the
port_change_mtu logic on DSA changing MTU of the CPU port as the switch
have a global MTU settingfor each port.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arun Ajith S [Fri, 15 Apr 2022 08:34:02 +0000 (08:34 +0000)]
net/ipv6: Introduce accept_unsolicited_na knob to implement router-side changes for RFC9131
Add a new neighbour cache entry in STALE state for routers on receiving
an unsolicited (gratuitous) neighbour advertisement with
target link-layer-address option specified.
This is similar to the arp_accept configuration for IPv4.
A new sysctl endpoint is created to turn on this behaviour:
/proc/sys/net/ipv6/conf/interface/accept_unsolicited_na.
Signed-off-by: Arun Ajith S <aajith@arista.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 13 Apr 2022 20:56:53 +0000 (13:56 -0700)]
ipv6: fix NULL deref in ip6_rcv_core()
idev can be NULL, as the surrounding code suggests.
Fixes: 4daf841a2ef3 ("net: ipv6: add skb drop reasons to ip6_rcv_core()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Menglong Dong <imagedong@tencent.com>
Cc: Jiang Biao <benbjiang@tencent.com>
Cc: Hao Peng <flyingpeng@tencent.com>
Link: https://lore.kernel.org/r/20220413205653.1178458-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Thu, 14 Apr 2022 01:10:04 +0000 (18:10 -0700)]
net_sched: make qdisc_reset() smaller
For some unknown reason qdisc_reset() is using
a convoluted way of freeing two lists of skbs.
Use __skb_queue_purge() instead.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20220414011004.2378350-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Leon Romanovsky [Thu, 14 Apr 2022 07:52:42 +0000 (10:52 +0300)]
octeon_ep: Remove custom driver version
In review comment [1] was pointed that new code is not supposed
to set driver version and should rely on kernel version instead.
As an outcome of that comment all the dance around setting such
driver version to FW should be removed too, because in upstream
kernel whole driver will have same version so read/write from/to
FW will give same result.
[1] https://lore.kernel.org/all/YladGTmon1x3dfxI@unreal
Fixes: 862cd659a6fb ("octeon_ep: Add driver framework and device initialization")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/5d76f3116ee795071ec044eabb815d6c2bdc7dbd.1649922731.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 15 Apr 2022 21:02:09 +0000 (14:02 -0700)]
Merge branch 'ibmvnic-use-a-set-of-ltbs-per-pool'
Sukadev Bhattiprolu says:
====================
ibmvnic: Use a set of LTBs per pool
ibmvnic uses a single large long term buffer (LTB) per rx or tx
pool (queue). This has two limitations.
First, if we need to free/allocate an LTB (eg during a reset), under
low memory conditions, the allocation can fail.
Second, the kernel limits the size of single LTB (DMA buffer) to 16MB
(based on MAX_ORDER). With jumbo frames (mtu = 9000) we can only have
about 1763 buffers per LTB (16MB / 9588 bytes per frame) even though
the max supported buffers is 4096. (The 9588 instead of 9088 comes from
IBMVNIC_BUFFER_HLEN.)
To overcome these limitations, allow creating a set of LTBs per queue.
====================
Link: https://lore.kernel.org/r/20220413171026.1264294-1-drt@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sukadev Bhattiprolu [Wed, 13 Apr 2022 17:10:26 +0000 (13:10 -0400)]
ibmvnic: Allow multiple ltbs in txpool ltb_set
Allow multiple LTBs in the txpool's ltb_set. i.e rather than using
a single large LTB, use several smaller LTBs.
The first n-1 LTBs will all be of the same size. The size of the last
LTB in the set depends on the number of buffers and buffer (mtu) size.
This strategy hopefully allows more reuse of the initial LTBs and also
reduces the chances of an allocation failure (of the large LTB) when
system is low in memory.
Suggested-by: Brian King <brking@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sukadev Bhattiprolu [Wed, 13 Apr 2022 17:10:25 +0000 (13:10 -0400)]
ibmvnic: Allow multiple ltbs in rxpool ltb_set
Allow multiple LTBs in the rxpool's ltb_set. The first n-1 LTBs will all
be of the same size. The size of the last LTB in the set depends on the
number of buffers and buffer (mtu) size.
Having a set of LTBs per pool provides a couple of benefits.
First, with the current value of IBMVNIC_MAX_LTB_SIZE of 16MB, with an
MTU of 9000, we need a LTB (DMA buffer) of that size but the allocation
can fail in low memory conditions. With a set of LTBs per pool, we can
use several smaller (8MB) LTBs and hopefully have fewer allocation
failures. (See also comments in ibmvnic.h on the trade-off with smaller
LTBs)
Second since the kernel limits the size of the DMA buffer to 16MB (based
on MAX_ORDER), with a single DMA buffer per pool, the pool is also limited
to 16MB. This in turn limits the number of buffers per pool to 1763 when
MTU is 9000. With a set of LTBs per pool, we can have upto the max of 4096
buffers per pool even when MTU is 9000.
Suggested-by: Brian King <brking@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sukadev Bhattiprolu [Wed, 13 Apr 2022 17:10:24 +0000 (13:10 -0400)]
ibmvnic: convert rxpool ltb to a set of ltbs
Define and use interfaces that treat the long term buffer (LTB) of an
rxpool as a set of LTBs rather than a single LTB. The set only has one
LTB for now.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sukadev Bhattiprolu [Wed, 13 Apr 2022 17:10:23 +0000 (13:10 -0400)]
ibmvnic: define map_txpool_buf_to_ltb()
Define a helper to map a given txpool buffer into its corresponding long
term buffer (LTB) and offset. Currently there is just one LTB per txpool
so the mapping is trivial. When we add support for multiple LTBs per
txpool, this helper will be more useful.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sukadev Bhattiprolu [Wed, 13 Apr 2022 17:10:22 +0000 (13:10 -0400)]
ibmvnic: define map_rxpool_buf_to_ltb()
Define a helper to map a given rx pool buffer into its corresponding long
term buffer (LTB) and offset. Currently there is just one LTB per pool so
the mapping is trivial. When we add support for multiple LTBs per pool,
this helper will be more useful.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sukadev Bhattiprolu [Wed, 13 Apr 2022 17:10:21 +0000 (13:10 -0400)]
ibmvnic: rename local variable index to bufidx
The local variable 'index' is heavily used in some functions and is
confusing with the presence of other "index" fields like pool->index,
->consumer_index, etc. Rename it to bufidx to better reflect that its
the index of a buffer in the pool.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 15 Apr 2022 18:41:56 +0000 (11:41 -0700)]
Merge branch 'net-ethool-add-support-to-get-set-tx-push-by-ethtool-g-g'
Jie Wang says:
====================
net: ethool: add support to get/set tx push by ethtool -G/g
These three patches add tx push in ring params and adapt the set and get APIs
of ring params.
====================
Link: https://lore.kernel.org/r/20220412020121.14140-1-huangguangbin2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jie Wang [Tue, 12 Apr 2022 02:01:21 +0000 (10:01 +0800)]
net: hns3: add tx push support in hns3 ring param process
This patch adds tx push param to hns3 ring param and adapts the set and get
API of ring params. So users can set it by cmd ethtool -G and get it by cmd
ethtool -g.
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jie Wang [Tue, 12 Apr 2022 02:01:20 +0000 (10:01 +0800)]
net: ethtool: move checks before rtnl_lock() in ethnl_set_rings
Currently these two checks in ethnl_set_rings are added after rtnl_lock()
which will do useless works if the request is invalid.
So this patch moves these checks before the rtnl_lock() to avoid these
costs.
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jie Wang [Tue, 12 Apr 2022 02:01:19 +0000 (10:01 +0800)]
net: ethtool: extend ringparam set/get APIs for tx_push
Currently tx push is a standard driver feature which controls use of a fast
path descriptor push. So this patch extends the ringparam APIs and data
structures to support set/get tx push by ethtool -G/g.
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yang Yingliang [Fri, 15 Apr 2022 02:39:57 +0000 (10:39 +0800)]
octeon_ep: fix error return code in octep_probe()
If register_netdev() fails , it should return error
code in octep_probe().
Fixes: 862cd659a6fb ("octeon_ep: Add driver framework and device initialization")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Apr 2022 10:46:29 +0000 (11:46 +0100)]
Merge branch 'emaclite-cleanups'
Radhey Shyam Pandey says:
====================
net: emaclite: Trivial code cleanup
This patchset fix coding style issues, remove BUFFER_ALIGN
macro and also update copyright text.
I have to resend as earlier series didn't reach mailing list
due to some configuration issue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shravya Kumbham [Thu, 14 Apr 2022 12:37:11 +0000 (18:07 +0530)]
net: emaclite: Remove custom BUFFER_ALIGN macro
BUFFER_ALIGN macro is used to calculate the number of bytes
required for the next alignment. Instead of this, we can directly
use the skb_reserve(skb, NET_IP_ALIGN) to make the protocol header
buffer aligned on at least a 4-byte boundary, where the NET_IP_ALIGN
is by default defined as 2. So removing the BUFFER_ALIGN and its
related defines which it can be done by the skb_reserve() itself.
Signed-off-by: Shravya Kumbham <shravya.kumbham@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Simek [Thu, 14 Apr 2022 12:37:10 +0000 (18:07 +0530)]
net: emaclite: Update copyright text to correct format
Based on recommended guidance Copyright term should be also present in
front of (c). That's why aligned driver to match this pattern.
It helps automated tools with source code scanning.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Radhey Shyam Pandey [Thu, 14 Apr 2022 12:37:09 +0000 (18:07 +0530)]
net: emaclite: Fix coding style
Make coding style changes to fix checkpatch script warnings.
There is no functional change. Fixes below check and warnings-
CHECK: Blank lines aren't necessary after an open brace '{'
CHECK: spinlock_t definition without comment
CHECK: Please don't use multiple blank lines
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
CHECK: braces {} should be used on all arms of this statement
CHECK: Unbalanced braces around else statement
CHECK: Alignment should match open parenthesis
WARNING: Missing a blank line after declarations
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minghao Chi [Thu, 14 Apr 2022 09:08:00 +0000 (09:08 +0000)]
net: ethernet: ti: davinci_emac: using pm_runtime_resume_and_get instead of pm_runtime_get_sync
Using pm_runtime_resume_and_get() to replace pm_runtime_get_sync and
pm_runtime_put_noidle. This change is just to simplify the code, no
actual functional changes.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 14 Apr 2022 08:08:34 +0000 (09:08 +0100)]
octeon_ep: Fix spelling mistake "inerrupts" -> "interrupts"
There is a spelling mistake in a dev_info message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Apr 2022 10:06:13 +0000 (11:06 +0100)]
Merge branch 'mlxsw-line-card-prep'
Ido Schimmel says:
====================
mlxsw: Preparations for line cards support
Currently, mlxsw registers thermal zones as well as hwmon entries for
objects such as transceiver modules and gearboxes. In upcoming modular
systems, these objects are no longer found on the main board (i.e., slot
0), but on plug-able line cards. This patchset prepares mlxsw for such
systems in terms of hwmon, thermal and cable access support.
Patches #1-#3 gradually prepare mlxsw for transceiver modules access
support for line cards by splitting some of the internal structures and
some APIs.
Patches #4-#5 gradually prepare mlxsw for hwmon support for line cards
by splitting some of the internal structures and augmenting them with a
slot index.
Patches #6-#7 do the same for thermal zones.
Patch #8 selects cooling device for binding to a thermal zone by exact
name match to prevent binding to non-relevant devices.
Patch #9 replaces internal define for thermal zone name length with a
common define.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 13 Apr 2022 15:17:33 +0000 (18:17 +0300)]
mlxsw: core_thermal: Use common define for thermal zone name length
Replace internal define 'MLXSW_THERMAL_ZONE_MAX_NAME' by common
'THERMAL_NAME_LENGTH'.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 13 Apr 2022 15:17:32 +0000 (18:17 +0300)]
mlxsw: core_thermal: Use exact name of cooling devices for binding
Modular system supports additional cooling devices "mlxreg_fan1",
"mlxreg_fan2", etcetera. Thermal zones in "mlxsw" driver should be
bound to the same device as before called "mlxreg_fan". Used exact
match for cooling device name to avoid binding to new additional
cooling devices.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 13 Apr 2022 15:17:31 +0000 (18:17 +0300)]
mlxsw: core_thermal: Add line card id prefix to line card thermal zone name
Add prefix "lc#n" to thermal zones associated with the thermal objects
found on line cards.
For example thermal zone for module #9 located at line card #7 will
have type:
mlxsw-lc7-module9.
And thermal zone for gearbox #3 located at line card #5 will have type:
mlxsw-lc5-gearbox3.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 13 Apr 2022 15:17:30 +0000 (18:17 +0300)]
mlxsw: core_thermal: Extend internal structures to support multi thermal areas
Introduce intermediate level for thermal zones areas.
Currently all thermal zones are associated with thermal objects located
within the main board. Such objects are created during driver
initialization and removed during driver de-initialization.
For line cards in modular system the thermal zones are to be associated
with the specific line card. They should be created whenever new line
card is available (inserted, validated, powered and enabled) and
removed, when line card is getting unavailable.
The thermal objects found on the line card #n are accessed by setting
slot index to #n, while for access to objects found on the main board
slot index should be set to default value zero.
Each thermal area contains the set of thermal zones associated with
particular slot index.
Thus introduction of thermal zone areas allows to use the same APIs for
the main board and line cards, by adding slot index argument.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 13 Apr 2022 15:17:29 +0000 (18:17 +0300)]
mlxsw: core_hwmon: Introduce slot parameter in hwmon interfaces
Add 'slot' parameter to 'mlxsw_hwmon_dev' structure. Use this parameter
in mlxsw_reg_mtmp_pack(), mlxsw_reg_mtbr_pack(), mlxsw_reg_mgpir_pack()
and mlxsw_reg_mtmp_slot_index_set() routines.
For main board it'll always be zero, for line cards it'll be set to
the physical slot number in modular systems.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 13 Apr 2022 15:17:28 +0000 (18:17 +0300)]
mlxsw: core_hwmon: Extend internal structures to support multi hwmon objects
Currently, mlxsw supports a single hwmon device and registers it with
attributes corresponding to the various objects found on the main
board such as fans and gearboxes.
Line cards can have the same objects, but unlike the main board they
can be added and removed while the system is running. The various
hwmon objects found on these line cards should be created when the
line card becomes available and destroyed when the line card becomes
unavailable.
The above can be achieved by representing each line card as a
different hwmon device and registering / unregistering it when the
line card becomes available / unavailable.
Prepare for multi hwmon device support by splitting
'struct mlxsw_hwmon' into 'struct mlxsw_hwmon' and
'struct mlxsw_hwmon_dev'. The first will hold information relevant to
all hwmon devices, whereas the second will hold per-hwmon device
information.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 13 Apr 2022 15:17:27 +0000 (18:17 +0300)]
mlxsw: core: Move port module events enablement to a separate function
Use a separate function for enablement of port module events such
plug/unplug and temperature threshold crossing. The motivation is to
reuse the function for line cards.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 13 Apr 2022 15:17:26 +0000 (18:17 +0300)]
mlxsw: core: Extend port module data structures for line cards
The port module core is tasked with module operations such as setting
power mode policy and reset. The per-module information is currently
stored in one large array suited for non-modular systems where only the
main board is present (i.e., slot index 0).
As a preparation for line cards support, allocate a per line card array
according to the queried number of slots in the system. For each line
card, allocate a module array according to the queried maximum number of
modules per-slot.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 13 Apr 2022 15:17:25 +0000 (18:17 +0300)]
mlxsw: core: Extend interfaces for cable info access with slot argument
Extend all cable info APIs with 'slot_index' argument.
For main board, slot will always be set to zero and these APIs will work
as before. If reading cable information is required from cages located
on line cards, slot should be set to the physical slot number, where
line card is located in modular systems.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minghao Chi [Wed, 13 Apr 2022 09:38:36 +0000 (09:38 +0000)]
net: ethernet: ti: cpsw_priv: using pm_runtime_resume_and_get instead of pm_runtime_get_sync
Using pm_runtime_resume_and_get is more appropriate
for simplifing code
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minghao Chi [Wed, 13 Apr 2022 09:38:01 +0000 (09:38 +0000)]
net: stmmac: stmmac_main: using pm_runtime_resume_and_get instead of pm_runtime_get_sync
Using pm_runtime_resume_and_get is more appropriate
for simplifing code
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minghao Chi [Wed, 13 Apr 2022 09:35:29 +0000 (09:35 +0000)]
net: ethernet: ti: cpsw_new: use pm_runtime_resume_and_get() instead of pm_runtime_get_sync()
Using pm_runtime_resume_and_get is more appropriate
for simplifing code
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 13 Apr 2022 08:44:40 +0000 (10:44 +0200)]
geneve: avoid indirect calls in GRO path, when possible
In the most common setups, the geneve tunnels use an inner
ethernet encapsulation. In the GRO path, when such condition is
true, we can call directly the relevant GRO helper and avoid
a few indirect calls.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Apr 2022 09:43:48 +0000 (10:43 +0100)]
Merge branch 'mneta-page_pool_get_stats'
Lorenzo Bianconi says:
====================
net: mvneta: add support for page_pool_get_stats
Introduce page_pool stats ethtool APIs in order to avoid driver duplicated
code.
Changes since v4:
- rebase on top of net-next
Changes since v3:
- get rid of wrong for loop in page_pool_ethtool_stats_get()
- add API stubs when page_pool_stats are not compiled in
Changes since v2:
- remove enum list of page_pool stats in page_pool.h
- remove leftover change in mvneta.c for ethtool_stats array allocation
Changes since v1:
- move stats accounting to page_pool code
- move stats string management to page_pool code
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Tue, 12 Apr 2022 16:31:59 +0000 (18:31 +0200)]
net: mvneta: add support for page_pool_get_stats
Introduce support for the page_pool stats API into mvneta driver.
Report page_pool stats through ethtool.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Tue, 12 Apr 2022 16:31:58 +0000 (18:31 +0200)]
net: page_pool: introduce ethtool stats
Introduce page_pool APIs to report stats through ethtool and reduce
duplicated code in each driver.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 15 Apr 2022 07:26:00 +0000 (09:26 +0200)]
Merge git://git./linux/kernel/git/netdev/net
Linus Torvalds [Thu, 14 Apr 2022 18:58:19 +0000 (11:58 -0700)]
Merge tag 'net-5.18-rc3' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from wireless and netfilter.
Current release - regressions:
- smc: fix af_ops of child socket pointing to released memory
- wifi: ath9k: fix usage of driver-private space in tx_info
Previous releases - regressions:
- ipv6: fix panic when forwarding a pkt with no in6 dev
- sctp: use the correct skb for security_sctp_assoc_request
- smc: fix NULL pointer dereference in smc_pnet_find_ib()
- sched: fix initialization order when updating chain 0 head
- phy: don't defer probe forever if PHY IRQ provider is missing
- dsa: revert "net: dsa: setup master before ports"
- dsa: felix: fix tagging protocol changes with multiple CPU ports
- eth: ice:
- fix use-after-free when freeing @rx_cpu_rmap
- revert "iavf: fix deadlock occurrence during resetting VF
interface"
- eth: lan966x: stop processing the MAC entry is port is wrong
Previous releases - always broken:
- sched:
- flower: fix parsing of ethertype following VLAN header
- taprio: check if socket flags are valid
- nfc: add flush_workqueue to prevent uaf
- veth: ensure eth header is in skb's linear part
- eth: stmmac: fix altr_tse_pcs function when using a fixed-link
- eth: macb: restart tx only if queue pointer is lagging
- eth: macvlan: fix leaking skb in source mode with nodst option"
* tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies
net: dsa: felix: fix tagging protocol changes with multiple CPU ports
tun: annotate access to queue->trans_start
nfc: nci: add flush_workqueue to prevent uaf
net: dsa: realtek: don't parse compatible string for RTL8366S
net: dsa: realtek: fix Kconfig to assure consistent driver linkage
net: ftgmac100: access hardware register after clock ready
Revert "net: dsa: setup master before ports"
macvlan: Fix leaking skb in source mode with nodst option
netfilter: nf_tables: nft_parse_register can return a negative value
net: lan966x: Stop processing the MAC entry is port is wrong.
net: lan966x: Fix when a port's upper is changed.
net: lan966x: Fix IGMP snooping when frames have vlan tag
net: lan966x: Update lan966x_ptp_get_nominal_value
sctp: Initialize daddr on peeled off socket
net/smc: Fix af_ops of child socket pointing to released memory
net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()
net/smc: use memcpy instead of snprintf to avoid out of bounds read
net: macb: Restart tx only if queue pointer is lagging
...
Linus Torvalds [Thu, 14 Apr 2022 18:08:12 +0000 (11:08 -0700)]
Merge tag 'sound-5.18-rc3' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This became an unexpectedly large pull request due to various
regression fixes in the previous kernels.
The majority of fixes are a series of patches to address the
regression at probe errors in devres'ed drivers, while there are yet
more fixes for the x86 SG allocations and for USB-audio buffer
management. In addition, a few HD-audio quirks and other small fixes
are found"
* tag 'sound-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (52 commits)
ALSA: usb-audio: Limit max buffer and period sizes per time
ALSA: memalloc: Add fallback SG-buffer allocations for x86
ALSA: nm256: Don't call card private_free at probe error path
ALSA: mtpav: Don't call card private_free at probe error path
ALSA: rme9652: Fix the missing snd_card_free() call at probe error
ALSA: hdspm: Fix the missing snd_card_free() call at probe error
ALSA: hdsp: Fix the missing snd_card_free() call at probe error
ALSA: oxygen: Fix the missing snd_card_free() call at probe error
ALSA: lx6464es: Fix the missing snd_card_free() call at probe error
ALSA: cmipci: Fix the missing snd_card_free() call at probe error
ALSA: aw2: Fix the missing snd_card_free() call at probe error
ALSA: als300: Fix the missing snd_card_free() call at probe error
ALSA: lola: Fix the missing snd_card_free() call at probe error
ALSA: bt87x: Fix the missing snd_card_free() call at probe error
ALSA: sis7019: Fix the missing error handling
ALSA: intel_hdmi: Fix the missing snd_card_free() call at probe error
ALSA: via82xx: Fix the missing snd_card_free() call at probe error
ALSA: sonicvibes: Fix the missing snd_card_free() call at probe error
ALSA: rme96: Fix the missing snd_card_free() call at probe error
ALSA: rme32: Fix the missing snd_card_free() call at probe error
...
Linus Torvalds [Thu, 14 Apr 2022 17:58:27 +0000 (10:58 -0700)]
Merge tag 'for-5.18-rc2-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more code and warning fixes.
There's one feature ioctl removal patch slated for 5.18 that did not
make it to the main pull request. It's just a one-liner and the ioctl
has a v2 that's in use for a long time, no point to postpone it to
5.19.
Late update:
- remove balance v1 ioctl, superseded by v2 in 2012
Fixes:
- add back cgroup attribution for compressed writes
- add super block write start/end annotations to asynchronous balance
- fix root reference count on an error handling path
- in zoned mode, activate zone at the chunk allocation time to avoid
ENOSPC due to timing issues
- fix delayed allocation accounting for direct IO
Warning fixes:
- simplify assertion condition in zoned check
- remove an unused variable"
* tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix btrfs_submit_compressed_write cgroup attribution
btrfs: fix root ref counts in error handling in btrfs_get_root_ref
btrfs: zoned: activate block group only for extent allocation
btrfs: return allocated block group from do_chunk_alloc()
btrfs: mark resumed async balance as writing
btrfs: remove support of balance v1 ioctl
btrfs: release correct delalloc amount in direct IO write path
btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
btrfs: zoned: remove redundant condition in btrfs_run_delalloc_range
Linus Torvalds [Thu, 14 Apr 2022 17:51:20 +0000 (10:51 -0700)]
Merge tag 'fscache-fixes-
20220413' of git://git./linux/kernel/git/dhowells/linux-fs
Pull fscache fixes from David Howells:
"Here's a collection of fscache and cachefiles fixes and misc small
cleanups. The two main fixes are:
- Add a missing unmark of the inode in-use mark in an error path.
- Fix a KASAN slab-out-of-bounds error when setting the xattr on a
cachefiles volume due to the wrong length being given to memcpy().
In addition, there's the removal of an unused parameter, removal of an
unused Kconfig option, conditionalising a bit of procfs-related stuff
and some doc fixes"
* tag 'fscache-fixes-
20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
fscache: remove FSCACHE_OLD_API Kconfig option
fscache: Use wrapper fscache_set_cache_state() directly when relinquishing
fscache: Move fscache_cookies_seq_ops specific code under CONFIG_PROC_FS
fscache: Remove the cookie parameter from fscache_clear_page_bits()
docs: filesystems: caching/backend-api.rst: fix an object withdrawn API
docs: filesystems: caching/backend-api.rst: correct two relinquish APIs use
cachefiles: Fix KASAN slab-out-of-bounds in cachefiles_set_volume_xattr
cachefiles: unmark inode in use in error path
Paolo Abeni [Thu, 14 Apr 2022 13:08:16 +0000 (15:08 +0200)]
Merge branch 'rndis_host-handle-bogus-mac-addresses-in-zte-rndis-devices'
Lech Perczak says:
====================
rndis_host: handle bogus MAC addresses in ZTE RNDIS devices
When porting support of ZTE MF286R to OpenWrt [1], it was discovered,
that its built-in LTE modem fails to adjust its target MAC address,
when a random MAC address is assigned to the interface, due to detection of
"locally-administered address" bit. This leads to dropping of ingress
trafficat the host. The modem uses RNDIS as its primary interface,
with some variants exposing both of them simultaneously.
Then it was discovered, that cdc_ether driver contains a fixup for that
exact issue, also appearing on CDC ECM interfaces.
I discussed how to proceed with that with Bjørn Mork at OpenWrt forum [3],
with the first approach would be to trust the locally-administered MAC
again, and add a quirk for the problematic ZTE devices, as suggested by
Kristian Evensen. before [4], but reusing the fixup from cdc_ether looks
like a safer and more generic solution.
Finally, according to Bjørn's suggestion. limit the scope of bogus MAC
addressdetection to ZTE devices, the same way as it is done in cdc_ether,
as this trait wasn't really observed outside of ZTE devices.
Do that for both flavours of RNDIS devices, with interface classes
02/02/ff and e0/01/03, as both types are reported by different modems.
[1] https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=
7ac8da00609f42b8aba74b7efc6b0d055b7cef3e
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=
bfe9b9d2df669a57a95d641ed46eb018e204c6ce
[3] https://forum.openwrt.org/t/problem-with-modem-in-zte-mf286r/120988
[4] https://lore.kernel.org/all/CAKfDRXhDp3heiD75Lat7cr1JmY-kaJ-MS0tt7QXX=s8RFjbpUQ@mail.gmail.com/T/
Cc: Bjørn Mork <bjorn@mork.no>
Cc: Kristian Evensen <kristian.evensen@gmail.com>
Cc: Oliver Neukum <oliver@neukum.org>
v3: Fixed wrong identifier commit description and whitespace in patch 2.
v2: ensure that MAC fixup is applied to all Ethernet frames in RNDIS
batch, by introducing a driver flag, and integrating the fixup inside
rndis_rx_fixup().
====================
Link: https://lore.kernel.org/r/20220413014416.2306843-1-lech.perczak@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lech Perczak [Wed, 13 Apr 2022 01:44:16 +0000 (03:44 +0200)]
rndis_host: limit scope of bogus MAC address detection to ZTE devices
Reporting of bogus MAC addresses and ignoring configuration of new
destination address wasn't observed outside of a range of ZTE devices,
among which this seems to be the common bug. Align rndis_host driver
with implementation found in cdc_ether, which also limits this workaround
to ZTE devices.
Suggested-by: Bjørn Mork <bjorn@mork.no>
Cc: Kristian Evensen <kristian.evensen@gmail.com>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lech Perczak [Wed, 13 Apr 2022 01:44:15 +0000 (03:44 +0200)]
rndis_host: enable the bogus MAC fixup for ZTE devices from cdc_ether
Certain ZTE modems, namely: MF823. MF831, MF910, built-in modem from
MF286R, expose both CDC-ECM and RNDIS network interfaces.
They have a trait of ignoring the locally-administered MAC address
configured on the interface both in CDC-ECM and RNDIS part,
and this leads to dropping of incoming traffic by the host.
However, the workaround was only present in CDC-ECM, and MF286R
explicitly requires it in RNDIS mode.
Re-use the workaround in rndis_host as well, to fix operation of MF286R
module, some versions of which expose only the RNDIS interface. Do so by
introducing new flag, RNDIS_DRIVER_DATA_DST_MAC_FIXUP, and testing for it
in rndis_rx_fixup. This is required, as RNDIS uses frame batching, and all
of the packets inside the batch need the fixup. This might introduce a
performance penalty, because test is done for every returned Ethernet
frame.
Apply the workaround to both "flavors" of RNDIS interfaces, as older ZTE
modems, like MF823 found in the wild, report the USB_CLASS_COMM class
interfaces, while MF286R reports USB_CLASS_WIRELESS_CONTROLLER.
Suggested-by: Bjørn Mork <bjorn@mork.no>
Cc: Kristian Evensen <kristian.evensen@gmail.com>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lech Perczak [Wed, 13 Apr 2022 01:44:14 +0000 (03:44 +0200)]
cdc_ether: export usbnet_cdc_zte_rx_fixup
Commit
bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling")
introduces a workaround for certain ZTE modems reporting invalid MAC
addresses over CDC-ECM.
The same issue was present on their RNDIS interface,which was fixed in
commit
a5a18bdf7453 ("rndis_host: Set valid random MAC on buggy devices").
However, internal modem of ZTE MF286R router, on its RNDIS interface, also
exhibits a second issue fixed already in CDC-ECM, of the device not
respecting configured random MAC address. In order to share the fixup for
this with rndis_host driver, export the workaround function, which will
be re-used in the following commit in rndis_host.
Cc: Kristian Evensen <kristian.evensen@gmail.com>
Cc: Bjørn Mork <bjorn@mork.no>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Lech Perczak <lech.perczak@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jeremy Linton [Tue, 12 Apr 2022 21:04:20 +0000 (16:04 -0500)]
net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
It turns out after digging deeper into this bug, that it was being
triggered by GCC12 failing to call the bcmgenet_enable_dma()
routine. Given that a gcc12 fix has been merged [1] and the genet
driver now works properly when built with gcc12, this commit should
be reverted.
[1]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=
aabb9a261ef060cf24fd626713f1d7d9df81aa57
Fixes: 8d3ea3d402db ("net: bcmgenet: Use stronger register read/writes to assure ordering")
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220412210420.1129430-1-jeremy.linton@arm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Machata [Tue, 12 Apr 2022 20:25:06 +0000 (22:25 +0200)]
rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies
When L3 stats are disabled, rtnl_offload_xstats_get_size_stats() returns
size of 0, which is supposed to be an indication that the corresponding
attribute should not be emitted. However, instead, the current code
reserves a 0-byte attribute.
The reason this does not show up as a citation on a kasan kernel is that
netdev_offload_xstats_get(), which is supposed to fill in the data, never
ends up getting called, because rtnl_offload_xstats_get_stats() notices
that the stats are not actually used and skips the call.
Thus a zero-length IFLA_OFFLOAD_XSTATS_L3_STATS attribute ends up in a
response, confusing the userspace.
Fix by skipping the L3-stats related block in rtnl_offload_xstats_fill().
Fixes: 0e7788fd7622 ("net: rtnetlink: Add UAPI for obtaining L3 offload xstats")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/591b58e7623edc3eb66dd1fcfa8c8f133d090974.1649794741.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vladimir Oltean [Tue, 12 Apr 2022 17:22:09 +0000 (20:22 +0300)]
net: dsa: felix: fix tagging protocol changes with multiple CPU ports
When the device tree has 2 CPU ports defined, a single one is active
(has any dp->cpu_dp pointers point to it). Yet the second one is still a
CPU port, and DSA still calls ->change_tag_protocol on it.
On the NXP LS1028A, the CPU ports are ports 4 and 5. Port 4 is the
active CPU port and port 5 is inactive.
After the following commands:
# Initial setting
cat /sys/class/net/eno2/dsa/tagging
ocelot
echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
echo ocelot > /sys/class/net/eno2/dsa/tagging
traffic is now broken, because the driver has moved the NPI port from
port 4 to port 5, unbeknown to DSA.
The problem can be avoided by detecting that the second CPU port is
unused, and not doing anything for it. Further rework will be needed
when proper support for multiple CPU ports is added.
Treat this as a bug and prepare current kernels to work in single-CPU
mode with multiple-CPU DT blobs.
Fixes: adb3dccf090b ("net: dsa: felix: convert to the new .change_tag_protocol DSA API")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220412172209.2531865-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Antoine Tenart [Tue, 12 Apr 2022 13:58:52 +0000 (15:58 +0200)]
tun: annotate access to queue->trans_start
Commit
5337824f4dc4 ("net: annotate accesses to queue->trans_start")
introduced a new helper, txq_trans_cond_update, to update
queue->trans_start using WRITE_ONCE. One snippet in drivers/net/tun.c
was missed, as it was introduced roughly at the same time.
Fixes: 5337824f4dc4 ("net: annotate accesses to queue->trans_start")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220412135852.466386-1-atenart@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lin Ma [Tue, 12 Apr 2022 16:04:30 +0000 (00:04 +0800)]
nfc: nci: add flush_workqueue to prevent uaf
Our detector found a concurrent use-after-free bug when detaching an
NCI device. The main reason for this bug is the unexpected scheduling
between the used delayed mechanism (timer and workqueue).
The race can be demonstrated below:
Thread-1 Thread-2
| nci_dev_up()
| nci_open_device()
| __nci_request(nci_reset_req)
| nci_send_cmd
| queue_work(cmd_work)
nci_unregister_device() |
nci_close_device() | ...
del_timer_sync(cmd_timer)[1] |
... | Worker
nci_free_device() | nci_cmd_work()
kfree(ndev)[3] | mod_timer(cmd_timer)[2]
In short, the cleanup routine thought that the cmd_timer has already
been detached by [1] but the mod_timer can re-attach the timer [2], even
it is already released [3], resulting in UAF.
This UAF is easy to trigger, crash trace by POC is like below
[ 66.703713] ==================================================================
[ 66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490
[ 66.703974] Write of size 8 at addr
ffff888009fb7058 by task kworker/u4:1/33
[ 66.703974]
[ 66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5
[ 66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work
[ 66.703974] Call Trace:
[ 66.703974] <TASK>
[ 66.703974] dump_stack_lvl+0x57/0x7d
[ 66.703974] print_report.cold+0x5e/0x5db
[ 66.703974] ? enqueue_timer+0x448/0x490
[ 66.703974] kasan_report+0xbe/0x1c0
[ 66.703974] ? enqueue_timer+0x448/0x490
[ 66.703974] enqueue_timer+0x448/0x490
[ 66.703974] __mod_timer+0x5e6/0xb80
[ 66.703974] ? mark_held_locks+0x9e/0xe0
[ 66.703974] ? try_to_del_timer_sync+0xf0/0xf0
[ 66.703974] ? lockdep_hardirqs_on_prepare+0x17b/0x410
[ 66.703974] ? queue_work_on+0x61/0x80
[ 66.703974] ? lockdep_hardirqs_on+0xbf/0x130
[ 66.703974] process_one_work+0x8bb/0x1510
[ 66.703974] ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 66.703974] ? pwq_dec_nr_in_flight+0x230/0x230
[ 66.703974] ? rwlock_bug.part.0+0x90/0x90
[ 66.703974] ? _raw_spin_lock_irq+0x41/0x50
[ 66.703974] worker_thread+0x575/0x1190
[ 66.703974] ? process_one_work+0x1510/0x1510
[ 66.703974] kthread+0x2a0/0x340
[ 66.703974] ? kthread_complete_and_exit+0x20/0x20
[ 66.703974] ret_from_fork+0x22/0x30
[ 66.703974] </TASK>
[ 66.703974]
[ 66.703974] Allocated by task 267:
[ 66.703974] kasan_save_stack+0x1e/0x40
[ 66.703974] __kasan_kmalloc+0x81/0xa0
[ 66.703974] nci_allocate_device+0xd3/0x390
[ 66.703974] nfcmrvl_nci_register_dev+0x183/0x2c0
[ 66.703974] nfcmrvl_nci_uart_open+0xf2/0x1dd
[ 66.703974] nci_uart_tty_ioctl+0x2c3/0x4a0
[ 66.703974] tty_ioctl+0x764/0x1310
[ 66.703974] __x64_sys_ioctl+0x122/0x190
[ 66.703974] do_syscall_64+0x3b/0x90
[ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 66.703974]
[ 66.703974] Freed by task 406:
[ 66.703974] kasan_save_stack+0x1e/0x40
[ 66.703974] kasan_set_track+0x21/0x30
[ 66.703974] kasan_set_free_info+0x20/0x30
[ 66.703974] __kasan_slab_free+0x108/0x170
[ 66.703974] kfree+0xb0/0x330
[ 66.703974] nfcmrvl_nci_unregister_dev+0x90/0xd0
[ 66.703974] nci_uart_tty_close+0xdf/0x180
[ 66.703974] tty_ldisc_kill+0x73/0x110
[ 66.703974] tty_ldisc_hangup+0x281/0x5b0
[ 66.703974] __tty_hangup.part.0+0x431/0x890
[ 66.703974] tty_release+0x3a8/0xc80
[ 66.703974] __fput+0x1f0/0x8c0
[ 66.703974] task_work_run+0xc9/0x170
[ 66.703974] exit_to_user_mode_prepare+0x194/0x1a0
[ 66.703974] syscall_exit_to_user_mode+0x19/0x50
[ 66.703974] do_syscall_64+0x48/0x90
[ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae
To fix the UAF, this patch adds flush_workqueue() to ensure the
nci_cmd_work is finished before the following del_timer_sync.
This combination will promise the timer is actually detached.
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alvin Šipraga [Tue, 12 Apr 2022 15:57:49 +0000 (17:57 +0200)]
net: dsa: realtek: don't parse compatible string for RTL8366S
This switch is not even supported, but if someone were to actually put
this compatible string "realtek,rtl8366s" in their device tree, they
would be greeted with a kernel panic because the probe function would
dereference NULL. So let's just remove it.
Link: https://lore.kernel.org/all/CACRpkdYdKZs0WExXc3=0yPNOwP+oOV60HRz7SRoGjZvYHaT=1g@mail.gmail.com/
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alvin Šipraga [Tue, 12 Apr 2022 15:55:27 +0000 (17:55 +0200)]
net: dsa: realtek: fix Kconfig to assure consistent driver linkage
The kernel test robot reported a build failure:
or1k-linux-ld: drivers/net/dsa/realtek/realtek-smi.o:(.rodata+0x16c): undefined reference to `rtl8366rb_variant'
... with the following build configuration:
CONFIG_NET_DSA_REALTEK=y
CONFIG_NET_DSA_REALTEK_SMI=y
CONFIG_NET_DSA_REALTEK_RTL8365MB=y
CONFIG_NET_DSA_REALTEK_RTL8366RB=m
The problem here is that the realtek-smi interface driver gets built-in,
while the rtl8366rb switch subdriver gets built as a module, hence the
symbol rtl8366rb_variant is not reachable when defining the OF device
table in the interface driver.
The Kconfig dependencies don't help in this scenario because they just
say that the subdriver(s) depend on at least one interface driver. In
fact, the subdrivers don't depend on the interface drivers at all, and
can even be built even in their absence. Somewhat strangely, the
interface drivers can also be built in the absence of any subdriver,
BUT, if a subdriver IS enabled, then it must be reachable according to
the linkage of the interface driver: effectively what the IS_REACHABLE()
macro achieves. If it is not reachable, the above kind of linker error
will be observed.
Rather than papering over the above build error by simply using
IS_REACHABLE(), we can do a little better and admit that it is actually
the interface drivers that have a dependency on the subdrivers. So this
patch does exactly that. Specifically, we ensure that:
1. The interface drivers' Kconfig symbols must have a value no greater
than the value of any subdriver Kconfig symbols.
2. The subdrivers should by default enable both interface drivers, since
most users probably want at least one of them; those interface
drivers can be explicitly disabled however.
What this doesn't do is prevent a user from building only a subdriver,
without any interface driver. To that end, add an additional line of
help in the menu to guide users in the right direction.
Link: https://lore.kernel.org/all/202204110757.XIafvVnj-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Fixes: aac94001067d ("net: dsa: realtek: add new mdio interface for drivers")
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dylan Muller [Tue, 12 Apr 2022 15:26:00 +0000 (17:26 +0200)]
nfp: update nfp_X logging definitions
Previously it was not possible to determine which code path was responsible
for generating a certain message after a call to the nfp_X messaging
definitions for cases of duplicate strings. We therefore modify nfp_err,
nfp_warn, nfp_info, nfp_dbg and nfp_printk to print the corresponding file
and line number where the nfp_X definition is used.
Signed-off-by: Dylan Muller <dylan.muller@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 13 Apr 2022 12:13:34 +0000 (13:13 +0100)]
Merge tag 'wireless-2022-04-13' of git://git./linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v5.18
First set of fixes for v5.18. Maintainers file updates, two
compilation warning fixes, one revert for ath11k and smaller fixes to
drivers and stack. All the usual stuff.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 13 Apr 2022 12:09:57 +0000 (13:09 +0100)]
Merge branch 'ip-ingress-skb-reason'
Menglong Dong says:
====================
net: ip: add skb drop reasons to ip ingress
In the series "net: use kfree_skb_reason() for ip/udp packet receive",
skb drop reasons are added to the basic ingress path of IPv4. And in
the series "net: use kfree_skb_reason() for ip/neighbour", the egress
paths of IPv4 and IPv6 are handled. Related links:
https://lore.kernel.org/netdev/
20220205074739.543606-1-imagedong@tencent.com/
https://lore.kernel.org/netdev/
20220226041831.
2058437-1-imagedong@tencent.com/
Seems we still have a lot work to do with IP layer, including IPv6 basic
ingress path, IPv4/IPv6 forwarding, IPv6 exthdrs, fragment and defrag,
etc.
In this series, skb drop reasons are added to the basic ingress path of
IPv6 protocol and IPv4/IPv6 packet forwarding. Following functions, which
are used for IPv6 packet receiving are handled:
ip6_pkt_drop()
ip6_rcv_core()
ip6_protocol_deliver_rcu()
And following functions that used for IPv6 TLV parse are handled:
ip6_parse_tlv()
ipv6_hop_ra()
ipv6_hop_ioam()
ipv6_hop_jumbo()
ipv6_hop_calipso()
ipv6_dest_hao()
Besides, ip_forward() and ip6_forward(), which are used for IPv4/IPv6
forwarding, are also handled. And following new drop reasons are added:
/* host unreachable, corresponding to IPSTATS_MIB_INADDRERRORS */
SKB_DROP_REASON_IP_INADDRERRORS
/* network unreachable, corresponding to IPSTATS_MIB_INADDRERRORS */
SKB_DROP_REASON_IP_INNOROUTES
/* packet size is too big, corresponding to
* IPSTATS_MIB_INTOOBIGERRORS
*/
SKB_DROP_REASON_PKT_TOO_BIG
In order to simply the definition and assignment for
'enum skb_drop_reason', some helper functions are introduced in the 1th
patch. I'm not such if this is necessary, but it makes the code simpler.
For example, we can replace the code:
if (reason == SKB_DROP_REASON_NOT_SPECIFIED)
reason = SKB_DROP_REASON_IP_INHDR;
with:
SKB_DR_OR(reason, IP_INHDR);
In the 6th patch, the statistics for skb in ipv6_hop_jum() is removed,
as I think it is redundant. There are two call chains for
ipv6_hop_jumbo(). The first one is:
ipv6_destopt_rcv() -> ip6_parse_tlv() -> ipv6_hop_jumbo()
On this call chain, the drop statistics will be done in
ipv6_destopt_rcv() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo()
returns false.
The second call chain is:
ip6_rcv_core() -> ipv6_parse_hopopts() -> ip6_parse_tlv()
And the drop statistics will also be done in ip6_rcv_core() with
'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false.
Therefore, the statistics in ipv6_hop_jumbo() is redundant, which
means the drop is counted twice. The statistics in ipv6_hop_jumbo()
is almost the same as the outside, except the
'IPSTATS_MIB_INTRUNCATEDPKTS', which seems that we have to ignore it.
======================================================================
Here is a basic test for IPv6 forwarding packet drop that monitored by
'dropwatch' tool:
drop at: ip6_forward+0x81a/0xb70 (0xffffffff86c73f8a)
origin: software
input port ifindex: 7
timestamp: Wed Apr 13 11:51:06 2022
130010176 nsec
protocol: 0x86dd
length: 94
original length: 94
drop reason: IP_INADDRERRORS
The origin cause of this case is that IPv6 doesn't allow to forward the
packet with LOCAL-LINK saddr, and results the 'IP_INADDRERRORS' drop
reason.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Wed, 13 Apr 2022 08:16:00 +0000 (16:16 +0800)]
net: ipv6: add skb drop reasons to ip6_protocol_deliver_rcu()
Replace kfree_skb() used in ip6_protocol_deliver_rcu() with
kfree_skb_reason().
No new reasons are added.
Some paths are ignored, as they are not common, such as encapsulation
on non-final protocol.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Wed, 13 Apr 2022 08:15:59 +0000 (16:15 +0800)]
net: ipv6: add skb drop reasons to ip6_rcv_core()
Replace kfree_skb() used in ip6_rcv_core() with kfree_skb_reason().
No new drop reasons are added.
Seems now we use 'SKB_DROP_REASON_IP_INHDR' for too many case during
ipv6 header parse or check, just like what 'IPSTATS_MIB_INHDRERRORS'
do. Will it be too general and hard to know what happened?
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Wed, 13 Apr 2022 08:15:58 +0000 (16:15 +0800)]
net: ipv6: add skb drop reasons to TLV parse
Replace kfree_skb() used in TLV encoded option header parsing with
kfree_skb_reason(). Following functions are involved:
ip6_parse_tlv()
ipv6_hop_ra()
ipv6_hop_ioam()
ipv6_hop_jumbo()
ipv6_hop_calipso()
ipv6_dest_hao()
Most skb drops during this process are regarded as 'InHdrErrors',
as 'IPSTATS_MIB_INHDRERRORS' is used when ip6_parse_tlv() fails,
which make we use 'SKB_DROP_REASON_IP_INHDR' correspondingly.
However, 'IP_INHDR' is a relatively general reason. Therefore, we
can use other reasons with higher priority in some cases. For example,
'SKB_DROP_REASON_UNHANDLED_PROTO' is used for unknown TLV options.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>