platform/kernel/linux-starfive.git
2 years agodevlink: Enable creation of the devlink-rate nodes from the driver
Michal Wilczynski [Tue, 15 Nov 2022 10:48:17 +0000 (11:48 +0100)]
devlink: Enable creation of the devlink-rate nodes from the driver

Intel 100G card internal firmware hierarchy for Hierarchicial QoS is very
rigid and can't be easily removed. This requires an ability to export
default hierarchy to allow user to modify it. Currently the driver is
only able to create the 'leaf' nodes, which usually represent the vport.
This is not enough for HQoS implemented in Intel hardware.

Introduce new function devl_rate_node_create() that allows for creation
of the devlink-rate nodes from the driver.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agodevlink: Introduce new attribute 'tx_weight' to devlink-rate
Michal Wilczynski [Tue, 15 Nov 2022 10:48:16 +0000 (11:48 +0100)]
devlink: Introduce new attribute 'tx_weight' to devlink-rate

To fully utilize offload capabilities of Intel 100G card QoS capabilities
new attribute 'tx_weight' needs to be introduced. This attribute allows
for usage of Weighted Fair Queuing arbitration scheme among siblings.
This arbitration scheme can be used simultaneously with the strict
priority.

Introduce new attribute in devlink-rate that will allow for configuration
of Weighted Fair Queueing. New attribute is optional.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agodevlink: Introduce new attribute 'tx_priority' to devlink-rate
Michal Wilczynski [Tue, 15 Nov 2022 10:48:15 +0000 (11:48 +0100)]
devlink: Introduce new attribute 'tx_priority' to devlink-rate

To fully utilize offload capabilities of Intel 100G card QoS capabilities
new attribute 'tx_priority' needs to be introduced. This attribute allows
for usage of strict priority arbiter among siblings. This arbitration
scheme attempts to schedule nodes based on their priority as long as the
nodes remain within their bandwidth limit.

Introduce new attribute in devlink-rate that will allow for configuration
of strict priority. New attribute is optional.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'autoload-dsa-tagging-driver-when-dynamically-changing-protocol'
Jakub Kicinski [Fri, 18 Nov 2022 05:16:46 +0000 (21:16 -0800)]
Merge branch 'autoload-dsa-tagging-driver-when-dynamically-changing-protocol'

Vladimir Oltean says:

====================
Autoload DSA tagging driver when dynamically changing protocol

This patch set solves the issue reported by Michael and Heiko here:
https://lore.kernel.org/lkml/20221027113248.420216-1-michael@walle.cc/
making full use of Michael's suggestion of having two modaliases: one
gets used for loading the tagging protocol when it's the default one
reported by the switch driver, the other gets loaded at user's request,
by name.

  # modinfo tag_ocelot
  filename:       /lib/modules/6.1.0-rc4+/kernel/net/dsa/tag_ocelot.ko
  license:        GPL v2
  alias:          dsa_tag:seville
  alias:          dsa_tag:id-21
  alias:          dsa_tag:ocelot
  alias:          dsa_tag:id-15
  depends:        dsa_core
  intree:         Y
  name:           tag_ocelot
  vermagic:       6.1.0-rc4+ SMP preempt mod_unload modversions aarch64

Tested on NXP LS1028A-RDB with the following device tree addition:

&mscc_felix_port4 {
dsa-tag-protocol = "ocelot-8021q";
};

&mscc_felix_port5 {
dsa-tag-protocol = "ocelot-8021q";
};

CONFIG_NET_DSA and everything that depends on it is built as module.
Everything auto-loads, and "cat /sys/class/net/eno2/dsa/tagging" shows
"ocelot-8021q". Traffic works as well. Furthermore, "echo ocelot-8021q"
into the aforementioned sysfs file now auto-loads the driver for it.
====================

Link: https://lore.kernel.org/r/20221115011847.2843127-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: autoload tag driver module on tagging protocol change
Vladimir Oltean [Tue, 15 Nov 2022 01:18:47 +0000 (03:18 +0200)]
net: dsa: autoload tag driver module on tagging protocol change

Issue a request_module() call when an attempt to change the tagging
protocol is made, either by sysfs or by device tree. In the case of
ocelot (the only driver for which the default and the alternative
tagging protocol are compiled as different modules), the user is now no
longer required to insert tag_ocelot_8021q.ko manually.

In the particular case of ocelot, this solves a problem where
tag_ocelot_8021q.ko is built as module, and this is present in the
device tree:

&mscc_felix_port4 {
dsa-tag-protocol = "ocelot-8021q";
};

&mscc_felix_port5 {
dsa-tag-protocol = "ocelot-8021q";
};

Because no one attempts to load the module into the kernel at boot time,
the switch driver will fail to probe (actually forever defer) until
someone manually inserts tag_ocelot_8021q.ko. This is now no longer
necessary and happens automatically.

Rename dsa_find_tagger_by_name() to denote the change in functionality:
there is now feature parity with dsa_tag_driver_get_by_id(), i.o.w. we
also load the module if it's missing.

Link: https://lore.kernel.org/lkml/20221027113248.420216-1-michael@walle.cc/
Suggested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc> # on kontron-sl28 w/ ocelot_8021q
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: rename dsa_tag_driver_get() to dsa_tag_driver_get_by_id()
Vladimir Oltean [Tue, 15 Nov 2022 01:18:46 +0000 (03:18 +0200)]
net: dsa: rename dsa_tag_driver_get() to dsa_tag_driver_get_by_id()

A future patch will introduce one more way of getting a reference on a
tagging protocl driver (by name). Rename the current method to "by_id".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: strip sysfs "tagging" string of trailing newline
Vladimir Oltean [Tue, 15 Nov 2022 01:18:45 +0000 (03:18 +0200)]
net: dsa: strip sysfs "tagging" string of trailing newline

Currently, dsa_find_tagger_by_name() uses sysfs_streq() which works both
with strings that contain \n at the end (echo ocelot > .../dsa/tagging)
and with strings that don't (printf ocelot > .../dsa/tagging).

There will be a problem once we'll want to construct the modalias string
based on which we auto-load the protocol kernel module. If the sysfs
buffer ends in a newline, we need to strip it first. This is a
preparatory patch specifically for that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: provide a second modalias to tag proto drivers based on their name
Vladimir Oltean [Tue, 15 Nov 2022 01:18:44 +0000 (03:18 +0200)]
net: dsa: provide a second modalias to tag proto drivers based on their name

Currently, tagging protocol drivers have a modalias of
"dsa_tag:id-<number>", where the number is one of DSA_TAG_PROTO_*_VALUE.

This modalias makes it possible for the request_module() call in
dsa_tag_driver_get() to work, given the input it has - an integer
returned by ds->ops->get_tag_protocol().

It is also possible to change tagging protocols at (pseudo-)runtime, via
sysfs or via device tree, and this works via the name string of the
tagging protocol rather than via its id (DSA_TAG_PROTO_*_VALUE).

In the latter case, there is no request_module() call, because there is
no association that the DSA core has between the string name and the ID,
to construct the modalias. The module is simply assumed to have been
inserted. This is actually slightly problematic when the tagging
protocol change should take place at probe time, since it's expected
that the dependency module should get autoloaded.

For this purpose, let's introduce a second modalias, so that the DSA
core can call request_module() by name. There is no reason to make the
modalias by name optional, so just modify the MODULE_ALIAS_DSA_TAG_DRIVER()
macro to take both the ID and the name as arguments, and generate two
modaliases behind the scenes.

Suggested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc> # on kontron-sl28 w/ ocelot_8021q
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: rename tagging protocol driver modalias
Vladimir Oltean [Tue, 15 Nov 2022 01:18:43 +0000 (03:18 +0200)]
net: dsa: rename tagging protocol driver modalias

It's autumn cleanup time, and today's target are modaliases.

Michael says that for users of modinfo, "dsa_tag-20" is not the most
suggestive name, and recommends a change to "dsa_tag-id-20".

Andrew points out that other modaliases have a prefix delimited by
colons, so he recommends "dsa_tag:20" instead of "dsa_tag-20".

To satisfy both proposals, Florian recommends "dsa_tag:id-20".

The modaliases are not stable ABI, and the essential information
(protocol ID) is still conveyed in the new string, which
request_module() must be adapted to form.

Link: 20221027210830.3577793-1-vladimir.oltean@nxp.com
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Suggested-by: Michael Walle <michael@walle.cc>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: stop exposing tag proto module helpers to the world
Vladimir Oltean [Tue, 15 Nov 2022 01:18:42 +0000 (03:18 +0200)]
net: dsa: stop exposing tag proto module helpers to the world

The DSA tagging protocol driver macros are in the public include/net/dsa.h
probably because that's also where the DSA_TAG_PROTO_*_VALUE macros are
(MODULE_ALIAS_DSA_TAG_DRIVER hinges on those macro definitions).

But there is no reason to expose these helpers to <net/dsa.h>. That
header is shared between switch drivers (drivers/net/dsa/), tagging
protocol drivers (net/dsa/tag_*.c), the DSA core (net/dsa/ sans tag_*.c),
and the rest of the world (DSA master drivers, network stack, etc).
Too much exposure.

On the other hand, net/dsa/dsa_priv.h is included only by the DSA core
and by DSA tagging protocol drivers (or IOW, "friend" modules). Also a
bit too much exposure - I've contemplated creating a new header which is
only included by tagging protocol drivers, but completely separating a
new dsa_tag_proto.h from dsa_priv.h is not immediately trivial - for
example dsa_slave_to_port() is used both from the fast path and from the
control path.

So for now, move these definitions to dsa_priv.h which at least hides
them from the world.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agodt-bindings: net: ipq4019-mdio: document required clock-names
Robert Marko [Mon, 14 Nov 2022 19:47:33 +0000 (20:47 +0100)]
dt-bindings: net: ipq4019-mdio: document required clock-names

IPQ5018, IPQ6018 and IPQ8074 require clock-names to be set as driver is
requesting the clock based on it and not index, so document that and make
it required for the listed SoC-s.

Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221114194734.3287854-4-robimarko@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agodt-bindings: net: ipq4019-mdio: require and validate clocks
Robert Marko [Mon, 14 Nov 2022 19:47:32 +0000 (20:47 +0100)]
dt-bindings: net: ipq4019-mdio: require and validate clocks

Now that we can match the platforms requiring clocks by compatible start
using those to allow clocks per compatible and make them required.

Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221114194734.3287854-3-robimarko@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agodt-bindings: net: ipq4019-mdio: add IPQ8074 compatible
Robert Marko [Mon, 14 Nov 2022 19:47:31 +0000 (20:47 +0100)]
dt-bindings: net: ipq4019-mdio: add IPQ8074 compatible

Allow using IPQ8074 specific compatible along with the fallback IPQ4019
one in order to be able to specify which compatibles require clocks to
be able to validate them via schema.

Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221114194734.3287854-2-robimarko@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agodt-bindings: net: ipq4019-mdio: document IPQ6018 compatible
Robert Marko [Mon, 14 Nov 2022 19:47:30 +0000 (20:47 +0100)]
dt-bindings: net: ipq4019-mdio: document IPQ6018 compatible

Document IPQ6018 compatible that is already being used in the DTS along
with the fallback IPQ4019 compatible as driver itself only gets probed
on IPQ4019 and IPQ5018 compatibles.

This is also required in order to specify which platform require clock to
be defined and validate it in schema.

Signed-off-by: Robert Marko <robimarko@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221114194734.3287854-1-robimarko@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'net-dsa-use-more-appropriate-net_name_-constants-for-user-ports'
Jakub Kicinski [Fri, 18 Nov 2022 03:40:13 +0000 (19:40 -0800)]
Merge branch 'net-dsa-use-more-appropriate-net_name_-constants-for-user-ports'

Rasmus Villemoes says:

====================
net: dsa: use more appropriate NET_NAME_* constants for user ports

The intention of commit 685343fc3ba6 ("net: add name_assign_type
netdev attribute") was clearly that drivers be switched over one by
one to select appropriate NET_NAME_* constants instead of
NET_NAME_UNKNOWN. This small series attempts to do that for DSA user
ports.

This is obviously and intentionally user-visible changes, so there's a
small chance that it could lead to a regression. To make it easy to
revert either of the "label in DT" and "fallback to eth%d" changes,
this is done as a refactoring which shouldn't introduce any functional
change (but by itself adds code which looks a little odd, with the two
identical assignments in the two branches), followed by changing the
constant used in each case in two different patches.
====================

Link: https://lore.kernel.org/r/20221116105205.1127843-1-linux@rasmusvillemoes.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: set name_assign_type to NET_NAME_ENUM for enumerated user ports
Rasmus Villemoes [Wed, 16 Nov 2022 10:52:04 +0000 (11:52 +0100)]
net: dsa: set name_assign_type to NET_NAME_ENUM for enumerated user ports

When a user port does not have a label in device tree, and we thus
fall back to the eth%d scheme, the proper constant to use is
NET_NAME_ENUM. See also commit e9f656b7a214 ("net: ethernet: set
default assignment identifier to NET_NAME_ENUM"), which in turn quoted
commit 685343fc3ba6 ("net: add name_assign_type netdev attribute"):

    ... when the kernel has given the interface a name using global
    device enumeration based on order of discovery (ethX, wlanY, etc)
    ... are labelled NET_NAME_ENUM.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.faineli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: use NET_NAME_PREDICTABLE for user ports with name given in DT
Rasmus Villemoes [Wed, 16 Nov 2022 10:52:03 +0000 (11:52 +0100)]
net: dsa: use NET_NAME_PREDICTABLE for user ports with name given in DT

When a user port has a label in device tree, the corresponding
netdevice is, to quote include/uapi/linux/netdevice.h, "predictably
named by the kernel". This is also explicitly one of the intended use
cases for NET_NAME_PREDICTABLE, quoting 685343fc3ba6 ("net: add
name_assign_type netdev attribute"):

  NET_NAME_PREDICTABLE:
    The ifname has been assigned by the kernel in a predictable way
    [...] Examples include [...] and names deduced from hardware
    properties (including being given explicitly by the firmware).

Expose that information properly for the benefit of userspace tools
that make decisions based on the name_assign_type attribute,
e.g. a systemd-udev rule with "kernel" in NamePolicy.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.faineli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: refactor name assignment for user ports
Rasmus Villemoes [Wed, 16 Nov 2022 10:52:02 +0000 (11:52 +0100)]
net: dsa: refactor name assignment for user ports

The following two patches each have a (small) chance of causing
regressions for userspace and will in that case of course need to be
reverted.

In order to prepare for that and make those two patches independent
and individually revertable, refactor the code which sets the names
for user ports by moving the "fall back to eth%d if no label is given
in device tree" to dsa_slave_create().

No functional change (at least none intended).

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.faineli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoethtool: doc: clarify what drivers can implement in their get_drvinfo()
Vincent Mailhol [Wed, 16 Nov 2022 17:18:28 +0000 (02:18 +0900)]
ethtool: doc: clarify what drivers can implement in their get_drvinfo()

Many of the drivers which implement ethtool_ops::get_drvinfo() will
prints the .driver, .version or .bus_info of struct ethtool_drvinfo.
To have a glance of current state, do:

  $ git grep -W "get_drvinfo(struct"

Printing in those three fields is useless because:

  - since [1], the driver version should be the kernel version (at
    least for upstream drivers). Arguably, out of tree drivers might
    still want to set a custom version, but out of tree is not our
    focus.

  - since [2], the core is able to provide default values for .driver
    and .bus_info.

In summary, drivers may provide .fw_version and .erom_version, the
rest is expected to be done by the core.

In struct ethtool_ops doc from linux/ethtool: rephrase field
get_drvinfo() doc to discourage developers from implementing this
callback.

In struct ethtool_drvinfo doc from uapi/linux/ethtool.h: remove the
paragraph mentioning what drivers should do. Rationale: no need to
repeat what is already written in struct ethtool_ops doc. But add a
note that .fw_version and .erom_version are driver defined.

Also update the dummy driver and simply remove the callback in order
not to confuse the newcomers: most of the drivers will not need this
callback function any more.

[1] commit 6a7e25c7fb48 ("net/core: Replace driver version to be
    kernel version")
Link: https://git.kernel.org/torvalds/linux/c/6a7e25c7fb48
[2] commit edaf5df22cb8 ("ethtool: ethtool_get_drvinfo: populate
    drvinfo fields even if callback exits")
Link: https://git.kernel.org/netdev/net-next/c/edaf5df22cb8
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20221116171828.4093-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Fri, 18 Nov 2022 00:19:14 +0000 (16:19 -0800)]
Merge git://git./linux/kernel/git/netdev/net

include/linux/bpf.h
  1f6e04a1c7b8 ("bpf: Fix offset calculation error in __copy_map_value and zero_map_value")
  aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record")
  f71b2f64177a ("bpf: Refactor map->off_arr handling")
https://lore.kernel.org/all/20221114095000.67a73239@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'net-6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 17 Nov 2022 16:58:36 +0000 (08:58 -0800)]
Merge tag 'net-6.1-rc6' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf.

  Current release - regressions:

   - tls: fix memory leak in tls_enc_skb() and tls_sw_fallback_init()

  Previous releases - regressions:

   - bridge: fix memory leaks when changing VLAN protocol

   - dsa: make dsa_master_ioctl() see through port_hwtstamp_get() shims

   - dsa: don't leak tagger-owned storage on switch driver unbind

   - eth: mlxsw: avoid warnings when not offloaded FDB entry with IPv6
     is removed

   - eth: stmmac: ensure tx function is not running in
     stmmac_xdp_release()

   - eth: hns3: fix return value check bug of rx copybreak

  Previous releases - always broken:

   - kcm: close race conditions on sk_receive_queue

   - bpf: fix alignment problem in bpf_prog_test_run_skb()

   - bpf: fix writing offset in case of fault in
     strncpy_from_kernel_nofault

   - eth: macvlan: use built-in RCU list checking

   - eth: marvell: add sleep time after enabling the loopback bit

   - eth: octeon_ep: fix potential memory leak in octep_device_setup()

  Misc:

   - tcp: configurable source port perturb table size

   - bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)"

* tag 'net-6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
  net: use struct_group to copy ip/ipv6 header addresses
  net: usb: smsc95xx: fix external PHY reset
  net: usb: qmi_wwan: add Telit 0x103a composition
  netdevsim: Fix memory leak of nsim_dev->fa_cookie
  tcp: configurable source port perturb table size
  l2tp: Serialize access to sk_user_data with sk_callback_lock
  net: thunderbolt: Fix error handling in tbnet_init()
  net: microchip: sparx5: Fix potential null-ptr-deref in sparx_stats_init() and sparx5_start()
  net: lan966x: Fix potential null-ptr-deref in lan966x_stats_init()
  net: dsa: don't leak tagger-owned storage on switch driver unbind
  net/x25: Fix skb leak in x25_lapb_receive_frame()
  net: ag71xx: call phylink_disconnect_phy if ag71xx_hw_enable() fail in ag71xx_open()
  bridge: switchdev: Fix memory leaks when changing VLAN protocol
  net: hns3: fix setting incorrect phy link ksettings for firmware in resetting process
  net: hns3: fix return value check bug of rx copybreak
  net: hns3: fix incorrect hw rss hash type of rx packet
  net: phy: marvell: add sleep time after enabling the loopback bit
  net: ena: Fix error handling in ena_init()
  kcm: close race conditions on sk_receive_queue
  net: ionic: Fix error handling in ionic_init_module()
  ...

2 years agonet: ethernet: renesas: Fix return type in rswitch_etha_wait_link_verification()
Dan Carpenter [Tue, 15 Nov 2022 13:09:55 +0000 (16:09 +0300)]
net: ethernet: renesas: Fix return type in rswitch_etha_wait_link_verification()

The rswitch_etha_wait_link_verification() is supposed to return zero
on success or negative error codes.  Unfortunately it is declared as a
bool so the caller treats everything as success.

Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/Y3OPo6AOL6PTvXFU@kili
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoNFC: nci: Allow to create multiple virtual nci devices
Dmitry Vyukov [Tue, 15 Nov 2022 10:00:17 +0000 (11:00 +0100)]
NFC: nci: Allow to create multiple virtual nci devices

The current virtual nci driver is great for testing and fuzzing.
But it allows to create at most one "global" device which does not allow
to run parallel tests and harms fuzzing isolation and reproducibility.
Restructure the driver to allow creation of multiple independent devices.
This should be backwards compatible for existing tests.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Cc: Bongsu Jeon <bongsu.jeon@samsung.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20221115100017.787929-1-dvyukov@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agosundance: remove unused variable cnt
Colin Ian King [Tue, 15 Nov 2022 09:31:37 +0000 (09:31 +0000)]
sundance: remove unused variable cnt

Variable cnt is just being incremented and it's never used
anywhere else. The variable and the increment are redundant so
remove it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20221115093137.144002-1-colin.i.king@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agosctp: sm_statefuns: Remove pointer casts of the same type
Li zeming [Tue, 15 Nov 2022 02:07:05 +0000 (10:07 +0800)]
sctp: sm_statefuns: Remove pointer casts of the same type

The subh.addip_hdr pointer is also of type (struct sctp_addiphdr *), so
it does not require a cast.

Signed-off-by: Li zeming <zeming@nfschina.com>
Link: https://lore.kernel.org/r/20221115020705.3220-1-zeming@nfschina.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: use struct_group to copy ip/ipv6 header addresses
Hangbin Liu [Tue, 15 Nov 2022 14:24:00 +0000 (22:24 +0800)]
net: use struct_group to copy ip/ipv6 header addresses

kernel test robot reported warnings when build bonding module with
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/net/bonding/:

                 from ../drivers/net/bonding/bond_main.c:35:
In function â€˜fortify_memcpy_chk’,
    inlined from â€˜iph_to_flow_copy_v4addrs’ at ../include/net/ip.h:566:2,
    inlined from â€˜bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3984:3:
../include/linux/fortify-string.h:413:25: warning: call to â€˜__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
  413 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function â€˜fortify_memcpy_chk’,
    inlined from â€˜iph_to_flow_copy_v6addrs’ at ../include/net/ipv6.h:900:2,
    inlined from â€˜bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3994:3:
../include/linux/fortify-string.h:413:25: warning: call to â€˜__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
  413 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is because we try to copy the whole ip/ip6 address to the flow_key,
while we only point the to ip/ip6 saddr. Note that since these are UAPI
headers, __struct_group() is used to avoid the compiler warnings.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: c3f8324188fa ("net: Add full IPv6 addresses to flow_keys")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20221115142400.1204786-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: usb: smsc95xx: fix external PHY reset
Alexandru Tachici [Tue, 15 Nov 2022 11:44:34 +0000 (13:44 +0200)]
net: usb: smsc95xx: fix external PHY reset

An external PHY needs settling time after power up or reset.
In the bind() function an mdio bus is registered. If at this point
the external PHY is still initialising, no valid PHY ID will be
read and on phy_find_first() the bind() function will fail.

If an external PHY is present, wait the maximum time specified
in 802.3 45.2.7.1.1.

Fixes: 05b35e7eb9a1 ("smsc95xx: add phylib support")
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20221115114434.9991-2-alexandru.tachici@analog.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: usb: qmi_wwan: add Telit 0x103a composition
Enrico Sau [Tue, 15 Nov 2022 10:58:59 +0000 (11:58 +0100)]
net: usb: qmi_wwan: add Telit 0x103a composition

Add the following Telit LE910C4-WWX composition:

0x103a: rmnet

Signed-off-by: Enrico Sau <enrico.sau@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Link: https://lore.kernel.org/r/20221115105859.14324-1-enrico.sau@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonetdevsim: Fix memory leak of nsim_dev->fa_cookie
Wang Yufen [Tue, 15 Nov 2022 09:30:25 +0000 (17:30 +0800)]
netdevsim: Fix memory leak of nsim_dev->fa_cookie

kmemleak reports this issue:

unreferenced object 0xffff8881bac872d0 (size 8):
  comm "sh", pid 58603, jiffies 4481524462 (age 68.065s)
  hex dump (first 8 bytes):
    04 00 00 00 de ad be ef                          ........
  backtrace:
    [<00000000c80b8577>] __kmalloc+0x49/0x150
    [<000000005292b8c6>] nsim_dev_trap_fa_cookie_write+0xc1/0x210 [netdevsim]
    [<0000000093d78e77>] full_proxy_write+0xf3/0x180
    [<000000005a662c16>] vfs_write+0x1c5/0xaf0
    [<000000007aabf84a>] ksys_write+0xed/0x1c0
    [<000000005f1d2e47>] do_syscall_64+0x3b/0x90
    [<000000006001c6ec>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

nsim_dev_trap_fa_cookie_write()
  kmalloc() fa_cookie
  nsim_dev->fa_cookie = fa_cookie
..
nsim_drv_remove()

The fa_cookie allocked in nsim_dev_trap_fa_cookie_write() is not freed. To
fix, add kfree(nsim_dev->fa_cookie) to nsim_drv_remove().

Fixes: d3cbb907ae57 ("netdevsim: add ACL trap reporting cookie as a metadata")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Link: https://lore.kernel.org/r/1668504625-14698-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomlxsw: update adjfine to use adjust_by_scaled_ppm
Jacob Keller [Mon, 14 Nov 2022 21:37:01 +0000 (13:37 -0800)]
mlxsw: update adjfine to use adjust_by_scaled_ppm

The mlxsw adjfine implementation in the spectrum_ptp.c file converts
scaled_ppm into ppb before updating a cyclecounter multiplier using the
standard "base * ppb / 1billion" calculation.

This can be re-written to use adjust_by_scaled_ppm, directly using the
scaled parts per million and reducing the amount of code required to
express this calculation.

We still calculate the parts per billion for passing into
mlxsw_sp_ptp_phc_adjfreq because this function requires the input to be in
parts per billion.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Amit Cohen <amcohen@nvidia.com>
Cc: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20221114213701.815132-1-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'for-linus-6.1-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 16 Nov 2022 18:49:06 +0000 (10:49 -0800)]
Merge tag 'for-linus-6.1-rc6-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Two trivial cleanups, and three simple fixes"

* tag 'for-linus-6.1-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/platform-pci: use define instead of literal number
  xen/platform-pci: add missing free_irq() in error path
  xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too
  xen/pcpu: fix possible memory leak in register_pcpu()
  x86/xen: Use kstrtobool() instead of strtobool()

2 years agoMerge tag 'pinctrl-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Wed, 16 Nov 2022 18:40:00 +0000 (10:40 -0800)]
Merge tag 'pinctrl-v6.1-4' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Aere is a hopefully final round of pin control fixes. Nothing special,
  driver fixes and we caught a potential NULL pointer exception.

   - Fix a potential NULL dereference in the core!

   - Fix all pin mux routes in the Rockchop PX30 driver

   - Fix the UFS pins in the Qualcomm SC8280XP driver

   - Fix bias disabling in the Mediatek driver

   - Fix debounce time settings in the Mediatek driver"

* tag 'pinctrl-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: mediatek: Export debounce time tables
  pinctrl: mediatek: Fix EINT pins input debounce time configuration
  pinctrl: devicetree: fix null pointer dereferencing in pinctrl_dt_to_map
  pinctrl: mediatek: common-v2: Fix bias-disable for PULL_PU_PD_RSEL_TYPE
  pinctrl: qcom: sc8280xp: Rectify UFS reset pins
  pinctrl: rockchip: list all pins in a possible mux route for PX30

2 years agoMerge tag 'platform-drivers-x86-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 16 Nov 2022 18:36:13 +0000 (10:36 -0800)]
Merge tag 'platform-drivers-x86-v6.1-4' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:

 - Surface Pro 9 and Surface Laptop 5 kbd, battery, etc support (this
   is just a few hw-id additions)

 - A couple of other hw-id / DMI-quirk additions

 - A few small bug fixes + 1 build fix

* tag 'platform-drivers-x86-v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: ideapad-laptop: Add module parameters to match DMI quirk tables
  platform/x86: ideapad-laptop: Fix interrupt storm on fn-lock toggle on some Yoga laptops
  platform/x86: hp-wmi: Ignore Smart Experience App event
  platform/surface: aggregator_registry: Add support for Surface Laptop 5
  platform/surface: aggregator_registry: Add support for Surface Pro 9
  platform/surface: aggregator: Do not check for repeated unsequenced packets
  platform/x86: acer-wmi: Enable SW_TABLET_MODE on Switch V 10 (SW5-017)
  platform/x86: asus-wmi: add missing pci_dev_put() in asus_wmi_set_xusb2pr()
  platform/x86/intel: pmc: Don't unconditionally attach Intel PMC when virtualized
  platform/x86: thinkpad_acpi: Enable s2idle quirk for 21A1 machine type
  platform/x86/amd: pmc: Add new ACPI ID AMDI0009
  platform/x86/amd: pmc: Remove more CONFIG_DEBUG_FS checks

2 years agotcp: annotate data-race around queue->synflood_warned
Eric Dumazet [Tue, 15 Nov 2022 09:18:51 +0000 (09:18 +0000)]
tcp: annotate data-race around queue->synflood_warned

Annotate the lockless read of queue->synflood_warned.

Following xchg() has the needed data-race resolution.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoax25: af_ax25: Remove unnecessary (void*) conversions
Li zeming [Tue, 15 Nov 2022 02:14:24 +0000 (10:14 +0800)]
ax25: af_ax25: Remove unnecessary (void*) conversions

The valptr pointer is of (void *) type, so other pointers need not be
forced to assign values to it.

Signed-off-by: Li zeming <zeming@nfschina.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: configurable source port perturb table size
Gleb Mazovetskiy [Mon, 14 Nov 2022 22:56:16 +0000 (22:56 +0000)]
tcp: configurable source port perturb table size

On embedded systems with little memory and no relevant
security concerns, it is beneficial to reduce the size
of the table.

Reducing the size from 2^16 to 2^8 saves 255 KiB
of kernel RAM.

Makes the table size configurable as an expert option.

The size was previously increased from 2^8 to 2^16
in commit 4c2c8f03a5ab ("tcp: increase source port perturb table to
2^16").

Signed-off-by: Gleb Mazovetskiy <glex.spb@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agol2tp: Serialize access to sk_user_data with sk_callback_lock
Jakub Sitnicki [Mon, 14 Nov 2022 19:16:19 +0000 (20:16 +0100)]
l2tp: Serialize access to sk_user_data with sk_callback_lock

sk->sk_user_data has multiple users, which are not compatible with each
other. Writers must synchronize by grabbing the sk->sk_callback_lock.

l2tp currently fails to grab the lock when modifying the underlying tunnel
socket fields. Fix it by adding appropriate locking.

We err on the side of safety and grab the sk_callback_lock also inside the
sk_destruct callback overridden by l2tp, even though there should be no
refs allowing access to the sock at the time when sk_destruct gets called.

v4:
- serialize write to sk_user_data in l2tp sk_destruct

v3:
- switch from sock lock to sk_callback_lock
- document write-protection for sk_user_data

v2:
- update Fixes to point to origin of the bug
- use real names in Reported/Tested-by tags

Cc: Tom Parkin <tparkin@katalix.com>
Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core")
Reported-by: Haowei Yan <g1042620637@gmail.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'net-atomic-dev-stats'
David S. Miller [Wed, 16 Nov 2022 12:48:44 +0000 (12:48 +0000)]
Merge branch 'net-atomic-dev-stats'

Eric Dumazet says:

====================
net: add atomic dev->stats infra

Long standing KCSAN issues are caused by data-race around
some dev->stats changes.

Most performance critical paths already use per-cpu
variables, or per-queue ones.

It is reasonable (and more correct) to use atomic operations
for the slow paths.

First patch adds the infrastructure, then three patches address
the most common paths that syzbot is playing with.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv4: tunnels: use DEV_STATS_INC()
Eric Dumazet [Tue, 15 Nov 2022 08:53:58 +0000 (08:53 +0000)]
ipv4: tunnels: use DEV_STATS_INC()

Most of code paths in tunnels are lockless (eg NETIF_F_LLTX in tx).

Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv6: tunnels: use DEV_STATS_INC()
Eric Dumazet [Tue, 15 Nov 2022 08:53:57 +0000 (08:53 +0000)]
ipv6: tunnels: use DEV_STATS_INC()

Most of code paths in tunnels are lockless (eg NETIF_F_LLTX in tx).

Adopt SMP safe DEV_STATS_{INC|ADD}() to update dev->stats fields.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv6/sit: use DEV_STATS_INC() to avoid data-races
Eric Dumazet [Tue, 15 Nov 2022 08:53:56 +0000 (08:53 +0000)]
ipv6/sit: use DEV_STATS_INC() to avoid data-races

syzbot/KCSAN reported that multiple cpus are updating dev->stats.tx_error
concurrently.

This is because sit tunnels are NETIF_F_LLTX, meaning their ndo_start_xmit()
is not protected by a spinlock.

While original KCSAN report was about tx path, rx path has the same issue.

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: add atomic_long_t to net_device_stats fields
Eric Dumazet [Tue, 15 Nov 2022 08:53:55 +0000 (08:53 +0000)]
net: add atomic_long_t to net_device_stats fields

Long standing KCSAN issues are caused by data-race around
some dev->stats changes.

Most performance critical paths already use per-cpu
variables, or per-queue ones.

It is reasonable (and more correct) to use atomic operations
for the slow paths.

This patch adds an union for each field of net_device_stats,
so that we can convert paths that are not yet protected
by a spinlock or a mutex.

netdev_stats_to_stats64() no longer has an #if BITS_PER_LONG==64

Note that the memcpy() we were using on 64bit arches
had no provision to avoid load-tearing,
while atomic_long_read() is providing the needed protection
at no cost.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'net-try_cmpxchg-conversions'
David S. Miller [Wed, 16 Nov 2022 12:42:01 +0000 (12:42 +0000)]
Merge branch 'net-try_cmpxchg-conversions'

Eric Dumazet says:

====================
net: more try_cmpxchg() conversions

Adopt try_cmpxchg() and friends in more places, as this
is preferred nowadays.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: __sock_gen_cookie() cleanup
Eric Dumazet [Tue, 15 Nov 2022 09:11:01 +0000 (09:11 +0000)]
net: __sock_gen_cookie() cleanup

Adopt atomic64_try_cmpxchg() and remove the loop,
to make the intent more obvious.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: adopt try_cmpxchg() in napi_{enable|disable}()
Eric Dumazet [Tue, 15 Nov 2022 09:11:00 +0000 (09:11 +0000)]
net: adopt try_cmpxchg() in napi_{enable|disable}()

This makes code a bit cleaner.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: adopt try_cmpxchg() in napi_schedule_prep() and napi_complete_done()
Eric Dumazet [Tue, 15 Nov 2022 09:10:59 +0000 (09:10 +0000)]
net: adopt try_cmpxchg() in napi_schedule_prep() and napi_complete_done()

This makes the code slightly more efficient.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: net_{enable|disable}_timestamp() optimizations
Eric Dumazet [Tue, 15 Nov 2022 09:10:58 +0000 (09:10 +0000)]
net: net_{enable|disable}_timestamp() optimizations

Adopting atomic_try_cmpxchg() makes the code cleaner.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv6: fib6_new_sernum() optimization
Eric Dumazet [Tue, 15 Nov 2022 09:10:57 +0000 (09:10 +0000)]
ipv6: fib6_new_sernum() optimization

Adopt atomic_try_cmpxchg() which is slightly more efficient.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mm_account_pinned_pages() optimization
Eric Dumazet [Tue, 15 Nov 2022 09:10:56 +0000 (09:10 +0000)]
net: mm_account_pinned_pages() optimization

Adopt atomic_long_try_cmpxchg() in mm_account_pinned_pages()
as it is slightly more efficient.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down
Vladimir Oltean [Mon, 14 Nov 2022 14:42:56 +0000 (16:42 +0200)]
net: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down

RFC 2863 says:

   The lowerLayerDown state is also a refinement on the down state.
   This new state indicates that this interface runs "on top of" one or
   more other interfaces (see ifStackTable) and that this interface is
   down specifically because one or more of these lower-layer interfaces
   are down.

DSA interfaces are virtual network devices, stacked on top of the DSA
master, but they have a physical MAC, with a PHY that reports a real
link status.

But since DSA (perhaps improperly) uses an iflink to describe the
relationship to its master since commit c084080151e1 ("dsa: set ->iflink
on slave interfaces to the ifindex of the parent"), default_operstate()
will misinterpret this to mean that every time the carrier of a DSA
interface is not ok, it is because of the master being not ok.

In fact, since commit c0a8a9c27493 ("net: dsa: automatically bring user
ports down when master goes down"), DSA cannot even in theory be in the
lowerLayerDown state, because it just calls dev_close_many(), thereby
going down, when the master goes down.

We could revert the commit that creates an iflink between a DSA user
port and its master, especially since now we have an alternative
IFLA_DSA_MASTER which has less side effects. But there may be tooling in
use which relies on the iflink, which has existed since 2009.

We could also probably do something local within DSA to overwrite what
rfc2863_policy() did, in a way similar to hsr_set_operstate(), but this
seems like a hack.

What seems appropriate is to follow the iflink, and check the carrier
status of that interface as well. If that's down too, yes, keep
reporting lowerLayerDown, otherwise just down.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'udp-pernetns-hash'
David S. Miller [Wed, 16 Nov 2022 09:43:36 +0000 (09:43 +0000)]
Merge branch 'udp-pernetns-hash'

Kuniyuki Iwashima says:

====================
udp: Introduce optional per-netns hash table.

This series is the UDP version of the per-netns ehash series [0],
which were initially in the same patch set. [1]

The notable difference with TCP is the max table size is 64K and the min
size is 128.  This is because the possible hash range by udp_hashfn()
always fits in 64K within the same netns and because we want to keep a
bitmap in udp_lib_get_port() on the stack.  Also, the UDP per-netns table
isolates both 1-tuple and 2-tuple tables.

For details, please see the last patch.

  patch 1 - 4: prep for per-netns hash table
  patch     5: add per-netns hash table

[0]: https://lore.kernel.org/netdev/20220908011022.45342-1-kuniyu@amazon.com/
[1]: https://lore.kernel.org/netdev/20220826000445.46552-1-kuniyu@amazon.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoudp: Introduce optional per-netns hash table.
Kuniyuki Iwashima [Mon, 14 Nov 2022 21:57:57 +0000 (13:57 -0800)]
udp: Introduce optional per-netns hash table.

The maximum hash table size is 64K due to the nature of the protocol. [0]
It's smaller than TCP, and fewer sockets can cause a performance drop.

On an EC2 c5.24xlarge instance (192 GiB memory), after running iperf3 in
different netns, creating 32Mi sockets without data transfer in the root
netns causes regression for the iperf3's connection.

  uhash_entries sockets length Gbps
    64K       1      1 5.69
    1Mi     16 5.27
    2Mi     32 4.90
    4Mi     64 4.09
    8Mi    128 2.96
   16Mi    256 2.06
   32Mi    512 1.12

The per-netns hash table breaks the lengthy lists into shorter ones.  It is
useful on a multi-tenant system with thousands of netns.  With smaller hash
tables, we can look up sockets faster, isolate noisy neighbours, and reduce
lock contention.

The max size of the per-netns table is 64K as well.  This is because the
possible hash range by udp_hashfn() always fits in 64K within the same
netns and we cannot make full use of the whole buckets larger than 64K.

  /* 0 < num < 64K  ->  X < hash < X + 64K */
  (num + net_hash_mix(net)) & mask;

Also, the min size is 128.  We use a bitmap to search for an available
port in udp_lib_get_port().  To keep the bitmap on the stack and not
fire the CONFIG_FRAME_WARN error at build time, we round up the table
size to 128.

The sysctl usage is the same with TCP:

  $ dmesg | cut -d ' ' -f 6- | grep "UDP hash"
  UDP hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc)

  # sysctl net.ipv4.udp_hash_entries
  net.ipv4.udp_hash_entries = 65536  # can be changed by uhash_entries

  # sysctl net.ipv4.udp_child_hash_entries
  net.ipv4.udp_child_hash_entries = 0  # disabled by default

  # ip netns add test1
  # ip netns exec test1 sysctl net.ipv4.udp_hash_entries
  net.ipv4.udp_hash_entries = -65536  # share the global table

  # sysctl -w net.ipv4.udp_child_hash_entries=100
  net.ipv4.udp_child_hash_entries = 100

  # ip netns add test2
  # ip netns exec test2 sysctl net.ipv4.udp_hash_entries
  net.ipv4.udp_hash_entries = 128  # own a per-netns table with 2^n buckets

We could optimise the hash table lookup/iteration further by removing
the netns comparison for the per-netns one in the future.  Also, we
could optimise the sparse udp_hslot layout by putting it in udp_table.

[0]: https://lore.kernel.org/netdev/4ACC2815.7010101@gmail.com/

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoudp: Access &udp_table via net.
Kuniyuki Iwashima [Mon, 14 Nov 2022 21:57:56 +0000 (13:57 -0800)]
udp: Access &udp_table via net.

We will soon introduce an optional per-netns hash table
for UDP.

This means we cannot use udp_table directly in most places.

Instead, access it via net->ipv4.udp_table.

The access will be valid only while initialising udp_table
itself and creating/destroying each netns.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoudp: Set NULL to udp_seq_afinfo.udp_table.
Kuniyuki Iwashima [Mon, 14 Nov 2022 21:57:55 +0000 (13:57 -0800)]
udp: Set NULL to udp_seq_afinfo.udp_table.

We will soon introduce an optional per-netns hash table
for UDP.

This means we cannot use the global udp_seq_afinfo.udp_table
to fetch a UDP hash table.

Instead, set NULL to udp_seq_afinfo.udp_table for UDP and get
a proper table from net->ipv4.udp_table.

Note that we still need udp_seq_afinfo.udp_table for UDP LITE.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoudp: Set NULL to sk->sk_prot->h.udp_table.
Kuniyuki Iwashima [Mon, 14 Nov 2022 21:57:54 +0000 (13:57 -0800)]
udp: Set NULL to sk->sk_prot->h.udp_table.

We will soon introduce an optional per-netns hash table
for UDP.

This means we cannot use the global sk->sk_prot->h.udp_table
to fetch a UDP hash table.

Instead, set NULL to sk->sk_prot->h.udp_table for UDP and get
a proper table from net->ipv4.udp_table.

Note that we still need sk->sk_prot->h.udp_table for UDP LITE.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoudp: Clean up some functions.
Kuniyuki Iwashima [Mon, 14 Nov 2022 21:57:53 +0000 (13:57 -0800)]
udp: Clean up some functions.

This patch adds no functional change and cleans up some functions
that the following patches touch around so that we make them tidy
and easy to review/revert.  The change is mainly to keep reverse
christmas tree order.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: thunderbolt: Fix error handling in tbnet_init()
Yuan Can [Mon, 14 Nov 2022 14:22:25 +0000 (14:22 +0000)]
net: thunderbolt: Fix error handling in tbnet_init()

A problem about insmod thunderbolt-net failed is triggered with following
log given while lsmod does not show thunderbolt_net:

 insmod: ERROR: could not insert module thunderbolt-net.ko: File exists

The reason is that tbnet_init() returns tb_register_service_driver()
directly without checking its return value, if tb_register_service_driver()
failed, it returns without removing property directory, resulting the
property directory can never be created later.

 tbnet_init()
   tb_register_property_dir() # register property directory
   tb_register_service_driver()
     driver_register()
       bus_add_driver()
         priv = kzalloc(...) # OOM happened
   # return without remove property directory

Fix by remove property directory when tb_register_service_driver() returns
error.

Fixes: e69b6c02b4c3 ("net: Add support for networking over Thunderbolt cable")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'microchip-fixes'
David S. Miller [Wed, 16 Nov 2022 09:10:29 +0000 (09:10 +0000)]
Merge branch 'microchip-fixes'

Shang XiaoJing says:

====================
net: microchip: Fix potential null-ptr-deref due to create_singlethread_workqueue()

There are some functions call create_singlethread_workqueue() without
checking ret value, and the NULL workqueue_struct pointer may causes
null-ptr-deref. Will be fixed by this patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: microchip: sparx5: Fix potential null-ptr-deref in sparx_stats_init() and sparx5...
Shang XiaoJing [Mon, 14 Nov 2022 13:38:53 +0000 (21:38 +0800)]
net: microchip: sparx5: Fix potential null-ptr-deref in sparx_stats_init() and sparx5_start()

sparx_stats_init() calls create_singlethread_workqueue() and not
checked the ret value, which may return NULL. And a null-ptr-deref may
happen:

sparx_stats_init()
    create_singlethread_workqueue() # failed, sparx5->stats_queue is NULL
    queue_delayed_work()
        queue_delayed_work_on()
            __queue_delayed_work()  # warning here, but continue
                __queue_work()      # access wq->flags, null-ptr-deref

Check the ret value and return -ENOMEM if it is NULL. So as
sparx5_start().

Fixes: af4b11022e2d ("net: sparx5: add ethtool configuration and statistics support")
Fixes: b37a1bae742f ("net: sparx5: add mactable support")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lan966x: Fix potential null-ptr-deref in lan966x_stats_init()
Shang XiaoJing [Mon, 14 Nov 2022 13:38:52 +0000 (21:38 +0800)]
net: lan966x: Fix potential null-ptr-deref in lan966x_stats_init()

lan966x_stats_init() calls create_singlethread_workqueue() and not
checked the ret value, which may return NULL. And a null-ptr-deref may
happen:

lan966x_stats_init()
    create_singlethread_workqueue() # failed, lan966x->stats_queue is NULL
    queue_delayed_work()
        queue_delayed_work_on()
            __queue_delayed_work()  # warning here, but continue
                __queue_work()      # access wq->flags, null-ptr-deref

Check the ret value and return -ENOMEM if it is NULL.

Fixes: 12c2d0a5b8e2 ("net: lan966x: add ethtool configuration and statistics")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'sfc-TC-offload-counters'
David S. Miller [Wed, 16 Nov 2022 09:07:03 +0000 (09:07 +0000)]
Merge branch 'sfc-TC-offload-counters'

Edward Cree says:

====================
sfc: TC offload counters

EF100 hardware supports attaching counters to action-sets in the MAE.
Use these counters to implement stats for TC flower offload.

The counters are delivered to the host over a special hardware RX queue
 which should only ever receive counter update messages, not 'real'
 network packets.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: implement counters readout to TC stats
Edward Cree [Mon, 14 Nov 2022 13:16:01 +0000 (13:16 +0000)]
sfc: implement counters readout to TC stats

On FLOW_CLS_STATS, look up the MAE counter by TC cookie, and report the
 change in packet and byte count since the last time FLOW_CLS_STATS read
 them.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: validate MAE action order
Edward Cree [Mon, 14 Nov 2022 13:16:00 +0000 (13:16 +0000)]
sfc: validate MAE action order

Currently the only actions supported are COUNT and DELIVER, which can only
 happen in the right order; but when more actions are added, it will be
 necessary to check that they are only used in the same order in which the
 hardware performs them (since the hardware API takes an action *set* in
 which the order is implicit).  For instance, a VLAN pop must not follow a
 VLAN push.  Most practical use-cases should be unaffected by these
 restrictions.
Add a function efx_tc_flower_action_order_ok() that checks whether it is
 appropriate to add a specified action to the existing action-set.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: attach an MAE counter to TC actions that need it
Edward Cree [Mon, 14 Nov 2022 13:15:59 +0000 (13:15 +0000)]
sfc: attach an MAE counter to TC actions that need it

The only actions that expect stats (that sfc HW supports) are gact shot
 (drop), mirred redirect and mirred mirror.  Since these are 'deliverish'
 actions that end an action-set, we only require at most one counter per
 action-set.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: accumulate MAE counter values from update packets
Edward Cree [Mon, 14 Nov 2022 13:15:58 +0000 (13:15 +0000)]
sfc: accumulate MAE counter values from update packets

Add the packet and byte counts to the software running total, and store
 the latest jiffies every time the counter is bumped.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: add functions to allocate/free MAE counters
Edward Cree [Mon, 14 Nov 2022 13:15:57 +0000 (13:15 +0000)]
sfc: add functions to allocate/free MAE counters

efx_tc_flower_get_counter_index() will create an MAE counter mapped to
 the passed (TC filter) cookie, or increment the reference if one already
 exists for that cookie.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: add hashtables for MAE counters and counter ID mappings
Edward Cree [Mon, 14 Nov 2022 13:15:56 +0000 (13:15 +0000)]
sfc: add hashtables for MAE counters and counter ID mappings

Nothing populates them yet.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: add extra RX channel to receive MAE counter updates on ef100
Edward Cree [Mon, 14 Nov 2022 13:15:55 +0000 (13:15 +0000)]
sfc: add extra RX channel to receive MAE counter updates on ef100

Currently there is no counter-allocating machinery to connect the
 resulting counter update values to; that will be added in a
 subsequent patch.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: add ef100 MAE counter support functions
Edward Cree [Mon, 14 Nov 2022 13:15:54 +0000 (13:15 +0000)]
sfc: add ef100 MAE counter support functions

Start and stop MAE counter streaming, and grant credits.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: add ability for extra channels to receive raw RX buffers
Edward Cree [Mon, 14 Nov 2022 13:15:53 +0000 (13:15 +0000)]
sfc: add ability for extra channels to receive raw RX buffers

The TC extra channel will need its own special RX handling, which must
 operate before any code that expects the RX buffer to contain a network
 packet; buffers on this RX queue contain MAE counter packets in a
 special format that does not resemble an Ethernet frame, and many fields
 of the RX packet prefix are not populated.
The USER_MARK field, however, is populated with the generation count from
 the counter subsystem, which needs to be passed on to the RX handler.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: add start and stop methods to channels
Edward Cree [Mon, 14 Nov 2022 13:15:52 +0000 (13:15 +0000)]
sfc: add start and stop methods to channels

The TC extra channel needs to do extra work in efx_{start,stop}_channels()
 to start/stop MAE counter streaming from the hardware.  Add callbacks for
 it to implement.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: add ability for an RXQ to grant credits on refill
Edward Cree [Mon, 14 Nov 2022 13:15:51 +0000 (13:15 +0000)]
sfc: add ability for an RXQ to grant credits on refill

EF100 hardware streams MAE counter updates to the driver over a dedicated
 RX queue; however, the MCPU is not able to detect when RX buffers have
 been posted to the ring.  Thus, the driver must call
 MC_CMD_MAE_COUNTERS_STREAM_GIVE_CREDITS; this patch adds the
 infrastructure to support that to the core RXQ handling code.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: fix ef100 RX prefix macro
Edward Cree [Mon, 14 Nov 2022 13:15:50 +0000 (13:15 +0000)]
sfc: fix ef100 RX prefix macro

Macro PREFIX_WIDTH_MASK uses unsigned long arithmetic for a shift of up
 to 32 bits, which breaks on 32-bit systems.  This did not previously
 show up as we weren't using any fields of width 32, but we now need to
 access ESF_GZ_RX_PREFIX_USER_MARK.
Change it to unsigned long long.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoplatform/x86: ideapad-laptop: Add module parameters to match DMI quirk tables
Hans de Goede [Tue, 15 Nov 2022 19:34:00 +0000 (20:34 +0100)]
platform/x86: ideapad-laptop: Add module parameters to match DMI quirk tables

Add module parameters to allow setting the hw_rfkill_switch and
set_fn_lock_led feature flags for testing these on laptops which are not
on the DMI-id based allow lists for these 2 flags.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20221115193400.376159-1-hdegoede@redhat.com
2 years agoplatform/x86: ideapad-laptop: Fix interrupt storm on fn-lock toggle on some Yoga...
Arnav Rawat [Fri, 11 Nov 2022 14:32:09 +0000 (14:32 +0000)]
platform/x86: ideapad-laptop: Fix interrupt storm on fn-lock toggle on some Yoga laptops

Commit 3ae86d2d4704 ("platform/x86: ideapad-laptop: Fix Legion 5 Fn lock
LED") uses the WMI event-id for the fn-lock event on some Legion 5 laptops
to manually toggle the fn-lock LED because the EC does not do it itself.
However, the same WMI ID is also sent on some Yoga laptops. Here, setting
the fn-lock state is not valid behavior, and causes the EC to spam
interrupts until the laptop is rebooted.

Add a set_fn_lock_led_list[] DMI-id list and only enable the workaround to
manually set the LED on models on this list.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=212671
Cc: Meng Dong <whenov@gmail.com>
Signed-off-by: Arnav Rawat <arnavr3@illinois.edu>
Link: https://lore.kernel.org/r/12093851.O9o76ZdvQC@fedora
[hdegoede@redhat.com: Check DMI-id list only once and store the result]
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2 years agoplatform/x86: hp-wmi: Ignore Smart Experience App event
Kai-Heng Feng [Mon, 14 Nov 2022 07:38:41 +0000 (15:38 +0800)]
platform/x86: hp-wmi: Ignore Smart Experience App event

Sometimes hp-wmi driver complains on system resume:
[ 483.116451] hp_wmi: Unknown event_id - 33 - 0x0

According to HP it's a feature called "HP Smart Experience App" and it's
safe to be ignored.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Link: https://lore.kernel.org/r/20221114073842.205392-1-kai.heng.feng@canonical.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2 years agoplatform/surface: aggregator_registry: Add support for Surface Laptop 5
Maximilian Luz [Tue, 15 Nov 2022 23:14:40 +0000 (00:14 +0100)]
platform/surface: aggregator_registry: Add support for Surface Laptop 5

Add device nodes to enable support for battery and charger status, the
ACPI platform profile, as well as internal HID devices (including
touchpad and keyboard) on the Surface Laptop 5.

Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20221115231440.1338142-1-luzmaximilian@gmail.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2 years agonet: dsa: don't leak tagger-owned storage on switch driver unbind
Vladimir Oltean [Mon, 14 Nov 2022 14:35:51 +0000 (16:35 +0200)]
net: dsa: don't leak tagger-owned storage on switch driver unbind

In the initial commit dc452a471dba ("net: dsa: introduce tagger-owned
storage for private and shared data"), we had a call to
tag_ops->disconnect(dst) issued from dsa_tree_free(), which is called at
tree teardown time.

There were problems with connecting to a switch tree as a whole, so this
got reworked to connecting to individual switches within the tree. In
this process, tag_ops->disconnect(ds) was made to be called only from
switch.c (cross-chip notifiers emitted as a result of dynamic tag proto
changes), but the normal driver teardown code path wasn't replaced with
anything.

Solve this problem by adding a function that does the opposite of
dsa_switch_setup_tag_protocol(), which is called from the equivalent
spot in dsa_switch_teardown(). The positioning here also ensures that we
won't have any use-after-free in tagging protocol (*rcv) ops, since the
teardown sequence is as follows:

dsa_tree_teardown
-> dsa_tree_teardown_master
   -> dsa_master_teardown
      -> unsets master->dsa_ptr, making no further packets match the
         ETH_P_XDSA packet type handler
-> dsa_tree_teardown_ports
   -> dsa_port_teardown
      -> dsa_slave_destroy
         -> unregisters DSA net devices, there is even a synchronize_net()
            in unregister_netdevice_many()
-> dsa_tree_teardown_switches
   -> dsa_switch_teardown
      -> dsa_switch_teardown_tag_protocol
         -> finally frees the tagger-owned storage

Fixes: 7f2973149c22 ("net: dsa: make tagging protocols connect to individual switches from a tree")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20221114143551.1906361-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'remove-phylink_validate-from-felix-dsa-driver'
Jakub Kicinski [Wed, 16 Nov 2022 04:34:30 +0000 (20:34 -0800)]
Merge branch 'remove-phylink_validate-from-felix-dsa-driver'

Vladimir Oltean says:

====================
Remove phylink_validate() from Felix DSA driver

The Felix DSA driver still uses its own phylink_validate() procedure
rather than the (relatively newly introduced) phylink_generic_validate()
because the latter did not cater for the case where a PHY provides rate
matching between the Ethernet cable side speed and the SERDES side
speed (and does not advertise other speeds except for the SERDES speed).

This changed with Sean Anderson's generic support for rate matching PHYs
in phylib and phylink:
https://patchwork.kernel.org/project/netdevbpf/cover/20220920221235.1487501-1-sean.anderson@seco.com/

Building upon that support, this patch set makes Linux understand that
the PHYs used in combination with the Felix DSA driver (SCH-30841 riser
card with AQR412 PHY, used with SERDES protocol 0x7777 - 4x2500base-x,
plugged into LS1028A-QDS) do support PAUSE rate matching. This requires
Aquantia PHY driver support for new PHY IDs.

To activate the rate matching support in phylink, config->mac_capabilities
must be populated. Coincidentally, this also opts the Felix driver into
the generic phylink validation.

Next, code that is no longer necessary is eliminated. This includes the
Felix driver validation procedures for VSC9959 and VSC9953, the
workaround in the Ocelot switch library to leave RX flow control always
enabled, as well as DSA plumbing necessary for a custom phylink
validation procedure to be propagated to the hardware driver level.

Many thanks go to Sean Anderson for providing generic support for rate
matching.
====================

Link: https://lore.kernel.org/r/20221114170730.2189282-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: remove phylink_validate() method
Vladimir Oltean [Mon, 14 Nov 2022 17:07:30 +0000 (19:07 +0200)]
net: dsa: remove phylink_validate() method

As of now, no DSA driver uses a custom link mode validation procedure
anymore. So remove this DSA operation and let phylink determine what is
supported based on config->mac_capabilities (if provided by the driver).
Leave a comment why we left the code that we did, and that there is more
work to do.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: mscc: ocelot: drop workaround for forcing RX flow control
Vladimir Oltean [Mon, 14 Nov 2022 17:07:29 +0000 (19:07 +0200)]
net: mscc: ocelot: drop workaround for forcing RX flow control

As phylink gained generic support for PHYs with rate matching via PAUSE
frames, the phylink_mac_link_up() method will be called with the maximum
speed and with rx_pause=true if rate matching is in use. This means that
setups with 2500base-x as the SERDES protocol between the MAC/PCS and
the PHY now work with no need for the driver to do anything special.

Tested with fsl-ls1028a-qds-7777.dts.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: felix: use phylink_generic_validate()
Vladimir Oltean [Mon, 14 Nov 2022 17:07:28 +0000 (19:07 +0200)]
net: dsa: felix: use phylink_generic_validate()

Drop the custom implementation of phylink_validate() in favor of the
generic one, which requires config->mac_capabilities to be set.

This was used up until now because of the possibility of being paired
with Aquantia PHYs with support for rate matching. The phylink framework
gained generic support for these, and knows to advertise all 10/100/1000
lower speed link modes when our SERDES protocol is 2500base-x
(fixed speed).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: phy: aquantia: add AQR112 and AQR412 PHY IDs
Vladimir Oltean [Mon, 14 Nov 2022 17:07:27 +0000 (19:07 +0200)]
net: phy: aquantia: add AQR112 and AQR412 PHY IDs

These are Gen3 Aquantia N-BASET PHYs which support 5GBASE-T,
2.5GBASE-T, 1000BASE-T and 100BASE-TX (not 10G); also EEE, Sync-E,
PTP, PoE.

The 112 is a single PHY package, the 412 is a quad PHY package.

The system-side SERDES interface of these PHYs selects its protocol
depending on the negotiated media side link speed. That protocol can be
1000BASE-X, 2500BASE-X, 10GBASE-R, SGMII, USXGMII.

The configuration of which SERDES protocol to use for which link speed
is made by firmware; even though it could be overwritten over MDIO by
Linux, we assume that the firmware provisioning is ok for the board on
which the driver probes.

For cases when the system side runs at a fixed rate, we want phylib/phylink
to detect the PAUSE rate matching ability of these PHYs, so we need to
use the Aquantia rather than the generic C45 driver. This needs
aqr107_read_status() -> aqr107_read_rate() to set phydev->rate_matching,
as well as the aqr107_get_rate_matching() method.

I am a bit unsure about the naming convention in the driver. Since
AQR107 is a Gen2 PHY, I assume all functions prefixed with "aqr107_"
rather than "aqr_" mean Gen2+ features. So I've reused this naming
convention.

I've tested PHY "SGMII" statistics as well as the .link_change_notify
method, which prints:

Aquantia AQR412 mdio_mux-0.4:00: Link partner is Aquantia PHY, FW 4.3, fast-retrain downshift advertised, fast reframe advertised

Tested SERDES protocols are usxgmii and 2500base-x (the latter with
PAUSE rate matching). Tested link modes are 100/1000/2500 Base-T
(with Aquantia link partner and with other link partners). No notable
events observed.

The placement of these PHY IDs in the driver is right before AQR113C,
a Gen4 PHY.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Jakub Kicinski [Wed, 16 Nov 2022 04:33:19 +0000 (20:33 -0800)]
Merge git://git./linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

1) Fix sparse warning in the new nft_inner expression, reported
   by Jakub Kicinski.

2) Incorrect vlan header check in nft_inner, from Peng Wu.

3) Two patches to pass reset boolean to expression dump operation,
   in preparation for allowing to reset stateful expressions in rules.
   This adds a new NFT_MSG_GETRULE_RESET command. From Phil Sutter.

4) Inconsistent indentation in nft_fib, from Jiapeng Chong.

5) Speed up siphash calculation in conntrack, from Florian Westphal.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: conntrack: use siphash_4u64
  netfilter: rpfilter/fib: clean up some inconsistent indenting
  netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET
  netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters
  netfilter: nft_inner: fix return value check in nft_inner_parse_l2l3()
  netfilter: nft_payload: use __be16 to store gre version
====================

Link: https://lore.kernel.org/r/20221115095922.139954-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoDocumentation: nfp: update documentation
Walter Heymans [Tue, 15 Nov 2022 09:08:34 +0000 (10:08 +0100)]
Documentation: nfp: update documentation

The NFP documentation is updated to include information about Corigine,
and the new NFP3800 chips. The 'Acquiring Firmware' section is updated
with new information about where to find firmware.

Two new sections are added to expand the coverage of the documentation.
The new sections include:
- Devlink Info
- Configure Device

Signed-off-by: Walter Heymans <walter.heymans@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20221115090834.738645-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'mtk_eth_soc-rx-vlan-offload-improvement-dsa-hardware-untag-support'
Jakub Kicinski [Wed, 16 Nov 2022 04:23:18 +0000 (20:23 -0800)]
Merge branch 'mtk_eth_soc-rx-vlan-offload-improvement-dsa-hardware-untag-support'

Felix Fietkau says:

====================
mtk_eth_soc rx vlan offload improvement + dsa hardware untag support

This series improves rx vlan offloading on mtk_eth_soc and extends it to
support hardware DSA untagging where possible.
This improves performance by avoiding calls into the DSA tag driver receive
function, including mangling of skb->data.

This is split out of a previous series, which added other fixes and
multiqueue support
====================

Link: https://lore.kernel.org/r/20221114124214.58199-1-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/x25: Fix skb leak in x25_lapb_receive_frame()
Wei Yongjun [Mon, 14 Nov 2022 11:05:19 +0000 (11:05 +0000)]
net/x25: Fix skb leak in x25_lapb_receive_frame()

x25_lapb_receive_frame() using skb_copy() to get a private copy of
skb, the new skb should be freed in the undersized/fragmented skb
error handling path. Otherwise there is a memory leak.

Fixes: cb101ed2c3c7 ("x25: Handle undersized/fragmented skbs")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20221114110519.514538-1-weiyongjun@huaweicloud.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ag71xx: call phylink_disconnect_phy if ag71xx_hw_enable() fail in ag71xx_open()
Liu Jian [Mon, 14 Nov 2022 09:55:49 +0000 (17:55 +0800)]
net: ag71xx: call phylink_disconnect_phy if ag71xx_hw_enable() fail in ag71xx_open()

If ag71xx_hw_enable() fails, call phylink_disconnect_phy() to clean up.
And if phylink_of_phy_connect() fails, nothing needs to be done.
Compile tested only.

Fixes: 892e09153fa3 ("net: ag71xx: port to phylink")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20221114095549.40342-1-liujian56@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ethernet: mtk_eth_soc: enable hardware DSA untagging
Felix Fietkau [Mon, 14 Nov 2022 12:42:14 +0000 (13:42 +0100)]
net: ethernet: mtk_eth_soc: enable hardware DSA untagging

- pass the tag to DSA via metadata dst
- disabled on 7986 for now, since it's not working yet
- disabled if a MAC is enabled that does not use DSA

This improves performance by bypassing the DSA tag driver and avoiding extra
skb data mangling

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ethernet: mtk_eth_soc: add support for configuring vlan rx offload
Felix Fietkau [Mon, 14 Nov 2022 12:42:13 +0000 (13:42 +0100)]
net: ethernet: mtk_eth_soc: add support for configuring vlan rx offload

Keep the vlan rx offload feature in sync across all netdevs belonging to the
device, since the feature is global and can't be turned off per MAC

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ethernet: mtk_eth_soc: pass correct VLAN protocol ID to the network stack
Felix Fietkau [Mon, 14 Nov 2022 12:42:12 +0000 (13:42 +0100)]
net: ethernet: mtk_eth_soc: pass correct VLAN protocol ID to the network stack

Use the id from the DMA descriptor instead of hardcoding 802.1q

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: add support for DSA rx offloading via metadata dst
Felix Fietkau [Mon, 14 Nov 2022 12:42:11 +0000 (13:42 +0100)]
net: dsa: add support for DSA rx offloading via metadata dst

If a metadata dst is present with the type METADATA_HW_PORT_MUX on a dsa cpu
port netdev, assume that it carries the port number and that there is no DSA
tag present in the skb data.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoAMerge tag 'netfs-fixes-20221115' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 15 Nov 2022 22:56:23 +0000 (14:56 -0800)]
AMerge tag 'netfs-fixes-20221115' of git://git./linux/kernel/git/dhowells/linux-fs

Pull netfx fixes from David Howells:
 "Two fixes, affecting the functions that iterates over the pagecache
  unmarking or unlocking pages after an op is complete:

   - xas_for_each() loops must call xas_retry() first thing and
     immediately do a "continue" in the case that the extracted value is
     a special value that indicates that the walk raced with a
     modification. Fix the unlock and unmark loops to do this.

   - The maths in the unlock loop is dodgy as it could, theoretically,
     at some point in the future end up with a starting file pointer
     that is in the middle of a folio. This will cause a subtraction to
     go negative - but the number is unsigned. Fix the maths to use
     absolute file positions instead of relative page indices"

* tag 'netfs-fixes-20221115' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  netfs: Fix dodgy maths
  netfs: Fix missing xas_retry() calls in xarray iteration

2 years agoMerge tag 'erofs-for-6.1-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 15 Nov 2022 18:30:34 +0000 (10:30 -0800)]
Merge tag 'erofs-for-6.1-rc6-fixes' of git://git./linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
 "Most patches randomly fix error paths or corner cases in fscache mode
  reported recently. One fixes an invalid access relating to fragments
  on crafted images.

  Summary:

   - Fix packed_inode invalid access when reading fragments on crafted
     images

   - Add a missing erofs_put_metabuf() in an error path in fscache mode

   - Fix incorrect `count' for unmapped extents in fscache mode

   - Fix use-after-free of fsid and domain_id string when remounting

   - Fix missing xas_retry() in fscache mode"

* tag 'erofs-for-6.1-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix missing xas_retry() in fscache mode
  erofs: fix use-after-free of fsid and domain_id string
  erofs: get correct count for unmapped range in fscache mode
  erofs: put metabuf in error path in fscache mode
  erofs: fix general protection fault when reading fragment

2 years agox86/cpu: Restore AMD's DE_CFG MSR after resume
Borislav Petkov [Mon, 14 Nov 2022 11:44:01 +0000 (12:44 +0100)]
x86/cpu: Restore AMD's DE_CFG MSR after resume

DE_CFG contains the LFENCE serializing bit, restore it on resume too.
This is relevant to older families due to the way how they do S3.

Unify and correct naming while at it.

Fixes: e4d0e84e4907 ("x86/cpu/AMD: Make LFENCE a serializing instruction")
Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Reported-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agonetfs: Fix dodgy maths
David Howells [Fri, 4 Nov 2022 15:36:49 +0000 (15:36 +0000)]
netfs: Fix dodgy maths

Fix the dodgy maths in netfs_rreq_unlock_folios().  start_page could be
inside the folio, in which case the calculation of pgpos will be come up
with a negative number (though for the moment rreq->start is rounded down
earlier and folios would have to get merged whilst locked)

Alter how this works to just frame the tracking in terms of absolute file
positions, rather than offsets from the start of the I/O request.  This
simplifies the maths and makes it easier to follow.

Fix the issue by using folio_pos() and folio_size() to calculate the end
position of the page.

Fixes: 3d3c95046742 ("netfs: Provide readahead and readpage netfs helpers")
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/Y2SJw7w1IsIik3nb@casper.infradead.org/
Link: https://lore.kernel.org/r/166757988611.950645.7626959069846893164.stgit@warthog.procyon.org.uk/
2 years agonetfs: Fix missing xas_retry() calls in xarray iteration
David Howells [Thu, 3 Nov 2022 16:08:14 +0000 (16:08 +0000)]
netfs: Fix missing xas_retry() calls in xarray iteration

netfslib has a number of places in which it performs iteration of an xarray
whilst being under the RCU read lock.  It *should* call xas_retry() as the
first thing inside of the loop and do "continue" if it returns true in case
the xarray walker passed out a special value indicating that the walk needs
to be redone from the root[*].

Fix this by adding the missing retry checks.

[*] I wonder if this should be done inside xas_find(), xas_next_node() and
    suchlike, but I'm told that's not an simple change to effect.

This can cause an oops like that below.  Note the faulting address - this
is an internal value (|0x2) returned from xarray.

BUG: kernel NULL pointer dereference, address: 0000000000000402
...
RIP: 0010:netfs_rreq_unlock+0xef/0x380 [netfs]
...
Call Trace:
 netfs_rreq_assess+0xa6/0x240 [netfs]
 netfs_readpage+0x173/0x3b0 [netfs]
 ? init_wait_var_entry+0x50/0x50
 filemap_read_page+0x33/0xf0
 filemap_get_pages+0x2f2/0x3f0
 filemap_read+0xaa/0x320
 ? do_filp_open+0xb2/0x150
 ? rmqueue+0x3be/0xe10
 ceph_read_iter+0x1fe/0x680 [ceph]
 ? new_sync_read+0x115/0x1a0
 new_sync_read+0x115/0x1a0
 vfs_read+0xf3/0x180
 ksys_read+0x5f/0xe0
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Changes:
========
ver #2)
 - Changed an unsigned int to a size_t to reduce the likelihood of an
   overflow as per Willy's suggestion.
 - Added an additional patch to fix the maths.

Fixes: 3d3c95046742 ("netfs: Provide readahead and readpage netfs helpers")
Reported-by: George Law <glaw@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/166749229733.107206.17482609105741691452.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/166757987929.950645.12595273010425381286.stgit@warthog.procyon.org.uk/
2 years agoplatform/surface: aggregator_registry: Add support for Surface Pro 9
Maximilian Luz [Sun, 13 Nov 2022 18:59:51 +0000 (19:59 +0100)]
platform/surface: aggregator_registry: Add support for Surface Pro 9

Add device nodes to enable support for battery and charger status, the
ACPI platform profile, as well as internal and type-cover HID devices
(including sensors, touchpad, keyboard, and other miscellaneous devices)
on the Surface Pro 9.

This does not include support for a tablet-mode switch yet, as that is
now handled via the POS subsystem (unlike the Surface Pro 8, where it is
handled via the KIP subsystem) and therefore needs further changes.

While we're at it, also add the missing comment for the Surface Pro 8.

Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20221113185951.224759-2-luzmaximilian@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2 years agoplatform/surface: aggregator: Do not check for repeated unsequenced packets
Maximilian Luz [Sun, 13 Nov 2022 18:59:50 +0000 (19:59 +0100)]
platform/surface: aggregator: Do not check for repeated unsequenced packets

Currently, we check any received packet whether we have already seen it
previously, regardless of the packet type (sequenced / unsequenced). We
do this by checking the sequence number. This assumes that sequence
numbers are valid for both sequenced and unsequenced packets. However,
this assumption appears to be incorrect.

On some devices, the sequence number field of unsequenced packets (in
particular HID input events on the Surface Pro 9) is always zero. As a
result, the current retransmission check kicks in and discards all but
the first unsequenced packet, breaking (among other things) keyboard and
touchpad input.

Note that we have, so far, only seen packets being retransmitted in
sequenced communication. In particular, this happens when there is an
ACK timeout, causing the EC (or us) to re-send the packet waiting for an
ACK. Arguably, retransmission / duplication of unsequenced packets
should not be an issue as there is no logical condition (such as an ACK
timeout) to determine when a packet should be sent again.

Therefore, remove the retransmission check for unsequenced packets
entirely to resolve the issue.

Fixes: c167b9c7e3d6 ("platform/surface: Add Surface Aggregator subsystem")
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20221113185951.224759-1-luzmaximilian@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2 years agoplatform/x86: acer-wmi: Enable SW_TABLET_MODE on Switch V 10 (SW5-017)
Hans de Goede [Fri, 11 Nov 2022 11:16:39 +0000 (12:16 +0100)]
platform/x86: acer-wmi: Enable SW_TABLET_MODE on Switch V 10 (SW5-017)

Like the Acer Switch 10 (SW5-012) and Acer Switch 10 (S1003) models
the Acer Switch V 10 (SW5-017) supports reporting SW_TABLET_MODE
through acer-wmi.

Add a DMI quirk for the SW5-017 setting force_caps to ACER_CAP_KBD_DOCK
(these devices have no other acer-wmi based functionality).

Cc: Rudolf Polzer <rpolzer@google.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20221111111639.35730-1-hdegoede@redhat.com