platform/kernel/linux-rpi.git
2 years agomlxsw: Add support for more than 256 ports in SBSR register
Amit Cohen [Wed, 1 Dec 2021 08:12:38 +0000 (10:12 +0200)]
mlxsw: Add support for more than 256 ports in SBSR register

Add 'port_page' field in SBSR to be able to query occupancy of more than
256 ports. The field determines the range of the ports specified in the
'ingress_port_mask' and 'egress_port_mask' bit masks:
>From '256 * port_page' to '256 * port_page + 255'.

For each local port, the appropriate port page is used. A query is never
performed for a port range that spans multiple port pages.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: Use u16 for local_port field instead of u8
Amit Cohen [Wed, 1 Dec 2021 08:12:37 +0000 (10:12 +0200)]
mlxsw: Use u16 for local_port field instead of u8

Currently, local_port field is saved as u8, which means that maximum 256
ports can be used.

As preparation for Spectrum-4, which will support more than 256 ports,
local_port field should be extended.

Save local_port as u16 to allow use of additional ports.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: reg: Adjust PPCNT register to support local port 255
Amit Cohen [Wed, 1 Dec 2021 08:12:36 +0000 (10:12 +0200)]
mlxsw: reg: Adjust PPCNT register to support local port 255

Local port 255 has a special meaning in PPCNT register, it is used to
refer to all local ports. This wild card ability is not currently used
by the driver.

Special casing local port 255 in Spectrum-4 systems where it is a valid
port is going to be a problem.

Work around this issue by adding and always setting the 'lp_gl' bit
which instructs the device's firmware to treat this local port like an
ordinary port.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: reg: Increase 'port_num' field in PMTDB register
Amit Cohen [Wed, 1 Dec 2021 08:12:35 +0000 (10:12 +0200)]
mlxsw: reg: Increase 'port_num' field in PMTDB register

'port_num' field is used to indicate the local port value which can be
assigned to a module.

Increase the field from 8 bits to 10 bits in order to support more than
255 ports.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: reg: Align existing registers to use extended local_port field
Amit Cohen [Wed, 1 Dec 2021 08:12:34 +0000 (10:12 +0200)]
mlxsw: reg: Align existing registers to use extended local_port field

Add support for 10-bit local ports in device registers by making use of the
MLXSW_ITEM32_LP() macro that was added in the previous patch.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: item: Add support for local_port field in a split form
Amit Cohen [Wed, 1 Dec 2021 08:12:33 +0000 (10:12 +0200)]
mlxsw: item: Add support for local_port field in a split form

Currently, local_port field uses 8 bits, which means that maximum 256
ports can be used.

As preparation for the next ASIC, which will support more than 256
ports, local_port field should be extended to 10 bits.

It is not possible to use 10 consecutive bits in all registers, and
therefore, the field is split into 2 fields:
1. local_port - the existing 8 bits, represent LSB of the extended
   field.
2. lp_msb - extra 2 bits, represent MSB of the extended field.

To avoid complex programming when reading/writing local_port, add a
dedicated macro which creates get and set functions which handle both parts
of local_port.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: reg: Remove unused functions
Amit Cohen [Wed, 1 Dec 2021 08:12:32 +0000 (10:12 +0200)]
mlxsw: reg: Remove unused functions

The functions mlxsw_reg_sfd_uc_unpack() and
mlxsw_reg_sfd_uc_lag_unpack() are not used. Remove them.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum: Bump minimum FW version to xx.2010.1006
Amit Cohen [Wed, 1 Dec 2021 08:12:31 +0000 (10:12 +0200)]
mlxsw: spectrum: Bump minimum FW version to xx.2010.1006

Add latest verified version of Nvidia Spectrum-family switch firmware,
for Spectrum (13.2010.1006), Spectrum-2 (29.2010.1006) and
Spectrum-3 (30.2010.1006).

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
David S. Miller [Wed, 1 Dec 2021 14:46:03 +0000 (14:46 +0000)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
40GbE Intel Wired LAN Driver Updates 2021-11-30

This series contains updates to iavf driver only.

Patryk adds a debug message when MTU is changed.

Grzegorz adds messaging when transitioning in and out of multicast
promiscuous mode.

Jake returns correct error codes for iavf_parse_cls_flower().

Jedrzej adds messaging for when the driver is removed and refactors
struct usage to take less memory. He also adjusts ethtool statistics to
only display information on active queues.

Tony allows for user to specify the RSS hash.

Karen resolves some static analysis warnings, corrects format specifiers,
and rewords a message to come across as informational.

v2:
- Dropped patch 1 (for net) and 5
- Change MTU message from info to debug
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
David S. Miller [Wed, 1 Dec 2021 14:42:23 +0000 (14:42 +0000)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2021-11-30

This series contains updates to ice driver only.

Shiraz corrects assignment of boolean variable and removes an unused
enum.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
David S. Miller [Wed, 1 Dec 2021 14:41:02 +0000 (14:41 +0000)]
Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2021-11-30

Jesper Dangaard Brouer says:

Changes to fix and enable XDP metadata to a specific Intel driver igc.
Tested with hardware i225 that uses driver igc, while testing AF_XDP
access to metadata area.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: natsemi: fix hw address initialization for jazz and xtensa
Max Filippov [Tue, 30 Nov 2021 14:36:00 +0000 (06:36 -0800)]
net: natsemi: fix hw address initialization for jazz and xtensa

Use eth_hw_addr_set function instead of writing the address directly to
net_device::dev_addr.

Fixes: adeef3e32146 ("net: constify netdev->dev_addr")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20211130143600.31970-1-jcmvbkbc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomctp: remove unnecessary check before calling kfree_skb()
Yang Yingliang [Tue, 30 Nov 2021 03:12:43 +0000 (11:12 +0800)]
mctp: remove unnecessary check before calling kfree_skb()

The skb will be checked inside kfree_skb(), so remove the
outside check.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20211130031243.768823-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoiavf: Fix displaying queue statistics shown by ethtool
Jedrzej Jagielski [Fri, 17 Sep 2021 08:52:52 +0000 (08:52 +0000)]
iavf: Fix displaying queue statistics shown by ethtool

Driver provided too many lines as an output to ethtool -S command.
Return actual length of string set of ethtool stats. Instead of predefined
maximal value use the actual value on netdev, iterate over active queues.
Without this patch, ethtool -S report would produce additional
erroneous lines of queues that are not configured.

Signed-off-by: Witold Fijalkowski <witoldx.fijalkowski@intel.com>
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoiavf: Refactor string format to avoid static analysis warnings
Karen Sornek [Tue, 31 Aug 2021 11:39:01 +0000 (13:39 +0200)]
iavf: Refactor string format to avoid static analysis warnings

Change format to match variable type that is used in string.

Use %u format for unsigned variable and %d format for signed variable
to remove static analysis warnings.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Karen Sornek <karen.sornek@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoiavf: Refactor text of informational message
Karen Sornek [Tue, 31 Aug 2021 10:12:02 +0000 (12:12 +0200)]
iavf: Refactor text of informational message

This message is intended to be informational to indicate a reset is about
to happen, but the use of "warning" in the message text can cause concern
with users.  Reword the message to make it less alarming.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Karen Sornek <karen.sornek@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoiavf: Fix static code analysis warning
Karen Sornek [Mon, 30 Aug 2021 08:38:01 +0000 (10:38 +0200)]
iavf: Fix static code analysis warning

Change min() to min_t() to fix static code analysis warning of possible
overflow.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Karen Sornek <karen.sornek@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoiavf: Refactor iavf_mac_filter struct memory usage
Jedrzej Jagielski [Mon, 30 Aug 2021 08:25:36 +0000 (08:25 +0000)]
iavf: Refactor iavf_mac_filter struct memory usage

iavf_mac_filter struct contained couple boolean
flags using up more memory than is necessary.
Change the flags to be bitfields in an anonymous struct
so all the flags now fit in one byte.

Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoiavf: Enable setting RSS hash key
Tony Nguyen [Fri, 16 Jul 2021 22:16:37 +0000 (15:16 -0700)]
iavf: Enable setting RSS hash key

Driver support for changing the RSS hash key exists, however, checks
have caused it to be reported as unsupported. Remove the check and
allow the hash key to be specified.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
2 years agoiavf: Add trace while removing device
Jedrzej Jagielski [Tue, 22 Jun 2021 13:43:48 +0000 (15:43 +0200)]
iavf: Add trace while removing device

Add kernel trace that device was removed.
Currently there is no such information.
I.e. Host admin removes a PCI device from a VM,
than on VM shall be info about the event.

This patch adds info log to iavf_remove function.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoiavf: return errno code instead of status code
Jacob Keller [Fri, 4 Jun 2021 16:53:34 +0000 (09:53 -0700)]
iavf: return errno code instead of status code

The iavf_parse_cls_flower function returns an integer error code, and
not an iavf_status enumeration.

Fix the function to use the standard errno value EINVAL as its return
instead of using IAVF_ERR_CONFIG.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoiavf: Log info when VF is entering and leaving Allmulti mode
Grzegorz Szczurek [Fri, 4 Jun 2021 16:53:32 +0000 (09:53 -0700)]
iavf: Log info when VF is entering and leaving Allmulti mode

Add log when VF is entering and leaving Allmulti mode.
The change of VF state is visible in dmesg now.
Without this commit, entering and leaving Allmulti mode
is not logged in dmesg.

Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoiavf: Add change MTU message
Patryk Małek [Fri, 4 Jun 2021 16:53:30 +0000 (09:53 -0700)]
iavf: Add change MTU message

Add a netdev_dbg log entry in case of a change of MTU so that user is
notified about this change in the same manner as in case of pf driver.

Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoigc: enable XDP metadata in driver
Jesper Dangaard Brouer [Mon, 15 Nov 2021 20:36:30 +0000 (21:36 +0100)]
igc: enable XDP metadata in driver

Enabling the XDP bpf_prog access to data_meta area is a very small
change. Hint passing 'true' to xdp_prepare_buff().

The SKB layers can also access data_meta area, which required more
driver changes to support. Reviewers, notice the igc driver have two
different functions that can create SKBs, depending on driver config.

Hint for testers, ethtool priv-flags legacy-rx enables
the function igc_construct_skb()

 ethtool --set-priv-flags DEV legacy-rx on

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoigc: AF_XDP zero-copy metadata adjust breaks SKBs on XDP_PASS
Jesper Dangaard Brouer [Mon, 15 Nov 2021 20:36:25 +0000 (21:36 +0100)]
igc: AF_XDP zero-copy metadata adjust breaks SKBs on XDP_PASS

Driver already implicitly supports XDP metadata access in AF_XDP
zero-copy mode, as xsk_buff_pool's xp_alloc() naturally set xdp_buff
data_meta equal data.

This works fine for XDP and AF_XDP, but if a BPF-prog adjust via
bpf_xdp_adjust_meta() and choose to call XDP_PASS, then igc function
igc_construct_skb_zc() will construct an invalid SKB packet. The
function correctly include the xdp->data_meta area in the memcpy, but
forgot to pull header to take metasize into account.

Fixes: fc9df2a0b520 ("igc: Enable RX via AF_XDP zero-copy")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agonet/ice: Remove unused enum
Shiraz Saleem [Wed, 24 Nov 2021 12:41:36 +0000 (06:41 -0600)]
net/ice: Remove unused enum

Remove ice_devlink_param_id enum as its not used.

Suggested-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agonet/ice: Fix boolean assignment
Shiraz Saleem [Wed, 24 Nov 2021 12:41:35 +0000 (06:41 -0600)]
net/ice: Fix boolean assignment

vbool in ice_devlink_enable_roce_get can be assigned to a
non-0/1 constant.

Fix this assignment of vbool to be 0/1.

Fixes: e523af4ee560 ("net/ice: Add support for enable_iwarp and enable_roce devlink param")
Suggested-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agonet: ixp4xx_hss: drop kfree for memory allocated with devm_kzalloc
Wei Yongjun [Tue, 30 Nov 2021 10:48:40 +0000 (10:48 +0000)]
net: ixp4xx_hss: drop kfree for memory allocated with devm_kzalloc

It's not necessary to free memory allocated with devm_kzalloc
and using kfree leads to a double free.

Fixes: 35aefaad326b ("net: ixp4xx_hss: Convert to use DT probing")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mscc: ocelot: fix mutex_lock not released
Lv Ruyi [Tue, 30 Nov 2021 11:24:43 +0000 (11:24 +0000)]
net: mscc: ocelot: fix mutex_lock not released

If err is true, the function will be returned, but mutex_lock isn't
released.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: make symbol 'hclge_mac_speed_map_to_fw' static
Wei Yongjun [Tue, 30 Nov 2021 11:34:37 +0000 (11:34 +0000)]
net: hns3: make symbol 'hclge_mac_speed_map_to_fw' static

The sparse tool complains as follows:

drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:2656:28: warning:
 symbol 'hclge_mac_speed_map_to_fw' was not declared. Should it be static?

This symbol is not used outside of hclge_main.c, so marks it static.

Fixes: e46da6a3d4d3 ("net: hns3: refine function hclge_cfg_mac_speed_dup_hw()")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'prestera-next'
David S. Miller [Tue, 30 Nov 2021 12:26:01 +0000 (12:26 +0000)]
Merge branch 'prestera-next'

Volodymyr Mytnyk says:

====================
net: prestera: acl: migrate to new vTcam/counter api

This patch series aims to use new vTcam and Counter API
provided by latest fw version. The advantage of using
this API is the following:

- provides a way to have a rule with desired Tcam size (improves
  Tcam memory utilization).
- batch support for acl counters gathering (improves performance)
- gives more control over HW ACL engine (actions/matches/bindings)
  to be able to support more features in the future driver
  versions

Note: the feature set left the same as was before this patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: prestera: acl: add rule stats support
Volodymyr Mytnyk [Tue, 30 Nov 2021 10:33:00 +0000 (12:33 +0200)]
net: prestera: acl: add rule stats support

Make flower to use counter API to get rule HW statistics.

Co-developed-by: Serhiy Boiko <serhiy.boiko@marvell.com>
Signed-off-by: Serhiy Boiko <serhiy.boiko@marvell.com>
Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: prestera: add counter HW API
Volodymyr Mytnyk [Tue, 30 Nov 2021 10:32:59 +0000 (12:32 +0200)]
net: prestera: add counter HW API

Add counter API for getting HW statistics.

- HW statistics gathered by this API are deleyed.
- Batch of conters is supported.
- acl stat is supported.

Co-developed-by: Serhiy Boiko <serhiy.boiko@marvell.com>
Signed-off-by: Serhiy Boiko <serhiy.boiko@marvell.com>
Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: prestera: acl: migrate to new vTCAM api
Volodymyr Mytnyk [Tue, 30 Nov 2021 10:32:58 +0000 (12:32 +0200)]
net: prestera: acl: migrate to new vTCAM api

- Add new vTCAM HW API to configure HW ACLs.
- Migrate acl to use new vTCAM HW API.
- No counter support in this patch-set.

Co-developed-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodevlink: Simplify devlink resources unregister call
Leon Romanovsky [Tue, 30 Nov 2021 10:16:20 +0000 (12:16 +0200)]
devlink: Simplify devlink resources unregister call

The devlink_resources_unregister() used second parameter as an
entry point for the recursive removal of devlink resources. None
of the callers outside of devlink core needed to use this field,
so let's remove it.

As part of this removal, the "struct devlink_resource" was moved
from .h to .c file as it is not possible to use in any place in
the code except devlink.c.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mdio: mscc-miim: Set back the optional resource.
Horatiu Vultur [Tue, 30 Nov 2021 09:57:45 +0000 (10:57 +0100)]
net: mdio: mscc-miim: Set back the optional resource.

In the blamed commit, the second memory resource was not considered
anymore as optional. On some platforms like sparx5 the second resource
is optional. So add it back as optional and restore the comment that
says so.

Fixes: a27a762828375a ("net: mdio: mscc-miim: convert to a regmap implementation")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agobond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device
Hangbin Liu [Tue, 30 Nov 2021 07:09:32 +0000 (15:09 +0800)]
bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device

We have VLAN PTP support(via get_ts_info) on kernel, and bond support(by
getting active interface via netlink message) on userspace tool linuxptp.
But there are always some users who want to use PTP with VLAN over bond,
which is not able to do with the current implementation.

This patch passed get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device
with bond mode active-backup/tlb/alb. With this users could get kernel native
bond or VLAN over bond PTP support.

Test with ptp4l and it works with VLAN over bond after this patch:
]# ptp4l -m -i bond0.23
ptp4l[53377.141]: selected /dev/ptp4 as PTP clock
ptp4l[53377.142]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[53377.143]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[53377.143]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[53384.127]: port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
ptp4l[53384.127]: selected local clock e41d2d.fffe.123db0 as best master
ptp4l[53384.127]: port 1: assuming the grand master role

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: cxgb: fix a typo in kernel doc
Jean Sacren [Tue, 30 Nov 2021 07:03:11 +0000 (00:03 -0700)]
net: cxgb: fix a typo in kernel doc

Fix a trivial typo of 'pakcet' in cxgb kernel doc.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: cxgb3: fix typos in kernel doc
Jean Sacren [Tue, 30 Nov 2021 07:03:10 +0000 (00:03 -0700)]
net: cxgb3: fix typos in kernel doc

Fix two trivial typos of 'pakcet' in cxgb3 kernel doc.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoBonding: add arp_missed_max option
Hangbin Liu [Tue, 30 Nov 2021 04:29:47 +0000 (12:29 +0800)]
Bonding: add arp_missed_max option

Currently, we use hard code number to verify if we are in the
arp_interval timeslice. But some user may want to reduce/extend
the verify timeslice. With the similar team option 'missed_max'
the uers could change that number based on their own environment.

Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lantiq: fix missing free_netdev() on error in ltq_etop_probe()
Yang Yingliang [Tue, 30 Nov 2021 03:38:37 +0000 (11:38 +0800)]
net: lantiq: fix missing free_netdev() on error in ltq_etop_probe()

Add the missing free_netdev() before return from ltq_etop_probe()
in the error handling case.

Fixes: 14d4e308e0aa ("net: lantiq: configure the burst length in ethernet drivers")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipv6: use the new fib6_nh_release_dsts helper in fib6_nh_release
Nikolay Aleksandrov [Mon, 29 Nov 2021 15:44:11 +0000 (17:44 +0200)]
net: ipv6: use the new fib6_nh_release_dsts helper in fib6_nh_release

We can remove a bit of code duplication by reusing the new
fib6_nh_release_dsts helper in fib6_nh_release. Their only difference is
that fib6_nh_release's version doesn't use atomic operation to swap the
pointers because it assumes the fib6_nh is no longer visible, while
fib6_nh_release_dsts can be used anywhere.

Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: nexthop: reduce rcu synchronizations when replacing resilient groups
Nikolay Aleksandrov [Mon, 29 Nov 2021 12:09:24 +0000 (14:09 +0200)]
net: nexthop: reduce rcu synchronizations when replacing resilient groups

We can optimize resilient nexthop group replaces by reducing the number of
synchronize_net calls. After commit 1005f19b9357 ("net: nexthop: release
IPv6 per-cpu dsts when replacing a nexthop group") we always do a
synchronize_net because we must ensure no new dsts can be created for the
replaced group's removed nexthops, but we already did that when replacing
resilient groups, so if we always call synchronize_net after any group
type replacement we'll take care of both cases and reduce synchronize_net
calls for resilient groups.

Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/tls: simplify the tls_set_sw_offload function
Tianjia Zhang [Mon, 29 Nov 2021 11:10:14 +0000 (19:10 +0800)]
net/tls: simplify the tls_set_sw_offload function

Assigning crypto_info variables in advance can simplify the logic
of accessing value and move related local variables to a smaller
scope.

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: stmmac: Add platform level debug register dump feature
Bhupesh Sharma [Sun, 28 Nov 2021 19:58:54 +0000 (01:28 +0530)]
net: stmmac: Add platform level debug register dump feature

dwmac-qcom-ethqos currently exposes a mechanism to dump rgmii registers
after the 'stmmac_dvr_probe()' returns. However with commit
5ec55823438e ("net: stmmac: add clocks management for gmac driver"),
we now let 'pm_runtime_put()' disable the clocks before returning from
'stmmac_dvr_probe()'.

This causes a crash when 'rgmii_dump()' register dumps are enabled,
as the clocks are already off.

Since other dwmac drivers (possible future users as well) might
require a similar register dump feature, introduce a platform level
callback to allow the same.

This fixes the crash noticed while enabling rgmii_dump() dumps in
dwmac-qcom-ethqos driver as well. It also allows future changes
to keep a invoking the register dump callback from the correct
place inside 'stmmac_dvr_probe()'.

Fixes: 5ec55823438e ("net: stmmac: add clocks management for gmac driver")
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Bhupesh Sharma <bhupesh.sharma@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoethtool: netlink: Slightly simplify 'ethnl_features_to_bitmap()'
Christophe JAILLET [Sun, 28 Nov 2021 11:03:30 +0000 (12:03 +0100)]
ethtool: netlink: Slightly simplify 'ethnl_features_to_bitmap()'

The 'dest' bitmap is fully initialized by the 'for' loop, so there is no
need to explicitly reset it.

This also makes this function in line with 'ethnl_features_to_bitmap32()'
which does not clear the destination before writing it.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/17fca158231c6f03689bd891254f0dd1f4e84cb8.1638091829.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ifb: support ethtools stats
Tonghao Zhang [Sun, 28 Nov 2021 01:46:31 +0000 (09:46 +0800)]
net: ifb: support ethtools stats

With this feature, we can use the ethtools to get tx/rx
queues stats. This patch, introduce the ifb_update_q_stats
helper to update the queues stats, and ifb_q_stats to simplify
the codes.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Link: https://lore.kernel.org/r/20211128014631.43627-1-xiangxia.m.yue@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agostmmac: remove ethtool driver version info
Heiner Kallweit [Sun, 28 Nov 2021 18:45:56 +0000 (19:45 +0100)]
stmmac: remove ethtool driver version info

I think there's no benefit in reporting a date from almost 6 yrs ago.
Let ethtool report the default (kernel version) instead.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: felix: fix flexible_array.cocci warnings
kernel test robot [Sat, 27 Nov 2021 18:03:20 +0000 (19:03 +0100)]
net: dsa: felix: fix flexible_array.cocci warnings

Zero-length and one-element arrays are deprecated, see
Documentation/process/deprecated.rst
Flexible-array members should be used instead.

Generated by: scripts/coccinelle/misc/flexible_array.cocci

Fixes: 23ae3a787771 ("net: dsa: felix: add stream gate settings for psfp")
CC: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'hns3-cleanups'
David S. Miller [Mon, 29 Nov 2021 14:26:18 +0000 (14:26 +0000)]
Merge branch 'hns3-cleanups'

Guangbin Huang says:

====================
hns3: some cleanups for -next

To improve code readability and simplicity, this series refactor some
functions in the HNS3 ethernet driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: split function hns3_set_l2l3l4()
Yufeng Mo [Mon, 29 Nov 2021 14:00:27 +0000 (22:00 +0800)]
net: hns3: split function hns3_set_l2l3l4()

Function hns3_set_l2l3l4() is a bit too long. So add two
new functions hns3_set_l3_type() and hns3_set_l4_csum_length()
to simplify code and improve code readability.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: split function hns3_handle_bdinfo()
Yufeng Mo [Mon, 29 Nov 2021 14:00:26 +0000 (22:00 +0800)]
net: hns3: split function hns3_handle_bdinfo()

Function hns3_handle_bdinfo() is a bit too long. So add two
new functions hns3_handle_rx_ts_info() and hns3_handle_rx_vlan_tag(
to simplify code and improve code readability.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: split function hns3_nic_get_stats64()
Yufeng Mo [Mon, 29 Nov 2021 14:00:25 +0000 (22:00 +0800)]
net: hns3: split function hns3_nic_get_stats64()

Function hns3_nic_get_stats64() is a bit too long. So add a
new function hns3_fetch_stats() to simplify code and improve
code readability.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: refine function hclge_tm_pri_q_qs_cfg()
Guangbin Huang [Mon, 29 Nov 2021 14:00:24 +0000 (22:00 +0800)]
net: hns3: refine function hclge_tm_pri_q_qs_cfg()

This patch encapsulates the process code for queue to qset config of two
mode(tc based and vnet based) into two function, for making code more
concise.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: add new function hclge_tm_schd_mode_tc_base_cfg()
Guangbin Huang [Mon, 29 Nov 2021 14:00:23 +0000 (22:00 +0800)]
net: hns3: add new function hclge_tm_schd_mode_tc_base_cfg()

This patch encapsulates the process code of tc based schedule mode of
function hclge_tm_lvl34_schd_mode_cfg() into a new function
hclge_tm_schd_mode_tc_base_cfg(). It make code more concise and the new
process code can be reused.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: refine function hclge_cfg_mac_speed_dup_hw()
Guangbin Huang [Mon, 29 Nov 2021 14:00:22 +0000 (22:00 +0800)]
net: hns3: refine function hclge_cfg_mac_speed_dup_hw()

To reuse the code of converting speed of driver to speed of firmware in
function hclge_cfg_mac_speed_dup_hw(), encapsulate them into a new
function hclge_convert_to_fw_speed().

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: split function hns3_get_tx_timeo_queue_info()
Yufeng Mo [Mon, 29 Nov 2021 14:00:21 +0000 (22:00 +0800)]
net: hns3: split function hns3_get_tx_timeo_queue_info()

Function hns3_get_tx_timeo_queue_info() is a bit too long. So add two
new functions hns3_dump_queue_stats() and hns3_dump_queue_reg() to
simplify code and improve code readability.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: refactor two hns3 debugfs functions
Jie Wang [Mon, 29 Nov 2021 14:00:20 +0000 (22:00 +0800)]
net: hns3: refactor two hns3 debugfs functions

Use for statement to optimize some print work of function
hclge_dbg_dump_rst_info() and hclge_dbg_dump_mac_enable_status() to
improve code simplicity.

Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: refactor hns3_nic_reuse_page()
Hao Chen [Mon, 29 Nov 2021 14:00:19 +0000 (22:00 +0800)]
net: hns3: refactor hns3_nic_reuse_page()

Split rx copybreak handle into a separate function from function
hns3_nic_reuse_page() to improve code simplicity.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: refactor reset_prepare_general retry statement
Jiaran Zhang [Mon, 29 Nov 2021 14:00:18 +0000 (22:00 +0800)]
net: hns3: refactor reset_prepare_general retry statement

Currently, the hclge_reset_prepare_general function uses the goto
statement to jump upwards, which increases code complexity and makes
the program structure difficult to understand. In addition, if
reset_pending is set, retry_cnt cannot be increased. This may result
in a failure to exit the retry or increase the number of retries.

Use the while statement instead to make the program easier to understand
and solve the problem that the goto statement cannot be exited.

Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: snmp: add statistics for tcp small queue check
Menglong Dong [Sun, 28 Nov 2021 06:01:02 +0000 (14:01 +0800)]
net: snmp: add statistics for tcp small queue check

Once tcp small queue check failed in tcp_small_queue_check(), the
throughput of tcp will be limited, and it's hard to distinguish
whether it is out of tcp congestion control.

Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodevlink: Remove misleading internal_flags from health reporter dump
Leon Romanovsky [Sun, 28 Nov 2021 12:14:46 +0000 (14:14 +0200)]
devlink: Remove misleading internal_flags from health reporter dump

DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET command doesn't have .doit callback
and has no use in internal_flags at all. Remove this misleading assignment.

Fixes: e44ef4e4516c ("devlink: Hang reporter's dump method on a dumpit cb")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'seville-shared-mdio'
David S. Miller [Mon, 29 Nov 2021 13:02:01 +0000 (13:02 +0000)]
Merge branch 'seville-shared-mdio'

Colin Foster says:

====================
update seville to use shared MDIO driver

This patch set exposes and utilizes the shared MDIO bus in
drivers/net/mdio/msio-mscc-miim.c

v3:
    * Fix errors using uninitilized "dev" inside the probe function.
    * Remove phy_regmap from the setup function, since it currently
    isn't used
    * Remove GCB_PHY_PHY_CFG definition from ocelot.h - it isn't used
    yet...

v2:
    * Error handling (thanks Andrew Lunn)
    * Fix logic errors calling mscc_miim_setup during patch 1/3 (thanks
    Jakub Kicinski)
    * Remove unnecessary felix_mdio file (thanks Vladimir Oltean)
    * Pass NULL to mscc_miim_setup instead of GCB_PHY_PHY_CFG, since the
    phy reset isn't handled at that point of the Seville driver (patch
    3/3)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: ocelot: felix: utilize shared mscc-miim driver for indirect MDIO access
Colin Foster [Mon, 29 Nov 2021 01:57:37 +0000 (17:57 -0800)]
net: dsa: ocelot: felix: utilize shared mscc-miim driver for indirect MDIO access

Switch to a shared MDIO access implementation by way of the mdio-mscc-miim
driver.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: ocelot: seville: utilize of_mdiobus_register
Colin Foster [Mon, 29 Nov 2021 01:57:36 +0000 (17:57 -0800)]
net: dsa: ocelot: seville: utilize of_mdiobus_register

Switch seville to use of_mdiobus_register(bus, NULL) instead of just
mdiobus_register. This code is about to be pulled into a separate module
that can optionally define ports by the device_node.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mdio: mscc-miim: convert to a regmap implementation
Colin Foster [Mon, 29 Nov 2021 01:57:35 +0000 (17:57 -0800)]
net: mdio: mscc-miim: convert to a regmap implementation

Utilize regmap instead of __iomem to perform indirect mdio access. This
will allow for custom regmaps to be used by way of the mscc_miim_setup
function.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'lan966x-driver'
David S. Miller [Mon, 29 Nov 2021 12:58:39 +0000 (12:58 +0000)]
Merge branch 'lan966x-driver'

Horatiu Vultur says:

====================
net: lan966x: Add lan966x switch driver

This patch series add support for Microchip lan966x driver

The lan966x switch is a multi-port Gigabit AVB/TSN Ethernet Switch with
two integrated 10/100/1000Base-T PHYs. In addition to the integrated PHYs,
it supports up to 2RGMII/RMII, up to 3BASE-X/SERDES/2.5GBASE-X and up to
2 Quad-SGMII/Quad-USGMII interfaces.

Initially it adds support only for the ports to behave as simple
NIC cards. In the future patches it would be extended with other
functionality like Switchdev, PTP, Frame DMA, VCAP, etc.

v4->v5:
- more fixes to the reset of the switch, require all resources before
  activating the hardware
- fix to lan966x-switch binding
- implement get/set_pauseparam in ethtool_ops
- stop calling lan966x_port_link_down when calling lan966x_port_pcs_set and
  call it in lan966x_phylink_mac_link_down

v3->v4:
- add timeouts when injecting/extracting frames, in case the HW breaks
- simplify the creation of the IFH
- fix the order of operations in lan966x_cleanup_ports
- fixes to phylink based on Russel review

v2->v3:
- fix compiling issues for x86
- fix resource management in first patch

v1->v2:
- add new patch for MAINTAINERS
- add functions lan966x_mac_cpu_learn/forget
- fix build issues with second patch
- fix the reset of the switch, return error if there is no reset controller
- start to use phylink_mii_c22_pcs_decode_state and
  phylink_mii_c22_pcs_encode_advertisement to remove duplicate code
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lan966x: Update MAINTAINERS to include lan966x driver
Horatiu Vultur [Mon, 29 Nov 2021 12:43:59 +0000 (13:43 +0100)]
net: lan966x: Update MAINTAINERS to include lan966x driver

Update MAINTAINERS to include lan966x driver

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lan966x: add ethtool configuration and statistics
Horatiu Vultur [Mon, 29 Nov 2021 12:43:58 +0000 (13:43 +0100)]
net: lan966x: add ethtool configuration and statistics

This patch adds support for statistics counters for the network
interfaces. Also adds support for configuring the network interface via
ethtool like: speed, duplex etc.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lan966x: add mactable support
Horatiu Vultur [Mon, 29 Nov 2021 12:43:57 +0000 (13:43 +0100)]
net: lan966x: add mactable support

This patch adds support for MAC table operations like add and forget.
Also add the functionality to read the MAC address from DT, if there is
no MAC set in DT it would use a random one.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lan966x: add port module support
Horatiu Vultur [Mon, 29 Nov 2021 12:43:56 +0000 (13:43 +0100)]
net: lan966x: add port module support

This patch adds support for netdev and phylink in the switch. The
injection + extraction is register based. This will be replaced with DMA
accees.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lan966x: add the basic lan966x driver
Horatiu Vultur [Mon, 29 Nov 2021 12:43:55 +0000 (13:43 +0100)]
net: lan966x: add the basic lan966x driver

This patch adds basic SwitchDev driver framework for lan966x. It
includes only the IO range mapping and probing of the switch.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: lan966x: Add lan966x-switch bindings
Horatiu Vultur [Mon, 29 Nov 2021 12:43:54 +0000 (13:43 +0100)]
dt-bindings: net: lan966x: Add lan966x-switch bindings

Document the lan966x switch device driver bindings

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ixp4xx_hss: Convert to use DT probing
Linus Walleij [Mon, 22 Nov 2021 22:35:30 +0000 (23:35 +0100)]
net: ixp4xx_hss: Convert to use DT probing

IXP4xx is being migrated to device tree only. Convert this
driver to use device tree probing.

Pull in all the boardfile code from the one boardfile and
make it local, pull all the boardfile parameters from the
device tree instead of the board file.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: Add bindings for IXP4xx V.35 WAN HSS
Linus Walleij [Mon, 22 Nov 2021 22:35:29 +0000 (23:35 +0100)]
dt-bindings: net: Add bindings for IXP4xx V.35 WAN HSS

This adds device tree bindings for the IXP4xx V.35 WAN high
speed serial (HSS) link.

An example is added to the NPE example where the HSS appears
as a child.

Cc: devicetree@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: rtl8365mb: set RGMII RX delay in steps of 0.3 ns
Alvin Šipraga [Mon, 29 Nov 2021 10:30:19 +0000 (11:30 +0100)]
net: dsa: rtl8365mb: set RGMII RX delay in steps of 0.3 ns

A contact at Realtek has clarified what exactly the units of RGMII RX
delay are. The answer is that the unit of RX delay is "about 0.3 ns".
Take this into account when parsing rx-internal-delay-ps by
approximating the closest step value. Delays of more than 2.1 ns are
rejected.

This obviously contradicts the previous assumption in the driver that a
step value of 4 was "about 2 ns", but Realtek also points out that it is
easy to find more than one RX delay step value which makes RGMII work.

Fixes: 4af2950c50c8 ("net: dsa: realtek-smi: add rtl8365mb subdriver for RTL8365MB-VC")
Cc: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: rtl8365mb: fix garbled comment
Alvin Šipraga [Mon, 29 Nov 2021 10:30:18 +0000 (11:30 +0100)]
net: dsa: rtl8365mb: fix garbled comment

Fixes: 4af2950c50c8 ("net: dsa: realtek-smi: add rtl8365mb subdriver for RTL8365MB-VC")
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: realtek-smi: don't log an error on EPROBE_DEFER
Alvin Šipraga [Mon, 29 Nov 2021 10:30:17 +0000 (11:30 +0100)]
net: dsa: realtek-smi: don't log an error on EPROBE_DEFER

Probe deferral is not an error, so don't log this as an error:

[0.590156] realtek-smi ethernet-switch: unable to register switch ret = -517

Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: net: bridge: fix typo in vlan_filtering dependency test
Ivan Vecera [Mon, 29 Nov 2021 09:58:50 +0000 (10:58 +0100)]
selftests: net: bridge: fix typo in vlan_filtering dependency test

Prior patch:
]# TESTS=vlmc_filtering_test ./bridge_vlan_mcast.sh
TEST: Vlan multicast snooping enable                                [ OK ]
Device "bridge" does not exist.
TEST: Disable multicast vlan snooping when vlan filtering is disabled   [FAIL]
        Vlan filtering is disabled but multicast vlan snooping is still enabled

After patch:
# TESTS=vlmc_filtering_test ./bridge_vlan_mcast.sh
TEST: Vlan multicast snooping enable                                [ OK ]
TEST: Disable multicast vlan snooping when vlan filtering is disabled   [ OK ]

Fixes: f5a9dd58f48b7c ("selftests: net: bridge: add test for vlan_filtering dependency")
Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'mpls-cleanups'
David S. Miller [Mon, 29 Nov 2021 12:46:52 +0000 (12:46 +0000)]
Merge branch 'mpls-cleanups'

Benjamin Poirier says:

====================
net: mpls: Cleanup nexthop iterator macros

The mpls macros for_nexthops and change_nexthops were probably copied
from decnet or ipv4 but they grew a superfluous variable and lost a
beneficial "const".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mpls: Make for_nexthops iterator const
Benjamin Poirier [Mon, 29 Nov 2021 06:23:16 +0000 (15:23 +0900)]
net: mpls: Make for_nexthops iterator const

There are separate for_nexthops and change_nexthops iterators. The
for_nexthops variant should use const.

Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mpls: Remove duplicate variable from iterator macro
Benjamin Poirier [Mon, 29 Nov 2021 06:23:15 +0000 (15:23 +0900)]
net: mpls: Remove duplicate variable from iterator macro

__nh is just a copy of nh with a different type.

Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'qualcomm-bam-dmux'
David S. Miller [Mon, 29 Nov 2021 12:27:34 +0000 (12:27 +0000)]
Merge branch 'qualcomm-bam-dmux'

Stephan Gerhold says:

====================
net: wwan: Add Qualcomm BAM-DMUX WWAN network driver

The BAM Data Multiplexer provides access to the network data channels
of modems integrated into many older Qualcomm SoCs, e.g. Qualcomm MSM8916
or MSM8974. This series adds a driver that allows using it.

All the changes in this patch series are based on a quite complicated
driver from Qualcomm [1]. The driver has been used in postmarketOS [2]
on various smartphones/tablets based on Qualcomm MSM8916 and MSM8974
for more than a year now with no reported problems. It works out of
the box with open-source WWAN userspace such as ModemManager.

[1]: https://source.codeaurora.org/quic/la/kernel/msm-3.10/tree/drivers/soc/qcom/bam_dmux.c?h=LA.BR.1.2.9.1-02310-8x16.0
[2]: https://postmarketos.org/

Changes in v3:
  - Clarify DT schema based on discussion
  - Drop bam_dma/dmaengine patches since they already landed in 5.16
  - Rebase on net-next
  - Simplify cover letter and commit messages

Changes in v2:
  - Rename "qcom,remote-power-collapse" -> "qcom,powered-remotely"
  - Rebase on net-next and fix conflicts
  - Rename network interfaces from "rmnet%d" -> "wwan%d"
  - Fix wrong file name in MAINTAINERS entry
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: Add Qualcomm BAM-DMUX WWAN network driver
Stephan Gerhold [Sat, 27 Nov 2021 17:31:08 +0000 (18:31 +0100)]
net: wwan: Add Qualcomm BAM-DMUX WWAN network driver

The BAM Data Multiplexer provides access to the network data channels of
modems integrated into many older Qualcomm SoCs, e.g. Qualcomm MSM8916 or
MSM8974. It is built using a simple protocol layer on top of a DMA engine
(Qualcomm BAM) and bidirectional interrupts to coordinate power control.

The modem announces a fixed set of channels by sending an OPEN command.
The driver exports each channel as separate network interface so that
a connection can be established via QMI from userspace. The network
interface can work either in Ethernet or Raw-IP mode (configurable via
QMI). However, Ethernet mode seems to be broken with most firmwares
(network packets are actually received as Raw-IP), therefore the driver
only supports Raw-IP mode.

Note that the control channel (QMI/AT) is entirely separate from
BAM-DMUX and is already supported by the RPMSG_WWAN_CTRL driver.

The driver uses runtime PM to coordinate power control with the modem.
TX/RX buffers are put in a kind of "ring queue" and submitted via
the bam_dma driver of the DMAEngine subsystem.

The basic architecture looks roughly like this:

                   +------------+                +-------+
         [IPv4/6]  |  BAM-DMUX  |                |       |
         [Data...] |            |                |       |
        ---------->|wwan0       | [DMUX chan: x] |       |
         [IPv4/6]  | (chan: 0)  | [IPv4/6]       |       |
         [Data...] |            | [Data...]      |       |
        ---------->|wwan1       |--------------->| Modem |
                   | (chan: 1)  |      BAM       |       |
         [IPv4/6]  | ...        |  (DMA Engine)  |       |
         [Data...] |            |                |       |
        ---------->|wwan7       |                |       |
                   | (chan: 7)  |                |       |
                   +------------+                +-------+

Note that some newer firmware versions support QMAP ("rmnet" driver)
as additional multiplexing layer on top of BAM-DMUX, but this is not
currently supported by this driver.

Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: Add schema for Qualcomm BAM-DMUX
Stephan Gerhold [Sat, 27 Nov 2021 17:31:07 +0000 (18:31 +0100)]
dt-bindings: net: Add schema for Qualcomm BAM-DMUX

The BAM Data Multiplexer provides access to the network data channels of
modems integrated into many older Qualcomm SoCs, e.g. Qualcomm MSM8916 or
MSM8974. It is built using a simple protocol layer on top of a DMA engine
(Qualcomm BAM) and bidirectional interrupts to coordinate power control.

The device tree node combines the incoming interrupt with the outgoing
interrupts (smem-states) as well as the two DMA channels, which allows
the BAM-DMUX driver to request all necessary resources.

Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'vxlan-port'
David S. Miller [Mon, 29 Nov 2021 12:19:53 +0000 (12:19 +0000)]
Merge branch 'vxlan-port'

Guangbin Huang says:

====================
net: vxlan: add macro definition for number of IANA VXLAN-GPE port

This series add macro definition for number of IANA VXLAN-GPE port for
cleanup.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: use macro IANA_VXLAN_GPE_UDP_PORT to replace number 4790
Hao Chen [Sat, 27 Nov 2021 09:34:05 +0000 (17:34 +0800)]
net: hns3: use macro IANA_VXLAN_GPE_UDP_PORT to replace number 4790

This patch uses macro IANA_VXLAN_GPE_UDP_PORT to replace number 4790 for
cleanup.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: vxlan: add macro definition for number of IANA VXLAN-GPE port
Hao Chen [Sat, 27 Nov 2021 09:34:04 +0000 (17:34 +0800)]
net: vxlan: add macro definition for number of IANA VXLAN-GPE port

Add macro definition for number of IANA VXLAN-GPE port for generic use.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Write lock dev_base_lock without disabling bottom halves.
Sebastian Andrzej Siewior [Fri, 26 Nov 2021 16:15:29 +0000 (17:15 +0100)]
net: Write lock dev_base_lock without disabling bottom halves.

The writer acquires dev_base_lock with disabled bottom halves.
The reader can acquire dev_base_lock without disabling bottom halves
because there is no writer in softirq context.

On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
as a lock to ensure that resources, that are protected by disabling
bottom halves, remain protected.
This leads to a circular locking dependency if the lock acquired with
disabled bottom halves (as in write_lock_bh()) and somewhere else with
enabled bottom halves (as by read_lock() in netstat_show()) followed by
disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
-> spin_lock_bh()). This is the reverse locking order.

All read_lock() invocation are from sysfs callback which are not invoked
from softirq context. Therefore there is no need to disable bottom
halves while acquiring a write lock.

Acquire the write lock of dev_base_lock without disabling bottom halves.

Reported-by: Pei Zhang <pezhang@redhat.com>
Reported-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/l2tp: convert tunnel rwlock_t to rcu
Tom Parkin [Fri, 26 Nov 2021 16:09:03 +0000 (16:09 +0000)]
net/l2tp: convert tunnel rwlock_t to rcu

Previously commit e02d494d2c60 ("l2tp: Convert rwlock to RCU") converted
most, but not all, rwlock instances in the l2tp subsystem to RCU.

The remaining rwlock protects the per-tunnel hashlist of sessions which
is used for session lookups in the UDP-encap data path.

Convert the remaining rwlock to rcu to improve performance of UDP-encap
tunnels.

Note that the tunnel and session, which both live on RCU-protected
lists, use slightly different approaches to incrementing their refcounts
in the various getter functions.

The tunnel has to use refcount_inc_not_zero because the tunnel shutdown
process involves dropping the refcount to zero prior to synchronizing
RCU readers (via. kfree_rcu).

By contrast, the session shutdown removes the session from the list(s)
it is on, synchronizes with readers, and then decrements the session
refcount.  Since the getter functions increment the session refcount
with the RCU read lock held we prevent getters seeing a zero session
refcount, and therefore don't need to use refcount_inc_not_zero.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'mvneta-next'
David S. Miller [Mon, 29 Nov 2021 12:05:52 +0000 (12:05 +0000)]
Merge branch 'mvneta-next'

Maxime Chevallier says:

====================
net: mvneta: mqprio cleanups and shaping support

This is the second version of the series that adds some improvements to the
existing mqprio implementation in mvneta, and adds support for
egress shaping offload.

The first 3 patches are some minor cleanups, such as using the
tc_mqprio_qopt_offload structure to get access to more offloading
options, cleaning the logic to detect whether or not we should offload
mqprio setting, and allowing to have a 1 to N mapping between TCs and
queues.

The last patch adds traffic shaping offload, using mvneta's per-queue
token buckets, allowing to limit rates from 10Kbps up to 5Gbps with
10Kbps increments.

This was tested only on an Armada 3720, with traffic up to 2.5Gbps.

Changes since V1 fixes the build for 32bits kernels, using the right
div helpers as suggested by Jakub.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mvneta: Add TC traffic shaping offload
Maxime Chevallier [Fri, 26 Nov 2021 11:20:56 +0000 (12:20 +0100)]
net: mvneta: Add TC traffic shaping offload

The mvneta controller is able to do some tocken-bucket per-queue traffic
shaping. This commit adds support for setting these using the TC mqprio
interface.

The token-bucket parameters are customisable, but the current
implementation configures them to have a 10kbps resolution for the
rate limitation, since it allows to cover the whole range of max_rate
values from 10kbps to 5Gbps with 10kbps increments.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mvneta: Allow having more than one queue per TC
Maxime Chevallier [Fri, 26 Nov 2021 11:20:55 +0000 (12:20 +0100)]
net: mvneta: Allow having more than one queue per TC

The current mqprio implementation assumed that we are only using one
queue per TC. Use the offset and count parameters to allow using
multiple queues per TC. In that case, the controller will use a standard
round-robin algorithm to pick queues assigned to the same TC, with the
same priority.

This only applies to VLAN priorities in ingress traffic, each TC
corresponding to a vlan priority.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mvneta: Don't force-set the offloading flag
Maxime Chevallier [Fri, 26 Nov 2021 11:20:54 +0000 (12:20 +0100)]
net: mvneta: Don't force-set the offloading flag

The qopt->hw flag is set by the TC code according to the offloading mode
asked by user. Don't force-set it in the driver, but instead read it to
make sure we do what's asked.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mvneta: Use struct tc_mqprio_qopt_offload for MQPrio configuration
Maxime Chevallier [Fri, 26 Nov 2021 11:20:53 +0000 (12:20 +0100)]
net: mvneta: Use struct tc_mqprio_qopt_offload for MQPrio configuration

The struct tc_mqprio_qopt_offload is a container for struct tc_mqprio_qopt,
that allows passing extra parameters, such as traffic shaping. This commit
converts the current mqprio code to that new struct.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mdio: ipq8064: replace ioremap() with devm_ioremap()
Yang Yingliang [Fri, 26 Nov 2021 09:13:40 +0000 (17:13 +0800)]
net: mdio: ipq8064: replace ioremap() with devm_ioremap()

Use devm_ioremap() instead of ioremap() to avoid iounmap() missing.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'af_unix-replace-unix_table_lock-with-per-hash-locks'
Jakub Kicinski [Sat, 27 Nov 2021 02:02:02 +0000 (18:02 -0800)]
Merge branch 'af_unix-replace-unix_table_lock-with-per-hash-locks'

Kuniyuki Iwashima says:

====================
af_unix: Replace unix_table_lock with per-hash locks.

The hash table of AF_UNIX sockets is protected by a single big lock,
unix_table_lock.  This series replaces it with small per-hash locks.

1st -  2nd : Misc refactoring
3rd -  8th : Separate BSD/abstract address logics
9th - 11th : Prep to save a hash in each socket
12th       : Replace the big lock
13th       : Speed up autobind()

Note to maintainers:
The 12th patch adds two kinds of Sparse warnings on patchwork:

  about unix_table_double_lock/unlock()
    We can avoid this by adding two apparent acquires/releases annotations,
    but there are the same kinds of warnings about unix_state_double_lock().

  about unix_next_socket() and unix_seq_stop() (/proc/net/unix)
    This is because Sparse does not understand logic in unix_next_socket(),
    which leaves a spin lock held until it returns NULL.
    Also, tcp_seq_stop() causes a warning for the same reason.

These warnings seem reasonable, but let me know if there is any better way.
Please see [0] for details.

[0]: https://lore.kernel.org/netdev/20211117001611.74123-1-kuniyu@amazon.co.jp/
====================

Link: https://lore.kernel.org/r/20211124021431.48956-1-kuniyu@amazon.co.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoaf_unix: Relax race in unix_autobind().
Kuniyuki Iwashima [Wed, 24 Nov 2021 02:14:31 +0000 (11:14 +0900)]
af_unix: Relax race in unix_autobind().

When we bind an AF_UNIX socket without a name specified, the kernel selects
an available one from 0x00000 to 0xFFFFF.  unix_autobind() starts searching
from a number in the 'static' variable and increments it after acquiring
two locks.

If multiple processes try autobind, they obtain the same lock and check if
a socket in the hash list has the same name.  If not, one process uses it,
and all except one end up retrying the _next_ number (actually not, it may
be incremented by the other processes).  The more we autobind sockets in
parallel, the longer the latency gets.  We can avoid such a race by
searching for a name from a random number.

These show latency in unix_autobind() while 64 CPUs are simultaneously
autobind-ing 1024 sockets for each.

  Without this patch:

     usec          : count     distribution
        0          : 1176     |***                                     |
        2          : 3655     |***********                             |
        4          : 4094     |*************                           |
        6          : 3831     |************                            |
        8          : 3829     |************                            |
        10         : 3844     |************                            |
        12         : 3638     |***********                             |
        14         : 2992     |*********                               |
        16         : 2485     |*******                                 |
        18         : 2230     |*******                                 |
        20         : 2095     |******                                  |
        22         : 1853     |*****                                   |
        24         : 1827     |*****                                   |
        26         : 1677     |*****                                   |
        28         : 1473     |****                                    |
        30         : 1573     |*****                                   |
        32         : 1417     |****                                    |
        34         : 1385     |****                                    |
        36         : 1345     |****                                    |
        38         : 1344     |****                                    |
        40         : 1200     |***                                     |

  With this patch:

     usec          : count     distribution
        0          : 1855     |******                                  |
        2          : 6464     |*********************                   |
        4          : 9936     |********************************        |
        6          : 12107    |****************************************|
        8          : 10441    |**********************************      |
        10         : 7264     |***********************                 |
        12         : 4254     |**************                          |
        14         : 2538     |********                                |
        16         : 1596     |*****                                   |
        18         : 1088     |***                                     |
        20         : 800      |**                                      |
        22         : 670      |**                                      |
        24         : 601      |*                                       |
        26         : 562      |*                                       |
        28         : 525      |*                                       |
        30         : 446      |*                                       |
        32         : 378      |*                                       |
        34         : 337      |*                                       |
        36         : 317      |*                                       |
        38         : 314      |*                                       |
        40         : 298      |                                        |

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoaf_unix: Replace the big lock with small locks.
Kuniyuki Iwashima [Wed, 24 Nov 2021 02:14:30 +0000 (11:14 +0900)]
af_unix: Replace the big lock with small locks.

The hash table of AF_UNIX sockets is protected by the single lock.  This
patch replaces it with per-hash locks.

The effect is noticeable when we handle multiple sockets simultaneously.
Here is a test result on an EC2 c5.24xlarge instance.  It shows latency
(under 10us only) in unix_insert_unbound_socket() while 64 CPUs creating
1024 sockets for each in parallel.

  Without this patch:

     nsec          : count     distribution
        0          : 179      |                                        |
        500        : 3021     |*********                               |
        1000       : 6271     |*******************                     |
        1500       : 6318     |*******************                     |
        2000       : 5828     |*****************                       |
        2500       : 5124     |***************                         |
        3000       : 4426     |*************                           |
        3500       : 3672     |***********                             |
        4000       : 3138     |*********                               |
        4500       : 2811     |********                                |
        5000       : 2384     |*******                                 |
        5500       : 2023     |******                                  |
        6000       : 1954     |*****                                   |
        6500       : 1737     |*****                                   |
        7000       : 1749     |*****                                   |
        7500       : 1520     |****                                    |
        8000       : 1469     |****                                    |
        8500       : 1394     |****                                    |
        9000       : 1232     |***                                     |
        9500       : 1138     |***                                     |
        10000      : 994      |***                                     |

  With this patch:

     nsec          : count     distribution
        0          : 1634     |****                                    |
        500        : 13170    |****************************************|
        1000       : 13156    |*************************************** |
        1500       : 9010     |***************************             |
        2000       : 6363     |*******************                     |
        2500       : 4443     |*************                           |
        3000       : 3240     |*********                               |
        3500       : 2549     |*******                                 |
        4000       : 1872     |*****                                   |
        4500       : 1504     |****                                    |
        5000       : 1247     |***                                     |
        5500       : 1035     |***                                     |
        6000       : 889      |**                                      |
        6500       : 744      |**                                      |
        7000       : 634      |*                                       |
        7500       : 498      |*                                       |
        8000       : 433      |*                                       |
        8500       : 355      |*                                       |
        9000       : 336      |*                                       |
        9500       : 284      |                                        |
        10000      : 243      |                                        |

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoaf_unix: Save hash in sk_hash.
Kuniyuki Iwashima [Wed, 24 Nov 2021 02:14:29 +0000 (11:14 +0900)]
af_unix: Save hash in sk_hash.

To replace unix_table_lock with per-hash locks in the next patch, we need
to save a hash in each socket because /proc/net/unix or BPF prog iterate
sockets while holding a hash table lock and release it later in a different
function.

Currently, we store a real/pseudo hash in struct unix_address.  However, we
do not allocate it to unbound sockets, nor should we do just for that.  For
this purpose, we can use sk_hash.  Then, we no longer use the hash field in
struct unix_address and can remove it.

Also, this patch does
  - rename unix_insert_socket() to unix_insert_unbound_socket()
  - remove the redundant list argument from __unix_insert_socket() and
     unix_insert_unbound_socket()
  - use 'unsigned int' instead of 'unsigned' in __unix_set_addr_hash()
  - remove 'inline' from unix_remove_socket() and
     unix_insert_unbound_socket().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>