platform/kernel/linux-starfive.git
2 years agonet/smc: Unbind r/w buffer size from clcsock and make them tunable
Tony Lu [Tue, 20 Sep 2022 09:52:22 +0000 (17:52 +0800)]
net/smc: Unbind r/w buffer size from clcsock and make them tunable

Currently, SMC uses smc->sk.sk_{rcv|snd}buf to create buffers for
send buffer and RMB. And the values of buffer size are from tcp_{w|r}mem
in clcsock.

The buffer size from TCP socket doesn't fit SMC well. Generally, buffers
are usually larger than TCP for SMC-R/-D to get higher performance, for
they are different underlay devices and paths.

So this patch unbinds buffer size from TCP, and introduces two sysctl
knobs to tune them independently. Also, these knobs are per net
namespace and work for containers.

Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet/smc: Introduce a specific sysctl for TEST_LINK time
Wen Gu [Tue, 20 Sep 2022 09:52:21 +0000 (17:52 +0800)]
net/smc: Introduce a specific sysctl for TEST_LINK time

SMC-R tests the viability of link by sending out TEST_LINK LLC
messages over RoCE fabric when connections on link have been
idle for a time longer than keepalive interval (testlink time).

But using tcp_keepalive_time as testlink time maybe not quite
suitable because it is default no less than two hours[1], which
is too long for single link to find peer dead. The active host
will still use peer-dead link (QP) sending messages, and can't
find out until get IB_WC_RETRY_EXC_ERR error CQEs, which takes
more time than TEST_LINK timeout (SMC_LLC_WAIT_TIME) normally.

So this patch introduces a independent sysctl for SMC-R to set
link keepalive time, in order to detect link down in time. The
default value is 30 seconds.

[1] https://www.rfc-editor.org/rfc/rfc1122#page-101

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: ethernet: altera: TSE: fix error return code in altera_tse_probe()
Sun Ke [Tue, 20 Sep 2022 02:00:41 +0000 (10:00 +0800)]
net: ethernet: altera: TSE: fix error return code in altera_tse_probe()

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: fef2998203e1 ("net: altera: tse: convert to phylink")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sun Ke <sunke32@huawei.com>
Link: https://lore.kernel.org/r/20220920020041.2685948-1-sunke32@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter...
Jakub Kicinski [Thu, 22 Sep 2022 01:42:54 +0000 (18:42 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter patches for net-next

Remove GPL license copypastry in uapi files, those have SPDX tags.
From Christophe Jaillet.

Remove unused variable in rpfilter, from Guillaume Nault.

Rework gc resched delay computation in conntrack, from Antoine Tenart.

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: rpfilter: Remove unused variable 'ret'.
  headers: Remove some left-over license text in include/uapi/linux/netfilter/
  netfilter: conntrack: revisit the gc initial rescheduling bias
  netfilter: conntrack: fix the gc rescheduling delay
====================

Link: https://lore.kernel.org/r/20220921095000.29569-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoethernet: tundra: Drop forward declaration of static functions
Uwe Kleine-König [Mon, 19 Sep 2022 13:15:15 +0000 (15:15 +0200)]
ethernet: tundra: Drop forward declaration of static functions

Usually it's not necessary to declare static functions if the symbols are
in the right order. Moving the definition of tsi_eth_driver down in the
compilation unit allows to drop two such declarations.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20220919131515.885361-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoflow_dissector: Do not count vlan tags inside tunnel payload
Qingqing Yang [Mon, 19 Sep 2022 07:48:08 +0000 (15:48 +0800)]
flow_dissector: Do not count vlan tags inside tunnel payload

We've met the problem that when there is a vlan tag inside
GRE encapsulation, the match of num_of_vlans fails.
It is caused by the vlan tag inside GRE payload has been
counted into num_of_vlans, which is not expected.

One example packet is like this:
Ethernet II, Src: Broadcom_68:56:07 (00:10:18:68:56:07)
                   Dst: Broadcom_68:56:08 (00:10:18:68:56:08)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 100
Internet Protocol Version 4, Src: 192.168.1.4, Dst: 192.168.1.200
Generic Routing Encapsulation (Transparent Ethernet bridging)
Ethernet II, Src: Broadcom_68:58:07 (00:10:18:68:58:07)
                   Dst: Broadcom_68:58:08 (00:10:18:68:58:08)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 200
...
It should match the (num_of_vlans 1) rule, but it matches
the (num_of_vlans 2) rule.

The vlan tags inside the GRE or other tunnel encapsulated payload
should not be taken into num_of_vlans.
The fix is to stop counting the vlan number when the encapsulation
bit is set.

Fixes: 34951fcf26c5 ("flow_dissector: Add number of vlan tags dissector")
Signed-off-by: Qingqing Yang <qingqing.yang@broadcom.com>
Reviewed-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Link: https://lore.kernel.org/r/20220919074808.136640-1-qingqing.yang@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: sched: remove unused tcf_result extension
Jamal Hadi Salim [Mon, 19 Sep 2022 13:06:27 +0000 (13:06 +0000)]
net: sched: remove unused tcf_result extension

Added by:
commit e5cf1baf92cb ("act_mirred: use TC_ACT_REINSERT when possible")
but no longer useful.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20220919130627.3551233-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'clean-up-ocelot_reset-routine'
Jakub Kicinski [Thu, 22 Sep 2022 01:29:38 +0000 (18:29 -0700)]
Merge branch 'clean-up-ocelot_reset-routine'

Colin Foster says:

====================
clean up ocelot_reset() routine

ocelot_reset() will soon be exported to a common library to be used by
the ocelot_ext system. This will make error values from regmap calls
possible, so they must be checked. Additionally, readx_poll_timeout()
can be substituted for the custom loop, as a simple cleanup.

I don't have hardware to verify this set directly, but there shouldn't
be any functional changes.
====================

Link: https://lore.kernel.org/r/20220917175127.161504-1-colin.foster@in-advantage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: mscc: ocelot: check return values of writes during reset
Colin Foster [Sat, 17 Sep 2022 17:51:27 +0000 (10:51 -0700)]
net: mscc: ocelot: check return values of writes during reset

The ocelot_reset() function utilizes regmap_field_write() but wasn't
checking return values. While this won't cause issues for the current MMIO
regmaps, it could be an issue for externally controlled interfaces.

Add checks for these return values.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: mscc: ocelot: utilize readx_poll_timeout() for chip reset
Colin Foster [Sat, 17 Sep 2022 17:51:26 +0000 (10:51 -0700)]
net: mscc: ocelot: utilize readx_poll_timeout() for chip reset

Clean up the reset code by utilizing readx_poll_timeout instead of a custom
loop.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'net-ll_temac-cleanup-for-clearing-static-warnings'
Jakub Kicinski [Thu, 22 Sep 2022 01:25:22 +0000 (18:25 -0700)]
Merge branch 'net-ll_temac-cleanup-for-clearing-static-warnings'

Haoyue Xu says:

====================
net: ll_temac: Cleanup for clearing static warnings

Most static warnings are detected by Checkpatch.pl, mainly about:
(1) #1: About the comments.
(2) #2: About function name in a string.
(3) #3: About the open parenthesis.
(4) #4: About the else branch.
(6) #6: About trailing statements.
(7) #5,#7: About blank lines and spaces.
====================

Link: https://lore.kernel.org/r/20220917103843.526877-1-xuhaoyue1@hisilicon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ll_temac: axienet: delete unnecessary blank lines and spaces
huangjunxian [Sat, 17 Sep 2022 10:38:43 +0000 (18:38 +0800)]
net: ll_temac: axienet: delete unnecessary blank lines and spaces

Cleaning some static warnings of indent.

Signed-off-by: huangjunxian <huangjunxian6@hisilicon.com>
Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Reviewed-by: Harini Katakam <harini.katakam@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ll_temac: move trailing statements to next line
huangjunxian [Sat, 17 Sep 2022 10:38:42 +0000 (18:38 +0800)]
net: ll_temac: move trailing statements to next line

Cleaning some static warnings of trailing statements.

Signed-off-by: huangjunxian <huangjunxian6@hisilicon.com>
Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Reviewed-by: Harini Katakam <harini.katakam@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ll_temac: fix the missing spaces around '='
huangjunxian [Sat, 17 Sep 2022 10:38:41 +0000 (18:38 +0800)]
net: ll_temac: fix the missing spaces around '='

Cleaning some static warnings of missing spaces around '='.

Signed-off-by: huangjunxian <huangjunxian6@hisilicon.com>
Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Reviewed-by: Harini Katakam <harini.katakam@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ll_temac: delete unnecessary else branch
huangjunxian [Sat, 17 Sep 2022 10:38:40 +0000 (18:38 +0800)]
net: ll_temac: delete unnecessary else branch

Cleaning some static warnings of unnecessary else branch.

Signed-off-by: huangjunxian <huangjunxian6@hisilicon.com>
Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Reviewed-by: Harini Katakam <harini.katakam@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ll_temac: axienet: align with open parenthesis
huangjunxian [Sat, 17 Sep 2022 10:38:39 +0000 (18:38 +0800)]
net: ll_temac: axienet: align with open parenthesis

Cleaning some static warnings of open parenthesis.

Signed-off-by: huangjunxian <huangjunxian6@hisilicon.com>
Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Reviewed-by: Harini Katakam <harini.katakam@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ll_temac: Cleanup for function name in a string
Haoyue Xu [Sat, 17 Sep 2022 10:38:38 +0000 (18:38 +0800)]
net: ll_temac: Cleanup for function name in a string

As Checkpatch.pl warns, prefer using '"%s...", __func__'
to using 'temac_device_reset', this function's name, in a string.

Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Reviewed-by: Harini Katakam <harini.katakam@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ll_temac: fix the format of block comments
huangjunxian [Sat, 17 Sep 2022 10:38:37 +0000 (18:38 +0800)]
net: ll_temac: fix the format of block comments

Cleaning some static warnings of block comments.

Signed-off-by: huangjunxian <huangjunxian6@hisilicon.com>
Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Reviewed-by: Harini Katakam <harini.katakam@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: macvtap: add __init/__exit annotations to module init/exit funcs
Xiu Jianfeng [Sat, 17 Sep 2022 08:35:35 +0000 (16:35 +0800)]
net: macvtap: add __init/__exit annotations to module init/exit funcs

Add missing __init/__exit annotations to module init/exit funcs.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Link: https://lore.kernel.org/r/20220917083535.20040-1-xiujianfeng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: hns3: add __init/__exit annotations to module init/exit funcs
Xiu Jianfeng [Sat, 17 Sep 2022 08:21:18 +0000 (16:21 +0800)]
net: hns3: add __init/__exit annotations to module init/exit funcs

Add missing __init/__exit annotations to module init/exit funcs.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Link: https://lore.kernel.org/r/20220917082118.7971-1-xiujianfeng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: sched: simplify code in mall_reoffload
William Dean [Sat, 17 Sep 2022 06:35:56 +0000 (14:35 +0800)]
net: sched: simplify code in mall_reoffload

such expression:
if (err)
return err;
return 0;
can simplify to:
return err;

Signed-off-by: William Dean <williamsukatube@163.com>
Link: https://lore.kernel.org/r/20220917063556.2673-1-williamsukatube@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ethernet: ti: am65-cpsw: remove unused parameter of am65_cpsw_nuss_common_open()
Jian Shen [Sat, 17 Sep 2022 02:04:51 +0000 (10:04 +0800)]
net: ethernet: ti: am65-cpsw: remove unused parameter of am65_cpsw_nuss_common_open()

The inptu parameter 'features' is unused now. so remove it.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Link: https://lore.kernel.org/r/20220917020451.63417-1-huangguangbin2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: fjes: Reorder symbols to get rid of a few forward declarations
Uwe Kleine-König [Sat, 17 Sep 2022 22:51:42 +0000 (00:51 +0200)]
net: fjes: Reorder symbols to get rid of a few forward declarations

Quite a few of the functions and other symbols defined in this driver had
forward declarations. They can all be dropped after reordering them.

This saves a few lines of code and reduces code duplication.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20220917225142.473770-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'net-hns3-updates-for-next'
Jakub Kicinski [Wed, 21 Sep 2022 21:32:23 +0000 (14:32 -0700)]
Merge branch 'net-hns3-updates-for-next'

Guangbin Huang says:

====================
net: hns3: updates for -next

This series includes some updates for the HNS3 ethernet driver.
====================

Link: https://lore.kernel.org/r/20220916023803.23756-1-huangguangbin2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: hns3: add judge fd ability for sync and clear process of flow director
Guangbin Huang [Fri, 16 Sep 2022 02:38:03 +0000 (10:38 +0800)]
net: hns3: add judge fd ability for sync and clear process of flow director

Currently, driver will always clear user defined field of flow director
in uninit process and sync flow director table in periodic task. However,
if hardware does not support flow director driver should not do those
processes, so add fd ability judgement.

The fd ability judgement in function hclge_clear_fd_rules_in_list() is
redundant, so delete it.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: hns3: refactor function hclge_mbx_handler()
Hao Lan [Fri, 16 Sep 2022 02:38:02 +0000 (10:38 +0800)]
net: hns3: refactor function hclge_mbx_handler()

Currently, the function hclge_mbx_handler() has too many switch-case
statements, it makes this function too long. To improve code readability,
refactor this function and use lookup table instead.

Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: hns3: optimize converting dscp to priority process of hns3_nic_select_queue()
Guangbin Huang [Fri, 16 Sep 2022 02:38:01 +0000 (10:38 +0800)]
net: hns3: optimize converting dscp to priority process of hns3_nic_select_queue()

Currently, when function hns3_nic_select_queue() converts dscp to priority,
it calls an indirect callback ae_algo->ops->get_dscp_prio to get priority.

However as function hns3_nic_select_queue() is in fast path, the indirect
call may cause degradation of performance. For optimization, this patch
moves dscp_app_cnt and dscp_prio from struct hclge_tm_info to struct
hnae3_knic_private_info, so they can be used in both hclge and hns3 layers.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: hns3: add support for external loopback test
Yonglong Liu [Fri, 16 Sep 2022 02:38:00 +0000 (10:38 +0800)]
net: hns3: add support for external loopback test

This patch add support for external loopback test.
The successful test need the link is up with duplex full. The
driver do external loopback first, and then the whole offline
test.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agodocs: net: add an explanation of VF (and other) Representors
Edward Cree [Mon, 5 Sep 2022 13:55:57 +0000 (14:55 +0100)]
docs: net: add an explanation of VF (and other) Representors

There's no clear explanation of what VF Representors are for, their
 semantics, etc., outside of vendor docs and random conference slides.
Add a document explaining Representors and defining what drivers that
 implement them are expected to do.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20220905135557.39233-1-ecree@xilinx.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'net-dev_err_probe'
David S. Miller [Wed, 21 Sep 2022 12:50:03 +0000 (13:50 +0100)]
Merge branch 'net-dev_err_probe'

Yang Yingliang says:

====================
net: drivers: Switch to use dev_err_probe() helper

In the probe path, dev_err() can be replace with dev_err_probe()
which will check if error code is -EPROBE_DEFER. It will print
error code in a human readable way and simplify the code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ll_temac: Switch to use dev_err_probe() helper
Yang Yingliang [Thu, 15 Sep 2022 11:42:14 +0000 (19:42 +0800)]
net: ll_temac: Switch to use dev_err_probe() helper

dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: stmmac: dwc-qos: Switch to use dev_err_probe() helper
Yang Yingliang [Thu, 15 Sep 2022 11:42:13 +0000 (19:42 +0800)]
net: stmmac: dwc-qos: Switch to use dev_err_probe() helper

dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ibm: emac: Switch to use dev_err_probe() helper
Yang Yingliang [Thu, 15 Sep 2022 11:42:12 +0000 (19:42 +0800)]
net: ibm: emac: Switch to use dev_err_probe() helper

dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: lantiq: Switch to use dev_err_probe() helper
Yang Yingliang [Thu, 15 Sep 2022 11:42:11 +0000 (19:42 +0800)]
net: dsa: lantiq: Switch to use dev_err_probe() helper

dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: cpsw_new: Switch to use dev_err_probe() helper
Yang Yingliang [Thu, 15 Sep 2022 11:42:10 +0000 (19:42 +0800)]
net: ethernet: ti: cpsw_new: Switch to use dev_err_probe() helper

dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: cpsw: Switch to use dev_err_probe() helper
Yang Yingliang [Thu, 15 Sep 2022 11:42:09 +0000 (19:42 +0800)]
net: ethernet: ti: cpsw: Switch to use dev_err_probe() helper

dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: am65-cpts: Switch to use dev_err_probe() helper
Yang Yingliang [Thu, 15 Sep 2022 11:42:08 +0000 (19:42 +0800)]
net: ethernet: ti: am65-cpts: Switch to use dev_err_probe() helper

dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/af_packet: registration process optimization in packet_init()
Ziyang Xuan [Thu, 15 Sep 2022 01:08:35 +0000 (09:08 +0800)]
net/af_packet: registration process optimization in packet_init()

Now, register_pernet_subsys() and register_netdevice_notifier() are both
after sock_register(). It can create PF_PACKET socket and process socket
once sock_register() successfully. It is possible PF_PACKET socket is
creating but register_pernet_subsys() and register_netdevice_notifier()
are not registered yet. Thus net->packet.sklist_lock and net->packet.sklist
will be accessed without initialization that is done in packet_net_init().
Although this is a low probability scenario.

Move register_pernet_subsys() and register_netdevice_notifier() to the
front in packet_init(). Correspondingly, adjust the unregister process
in packet_exit().

Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: sched: act_ct: remove redundant variable err
Jinpeng Cui [Tue, 13 Sep 2022 16:13:26 +0000 (16:13 +0000)]
net: sched: act_ct: remove redundant variable err

Return value directly from pskb_trim_rcsum() instead of
getting value from redundant variable err.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Jinpeng Cui <cui.jinpeng2@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonetfilter: rpfilter: Remove unused variable 'ret'.
Guillaume Nault [Thu, 8 Sep 2022 17:29:23 +0000 (19:29 +0200)]
netfilter: rpfilter: Remove unused variable 'ret'.

Commit 91a178258aea ("netfilter: rpfilter: Convert
rpfilter_lookup_reverse to new dev helper") removed the need for the
'ret' variable. This went unnoticed because of the __maybe_unused
annotation.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2 years agoheaders: Remove some left-over license text in include/uapi/linux/netfilter/
Christophe JAILLET [Sun, 11 Sep 2022 12:03:09 +0000 (14:03 +0200)]
headers: Remove some left-over license text in include/uapi/linux/netfilter/

When the SPDX-License-Identifier tag has been added, the corresponding
license text has not been removed.

Remove it now.

Also, in xt_connmark.h, move the copyright text at the top of the file
which is a much more common pattern.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Florian Westphal <fw@strlen.de>
2 years agonetfilter: conntrack: revisit the gc initial rescheduling bias
Antoine Tenart [Fri, 16 Sep 2022 09:29:41 +0000 (11:29 +0200)]
netfilter: conntrack: revisit the gc initial rescheduling bias

The previous commit changed the way the rescheduling delay is computed
which has a side effect: the bias is now represented as much as the
other entries in the rescheduling delay which makes the logic to kick in
only with very large sets, as the initial interval is very large
(INT_MAX).

Revisit the GC initial bias to allow more frequent GC for smaller sets
while still avoiding wakeups when a machine is mostly idle. We're moving
from a large initial value to pretending we have 100 entries expiring at
the upper bound. This way only a few entries having a small timeout
won't impact much the rescheduling delay and non-idle machines will have
enough entries to lower the delay when needed. This also improves
readability as the initial bias is now linked to what is computed
instead of being an arbitrary large value.

Fixes: 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2 years agonetfilter: conntrack: fix the gc rescheduling delay
Antoine Tenart [Fri, 16 Sep 2022 09:29:40 +0000 (11:29 +0200)]
netfilter: conntrack: fix the gc rescheduling delay

Commit 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning")
changed the eviction rescheduling to the use average expiry of scanned
entries (within 1-60s) by doing:

  for (...) {
      expires = clamp(nf_ct_expires(tmp), ...);
      next_run += expires;
      next_run /= 2;
  }

The issue is the above will make the average ('next_run' here) more
dependent on the last expiration values than the firsts (for sets > 2).
Depending on the expiration values used to compute the average, the
result can be quite different than what's expected. To fix this we can
do the following:

  for (...) {
      expires = clamp(nf_ct_expires(tmp), ...);
      next_run += (expires - next_run) / ++count;
  }

Fixes: 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning")
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2 years agoliquidio: CN23XX: delete repeated words, add missing words and fix typo in comment
Ruffalo Lavoisier [Mon, 19 Sep 2022 05:34:46 +0000 (14:34 +0900)]
liquidio: CN23XX: delete repeated words, add missing words and fix typo in comment

- Delete the repeated word 'to' in the comment.

- Add the missing 'use' word within the sentence.

- Correct spelling on 'malformation', 'needs'.

Signed-off-by: Ruffalo Lavoisier <RuffaloLavoisier@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20220919053447.5702-1-RuffaloLavoisier@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoocteontx2-pf: Fix unused variable build error
Ren Zhijie [Mon, 19 Sep 2022 02:58:40 +0000 (10:58 +0800)]
octeontx2-pf: Fix unused variable build error

If CONFIG_DCB is not set,
make ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu-,
will be failed, like this:

drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c: In function ‘otx2_select_queue’:
drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c:1886:19: error: unused variable ‘pf’ [-Werror=unused-variable]
  struct otx2_nic *pf = netdev_priv(netdev);
                   ^~
cc1: all warnings being treated as errors

To fix this build error, put the definition of *pf under the CONFIG_DCB.

Fixes: 99c969a83d82 ("octeontx2-pf: Add egress PFC support")
Signed-off-by: Ren Zhijie <renzhijie2@huawei.com>
Link: https://lore.kernel.org/r/20220919025840.256411-1-renzhijie2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoice: Add low latency Tx timestamp read
Karol Kolacinski [Fri, 16 Sep 2022 20:17:28 +0000 (13:17 -0700)]
ice: Add low latency Tx timestamp read

E810 products can support low latency Tx timestamp register read.
This requires usage of threaded IRQ instead of kthread to reduce the
kthread start latency (spikes up to 20 ms).
Add a check for the device capability and use the new method if
supported.

Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220916201728.241510-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'add-a-secondary-at-port-to-the-telit-fn990'
Jakub Kicinski [Tue, 20 Sep 2022 23:07:14 +0000 (16:07 -0700)]
Merge branch 'add-a-secondary-at-port-to-the-telit-fn990'

Fabio Porcedda says:

====================
Add a secondary AT port to the Telit FN990

In order to add a secondary AT port to the Telit FN990 first add "DUN2"
to mhi_wwan_ctrl.c, after that add a seconday AT port to the
Telit FN990 in pci_generic.c
====================

Link: https://lore.kernel.org/r/20220916144329.243368-1-fabio.porcedda@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agobus: mhi: host: pci_generic: Add a secondary AT port to Telit FN990
Fabio Porcedda [Fri, 16 Sep 2022 14:43:29 +0000 (16:43 +0200)]
bus: mhi: host: pci_generic: Add a secondary AT port to Telit FN990

Add a secondary AT port using one of OEM reserved channel.

Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: wwan: mhi_wwan_ctrl: Add DUN2 to have a secondary AT port
Fabio Porcedda [Fri, 16 Sep 2022 14:43:28 +0000 (16:43 +0200)]
net: wwan: mhi_wwan_ctrl: Add DUN2 to have a secondary AT port

In order to have a secondary AT port add "DUN2".

Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'refactor-duplicate-codes-in-the-tc-cls-walk-function'
Jakub Kicinski [Tue, 20 Sep 2022 22:54:46 +0000 (15:54 -0700)]
Merge branch 'refactor-duplicate-codes-in-the-tc-cls-walk-function'

Zhengchao Shao says:

====================
refactor duplicate codes in the tc cls walk function

The walk implementation of most tc cls modules is basically the same.
That is, the values of count and skip are checked first. If count is
greater than or equal to skip, the registered fn function is executed.
Otherwise, increase the value of count. So the code can be refactored.
Then use helper function to replace the code of each cls module in
alphabetical order.

The walk function is invoked during dump. Therefore, test cases related
 to the tdc filter need to be added.
====================

Link: https://lore.kernel.org/r/20220916020251.190097-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests/tc-testings: add list case for basic filter
Zhengchao Shao [Fri, 16 Sep 2022 02:02:51 +0000 (10:02 +0800)]
selftests/tc-testings: add list case for basic filter

Test 0811: Add multiple basic filter with cmp ematch u8/link layer and
default action and dump them
Test 5129: List basic filters

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests/tc-testings: add selftests for tcindex filter
Zhengchao Shao [Fri, 16 Sep 2022 02:02:50 +0000 (10:02 +0800)]
selftests/tc-testings: add selftests for tcindex filter

Test 8293: Add tcindex filter with default action
Test 7281: Add tcindex filter with hash size and pass action
Test b294: Add tcindex filter with mask shift and reclassify action
Test 0532: Add tcindex filter with pass_on and continue actions
Test d473: Add tcindex filter with pipe action
Test 2940: Add tcindex filter with miltiple actions
Test 1893: List tcindex filters
Test 2041: Change tcindex filter with pass action
Test 9203: Replace tcindex filter with pass action
Test 7957: Delete tcindex filter with drop action

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests/tc-testings: add selftests for rsvp filter
Zhengchao Shao [Fri, 16 Sep 2022 02:02:49 +0000 (10:02 +0800)]
selftests/tc-testings: add selftests for rsvp filter

Test 2141: Add rsvp filter with tcp proto and specific IP address
Test 5267: Add rsvp filter with udp proto and specific IP address
Test 2819: Add rsvp filter with src ip and src port
Test c967: Add rsvp filter with tunnelid and continue action
Test 5463: Add rsvp filter with tunnel and pipe action
Test 2332: Add rsvp filter with miltiple actions
Test 8879: Add rsvp filter with tunnel and skp flag
Test 8261: List rsvp filters
Test 8989: Delete rsvp filter

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests/tc-testings: add selftests for route filter
Zhengchao Shao [Fri, 16 Sep 2022 02:02:48 +0000 (10:02 +0800)]
selftests/tc-testings: add selftests for route filter

Test e122: Add route filter with from and to tag
Test 6573: Add route filter with fromif and to tag
Test 1362: Add route filter with to flag and reclassify action
Test 4720: Add route filter with from flag and continue actions
Test 2812: Add route filter with form tag and pipe action
Test 7994: Add route filter with miltiple actions
Test 4312: List route filters
Test 2634: Delete route filter with pipe action

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests/tc-testings: add selftests for flow filter
Zhengchao Shao [Fri, 16 Sep 2022 02:02:47 +0000 (10:02 +0800)]
selftests/tc-testings: add selftests for flow filter

Test 5294: Add flow filter with map key and ops
Test 3514: Add flow filter with map key or ops
Test 7534: Add flow filter with map key xor ops
Test 4524: Add flow filter with map key rshift ops
Test 0230: Add flow filter with map key addend ops
Test 2344: Add flow filter with src map key
Test 9304: Add flow filter with proto map key
Test 9038: Add flow filter with proto-src map key
Test 2a03: Add flow filter with proto-dst map key
Test a073: Add flow filter with iif map key
Test 3b20: Add flow filter with priority map key
Test 8945: Add flow filter with mark map key
Test c034: Add flow filter with nfct map key
Test 0205: Add flow filter with nfct-src map key
Test 5315: Add flow filter with nfct-src map key
Test 7849: Add flow filter with nfct-proto-src map key
Test 9902: Add flow filter with nfct-proto-dst map key
Test 6742: Add flow filter with rt-classid map key
Test 5432: Add flow filter with sk-uid map key
Test 4134: Add flow filter with sk-gid map key
Test 4522: Add flow filter with vlan-tag map key
Test 4253: Add flow filter with rxhash map key
Test 4452: Add flow filter with hash key list
Test 4341: Add flow filter with muliple ops
Test 4392: List flow filters
Test 4322: Change flow filter with map key num
Test 2320: Replace flow filter with map key num
Test 3213: Delete flow filter with map key num

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests/tc-testings: add selftests for cgroup filter
Zhengchao Shao [Fri, 16 Sep 2022 02:02:46 +0000 (10:02 +0800)]
selftests/tc-testings: add selftests for cgroup filter

Test 6273: Add cgroup filter with cmp ematch u8/link layer and drop action
Test 4721: Add cgroup filter with cmp ematch u8/link layer with trans
flag and pass action
Test d392: Add cgroup filter with cmp ematch u16/link layer and pipe action
Test 0234: Add cgroup filter with cmp ematch u32/link layer and miltiple
actions
Test 8499: Add cgroup filter with cmp ematch u8/network layer and pass
action
Test b273: Add cgroup filter with cmp ematch u8/network layer with trans
flag and drop action
Test 1934: Add cgroup filter with cmp ematch u16/network layer and pipe
action
Test 2733: Add cgroup filter with cmp ematch u32/network layer and
miltiple actions
Test 3271: Add cgroup filter with NOT cmp ematch rule and pass action
Test 2362: Add cgroup filter with two ANDed cmp ematch rules and single
action
Test 9993: Add cgroup filter with two ORed cmp ematch rules and single
action
Test 2331: Add cgroup filter with two ANDed cmp ematch rules and one ORed
ematch rule and single action
Test 3645: Add cgroup filter with two ANDed cmp ematch rules and one NOT
ORed ematch rule and single action
Test b124: Add cgroup filter with u32 ematch u8/zero offset and drop
action
Test 7381: Add cgroup filter with u32 ematch u8/zero offset and invalid
value >0xFF
Test 2231: Add cgroup filter with u32 ematch u8/positive offset and drop
action
Test 1882: Add cgroup filter with u32 ematch u8/invalid mask >0xFF
Test 1237: Add cgroup filter with u32 ematch u8/missing offset
Test 3812: Add cgroup filter with u32 ematch u8/missing AT keyword
Test 1112: Add cgroup filter with u32 ematch u8/missing value
Test 3241: Add cgroup filter with u32 ematch u8/non-numeric value
Test e231: Add cgroup filter with u32 ematch u8/non-numeric mask
Test 4652: Add cgroup filter with u32 ematch u8/negative offset and pass
Test 1331: Add cgroup filter with u32 ematch u16/zero offset and pipe
action
Test e354: Add cgroup filter with u32 ematch u16/zero offset and invalid
value >0xFFFF
Test 3538: Add cgroup filter with u32 ematch u16/positive offset and drop
action
Test 4576: Add cgroup filter with u32 ematch u16/invalid mask >0xFFFF
Test b842: Add cgroup filter with u32 ematch u16/missing offset
Test c924: Add cgroup filter with u32 ematch u16/missing AT keyword
Test cc93: Add cgroup filter with u32 ematch u16/missing value
Test 123c: Add cgroup filter with u32 ematch u16/non-numeric value
Test 3675: Add cgroup filter with u32 ematch u16/non-numeric mask
Test 1123: Add cgroup filter with u32 ematch u16/negative offset and drop
action
Test 4234: Add cgroup filter with u32 ematch u16/nexthdr+ offset and pass
action
Test e912: Add cgroup filter with u32 ematch u32/zero offset and pipe
action
Test 1435: Add cgroup filter with u32 ematch u32/positive offset and drop
action
Test 1282: Add cgroup filter with u32 ematch u32/missing offset
Test 6456: Add cgroup filter with u32 ematch u32/missing AT keyword
Test 4231: Add cgroup filter with u32 ematch u32/missing value
Test 2131: Add cgroup filter with u32 ematch u32/non-numeric value
Test f125: Add cgroup filter with u32 ematch u32/non-numeric mask
Test 4316: Add cgroup filter with u32 ematch u32/negative offset and drop
action
Test 23ae: Add cgroup filter with u32 ematch u32/nexthdr+ offset and pipe
action
Test 23a1: Add cgroup filter with canid ematch and single SFF
Test 324f: Add cgroup filter with canid ematch and single SFF with mask
Test 2576: Add cgroup filter with canid ematch and multiple SFF
Test 4839: Add cgroup filter with canid ematch and multiple SFF with masks
Test 6713: Add cgroup filter with canid ematch and single EFF
Test ab9d: Add cgroup filter with canid ematch and multiple EFF with masks
Test 5349: Add cgroup filter with canid ematch and a combination of
SFF/EFF
Test c934: Add cgroup filter with canid ematch and a combination of
SFF/EFF with masks
Test 4319: Replace cgroup filter with diffferent match
Test 4636: Delete cgroup filter

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests/tc-testings: add selftests for bpf filter
Zhengchao Shao [Fri, 16 Sep 2022 02:02:45 +0000 (10:02 +0800)]
selftests/tc-testings: add selftests for bpf filter

Test 23c3: Add cBPF filter with valid bytecode
Test 1563: Add cBPF filter with invalid bytecode
Test 2334: Add eBPF filter with valid object-file
Test 2373: Add eBPF filter with invalid object-file
Test 4423: Replace cBPF bytecode
Test 5122: Delete cBPF filter
Test e0a9: List cBPF filters

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/sched: use tc_cls_stats_dump() in filter
Zhengchao Shao [Fri, 16 Sep 2022 02:02:44 +0000 (10:02 +0800)]
net/sched: use tc_cls_stats_dump() in filter

use tc_cls_stats_dump() in filter.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/sched: cls_api: add helper for tc cls walker stats dump
Zhengchao Shao [Fri, 16 Sep 2022 02:02:43 +0000 (10:02 +0800)]
net/sched: cls_api: add helper for tc cls walker stats dump

The walk implementation of most tc cls modules is basically the same.
That is, the values of count and skip are checked first. If count is
greater than or equal to skip, the registered fn function is executed.
Otherwise, increase the value of count. So we can reconstruct them.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: broadcom: bcm4908_enet: handle -EPROBE_DEFER when getting MAC
Rafał Miłecki [Thu, 15 Sep 2022 13:30:13 +0000 (15:30 +0200)]
net: broadcom: bcm4908_enet: handle -EPROBE_DEFER when getting MAC

Reading MAC from OF may return -EPROBE_DEFER if underlaying NVMEM device
isn't ready yet. In such case pass that error code up and "wait" to be
probed later.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Link: https://lore.kernel.org/r/20220915133013.2243-1-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: make NET_(DEV|NS)_REFCNT_TRACKER depend on NET
Lukas Bulwahn [Thu, 15 Sep 2022 12:42:56 +0000 (14:42 +0200)]
net: make NET_(DEV|NS)_REFCNT_TRACKER depend on NET

It makes little sense to ask if networking namespace or net device refcount
tracking shall be enabled for debug kernel builds without network support.

This is similar to the commit eb0b39efb7d9 ("net: CONFIG_DEBUG_NET depends
on CONFIG_NET").

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220915124256.32512-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'small-tc-taprio-improvements'
Jakub Kicinski [Tue, 20 Sep 2022 20:54:39 +0000 (13:54 -0700)]
Merge branch 'small-tc-taprio-improvements'

Vladimir Oltean says:

====================
Small tc-taprio improvements

This series contains:
- the proper protected variant of rcu_dereference() of admin and oper
  schedules for accesses from the slow path
- a removal of an extra function pointer indirection for
  qdisc->dequeue() and qdisc->peek()
- a removal of WARN_ON_ONCE() checks that can never trigger
- the addition of netlink extack messages to some qdisc->init() failures

These were split from an earlier patch set, hence the v2.
====================

Link: https://lore.kernel.org/r/20220915105046.2404072-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/sched: taprio: replace safety precautions with comments
Vladimir Oltean [Thu, 15 Sep 2022 10:50:46 +0000 (13:50 +0300)]
net/sched: taprio: replace safety precautions with comments

The WARN_ON_ONCE() checks introduced in commit 13511704f8d7 ("net:
taprio offload: enforce qdisc to netdev queue mapping") take a small
toll on performance, but otherwise, the conditions are never expected to
happen. Replace them with comments, such that the information is still
conveyed to developers.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/sched: taprio: add extack messages in taprio_init
Vladimir Oltean [Thu, 15 Sep 2022 10:50:45 +0000 (13:50 +0300)]
net/sched: taprio: add extack messages in taprio_init

Stop contributing to the proverbial user unfriendliness of tc, and tell
the user what is wrong wherever possible.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/sched: taprio: stop going through private ops for dequeue and peek
Vladimir Oltean [Thu, 15 Sep 2022 10:50:44 +0000 (13:50 +0300)]
net/sched: taprio: stop going through private ops for dequeue and peek

Since commit 13511704f8d7 ("net: taprio offload: enforce qdisc to netdev
queue mapping"), taprio_dequeue_soft() and taprio_peek_soft() are de
facto the only implementations for Qdisc_ops :: dequeue and Qdisc_ops ::
peek that taprio provides.

This is because in full offload mode, __dev_queue_xmit() will select a
txq->qdisc which is never root taprio qdisc. So if nothing is enqueued
in the root qdisc, it will never be run and nothing will get dequeued
from it.

Therefore, we can remove the private indirection from taprio, and always
point Qdisc_ops :: dequeue to taprio_dequeue_soft (now simply named
taprio_dequeue) and Qdisc_ops :: peek to taprio_peek_soft (now simply
named taprio_peek).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/sched: taprio: remove redundant FULL_OFFLOAD_IS_ENABLED check in taprio_enqueue
Vladimir Oltean [Thu, 15 Sep 2022 10:50:43 +0000 (13:50 +0300)]
net/sched: taprio: remove redundant FULL_OFFLOAD_IS_ENABLED check in taprio_enqueue

Since commit 13511704f8d7 ("net: taprio offload: enforce qdisc to netdev
queue mapping"), __dev_queue_xmit() will select a txq->qdisc for the
full offload case of taprio which isn't the root taprio qdisc, so
qdisc enqueues will never pass through taprio_enqueue().

That commit already introduced one safety precaution check for
FULL_OFFLOAD_IS_ENABLED(); a second one is really not needed, so
simplify the conditional for entering into the GSO segmentation logic.
Also reword the comment a little, to appear more natural after the code
change.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/sched: taprio: use rtnl_dereference for oper and admin sched in taprio_destroy()
Vladimir Oltean [Thu, 15 Sep 2022 10:50:42 +0000 (13:50 +0300)]
net/sched: taprio: use rtnl_dereference for oper and admin sched in taprio_destroy()

Sparse complains that taprio_destroy() dereferences q->oper_sched and
q->admin_sched without rcu_dereference(), since they are marked as __rcu
in the taprio private structure.

1671:28: warning: incorrect type in argument 1 (different address spaces)
1671:28:    expected struct callback_head *head
1671:28:    got struct callback_head [noderef] __rcu *
1674:28: warning: incorrect type in argument 1 (different address spaces)
1674:28:    expected struct callback_head *head
1674:28:    got struct callback_head [noderef] __rcu *

To silence that build warning, do actually use rtnl_dereference(), since
we know the rtnl_mutex is held at the time of q->destroy().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/sched: taprio: taprio_dump and taprio_change are protected by rtnl_mutex
Vladimir Oltean [Thu, 15 Sep 2022 10:50:41 +0000 (13:50 +0300)]
net/sched: taprio: taprio_dump and taprio_change are protected by rtnl_mutex

Since the writer-side lock is taken here, we do not need to open an RCU
read-side critical section, instead we can use rtnl_dereference() to
tell lockdep we are serialized with concurrent writes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/sched: taprio: taprio_offload_config_changed() is protected by rtnl_mutex
Vladimir Oltean [Thu, 15 Sep 2022 10:50:40 +0000 (13:50 +0300)]
net/sched: taprio: taprio_offload_config_changed() is protected by rtnl_mutex

The locking in taprio_offload_config_changed() is wrong (but also
inconsequentially so). The current_entry_lock does not serialize changes
to the admin and oper schedules, only to the current entry. In fact, the
rtnl_mutex does that, and that is taken at the time when taprio_change()
is called.

Replace the rcu_dereference_protected() method with the proper RCU
annotation, and drop the unnecessary spin lock.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'nfp-flower-police-validation-and-ct-enhancements'
Jakub Kicinski [Tue, 20 Sep 2022 20:25:15 +0000 (13:25 -0700)]
Merge branch 'nfp-flower-police-validation-and-ct-enhancements'

Simon Horman says:

====================
nfp: flower: police validation and ct enhancements

this series enhances the flower hardware offload
facility provided by the nfp driver.

1. Add validation of police actions created independently of flows

2. Add support offload of ct NAT action

3. Support offload of rule which has both vlan push/pop/mangle
   and ct action
====================

Link: https://lore.kernel.org/r/20220914160604.1740282-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonfp: flower: support vlan action in pre_ct
Hui Zhou [Wed, 14 Sep 2022 16:06:04 +0000 (17:06 +0100)]
nfp: flower: support vlan action in pre_ct

Support hardware offload of rule which has both vlan push/pop/mangle
and ct action.

Signed-off-by: Hui Zhou <hui.zhou@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonfp: flower: support hw offload for ct nat action
Hui Zhou [Wed, 14 Sep 2022 16:06:03 +0000 (17:06 +0100)]
nfp: flower: support hw offload for ct nat action

support ct nat action when pre_ct merge with post_ct
and nft. at the same time, add the extra checksum action
and hardware stats for nft to meet the action check when
do nat.

Signed-off-by: Hui Zhou <hui.zhou@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonfp: flower: add validation of for police actions which are independent of flows
Ziyang Chen [Wed, 14 Sep 2022 16:06:02 +0000 (17:06 +0100)]
nfp: flower: add validation of for police actions which are independent of flows

Validation of police actions was added to offload drivers in
commit d97b4b105ce7 ("flow_offload: reject offload for all drivers with
invalid police parameters")

This patch extends that validation in the nfp driver to include
police actions which are created independently of flows.

Signed-off-by: Ziyang Chen <ziyang.chen@corigine.com>
Reviewed-by: Baowen Zheng <baowen.zheng@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'tcp-introduce-optional-per-netns-ehash'
Jakub Kicinski [Tue, 20 Sep 2022 17:21:53 +0000 (10:21 -0700)]
Merge branch 'tcp-introduce-optional-per-netns-ehash'

Kuniyuki Iwashima says:

====================
tcp: Introduce optional per-netns ehash.

The more sockets we have in the hash table, the longer we spend looking
up the socket.  While running a number of small workloads on the same
host, they penalise each other and cause performance degradation.

The root cause might be a single workload that consumes much more
resources than the others.  It often happens on a cloud service where
different workloads share the same computing resource.

On EC2 c5.24xlarge instance (196 GiB memory and 524288 (1Mi / 2) ehash
entries), after running iperf3 in different netns, creating 24Mi sockets
without data transfer in the root netns causes about 10% performance
regression for the iperf3's connection.

 thash_entries sockets length Gbps
524288       1      1 50.7
   24Mi     48 45.1

It is basically related to the length of the list of each hash bucket.
For testing purposes to see how performance drops along the length,
I set 131072 (1Mi / 8) to thash_entries, and here's the result.

 thash_entries sockets length Gbps
        131072       1      1 50.7
    1Mi      8 49.9
    2Mi     16 48.9
    4Mi     32 47.3
    8Mi     64 44.6
   16Mi    128 40.6
   24Mi    192 36.3
   32Mi    256 32.5
   40Mi    320 27.0
   48Mi    384 25.0

To resolve the socket lookup degradation, we introduce an optional
per-netns hash table for TCP, but it's just ehash, and we still share
the global bhash, bhash2 and lhash2.

With a smaller ehash, we can look up non-listener sockets faster and
isolate such noisy neighbours.  Also, we can reduce lock contention.

For details, please see the last patch.

  patch 1 - 4: prep for per-netns ehash
  patch     5: small optimisation for netns dismantle without TIME_WAIT sockets
  patch     6: add per-netns ehash

Many thanks to Eric Dumazet for reviewing and advising.
====================

Link: https://lore.kernel.org/r/20220908011022.45342-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: Introduce optional per-netns ehash.
Kuniyuki Iwashima [Thu, 8 Sep 2022 01:10:22 +0000 (18:10 -0700)]
tcp: Introduce optional per-netns ehash.

The more sockets we have in the hash table, the longer we spend looking
up the socket.  While running a number of small workloads on the same
host, they penalise each other and cause performance degradation.

The root cause might be a single workload that consumes much more
resources than the others.  It often happens on a cloud service where
different workloads share the same computing resource.

On EC2 c5.24xlarge instance (196 GiB memory and 524288 (1Mi / 2) ehash
entries), after running iperf3 in different netns, creating 24Mi sockets
without data transfer in the root netns causes about 10% performance
regression for the iperf3's connection.

 thash_entries sockets length Gbps
524288       1      1 50.7
   24Mi     48 45.1

It is basically related to the length of the list of each hash bucket.
For testing purposes to see how performance drops along the length,
I set 131072 (1Mi / 8) to thash_entries, and here's the result.

 thash_entries sockets length Gbps
        131072       1      1 50.7
    1Mi      8 49.9
    2Mi     16 48.9
    4Mi     32 47.3
    8Mi     64 44.6
   16Mi    128 40.6
   24Mi    192 36.3
   32Mi    256 32.5
   40Mi    320 27.0
   48Mi    384 25.0

To resolve the socket lookup degradation, we introduce an optional
per-netns hash table for TCP, but it's just ehash, and we still share
the global bhash, bhash2 and lhash2.

With a smaller ehash, we can look up non-listener sockets faster and
isolate such noisy neighbours.  In addition, we can reduce lock contention.

We can control the ehash size by a new sysctl knob.  However, depending
on workloads, it will require very sensitive tuning, so we disable the
feature by default (net.ipv4.tcp_child_ehash_entries == 0).  Moreover,
we can fall back to using the global ehash in case we fail to allocate
enough memory for a new ehash.  The maximum size is 16Mi, which is large
enough that even if we have 48Mi sockets, the average list length is 3,
and regression would be less than 1%.

We can check the current ehash size by another read-only sysctl knob,
net.ipv4.tcp_ehash_entries.  A negative value means the netns shares
the global ehash (per-netns ehash is disabled or failed to allocate
memory).

  # dmesg | cut -d ' ' -f 5- | grep "established hash"
  TCP established hash table entries: 524288 (order: 10, 4194304 bytes, vmalloc hugepage)

  # sysctl net.ipv4.tcp_ehash_entries
  net.ipv4.tcp_ehash_entries = 524288  # can be changed by thash_entries

  # sysctl net.ipv4.tcp_child_ehash_entries
  net.ipv4.tcp_child_ehash_entries = 0  # disabled by default

  # ip netns add test1
  # ip netns exec test1 sysctl net.ipv4.tcp_ehash_entries
  net.ipv4.tcp_ehash_entries = -524288  # share the global ehash

  # sysctl -w net.ipv4.tcp_child_ehash_entries=100
  net.ipv4.tcp_child_ehash_entries = 100

  # ip netns add test2
  # ip netns exec test2 sysctl net.ipv4.tcp_ehash_entries
  net.ipv4.tcp_ehash_entries = 128  # own a per-netns ehash with 2^n buckets

When more than two processes in the same netns create per-netns ehash
concurrently with different sizes, we need to guarantee the size in
one of the following ways:

  1) Share the global ehash and create per-netns ehash

  First, unshare() with tcp_child_ehash_entries==0.  It creates dedicated
  netns sysctl knobs where we can safely change tcp_child_ehash_entries
  and clone()/unshare() to create a per-netns ehash.

  2) Control write on sysctl by BPF

  We can use BPF_PROG_TYPE_CGROUP_SYSCTL to allow/deny read/write on
  sysctl knobs.

Note that the global ehash allocated at the boot time is spread over
available NUMA nodes, but inet_pernet_hashinfo_alloc() will allocate
pages for each per-netns ehash depending on the current process's NUMA
policy.  By default, the allocation is done in the local node only, so
the per-netns hash table could fully reside on a random node.  Thus,
depending on the NUMA policy the netns is created with and the CPU the
current thread is running on, we could see some performance differences
for highly optimised networking applications.

Note also that the default values of two sysctl knobs depend on the ehash
size and should be tuned carefully:

  tcp_max_tw_buckets  : tcp_child_ehash_entries / 2
  tcp_max_syn_backlog : max(128, tcp_child_ehash_entries / 128)

As a bonus, we can dismantle netns faster.  Currently, while destroying
netns, we call inet_twsk_purge(), which walks through the global ehash.
It can be potentially big because it can have many sockets other than
TIME_WAIT in all netns.  Splitting ehash changes that situation, where
it's only necessary for inet_twsk_purge() to clean up TIME_WAIT sockets
in each netns.

With regard to this, we do not free the per-netns ehash in inet_twsk_kill()
to avoid UAF while iterating the per-netns ehash in inet_twsk_purge().
Instead, we do it in tcp_sk_exit_batch() after calling tcp_twsk_purge() to
keep it protocol-family-independent.

In the future, we could optimise ehash lookup/iteration further by removing
netns comparison for the per-netns ehash.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: Save unnecessary inet_twsk_purge() calls.
Kuniyuki Iwashima [Thu, 8 Sep 2022 01:10:21 +0000 (18:10 -0700)]
tcp: Save unnecessary inet_twsk_purge() calls.

While destroying netns, we call inet_twsk_purge() in tcp_sk_exit_batch()
and tcpv6_net_exit_batch() for AF_INET and AF_INET6.  These commands
trigger the kernel to walk through the potentially big ehash twice even
though the netns has no TIME_WAIT sockets.

  # ip netns add test
  # ip netns del test

  or

  # unshare -n /bin/true >/dev/null

When tw_refcount is 1, we need not call inet_twsk_purge() at least
for the net.  We can save such unneeded iterations if all netns in
net_exit_list have no TIME_WAIT sockets.  This change eliminates
the tax by the additional unshare() described in the next patch to
guarantee the per-netns ehash size.

Tested:

  # mount -t debugfs none /sys/kernel/debug/
  # echo cleanup_net > /sys/kernel/debug/tracing/set_ftrace_filter
  # echo inet_twsk_purge >> /sys/kernel/debug/tracing/set_ftrace_filter
  # echo function > /sys/kernel/debug/tracing/current_tracer
  # cat ./add_del_unshare.sh
  for i in `seq 1 40`
  do
      (for j in `seq 1 100` ; do  unshare -n /bin/true >/dev/null ; done) &
  done
  wait;
  # ./add_del_unshare.sh

Before the patch:

  # cat /sys/kernel/debug/tracing/trace_pipe
    kworker/u128:0-8       [031] ...1.   174.162765: cleanup_net <-process_one_work
    kworker/u128:0-8       [031] ...1.   174.240796: inet_twsk_purge <-cleanup_net
    kworker/u128:0-8       [032] ...1.   174.244759: inet_twsk_purge <-tcp_sk_exit_batch
    kworker/u128:0-8       [034] ...1.   174.290861: cleanup_net <-process_one_work
    kworker/u128:0-8       [039] ...1.   175.245027: inet_twsk_purge <-cleanup_net
    kworker/u128:0-8       [046] ...1.   175.290541: inet_twsk_purge <-tcp_sk_exit_batch
    kworker/u128:0-8       [037] ...1.   175.321046: cleanup_net <-process_one_work
    kworker/u128:0-8       [024] ...1.   175.941633: inet_twsk_purge <-cleanup_net
    kworker/u128:0-8       [025] ...1.   176.242539: inet_twsk_purge <-tcp_sk_exit_batch

After:

  # cat /sys/kernel/debug/tracing/trace_pipe
    kworker/u128:0-8       [038] ...1.   428.116174: cleanup_net <-process_one_work
    kworker/u128:0-8       [038] ...1.   428.262532: cleanup_net <-process_one_work
    kworker/u128:0-8       [030] ...1.   429.292645: cleanup_net <-process_one_work

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: Access &tcp_hashinfo via net.
Kuniyuki Iwashima [Thu, 8 Sep 2022 01:10:20 +0000 (18:10 -0700)]
tcp: Access &tcp_hashinfo via net.

We will soon introduce an optional per-netns ehash.

This means we cannot use tcp_hashinfo directly in most places.

Instead, access it via net->ipv4.tcp_death_row.hashinfo.

The access will be valid only while initialising tcp_hashinfo
itself and creating/destroying each netns.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: Set NULL to sk->sk_prot->h.hashinfo.
Kuniyuki Iwashima [Thu, 8 Sep 2022 01:10:19 +0000 (18:10 -0700)]
tcp: Set NULL to sk->sk_prot->h.hashinfo.

We will soon introduce an optional per-netns ehash.

This means we cannot use the global sk->sk_prot->h.hashinfo
to fetch a TCP hashinfo.

Instead, set NULL to sk->sk_prot->h.hashinfo for TCP and get
a proper hashinfo from net->ipv4.tcp_death_row.hashinfo.

Note that we need not use sk->sk_prot->h.hashinfo if DCCP is
disabled.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: Don't allocate tcp_death_row outside of struct netns_ipv4.
Kuniyuki Iwashima [Thu, 8 Sep 2022 01:10:18 +0000 (18:10 -0700)]
tcp: Don't allocate tcp_death_row outside of struct netns_ipv4.

We will soon introduce an optional per-netns ehash and access hash
tables via net->ipv4.tcp_death_row->hashinfo instead of &tcp_hashinfo
in most places.

It could harm the fast path because dereferences of two fields in net
and tcp_death_row might incur two extra cache line misses.  To save one
dereference, let's place tcp_death_row back in netns_ipv4 and fetch
hashinfo via net->ipv4.tcp_death_row"."hashinfo.

Note tcp_death_row was initially placed in netns_ipv4, and commit
fbb8295248e1 ("tcp: allocate tcp_death_row outside of struct netns_ipv4")
changed it to a pointer so that we can fire TIME_WAIT timers after freeing
net.  However, we don't do so after commit 04c494e68a13 ("Revert "tcp/dccp:
get rid of inet_twsk_purge()""), so we need not define tcp_death_row as a
pointer.

Also, we move refcount_dec_and_test(&tw_refcount) from tcp_sk_exit() to
tcp_sk_exit_batch() as a debug check.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: Clean up some functions.
Kuniyuki Iwashima [Thu, 8 Sep 2022 01:10:17 +0000 (18:10 -0700)]
tcp: Clean up some functions.

This patch adds no functional change and cleans up some functions
that the following patches touch around so that we make them tidy
and easy to review/revert.  The changes are

  - Keep reverse christmas tree order
  - Remove unnecessary init of port in inet_csk_find_open_port()
  - Use req_to_sk() once in reqsk_queue_unlink()
  - Use sock_net(sk) once in tcp_time_wait() and tcp_v[46]_connect()

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoheaders: Remove some left-over license text
Christophe JAILLET [Sun, 11 Sep 2022 11:39:01 +0000 (13:39 +0200)]
headers: Remove some left-over license text

Remove a left-over from commit 2874c5fd2842 ("treewide: Replace GPLv2
boilerplate/reference with SPDX - rule 152")

There is no need for an empty "License:".

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/0e5ff727626b748238f4b78932f81572143d8f0b.1662896317.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests/bonding: add a test for bonding lladdr target
Hangbin Liu [Tue, 20 Sep 2022 03:30:47 +0000 (11:30 +0800)]
selftests/bonding: add a test for bonding lladdr target

This is a regression test for commit 592335a4164c ("bonding: accept
unsolicited NA message") and commit b7f14132bf58 ("bonding: use unspecified
address if no available link local address"). When the bond interface
up and no available link local address, unspecified address(::) is used to
send the NS message. The unsolicited NA message should also be accepted
for validation.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/20220920033047.173244-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: mdio: mux-multiplexer: Switch to use dev_err_probe() helper
Yang Yingliang [Thu, 15 Sep 2022 06:50:43 +0000 (14:50 +0800)]
net: mdio: mux-multiplexer: Switch to use dev_err_probe() helper

dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220915065043.665138-3-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: mdio: mux-mmioreg: Switch to use dev_err_probe() helper
Yang Yingliang [Thu, 15 Sep 2022 06:50:42 +0000 (14:50 +0800)]
net: mdio: mux-mmioreg: Switch to use dev_err_probe() helper

dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220915065043.665138-2-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: mdio: mux-meson-g12a: Switch to use dev_err_probe() helper
Yang Yingliang [Thu, 15 Sep 2022 06:50:41 +0000 (14:50 +0800)]
net: mdio: mux-meson-g12a: Switch to use dev_err_probe() helper

dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220915065043.665138-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoravb: Add RZ/G2L MII interface support
Biju Das [Wed, 14 Sep 2022 19:26:04 +0000 (20:26 +0100)]
ravb: Add RZ/G2L MII interface support

EMAC IP found on RZ/G2L Gb ethernet supports MII interface.
This patch adds support for selecting MII interface mode.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/20220914192604.265859-1-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: rtnetlink: Enslave device before bringing it up
Phil Sutter [Wed, 14 Sep 2022 15:06:23 +0000 (17:06 +0200)]
net: rtnetlink: Enslave device before bringing it up

Unlike with bridges, one can't add an interface to a bond and set it up
at the same time:

| # ip link set dummy0 down
| # ip link set dummy0 master bond0 up
| Error: Device can not be enslaved while up.

Of all drivers with ndo_add_slave callback, bond and team decline if
IFF_UP flag is set, vrf cycles the interface (i.e., sets it down and
immediately up again) and the others just don't care.

Support the common notion of setting the interface up after enslaving it
by sorting the operations accordingly.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220914150623.24152-1-phil@nwl.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'macb-add-zynqmp-sgmii-dynamic-configuration-support'
Jakub Kicinski [Tue, 20 Sep 2022 15:33:07 +0000 (08:33 -0700)]
Merge branch 'macb-add-zynqmp-sgmii-dynamic-configuration-support'

Radhey Shyam Pandey says:

====================
macb: add zynqmp SGMII dynamic configuration support

This patchset add firmware and driver support to do SD/GEM dynamic
configuration. In traditional flow GEM secure space configuration
is done by FSBL. However in specific usescases like dynamic designs
where GEM is not enabled in base vivado design, FSBL skips GEM
initialization and we need a mechanism to configure GEM secure space
in linux space at runtime.
====================

Link: https://lore.kernel.org/r/1663158796-14869-1-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: macb: Add zynqmp SGMII dynamic configuration support
Radhey Shyam Pandey [Wed, 14 Sep 2022 12:33:16 +0000 (18:03 +0530)]
net: macb: Add zynqmp SGMII dynamic configuration support

Add support for the dynamic configuration which takes care of
configuring the GEM secure space configuration registers
using EEMI APIs.
High level sequence is to:
- Check for the PM dynamic configuration support, if no error proceed with
  GEM dynamic configurations(next steps) otherwise skip the dynamic
  configuration.
- Configure GEM Fixed configurations.
- Configure GEM_CLK_CTRL (gemX_sgmii_mode).
- Trigger GEM reset.

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Conor Dooley <conor.dooley@microchip.com> (for MPFS)
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agofirmware: xilinx: add support for sd/gem config
Ronak Jain [Wed, 14 Sep 2022 12:33:15 +0000 (18:03 +0530)]
firmware: xilinx: add support for sd/gem config

Add new APIs in firmware to configure SD/GEM registers. Internally
it calls PM IOCTL for below SD/GEM register configuration:
- SD/EMMC select
- SD slot type
- SD base clock
- SD 8 bit support
- SD fixed config
- GEM SGMII Mode
- GEM fixed config

Signed-off-by: Ronak Jain <ronak.jain@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoxen-netfront: make bounce_skb static
ruanjinjie [Wed, 14 Sep 2022 06:43:39 +0000 (14:43 +0800)]
xen-netfront: make bounce_skb static

The symbol is not used outside of the file, so mark it static.

Fixes the following warning:

./drivers/net/xen-netfront.c:676:16: warning: symbol 'bounce_skb' was not declared. Should it be static?

Signed-off-by: ruanjinjie <ruanjinjie@huawei.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20220914064339.49841-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: phy: micrel: Add interrupts support for LAN8804 PHY
Horatiu Vultur [Tue, 13 Sep 2022 14:29:26 +0000 (16:29 +0200)]
net: phy: micrel: Add interrupts support for LAN8804 PHY

Add support for interrupts for LAN8804 PHY.

Tested-by: Michael Walle <michael@walle.cc> # on kontron-kswitch-d10
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20220913142926.816746-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'sfp-add-support-for-halny-gpon-module'
Jakub Kicinski [Tue, 20 Sep 2022 14:54:16 +0000 (07:54 -0700)]
Merge branch 'sfp-add-support-for-halny-gpon-module'

Russell King says:

====================
sfp: add support for HALNy GPON module

This series adds support for the HALNy GPON SFP module. In order to do
this sensibly, we need a more flexible quirk system, since we need to
change the behaviour of the SFP cage driver to ignore the LOS and
TX_FAULT signals after module detection.

Since we move the SFP quirks into the SFP cage driver, we can use it
for the MA5671A and 3FE46541AA modules as well.
====================

Link: https://lore.kernel.org/r/YyDUnvM1b0dZPmmd@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: sfp: add support for HALNy GPON SFP
Russell King (Oracle) [Tue, 13 Sep 2022 19:06:48 +0000 (20:06 +0100)]
net: sfp: add support for HALNy GPON SFP

Add a quirk for the HALNy HL-GSFP module, which appears to have an
inverted RX_LOS signal, and maybe uses TX_FAULT as a serial port
transmit pin. Rather than use these hardware signals, switch to
using software polling for these status signals.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: sfp: move Huawei MA5671A fixup
Russell King (Oracle) [Tue, 13 Sep 2022 19:06:42 +0000 (20:06 +0100)]
net: sfp: move Huawei MA5671A fixup

Move this module over to the new fixup mechanism.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: sfp: move Alcatel Lucent 3FE46541AA fixup
Russell King (Oracle) [Tue, 13 Sep 2022 19:06:37 +0000 (20:06 +0100)]
net: sfp: move Alcatel Lucent 3FE46541AA fixup

Add a new fixup mechanism to the SFP quirks, and use it for this
module.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: sfp: move quirk handling into sfp.c
Russell King (Oracle) [Tue, 13 Sep 2022 19:06:32 +0000 (20:06 +0100)]
net: sfp: move quirk handling into sfp.c

We need to handle more quirks than just those which affect the link
modes of the module. Move the quirk lookup into sfp.c, and pass the
quirk to sfp-bus.c

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: sfp: re-implement soft state polling setup
Russell King (Oracle) [Tue, 13 Sep 2022 19:06:27 +0000 (20:06 +0100)]
net: sfp: re-implement soft state polling setup

Re-implement the decision making for soft state polling. Instead of
generating the soft state mask in sfp_soft_start_poll() by looking at
which GPIOs are available, record their availability in
sfp_sm_mod_probe() in sfp->state_hw_mask.

This will then allow us to clear bits in sfp->state_hw_mask in module
specific quirks when the hardware signals should not be used, thereby
allowing us to switch to using the software state polling.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agodt-bindings: net: dsa: convert ocelot.txt to dt-schema
Vladimir Oltean [Tue, 13 Sep 2022 12:58:06 +0000 (15:58 +0300)]
dt-bindings: net: dsa: convert ocelot.txt to dt-schema

Replace the free-form description of device tree bindings for VSC9959
and VSC9953 with a YAML formatted dt-schema description. This contains
more or less the same information, but reworded to be a bit more
succint.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220913125806.524314-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'net-ipa-a-mix-of-cleanups'
Jakub Kicinski [Tue, 20 Sep 2022 14:45:51 +0000 (07:45 -0700)]
Merge branch 'net-ipa-a-mix-of-cleanups'

Alex Elder says:

====================
net: ipa: a mix of cleanups

This series contains a set of cleanups done in preparation for a
more substantitive upcoming series that reworks how IPA registers
and their fields are defined.

The first eliminates about half of the possible GSI register
constant symbols by removing offset definitions that are not
currently required.

The next two mainly rearrange code for some common enumerated types.

The next one fixes two spots that reuse local variable names in
inner scopes when defining offsets.

The next adds some additional restrictions on the value held in a
register.

And the last one just fixes two field mask symbol names so they
adhere to the common naming convention.
====================

Link: https://lore.kernel.org/r/20220910011131.1431934-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>