platform/kernel/linux-starfive.git
3 years agonet: hns3: refactor dump qos pause cfg of debugfs
Guangbin Huang [Thu, 20 May 2021 02:21:38 +0000 (10:21 +0800)]
net: hns3: refactor dump qos pause cfg of debugfs

Currently, user gets pause config by implementing debugfs command
"echo dump qos pause cfg > cmd", this command will dump info in dmesg.
It's unnecessary and heavy.

To optimize it, create a single file "qos_pause_cfg" in tm directory
and use cat command to get info. It will return info to userspace,
rather than record in dmesg.

The display style is below:
$ cat qos_pause_cfg
pause_trans_gap: 0x7f
pause_trans_time: 0xffff

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump tc of debugfs
Guangbin Huang [Thu, 20 May 2021 02:21:37 +0000 (10:21 +0800)]
net: hns3: refactor dump tc of debugfs

Currently, user gets tc schedule info by implementing debugfs command
"echo dump tc > cmd", this command will dump info in dmesg. It's
unnecessary and heavy.

To optimize it, create a single file "tc_sch_info" and use cat command
to get info. It will return info to userspace, rather than record in
dmesg.

The display style is below:
$ cat tc_sch_info
enabled tc number: 4
weight_offset: 14
TC    MODE  WEIGHT
0     dwrr     25
1     dwrr     25
2     dwrr     25
3     dwrr     25
4     dwrr      0
5     dwrr      0
6     dwrr      0
7     dwrr      0

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump tm of debugfs
Guangbin Huang [Thu, 20 May 2021 02:21:36 +0000 (10:21 +0800)]
net: hns3: refactor dump tm of debugfs

Currently, user gets some tm info by implementing debugfs command
"echo dump tm > cmd", this command will dump info in dmesg. It's
unnecessary and heavy.

In addition, the info of this command mixes info of qset, priority,
pg and port. Qset and priority have their own command to get info of
themself, so can remove info of qset and priority from this command.

To optimize it, create two new files "tm_pg", "tm_port" in tm directory
and use cat command to separately get info of pg and port.

The display style is below:
$ cat tm_pg
ID  PRI_MAP  MODE DWRR  C_IR_B  C_IR_U  C_IR_S  C_BS_B  C_BS_S ...
00   0x1f    dwrr  1       75       9       0      31      20  ...

$ cat tm_port
IR_B  IR_U  IR_S  BS_B  BS_S  FLAG  RATE(Mbps)
75     9     0    31    20    1     200000

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump tm map of debugfs
Guangbin Huang [Thu, 20 May 2021 02:21:35 +0000 (10:21 +0800)]
net: hns3: refactor dump tm map of debugfs

Currently, the debugfs command for tm map is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create a single file
"tm_map" for it, and query it by command "cat tm_map",
return the result to userspace, rather than record in dmesg.

As user can't specify queue id in cat command, driver will return info
of all queue id.

The display style is below:
$ cat tm_map
queue_id   qset_id   pri_id   tc_id
0000         0000      00       00
INDEX | TM BP QSET MAPPING:
0000  | 00000000:00000000:00000000:00000000:00000000:00000000:00000000
0256  | 00000000:00000000:00000000:00000000:00000000:00000002:00000000
0512  | 00000000:00000000:00000000:00000004:00000000:00000000:00000000
0768  | 00000000:00000008:00000000:00000000:00000000:00000000:00000000

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump fd tcam of debugfs
Hao Chen [Thu, 20 May 2021 02:21:34 +0000 (10:21 +0800)]
net: hns3: refactor dump fd tcam of debugfs

Currently, the debugfs command for fd tcam is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create a single file
"fd_tcam" for it, and query it by command "cat fd_tcam",
return the result to userspace, rather than record in dmesg.

The display style is below:
$ cat fd_tcam
read result tcam key x(31):
00000000
00000000
00000000
08000000
00000600
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
read result tcam key y(31):
00000000
00000000
00000000
f7ff0000
0000f900
00000000
00000000
00000000
00000000
00000000
00000000
00000000
0000fff8

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor queue info of debugfs
Hao Chen [Thu, 20 May 2021 02:21:33 +0000 (10:21 +0800)]
net: hns3: refactor queue info of debugfs

Currently, the debugfs command for queue info is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create two files
"rx_queue_info" and "tx_queue_info" for it, and query it
by command "cat rx_queue_info" and "cat tx_queue_info",
return the result to userspace, rather than record in dmesg.

The display style is below:
$ cat rx_queue_info
QUEUE_ID  BD_NUM  BD_LEN  TAIL  HEAD  FBDNUM  PKTNUM   ...
0           0       0     0     0       0       0      ...
1           0       0     0     0       0       0      ...
2           0       0     0     0       0       0      ...

$ cat tx_queue_info
QUEUE_ID  BD_NUM  TC  TAIL  HEAD  FBDNUM  OFFSET  PKTNUM  ...
0           0     0     0     0       0       0        0  ...
1           0     0     0     0       0       0        0  ...
2           0     0     0     0       0       0        0  ...

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor queue map of debugfs
Hao Chen [Thu, 20 May 2021 02:21:32 +0000 (10:21 +0800)]
net: hns3: refactor queue map of debugfs

Currently, the debugfs command for queue map is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create a single file
"queue_map" for it, and query it by command "cat queue_map",
return the result to userspace, rather than record in dmesg.

The display style is below:
$ cat queue_map
local_queue_id   global_queue_id   vector_id
0                0                 341

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump reg dcb info of debugfs
Yufeng Mo [Thu, 20 May 2021 02:21:31 +0000 (10:21 +0800)]
net: hns3: refactor dump reg dcb info of debugfs

Currently, the debugfs command for reg dcb info is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create a single file
"dcb" for it, and query it by command "cat dcb",
return the result to userspace, rather than record in dmesg.

The display style is below:
$ cat dcb
qset_id  roce_qset_mask  nic_qset_mask  qset_shaping_pass  qset_bp_status
0000           0x1            0x1             0x1               0x0
0001           0x1            0x1             0x1               0x0
0002           0x1            0x1             0x1               0x0
0003           0x1            0x1             0x1               0x0
0004           0x1            0x1             0x1               0x0
0005           0x1            0x1             0x1               0x0
0006           0x1            0x1             0x1               0x0
0007           0x1            0x1             0x1               0x0
pri_id  pri_mask  pri_cshaping_pass  pri_pshaping_pass
000       0x1           0x0                0x1
001       0x1           0x0                0x0
002       0x1           0x0                0x0
003       0x1           0x0                0x0
004       0x1           0x0                0x0
005       0x1           0x0                0x0
006       0x1           0x0                0x0
007       0x1           0x0                0x0
pg_id  pg_mask  pg_cshaping_pass  pg_pshaping_pass
000      0x1           0x0               0x1

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hns3: refactor dump reg of debugfs
Yufeng Mo [Thu, 20 May 2021 02:21:30 +0000 (10:21 +0800)]
net: hns3: refactor dump reg of debugfs

Currently, the debugfs command for reg is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create some files
"bios_common/ssu/igu_egu/rpu/ncsi/rtc/ppp/rcb/tqp/mac" for it,
and query it by command "cat xxx", return the result to
userspace, rather than record in dmesg.

The display style is below:
$ cat bios_common
BP_CPU_STATE: 0x0
DFX_MSIX_INFO_NIC_0: 0xc000
DFX_MSIX_INFO_NIC_1: 0x0
DFX_MSIX_INFO_NIC_2: 0x0
DFX_MSIX_INFO_NIC_3: 0x0
DFX_MSIX_INFO_ROC_0: 0xc000
DFX_MSIX_INFO_ROC_1: 0x0
DFX_MSIX_INFO_ROC_2: 0x0
DFX_MSIX_INFO_ROC_3: 0x0

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomISDN: Remove obsolete PIPELINE_DEBUG debugging information
Zhen Lei [Thu, 20 May 2021 02:14:11 +0000 (10:14 +0800)]
mISDN: Remove obsolete PIPELINE_DEBUG debugging information

As Leon Romanovsky's tips:
The definition of macro PIPELINE_DEBUG is commented more than 10 years ago
and can be seen as a dead code that should be removed.

Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mt7530-interrupt-support'
David S. Miller [Wed, 19 May 2021 20:27:42 +0000 (13:27 -0700)]
Merge branch 'mt7530-interrupt-support'

DENG Qingfang says:

====================
MT7530 interrupt support

Add support for MT7530 interrupt controller.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agostaging: mt7621-dts: enable MT7530 interrupt controller
DENG Qingfang [Wed, 19 May 2021 03:32:02 +0000 (11:32 +0800)]
staging: mt7621-dts: enable MT7530 interrupt controller

Enable MT7530 interrupt controller in the MT7621 SoC.

Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodt-bindings: net: dsa: add MT7530 interrupt controller binding
DENG Qingfang [Wed, 19 May 2021 03:32:01 +0000 (11:32 +0800)]
dt-bindings: net: dsa: add MT7530 interrupt controller binding

Add device tree binding to support MT7530 interrupt controller.

Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: mt7530: add interrupt support
DENG Qingfang [Wed, 19 May 2021 03:32:00 +0000 (11:32 +0800)]
net: dsa: mt7530: add interrupt support

Add support for MT7530 interrupt controller to handle internal PHYs.
In order to assign an IRQ number to each PHY, the registration of MDIO bus
is also done in this driver.

Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: add MediaTek Gigabit Ethernet PHY driver
DENG Qingfang [Wed, 19 May 2021 03:31:59 +0000 (11:31 +0800)]
net: phy: add MediaTek Gigabit Ethernet PHY driver

Add support for MediaTek Gigabit Ethernet PHYs found in MT7530 and
MT7531 switches.
The initialization procedure is from the vendor driver, but due to lack
of documentation, the function of some register values remains unknown.

Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: qrtr: ns: Fix error return code in qrtr_ns_init()
Wei Yongjun [Wed, 19 May 2021 15:58:52 +0000 (15:58 +0000)]
net: qrtr: ns: Fix error return code in qrtr_ns_init()

Fix to return a negative error code -ENOMEM from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: c6e08d6251f3 ("net: qrtr: Allocate workqueue before kernel_bind")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: ixp4xx: Fix return value check in ixp4xx_eth_probe()
Wei Yongjun [Wed, 19 May 2021 14:16:27 +0000 (14:16 +0000)]
net: ethernet: ixp4xx: Fix return value check in ixp4xx_eth_probe()

In case of error, the function mdiobus_get_phy() returns NULL
pointer not ERR_PTR(). The IS_ERR() test in the return value
check should be replaced with NULL test.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/sched: cls_api: increase max_reclassify_loop
Davide Caratti [Wed, 19 May 2021 13:17:21 +0000 (15:17 +0200)]
net/sched: cls_api: increase max_reclassify_loop

modern userspace applications, like OVN, can configure the TC datapath to
"recirculate" packets several times. If more than 4 "recirculation" rules
are configured, packets can be dropped by __tcf_classify().
Changing the maximum number of reclassifications (from 4 to 16) should be
sufficient to prevent drops in most use cases, and guard against loops at
the same time.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Wed, 19 May 2021 19:58:29 +0000 (12:58 -0700)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-05-19

The following pull-request contains BPF updates for your *net-next* tree.

We've added 43 non-merge commits during the last 11 day(s) which contain
a total of 74 files changed, 3717 insertions(+), 578 deletions(-).

The main changes are:

1) syscall program type, fd array, and light skeleton, from Alexei.

2) Stop emitting static variables in skeleton, from Andrii.

3) Low level tc-bpf api, from Kumar.

4) Reduce verifier kmalloc/kfree churn, from Lorenz.
====================

3 years agoMerge branch 'mlxsw-mphash-policies'
David S. Miller [Wed, 19 May 2021 19:47:47 +0000 (12:47 -0700)]
Merge branch 'mlxsw-mphash-policies'

Ido Schimmel says:

====================
mlxsw: Add support for new multipath hash policies

This patchset adds support for two new multipath hash policies in mlxsw.

Patch #1 emits net events whenever the
net.ipv{4,6}.fib_multipath_hash_fields sysctls are changed. This allows
listeners to react to changes in the packet fields used for the
computation of the multipath hash.

Patches #2-#3 refactor the code in mlxsw that is responsible for the
configuration of the multipath hash, so that it will be easier to extend
for the two new policies.

Patch #4 adds the register fields required to support the new policies.

Patch #5-#7 add support for inner layer 3 and custom multipath hash
policies.

Tested using following forwarding selftests:

* custom_multipath_hash.sh
* gre_custom_multipath_hash.sh
* gre_inner_v4_multipath.sh
* gre_inner_v6_multipath.sh
====================

3 years agomlxsw: spectrum_router: Add support for custom multipath hash policy
Ido Schimmel [Wed, 19 May 2021 12:08:24 +0000 (15:08 +0300)]
mlxsw: spectrum_router: Add support for custom multipath hash policy

When this policy is set, only enable the packet fields that were enabled
by user space for multipath hash computation.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: spectrum_router: Add support for inner layer 3 multipath hash policy
Ido Schimmel [Wed, 19 May 2021 12:08:23 +0000 (15:08 +0300)]
mlxsw: spectrum_router: Add support for inner layer 3 multipath hash policy

When this policy is set, the kernel uses the inner layer 3 fields for
multipath hash computation and falls back to the outer fields if no
encapsulation was encountered. This behavior is most likely influenced
by the behavior of the flow dissector, which is used for the packet
dissection.

The Spectrum ASIC, however, cannot fallback to outer fields if inner
fields are not available. This should not result in a discrepancy from
the software data path because if several flows have matching inner
fields, they will tend to have matching outer fields as well.

Therefore, implement this policy by enabling both outer and inner layer
3 fields for the multipath hash computation.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: spectrum_outer: Factor out helper for common outer fields
Ido Schimmel [Wed, 19 May 2021 12:08:22 +0000 (15:08 +0300)]
mlxsw: spectrum_outer: Factor out helper for common outer fields

Outer IPv4 and IPv6 addresses are used by multiple multipath hash
policies. Factor out helpers that set these fields to increase code
sharing between different policies.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: reg: Add inner packet fields to RECRv2 register
Ido Schimmel [Wed, 19 May 2021 12:08:21 +0000 (15:08 +0300)]
mlxsw: reg: Add inner packet fields to RECRv2 register

The RECRv2 register is used for setting up the router's ECMP hash
configuration. Extend it with inner packet fields to allow the ECMP hash
to be calculated based on inner flow information.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: spectrum_router: Move multipath hash configuration to a bitmap
Ido Schimmel [Wed, 19 May 2021 12:08:20 +0000 (15:08 +0300)]
mlxsw: spectrum_router: Move multipath hash configuration to a bitmap

Currently, the multipath hash configuration is written directly to the
register payload. While this is OK for the two currently supported
policies, it is going to be hard to follow when more policies and more
packet fields are added.

Instead, set the required headers and fields in a bitmap and then dump
it to the register payload.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: spectrum_router: Replace if statement with a switch statement
Ido Schimmel [Wed, 19 May 2021 12:08:19 +0000 (15:08 +0300)]
mlxsw: spectrum_router: Replace if statement with a switch statement

The code was written when only two multipath hash policies were present,
so the if statement was sufficient. The next patch and future patches
are going to add support for more policies, so move to a switch
statement.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: Add notifications when multipath hash field change
Ido Schimmel [Wed, 19 May 2021 12:08:18 +0000 (15:08 +0300)]
net: Add notifications when multipath hash field change

In-kernel notifications are already sent when the multipath hash policy
itself changes, but not when the multipath hash fields change.

Add these notifications, so that interested listeners (e.g., switch ASIC
drivers) could perform the necessary configuration.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonfc: s3fwrn5: i2c: Enable optional clock from device tree
Stephan Gerhold [Wed, 19 May 2021 09:16:13 +0000 (11:16 +0200)]
nfc: s3fwrn5: i2c: Enable optional clock from device tree

S3FWRN5 depends on a clock input ("XI" pin) to function properly.
Depending on the hardware configuration this could be an always-on
oscillator or some external clock that must be explicitly enabled.

So far we assumed that the clock is always-on.
Make the driver request an (optional) clock from the device tree
and make sure the clock is running before starting S3FWRN5.

Note: S3FWRN5 asserts "GPIO2" whenever it needs the clock input to
function correctly. On some hardware configurations, GPIO2 is
connected directly to an input pin of the external clock provider
(e.g. the main PMIC of the SoC). In that case, it can automatically
AND the clock enable bit and clock request from S3FWRN5 so that
the clock is actually only enabled when needed.

It is also conceivable that on some other hardware configuration
S3FWRN5's GPIO2 might be connected as a regular GPIO input
of the SoC. In that case, follow-up patches could extend the
driver to request the GPIO, set up an interrupt and only enable
the clock when requested by S3FWRN5.

Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodt-bindings: net: nfc: s3fwrn5: Add optional clock
Stephan Gerhold [Wed, 19 May 2021 09:16:12 +0000 (11:16 +0200)]
dt-bindings: net: nfc: s3fwrn5: Add optional clock

On some systems, S3FWRN5 depends on having an external clock enabled
to function correctly. Allow declaring that clock in the device tree.

Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonetlabel: remove unused parameter in netlbl_netlink_auditinfo()
Zheng Yejian [Wed, 19 May 2021 07:34:38 +0000 (15:34 +0800)]
netlabel: remove unused parameter in netlbl_netlink_auditinfo()

loginuid/sessionid/secid have been read from 'current' instead of struct
netlink_skb_parms, the parameter 'skb' seems no longer needed.

Fixes: c53fa1ed92cd ("netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'intel-cleanups'
David S. Miller [Wed, 19 May 2021 19:23:25 +0000 (12:23 -0700)]
Merge branch 'intel-cleanups'

Guangbin Huang says:

====================
net: intel: some cleanups

This patchset adds some cleanups for intel e1000/e1000e ethernet driver.
====================

Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: e1000e: fix misspell word "retreived"
Hao Chen [Wed, 19 May 2021 06:14:45 +0000 (14:14 +0800)]
net: e1000e: fix misspell word "retreived"

There is a misspell word "retreived" in comment, so fix it to "retrieved".

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: e1000e: remove repeated word "slot" for netdev.c
Hao Chen [Wed, 19 May 2021 06:14:44 +0000 (14:14 +0800)]
net: e1000e: remove repeated word "slot" for netdev.c

There are double "slot" in comment, so remove the redundant one.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: e1000e: remove repeated word "the" for ich8lan.c
Hao Chen [Wed, 19 May 2021 06:14:43 +0000 (14:14 +0800)]
net: e1000e: remove repeated word "the" for ich8lan.c

There are double "the" in comment, so remove the redundant one.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: e1000: remove repeated words for e1000_hw.c
Hao Chen [Wed, 19 May 2021 06:14:42 +0000 (14:14 +0800)]
net: e1000: remove repeated words for e1000_hw.c

There are double "in" and "to" in comments, so remove the redundant one.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: e1000: remove repeated word "slot" for e1000_main.c
Hao Chen [Wed, 19 May 2021 06:14:41 +0000 (14:14 +0800)]
net: e1000: remove repeated word "slot" for e1000_main.c

There are double "slot" in comment, so remove the redundant one.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'net-dev-leading-spaces'
David S. Miller [Wed, 19 May 2021 19:17:32 +0000 (12:17 -0700)]
Merge branch 'net-dev-leading-spaces'

Hui Tang says:

====================
net: ethernet: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

        $ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
        $ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: fujitsu: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:53 +0000 (13:30 +0800)]
net: fujitsu: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: 8390: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:52 +0000 (13:30 +0800)]
net: 8390: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: xircom: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:51 +0000 (13:30 +0800)]
net: xircom: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: fealnx: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:50 +0000 (13:30 +0800)]
net: fealnx: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sun: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:49 +0000 (13:30 +0800)]
net: sun: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: smsc: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:48 +0000 (13:30 +0800)]
net: smsc: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sis: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:47 +0000 (13:30 +0800)]
net: sis: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: seeq: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:46 +0000 (13:30 +0800)]
net: seeq: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: realtek: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:45 +0000 (13:30 +0800)]
net: realtek: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: natsemi: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:44 +0000 (13:30 +0800)]
net: natsemi: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: marvell: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:43 +0000 (13:30 +0800)]
net: marvell: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ibm: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:42 +0000 (13:30 +0800)]
net: ibm: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'
Cc: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Acked-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dlink: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:41 +0000 (13:30 +0800)]
net: dlink: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'
Cc: "Alexander A. Klimov" <grandmaster@al2klimov.de>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dec: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:40 +0000 (13:30 +0800)]
net: dec: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: chelsio: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:39 +0000 (13:30 +0800)]
net: chelsio: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: broadcom: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:38 +0000 (13:30 +0800)]
net: broadcom: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: apple: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:37 +0000 (13:30 +0800)]
net: apple: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: amd: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:36 +0000 (13:30 +0800)]
net: amd: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: alteon: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:35 +0000 (13:30 +0800)]
net: alteon: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: 3com: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:34 +0000 (13:30 +0800)]
net: 3com: remove leading spaces before tabs

There are a few leading space before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Steffen Klassert <klassert@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotun: use DEVICE_ATTR_RO macro
YueHaibing [Wed, 19 May 2021 02:38:50 +0000 (10:38 +0800)]
tun: use DEVICE_ATTR_RO macro

Use DEVICE_ATTR_RO helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmveth: fix kobj_to_dev.cocci warnings
YueHaibing [Wed, 19 May 2021 02:28:49 +0000 (10:28 +0800)]
ibmveth: fix kobj_to_dev.cocci warnings

Use kobj_to_dev() instead of container_of()

Generated by: scripts/coccinelle/api/kobj_to_dev.cocci

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobpf: Make some symbols static
Pu Lehui [Wed, 19 May 2021 06:41:16 +0000 (14:41 +0800)]
bpf: Make some symbols static

The sparse tool complains as follows:

kernel/bpf/syscall.c:4567:29: warning:
 symbol 'bpf_sys_bpf_proto' was not declared. Should it be static?
kernel/bpf/syscall.c:4592:29: warning:
 symbol 'bpf_sys_close_proto' was not declared. Should it be static?

This symbol is not used outside of syscall.c, so marks it static.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210519064116.240536-1-pulehui@huawei.com
3 years agobpf: Add cmd alias BPF_PROG_RUN
Alexei Starovoitov [Wed, 19 May 2021 01:40:32 +0000 (18:40 -0700)]
bpf: Add cmd alias BPF_PROG_RUN

Add BPF_PROG_RUN command as an alias to BPF_RPOG_TEST_RUN to better
indicate the full range of use cases done by the command.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210519014032.20908-1-alexei.starovoitov@gmail.com
3 years agoMerge branch 'bpf-loader-progs'
Daniel Borkmann [Wed, 19 May 2021 13:27:32 +0000 (15:27 +0200)]
Merge branch 'bpf-loader-progs'

Alexei Starovoitov says:

====================
v5->v6:
- fixed issue found by bpf CI. The light skeleton generation was
  doing a dry-run of loading the program where all actual sys_bpf syscalls
  were replaced by calls into gen_loader. Turned out that search for valid
  vmlinux_btf was not stubbed out which was causing light skeleton gen
  to fail on older kernels.
- significantly reduced verbosity of gen_loader.c.
- an example trace_printk.lskel.h generated out of progs/trace_printk.c
  https://gist.github.com/4ast/774ea58f8286abac6aa8e3bf3bf3b903

v4->v5:
- addressed a bunch of minor comments from Andrii.
- the main difference is that lskel is now more robust in case of errors
  and a bit cleaner looking.

v3->v4:
- cleaned up closing of temporary FDs in case intermediate sys_bpf fails during
  execution of loader program.
- added support for rodata in the skeleton.
- enforce bpf_prog_type_syscall to be sleepable, since it needs bpf_copy_from_user
  to populate rodata map.
- converted test trace_printk to use lskel to test rodata access.
- various small bug fixes.

v2->v3: Addressed comments from Andrii and John.
- added support for setting max_entries after signature verification
  and used it in ringbuf test, since ringbuf's max_entries has to be updated
  after skeleton open() and before load(). See patch 20.
- bpf_btf_find_by_name_kind doesn't take btf_fd anymore.
  Because of that removed attach_prog_fd from bpf_prog_desc in lskel.
  Both features to be added later.
- cleaned up closing of fd==0 during loader gen by resetting fds back to -1.
- converted loader gen to use memset(&attr, cmd_specific_attr_size).
  would love to see this optimization in the rest of libbpf.
- fixed memory leak during loader_gen in case of enomem.
- support for fd_array kernel feature is added in patch 9 to have
  exhaustive testing across all selftests and then partially reverted
  in patch 15 to keep old style map_fd patching tested as well.
- since fentry_test/fexit_tests were extended with re-attach had to add
  support for per-program attach method in lskel and use it in the tests.
- cleanup closing of fds in lskel in case of partial failures.
- fixed numerous small nits.

v1->v2: Addressed comments from Al, Yonghong and Andrii.
- documented sys_close fdget/fdput requirement and non-recursion check.
- reduced internal api leaks between libbpf and bpftool.
  Now bpf_object__gen_loader() is the only new libbf api with minimal fields.
- fixed light skeleton __destroy() method to munmap and close maps and progs.
- refactored bpf_btf_find_by_name_kind to return btf_id | (btf_obj_fd << 32).
- refactored use of bpf_btf_find_by_name_kind from loader prog.
- moved auto-gen like code into skel_internal.h that is used by *.lskel.h
  It has minimal static inline bpf_load_and_run() method used by lskel.
- added lksel.h example in patch 15.
- replaced union bpf_map_prog_desc with struct bpf_map_desc and struct bpf_prog_desc.
- removed mark_feat_supported and added a patch to pass 'obj' into kernel_supports.
- added proper tracking of temporary FDs in loader prog and their cleanup via bpf_sys_close.
- rename gen_trace.c into gen_loader.c to better align the naming throughout.
- expanded number of available helpers in new prog type.
- added support for raw_tp attaching in lskel.
  lskel supports tracing and raw_tp progs now.
  It correctly loads all networking prog types too, but __attach() method is tbd.
- converted progs/test_ksyms_module.c to lskel.
- minor feedback fixes all over.

The description of V1 set is still valid:

This is a first step towards signed bpf programs and the third approach of that kind.
The first approach was to bring libbpf into the kernel as a user-mode-driver.
The second approach was to invent a new file format and let kernel execute
that format as a sequence of syscalls that create maps and load programs.
This third approach is using new type of bpf program instead of inventing file format.
1st and 2nd approaches had too many downsides comparing to this 3rd and were discarded
after months of work.

To make it work the following new concepts are introduced:
1. syscall bpf program type
A kind of bpf program that can do sys_bpf and sys_close syscalls.
It can only execute in user context.

2. FD array or FD index.
Traditionally BPF instructions are patched with FDs.
What it means that maps has to be created first and then instructions modified
which breaks signature verification if the program is signed.
Instead of patching each instruction with FD patch it with an index into array of FDs.
That makes the program signature stable if it uses maps.

3. loader program that is generated as "strace of libbpf".
When libbpf is loading bpf_file.o it does a bunch of sys_bpf() syscalls to
load BTF, create maps, populate maps and finally load programs.
Instead of actually doing the syscalls generate a trace of what libbpf
would have done and represent it as the "loader program".
The "loader program" consists of single map and single bpf program that
does those syscalls.
Executing such "loader program" via bpf_prog_test_run() command will
replay the sequence of syscalls that libbpf would have done which will result
the same maps created and programs loaded as specified in the elf file.
The "loader program" removes libelf and majority of libbpf dependency from
program loading process.

4. light skeleton
Instead of embedding the whole elf file into skeleton and using libbpf
to parse it later generate a loader program and embed it into "light skeleton".
Such skeleton can load the same set of elf files, but it doesn't need
libbpf and libelf to do that. It only needs few sys_bpf wrappers.

Future steps:
- support CO-RE in the kernel. This patch set is already too big,
  so that critical feature is left for the next step.
- generate light skeleton in golang to allow such users use BTF and
  all other features provided by libbpf
- generate light skeleton for kernel, so that bpf programs can be embeded
  in the kernel module. The UMD usage in bpf_preload will be replaced with
  such skeleton, so bpf_preload would become a standard kernel module
  without user space dependency.
- finally do the signing of the loader program.

The patches are work in progress with few rough edges.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
3 years agoselftests/bpf: Convert test trace_printk to lskel.
Alexei Starovoitov [Fri, 14 May 2021 00:36:23 +0000 (17:36 -0700)]
selftests/bpf: Convert test trace_printk to lskel.

Convert test trace_printk to light skeleton to check
rodata support in lskel.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-22-alexei.starovoitov@gmail.com
3 years agoselftests/bpf: Convert test printk to use rodata.
Alexei Starovoitov [Fri, 14 May 2021 00:36:22 +0000 (17:36 -0700)]
selftests/bpf: Convert test printk to use rodata.

Convert test trace_printk to more aggressively validate and use rodata.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-21-alexei.starovoitov@gmail.com
3 years agoselftests/bpf: Convert atomics test to light skeleton.
Alexei Starovoitov [Fri, 14 May 2021 00:36:21 +0000 (17:36 -0700)]
selftests/bpf: Convert atomics test to light skeleton.

Convert prog_tests/atomics.c to lskel.h

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-20-alexei.starovoitov@gmail.com
3 years agoselftests/bpf: Convert few tests to light skeleton.
Alexei Starovoitov [Fri, 14 May 2021 00:36:20 +0000 (17:36 -0700)]
selftests/bpf: Convert few tests to light skeleton.

Convert few tests that don't use CO-RE to light skeleton.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-19-alexei.starovoitov@gmail.com
3 years agobpftool: Use syscall/loader program in "prog load" and "gen skeleton" command.
Alexei Starovoitov [Fri, 14 May 2021 00:36:19 +0000 (17:36 -0700)]
bpftool: Use syscall/loader program in "prog load" and "gen skeleton" command.

Add -L flag to bpftool to use libbpf gen_trace facility and syscall/loader program
for skeleton generation and program loading.

"bpftool gen skeleton -L" command will generate a "light skeleton" or "loader skeleton"
that is similar to existing skeleton, but has one major difference:
$ bpftool gen skeleton lsm.o > lsm.skel.h
$ bpftool gen skeleton -L lsm.o > lsm.lskel.h
$ diff lsm.skel.h lsm.lskel.h
@@ -5,34 +4,34 @@
 #define __LSM_SKEL_H__

 #include <stdlib.h>
-#include <bpf/libbpf.h>
+#include <bpf/bpf.h>

The light skeleton does not use majority of libbpf infrastructure.
It doesn't need libelf. It doesn't parse .o file.
It only needs few sys_bpf wrappers. All of them are in bpf/bpf.h file.
In future libbpf/bpf.c can be inlined into bpf.h, so not even libbpf.a would be
needed to work with light skeleton.

"bpftool prog load -L file.o" command is introduced for debugging of syscall/loader
program generation. Just like the same command without -L it will try to load
the programs from file.o into the kernel. It won't even try to pin them.

"bpftool prog load -L -d file.o" command will provide additional debug messages
on how syscall/loader program was generated.
Also the execution of syscall/loader program will use bpf_trace_printk() for
each step of loading BTF, creating maps, and loading programs.
The user can do "cat /.../trace_pipe" for further debug.

An example of fexit_sleep.lskel.h generated from progs/fexit_sleep.c:
struct fexit_sleep {
struct bpf_loader_ctx ctx;
struct {
struct bpf_map_desc bss;
} maps;
struct {
struct bpf_prog_desc nanosleep_fentry;
struct bpf_prog_desc nanosleep_fexit;
} progs;
struct {
int nanosleep_fentry_fd;
int nanosleep_fexit_fd;
} links;
struct fexit_sleep__bss {
int pid;
int fentry_cnt;
int fexit_cnt;
} *bss;
};

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-18-alexei.starovoitov@gmail.com
3 years agolibbpf: Introduce bpf_map__initial_value().
Alexei Starovoitov [Fri, 14 May 2021 00:36:18 +0000 (17:36 -0700)]
libbpf: Introduce bpf_map__initial_value().

Introduce bpf_map__initial_value() to read initial contents
of mmaped data/rodata/bss maps.
Note that bpf_map__set_initial_value() doesn't allow modifying
kconfig map while bpf_map__initial_value() allows reading
its values.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-17-alexei.starovoitov@gmail.com
3 years agolibbpf: Cleanup temp FDs when intermediate sys_bpf fails.
Alexei Starovoitov [Fri, 14 May 2021 00:36:17 +0000 (17:36 -0700)]
libbpf: Cleanup temp FDs when intermediate sys_bpf fails.

Fix loader program to close temporary FDs when intermediate
sys_bpf command fails.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-16-alexei.starovoitov@gmail.com
3 years agolibbpf: Generate loader program out of BPF ELF file.
Alexei Starovoitov [Fri, 14 May 2021 00:36:16 +0000 (17:36 -0700)]
libbpf: Generate loader program out of BPF ELF file.

The BPF program loading process performed by libbpf is quite complex
and consists of the following steps:
"open" phase:
- parse elf file and remember relocations, sections
- collect externs and ksyms including their btf_ids in prog's BTF
- patch BTF datasec (since llvm couldn't do it)
- init maps (old style map_def, BTF based, global data map, kconfig map)
- collect relocations against progs and maps
"load" phase:
- probe kernel features
- load vmlinux BTF
- resolve externs (kconfig and ksym)
- load program BTF
- init struct_ops
- create maps
- apply CO-RE relocations
- patch ld_imm64 insns with src_reg=PSEUDO_MAP, PSEUDO_MAP_VALUE, PSEUDO_BTF_ID
- reposition subprograms and adjust call insns
- sanitize and load progs

During this process libbpf does sys_bpf() calls to load BTF, create maps,
populate maps and finally load programs.
Instead of actually doing the syscalls generate a trace of what libbpf
would have done and represent it as the "loader program".
The "loader program" consists of single map with:
- union bpf_attr(s)
- BTF bytes
- map value bytes
- insns bytes
and single bpf program that passes bpf_attr(s) and data into bpf_sys_bpf() helper.
Executing such "loader program" via bpf_prog_test_run() command will
replay the sequence of syscalls that libbpf would have done which will result
the same maps created and programs loaded as specified in the elf file.
The "loader program" removes libelf and majority of libbpf dependency from
program loading process.

kconfig, typeless ksym, struct_ops and CO-RE are not supported yet.

The order of relocate_data and relocate_calls had to change, so that
bpf_gen__prog_load() can see all relocations for a given program with
correct insn_idx-es.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-15-alexei.starovoitov@gmail.com
3 years agolibbpf: Preliminary support for fd_idx
Alexei Starovoitov [Fri, 14 May 2021 00:36:15 +0000 (17:36 -0700)]
libbpf: Preliminary support for fd_idx

Prep libbpf to use FD_IDX kernel feature when generating loader program.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-14-alexei.starovoitov@gmail.com
3 years agolibbpf: Add bpf_object pointer to kernel_supports().
Alexei Starovoitov [Fri, 14 May 2021 00:36:14 +0000 (17:36 -0700)]
libbpf: Add bpf_object pointer to kernel_supports().

Add a pointer to 'struct bpf_object' to kernel_supports() helper.
It will be used in the next patch.
No functional changes.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-13-alexei.starovoitov@gmail.com
3 years agolibbpf: Change the order of data and text relocations.
Alexei Starovoitov [Fri, 14 May 2021 00:36:13 +0000 (17:36 -0700)]
libbpf: Change the order of data and text relocations.

In order to be able to generate loader program in the later
patches change the order of data and text relocations.
Also improve the test to include data relos.

If the kernel supports "FD array" the map_fd relocations can be processed
before text relos since generated loader program won't need to manually
patch ld_imm64 insns with map_fd.
But ksym and kfunc relocations can only be processed after all calls
are relocated, since loader program will consist of a sequence
of calls to bpf_btf_find_by_name_kind() followed by patching of btf_id
and btf_obj_fd into corresponding ld_imm64 insns. The locations of those
ld_imm64 insns are specified in relocations.
Hence process all data relocations (maps, ksym, kfunc) together after call relos.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-12-alexei.starovoitov@gmail.com
3 years agobpf: Add bpf_sys_close() helper.
Alexei Starovoitov [Fri, 14 May 2021 00:36:12 +0000 (17:36 -0700)]
bpf: Add bpf_sys_close() helper.

Add bpf_sys_close() helper to be used by the syscall/loader program to close
intermediate FDs and other cleanup.
Note this helper must never be allowed inside fdget/fdput bracketing.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-11-alexei.starovoitov@gmail.com
3 years agobpf: Add bpf_btf_find_by_name_kind() helper.
Alexei Starovoitov [Fri, 14 May 2021 00:36:11 +0000 (17:36 -0700)]
bpf: Add bpf_btf_find_by_name_kind() helper.

Add new helper:
long bpf_btf_find_by_name_kind(char *name, int name_sz, u32 kind, int flags)
Description
Find BTF type with given name and kind in vmlinux BTF or in module's BTFs.
Return
Returns btf_id and btf_obj_fd in lower and upper 32 bits.

It will be used by loader program to find btf_id to attach the program to
and to find btf_ids of ksyms.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-10-alexei.starovoitov@gmail.com
3 years agobpf: Introduce fd_idx
Alexei Starovoitov [Fri, 14 May 2021 00:36:10 +0000 (17:36 -0700)]
bpf: Introduce fd_idx

Typical program loading sequence involves creating bpf maps and applying
map FDs into bpf instructions in various places in the bpf program.
This job is done by libbpf that is using compiler generated ELF relocations
to patch certain instruction after maps are created and BTFs are loaded.
The goal of fd_idx is to allow bpf instructions to stay immutable
after compilation. At load time the libbpf would still create maps as usual,
but it wouldn't need to patch instructions. It would store map_fds into
__u32 fd_array[] and would pass that pointer to sys_bpf(BPF_PROG_LOAD).

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-9-alexei.starovoitov@gmail.com
3 years agoselftests/bpf: Test for btf_load command.
Alexei Starovoitov [Fri, 14 May 2021 00:36:09 +0000 (17:36 -0700)]
selftests/bpf: Test for btf_load command.

Improve selftest to check that btf_load is working from bpf program.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-8-alexei.starovoitov@gmail.com
3 years agobpf: Make btf_load command to be bpfptr_t compatible.
Alexei Starovoitov [Fri, 14 May 2021 00:36:08 +0000 (17:36 -0700)]
bpf: Make btf_load command to be bpfptr_t compatible.

Similar to prog_load make btf_load command to be availble to
bpf_prog_type_syscall program.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-7-alexei.starovoitov@gmail.com
3 years agoselftests/bpf: Test for syscall program type
Alexei Starovoitov [Fri, 14 May 2021 00:36:07 +0000 (17:36 -0700)]
selftests/bpf: Test for syscall program type

bpf_prog_type_syscall is a program that creates a bpf map,
updates it, and loads another bpf program using bpf_sys_bpf() helper.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-6-alexei.starovoitov@gmail.com
3 years agolibbpf: Support for syscall program type
Alexei Starovoitov [Fri, 14 May 2021 00:36:06 +0000 (17:36 -0700)]
libbpf: Support for syscall program type

Trivial support for syscall program type.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-5-alexei.starovoitov@gmail.com
3 years agobpf: Prepare bpf syscall to be used from kernel and user space.
Alexei Starovoitov [Fri, 14 May 2021 00:36:05 +0000 (17:36 -0700)]
bpf: Prepare bpf syscall to be used from kernel and user space.

With the help from bpfptr_t prepare relevant bpf syscall commands
to be used from kernel and user space.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-4-alexei.starovoitov@gmail.com
3 years agobpf: Introduce bpfptr_t user/kernel pointer.
Alexei Starovoitov [Fri, 14 May 2021 00:36:04 +0000 (17:36 -0700)]
bpf: Introduce bpfptr_t user/kernel pointer.

Similar to sockptr_t introduce bpfptr_t with few additions:
make_bpfptr() creates new user/kernel pointer in the same address space as
existing user/kernel pointer.
bpfptr_add() advances the user/kernel pointer.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-3-alexei.starovoitov@gmail.com
3 years agobpf: Introduce bpf_sys_bpf() helper and program type.
Alexei Starovoitov [Fri, 14 May 2021 00:36:03 +0000 (17:36 -0700)]
bpf: Introduce bpf_sys_bpf() helper and program type.

Add placeholders for bpf_sys_bpf() helper and new program type.
Make sure to check that expected_attach_type is zero for future extensibility.
Allow tracing helper functions to be used in this program type, since they will
only execute from user context via bpf_prog_test_run.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-2-alexei.starovoitov@gmail.com
3 years agonet: mdio: provide shim implementation of devm_of_mdiobus_register
Vladimir Oltean [Tue, 18 May 2021 17:49:24 +0000 (20:49 +0300)]
net: mdio: provide shim implementation of devm_of_mdiobus_register

Similar to the way in which of_mdiobus_register() has a fallback to the
non-DT based mdiobus_register() when CONFIG_OF is not set, we can create
a shim for the device-managed devm_of_mdiobus_register() which calls
devm_mdiobus_register() and discards the struct device_node *.

In particular, this solves a build issue with the qca8k DSA driver which
uses devm_of_mdiobus_register and can be compiled without CONFIG_OF.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dcb: Remove unnecessary INIT_LIST_HEAD()
Yang Yingliang [Tue, 18 May 2021 13:03:58 +0000 (21:03 +0800)]
net: dcb: Remove unnecessary INIT_LIST_HEAD()

The list_head dcb_app_list is initialized statically.
It is unnecessary to initialize by INIT_LIST_HEAD().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agocxgb4: clip_tbl: use list_del_init instead of list_del/INIT_LIST_HEAD
Yang Yingliang [Tue, 18 May 2021 13:01:35 +0000 (21:01 +0800)]
cxgb4: clip_tbl: use list_del_init instead of list_del/INIT_LIST_HEAD

Using list_del_init() instead of list_del() + INIT_LIST_HEAD()
to simpify the code.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'wan-cleanups'
David S. Miller [Tue, 18 May 2021 20:42:42 +0000 (13:42 -0700)]
Merge branch 'wan-cleanups'

Guangbin Huang says:

====================
net: wan: clean up some code style issues

This patchset clean up some code style issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: fix variable definition style
Peng Li [Tue, 18 May 2021 12:29:54 +0000 (20:29 +0800)]
net: wan: fix variable definition style

Fix the checkpatch error: "foo* bar" should be "foo *bar".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: remove redundant space
Peng Li [Tue, 18 May 2021 12:29:53 +0000 (20:29 +0800)]
net: wan: remove redundant space

Space prohibited before that close parenthesis ')',
so removes the redundant space.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: remove redundant braces {}
Peng Li [Tue, 18 May 2021 12:29:52 +0000 (20:29 +0800)]
net: wan: remove redundant braces {}

Braces {} are not necessary for single statement blocks,
this patch removes redundant braces {}.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: add some required spaces
Peng Li [Tue, 18 May 2021 12:29:51 +0000 (20:29 +0800)]
net: wan: add some required spaces

Add space required before the open parenthesis '(',
and add spaces required around that '<', '>' and '!='.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: remove redundant blank lines
Peng Li [Tue, 18 May 2021 12:29:50 +0000 (20:29 +0800)]
net: wan: remove redundant blank lines

This patch removes some redundant blank lines.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: qca8k: fix missing unlock on error in qca8k_vlan_(add|del)
Wei Yongjun [Tue, 18 May 2021 11:24:13 +0000 (11:24 +0000)]
net: dsa: qca8k: fix missing unlock on error in qca8k_vlan_(add|del)

Add the missing unlock before return from function qca8k_vlan_add()
and qca8k_vlan_del() in the error handling case.

Fixes: 028f5f8ef44f ("net: dsa: qca8k: handle error with qca8k_read operation")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agocipso: correct comments of cipso_v4_cache_invalidate()
Zheng Yejian [Tue, 18 May 2021 09:11:41 +0000 (17:11 +0800)]
cipso: correct comments of cipso_v4_cache_invalidate()

Since cipso_v4_cache_invalidate() has no return value, so drop
related descriptions in its comments.

Fixes: 446fda4f2682 ("[NetLabel]: CIPSOv4 engine")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'custom-multipath-hash'
David S. Miller [Tue, 18 May 2021 20:27:32 +0000 (13:27 -0700)]
Merge branch 'custom-multipath-hash'

Ido Schimmel says:

====================
Add support for custom multipath hash

This patchset adds support for custom multipath hash policy for both
IPv4 and IPv6 traffic. The new policy allows user space to control the
outer and inner packet fields used for the hash computation.

Motivation
==========

Linux currently supports different multipath hash policies for IPv4 and
IPv6 traffic:

* Layer 3
* Layer 4
* Layer 3 or inner layer 3, if present

These policies hash on a fixed set of fields, which is inflexible and
against operators' requirements to control the hash input: "The ability
to control the inputs to the hash function should be a consideration in
any load-balancing RFP" [1].

An example of this inflexibility can be seen by the fact that none of
the current policies allows operators to use the standard 5-tuple and
the flow label for multipath hash computation. Such a policy is useful
in the following real-world example of a data center with the following
types of traffic:

* Anycast IPv6 TCP traffic towards layer 4 load balancers. Flow label is
constant (zero) to avoid breaking established connections

* Non-encapsulated IPv6 traffic. Flow label is used to re-route flows
around problematic (congested / failed) paths [2]

* IPv6 encapsulated traffic (IPv4-in-IPv6 or IPv6-in-IPv6). Outer flow
label is generated from encapsulated packet

* UDP encapsulated traffic. Outer source port is generated from
encapsulated packet

In the above example, using the inner flow information for hash
computation in addition to the outer flow information is useful during
failures of the BPF agent that selectively generates the flow label
based on the traffic type. In such cases, the self-healing properties of
the flow label are lost, but encapsulated flows are still load balanced.

Control over the inner fields is even more critical when encapsulation
is performed by hardware routers. For example, the Spectrum ASIC can
only encode 8 bits of entropy in the outer flow label / outer UDP source
port when performing IP / UDP encapsulation. In the case of IPv4 GRE
encapsulation there is no outer field to encode the inner hash in.

User interface
==============

In accordance with existing multipath hash configuration, the new custom
policy is added as a new option (3) to the
net.ipv{4,6}.fib_multipath_hash_policy sysctls. When the new policy is
used, the packet fields used for hash computation are determined by the
net.ipv{4,6}.fib_multipath_hash_fields sysctls. These sysctls accept a
bitmask according to the following table (from ip-sysctl.rst):

====== ============================
0x0001 Source IP address
0x0002 Destination IP address
0x0004 IP protocol
0x0008 Flow Label
0x0010 Source port
0x0020 Destination port
0x0040 Inner source IP address
0x0080 Inner destination IP address
0x0100 Inner IP protocol
0x0200 Inner Flow Label
0x0400 Inner source port
0x0800 Inner destination port
====== ============================

For example, to allow IPv6 traffic to be hashed based on standard
5-tuple and flow label:

 # sysctl -wq net.ipv6.fib_multipath_hash_fields=0x0037
 # sysctl -wq net.ipv6.fib_multipath_hash_policy=3

Implementation
==============

As with existing policies, the new policy relies on the flow dissector
to extract the packet fields for the hash computation. However, unlike
existing policies that either use the outer or inner flow, the new
policy might require both flows to be dissected.

To avoid unnecessary invocations of the flow dissector, the data path
skips dissection of the outer or inner flows if none of the outer or
inner fields are required.

In addition, inner flow dissection is not performed when no
encapsulation was encountered (i.e., 'FLOW_DIS_ENCAPSULATION' not set by
flow dissector) during dissection of the outer flow.

Testing
=======

Three new selftests are added with three different topologies that allow
testing of following traffic combinations:

* Non-encapsulated IPv4 / IPv6 traffic
* IPv4 / IPv6 overlay over IPv4 underlay
* IPv4 / IPv6 overlay over IPv6 underlay

All three tests follow the same pattern. Each time a different packet
field is used for hash computation. When the field changes in the packet
stream, traffic is expected to be balanced across the two paths. When
the field does not change, traffic is expected to be unbalanced across
the two paths.

Patchset overview
=================

Patches #1-#3 add custom multipath hash support for IPv4 traffic
Patches #4-#7 do the same for IPv6
Patches #8-#10 add selftests

Future work
===========

mlxsw support can be found here [3].

Changes since RFC v2 [4]:

* Patch #2: Document that 0x0008 is used for Flow Label
* Patch #2: Do not allow the bitmask to be zero
* Patch #6: Do not allow the bitmask to be zero

Changes since RFC v1 [5]:

* Use a bitmask instead of a bitmap

[1] https://blog.apnic.net/2018/01/11/ipv6-flow-label-misuse-hashing/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3acf3ec3f4b0fd4263989f2e4227bbd1c42b5fe1
[3] https://github.com/idosch/linux/tree/submit/custom_hash_mlxsw_v2
[4] https://lore.kernel.org/netdev/20210509151615.200608-1-idosch@idosch.org/
[5] https://lore.kernel.org/netdev/20210502162257.3472453-1-idosch@idosch.org/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: forwarding: Add test for custom multipath hash with IPv6 GRE
Ido Schimmel [Mon, 17 May 2021 18:15:26 +0000 (21:15 +0300)]
selftests: forwarding: Add test for custom multipath hash with IPv6 GRE

Test that when the hash policy is set to custom, traffic is distributed
only according to the inner fields set in the fib_multipath_hash_fields
sysctl.

Each time set a different field and make sure traffic is only
distributed when the field is changed in the packet stream.

The test only verifies the behavior of IPv4/IPv6 overlays on top of an
IPv6 underlay network. The previous patch verified the same with an IPv4
underlay network.

Example output:

 # ./ip6gre_custom_multipath_hash.sh
 TEST: ping                                                          [ OK ]
 TEST: ping6                                                         [ OK ]
 INFO: Running IPv4 overlay custom multipath hash tests
 TEST: Multipath hash field: Inner source IP (balanced)              [ OK ]
 INFO: Packets sent on path1 / path2: 6602 / 6002
 TEST: Multipath hash field: Inner source IP (unbalanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 1 / 12601
 TEST: Multipath hash field: Inner destination IP (balanced)         [ OK ]
 INFO: Packets sent on path1 / path2: 6802 / 5801
 TEST: Multipath hash field: Inner destination IP (unbalanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 12602 / 3
 TEST: Multipath hash field: Inner source port (balanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 16431 / 16344
 TEST: Multipath hash field: Inner source port (unbalanced)          [ OK ]
 INFO: Packets sent on path1 / path2: 0 / 32773
 TEST: Multipath hash field: Inner destination port (balanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 16431 / 16344
 TEST: Multipath hash field: Inner destination port (unbalanced)     [ OK ]
 INFO: Packets sent on path1 / path2: 2 / 32772
 INFO: Running IPv6 overlay custom multipath hash tests
 TEST: Multipath hash field: Inner source IP (balanced)              [ OK ]
 INFO: Packets sent on path1 / path2: 6704 / 5902
 TEST: Multipath hash field: Inner source IP (unbalanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 1 / 12600
 TEST: Multipath hash field: Inner destination IP (balanced)         [ OK ]
 INFO: Packets sent on path1 / path2: 5751 / 6852
 TEST: Multipath hash field: Inner destination IP (unbalanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 12602 / 0
 TEST: Multipath hash field: Inner flowlabel (balanced)              [ OK ]
 INFO: Packets sent on path1 / path2: 8272 / 8181
 TEST: Multipath hash field: Inner flowlabel (unbalanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 3 / 12602
 TEST: Multipath hash field: Inner source port (balanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 16424 / 16351
 TEST: Multipath hash field: Inner source port (unbalanced)          [ OK ]
 INFO: Packets sent on path1 / path2: 3 / 32774
 TEST: Multipath hash field: Inner destination port (balanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 16425 / 16350
 TEST: Multipath hash field: Inner destination port (unbalanced)     [ OK ]
 INFO: Packets sent on path1 / path2: 2 / 32773

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: forwarding: Add test for custom multipath hash with IPv4 GRE
Ido Schimmel [Mon, 17 May 2021 18:15:25 +0000 (21:15 +0300)]
selftests: forwarding: Add test for custom multipath hash with IPv4 GRE

Test that when the hash policy is set to custom, traffic is distributed
only according to the inner fields set in the fib_multipath_hash_fields
sysctl.

Each time set a different field and make sure traffic is only
distributed when the field is changed in the packet stream.

The test only verifies the behavior of IPv4/IPv6 overlays on top of an
IPv4 underlay network. A subsequent patch will do the same with an IPv6
underlay network.

Example output:

 # ./gre_custom_multipath_hash.sh
 TEST: ping                                                          [ OK ]
 TEST: ping6                                                         [ OK ]
 INFO: Running IPv4 overlay custom multipath hash tests
 TEST: Multipath hash field: Inner source IP (balanced)              [ OK ]
 INFO: Packets sent on path1 / path2: 6601 / 6001
 TEST: Multipath hash field: Inner source IP (unbalanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 0 / 12600
 TEST: Multipath hash field: Inner destination IP (balanced)         [ OK ]
 INFO: Packets sent on path1 / path2: 6802 / 5802
 TEST: Multipath hash field: Inner destination IP (unbalanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 12601 / 1
 TEST: Multipath hash field: Inner source port (balanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 16430 / 16344
 TEST: Multipath hash field: Inner source port (unbalanced)          [ OK ]
 INFO: Packets sent on path1 / path2: 0 / 32772
 TEST: Multipath hash field: Inner destination port (balanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 16430 / 16343
 TEST: Multipath hash field: Inner destination port (unbalanced)     [ OK ]
 INFO: Packets sent on path1 / path2: 0 / 32772
 INFO: Running IPv6 overlay custom multipath hash tests
 TEST: Multipath hash field: Inner source IP (balanced)              [ OK ]
 INFO: Packets sent on path1 / path2: 6702 / 5900
 TEST: Multipath hash field: Inner source IP (unbalanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 0 / 12601
 TEST: Multipath hash field: Inner destination IP (balanced)         [ OK ]
 INFO: Packets sent on path1 / path2: 5751 / 6851
 TEST: Multipath hash field: Inner destination IP (unbalanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 12602 / 1
 TEST: Multipath hash field: Inner flowlabel (balanced)              [ OK ]
 INFO: Packets sent on path1 / path2: 8364 / 8065
 TEST: Multipath hash field: Inner flowlabel (unbalanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 12601 / 0
 TEST: Multipath hash field: Inner source port (balanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 16425 / 16349
 TEST: Multipath hash field: Inner source port (unbalanced)          [ OK ]
 INFO: Packets sent on path1 / path2: 1 / 32770
 TEST: Multipath hash field: Inner destination port (balanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 16425 / 16349
 TEST: Multipath hash field: Inner destination port (unbalanced)     [ OK ]
 INFO: Packets sent on path1 / path2: 2 / 32770

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: forwarding: Add test for custom multipath hash
Ido Schimmel [Mon, 17 May 2021 18:15:24 +0000 (21:15 +0300)]
selftests: forwarding: Add test for custom multipath hash

Test that when the hash policy is set to custom, traffic is distributed
only according to the outer fields set in the fib_multipath_hash_fields
sysctl.

Each time set a different field and make sure traffic is only
distributed when the field is changed in the packet stream.

The test only verifies the behavior with non-encapsulated IPv4 and IPv6
packets. Subsequent patches will add tests for IPv4/IPv6 overlays on top
of IPv4/IPv6 underlay networks.

Example output:

 # ./custom_multipath_hash.sh
 TEST: ping                                                          [ OK ]
 TEST: ping6                                                         [ OK ]
 INFO: Running IPv4 custom multipath hash tests
 TEST: Multipath hash field: Source IP (balanced)                    [ OK ]
 INFO: Packets sent on path1 / path2: 6353 / 6254
 TEST: Multipath hash field: Source IP (unbalanced)                  [ OK ]
 INFO: Packets sent on path1 / path2: 0 / 12600
 TEST: Multipath hash field: Destination IP (balanced)               [ OK ]
 INFO: Packets sent on path1 / path2: 6102 / 6502
 TEST: Multipath hash field: Destination IP (unbalanced)             [ OK ]
 INFO: Packets sent on path1 / path2: 1 / 12601
 TEST: Multipath hash field: Source port (balanced)                  [ OK ]
 INFO: Packets sent on path1 / path2: 16428 / 16345
 TEST: Multipath hash field: Source port (unbalanced)                [ OK ]
 INFO: Packets sent on path1 / path2: 32770 / 2
 TEST: Multipath hash field: Destination port (balanced)             [ OK ]
 INFO: Packets sent on path1 / path2: 16428 / 16345
 TEST: Multipath hash field: Destination port (unbalanced)           [ OK ]
 INFO: Packets sent on path1 / path2: 32770 / 2
 INFO: Running IPv6 custom multipath hash tests
 TEST: Multipath hash field: Source IP (balanced)                    [ OK ]
 INFO: Packets sent on path1 / path2: 6704 / 5903
 TEST: Multipath hash field: Source IP (unbalanced)                  [ OK ]
 INFO: Packets sent on path1 / path2: 12600 / 0
 TEST: Multipath hash field: Destination IP (balanced)               [ OK ]
 INFO: Packets sent on path1 / path2: 5551 / 7052
 TEST: Multipath hash field: Destination IP (unbalanced)             [ OK ]
 INFO: Packets sent on path1 / path2: 12603 / 0
 TEST: Multipath hash field: Flowlabel (balanced)                    [ OK ]
 INFO: Packets sent on path1 / path2: 8378 / 8080
 TEST: Multipath hash field: Flowlabel (unbalanced)                  [ OK ]
 INFO: Packets sent on path1 / path2: 2 / 12603
 TEST: Multipath hash field: Source port (balanced)                  [ OK ]
 INFO: Packets sent on path1 / path2: 16385 / 16388
 TEST: Multipath hash field: Source port (unbalanced)                [ OK ]
 INFO: Packets sent on path1 / path2: 0 / 32774
 TEST: Multipath hash field: Destination port (balanced)             [ OK ]
 INFO: Packets sent on path1 / path2: 16386 / 16390
 TEST: Multipath hash field: Destination port (unbalanced)           [ OK ]
 INFO: Packets sent on path1 / path2: 32771 / 2

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: Add custom multipath hash policy
Ido Schimmel [Mon, 17 May 2021 18:15:23 +0000 (21:15 +0300)]
ipv6: Add custom multipath hash policy

Add a new multipath hash policy where the packet fields used for hash
calculation are determined by user space via the
fib_multipath_hash_fields sysctl that was introduced in the previous
patch.

The current set of available packet fields includes both outer and inner
fields, which requires two invocations of the flow dissector. Avoid
unnecessary dissection of the outer or inner flows by skipping
dissection if none of the outer or inner fields are required.

In accordance with the existing policies, when an skb is not available,
packet fields are extracted from the provided flow key. In which case,
only outer fields are considered.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: Add a sysctl to control multipath hash fields
Ido Schimmel [Mon, 17 May 2021 18:15:22 +0000 (21:15 +0300)]
ipv6: Add a sysctl to control multipath hash fields

A subsequent patch will add a new multipath hash policy where the packet
fields used for multipath hash calculation are determined by user space.
This patch adds a sysctl that allows user space to set these fields.

The packet fields are represented using a bitmask and are common between
IPv4 and IPv6 to allow user space to use the same numbering across both
protocols. For example, to hash based on standard 5-tuple:

 # sysctl -w net.ipv6.fib_multipath_hash_fields=0x0037
 net.ipv6.fib_multipath_hash_fields = 0x0037

To avoid introducing holes in 'struct netns_sysctl_ipv6', move the
'bindv6only' field after the multipath hash fields.

The kernel rejects unknown fields, for example:

 # sysctl -w net.ipv6.fib_multipath_hash_fields=0x1000
 sysctl: setting key "net.ipv6.fib_multipath_hash_fields": Invalid argument

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>