platform/kernel/linux-starfive.git
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Wed, 19 May 2021 19:58:29 +0000 (12:58 -0700)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-05-19

The following pull-request contains BPF updates for your *net-next* tree.

We've added 43 non-merge commits during the last 11 day(s) which contain
a total of 74 files changed, 3717 insertions(+), 578 deletions(-).

The main changes are:

1) syscall program type, fd array, and light skeleton, from Alexei.

2) Stop emitting static variables in skeleton, from Andrii.

3) Low level tc-bpf api, from Kumar.

4) Reduce verifier kmalloc/kfree churn, from Lorenz.
====================

3 years agoMerge branch 'mlxsw-mphash-policies'
David S. Miller [Wed, 19 May 2021 19:47:47 +0000 (12:47 -0700)]
Merge branch 'mlxsw-mphash-policies'

Ido Schimmel says:

====================
mlxsw: Add support for new multipath hash policies

This patchset adds support for two new multipath hash policies in mlxsw.

Patch #1 emits net events whenever the
net.ipv{4,6}.fib_multipath_hash_fields sysctls are changed. This allows
listeners to react to changes in the packet fields used for the
computation of the multipath hash.

Patches #2-#3 refactor the code in mlxsw that is responsible for the
configuration of the multipath hash, so that it will be easier to extend
for the two new policies.

Patch #4 adds the register fields required to support the new policies.

Patch #5-#7 add support for inner layer 3 and custom multipath hash
policies.

Tested using following forwarding selftests:

* custom_multipath_hash.sh
* gre_custom_multipath_hash.sh
* gre_inner_v4_multipath.sh
* gre_inner_v6_multipath.sh
====================

3 years agomlxsw: spectrum_router: Add support for custom multipath hash policy
Ido Schimmel [Wed, 19 May 2021 12:08:24 +0000 (15:08 +0300)]
mlxsw: spectrum_router: Add support for custom multipath hash policy

When this policy is set, only enable the packet fields that were enabled
by user space for multipath hash computation.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: spectrum_router: Add support for inner layer 3 multipath hash policy
Ido Schimmel [Wed, 19 May 2021 12:08:23 +0000 (15:08 +0300)]
mlxsw: spectrum_router: Add support for inner layer 3 multipath hash policy

When this policy is set, the kernel uses the inner layer 3 fields for
multipath hash computation and falls back to the outer fields if no
encapsulation was encountered. This behavior is most likely influenced
by the behavior of the flow dissector, which is used for the packet
dissection.

The Spectrum ASIC, however, cannot fallback to outer fields if inner
fields are not available. This should not result in a discrepancy from
the software data path because if several flows have matching inner
fields, they will tend to have matching outer fields as well.

Therefore, implement this policy by enabling both outer and inner layer
3 fields for the multipath hash computation.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: spectrum_outer: Factor out helper for common outer fields
Ido Schimmel [Wed, 19 May 2021 12:08:22 +0000 (15:08 +0300)]
mlxsw: spectrum_outer: Factor out helper for common outer fields

Outer IPv4 and IPv6 addresses are used by multiple multipath hash
policies. Factor out helpers that set these fields to increase code
sharing between different policies.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: reg: Add inner packet fields to RECRv2 register
Ido Schimmel [Wed, 19 May 2021 12:08:21 +0000 (15:08 +0300)]
mlxsw: reg: Add inner packet fields to RECRv2 register

The RECRv2 register is used for setting up the router's ECMP hash
configuration. Extend it with inner packet fields to allow the ECMP hash
to be calculated based on inner flow information.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: spectrum_router: Move multipath hash configuration to a bitmap
Ido Schimmel [Wed, 19 May 2021 12:08:20 +0000 (15:08 +0300)]
mlxsw: spectrum_router: Move multipath hash configuration to a bitmap

Currently, the multipath hash configuration is written directly to the
register payload. While this is OK for the two currently supported
policies, it is going to be hard to follow when more policies and more
packet fields are added.

Instead, set the required headers and fields in a bitmap and then dump
it to the register payload.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: spectrum_router: Replace if statement with a switch statement
Ido Schimmel [Wed, 19 May 2021 12:08:19 +0000 (15:08 +0300)]
mlxsw: spectrum_router: Replace if statement with a switch statement

The code was written when only two multipath hash policies were present,
so the if statement was sufficient. The next patch and future patches
are going to add support for more policies, so move to a switch
statement.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: Add notifications when multipath hash field change
Ido Schimmel [Wed, 19 May 2021 12:08:18 +0000 (15:08 +0300)]
net: Add notifications when multipath hash field change

In-kernel notifications are already sent when the multipath hash policy
itself changes, but not when the multipath hash fields change.

Add these notifications, so that interested listeners (e.g., switch ASIC
drivers) could perform the necessary configuration.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonfc: s3fwrn5: i2c: Enable optional clock from device tree
Stephan Gerhold [Wed, 19 May 2021 09:16:13 +0000 (11:16 +0200)]
nfc: s3fwrn5: i2c: Enable optional clock from device tree

S3FWRN5 depends on a clock input ("XI" pin) to function properly.
Depending on the hardware configuration this could be an always-on
oscillator or some external clock that must be explicitly enabled.

So far we assumed that the clock is always-on.
Make the driver request an (optional) clock from the device tree
and make sure the clock is running before starting S3FWRN5.

Note: S3FWRN5 asserts "GPIO2" whenever it needs the clock input to
function correctly. On some hardware configurations, GPIO2 is
connected directly to an input pin of the external clock provider
(e.g. the main PMIC of the SoC). In that case, it can automatically
AND the clock enable bit and clock request from S3FWRN5 so that
the clock is actually only enabled when needed.

It is also conceivable that on some other hardware configuration
S3FWRN5's GPIO2 might be connected as a regular GPIO input
of the SoC. In that case, follow-up patches could extend the
driver to request the GPIO, set up an interrupt and only enable
the clock when requested by S3FWRN5.

Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodt-bindings: net: nfc: s3fwrn5: Add optional clock
Stephan Gerhold [Wed, 19 May 2021 09:16:12 +0000 (11:16 +0200)]
dt-bindings: net: nfc: s3fwrn5: Add optional clock

On some systems, S3FWRN5 depends on having an external clock enabled
to function correctly. Allow declaring that clock in the device tree.

Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonetlabel: remove unused parameter in netlbl_netlink_auditinfo()
Zheng Yejian [Wed, 19 May 2021 07:34:38 +0000 (15:34 +0800)]
netlabel: remove unused parameter in netlbl_netlink_auditinfo()

loginuid/sessionid/secid have been read from 'current' instead of struct
netlink_skb_parms, the parameter 'skb' seems no longer needed.

Fixes: c53fa1ed92cd ("netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'intel-cleanups'
David S. Miller [Wed, 19 May 2021 19:23:25 +0000 (12:23 -0700)]
Merge branch 'intel-cleanups'

Guangbin Huang says:

====================
net: intel: some cleanups

This patchset adds some cleanups for intel e1000/e1000e ethernet driver.
====================

Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: e1000e: fix misspell word "retreived"
Hao Chen [Wed, 19 May 2021 06:14:45 +0000 (14:14 +0800)]
net: e1000e: fix misspell word "retreived"

There is a misspell word "retreived" in comment, so fix it to "retrieved".

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: e1000e: remove repeated word "slot" for netdev.c
Hao Chen [Wed, 19 May 2021 06:14:44 +0000 (14:14 +0800)]
net: e1000e: remove repeated word "slot" for netdev.c

There are double "slot" in comment, so remove the redundant one.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: e1000e: remove repeated word "the" for ich8lan.c
Hao Chen [Wed, 19 May 2021 06:14:43 +0000 (14:14 +0800)]
net: e1000e: remove repeated word "the" for ich8lan.c

There are double "the" in comment, so remove the redundant one.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: e1000: remove repeated words for e1000_hw.c
Hao Chen [Wed, 19 May 2021 06:14:42 +0000 (14:14 +0800)]
net: e1000: remove repeated words for e1000_hw.c

There are double "in" and "to" in comments, so remove the redundant one.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: e1000: remove repeated word "slot" for e1000_main.c
Hao Chen [Wed, 19 May 2021 06:14:41 +0000 (14:14 +0800)]
net: e1000: remove repeated word "slot" for e1000_main.c

There are double "slot" in comment, so remove the redundant one.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'net-dev-leading-spaces'
David S. Miller [Wed, 19 May 2021 19:17:32 +0000 (12:17 -0700)]
Merge branch 'net-dev-leading-spaces'

Hui Tang says:

====================
net: ethernet: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

        $ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
        $ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: fujitsu: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:53 +0000 (13:30 +0800)]
net: fujitsu: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: 8390: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:52 +0000 (13:30 +0800)]
net: 8390: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: xircom: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:51 +0000 (13:30 +0800)]
net: xircom: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: fealnx: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:50 +0000 (13:30 +0800)]
net: fealnx: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sun: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:49 +0000 (13:30 +0800)]
net: sun: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: smsc: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:48 +0000 (13:30 +0800)]
net: smsc: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sis: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:47 +0000 (13:30 +0800)]
net: sis: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: seeq: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:46 +0000 (13:30 +0800)]
net: seeq: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: realtek: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:45 +0000 (13:30 +0800)]
net: realtek: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: natsemi: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:44 +0000 (13:30 +0800)]
net: natsemi: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: marvell: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:43 +0000 (13:30 +0800)]
net: marvell: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ibm: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:42 +0000 (13:30 +0800)]
net: ibm: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'
Cc: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Acked-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dlink: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:41 +0000 (13:30 +0800)]
net: dlink: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'
Cc: "Alexander A. Klimov" <grandmaster@al2klimov.de>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dec: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:40 +0000 (13:30 +0800)]
net: dec: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: chelsio: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:39 +0000 (13:30 +0800)]
net: chelsio: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: broadcom: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:38 +0000 (13:30 +0800)]
net: broadcom: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: apple: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:37 +0000 (13:30 +0800)]
net: apple: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: amd: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:36 +0000 (13:30 +0800)]
net: amd: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: alteon: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:35 +0000 (13:30 +0800)]
net: alteon: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: 3com: remove leading spaces before tabs
Hui Tang [Wed, 19 May 2021 05:30:34 +0000 (13:30 +0800)]
net: 3com: remove leading spaces before tabs

There are a few leading space before tabs and remove it by running the
following commard:

$ find . -name '*.c' | xargs sed -r -i 's/^[ ]+\t/\t/'
$ find . -name '*.h' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Steffen Klassert <klassert@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotun: use DEVICE_ATTR_RO macro
YueHaibing [Wed, 19 May 2021 02:38:50 +0000 (10:38 +0800)]
tun: use DEVICE_ATTR_RO macro

Use DEVICE_ATTR_RO helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmveth: fix kobj_to_dev.cocci warnings
YueHaibing [Wed, 19 May 2021 02:28:49 +0000 (10:28 +0800)]
ibmveth: fix kobj_to_dev.cocci warnings

Use kobj_to_dev() instead of container_of()

Generated by: scripts/coccinelle/api/kobj_to_dev.cocci

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobpf: Make some symbols static
Pu Lehui [Wed, 19 May 2021 06:41:16 +0000 (14:41 +0800)]
bpf: Make some symbols static

The sparse tool complains as follows:

kernel/bpf/syscall.c:4567:29: warning:
 symbol 'bpf_sys_bpf_proto' was not declared. Should it be static?
kernel/bpf/syscall.c:4592:29: warning:
 symbol 'bpf_sys_close_proto' was not declared. Should it be static?

This symbol is not used outside of syscall.c, so marks it static.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210519064116.240536-1-pulehui@huawei.com
3 years agobpf: Add cmd alias BPF_PROG_RUN
Alexei Starovoitov [Wed, 19 May 2021 01:40:32 +0000 (18:40 -0700)]
bpf: Add cmd alias BPF_PROG_RUN

Add BPF_PROG_RUN command as an alias to BPF_RPOG_TEST_RUN to better
indicate the full range of use cases done by the command.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210519014032.20908-1-alexei.starovoitov@gmail.com
3 years agoMerge branch 'bpf-loader-progs'
Daniel Borkmann [Wed, 19 May 2021 13:27:32 +0000 (15:27 +0200)]
Merge branch 'bpf-loader-progs'

Alexei Starovoitov says:

====================
v5->v6:
- fixed issue found by bpf CI. The light skeleton generation was
  doing a dry-run of loading the program where all actual sys_bpf syscalls
  were replaced by calls into gen_loader. Turned out that search for valid
  vmlinux_btf was not stubbed out which was causing light skeleton gen
  to fail on older kernels.
- significantly reduced verbosity of gen_loader.c.
- an example trace_printk.lskel.h generated out of progs/trace_printk.c
  https://gist.github.com/4ast/774ea58f8286abac6aa8e3bf3bf3b903

v4->v5:
- addressed a bunch of minor comments from Andrii.
- the main difference is that lskel is now more robust in case of errors
  and a bit cleaner looking.

v3->v4:
- cleaned up closing of temporary FDs in case intermediate sys_bpf fails during
  execution of loader program.
- added support for rodata in the skeleton.
- enforce bpf_prog_type_syscall to be sleepable, since it needs bpf_copy_from_user
  to populate rodata map.
- converted test trace_printk to use lskel to test rodata access.
- various small bug fixes.

v2->v3: Addressed comments from Andrii and John.
- added support for setting max_entries after signature verification
  and used it in ringbuf test, since ringbuf's max_entries has to be updated
  after skeleton open() and before load(). See patch 20.
- bpf_btf_find_by_name_kind doesn't take btf_fd anymore.
  Because of that removed attach_prog_fd from bpf_prog_desc in lskel.
  Both features to be added later.
- cleaned up closing of fd==0 during loader gen by resetting fds back to -1.
- converted loader gen to use memset(&attr, cmd_specific_attr_size).
  would love to see this optimization in the rest of libbpf.
- fixed memory leak during loader_gen in case of enomem.
- support for fd_array kernel feature is added in patch 9 to have
  exhaustive testing across all selftests and then partially reverted
  in patch 15 to keep old style map_fd patching tested as well.
- since fentry_test/fexit_tests were extended with re-attach had to add
  support for per-program attach method in lskel and use it in the tests.
- cleanup closing of fds in lskel in case of partial failures.
- fixed numerous small nits.

v1->v2: Addressed comments from Al, Yonghong and Andrii.
- documented sys_close fdget/fdput requirement and non-recursion check.
- reduced internal api leaks between libbpf and bpftool.
  Now bpf_object__gen_loader() is the only new libbf api with minimal fields.
- fixed light skeleton __destroy() method to munmap and close maps and progs.
- refactored bpf_btf_find_by_name_kind to return btf_id | (btf_obj_fd << 32).
- refactored use of bpf_btf_find_by_name_kind from loader prog.
- moved auto-gen like code into skel_internal.h that is used by *.lskel.h
  It has minimal static inline bpf_load_and_run() method used by lskel.
- added lksel.h example in patch 15.
- replaced union bpf_map_prog_desc with struct bpf_map_desc and struct bpf_prog_desc.
- removed mark_feat_supported and added a patch to pass 'obj' into kernel_supports.
- added proper tracking of temporary FDs in loader prog and their cleanup via bpf_sys_close.
- rename gen_trace.c into gen_loader.c to better align the naming throughout.
- expanded number of available helpers in new prog type.
- added support for raw_tp attaching in lskel.
  lskel supports tracing and raw_tp progs now.
  It correctly loads all networking prog types too, but __attach() method is tbd.
- converted progs/test_ksyms_module.c to lskel.
- minor feedback fixes all over.

The description of V1 set is still valid:

This is a first step towards signed bpf programs and the third approach of that kind.
The first approach was to bring libbpf into the kernel as a user-mode-driver.
The second approach was to invent a new file format and let kernel execute
that format as a sequence of syscalls that create maps and load programs.
This third approach is using new type of bpf program instead of inventing file format.
1st and 2nd approaches had too many downsides comparing to this 3rd and were discarded
after months of work.

To make it work the following new concepts are introduced:
1. syscall bpf program type
A kind of bpf program that can do sys_bpf and sys_close syscalls.
It can only execute in user context.

2. FD array or FD index.
Traditionally BPF instructions are patched with FDs.
What it means that maps has to be created first and then instructions modified
which breaks signature verification if the program is signed.
Instead of patching each instruction with FD patch it with an index into array of FDs.
That makes the program signature stable if it uses maps.

3. loader program that is generated as "strace of libbpf".
When libbpf is loading bpf_file.o it does a bunch of sys_bpf() syscalls to
load BTF, create maps, populate maps and finally load programs.
Instead of actually doing the syscalls generate a trace of what libbpf
would have done and represent it as the "loader program".
The "loader program" consists of single map and single bpf program that
does those syscalls.
Executing such "loader program" via bpf_prog_test_run() command will
replay the sequence of syscalls that libbpf would have done which will result
the same maps created and programs loaded as specified in the elf file.
The "loader program" removes libelf and majority of libbpf dependency from
program loading process.

4. light skeleton
Instead of embedding the whole elf file into skeleton and using libbpf
to parse it later generate a loader program and embed it into "light skeleton".
Such skeleton can load the same set of elf files, but it doesn't need
libbpf and libelf to do that. It only needs few sys_bpf wrappers.

Future steps:
- support CO-RE in the kernel. This patch set is already too big,
  so that critical feature is left for the next step.
- generate light skeleton in golang to allow such users use BTF and
  all other features provided by libbpf
- generate light skeleton for kernel, so that bpf programs can be embeded
  in the kernel module. The UMD usage in bpf_preload will be replaced with
  such skeleton, so bpf_preload would become a standard kernel module
  without user space dependency.
- finally do the signing of the loader program.

The patches are work in progress with few rough edges.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
3 years agoselftests/bpf: Convert test trace_printk to lskel.
Alexei Starovoitov [Fri, 14 May 2021 00:36:23 +0000 (17:36 -0700)]
selftests/bpf: Convert test trace_printk to lskel.

Convert test trace_printk to light skeleton to check
rodata support in lskel.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-22-alexei.starovoitov@gmail.com
3 years agoselftests/bpf: Convert test printk to use rodata.
Alexei Starovoitov [Fri, 14 May 2021 00:36:22 +0000 (17:36 -0700)]
selftests/bpf: Convert test printk to use rodata.

Convert test trace_printk to more aggressively validate and use rodata.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-21-alexei.starovoitov@gmail.com
3 years agoselftests/bpf: Convert atomics test to light skeleton.
Alexei Starovoitov [Fri, 14 May 2021 00:36:21 +0000 (17:36 -0700)]
selftests/bpf: Convert atomics test to light skeleton.

Convert prog_tests/atomics.c to lskel.h

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-20-alexei.starovoitov@gmail.com
3 years agoselftests/bpf: Convert few tests to light skeleton.
Alexei Starovoitov [Fri, 14 May 2021 00:36:20 +0000 (17:36 -0700)]
selftests/bpf: Convert few tests to light skeleton.

Convert few tests that don't use CO-RE to light skeleton.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-19-alexei.starovoitov@gmail.com
3 years agobpftool: Use syscall/loader program in "prog load" and "gen skeleton" command.
Alexei Starovoitov [Fri, 14 May 2021 00:36:19 +0000 (17:36 -0700)]
bpftool: Use syscall/loader program in "prog load" and "gen skeleton" command.

Add -L flag to bpftool to use libbpf gen_trace facility and syscall/loader program
for skeleton generation and program loading.

"bpftool gen skeleton -L" command will generate a "light skeleton" or "loader skeleton"
that is similar to existing skeleton, but has one major difference:
$ bpftool gen skeleton lsm.o > lsm.skel.h
$ bpftool gen skeleton -L lsm.o > lsm.lskel.h
$ diff lsm.skel.h lsm.lskel.h
@@ -5,34 +4,34 @@
 #define __LSM_SKEL_H__

 #include <stdlib.h>
-#include <bpf/libbpf.h>
+#include <bpf/bpf.h>

The light skeleton does not use majority of libbpf infrastructure.
It doesn't need libelf. It doesn't parse .o file.
It only needs few sys_bpf wrappers. All of them are in bpf/bpf.h file.
In future libbpf/bpf.c can be inlined into bpf.h, so not even libbpf.a would be
needed to work with light skeleton.

"bpftool prog load -L file.o" command is introduced for debugging of syscall/loader
program generation. Just like the same command without -L it will try to load
the programs from file.o into the kernel. It won't even try to pin them.

"bpftool prog load -L -d file.o" command will provide additional debug messages
on how syscall/loader program was generated.
Also the execution of syscall/loader program will use bpf_trace_printk() for
each step of loading BTF, creating maps, and loading programs.
The user can do "cat /.../trace_pipe" for further debug.

An example of fexit_sleep.lskel.h generated from progs/fexit_sleep.c:
struct fexit_sleep {
struct bpf_loader_ctx ctx;
struct {
struct bpf_map_desc bss;
} maps;
struct {
struct bpf_prog_desc nanosleep_fentry;
struct bpf_prog_desc nanosleep_fexit;
} progs;
struct {
int nanosleep_fentry_fd;
int nanosleep_fexit_fd;
} links;
struct fexit_sleep__bss {
int pid;
int fentry_cnt;
int fexit_cnt;
} *bss;
};

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-18-alexei.starovoitov@gmail.com
3 years agolibbpf: Introduce bpf_map__initial_value().
Alexei Starovoitov [Fri, 14 May 2021 00:36:18 +0000 (17:36 -0700)]
libbpf: Introduce bpf_map__initial_value().

Introduce bpf_map__initial_value() to read initial contents
of mmaped data/rodata/bss maps.
Note that bpf_map__set_initial_value() doesn't allow modifying
kconfig map while bpf_map__initial_value() allows reading
its values.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-17-alexei.starovoitov@gmail.com
3 years agolibbpf: Cleanup temp FDs when intermediate sys_bpf fails.
Alexei Starovoitov [Fri, 14 May 2021 00:36:17 +0000 (17:36 -0700)]
libbpf: Cleanup temp FDs when intermediate sys_bpf fails.

Fix loader program to close temporary FDs when intermediate
sys_bpf command fails.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-16-alexei.starovoitov@gmail.com
3 years agolibbpf: Generate loader program out of BPF ELF file.
Alexei Starovoitov [Fri, 14 May 2021 00:36:16 +0000 (17:36 -0700)]
libbpf: Generate loader program out of BPF ELF file.

The BPF program loading process performed by libbpf is quite complex
and consists of the following steps:
"open" phase:
- parse elf file and remember relocations, sections
- collect externs and ksyms including their btf_ids in prog's BTF
- patch BTF datasec (since llvm couldn't do it)
- init maps (old style map_def, BTF based, global data map, kconfig map)
- collect relocations against progs and maps
"load" phase:
- probe kernel features
- load vmlinux BTF
- resolve externs (kconfig and ksym)
- load program BTF
- init struct_ops
- create maps
- apply CO-RE relocations
- patch ld_imm64 insns with src_reg=PSEUDO_MAP, PSEUDO_MAP_VALUE, PSEUDO_BTF_ID
- reposition subprograms and adjust call insns
- sanitize and load progs

During this process libbpf does sys_bpf() calls to load BTF, create maps,
populate maps and finally load programs.
Instead of actually doing the syscalls generate a trace of what libbpf
would have done and represent it as the "loader program".
The "loader program" consists of single map with:
- union bpf_attr(s)
- BTF bytes
- map value bytes
- insns bytes
and single bpf program that passes bpf_attr(s) and data into bpf_sys_bpf() helper.
Executing such "loader program" via bpf_prog_test_run() command will
replay the sequence of syscalls that libbpf would have done which will result
the same maps created and programs loaded as specified in the elf file.
The "loader program" removes libelf and majority of libbpf dependency from
program loading process.

kconfig, typeless ksym, struct_ops and CO-RE are not supported yet.

The order of relocate_data and relocate_calls had to change, so that
bpf_gen__prog_load() can see all relocations for a given program with
correct insn_idx-es.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-15-alexei.starovoitov@gmail.com
3 years agolibbpf: Preliminary support for fd_idx
Alexei Starovoitov [Fri, 14 May 2021 00:36:15 +0000 (17:36 -0700)]
libbpf: Preliminary support for fd_idx

Prep libbpf to use FD_IDX kernel feature when generating loader program.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-14-alexei.starovoitov@gmail.com
3 years agolibbpf: Add bpf_object pointer to kernel_supports().
Alexei Starovoitov [Fri, 14 May 2021 00:36:14 +0000 (17:36 -0700)]
libbpf: Add bpf_object pointer to kernel_supports().

Add a pointer to 'struct bpf_object' to kernel_supports() helper.
It will be used in the next patch.
No functional changes.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-13-alexei.starovoitov@gmail.com
3 years agolibbpf: Change the order of data and text relocations.
Alexei Starovoitov [Fri, 14 May 2021 00:36:13 +0000 (17:36 -0700)]
libbpf: Change the order of data and text relocations.

In order to be able to generate loader program in the later
patches change the order of data and text relocations.
Also improve the test to include data relos.

If the kernel supports "FD array" the map_fd relocations can be processed
before text relos since generated loader program won't need to manually
patch ld_imm64 insns with map_fd.
But ksym and kfunc relocations can only be processed after all calls
are relocated, since loader program will consist of a sequence
of calls to bpf_btf_find_by_name_kind() followed by patching of btf_id
and btf_obj_fd into corresponding ld_imm64 insns. The locations of those
ld_imm64 insns are specified in relocations.
Hence process all data relocations (maps, ksym, kfunc) together after call relos.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-12-alexei.starovoitov@gmail.com
3 years agobpf: Add bpf_sys_close() helper.
Alexei Starovoitov [Fri, 14 May 2021 00:36:12 +0000 (17:36 -0700)]
bpf: Add bpf_sys_close() helper.

Add bpf_sys_close() helper to be used by the syscall/loader program to close
intermediate FDs and other cleanup.
Note this helper must never be allowed inside fdget/fdput bracketing.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-11-alexei.starovoitov@gmail.com
3 years agobpf: Add bpf_btf_find_by_name_kind() helper.
Alexei Starovoitov [Fri, 14 May 2021 00:36:11 +0000 (17:36 -0700)]
bpf: Add bpf_btf_find_by_name_kind() helper.

Add new helper:
long bpf_btf_find_by_name_kind(char *name, int name_sz, u32 kind, int flags)
Description
Find BTF type with given name and kind in vmlinux BTF or in module's BTFs.
Return
Returns btf_id and btf_obj_fd in lower and upper 32 bits.

It will be used by loader program to find btf_id to attach the program to
and to find btf_ids of ksyms.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-10-alexei.starovoitov@gmail.com
3 years agobpf: Introduce fd_idx
Alexei Starovoitov [Fri, 14 May 2021 00:36:10 +0000 (17:36 -0700)]
bpf: Introduce fd_idx

Typical program loading sequence involves creating bpf maps and applying
map FDs into bpf instructions in various places in the bpf program.
This job is done by libbpf that is using compiler generated ELF relocations
to patch certain instruction after maps are created and BTFs are loaded.
The goal of fd_idx is to allow bpf instructions to stay immutable
after compilation. At load time the libbpf would still create maps as usual,
but it wouldn't need to patch instructions. It would store map_fds into
__u32 fd_array[] and would pass that pointer to sys_bpf(BPF_PROG_LOAD).

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-9-alexei.starovoitov@gmail.com
3 years agoselftests/bpf: Test for btf_load command.
Alexei Starovoitov [Fri, 14 May 2021 00:36:09 +0000 (17:36 -0700)]
selftests/bpf: Test for btf_load command.

Improve selftest to check that btf_load is working from bpf program.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-8-alexei.starovoitov@gmail.com
3 years agobpf: Make btf_load command to be bpfptr_t compatible.
Alexei Starovoitov [Fri, 14 May 2021 00:36:08 +0000 (17:36 -0700)]
bpf: Make btf_load command to be bpfptr_t compatible.

Similar to prog_load make btf_load command to be availble to
bpf_prog_type_syscall program.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-7-alexei.starovoitov@gmail.com
3 years agoselftests/bpf: Test for syscall program type
Alexei Starovoitov [Fri, 14 May 2021 00:36:07 +0000 (17:36 -0700)]
selftests/bpf: Test for syscall program type

bpf_prog_type_syscall is a program that creates a bpf map,
updates it, and loads another bpf program using bpf_sys_bpf() helper.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-6-alexei.starovoitov@gmail.com
3 years agolibbpf: Support for syscall program type
Alexei Starovoitov [Fri, 14 May 2021 00:36:06 +0000 (17:36 -0700)]
libbpf: Support for syscall program type

Trivial support for syscall program type.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-5-alexei.starovoitov@gmail.com
3 years agobpf: Prepare bpf syscall to be used from kernel and user space.
Alexei Starovoitov [Fri, 14 May 2021 00:36:05 +0000 (17:36 -0700)]
bpf: Prepare bpf syscall to be used from kernel and user space.

With the help from bpfptr_t prepare relevant bpf syscall commands
to be used from kernel and user space.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-4-alexei.starovoitov@gmail.com
3 years agobpf: Introduce bpfptr_t user/kernel pointer.
Alexei Starovoitov [Fri, 14 May 2021 00:36:04 +0000 (17:36 -0700)]
bpf: Introduce bpfptr_t user/kernel pointer.

Similar to sockptr_t introduce bpfptr_t with few additions:
make_bpfptr() creates new user/kernel pointer in the same address space as
existing user/kernel pointer.
bpfptr_add() advances the user/kernel pointer.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-3-alexei.starovoitov@gmail.com
3 years agobpf: Introduce bpf_sys_bpf() helper and program type.
Alexei Starovoitov [Fri, 14 May 2021 00:36:03 +0000 (17:36 -0700)]
bpf: Introduce bpf_sys_bpf() helper and program type.

Add placeholders for bpf_sys_bpf() helper and new program type.
Make sure to check that expected_attach_type is zero for future extensibility.
Allow tracing helper functions to be used in this program type, since they will
only execute from user context via bpf_prog_test_run.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-2-alexei.starovoitov@gmail.com
3 years agonet: mdio: provide shim implementation of devm_of_mdiobus_register
Vladimir Oltean [Tue, 18 May 2021 17:49:24 +0000 (20:49 +0300)]
net: mdio: provide shim implementation of devm_of_mdiobus_register

Similar to the way in which of_mdiobus_register() has a fallback to the
non-DT based mdiobus_register() when CONFIG_OF is not set, we can create
a shim for the device-managed devm_of_mdiobus_register() which calls
devm_mdiobus_register() and discards the struct device_node *.

In particular, this solves a build issue with the qca8k DSA driver which
uses devm_of_mdiobus_register and can be compiled without CONFIG_OF.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dcb: Remove unnecessary INIT_LIST_HEAD()
Yang Yingliang [Tue, 18 May 2021 13:03:58 +0000 (21:03 +0800)]
net: dcb: Remove unnecessary INIT_LIST_HEAD()

The list_head dcb_app_list is initialized statically.
It is unnecessary to initialize by INIT_LIST_HEAD().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agocxgb4: clip_tbl: use list_del_init instead of list_del/INIT_LIST_HEAD
Yang Yingliang [Tue, 18 May 2021 13:01:35 +0000 (21:01 +0800)]
cxgb4: clip_tbl: use list_del_init instead of list_del/INIT_LIST_HEAD

Using list_del_init() instead of list_del() + INIT_LIST_HEAD()
to simpify the code.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'wan-cleanups'
David S. Miller [Tue, 18 May 2021 20:42:42 +0000 (13:42 -0700)]
Merge branch 'wan-cleanups'

Guangbin Huang says:

====================
net: wan: clean up some code style issues

This patchset clean up some code style issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: fix variable definition style
Peng Li [Tue, 18 May 2021 12:29:54 +0000 (20:29 +0800)]
net: wan: fix variable definition style

Fix the checkpatch error: "foo* bar" should be "foo *bar".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: remove redundant space
Peng Li [Tue, 18 May 2021 12:29:53 +0000 (20:29 +0800)]
net: wan: remove redundant space

Space prohibited before that close parenthesis ')',
so removes the redundant space.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: remove redundant braces {}
Peng Li [Tue, 18 May 2021 12:29:52 +0000 (20:29 +0800)]
net: wan: remove redundant braces {}

Braces {} are not necessary for single statement blocks,
this patch removes redundant braces {}.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: add some required spaces
Peng Li [Tue, 18 May 2021 12:29:51 +0000 (20:29 +0800)]
net: wan: add some required spaces

Add space required before the open parenthesis '(',
and add spaces required around that '<', '>' and '!='.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wan: remove redundant blank lines
Peng Li [Tue, 18 May 2021 12:29:50 +0000 (20:29 +0800)]
net: wan: remove redundant blank lines

This patch removes some redundant blank lines.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: qca8k: fix missing unlock on error in qca8k_vlan_(add|del)
Wei Yongjun [Tue, 18 May 2021 11:24:13 +0000 (11:24 +0000)]
net: dsa: qca8k: fix missing unlock on error in qca8k_vlan_(add|del)

Add the missing unlock before return from function qca8k_vlan_add()
and qca8k_vlan_del() in the error handling case.

Fixes: 028f5f8ef44f ("net: dsa: qca8k: handle error with qca8k_read operation")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agocipso: correct comments of cipso_v4_cache_invalidate()
Zheng Yejian [Tue, 18 May 2021 09:11:41 +0000 (17:11 +0800)]
cipso: correct comments of cipso_v4_cache_invalidate()

Since cipso_v4_cache_invalidate() has no return value, so drop
related descriptions in its comments.

Fixes: 446fda4f2682 ("[NetLabel]: CIPSOv4 engine")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'custom-multipath-hash'
David S. Miller [Tue, 18 May 2021 20:27:32 +0000 (13:27 -0700)]
Merge branch 'custom-multipath-hash'

Ido Schimmel says:

====================
Add support for custom multipath hash

This patchset adds support for custom multipath hash policy for both
IPv4 and IPv6 traffic. The new policy allows user space to control the
outer and inner packet fields used for the hash computation.

Motivation
==========

Linux currently supports different multipath hash policies for IPv4 and
IPv6 traffic:

* Layer 3
* Layer 4
* Layer 3 or inner layer 3, if present

These policies hash on a fixed set of fields, which is inflexible and
against operators' requirements to control the hash input: "The ability
to control the inputs to the hash function should be a consideration in
any load-balancing RFP" [1].

An example of this inflexibility can be seen by the fact that none of
the current policies allows operators to use the standard 5-tuple and
the flow label for multipath hash computation. Such a policy is useful
in the following real-world example of a data center with the following
types of traffic:

* Anycast IPv6 TCP traffic towards layer 4 load balancers. Flow label is
constant (zero) to avoid breaking established connections

* Non-encapsulated IPv6 traffic. Flow label is used to re-route flows
around problematic (congested / failed) paths [2]

* IPv6 encapsulated traffic (IPv4-in-IPv6 or IPv6-in-IPv6). Outer flow
label is generated from encapsulated packet

* UDP encapsulated traffic. Outer source port is generated from
encapsulated packet

In the above example, using the inner flow information for hash
computation in addition to the outer flow information is useful during
failures of the BPF agent that selectively generates the flow label
based on the traffic type. In such cases, the self-healing properties of
the flow label are lost, but encapsulated flows are still load balanced.

Control over the inner fields is even more critical when encapsulation
is performed by hardware routers. For example, the Spectrum ASIC can
only encode 8 bits of entropy in the outer flow label / outer UDP source
port when performing IP / UDP encapsulation. In the case of IPv4 GRE
encapsulation there is no outer field to encode the inner hash in.

User interface
==============

In accordance with existing multipath hash configuration, the new custom
policy is added as a new option (3) to the
net.ipv{4,6}.fib_multipath_hash_policy sysctls. When the new policy is
used, the packet fields used for hash computation are determined by the
net.ipv{4,6}.fib_multipath_hash_fields sysctls. These sysctls accept a
bitmask according to the following table (from ip-sysctl.rst):

====== ============================
0x0001 Source IP address
0x0002 Destination IP address
0x0004 IP protocol
0x0008 Flow Label
0x0010 Source port
0x0020 Destination port
0x0040 Inner source IP address
0x0080 Inner destination IP address
0x0100 Inner IP protocol
0x0200 Inner Flow Label
0x0400 Inner source port
0x0800 Inner destination port
====== ============================

For example, to allow IPv6 traffic to be hashed based on standard
5-tuple and flow label:

 # sysctl -wq net.ipv6.fib_multipath_hash_fields=0x0037
 # sysctl -wq net.ipv6.fib_multipath_hash_policy=3

Implementation
==============

As with existing policies, the new policy relies on the flow dissector
to extract the packet fields for the hash computation. However, unlike
existing policies that either use the outer or inner flow, the new
policy might require both flows to be dissected.

To avoid unnecessary invocations of the flow dissector, the data path
skips dissection of the outer or inner flows if none of the outer or
inner fields are required.

In addition, inner flow dissection is not performed when no
encapsulation was encountered (i.e., 'FLOW_DIS_ENCAPSULATION' not set by
flow dissector) during dissection of the outer flow.

Testing
=======

Three new selftests are added with three different topologies that allow
testing of following traffic combinations:

* Non-encapsulated IPv4 / IPv6 traffic
* IPv4 / IPv6 overlay over IPv4 underlay
* IPv4 / IPv6 overlay over IPv6 underlay

All three tests follow the same pattern. Each time a different packet
field is used for hash computation. When the field changes in the packet
stream, traffic is expected to be balanced across the two paths. When
the field does not change, traffic is expected to be unbalanced across
the two paths.

Patchset overview
=================

Patches #1-#3 add custom multipath hash support for IPv4 traffic
Patches #4-#7 do the same for IPv6
Patches #8-#10 add selftests

Future work
===========

mlxsw support can be found here [3].

Changes since RFC v2 [4]:

* Patch #2: Document that 0x0008 is used for Flow Label
* Patch #2: Do not allow the bitmask to be zero
* Patch #6: Do not allow the bitmask to be zero

Changes since RFC v1 [5]:

* Use a bitmask instead of a bitmap

[1] https://blog.apnic.net/2018/01/11/ipv6-flow-label-misuse-hashing/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3acf3ec3f4b0fd4263989f2e4227bbd1c42b5fe1
[3] https://github.com/idosch/linux/tree/submit/custom_hash_mlxsw_v2
[4] https://lore.kernel.org/netdev/20210509151615.200608-1-idosch@idosch.org/
[5] https://lore.kernel.org/netdev/20210502162257.3472453-1-idosch@idosch.org/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: forwarding: Add test for custom multipath hash with IPv6 GRE
Ido Schimmel [Mon, 17 May 2021 18:15:26 +0000 (21:15 +0300)]
selftests: forwarding: Add test for custom multipath hash with IPv6 GRE

Test that when the hash policy is set to custom, traffic is distributed
only according to the inner fields set in the fib_multipath_hash_fields
sysctl.

Each time set a different field and make sure traffic is only
distributed when the field is changed in the packet stream.

The test only verifies the behavior of IPv4/IPv6 overlays on top of an
IPv6 underlay network. The previous patch verified the same with an IPv4
underlay network.

Example output:

 # ./ip6gre_custom_multipath_hash.sh
 TEST: ping                                                          [ OK ]
 TEST: ping6                                                         [ OK ]
 INFO: Running IPv4 overlay custom multipath hash tests
 TEST: Multipath hash field: Inner source IP (balanced)              [ OK ]
 INFO: Packets sent on path1 / path2: 6602 / 6002
 TEST: Multipath hash field: Inner source IP (unbalanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 1 / 12601
 TEST: Multipath hash field: Inner destination IP (balanced)         [ OK ]
 INFO: Packets sent on path1 / path2: 6802 / 5801
 TEST: Multipath hash field: Inner destination IP (unbalanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 12602 / 3
 TEST: Multipath hash field: Inner source port (balanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 16431 / 16344
 TEST: Multipath hash field: Inner source port (unbalanced)          [ OK ]
 INFO: Packets sent on path1 / path2: 0 / 32773
 TEST: Multipath hash field: Inner destination port (balanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 16431 / 16344
 TEST: Multipath hash field: Inner destination port (unbalanced)     [ OK ]
 INFO: Packets sent on path1 / path2: 2 / 32772
 INFO: Running IPv6 overlay custom multipath hash tests
 TEST: Multipath hash field: Inner source IP (balanced)              [ OK ]
 INFO: Packets sent on path1 / path2: 6704 / 5902
 TEST: Multipath hash field: Inner source IP (unbalanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 1 / 12600
 TEST: Multipath hash field: Inner destination IP (balanced)         [ OK ]
 INFO: Packets sent on path1 / path2: 5751 / 6852
 TEST: Multipath hash field: Inner destination IP (unbalanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 12602 / 0
 TEST: Multipath hash field: Inner flowlabel (balanced)              [ OK ]
 INFO: Packets sent on path1 / path2: 8272 / 8181
 TEST: Multipath hash field: Inner flowlabel (unbalanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 3 / 12602
 TEST: Multipath hash field: Inner source port (balanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 16424 / 16351
 TEST: Multipath hash field: Inner source port (unbalanced)          [ OK ]
 INFO: Packets sent on path1 / path2: 3 / 32774
 TEST: Multipath hash field: Inner destination port (balanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 16425 / 16350
 TEST: Multipath hash field: Inner destination port (unbalanced)     [ OK ]
 INFO: Packets sent on path1 / path2: 2 / 32773

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: forwarding: Add test for custom multipath hash with IPv4 GRE
Ido Schimmel [Mon, 17 May 2021 18:15:25 +0000 (21:15 +0300)]
selftests: forwarding: Add test for custom multipath hash with IPv4 GRE

Test that when the hash policy is set to custom, traffic is distributed
only according to the inner fields set in the fib_multipath_hash_fields
sysctl.

Each time set a different field and make sure traffic is only
distributed when the field is changed in the packet stream.

The test only verifies the behavior of IPv4/IPv6 overlays on top of an
IPv4 underlay network. A subsequent patch will do the same with an IPv6
underlay network.

Example output:

 # ./gre_custom_multipath_hash.sh
 TEST: ping                                                          [ OK ]
 TEST: ping6                                                         [ OK ]
 INFO: Running IPv4 overlay custom multipath hash tests
 TEST: Multipath hash field: Inner source IP (balanced)              [ OK ]
 INFO: Packets sent on path1 / path2: 6601 / 6001
 TEST: Multipath hash field: Inner source IP (unbalanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 0 / 12600
 TEST: Multipath hash field: Inner destination IP (balanced)         [ OK ]
 INFO: Packets sent on path1 / path2: 6802 / 5802
 TEST: Multipath hash field: Inner destination IP (unbalanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 12601 / 1
 TEST: Multipath hash field: Inner source port (balanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 16430 / 16344
 TEST: Multipath hash field: Inner source port (unbalanced)          [ OK ]
 INFO: Packets sent on path1 / path2: 0 / 32772
 TEST: Multipath hash field: Inner destination port (balanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 16430 / 16343
 TEST: Multipath hash field: Inner destination port (unbalanced)     [ OK ]
 INFO: Packets sent on path1 / path2: 0 / 32772
 INFO: Running IPv6 overlay custom multipath hash tests
 TEST: Multipath hash field: Inner source IP (balanced)              [ OK ]
 INFO: Packets sent on path1 / path2: 6702 / 5900
 TEST: Multipath hash field: Inner source IP (unbalanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 0 / 12601
 TEST: Multipath hash field: Inner destination IP (balanced)         [ OK ]
 INFO: Packets sent on path1 / path2: 5751 / 6851
 TEST: Multipath hash field: Inner destination IP (unbalanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 12602 / 1
 TEST: Multipath hash field: Inner flowlabel (balanced)              [ OK ]
 INFO: Packets sent on path1 / path2: 8364 / 8065
 TEST: Multipath hash field: Inner flowlabel (unbalanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 12601 / 0
 TEST: Multipath hash field: Inner source port (balanced)            [ OK ]
 INFO: Packets sent on path1 / path2: 16425 / 16349
 TEST: Multipath hash field: Inner source port (unbalanced)          [ OK ]
 INFO: Packets sent on path1 / path2: 1 / 32770
 TEST: Multipath hash field: Inner destination port (balanced)       [ OK ]
 INFO: Packets sent on path1 / path2: 16425 / 16349
 TEST: Multipath hash field: Inner destination port (unbalanced)     [ OK ]
 INFO: Packets sent on path1 / path2: 2 / 32770

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: forwarding: Add test for custom multipath hash
Ido Schimmel [Mon, 17 May 2021 18:15:24 +0000 (21:15 +0300)]
selftests: forwarding: Add test for custom multipath hash

Test that when the hash policy is set to custom, traffic is distributed
only according to the outer fields set in the fib_multipath_hash_fields
sysctl.

Each time set a different field and make sure traffic is only
distributed when the field is changed in the packet stream.

The test only verifies the behavior with non-encapsulated IPv4 and IPv6
packets. Subsequent patches will add tests for IPv4/IPv6 overlays on top
of IPv4/IPv6 underlay networks.

Example output:

 # ./custom_multipath_hash.sh
 TEST: ping                                                          [ OK ]
 TEST: ping6                                                         [ OK ]
 INFO: Running IPv4 custom multipath hash tests
 TEST: Multipath hash field: Source IP (balanced)                    [ OK ]
 INFO: Packets sent on path1 / path2: 6353 / 6254
 TEST: Multipath hash field: Source IP (unbalanced)                  [ OK ]
 INFO: Packets sent on path1 / path2: 0 / 12600
 TEST: Multipath hash field: Destination IP (balanced)               [ OK ]
 INFO: Packets sent on path1 / path2: 6102 / 6502
 TEST: Multipath hash field: Destination IP (unbalanced)             [ OK ]
 INFO: Packets sent on path1 / path2: 1 / 12601
 TEST: Multipath hash field: Source port (balanced)                  [ OK ]
 INFO: Packets sent on path1 / path2: 16428 / 16345
 TEST: Multipath hash field: Source port (unbalanced)                [ OK ]
 INFO: Packets sent on path1 / path2: 32770 / 2
 TEST: Multipath hash field: Destination port (balanced)             [ OK ]
 INFO: Packets sent on path1 / path2: 16428 / 16345
 TEST: Multipath hash field: Destination port (unbalanced)           [ OK ]
 INFO: Packets sent on path1 / path2: 32770 / 2
 INFO: Running IPv6 custom multipath hash tests
 TEST: Multipath hash field: Source IP (balanced)                    [ OK ]
 INFO: Packets sent on path1 / path2: 6704 / 5903
 TEST: Multipath hash field: Source IP (unbalanced)                  [ OK ]
 INFO: Packets sent on path1 / path2: 12600 / 0
 TEST: Multipath hash field: Destination IP (balanced)               [ OK ]
 INFO: Packets sent on path1 / path2: 5551 / 7052
 TEST: Multipath hash field: Destination IP (unbalanced)             [ OK ]
 INFO: Packets sent on path1 / path2: 12603 / 0
 TEST: Multipath hash field: Flowlabel (balanced)                    [ OK ]
 INFO: Packets sent on path1 / path2: 8378 / 8080
 TEST: Multipath hash field: Flowlabel (unbalanced)                  [ OK ]
 INFO: Packets sent on path1 / path2: 2 / 12603
 TEST: Multipath hash field: Source port (balanced)                  [ OK ]
 INFO: Packets sent on path1 / path2: 16385 / 16388
 TEST: Multipath hash field: Source port (unbalanced)                [ OK ]
 INFO: Packets sent on path1 / path2: 0 / 32774
 TEST: Multipath hash field: Destination port (balanced)             [ OK ]
 INFO: Packets sent on path1 / path2: 16386 / 16390
 TEST: Multipath hash field: Destination port (unbalanced)           [ OK ]
 INFO: Packets sent on path1 / path2: 32771 / 2

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: Add custom multipath hash policy
Ido Schimmel [Mon, 17 May 2021 18:15:23 +0000 (21:15 +0300)]
ipv6: Add custom multipath hash policy

Add a new multipath hash policy where the packet fields used for hash
calculation are determined by user space via the
fib_multipath_hash_fields sysctl that was introduced in the previous
patch.

The current set of available packet fields includes both outer and inner
fields, which requires two invocations of the flow dissector. Avoid
unnecessary dissection of the outer or inner flows by skipping
dissection if none of the outer or inner fields are required.

In accordance with the existing policies, when an skb is not available,
packet fields are extracted from the provided flow key. In which case,
only outer fields are considered.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: Add a sysctl to control multipath hash fields
Ido Schimmel [Mon, 17 May 2021 18:15:22 +0000 (21:15 +0300)]
ipv6: Add a sysctl to control multipath hash fields

A subsequent patch will add a new multipath hash policy where the packet
fields used for multipath hash calculation are determined by user space.
This patch adds a sysctl that allows user space to set these fields.

The packet fields are represented using a bitmask and are common between
IPv4 and IPv6 to allow user space to use the same numbering across both
protocols. For example, to hash based on standard 5-tuple:

 # sysctl -w net.ipv6.fib_multipath_hash_fields=0x0037
 net.ipv6.fib_multipath_hash_fields = 0x0037

To avoid introducing holes in 'struct netns_sysctl_ipv6', move the
'bindv6only' field after the multipath hash fields.

The kernel rejects unknown fields, for example:

 # sysctl -w net.ipv6.fib_multipath_hash_fields=0x1000
 sysctl: setting key "net.ipv6.fib_multipath_hash_fields": Invalid argument

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: Calculate multipath hash inside switch statement
Ido Schimmel [Mon, 17 May 2021 18:15:21 +0000 (21:15 +0300)]
ipv6: Calculate multipath hash inside switch statement

A subsequent patch will add another multipath hash policy where the
multipath hash is calculated directly by the policy specific code and
not outside of the switch statement.

Prepare for this change by moving the multipath hash calculation inside
the switch statement.

No functional changes intended.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: Use a more suitable label name
Ido Schimmel [Mon, 17 May 2021 18:15:20 +0000 (21:15 +0300)]
ipv6: Use a more suitable label name

The 'out_timer' label was added in commit 63152fc0de4d ("[NETNS][IPV6]
ip6_fib - gc timer per namespace") when the timer was allocated on the
heap.

Commit 417f28bb3407 ("netns: dont alloc ipv6 fib timer list") removed
the allocation, but kept the label name.

Rename it to a more suitable name.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv4: Add custom multipath hash policy
Ido Schimmel [Mon, 17 May 2021 18:15:19 +0000 (21:15 +0300)]
ipv4: Add custom multipath hash policy

Add a new multipath hash policy where the packet fields used for hash
calculation are determined by user space via the
fib_multipath_hash_fields sysctl that was introduced in the previous
patch.

The current set of available packet fields includes both outer and inner
fields, which requires two invocations of the flow dissector. Avoid
unnecessary dissection of the outer or inner flows by skipping
dissection if none of the outer or inner fields are required.

In accordance with the existing policies, when an skb is not available,
packet fields are extracted from the provided flow key. In which case,
only outer fields are considered.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv4: Add a sysctl to control multipath hash fields
Ido Schimmel [Mon, 17 May 2021 18:15:18 +0000 (21:15 +0300)]
ipv4: Add a sysctl to control multipath hash fields

A subsequent patch will add a new multipath hash policy where the packet
fields used for multipath hash calculation are determined by user space.
This patch adds a sysctl that allows user space to set these fields.

The packet fields are represented using a bitmask and are common between
IPv4 and IPv6 to allow user space to use the same numbering across both
protocols. For example, to hash based on standard 5-tuple:

 # sysctl -w net.ipv4.fib_multipath_hash_fields=0x0037
 net.ipv4.fib_multipath_hash_fields = 0x0037

The kernel rejects unknown fields, for example:

 # sysctl -w net.ipv4.fib_multipath_hash_fields=0x1000
 sysctl: setting key "net.ipv4.fib_multipath_hash_fields": Invalid argument

More fields can be added in the future, if needed.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv4: Calculate multipath hash inside switch statement
Ido Schimmel [Mon, 17 May 2021 18:15:17 +0000 (21:15 +0300)]
ipv4: Calculate multipath hash inside switch statement

A subsequent patch will add another multipath hash policy where the
multipath hash is calculated directly by the policy specific code and
not outside of the switch statement.

Prepare for this change by moving the multipath hash calculation inside
the switch statement.

No functional changes intended.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobpf: Check for BPF_F_ADJ_ROOM_FIXED_GSO when bpf_skb_change_proto
Dongseok Yi [Wed, 12 May 2021 07:27:33 +0000 (16:27 +0900)]
bpf: Check for BPF_F_ADJ_ROOM_FIXED_GSO when bpf_skb_change_proto

In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the
coalesced packet payload can be > MSS, but < MSS + 20.

bpf_skb_proto_6_to_4() will upgrade the MSS and it can be > the payload
length. After then tcp_gso_segment checks for the payload length if it
is <= MSS. The condition is causing the packet to be dropped.

tcp_gso_segment():
        [...]
        mss = skb_shinfo(skb)->gso_size;
        if (unlikely(skb->len <= mss))
                goto out;
        [...]

Allow to upgrade/downgrade MSS only when BPF_F_ADJ_ROOM_FIXED_GSO is
not set.

Signed-off-by: Dongseok Yi <dseok.yi@samsung.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/1620804453-57566-1-git-send-email-dseok.yi@samsung.com
3 years agoskmsg: Remove unused parameters of sk_msg_wait_data()
Cong Wang [Mon, 17 May 2021 02:23:48 +0000 (19:23 -0700)]
skmsg: Remove unused parameters of sk_msg_wait_data()

'err' and 'flags' are not used, we can just get rid of them.

Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210517022348.50555-1-xiyou.wangcong@gmail.com
3 years agobpf, arm64: Remove redundant switch case about BPF_DIV and BPF_MOD
Tiezhu Yang [Tue, 18 May 2021 08:56:10 +0000 (16:56 +0800)]
bpf, arm64: Remove redundant switch case about BPF_DIV and BPF_MOD

After commit 96a71005bdcb ("bpf, arm64: remove obsolete exception handling
from div/mod"), there is no need to check twice about BPF_DIV and BPF_MOD,
remove the redundant switch case.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1621328170-17583-1-git-send-email-yangtiezhu@loongson.cn
3 years agodrivers/net: Remove leading spaces in Kconfig
Juerg Haefliger [Mon, 17 May 2021 09:58:33 +0000 (11:58 +0200)]
drivers/net: Remove leading spaces in Kconfig

Remove leading spaces before tabs in Kconfig file(s) by running the
following command:

  $ find drivers/net -name 'Kconfig*' | xargs sed -r -i 's/^[ ]+\t/\t/'

Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/packet: Remove redundant assignment to ret
Jiapeng Chong [Mon, 17 May 2021 10:15:25 +0000 (18:15 +0800)]
net/packet: Remove redundant assignment to ret

Variable ret is set to '0' or '-EBUSY', but this value is never read
as it is not used later on, hence it is a redundant assignment and
can be removed.

Clean up the following clang-analyzer warning:

net/packet/af_packet.c:3936:4: warning: Value stored to 'ret' is never
read [clang-analyzer-deadcode.DeadStores].

net/packet/af_packet.c:3933:4: warning: Value stored to 'ret' is never
read [clang-analyzer-deadcode.DeadStores].

No functional change.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'stmmac-xpcs-eee'
David S. Miller [Mon, 17 May 2021 22:53:59 +0000 (15:53 -0700)]
Merge branch 'stmmac-xpcs-eee'

Michael Sit Wei Hong says:

====================
Introducing support for DWC xpcs Energy Efficient Ethernet

The goal of this patch set is to enable EEE in the xpcs so that when
EEE is enabled, the MAC-->xpcs-->PHY have all the EEE related
configurations enabled.

Patch 1 adds the functions to enable EEE in the xpcs and sets it to
transparent mode.
Patch 2 adds the callbacks to configure the xpcs EEE mode.

The results are tested by checking the lpi counters of the tx and rx
path of the interface. When EEE is enabled, the lpi counters should
increament as it enters and exits lpi states.

host@EHL$ ethtool --show-eee enp0s30f4
EEE Settings for enp0s30f4:
        EEE status: disabled
        Tx LPI: disabled
        Supported EEE link modes:  100baseT/Full
                                   1000baseT/Full
        Advertised EEE link modes:  Not reported
        Link partner advertised EEE link modes:  100baseT/Full
                                                 1000baseT/Full
host@EHL$ ethtool -S enp0s30f4 | grep lpi
     irq_tx_path_in_lpi_mode_n: 0
     irq_tx_path_exit_lpi_mode_n: 0
     irq_rx_path_in_lpi_mode_n: 0
     irq_rx_path_exit_lpi_mode_n: 0
host@EHL$ ethtool --set-eee enp0s30f4 eee on
host@EHL$ [  110.265154] intel-eth-pci 0000:00:1e.4 enp0s30f4: Link is Down
[  112.315155] intel-eth-pci 0000:00:1e.4 enp0s30f4: Link is Up - 1Gbps/Full - flow control off
[  112.324612] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s30f4: link becomes ready
host@EHL$ ethtool --show-eee enp0s30f4
EEE Settings for enp0s30f4:
        EEE status: enabled - active
        Tx LPI: 1000000 (us)
        Supported EEE link modes:  100baseT/Full
                                   1000baseT/Full
        Advertised EEE link modes:  100baseT/Full
                                    1000baseT/Full
        Link partner advertised EEE link modes:  100baseT/Full
                                                 1000baseT/Full
host@EHL$ ethtool -S enp0s30f4 | grep lpi
     irq_tx_path_in_lpi_mode_n: 6
     irq_tx_path_exit_lpi_mode_n: 5
     irq_rx_path_in_lpi_mode_n: 7
     irq_rx_path_exit_lpi_mode_n: 6
host@EHL$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.02 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.510 ms
64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=0.489 ms
64 bytes from 192.168.1.1: icmp_seq=4 ttl=64 time=0.484 ms
64 bytes from 192.168.1.1: icmp_seq=5 ttl=64 time=0.504 ms
64 bytes from 192.168.1.1: icmp_seq=6 ttl=64 time=0.466 ms
64 bytes from 192.168.1.1: icmp_seq=7 ttl=64 time=0.529 ms
64 bytes from 192.168.1.1: icmp_seq=8 ttl=64 time=0.519 ms
64 bytes from 192.168.1.1: icmp_seq=9 ttl=64 time=0.518 ms
64 bytes from 192.168.1.1: icmp_seq=10 ttl=64 time=0.501 ms

--- 192.168.1.1 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9216ms
rtt min/avg/max/mdev = 0.466/0.553/1.018/0.155 ms
host@EHL$ ethtool -S enp0s30f4 | grep lpi
     irq_tx_path_in_lpi_mode_n: 22
     irq_tx_path_exit_lpi_mode_n: 21
     irq_rx_path_in_lpi_mode_n: 21
     irq_rx_path_exit_lpi_mode_n: 20
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: stmmac: Add callbacks for DWC xpcs Energy Efficient Ethernet
Michael Sit Wei Hong [Mon, 17 May 2021 09:43:32 +0000 (17:43 +0800)]
net: stmmac: Add callbacks for DWC xpcs Energy Efficient Ethernet

Link xpcs callback functions for MAC to configure the xpcs EEE feature.

The clk_eee frequency is used to calculate the MULT_FACT_100NS. This is
to adjust the clock tic closer to 100ns.

Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: pcs: Introducing support for DWC xpcs Energy Efficient Ethernet
Michael Sit Wei Hong [Mon, 17 May 2021 09:43:31 +0000 (17:43 +0800)]
net: pcs: Introducing support for DWC xpcs Energy Efficient Ethernet

Add DWC xpcs EEE support callbacks.The callback function is used to
set EEE registers on xpcs.

xpcs transparent mode is enabled to allow PHY to detect MAC EEE status.

Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoalx: fix a double unlock in alx_probe()
Dan Carpenter [Mon, 17 May 2021 08:57:56 +0000 (11:57 +0300)]
alx: fix a double unlock in alx_probe()

We're not holding the lock at this point so "goto unlock;" should be
"goto unmap;"

Fixes: 4a5fe57e7751 ("alx: use fine-grained locking instead of RTNL")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wwan: Add WWAN port type attribute
Loic Poulain [Mon, 17 May 2021 09:53:34 +0000 (11:53 +0200)]
net: wwan: Add WWAN port type attribute

The port type is by default part of the WWAN port device name.
However device name can not be considered as a 'stable' API and
may be subject to change in the future. This change adds a proper
device attribute that can be used to determine the WWAN protocol/
type.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'stmmac-RK3568'
David S. Miller [Mon, 17 May 2021 22:34:43 +0000 (15:34 -0700)]
Merge branch 'stmmac-RK3568'

Ezequiel Garcia says:

====================
stmmmac: RK3568

Here's the third version of this patchset, taking
the feedback from Heiko and Chen-Yu Tsai.

Although this solution is a tad ugly as it hardcodes
the register addresses, we believe it's the most robust approach.

See:

https://lore.kernel.org/netdev/CAGb2v67ZBR=XDFPeXQc429HNu_dbY__-KN50tvBW44fXMs78_w@mail.gmail.com/

This is tested on RK3566 EVB2 and seems to work well.
Once the RK3568 devicetree lands upstream, we'll post
patches to add network support for RK3566 and RK3568.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: stmmac: Add RK3566/RK3568 SoC support
David Wu [Mon, 17 May 2021 15:40:37 +0000 (12:40 -0300)]
net: stmmac: Add RK3566/RK3568 SoC support

Add constants and callback functions for the dwmac present
on RK3566/RK3568 SoCs.

RK3568 has two MACs, and RK3566 just one, but it's otherwise
the same IP core.

Signed-off-by: David Wu <david.wu@rock-chips.com>
[Ezequiel: Separate rk3566-gmac support]
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodt-bindings: net: rockchip-dwmac: add rk3568 compatible string
Ezequiel Garcia [Mon, 17 May 2021 15:40:36 +0000 (12:40 -0300)]
dt-bindings: net: rockchip-dwmac: add rk3568 compatible string

Add compatible string for RK3568 gmac, and constrain it to
be compatible with Synopsys dwmac 4.20a.

Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>