platform/kernel/linux-starfive.git
2 years agoMerge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Jakub Kicinski [Fri, 22 Jul 2022 23:55:43 +0000 (16:55 -0700)]
Merge https://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
bpf-next 2022-07-22

We've added 73 non-merge commits during the last 12 day(s) which contain
a total of 88 files changed, 3458 insertions(+), 860 deletions(-).

The main changes are:

1) Implement BPF trampoline for arm64 JIT, from Xu Kuohai.

2) Add ksyscall/kretsyscall section support to libbpf to simplify tracing kernel
   syscalls through kprobe mechanism, from Andrii Nakryiko.

3) Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel
   function, from Song Liu & Jiri Olsa.

4) Add new kfunc infrastructure for netfilter's CT e.g. to insert and change
   entries, from Kumar Kartikeya Dwivedi & Lorenzo Bianconi.

5) Add a ksym BPF iterator to allow for more flexible and efficient interactions
   with kernel symbols, from Alan Maguire.

6) Bug fixes in libbpf e.g. for uprobe binary path resolution, from Dan Carpenter.

7) Fix BPF subprog function names in stack traces, from Alexei Starovoitov.

8) libbpf support for writing custom perf event readers, from Jon Doron.

9) Switch to use SPDX tag for BPF helper man page, from Alejandro Colomar.

10) Fix xsk send-only sockets when in busy poll mode, from Maciej Fijalkowski.

11) Reparent BPF maps and their charging on memcg offlining, from Roman Gushchin.

12) Multiple follow-up fixes around BPF lsm cgroup infra, from Stanislav Fomichev.

13) Use bootstrap version of bpftool where possible to speed up builds, from Pu Lehui.

14) Cleanup BPF verifier's check_func_arg() handling, from Joanne Koong.

15) Make non-prealloced BPF map allocations low priority to play better with
    memcg limits, from Yafang Shao.

16) Fix BPF test runner to reject zero-length data for skbs, from Zhengchao Shao.

17) Various smaller cleanups and improvements all over the place.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (73 commits)
  bpf: Simplify bpf_prog_pack_[size|mask]
  bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)
  bpf, x64: Allow to use caller address from stack
  ftrace: Allow IPMODIFY and DIRECT ops on the same function
  ftrace: Add modify_ftrace_direct_multi_nolock
  bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test
  bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF
  selftests/bpf: Fix test_verifier failed test in unprivileged mode
  selftests/bpf: Add negative tests for new nf_conntrack kfuncs
  selftests/bpf: Add tests for new nf_conntrack kfuncs
  selftests/bpf: Add verifier tests for trusted kfunc args
  net: netfilter: Add kfuncs to set and change CT status
  net: netfilter: Add kfuncs to set and change CT timeout
  net: netfilter: Add kfuncs to allocate and insert CT
  net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
  bpf: Add documentation for kfuncs
  bpf: Add support for forcing kfunc args to be trusted
  bpf: Switch to new kfunc flags infrastructure
  tools/resolve_btfids: Add support for 8-byte BTF sets
  bpf: Introduce 8-byte BTF set
  ...
====================

Link: https://lore.kernel.org/r/20220722221218.29943-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'io_uring-zerocopy-send' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Fri, 22 Jul 2022 21:53:33 +0000 (14:53 -0700)]
Merge branch 'io_uring-zerocopy-send' of git://git./linux/kernel/git/kuba/linux

Pull in Pavel's patch from a shared branch.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: fix uninitialised msghdr->sg_from_iter
Pavel Begunkov [Thu, 21 Jul 2022 14:25:46 +0000 (15:25 +0100)]
net: fix uninitialised msghdr->sg_from_iter

Because of how struct msghdr is usually initialised some fields and
sg_from_iter in particular might be left out not initialised, so we
can't safely use it in __zerocopy_sg_from_iter().

For now use the callback only when there is ->msg_ubuf set relying on
the fact that they're used together and we properly zero ->msg_ubuf.

Fixes: ebe73a284f4de8 ("net: Allow custom iter handler in msghdr")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Message-Id: <ce8b68b41351488f79fd998b032b3c56e9b1cc6c.1658401817.git.asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomlxsw: core: Fix use-after-free calling devl_unlock() in mlxsw_core_bus_device_unregi...
Jiri Pirko [Thu, 21 Jul 2022 14:24:24 +0000 (16:24 +0200)]
mlxsw: core: Fix use-after-free calling devl_unlock() in mlxsw_core_bus_device_unregister()

Do devl_unlock() before freeing the devlink in
mlxsw_core_bus_device_unregister() function.

Reported-by: Ido Schimmel <idosch@nvidia.com>
Fixes: 72a4c8c94efa ("mlxsw: convert driver to use unlocked devlink API during init/fini")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220721142424.3975704-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agobpf: Simplify bpf_prog_pack_[size|mask]
Song Liu [Wed, 13 Jul 2022 20:49:50 +0000 (13:49 -0700)]
bpf: Simplify bpf_prog_pack_[size|mask]

Simplify the logic that selects bpf_prog_pack_size, and always use
(PMD_SIZE * num_possible_nodes()). This is a good tradeoff, as most of
the performance benefit observed is from less direct map fragmentation [0].

Also, module_alloc(4MB) may not allocate 4MB aligned memory. Therefore,
we cannot use (ptr & bpf_prog_pack_mask) to find the correct address of
bpf_prog_pack. Fix this by checking the header address falls in the range
of pack->ptr and (pack->ptr + bpf_prog_pack_size).

  [0] https://lore.kernel.org/bpf/20220707223546.4124919-1-song@kernel.org/

Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20220713204950.3015201-1-song@kernel.org
2 years agobpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)
Song Liu [Wed, 20 Jul 2022 00:21:26 +0000 (17:21 -0700)]
bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)

When tracing a function with IPMODIFY ftrace_ops (livepatch), the bpf
trampoline must follow the instruction pointer saved on stack. This needs
extra handling for bpf trampolines with BPF_TRAMP_F_CALL_ORIG flag.

Implement bpf_tramp_ftrace_ops_func and use it for the ftrace_ops used
by BPF trampoline. This enables tracing functions with livepatch.

This also requires moving bpf trampoline to *_ftrace_direct_mult APIs.

Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-5-song@kernel.org
2 years agobpf, x64: Allow to use caller address from stack
Jiri Olsa [Wed, 20 Jul 2022 00:21:25 +0000 (17:21 -0700)]
bpf, x64: Allow to use caller address from stack

Currently we call the original function by using the absolute address
given at the JIT generation. That's not usable when having trampoline
attached to multiple functions, or the target address changes dynamically
(in case of live patch). In such cases we need to take the return address
from the stack.

Adding support to retrieve the original function address from the stack
by adding new BPF_TRAMP_F_ORIG_STACK flag for arch_prepare_bpf_trampoline
function.

Basically we take the return address of the 'fentry' call:

   function + 0: call fentry    # stores 'function + 5' address on stack
   function + 5: ...

The 'function + 5' address will be used as the address for the
original function to call.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220720002126.803253-4-song@kernel.org
2 years agoftrace: Allow IPMODIFY and DIRECT ops on the same function
Song Liu [Wed, 20 Jul 2022 00:21:24 +0000 (17:21 -0700)]
ftrace: Allow IPMODIFY and DIRECT ops on the same function

IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.

First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().

Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.

If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.

If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.

For more details, please refer to comment before enum ftrace_ops_cmd.

Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2 years agoftrace: Add modify_ftrace_direct_multi_nolock
Song Liu [Wed, 20 Jul 2022 00:21:23 +0000 (17:21 -0700)]
ftrace: Add modify_ftrace_direct_multi_nolock

This is similar to modify_ftrace_direct_multi, but does not acquire
direct_mutex. This is useful when direct_mutex is already locked by the
user.

Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/bpf/20220720002126.803253-2-song@kernel.org
2 years agobpf/selftests: Fix couldn't retrieve pinned program in xdp veth test
Jie2x Zhou [Tue, 19 Jul 2022 08:24:30 +0000 (16:24 +0800)]
bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test

Before change:

  selftests: bpf: test_xdp_veth.sh
  Couldn't retrieve pinned program '/sys/fs/bpf/test_xdp_veth/progs/redirect_map_0': No such file or directory
  selftests: xdp_veth [SKIP]
  ok 20 selftests: bpf: test_xdp_veth.sh # SKIP

After change:

  PING 10.1.1.33 (10.1.1.33) 56(84) bytes of data.
  64 bytes from 10.1.1.33: icmp_seq=1 ttl=64 time=0.320 ms
  --- 10.1.1.33 ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 0.320/0.320/0.320/0.000 ms
  selftests: xdp_veth [PASS]

For the test case, the following can be found:

  ls /sys/fs/bpf/test_xdp_veth/progs/redirect_map_0
  ls: cannot access '/sys/fs/bpf/test_xdp_veth/progs/redirect_map_0': No such file or directory
  ls /sys/fs/bpf/test_xdp_veth/progs/
  xdp_redirect_map_0  xdp_redirect_map_1  xdp_redirect_map_2

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jie2x Zhou <jie2x.zhou@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220719082430.9916-1-jie2x.zhou@intel.com
2 years agobpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF
Kumar Kartikeya Dwivedi [Fri, 22 Jul 2022 11:36:05 +0000 (13:36 +0200)]
bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF

BTF_ID_FLAGS macro needs to be able to take 0 or 1 args, so make it a
variable argument. BTF_SET8_END is incorrect, it should just be empty.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: ab21d6063c01 ("bpf: Introduce 8-byte BTF set")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220722113605.6513-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agonet: add missing includes and forward declarations under net/
Jakub Kicinski [Wed, 20 Jul 2022 23:57:58 +0000 (16:57 -0700)]
net: add missing includes and forward declarations under net/

This patch adds missing includes to headers under include/net.
All these problems are currently masked by the existing users
including the missing dependency before the broken header.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'sfc-E100-VF-respresenters'
David S. Miller [Fri, 22 Jul 2022 11:50:07 +0000 (12:50 +0100)]
Merge branch 'sfc-E100-VF-respresenters'

Edward Cree says:

====================
sfc: VF representors for EF100

This series adds representor netdevices for EF100 VFs, as a step towards
 supporting TC offload and vDPA usecases in future patches.
In this first series is basic netdevice creation and packet TX; the
 following series will add the RX path.

v3: dropped massive mcdi_pcol.h patch which was applied separately.
v2: converted comments on struct efx_nic members added in patch #4 to
 kernel-doc (Jakub).  While at it, also gave struct efx_rep its own kdoc
 since several members had comments on them.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: attach/detach EF100 representors along with their owning PF
Edward Cree [Wed, 20 Jul 2022 18:33:49 +0000 (19:33 +0100)]
sfc: attach/detach EF100 representors along with their owning PF

Since representors piggy-back on the PF's queues for TX, they can
 only accept new TXes while the PF is up.  Thus, any operation which
 detaches the PF must first detach all its VFreps.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: hook up ef100 representor TX
Edward Cree [Wed, 20 Jul 2022 18:33:48 +0000 (19:33 +0100)]
sfc: hook up ef100 representor TX

Implement .ndo_start_xmit() by calling into the parent PF's TX path.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: support passing a representor to the EF100 TX path
Edward Cree [Wed, 20 Jul 2022 18:33:47 +0000 (19:33 +0100)]
sfc: support passing a representor to the EF100 TX path

A non-null efv in __ef100_enqueue_skb() indicates that the packet is
 from that representor, should be transmitted with a suitable option
 descriptor (to instruct the switch to deliver it to the representee),
 and should not be accounted to the parent PF's stats or BQL.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: determine representee m-port for EF100 representors
Edward Cree [Wed, 20 Jul 2022 18:29:34 +0000 (19:29 +0100)]
sfc: determine representee m-port for EF100 representors

An MAE port, or m-port, is a port (source/destination for traffic) on
 the Match-Action Engine (the internal switch on EF100).
Representors will use their representee's m-port for two purposes: as
 a destination override on TX from the representor, and as a source
 match in 'default rules' to steer representee traffic (when not
 matched by e.g. a TC flower rule) to representor RX via the parent
 PF's receive queue.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: phys port/switch identification for ef100 reps
Edward Cree [Wed, 20 Jul 2022 18:29:33 +0000 (19:29 +0100)]
sfc: phys port/switch identification for ef100 reps

Requires storing VF index in struct efx_rep.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: add basic ethtool ops to ef100 reps
Edward Cree [Wed, 20 Jul 2022 18:29:30 +0000 (19:29 +0100)]
sfc: add basic ethtool ops to ef100 reps

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: add skeleton ef100 VF representors
Edward Cree [Wed, 20 Jul 2022 18:29:28 +0000 (19:29 +0100)]
sfc: add skeleton ef100 VF representors

No net_device_ops yet, just a placeholder netdev created per VF.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: detect ef100 MAE admin privilege/capability at probe time
Edward Cree [Wed, 20 Jul 2022 18:29:26 +0000 (19:29 +0100)]
sfc: detect ef100 MAE admin privilege/capability at probe time

One PCIe function per network port (more precisely, per m-port group) is
 responsible for configuring the Match-Action Engine which performs
 switching and packet modification in the slice to support flower/OVS
 offload.  The GRP_MAE bit in the privilege mask indicates whether a
 given function has this capability.
At probe time, call MCDIs to read the calling function's privilege mask,
 and store the GRP_MAE bit in a new ef100_nic_data member.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agosfc: update EF100 register descriptions
Edward Cree [Wed, 20 Jul 2022 18:29:24 +0000 (19:29 +0100)]
sfc: update EF100 register descriptions

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoping: support ipv6 ping socket flow labels
Alan Brady [Wed, 20 Jul 2022 18:13:10 +0000 (11:13 -0700)]
ping: support ipv6 ping socket flow labels

Ping sockets don't appear to make any attempt to preserve flow labels
created and set by userspace using IPV6_FLOWINFO_SEND. Instead they are
clobbered by autolabels (if enabled) or zero.

Grab the flowlabel out of the msghdr similar to how rawv6_sendmsg does
it and move the memset up so it doesn't get zeroed after.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: marvell: prestera: use netif_is_any_bridge_port instead of open code
Juhee Kang [Thu, 21 Jul 2022 10:26:48 +0000 (19:26 +0900)]
net: marvell: prestera: use netif_is_any_bridge_port instead of open code

The open code which is netif_is_bridge_port() || netif_is_ovs_port() is
defined as a new helper function on netdev.h like netif_is_any_bridge_port
that can check both IFF flags in 1 go. So use netif_is_any_bridge_port()
function instead of open code. This patch doesn't change logic.

Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: use netif_is_any_bridge_port() instead of open code
Juhee Kang [Thu, 21 Jul 2022 10:26:47 +0000 (19:26 +0900)]
mlxsw: use netif_is_any_bridge_port() instead of open code

The open code which is netif_is_bridge_port() || netif_is_ovs_port() is
defined as a new helper function on netdev.h like netif_is_any_bridge_port
that can check both IFF flags in 1 go. So use netif_is_any_bridge_port()
function instead of open code. This patch doesn't change logic.

Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'New nf_conntrack kfuncs for insertion, changing timeout, status'
Alexei Starovoitov [Fri, 22 Jul 2022 03:59:43 +0000 (20:59 -0700)]
Merge branch 'New nf_conntrack kfuncs for insertion, changing timeout, status'

Kumar Kartikeya Dwivedi says:

====================

Introduce the following new kfuncs:
 - bpf_{xdp,skb}_ct_alloc
 - bpf_ct_insert_entry
 - bpf_ct_{set,change}_timeout
 - bpf_ct_{set,change}_status

The setting of timeout and status on allocated or inserted/looked up CT
is same as the ctnetlink interface, hence code is refactored and shared
with the kfuncs. It is ensured allocated CT cannot be passed to kfuncs
that expected inserted CT, and vice versa. Please see individual patches
for details.

Changelog:
----------
v6 -> v7:
v6: https://lore.kernel.org/bpf/20220719132430.19993-1-memxor@gmail.com

 * Use .long to encode flags (Alexei)
 * Fix description of KF_RET_NULL in documentation (Toke)

v5 -> v6:
v5: https://lore.kernel.org/bpf/20220623192637.3866852-1-memxor@gmail.com

 * Introduce kfunc flags, rework verifier to work with them
 * Add documentation for kfuncs
 * Add comment explaining TRUSTED_ARGS kfunc flag (Alexei)
 * Fix missing offset check for trusted arguments (Alexei)
 * Change nf_conntrack test minimum delta value to 8

v4 -> v5:
v4: https://lore.kernel.org/bpf/cover.1653600577.git.lorenzo@kernel.org

 * Drop read-only PTR_TO_BTF_ID approach, use struct nf_conn___init (Alexei)
 * Drop acquire release pair code that is no longer required (Alexei)
 * Disable writes into nf_conn, use dedicated helpers (Florian, Alexei)
 * Refactor and share ctnetlink code for setting timeout and status
 * Do strict type matching on finding __ref suffix on argument to
   prevent passing nf_conn___init as nf_conn (offset = 0, match on walk)
 * Remove bpf_ct_opts parameter from bpf_ct_insert_entry
 * Update selftests for new additions, add more negative tests

v3 -> v4:
v3: https://lore.kernel.org/bpf/cover.1652870182.git.lorenzo@kernel.org

 * split bpf_xdp_ct_add in bpf_xdp_ct_alloc/bpf_skb_ct_alloc and
   bpf_ct_insert_entry
 * add verifier code to properly populate/configure ct entry
 * improve selftests

v2 -> v3:
v2: https://lore.kernel.org/bpf/cover.1652372970.git.lorenzo@kernel.org

 * add bpf_xdp_ct_add and bpf_ct_refresh_timeout kfunc helpers
 * remove conntrack dependency from selftests
 * add support for forcing kfunc args to be referenced and related selftests

v1 -> v2:
v1: https://lore.kernel.org/bpf/1327f8f5696ff2bc60400e8f3b79047914ccc837.1651595019.git.lorenzo@kernel.org

 * add bpf_ct_refresh_timeout kfunc selftest

Kumar Kartikeya Dwivedi (10):
  bpf: Introduce 8-byte BTF set
  tools/resolve_btfids: Add support for 8-byte BTF sets
  bpf: Switch to new kfunc flags infrastructure
  bpf: Add support for forcing kfunc args to be trusted
  bpf: Add documentation for kfuncs
  net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
  net: netfilter: Add kfuncs to set and change CT timeout
  selftests/bpf: Add verifier tests for trusted kfunc args
  selftests/bpf: Add negative tests for new nf_conntrack kfuncs
  selftests/bpf: Fix test_verifier failed test in unprivileged mode
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoselftests/bpf: Fix test_verifier failed test in unprivileged mode
Kumar Kartikeya Dwivedi [Thu, 21 Jul 2022 13:42:45 +0000 (15:42 +0200)]
selftests/bpf: Fix test_verifier failed test in unprivileged mode

Loading the BTF won't be permitted without privileges, hence only test
for privileged mode by setting the prog type. This makes the
test_verifier show 0 failures when unprivileged BPF is enabled.

Fixes: 41188e9e9def ("selftest/bpf: Test for use-after-free bug fix in inline_bpf_loop")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-14-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoselftests/bpf: Add negative tests for new nf_conntrack kfuncs
Kumar Kartikeya Dwivedi [Thu, 21 Jul 2022 13:42:44 +0000 (15:42 +0200)]
selftests/bpf: Add negative tests for new nf_conntrack kfuncs

Test cases we care about and ensure improper usage is caught and
rejected by the verifier.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-13-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoselftests/bpf: Add tests for new nf_conntrack kfuncs
Lorenzo Bianconi [Thu, 21 Jul 2022 13:42:43 +0000 (15:42 +0200)]
selftests/bpf: Add tests for new nf_conntrack kfuncs

Introduce selftests for the following kfunc helpers:
- bpf_xdp_ct_alloc
- bpf_skb_ct_alloc
- bpf_ct_insert_entry
- bpf_ct_set_timeout
- bpf_ct_change_timeout
- bpf_ct_set_status
- bpf_ct_change_status

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-12-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agoselftests/bpf: Add verifier tests for trusted kfunc args
Kumar Kartikeya Dwivedi [Thu, 21 Jul 2022 13:42:42 +0000 (15:42 +0200)]
selftests/bpf: Add verifier tests for trusted kfunc args

Make sure verifier rejects the bad cases and ensure the good case keeps
working. The selftests make use of the bpf_kfunc_call_test_ref kfunc
added in the previous patch only for verification.

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agonet: netfilter: Add kfuncs to set and change CT status
Lorenzo Bianconi [Thu, 21 Jul 2022 13:42:41 +0000 (15:42 +0200)]
net: netfilter: Add kfuncs to set and change CT status

Introduce bpf_ct_set_status and bpf_ct_change_status kfunc helpers in
order to set nf_conn field of allocated entry or update nf_conn status
field of existing inserted entry. Use nf_ct_change_status_common to
share the permitted status field changes between netlink and BPF side
by refactoring ctnetlink_change_status.

It is required to introduce two kfuncs taking nf_conn___init and nf_conn
instead of sharing one because KF_TRUSTED_ARGS flag causes strict type
checking. This would disallow passing nf_conn___init to kfunc taking
nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we
only want to accept refcounted pointers and not e.g. ct->master.

Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and
bpf_ct_change_* kfuncs are meant to be used on inserted or looked up
CT entry.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-10-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agonet: netfilter: Add kfuncs to set and change CT timeout
Kumar Kartikeya Dwivedi [Thu, 21 Jul 2022 13:42:40 +0000 (15:42 +0200)]
net: netfilter: Add kfuncs to set and change CT timeout

Introduce bpf_ct_set_timeout and bpf_ct_change_timeout kfunc helpers in
order to change nf_conn timeout. This is same as ctnetlink_change_timeout,
hence code is shared between both by extracting it out to
__nf_ct_change_timeout. It is also updated to return an error when it
sees IPS_FIXED_TIMEOUT_BIT bit in ct->status, as that check was missing.

It is required to introduce two kfuncs taking nf_conn___init and nf_conn
instead of sharing one because KF_TRUSTED_ARGS flag causes strict type
checking. This would disallow passing nf_conn___init to kfunc taking
nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we
only want to accept refcounted pointers and not e.g. ct->master.

Apart from this, bpf_ct_set_timeout is only called for newly allocated
CT so it doesn't need to inspect the status field just yet. Sharing the
helpers even if it was possible would make timeout setting helper
sensitive to order of setting status and timeout after allocation.

Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and
bpf_ct_change_* kfuncs are meant to be used on inserted or looked up
CT entry.

Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-9-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agonet: netfilter: Add kfuncs to allocate and insert CT
Lorenzo Bianconi [Thu, 21 Jul 2022 13:42:39 +0000 (15:42 +0200)]
net: netfilter: Add kfuncs to allocate and insert CT

Introduce bpf_xdp_ct_alloc, bpf_skb_ct_alloc and bpf_ct_insert_entry
kfuncs in order to insert a new entry from XDP and TC programs.
Introduce bpf_nf_ct_tuple_parse utility routine to consolidate common
code.

We extract out a helper __nf_ct_set_timeout, used by the ctnetlink and
nf_conntrack_bpf code, extract it out to nf_conntrack_core, so that
nf_conntrack_bpf doesn't need a dependency on CONFIG_NF_CT_NETLINK.
Later this helper will be reused as a helper to set timeout of allocated
but not yet inserted CT entry.

The allocation functions return struct nf_conn___init instead of
nf_conn, to distinguish allocated CT from an already inserted or looked
up CT. This is later used to enforce restrictions on what kfuncs
allocated CT can be used with.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agonet: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
Kumar Kartikeya Dwivedi [Thu, 21 Jul 2022 13:42:38 +0000 (15:42 +0200)]
net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup

Move common checks inside the common function, and maintain the only
difference the two being how to obtain the struct net * from ctx.
No functional change intended.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: Add documentation for kfuncs
Kumar Kartikeya Dwivedi [Thu, 21 Jul 2022 13:42:37 +0000 (15:42 +0200)]
bpf: Add documentation for kfuncs

As the usage of kfuncs grows, we are starting to form consensus on the
kinds of attributes and annotations that kfuncs can have. To better help
developers make sense of the various options available at their disposal
to present an unstable API to the BPF users, document the various kfunc
flags and annotations, their expected usage, and explain the process of
defining and registering a kfunc set.

Cc: KP Singh <kpsingh@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: Add support for forcing kfunc args to be trusted
Kumar Kartikeya Dwivedi [Thu, 21 Jul 2022 13:42:36 +0000 (15:42 +0200)]
bpf: Add support for forcing kfunc args to be trusted

Teach the verifier to detect a new KF_TRUSTED_ARGS kfunc flag, which
means each pointer argument must be trusted, which we define as a
pointer that is referenced (has non-zero ref_obj_id) and also needs to
have its offset unchanged, similar to how release functions expect their
argument. This allows a kfunc to receive pointer arguments unchanged
from the result of the acquire kfunc.

This is required to ensure that kfunc that operate on some object only
work on acquired pointers and not normal PTR_TO_BTF_ID with same type
which can be obtained by pointer walking. The restrictions applied to
release arguments also apply to trusted arguments. This implies that
strict type matching (not deducing type by recursively following members
at offset) and OBJ_RELEASE offset checks (ensuring they are zero) are
used for trusted pointer arguments.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: Switch to new kfunc flags infrastructure
Kumar Kartikeya Dwivedi [Thu, 21 Jul 2022 13:42:35 +0000 (15:42 +0200)]
bpf: Switch to new kfunc flags infrastructure

Instead of populating multiple sets to indicate some attribute and then
researching the same BTF ID in them, prepare a single unified BTF set
which indicates whether a kfunc is allowed to be called, and also its
attributes if any at the same time. Now, only one call is needed to
perform the lookup for both kfunc availability and its attributes.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agotools/resolve_btfids: Add support for 8-byte BTF sets
Kumar Kartikeya Dwivedi [Thu, 21 Jul 2022 13:42:34 +0000 (15:42 +0200)]
tools/resolve_btfids: Add support for 8-byte BTF sets

A flag is a 4-byte symbol that may follow a BTF ID in a set8. This is
used in the kernel to tag kfuncs in BTF sets with certain flags. Add
support to adjust the sorting code so that it passes size as 8 bytes
for 8-byte BTF sets.

Cc: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220721134245.2450-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agobpf: Introduce 8-byte BTF set
Kumar Kartikeya Dwivedi [Thu, 21 Jul 2022 13:42:33 +0000 (15:42 +0200)]
bpf: Introduce 8-byte BTF set

Introduce support for defining flags for kfuncs using a new set of
macros, BTF_SET8_START/BTF_SET8_END, which define a set which contains
8 byte elements (each of which consists of a pair of BTF ID and flags),
using a new BTF_ID_FLAGS macro.

This will be used to tag kfuncs registered for a certain program type
as acquire, release, sleepable, ret_null, etc. without having to create
more and more sets which was proving to be an unscalable solution.

Now, when looking up whether a kfunc is allowed for a certain program,
we can also obtain its kfunc flags in the same call and avoid further
lookups.

The resolve_btfids change is split into a separate patch.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 years agonet: ipv6: avoid accepting values greater than 2 for accept_untracked_na
Jaehee Park [Wed, 20 Jul 2022 18:36:32 +0000 (14:36 -0400)]
net: ipv6: avoid accepting values greater than 2 for accept_untracked_na

The accept_untracked_na sysctl changed from a boolean to an integer
when a new knob '2' was added. This patch provides a safeguard to avoid
accepting values that are not defined in the sysctl. When setting a
value greater than 2, the user will get an 'invalid argument' warning.

Fixes: aaa5f515b16b ("net: ipv6: new accept_untracked_na option to accept na only if in-network")
Signed-off-by: Jaehee Park <jhpark1013@gmail.com>
Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Suggested-by: Roopa Prabhu <roopa@nvidia.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20220720183632.376138-1-jhpark1013@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: tls: add a test for timeo vs lock
Jakub Kicinski [Wed, 20 Jul 2022 20:37:01 +0000 (13:37 -0700)]
selftests: tls: add a test for timeo vs lock

Add a test for recv timeout. Place it in the tls_err
group, so it only runs for TLS 1.2 and 1.3 but not
for every AEAD out there.

Link: https://lore.kernel.org/r/20220720203701.2179034-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotls: rx: release the sock lock on locking timeout
Jakub Kicinski [Wed, 20 Jul 2022 20:37:00 +0000 (13:37 -0700)]
tls: rx: release the sock lock on locking timeout

Eric reports we should release the socket lock if the entire
"grab reader lock" operation has failed. The callers assume
they don't have to release it or otherwise unwind.

Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+16e72110feb2b653ef27@syzkaller.appspotmail.com
Fixes: 4cbc325ed6b4 ("tls: rx: allow only one reader at a time")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220720203701.2179034-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'linux-can-next-for-5.20-20220721' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Fri, 22 Jul 2022 01:45:34 +0000 (18:45 -0700)]
Merge tag 'linux-can-next-for-5.20-20220721' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
can-next 2022-07-21

The patch is by Vincent Mailhol and fixes a use on an uninitialized
variable in the pch_can driver (introduced in last pull request to
net-next).

* tag 'linux-can-next-for-5.20-20220721' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: pch_can: pch_can_error(): initialize errc before using it
====================

Link: https://lore.kernel.org/r/20220721163042.3448384-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ipa: fix build
Paolo Abeni [Thu, 21 Jul 2022 13:55:14 +0000 (15:55 +0200)]
net: ipa: fix build

After commit 2c7b9b936bdc ("net: ipa: move configuration data files
into a subdirectory"), build of the ipa driver fails with the
following error:

drivers/net/ipa/data/ipa_data-v3.1.c:9:10: fatal error: gsi.h: No such file or directory

After the mentioned commit, all the file included by the configuration
are in the parent directory. Fix the issue updating the include path.

Fixes: 2c7b9b936bdc ("net: ipa: move configuration data files into a subdirectory")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/7105112c38cfe0642a2d9e1779bf784a7aa63d16.1658411666.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agobpf, docs: Use SPDX license identifier in bpf_doc.py
Alejandro Colomar [Thu, 21 Jul 2022 11:08:22 +0000 (13:08 +0200)]
bpf, docs: Use SPDX license identifier in bpf_doc.py

The Linux man-pages project now uses SPDX tags, instead of the full
license text.

Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://www.kernel.org/doc/man-pages/licenses.html
Link: https://spdx.org/licenses/Linux-man-pages-copyleft.html
Link: https://lore.kernel.org/bpf/20220721110821.8240-1-alx.manpages@gmail.com
2 years agobpf, arm64: Fix compile error in dummy_tramp()
Xu Kuohai [Thu, 21 Jul 2022 12:13:19 +0000 (08:13 -0400)]
bpf, arm64: Fix compile error in dummy_tramp()

dummy_tramp() uses "lr" to refer to the x30 register, but some assembler
does not recognize "lr" and reports a build failure:

/tmp/cc52xO0c.s: Assembler messages:
/tmp/cc52xO0c.s:8: Error: operand 1 should be an integer register -- `mov lr,x9'
/tmp/cc52xO0c.s:7: Error: undefined symbol lr used as an immediate value
make[2]: *** [scripts/Makefile.build:250: arch/arm64/net/bpf_jit_comp.o] Error 1
make[1]: *** [scripts/Makefile.build:525: arch/arm64/net] Error 2

So replace "lr" with "x30" to fix it.

Fixes: b2ad54e1533e ("bpf, arm64: Implement bpf_arch_text_poke() for arm64")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/bpf/20220721121319.2999259-1-xukuohai@huaweicloud.com
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 21 Jul 2022 20:03:39 +0000 (13:03 -0700)]
Merge git://git./linux/kernel/git/netdev/net

No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'net-5.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 21 Jul 2022 18:08:35 +0000 (11:08 -0700)]
Merge tag 'net-5.19-rc8' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from can.

  Still no major regressions, most of the changes are still due to data
  races fixes, plus the usual bunch of drivers fixes.

  Previous releases - regressions:

   - tcp/udp: make early_demux back namespacified.

   - dsa: fix issues with vlan_filtering_is_global

  Previous releases - always broken:

   - ip: fix data-races around ipv4_net_table (round 2, 3 & 4)

   - amt: fix validation and synchronization bugs

   - can: fix detection of mcp251863

   - eth: iavf: fix handling of dummy receive descriptors

   - eth: lan966x: fix issues with MAC table

   - eth: stmmac: dwmac-mediatek: fix clock issue

  Misc:

   - dsa: update documentation"

* tag 'net-5.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
  mlxsw: spectrum_router: Fix IPv4 nexthop gateway indication
  net/sched: cls_api: Fix flow action initialization
  tcp: Fix data-races around sysctl_tcp_max_reordering.
  tcp: Fix a data-race around sysctl_tcp_abort_on_overflow.
  tcp: Fix a data-race around sysctl_tcp_rfc1337.
  tcp: Fix a data-race around sysctl_tcp_stdurg.
  tcp: Fix a data-race around sysctl_tcp_retrans_collapse.
  tcp: Fix data-races around sysctl_tcp_slow_start_after_idle.
  tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts.
  tcp: Fix data-races around sysctl_tcp_recovery.
  tcp: Fix a data-race around sysctl_tcp_early_retrans.
  tcp: Fix data-races around sysctl knobs related to SYN option.
  udp: Fix a data-race around sysctl_udp_l3mdev_accept.
  ip: Fix data-races around sysctl_ip_prot_sock.
  ipv4: Fix data-races around sysctl_fib_multipath_hash_fields.
  ipv4: Fix data-races around sysctl_fib_multipath_hash_policy.
  ipv4: Fix a data-race around sysctl_fib_multipath_use_neigh.
  can: rcar_canfd: Add missing of_node_put() in rcar_canfd_probe()
  can: mcp251xfd: fix detection of mcp251863
  Documentation: fix udp_wmem_min in ip-sysctl.rst
  ...

2 years agommu_gather: Force tlb-flush VM_PFNMAP vmas
Peter Zijlstra [Fri, 8 Jul 2022 07:18:06 +0000 (09:18 +0200)]
mmu_gather: Force tlb-flush VM_PFNMAP vmas

Jann reported a race between munmap() and unmap_mapping_range(), where
unmap_mapping_range() will no-op once unmap_vmas() has unlinked the
VMA; however munmap() will not yet have invalidated the TLBs.

Therefore unmap_mapping_range() will complete while there are still
(stale) TLB entries for the specified range.

Mitigate this by force flushing TLBs for VM_PFNMAP ranges.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agommu_gather: Let there be one tlb_{start,end}_vma() implementation
Peter Zijlstra [Fri, 8 Jul 2022 07:18:05 +0000 (09:18 +0200)]
mmu_gather: Let there be one tlb_{start,end}_vma() implementation

Now that architectures are no longer allowed to override
tlb_{start,end}_vma() re-arrange code so that there is only one
implementation for each of these functions.

This much simplifies trying to figure out what they actually do.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agocsky/tlb: Remove tlb_flush() define
Peter Zijlstra [Fri, 8 Jul 2022 07:18:04 +0000 (09:18 +0200)]
csky/tlb: Remove tlb_flush() define

The previous patch removed the tlb_flush_end() implementation which
used tlb_flush_range(). This means:

 - csky did double invalidates, a range invalidate per vma and a full
   invalidate at the end

 - csky actually has range invalidates and as such the generic
   tlb_flush implementation is more efficient for it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agommu_gather: Remove per arch tlb_{start,end}_vma()
Peter Zijlstra [Fri, 8 Jul 2022 07:18:03 +0000 (09:18 +0200)]
mmu_gather: Remove per arch tlb_{start,end}_vma()

Scattered across the archs are 3 basic forms of tlb_{start,end}_vma().
Provide two new MMU_GATHER_knobs to enumerate them and remove the per
arch tlb_{start,end}_vma() implementations.

 - MMU_GATHER_NO_FLUSH_CACHE indicates the arch has flush_cache_range()
   but does *NOT* want to call it for each VMA.

 - MMU_GATHER_MERGE_VMAS indicates the arch wants to merge the
   invalidate across multiple VMAs if possible.

With these it is possible to capture the three forms:

  1) empty stubs;
     select MMU_GATHER_NO_FLUSH_CACHE and MMU_GATHER_MERGE_VMAS

  2) start: flush_cache_range(), end: empty;
     select MMU_GATHER_MERGE_VMAS

  3) start: flush_cache_range(), end: flush_tlb_range();
     default

Obviously, if the architecture does not have flush_cache_range() then
it also doesn't need to select MMU_GATHER_NO_FLUSH_CACHE.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoscripts/gdb: Fix gdb 'lx-symbols' command
Khalid Masum [Thu, 21 Jul 2022 09:30:42 +0000 (15:30 +0600)]
scripts/gdb: Fix gdb 'lx-symbols' command

Currently the command 'lx-symbols' in gdb exits with the error`Function
"do_init_module" not defined in "kernel/module.c"`. This occurs because
the file kernel/module.c was moved to kernel/module/main.c.

Fix this breakage by changing the path to "kernel/module/main.c" in
LoadModuleBreakpoint.

Signed-off-by: Khalid Masum <khalid.masum.92@gmail.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Fixes: cfc1d277891e ("module: Move all into module/")
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agowatch-queue: remove spurious double semicolon
Linus Torvalds [Thu, 21 Jul 2022 17:30:14 +0000 (10:30 -0700)]
watch-queue: remove spurious double semicolon

Sedat Dilek noticed that I had an extraneous semicolon at the end of a
line in the previous patch.

It's harmless, but unintentional, and while compilers just treat it as
an extra empty statement, for all I know some other tooling might warn
about it. So clean it up before other people notice too ;)

Fixes: 353f7988dd84 ("watchqueue: make sure to serialize 'wqueue->defunct' properly")
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
2 years agocan: pch_can: pch_can_error(): initialize errc before using it
Vincent Mailhol [Thu, 21 Jul 2022 16:00:32 +0000 (01:00 +0900)]
can: pch_can: pch_can_error(): initialize errc before using it

After commit 3a5c7e4611dd, the variable errc is accessed before being
initialized, c.f. below W=2 warning:

| In function 'pch_can_error',
|     inlined from 'pch_can_poll' at drivers/net/can/pch_can.c:739:4:
| drivers/net/can/pch_can.c:501:29: warning: 'errc' may be used uninitialized [-Wmaybe-uninitialized]
|   501 |                 cf->data[6] = errc & PCH_TEC;
|       |                             ^
| drivers/net/can/pch_can.c: In function 'pch_can_poll':
| drivers/net/can/pch_can.c:484:13: note: 'errc' was declared here
|   484 |         u32 errc, lec;
|       |             ^~~~

Moving errc initialization up solves this issue.

Fixes: 3a5c7e4611dd ("can: pch_can: do not report txerr and rxerr during bus-off")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/all/20220721160032.9348-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agobpf: Check attach_func_proto more carefully in check_helper_call
Stanislav Fomichev [Wed, 20 Jul 2022 16:47:29 +0000 (09:47 -0700)]
bpf: Check attach_func_proto more carefully in check_helper_call

Syzkaller found a problem similar to d1a6edecc1fd ("bpf: Check
attach_func_proto more carefully in check_return_code") where
attach_func_proto might be NULL:

RIP: 0010:check_helper_call+0x3dcb/0x8d50 kernel/bpf/verifier.c:7330
 do_check kernel/bpf/verifier.c:12302 [inline]
 do_check_common+0x6e1e/0xb980 kernel/bpf/verifier.c:14610
 do_check_main kernel/bpf/verifier.c:14673 [inline]
 bpf_check+0x661e/0xc520 kernel/bpf/verifier.c:15243
 bpf_prog_load+0x11ae/0x1f80 kernel/bpf/syscall.c:2620

With the following reproducer:

  bpf$BPF_PROG_RAW_TRACEPOINT_LOAD(0x5, &(0x7f0000000780)={0xf, 0x4, &(0x7f0000000040)=@framed={{}, [@call={0x85, 0x0, 0x0, 0xbb}]}, &(0x7f0000000000)='GPL\x00', 0x0, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, 0x2b, 0xffffffffffffffff, 0x8, 0x0, 0x0, 0x10, 0x0}, 0x80)

Let's do the same here, only check attach_func_proto for the prog types
where we are certain that attach_func_proto is defined.

Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor")
Reported-by: syzbot+0f8d989b1fba1addc5e0@syzkaller.appspotmail.com
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220720164729.147544-1-sdf@google.com
2 years agolibbpf: Fix str_has_sfx()'s return value
Dan Carpenter [Tue, 19 Jul 2022 09:53:01 +0000 (12:53 +0300)]
libbpf: Fix str_has_sfx()'s return value

The return from strcmp() is inverted so it wrongly returns true instead
of false and vice versa.

Fixes: a1c9d61b19cb ("libbpf: Improve library identification for uprobe binary path resolution")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/YtZ+/dAA195d99ak@kili
2 years agolibbpf: Fix sign expansion bug in btf_dump_get_enum_value()
Dan Carpenter [Tue, 19 Jul 2022 09:49:34 +0000 (12:49 +0300)]
libbpf: Fix sign expansion bug in btf_dump_get_enum_value()

The code here is supposed to take a signed int and store it in a signed
long long. Unfortunately, the way that the type promotion works with
this conditional statement is that it takes a signed int, type promotes
it to a __u32, and then stores that as a signed long long. The result is
never negative.

This is from static analysis, but I made a little test program just to
test it before I sent the patch:

  #include <stdio.h>

  int main(void)
  {
        unsigned long long src = -1ULL;
        signed long long dst1, dst2;
        int is_signed = 1;

        dst1 = is_signed ? *(int *)&src : *(unsigned int *)0;
        dst2 = is_signed ? (signed long long)*(int *)&src : *(unsigned int *)0;

        printf("%lld\n", dst1);
        printf("%lld\n", dst2);

        return 0;
  }

Fixes: d90ec262b35b ("libbpf: Add enum64 support for btf_dump")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/YtZ+LpgPADm7BeEd@kili
2 years agonet/cdc_ncm: Increase NTB max RX/TX values to 64kb
Łukasz Spintzyk [Wed, 20 Jul 2022 06:05:18 +0000 (08:05 +0200)]
net/cdc_ncm: Increase NTB max RX/TX values to 64kb

DisplayLink ethernet devices require NTB buffers larger then 32kb
in order to run with highest performance.

This patch is changing upper limit of the rx and tx buffers.
Those buffers are initialized with CDC_NCM_NTB_DEF_SIZE_RX and
CDC_NCM_NTB_DEF_SIZE_TX which is 16kb so by default no device is
affected by increased limit.

Rx and tx buffer is increased under two conditions:
 - Device need to advertise that it supports higher buffer size in
   dwNtbMaxInMaxSize and dwNtbMaxOutMaxSize.
 - cdc_ncm/rx_max and cdc_ncm/tx_max driver parameters must be adjusted
   with udev rule or ethtool.

Summary of testing and performance results:
Tests were performed on following devices:
 - DisplayLink DL-3xxx family device
 - DisplayLink DL-6xxx family device
 - ASUS USB-C2500 2.5G USB3 ethernet adapter
 - Plugable USB3 1G USB3 ethernet adapter
 - EDIMAX EU-4307 USB-C ethernet adapter
 - Dell DBQBCBC064 USB-C ethernet adapter

Performance measurements were done with:
 - iperf3 between two linux boxes
 - http://openspeedtest.com/ instance running on local test machine

Insights from tests results:
 - All except one from third party usb adapters were not affected by
   increased buffer size to their advertised dwNtbOutMaxSize and
   dwNtbInMaxSize.
   Devices were generally reaching 912-940Mbps both download and upload.

   Only EDIMAX adapter experienced decreased download size from
   929Mbps to 827Mbps with iper3, with openspeedtest decrease was from
   968Mbps to 886Mbps.

 - DisplayLink DL-3xxx family devices experienced performance increase
   with iperf3 download from 300Mbps to 870Mbps and
   upload from 782Mbps to 844Mbps.
   With openspeedtest download increased from 556Mbps to 873Mbps
   and upload from 727Mbps to 973Mbps

 - DiplayLink DL-6xxx family devices are not affected by
   increased buffer size.

Signed-off-by: Łukasz Spintzyk <lukasz.spintzyk@synaptics.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220720060518.541-2-lukasz.spintzyk@synaptics.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet/cdc_ncm: Enable ZLP for DisplayLink ethernet devices
Dominik Czerwik [Wed, 20 Jul 2022 06:05:17 +0000 (08:05 +0200)]
net/cdc_ncm: Enable ZLP for DisplayLink ethernet devices

This improves performance and stability of
DL-3xxx/DL-5xxx/DL-6xxx device series.

Specifically prevents device from temporary network dropouts when
playing video from the web and network traffic going through is high.

Signed-off-by: Dominik Czerwik <dominik.czerwik@synaptics.com>
Signed-off-by: Łukasz Spintzyk <lukasz.spintzyk@synaptics.com>
Link: https://lore.kernel.org/r/20220720060518.541-1-lukasz.spintzyk@synaptics.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge branch 'net-ipa-move-configuration-data-files'
Jakub Kicinski [Thu, 21 Jul 2022 04:05:44 +0000 (21:05 -0700)]
Merge branch 'net-ipa-move-configuration-data-files'

Alex Elder says:

====================
net: ipa: move configuration data files

This series moves the "ipa_data-vX.Y.c" files into a subdirectory.
The first patch adds a Makefile variable containing the list of
supported IPA versions, and uses it to simplify the way these files
are specified.
====================

Link: https://lore.kernel.org/r/20220719150827.295248-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ipa: move configuration data files into a subdirectory
Alex Elder [Tue, 19 Jul 2022 15:08:27 +0000 (10:08 -0500)]
net: ipa: move configuration data files into a subdirectory

Reduce the clutter in the main IPA source directory by creating a
new "data" subdirectory, and locating all of the configuration data
files in there.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ipa: list supported IPA versions in the Makefile
Alex Elder [Tue, 19 Jul 2022 15:08:26 +0000 (10:08 -0500)]
net: ipa: list supported IPA versions in the Makefile

Create a variable in the Makefile listing the IPA versions supported
by the driver.  Use that to create the list of configuration data
object files used (rather than listing them all individually).

Add a SPDX license comment.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'net-ipa-small-transaction-updates'
Jakub Kicinski [Thu, 21 Jul 2022 04:05:07 +0000 (21:05 -0700)]
Merge branch 'net-ipa-small-transaction-updates'

Alex Elder says:

====================
net: ipa: small transaction updates

Version 2 of this series corrects a misspelling of "outstanding"
pointed out by the netdev test bots.  (For some reason I don't see
that when I run "checkpatch".)  I found and fixed a second instance
of that word being misspelled as well.

This series includes three changes to the transaction code.  The
first adds a new transaction list that represents a distinct state
that has not been maintained.  The second moves a field in the
transaction information structure, and reorders its initialization
a bit.  The third skips a function call when it is known not to be
necessary.

The last two are very small "leftover" patches.
====================

Link: https://lore.kernel.org/r/20220719181020.372697-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ipa: fix an outdated comment
Alex Elder [Tue, 19 Jul 2022 18:10:20 +0000 (13:10 -0500)]
net: ipa: fix an outdated comment

Since commit 8797972afff3d ("net: ipa: remove command info pool"),
we don't allocate "command info" entries for command channel
transactions.  Fix a comment that seems to suggest we still do.
(Even before that commit, the comment was out of place.)

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ipa: report when the driver has been removed
Alex Elder [Tue, 19 Jul 2022 18:10:19 +0000 (13:10 -0500)]
net: ipa: report when the driver has been removed

When the IPA driver has completed its initialization and setup
stages, it emits a brief message to the log.  Add a small message
that reports when it has been removed.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ipa: skip some cleanup for unused transactions
Alex Elder [Tue, 19 Jul 2022 18:10:18 +0000 (13:10 -0500)]
net: ipa: skip some cleanup for unused transactions

In gsi_trans_free(), there's no point in ipa_gsi_trans_release() if
a transaction is unused.  No used TREs means no IPA layer resources
to clean up.  So only call ipa_gsi_trans_release() if at least one
TRE was used.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ipa: rearrange transaction initialization
Alex Elder [Tue, 19 Jul 2022 18:10:17 +0000 (13:10 -0500)]
net: ipa: rearrange transaction initialization

The transaction map is really associated with the transaction pool;
move its definition earlier in the gsi_trans_info structure.

Rearrange initialization in gsi_channel_trans_init() so it
sets the tre_avail value first, then initializes the transaction
pool, and finally allocating the transaction map.

Update comments.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ipa: add a transaction committed list
Alex Elder [Tue, 19 Jul 2022 18:10:16 +0000 (13:10 -0500)]
net: ipa: add a transaction committed list

We currently put a transaction on the pending list when it has
been committed.  But until the channel's doorbell rings, these
transactions aren't actually "owned" by the hardware yet.

Add a new "committed" state (and list), to represent transactions
that have been committed but not yet sent to hardware.  Define
"pending" to mean committed transactions that have been sent
to hardware but have not yet completed.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ipa: add an endpoint device attribute group
Alex Elder [Tue, 19 Jul 2022 19:16:39 +0000 (14:16 -0500)]
net: ipa: add an endpoint device attribute group

Create a new attribute group meant to provide a single place that
defines endpoint IDs that might be needed by user space.  Not all
defined endpoints are presented, and only those that are defined
will be made visible.

The new attributes use "extended" device attributes to hold endpoint
IDs, which is a little more compact and efficient.  Reimplement the
existing modem endpoint ID attribute files using common code.

Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20220719191639.373249-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: net: af_unix: Fix a build error of unix_connect.c.
Kuniyuki Iwashima [Wed, 20 Jul 2022 00:57:50 +0000 (17:57 -0700)]
selftests: net: af_unix: Fix a build error of unix_connect.c.

This patch fixes a build error reported in the link. [0]

  unix_connect.c: In function ‘unix_connect_test’:
  unix_connect.c:115:55: error: expected identifier before ‘(’ token
   #define offsetof(type, member) ((size_t)&((type *)0)->(member))
                                                       ^
  unix_connect.c:128:12: note: in expansion of macro ‘offsetof’
    addrlen = offsetof(struct sockaddr_un, sun_path) + variant->len;
              ^~~~~~~~

We can fix this by removing () around member, but checkpatch will complain
about it, and the root cause of the build failure is that I followed the
warning and fixed this in the v2 -> v3 change of the blamed commit. [1]

  CHECK: Macro argument 'member' may be better as '(member)' to avoid precedence issues
  #33: FILE: tools/testing/selftests/net/af_unix/unix_connect.c:115:
  +#define offsetof(type, member) ((size_t)&((type *)0)->member)

To avoid this warning, let's use offsetof() defined in stddef.h instead.

[0]: https://lore.kernel.org/linux-mm/202207182205.FrkMeDZT-lkp@intel.com/
[1]: https://lore.kernel.org/netdev/20220702154818.66761-1-kuniyu@amazon.com/

Fixes: e95ab1d85289 ("selftests: net: af_unix: Test connect() with different netns.")
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20220720005750.16600-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: amd8111e: remove repeated dev->features assignement
Jian Shen [Tue, 19 Jul 2022 14:24:24 +0000 (22:24 +0800)]
net: amd8111e: remove repeated dev->features assignement

It's repeated with line 1793-1795, and there isn't any other
handling for it. So remove it.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Link: https://lore.kernel.org/r/20220719142424.4528-1-shenjian15@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Jakub Kicinski [Thu, 21 Jul 2022 01:05:51 +0000 (18:05 -0700)]
Merge git://git./linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

1) Simplify nf_ct_get_tuple(), from Jackie Liu.

2) Add format to request_module() call, from Bill Wendling.

3) Add /proc/net/stats/nf_flowtable to monitor in-flight pending
   hardware offload objects to be processed, from Vlad Buslov.

4) Missing rcu annotation and accessors in the netfilter tree,
   from Florian Westphal.

5) Merge h323 conntrack helper nat hooks into single object,
   also from Florian.

6) A batch of update to fix sparse warnings treewide,
   from Florian Westphal.

7) Move nft_cmp_fast_mask() where it used, from Florian.

8) Missing const in nf_nat_initialized(), from James Yonan.

9) Use bitmap API for Maglev IPVS scheduler, from Christophe Jaillet.

10) Use refcount_inc instead of _inc_not_zero in flowtable,
    from Florian Westphal.

11) Remove pr_debug in xt_TPROXY, from Nathan Cancellor.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: xt_TPROXY: remove pr_debug invocations
  netfilter: flowtable: prefer refcount_inc
  netfilter: ipvs: Use the bitmap API to allocate bitmaps
  netfilter: nf_nat: in nf_nat_initialized(), use const struct nf_conn *
  netfilter: nf_tables: move nft_cmp_fast_mask to where its used
  netfilter: nf_tables: use correct integer types
  netfilter: nf_tables: add and use BE register load-store helpers
  netfilter: nf_tables: use the correct get/put helpers
  netfilter: x_tables: use correct integer types
  netfilter: nfnetlink: add missing __be16 cast
  netfilter: nft_set_bitmap: Fix spelling mistake
  netfilter: h323: merge nat hook pointers into one
  netfilter: nf_conntrack: use rcu accessors where needed
  netfilter: nf_conntrack: add missing __rcu annotations
  netfilter: nf_flow_table: count pending offload workqueue tasks
  net/sched: act_ct: set 'net' pointer when creating new nf_flow_table
  netfilter: conntrack: use correct format characters
  netfilter: conntrack: use fallthrough to cleanup
====================

Link: https://lore.kernel.org/r/20220720230754.209053-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'mlx5-updates-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Thu, 21 Jul 2022 00:54:46 +0000 (17:54 -0700)]
Merge tag 'mlx5-updates-2022-07-17' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2022-07-17

1) Add resiliency for lost completions for PTP TX port timestamp

2) Report Header-data split state via ethtool

3) Decouple HTB code from main regular TX code

* tag 'mlx5-updates-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: CT: Remove warning of ignore_flow_level support for non PF
  net/mlx5e: Add resiliency for PTP TX port timestamp
  net/mlx5: Expose ts_cqe_metadata_size2wqe_counter
  net/mlx5e: HTB, move htb functions to a new file
  net/mlx5e: HTB, change functions name to follow convention
  net/mlx5e: HTB, remove priv from htb function calls
  net/mlx5e: HTB, hide and dynamically allocate mlx5e_htb structure
  net/mlx5e: HTB, move stats and max_sqs to priv
  net/mlx5e: HTB, move section comment to the right place
  net/mlx5e: HTB, move ids to selq_params struct
  net/mlx5e: HTB, reduce visibility of htb functions
  net/mlx5e: Fix mqprio_rl handling on devlink reload
  net/mlx5e: Report header-data split state through ethtool
====================

Link: https://lore.kernel.org/r/20220719203529.51151-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonetfilter: xt_TPROXY: remove pr_debug invocations
Justin Stitt [Tue, 12 Jul 2022 20:49:00 +0000 (13:49 -0700)]
netfilter: xt_TPROXY: remove pr_debug invocations

pr_debug calls are no longer needed in this file.

Pablo suggested "a patch to remove these pr_debug calls". This patch has
some other beneficial collateral as it also silences multiple Clang
-Wformat warnings that were present in the pr_debug calls.

diff from v1 -> v2:
* converted if statement one-liner style
* x == NULL is now !x

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: flowtable: prefer refcount_inc
Florian Westphal [Thu, 7 Jul 2022 19:30:56 +0000 (21:30 +0200)]
netfilter: flowtable: prefer refcount_inc

With refcount_inc_not_zero, we'd also need a smp_rmb or similar,
followed by a test of the CONFIRMED bit.

However, the ct pointer is taken from skb->_nfct, its refcount must
not be 0 (else, we'd already have a use-after-free bug).

Use refcount_inc() instead to clarify the ct refcount is expected to
be at least 1.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agonetfilter: ipvs: Use the bitmap API to allocate bitmaps
Christophe JAILLET [Mon, 4 Jul 2022 19:24:46 +0000 (21:24 +0200)]
netfilter: ipvs: Use the bitmap API to allocate bitmaps

Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 years agobpf: Fix bpf_trampoline_{,un}link_cgroup_shim ifdef guards
Stanislav Fomichev [Wed, 20 Jul 2022 15:52:20 +0000 (08:52 -0700)]
bpf: Fix bpf_trampoline_{,un}link_cgroup_shim ifdef guards

They were updated in kernel/bpf/trampoline.c to fix another build
issue. We should to do the same for include/linux/bpf.h header.

Fixes: 3908fcddc65d ("bpf: fix lsm_cgroup build errors on esoteric configs")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220720155220.4087433-1-sdf@google.com
2 years agowatchqueue: make sure to serialize 'wqueue->defunct' properly
Linus Torvalds [Tue, 19 Jul 2022 18:09:01 +0000 (11:09 -0700)]
watchqueue: make sure to serialize 'wqueue->defunct' properly

When the pipe is closed, we mark the associated watchqueue defunct by
calling watch_queue_clear().  However, while that is protected by the
watchqueue lock, new watchqueue entries aren't actually added under that
lock at all: they use the pipe->rd_wait.lock instead, and looking up
that pipe happens without any locking.

The watchqueue code uses the RCU read-side section to make sure that the
wqueue entry itself hasn't disappeared, but that does not protect the
pipe_info in any way.

So make sure to actually hold the wqueue lock when posting watch events,
properly serializing against the pipe being torn down.

Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agolockdown: Fix kexec lockdown bypass with ima policy
Eric Snowberg [Wed, 20 Jul 2022 16:40:27 +0000 (12:40 -0400)]
lockdown: Fix kexec lockdown bypass with ima policy

The lockdown LSM is primarily used in conjunction with UEFI Secure Boot.
This LSM may also be used on machines without UEFI.  It can also be
enabled when UEFI Secure Boot is disabled.  One of lockdown's features
is to prevent kexec from loading untrusted kernels.  Lockdown can be
enabled through a bootparam or after the kernel has booted through
securityfs.

If IMA appraisal is used with the "ima_appraise=log" boot param,
lockdown can be defeated with kexec on any machine when Secure Boot is
disabled or unavailable.  IMA prevents setting "ima_appraise=log" from
the boot param when Secure Boot is enabled, but this does not cover
cases where lockdown is used without Secure Boot.

To defeat lockdown, boot without Secure Boot and add ima_appraise=log to
the kernel command line; then:

  $ echo "integrity" > /sys/kernel/security/lockdown
  $ echo "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig" > \
    /sys/kernel/security/ima/policy
  $ kexec -ls unsigned-kernel

Add a call to verify ima appraisal is set to "enforce" whenever lockdown
is enabled.  This fixes CVE-2022-21505.

Cc: stable@vger.kernel.org
Fixes: 29d3c1c8dfe7 ("kexec: Allow kexec_file() with appropriate IMA policy when locked down")
Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoMerge tag 'linux-can-fixes-for-5.19-20220720' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Wed, 20 Jul 2022 10:13:54 +0000 (11:13 +0100)]
Merge tag 'linux-can-fixes-for-5.19-20220720' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
this is a pull request of 2 patches for net/master.

The first patch is by me and fixes the detection of the mcp251863 in
the mcp251xfd driver.

The last patch is by Liang He and adds a missing of_node_put() in the
rcar_canfd driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: initialize ring indexes to 0
Alex Elder [Tue, 19 Jul 2022 14:18:55 +0000 (09:18 -0500)]
net: ipa: initialize ring indexes to 0

When a GSI channel is initially allocated, and after it has been
reset, the hardware assumes its ring index is 0.  And although we
do initialize channels this way, the comments in the IPA code don't
really explain this.  For event rings, it doesn't matter what value
we use initially, so using 0 is just fine.

Add some information about the assumptions made by hardware above
the definition of the gsi_ring structure in "gsi.h".

Zero the index field for all rings (channel and event) when the ring
is allocated.  As a result, that function initializes all fields in
the structure.

Stop zeroing the index the top of gsi_channel_program().  Initially
we'll use the index value set when the channel ring was allocated.
And we'll explicitly zero the index value in gsi_channel_reset()
before programming the hardware, adding a comment explaining why
it's required.

For event rings, use the index initialized by gsi_ring_alloc()
rather than 0 when ringing the doorbell in gsi_evt_ring_program().
(It'll still be zero, but we won't assume that to be the case.)

Use a local variable in gsi_evt_ring_program() that represents the
address of the event ring's ring structure.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum_router: Fix IPv4 nexthop gateway indication
Ido Schimmel [Tue, 19 Jul 2022 12:26:26 +0000 (15:26 +0300)]
mlxsw: spectrum_router: Fix IPv4 nexthop gateway indication

mlxsw needs to distinguish nexthops with a gateway from connected
nexthops in order to write the former to the adjacency table of the
device. The check used to rely on the fact that nexthops with a gateway
have a 'link' scope whereas connected nexthops have a 'host' scope. This
is no longer correct after commit 747c14307214 ("ip: fix dflt addr
selection for connected nexthop").

Fix that by instead checking the address family of the gateway IP. This
is a more direct way and also consistent with the IPv6 counterpart in
mlxsw_sp_rt6_is_gateway().

Cc: stable@vger.kernel.org
Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop")
Fixes: 597cfe4fc339 ("nexthop: Add support for IPv4 nexthops")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/sched: cls_api: Fix flow action initialization
Oz Shlomo [Tue, 19 Jul 2022 12:24:09 +0000 (15:24 +0300)]
net/sched: cls_api: Fix flow action initialization

The cited commit refactored the flow action initialization sequence to
use an interface method when translating tc action instances to flow
offload objects. The refactored version skips the initialization of the
generic flow action attributes for tc actions, such as pedit, that allocate
more than one offload entry. This can cause potential issues for drivers
mapping flow action ids.

Populate the generic flow action fields for all the flow action entries.

Fixes: c54e1d920f04 ("flow_offload: add ops to tc_action_ops for flow action setup")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
----
v1 -> v2:
 - coalese the generic flow action fields initialization to a single loop
Reviewed-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: marvell: prestera: add phylink support
Oleksandr Mazur [Tue, 19 Jul 2022 10:57:16 +0000 (13:57 +0300)]
net: marvell: prestera: add phylink support

For SFP port prestera driver will use kernel
phylink infrastucture to configure port mode based on
the module that has beed inserted

Co-developed-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Co-developed-by: Taras Chornyi <taras.chornyi@plvision.eu>
Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agovmxnet3: Implement ethtool's get_channels command
Andrey Turkin [Tue, 19 Jul 2022 04:29:55 +0000 (04:29 +0000)]
vmxnet3: Implement ethtool's get_channels command

Some tools (e.g. libxdp) use that information.

Signed-off-by: Andrey Turkin <andrey.turkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge tag 'linux-can-next-for-5.20-20220720' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Wed, 20 Jul 2022 09:17:05 +0000 (10:17 +0100)]
Merge tag 'linux-can-next-for-5.20-20220720' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
this is a pull request of 29 patches for net-next/master.

The first 6 patches target the slcan driver. Dan Carpenter contributes
a hardening patch, followed by 5 cleanup patches.

Biju Das contributes 5 patches to prepare the sja1000 driver to
support the Renesas RZ/N1 SJA1000 CAN controller.

Dario Binacchi's patch for the slcan driver fixes a sleep with held
spin lock.

Another patch by Dario Binacchi fixes a wrong comment in the c_can
driver.

Pavel Pisa updates the CTU CAN FD IP core registers.

Stephane Grosjean contributes 3 patches to the peak_usb driver for
cleanups and support of a new MCU.

The last 12 patches are by Vincent Mailhol, they fix and improve the
txerr and rxerr reporting in all CAN drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'net-sysctl-races-round-4'
David S. Miller [Wed, 20 Jul 2022 09:14:50 +0000 (10:14 +0100)]
Merge branch 'net-sysctl-races-round-4'

Kuniyuki Iwashima says:

====================
sysctl: Fix data-races around ipv4_net_table (Round 4).

This series fixes data-races around 17 knobs after fib_multipath_use_neigh
in ipv4_net_table.

tcp_fack was skipped because it's obsolete and there's no readers.

So, round 5 will start with tcp_dsack, 2 rounds left for 27 knobs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: Fix data-races around sysctl_tcp_max_reordering.
Kuniyuki Iwashima [Mon, 18 Jul 2022 17:26:53 +0000 (10:26 -0700)]
tcp: Fix data-races around sysctl_tcp_max_reordering.

While reading sysctl_tcp_max_reordering, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: dca145ffaa8d ("tcp: allow for bigger reordering level")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: Fix a data-race around sysctl_tcp_abort_on_overflow.
Kuniyuki Iwashima [Mon, 18 Jul 2022 17:26:52 +0000 (10:26 -0700)]
tcp: Fix a data-race around sysctl_tcp_abort_on_overflow.

While reading sysctl_tcp_abort_on_overflow, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: Fix a data-race around sysctl_tcp_rfc1337.
Kuniyuki Iwashima [Mon, 18 Jul 2022 17:26:51 +0000 (10:26 -0700)]
tcp: Fix a data-race around sysctl_tcp_rfc1337.

While reading sysctl_tcp_rfc1337, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: Fix a data-race around sysctl_tcp_stdurg.
Kuniyuki Iwashima [Mon, 18 Jul 2022 17:26:50 +0000 (10:26 -0700)]
tcp: Fix a data-race around sysctl_tcp_stdurg.

While reading sysctl_tcp_stdurg, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: Fix a data-race around sysctl_tcp_retrans_collapse.
Kuniyuki Iwashima [Mon, 18 Jul 2022 17:26:49 +0000 (10:26 -0700)]
tcp: Fix a data-race around sysctl_tcp_retrans_collapse.

While reading sysctl_tcp_retrans_collapse, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: Fix data-races around sysctl_tcp_slow_start_after_idle.
Kuniyuki Iwashima [Mon, 18 Jul 2022 17:26:48 +0000 (10:26 -0700)]
tcp: Fix data-races around sysctl_tcp_slow_start_after_idle.

While reading sysctl_tcp_slow_start_after_idle, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 35089bb203f4 ("[TCP]: Add tcp_slow_start_after_idle sysctl.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts.
Kuniyuki Iwashima [Mon, 18 Jul 2022 17:26:47 +0000 (10:26 -0700)]
tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts.

While reading sysctl_tcp_thin_linear_timeouts, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: 36e31b0af587 ("net: TCP thin linear timeouts")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: Fix data-races around sysctl_tcp_recovery.
Kuniyuki Iwashima [Mon, 18 Jul 2022 17:26:46 +0000 (10:26 -0700)]
tcp: Fix data-races around sysctl_tcp_recovery.

While reading sysctl_tcp_recovery, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 4f41b1c58a32 ("tcp: use RACK to detect losses")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: Fix a data-race around sysctl_tcp_early_retrans.
Kuniyuki Iwashima [Mon, 18 Jul 2022 17:26:45 +0000 (10:26 -0700)]
tcp: Fix a data-race around sysctl_tcp_early_retrans.

While reading sysctl_tcp_early_retrans, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: eed530b6c676 ("tcp: early retransmit")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: Fix data-races around sysctl knobs related to SYN option.
Kuniyuki Iwashima [Mon, 18 Jul 2022 17:26:44 +0000 (10:26 -0700)]
tcp: Fix data-races around sysctl knobs related to SYN option.

While reading these knobs, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.

  - tcp_sack
  - tcp_window_scaling
  - tcp_timestamps

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoudp: Fix a data-race around sysctl_udp_l3mdev_accept.
Kuniyuki Iwashima [Mon, 18 Jul 2022 17:26:43 +0000 (10:26 -0700)]
udp: Fix a data-race around sysctl_udp_l3mdev_accept.

While reading sysctl_udp_l3mdev_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 63a6fff353d0 ("net: Avoid receiving packets with an l3mdev on unbound UDP sockets")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoip: Fix data-races around sysctl_ip_prot_sock.
Kuniyuki Iwashima [Mon, 18 Jul 2022 17:26:42 +0000 (10:26 -0700)]
ip: Fix data-races around sysctl_ip_prot_sock.

sysctl_ip_prot_sock is accessed concurrently, and there is always a chance
of data-race.  So, all readers and writers need some basic protection to
avoid load/store-tearing.

Fixes: 4548b683b781 ("Introduce a sysctl that modifies the value of PROT_SOCK.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>