platform/kernel/linux-starfive.git
4 years agosamples, bpf: Refactor perf_event user program with libbpf bpf_link
Daniel T. Lee [Sat, 21 Mar 2020 10:04:24 +0000 (19:04 +0900)]
samples, bpf: Refactor perf_event user program with libbpf bpf_link

The bpf_program__attach of libbpf(using bpf_link) is much more intuitive
than the previous method using ioctl.

bpf_program__attach_perf_event manages the enable of perf_event and
attach of BPF programs to it, so there's no neeed to do this
directly with ioctl.

In addition, bpf_link provides consistency in the use of API because it
allows disable (detach, destroy) for multiple events to be treated as
one bpf_link__destroy. Also, bpf_link__destroy manages the close() of
perf_event fd.

This commit refactors samples that attach the bpf program to perf_event
by using libbbpf instead of ioctl. Also the bpf_load in the samples were
removed and migrated to use libbbpf API.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200321100424.1593964-3-danieltimlee@gmail.com
4 years agosamples, bpf: Move read_trace_pipe to trace_helpers
Daniel T. Lee [Sat, 21 Mar 2020 10:04:23 +0000 (19:04 +0900)]
samples, bpf: Move read_trace_pipe to trace_helpers

To reduce the reliance of trace samples (trace*_user) on bpf_load,
move read_trace_pipe to trace_helpers. By moving this bpf_loader helper
elsewhere, trace functions can be easily migrated to libbbpf.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200321100424.1593964-2-danieltimlee@gmail.com
4 years agobpf: Add tests for bpf_sk_storage to bpf_tcp_ca
Martin KaFai Lau [Fri, 20 Mar 2020 15:21:07 +0000 (08:21 -0700)]
bpf: Add tests for bpf_sk_storage to bpf_tcp_ca

This patch adds test to exercise the bpf_sk_storage_get()
and bpf_sk_storage_delete() helper from the bpf_dctcp.c.

The setup and check on the sk_storage is done immediately
before and after the connect().

This patch also takes this chance to move the pthread_create()
after the connect() has been done.  That will remove the need of
the "wait_thread" label.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200320152107.2169904-1-kafai@fb.com
4 years agobpf: Add bpf_sk_storage support to bpf_tcp_ca
Martin KaFai Lau [Fri, 20 Mar 2020 15:21:01 +0000 (08:21 -0700)]
bpf: Add bpf_sk_storage support to bpf_tcp_ca

This patch adds bpf_sk_storage_get() and bpf_sk_storage_delete()
helper to the bpf_tcp_ca's struct_ops.  That would allow
bpf-tcp-cc to:
1) share sk private data with other bpf progs.
2) use bpf_sk_storage as a private storage for a bpf-tcp-cc
   if the existing icsk_ca_priv is not big enough.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200320152101.2169498-1-kafai@fb.com
4 years agoselftests/bpf: Fix mix of tabs and spaces
Bill Wendling [Fri, 20 Mar 2020 20:15:10 +0000 (13:15 -0700)]
selftests/bpf: Fix mix of tabs and spaces

Clang's -Wmisleading-indentation warns about misleading indentations if
there's a mixture of spaces and tabs. Remove extraneous spaces.

Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200320201510.217169-1-morbo@google.com
4 years agobpf, tcp: Make tcp_bpf_recvmsg static
YueHaibing [Fri, 20 Mar 2020 02:34:26 +0000 (10:34 +0800)]
bpf, tcp: Make tcp_bpf_recvmsg static

After commit f747632b608f ("bpf: sockmap: Move generic sockmap
hooks from BPF TCP"), tcp_bpf_recvmsg() is not used out of
tcp_bpf.c, so make it static and remove it from tcp.h. Also move
it to BPF_STREAM_PARSER #ifdef to fix unused function warnings.

Fixes: f747632b608f ("bpf: sockmap: Move generic sockmap hooks from BPF TCP")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200320023426.60684-3-yuehaibing@huawei.com
4 years agobpf, tcp: Fix unused function warnings
YueHaibing [Fri, 20 Mar 2020 02:34:25 +0000 (10:34 +0800)]
bpf, tcp: Fix unused function warnings

If BPF_STREAM_PARSER is not set, gcc warns:

  net/ipv4/tcp_bpf.c:483:12: warning: 'tcp_bpf_sendpage' defined but not used [-Wunused-function]
  net/ipv4/tcp_bpf.c:395:12: warning: 'tcp_bpf_sendmsg' defined but not used [-Wunused-function]
  net/ipv4/tcp_bpf.c:13:13: warning: 'tcp_bpf_stream_read' defined but not used [-Wunused-function]

Moves the unused functions into the #ifdef CONFIG_BPF_STREAM_PARSER.

Fixes: f747632b608f ("bpf: sockmap: Move generic sockmap hooks from BPF TCP")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200320023426.60684-2-yuehaibing@huawei.com
4 years agobpftool: Add struct_ops support
Martin KaFai Lau [Wed, 18 Mar 2020 17:16:56 +0000 (10:16 -0700)]
bpftool: Add struct_ops support

This patch adds struct_ops support to the bpftool.

To recap a bit on the recent bpf_struct_ops feature on the kernel side:
It currently supports "struct tcp_congestion_ops" to be implemented
in bpf.  At a high level, bpf_struct_ops is struct_ops map populated
with a number of bpf progs.  bpf_struct_ops currently supports the
"struct tcp_congestion_ops".  However, the bpf_struct_ops design is
generic enough that other kernel struct ops can be supported in
the future.

Although struct_ops is map+progs at a high lever, there are differences
in details.  For example,
1) After registering a struct_ops, the struct_ops is held by the kernel
   subsystem (e.g. tcp-cc).  Thus, there is no need to pin a
   struct_ops map or its progs in order to keep them around.
2) To iterate all struct_ops in a system, it iterates all maps
   in type BPF_MAP_TYPE_STRUCT_OPS.  BPF_MAP_TYPE_STRUCT_OPS is
   the current usual filter.  In the future, it may need to
   filter by other struct_ops specific properties.  e.g. filter by
   tcp_congestion_ops or other kernel subsystem ops in the future.
3) struct_ops requires the running kernel having BTF info.  That allows
   more flexibility in handling other kernel structs.  e.g. it can
   always dump the latest bpf_map_info.
4) Also, "struct_ops" command is not intended to repeat all features
   already provided by "map" or "prog".  For example, if there really
   is a need to pin the struct_ops map, the user can use the "map" cmd
   to do that.

While the first attempt was to reuse parts from map/prog.c,  it ended up
not a lot to share.  The only obvious item is the map_parse_fds() but
that still requires modifications to accommodate struct_ops map specific
filtering (for the immediate and the future needs).  Together with the
earlier mentioned differences, it is better to part away from map/prog.c.

The initial set of subcmds are, register, unregister, show, and dump.

For register, it registers all struct_ops maps that can be found in an
obj file.  Option can be added in the future to specify a particular
struct_ops map.  Also, the common bpf_tcp_cc is stateless (e.g.
bpf_cubic.c and bpf_dctcp.c).  The "reuse map" feature is not
implemented in this patch and it can be considered later also.

For other subcmds, please see the man doc for details.

A sample output of dump:
[root@arch-fb-vm1 bpf]# bpftool struct_ops dump name cubic
[{
        "bpf_map_info": {
            "type": 26,
            "id": 64,
            "key_size": 4,
            "value_size": 256,
            "max_entries": 1,
            "map_flags": 0,
            "name": "cubic",
            "ifindex": 0,
            "btf_vmlinux_value_type_id": 18452,
            "netns_dev": 0,
            "netns_ino": 0,
            "btf_id": 52,
            "btf_key_type_id": 0,
            "btf_value_type_id": 0
        }
    },{
        "bpf_struct_ops_tcp_congestion_ops": {
            "refcnt": {
                "refs": {
                    "counter": 1
                }
            },
            "state": "BPF_STRUCT_OPS_STATE_INUSE",
            "data": {
                "list": {
                    "next": 0,
                    "prev": 0
                },
                "key": 0,
                "flags": 0,
                "init": "void (struct sock *) bictcp_init/prog_id:138",
                "release": "void (struct sock *) 0",
                "ssthresh": "u32 (struct sock *) bictcp_recalc_ssthresh/prog_id:141",
                "cong_avoid": "void (struct sock *, u32, u32) bictcp_cong_avoid/prog_id:140",
                "set_state": "void (struct sock *, u8) bictcp_state/prog_id:142",
                "cwnd_event": "void (struct sock *, enum tcp_ca_event) bictcp_cwnd_event/prog_id:139",
                "in_ack_event": "void (struct sock *, u32) 0",
                "undo_cwnd": "u32 (struct sock *) tcp_reno_undo_cwnd/prog_id:144",
                "pkts_acked": "void (struct sock *, const struct ack_sample *) bictcp_acked/prog_id:143",
                "min_tso_segs": "u32 (struct sock *) 0",
                "sndbuf_expand": "u32 (struct sock *) 0",
                "cong_control": "void (struct sock *, const struct rate_sample *) 0",
                "get_info": "size_t (struct sock *, u32, int *, union tcp_cc_info *) 0",
                "name": "bpf_cubic",
                "owner": 0
            }
        }
    }
]

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200318171656.129650-1-kafai@fb.com
4 years agobpftool: Translate prog_id to its bpf prog_name
Martin KaFai Lau [Wed, 18 Mar 2020 17:16:50 +0000 (10:16 -0700)]
bpftool: Translate prog_id to its bpf prog_name

The kernel struct_ops obj has kernel's func ptrs implemented by bpf_progs.
The bpf prog_id is stored as the value of the func ptr for introspection
purpose.  In the latter patch, a struct_ops dump subcmd will be added
to introspect these func ptrs.  It is desired to print the actual bpf
prog_name instead of only printing the prog_id.

Since struct_ops is the only usecase storing prog_id in the func ptr,
this patch adds a prog_id_as_func_ptr bool (default is false) to
"struct btf_dumper" in order not to mis-interpret the ptr value
for the other existing use-cases.

While printing a func_ptr as a bpf prog_name,
this patch also prefix the bpf prog_name with the ptr's func_proto.
[ Note that it is the ptr's func_proto instead of the bpf prog's
  func_proto ]
It reuses the current btf_dump_func() to obtain the ptr's func_proto
string.

Here is an example from the bpf_cubic.c:
"void (struct sock *, u32, u32) bictcp_cong_avoid/prog_id:140"

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200318171650.129252-1-kafai@fb.com
4 years agobpftool: Print as a string for char array
Martin KaFai Lau [Wed, 18 Mar 2020 17:16:43 +0000 (10:16 -0700)]
bpftool: Print as a string for char array

A char[] is currently printed as an integer array.
This patch will print it as a string when
1) The array element type is an one byte int
2) The array element type has a BTF_INT_CHAR encoding or
   the array element type's name is "char"
3) All characters is between (0x1f, 0x7f) and it is terminated
   by a null character.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200318171643.129021-1-kafai@fb.com
4 years agobpftool: Print the enum's name instead of value
Martin KaFai Lau [Wed, 18 Mar 2020 17:16:37 +0000 (10:16 -0700)]
bpftool: Print the enum's name instead of value

This patch prints the enum's name if there is one found in
the array of btf_enum.

The commit 9eea98497951 ("bpf: fix BTF verification of enums")
has details about an enum could have any power-of-2 size (up to 8 bytes).
This patch also takes this chance to accommodate these non 4 byte
enums.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200318171637.128862-1-kafai@fb.com
4 years agobpf: Support llvm-objcopy for vmlinux BTF
Fangrui Song [Wed, 18 Mar 2020 22:27:46 +0000 (15:27 -0700)]
bpf: Support llvm-objcopy for vmlinux BTF

Simplify gen_btf logic to make it work with llvm-objcopy. The existing
'file format' and 'architecture' parsing logic is brittle and does not
work with llvm-objcopy/llvm-objdump.

'file format' output of llvm-objdump>=11 will match GNU objdump, but
'architecture' (bfdarch) may not.

.BTF in .tmp_vmlinux.btf is non-SHF_ALLOC. Add the SHF_ALLOC flag
because it is part of vmlinux image used for introspection. C code
can reference the section via linker script defined __start_BTF and
__stop_BTF. This fixes a small problem that previous .BTF had the
SHF_WRITE flag (objcopy -I binary -O elf* synthesized .data).

Additionally, `objcopy -I binary` synthesized symbols
_binary__btf_vmlinux_bin_start and _binary__btf_vmlinux_bin_stop (not
used elsewhere) are replaced with more commonplace __start_BTF and
__stop_BTF.

Add 2>/dev/null because GNU objcopy (but not llvm-objcopy) warns
"empty loadable segment detected at vaddr=0xffffffff81000000, is this intentional?"

We use a dd command to change the e_type field in the ELF header from
ET_EXEC to ET_REL so that lld will accept .btf.vmlinux.bin.o.  Accepting
ET_EXEC as an input file is an extremely rare GNU ld feature that lld
does not intend to support, because this is error-prone.

The output section description .BTF in include/asm-generic/vmlinux.lds.h
avoids potential subtle orphan section placement issues and suppresses
--orphan-handling=warn warnings.

Fixes: df786c9b9476 ("bpf: Force .BTF section start to zero when dumping from vmlinux")
Fixes: cb0cc635c7a9 ("powerpc: Include .BTF section")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Stanislav Fomichev <sdf@google.com>
Tested-by: Andrii Nakryiko <andriin@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://github.com/ClangBuiltLinux/linux/issues/871
Link: https://lore.kernel.org/bpf/20200318222746.173648-1-maskray@google.com
4 years agobpf, libbpf: Fix ___bpf_kretprobe_args1(x) macro definition
Wenbo Zhang [Sun, 15 Mar 2020 08:32:52 +0000 (04:32 -0400)]
bpf, libbpf: Fix ___bpf_kretprobe_args1(x) macro definition

Use PT_REGS_RC instead of PT_REGS_RET to get ret correctly.

Fixes: df8ff35311c8 ("libbpf: Merge selftests' bpf_trace_helpers.h into libbpf's bpf_tracing.h")
Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200315083252.22274-1-ethercflow@gmail.com
4 years agoselftests/bpf: Reset process and thread affinity after each test/sub-test
Andrii Nakryiko [Sat, 14 Mar 2020 01:39:32 +0000 (18:39 -0700)]
selftests/bpf: Reset process and thread affinity after each test/sub-test

Some tests and sub-tests are setting "custom" thread/process affinity and
don't reset it back. Instead of requiring each test to undo all this, ensure
that thread affinity is restored by test_progs test runner itself.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200314013932.4035712-3-andriin@fb.com
4 years agoselftests/bpf: Fix test_progs's parsing of test numbers
Andrii Nakryiko [Sat, 14 Mar 2020 01:39:31 +0000 (18:39 -0700)]
selftests/bpf: Fix test_progs's parsing of test numbers

When specifying disjoint set of tests, test_progs doesn't set skipped test's
array elements to false. This leads to spurious execution of tests that should
have been skipped. Fix it by explicitly initializing them to false.

Fixes: 3a516a0a3a7b ("selftests/bpf: add sub-tests support for test_progs")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200314013932.4035712-2-andriin@fb.com
4 years agoselftests/bpf: Fix race in tcp_rtt test
Andrii Nakryiko [Sat, 14 Mar 2020 01:39:30 +0000 (18:39 -0700)]
selftests/bpf: Fix race in tcp_rtt test

Previous attempt to make tcp_rtt more robust introduced a new race, in which
server_done might be set to true before server can actually accept any
connection. Fix this by unconditionally waiting for accept(). Given socket is
non-blocking, if there are any problems with client side, it should eventually
close listening FD and let server thread exit with failure.

Fixes: 4cd729fa022c ("selftests/bpf: Make tcp_rtt test more robust to failures")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200314013932.4035712-1-andriin@fb.com
4 years agoselftests/bpf: Fix nanosleep for real this time
Andrii Nakryiko [Sat, 14 Mar 2020 00:27:43 +0000 (17:27 -0700)]
selftests/bpf: Fix nanosleep for real this time

Amazingly, some libc implementations don't call __NR_nanosleep syscall from
their nanosleep() APIs. Hammer it down with explicit syscall() call and never
get back to it again. Also simplify code for timespec initialization.

I verified that nanosleep is called w/ printk and in exactly same Linux image
that is used in Travis CI. So it should both sleep and call correct syscall.

v1->v2:
  - math is too hard, fix usec -> nsec convertion (Martin);
  - test_vmlinux has explicit nanosleep() call, convert that one as well.

Fixes: 4e1fd25d19e8 ("selftests/bpf: Fix usleep() implementation")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200314002743.3782677-1-andriin@fb.com
4 years agoselftest/bpf: Fix compilation warning in sockmap_parse_prog.c
Andrii Nakryiko [Sat, 14 Mar 2020 00:18:34 +0000 (17:18 -0700)]
selftest/bpf: Fix compilation warning in sockmap_parse_prog.c

Remove unused len variable, which causes compilation warnings.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200314001834.3727680-1-andriin@fb.com
4 years agosfc: fix XDP-redirect in this driver
Jesper Dangaard Brouer [Fri, 13 Mar 2020 13:25:19 +0000 (14:25 +0100)]
sfc: fix XDP-redirect in this driver

XDP-redirect is broken in this driver sfc. XDP_REDIRECT requires
tailroom for skb_shared_info when creating an SKB based on the
redirected xdp_frame (both in cpumap and veth).

The fix requires some initial explaining. The driver uses RX page-split
when possible. It reserves the top 64 bytes in the RX-page for storing
dma_addr (struct efx_rx_page_state). It also have the XDP recommended
headroom of XDP_PACKET_HEADROOM (256 bytes). As it doesn't reserve any
tailroom, it can still fit two standard MTU (1500) frames into one page.

The sizeof struct skb_shared_info in 320 bytes. Thus drivers like ixgbe
and i40e, reduce their XDP headroom to 192 bytes, which allows them to
fit two frames with max 1536 bytes into a 4K page (192+1536+320=2048).

The fix is to reduce this drivers headroom to 128 bytes and add the 320
bytes tailroom. This account for reserved top 64 bytes in the page, and
still fit two frame in a page for normal MTUs.

We must never go below 128 bytes of headroom for XDP, as one cacheline
is for xdp_frame area and next cacheline is reserved for metadata area.

Fixes: eb9a36be7f3e ("sfc: perform XDP processing on received packets")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoremoteproc: clean up notification config
Alex Elder [Mon, 16 Mar 2020 22:51:21 +0000 (17:51 -0500)]
remoteproc: clean up notification config

Rearrange the config files for remoteproc and IPA to fix their
interdependencies.

First, have CONFIG_QCOM_Q6V5_MSS select QCOM_Q6V5_IPA_NOTIFY so the
notification code is built regardless of whether IPA needs it.

Next, represent QCOM_IPA as being dependent on QCOM_Q6V5_MSS rather
than setting its value to match QCOM_Q6V5_COMMON (which is selected
by QCOM_Q6V5_MSS).

Drop all dependencies from QCOM_Q6V5_IPA_NOTIFY.  The notification
code will be built whenever QCOM_Q6V5_MSS is set, and it has no other
dependencies.

Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: kcm: kcmproc.c: Fix RCU list suspicious usage warning
Madhuparna Bhowmik [Mon, 16 Mar 2020 17:13:52 +0000 (22:43 +0530)]
net: kcm: kcmproc.c: Fix RCU list suspicious usage warning

This path fixes the suspicious RCU usage warning reported by
kernel test robot.

net/kcm/kcmproc.c:#RCU-list_traversed_in_non-reader_section

There is no need to use list_for_each_entry_rcu() in
kcm_stats_seq_show() as the list is always traversed under
knet->mutex held.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoqede: remove some unused code in function qede_selftest_receive_traffic
Zheng Zengkai [Mon, 16 Mar 2020 13:05:24 +0000 (21:05 +0800)]
qede: remove some unused code in function qede_selftest_receive_traffic

Remove set but not used variables 'sw_comp_cons' and 'hw_comp_cons'
to fix gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/qlogic/qede/qede_ethtool.c: In function qede_selftest_receive_traffic:
drivers/net/ethernet/qlogic/qede/qede_ethtool.c:1569:20:
 warning: variable sw_comp_cons set but not used [-Wunused-but-set-variable]
drivers/net/ethernet/qlogic/qede/qede_ethtool.c: In function qede_selftest_receive_traffic:
drivers/net/ethernet/qlogic/qede/qede_ethtool.c:1569:6:
 warning: variable hw_comp_cons set but not used [-Wunused-but-set-variable]

After removing 'hw_comp_cons',the memory barrier 'rmb()' and its comments become useless,
so remove them as well.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sched: set the hw_stats_type in pedit loop
Jiri Pirko [Mon, 16 Mar 2020 08:03:25 +0000 (09:03 +0100)]
net: sched: set the hw_stats_type in pedit loop

For a single pedit action, multiple offload entries may be used. Set the
hw_stats_type to all of them.

Fixes: 44f865801741 ("sched: act: allow user to specify type of HW stats for a filter")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-stmmac-Use-readl_poll_timeout-to-simplify-the-code'
David S. Miller [Mon, 16 Mar 2020 09:10:10 +0000 (02:10 -0700)]
Merge branch 'net-stmmac-Use-readl_poll_timeout-to-simplify-the-code'

Dejin Zheng says:

====================
net: stmmac: Use readl_poll_timeout() to simplify the code

This patch sets just for replace the open-coded loop to the
readl_poll_timeout() helper macro for simplify the code in
stmmac driver.

v2 -> v3:
- return whatever error code by readl_poll_timeout() returned.
v1 -> v2:
- no changed. I am a newbie and sent this patch a month
  ago (February 6th). So far, I have not received any comments or
  suggestion. I think it may be lost somewhere in the world, so
  resend it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: stmmac: use readl_poll_timeout() function in dwmac4_dma_reset()
Dejin Zheng [Mon, 16 Mar 2020 02:32:54 +0000 (10:32 +0800)]
net: stmmac: use readl_poll_timeout() function in dwmac4_dma_reset()

The dwmac4_dma_reset() function use an open coded of readl_poll_timeout().
Replace the open coded handling with the proper function.

Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: stmmac: use readl_poll_timeout() function in init_systime()
Dejin Zheng [Mon, 16 Mar 2020 02:32:53 +0000 (10:32 +0800)]
net: stmmac: use readl_poll_timeout() function in init_systime()

The init_systime() function use an open coded of readl_poll_timeout().
Replace the open coded handling with the proper function.

Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agochcr: remove set but not used variable 'status'
YueHaibing [Sat, 14 Mar 2020 10:51:20 +0000 (18:51 +0800)]
chcr: remove set but not used variable 'status'

drivers/crypto/chelsio/chcr_ktls.c: In function chcr_ktls_cpl_set_tcb_rpl:
drivers/crypto/chelsio/chcr_ktls.c:662:11: warning:
 variable status set but not used [-Wunused-but-set-variable]

commit 8a30923e1598 ("cxgb4/chcr: Save tx keys and handle HW response")
involved this unused variable, remove it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomacsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)
Era Mayflower [Mon, 9 Mar 2020 19:47:02 +0000 (19:47 +0000)]
macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)

Netlink support of extended packet number cipher suites,
allows adding and updating XPN macsec interfaces.

Added support in:
    * Creating interfaces with GCM-AES-XPN-128 and GCM-AES-XPN-256 suites.
    * Setting and getting 64bit packet numbers with of SAs.
    * Setting (only on SA creation) and getting ssci of SAs.
    * Setting salt when installing a SAK.

Added 2 cipher suite identifiers according to 802.1AE-2018 table 14-1:
    * MACSEC_CIPHER_ID_GCM_AES_XPN_128
    * MACSEC_CIPHER_ID_GCM_AES_XPN_256

In addition, added 2 new netlink attribute types:
    * MACSEC_SA_ATTR_SSCI
    * MACSEC_SA_ATTR_SALT

Depends on: macsec: Support XPN frame handling - IEEE 802.1AEbw.

Signed-off-by: Era Mayflower <mayflowerera@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomacsec: Support XPN frame handling - IEEE 802.1AEbw
Era Mayflower [Mon, 9 Mar 2020 19:47:01 +0000 (19:47 +0000)]
macsec: Support XPN frame handling - IEEE 802.1AEbw

Support extended packet number cipher suites (802.1AEbw) frames handling.
This does not include the needed netlink patches.

    * Added xpn boolean field to `struct macsec_secy`.
    * Added ssci field to `struct_macsec_tx_sa` (802.1AE figure 10-5).
    * Added ssci field to `struct_macsec_rx_sa` (802.1AE figure 10-5).
    * Added salt field to `struct macsec_key` (802.1AE 10.7 NOTE 1).
    * Created pn_t type for easy access to lower and upper halves.
    * Created salt_t type for easy access to the "ssci" and "pn" parts.
    * Created `macsec_fill_iv_xpn` function to create IV in XPN mode.
    * Support in PN recovery and preliminary replay check in XPN mode.

In addition, according to IEEE 802.1AEbw figure 10-5, the PN of incoming
frame can be 0 when XPN cipher suite is used, so fixed the function
`macsec_validate_skb` to fail on PN=0 only if XPN is off.

Signed-off-by: Era Mayflower <mayflowerera@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-dsa-improve-serdes-integration'
David S. Miller [Mon, 16 Mar 2020 00:11:13 +0000 (17:11 -0700)]
Merge branch 'net-dsa-improve-serdes-integration'

Russell King says:

====================
net: dsa: improve serdes integration

Depends on "net: mii clause 37 helpers".

Andrew Lunn mentioned that the Serdes PCS found in Marvell DSA switches
does not automatically update the switch MACs with the link parameters.
Currently, the DSA code implements a work-around for this.

This series improves the Serdes integration, making use of the recent
phylink changes to support split MAC/PCS setups.  One noticable
improvement for userspace is that ethtool can now report the link
partner's advertisement.

This repost has no changes compared to the previous posting; however,
the regression Andrew had found which exists even without this patch
set has now been fixed by Andrew and merged into the net-next tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: mv88e6xxx: use PHY_DETECT in mac_link_up/mac_link_down
Russell King [Sat, 14 Mar 2020 10:16:03 +0000 (10:16 +0000)]
net: dsa: mv88e6xxx: use PHY_DETECT in mac_link_up/mac_link_down

Use the status of the PHY_DETECT bit to determine whether we need to
force the MAC settings in mac_link_up() and mac_link_down().

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: mv88e6xxx: remove port_link_state functions
Russell King [Sat, 14 Mar 2020 10:15:58 +0000 (10:15 +0000)]
net: dsa: mv88e6xxx: remove port_link_state functions

The port_link_state method is only used by mv88e6xxx_port_setup_mac(),
which is now only called during port setup, rather than also being
called via phylink's mac_config method.

Remove this now unnecessary optimisation, which allows us to remove the
port_link_state methods as well.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: mv88e6xxx: combine port_set_speed and port_set_duplex
Russell King [Sat, 14 Mar 2020 10:15:53 +0000 (10:15 +0000)]
net: dsa: mv88e6xxx: combine port_set_speed and port_set_duplex

Setting the speed independently of duplex makes little sense; the two
parameters result from negotiation or fixed setup, and may have inter-
dependencies. Moreover, they are always controlled via the same
register - having them split means we have to read-modify-write this
register twice.

Combine the two operations into a single port_set_speed_duplex()
operation. Not only is this more efficient, it reduces the size of the
code as well.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: mv88e6xxx: fix Serdes link changes
Russell King [Sat, 14 Mar 2020 10:15:48 +0000 (10:15 +0000)]
net: dsa: mv88e6xxx: fix Serdes link changes

phylink_mac_change() is supposed to be called with a 'false' argument
if the link has gone down since it was last reported up; this is to
ensure that link events along with renegotiation events are always
correctly reported to userspace.

Read the BMSR once when we have an interrupt, and report the link
latched status to phylink via phylink_mac_change().  phylink will deal
automatically with re-reading the link state once it has processed the
link-down event.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: mv88e6xxx: extend phylink to Serdes PHYs
Russell King [Sat, 14 Mar 2020 10:15:43 +0000 (10:15 +0000)]
net: dsa: mv88e6xxx: extend phylink to Serdes PHYs

Extend the mv88e6xxx phylink implementation down to Serdes PHYs, which
handle the PCS layer of such links.

- Implement phylink PCS link state reading, so that we can provide
  ethtool with the linkmodes and link speed in the expected manner.
  Note: this will only be called for in-band negotiation, which is
  only supported by the serdes interfaces.
- Implement phylink PCS configuration, so that the in-band AN and
  advertisement can be configured.
- Implement phylink PCS negotiation restart, so that the in-band AN
  can be restarted.
- Implement phylink PCS link up, so that when operating out-of-band,
  the Serdes can be configured for the appropriate fixed speed mode.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: mv88e6xxx: configure interface settings in mac_config
Russell King [Sat, 14 Mar 2020 10:15:38 +0000 (10:15 +0000)]
net: dsa: mv88e6xxx: configure interface settings in mac_config

Only configure the interface settings in mac_config(), leaving the
speed and duplex settings to mac_link_up to deal with.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: mv88e6xxx: use BMCR definitions for serdes control register
Russell King [Sat, 14 Mar 2020 10:15:33 +0000 (10:15 +0000)]
net: dsa: mv88e6xxx: use BMCR definitions for serdes control register

The SGMII/1000base-X serdes register set is a clause 22 register set
offset at 0x2000 in the PHYXS device. Rather than inventing our own
defintions, use those that already exist, and name the register
MV88E6390_SGMII_BMCR.  Also remove the unused MV88E6390_SGMII_STATUS
definitions.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: warn if phylink_mac_link_state returns error
Russell King [Sat, 14 Mar 2020 10:15:28 +0000 (10:15 +0000)]
net: dsa: warn if phylink_mac_link_state returns error

Issue a warning to the kernel log if phylink_mac_link_state() returns
an error. This should not occur, but let's make it visible.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-mii-clause-37-helpers'
David S. Miller [Mon, 16 Mar 2020 00:10:14 +0000 (17:10 -0700)]
Merge branch 'net-mii-clause-37-helpers'

Russell King says:

====================
net: mii clause 37 helpers

This is a re-post of two patches that are common to two series that
I've sent in recent weeks; I'm re-posting them separately in the hope
that they can be merged.  No changes from either of the previous
postings.

These patches:

1. convert the existing (unused) mii_lpa_to_ethtool_lpa_x() function
   to a linkmode variant.

2. add a helper for clause 37 advertisements, supporting both the
   1000baseX and defacto 2500baseX variants. Note that ethtool does
   not support half duplex for either of these, and we make no effort
   to do so.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mii: add linkmode_adv_to_mii_adv_x()
Russell King [Sat, 14 Mar 2020 10:09:58 +0000 (10:09 +0000)]
net: mii: add linkmode_adv_to_mii_adv_x()

Add a helper to convert a linkmode advertisement to a clause 37
advertisement value for 1000base-x and 2500base-x.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mii: convert mii_lpa_to_ethtool_lpa_x() to linkmode variant
Russell King [Sat, 14 Mar 2020 10:09:53 +0000 (10:09 +0000)]
net: mii: convert mii_lpa_to_ethtool_lpa_x() to linkmode variant

Add a LPA to linkmode decoder for 1000BASE-X protocols; this decoder
only provides the modify semantics similar to other such decoders.
This replaces the unused mii_lpa_to_ethtool_lpa_x() helper.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agocdc_ncm: Fix the build warning
Alexander Bersenev [Sat, 14 Mar 2020 05:33:24 +0000 (10:33 +0500)]
cdc_ncm: Fix the build warning

The ndp32->wLength is two bytes long, so replace cpu_to_le32 with cpu_to_le16.

Fixes: 0fa81b304a79 ("cdc_ncm: Implement the 32-bit version of NCM Transfer Block")
Signed-off-by: Alexander Bersenev <bay@hackerdom.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'mptcp-simplify-mptcp_accept'
David S. Miller [Sun, 15 Mar 2020 07:19:03 +0000 (00:19 -0700)]
Merge branch 'mptcp-simplify-mptcp_accept'

Paolo Abeni says:

====================
mptcp: simplify mptcp_accept()

Currently we allocate the MPTCP master socket at accept time.

The above makes mptcp_accept() quite complex, and requires checks is several
places for NULL MPTCP master socket.

These series simplify the MPTCP accept implementation, moving the master socket
allocation at syn-ack time, so that we drop unneeded checks with the follow-up
patch.

v1 -> v2:
- rebased on top of 2398e3991bda7caa6b112a6f650fbab92f732b91
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: drop unneeded checks
Paolo Abeni [Fri, 13 Mar 2020 15:52:42 +0000 (16:52 +0100)]
mptcp: drop unneeded checks

After the previous patch subflow->conn is always != NULL and
is never changed. We can drop a bunch of now unneeded checks.

v1 -> v2:
 - rebased on top of commit 2398e3991bda ("mptcp: always
   include dack if possible.")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomptcp: create msk early
Paolo Abeni [Fri, 13 Mar 2020 15:52:41 +0000 (16:52 +0100)]
mptcp: create msk early

This change moves the mptcp socket allocation from mptcp_accept() to
subflow_syn_recv_sock(), so that subflow->conn is now always set
for the non fallback scenario.

It allows cleaning up a bit mptcp_accept() reducing the additional
locking and will allow fourther cleanup in the next patch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: stmmac: platform: convert to devm_platform_ioremap_resource
Dejin Zheng [Fri, 13 Mar 2020 14:42:57 +0000 (22:42 +0800)]
net: stmmac: platform: convert to devm_platform_ioremap_resource

Use devm_platform_ioremap_resource() to simplify code, which
contains platform_get_resource and devm_ioremap_resource.

Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mscc: ocelot: adjust maxlen on NPI port, not CPU
Vladimir Oltean [Fri, 13 Mar 2020 13:46:51 +0000 (15:46 +0200)]
net: mscc: ocelot: adjust maxlen on NPI port, not CPU

Being a non-physical port, the CPU port does not have an ocelot_port
structure, so the ocelot_port_writel call inside the
ocelot_port_set_maxlen() function would access data behind a NULL
pointer.

This is a patch for net-next only, the net tree boots fine, the bug was
introduced during the net -> net-next merge.

Fixes: 1d3435793123 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Fixes: a8015ded89ad ("net: mscc: ocelot: properly account for VLAN header length when setting MRU")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotipc: add NULL pointer check to prevent kernel oops
Hoang Le [Fri, 13 Mar 2020 03:18:03 +0000 (10:18 +0700)]
tipc: add NULL pointer check to prevent kernel oops

Calling:
tipc_node_link_down()->
   - tipc_node_write_unlock()->tipc_mon_peer_down()
   - tipc_mon_peer_down()
  just after disabling bearer could be caused kernel oops.

Fix this by adding a sanity check to make sure valid memory
access.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotipc: simplify trivial boolean return
Hoang Le [Fri, 13 Mar 2020 03:18:02 +0000 (10:18 +0700)]
tipc: simplify trivial boolean return

Checking and returning 'true' boolean is useless as it will be
returning at end of function

Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'ethtool-consolidate-irq-coalescing-part-5'
David S. Miller [Sun, 15 Mar 2020 04:13:55 +0000 (21:13 -0700)]
Merge branch 'ethtool-consolidate-irq-coalescing-part-5'

Jakub Kicinski says:

====================
ethtool: consolidate irq coalescing - part 5

Convert more drivers following the groundwork laid in a recent
patch set [1] and continued in [2], [3], [4]. The aim of the effort
is to consolidate irq coalescing parameter validation in the core.

This set converts further 15 drivers in drivers/net/ethernet.
One more conversion sets to come.

[1] https://lore.kernel.org/netdev/20200305051542.991898-1-kuba@kernel.org/
[2] https://lore.kernel.org/netdev/20200306010602.1620354-1-kuba@kernel.org/
[3] https://lore.kernel.org/netdev/20200310021512.1861626-1-kuba@kernel.org/
[4] https://lore.kernel.org/netdev/20200311223302.2171564-1-kuba@kernel.org/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: via: reject unsupported coalescing params
Jakub Kicinski [Fri, 13 Mar 2020 04:08:03 +0000 (21:08 -0700)]
net: via: reject unsupported coalescing params

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sxgbe: reject unsupported coalescing params
Jakub Kicinski [Fri, 13 Mar 2020 04:08:02 +0000 (21:08 -0700)]
net: sxgbe: reject unsupported coalescing params

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: r8169: reject unsupported coalescing params
Jakub Kicinski [Fri, 13 Mar 2020 04:08:01 +0000 (21:08 -0700)]
net: r8169: reject unsupported coalescing params

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: qlnic: let core reject the unsupported coalescing parameters
Jakub Kicinski [Fri, 13 Mar 2020 04:08:00 +0000 (21:08 -0700)]
net: qlnic: let core reject the unsupported coalescing parameters

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver already correctly rejected almost all
unsupported parameters (missing sample_rate_interval).

As a side effect of these changes the error code for
unsupported params changes from EINVAL to EOPNOTSUPP.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: qede: reject unsupported coalescing params
Jakub Kicinski [Fri, 13 Mar 2020 04:07:59 +0000 (21:07 -0700)]
net: qede: reject unsupported coalescing params

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: netxen: let core reject the unsupported coalescing parameters
Jakub Kicinski [Fri, 13 Mar 2020 04:07:58 +0000 (21:07 -0700)]
net: netxen: let core reject the unsupported coalescing parameters

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

As a side effect of these changes the error code for
unsupported params changes from EINVAL to EOPNOTSUPP.

The driver was missing a check for rate_sample_interval.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: nixge: let core reject the unsupported coalescing parameters
Jakub Kicinski [Fri, 13 Mar 2020 04:07:57 +0000 (21:07 -0700)]
net: nixge: let core reject the unsupported coalescing parameters

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver correctly rejects all unsupported
parameters, no functional changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: myri10ge: reject unsupported coalescing params
Jakub Kicinski [Fri, 13 Mar 2020 04:07:56 +0000 (21:07 -0700)]
net: myri10ge: reject unsupported coalescing params

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sky2: reject unsupported coalescing params
Jakub Kicinski [Fri, 13 Mar 2020 04:07:55 +0000 (21:07 -0700)]
net: sky2: reject unsupported coalescing params

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: skge: reject unsupported coalescing params
Jakub Kicinski [Fri, 13 Mar 2020 04:07:54 +0000 (21:07 -0700)]
net: skge: reject unsupported coalescing params

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: octeontx2-pf: let core reject the unsupported coalescing parameters
Jakub Kicinski [Fri, 13 Mar 2020 04:07:53 +0000 (21:07 -0700)]
net: octeontx2-pf: let core reject the unsupported coalescing parameters

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver correctly rejects all unsupported
parameters, no functional changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mvpp2: reject unsupported coalescing params
Jakub Kicinski [Fri, 13 Mar 2020 04:07:52 +0000 (21:07 -0700)]
net: mvpp2: reject unsupported coalescing params

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mvneta: reject unsupported coalescing params
Jakub Kicinski [Fri, 13 Mar 2020 04:07:51 +0000 (21:07 -0700)]
net: mvneta: reject unsupported coalescing params

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mv643xx_eth: reject unsupported coalescing params
Jakub Kicinski [Fri, 13 Mar 2020 04:07:50 +0000 (21:07 -0700)]
net: mv643xx_eth: reject unsupported coalescing params

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: jme: reject unsupported coalescing params
Jakub Kicinski [Fri, 13 Mar 2020 04:07:49 +0000 (21:07 -0700)]
net: jme: reject unsupported coalescing params

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-phy-split-the-mscc-driver'
David S. Miller [Sun, 15 Mar 2020 04:06:45 +0000 (21:06 -0700)]
Merge branch 'net-phy-split-the-mscc-driver'

Antoine Tenart says:

====================
net: phy: split the mscc driver

This is a proposal to split the MSCC PHY driver, as its code base grew a
lot lately (it's already 3800+ lines). It also supports features
requiring a lot of code (MACsec), which would gain in being split from
the driver core, for readability and maintenance. This is also done as
other features should be coming later, which will also need lots of code
addition.

This series shouldn't change the way the driver works.

I checked, and there were no patch pending on this driver. This change
was done on top of all the modifications done on this driver in net-next.

Since v2:
  - Defined inline functions as static inline.
  - Fixed a locking issue reported by Kbuild.

Since v1:
  - Moved more definitions into the mscc_macsec.h header.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: mscc: fix header defines and descriptions
Antoine Tenart [Fri, 13 Mar 2020 09:48:02 +0000 (10:48 +0100)]
net: phy: mscc: fix header defines and descriptions

Cosmetic commit fixing the MSCC PHY header defines and descriptions,
which were referring the to MSCC Ocelot MAC driver (see
drivers/net/ethernet/mscc/).

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: mscc: split the driver into separate files
Antoine Tenart [Fri, 13 Mar 2020 09:48:01 +0000 (10:48 +0100)]
net: phy: mscc: split the driver into separate files

This patch splits the MSCC driver into separate files, per
functionality, to improve readability and maintenance as the codebase
grew a lot. The MACsec code is moved to a dedicated mscc_macsec.c file,
the mscc.c file is renamed to mscc_main.c to keep the driver binary to
be named mscc and common definition are put into a new mscc.h header.

Most of the code was just moved around, except for a few exceptions:
- Header inclusions were reworked to only keep what's needed.
- Three helpers were created in the MACsec code, to avoid #ifdef's in
  the main C file: vsc8584_macsec_init, vsc8584_handle_macsec_interrupt
  and vsc8584_config_macsec_intr.

The patch should not introduce any functional modification.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: move the mscc driver to its own directory
Antoine Tenart [Fri, 13 Mar 2020 09:48:00 +0000 (10:48 +0100)]
net: phy: move the mscc driver to its own directory

The MSCC PHY driver is growing, with lots of space consuming features
(firmware support, full initialization, MACsec...). It's becoming hard
to read and navigate in its source code. This patch moves the MSCC
driver to its own directory, without modifying anything, as a
preparation for splitting up its features into dedicated files.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'RED-Introduce-an-ECN-tail-dropping-mode'
David S. Miller [Sun, 15 Mar 2020 04:03:47 +0000 (21:03 -0700)]
Merge branch 'RED-Introduce-an-ECN-tail-dropping-mode'

Petr Machata says:

====================
RED: Introduce an ECN tail-dropping mode

When the RED qdisc is currently configured to enable ECN, the RED algorithm
is used to decide whether a certain SKB should be marked. If that SKB is
not ECN-capable, it is early-dropped.

It is also possible to keep all traffic in the queue, and just mark the
ECN-capable subset of it, as appropriate under the RED algorithm. Some
switches support this mode, and some installations make use of it.
There is currently no way to put the RED qdiscs to this mode.

Therefore this patchset adds a new RED flag, TC_RED_TAILDROP. When the
qdisc is configured with this flag, non-ECT traffic is enqueued (and
tail-dropped when the queue size is exhausted) instead of being
early-dropped.

Unfortunately, adding a new RED flag is not as simple as it sounds. RED
flags are passed in tc_red_qopt.flags. However RED neglects to validate the
flag field, and just copies it over wholesale to its internal structure,
and later dumps it back.

A broken userspace can therefore configure a RED qdisc with arbitrary
unsupported flags, and later expect to see the flags on qdisc dump. The
current ABI thus allows storage of 5 bits of custom data along with the
qdisc instance.

GRED, SFQ and CHOKE qdiscs are in the same situation. (GRED validates VQ
flags, but not the flags for the main queue.) E.g. if SFQ ever needs to
support TC_RED_ADAPTATIVE, it needs another way of doing it, and at the
same time it needs to retain the possibility to store 6 bits of
uninterpreted data.

For RED, this problem is resolved in patch #2, which adds a new attribute,
and a way to separate flags from userbits that can be reused by other
qdiscs. The flag itself and related behavioral changes are added in patch

To test the new feature, patch #1 first introduces a TDC testsuite that
covers the existing RED flags. Patch #5 later extends it with taildrop
coverage. Patch #6 contains a forwarding selftest for the offloaded
datapath.

To test the SW datapath, I took the mlxsw selftest and adapted it in mostly
obvious ways. The test is stable enough to verify that RED, ECN and ECN
taildrop actually work. However, I have no confidence in its portability to
other people's machines or mildly different configurations. I therefore do
not find it suitable for upstreaming.

GRED and CHOKE can use the same method as RED if they ever need to support
extra flags. SFQ uses the length of TCA_OPTIONS to dispatch on binary
control structure version, and would therefore need a different approach.

v2:
- Patch #1
    - Require nsPlugin in each RED test
    - Match end-of-line to catch cases of more flags reported than
      requested
- Patch #2:
    - Replaced with another patch.
- Patch #3:
    - Fix red_use_taildrop() condition in red_enqueue switch for
      probabilistic case.
- Patch #5:
    - Require nsPlugin in each RED test
    - Match end-of-line to catch cases of more flags reported than
      requested
    - Add a test for creation of non-ECN taildrop, which should fail
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: mlxsw: RED: Test RED ECN nodrop offload
Petr Machata [Thu, 12 Mar 2020 23:11:00 +0000 (01:11 +0200)]
selftests: mlxsw: RED: Test RED ECN nodrop offload

Extend RED testsuite to cover the new nodrop mode of RED-ECN. This test is
really similar to ECN test, diverging only in the last step, where UDP
traffic should go to backlog instead of being dropped. Thus extract a
common helper, ecn_test_common(), make do_ecn_test() into a relatively
simple wrapper, and add another one, do_ecn_nodrop_test().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: qdiscs: RED: Add nodrop tests
Petr Machata [Thu, 12 Mar 2020 23:10:59 +0000 (01:10 +0200)]
selftests: qdiscs: RED: Add nodrop tests

Add tests for the new "nodrop" flag.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlxsw: spectrum_qdisc: Offload RED ECN nodrop mode
Petr Machata [Thu, 12 Mar 2020 23:10:58 +0000 (01:10 +0200)]
mlxsw: spectrum_qdisc: Offload RED ECN nodrop mode

RED ECN nodrop mode means that non-ECT traffic should not be early-dropped,
but enqueued normally instead. In Spectrum systems, this is achieved by
disabling CWTPM.ew (enable WRED) for a given traffic class.

So far CWTPM.ew was unconditionally enabled. Instead disable it when the
RED qdisc is in nodrop mode.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sched: RED: Introduce an ECN nodrop mode
Petr Machata [Thu, 12 Mar 2020 23:10:57 +0000 (01:10 +0200)]
net: sched: RED: Introduce an ECN nodrop mode

When the RED Qdisc is currently configured to enable ECN, the RED algorithm
is used to decide whether a certain SKB should be marked. If that SKB is
not ECN-capable, it is early-dropped.

It is also possible to keep all traffic in the queue, and just mark the
ECN-capable subset of it, as appropriate under the RED algorithm. Some
switches support this mode, and some installations make use of it.

To that end, add a new RED flag, TC_RED_NODROP. When the Qdisc is
configured with this flag, non-ECT traffic is enqueued instead of being
early-dropped.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sched: Allow extending set of supported RED flags
Petr Machata [Thu, 12 Mar 2020 23:10:56 +0000 (01:10 +0200)]
net: sched: Allow extending set of supported RED flags

The qdiscs RED, GRED, SFQ and CHOKE use different subsets of the same pool
of global RED flags. These are passed in tc_red_qopt.flags. However none of
these qdiscs validate the flag field, and just copy it over wholesale to
internal structures, and later dump it back. (An exception is GRED, which
does validate for VQs -- however not for the main setup.)

A broken userspace can therefore configure a qdisc with arbitrary
unsupported flags, and later expect to see the flags on qdisc dump. The
current ABI therefore allows storage of several bits of custom data to
qdisc instances of the types mentioned above. How many bits, depends on
which flags are meaningful for the qdisc in question. E.g. SFQ recognizes
flags ECN and HARDDROP, and the rest is not interpreted.

If SFQ ever needs to support ADAPTATIVE, it needs another way of doing it,
and at the same time it needs to retain the possibility to store 6 bits of
uninterpreted data. Likewise RED, which adds a new flag later in this
patchset.

To that end, this patch adds a new function, red_get_flags(), to split the
passed flags of RED-like qdiscs to flags and user bits, and
red_validate_flags() to validate the resulting configuration. It further
adds a new attribute, TCA_RED_FLAGS, to pass arbitrary flags.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests: qdiscs: Add TDC test for RED
Petr Machata [Thu, 12 Mar 2020 23:10:55 +0000 (01:10 +0200)]
selftests: qdiscs: Add TDC test for RED

Add a handful of tests for creating RED with different flags.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosfc: support configuring vf spoofchk on EF10 VFs
Edward Cree [Thu, 12 Mar 2020 19:21:39 +0000 (19:21 +0000)]
sfc: support configuring vf spoofchk on EF10 VFs

Corresponds to the MAC_SPOOFING_TX privilege in the hardware.
Some firmware versions on some cards don't support the feature, so check
 the TX_MAC_SECURITY capability and fail EOPNOTSUPP if trying to enable
 spoofchk on a NIC that doesn't support it.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-phy-XLGMII-define-and-usage-in-PHYLINK'
David S. Miller [Sun, 15 Mar 2020 03:55:12 +0000 (20:55 -0700)]
Merge branch 'net-phy-XLGMII-define-and-usage-in-PHYLINK'

Jose Abreu says:

====================
net: phy: XLGMII define and usage in PHYLINK

Adds XLGMII defines and usage in PHYLINK.

Patch 1/2, adds the define for it, whilst 2/2 adds the usage of it in
PHYLINK.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phylink: Add XLGMII support
Jose Abreu [Thu, 12 Mar 2020 17:10:10 +0000 (18:10 +0100)]
net: phylink: Add XLGMII support

Add XLGMII interface and the list of XLGMII speeds to PHYLINK.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: Add XLGMII interface define
Jose Abreu [Thu, 12 Mar 2020 17:10:09 +0000 (18:10 +0100)]
net: phy: Add XLGMII interface define

Add a define for XLGMII interface.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: ethtool: clean up minor indentation issue
Colin Ian King [Thu, 12 Mar 2020 14:05:22 +0000 (14:05 +0000)]
net: ena: ethtool: clean up minor indentation issue

There is a statement that is indented incorrectly, remove a space.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: sja1105: move MAC configuration to .phylink_mac_link_up
Vladimir Oltean [Thu, 12 Mar 2020 12:19:51 +0000 (12:19 +0000)]
net: dsa: sja1105: move MAC configuration to .phylink_mac_link_up

The switches supported so far by the driver only have non-SerDes ports,
so they should be configured in the PHYLINK callback that provides the
resolved PHY link parameters.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agocxgb4: update T5/T6 adapter register ranges
Shahjada Abul Husain [Thu, 12 Mar 2020 11:42:40 +0000 (17:12 +0530)]
cxgb4: update T5/T6 adapter register ranges

Add more T5/T6 registers to be collected in register dump:

1. MPS register range 0x9810 to 0x9864 and 0xd000 to 0xd004.
2. NCSI register range 0x1a114 to 0x1a130 and 0x1a138 to 0x1a1c4.

Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag 'mlx5-updates-2020-03-13' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Sat, 14 Mar 2020 04:04:03 +0000 (21:04 -0700)]
Merge tag 'mlx5-updates-2020-03-13' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-03-13

Misc update to mlx5 core and E-Switch driver:

1) Blue-Field, Update VF vports config when num of VFs changed

From Bodon, Various misc cleanups and refactoring
for vport enabling/disabling routines to allow them to be called
dynamically and not only on E-Switch load.

This will allow ECPF (ConnectX BlueField Smartnic) support for dynamic
num vf changes and dynamic vport creation and configuration as introduced
in "Update VF vports config when num of VFs changed" patch.

2) From Parav and Mark, trivial clean-ups.

3) Software steering support for flow table id as destination
and a clean-up patch to remove unnecessary function stubs, from Alex.
====================

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Sat, 14 Mar 2020 03:52:03 +0000 (20:52 -0700)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2020-03-13

The following pull-request contains BPF updates for your *net-next* tree.

We've added 86 non-merge commits during the last 12 day(s) which contain
a total of 107 files changed, 5771 insertions(+), 1700 deletions(-).

The main changes are:

1) Add modify_return attach type which allows to attach to a function via
   BPF trampoline and is run after the fentry and before the fexit programs
   and can pass a return code to the original caller, from KP Singh.

2) Generalize BPF's kallsyms handling and add BPF trampoline and dispatcher
   objects to be visible in /proc/kallsyms so they can be annotated in
   stack traces, from Jiri Olsa.

3) Extend BPF sockmap to allow for UDP next to existing TCP support in order
   in order to enable this for BPF based socket dispatch, from Lorenz Bauer.

4) Introduce a new bpftool 'prog profile' command which attaches to existing
   BPF programs via fentry and fexit hooks and reads out hardware counters
   during that period, from Song Liu. Example usage:

   bpftool prog profile id 337 duration 3 cycles instructions llc_misses

        4228 run_cnt
     3403698 cycles                                              (84.08%)
     3525294 instructions   #  1.04 insn per cycle               (84.05%)
          13 llc_misses     #  3.69 LLC misses per million isns  (83.50%)

5) Batch of improvements to libbpf, bpftool and BPF selftests. Also addition
   of a new bpf_link abstraction to keep in particular BPF tracing programs
   attached even when the applicaion owning them exits, from Andrii Nakryiko.

6) New bpf_get_current_pid_tgid() helper for tracing to perform PID filtering
   and which returns the PID as seen by the init namespace, from Carlos Neira.

7) Refactor of RISC-V JIT code to move out common pieces and addition of a
   new RV32G BPF JIT compiler, from Luke Nelson.

8) Add gso_size context member to __sk_buff in order to be able to know whether
   a given skb is GSO or not, from Willem de Bruijn.

9) Add a new bpf_xdp_output() helper which reuses XDP's existing perf RB output
   implementation but can be called from tracepoint programs, from Eelco Chaudron.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/mlx5: DR, Remove unneeded functions deceleration
Alex Vesker [Sun, 8 Mar 2020 11:21:41 +0000 (13:21 +0200)]
net/mlx5: DR, Remove unneeded functions deceleration

Remove dummy functions declaration, the dummy functions are not needed
since fs_dr is the only one to call mlx5dr and both fs_dr and dr files
depend on the same config flag (MLX5_SW_STEERING).

Fixes: 70605ea545e8 ("net/mlx5: DR, Expose APIs for direct rule managing")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: DR, Add support for flow table id destination action
Alex Vesker [Wed, 26 Feb 2020 09:39:45 +0000 (11:39 +0200)]
net/mlx5: DR, Add support for flow table id destination action

This action allows to go to a flow table based on the table id.
Goto flow table id is required for supporting user space SW.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: Avoid deriving mlx5_core_dev second time
Parav Pandit [Wed, 18 Dec 2019 05:16:11 +0000 (23:16 -0600)]
net/mlx5: Avoid deriving mlx5_core_dev second time

All callers needs to work on mlx5_core_dev and it is already derived
before calling mlx5_devlink_eswitch_check().
Hence, accept mlx5_core_dev in mlx5_devlink_eswitch_check().

Given that it works on mlx5_core_dev change helper function name to
drop devlink prefix.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: E-switch, Annotate esw state_lock mutex destroy
Parav Pandit [Wed, 18 Dec 2019 04:51:24 +0000 (22:51 -0600)]
net/mlx5: E-switch, Annotate esw state_lock mutex destroy

Invoke mutex_destroy() to catch any esw state_lock errors.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: E-switch, Annotate termtbl_mutex mutex destroy
Parav Pandit [Sat, 14 Dec 2019 09:24:25 +0000 (03:24 -0600)]
net/mlx5: E-switch, Annotate termtbl_mutex mutex destroy

Annotate mutex destroy to keep it symmetric to init sequence.
It should be destroyed after its users (representor netdevices) are
destroyed in below flow.

esw_offloads_disable()
  esw_offloads_unload_rep()

Hence, initialize the mutex before creating the representors which uses
it.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: Accept flow rules without match
Mark Bloch [Fri, 17 Jan 2020 18:30:32 +0000 (18:30 +0000)]
net/mlx5: Accept flow rules without match

Allow passing NULL spec when creating a flow rule. Such rules will act
as "catch all" flow rules.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: E-Switch, Refactor unload all reps per rep type
Bodong Wang [Tue, 12 Nov 2019 17:56:12 +0000 (11:56 -0600)]
net/mlx5: E-Switch, Refactor unload all reps per rep type

Following introduction of per vport configuration of vport and rep,
unload all reps per rep type is still needed as IB reps can be
unloaded individually. However, a few internal functions exist purely
for this purpose, merge them to a single function.

This patch doesn't change any existing functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: E-Switch, Update VF vports config when num of VFs changed
Bodong Wang [Tue, 12 Nov 2019 17:30:10 +0000 (11:30 -0600)]
net/mlx5: E-Switch, Update VF vports config when num of VFs changed

Currently, ECPF eswitch manager does one-time only configuration for
VF vports when device switches to offloads mode. However, when num of
VFs changed from host side, driver doesn't update VF vports
configurations.

Hence, perform VFs vport configuration update whenever num_vfs change
event occurs.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: E-Switch, Introduce per vport configuration for eswitch modes
Bodong Wang [Mon, 11 Nov 2019 22:40:35 +0000 (16:40 -0600)]
net/mlx5: E-Switch, Introduce per vport configuration for eswitch modes

Both legacy and offload modes require vport setup, only offload mode
requires rep setup. Before this patch, vport and rep operations are
separated applied to all relevant vports in different stages.

Change to use per vport configuration, so that vport and rep operations
are modularized per vport.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: E-switch, Make vport setup/cleanup sequence symmetric
Bodong Wang [Wed, 16 Oct 2019 16:19:34 +0000 (11:19 -0500)]
net/mlx5: E-switch, Make vport setup/cleanup sequence symmetric

Vport enable and disable sequence is incorrect. It should be:
  enable()
  esw_vport_setup_acl,
  esw_vport_setup,
  esw_vport_enable_qos.

  disable()
  esw_vport_disable_qos,
  esw_vport_cleanup,
  esw_vport_cleanup_acl.

Instead of having two setup functions for port and acl, merge
acl setup to port setup function.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: E-Switch, Prepare for vport enable/disable refactor
Bodong Wang [Fri, 30 Aug 2019 20:41:09 +0000 (15:41 -0500)]
net/mlx5: E-Switch, Prepare for vport enable/disable refactor

Rename esw_apply_vport_config() to esw_vport_setup(), and add new
helper function esw_vport_cleanup() to make them symmetric.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: E-Switch, Remove redundant warning when QoS enable failed
Bodong Wang [Thu, 17 Oct 2019 19:55:52 +0000 (14:55 -0500)]
net/mlx5: E-Switch, Remove redundant warning when QoS enable failed

esw_vport_enable_qos can return error in cases below:
1. QoS is already enabled. Warnning is useless in this case.
2. Create scheduling element cmd failed. There is already a warning.

Remove the redundant warnning if esw_vport_enable_qos returns err.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: E-Switch, Hold mutex when querying drop counter in legacy mode
Bodong Wang [Fri, 13 Sep 2019 21:24:19 +0000 (16:24 -0500)]
net/mlx5: E-Switch, Hold mutex when querying drop counter in legacy mode

Consider scenario below, CPU 1 is at risk to query already destroyed
drop counters. Need to apply the same state mutex when disabling vport.

+-------------------------------+-------------------------------------+
| CPU 0                         | CPU 1                               |
+-------------------------------+-------------------------------------+
| mlx5_device_disable_sriov     | mlx5e_get_vf_stats                  |
| mlx5_eswitch_disable          | mlx5_eswitch_get_vport_stats        |
| esw_disable_vport             | mlx5_eswitch_query_vport_drop_stats |
| mlx5_fc_destroy(drop_counter) | mlx5_fc_query(drop_counter)         |
+-------------------------------+-------------------------------------+

Fixes: b8a0dbe3a90b ("net/mlx5e: E-switch, Add steering drop counters")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: E-Switch, Remove redundant check of eswitch manager cap
Bodong Wang [Thu, 2 Jan 2020 21:30:52 +0000 (15:30 -0600)]
net/mlx5: E-Switch, Remove redundant check of eswitch manager cap

esw_vport_create_legacy_acl_tables bails out immediately for eswitch
manager, hence remove all the check of esw manager cap after.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agoMerge branch 'bpf-core-fixes'
Daniel Borkmann [Fri, 13 Mar 2020 22:30:53 +0000 (23:30 +0100)]
Merge branch 'bpf-core-fixes'

Andrii Nakryiko says:

====================
This patch set fixes bug in CO-RE relocation candidate finding logic, which
currently allows matching against forward declarations, functions, and other
named types, even though it makes no sense to even attempt. As part of
verifying the fix, add test using vmlinux.h with preserve_access_index
attribute and utilizing struct pt_regs heavily to trace nanosleep syscall
using 5 different types of tracing BPF programs.

This test also demonstrated problems using struct pt_regs in syscall
tracepoints and required a new set of macro, which were added in patch #3
into bpf_tracing.h.

Patch #1 fixes annoying issue with selftest failure messages being out of
sync.

v1->v2:
  - drop unused handle__probed() function (Martin).
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>