review.tizen.org Git - platform/kernel/linux-rpi.git/log

tools/bpftool: Fix error handing in do_skeleton()

Fix pass 0 to PTR_ERR, also dump more err info using
libbpf_strerror.

Fixes: 5dc7a8b21144 ("bpftool, selftests/bpf: Embed object file inside skeleton")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200717123059.29624-1-yuehaibing@huawei.com

libbpf bpf_helpers: Use __builtin_offsetof for offsetof

The non-builtin route for offsetof has a dependency on size_t from
stdlib.h/stdint.h that is undeclared and may break targets.
The offsetof macro in bpf_helpers may disable the same macro in other
headers that have a #ifdef offsetof guard. Rather than add additional
dependencies improve the offsetof macro declared here to use the
builtin that is available since llvm 3.7 (the first with a BPF backend).

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200720061741.1514673-1-irogers@google.com

s390/bpf: Use bpf_skip() in bpf_jit_prologue()

Now that we have bpf_skip() for emitting nops, use it in
bpf_jit_prologue() in order to reduce code duplication.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717165326.6786-6-iii@linux.ibm.com

s390/bpf: Tolerate not converging code shrinking

"BPF_MAXINSNS: Maximum possible literals" unnecessarily falls back to
the interpreter because of failing sanity check in bpf_set_addr. The
problem is that there are a lot of branches that can be shrunk, and
doing so opens up the possibility to shrink even more. This process
does not converge after 3 passes, causing code offsets to change during
the codegen pass, which must never happen.

Fix by inserting nops during codegen pass in order to preserve code
offets.

Fixes: 4e9b4a6883dd ("s390/bpf: Use relative long branches")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717165326.6786-5-iii@linux.ibm.com

s390/bpf: Use brcl for jumping to exit_ip if necessary

"BPF_MAXINSNS: Maximum possible literals" test causes panic with
bpf_jit_harden = 2. The reason is that BPF_JMP | BPF_EXIT is always
emitted as brc, however, after removal of JITed image size
limitations, brcl might be required.

Fix by using brcl when necessary.

Fixes: 4e9b4a6883dd ("s390/bpf: Use relative long branches")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717165326.6786-4-iii@linux.ibm.com

s390/bpf: Fix sign extension in branch_ku

Both signed and unsigned variants of BPF_JMP | BPF_K require
sign-extending the immediate. JIT emits cgfi for the signed case,
which is correct, and clgfi for the unsigned case, which is not
correct: clgfi zero-extends the immediate.

s390 does not provide an instruction that does sign-extension and
unsigned comparison at the same time. Therefore, fix by first loading
the sign-extended immediate into work register REG_1 and proceeding
as if it's BPF_X.

Fixes: 4e9b4a6883dd ("s390/bpf: Use relative long branches")
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Link: https://lore.kernel.org/bpf/20200717165326.6786-3-iii@linux.ibm.com

selftests: bpf: test_kmod.sh: Fix running out of srctree

When running out of srctree, relative path to lib/test_bpf.ko is
different than when running in srctree. Check $building_out_of_srctree
environment variable and use a different relative path if needed.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717165326.6786-2-iii@linux.ibm.com

bpf: cpumap: Fix possible rcpu kthread hung

Fix the following cpumap kthread hung. The issue is currently occurring
when __cpu_map_load_bpf_program fails (e.g if the bpf prog has not
BPF_XDP_CPUMAP as expected_attach_type)

$./test_progs -n 101
101/1 cpumap_with_progs:OK
101 xdp_cpumap_attach:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
[  369.996478] INFO: task cpumap/0/map:7:205 blocked for more than 122 seconds.
[  369.998463]       Not tainted 5.8.0-rc4-01472-ge57892f50a07 #212
[  370.000102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  370.001918] cpumap/0/map:7  D    0   205      2 0x00004000
[  370.003228] Call Trace:
[  370.003930]  __schedule+0x5c7/0xf50
[  370.004901]  ? io_schedule_timeout+0xb0/0xb0
[  370.005934]  ? static_obj+0x31/0x80
[  370.006788]  ? mark_held_locks+0x24/0x90
[  370.007752]  ? cpu_map_bpf_prog_run_xdp+0x6c0/0x6c0
[  370.008930]  schedule+0x6f/0x160
[  370.009728]  schedule_preempt_disabled+0x14/0x20
[  370.010829]  kthread+0x17b/0x240
[  370.011433]  ? kthread_create_worker_on_cpu+0xd0/0xd0
[  370.011944]  ret_from_fork+0x1f/0x30
[  370.012348]
               Showing all locks held in the system:
[  370.013025] 1 lock held by khungtaskd/33:
[  370.013432]  #0: ffffffff82b24720 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x28/0x1c3

[  370.014461] =============================================

Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap")
Reported-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/e54f2aabf959f298939e5507b09c48f8c2e380be.1595170625.git.lorenzo@kernel.org

bpf, netns: Fix build without CONFIG_INET

When CONFIG_NET is set but CONFIG_INET isn't, build fails with:

  ld: kernel/bpf/net_namespace.o: in function `netns_bpf_attach_type_unneed':
  kernel/bpf/net_namespace.c:32: undefined reference to `bpf_sk_lookup_enabled'
  ld: kernel/bpf/net_namespace.o: in function `netns_bpf_attach_type_need':
  kernel/bpf/net_namespace.c:43: undefined reference to `bpf_sk_lookup_enabled'

This is because without CONFIG_INET bpf_sk_lookup_enabled symbol is not
available. Wrap references to bpf_sk_lookup_enabled with preprocessor
conditionals.

Fixes: 1559b4aa1db4 ("inet: Run SK_LOOKUP BPF program on socket lookup")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/bpf/20200721100716.720477-1-jakub@cloudflare.com

Merge branch 'bpf-socket-lookup'

Jakub Sitnicki says:

====================
Changelog
=========
v4 -> v5:
- Enforce BPF prog return value to be SK_DROP or SK_PASS. (Andrii)
- Simplify prog runners now that only SK_DROP/PASS can be returned.
- Enable bpf_perf_event_output from the start. (Andrii)
- Drop patch
  "selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c"
- Remove tests for narrow loads from context at an offset wider in size
  than target field, while we are discussing how to fix it:
  https://lore.kernel.org/bpf/20200710173123.427983-1-jakub@cloudflare.com/
- Rebase onto recent bpf-next (bfdfa51702de)
- Other minor changes called out in per-patch changelogs,
  see patches: 2, 4, 6, 13-15
- Carried over Andrii's Acks where nothing changed.

v3 -> v4:
- Reduce BPF prog return codes to SK_DROP/SK_PASS (Lorenz)
- Default to drop on illegal return value from BPF prog (Lorenz)
- Extend bpf_sk_assign to accept NULL socket pointer.
- Switch to saner return values and add docs for new prog_array API (Andrii)
- Add support for narrow loads from BPF context fields (Yonghong)
- Fix broken build when IPv6 is compiled as a module (kernel test robot)
- Fix null/wild-ptr-deref on BPF context access
- Rebase to recent bpf-next (eef8a42d6ce0)
- Other minor changes called out in per-patch changelogs,
  see patches 1-2, 4, 6, 8, 10-12, 14, 16

v2 -> v3:
- Switch to link-based program attachment
- Support for multi-prog attachment
- Ability to skip reuseport socket selection
- Code on RX path is guarded by a static key
- struct in6_addr's are no longer copied into BPF prog context
- BPF prog context is initialized as late as possible
- Changes called out in patches 1-2, 4, 6, 8, 10-14, 16
- Patches dropped:
  01/17 flow_dissector: Extract attach/detach/query helpers
  03/17 inet: Store layer 4 protocol in inet_hashinfo
  08/17 udp: Store layer 4 protocol in udp_table

v1 -> v2:
- Changes called out in patches 2, 13-15, 17
- Rebase to recent bpf-next (b4563facdcae)

RFCv2 -> v1:

- Switch to fetching a socket from a map and selecting a socket with
  bpf_sk_assign, instead of having a dedicated helper that does both.
- Run reuseport logic on sockets selected by BPF sk_lookup.
- Allow BPF sk_lookup to fail the lookup with no match.
- Go back to having just 2 hash table lookups in UDP.

RFCv1 -> RFCv2:

- Make socket lookup redirection map-based. BPF program now uses a
  dedicated helper and a SOCKARRAY map to select the socket to redirect to.
  A consequence of this change is that bpf_inet_lookup context is now
  read-only.
- Look for connected UDP sockets before allowing redirection from BPF.
  This makes connected UDP socket work as expected in the presence of
  inet_lookup prog.
- Share the code for BPF_PROG_{ATTACH,DETACH,QUERY} with flow_dissector,
  the only other per-netns BPF prog type.

Overview
========

This series proposes a new BPF program type named BPF_PROG_TYPE_SK_LOOKUP,
or BPF sk_lookup for short.

BPF sk_lookup program runs when transport layer is looking up a listening
socket for a new connection request (TCP), or when looking up an
unconnected socket for a packet (UDP).

This serves as a mechanism to overcome the limits of what bind() API allows
to express. Two use-cases driving this work are:

(1) steer packets destined to an IP range, fixed port to a single socket

     192.0.2.0/24, port 80 -> NGINX socket

(2) steer packets destined to an IP address, any port to a single socket

     198.51.100.1, any port -> L7 proxy socket

In its context, program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple.

To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection, and returns SK_PASS code. Transport layer
then uses the selected socket as a result of socket lookup.

Alternatively, program can also fail the lookup (SK_DROP), or let the
lookup continue as usual (SK_PASS without selecting a socket).

This lets the user match packets with listening (TCP) or receiving (UDP)
sockets freely at the last possible point on the receive path, where we
know that packets are destined for local delivery after undergoing
policing, filtering, and routing.

Program is attached to a network namespace, similar to BPF flow_dissector.
We add a new attach type, BPF_SK_LOOKUP, for this. Multiple programs can be
attached at the same time, in which case their return values are aggregated
according the rules outlined in patch #4 description.

Series structure
================

Patches are organized as so:

1: enables multiple link-based prog attachments for bpf-netns
2: introduces sk_lookup program type
3-4: hook up the program to run on ipv4/tcp socket lookup
5-6: hook up the program to run on ipv6/tcp socket lookup
7-8: hook up the program to run on ipv4/udp socket lookup
9-10: hook up the program to run on ipv6/udp socket lookup
11-13: libbpf & bpftool support for sk_lookup
14-15: verifier and selftests for sk_lookup

Patches are also available on GH:

  https://github.com/jsitnicki/linux/commits/bpf-inet-lookup-v5

Follow-up work
==============

I'll follow up with below items, which IMHO don't block the review:

- benchmark results for udp6 small packet flood scenario,
- user docs for new BPF prog type, Documentation/bpf/prog_sk_lookup.rst,
- timeout for accept() in tests after extending network_helper.[ch].

Thanks to the reviewers for their feedback to this patch series:

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Marek Majkowski <marek@cloudflare.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
-jkbs

[RFCv1] https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@cloudflare.com/
[RFCv2] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
[v1] https://lore.kernel.org/bpf/20200511185218.1422406-18-jakub@cloudflare.com/
[v2] https://lore.kernel.org/bpf/20200506125514.1020829-1-jakub@cloudflare.com/
[v3] https://lore.kernel.org/bpf/20200702092416.11961-1-jakub@cloudflare.com/
[v4] https://lore.kernel.org/bpf/20200713174654.642628-1-jakub@cloudflare.com/
====================

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Tests for BPF_SK_LOOKUP attach point

Add tests to test_progs that exercise:

- attaching/detaching/querying programs to BPF_SK_LOOKUP hook,
- redirecting socket lookup to a socket selected by BPF program,
- failing a socket lookup on BPF program's request,
- error scenarios for selecting a socket from BPF program,
- accessing BPF program context,
- attaching and running multiple BPF programs.

Run log:

  bash-5.0# ./test_progs -n 70
  #70/1 query lookup prog:OK
  #70/2 TCP IPv4 redir port:OK
  #70/3 TCP IPv4 redir addr:OK
  #70/4 TCP IPv4 redir with reuseport:OK
  #70/5 TCP IPv4 redir skip reuseport:OK
  #70/6 TCP IPv6 redir port:OK
  #70/7 TCP IPv6 redir addr:OK
  #70/8 TCP IPv4->IPv6 redir port:OK
  #70/9 TCP IPv6 redir with reuseport:OK
  #70/10 TCP IPv6 redir skip reuseport:OK
  #70/11 UDP IPv4 redir port:OK
  #70/12 UDP IPv4 redir addr:OK
  #70/13 UDP IPv4 redir with reuseport:OK
  #70/14 UDP IPv4 redir skip reuseport:OK
  #70/15 UDP IPv6 redir port:OK
  #70/16 UDP IPv6 redir addr:OK
  #70/17 UDP IPv4->IPv6 redir port:OK
  #70/18 UDP IPv6 redir and reuseport:OK
  #70/19 UDP IPv6 redir skip reuseport:OK
  #70/20 TCP IPv4 drop on lookup:OK
  #70/21 TCP IPv6 drop on lookup:OK
  #70/22 UDP IPv4 drop on lookup:OK
  #70/23 UDP IPv6 drop on lookup:OK
  #70/24 TCP IPv4 drop on reuseport:OK
  #70/25 TCP IPv6 drop on reuseport:OK
  #70/26 UDP IPv4 drop on reuseport:OK
  #70/27 TCP IPv6 drop on reuseport:OK
  #70/28 sk_assign returns EEXIST:OK
  #70/29 sk_assign honors F_REPLACE:OK
  #70/30 sk_assign accepts NULL socket:OK
  #70/31 access ctx->sk:OK
  #70/32 narrow access to ctx v4:OK
  #70/33 narrow access to ctx v6:OK
  #70/34 sk_assign rejects TCP established:OK
  #70/35 sk_assign rejects UDP connected:OK
  #70/36 multi prog - pass, pass:OK
  #70/37 multi prog - drop, drop:OK
  #70/38 multi prog - pass, drop:OK
  #70/39 multi prog - drop, pass:OK
  #70/40 multi prog - pass, redir:OK
  #70/41 multi prog - redir, pass:OK
  #70/42 multi prog - drop, redir:OK
  #70/43 multi prog - redir, drop:OK
  #70/44 multi prog - redir, redir:OK
  #70 sk_lookup:OK
  Summary: 1/44 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-16-jakub@cloudflare.com

selftests/bpf: Add verifier tests for bpf_sk_lookup context access

Exercise verifier access checks for bpf_sk_lookup context fields.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-15-jakub@cloudflare.com

tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type

Make bpftool show human-friendly identifiers for newly introduced program
and attach type, BPF_PROG_TYPE_SK_LOOKUP and BPF_SK_LOOKUP, respectively.

Also, add the new prog type bash-completion, man page and help message.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-14-jakub@cloudflare.com

libbpf: Add support for SK_LOOKUP program type

Make libbpf aware of the newly added program type, and assign it a
section name.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200717103536.397595-13-jakub@cloudflare.com

bpf: Sync linux/bpf.h to tools/

Newly added program, context type and helper is used by tests in a
subsequent patch. Synchronize the header file.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200717103536.397595-12-jakub@cloudflare.com

udp6: Run SK_LOOKUP BPF program on socket lookup

Same as for udp4, let BPF program override the socket lookup result, by
selecting a receiving socket of its choice or failing the lookup, if no
connected UDP socket matched packet 4-tuple.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200717103536.397595-11-jakub@cloudflare.com

udp6: Extract helper for selecting socket from reuseport group

Prepare for calling into reuseport from __udp6_lib_lookup as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200717103536.397595-10-jakub@cloudflare.com

udp: Run SK_LOOKUP BPF program on socket lookup

Following INET/TCP socket lookup changes, modify UDP socket lookup to let
BPF program select a receiving socket before searching for a socket by
destination address and port as usual.

Lookup of connected sockets that match packet 4-tuple is unaffected by this
change. BPF program runs, and potentially overrides the lookup result, only
if a 4-tuple match was not found.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200717103536.397595-9-jakub@cloudflare.com

udp: Extract helper for selecting socket from reuseport group

Prepare for calling into reuseport from __udp4_lib_lookup as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200717103536.397595-8-jakub@cloudflare.com

inet6: Run SK_LOOKUP BPF program on socket lookup

Following ipv4 stack changes, run a BPF program attached to netns before
looking up a listening socket. Program can return a listening socket to use
as result of socket lookup, fail the lookup, or take no action.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-7-jakub@cloudflare.com

inet6: Extract helper for selecting socket from reuseport group

Prepare for calling into reuseport from inet6_lookup_listener as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200717103536.397595-6-jakub@cloudflare.com

inet: Run SK_LOOKUP BPF program on socket lookup

Run a BPF program before looking up a listening socket on the receive path.
Program selects a listening socket to yield as result of socket lookup by
calling bpf_sk_assign() helper and returning SK_PASS code. Program can
revert its decision by assigning a NULL socket with bpf_sk_assign().

Alternatively, BPF program can also fail the lookup by returning with
SK_DROP, or let the lookup continue as usual with SK_PASS on return, when
no socket has been selected with bpf_sk_assign().

This lets the user match packets with listening sockets freely at the last
possible point on the receive path, where we know that packets are destined
for local delivery after undergoing policing, filtering, and routing.

With BPF code selecting the socket, directing packets destined to an IP
range or to a port range to a single socket becomes possible.

In case multiple programs are attached, they are run in series in the order
in which they were attached. The end result is determined from return codes
of all the programs according to following rules:

1. If any program returned SK_PASS and selected a valid socket, the socket
    is used as result of socket lookup.
2. If more than one program returned SK_PASS and selected a socket,
    last selection takes effect.
3. If any program returned SK_DROP, and no program returned SK_PASS and
    selected a socket, socket lookup fails with -ECONNREFUSED.
4. If all programs returned SK_PASS and none of them selected a socket,
    socket lookup continues to htable-based lookup.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-5-jakub@cloudflare.com

inet: Extract helper for selecting socket from reuseport group

Prepare for calling into reuseport from __inet_lookup_listener as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200717103536.397595-4-jakub@cloudflare.com

bpf: Introduce SK_LOOKUP program type with a dedicated attach point

Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type
BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer
when looking up a listening socket for a new connection request for
connection oriented protocols, or when looking up an unconnected socket for
a packet for connection-less protocols.

When called, SK_LOOKUP BPF program can select a socket that will receive
the packet. This serves as a mechanism to overcome the limits of what
bind() API allows to express. Two use-cases driving this work are:

(1) steer packets destined to an IP range, on fixed port to a socket

192.0.2.0/24, port 80 -> NGINX socket

(2) steer packets destined to an IP address, on any port to a socket

198.51.100.1, any port -> L7 proxy socket

In its run-time context program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple. Context can be further extended to include ingress
interface identifier.

To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection. Transport layer then uses the selected
socket as a result of socket lookup.

In its basic form, SK_LOOKUP acts as a filter and hence must return either
SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should
look for a socket to receive the packet, or use the one selected by the
program if available, while SK_DROP informs the transport layer that the
lookup should fail.

This patch only enables the user to attach an SK_LOOKUP program to a
network namespace. Subsequent patches hook it up to run on local delivery
path in ipv4 and ipv6 stacks.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-3-jakub@cloudflare.com

bpf, netns: Handle multiple link attachments

Extend the BPF netns link callbacks to rebuild (grow/shrink) or update the
prog_array at given position when link gets attached/updated/released.

This let's us lift the limit of having just one link attached for the new
attach type introduced by subsequent patch.

No functional changes intended.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200717103536.397595-2-jakub@cloudflare.com

bpf: Drop duplicated words in uapi helper comments

Drop doubled words "will" and "attach".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/6b9f71ae-4f8e-0259-2c5d-187ddaefe6eb@infradead.org

selftests/bpf: Fix possible hang in sockopt_inherit

Andrii reported that sockopt_inherit occasionally hangs up on 5.5 kernel [0].
This can happen if server_thread runs faster than the main thread.
In that case, pthread_cond_wait will wait forever because
pthread_cond_signal was executed before the main thread was blocking.
Let's move pthread_mutex_lock up a bit to make sure server_thread
runs strictly after the main thread goes to sleep.

(Not sure why this is 5.5 specific, maybe scheduling is less
deterministic? But I was able to confirm that it does indeed
happen in a VM.)

[0] https://lore.kernel.org/bpf/CAEf4BzY0-bVNHmCkMFPgObs=isUAyg-dFzGDY7QWYkmm7rmTSg@mail.gmail.com/

Reported-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200715224107.3591967-1-sdf@google.com

bpf: revert "test_bpf: Flag tests that cannot be jited on s390"

This reverts commit 3203c9010060 ("test_bpf: flag tests that cannot
be jited on s390").

The s390 bpf JIT previously had a restriction on the maximum program
size, which required some tests in test_bpf to be flagged as expected
failures. The program size limitation has been removed, and the tests
now pass, so these tests should no longer be flagged.

Fixes: d1242b10ff03 ("s390/bpf: Remove JITed image size limitations")
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20200716143931.330122-1-seth.forshee@canonical.com

selftest: Add tests for XDP programs in CPUMAP entries

Similar to what have been done for DEVMAP, introduce tests to verify
ability to add a XDP program to an entry in a CPUMAP.
Verify CPUMAP programs can not be attached to devices as a normal
XDP program, and only programs with BPF_XDP_CPUMAP attach type can
be loaded in a CPUMAP.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/9c632fcea5382ea7b4578bd06b6eddf382c3550b.1594734381.git.lorenzo@kernel.org

samples/bpf: xdp_redirect_cpu: Load a eBPF program on cpumap

Extend xdp_redirect_cpu_{usr,kern}.c adding the possibility to load
a XDP program on cpumap entries. The following options have been added:
- mprog-name: cpumap entry program name
- mprog-filename: cpumap entry program filename
- redirect-device: output interface if the cpumap program performs a
XDP_REDIRECT to an egress interface
- redirect-map: bpf map used to perform XDP_REDIRECT to an egress
interface
- mprog-disable: disable loading XDP program on cpumap entries

Add xdp_pass, xdp_drop, xdp_redirect stats accounting

Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/aa5a9a281b9dac425620fdabe82670ffb6bbdb92.1594734381.git.lorenzo@kernel.org

libbpf: Add SEC name for xdp programs attached to CPUMAP

As for DEVMAP, support SEC("xdp_cpumap/") as a short cut for loading
the program with type BPF_PROG_TYPE_XDP and expected attach type
BPF_XDP_CPUMAP.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/33174c41993a6d860d9c7c1f280a2477ee39ed11.1594734381.git.lorenzo@kernel.org

bpf: cpumap: Implement XDP_REDIRECT for eBPF programs attached to map entries

Introduce XDP_REDIRECT support for eBPF programs attached to cpumap
entries.
This patch has been tested on Marvell ESPRESSObin using a modified
version of xdp_redirect_cpu sample in order to attach a XDP program
to CPUMAP entries to perform a redirect on the mvneta interface.
In particular the following scenario has been tested:

rq (cpu0) --> mvneta - XDP_REDIRECT (cpu0) --> CPUMAP - XDP_REDIRECT (cpu1) --> mvneta

$./xdp_redirect_cpu -p xdp_cpu_map0 -d eth0 -c 1 -e xdp_redirect \
-f xdp_redirect_kern.o -m tx_port -r eth0

tx: 285.2 Kpps rx: 285.2 Kpps

Attaching a simple XDP program on eth0 to perform XDP_TX gives
comparable results:

tx: 288.4 Kpps rx: 288.4 Kpps

Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/2cf8373a731867af302b00c4ff16c122630c4980.1594734381.git.lorenzo@kernel.org

bpf: cpumap: Add the possibility to attach an eBPF program to cpumap

Introduce the capability to attach an eBPF program to cpumap entries.
The idea behind this feature is to add the possibility to define on
which CPU run the eBPF program if the underlying hw does not support
RSS. Current supported verdicts are XDP_DROP and XDP_PASS.

This patch has been tested on Marvell ESPRESSObin using xdp_redirect_cpu
sample available in the kernel tree to identify possible performance
regressions. Results show there are no observable differences in
packet-per-second:

$./xdp_redirect_cpu --progname xdp_cpu_map0 --dev eth0 --cpu 1
rx: 354.8 Kpps
rx: 356.0 Kpps
rx: 356.8 Kpps
rx: 356.3 Kpps
rx: 356.6 Kpps
rx: 356.6 Kpps
rx: 356.7 Kpps
rx: 355.8 Kpps
rx: 356.8 Kpps
rx: 356.8 Kpps

Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/5c9febdf903d810b3415732e5cd98491d7d9067a.1594734381.git.lorenzo@kernel.org

cpumap: Formalize map value as a named struct

As it has been already done for devmap, introduce 'struct bpf_cpumap_val'
to formalize the expected values that can be passed in for a CPUMAP.
Update cpumap code to use the struct.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/754f950674665dae6139c061d28c1d982aaf4170.1594734381.git.lorenzo@kernel.org

samples/bpf: xdp_redirect_cpu_user: Do not update bpf maps in option loop

Do not update xdp_redirect_cpu maps running while option loop but
defer it after all available options have been parsed. This is a
preliminary patch to pass the program name we want to attach to the
map entries as a user option

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/95dc46286fd2c609042948e04bb7ae1f5b425538.1594734381.git.lorenzo@kernel.org

net: Refactor xdp_convert_buff_to_frame

Move the guts of xdp_convert_buff_to_frame to a new helper,
xdp_update_frame_from_buff so it can be reused removing code duplication

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/90a68c283d7ebeb48924934c9b7ac79492300472.1594734381.git.lorenzo@kernel.org

cpumap: Use non-locked version __ptr_ring_consume_batched

Commit 77361825bb01 ("bpf: cpumap use ptr_ring_consume_batched") changed
away from using single frame ptr_ring dequeue (__ptr_ring_consume) to
consume a batched, but it uses a locked version, which as the comment
explain isn't needed.

Change to use the non-locked version __ptr_ring_consume_batched.

Fixes: 77361825bb01 ("bpf: cpumap use ptr_ring_consume_batched")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/a9c7d06f9a009e282209f0c8c7b2c5d9b9ad60b9.1594734381.git.lorenzo@kernel.org

net: ipv6: drop duplicate word in comment

Drop the doubled word "by" in a comment.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sctp: drop duplicate words in comments

Drop doubled words in several comments.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ip6_fib.h: drop duplicate word in comment

Drop doubled word "the" in a comment.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa.h: drop duplicate word in comment

Drop doubled word "to" in a comment.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: caif: drop duplicate words in comments

Drop doubled words "or" and "the" in several comments.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: 9p: drop duplicate word in comment

Drop doubled word "not" in a comment.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: wimax: fix duplicate words in comments

Drop doubled words in two comments.
Fix a spello/typo.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: skbuff.h: drop duplicate words in comments

Drop doubled words in several comments.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: qed: drop duplicate words in comments

Drop doubled word "the" in two comments.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

drivers: net: wan: Fix trivial spelling

The word 'descriptor' is misspelled throughout the tree.

Fix it up accordingly:
decriptor -> descriptor

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlxsw-reg-add-policer-bandwidth-limits'

Ido Schimmel says:

====================
mlxsw: Offload tc police action

This patch set adds support for tc police action in mlxsw.

Patches #1-#2 add defines for policer bandwidth limits and resource
identifiers (e.g., maximum number of policers).

Patch #3 adds a common policer core in mlxsw. Currently it is only used
by the policy engine, but future patch sets will use it for trap
policers and storm control policers. The common core allows us to share
common logic between all policer types and abstract certain details from
the various users in mlxsw.

Patch #4 exposes the maximum number of supported policers and their
current usage to user space via devlink-resource. This provides better
visibility and also used for selftests purposes.

Patches #5-#7 gradually add support for tc police action in the policy
engine by calling into previously mentioned policer core.

Patch #8 adds a generic selftest for tc-police that can be used with
veth pairs or physical loopbacks.

Patches #9-#11 add mlxsw-specific selftests.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: Test policers' occupancy

Test that policers shared by different tc filters are correctly
reference counted by observing policers' occupancy via devlink-resource.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: Add scale test for tc-police

Query the maximum number of supported policers using devlink-resource
and test that this number can be reached by configuring tc filters with
police action. Test that an error is returned in case the maximum number
is exceeded.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: tc_restrictions: Test tc-police restrictions

Test that upper and lower limits on rate and burst size imposed by the
device are rejected by the kernel.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: forwarding: Add tc-police tests

Test tc-police action in various scenarios such as Rx policing, Tx
policing, shared policer and police piped to mirred. The test passes
with both veth pairs and loopbacked ports.

# ./tc_police.sh
TEST: police on rx                                                  [ OK ]
TEST: police on tx                                                  [ OK ]
TEST: police with shared policer - rx                               [ OK ]
TEST: police with shared policer - tx                               [ OK ]
TEST: police rx and mirror                                          [ OK ]
TEST: police tx and mirror                                          [ OK ]

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_acl: Offload FLOW_ACTION_POLICE

Offload action police when used with a flower classifier. The number of
dropped packets is read from the policer and reported to tc.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: core_acl_flex_actions: Add police action

Add core functionality required to support police action in the policy
engine.

The utilized hardware policers are stored in a hash table keyed by the
flow action index. This allows to support policer sharing between
multiple ACL rules.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: core_acl_flex_actions: Work around hardware limitation

In the policy engine, each ACL rule points to an action block where the
ACL actions are stored. Each action block consists of one or more action
sets. Each action set holds one or more individual actions, up to a
maximum queried from the device. For example:

                        Action set #1               Action set #2

+----------+          +--------------+            +--------------+
| ACL rule +---------->  Action #1   |      +----->  Action #4   |
+----------+          +--------------+      |     +--------------+
                      |  Action #2   |      |     |  Action #5   |
                      +--------------+      |     +--------------+
                      |  Action #3   +------+     |              |
                      +--------------+            +--------------+

                      <---------+ Action block +----------------->

The hardware has a limitation that prevents a policing action
(MLXSW_AFA_POLCNT_CODE when used with a policer, not a counter) from
being configured in the same action set with a trap action (i.e.,
MLXSW_AFA_TRAP_CODE or MLXSW_AFA_TRAPWU_CODE). Note that the latter used
to implement multiple actions: 'trap', 'mirred', 'drop'.

Work around this limitation by teaching mlxsw_afa_block_append_action()
to create a new action set not only when there is no more room left in
the current set, but also when there is a conflict between previously
mentioned actions.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_policer: Add devlink resource support

Expose via devlink-resource the maximum number of single-rate policers
and their current occupancy. Example:

$ devlink resource show pci/0000:01:00.0
...
  name global_policers size 1000 unit entry dpipe_tables none
    resources:
      name single_rate_policers size 968 occ 0 unit entry dpipe_tables none

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_policer: Add policer core

Add common code to handle all policer-related functionality in mlxsw.
Currently, only policer for policy engines are supported, but it in the
future more policer families will be added such as CPU (trap) policers
and storm control policers.

The API allows different modules to add / delete policers and read their
drop counter.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: resources: Add resource identifier for global policers

Add a resource identifier for maximum global policers so that it could
be later used to query the information from firmware.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: reg: Add policer bandwidth limits

Add policer bandwidth limits for both rate and burst size so that they
could be enforced by a later patch.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hinic: add firmware update support

add support to update firmware by the devlink flashing API

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

decnet: dn_dev: Remove an unnecessary label.

Remove the unnecessary label from dn_dev_ioctl() and make its error
handling simpler to read.

Signed-off-by: Suraj Upadhyay <usuraj35@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: networking: timestamping: add section for stacked PHC devices

The concept of timestamping DSA switches / Ethernet PHYs is becoming
more and more popular, however the Linux kernel timestamping code has
evolved quite organically and there's layers upon layers of new and old
code that need to work together for things to behave as expected.

Add this chapter to explain what the overall goals are.

Loosely based upon this email discussion plus some more info:
https://lkml.org/lkml/2020/7/6/481

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sundance: Replace HTTP links with HTTPS ones

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
  If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
            If both the HTTP and HTTPS versions
            return 200 OK and serve the same content:
              Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netpoll: Remove unused inline function netpoll_netdev_init()

commit d565b0a1a9b6 ("net: Add Generic Receive Offload infrastructure")
left behind this, remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: Remove unused inline function mptcp_rcv_synsent()

commit 263e1201a2c3 ("mptcp: consolidate synack processing.")
left behind this, remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: flow: Remove unused inline function

It is not used since commit 09c7570480f7 ("xfrm: remove flow cache")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cipso: Remove unused inline functions

They are not used any more since commit b1edeb102397 ("netlabel: Replace
protocol/NetLabel linking with refrerence counts")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'udp_tunnel-NIC-RX-port-offload-infrastructure'

Jakub Kicinski says:

====================
udp_tunnel: NIC RX port offload infrastructure

This set of patches converts further drivers to use the new
infrastructure to UDP tunnel port offload merged in
commit 0ea460474d70 ("Merge branch 'udp_tunnel-add-NIC-RX-port-offload-infrastructure'").

v3:
- fix a W=1 build warning in qede.
v2:
- fix a W=1 build warning in xgbe,
- expand the size of tables for lio.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: convert to new udp_tunnel_nic infra

Straightforward conversion to new infra, 1 VxLAN port, handler
may sleep.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

qede: convert to new udp_tunnel_nic infra

Covert to new infra. Looks like this driver was not doing
ref counting, and sleeping in the callback.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

fm10k: convert to new udp_tunnel_nic infra

Straightforward conversion to new infra. Driver restores info
after close/open cycle by calling its internal restore function
so just use that, no need for udp_tunnel_nic_reset_ntf() here.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio_vf: convert to new udp_tunnel_nic infra

Carbon copy of the previous change.

This driver is just a super thin FW interface, but Derek let us
know the table has 1024 entries.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: convert to new udp_tunnel_nic infra

This driver is just a super thin FW interface, but Derek let us
know the table has 1024 entries.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

enic: convert to new udp_tunnel_nic infra

Convert to new infra, now the refcounting will be correct,
and driver gets port replay of other ports when offloaded
port gets removed.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: convert to new udp_tunnel_nic infra

Convert to new infra, this driver is very simple. The check of
adapter->rawf_cnt in cxgb_udp_tunnel_unset_port() is kept from
the old port deletion function but it's dodgy since nothing ever
updates that member once its set during init. Also .set_port
callback always adds the raw mac filter..

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: convert to new udp_tunnel_nic infra

Fairly straightforward conversion - no need to keep track
of the use count, and replay when ports get removed, also
callbacks can just sleep.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

xgbe: convert to new udp_tunnel_nic infra

Make use of the new udp_tunnel_nic infra. Don't clear the features
when VxLAN port is not present to make all drivers behave the same.
Driver will now (until we address the problem in the core) leave
the RX UDP tunnel feature always on, since this is what most drivers
do.

Remove the list of VxLAN ports, just program the one core told us to.

The driver seem to want to clear the VxLAN ports on close but it
doesn't seem to flush the port list properly so it'd get wrong
use counts after close/open. Again since it calls its own open
handler we need the reset notification hack.

v2:
- fix kbuild warning

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

xgbe: switch to more generic VxLAN detection

Instead of looping though the list of ports just check
if the geometry of the packet is correct for VxLAN.
HW most likely doesn't care about the exact port, anyway,
since only first port is actually offloaded, and this way
we won't have to maintain the port list at all.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: convert to new udp_tunnel_nic infra

Convert be2net to new udp_tunnel_nic infra. NIC only takes one VxLAN
port. Remove the port tracking using a list. The warning in
be_work_del_vxlan_port() looked suspicious - like the driver expected
ports to be removed in order of addition.

be2net unregisters ports when going down and re-registers them (for
skyhawk) when coming up, but it never checks if the device is up
in the add_port / del_port callbacks. Make it use
UDP_TUNNEL_NIC_INFO_OPEN_ONLY. Sadly this driver calls its own
open/close functions directly so the udp_tunnel_nic_reset_ntf()
workaround is needed.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: convert to new udp_tunnel_nic infra

NFP conversion is pretty straightforward. We want to be able
to sleep, and only get callbacks when the device is open.

NFP did not ask for port replay when ports were removed, now
new infra will provide this feature for free.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-07-14

The following pull-request contains BPF updates for your *net-next* tree.

We've added 21 non-merge commits during the last 1 day(s) which contain
a total of 20 files changed, 308 insertions(+), 279 deletions(-).

The main changes are:

1) Fix selftests/bpf build, from Alexei.

2) Fix resolve_btfids build issues, from Jiri.

3) Pull usermode-driver-cleanup set, from Eric.

4) Two minor fixes to bpfilter, from Alexei and Masahiro.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ksz884x: switch from 'pci_' to 'dma_' API

The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.

When memory is allocated in 'ksz_alloc_desc()', GFP_KERNEL can be used
because a few lines below, GFP_KERNEL is also used in the
'ksz_alloc_soft_desc()' calls.

@@
@@
-    PCI_DMA_BIDIRECTIONAL
+    DMA_BIDIRECTIONAL

@@
@@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@
@@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
@@
-    PCI_DMA_NONE
+    DMA_NONE

@@
expression e1, e2, e3;
@@
-    pci_alloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-    pci_zalloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-    pci_free_consistent(e1, e2, e3, e4)
+    dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-    pci_map_page(e1, e2, e3, e4, e5)
+    dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_page(e1, e2, e3, e4)
+    dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_sg(e1, e2, e3, e4)
+    dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_sg(e1, e2, e3, e4)
+    dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_device(e1, e2, e3, e4)
+    dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-    pci_dma_mapping_error(e1, e2)
+    dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_dma_mask(e1, e2)
+    dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_consistent_dma_mask(e1, e2)
+    dma_set_coherent_mask(&e1->dev, e2)

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'r8169-add-support-for-RTL8125B'

Heiner Kallweit says:

====================
r8169: add support for RTL8125B

This series adds support for RTL8125B rev.b.
Tested with a Delock 89564 PCIe card.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: add support for RTL8125B

Add support for RTL8125B rev.b. In my tests 2.5Gbps worked well
w/o firmware, however for a stable link at 1Gbps firmware revision
0.0.2 is needed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: realtek: add support for RTL8125B-internal PHY

Realtek assigned a new PHY ID for the RTL8125B-internal PHY.
It's however compatible with the RTL8125A-internal PHY.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 's390-qeth-next'

Julian Wiedmann says:

====================
s390/qeth: updates 2020-07-14

please apply the following patch series for qeth to netdev's net-next tree.

This brings a mix of cleanups for various parts of the control code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: constify the MPC initialization data

We're not modifying these data blobs, so mark them as constant.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: unify RX-mode hashtables

To keep track of the addresses programmed from an RX modeset, we have
two separate hashtables (L2: mac_htable, L3: ip_mc_htable).

These are never used at the same time, so unify them into a single
rx_mode_addrs hashtable.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: cleanup OAT code

While initially just trying to fix up the indentation, condense a few
lines and get rid of a goto label.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: clean up a magic number in the OAT callback

Use the correct struct member instead of hardcoding its offset.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: use u64_to_user_ptr() in the OAT code

Use the correct helper for casting to a user pointer.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: clean up error handling for isolation mode cmds

As the cmd IO path has learned to propagate errnos back to its callers,
let them deal with errors instead of trying to restore their previous
configuration from within the IO error path.

Also translate the HW error to a meaningful errno, instead of returning
-EIO for all cases (and don't map this to -EOPNOTSUPP later on...).

While at it, add a READ_ONCE() / WRITE_ONCE() pair to ensure that the
data path always sees a valid isolation mode during reconfiguration.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: don't clear the configured isolation mode

When qeth_set_access_ctrl_online() is called during the device's
initialization and discovers that isolation mode isn't supported, don't
clear the user's currently configured mode.
They intentionally choose to operate the device in this specific mode,
and degrading the isolation is not an option.

Only adjust the configuration when called via sysfs (ie. fallback = 1),
and here follow the common pattern and restore it from prev_isolation.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: only init the isolation mode when necessary

A newly initialized device defaults to ISOLATION_MODE_NONE, don't bother
with programming this a second time.

Then remove the OSD/OSX check, it's already done in the sysfs path
whenever the user actually changes the configuration.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: fine-tune errno when cmds are cancelled

If we cancel all pending cmds (eg. when tearing down the device), don't
blame it on an IO error.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: reject unsupported link type earlier

Rather than delaying the decision until netdev setup, immediately reject
a device when we discover that it has an unsupported link type
(ie. Token Ring).

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-Mirror-to-CPU-preparations'

Ido Schimmel says:

====================
mlxsw: Mirror to CPU preparations

A future patch set will add the ability to trap packets that were
dropped due to buffer related reasons (e.g., early drop). Internally
this is implemented by mirroring these packets towards the CPU port.
This patch set adds the required infrastructure to enable such
mirroring.

Patches #1-#2 extend two registers needed for above mentioned
functionality.

Patches #3-#6 gradually add support for setting the mirroring target of
a SPAN (mirroring) agent as the CPU port. This is only supported from
Spectrum-2 onwards, so an error is returned for Spectrum-1.

Patches #7-#8 add the ability to set a policer on a SPAN agent. This is
required because unlike regularly trapped packets, a policer cannot be
set on the trap group with which the mirroring trap is associated.

Patches #9-#12 parse the mirror reason field from the Completion Queue
Element (CQE). Unlike other trapped packets, the trap identifier of
mirrored packets only indicates that the packet was mirrored, but not
why. The reason (e.g., tail drop) is encoded in the mirror reason field.

Patch #13 utilizes the mirror reason field in order to lookup the
matching Rx listener. This allows us to maintain the abstraction that an
Rx listener is mapped to a single trap reason. Without taking the mirror
reason into account we would need to register a single Rx listener for
all mirrored packets.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: core: Use mirror reason during Rx listener lookup

The Rx listener abstraction allows the switch driver (e.g.,
mlxsw_spectrum) to register a function that is called when a packet is
received (trapped) for a specific reason.

Up until now, the Rx listener lookup was solely based on the trap
identifier. However, when a packet is mirrored to the CPU the trap
identifier merely indicates that the packet was mirrored, but not why it
was mirrored. This makes it impossible for the switch driver to register
different Rx listeners for different mirror reasons.

Solve this by allowing the switch driver to register a Rx listener with
a mirror reason and by extending the Rx listener lookup to take the
mirror reason into account.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: pci: Retrieve mirror reason from CQE during receive

In case the mirror reason is valid, retrieve it into the Rx information
so that it could be used during listener lookup in a later patch.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: pci: Add mirror reason field to CQEv2

The Completion Queue Element version 2 (CQEv2) includes a field called
'mirror_reason' which indicates why the packet was mirrored to the CPU.

Add the field so that it can be used by a later patch.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>