platform/kernel/linux-rpi.git
3 years agobpftool: Remove unused includes to <bpf/bpf_gen_internal.h>
Quentin Monnet [Thu, 7 Oct 2021 19:44:28 +0000 (20:44 +0100)]
bpftool: Remove unused includes to <bpf/bpf_gen_internal.h>

It seems that the header file was never necessary to compile bpftool,
and it is not part of the headers exported from libbpf. Let's remove the
includes from prog.c and gen.c.

Fixes: d510296d331a ("bpftool: Use syscall/loader program in "prog load" and "gen skeleton" command.")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211007194438.34443-3-quentin@isovalent.com
3 years agolibbpf: Skip re-installing headers file if source is older than target
Quentin Monnet [Thu, 7 Oct 2021 19:44:27 +0000 (20:44 +0100)]
libbpf: Skip re-installing headers file if source is older than target

The "install_headers" target in libbpf's Makefile would unconditionally
export all API headers to the target directory. When those headers are
installed to compile another application, this means that make always
finds newer dependencies for the source files relying on those headers,
and deduces that the targets should be rebuilt.

Avoid that by making "install_headers" depend on the source header
files, and (re-)install them only when necessary.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211007194438.34443-2-quentin@isovalent.com
3 years agoselftests/bpf: Fix btf_dump test under new clang
Yucong Sun [Fri, 8 Oct 2021 17:31:39 +0000 (10:31 -0700)]
selftests/bpf: Fix btf_dump test under new clang

New clang version changed ([0]) type name in dwarf from "long int" to "long",
this is causing btf_dump tests to fail.

  [0] https://github.com/llvm/llvm-project/commit/f6a561c4d6754b13165a49990e8365d819f64c86

Signed-off-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211008173139.1457407-1-fallentree@fb.com
3 years agoselftests/bpf: Remove SEC("version") from test progs
Dave Marchevsky [Thu, 7 Oct 2021 23:12:34 +0000 (16:12 -0700)]
selftests/bpf: Remove SEC("version") from test progs

Since commit 6c4fc209fcf9d ("bpf: remove useless version check for prog
load") these "version" sections, which result in bpf_attr.kern_version
being set, have been unnecessary.

Remove them so that it's obvious to folks using selftests as a guide that
"modern" BPF progs don't need this section.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211007231234.2223081-1-davemarchevsky@fb.com
3 years agoselftests/bpf: Skip the second half of get_branch_snapshot in vm
Song Liu [Thu, 7 Oct 2021 05:02:31 +0000 (22:02 -0700)]
selftests/bpf: Skip the second half of get_branch_snapshot in vm

VMs running on upstream 5.12+ kernel support LBR. However,
bpf_get_branch_snapshot couldn't stop the LBR before too many entries
are flushed. Skip the hit/waste test for VMs before we find a proper fix
for LBR in VM.

Fixes: 025bd7c753aa ("selftests/bpf: Add test for bpf_get_branch_snapshot")
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211007050231.728496-1-songliubraving@fb.com
3 years agobpf, tests: Add more LD_IMM64 tests
Johan Almbladh [Thu, 7 Oct 2021 14:30:06 +0000 (16:30 +0200)]
bpf, tests: Add more LD_IMM64 tests

This patch adds new tests for the two-instruction LD_IMM64. The new tests
verify the operation with immediate values of different byte patterns.
Mainly intended to cover JITs that want to be clever when loading 64-bit
constants.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211007143006.634308-1-johan.almbladh@anyfinetworks.com
3 years agomips, bpf: Optimize loading of 64-bit constants
Johan Almbladh [Thu, 7 Oct 2021 14:28:28 +0000 (16:28 +0200)]
mips, bpf: Optimize loading of 64-bit constants

This patch shaves off a few instructions when loading sparse 64-bit
constants to register. The change is covered by additional tests in
lib/test_bpf.c.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211007142828.634182-1-johan.almbladh@anyfinetworks.com
3 years agomips, bpf: Fix Makefile that referenced a removed file
Johan Almbladh [Thu, 7 Oct 2021 14:23:39 +0000 (16:23 +0200)]
mips, bpf: Fix Makefile that referenced a removed file

This patch removes a stale Makefile reference to the cBPF JIT that was
removed.

Fixes: ebcbacfa50ec ("mips, bpf: Remove old BPF JIT implementations")
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211007142339.633899-1-johan.almbladh@anyfinetworks.com
3 years agobpf, x64: Factor out emission of REX byte in more cases
Jie Meng [Wed, 6 Oct 2021 19:41:35 +0000 (12:41 -0700)]
bpf, x64: Factor out emission of REX byte in more cases

Introduce a single reg version of maybe_emit_mod() and factor out
common code in more cases.

Signed-off-by: Jie Meng <jmeng@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211006194135.608932-1-jmeng@fb.com
3 years agoMerge branch 'libbpf: Deprecate bpf_{map,program}__{prev,next} APIs since v0.7'
Andrii Nakryiko [Wed, 6 Oct 2021 17:58:34 +0000 (10:58 -0700)]
Merge branch 'libbpf: Deprecate bpf_{map,program}__{prev,next} APIs since v0.7'

Hengqi Chen says:

====================

bpf_{map,program}__{prev,next} don't follow the libbpf API naming
convention. Deprecate them and replace them with a new set of APIs
named bpf_object__{prev,next}_{program,map}.

v1->v2: [0]
  * Addressed Andrii's comments

  [0]: https://patchwork.kernel.org/project/netdevbpf/patch/20210906165456.325999-1-hengqi.chen@gmail.com/
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
3 years agolibbpf: Deprecate bpf_object__unload() API since v0.6
Hengqi Chen [Sat, 2 Oct 2021 16:10:00 +0000 (00:10 +0800)]
libbpf: Deprecate bpf_object__unload() API since v0.6

BPF objects are not reloadable after unload. Users are expected to use
bpf_object__close() to unload and free up resources in one operation.
No need to expose bpf_object__unload() as a public API, deprecate it
([0]).  Add bpf_object__unload() as an alias to internal
bpf_object_unload() and replace all bpf_object__unload() uses to avoid
compilation errors.

  [0] Closes: https://github.com/libbpf/libbpf/issues/290

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211002161000.3854559-1-hengqi.chen@gmail.com
3 years agoselftests/bpf: Switch to new bpf_object__next_{map,program} APIs
Hengqi Chen [Sun, 3 Oct 2021 16:58:44 +0000 (00:58 +0800)]
selftests/bpf: Switch to new bpf_object__next_{map,program} APIs

Replace deprecated bpf_{map,program}__next APIs with newly added
bpf_object__next_{map,program} APIs, so that no compilation warnings
emit.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211003165844.4054931-3-hengqi.chen@gmail.com
3 years agolibbpf: Add API documentation convention guidelines
Grant Seltzer [Mon, 4 Oct 2021 21:56:44 +0000 (17:56 -0400)]
libbpf: Add API documentation convention guidelines

This adds a section to the documentation for libbpf
naming convention which describes how to document
API features in libbpf, specifically the format of
which API doc comments need to conform to.

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211004215644.497327-1-grantseltzer@gmail.com
3 years agolibbpf: Deprecate bpf_{map,program}__{prev,next} APIs since v0.7
Hengqi Chen [Sun, 3 Oct 2021 16:58:43 +0000 (00:58 +0800)]
libbpf: Deprecate bpf_{map,program}__{prev,next} APIs since v0.7

Deprecate bpf_{map,program}__{prev,next} APIs. Replace them with
a new set of APIs named bpf_object__{prev,next}_{program,map} which
follow the libbpf API naming convention ([0]). No functionality changes.

  [0] Closes: https://github.com/libbpf/libbpf/issues/296

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211003165844.4054931-2-hengqi.chen@gmail.com
3 years agoselftest/bpf: Switch recursion test to use htab_map_delete_elem
Jiri Olsa [Sun, 3 Oct 2021 16:49:25 +0000 (18:49 +0200)]
selftest/bpf: Switch recursion test to use htab_map_delete_elem

Currently the recursion test is hooking __htab_map_lookup_elem
function, which is invoked both from bpf_prog and bpf syscall.

But in our kernel build, the __htab_map_lookup_elem gets inlined
within the htab_map_lookup_elem, so it's not trigered and the
test fails.

Fixing this by using htab_map_delete_elem, which is not inlined
for bpf_prog calls (like htab_map_lookup_elem is) and is used
directly as pointer for map_delete_elem, so it won't disappear
by inlining.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/YVnfFTL/3T6jOwHI@krava
3 years agobpf: Use $(pound) instead of \# in Makefiles
Quentin Monnet [Wed, 6 Oct 2021 11:10:49 +0000 (12:10 +0100)]
bpf: Use $(pound) instead of \# in Makefiles

Recent-ish versions of make do no longer consider number signs ("#") as
comment symbols when they are inserted inside of a macro reference or in
a function invocation. In such cases, the symbols should not be escaped.

There are a few occurrences of "\#" in libbpf's and samples' Makefiles.
In the former, the backslash is harmless, because grep associates no
particular meaning to the escaped symbol and reads it as a regular "#".
In samples' Makefile, recent versions of make will pass the backslash
down to the compiler, making the probe fail all the time and resulting
in the display of a warning about "make headers_install" being required,
even after headers have been installed.

A similar issue has been addressed at some other locations by commit
9564a8cf422d ("Kbuild: fix # escaping in .cmd files for future Make").
Let's address it for libbpf's and samples' Makefiles in the same
fashion, by using a "$(pound)" variable (pulled from
tools/scripts/Makefile.include for libbpf, or re-defined for the
samples).

Reference for the change in make:
https://git.savannah.gnu.org/cgit/make.git/commit/?id=c6966b323811c37acedff05b57

Fixes: 2f3830412786 ("libbpf: Make libbpf_version.h non-auto-generated")
Fixes: 07c3bbdb1a9b ("samples: bpf: print a warning about headers_install")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211006111049.20708-1-quentin@isovalent.com
3 years agobpf, arm: Remove dummy bpf_jit_compile stub
Daniel Borkmann [Wed, 6 Oct 2021 14:08:25 +0000 (16:08 +0200)]
bpf, arm: Remove dummy bpf_jit_compile stub

The BPF core defines a __weak bpf_jit_compile() dummy function already
which should only be overridden by JITs if they actually implement a
legacy cBPF JIT. Given arm implements an eBPF JIT, this stub is not
needed.

Now that MIPS cBPF JIT is finally gone, the only JIT left that is still
implementing bpf_jit_compile() is the sparc32 one.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
3 years agoMerge branch 'bpf-mips-jit'
Daniel Borkmann [Wed, 6 Oct 2021 13:52:24 +0000 (15:52 +0200)]
Merge branch 'bpf-mips-jit'

Johan Almbladh says:

====================
This is an implementation of an eBPF JIT for MIPS I-V and MIPS32/64 r1-r6.
The new JIT is written from scratch, but uses the same overall structure
as other eBPF JITs.

Before, the MIPS JIT situation looked like this.

  - 32-bit: MIPS32, cBPF-only, tests fail
  - 64-bit: MIPS64r2-r6, eBPF, tests fail, incomplete eBPF ISA support

The new JIT implementation raises the bar to the following level.

  - 32/64-bit: all MIPS ISA, eBPF, all tests pass, full eBPF ISA support

Overview
--------
The implementation supports all 32-bit and 64-bit eBPF instructions
defined as of this writing, including the recently-added atomics. It is
intended to provide good performance for native word size operations,
while also being complete so the JIT never has to fall back to the
interpreter. The new JIT replaces the current cBPF and eBPF JITs for MIPS.

The implementation is divided into separate files as follows. The source
files contains comments describing internal mechanisms and details on
things like eBPF-to-CPU register mappings, so I won't repeat that here.

  - jit_comp.[ch]    code shared between 32-bit and 64-bit JITs
  - jit_comp32.c     32-bit JIT implementation
  - jit_comp64.c     64-bit JIT implementation

Both the 32-bit and 64-bit versions map all eBPF registers to native MIPS
CPU registers. There are also enough unmapped CPU registers available to
allow all eBPF operations implemented natively by the JIT to use only CPU
registers without having to resort to stack scratch space.

Some operations are deemed too complex to implement natively in the JIT.
Those are instead implemented as a function call to a helper that performs
the operation. This is done in the following cases.

  - 64-bit div and mod on a 32-bit CPU
  - 64-bit atomics on a 32-bit CPU
  - 32-bit atomics on a 32-bit CPU that lacks ll/sc instructions

CPU errata workarounds
----------------------
The JIT implements workarounds for R10000, Loongson-2F and Loongson-3 CPU
errata. For the Loongson workarounds, I have used the public information
available on the matter.

Link: https://sourceware.org/legacy-ml/binutils/2009-11/msg00387.html
Testing
-------
During the development of the JIT, I have added a number of new test cases
to the test_bpf.ko test suite to be able to verify correctness of JIT
implementations in a more systematic way. The new additions increase the
test suite roughly three-fold, with many of the new tests being very
extensive and even exhaustive when feasible.

Link: https://lore.kernel.org/bpf/20211001130348.3670534-1-johan.almbladh@anyfinetworks.com/
Link: https://lore.kernel.org/bpf/20210914091842.4186267-1-johan.almbladh@anyfinetworks.com/
Link: https://lore.kernel.org/bpf/20210809091829.810076-1-johan.almbladh@anyfinetworks.com/
The JIT has been tested by running the test_bpf.ko test suite in QEMU with
the following MIPS ISAs, in both big and little endian mode, with and
without JIT hardening enabled.

  MIPS32r2, MIPS32r6, MIPS64r2, MIPS64r6

For the QEMU r2 targets, the correctness of pre-r2 code emitted has been
tested by manually overriding each of the following macros with 0.

  cpu_has_llsc, cpu_has_mips_2, cpu_has_mips_r1, cpu_has_mips_r2

Similarly, CPU errata workaround code has been tested by enabling the
each of the following configurations for the MIPS64r2 targets.

  CONFIG_WAR_R10000
  CONFIG_CPU_LOONGSON3_WORKAROUNDS
  CONFIG_CPU_NOP_WORKAROUNDS
  CONFIG_CPU_JUMP_WORKAROUNDS

The JIT passes all tests in all configurations. Below is the summary for
MIPS32r2 in little endian mode.

  test_bpf: Summary: 1006 PASSED, 0 FAILED, [994/994 JIT'ed]
  test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed]
  test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED

According to MIPS ISA reference documentation, the result of a 32-bit ALU
arithmetic operation on a 64-bit CPU is unpredictable if an operand
register value is not properly sign-extended to 64 bits. To verify the
code emitted by the JIT, the code generation engine in QEMU was modifed to
flip all low 32 bits if the above condition was not met. With this
trip-wire installed, the kernel booted properly in qemu-system-mips64el
and all test_bpf.ko tests passed.

Remaining features
------------------
While the JIT is complete is terms of eBPF ISA support, this series does
not include support for BPF-to-BPF calls and BPF trampolines. Those
features are planned to be added in another patch series.

The BPF_ST | BPF_NOSPEC instruction currently emits nothing. This is
consistent with the behavior if the MIPS interpreter and the existing
eBPF JIT.

Why not build on the existing eBPF JIT?
---------------------------------------
The existing eBPF JIT was originally written for MIPS64. An effort was
made to add MIPS32 support to it in commit 716850ab104d ("MIPS: eBPF:
Initial eBPF support for MIPS32 architecture."). That turned out to
contain a number of flaws, so eBPF support for MIPS32 was disabled in
commit 36366e367ee9 ("MIPS: BPF: Restore MIPS32 cBPF JIT").

Link: https://lore.kernel.org/bpf/5deaa994.1c69fb81.97561.647e@mx.google.com/
The current eBPF JIT for MIPS64 lacks a lot of functionality regarding
ALU32, JMP32 and atomic operations. It also lacks 32-bit CPU support on a
fundamental level, for example 32-bit CPU register mappings and o32 ABI
calling conventions. For optimization purposes, it tracks register usage
through the program control flow in order to do zero-extension and sign-
extension only when necessary, a static analysis of sorts. In my opinion,
having this kind of complexity in JITs, and for which there is not
adequate test coverage, is a problem. Such analysis should be done by the
verifier, if needed at all. Finally, when I run the BPF test suite
test_bpf.ko on the current JIT, there are errors and warnings.

I believe that an eBPF JIT should strive to be correct, complete and
optimized, and in that order. The JIT runs after the verifer has audited
the program and given its approval. If the JIT then emits code that does
something else, it will undermine the eBPF security model. A simple
implementation is easier to get correct than a complex one. Furthermore,
the real performance hit is not an extra CPU instruction here and there,
but when the JIT bails on an unimplemented eBPF instruction and cause the
whole program to fall back to the interpreter. My reasoning here boils
down to the following.

* The JIT should not contain a static analyzer that tracks branches.

* It is acceptable to emit possibly superfluous sign-/zero-extensions for
  ALU32 and JMP32 operations on a 64-bit MIPS to guarantee correctness.

* The JIT should handle all eBPF instructions on all MIPS CPUs.

I conclude that the current eBPF MIPS JIT is complex, incomplete and
incorrect. For the reasons stated above, I decided to not use the existing
JIT implementation.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
3 years agomips, bpf: Remove old BPF JIT implementations
Johan Almbladh [Tue, 5 Oct 2021 16:54:08 +0000 (18:54 +0200)]
mips, bpf: Remove old BPF JIT implementations

This patch removes the old 32-bit cBPF and 64-bit eBPF JIT implementations.
They are replaced by a new eBPF implementation that supports both 32-bit
and 64-bit MIPS CPUs.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211005165408.2305108-8-johan.almbladh@anyfinetworks.com
3 years agomips, bpf: Enable eBPF JITs
Johan Almbladh [Tue, 5 Oct 2021 16:54:07 +0000 (18:54 +0200)]
mips, bpf: Enable eBPF JITs

This patch enables the new eBPF JITs for 32-bit and 64-bit MIPS. It also
disables the old cBPF JIT to so cBPF programs are converted to use the
new JIT.

Workarounds for R4000 CPU errata are not implemented by the JIT, so the
JIT is disabled if any of those workarounds are configured.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211005165408.2305108-7-johan.almbladh@anyfinetworks.com
3 years agomips, bpf: Add JIT workarounds for CPU errata
Johan Almbladh [Tue, 5 Oct 2021 16:54:06 +0000 (18:54 +0200)]
mips, bpf: Add JIT workarounds for CPU errata

This patch adds workarounds for the following CPU errata to the MIPS
eBPF JIT, if enabled in the kernel configuration.

  - R10000 ll/sc weak ordering
  - Loongson-3 ll/sc weak ordering
  - Loongson-2F jump hang

The Loongson-2F nop errata is implemented in uasm, which the JIT uses,
so no additional mitigations are needed for that.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Link: https://lore.kernel.org/bpf/20211005165408.2305108-6-johan.almbladh@anyfinetworks.com
3 years agomips, bpf: Add new eBPF JIT for 64-bit MIPS
Johan Almbladh [Tue, 5 Oct 2021 16:54:05 +0000 (18:54 +0200)]
mips, bpf: Add new eBPF JIT for 64-bit MIPS

This is an implementation on of an eBPF JIT for 64-bit MIPS III-V and
MIPS64r1-r6. It uses the same framework introduced by the 32-bit JIT.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211005165408.2305108-5-johan.almbladh@anyfinetworks.com
3 years agomips, bpf: Add eBPF JIT for 32-bit MIPS
Johan Almbladh [Tue, 5 Oct 2021 16:54:04 +0000 (18:54 +0200)]
mips, bpf: Add eBPF JIT for 32-bit MIPS

This is an implementation of an eBPF JIT for 32-bit MIPS I-V and MIPS32.
The implementation supports all 32-bit and 64-bit ALU and JMP operations,
including the recently-added atomics. 64-bit div/mod and 64-bit atomics
are implemented using function calls to math64 and atomic64 functions,
respectively. All 32-bit operations are implemented natively by the JIT,
except if the CPU lacks ll/sc instructions.

Register mapping
================
All 64-bit eBPF registers are mapped to native 32-bit MIPS register pairs,
and does not use any stack scratch space for register swapping. This means
that all eBPF register data is kept in CPU registers all the time, and
this simplifies the register management a lot. It also reduces the JIT's
pressure on temporary registers since we do not have to move data around.

Native register pairs are ordered according to CPU endiannes, following
the O32 calling convention for passing 64-bit arguments and return values.
The eBPF return value, arguments and callee-saved registers are mapped to
their native MIPS equivalents.

Since the 32 highest bits in the eBPF FP (frame pointer) register are
always zero, only one general-purpose register is actually needed for the
mapping. The MIPS fp register is used for this purpose. The high bits are
mapped to MIPS register r0. This saves us one CPU register, which is much
needed for temporaries, while still allowing us to treat the R10 (FP)
register just like any other eBPF register in the JIT.

The MIPS gp (global pointer) and at (assembler temporary) registers are
used as internal temporary registers for constant blinding. CPU registers
t6-t9 are used internally by the JIT when constructing more complex 64-bit
operations. This is precisely what is needed - two registers to store an
operand value, and two more as scratch registers when performing the
operation.

The register mapping is shown below.

    R0 - $v1, $v0   return value
    R1 - $a1, $a0   argument 1, passed in registers
    R2 - $a3, $a2   argument 2, passed in registers
    R3 - $t1, $t0   argument 3, passed on stack
    R4 - $t3, $t2   argument 4, passed on stack
    R5 - $t4, $t3   argument 5, passed on stack
    R6 - $s1, $s0   callee-saved
    R7 - $s3, $s2   callee-saved
    R8 - $s5, $s4   callee-saved
    R9 - $s7, $s6   callee-saved
    FP - $r0, $fp   32-bit frame pointer
    AX - $gp, $at   constant-blinding
         $t6 - $t9  unallocated, JIT temporaries

Jump offsets
============
The JIT tries to map all conditional JMP operations to MIPS conditional
PC-relative branches. The MIPS branch offset field is 18 bits, in bytes,
which is equivalent to the eBPF 16-bit instruction offset. However, since
the JIT may emit more than one CPU instruction per eBPF instruction, the
field width may overflow. If that happens, the JIT converts the long
conditional jump to a short PC-relative branch with the condition
inverted, jumping over a long unconditional absolute jmp (j).

This conversion will change the instruction offset mapping used for jumps,
and may in turn result in more branch offset overflows. The JIT therefore
dry-runs the translation until no more branches are converted and the
offsets do not change anymore. There is an upper bound on this of course,
and if the JIT hits that limit, the last two iterations are run with all
branches being converted.

Tail call count
===============
The current tail call count is stored in the 16-byte area of the caller's
stack frame that is reserved for the callee in the o32 ABI. The value is
initialized in the prologue, and propagated to the tail-callee by skipping
the initialization instructions when emitting the tail call.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211005165408.2305108-4-johan.almbladh@anyfinetworks.com
3 years agomips, uasm: Add workaround for Loongson-2F nop CPU errata
Johan Almbladh [Tue, 5 Oct 2021 16:54:03 +0000 (18:54 +0200)]
mips, uasm: Add workaround for Loongson-2F nop CPU errata

This patch implements a workaround for the Loongson-2F nop in generated,
code, if the existing option CONFIG_CPU_NOP_WORKAROUND is set. Before,
the binutils option -mfix-loongson2f-nop was enabled, but no workaround
was done when emitting MIPS code. Now, the nop pseudo instruction is
emitted as "or ax,ax,zero" instead of the default "sll zero,zero,0". This
is consistent with the workaround implemented by binutils.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Link: https://sourceware.org/legacy-ml/binutils/2009-11/msg00387.html
Link: https://lore.kernel.org/bpf/20211005165408.2305108-3-johan.almbladh@anyfinetworks.com
3 years agomips, uasm: Enable muhu opcode for MIPS R6
Tony Ambardar [Tue, 5 Oct 2021 16:54:02 +0000 (18:54 +0200)]
mips, uasm: Enable muhu opcode for MIPS R6

Enable the 'muhu' instruction, complementing the existing 'mulu', needed
to implement a MIPS32 BPF JIT.

Also fix a typo in the existing definition of 'dmulu'.

Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211005165408.2305108-2-johan.almbladh@anyfinetworks.com
3 years agoselftests/bpf: Test new btf__add_btf() API
Andrii Nakryiko [Wed, 6 Oct 2021 05:11:07 +0000 (22:11 -0700)]
selftests/bpf: Test new btf__add_btf() API

Add a test that validates that btf__add_btf() API is correctly copying
all the types from the source BTF into destination BTF object and
adjusts type IDs and string offsets properly.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211006051107.17921-4-andrii@kernel.org
3 years agoselftests/bpf: Refactor btf_write selftest to reuse BTF generation logic
Andrii Nakryiko [Wed, 6 Oct 2021 05:11:06 +0000 (22:11 -0700)]
selftests/bpf: Refactor btf_write selftest to reuse BTF generation logic

Next patch will need to reuse BTF generation logic, which tests every
supported BTF kind, for testing btf__add_btf() APIs. So restructure
existing selftests and make it as a single subtest that uses bulk
VALIDATE_RAW_BTF() macro for raw BTF dump checking.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211006051107.17921-3-andrii@kernel.org
3 years agolibbpf: Add API that copies all BTF types from one BTF object to another
Andrii Nakryiko [Wed, 6 Oct 2021 05:11:05 +0000 (22:11 -0700)]
libbpf: Add API that copies all BTF types from one BTF object to another

Add a bulk copying api, btf__add_btf(), that speeds up and simplifies
appending entire contents of one BTF object to another one, taking care
of copying BTF type data, adjusting resulting BTF type IDs according to
their new locations in the destination BTF object, as well as copying
and deduplicating all the referenced strings and updating all the string
offsets in new BTF types as appropriate.

This API is intended to be used from tools that are generating and
otherwise manipulating BTFs generically, such as pahole. In pahole's
case, this API is useful for speeding up parallelized BTF encoding, as
it allows pahole to offload all the intricacies of BTF type copying to
libbpf and handle the parallelization aspects of the process.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: https://lore.kernel.org/bpf/20211006051107.17921-2-andrii@kernel.org
3 years agobpf, x64: Save bytes for DIV by reducing reg copies
Jie Meng [Sat, 2 Oct 2021 03:56:26 +0000 (20:56 -0700)]
bpf, x64: Save bytes for DIV by reducing reg copies

Instead of unconditionally performing push/pop on %rax/%rdx in case of
division/modulo, we can save a few bytes in case of destination register
being either BPF r0 (%rax) or r3 (%rdx) since the result is written in
there anyway.

Also, we do not need to copy the source to %r11 unless the source is either
%rax, %rdx or an immediate.

For example, before the patch:

  22:   push   %rax
  23:   push   %rdx
  24:   mov    %rsi,%r11
  27:   xor    %edx,%edx
  29:   div    %r11
  2c:   mov    %rax,%r11
  2f:   pop    %rdx
  30:   pop    %rax
  31:   mov    %r11,%rax

After:

  22:   push   %rdx
  23:   xor    %edx,%edx
  25:   div    %rsi
  28:   pop    %rdx

Signed-off-by: Jie Meng <jmeng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211002035626.2041910-1-jmeng@fb.com
3 years agobpf: Avoid retpoline for bpf_for_each_map_elem
Andrey Ignatov [Wed, 6 Oct 2021 00:18:38 +0000 (17:18 -0700)]
bpf: Avoid retpoline for bpf_for_each_map_elem

Similarly to 09772d92cd5a ("bpf: avoid retpoline for
lookup/update/delete calls on maps") and 84430d4232c3 ("bpf, verifier:
avoid retpoline for map push/pop/peek operation") avoid indirect call
while calling bpf_for_each_map_elem.

Before (a program fragment):

  ; if (rules_map) {
   142: (15) if r4 == 0x0 goto pc+8
   143: (bf) r3 = r10
  ; bpf_for_each_map_elem(rules_map, process_each_rule, &ctx, 0);
   144: (07) r3 += -24
   145: (bf) r1 = r4
   146: (18) r2 = subprog[+5]
   148: (b7) r4 = 0
   149: (85) call bpf_for_each_map_elem#143680  <-- indirect call via
                                                    helper

After (same program fragment):

   ; if (rules_map) {
    142: (15) if r4 == 0x0 goto pc+8
    143: (bf) r3 = r10
   ; bpf_for_each_map_elem(rules_map, process_each_rule, &ctx, 0);
    144: (07) r3 += -24
    145: (bf) r1 = r4
    146: (18) r2 = subprog[+5]
    148: (b7) r4 = 0
    149: (85) call bpf_for_each_array_elem#170336  <-- direct call

On a benchmark that calls bpf_for_each_map_elem() once and does many
other things (mostly checking fields in skb) with CONFIG_RETPOLINE=y it
makes program faster.

Before:

  ============================================================================
  Benchmark.cpp                                              time/iter iters/s
  ============================================================================
  IngressMatchByRemoteEndpoint                                80.78ns 12.38M
  IngressMatchByRemoteIP                                      80.66ns 12.40M
  IngressMatchByRemotePort                                    80.87ns 12.37M

After:

  ============================================================================
  Benchmark.cpp                                              time/iter iters/s
  ============================================================================
  IngressMatchByRemoteEndpoint                                73.49ns 13.61M
  IngressMatchByRemoteIP                                      71.48ns 13.99M
  IngressMatchByRemotePort                                    70.39ns 14.21M

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211006001838.75607-1-rdna@fb.com
3 years agoMerge branch 'Support kernel module function calls from eBPF'
Alexei Starovoitov [Wed, 6 Oct 2021 00:07:42 +0000 (17:07 -0700)]
Merge branch 'Support kernel module function calls from eBPF'

Kumar Kartikeya says:

====================

This set enables kernel module function calls, and also modifies verifier logic
to permit invalid kernel function calls as long as they are pruned as part of
dead code elimination. This is done to provide better runtime portability for
BPF objects, which can conditionally disable parts of code that are pruned later
by the verifier (e.g. const volatile vars, kconfig options). libbpf
modifications are made along with kernel changes to support module function
calls.

It also converts TCP congestion control objects to use the module kfunc support
instead of relying on IS_BUILTIN ifdef.

Changelog:
----------
v6 -> v7
v6: https://lore.kernel.org/bpf/20210930062948.1843919-1-memxor@gmail.com

 * Let __bpf_check_kfunc_call take kfunc_btf_id_list instead of generating
   callbacks (Andrii)
 * Rename it to bpf_check_mod_kfunc_call to reflect usage
 * Remove OOM checks (Alexei)
 * Remove resolve_btfids invocation for bpf_testmod (Andrii)
 * Move fd_array_cnt initialization near fd_array alloc (Andrii)
 * Rename helper to btf_find_by_name_kind and pass start_id (Andrii)
 * memset when data is NULL in add_data (Alexei)
 * Fix other nits

v5 -> v6
v5: https://lore.kernel.org/bpf/20210927145941.1383001-1-memxor@gmail.com

 * Rework gen_loader relocation emits
   * Only emit bpf_btf_find_by_name_kind call when required (Alexei)
   * Refactor code to emit ksym var and func relo into separate helpers, this
     will be easier to add future weak/typeless ksym support to (for my followup)
   * Count references for both ksym var and funcs, and avoid calling helpers
     unless required for both of them. This also means we share fds between
     ksym vars for the module BTFs. Also be careful with this when closing
     BTF fd so that we only close one instance of the fd for each ksym

v4 -> v5
v4: https://lore.kernel.org/bpf/20210920141526.3940002-1-memxor@gmail.com

 * Address comments from Alexei
   * Use reserved fd_array area in loader map instead of creating a new map
   * Drop selftest testing the 256 kfunc limit, however selftest testing reuse
     of BTF fd for same kfunc in gen_loader and libbpf is kept
 * Address comments from Andrii
   * Make --no-fail the default for resolve_btfids, i.e. only fail if we find
     BTF section and cannot process it
   * Use obj->btf_modules array to store index in the fd_array, so that we don't
     have to do any searching to reuse the index, instead only set it the first
     time a module BTF's fd is used
   * Make find_ksym_btf_id to return struct module_btf * in last parameter
   * Improve logging when index becomes bigger than INT16_MAX
   * Add btf__find_by_name_kind_own internal helper to only start searching for
     kfunc ID in module BTF, since find_ksym_btf_id already checks vmlinux BTF
     before iterating over module BTFs.
   * Fix various other nits
 * Fixes for failing selftests on BPF CI
 * Rearrange/cleanup selftests
   * Avoid testing kfunc limit (Alexei)
   * Do test gen_loader and libbpf BTF fd index dedup with 256 calls
   * Move invalid kfunc failure test to verifier selftest
   * Minimize duplication
 * Use consistent bpf_<type>_check_kfunc_call naming for module kfunc callback
 * Since we try to add fd using add_data while we can, cherry pick Alexei's
   patch from CO-RE RFC series to align gen_loader data.

v3 -> v4
v3: https://lore.kernel.org/bpf/20210915050943.679062-1-memxor@gmail.com

 * Address comments from Alexei
   * Drop MAX_BPF_STACK change, instead move map_fd and BTF fd to BPF array map
     and pass fd_array using BPF_PSEUDO_MAP_IDX_VALUE
 * Address comments from Andrii
   * Fix selftest to store to variable for observing function call instead of
     printk and polluting CI logs
 * Drop use of raw_tp for testing, instead reuse classifier based prog_test_run
 * Drop index + 1 based insn->off convention for kfunc module calls
 * Expand selftests to cover more corner cases
 * Misc cleanups

v2 -> v3
v2: https://lore.kernel.org/bpf/20210914123750.460750-1-memxor@gmail.com

 * Fix issues pointed out by Kernel Test Robot
 * Fix find_kfunc_desc to also take offset into consideration when comparing

RFC v1 -> v2
v1: https://lore.kernel.org/bpf/20210830173424.1385796-1-memxor@gmail.com

 * Address comments from Alexei
   * Reuse fd_array instead of introducing kfunc_btf_fds array
   * Take btf and module reference as needed, instead of preloading
   * Add BTF_KIND_FUNC relocation support to gen_loader infrastructure
 * Address comments from Andrii
   * Drop hashmap in libbpf for finding index of existing BTF in fd_array
   * Preserve invalid kfunc calls only when the symbol is weak
 * Adjust verifier selftests
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf: selftests: Add selftests for module kfunc support
Kumar Kartikeya Dwivedi [Sat, 2 Oct 2021 01:17:57 +0000 (06:47 +0530)]
bpf: selftests: Add selftests for module kfunc support

This adds selftests that tests the success and failure path for modules
kfuncs (in presence of invalid kfunc calls) for both libbpf and
gen_loader. It also adds a prog_test kfunc_btf_id_list so that we can
add module BTF ID set from bpf_testmod.

This also introduces  a couple of test cases to verifier selftests for
validating whether we get an error or not depending on if invalid kfunc
call remains after elimination of unreachable instructions.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211002011757.311265-10-memxor@gmail.com
3 years agolibbpf: Update gen_loader to emit BTF_KIND_FUNC relocations
Kumar Kartikeya Dwivedi [Sat, 2 Oct 2021 01:17:56 +0000 (06:47 +0530)]
libbpf: Update gen_loader to emit BTF_KIND_FUNC relocations

This change updates the BPF syscall loader to relocate BTF_KIND_FUNC
relocations, with support for weak kfunc relocations. The general idea
is to move map_fds to loader map, and also use the data for storing
kfunc BTF fds. Since both reuse the fd_array parameter, they need to be
kept together.

For map_fds, we reserve MAX_USED_MAPS slots in a region, and for kfunc,
we reserve MAX_KFUNC_DESCS. This is done so that insn->off has more
chances of being <= INT16_MAX than treating data map as a sparse array
and adding fd as needed.

When the MAX_KFUNC_DESCS limit is reached, we fall back to the sparse
array model, so that as long as it does remain <= INT16_MAX, we pass an
index relative to the start of fd_array.

We store all ksyms in an array where we try to avoid calling the
bpf_btf_find_by_name_kind helper, and also reuse the BTF fd that was
already stored. This also speeds up the loading process compared to
emitting calls in all cases, in later tests.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211002011757.311265-9-memxor@gmail.com
3 years agolibbpf: Resolve invalid weak kfunc calls with imm = 0, off = 0
Kumar Kartikeya Dwivedi [Sat, 2 Oct 2021 01:17:55 +0000 (06:47 +0530)]
libbpf: Resolve invalid weak kfunc calls with imm = 0, off = 0

Preserve these calls as it allows verifier to succeed in loading the
program if they are determined to be unreachable after dead code
elimination during program load. If not, the verifier will fail at
runtime. This is done for ext->is_weak symbols similar to the case for
variable ksyms.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211002011757.311265-8-memxor@gmail.com
3 years agolibbpf: Support kernel module function calls
Kumar Kartikeya Dwivedi [Sat, 2 Oct 2021 01:17:54 +0000 (06:47 +0530)]
libbpf: Support kernel module function calls

This patch adds libbpf support for kernel module function call support.
The fd_array parameter is used during BPF program load to pass module
BTFs referenced by the program. insn->off is set to index into this
array, but starts from 1, because insn->off as 0 is reserved for
btf_vmlinux.

We try to use existing insn->off for a module, since the kernel limits
the maximum distinct module BTFs for kfuncs to 256, and also because
index must never exceed the maximum allowed value that can fit in
insn->off (INT16_MAX). In the future, if kernel interprets signed offset
as unsigned for kfunc calls, this limit can be increased to UINT16_MAX.

Also introduce a btf__find_by_name_kind_own helper to start searching
from module BTF's start id when we know that the BTF ID is not present
in vmlinux BTF (in find_ksym_btf_id).

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211002011757.311265-7-memxor@gmail.com
3 years agobpf: Enable TCP congestion control kfunc from modules
Kumar Kartikeya Dwivedi [Sat, 2 Oct 2021 01:17:53 +0000 (06:47 +0530)]
bpf: Enable TCP congestion control kfunc from modules

This commit moves BTF ID lookup into the newly added registration
helper, in a way that the bbr, cubic, and dctcp implementation set up
their sets in the bpf_tcp_ca kfunc_btf_set list, while the ones not
dependent on modules are looked up from the wrapper function.

This lifts the restriction for them to be compiled as built in objects,
and can be loaded as modules if required. Also modify Makefile.modfinal
to call resolve_btfids for each module.

Note that since kernel kfunc_ids never overlap with module kfunc_ids, we
only match the owner for module btf id sets.

See following commits for background on use of:

 CONFIG_X86 ifdef:
 569c484f9995 (bpf: Limit static tcp-cc functions in the .BTF_ids list to x86)

 CONFIG_DYNAMIC_FTRACE ifdef:
 7aae231ac93b (bpf: tcp: Limit calling some tcp cc functions to CONFIG_DYNAMIC_FTRACE)

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211002011757.311265-6-memxor@gmail.com
3 years agotools: Allow specifying base BTF file in resolve_btfids
Kumar Kartikeya Dwivedi [Sat, 2 Oct 2021 01:17:52 +0000 (06:47 +0530)]
tools: Allow specifying base BTF file in resolve_btfids

This commit allows specifying the base BTF for resolving btf id
lists/sets during link time in the resolve_btfids tool. The base BTF is
set to NULL if no path is passed. This allows resolving BTF ids for
module kernel objects.

Also, drop the --no-fail option, as it is only used in case .BTF_ids
section is not present, instead make no-fail the default mode. The long
option name is same as that of pahole.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211002011757.311265-5-memxor@gmail.com
3 years agobpf: btf: Introduce helpers for dynamic BTF set registration
Kumar Kartikeya Dwivedi [Sat, 2 Oct 2021 01:17:51 +0000 (06:47 +0530)]
bpf: btf: Introduce helpers for dynamic BTF set registration

This adds helpers for registering btf_id_set from modules and the
bpf_check_mod_kfunc_call callback that can be used to look them up.

With in kernel sets, the way this is supposed to work is, in kernel
callback looks up within the in-kernel kfunc whitelist, and then defers
to the dynamic BTF set lookup if it doesn't find the BTF id. If there is
no in-kernel BTF id set, this callback can be used directly.

Also fix includes for btf.h and bpfptr.h so that they can included in
isolation. This is in preparation for their usage in tcp_bbr, tcp_cubic
and tcp_dctcp modules in the next patch.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211002011757.311265-4-memxor@gmail.com
3 years agobpf: Be conservative while processing invalid kfunc calls
Kumar Kartikeya Dwivedi [Sat, 2 Oct 2021 01:17:50 +0000 (06:47 +0530)]
bpf: Be conservative while processing invalid kfunc calls

This patch also modifies the BPF verifier to only return error for
invalid kfunc calls specially marked by userspace (with insn->imm == 0,
insn->off == 0) after the verifier has eliminated dead instructions.
This can be handled in the fixup stage, and skip processing during add
and check stages.

If such an invalid call is dropped, the fixup stage will not encounter
insn->imm as 0, otherwise it bails out and returns an error.

This will be exposed as weak ksym support in libbpf in later patches.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211002011757.311265-3-memxor@gmail.com
3 years agobpf: Introduce BPF support for kernel module function calls
Kumar Kartikeya Dwivedi [Sat, 2 Oct 2021 01:17:49 +0000 (06:47 +0530)]
bpf: Introduce BPF support for kernel module function calls

This change adds support on the kernel side to allow for BPF programs to
call kernel module functions. Userspace will prepare an array of module
BTF fds that is passed in during BPF_PROG_LOAD using fd_array parameter.
In the kernel, the module BTFs are placed in the auxilliary struct for
bpf_prog, and loaded as needed.

The verifier then uses insn->off to index into the fd_array. insn->off
0 is reserved for vmlinux BTF (for backwards compat), so userspace must
use an fd_array index > 0 for module kfunc support. kfunc_btf_tab is
sorted based on offset in an array, and each offset corresponds to one
descriptor, with a max limit up to 256 such module BTFs.

We also change existing kfunc_tab to distinguish each element based on
imm, off pair as each such call will now be distinct.

Another change is to check_kfunc_call callback, which now include a
struct module * pointer, this is to be used in later patch such that the
kfunc_id and module pointer are matched for dynamically registered BTF
sets from loadable modules, so that same kfunc_id in two modules doesn't
lead to check_kfunc_call succeeding. For the duration of the
check_kfunc_call, the reference to struct module exists, as it returns
the pointer stored in kfunc_btf_tab.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211002011757.311265-2-memxor@gmail.com
3 years agoMerge tag 'for-net-next-2021-10-01' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Tue, 5 Oct 2021 14:41:14 +0000 (07:41 -0700)]
Merge tag 'for-net-next-2021-10-01' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

 - Add support for MediaTek MT7922 and MT7921
 - Enable support for AOSP extention in Qualcomm WCN399x and Realtek
   8822C/8852A.
 - Add initial support for link quality and audio/codec offload.
 - Rework of sockets sendmsg to avoid locking issues.
 - Add vhci suspend/resume emulation.

====================

Link: https://lore.kernel.org/r/20211001230850.3635543-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: usb: use eth_hw_addr_set() for dev->addr_len cases
Jakub Kicinski [Mon, 4 Oct 2021 16:05:22 +0000 (09:05 -0700)]
net: usb: use eth_hw_addr_set() for dev->addr_len cases

Convert usb drivers from memcpy(... dev->addr_len)
to eth_hw_addr_set():

  @@
  expression dev, np;
  @@
  - memcpy(dev->dev_addr, np, dev->addr_len)
  + eth_hw_addr_set(dev, np)

Manually checked these are either usbnet or pure etherdevs.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoethernet: use eth_hw_addr_set() for dev->addr_len cases
Jakub Kicinski [Mon, 4 Oct 2021 16:05:21 +0000 (09:05 -0700)]
ethernet: use eth_hw_addr_set() for dev->addr_len cases

Convert all Ethernet drivers from memcpy(... dev->addr_len)
to eth_hw_addr_set():

  @@
  expression dev, np;
  @@
  - memcpy(dev->dev_addr, np, dev->addr_len)
  + eth_hw_addr_set(dev, np)

In theory addr_len may not be ETH_ALEN, but we don't expect
non-Ethernet devices to live under this directory, and only
the following cases of setting addr_len exist:
 - cxgb4 for mgmt device,
and the drivers which set it to ETH_ALEN: s2io, mlx4, vxge.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mlx4-const-dev_addr'
David S. Miller [Tue, 5 Oct 2021 12:15:35 +0000 (13:15 +0100)]
Merge branch 'mlx4-const-dev_addr'

Jakub Kicinski says:

====================
mlx4: prep for constant dev->dev_addr

This patch converts mlx4 for dev->dev_addr being const. It converts
to use of common helpers but also removes some seemingly unnecessary
idiosyncrasies.

Please review.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlx4: constify args for const dev_addr
Jakub Kicinski [Mon, 4 Oct 2021 19:14:46 +0000 (12:14 -0700)]
mlx4: constify args for const dev_addr

netdev->dev_addr will become const soon. Make sure all
functions which pass it around mark appropriate args
as const.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlx4: remove custom dev_addr clearing
Jakub Kicinski [Mon, 4 Oct 2021 19:14:45 +0000 (12:14 -0700)]
mlx4: remove custom dev_addr clearing

mlx4_en_u64_to_mac() takes the dev->dev_addr pointer and writes
to it byte by byte. It also clears the two bytes _after_ ETH_ALEN
which seems unnecessary. dev->addr_len is set to ETH_ALEN just
before the call.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlx4: replace mlx4_u64_to_mac() with u64_to_ether_addr()
Jakub Kicinski [Mon, 4 Oct 2021 19:14:44 +0000 (12:14 -0700)]
mlx4: replace mlx4_u64_to_mac() with u64_to_ether_addr()

mlx4_u64_to_mac() predates the common helper but doesn't
make the argument constant.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlx4: replace mlx4_mac_to_u64() with ether_addr_to_u64()
Jakub Kicinski [Mon, 4 Oct 2021 19:14:43 +0000 (12:14 -0700)]
mlx4: replace mlx4_mac_to_u64() with ether_addr_to_u64()

mlx4_mac_to_u64() predates and opencodes ether_addr_to_u64().
It doesn't make the argument constant so it'll be problematic
when dev->dev_addr becomes a const. Convert to the generic helper.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonetlink: remove netlink_broadcast_filtered
Florian Westphal [Tue, 5 Oct 2021 11:52:42 +0000 (13:52 +0200)]
netlink: remove netlink_broadcast_filtered

No users in tree since commit a3498436b3a0 ("netns: restrict uevents"),
so remove this functionality.

Cc: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'mlx5-updates-2021-10-04' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Tue, 5 Oct 2021 10:42:38 +0000 (11:42 +0100)]
Merge tag 'mlx5-updates-2021-10-04' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-10-04

Misc updates for mlx5 driver

1) Add TX max rate support for MQPRIO channel mode
2) Trivial TC action and modify header refactoring
3) TC support for accept action in fdb offloads
4) Allow single IRQ for PCI functions

5) Bridge offload: Pop PVID VLAN header on egress miss

Vlad Buslov says:
=================

With current architecture of mlx5 bridge offload it is possible for a
packet to match in ingress table by source MAC (resulting VLAN header push
in case of port with configured PVID) and then miss in egress table when
destination MAC is not in FDB. Due to the lack of hardware learning in
NICs, this, in turn, results packet going to software data path with PVID
VLAN already added by hardware. This doesn't break software bridge since it
accepts either untagged packets or packets with any provisioned VLAN on
ports with PVID, but can break ingress TC, if affected part of Ethernet
header is matched by classifier.

Improve compatibility with software TC by restoring the packet header on
egress miss. Effectively, this change implements atomicity of mlx5 bridge
offload implementation - packet is either modified and redirected to
destination port or appears unmodified in software.

=================

=================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bgmac: support MDIO described in DT
Rafał Miłecki [Sat, 2 Oct 2021 17:58:12 +0000 (19:58 +0200)]
net: bgmac: support MDIO described in DT

Check ethernet controller DT node for "mdio" subnode and use it with
of_mdiobus_register() when present. That allows specifying MDIO and its
PHY devices in a standard DT based way.

This is required for BCM53573 SoC support. That family is sometimes
called Northstar (by marketing?) but is quite different from it. It uses
different CPU(s) and many different hw blocks.

One of shared blocks in BCM53573 is Ethernet controller. Switch however
is not SRAB accessible (as it Northstar) but is MDIO attached.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: bgmac: improve handling PHY
Rafał Miłecki [Sat, 2 Oct 2021 17:58:11 +0000 (19:58 +0200)]
net: bgmac: improve handling PHY

1. Use info from DT if available

It allows describing for example a fixed link. It's more accurate than
just guessing there may be one (depending on a chipset).

2. Verify PHY ID before trying to connect PHY

PHY addr 0x1e (30) is special in Broadcom routers and means a switch
connected as MDIO devices instead of a real PHY. Don't try connecting to
it.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoethernet: ehea: add missing cast
Jakub Kicinski [Tue, 5 Oct 2021 01:11:14 +0000 (18:11 -0700)]
ethernet: ehea: add missing cast

We need to cast the pointer, unlike memcpy() eth_hw_addr_set()
does not take void *. The driver already casts &port->mac_addr
to u8 * in other places.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: a96d317fb1a3 ("ethernet: use eth_hw_addr_set()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agosparc: Fix typo.
David S. Miller [Tue, 5 Oct 2021 10:32:56 +0000 (11:32 +0100)]
sparc: Fix typo.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/mlx5: Enable single IRQ for PCI Function
Shay Drory [Sun, 1 Aug 2021 09:08:49 +0000 (12:08 +0300)]
net/mlx5: Enable single IRQ for PCI Function

Prior to this patch the driver requires two IRQs to function properly,
one required IRQ for control and at least one required IRQ for IO.

This requirement can be relaxed to one as the driver now allows
sharing of IRQs, so control and IO EQs can share the same irq.

This is needed for high scale amount of VFs.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Shift control IRQ to the last index
Shay Drory [Thu, 19 Aug 2021 13:18:57 +0000 (16:18 +0300)]
net/mlx5: Shift control IRQ to the last index

Control IRQ is the first IRQ vector. This complicates handling of
completion irqs as we need to offset them by one.
in the next patch, there are scenarios where completion and control EQs
will share the same irq. for example: functions with single IRQ. To ease
such scenarios, we shift control IRQ to the end of the irq array.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Bridge, pop VLAN on egress table miss
Vlad Buslov [Wed, 8 Sep 2021 15:22:12 +0000 (18:22 +0300)]
net/mlx5: Bridge, pop VLAN on egress table miss

Create lowest priority flow group in egress table with single rule that
matches on special reg_c1 value that is set on ingress VLAN push with
single action that pops VLAN. The flow destination is skip table that is
used to skip any further processing of packet in FDB bridge priority.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Bridge, mark reg_c1 when pushing VLAN
Vlad Buslov [Wed, 1 Sep 2021 16:05:02 +0000 (19:05 +0300)]
net/mlx5: Bridge, mark reg_c1 when pushing VLAN

On ingress VLAN push also assign value 0x7FE to reg_c1 tunnel id+opts
bits (tunnel id 0, which is not a valid tunnel id, and option 0x7FE which
was reserved by one of previous patches in the series). In following patch
the reg value is matched on egress miss to restore the packet to its
original state by removing the VLAN before passing it to the software data
path.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Bridge, extract VLAN pop code to dedicated functions
Vlad Buslov [Thu, 2 Sep 2021 08:12:16 +0000 (11:12 +0300)]
net/mlx5: Bridge, extract VLAN pop code to dedicated functions

Following patches in series need to pop VLAN when packet misses on egress.
To reuse existing bridge VLAN pop handling code, extract it to dedicated
helpers mlx5_esw_bridge_pkt_reformat_vlan_pop_supported() and
mlx5_esw_bridge_pkt_reformat_vlan_pop_create().

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Bridge, refactor eswitch instance usage
Vlad Buslov [Thu, 2 Sep 2021 08:42:36 +0000 (11:42 +0300)]
net/mlx5: Bridge, refactor eswitch instance usage

Several functions in bridge.c excessively obtain pointer to parent eswitch
instance by dereferencing br_offloads->esw on every usage and following
patches in this series add even more usages of eswitch. Introduce local
variable 'esw' and use it instead.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Support accept action
Vlad Buslov [Tue, 7 Sep 2021 15:47:02 +0000 (18:47 +0300)]
net/mlx5e: Support accept action

Support TC generic 'accept' action in mlx5 by introducing
MLX5_ESW_ATTR_FLAG_ACCEPT attribute flag. Flag has similar semantics to
existing MLX5_ESW_ATTR_FLAG_SLOW_PATH flag, however, dedicated flag is
required because existing 'slow path' flag can be flipped by tunneling
subsystem when neighbor changes state.

Introduce new helper function mlx5_esw_attr_flags_skip() to check whether
attribute flags for 'slow path' or 'accept' action are set and use it in
eswitch code instead of direct bit manipulation.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Specify out ifindex when looking up encap route
Chris Mi [Sun, 26 Sep 2021 09:17:49 +0000 (17:17 +0800)]
net/mlx5e: Specify out ifindex when looking up encap route

There is a use case that the local and remote VTEPs are in the same
host. Currently, the out ifindex is not specified when looking up the
encap route for offloads. So in this case, a local route is returned
and the route dev is lo.

Actual tunnel interface can be created with a parameter "dev" [1],
which specifies the physical device to use for tunnel endpoint
communication. Pass this parameter to driver when looking up encap
route for offloads. So that a unicast route will be returned.

[1] ip link add name vxlan1 type vxlan id 100 dev enp4s0f0 remote 1.1.1.1 dstport 4789

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Reserve a value from TC tunnel options mapping
Vlad Buslov [Wed, 1 Sep 2021 15:22:37 +0000 (18:22 +0300)]
net/mlx5e: Reserve a value from TC tunnel options mapping

Reserve one more value from TC tunnel options range to be used by bridge
offload in following patches.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Move parse fdb check into actions_match_supported_fdb()
Roi Dayan [Sun, 15 Aug 2021 10:36:01 +0000 (13:36 +0300)]
net/mlx5e: Move parse fdb check into actions_match_supported_fdb()

The parse fdb/nic actions funcs parse the actions and then call
actions_match_supported() for final check.
Move related check in parse_tc_fdb_actions() into
actions_match_supported_fdb() for more organized code.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Split actions_match_supported() into a sub function
Roi Dayan [Sun, 15 Aug 2021 09:53:13 +0000 (12:53 +0300)]
net/mlx5e: Split actions_match_supported() into a sub function

There will probably be more checks, some for nic flows, some for fdb
flows and some are shared checks. Split it for fdb and nic to avoid
the function getting too big.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Move mod hdr allocation to a single place
Roi Dayan [Thu, 12 Aug 2021 12:46:34 +0000 (15:46 +0300)]
net/mlx5e: Move mod hdr allocation to a single place

Move mod hdr allocation chunk from parse_tc_fdb_actions() and
parse_tc_nic_actions() to a shared function.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Refactor sample offload error flow
Roi Dayan [Tue, 14 Sep 2021 08:32:51 +0000 (11:32 +0300)]
net/mlx5e: TC, Refactor sample offload error flow

Refactor sample unoffload to be symmetric to sample offload.
Use the existing del_post_rule() to release the post rule.
Also mlx5e_tc_sample_unoffload() should not return post_rule
which is NULL when post actions are supported.
Sample offload works with this NULL because many places of the
code use IS_ERR() instead of IS_ERR_OR_NULL() to check rule is valid
and when rule is detected as sample offload the code is not using the
rule. Let's be persistent and avoid returning NULL anyway and return the
pre rule, like in CT case, which is not NULL.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Add TX max rate support for MQPRIO channel mode
Tariq Toukan [Wed, 29 Sep 2021 13:30:38 +0000 (16:30 +0300)]
net/mlx5e: Add TX max rate support for MQPRIO channel mode

Add driver max_rate support for the MQPRIO bw_rlimit shaper
in channel mode.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Specify SQ stats struct for mlx5e_open_txqsq()
Tariq Toukan [Wed, 18 Aug 2021 11:17:33 +0000 (14:17 +0300)]
net/mlx5e: Specify SQ stats struct for mlx5e_open_txqsq()

Let the caller of mlx5e_open_txqsq() directly pass the SQ stats
structure pointer.
This replaces logic involving the qos_queue_group_id parameter,
and helps generalizing its role in the next patch.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agoMerge branch 'phy-10g-mode-helper'
David S. Miller [Mon, 4 Oct 2021 12:50:05 +0000 (13:50 +0100)]
Merge branch 'phy-10g-mode-helper'

Russell King says:

====================
Add phylink helper for 10G modes

During the last cycle, there was discussion about adding a helper
to set the 10G link modes for phylink, which resulted in these two
patches introduce such a helper.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: use phylink_set_10g_modes()
Russell King (Oracle) [Mon, 4 Oct 2021 11:03:33 +0000 (12:03 +0100)]
net: ethernet: use phylink_set_10g_modes()

Update three drivers to use the new phylink_set_10g_modes() helper:
Cadence macb, Freescale DPAA2 and Marvell PP2.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phylink: add phylink_set_10g_modes() helper
Russell King (Oracle) [Mon, 4 Oct 2021 11:03:28 +0000 (12:03 +0100)]
net: phylink: add phylink_set_10g_modes() helper

Add a helper for setting 10Gigabit modes, so we have one central
place that sets all appropriate 10G modes for a driver.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ipv6: fix use after free of struct seg6_pernet_data
MichelleJin [Sat, 2 Oct 2021 22:33:32 +0000 (22:33 +0000)]
net: ipv6: fix use after free of struct seg6_pernet_data

sdata->tun_src should be freed before sdata is freed
because sdata->tun_src is allocated after sdata allocation.
So, kfree(sdata) and kfree(rcu_dereference_raw(sdata->tun_src)) are
changed code order.

Fixes: f04ed7d277e8 ("net: ipv6: check return value of rhashtable_init")

Signed-off-by: MichelleJin <shjy180909@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'qed-new-fw'
David S. Miller [Mon, 4 Oct 2021 11:55:49 +0000 (12:55 +0100)]
Merge branch 'qed-new-fw'

Prabhakar Kushwaha says:

====================
qed: new firmware version 8.59.1.0 support

This series integrate new firmware version 8.59.1.0, along with updated
HSI (hardware software interface) to use the FW, into the family of
qed drivers (fastlinq devices). This FW does not reside in the NVRAM.
It needs to be programmed to device during driver load as the part of
initialization sequence.

Similar to previous FW support series, this FW is tightly linked to
software and pf function driver. This means FW release is not backward
compatible, and driver should always run with the FW it was designed
against.

FW binary blob is already submitted & accepted in linux-firmware repo.

Patches in the series include:
patch 1     - qed: Fix kernel-doc warnings
patch 2     - qed: Remove e4_ and _e4 from FW HSI
patch 3     - qed: split huge qed_hsi.h header file
patch 4-8   - HSI (hardware software interface) changes
patch 9     - qed: Add '_GTT' suffix to the IRO RAM macros
patch 10    - qed: Update debug related changes
patch 11    - qed: rdma: Update TCP silly-window-syndrome timeout
patch 12    - qed: Update the TCP active termination 2  MSL timer
patch 13    - qed: fix ll2 establishment during load of RDMA driver

In addition, this patch series also fixes existing checkpatch warnings
and checks which are missing.

Changes for v2:
 - Incorporated Jakub's comments.
   - New patch introduced to fix all kernel-doc issue in qed driver.
   - Fixed warning: ‘qed_mfw_ext_20g’ defined but not used.
   - Fixed warning related to kernel-doc wrt to this series.
   - Removed inline function declaration.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoqed: fix ll2 establishment during load of RDMA driver
Manish Chopra [Mon, 4 Oct 2021 06:58:51 +0000 (09:58 +0300)]
qed: fix ll2 establishment during load of RDMA driver

If stats ID of a LL2 (light l2) queue exceeds than the total amount
of statistics counters, it may cause system crash upon enabling
RDMA on all PFs.

This patch makes sure that the stats ID of the LL2 queue doesn't exceed
the max allowed value.

Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoqed: Update the TCP active termination 2 MSL timer ("TIME_WAIT")
Prabhakar Kushwaha [Mon, 4 Oct 2021 06:58:50 +0000 (09:58 +0300)]
qed: Update the TCP active termination 2 MSL timer ("TIME_WAIT")

Initialize 2 MSL timeout value used for the TCP TIME_WAIT state to
non-zero default.

This patch also removes magic number from qedi/qedi_main.c.

Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Nikolay Assa <nassa@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoqed: Update TCP silly-window-syndrome timeout for iwarp, scsi
Nikolay Assa [Mon, 4 Oct 2021 06:58:49 +0000 (09:58 +0300)]
qed: Update TCP silly-window-syndrome timeout for iwarp, scsi

Update TCP silly-window-syndrome timeout, for the cases where
initiator's small TCP window size prevents FW from transmitting
packets on the connection. Timeout causes FW to retransmit
window probes if needed, preventing I/O stall if initiator ignores
first window probe.

Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Nikolay Assa <nassa@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoqed: Update debug related changes
Prabhakar Kushwaha [Mon, 4 Oct 2021 06:58:48 +0000 (09:58 +0300)]
qed: Update debug related changes

qed_debug features are updated to support FW version 8.59.1.0 along
with few enhancements.
  - Removal of _BB_K2 from register defines.
  - Add new condition cond14.
  - Add dump of new area sw-platform, epoch, iscsi_task_pages,
    fcoe_task_pages, roce_task_pages and eth_task_pages.
  - Introduced new functions qed_dbg_phy_size().
  - Update in qed_mcp_nvm_rd_cmd() declaration.
  - Allow QED to control init/exit at pf level.
  - Dump partial "ILT-dump" if buffer size is not sufficient.

This patch also fixes the existing checkpatch warnings and few important
checks.

Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoqed: Add '_GTT' suffix to the IRO RAM macros
Prabhakar Kushwaha [Mon, 4 Oct 2021 06:58:47 +0000 (09:58 +0300)]
qed: Add '_GTT' suffix to the IRO RAM macros

GTT (Global translation table) is a fast-access window in the BAR into
the register space, which only maps certain register addresses.
This change helps enforce that only those addresses which are indeed
mapped by the GTT are being accessed through it.

Adding the '_GTT' suffix to the IRO FW memory (“RAM”) macros that
access GTT-able region in FW memories (“RAM”) and use GTT macros
to access RAM BAR from drivers.

Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoqed: Update FW init functions to support FW 8.59.1.0
Omkar Kulkarni [Mon, 4 Oct 2021 06:58:46 +0000 (09:58 +0300)]
qed: Update FW init functions to support FW 8.59.1.0

The qed_init_fw_func.c and qed_init_ops.c updated to support FW
version 8.59.1.0.
  - Support 16-bit VPORT WFQ (weighted fair queueing) weights.
  - Support WFQ (weighted fair queueing) weight per VPORT + TC.
  - Support allocation of Tx PQs(physical queues) per PF,VF.
  - Modify Global RL (rate limiter) upper bound configuration.
  - Update FW operation functions.
  - Update iro_arr[] array.

This patch also fixes the existing checkpatch warnings and few important
checks.

Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoqed: Use enum as per FW 8.59.1.0 in qed_iro_hsi.h
Prabhakar Kushwaha [Mon, 4 Oct 2021 06:58:45 +0000 (09:58 +0300)]
qed: Use enum as per FW 8.59.1.0 in qed_iro_hsi.h

qed_iro_hsi.h contains HSI changes related to storm memories access.
Existing code is based on hard-coded index.
Use enum as defined for FW HSI 8.59.1.0, instead of hard-coded index.

This patch also removes unnecessary header file inclusion.

Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoqed: Update qed_hsi.h for fw 8.59.1.0
Prabhakar Kushwaha [Mon, 4 Oct 2021 06:58:44 +0000 (09:58 +0300)]
qed: Update qed_hsi.h for fw 8.59.1.0

The qed_hsi.h has been updated to support new FW version 8.59.1.0 with
changes.
 - Updates FW HSI (Hardware Software interface) structures.
 - Addition/update in function declaration and defines as per HSI.
 - Add generic infrastructure for FW error reporting as part of
   common event queue handling.
 - Move malicious VF error reporting to FW error reporting
   infrastructure.
 - Move consolidation queue initialization from FW context to ramrod
   message.

qed_hsi.h header file changes lead to change in many files to ensure
compilation.

This patch also fixes the existing checkpatch warnings and few important
checks.

Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoqed: Update qed_mfw_hsi.h for FW ver 8.59.1.0
Prabhakar Kushwaha [Mon, 4 Oct 2021 06:58:43 +0000 (09:58 +0300)]
qed: Update qed_mfw_hsi.h for FW ver 8.59.1.0

The qed_mfw_hsi.h contains HSI (Hardware Software Interface) changes
related to management firmware. It has been updated to support new FW
version 8.59.1.0 with below changes.
 - New defines for VF bitmap.
 - fec_mode and extended_speed defines updated in struct eth_phy_cfg.
 - Updated structutres lldp_system_tlvs_buffer_s, public_global,
   public_port, public_func, drv_union_data, public_drv_mb
   with all dependent new structures.
 - Updates in NVM related structures and defines.
 - Msg defines are added in enum drv_msg_code and fw_msg_code.
 - Updated/added new defines.

This patch also fixes the existing checkpatch warnings and few important
checks.

Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoqed: Update common_hsi for FW ver 8.59.1.0
Prabhakar Kushwaha [Mon, 4 Oct 2021 06:58:42 +0000 (09:58 +0300)]
qed: Update common_hsi for FW ver 8.59.1.0

The common_hsi.h has been updated for FW version 8.59.1.0 with below
changes.
  - FW and Tools version.
  - New structures related to search table, packet duplication.
  - Structure for doorbell address for legacy mode without DEM.
  - Enhanced union rdma_eqe_data for RoCE Suspend Event Data.
  - New defines.

This patch also fixes the existing checkpatch warnings and few important
checks.

Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoqed: Split huge qed_hsi.h header file
Omkar Kulkarni [Mon, 4 Oct 2021 06:58:41 +0000 (09:58 +0300)]
qed: Split huge qed_hsi.h header file

The qed_hsi.h is a huge header file containing HSI (Hardware Software
Interface) definitions of storm memory access, debug related, general
and management firmware specific. In order to have a better
code-organization HSI definition, this patch split the code across
multiple files, i.e.
- storm memory access HSI : qed_iro_hsi.h
- debug related HSI       : qed_dbg_hsi.h
- Management firmware HSI : qed_mfg_hsi.h
- General HSI             : qed_hsi.h

In addition, this patch also fixes existing checkpatch warnings and
few important checks.

Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoqed: Remove e4_ and _e4 from FW HSI
Shai Malin [Mon, 4 Oct 2021 06:58:40 +0000 (09:58 +0300)]
qed: Remove e4_ and _e4 from FW HSI

The existing qed/qede/qedr/qedi/qedf code uses chip-specific naming in
structures,  functions, variables and defines in FW HSI (Hardware
Software Interface).

The new FW version introduced a generic naming convention in HSI
in-which the same code will be used across different versions
for simpler maintainability. It also eases in providing support for
new features.

With this patch every "_e4" or "e4_" prefix or suffix is not needed
anymore and it will be removed.

Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Reviewed-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoqed: Fix kernel-doc warnings
Prabhakar Kushwaha [Mon, 4 Oct 2021 06:58:39 +0000 (09:58 +0300)]
qed: Fix kernel-doc warnings

This patch fixes all the qed and qede kernel-doc warnings
according to the guidelines that are described in
Documentation/doc-guide/kernel-doc.rst.

Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'ipv6-ioam-encap'
David S. Miller [Mon, 4 Oct 2021 11:53:36 +0000 (12:53 +0100)]
Merge branch 'ipv6-ioam-encap'

Justin Iurman says:

====================
Support for the ip6ip6 encapsulation of IOAM

v2:
 - add prerequisite patches
 - keep uapi backwards compatible by adding two new attributes
 - add more comments to document the ioam6_iptunnel uapi

In the current implementation, IOAM can only be inserted directly (i.e., only
inside packets generated locally) by default, to be compliant with RFC8200.

This patch adds support for in-transit packets and provides the ip6ip6
encapsulation of IOAM (RFC8200 compliant). Therefore, three ioam6 encap modes
are defined:

 - inline: directly inserts IOAM inside packets (by default).

 - encap:  ip6ip6 encapsulation of IOAM inside packets.

 - auto:   either inline mode for packets generated locally or encap mode for
           in-transit packets.

With current iproute2 implementation, it is configured this way:

$ ip -6 r [...] encap ioam6 trace prealloc [...]

The old syntax does not change (for backwards compatibility) and implicitly uses
the inline mode. With the new syntax, an encap mode can be specified:

(inline mode)
$ ip -6 r [...] encap ioam6 mode inline trace prealloc [...]

(encap mode)
$ ip -6 r [...] encap ioam6 mode encap tundst fc00::2 trace prealloc [...]

(auto mode)
$ ip -6 r [...] encap ioam6 mode auto tundst fc00::2 trace prealloc [...]

A tunnel destination address must be configured when using the encap mode or the
auto mode.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: net: Test for the IOAM encapsulation with IPv6
Justin Iurman [Sun, 3 Oct 2021 18:45:39 +0000 (20:45 +0200)]
selftests: net: Test for the IOAM encapsulation with IPv6

This patch adds support for testing the encap (ip6ip6) mode of IOAM.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: ioam: Add support for the ip6ip6 encapsulation
Justin Iurman [Sun, 3 Oct 2021 18:45:38 +0000 (20:45 +0200)]
ipv6: ioam: Add support for the ip6ip6 encapsulation

This patch adds support for the ip6ip6 encapsulation by providing three encap
modes: inline, encap and auto.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: ioam: Prerequisite patch for ioam6_iptunnel
Justin Iurman [Sun, 3 Oct 2021 18:45:37 +0000 (20:45 +0200)]
ipv6: ioam: Prerequisite patch for ioam6_iptunnel

This prerequisite patch provides some minor edits (alignments, renames) and a
minor modification inside a function to facilitate the next patch by using
existing nla_* functions.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: ioam: Distinguish input and output for hop-limit
Justin Iurman [Sun, 3 Oct 2021 18:45:36 +0000 (20:45 +0200)]
ipv6: ioam: Distinguish input and output for hop-limit

This patch anticipates the support for the IOAM insertion inside in-transit
packets, by making a difference between input and output in order to determine
the right value for its hop-limit (inherited from the IPv6 hop-limit).

Input case: happens before ip6_forward, the IPv6 hop-limit is not decremented
yet -> decrement the IOAM hop-limit to reflect the new hop inside the trace.

Output case: happens after ip6_forward, the IPv6 hop-limit has already been
decremented -> keep the same value for the IOAM hop-limit.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/mlx4_en: avoid one cache line miss to ring doorbell
Eric Dumazet [Fri, 1 Oct 2021 00:52:49 +0000 (17:52 -0700)]
net/mlx4_en: avoid one cache line miss to ring doorbell

This patch caches doorbell address directly in struct mlx4_en_tx_ring.

This removes the need to bring in cpu caches whole struct mlx4_uar
in fast path.

Note that mlx4_uar is not guaranteed to be on a local node,
because mlx4_bf_alloc() uses a single free list (priv->bf_list)
regardless of its node parameter.

This kind of change does matter in presence of light/moderate traffic.
In high stress, this read-only line would be kept hot in caches.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mctp-kunit-tests'
David S. Miller [Sun, 3 Oct 2021 13:35:42 +0000 (14:35 +0100)]
Merge branch 'mctp-kunit-tests'

Jeremy Kerr says:

====================
MCTP kunit tests

This change adds some initial kunit tests for the MCTP core. We'll
expand the coverage in a future series, and augment with a few
selftests, but this establishes a baseline set of tests for now.

Thanks to the kunit folks for the framework!

---

v2:
 - fix MCTP=m, KUNIT={y,m} breakage
 - fix mctp test netdev initialisation
 - strict route reference count checking
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomctp: Add input reassembly tests
Jeremy Kerr [Sun, 3 Oct 2021 03:17:08 +0000 (11:17 +0800)]
mctp: Add input reassembly tests

Add multi-packet route input tests, for message reassembly. These will
feed packets to be received by a bound socket, or dropped.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomctp: Add route input to socket tests
Jeremy Kerr [Sun, 3 Oct 2021 03:17:07 +0000 (11:17 +0800)]
mctp: Add route input to socket tests

Add a few tests for single-packet route inputs, testing the
mctp_route_input function.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomctp: Add packet rx tests
Jeremy Kerr [Sun, 3 Oct 2021 03:17:06 +0000 (11:17 +0800)]
mctp: Add packet rx tests

Add a few tests for the initial packet ingress through
mctp_pkttype_receive function; mainly packet header sanity checks. Full
input routing checks will be added as a separate change.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomctp: Add test utils
Jeremy Kerr [Sun, 3 Oct 2021 03:17:05 +0000 (11:17 +0800)]
mctp: Add test utils

Add a new object for shared test utilities

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomctp: Add initial test structure and fragmentation test
Jeremy Kerr [Sun, 3 Oct 2021 03:17:04 +0000 (11:17 +0800)]
mctp: Add initial test structure and fragmentation test

This change adds the first kunit test for the mctp subsystem, and an
initial test for the fragmentation path.

We're adding tests under a new net/mctp/test/ directory.

Incorporates a fix for module configs:

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: wwan: iosm: correct devlink extra params
M Chetan Kumar [Sat, 2 Oct 2021 14:32:12 +0000 (20:02 +0530)]
net: wwan: iosm: correct devlink extra params

1. Removed driver specific extra params like download_region,
   address & region_count. The required information is passed
   as part of flash API.
2. IOSM Devlink documentation updated to reflect the same.

Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>