review.tizen.org Git - platform/kernel/linux-rpi.git/log

nfp: bpf: implement jitting of JMP32

This patch implements code-gen for new JMP32 instructions on NFP.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

s390: bpf: implement jitting of JMP32

This patch implements code-gen for new JMP32 instructions on s390.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ppc: bpf: implement jitting of JMP32

This patch implements code-gen for new JMP32 instructions on ppc.

For JMP32 | JSET, instruction encoding for PPC_RLWINM_DOT is added to check
the result of ANDing low 32-bit of operands.

Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

arm: bpf: implement jitting of JMP32

This patch implements code-gen for new JMP32 instructions on arm.

For JSET, "ands" (AND with flags updated) is used, so corresponding
encoding helper is added.

Cc: Shubham Bansal <illusionist.neo@gmail.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

arm64: bpf: implement jitting of JMP32

This patch implements code-gen for new JMP32 instructions on arm64.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

x32: bpf: implement jitting of JMP32

This patch implements code-gen for new JMP32 instructions on x32.
Also fixed several reverse xmas tree coding style issues as I am there.

Cc: Wang YanQing <udknight@gmail.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

x86_64: bpf: implement jitting of JMP32

This patch implements code-gen for new JMP32 instructions on x86_64.

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: JIT blinds support JMP32

This patch adds JIT blinds support for JMP32.

Like BPF_JMP_REG/IMM, JMP32 version are needed for building raw bpf insn.
They are added to both include/linux/filter.h and
tools/include/linux/filter.h.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: interpreter support for JMP32

This patch implements interpreting new JMP32 instructions.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools: bpftool: teach cfg code about JMP32

The cfg code need to be aware of the new JMP32 instruction class so it
could partition functions correctly.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: disassembler support JMP32

This patch teaches disassembler about JMP32. There are two places to
update:

  - Class 0x6 now used by BPF_JMP32, not "unused".

  - BPF_JMP32 need to show comparison operands properly.
    The disassemble format is to add an extra "(32)" before the operands if
    it is a sub-register. A better disassemble format for both JMP32 and
    ALU32 just show the register prefix as "w" instead of "r", this is the
    format using by LLVM assembler.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: verifier support JMP32

This patch teach verifier about the new BPF_JMP32 instruction class.
Verifier need to treat it similar as the existing BPF_JMP class.
A BPF_JMP32 insn needs to go through all checks that have been done on
BPF_JMP.

Also, verifier is doing runtime optimizations based on the extra info
conditional jump instruction could offer, especially when the comparison is
between constant and register that the value range of the register could be
improved based on the comparison results. These code are updated
accordingly.

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: refactor verifier min/max code for condition jump

The current min/max code does both signed and unsigned comparisons against
the input argument "val" which is "u64" and there is explicit type casting
when the comparison is signed.

As we will need slightly more complexer type casting when JMP32 introduced,
it is better to host the signed type casting. This makes the code more
clean with ignorable runtime overhead.

Also, code for J*GE/GT/LT/LE and JEQ/JNE are very similar, this patch
combine them.

The main purpose for this refactor is to make sure the min/max code will
still be readable and with minimum code duplication after JMP32 introduced.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: allocate 0x06 to new eBPF instruction class JMP32

The new eBPF instruction class JMP32 uses the reserved class number 0x6.
Kernel BPF ISA documentation updated accordingly.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'af-xdp-sock-diag'

Björn Töpel says:

====================
This series adds an AF_XDP sock_diag interface for querying sockets
from user-space. Tools like iproute2 ss(8) can use this interface to
list open AF_XDP sockets.

The diagnostic provides information about the Rx/Tx/fill/completetion
rings, umem, memory usage and such. For a complete list, please refer
to the xsk_diag.c file.

The AF_XDP sock_diag interface is optional, and can be built as a
module. A separate patch series, adding ss(8) iproute2 support, will
follow.

v1->v2:
  * Removed extra newline
  * Zero-out all user-space facing structures prior setting the
    members
  * Added explicit "pad" member in _msg struct
  * Removed unused variable "req" in xsk_diag_handler_dump()

Thanks to Daniel for reviewing the series!
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

xsk: add sock_diag interface for AF_XDP

This patch adds the sock_diag interface for querying sockets from user
space. Tools like iproute2 ss(8) can use this interface to list open
AF_XDP sockets.

The user-space ABI is defined in linux/xdp_diag.h and includes netlink
request and response structs. The request can query sockets and the
response contains socket information about the rings, umems, inode and
more.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

xsk: add id to umem

This commit adds an id to the umem structure. The id uniquely
identifies a umem instance, and will be exposed to user-space via the
socket monitoring interface.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

net: xsk: track AF_XDP sockets on a per-netns list

Track each AF_XDP socket in a per-netns list. This will be used later
by the sock_diag interface for querying sockets from userspace.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests/bpf: suppress readelf stderr when probing for BTF support

Before:
$ make -s -C tools/testing/selftests/bpf
readelf: Error: Missing knowledge of 32-bit reloc types used in DWARF
sections of machine number 247
readelf: Warning: unable to apply unsupported reloc type 10 to section
.debug_info
readelf: Warning: unable to apply unsupported reloc type 1 to section
.debug_info
readelf: Warning: unable to apply unsupported reloc type 10 to section
.debug_info

After:
$ make -s -C tools/testing/selftests/bpf

v2:
* use llvm-readelf instead of redirecting binutils' readelf stderr to
/dev/null

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: allow BPF programs access skb_shared_info->gso_segs field

This adds the ability to read gso_segs from a BPF program.

v3: Use BPF_REG_AX instead of BPF_REG_TMP for the temporary register,
as suggested by Martin.

v2: refined Eddie Hao patch to address Alexei feedback.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eddie Hao <eddieh@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpftool: feature probing, change default action

When 'bpftool feature' is executed it shows incorrect help string.

test# bpftool feature
Usage: bpftool bpftool probe [COMPONENT] [macros [prefix PREFIX]]
bpftool bpftool help

COMPONENT := { kernel | dev NAME }

Instead of fixing the help text by tweaking argv[] indices, this
patch changes the default action to 'probe'. It makes the behavior
consistent with other subcommands, where first subcommand without
extra parameter results in 'show' action.

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'dead-code-elimination'

Jakub Kicinski says:

====================
This set adds support for complete removal of dead code.

Patch 3 contains all the code removal logic, patches 2 and 4
additionally optimize branches around and to dead code.

Patches 6 and 7 allow offload JITs to take advantage of the
optimization.  After a few small clean ups (8, 9, 10) nfp
support is added (11, 12).

Removing code directly in the verifier makes it easy to adjust
the relevant metadata (line info, subprogram info).  JITs for
code store constrained architectures would have hard time
performing such adjustments at JIT level.  Removing subprograms
or line info is very hard once BPF core finished the verification.
For user space to perform dead code removal it would have to perform
the execution simulation/analysis similar to what the verifier does.

v3:
- fix uninitilized var warning in GCC 6 (buildbot).
v4:
- simplify the linfo-keeping logic (Yonghong).  Instead of
   trying to figure out that we are removing first instruction
   of a subprogram, just always keep last dead line info, if
   first live instruction doesn't have one.
v5:
- improve comments (Martin Lau).
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

nfp: bpf: support removing dead code

Add a verifier callback to the nfp JIT to remove the instructions
the verifier deemed to be dead.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

nfp: bpf: support optimizing dead branches

Verifier will now optimize out branches to dead code, implement
the replace_insn callback to take advantage of that optimization.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

nfp: bpf: save original program length

Instead of passing env->prog->len around, and trying to adjust
for optimized out instructions just save the initial number
of instructions in struct nfp_prog.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

nfp: bpf: split up the skip flag

We fail program loading if jump lands on a skipped instruction.
This is for historical reasons, it used to be that we only skipped
instructions optimized out based on prior context, and therefore
the optimization would be buggy if we jumped directly to such
instruction (because the context would be skipped by the jump).

There are cases where instructions can be skipped without any
context, for example there is no point in generating code for:

r0 |= 0

We will also soon support dropping dead code, so make the skip
logic differentiate between "optimized with preceding context"
vs other skip types.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

nfp: bpf: don't use instruction number for jump target

Instruction number is meaningless at code gen phase.  The target
of the instruction is overwritten by nfp_fixup_branches().  The
convention is to put the raw offset in target address as a place
holder.  See cmp_* functions.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: notify offload JITs about optimizations

Let offload JITs know when instructions are replaced and optimized
out, so they can update their state appropriately. The optimizations
are best effort, if JIT returns an error from any callback verifier
will stop notifying it as state may now be out of sync, but the
verifier continues making progress.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: verifier: record original instruction index

The communication between the verifier and advanced JITs is based
on instruction indexes. We have to keep them stable throughout
the optimizations otherwise referring to a particular instruction
gets messy quickly.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests: bpf: add tests for dead code removal

Add tests for newly added dead code elimination. Both verifier
and BTF tests are added. BTF test infrastructure has to be
extended to be able to account for line info which is eliminated
during dead code removal.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: verifier: remove unconditional branches by 0

Unconditional branches by 0 instructions are basically noops
but they can result from earlier optimizations, e.g. a conditional
jumps which would never be taken or a conditional jump around
dead code.

Remove those branches.

v0.2:
- s/opt_remove_dead_branches/opt_remove_nops/ (Jiong).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: verifier: remove dead code

Instead of overwriting dead code with jmp -1 instructions
remove it completely for root. Adjust verifier state and
line info appropriately.

v2:
- adjust func_info (Alexei);
- make sure first instruction retains line info (Alexei).
v4: (Yonghong)
- remove unnecessary if (!insn to remove) checks;
- always keep last line info if first live instruction lacks one.
v5: (Martin Lau)
- improve and clarify comments.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: verifier: hard wire branches to dead code

Loading programs with dead code becomes more and more
common, as people begin to patch constants at load time.
Turn conditional jumps to unconditional ones, to avoid
potential branch misprediction penalty.

This optimization is enabled for privileged users only.

For branches which just fall through we could just mark
them as not seen and have dead code removal take care of
them, but that seems less clean.

v0.2:
- don't call capable(CAP_SYS_ADMIN) twice (Jiong).
v3:
- fix GCC warning;

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: change parameters of call/branch offset adjustment

In preparation for code removal change parameters to branch
and call adjustment functions to be more universal.  The
current parameters assume we are patching a single instruction
with a longer set.

A diagram may help reading the change, this is for the patch
single case, patching instruction 1 with a replacement of 4:
   ____
0 |____|
1 |____| <-- pos                ^
2 |    | <-- end old  ^         |
3 |    |              |  delta  |  len
4 |____|              |         |  (patch region)
5 |    | <-- end new  v         v
6 |____|

end_old = pos + 1
end_new = pos + delta + 1

If we are before the patch region - curr variable and the target
are fully in old coordinates (hence comparing against end_old).
If we are after the region curr is in new coordinates (hence
the comparison to end_new) but target is in mixed coordinates,
so we just check if it falls before end_new, and if so it needs
the adjustment.

Note that we will not fix up branches which land in removed region
in case of removal, which should be okay, as we are only going to
remove dead code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: don't hardcode iptables/nc path in test_tcpnotify_user

system() is calling shell which should find the appropriate full path
via $PATH. On some systems, full path to iptables and/or nc might be
different that we one we have hardcoded.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

libbpf: Show supported ELF section names when failing to guess prog/attach type

We need to let users check their wrong ELF section name with proper
ELF section names when they fail to get a prog/attach type from it.
Because users can't realize libbpf guess prog/attach types from given
ELF section names. For example, when a 'cgroup' section name of a
BPF program is used, show available ELF section names(types).

Before:

    $ bpftool prog load bpf-prog.o /sys/fs/bpf/prog1
    Error: failed to guess program type based on ELF section name cgroup

After:

    libbpf: failed to guess program type based on ELF section name 'cgroup'
    libbpf: supported section(type) names are: socket kprobe/ kretprobe/ classifier action tracepoint/ raw_tracepoint/ xdp perf_event lwt_in lwt_out lwt_xmit lwt_seg6local cgroup_skb/ingress cgroup_skb/egress cgroup/skb cgroup/sock cgroup/post_bind4 cgroup/post_bind6 cgroup/dev sockops sk_skb/stream_parser sk_skb/stream_verdict sk_skb sk_msg lirc_mode2 flow_dissector cgroup/bind4 cgroup/bind6 cgroup/connect4 cgroup/connect6 cgroup/sendmsg4 cgroup/sendmsg6

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Andrey Ignatov <rdna@fb.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: btf: add btf documentation

This patch added documentation for BTF (BPF Debug Format).
The document is placed under linux:Documentation/bpf directory.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpftool-probes'

Quentin Monnet says:

====================
Hi,
This set adds a new command to bpftool in order to dump a list of
eBPF-related parameters for the system (or for a specific network
device) to the console. Once again, this is based on a suggestion from
Daniel.

At this time, output includes:

    - Availability of bpf() system call
    - Availability of bpf() system call for unprivileged users
    - JIT status (enabled or not, with or without debugging traces)
    - JIT hardening status
    - JIT kallsyms exports status
    - Global memory limit for JIT compiler for unprivileged users
    - Status of kernel compilation options related to BPF features
    - Availability of known eBPF program types
    - Availability of known eBPF map types
    - Availability of known eBPF helper functions

There are three different ways to dump this information at this time:

    - Plain output dumps probe results in plain text. It is the most
      flexible options for providing descriptive output to the user, but
      should not be relied upon for parsing the output.
    - JSON output is supported.
    - A third mode, available through the "macros" keyword appended to the
      command line, dumps some of those parameters (not all) as a series of
      "#define" directives, that can be included into a C header file for
      example.

Probes for supported program and map types, and supported helpers, are
directly added to libbpf, so that other applications (or selftests) can
reuse them as necessary.

If the user does not have root privileges (or more precisely, the
CAP_SYS_ADMIN capability) detection will be erroneous for most
parameters. Therefore, forbid non-root users to run the command.

v5:
- Move exported symbols to a new LIBBPF_0.0.2 section in libbpf.map
  (patches 4 to 6).
- Minor fixes on patches 3 and 4.

v4:
- Probe bpf_jit_limit parameter (patch 2).
- Probe some additional kernel config options (patch 3).
- Minor fixes on patch 6.

v3:
- Do not probe kernel version in bpftool (just retrieve it to probe support
  for kprobes in libbpf).
- Change the way results for helper support is displayed: now one list of
  compatible helpers for each program type (and C-style output gets a
  HAVE_PROG_TYPE_HELPER(prog_type, helper) macro to help with tests. See
  patches 6, 7.
- Address other comments from feedback from v2 (please refer to individual
  patches' history).

v2 (please also refer to individual patches' history):
- Move probes for prog/map types, helpers, from bpftool to libbpf.
- Move C-style output as a separate patch, and restrict it to a subset of
  collected information (bpf() availability, prog/map types, helpers).
- Now probe helpers with all supported program types, and display a list of
  compatible program types (as supported on the system) for each helper.
- NOT addressed: grouping compilation options for kernel into subsections
  (patch 3) (I don't see an easy way of grouping them at the moment, please
  see also the discussion on v1 thread).
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools: bpftool: add bash completion for bpftool probes

Add the bash completion related to the newly introduced "bpftool feature
probe" command.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools: bpftool: add probes for a network device

bpftool gained support for probing the current system in order to see
what program and map types, and what helpers are available on that
system. This patch adds the possibility to pass an interface index to
libbpf (and hence to the kernel) when trying to load the programs or to
create the maps, in order to see what items a given network device can
support.

A new keyword "dev <ifname>" can be used as an alternative to "kernel"
to indicate that the given device should be tested. If no target ("dev"
or "kernel") is specified bpftool defaults to probing the kernel.

Sample output:

    # bpftool -p feature probe dev lo
    {
        "syscall_config": {
            "have_bpf_syscall": true
        },
        "program_types": {
            "have_sched_cls_prog_type": false,
            "have_xdp_prog_type": false
        },
        ...
    }

As the target is a network device, /proc/ parameters and kernel
configuration are NOT dumped. Availability of the bpf() syscall is
still probed, so we can return early if that syscall is not usable
(since there is no point in attempting the remaining probes in this
case).

Among the program types, only the ones that can be offloaded are probed.
All map types are probed, as there is no specific rule telling which one
could or could not be supported by a device in the future. All helpers
are probed (but only for offload-able program types).

Caveat: as bpftool does not attempt to attach programs to the device at
the moment, probes do not entirely reflect what the device accepts:
typically, for Netronome's nfp, results will announce that TC cls
offload is available even if support has been deactivated (with e.g.
ethtool -K eth1 hw-tc-offload off).

v2:
- All helpers are probed, whereas previous version would only probe the
  ones compatible with an offload-able program type. This is because we
  do not keep a default compatible program type for each helper anymore.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools: bpftool: add C-style "#define" output for probes

Make bpftool able to dump a subset of the parameters collected by
probing the system as a listing of C-style #define macros, so that
external projects can reuse the result of this probing and build
BPF-based project in accordance with the features available on the
system.

The new "macros" keyword is used to select this output. An additional
"prefix" keyword is added so that users can select a custom prefix for
macro names, in order to avoid any namespace conflict.

Sample output:

    # bpftool feature probe kernel macros prefix FOO_
    /*** System call availability ***/
    #define FOO_HAVE_BPF_SYSCALL

    /*** eBPF program types ***/
    #define FOO_HAVE_SOCKET_FILTER_PROG_TYPE
    #define FOO_HAVE_KPROBE_PROG_TYPE
    #define FOO_HAVE_SCHED_CLS_PROG_TYPE
    ...

    /*** eBPF map types ***/
    #define FOO_HAVE_HASH_MAP_TYPE
    #define FOO_HAVE_ARRAY_MAP_TYPE
    #define FOO_HAVE_PROG_ARRAY_MAP_TYPE
    ...

    /*** eBPF helper functions ***/
    /*
     * Use FOO_HAVE_PROG_TYPE_HELPER(prog_type_name, helper_name)
     * to determine if <helper_name> is available for <prog_type_name>,
     * e.g.
     *      #if FOO_HAVE_PROG_TYPE_HELPER(xdp, bpf_redirect)
     *              // do stuff with this helper
     *      #elif
     *              // use a workaround
     *      #endif
     */
    #define FOO_HAVE_PROG_TYPE_HELPER(prog_type, helper)        \
            FOO_BPF__PROG_TYPE_ ## prog_type ## __HELPER_ ## helper
    ...
    #define FOO_BPF__PROG_TYPE_socket_filter__HELPER_bpf_probe_read 0
    #define FOO_BPF__PROG_TYPE_socket_filter__HELPER_bpf_ktime_get_ns 1
    #define FOO_BPF__PROG_TYPE_socket_filter__HELPER_bpf_trace_printk 1
    ...

v3:
- Change output for helpers again: add a
  HAVE_PROG_TYPE_HELPER(type, helper) macro that can be used to tell
  if <helper> is available for program <type>.

v2:
- #define-based output added as a distinct patch.
- "HAVE_" prefix appended to macro names.
- Output limited to bpf() syscall availability, BPF prog and map types,
  helper functions. In this version kernel config options, procfs
  parameter or kernel version are intentionally left aside.
- Following the change on helper probes, format for helper probes in
  this output style has changed (now a list of compatible program
  types).

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools: bpftool: add probes for eBPF helper functions

Similarly to what was done for program types and map types, add a set of
probes to test the availability of the different eBPF helper functions
on the current system.

For each known program type, all known helpers are tested, in order to
establish a compatibility matrix. Output is provided as a set of lists
of available helpers, one per program type.

Sample output:

    # bpftool feature probe kernel
    ...
    Scanning eBPF helper functions...
    eBPF helpers supported for program type socket_filter:
            - bpf_map_lookup_elem
            - bpf_map_update_elem
            - bpf_map_delete_elem
    ...
    eBPF helpers supported for program type kprobe:
            - bpf_map_lookup_elem
            - bpf_map_update_elem
            - bpf_map_delete_elem
    ...

    # bpftool --json --pretty feature probe kernel
    {
        ...
        "helpers": {
            "socket_filter_available_helpers": ["bpf_map_lookup_elem", \
                    "bpf_map_update_elem","bpf_map_delete_elem", ...
            ],
            "kprobe_available_helpers": ["bpf_map_lookup_elem", \
                    "bpf_map_update_elem","bpf_map_delete_elem", ...
            ],
            ...
        }
    }

v5:
- In libbpf.map, move global symbol to the new LIBBPF_0.0.2 section.

v4:
- Use "enum bpf_func_id" instead of "__u32" in bpf_probe_helper()
  declaration for the type of the argument used to pass the id of
  the helper to probe.
- Undef BPF_HELPER_MAKE_ENTRY after using it.

v3:
- Do not pass kernel version from bpftool to libbpf probes (kernel
  version for testing program with kprobes is retrieved directly from
  libbpf).
- Dump one list of available helpers per program type (instead of one
  list of compatible program types per helper).

v2:
- Move probes from bpftool to libbpf.
- Test all program types for each helper, print a list of working prog
  types for each helper.
- Fall back on include/uapi/linux/bpf.h for names and ids of helpers.
- Remove C-style macros output from this patch.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools: bpftool: add probes for eBPF map types

Add new probes for eBPF map types, to detect what are the ones available
on the system. Try creating one map of each type, and see if the kernel
complains.

Sample output:

    # bpftool feature probe kernel
    ...
    Scanning eBPF map types...
    eBPF map_type hash is available
    eBPF map_type array is available
    eBPF map_type prog_array is available
    ...

    # bpftool --json --pretty feature probe kernel
    {
        ...
        "map_types": {
            "have_hash_map_type": true,
            "have_array_map_type": true,
            "have_prog_array_map_type": true,
            ...
        }
    }

v5:
- In libbpf.map, move global symbol to the new LIBBPF_0.0.2 section.

v3:
- Use a switch with all enum values for setting specific map parameters,
  so that gcc complains at compile time (-Wswitch-enum) if new map types
  were added to the kernel but libbpf was not updated.

v2:
- Move probes from bpftool to libbpf.
- Remove C-style macros output from this patch.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools: bpftool: add probes for eBPF program types

Introduce probes for supported BPF program types in libbpf, and call it
from bpftool to test what types are available on the system. The probe
simply consists in loading a very basic program of that type and see if
the verifier complains or not.

Sample output:

    # bpftool feature probe kernel
    ...
    Scanning eBPF program types...
    eBPF program_type socket_filter is available
    eBPF program_type kprobe is available
    eBPF program_type sched_cls is available
    ...

    # bpftool --json --pretty feature probe kernel
    {
        ...
        "program_types": {
            "have_socket_filter_prog_type": true,
            "have_kprobe_prog_type": true,
            "have_sched_cls_prog_type": true,
            ...
        }
    }

v5:
- In libbpf.map, move global symbol to a new LIBBPF_0.0.2 section.
- Rename (non-API function) prog_load() as probe_load().

v3:
- Get kernel version for checking kprobes availability from libbpf
  instead of from bpftool. Do not pass kernel_version as an argument
  when calling libbpf probes.
- Use a switch with all enum values for setting specific program
  parameters just before probing, so that gcc complains at compile time
  (-Wswitch-enum) if new prog types were added to the kernel but libbpf
  was not updated.
- Add a comment in libbpf.h about setrlimit() usage to allow many
  consecutive probe attempts.

v2:
- Move probes from bpftool to libbpf.
- Remove C-style macros output from this patch.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools: bpftool: add probes for kernel configuration options

Add probes to dump a number of options set (or not set) for compiling
the kernel image. These parameters provide information about what BPF
components should be available on the system. A number of them are not
directly related to eBPF, but are in fact used in the kernel as
conditions on which to compile, or not to compile, some of the eBPF
helper functions.

Sample output:

    # bpftool feature probe kernel
    Scanning system configuration...
    ...
    CONFIG_BPF is set to y
    CONFIG_BPF_SYSCALL is set to y
    CONFIG_HAVE_EBPF_JIT is set to y
    ...

    # bpftool --pretty --json feature probe kernel
    {
        "system_config": {
            ...
            "CONFIG_BPF": "y",
            "CONFIG_BPF_SYSCALL": "y",
            "CONFIG_HAVE_EBPF_JIT": "y",
            ...
        }
    }

v5:
- Declare options[] array in probe_kernel_image_config() as static.

v4:
- Add some options to the list:
    - CONFIG_TRACING
    - CONFIG_KPROBE_EVENTS
    - CONFIG_UPROBE_EVENTS
    - CONFIG_FTRACE_SYSCALLS
- Add comments about those options in the source code.

v3:
- Add a comment about /proc/config.gz not being supported as a path for
  the config file at this time.
- Use p_info() instead of p_err() on failure to get options from config
  file, as bpftool keeps probing other parameters and that would
  possibly create duplicate "error" entries for JSON.

v2:
- Remove C-style macros output from this patch.
- NOT addressed: grouping of those config options into subsections
  (I don't see an easy way of grouping them at the moment, please see
  also the discussion on v1 thread).

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools: bpftool: add probes for /proc/ eBPF parameters

Add a set of probes to dump the eBPF-related parameters available from
/proc/: availability of bpf() syscall for unprivileged users,
JIT compiler status and hardening status, kallsyms exports status.

Sample output:

    # bpftool feature probe kernel
    Scanning system configuration...
    bpf() syscall for unprivileged users is enabled
    JIT compiler is disabled
    JIT compiler hardening is disabled
    JIT compiler kallsyms exports are disabled
    Global memory limit for JIT compiler for unprivileged users \
            is 264241152 bytes
    ...

    # bpftool --json --pretty feature probe kernel
    {
        "system_config": {
            "unprivileged_bpf_disabled": 0,
            "bpf_jit_enable": 0,
            "bpf_jit_harden": 0,
            "bpf_jit_kallsyms": 0,
            "bpf_jit_limit": 264241152
        },
        ...
    }

These probes are skipped if procfs is not mounted.

v4:
- Add bpf_jit_limit parameter.

v2:
- Remove C-style macros output from this patch.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools: bpftool: add basic probe capability, probe syscall availability

Add a new component and command for bpftool, in order to probe the
system to dump a set of eBPF-related parameters so that users can know
what features are available on the system.

Parameters are dumped in plain or JSON output (with -j/-p options).

The current patch introduces probing of one simple parameter:
availability of the bpf() system call. Later commits
will add other probes.

Sample output:

    # bpftool feature probe kernel
    Scanning system call availability...
    bpf() syscall is available

    # bpftool --json --pretty feature probe kernel
    {
        "syscall_config": {
            "have_bpf_syscall": true
        }
    }

The optional "kernel" keyword enforces probing of the current system,
which is the only possible behaviour at this stage. It can be safely
omitted.

The feature comes with the relevant man page, but bash completion will
come in a dedicated commit.

v3:
- Do not probe kernel version. Contrarily to what is written below for
  v2, we can have the kernel version retrieved in libbpf instead of
  bpftool (in the patch adding probing for program types).

v2:
- Remove C-style macros output from this patch.
- Even though kernel version is no longer needed for testing kprobes
  availability, note that we still collect it in this patch so that
  bpftool gets able to probe (in next patches) older kernels as well.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: fix a (false) compiler warning

An older GCC compiler complains:

kernel/bpf/verifier.c: In function 'bpf_check':
kernel/bpf/verifier.c:4***:13: error: 'prev_offset' may be used uninitialized
      in this function [-Werror=maybe-uninitialized]
   } else if (krecord[i].insn_offset <= prev_offset) {
             ^
kernel/bpf/verifier.c:4***:38: note: 'prev_offset' was declared here
  u32 i, nfuncs, urec_size, min_size, prev_offset;

Although the compiler is wrong here, the patch makes sure
that prev_offset is always initialized, just to silence the warning.

v2: fix a spelling error in the commit message.

Signed-off-by: Peter Oskolkov <posk@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'bpf-bpftool-queue-stack'

Stanislav Fomichev says:

====================
This patch series add support for queue/stack manipulations.

It goes like this:

commands by permitting empty keys.

v2:
* removed unneeded jsonw_null from patch #6
* improved bash completions (and moved them into separate patch #7)
====================

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpftool: add bash completion for peek/push/enqueue/pop/dequeue

bpftool map peek id <TAB>      - suggests only queue and stack map ids
bpftool map pop id <TAB>       - suggests only stack map ids
bpftool map dequeue id <TAB>   - suggests only queue map ids

bpftool map push id <TAB>      - suggests only stack map ids
bpftool map enqueue id <TAB>   - suggests only queue map ids

bpftool map push id 1 <TAB>    - suggests 'value', not 'key'
bpftool map enqueue id 2 <TAB> - suggests 'value', not 'key'

bpftool map update id <stack/queue type> - suggests 'value', not 'key'
bpftool map lookup id <stack/queue type> - suggests nothing

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpftool: add pop and dequeue commands

This is intended to be used with queues and stacks, it pops and prints
the last element via bpf_map_lookup_and_delete_elem.

Example:

bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q
bpftool map push pinned /sys/fs/bpf/q value 0 1 2 3
bpftool map pop pinned /sys/fs/bpf/q
value: 00 01 02 03
bpftool map pop pinned /sys/fs/bpf/q
Error: empty map

bpftool map create /sys/fs/bpf/s type stack value 4 entries 10 name s
bpftool map enqueue pinned /sys/fs/bpf/s value 0 1 2 3
bpftool map dequeue pinned /sys/fs/bpf/s
value: 00 01 02 03
bpftool map dequeue pinned /sys/fs/bpf/s
Error: empty map

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpftool: add push and enqueue commands

This is intended to be used with queues and stacks and be more
user-friendly than 'update' without the key.

Example:
bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q
bpftool map push pinned /sys/fs/bpf/q value 0 1 2 3
bpftool map peek pinned /sys/fs/bpf/q
value: 00 01 02 03

bpftool map create /sys/fs/bpf/s type stack value 4 entries 10 name s
bpftool map enqueue pinned /sys/fs/bpf/s value 0 1 2 3
bpftool map peek pinned /sys/fs/bpf/s
value: 00 01 02 03

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpftool: add peek command

This is intended to be used with queues and stacks and be more
user-friendly than 'lookup' without key/value.

Example:
bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q
bpftool map update pinned /sys/fs/bpf/q value 0 1 2 3
bpftool map peek pinned /sys/fs/bpf/q
value: 00 01 02 03

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpftool: don't print empty key/value for maps

When doing dump or lookup, don't print key if key_size == 0 or value if
value_size == 0. The initial usecase is queue and stack, where we have
only values.

This is for regular output only, json still has all the fields.

Before:
bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q
bpftool map update pinned /sys/fs/bpf/q value 0 1 2 3
bpftool map lookup pinned /sys/fs/bpf/q
key: value: 00 01 02 03

After:
bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q
bpftool map update pinned /sys/fs/bpf/q value 0 1 2 3
bpftool map lookup pinned /sys/fs/bpf/q
value: 00 01 02 03

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpftool: make key optional in lookup command

Bpftool expects key for 'lookup' operations. For some map types, key should
not be specified. Support looking up those map types.

Before:
bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q
bpftool map update pinned /sys/fs/bpf/q value 0 1 2 3
bpftool map lookup pinned /sys/fs/bpf/q
Error: did not find key

After:
bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q
bpftool map update pinned /sys/fs/bpf/q value 0 1 2 3
bpftool map lookup pinned /sys/fs/bpf/q
key: value: 00 01 02 03

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpftool: make key and value optional in update command

Bpftool expects both key and value for 'update' operations. For some
map types, key should not be specified. Support updating those map types.

Before:
bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q
bpftool map update pinned /sys/fs/bpf/q value 0 1 2 3
Error: did not find key

After:
bpftool map create /sys/fs/bpf/q type queue value 4 entries 10 name q
bpftool map update pinned /sys/fs/bpf/q value 0 1 2 3

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'bpf-int128-btf'

Yonghong Song says:

====================
Previous maximum supported integer bit width is 64. But
the __int128 type has been supported by most (if not all)
64bit architectures including bpf for both gcc and clang.

The kernel itself uses __int128 for x64 and arm64. Some bcc
tools are using __int128 in bpf programs to describe ipv6
addresses. Without 128bit int support, the vmlinux BTF won't
work and those bpf programs using __int128 cannot utilize BTF.

This patch set therefore implements BTF __int128 support.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tools/bpf: support __int128 in bpftool map pretty dumper

For formatted output, currently when json is enabled, the decimal
number is required. Similar to kernel bpffs printout,
for int128 numbers, only hex numbers are dumped, which are
quoted as strings.

The below is an example to show plain and json pretty print
based on the map in test_btf pretty print test.

  $ bpftool m s
  75: hash  name pprint_test_has  flags 0x0
        key 4B  value 112B  max_entries 4  memlock 4096B
  $ bpftool m d id 75
  ......
    {
        "key": 3,
        "value": {
            "ui32": 3,
            "ui16": 0,
            "si32": -3,
            "unused_bits2a": 0x3,
            "bits28": 0x3,
            "unused_bits2b": 0x3,
            "": {
                "ui64": 3,
                "ui8a": [3,0,0,0,0,0,0,0
                ]
            },
            "aenum": 3,
            "ui32b": 4,
            "bits2c": 0x1,
            "si128a": 0x3,
            "si128b": 0xfffffffd,
            "bits3": 0x3,
            "bits80": 0x10000000000000003,
            "ui128": 0x20000000000000003
        }
    },
  ......

  $ bptfool -p -j m d id 75
  ......
  {
        "key": ["0x03","0x00","0x00","0x00"
        ],
        "value": ["0x03","0x00","0x00","0x00","0x00","0x00","0x00","0x00",
                  "0xfd","0xff","0xff","0xff","0x0f","0x00","0x00","0xc0",
                  "0x03","0x00","0x00","0x00","0x00","0x00","0x00","0x00",
                  "0x03","0x00","0x00","0x00","0x04","0x00","0x00","0x00",
                  "0x01","0x00","0x00","0x00","0x00","0x00","0x00","0x00",
                  "0x00","0x00","0x00","0x00","0x00","0x00","0x00","0x00",
                  "0x03","0x00","0x00","0x00","0x00","0x00","0x00","0x00",
                  "0x00","0x00","0x00","0x00","0x00","0x00","0x00","0x00",
                  "0xfd","0xff","0xff","0xff","0x00","0x00","0x00","0x00",
                  "0x00","0x00","0x00","0x00","0x00","0x00","0x00","0x00",
                  "0x1b","0x00","0x00","0x00","0x00","0x00","0x00","0x00",
                  "0x08","0x00","0x00","0x00","0x00","0x00","0x00","0x00",
                  "0x03","0x00","0x00","0x00","0x00","0x00","0x00","0x00",
                  "0x02","0x00","0x00","0x00","0x00","0x00","0x00","0x00"
        ],
        "formatted": {
            "key": 3,
            "value": {
                "ui32": 3,
                "ui16": 0,
                "si32": -3,
                "unused_bits2a": "0x3",
                "bits28": "0x3",
                "unused_bits2b": "0x3",
                "": {
                    "ui64": 3,
                    "ui8a": [3,0,0,0,0,0,0,0
                    ]
                },
                "aenum": 3,
                "ui32b": 4,
                "bits2c": "0x1",
                "si128a": "0x3",
                "si128b": "0xfffffffd",
                "bits3": "0x3",
                "bits80": "0x10000000000000003",
                "ui128": "0x20000000000000003"
            }
        }
    }
  ......

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tools/bpf: add bpffs pretty print test for int128

The bpffs pretty print test is extended to cover int128 types.
Tested on an x64 machine.
  $ test_btf -p
  ......
  BTF pretty print array(#3)......OK
  PASS:9 SKIP:0 FAIL:0

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tools/bpf: refactor test_btf pretty printing for multiple map value formats

The test_btf pretty print is refactored in order to easily
support multiple map value formats. The next patch will
add __int128 type tests which needs macro guard __SIZEOF_INT128__.
There is no functionality change with this patch.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tools/bpf: add int128 raw test in test_btf

Several int128 raw type tests are added to test_btf.
Currently these tests are enabled only for x64 and arm64
for which kernel has CONFIG_ARCH_SUPPORTS_INT128 set.

  $ test_btf
  ......
  BTF raw test[106] (128-bit int): OK
  BTF raw test[107] (struct, 128-bit int member): OK
  BTF raw test[108] (struct, 120-bit int member bitfield): OK
  BTF raw test[109] (struct, kind_flag, 128-bit int member): OK
  BTF raw test[110] (struct, kind_flag, 120-bit int member bitfield): OK
  ......

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: btf: support 128 bit integer type

Currently, btf only supports up to 64-bit integer.
On the other hand, 128bit support for gcc and clang
has existed for a long time. For example, both gcc 4.8
and llvm 3.7 supports types "__int128" and
"unsigned __int128" for virtually all 64bit architectures
including bpf.

The requirement for __int128 support comes from two areas:
  . bpf program may use __int128. For example, some bcc tools
    (https://github.com/iovisor/bcc/tree/master/tools),
    mostly tcp v6 related, tcpstates.py, tcpaccept.py, etc.,
    are using __int128 to represent the ipv6 addresses.
  . linux itself is using __int128 types. Hence supporting
    __int128 type in BTF is required for vmlinux BTF,
    which will be used by "compile once and run everywhere"
    and other projects.

For 128bit integer, instead of base-10, hex numbers are pretty
printed out as large decimal number is hard to decipher, e.g.,
for ipv6 addresses.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

libbpf: don't define CC and AR

We are already including tools/scripts/Makefile.include which correctly
handles CROSS_COMPILE, no need to define our own vars.

See related commit 7ed1c1901fe5 ("tools: fix cross-compile var clobbering")
for more details.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

networking: Documentation: fix snmp_counters.rst Sphinx warnings

Fix over 100 documentation warnings in snmp_counter.rst by
extending the underline string lengths and inserting a blank line
after bullet items.

Examples:

Documentation/networking/snmp_counter.rst:1: WARNING: Title overline too short.
Documentation/networking/snmp_counter.rst:14: WARNING: Bullet list ends without a blank line; unexpected unindent.

Fixes: 2b96547223e3 ("add document for TCP OFO, PAWS and skip ACK counters")
Fixes: 8e2ea53a83df ("add snmp counters document")
Fixes: 712ee16c230f ("add documents for snmp counters")
Fixes: 80cc49507ba4 ("net: Add part of TCP counts explanations in snmp_counters.rst")
Fixes: b08794a922c4 ("documentation of some IP/ICMP snmp counters")

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: yupeng <yupeng0921@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net, decnet: use struct_size() in kzalloc()

One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with memory
for some number of elements for that array. For example:

struct foo {
int stuff;
struct boo entry[];
};

instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_nve: Use struct_size() in kzalloc()

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
struct boo entry[];
};

instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl_bloom_filter: use struct_size() in kzalloc()

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
void *entry[];
};

instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This issue was detected with the help of Coccinelle.

Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: dsa: ksz9477: fix indentation for switch spi bindings

Switch bindings for spi managed mode are using spaces instead of tabs.
Fix them to get a file with a proper kernel indentation style.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Fix ERROR:do not initialise statics to 0 in af_vsock.c

Found by scripts/checkpatch.pl
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2019-01-15

This series contains updates to the ice driver only.

Bruce fixes an unused variable build warning, which was introduced with
the commit 2fd527b72bb6 ("net: ndo_bridge_setlink: Add extack").  Added
ethtool support for get_eeprom and get_eeprom_len operations.  Added
support for bringing down the PHY link optional when the interface is
administratively downed.

Anirudh refactors the transmit scheduler functions, which results in
reduced code duplication and adds a helper function, which all the
scheduler functions call instead.  Added an LED blinking handler to
ethtool.  Reworked the queue management code to allow for reuse in
future XDP feature support.  Updates the driver to be able to preserve
the aggregator list after reset by moving it out of port_info and into
ice_hw.  Added the ability to offload SCTP checksum calculation to the
hardware.  Added support for new PHY types, which support higher link
speeds.

Md Fahad makes sure that RSS lookup table and hash key get configured
during the rebuild path after a reset.

Brett updates the driver to set the physical link state according to the
netdev state (up/down).  Added support for adaptive/dynamic interrupt
moderation in the ice driver, along with the ethtool operations needed.

Tony adds software timestamping support by using
ethtool_op_get_ts_info().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ice: add const qualifier to mac_addr parameter

The function ice_aq_manage_mac_write takes a pointer to a MAC address.
The parameter is not marked const, even though the function doesn't need
to modify it. This prevents passing a parameter that is already marked
const. Update the function prototype to take a const pointer, to allow
passing constant pointers to this function.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Add support for new PHY types

This patch adds code for the detection and operation of several
additional PHY types that support higher link speeds.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Offload SCTP checksum

This patch adds the ability to offload SCTP checksum calculations to the
NIC.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Allow for software timestamping

Use ethtool_op_get_ts_info to provide software timestamping.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Implement getting and setting ethtool coalesce

This patch includes the following ethtool operations:

1. get_coalesce
2. set_coalesce
3. get_per_q_coalesce
4. set_per_q_coalesce

Each ITR value (current_itr/target_itr) are stored on a per
ice_ring_container basis. This is because each valid ice_ring_container
can have 1 or more rings that are tied to the same q_vector ITR index.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Add support for adaptive interrupt moderation

Currently the driver does not support adaptive/dynamic interrupt
moderation. This patch adds support for this. Also, adaptive/dynamic
interrupt moderation is turned on by default upon driver load.

In order to support adaptive interrupt moderation, two functions were
added, ice_update_itr() and ice_itr_divisor(). These are used to
determine the current packet load and to determine a divisor based
on link speed respectively.

This patch also adds the ICE_ITR_GRAN_S define that is used in the
hot-path when setting a new ITR value. The shift is used to pet two
birds with one hand, set the ITR value while re-enabling the
interrupt. Also, the ICE_ITR_GRAN_S is defined as 1 because the device
has a ITR granularity of 2usecs.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Move aggregator list into ice_hw instance

The aggregator list needs to be preserved for use after a reset. This
patch moves it out of the port_info instance and into the ice_hw instance.

Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Rework queue management code for reuse

This patch reworks the queue management code to allow for reuse with the
XDP feature (to be added in a future patch).

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Add ethtool private flag to make forcing link down optional

Add new infrastructure for implementing ethtool private flags using the
existing pf->flags bitmap to store them, and add the link-down-on-close
ethtool private flag to optionally bring down the PHY link when the
interface is administratively downed.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Set physical link up/down when an interface is set up/down

When a netdev is set up/down we need to set the phsyical link state
accordingly. This patch adds that functionality by calling
ice_force_phys_link_state(vsi, link_up) in both the ice_stop() and
ice_open() paths.

In order to force link, ice_force_phys_link_state(vsi, link_up) will
first determine the current phy capabilities. If link has not changed
there is nothing to do. If link has changed, previous PHY capabilities
are saved and the "Enable Automatic Link Update" and "Link Establishment
State Machine (LESM)" enable bits are set. Then the new PHY config is
saved. The "Enable Automatic Link Update" will force the FW to execute
Setup link and restart auto-negotiation. This *should* then result in a
"Link Status Event (LSE)" which will cause the driver to get the current
link status.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Implement support for normal get_eeprom[_len] ethtool ops

Add support for get_eeprom and get_eeprom_len ethtool ops

Specification states that PF software accesses NVM (shadow-ram) via AQ
commands (e.g. NVM Read, NVM Write) in the range 0x000000-0x00FFFF (64KB),
so the get_eeprom_len op should return 64KB. If additional regions of the
16MB NVM must be read, another access method must be used.

The ethtool kernel code, by default, will ask for multiple page-size hunks
of the NVM not to exceed the value returned by ice_get_eeprom_len().
ice_read_sr_buf() deals with arch page sizes different than 4KB.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Add ethtool set_phys_id handler

Add led blinking handler to ethtool. Since led blinking is
controlled by FW/HW only ETHTOOL_ID_ACTIVE and ETHTOOL_ID_INACTIVE
are really needed.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Configure RSS LUT and HASH KEY in rebuild path

This patch configures the RSS lookup table and hash key post reset.

Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Refactor a few Tx scheduler functions

The following functions were refactored to call a new common function,
ice_aqc_send_sched_elem_cmd():

- ice_aq_add_sched_elems()
- ice_aq_delete_sched_elems()
- ice_aq_move_sched_elems()
- ice_aq_query_sched_elems()
- ice_aq_cfg_sched_elems()
- ice_aq_suspend_sched_elems()
- ice_aq_resume_sched_elems()

Signed-off-by: Greg Priest <greg.priest@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge tag 'trace-v5.0-rc1' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
"Andrea Righi fixed a NULL pointer dereference in trace_kprobe_create()

  It is possible to trigger a NULL pointer dereference by writing an
  incorrectly formatted string to the krpobe_events file"

* tag 'trace-v5.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/kprobes: Fix NULL pointer dereference in trace_kprobe_create()

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix regression in multi-SKB responses to RTM_GETADDR, from Arthur
    Gautier.

2) Fix ipv6 frag parsing in openvswitch, from Yi-Hung Wei.

3) Unbounded recursion in ipv4 and ipv6 GUE tunnels, from Stefano
    Brivio.

4) Use after free in hns driver, from Yonglong Liu.

5) icmp6_send() needs to handle the case of NULL skb, from Eric
    Dumazet.

6) Missing rcu read lock in __inet6_bind() when operating on mapped
    addresses, from David Ahern.

7) Memory leak in tipc-nl_compat_publ_dump(), from Gustavo A. R. Silva.

8) Fix PHY vs r8169 module loading ordering issues, from Heiner
    Kallweit.

9) Fix bridge vlan memory leak, from Ido Schimmel.

10) Dev refcount leak in AF_PACKET, from Jason Gunthorpe.

11) Infoleak in ipv6_local_error(), flow label isn't completely
    initialized. From Eric Dumazet.

12) Handle mv88e6390 errata, from Andrew Lunn.

13) Making vhost/vsock CID hashing consistent, from Zha Bin.

14) Fix lack of UMH cleanup when it unexpectedly exits, from Taehee Yoo.

15) Bridge forwarding must clear skb->tstamp, from Paolo Abeni.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
  bnxt_en: Fix context memory allocation.
  bnxt_en: Fix ring checking logic on 57500 chips.
  mISDN: hfcsusb: Use struct_size() in kzalloc()
  net: clear skb->tstamp in bridge forwarding path
  net: bpfilter: disallow to remove bpfilter module while being used
  net: bpfilter: restart bpfilter_umh when error occurred
  net: bpfilter: use cleanup callback to release umh_info
  umh: add exit routine for UMH process
  isdn: i4l: isdn_tty: Fix some concurrency double-free bugs
  vhost/vsock: fix vhost vsock cid hashing inconsistent
  net: stmmac: Prevent RX starvation in stmmac_napi_poll()
  net: stmmac: Fix the logic of checking if RX Watchdog must be enabled
  net: stmmac: Check if CBS is supported before configuring
  net: stmmac: dwxgmac2: Only clear interrupts that are active
  net: stmmac: Fix PCI module removal leak
  tools/bpf: fix bpftool map dump with bitfields
  tools/bpf: test btf bitfield with >=256 struct member offset
  bpf: fix bpffs bitfield pretty print
  net: ethernet: mediatek: fix warning in phy_start_aneg
  tcp: change txhash on SYN-data timeout
  ...

ice: Fix unused variable build warning

Commit 2fd527b72bb6 ("net: ndo_bridge_setlink: Add extack") added a new
parameter "extack" to ice_bridge_setlink but this parameter isn't used
by the function. This results in a warning: unused parameter ‘extack’
[-Wunused-parameter]. Fix that by adding an "__always_unused" qualifier.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

tracing/kprobes: Fix NULL pointer dereference in trace_kprobe_create()

It is possible to trigger a NULL pointer dereference by writing an
incorrectly formatted string to krpobe_events (trying to create a
kretprobe omitting the symbol).

Example:

echo "r:event_1 " >> /sys/kernel/debug/tracing/kprobe_events

That triggers this:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 6 PID: 1757 Comm: bash Not tainted 5.0.0-rc1+ #125
Hardware name: Dell Inc. XPS 13 9370/0F6P3V, BIOS 1.5.1 08/09/2018
RIP: 0010:kstrtoull+0x2/0x20
Code: 28 00 00 00 75 17 48 83 c4 18 5b 41 5c 5d c3 b8 ea ff ff ff eb e1 b8 de ff ff ff eb da e8 d6 36 bb ff 66 0f 1f 44 00 00 31 c0 <80> 3f 2b 55 48 89 e5 0f 94 c0 48 01 c7 e8 5c ff ff ff 5d c3 66 2e
RSP: 0018:ffffb5d482e57cb8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff82b12720
RDX: ffffb5d482e57cf8 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffb5d482e57d70 R08: ffffa0c05e5a7080 R09: ffffa0c05e003980
R10: 0000000000000000 R11: 0000000040000000 R12: ffffa0c04fe87b08
R13: 0000000000000001 R14: 000000000000000b R15: ffffa0c058d749e1
FS:  00007f137c7f7740(0000) GS:ffffa0c05e580000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000497d46004 CR4: 00000000003606e0
Call Trace:
  ? trace_kprobe_create+0xb6/0x840
  ? _cond_resched+0x19/0x40
  ? _cond_resched+0x19/0x40
  ? __kmalloc+0x62/0x210
  ? argv_split+0x8f/0x140
  ? trace_kprobe_create+0x840/0x840
  ? trace_kprobe_create+0x840/0x840
  create_or_delete_trace_kprobe+0x11/0x30
  trace_run_command+0x50/0x90
  trace_parse_run_command+0xc1/0x160
  probes_write+0x10/0x20
  __vfs_write+0x3a/0x1b0
  ? apparmor_file_permission+0x1a/0x20
  ? security_file_permission+0x31/0xf0
  ? _cond_resched+0x19/0x40
  vfs_write+0xb1/0x1a0
  ksys_write+0x55/0xc0
  __x64_sys_write+0x1a/0x20
  do_syscall_64+0x5a/0x120
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix by doing the proper argument checks in trace_kprobe_create().

Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/lkml/20190111095108.b79a2ee026185cbd62365977@kernel.org
Link: http://lkml.kernel.org/r/20190111060113.GA22841@xps-13
Fixes: 6212dd29683e ("tracing/kprobes: Use dyn_event framework for kprobe events")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

sbitmap: Protect swap_lock from hardirq

Because we may call blk_mq_get_driver_tag() directly from
blk_mq_dispatch_rq_list() without holding any lock, then HARDIRQ may
come and the above DEADLOCK is triggered.

Commit ab53dcfb3e7b ("sbitmap: Protect swap_lock from hardirq") tries to
fix this issue by using 'spin_lock_bh', which isn't enough because we
complete request from hardirq context direclty in case of multiqueue.

Cc: Clark Williams <williams@redhat.com>
Fixes: ab53dcfb3e7b ("sbitmap: Protect swap_lock from hardirq")
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

sbitmap: Protect swap_lock from softirqs

The swap_lock used by sbitmap has a chain with locks taken from softirq,
but the swap_lock is not protected from being preempted by softirqs.

A chain exists of:

sbq->ws[i].wait -> dispatch_wait_lock -> swap_lock

Where the sbq->ws[i].wait lock can be taken from softirq context, which
means all locks below it in the chain must also be protected from
softirqs.

Reported-by: Clark Williams <williams@redhat.com>
Fixes: 58ab5e32e6fd ("sbitmap: silence bogus lockdep IRQ warning")
Fixes: ea86ea2cdced ("sbitmap: amortize cost of clearing bits")
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'gpio-v5.0-2' of git://git./linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
"The patch hitting the MMC/SD subsystem is fixing up my own mess when
  moving semantics from MMC/SD over to gpiolib. Ulf is on vacation but I
  managed to reach him on chat and obtain his ACK.

  The other two are early-rc fixes that are not super serious but pretty
  annoying so I'd like to get rid of them.

  Summary:

   - Get rid of some WARN_ON() from the ACPI code

   - Staticize a symbol

   - Fix MMC polarity detection"

* tag 'gpio-v5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  mmc: core: don't override the CD GPIO level when "cd-inverted" is set
  gpio: pca953x: Make symbol 'pca953x_i2c_regmap' static
  gpiolib-acpi: Remove unnecessary WARN_ON from acpi_gpiochip_free_interrupts

Merge tag 'mfd-next-4.21' of git://git./linux/kernel/git/lee/mfd

Pull MFD updates from Lee Jones:
"New Device Support
   - Add support for Power Supply to AXP813
   - Add support for GPIO, ADC, AC and Battery Power Supply to AXP803
   - Add support for UART to Exynos LPASS

  Fix-ups:
   - Use supplied MACROS; ti_am335x_tscadc
   - Trivial spelling/whitespace/alignment; tmio, axp20x, rave-sp
   - Regmap changes; bd9571mwv, wm5110-tables
   - Kconfig dependencies; MFD_AT91_USART
   - Supply shared data for child-devices; madera-core
   - Use new of_node_name_eq() API call; max77620, stmpe
   - Use managed resources (devm_*); tps65218
   - Comment descriptions; ingenic-tcu
   - Coding style; madera-core

  Bug Fixes:
   - Fix section mismatches; twl-core, db8500-prcmu
   - Correct error path related issues; mt6397-core, ab8500-core, mc13xxx-core
   - IRQ related fixes; tps6586x
   - Ensure proper initialisation sequence; qcom_rpm
   - Repair potential memory leak; cros_ec_dev"

* tag 'mfd-next-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (25 commits)
  mfd: exynos-lpass: Enable UART module support
  mfd: mc13xxx: Fix a missing check of a register-read failure
  mfd: cros_ec: Add commands to control codec
  mfd: madera: Remove spurious semicolon in while loop
  mfd: rave-sp: Fix typo in rave_sp_checksum comment
  mfd: ingenic-tcu: Fix bit field description in header
  mfd: tps65218: Use devm_regmap_add_irq_chip and clean up error path in probe()
  mfd: Use of_node_name_eq() for node name comparisons
  mfd: cros_ec_dev: Add missing mfd_remove_devices() call in remove
  mfd: axp20x: Add supported cells for AXP803
  mfd: axp20x: Re-align MFD cell entries
  mfd: axp20x: Add AC power supply cell for AXP813
  mfd: wm5110: Add missing ASRC rate register
  mfd: qcom_rpm: write fw_version to CTRL_REG
  mfd: tps6586x: Handle interrupts on suspend
  mfd: madera: Add shared data for accessory detection
  mfd: at91-usart: Add platform dependency
  mfd: bd9571mwv: Add volatile register to make DVFS work
  mfd: ab8500-core: Return zero in get_register_interruptible()
  mfd: tmio: Typo s/use use/use/
  ...

Merge tag 'backlight-next-4.21' of git://git./linux/kernel/git/lee/backlight

Pull backlight updates from Lee Jones:
"Fix-ups:
   - Use new of_node_name_eq() API call

  Bug Fixes:
   - Internally track 'enabled' state in pwm_bl
   - Fix auto-generated pwm_bl brightness tables parsed by DT

* tag 'backlight-next-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
  backlight: 88pm860x_bl: Use of_node_name_eq for node name comparisons
  backlight: pwm_bl: Fix devicetree parsing with auto-generated brightness tables
  backlight: pwm_bl: Re-add driver internal enabled tracking

Linux 5.0-rc2

kernel/sys.c: Clarify that UNAME26 does not generate unique versions anymore

UNAME26 is a mechanism to report Linux's version as 2.6.x, for
compatibility with old/broken software.  Due to the way it is
implemented, it would have to be updated after 5.0, to keep the
resulting versions unique.  Linus Torvalds argued:

"Do we actually need this?

  I'd rather let it bitrot, and just let it return random versions. It
  will just start again at 2.4.60, won't it?

  Anybody who uses UNAME26 for a 5.x kernel might as well think it's
  still 4.x. The user space is so old that it can't possibly care about
  differences between 4.x and 5.x, can it?

  The only thing that matters is that it shows "2.4.<largeenough>",
  which it will do regardless"

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"A bigger batch than I anticipated this week, for two reasons:

   - Some fallout on Davinci from board file -> DTB conversion, that
     also includes a few longer-standing fixes (i.e. not recent
     regressions).

   - drivers/reset material that has been in linux-next for a while, but
     didn't get sent to us until now for a variety of reasons
     (maintainer out sick, holidays, etc). There's a functional
     dependency in there such that one platform (Altera's SoCFPGA) won't
     boot without one of the patches; instead of reverting the patch
     that got merged, I looked at this set and decided it was small
     enough that I'll pick it up anyway. If you disagree I can revisit
     with a smaller set.

  That being said, there's also a handful of the usual stuff:

   - Fix for a crash on Armada 7K/8K when the kernel touches
     PSCI-reserved memory

   - Fix for PCIe reset on Macchiatobin (Armada 8K development board,
     what this email is sent from in fact :)

   - Enable a few new-merged modules for Amlogic in arm64 defconfig

   - Error path fixes on Integrator

   - Build fix for Renesas and Qualcomm

   - Initialization fix for Renesas RZ/G2E

  .. plus a few more fixlets"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits)
  ARM: integrator: impd1: use struct_size() in devm_kzalloc()
  qcom-scm: Include <linux/err.h> header
  gpio: pl061: handle failed allocations
  ARM: dts: kirkwood: Fix polarity of GPIO fan lines
  arm64: dts: marvell: mcbin: fix PCIe reset signal
  arm64: dts: marvell: armada-ap806: reserve PSCI area
  ARM: dts: da850-lcdk: Correct the sound card name
  ARM: dts: da850-lcdk: Correct the audio codec regulators
  ARM: dts: da850-evm: Correct the sound card name
  ARM: dts: da850-evm: Correct the audio codec regulators
  ARM: davinci: omapl138-hawk: fix label names in GPIO lookup entries
  ARM: davinci: dm644x-evm: fix label names in GPIO lookup entries
  ARM: davinci: dm355-evm: fix label names in GPIO lookup entries
  ARM: davinci: da850-evm: fix label names in GPIO lookup entries
  ARM: davinci: da830-evm: fix label names in GPIO lookup entries
  arm64: defconfig: enable modules for amlogic s400 sound card
  reset: uniphier-glue: Add AHCI reset control support in glue layer
  dt-bindings: reset: uniphier: Add AHCI core reset description
  reset: uniphier-usb3: Rename to reset-uniphier-glue
  dt-bindings: reset: uniphier: Replace the expression of USB3 with generic peripherals
  ...

Merge tag 'for-5.0-rc1-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- two regression fixes in clone/dedupe ioctls, the generic check
   callback needs to lock extents properly and wait for io to avoid
   problems with writeback and relocation

- fix deadlock when using free space tree due to block group creation

- a recently added check refuses a valid fileystem with seeding device,
   make that work again with a quickfix, proper solution needs more
   intrusive changes

* tag 'for-5.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: Use real device structure to verify dev extent
  Btrfs: fix deadlock when using free space tree due to block group creation
  Btrfs: fix race between reflink/dedupe and relocation
  Btrfs: fix race between cloning range ending at eof and writeback

Merge tag 'driver-core-5.0-rc2' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
"Here is one small sysfs change, and a documentation update for 5.0-rc2

  The sysfs change moves from using BUG_ON to WARN_ON, as discussed in
  an email thread on lkml while trying to track down another driver bug.
  sysfs should not be crashing and preventing people from seeing where
  they went wrong. Now it properly recovers and warns the developer.

  The documentation update removes the use of BUS_ATTR() as the kernel
  is moving away from this to use the specific BUS_ATTR_RW() and friends
  instead. There are pending patches in all of the different subsystems
  to remove the last users of this macro, but for now, don't advertise
  it should be used anymore to keep new ones from being introduced.

  Both have been in linux-next with no reported issues"

* tag 'driver-core-5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  Documentation: driver core: remove use of BUS_ATTR
  sysfs: convert BUG_ON to WARN_ON

Merge tag 'staging-5.0-rc2' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for some reported issues.

  One reverts a patch that was made to the rtl8723bs driver that turned
  out to not be needed at all as it was a bug in clang. The others fix
  up some reported issues in the rtl8188eu driver and update the
  MAINTAINERS file to point to Larry for this driver so he can get the
  bug reports easier.

  All have been in linux-next with no reported issues"

* tag 'staging-5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  Revert "staging: rtl8723bs: Mark ACPI table declaration as used"
  staging: rtl8188eu: Fix module loading from tasklet for WEP encryption
  staging: rtl8188eu: Fix module loading from tasklet for CCMP encryption
  MAINTAINERS: Add entry for staging driver r8188eu

Merge tag 'tty-5.0-rc2' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
"Here are 2 tty and serial fixes for 5.0-rc2 that resolve some reported
  issues.

  The first is a simple serial driver fix for a regression that showed
  up in 5.0-rc1. The second one resolves a number of reported issues
  with the recent tty locking fixes that went into 5.0-rc1. Lots of
  people have tested the second one and say it resolves their issues.

  Both have been in linux-next with no reported issues"

* tag 'tty-5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: Don't hold ldisc lock in tty_reopen() if ldisc present
  serial: lantiq: Do not swap register read/writes