platform/kernel/linux-rpi.git
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Mon, 25 Nov 2019 23:46:58 +0000 (15:46 -0800)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2019-11-26

The following pull-request contains BPF updates for your *net-next* tree.

We've added 2 non-merge commits during the last 1 day(s) which contain
a total of 2 files changed, 14 insertions(+), 3 deletions(-).

The main changes, 2 small fixes are:

1) Fix libbpf out of tree compilation which complained about unknown u32
   type used in libbpf_find_vmlinux_btf_id() which needs to be __u32 instead,
   from Andrii Nakryiko.

2) Follow-up fix for the prior BPF mmap series where kbuild bot complained
   about missing vmalloc_user_node_flags() for no-MMU, also from Andrii Nakryiko.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
David S. Miller [Mon, 25 Nov 2019 22:57:26 +0000 (14:57 -0800)]
Merge git://git./linux/kernel/git/netdev/net

Merge in networking bug fixes for merge window.

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agolibbpf: Fix usage of u32 in userspace code
Andrii Nakryiko [Mon, 25 Nov 2019 21:29:48 +0000 (13:29 -0800)]
libbpf: Fix usage of u32 in userspace code

u32 is not defined for libbpf when compiled outside of kernel sources (e.g.,
in Github projection). Use __u32 instead.

Fixes: b8c54ea455dc ("libbpf: Add support to attach to fentry/fexit tracing progs")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191125212948.1163343-1-andriin@fb.com
4 years agomm: Implement no-MMU variant of vmalloc_user_node_flags
Andrii Nakryiko [Sat, 23 Nov 2019 22:08:35 +0000 (14:08 -0800)]
mm: Implement no-MMU variant of vmalloc_user_node_flags

To fix build with !CONFIG_MMU, implement it for no-MMU configurations as well.

Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20191123220835.1237773-1-andriin@fb.com
4 years agoslip: Fix use-after-free Read in slip_open
Jouni Hogander [Mon, 25 Nov 2019 12:23:43 +0000 (14:23 +0200)]
slip: Fix use-after-free Read in slip_open

Slip_open doesn't clean-up device which registration failed from the
slip_devs device list. On next open after failure this list is iterated
and freed device is accessed. Fix this by calling sl_free_netdev in error
path.

Here is the trace from the Syzbot:

__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x197/0x210 lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
__kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
kasan_report+0x12/0x20 mm/kasan/common.c:634
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
sl_sync drivers/net/slip/slip.c:725 [inline]
slip_open+0xecd/0x11b7 drivers/net/slip/slip.c:801
tty_ldisc_open.isra.0+0xa3/0x110 drivers/tty/tty_ldisc.c:469
tty_set_ldisc+0x30e/0x6b0 drivers/tty/tty_ldisc.c:596
tiocsetd drivers/tty/tty_io.c:2334 [inline]
tty_ioctl+0xe8d/0x14f0 drivers/tty/tty_io.c:2594
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:509 [inline]
do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:696
ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 3b5a39979daf ("slip: Fix memory leak in slip_open error path")
Reported-by: syzbot+4d5170758f3762109542@syzkaller.appspotmail.com
Cc: David Miller <davem@davemloft.net>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jouni Hogander <jouni.hogander@unikie.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: sja1105: fix sja1105_parse_rgmii_delays()
Oleksij Rempel [Mon, 25 Nov 2019 11:43:51 +0000 (12:43 +0100)]
net: dsa: sja1105: fix sja1105_parse_rgmii_delays()

This function was using configuration of port 0 in devicetree for all ports.
In case CPU port was not 0, the delay settings was ignored. This resulted not
working communication between CPU and the switch.

Fixes: f5b8631c293b ("net: dsa: sja1105: Error out if RGMII delays are requested in DT")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomacvlan: schedule bc_work even if error
Menglong Dong [Mon, 25 Nov 2019 08:58:09 +0000 (16:58 +0800)]
macvlan: schedule bc_work even if error

While enqueueing a broadcast skb to port->bc_queue, schedule_work()
is called to add port->bc_work, which processes the skbs in
bc_queue, to "events" work queue. If port->bc_queue is full, the
skb will be discarded and schedule_work(&port->bc_work) won't be
called. However, if port->bc_queue is full and port->bc_work is not
running or pending, port->bc_queue will keep full and schedule_work()
won't be called any more, and all broadcast skbs to macvlan will be
discarded. This case can happen:

macvlan_process_broadcast() is the pending function of port->bc_work,
it moves all the skbs in port->bc_queue to the queue "list", and
processes the skbs in "list". During this, new skbs will keep being
added to port->bc_queue in macvlan_broadcast_enqueue(), and
port->bc_queue may already full when macvlan_process_broadcast()
return. This may happen, especially when there are a lot of real-time
threads and the process is preempted.

Fix this by calling schedule_work(&port->bc_work) even if
port->bc_work is full in macvlan_broadcast_enqueue().

Fixes: 412ca1550cbe ("macvlan: Move broadcasts into a work queue")
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoenetc: add support Credit Based Shaper(CBS) for hardware offload
Po Liu [Mon, 25 Nov 2019 05:56:56 +0000 (05:56 +0000)]
enetc: add support Credit Based Shaper(CBS) for hardware offload

The ENETC hardware support the Credit Based Shaper(CBS) which part
of the IEEE-802.1Qav. The CBS driver was loaded by the sch_cbs
interface when set in the QOS in the kernel.

Here is an example command to set 20Mbits bandwidth in 1Gbits port
for taffic class 7:

tc qdisc add dev eth0 root handle 1: mqprio \
   num_tc 8 map 0 1 2 3 4 5 6 7 hw 1

tc qdisc replace dev eth0 parent 1:8 cbs \
   locredit -1470 hicredit 30 \
   sendslope -980000 idleslope 20000 offload 1

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: add helpers phy_(un)lock_mdio_bus
Heiner Kallweit [Sat, 23 Nov 2019 16:28:37 +0000 (17:28 +0100)]
net: phy: add helpers phy_(un)lock_mdio_bus

Add helpers to make locking/unlocking the MDIO bus easier.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomdio_bus: don't use managed reset-controller
David Bauer [Fri, 22 Nov 2019 21:44:51 +0000 (22:44 +0100)]
mdio_bus: don't use managed reset-controller

Geert Uytterhoeven reported that using devm_reset_controller_get leads
to a WARNING when probing a reset-controlled PHY. This is because the
device devm_reset_controller_get gets supplied is not actually the
one being probed.

Acquire an unmanaged reset-control as well as free the reset_control on
unregister to fix this.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
CC: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David Bauer <mail@david-bauer.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Mon, 25 Nov 2019 06:05:47 +0000 (22:05 -0800)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2019-11-24

The following pull-request contains BPF updates for your *net-next* tree.

We've added 27 non-merge commits during the last 4 day(s) which contain
a total of 50 files changed, 2031 insertions(+), 548 deletions(-).

The main changes are:

1) Optimize bpf_tail_call() from retpoline-ed indirect jump to direct jump,
   from Daniel.

2) Support global variables in libbpf, from Andrii.

3) Cleanup selftests with BPF_TRACE_x() macro, from Martin.

4) Fix devmap hash, from Toke.

5) Fix register bounds after 32-bit conditional jumps, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetoot...
Jakub Kicinski [Mon, 25 Nov 2019 02:46:13 +0000 (18:46 -0800)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2019-11-24

Here's one last bluetooth-next pull request for the 5.5 kernel:

 - Fix BDADDR_PROPERTY & INVALID_BDADDR quirk handling
 - Added support for BCM4334B0 and BCM4335A0 controllers
 - A few other smaller fixes related to locking and memory leaks
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agoax88179_178a: add ethtool_op_get_ts_info()
Andreas K. Besslein [Sat, 23 Nov 2019 21:04:47 +0000 (22:04 +0100)]
ax88179_178a: add ethtool_op_get_ts_info()

This enables the use of SW timestamping.

ax88179_178a uses the usbnet transmit function usbnet_start_xmit() which
implements software timestamping. ax88179_178a overrides ethtool_ops but
missed to set .get_ts_info. This caused SOF_TIMESTAMPING_TX_SOFTWARE
capability to be not available.

Signed-off-by: Andreas K. Besslein <besslein.andreas@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agoMerge branch 'mlxsw-Two-small-updates'
Jakub Kicinski [Mon, 25 Nov 2019 02:35:05 +0000 (18:35 -0800)]
Merge branch 'mlxsw-Two-small-updates'

Ido Schimmel says:

====================
mlxsw: Two small updates

Patch #1 from Petr handles a corner case in GRE tunnel offload.

Patch #2 from Amit fixes a recent issue where the driver was programming
the device to use an adjacency index (for a nexthop) that was not
properly initialized.
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agomlxsw: spectrum_router: Fix use of uninitialized adjacency index
Amit Cohen [Sun, 24 Nov 2019 07:48:03 +0000 (09:48 +0200)]
mlxsw: spectrum_router: Fix use of uninitialized adjacency index

When mlxsw_sp_adj_discard_write() is called for the first time, the
value stored in 'mlxsw_sp->router->adj_discard_index' is invalid, as
indicated by 'mlxsw_sp->router->adj_discard_index_valid' being set to
'false'.

In this case, we should not use the value initially stored in
'mlxsw_sp->router->adj_discard_index' (0) and instead use the value
allocated later in the function.

Fixes: 983db6198f0d ("mlxsw: spectrum_router: Allocate discard adjacency entry when needed")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agomlxsw: spectrum_router: After underlay moves, demote conflicting tunnels
Petr Machata [Sun, 24 Nov 2019 07:48:02 +0000 (09:48 +0200)]
mlxsw: spectrum_router: After underlay moves, demote conflicting tunnels

When a GRE tunnel is bound to an underlay netdevice and that netdevice is
moved to a different VRF, that could cause two tunnels to have the same
underlay local address in the same VRF. Linux in this situation dispatches
the traffic according to the tunnel key (or lack thereof), but that cannot
be offloaded to Spectrum devices.

Detect this situation and unoffload the two impacted tunnels when it
happens.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agobpf: Simplify __bpf_arch_text_poke poke type handling
Daniel Borkmann [Sun, 24 Nov 2019 00:39:42 +0000 (01:39 +0100)]
bpf: Simplify __bpf_arch_text_poke poke type handling

Given that we have BPF_MOD_NOP_TO_{CALL,JUMP}, BPF_MOD_{CALL,JUMP}_TO_NOP
and BPF_MOD_{CALL,JUMP}_TO_{CALL,JUMP} poke types and that we also pass in
old_addr as well as new_addr, it's a bit redundant and unnecessarily
complicates __bpf_arch_text_poke() itself since we can derive the same from
the *_addr that were passed in. Hence simplify and use BPF_MOD_{CALL,JUMP}
as types which also allows to clean up call-sites.

In addition to that, __bpf_arch_text_poke() currently verifies that text
matches expected old_insn before we invoke text_poke_bp(). Also add a check
on new_insn and skip rewrite if it already matches. Reason why this is rather
useful is that it avoids making any special casing in prog_array_map_poke_run()
when old and new prog were NULL and has the benefit that also for this case
we perform a check on text whether it really matches our expectations.

Suggested-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/fcb00a2b0b288d6c73de4ef58116a821c8fe8f2f.1574555798.git.daniel@iogearbox.net
4 years agobpf: Introduce BPF_TRACE_x helper for the tracing tests
Martin KaFai Lau [Sat, 23 Nov 2019 20:25:04 +0000 (12:25 -0800)]
bpf: Introduce BPF_TRACE_x helper for the tracing tests

For BPF_PROG_TYPE_TRACING, the bpf_prog's ctx is an array of u64.
This patch borrows the idea from BPF_CALL_x in filter.h to
convert a u64 to the arg type of the traced function.

The new BPF_TRACE_x has an arg to specify the return type of a bpf_prog.
It will be used in the future TCP-ops bpf_prog that may return "void".

The new macros are defined in the new header file "bpf_trace_helpers.h".
It is under selftests/bpf/ for now.  It could be moved to libbpf later
after seeing more upcoming non-tracing use cases.

The tests are changed to use these new macros also.  Hence,
the k[s]u8/16/32/64 are no longer needed and they are removed
from the bpf_helpers.h.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191123202504.1502696-1-kafai@fb.com
4 years agobpf: Add bpf_jit_blinding_enabled for !CONFIG_BPF_JIT
Daniel Borkmann [Sat, 23 Nov 2019 20:37:31 +0000 (21:37 +0100)]
bpf: Add bpf_jit_blinding_enabled for !CONFIG_BPF_JIT

Add a definition of bpf_jit_blinding_enabled() when CONFIG_BPF_JIT is not set
in order to fix a recent build regression:

  [...]
  CC      kernel/bpf/verifier.o
  CC      kernel/bpf/inode.o
kernel/bpf/verifier.c: In function ‘fixup_bpf_calls’:
kernel/bpf/verifier.c:9132:25: error: implicit declaration of function ‘bpf_jit_blinding_enabled’; did you mean ‘bpf_jit_kallsyms_enabled’? [-Werror=implicit-function-declaration]
 9132 |  bool expect_blinding = bpf_jit_blinding_enabled(prog);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~
      |                         bpf_jit_kallsyms_enabled
  CC      kernel/bpf/helpers.o
  CC      kernel/bpf/hashtab.o
  [...]

Fixes: d2e4c1e6c294 ("bpf: Constant map key tracking for prog array pokes")
Reported-by: Jakub Sitnicki <jakub@cloudflare.com>
Reported-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/40baf8f3507cac4851a310578edfb98ce73b5605.1574541375.git.daniel@iogearbox.net
4 years agoMerge branch 'optimize-bpf_tail_call'
Alexei Starovoitov [Sat, 23 Nov 2019 02:02:44 +0000 (18:02 -0800)]
Merge branch 'optimize-bpf_tail_call'

Daniel Borkmann says:

====================
This gets rid of indirect jumps for BPF tail calls whenever possible.
The series adds emission for *direct* jumps for tail call maps in order
to avoid the retpoline overhead from a493a87f38cf ("bpf, x64: implement
retpoline for tail call") for situations that allow for it, meaning,
for known constant keys at verification time which are used as index
into the tail call map. See patch 7/8 for more general details.

Thanks!

v1  -> v2:
  - added more test cases
  - u8 ip_stable -> bool (Andrii)
  - removed bpf_map_poke_{un,}lock and simplified the code (Andrii)
  - added break into prog_array_map_poke_untrack since there's just
    one prog (Andrii)
  - fixed typo: for for in commit msg (Andrii)
  - reworked __bpf_arch_text_poke (Andrii)
  - added subtests, and comment on tests themselves, NULL-NULL
    transistion (Andrii)
  - in constant map key tracking I've moved the map_poke_track callback
    to once we've finished creating the poke tab as otherwise concurrent
    access from tail call map would blow up (since we realloc the table)
rfc -> v1:
  - Applied Alexei's and Andrii's feeback from
    https://lore.kernel.org/bpf/cover.1573779287.git.daniel@iogearbox.net/T/#t
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 years agobpf, testing: Add various tail call test cases
Daniel Borkmann [Fri, 22 Nov 2019 20:08:01 +0000 (21:08 +0100)]
bpf, testing: Add various tail call test cases

Add several BPF kselftest cases for tail calls which test the various
patch directions, and that multiple locations are patched in same and
different programs.

  # ./test_progs -n 45
   #45/1 tailcall_1:OK
   #45/2 tailcall_2:OK
   #45/3 tailcall_3:OK
   #45/4 tailcall_4:OK
   #45/5 tailcall_5:OK
   #45 tailcalls:OK
  Summary: 1/5 PASSED, 0 SKIPPED, 0 FAILED

I've also verified the JITed dump after each of the rewrite cases that
it matches expectations.

Also regular test_verifier suite passes fine which contains further tail
call tests:

  # ./test_verifier
  [...]
  Summary: 1563 PASSED, 0 SKIPPED, 0 FAILED

Checked under JIT, interpreter and JIT + hardening.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/3d6cbecbeb171117dccfe153306e479798fb608d.1574452833.git.daniel@iogearbox.net
4 years agobpf, x86: Emit patchable direct jump as tail call
Daniel Borkmann [Fri, 22 Nov 2019 20:08:00 +0000 (21:08 +0100)]
bpf, x86: Emit patchable direct jump as tail call

Add initial code emission for *direct* jumps for tail call maps in
order to avoid the retpoline overhead from a493a87f38cf ("bpf, x64:
implement retpoline for tail call") for situations that allow for
it, meaning, for known constant keys at verification time which are
used as index into the tail call map. In case of Cilium which makes
heavy use of tail calls, constant keys are used in the vast majority,
only for a single occurrence we use a dynamic key.

High level outline is that if the target prog is NULL in the map, we
emit a 5-byte nop for the fall-through case and if not, we emit a
5-byte direct relative jmp to the target bpf_func + skipped prologue
offset. Later during runtime, we patch these 5-byte nop/jmps upon
tail call map update or deletions dynamically. Note that on x86-64
the direct jmp works as we reuse the same stack frame and skip
prologue (as opposed to some other JIT implementations).

One of the issues is that the tail call map slots can change at any
given time even during JITing. Therefore, we have two passes: i) emit
nops for all patchable locations during main JITing phase until we
declare prog->jited = 1 eventually. At this point the image is stable,
not public yet and with all jmps disabled. While JITing, we collect
additional info like poke->ip in order to remember the patch location
for later modifications. In ii) bpf_tail_call_direct_fixup() walks
over the progs poke_tab, locks the tail call maps poke_mutex to
prevent from parallel updates and patches in the right locations via
__bpf_arch_text_poke(). Note, the main bpf_arch_text_poke() cannot
be used at this point since we're not yet exposed to kallsyms. For
the update we use plain memcpy() since the image is not public and
still in read-write mode. After patching, we activate that poke entry
through poke->ip_stable. Meaning, at this point any tail call map
updates/deletions are not going to ignore that poke entry anymore.
Then, bpf_arch_text_poke() might still occur on the read-write image
until we finally locked it as read-only. Both modifications on the
given image are under text_mutex to avoid interference with each
other when update requests come in in parallel for different tail
call maps (current one we have locked in JIT and different one where
poke->ip_stable was already set).

Example prog:

  # ./bpftool p d x i 1655
   0: (b7) r3 = 0
   1: (18) r2 = map[id:526]
   3: (85) call bpf_tail_call#12
   4: (b7) r0 = 1
   5: (95) exit

Before:

  # ./bpftool p d j i 1655
  0xffffffffc076e55c:
   0:   nopl   0x0(%rax,%rax,1)
   5:   push   %rbp
   6:   mov    %rsp,%rbp
   9:   sub    $0x200,%rsp
  10:   push   %rbx
  11:   push   %r13
  13:   push   %r14
  15:   push   %r15
  17:   pushq  $0x0                      _
  19:   xor    %edx,%edx                |_ index (arg 3)
  1b:   movabs $0xffff88d95cc82600,%rsi |_ map (arg 2)
  25:   mov    %edx,%edx                |  index >= array->map.max_entries
  27:   cmp    %edx,0x24(%rsi)          |
  2a:   jbe    0x0000000000000066       |_
  2c:   mov    -0x224(%rbp),%eax        |  tail call limit check
  32:   cmp    $0x20,%eax               |
  35:   ja     0x0000000000000066       |
  37:   add    $0x1,%eax                |
  3a:   mov    %eax,-0x224(%rbp)        |_
  40:   mov    0xd0(%rsi,%rdx,8),%rax   |_ prog = array->ptrs[index]
  48:   test   %rax,%rax                |  prog == NULL check
  4b:   je     0x0000000000000066       |_
  4d:   mov    0x30(%rax),%rax          |  goto *(prog->bpf_func + prologue_size)
  51:   add    $0x19,%rax               |
  55:   callq  0x0000000000000061       |  retpoline for indirect jump
  5a:   pause                           |
  5c:   lfence                          |
  5f:   jmp    0x000000000000005a       |
  61:   mov    %rax,(%rsp)              |
  65:   retq                            |_
  66:   mov    $0x1,%eax
  6b:   pop    %rbx
  6c:   pop    %r15
  6e:   pop    %r14
  70:   pop    %r13
  72:   pop    %rbx
  73:   leaveq
  74:   retq

After; state after JIT:

  # ./bpftool p d j i 1655
  0xffffffffc08e8930:
   0:   nopl   0x0(%rax,%rax,1)
   5:   push   %rbp
   6:   mov    %rsp,%rbp
   9:   sub    $0x200,%rsp
  10:   push   %rbx
  11:   push   %r13
  13:   push   %r14
  15:   push   %r15
  17:   pushq  $0x0                      _
  19:   xor    %edx,%edx                |_ index (arg 3)
  1b:   movabs $0xffff9d8afd74c000,%rsi |_ map (arg 2)
  25:   mov    -0x224(%rbp),%eax        |  tail call limit check
  2b:   cmp    $0x20,%eax               |
  2e:   ja     0x000000000000003e       |
  30:   add    $0x1,%eax                |
  33:   mov    %eax,-0x224(%rbp)        |_
  39:   jmpq   0xfffffffffffd1785       |_ [direct] goto *(prog->bpf_func + prologue_size)
  3e:   mov    $0x1,%eax
  43:   pop    %rbx
  44:   pop    %r15
  46:   pop    %r14
  48:   pop    %r13
  4a:   pop    %rbx
  4b:   leaveq
  4c:   retq

After; state after map update (target prog):

  # ./bpftool p d j i 1655
  0xffffffffc08e8930:
   0:   nopl   0x0(%rax,%rax,1)
   5:   push   %rbp
   6:   mov    %rsp,%rbp
   9:   sub    $0x200,%rsp
  10:   push   %rbx
  11:   push   %r13
  13:   push   %r14
  15:   push   %r15
  17:   pushq  $0x0
  19:   xor    %edx,%edx
  1b:   movabs $0xffff9d8afd74c000,%rsi
  25:   mov    -0x224(%rbp),%eax
  2b:   cmp    $0x20,%eax               .
  2e:   ja     0x000000000000003e       .
  30:   add    $0x1,%eax                .
  33:   mov    %eax,-0x224(%rbp)        |_
  39:   jmpq   0xffffffffffb09f55       |_ goto *(prog->bpf_func + prologue_size)
  3e:   mov    $0x1,%eax
  43:   pop    %rbx
  44:   pop    %r15
  46:   pop    %r14
  48:   pop    %r13
  4a:   pop    %rbx
  4b:   leaveq
  4c:   retq

After; state after map update (no prog):

  # ./bpftool p d j i 1655
  0xffffffffc08e8930:
   0:   nopl   0x0(%rax,%rax,1)
   5:   push   %rbp
   6:   mov    %rsp,%rbp
   9:   sub    $0x200,%rsp
  10:   push   %rbx
  11:   push   %r13
  13:   push   %r14
  15:   push   %r15
  17:   pushq  $0x0
  19:   xor    %edx,%edx
  1b:   movabs $0xffff9d8afd74c000,%rsi
  25:   mov    -0x224(%rbp),%eax
  2b:   cmp    $0x20,%eax               .
  2e:   ja     0x000000000000003e       .
  30:   add    $0x1,%eax                .
  33:   mov    %eax,-0x224(%rbp)        |_
  39:   nopl   0x0(%rax,%rax,1)         |_ fall-through nop
  3e:   mov    $0x1,%eax
  43:   pop    %rbx
  44:   pop    %r15
  46:   pop    %r14
  48:   pop    %r13
  4a:   pop    %rbx
  4b:   leaveq
  4c:   retq

Nice bonus is that this also shrinks the code emission quite a bit
for every tail call invocation.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/6ada4c1c9d35eeb5f4ecfab94593dafa6b5c4b09.1574452833.git.daniel@iogearbox.net
4 years agobpf: Constant map key tracking for prog array pokes
Daniel Borkmann [Fri, 22 Nov 2019 20:07:59 +0000 (21:07 +0100)]
bpf: Constant map key tracking for prog array pokes

Add tracking of constant keys into tail call maps. The signature of
bpf_tail_call_proto is that arg1 is ctx, arg2 map pointer and arg3
is a index key. The direct call approach for tail calls can be enabled
if the verifier asserted that for all branches leading to the tail call
helper invocation, the map pointer and index key were both constant
and the same.

Tracking of map pointers we already do from prior work via c93552c443eb
("bpf: properly enforce index mask to prevent out-of-bounds speculation")
and 09772d92cd5a ("bpf: avoid retpoline for lookup/update/ delete calls
on maps").

Given the tail call map index key is not on stack but directly in the
register, we can add similar tracking approach and later in fixup_bpf_calls()
add a poke descriptor to the progs poke_tab with the relevant information
for the JITing phase.

We internally reuse insn->imm for the rewritten BPF_JMP | BPF_TAIL_CALL
instruction in order to point into the prog's poke_tab, and keep insn->imm
as 0 as indicator that current indirect tail call emission must be used.
Note that publishing to the tracker must happen at the end of fixup_bpf_calls()
since adding elements to the poke_tab reallocates its memory, so we need
to wait until its in final state.

Future work can generalize and add similar approach to optimize plain
array map lookups. Difference there is that we need to look into the key
value that sits on stack. For clarity in bpf_insn_aux_data, map_state
has been renamed into map_ptr_state, so we get map_{ptr,key}_state as
trackers.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/e8db37f6b2ae60402fa40216c96738ee9b316c32.1574452833.git.daniel@iogearbox.net
4 years agobpf: Add poke dependency tracking for prog array maps
Daniel Borkmann [Fri, 22 Nov 2019 20:07:58 +0000 (21:07 +0100)]
bpf: Add poke dependency tracking for prog array maps

This work adds program tracking to prog array maps. This is needed such
that upon prog array updates/deletions we can fix up all programs which
make use of this tail call map. We add ops->map_poke_{un,}track()
helpers to maps to maintain the list of programs and ops->map_poke_run()
for triggering the actual update.

bpf_array_aux is extended to contain the list head and poke_mutex in
order to serialize program patching during updates/deletions.
bpf_free_used_maps() will untrack the program shortly before dropping
the reference to the map. For clearing out the prog array once all urefs
are dropped we need to use schedule_work() to have a sleepable context.

The prog_array_map_poke_run() is triggered during updates/deletions and
walks the maintained prog list. It checks in their poke_tabs whether the
map and key is matching and runs the actual bpf_arch_text_poke() for
patching in the nop or new jmp location. Depending on the type of update,
we use one of BPF_MOD_{NOP_TO_JUMP,JUMP_TO_NOP,JUMP_TO_JUMP}.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/1fb364bb3c565b3e415d5ea348f036ff379e779d.1574452833.git.daniel@iogearbox.net
4 years agobpf: Add initial poke descriptor table for jit images
Daniel Borkmann [Fri, 22 Nov 2019 20:07:57 +0000 (21:07 +0100)]
bpf: Add initial poke descriptor table for jit images

Add initial poke table data structures and management to the BPF
prog that can later be used by JITs. Also add an instance of poke
specific data for tail call maps; plan for later work is to extend
this also for BPF static keys.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/1db285ec2ea4207ee0455b3f8e191a4fc58b9ade.1574452833.git.daniel@iogearbox.net
4 years agobpf: Move owner type, jited info into array auxiliary data
Daniel Borkmann [Fri, 22 Nov 2019 20:07:56 +0000 (21:07 +0100)]
bpf: Move owner type, jited info into array auxiliary data

We're going to extend this with further information which is only
relevant for prog array at this point. Given this info is not used
in critical path, move it into its own structure such that the main
array map structure can be kept on diet.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/b9ddccdb0f6f7026489ee955f16c96381e1e7238.1574452833.git.daniel@iogearbox.net
4 years agobpf: Move bpf_free_used_maps into sleepable section
Daniel Borkmann [Fri, 22 Nov 2019 20:07:55 +0000 (21:07 +0100)]
bpf: Move bpf_free_used_maps into sleepable section

We later on are going to need a sleepable context as opposed to plain
RCU callback in order to untrack programs we need to poke at runtime
and tracking as well as image update is performed under mutex.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/09823b1d5262876e9b83a8e75df04cf0467357a4.1574452833.git.daniel@iogearbox.net
4 years agobpf, x86: Generalize and extend bpf_arch_text_poke for direct jumps
Daniel Borkmann [Fri, 22 Nov 2019 20:07:54 +0000 (21:07 +0100)]
bpf, x86: Generalize and extend bpf_arch_text_poke for direct jumps

Add BPF_MOD_{NOP_TO_JUMP,JUMP_TO_JUMP,JUMP_TO_NOP} patching for x86
JIT in order to be able to patch direct jumps or nop them out. We need
this facility in order to patch tail call jumps and in later work also
BPF static keys.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/aa4784196a8e5e985af4b30a4fe5336bce6e9643.1574452833.git.daniel@iogearbox.net
4 years agoselftests/bpf: Add BPF trampoline performance test
Alexei Starovoitov [Fri, 22 Nov 2019 01:15:15 +0000 (17:15 -0800)]
selftests/bpf: Add BPF trampoline performance test

Add a test that benchmarks different ways of attaching BPF program to a kernel function.
Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations:
$ ./test_progs -n 49 -v|grep events
task_rename base 2743K events per sec
task_rename kprobe 2419K events per sec
task_rename kretprobe 1876K events per sec
task_rename raw_tp 2578K events per sec
task_rename fentry 2710K events per sec
task_rename fexit 2685K events per sec

On a kernel with retpoline:
$ ./test_progs -n 49 -v|grep events
task_rename base 2401K events per sec
task_rename kprobe 1930K events per sec
task_rename kretprobe 1485K events per sec
task_rename raw_tp 2053K events per sec
task_rename fentry 2351K events per sec
task_rename fexit 2185K events per sec

All 5 approaches:
- kprobe/kretprobe in __set_task_comm()
- raw tracepoint in trace_task_rename()
- fentry/fexit in __set_task_comm()
are roughly equivalent.

__set_task_comm() by itself is quite fast, so any extra instructions add up.
Until BPF trampoline was introduced the fastest mechanism was raw tracepoint.
kprobe via ftrace was second best. kretprobe is slow due to trap. New
fentry/fexit methods via BPF trampoline are clearly the fastest and the
difference is more pronounced with retpoline on, since BPF trampoline doesn't
use indirect jumps.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
4 years agoMerge branch 'jmp32-reg-bounds'
Alexei Starovoitov [Fri, 22 Nov 2019 18:29:38 +0000 (10:29 -0800)]
Merge branch 'jmp32-reg-bounds'

Yonghong Song says:

====================
With latest llvm, bpf selftest test_progs, which has +alu32 enabled, failed for
strobemeta.o and a few other subtests. The reason is due to that
verifier did not provide better var_off.mask after jmp32 instructions.
This patch set addressed this issue and after the fix, test_progs passed
with alu32.

Patch #1 provided detailed explanation of the problem and the fix.
Patch #2 added three tests in test_verifier.

Changelog:
  v1 -> v2:
    - do not directly manipulate tnum.{value,mask} in __reg_bound_offset32(),
      using tnum_lshift/tnum_rshift functions instead
    - do __reg_bound_offset32() after regular 64bit __reg_bound_offset()
      since the latter may give a better upper 32bit var_off, which can
      be inherited by __reg_bound_offset32().
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 years agoselftests/bpf: Ensure core_reloc_kernel is reading test_progs's data only
Andrii Nakryiko [Thu, 21 Nov 2019 17:59:00 +0000 (09:59 -0800)]
selftests/bpf: Ensure core_reloc_kernel is reading test_progs's data only

test_core_reloc_kernel.c selftest is the only CO-RE test that reads and
returns for validation calling thread's information (pid, tgid, comm). Thus it
has to make sure that only test_prog's invocations are honored.

Fixes: df36e621418b ("selftests/bpf: add CO-RE relocs testing setup")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20191121175900.3486133-1-andriin@fb.com
4 years agoselftests/bpf: Add verifier tests for better jmp32 register bounds
Yonghong Song [Thu, 21 Nov 2019 17:06:51 +0000 (09:06 -0800)]
selftests/bpf: Add verifier tests for better jmp32 register bounds

Three test cases are added.
Test 1: jmp32 'reg op imm'.
Test 2: jmp32 'reg op reg' where dst 'reg' has unknown constant
        and src 'reg' has known constant
Test 3: jmp32 'reg op reg' where dst 'reg' has known constant
        and src 'reg' has unknown constant

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191121170651.449096-1-yhs@fb.com
4 years agolibbpf: Fix bpf_object name determination for bpf_object__open_file()
Andrii Nakryiko [Fri, 22 Nov 2019 00:35:27 +0000 (16:35 -0800)]
libbpf: Fix bpf_object name determination for bpf_object__open_file()

If bpf_object__open_file() gets path like "some/dir/obj.o", it should derive
BPF object's name as "obj" (unless overriden through opts->object_name).
Instead, due to using `path` as a fallback value for opts->obj_name, path is
used as is for object name, so for above example BPF object's name will be
verbatim "some/dir/obj", which leads to all sorts of troubles, especially when
internal maps are concern (they are using up to 8 characters of object name).
Fix that by ensuring object_name stays NULL, unless overriden.

Fixes: 291ee02b5e40 ("libbpf: Refactor bpf_object__open APIs to use common opts")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191122003527.551556-1-andriin@fb.com
4 years agobpf: Provide better register bounds after jmp32 instructions
Yonghong Song [Thu, 21 Nov 2019 17:06:50 +0000 (09:06 -0800)]
bpf: Provide better register bounds after jmp32 instructions

With latest llvm (trunk https://github.com/llvm/llvm-project),
test_progs, which has +alu32 enabled, failed for strobemeta.o.
The verifier output looks like below with edit to replace large
decimal numbers with hex ones.
 193: (85) call bpf_probe_read_user_str#114
   R0=inv(id=0)
 194: (26) if w0 > 0x1 goto pc+4
   R0_w=inv(id=0,umax_value=0xffffffff00000001)
 195: (6b) *(u16 *)(r7 +80) = r0
 196: (bc) w6 = w0
   R6_w=inv(id=0,umax_value=0xffffffff,var_off=(0x0; 0xffffffff))
 197: (67) r6 <<= 32
   R6_w=inv(id=0,smax_value=0x7fffffff00000000,umax_value=0xffffffff00000000,
            var_off=(0x0; 0xffffffff00000000))
 198: (77) r6 >>= 32
   R6=inv(id=0,umax_value=0xffffffff,var_off=(0x0; 0xffffffff))
 ...
 201: (79) r8 = *(u64 *)(r10 -416)
   R8_w=map_value(id=0,off=40,ks=4,vs=13872,imm=0)
 202: (0f) r8 += r6
   R8_w=map_value(id=0,off=40,ks=4,vs=13872,umax_value=0xffffffff,var_off=(0x0; 0xffffffff))
 203: (07) r8 += 9696
   R8_w=map_value(id=0,off=9736,ks=4,vs=13872,umax_value=0xffffffff,var_off=(0x0; 0xffffffff))
 ...
 255: (bf) r1 = r8
   R1_w=map_value(id=0,off=9736,ks=4,vs=13872,umax_value=0xffffffff,var_off=(0x0; 0xffffffff))
 ...
 257: (85) call bpf_probe_read_user_str#114
 R1 unbounded memory access, make sure to bounds check any array access into a map

The value range for register r6 at insn 198 should be really just 0/1.
The umax_value=0xffffffff caused later verification failure.

After jmp instructions, the current verifier already tried to use just
obtained information to get better register range. The current mechanism is
for 64bit register only. This patch implemented to tighten the range
for 32bit sub-registers after jmp32 instructions.
With the patch, we have the below range ranges for the
above code sequence:
 193: (85) call bpf_probe_read_user_str#114
   R0=inv(id=0)
 194: (26) if w0 > 0x1 goto pc+4
   R0_w=inv(id=0,smax_value=0x7fffffff00000001,umax_value=0xffffffff00000001,
            var_off=(0x0; 0xffffffff00000001))
 195: (6b) *(u16 *)(r7 +80) = r0
 196: (bc) w6 = w0
   R6_w=inv(id=0,umax_value=0xffffffff,var_off=(0x0; 0x1))
 197: (67) r6 <<= 32
   R6_w=inv(id=0,umax_value=0x100000000,var_off=(0x0; 0x100000000))
 198: (77) r6 >>= 32
   R6=inv(id=0,umax_value=1,var_off=(0x0; 0x1))
 ...
 201: (79) r8 = *(u64 *)(r10 -416)
   R8_w=map_value(id=0,off=40,ks=4,vs=13872,imm=0)
 202: (0f) r8 += r6
   R8_w=map_value(id=0,off=40,ks=4,vs=13872,umax_value=1,var_off=(0x0; 0x1))
 203: (07) r8 += 9696
   R8_w=map_value(id=0,off=9736,ks=4,vs=13872,umax_value=1,var_off=(0x0; 0x1))
 ...
 255: (bf) r1 = r8
   R1_w=map_value(id=0,off=9736,ks=4,vs=13872,umax_value=1,var_off=(0x0; 0x1))
 ...
 257: (85) call bpf_probe_read_user_str#114
 ...

At insn 194, the register R0 has better var_off.mask and smax_value.
Especially, the var_off.mask ensures later lshift and rshift
maintains proper value range.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191121170650.449030-1-yhs@fb.com
4 years agoxdp: Fix cleanup on map free for devmap_hash map type
Toke Høiland-Jørgensen [Thu, 21 Nov 2019 13:36:12 +0000 (14:36 +0100)]
xdp: Fix cleanup on map free for devmap_hash map type

Tetsuo pointed out that it was not only the device unregister hook that was
broken for devmap_hash types, it was also cleanup on map free. So better
fix this as well.

While we're at it, there's no reason to allocate the netdev_map array for
DEVMAP_HASH, so skip that and adjust the cost accordingly.

Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20191121133612.430414-1-toke@redhat.com
4 years agoMerge branch 'libbpf-global-vars'
Alexei Starovoitov [Thu, 21 Nov 2019 17:12:52 +0000 (09:12 -0800)]
Merge branch 'libbpf-global-vars'

Andrii Nakryiko says:

====================
This patch set salvages all the non-extern-specific changes out of blocked
externs patch set ([0]). In addition to small clean ups, it also refactors
libbpf's handling of relocations and allows support for global (non-static)
variables.

  [0] https://patchwork.ozlabs.org/project/netdev/list/?series=143358&state=*
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 years agoselftests/bpf: Integrate verbose verifier log into test_progs
Andrii Nakryiko [Wed, 20 Nov 2019 00:35:48 +0000 (16:35 -0800)]
selftests/bpf: Integrate verbose verifier log into test_progs

Add exra level of verboseness, activated by -vvv argument. When -vv is
specified, verbose libbpf and verifier log (level 1) is output, even for
successful tests. With -vvv, verifier log goes to level 2.

This is extremely useful to debug verifier failures, as well as just see the
state and flow of verification. Before this, you'd have to go and modify
load_program()'s source code inside libbpf to specify extra log_level flags,
which is suboptimal to say the least.

Currently -vv and -vvv triggering verifier output is integrated into
test_stub's bpf_prog_load as well as bpf_verif_scale.c tests.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191120003548.4159797-1-andriin@fb.com
4 years agolibbpf: Support initialized global variables
Andrii Nakryiko [Thu, 21 Nov 2019 07:07:43 +0000 (23:07 -0800)]
libbpf: Support initialized global variables

Initialized global variables are no different in ELF from static variables,
and don't require any extra support from libbpf. But they are matching
semantics of global data (backed by BPF maps) more closely, preventing
LLVM/Clang from aggressively inlining constant values and not requiring
volatile incantations to prevent those. This patch enables global variables.
It still disables uninitialized variables, which will be put into special COM
(common) ELF section, because BPF doesn't allow uninitialized data to be
accessed.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191121070743.1309473-5-andriin@fb.com
4 years agoselftests, bpftool: Skip the build test if not in tree
Jakub Kicinski [Tue, 19 Nov 2019 10:50:10 +0000 (10:50 +0000)]
selftests, bpftool: Skip the build test if not in tree

If selftests are copied over to another machine/location
for execution the build test of bpftool will obviously
not work, since the sources are not copied.
Skip it if we can't find bpftool's Makefile.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191119105010.19189-3-quentin.monnet@netronome.com
4 years agolibbpf: Fix various errors and warning reported by checkpatch.pl
Andrii Nakryiko [Thu, 21 Nov 2019 07:07:42 +0000 (23:07 -0800)]
libbpf: Fix various errors and warning reported by checkpatch.pl

Fix a bunch of warnings and errors reported by checkpatch.pl, to make it
easier to spot new problems.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191121070743.1309473-4-andriin@fb.com
4 years agoselftests, bpftool: Set EXIT trap after usage function
Quentin Monnet [Tue, 19 Nov 2019 10:50:09 +0000 (10:50 +0000)]
selftests, bpftool: Set EXIT trap after usage function

The trap on EXIT is used to clean up any temporary directory left by the
build attempts. It is not needed when the user simply calls the script
with its --help option, and may not be needed either if we add checks
(e.g. on the availability of bpftool files) before the build attempts.

Let's move this trap and related variables lower down in the code, so
that we don't accidentally change the value returned from the script
on early exits at pre-checks.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Link: https://lore.kernel.org/bpf/20191119105010.19189-2-quentin.monnet@netronome.com
4 years agolibbpf: Refactor relocation handling
Andrii Nakryiko [Thu, 21 Nov 2019 07:07:41 +0000 (23:07 -0800)]
libbpf: Refactor relocation handling

Relocation handling code is convoluted and unnecessarily deeply nested. Split
out per-relocation logic into separate function. Also refactor the logic to be
more a sequence of per-relocation type checks and processing steps, making it
simpler to follow control flow. This makes it easier to further extends it to
new kinds of relocations (e.g., support for extern variables).

This patch also makes relocation's section verification more robust.
Previously relocations against not yet supported externs were silently ignored
because of obj->efile.text_shndx was zero, when all BPF programs had custom
section names and there was no .text section. Also, invalid LDIMM64 relocations
against non-map sections were passed through, if they were pointing to a .text
section (or 0, which is invalid section). All these bugs are fixed within this
refactoring and checks are made more appropriate for each type of relocation.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191121070743.1309473-3-andriin@fb.com
4 years agotools, bpf: Fix build for 'make -s tools/bpf O=<dir>'
Quentin Monnet [Tue, 19 Nov 2019 10:56:26 +0000 (10:56 +0000)]
tools, bpf: Fix build for 'make -s tools/bpf O=<dir>'

Building selftests with 'make TARGETS=bpf kselftest' was fixed in commit
55d554f5d140 ("tools: bpf: Use !building_out_of_srctree to determine
srctree"). However, by updating $(srctree) in tools/bpf/Makefile for
in-tree builds only, we leave out the case where we pass an output
directory to build BPF tools, but $(srctree) is not set. This
typically happens for:

    $ make -s tools/bpf O=/tmp/foo
    Makefile:40: /tools/build/Makefile.feature: No such file or directory

Fix it by updating $(srctree) in the Makefile not only for out-of-tree
builds, but also if $(srctree) is empty.

Detected with test_bpftool_build.sh.

Fixes: 55d554f5d140 ("tools: bpf: Use !building_out_of_srctree to determine srctree")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Link: https://lore.kernel.org/bpf/20191119105626.21453-1-quentin.monnet@netronome.com
4 years agoselftests/bpf: Ensure no DWARF relocations for BPF object files
Andrii Nakryiko [Thu, 21 Nov 2019 07:07:40 +0000 (23:07 -0800)]
selftests/bpf: Ensure no DWARF relocations for BPF object files

Add -mattr=dwarfris attribute to llc to avoid having relocations against DWARF
data. These relocations make it impossible to inspect DWARF contents: all
strings are invalid.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191121070743.1309473-2-andriin@fb.com
4 years agotools, bpftool: Fix warning on ignored return value for 'read'
Quentin Monnet [Tue, 19 Nov 2019 11:17:06 +0000 (11:17 +0000)]
tools, bpftool: Fix warning on ignored return value for 'read'

When building bpftool, a warning was introduced by commit a94364603610
("bpftool: Allow to read btf as raw data"), because the return value
from a call to 'read()' is ignored. Let's address it.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191119111706.22440-1-quentin.monnet@netronome.com
4 years agoxsk: Fix xsk_poll()'s return type
Luc Van Oostenryck [Wed, 20 Nov 2019 00:10:42 +0000 (01:10 +0100)]
xsk: Fix xsk_poll()'s return type

xsk_poll() is defined as returning 'unsigned int' but the
.poll method is declared as returning '__poll_t', a bitwise type.

Fix this by using the proper return type and using the EPOLL
constants instead of the POLL ones, as required for __poll_t.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20191120001042.30830-1-luc.vanoostenryck@gmail.com
4 years agopowerpc: Add const qual to local_read() parameter
Eric Dumazet [Wed, 20 Nov 2019 01:14:51 +0000 (12:14 +1100)]
powerpc: Add const qual to local_read() parameter

A patch in net-next triggered a compile error on powerpc:

  include/linux/u64_stats_sync.h: In function 'u64_stats_read':
  include/asm-generic/local64.h:30:37: warning: passing argument 1 of 'local_read' discards 'const' qualifier from pointer target type

This seems reasonable to relax powerpc local_read() requirements.

Fixes: 316580b69d0a ("u64_stats: provide u64_stats_t type")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kbuild test robot <lkp@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # build only
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agoMerge branch 'bnxt_en-Updates'
Jakub Kicinski [Sun, 24 Nov 2019 22:50:43 +0000 (14:50 -0800)]
Merge branch 'bnxt_en-Updates'

Michael Chan says:

====================
bnxt_en: Updates.

This patchset contains these main features:

1. Add the proper logic to support suspend/resume on the new 57500
   chips.
2. Allow Phy configurations from user on a Multihost function if
   supported by fw.
3. devlink NVRAM flashing support.
4. Add a couple of chip IDs, PHY loopback enhancement, and provide
   more RSS contexts to VFs.

v2: Dropped the devlink info patches to address some feedback
    and resubmit for the 5.6 kernel.
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agobnxt_en: Add support for flashing the device via devlink
Vasundhara Volam [Sun, 24 Nov 2019 03:30:50 +0000 (22:30 -0500)]
bnxt_en: Add support for flashing the device via devlink

Use the same bnxt_flash_package_from_file() function to support
devlink flash operation.

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agobnxt_en: Allow PHY settings on multi-function or NPAR PFs if allowed by FW.
Michael Chan [Sun, 24 Nov 2019 03:30:49 +0000 (22:30 -0500)]
bnxt_en: Allow PHY settings on multi-function or NPAR PFs if allowed by FW.

Currently, the driver does not allow PHY settings on a multi-function or
NPAR NIC whose port is shared by more than one function.  Newer
firmware now allows PHY settings on some of these NICs.  Check for
this new firmware setting and allow the user to set the PHY settings
accordingly.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agobnxt_en: Add async. event logic for PHY configuration changes.
Michael Chan [Sun, 24 Nov 2019 03:30:48 +0000 (22:30 -0500)]
bnxt_en: Add async. event logic for PHY configuration changes.

If the link settings have been changed by another function sharing the
port, firmware will send us an async. message.  In response, we will
call the new bnxt_init_ethtool_link_settings() function to update
the current settings.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agobnxt_en: Refactor the initialization of the ethtool link settings.
Michael Chan [Sun, 24 Nov 2019 03:30:47 +0000 (22:30 -0500)]
bnxt_en: Refactor the initialization of the ethtool link settings.

Refactor this logic in bnxt_probe_phy() into a separate function
bnxt_init_ethtool_link_settings().  It used to be that the settable
link settings will never be changed without going through ethtool.
So we only needed to do this once in bnxt_probe_phy().  Now, another
function sharing the port may change it and we may need to re-initialize
the ethtool settings again in run-time.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agobnxt_en: Skip disabling autoneg before PHY loopback when appropriate.
Michael Chan [Sun, 24 Nov 2019 03:30:46 +0000 (22:30 -0500)]
bnxt_en: Skip disabling autoneg before PHY loopback when appropriate.

New firmware allows PHY loopback to be set without disabling autoneg
first.  Check this capability and skip disabling autoneg when
it is supported by firmware.  Using this scheme, loopback will
always work even if the PHY only supports autoneg.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agobnxt_en: Assign more RSS context resources to the VFs.
Michael Chan [Sun, 24 Nov 2019 03:30:45 +0000 (22:30 -0500)]
bnxt_en: Assign more RSS context resources to the VFs.

The driver currently only assignes 1 RSS context to each VF.  This works
for the Linux VF driver.  But other drivers, such as DPDK, can make use
of additional RSS contexts.  Modify the code to divide up and assign
RSS contexts to VFs just like other resources.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agobnxt_en: Initialize context memory to the value specified by firmware.
Michael Chan [Sun, 24 Nov 2019 03:30:44 +0000 (22:30 -0500)]
bnxt_en: Initialize context memory to the value specified by firmware.

Some chips that need host context memory as a backing store requires
the memory to be initialized to a non-zero value.  Query the
value from firmware and initialize the context memory accordingly.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agobnxt_en: Fix suspend/resume path on 57500 chips
Vasundhara Volam [Sun, 24 Nov 2019 03:30:43 +0000 (22:30 -0500)]
bnxt_en: Fix suspend/resume path on 57500 chips

Driver calls HWRM_FUNC_RESET firmware call while resuming the device
which clears the context memory backing store. Because of which
allocating firmware resources would eventually fail. Fix it by freeing
all context memory during suspend and reallocate the memory during resume.

Call bnxt_hwrm_queue_qportcfg() in resume path.  This firmware call
is needed on the 57500 chips so that firmware will set up the proper
queue mapping in relation to the context memory.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agobnxt_en: Send FUNC_RESOURCE_QCAPS command in bnxt_resume()
Vasundhara Volam [Sun, 24 Nov 2019 03:30:42 +0000 (22:30 -0500)]
bnxt_en: Send FUNC_RESOURCE_QCAPS command in bnxt_resume()

After driver unregister, firmware is erasing the information that
driver supports new resource management. Send FUNC_RESOURCE_QCAPS
command to inform the firmware that driver supports new resource
management while resuming from hibernation.  Otherwise, we fallback
to the older resource allocation scheme.

Also, move driver register after sending FUNC_RESOURCE_QCAPS command
to be consistent with the normal initialization sequence.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agobnxt_en: Combine 2 functions calling the same HWRM_DRV_RGTR fw command.
Vasundhara Volam [Sun, 24 Nov 2019 03:30:41 +0000 (22:30 -0500)]
bnxt_en: Combine 2 functions calling the same HWRM_DRV_RGTR fw command.

Everytime driver registers with firmware, driver is required to
register for async event notifications as well. These 2 calls
are done using the same firmware command and can be combined.

We are also missing the 2nd step to register for async events
in the suspend/resume path and this will fix it.  Prior to this,
we were getting only default notifications.

ULP can register for additional async events for the RDMA driver,
so we add a parameter to the new function to only do step 2 when
it is called from ULP.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agobnxt_en: Do driver unregister cleanup in bnxt_init_one() failure path.
Vasundhara Volam [Sun, 24 Nov 2019 03:30:40 +0000 (22:30 -0500)]
bnxt_en: Do driver unregister cleanup in bnxt_init_one() failure path.

In the bnxt_init_one() failure path, if the driver has already called
firmware to register the driver, it is not undoing the driver
registration.  Add this missing step to unregister for correctness,
so that the firmware knows that the driver has unloaded.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agobnxt_en: Disable/enable Bus master during suspend/resume.
Michael Chan [Sun, 24 Nov 2019 03:30:39 +0000 (22:30 -0500)]
bnxt_en: Disable/enable Bus master during suspend/resume.

Disable Bus master during suspend to prevent DMAs after the device
goes into D3hot state.  The new 57500 devices may continue to DMA
from context memory after the system goes into D3hot state.  This
may cause some PCIe errors on some system.  Re-enable it during resume.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agobnxt_en: Add chip IDs for 57452 and 57454 chips.
Michael Chan [Sun, 24 Nov 2019 03:30:38 +0000 (22:30 -0500)]
bnxt_en: Add chip IDs for 57452 and 57454 chips.

Fix BNXT_CHIP_NUM_5645X() to include 57452 and 56454 chip IDs, so
that these chips will be properly classified as P4 chips to take
advantage of the P4 fixes and features.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agosfc: fix build without CONFIG_RFS_ACCEL
Jakub Kicinski [Sat, 23 Nov 2019 17:45:42 +0000 (09:45 -0800)]
sfc: fix build without CONFIG_RFS_ACCEL

The rfs members of struct efx_channel are under CONFIG_RFS_ACCEL.
Ethtool stats which access those need to be as well.

Reported-by: kbuild test robot <lkp@intel.com>
Fixes: ca70bd423f10 ("sfc: add statistics for ARFS")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agohv_netvsc: make recording RSS hash depend on feature flag
Stephen Hemminger [Sat, 23 Nov 2019 23:50:17 +0000 (15:50 -0800)]
hv_netvsc: make recording RSS hash depend on feature flag

The recording of RSS hash should be controlled by NETIF_F_RXHASH.

Fixes: 1fac7ca4e63b ("hv_netvsc: record hardware hash in skb")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agosctp: cache netns in sctp_ep_common
Xin Long [Sat, 23 Nov 2019 03:56:49 +0000 (11:56 +0800)]
sctp: cache netns in sctp_ep_common

This patch is to fix a data-race reported by syzbot:

  BUG: KCSAN: data-race in sctp_assoc_migrate / sctp_hash_obj

  write to 0xffff8880b67c0020 of 8 bytes by task 18908 on cpu 1:
    sctp_assoc_migrate+0x1a6/0x290 net/sctp/associola.c:1091
    sctp_sock_migrate+0x8aa/0x9b0 net/sctp/socket.c:9465
    sctp_accept+0x3c8/0x470 net/sctp/socket.c:4916
    inet_accept+0x7f/0x360 net/ipv4/af_inet.c:734
    __sys_accept4+0x224/0x430 net/socket.c:1754
    __do_sys_accept net/socket.c:1795 [inline]
    __se_sys_accept net/socket.c:1792 [inline]
    __x64_sys_accept+0x4e/0x60 net/socket.c:1792
    do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

  read to 0xffff8880b67c0020 of 8 bytes by task 12003 on cpu 0:
    sctp_hash_obj+0x4f/0x2d0 net/sctp/input.c:894
    rht_key_get_hash include/linux/rhashtable.h:133 [inline]
    rht_key_hashfn include/linux/rhashtable.h:159 [inline]
    rht_head_hashfn include/linux/rhashtable.h:174 [inline]
    head_hashfn lib/rhashtable.c:41 [inline]
    rhashtable_rehash_one lib/rhashtable.c:245 [inline]
    rhashtable_rehash_chain lib/rhashtable.c:276 [inline]
    rhashtable_rehash_table lib/rhashtable.c:316 [inline]
    rht_deferred_worker+0x468/0xab0 lib/rhashtable.c:420
    process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
    worker_thread+0xa0/0x800 kernel/workqueue.c:2415
    kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

It was caused by rhashtable access asoc->base.sk when sctp_assoc_migrate
is changing its value. However, what rhashtable wants is netns from asoc
base.sk, and for an asoc, its netns won't change once set. So we can
simply fix it by caching netns since created.

Fixes: d6c0256a60e6 ("sctp: add the rhashtable apis for sctp global transport hashtable")
Reported-by: syzbot+e3b35fe7918ff0ee474e@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agosctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook
Navid Emamdoost [Fri, 22 Nov 2019 22:17:56 +0000 (16:17 -0600)]
sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook

In the implementation of sctp_sf_do_5_2_4_dupcook() the allocated
new_asoc is leaked if security_sctp_assoc_request() fails. Release it
via sctp_association_free().

Fixes: 2277c7cd75e3 ("sctp: Add LSM hooks")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agonet: gro: use vlan API instead of accessing directly
Tonghao Zhang [Fri, 22 Nov 2019 12:38:01 +0000 (20:38 +0800)]
net: gro: use vlan API instead of accessing directly

Use vlan common api to access the vlan_tag info.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agoMerge tag 'mlx5-updates-2019-11-22' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Sun, 24 Nov 2019 00:36:49 +0000 (16:36 -0800)]
Merge tag 'mlx5-updates-2019-11-22' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-11-22

1) Misc Cleanups
2) Software steering support for Geneve
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agonet: phylink: rename mac_link_state() op to mac_pcs_get_state()
Russell King [Thu, 21 Nov 2019 00:36:22 +0000 (00:36 +0000)]
net: phylink: rename mac_link_state() op to mac_pcs_get_state()

Rename the mac_link_state() method to mac_pcs_get_state() to make it
clear that it should be returning the MACs PCS current state, which
is used for inband negotiation rather than just reading back what the
MAC has been configured for. Update the documentation to explicitly
mention that this is for inband.

We drop the return value as well; most of phylink doesn't check the
return value and it is not clear what it should do on error - instead
arrange for state->link to be false.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agonet: use rhashtable_lookup() instead of rhashtable_lookup_fast()
Taehee Yoo [Fri, 22 Nov 2019 08:15:19 +0000 (08:15 +0000)]
net: use rhashtable_lookup() instead of rhashtable_lookup_fast()

rhashtable_lookup_fast() internally calls rcu_read_lock() then,
calls rhashtable_lookup(). So if rcu_read_lock() is already held,
rhashtable_lookup() is enough.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agoMerge tag 'wireless-drivers-next-2019-11-22' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Sat, 23 Nov 2019 20:00:54 +0000 (12:00 -0800)]
Merge tag 'wireless-drivers-next-2019-11-22' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for v5.5

Last set of patches for v5.5. Major features here 802.11ax support for
qtnfmac and airtime fairness support to mt76. And naturally smaller
fixes and improvements all over.

Major changes:

qtnfmac

* add 802.11ax support in AP mode

* enable offload bridging support

iwlwifi

* support TX/RX antennas reporting

mt76

* mt7615 smart carrier sense support

* aggregation statistics via debugfs

* airtime fairness (ATF) support

* mt76x0 OF mac address support
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agoMerge branch 'nfc-convert-from-txt-to-rst'
Jakub Kicinski [Sat, 23 Nov 2019 19:01:13 +0000 (11:01 -0800)]
Merge branch 'nfc-convert-from-txt-to-rst'

Robert Schwebel says:

====================
here is v2 of the series converting the NFC documentation from txt to
rst. Thanks to Jonathan and Dave for the input.

Changes since (implicit) v1:

* replace code-block by more compact :: syntax

* really add the rst file to the index
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agodocs: networking: nfc: change to rst format
Robert Schwebel [Fri, 22 Nov 2019 07:43:06 +0000 (08:43 +0100)]
docs: networking: nfc: change to rst format

Now that the sphinx syntax has been fixed, change the document from txt
to rst and add it to the index.

Signed-off-by: Robert Schwebel <r.schwebel@pengutronix.de>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agodocs: networking: nfc: fix code block syntax
Robert Schwebel [Fri, 22 Nov 2019 07:43:05 +0000 (08:43 +0100)]
docs: networking: nfc: fix code block syntax

Silence this warning:

Documentation/networking/nfc.rst:113: WARNING: Definition list ends without
a blank line; unexpected unindent.

Signed-off-by: Robert Schwebel <r.schwebel@pengutronix.de>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agodocs: networking: nfc: fix bullet list syntax
Robert Schwebel [Fri, 22 Nov 2019 07:43:04 +0000 (08:43 +0100)]
docs: networking: nfc: fix bullet list syntax

Fix this warning:

Documentation/networking/nfc.rst:87: WARNING: Bullet list ends without
a blank line; unexpected unindent.

Signed-off-by: Robert Schwebel <r.schwebel@pengutronix.de>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agodocs: networking: nfc: change block diagram to sphinx syntax
Robert Schwebel [Fri, 22 Nov 2019 07:43:03 +0000 (08:43 +0100)]
docs: networking: nfc: change block diagram to sphinx syntax

Change the block diagram to match the sphinx syntax. This will make it
possible to switch this file to rst in the future.

Signed-off-by: Robert Schwebel <r.schwebel@pengutronix.de>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agodocs: networking: nfc: change headlines to sphinx syntax
Robert Schwebel [Fri, 22 Nov 2019 07:43:02 +0000 (08:43 +0100)]
docs: networking: nfc: change headlines to sphinx syntax

The headlines in this file do are not in the standard kernel docu-
mentation headline format. Change it, so this file can be switched to
rst in the future.

Signed-off-by: Robert Schwebel <r.schwebel@pengutronix.de>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agonet: phy: initialise phydev speed and duplex sanely
Russell King [Fri, 22 Nov 2019 15:23:23 +0000 (15:23 +0000)]
net: phy: initialise phydev speed and duplex sanely

When a phydev is created, the speed and duplex are set to zero and
-1 respectively, rather than using the predefined SPEED_UNKNOWN and
DUPLEX_UNKNOWN constants.

There is a window at initialisation time where we may report link
down using the 0/-1 values.  Tidy this up and use the predefined
constants, so debug doesn't complain with:

"Unsupported (update phy-core.c)/Unsupported (update phy-core.c)"

when the speed and duplex settings are printed.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agonet: phy: remove phy_ethtool_sset()
Russell King [Fri, 22 Nov 2019 12:37:08 +0000 (12:37 +0000)]
net: phy: remove phy_ethtool_sset()

There are no users of phy_ethtool_sset() in the kernel anymore, and
as of commit 3c1bcc8614db ("net: ethernet: Convert phydev advertize
and supported from u32 to link mode"), the implementation is slightly
buggy - it doesn't correctly check the masked advertising mask as it
used to.

Remove it, and update the phy documentation to refer to its replacement
function.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agoRevert "bpf: Emit audit messages upon successful prog load and unload"
Jakub Kicinski [Sat, 23 Nov 2019 17:54:58 +0000 (09:54 -0800)]
Revert "bpf: Emit audit messages upon successful prog load and unload"

This commit reverts commit 91e6015b082b ("bpf: Emit audit messages
upon successful prog load and unload") and its follow up commit
7599a896f2e4 ("audit: Move audit_log_task declaration under
CONFIG_AUDITSYSCALL") as requested by Paul Moore. The change needs
close review on linux-audit, tests etc.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agoMerge branch 'sfc-ARFS-expiry-improvements'
Jakub Kicinski [Sat, 23 Nov 2019 01:51:14 +0000 (17:51 -0800)]
Merge branch 'sfc-ARFS-expiry-improvements'

Edward Cree says:

====================
A series of changes to how we check filters for expiry, manage how much
 of that work to do & when, etc.
Prompted by some pathological behaviour under heavy load, which was
Reported-by: David Ahern <dahern@digitalocean.com>
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agosfc: do ARFS expiry work occasionally even without NAPI poll
Edward Cree [Fri, 22 Nov 2019 17:57:40 +0000 (17:57 +0000)]
sfc: do ARFS expiry work occasionally even without NAPI poll

If there's no traffic on a channel, its ARFS expiry work will never get
 scheduled by efx_poll() as that isn't being run.
So make efx_filter_rfs_expire() reschedule itself to run after 30 seconds.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agosfc: add statistics for ARFS
Edward Cree [Fri, 22 Nov 2019 17:57:27 +0000 (17:57 +0000)]
sfc: add statistics for ARFS

Report the number of successful and failed insertions, and also the
 current count of filters, to aid in tuning e.g. rps_flow_cnt.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agosfc: suppress MCDI errors from ARFS
Edward Cree [Fri, 22 Nov 2019 17:57:19 +0000 (17:57 +0000)]
sfc: suppress MCDI errors from ARFS

In high connection count usage, the NIC's filter table may be filled with
 sufficiently many ARFS filters that further insertions fail.  As this
 does not represent a correctness issue, do not log the resulting MCDI
 errors.  Add a debug-level message under the (by default disabled)
 rx_status category instead; and take the opportunity to do a little extra
 expiry work.

Since there are now multiple workitems able to call __efx_filter_rfs_expire
 on a given channel, it is possible for them to race and thus pass quotas
 which, combined, exceed rfs_filter_count.  Thus, don't WARN_ON if we loop
 all the way around the table with quota left over.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Tested-by: David Ahern <dahern@digitalocean.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agosfc: change ARFS expiry mechanism
Edward Cree [Fri, 22 Nov 2019 17:57:03 +0000 (17:57 +0000)]
sfc: change ARFS expiry mechanism

The old rfs_filters_added method for determining the quota could potentially
 allow the NIC to become filled with old filters, which never get tested for
 expiry.  Instead, explicitly make expiry check work depend on the number of
 filters installed, and don't count checking slots without filters in as
 doing work.  This guarantees that each filter will be checked for expiry at
 least once every thirty seconds (assuming the channel to which it belongs is
 NAPI polling actively) regardless of fill level.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Tested-by: David Ahern <dahern@digitalocean.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
Jakub Kicinski [Sat, 23 Nov 2019 00:59:00 +0000 (16:59 -0800)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
This series contains updates to the ice driver only.

Bruce updates the driver to store the number of functions the device has
so that it won't have to compute it when setting safe mode capabilities.
Adds a check to adjust the reporting of capabilities for devices with
more than 4 ports, which differ for devices with less than 4 ports.

Brett adds a helper function to determine if the VF is allowed to do
VLAN operations based on the host's VF configuration.  Also adds a new
function that initializes VLAN stripping (enabled/disabled) for the VF
based on the device supported capabilities.  Adds a check if the vector
index is valid with the respect to the number of transmit and receive
queues configured when we set coalesce settings for DCB.  Adds a check
if the promisc_mask contains ICE_PROMISC_VLAN_RX or ICE_PROMISC_VLAN_TX
so that VLAN 0 promiscuous rules to be removed.  Add a helper macro for
a commonly used de-reference of a pointer to &pf->dev->pdev.

Jesse fixes an issue where if an invalid virtchnl request from the VF,
the driver would return uninitialized data to the VF from the PF stack,
so ensure the stack variable is initialized earlier.  Add helpers to the
virtchnl interface make the reporting of strings consistent and help
reduce stack space.  Implements VF statistics gathering via the kernel
ndo_get_vf_stats().

Akeem ensures we disable the state flag for each VF when its resources
are returned to the device.

Tony does additional cleanup in the driver to ensure the when we
allocate and free memory within the same function, we should not be
using devm_* variants; use regular alloc and free functions.

Henry implements code to query and set the number of channels on the
primary VSI for a PF via ethtool.

Jake cleans up needless NULL checks in ice_sched_cleanup_all().

Kevin updates the firmware API version to align with current NVM images.

v2: Added "Fixes:" tag to patch 5 commit description and added the use
    of netif_is_rxfh_configured() in patch 13 to see if RSS has been
    configured by the user, if so do not overwrite that configuration.
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agonet: inet_is_local_reserved_port() should return bool not int
Maciej Żenczykowski [Fri, 22 Nov 2019 21:50:52 +0000 (13:50 -0800)]
net: inet_is_local_reserved_port() should return bool not int

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agoMerge branch 'cxgb4-add-udp-segmentation-offload-support'
Jakub Kicinski [Sat, 23 Nov 2019 00:46:33 +0000 (16:46 -0800)]
Merge branch 'cxgb4-add-udp-segmentation-offload-support'

Rahul Lakkireddy says:

====================
This series of patches add UDP Segmentation Offload (USO) supported
by Chelsio T5/T6 NICs.

Patch 1 updates the current Scatter Gather List (SGL) DMA unmap logic
for USO requests.

Patch 2 adds USO support for NIC and MQPRIO QoS offload Tx path.

Patch 3 adds missing stats for MQPRIO QoS offload Tx path.
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agocxgb4: add stats for MQPRIO QoS offload Tx path
Rahul Lakkireddy [Fri, 22 Nov 2019 01:00:03 +0000 (06:30 +0530)]
cxgb4: add stats for MQPRIO QoS offload Tx path

Export necessary stats for traffic flowing through MQPRIO QoS offload
Tx path.

v2:
- No change.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agocxgb4: add UDP segmentation offload support
Rahul Lakkireddy [Fri, 22 Nov 2019 01:00:02 +0000 (06:30 +0530)]
cxgb4: add UDP segmentation offload support

Implement and export UDP segmentation offload (USO) support for both
NIC and MQPRIO QoS offload Tx path. Update appropriate logic in Tx to
parse GSO info in skb and configure FW_ETH_TX_EO_WR request needed to
perform USO.

v2:
- Remove inline keyword from write_eo_udp_wr() in sge.c. Let the
  compiler decide.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agocxgb4/chcr: update SGL DMA unmap for USO
Rahul Lakkireddy [Fri, 22 Nov 2019 01:00:01 +0000 (06:30 +0530)]
cxgb4/chcr: update SGL DMA unmap for USO

The FW_ETH_TX_EO_WR used for sending UDP Segmentation Offload (USO)
requests expects the headers to be part of the descriptor and the
payload to be part of the SGL containing the DMA mapped addresses.
Hence, the DMA address in the first entry of the SGL can start after
the packet headers. Currently, unmap_sgl() tries to unmap from this
wrong offset, instead of the originally mapped DMA address.

So, use existing unmap_skb() instead, which takes originally saved DMA
addresses as input. Update all necessary Tx paths to save the original
DMA addresses, so that unmap_skb() can unmap them properly.

v2:
- No change.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Sat, 23 Nov 2019 00:27:24 +0000 (16:27 -0800)]
Merge git://git./linux/kernel/git/netdev/net

Minor conflict in drivers/s390/net/qeth_l2_main.c, kept the lock
from commit c8183f548902 ("s390/qeth: fix potential deadlock on
workqueue flush"), removed the code which was removed by commit
9897d583b015 ("s390/qeth: consolidate some duplicated HW cmd code").

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Fri, 22 Nov 2019 22:28:14 +0000 (14:28 -0800)]
Merge git://git./linux/kernel/git/netdev/net

Pull networking fixes from David Miller:

 1) Validate tunnel options length in act_tunnel_key, from Xin Long.

 2) Fix DMA sync bug in gve driver, from Adi Suresh.

 3) TSO kills performance on some r8169 chips due to HW issues, disable
    by default in that case, from Corinna Vinschen.

 4) Fix clock disable mismatch in fec driver, from Chubong Yuan.

 5) Fix interrupt status bits define in hns3 driver, from Huazhong Tan.

 6) Fix workqueue deadlocks in qeth driver, from Julian Wiedmann.

 7) Don't napi_disable() twice in r8152 driver, from Hayes Wang.

 8) Fix SKB extension memory leak, from Florian Westphal.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
  r8152: avoid to call napi_disable twice
  MAINTAINERS: Add myself as maintainer of virtio-vsock
  udp: drop skb extensions before marking skb stateless
  net: rtnetlink: prevent underflows in do_setvfinfo()
  can: m_can_platform: remove unnecessary m_can_class_resume() call
  can: m_can_platform: set net_device structure as driver data
  hv_netvsc: Fix send_table offset in case of a host bug
  hv_netvsc: Fix offset usage in netvsc_send_table()
  net-ipv6: IPV6_TRANSPARENT - check NET_RAW prior to NET_ADMIN
  sfc: Only cancel the PPS workqueue if it exists
  nfc: port100: handle command failure cleanly
  net-sysfs: fix netdev_queue_add_kobject() breakage
  r8152: Re-order napi_disable in rtl8152_close
  net: qca_spi: Move reset_count to struct qcaspi
  net: qca_spi: fix receive buffer size check
  net/ibmvnic: Ignore H_FUNCTION return from H_EOI to tolerate XIVE mode
  Revert "net/ibmvnic: Fix EOI when running in XIVE mode"
  net/mlxfw: Verify FSM error code translation doesn't exceed array size
  net/mlx5: Update the list of the PCI supported devices
  net/mlx5: Fix auto group size calculation
  ...

4 years agoafs: Fix large file support
Marc Dionne [Thu, 21 Nov 2019 15:37:26 +0000 (15:37 +0000)]
afs: Fix large file support

By default s_maxbytes is set to MAX_NON_LFS, which limits the usable
file size to 2GB, enforced by the vfs.

Commit b9b1f8d5930a ("AFS: write support fixes") added support for the
64-bit fetch and store server operations, but did not change this value.
As a result, attempts to write past the 2G mark result in EFBIG errors:

 $ dd if=/dev/zero of=foo bs=1M count=1 seek=2048
 dd: error writing 'foo': File too large

Set s_maxbytes to MAX_LFS_FILESIZE.

Fixes: b9b1f8d5930a ("AFS: write support fixes")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoafs: Fix possible assert with callbacks from yfs servers
Marc Dionne [Thu, 21 Nov 2019 15:26:15 +0000 (15:26 +0000)]
afs: Fix possible assert with callbacks from yfs servers

Servers sending callback breaks to the YFS_CM_SERVICE service may
send up to YFSCBMAX (1024) fids in a single RPC.  Anything over
AFSCBMAX (50) will cause the assert in afs_break_callbacks to trigger.

Remove the assert, as the count has already been checked against
the appropriate max values in afs_deliver_cb_callback and
afs_deliver_yfs_cb_callback.

Fixes: 35dbfba3111a ("afs: Implement the YFS cache manager service")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoice: Update FW API minor version
Kevin Scott [Fri, 8 Nov 2019 14:23:31 +0000 (06:23 -0800)]
ice: Update FW API minor version

Update FW API minor version to align to current value advertised
by FW in new NVM images.

Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: remove pointless NULL check of port_info
Jacob Keller [Fri, 8 Nov 2019 14:23:30 +0000 (06:23 -0800)]
ice: remove pointless NULL check of port_info

The code in ice_sched_cleanup_all checks whether the port info is NULL
prior to calling ice_sched_clear_port. However, ice_sched_clear_port
already checks whether port info is non-NULL.

More importantly, it also checks whether the port structure has been
initialized by checking its port_state field as well.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agoice: Implement ethtool ops for channels
Henry Tieman [Fri, 8 Nov 2019 14:23:29 +0000 (06:23 -0800)]
ice: Implement ethtool ops for channels

Add code to query and set the number of channels on the primary VSI for a
PF. This is accessed from the 'ethtool -l' and 'ethtool -L' commands,
respectively.  Though the ice driver supports asymmetric queues report an
IRQ vector that has both Rx and Tx queues attached and is counted as a
'combined' channel.

Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Co-developed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
4 years agonet/mlx5e: Remove redundant pointer check
Eli Cohen [Thu, 31 Oct 2019 07:12:18 +0000 (09:12 +0200)]
net/mlx5e: Remove redundant pointer check

When code reaches the "out" label, n is guaranteed to be valid so we can
unconditionally call neigh_release.

Also change the label to release_neigh to better reflect the fact that
we unconditionally free the neighbour and also match other labels
convention.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5e: TC, Stub out ipv6 tun create header function
Saeed Mahameed [Fri, 1 Nov 2019 22:38:30 +0000 (15:38 -0700)]
net/mlx5e: TC, Stub out ipv6 tun create header function

Improve mlx5e_route_lookup_ipv6 function structure by avoiding #ifdef then
return -EOPNOTSUPP in the middle of the function code.

To do so, we stub out mlx5e_tc_tun_create_header_ipv6 which is the only
caller of this helper function to avoid calling it altogether
when ipv6 is compiled out, which should also cleanup some compiler
warnings of unused variables.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
4 years agonet/mlx5: DR, Add support for Geneve packets SW steering
Yevgeny Kliteynik [Mon, 28 Oct 2019 15:22:06 +0000 (17:22 +0200)]
net/mlx5: DR, Add support for Geneve packets SW steering

Add support for SW steering matching on Geneve header fields:
 - VNI
 - OAM
 - protocol type
 - options length

Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>