platform/kernel/linux-rpi.git
5 years agorxrpc: Fix read-after-free in rxrpc_queue_local()
David Howells [Tue, 13 Aug 2019 21:26:36 +0000 (22:26 +0100)]
rxrpc: Fix read-after-free in rxrpc_queue_local()

rxrpc_queue_local() attempts to queue the local endpoint it is given and
then, if successful, prints a trace line.  The trace line includes the
current usage count - but we're not allowed to look at the local endpoint
at this point as we passed our ref on it to the workqueue.

Fix this by reading the usage count before queuing the work item.

Also fix the reading of local->debug_id for trace lines, which must be done
with the same consideration as reading the usage count.

Fixes: 09d2bf595db4 ("rxrpc: Add a tracepoint to track rxrpc_local refcounting")
Reported-by: syzbot+78e71c5bab4f76a6a719@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
5 years agorxrpc: Fix local endpoint replacement
David Howells [Mon, 12 Aug 2019 22:30:06 +0000 (23:30 +0100)]
rxrpc: Fix local endpoint replacement

When a local endpoint (struct rxrpc_local) ceases to be in use by any
AF_RXRPC sockets, it starts the process of being destroyed, but this
doesn't cause it to be removed from the namespace endpoint list immediately
as tearing it down isn't trivial and can't be done in softirq context, so
it gets deferred.

If a new socket comes along that wants to bind to the same endpoint, a new
rxrpc_local object will be allocated and rxrpc_lookup_local() will use
list_replace() to substitute the new one for the old.

Then, when the dying object gets to rxrpc_local_destroyer(), it is removed
unconditionally from whatever list it is on by calling list_del_init().

However, list_replace() doesn't reset the pointers in the replaced
list_head and so the list_del_init() will likely corrupt the local
endpoints list.

Fix this by using list_replace_init() instead.

Fixes: 730c5fd42c1e ("rxrpc: Fix local endpoint refcounting")
Reported-by: syzbot+193e29e9387ea5837f1d@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
5 years agonet: tc35815: Explicitly check NET_IP_ALIGN is not zero in tc35815_rx
Nathan Chancellor [Mon, 12 Aug 2019 03:13:45 +0000 (20:13 -0700)]
net: tc35815: Explicitly check NET_IP_ALIGN is not zero in tc35815_rx

clang warns:

drivers/net/ethernet/toshiba/tc35815.c:1507:30: warning: use of logical
'&&' with constant operand [-Wconstant-logical-operand]
                        if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN)
                                                  ^  ~~~~~~~~~~~~
drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: use '&' for a
bitwise operation
                        if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN)
                                                  ^~
                                                  &
drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: remove constant to
silence this warning
                        if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN)
                                                 ~^~~~~~~~~~~~~~~
1 warning generated.

Explicitly check that NET_IP_ALIGN is not zero, which matches how this
is checked in other parts of the tree. Because NET_IP_ALIGN is a build
time constant, this check will be constant folded away during
optimization.

Fixes: 82a9928db560 ("tc35815: Enable StripCRC feature")
Link: https://github.com/ClangBuiltLinux/linux/issues/608
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: initialise addr_trail_end when setting node addresses
Chris Packham [Sun, 11 Aug 2019 20:18:25 +0000 (08:18 +1200)]
tipc: initialise addr_trail_end when setting node addresses

We set the field 'addr_trial_end' to 'jiffies', instead of the current
value 0, at the moment the node address is initialized. This guarantees
we don't inadvertently enter an address trial period when the node
address is explicitly set by the user.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: Check existence of .port_mdb_add callback before calling it
Chen-Yu Tsai [Sun, 11 Aug 2019 14:18:25 +0000 (22:18 +0800)]
net: dsa: Check existence of .port_mdb_add callback before calling it

The dsa framework has optional .port_mdb_{prepare,add,del} callback fields
for drivers to handle multicast database entries. When adding an entry, the
framework goes through a prepare phase, then a commit phase. Drivers not
providing these callbacks should be detected in the prepare phase.

DSA core may still bypass the bridge layer and call the dsa_port_mdb_add
function directly with no prepare phase or no switchdev trans object,
and the framework ends up calling an undefined .port_mdb_add callback.
This results in a NULL pointer dereference, as shown in the log below.

The other functions seem to be properly guarded. Do the same for
.port_mdb_add in dsa_switch_mdb_add_bitmap() as well.

    8<--- cut here ---
    Unable to handle kernel NULL pointer dereference at virtual address 00000000
    pgd = (ptrval)
    [00000000] *pgd=00000000
    Internal error: Oops: 80000005 [#1] SMP ARM
    Modules linked in: rtl8xxxu rtl8192cu rtl_usb rtl8192c_common rtlwifi mac80211 cfg80211
    CPU: 1 PID: 134 Comm: kworker/1:2 Not tainted 5.3.0-rc1-00247-gd3519030752a #1
    Hardware name: Allwinner sun7i (A20) Family
    Workqueue: events switchdev_deferred_process_work
    PC is at 0x0
    LR is at dsa_switch_event+0x570/0x620
    pc : [<00000000>]    lr : [<c08533ec>]    psr: 80070013
    sp : ee871db8  ip : 00000000  fp : ee98d0a4
    r10: 0000000c  r9 : 00000008  r8 : ee89f710
    r7 : ee98d040  r6 : ee98d088  r5 : c0f04c48  r4 : ee98d04c
    r3 : 00000000  r2 : ee89f710  r1 : 00000008  r0 : ee98d040
    Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
    Control: 10c5387d  Table: 6deb406a  DAC: 00000051
    Process kworker/1:2 (pid: 134, stack limit = 0x(ptrval))
    Stack: (0xee871db8 to 0xee872000)
    1da0:                                                       ee871e14 103ace2d
    1dc0: 00000000 ffffffff 00000000 ee871e14 00000005 00000000 c08524a0 00000000
    1de0: ffffe000 c014bdfc c0f04c48 ee871e98 c0f04c48 ee9e5000 c0851120 c014bef0
    1e00: 00000000 b643aea2 ee9b4068 c08509a8 ee2bf940 ee89f710 ee871ecb 00000000
    1e20: 00000008 103ace2d 00000000 c087e248 ee29c868 103ace2d 00000001 ffffffff
    1e40: 00000000 ee871e98 00000006 00000000 c0fb2a50 c087e2d0 ffffffff c08523c4
    1e60: ffffffff c014bdfc 00000006 c0fad2d0 ee871e98 ee89f710 00000000 c014c500
    1e80: 00000000 ee89f3c0 c0f04c48 00000000 ee9e5000 c087dfb4 ee9e5000 00000000
    1ea0: ee89f710 ee871ecb 00000001 103ace2d 00000000 c0f04c48 00000000 c087e0a8
    1ec0: 00000000 efd9a3e0 0089f3c0 103ace2d ee89f700 ee89f710 ee9e5000 00000122
    1ee0: 00000100 c087e130 ee89f700 c0fad2c8 c1003ef0 c087de4c 2e928000 c0fad2ec
    1f00: c0fad2ec ee839580 ef7a62c0 ef7a9400 00000000 c087def8 c0fad2ec c01447dc
    1f20: ef315640 ef7a62c0 00000008 ee839580 ee839594 ef7a62c0 00000008 c0f03d00
    1f40: ef7a62d8 ef7a62c0 ffffe000 c0145b84 ffffe000 c0fb2420 c0bfaa8c 00000000
    1f60: ffffe000 ee84b600 ee84b5c0 00000000 ee870000 ee839580 c0145b40 ef0e5ea4
    1f80: ee84b61c c014a6f8 00000001 ee84b5c0 c014a5b0 00000000 00000000 00000000
    1fa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
    1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
    [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
    [<c014bdfc>] (notifier_call_chain) from [<c014bef0>] (raw_notifier_call_chain+0x18/0x20)
    [<c014bef0>] (raw_notifier_call_chain) from [<c08509a8>] (dsa_port_mdb_add+0x48/0x74)
    [<c08509a8>] (dsa_port_mdb_add) from [<c087e248>] (__switchdev_handle_port_obj_add+0x54/0xd4)
    [<c087e248>] (__switchdev_handle_port_obj_add) from [<c087e2d0>] (switchdev_handle_port_obj_add+0x8/0x14)
    [<c087e2d0>] (switchdev_handle_port_obj_add) from [<c08523c4>] (dsa_slave_switchdev_blocking_event+0x94/0xa4)
    [<c08523c4>] (dsa_slave_switchdev_blocking_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
    [<c014bdfc>] (notifier_call_chain) from [<c014c500>] (blocking_notifier_call_chain+0x50/0x68)
    [<c014c500>] (blocking_notifier_call_chain) from [<c087dfb4>] (switchdev_port_obj_notify+0x44/0xa8)
    [<c087dfb4>] (switchdev_port_obj_notify) from [<c087e0a8>] (switchdev_port_obj_add_now+0x90/0x104)
    [<c087e0a8>] (switchdev_port_obj_add_now) from [<c087e130>] (switchdev_port_obj_add_deferred+0x14/0x5c)
    [<c087e130>] (switchdev_port_obj_add_deferred) from [<c087de4c>] (switchdev_deferred_process+0x64/0x104)
    [<c087de4c>] (switchdev_deferred_process) from [<c087def8>] (switchdev_deferred_process_work+0xc/0x14)
    [<c087def8>] (switchdev_deferred_process_work) from [<c01447dc>] (process_one_work+0x218/0x50c)
    [<c01447dc>] (process_one_work) from [<c0145b84>] (worker_thread+0x44/0x5bc)
    [<c0145b84>] (worker_thread) from [<c014a6f8>] (kthread+0x148/0x150)
    [<c014a6f8>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
    Exception stack(0xee871fb0 to 0xee871ff8)
    1fa0:                                     00000000 00000000 00000000 00000000
    1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
    Code: bad PC value
    ---[ end trace 1292c61abd17b130 ]---

    [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
    corresponds to

$ arm-linux-gnueabihf-addr2line -C -i -e vmlinux c08533ec

linux/net/dsa/switch.c:156
linux/net/dsa/switch.c:178
linux/net/dsa/switch.c:328

Fixes: e6db98db8a95 ("net: dsa: add switch mdb bitmap functions")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_ptp: Keep unmatched entries in a linked list
Petr Machata [Sun, 11 Aug 2019 07:48:37 +0000 (10:48 +0300)]
mlxsw: spectrum_ptp: Keep unmatched entries in a linked list

To identify timestamps for matching with their packets, Spectrum-1 uses a
five-tuple of (port, direction, domain number, message type, sequence ID).
If there are several clients from the same domain behind a single port
sending Delay_Req's, the only thing differentiating these packets, as far
as Spectrum-1 is concerned, is the sequence ID. Should sequence IDs between
individual clients be similar, conflicts may arise. That is not a problem
to hardware, which will simply deliver timestamps on a first comes, first
served basis.

However the driver uses a simple hash table to store the unmatched pieces.
When a new conflicting piece arrives, it pushes out the previously stored
one, which if it is a packet, is delivered without timestamp. Later on as
the corresponding timestamps arrive, the first one is mismatched to the
second packet, and the second one is never matched and eventually is GCd.

To correct this issue, instead of using a simple rhashtable, use rhltable
to keep the unmatched entries.

Previously, a found unmatched entry would always be removed from the hash
table. That is not the case anymore--an incompatible entry is left in the
hash table. Therefore removal from the hash table cannot be used to confirm
the validity of the looked-up pointer, instead the lookup would simply need
to be redone. Therefore move it inside the critical section. This
simplifies a lot of the code.

Fixes: 8748642751ed ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls")
Reported-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: nps_enet: Fix function names in doc comments
Jonathan Neuschäfer [Sat, 10 Aug 2019 11:11:56 +0000 (13:11 +0200)]
net: nps_enet: Fix function names in doc comments

Adjust the function names in two doc comments to match the corresponding
functions.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorxrpc: Fix local refcounting
David Howells [Fri, 9 Aug 2019 21:47:47 +0000 (22:47 +0100)]
rxrpc: Fix local refcounting

Fix rxrpc_unuse_local() to handle a NULL local pointer as it can be called
on an unbound socket on which rx->local is not yet set.

The following reproduced (includes omitted):

int main(void)
{
socket(AF_RXRPC, SOCK_DGRAM, AF_INET);
return 0;
}

causes the following oops to occur:

BUG: kernel NULL pointer dereference, address: 0000000000000010
...
RIP: 0010:rxrpc_unuse_local+0x8/0x1b
...
Call Trace:
 rxrpc_release+0x2b5/0x338
 __sock_release+0x37/0xa1
 sock_close+0x14/0x17
 __fput+0x115/0x1e9
 task_work_run+0x72/0x98
 do_exit+0x51b/0xa7a
 ? __context_tracking_exit+0x4e/0x10e
 do_group_exit+0xab/0xab
 __x64_sys_exit_group+0x14/0x17
 do_syscall_64+0x89/0x1d4
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reported-by: syzbot+20dee719a2e090427b5f@syzkaller.appspotmail.com
Fixes: 730c5fd42c1e ("rxrpc: Fix local endpoint refcounting")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetdevsim: Restore per-network namespace accounting for fib entries
David Ahern [Tue, 6 Aug 2019 19:15:17 +0000 (12:15 -0700)]
netdevsim: Restore per-network namespace accounting for fib entries

Prior to the commit in the fixes tag, the resource controller in netdevsim
tracked fib entries and rules per network namespace. Restore that behavior.

Fixes: 5fc494225c1e ("netdevsim: create devlink instance per netdevsim instance")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
David S. Miller [Sun, 11 Aug 2019 21:49:34 +0000 (14:49 -0700)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2019-08-11

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) x64 JIT code generation fix for backward-jumps to 1st insn, from Alexei.

2) Fix buggy multi-closing of BTF file descriptor in libbpf, from Andrii.

3) Fix libbpf_num_possible_cpus() to make it thread safe, from Takshak.

4) Fix bpftool to dump an error if pinning fails, from Jakub.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: swap sk_write_space on close
Jakub Kicinski [Sat, 10 Aug 2019 01:36:23 +0000 (18:36 -0700)]
net/tls: swap sk_write_space on close

Now that we swap the original proto and clear the ULP pointer
on close we have to make sure no callback will try to access
the freed state. sk_write_space is not part of sk_prot, remember
to swap it.

Reported-by: syzbot+dcdc9deefaec44785f32@syzkaller.appspotmail.com
Fixes: 95fa145479fb ("bpf: sockmap/tls, close can race with map free")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agohv_netvsc: Fix a warning of suspicious RCU usage
Dexuan Cui [Fri, 9 Aug 2019 01:58:08 +0000 (01:58 +0000)]
hv_netvsc: Fix a warning of suspicious RCU usage

This fixes a warning of "suspicious rcu_dereference_check() usage"
when nload runs.

Fixes: 776e726bfb34 ("netvsc: fix RCU warning in get_stats")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5-fixes-2019-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Fri, 9 Aug 2019 20:23:33 +0000 (13:23 -0700)]
Merge tag 'mlx5-fixes-2019-08-08' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-08-08

This series introduces some fixes to mlx5 driver.

Highlights:
1) From Tariq, Critical mlx5 kTLS fixes to better align with hw specs.
2) From Aya, Fixes to mlx5 tx devlink health reporter.
3) From Maxim, aRFs parsing to use flow dissector to avoid relying on
invalid skb fields.

Please pull and let me know if there is any problem.

For -stable v4.3
 ('net/mlx5e: Only support tx/rx pause setting for port owner')
For -stable v4.9
 ('net/mlx5e: Use flow keys dissector to parse packets for ARFS')
For -stable v5.1
 ('net/mlx5e: Fix false negative indication on tx reporter CQE recovery')
 ('net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter')
 ('net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off')

Note: when merged with net-next this minor conflict will pop up:
++<<<<<<< (net-next)
 +      if (is_eswitch_flow) {
 +              flow->esw_attr->match_level = match_level;
 +              flow->esw_attr->tunnel_match_level = tunnel_match_level;
++=======
+       if (flow->flags & MLX5E_TC_FLOW_ESWITCH) {
+               flow->esw_attr->inner_match_level = inner_match_level;
+               flow->esw_attr->outer_match_level = outer_match_level;
++>>>>>>> (net)

To resolve, use hunks from net (2nd) and replace:
if (flow->flags & MLX5E_TC_FLOW_ESWITCH)
with
if (is_eswitch_flow)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoixgbe: fix possible deadlock in ixgbe_service_task()
Taehee Yoo [Thu, 8 Aug 2019 16:37:56 +0000 (09:37 -0700)]
ixgbe: fix possible deadlock in ixgbe_service_task()

ixgbe_service_task() calls unregister_netdev() under rtnl_lock().
But unregister_netdev() internally calls rtnl_lock().
So deadlock would occur.

Fixes: 59dd45d550c5 ("ixgbe: firmware recovery mode")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'Fix-collisions-in-socket-cookie-generation'
David S. Miller [Fri, 9 Aug 2019 20:14:46 +0000 (13:14 -0700)]
Merge branch 'Fix-collisions-in-socket-cookie-generation'

Daniel Borkmann says:

====================
Fix collisions in socket cookie generation

This change makes the socket cookie generator as a global counter
instead of per netns in order to fix cookie collisions for BPF use
cases we ran into. See main patch #1 for more details.

Given the change is small/trivial and fixes an issue we're seeing
my preference would be net tree (though it cleanly applies to
net-next as well). Went for net tree instead of bpf tree here given
the main change is in net/core/sock_diag.c, but either way would be
fine with me.

v1 -> v2:
  - Fix up commit description in patch #1, thanks Eric!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobpf: sync bpf.h to tools infrastructure
Daniel Borkmann [Thu, 8 Aug 2019 11:57:26 +0000 (13:57 +0200)]
bpf: sync bpf.h to tools infrastructure

Pull in updates in BPF helper function description.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosock: make cookie generation global instead of per netns
Daniel Borkmann [Thu, 8 Aug 2019 11:57:25 +0000 (13:57 +0200)]
sock: make cookie generation global instead of per netns

Generating and retrieving socket cookies are a useful feature that is
exposed to BPF for various program types through bpf_get_socket_cookie()
helper.

The fact that the cookie counter is per netns is quite a limitation
for BPF in practice in particular for programs in host namespace that
use socket cookies as part of a map lookup key since they will be
causing socket cookie collisions e.g. when attached to BPF cgroup hooks
or cls_bpf on tc egress in host namespace handling container traffic
from veth or ipvlan devices with peer in different netns. Change the
counter to be global instead.

Socket cookie consumers must assume the value as opqaue in any case.
Not every socket must have a cookie generated and knowledge of the
counter value itself does not provide much value either way hence
conversion to global is fine.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Martynas Pumputis <m@lambda.lt>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'rxrpc-fixes-20190809' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Fri, 9 Aug 2019 18:27:17 +0000 (11:27 -0700)]
Merge tag 'rxrpc-fixes-20190809' of git://git./linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
Here's a couple of fixes for rxrpc:

 (1) Fix refcounting of the local endpoint.

 (2) Don't calculate or report packet skew information.  This has been
     obsolete since AFS 3.1 and so is a waste of resources.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'bpf-bpftool-pinning-error-msg'
Daniel Borkmann [Fri, 9 Aug 2019 15:38:53 +0000 (17:38 +0200)]
Merge branch 'bpf-bpftool-pinning-error-msg'

Jakub Kicinski says:

====================
First make sure we don't use "prog" in error messages because
the pinning operation could be performed on a map. Second add
back missing error message if pin syscall failed.
====================

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools: bpftool: add error message on pin failure
Jakub Kicinski [Wed, 7 Aug 2019 00:19:23 +0000 (17:19 -0700)]
tools: bpftool: add error message on pin failure

No error message is currently printed if the pin syscall
itself fails. It got lost in the loadall refactoring.

Fixes: 77380998d91d ("bpftool: add loadall command")
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools: bpftool: fix error message (prog -> object)
Jakub Kicinski [Wed, 7 Aug 2019 00:19:22 +0000 (17:19 -0700)]
tools: bpftool: fix error message (prog -> object)

Change an error message to work for any object being
pinned not just programs.

Fixes: 71bb428fe2c1 ("tools: bpf: add bpftool")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agorxrpc: Don't bother generating maxSkew in the ACK packet
David Howells [Fri, 9 Aug 2019 14:20:41 +0000 (15:20 +0100)]
rxrpc: Don't bother generating maxSkew in the ACK packet

Don't bother generating maxSkew in the ACK packet as it has been obsolete
since AFS 3.1.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
5 years agorxrpc: Fix local endpoint refcounting
David Howells [Fri, 9 Aug 2019 14:20:41 +0000 (15:20 +0100)]
rxrpc: Fix local endpoint refcounting

The object lifetime management on the rxrpc_local struct is broken in that
the rxrpc_local_processor() function is expected to clean up and remove an
object - but it may get requeued by packets coming in on the backing UDP
socket once it starts running.

This may result in the assertion in rxrpc_local_rcu() firing because the
memory has been scheduled for RCU destruction whilst still queued:

rxrpc: Assertion failed
------------[ cut here ]------------
kernel BUG at net/rxrpc/local_object.c:468!

Note that if the processor comes around before the RCU free function, it
will just do nothing because ->dead is true.

Fix this by adding a separate refcount to count active users of the
endpoint that causes the endpoint to be destroyed when it reaches 0.

The original refcount can then be used to refcount objects through the work
processor and cause the memory to be rcu freed when that reaches 0.

Fixes: 4f95dd78a77e ("rxrpc: Rework local endpoint management")
Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
5 years agonet: tundra: tsi108: use spin_lock_irqsave instead of spin_lock_irq in IRQ context
Fuqian Huang [Fri, 9 Aug 2019 05:35:39 +0000 (13:35 +0800)]
net: tundra: tsi108: use spin_lock_irqsave instead of spin_lock_irq in IRQ context

As spin_unlock_irq will enable interrupts.
Function tsi108_stat_carry is called from interrupt handler tsi108_irq.
Interrupts are enabled in interrupt handler.
Use spin_lock_irqsave/spin_unlock_irqrestore instead of spin_(un)lock_irq
in IRQ context to avoid this.

Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoteam: Add vlan tx offload to hw_enc_features
YueHaibing [Thu, 8 Aug 2019 06:22:47 +0000 (14:22 +0800)]
team: Add vlan tx offload to hw_enc_features

We should also enable team's vlan tx offload in hw_enc_features,
pass the vlan packets to the slave devices with vlan tci, let the
slave handle vlan tunneling offload implementation.

Fixes: 3268e5cb494d ("team: Advertise tunneling offload features")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: prevent skb_orphan() from leaking TLS plain text with offload
Jakub Kicinski [Thu, 8 Aug 2019 00:03:59 +0000 (17:03 -0700)]
net/tls: prevent skb_orphan() from leaking TLS plain text with offload

sk_validate_xmit_skb() and drivers depend on the sk member of
struct sk_buff to identify segments requiring encryption.
Any operation which removes or does not preserve the original TLS
socket such as skb_orphan() or skb_clone() will cause clear text
leaks.

Make the TCP socket underlying an offloaded TLS connection
mark all skbs as decrypted, if TLS TX is in offload mode.
Then in sk_validate_xmit_skb() catch skbs which have no socket
(or a socket with no validation) and decrypted flag set.

Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
sk->sk_validate_xmit_skb are slightly interchangeable right now,
they all imply TLS offload. The new checks are guarded by
CONFIG_TLS_DEVICE because that's the option guarding the
sk_buff->decrypted member.

Second, smaller issue with orphaning is that it breaks
the guarantee that packets will be delivered to device
queues in-order. All TLS offload drivers depend on that
scheduling property. This means skb_orphan_partial()'s
trick of preserving partial socket references will cause
issues in the drivers. We need a full orphan, and as a
result netem delay/throttling will cause all TLS offload
skbs to be dropped.

Reusing the sk_buff->decrypted flag also protects from
leaking clear text when incoming, decrypted skb is redirected
(e.g. by TC).

See commit 0608c69c9a80 ("bpf: sk_msg, sock{map|hash} redirect
through ULP") for justification why the internal flag is safe.
The only location which could leak the flag in is tcp_bpf_sendmsg(),
which is taken care of by clearing the previously unused bit.

v2:
 - remove superfluous decrypted mark copy (Willem);
 - remove the stale doc entry (Boris);
 - rely entirely on EOR marking to prevent coalescing (Boris);
 - use an internal sendpages flag instead of marking the socket
   (Boris).
v3 (Willem):
 - reorganize the can_skb_orphan_partial() condition;
 - fix the flag leak-in through tcp_bpf_sendmsg.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'skbedit-batch-fixes'
David S. Miller [Fri, 9 Aug 2019 05:37:06 +0000 (22:37 -0700)]
Merge branch 'skbedit-batch-fixes'

Roman Mashak says:

====================
Fix batched event generation for skbedit action

When adding or deleting a batch of entries, the kernel sends up to
TCA_ACT_MAX_PRIO (defined to 32 in kernel) entries in an event to user
space. However it does not consider that the action sizes may vary and
require different skb sizes.

For example, consider the following script adding 32 entries with all
supported skbedit parameters and cookie (in order to maximize netlink
messages size):

% cat tc-batch.sh
TC="sudo /mnt/iproute2.git/tc/tc"

$TC actions flush action skbedit
for i in `seq 1 $1`;
do
   cmd="action skbedit queue_mapping 2 priority 10 mark 7/0xaabbccdd \
               ptype host inheritdsfield \
               index $i cookie aabbccddeeff112233445566778800a1 "
   args=$args$cmd
done
$TC actions add $args
%
% ./tc-batch.sh 32
Error: Failed to fill netlink attributes while adding TC action.
We have an error talking to the kernel
%

patch 1 adds callback in tc_action_ops of skbedit action, which calculates
the action size, and passes size to tcf_add_notify()/tcf_del_notify().

patch 2 updates the TDC test suite with relevant skbedit test cases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotc-testing: updated skbedit action tests with batch create/delete
Roman Mashak [Wed, 7 Aug 2019 19:57:29 +0000 (15:57 -0400)]
tc-testing: updated skbedit action tests with batch create/delete

Update TDC tests with cases varifying ability of TC to install or delete
batches of skbedit actions.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet sched: update skbedit action for batched events operations
Roman Mashak [Wed, 7 Aug 2019 19:57:28 +0000 (15:57 -0400)]
net sched: update skbedit action for batched events operations

Add get_fill_size() routine used to calculate the action size
when building a batch of events.

Fixes: ca9b0e27e ("pkt_action: add new action skbedit")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: remove set but not used variables 'tx_vid' and 'rx_vid'
YueHaibing [Wed, 7 Aug 2019 13:08:56 +0000 (21:08 +0800)]
net: dsa: sja1105: remove set but not used variables 'tx_vid' and 'rx_vid'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/dsa/sja1105/sja1105_main.c: In function sja1105_fdb_dump:
drivers/net/dsa/sja1105/sja1105_main.c:1226:14: warning:
 variable tx_vid set but not used [-Wunused-but-set-variable]
drivers/net/dsa/sja1105/sja1105_main.c:1226:6: warning:
 variable rx_vid set but not used [-Wunused-but-set-variable]

They are not used since commit 6d7c7d948a2e ("net: dsa:
sja1105: Fix broken learning with vlan_filtering disabled")

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobonding: Add vlan tx offload to hw_enc_features
YueHaibing [Wed, 7 Aug 2019 02:19:59 +0000 (10:19 +0800)]
bonding: Add vlan tx offload to hw_enc_features

As commit 30d8177e8ac7 ("bonding: Always enable vlan tx offload")
said, we should always enable bonding's vlan tx offload, pass the
vlan packets to the slave devices with vlan tci, let them to handle
vlan implementation.

Now if encapsulation protocols like VXLAN is used, skb->encapsulation
may be set, then the packet is passed to vlan device which based on
bonding device. However in netif_skb_features(), the check of
hw_enc_features:

 if (skb->encapsulation)
                 features &= dev->hw_enc_features;

clears NETIF_F_HW_VLAN_CTAG_TX/NETIF_F_HW_VLAN_STAG_TX. This results
in same issue in commit 30d8177e8ac7 like this:

vlan_dev_hard_start_xmit
  -->dev_queue_xmit
    -->validate_xmit_skb
      -->netif_skb_features //NETIF_F_HW_VLAN_CTAG_TX is cleared
      -->validate_xmit_vlan
        -->__vlan_hwaccel_push_inside //skb->tci is cleared
...
 --> bond_start_xmit
   --> bond_xmit_hash //BOND_XMIT_POLICY_ENCAP34
     --> __skb_flow_dissect // nhoff point to IP header
        -->  case htons(ETH_P_8021Q)
             // skb_vlan_tag_present is false, so
             vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
             //vlan point to ip header wrongly

Fixes: b2a103e6d0af ("bonding: convert to ndo_fix_features")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: sch_taprio: fix memleak in error path for sched list parse
Ivan Khoronzhuk [Tue, 6 Aug 2019 22:45:40 +0000 (01:45 +0300)]
net: sched: sch_taprio: fix memleak in error path for sched list parse

In error case, all entries should be freed from the sched list
before deleting it. For simplicity use rcu way.

Fixes: 5a781ccbd19e46 ("tc: Add support for configuring the taprio scheduler")
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: docs: replace IPX in tuntap documentation
Stephen Hemminger [Mon, 5 Aug 2019 22:30:03 +0000 (15:30 -0700)]
net: docs: replace IPX in tuntap documentation

IPX is no longer supported, but the example in the documentation
might useful. Replace it with IPv6.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodocs: admin-guide: remove references to IPX and token-ring
Stephen Hemminger [Mon, 5 Aug 2019 22:30:02 +0000 (15:30 -0700)]
docs: admin-guide: remove references to IPX and token-ring

Both IPX and TR have not been supported for a while now.
Remove them from the /proc/sys/net documentation.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoxen/netback: Reset nr_frags before freeing skb
Ross Lagerwall [Mon, 5 Aug 2019 15:34:34 +0000 (16:34 +0100)]
xen/netback: Reset nr_frags before freeing skb

At this point nr_frags has been incremented but the frag does not yet
have a page assigned so freeing the skb results in a crash. Reset
nr_frags before freeing the skb to prevent this.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoinet: frags: re-introduce skb coalescing for local delivery
Guillaume Nault [Fri, 2 Aug 2019 15:15:03 +0000 (17:15 +0200)]
inet: frags: re-introduce skb coalescing for local delivery

Before commit d4289fcc9b16 ("net: IP6 defrag: use rbtrees for IPv6
defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus
generating many fragments) and running over an IPsec tunnel, reported
more than 6Gbps throughput. After that patch, the same test gets only
9Mbps when receiving on a be2net nic (driver can make a big difference
here, for example, ixgbe doesn't seem to be affected).

By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing
(IPv4 fragment coalescing was dropped by commit 14fe22e33462 ("Revert
"ipv4: use skb coalescing in defragmentation"")).

Without fragment coalescing, be2net runs out of Rx ring entries and
starts to drop frames (ethtool reports rx_drops_no_frags errors). Since
the netperf traffic is only composed of UDP fragments, any lost packet
prevents reassembly of the full datagram. Therefore, fragments which
have no possibility to ever get reassembled pile up in the reassembly
queue, until the memory accounting exeeds the threshold. At that point
no fragment is accepted anymore, which effectively discards all
netperf traffic.

When reassembly timeout expires, some stale fragments are removed from
the reassembly queue, so a few packets can be received, reassembled
and delivered to the netperf receiver. But the nic still drops frames
and soon the reassembly queue gets filled again with stale fragments.
These long time frames where no datagram can be received explain why
the performance drop is so significant.

Re-introducing fragment coalescing is enough to get the initial
performances again (6.6Gbps with be2net): driver doesn't drop frames
anymore (no more rx_drops_no_frags errors) and the reassembly engine
works at full speed.

This patch is quite conservative and only coalesces skbs for local
IPv4 and IPv6 delivery (in order to avoid changing skb geometry when
forwarding). Coalescing could be extended in the future if need be, as
more scenarios would probably benefit from it.

[0]: Test configuration
Sender:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir in tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir out tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
netserver -D -L fc00:2::1

Receiver:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir in tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir out tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
netperf -H fc00:2::1 -f k -P 0 -L fc00:1::1 -l 60 -t UDP_STREAM -I 99,5 -i 5,5 -T5,5 -6

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5e: Remove redundant check in CQE recovery flow of tx reporter
Aya Levin [Thu, 8 Aug 2019 12:55:48 +0000 (15:55 +0300)]
net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter

Remove check of recovery bit, in the beginning of the CQE recovery
function. This test is already performed right before the reporter
is invoked, when CQE error is detected.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Fix error flow of CQE recovery on tx reporter
Aya Levin [Tue, 6 Aug 2019 12:19:19 +0000 (15:19 +0300)]
net/mlx5e: Fix error flow of CQE recovery on tx reporter

CQE recovery function begins with test and set of recovery bit. Add an
error flow which ensures clearing of this bit when leaving the recovery
function, to allow further recoveries to take place. This allows removal
of clearing recovery bit on sq activate.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Fix false negative indication on tx reporter CQE recovery
Aya Levin [Wed, 7 Aug 2019 12:59:06 +0000 (15:59 +0300)]
net/mlx5e: Fix false negative indication on tx reporter CQE recovery

Remove wrong error return value when SQ is not in error state.
CQE recovery on TX reporter queries the sq state. If the sq is not in
error state, the sq is either in ready or reset state. Ready state is
good state which doesn't require recovery and reset state is a temporal
state which ends in ready state. With this patch, CQE recovery in this
scenario is successful.

Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: kTLS, Fix tisn field placement
Tariq Toukan [Tue, 30 Jul 2019 10:45:42 +0000 (13:45 +0300)]
net/mlx5e: kTLS, Fix tisn field placement

Shift the tisn field in the WQE control segment, per the
HW specification.

Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: kTLS, Fix tisn field name
Tariq Toukan [Thu, 8 Aug 2019 09:26:57 +0000 (12:26 +0300)]
net/mlx5e: kTLS, Fix tisn field name

Use the proper tisn field name from the union in struct mlx5_wqe_ctrl_seg.

Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: kTLS, Fix progress params context WQE layout
Tariq Toukan [Tue, 30 Jul 2019 08:55:25 +0000 (11:55 +0300)]
net/mlx5e: kTLS, Fix progress params context WQE layout

The TLS progress params context WQE should not include an
Eth segment, drop it.
In addition, align the tls_progress_params layout with the
HW specification document:
- fix the tisn field name.
- remove the valid bit.

Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: kTLS, Fix wrong TIS opmod constants
Tariq Toukan [Sun, 21 Jul 2019 11:18:42 +0000 (14:18 +0300)]
net/mlx5: kTLS, Fix wrong TIS opmod constants

Fix the used constants for TLS TIS opmods, per the HW specification.

Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: crypto, Fix wrong offset in encryption key command
Tariq Toukan [Sun, 21 Jul 2019 11:13:15 +0000 (14:13 +0300)]
net/mlx5: crypto, Fix wrong offset in encryption key command

Fix the 128b key offset in key encryption key creation command,
per the HW specification.

Fixes: 45d3b55dc665 ("net/mlx5: Add crypto library to support create/destroy encryption key")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off
Mohamad Heib [Tue, 23 Apr 2019 18:13:48 +0000 (21:13 +0300)]
net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off

Setting speed to 56GBASE is allowed only with auto-negotiation enabled.

This patch prevent setting speed to 56GBASE when auto-negotiation disabled.

Fixes: f62b8bb8f2d3 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
Signed-off-by: Mohamad Heib <mohamadh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Only support tx/rx pause setting for port owner
Huy Nguyen [Thu, 1 Aug 2019 16:10:19 +0000 (11:10 -0500)]
net/mlx5e: Only support tx/rx pause setting for port owner

Only support changing tx/rx pause frame setting if the net device
is the vport group manager.

Fixes: 3c2d18ef22df ("net/mlx5e: Support ethtool get/set_pauseparam")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Support inner header match criteria for non decap flow action
Huy Nguyen [Wed, 17 Jul 2019 19:04:31 +0000 (14:04 -0500)]
net/mlx5: Support inner header match criteria for non decap flow action

We have an issue that OVS application creates an offloaded drop rule
that drops VXLAN traffic with both inner and outer header match
criteria. mlx5_core driver detects correctly the inner and outer
header match criteria but does not enable the inner header match criteria
due to an incorrect assumption in mlx5_eswitch_add_offloaded_rule that
only decap rule needs inner header criteria.

Solution:
Remove mlx5_esw_flow_attr's match_level and tunnel_match_level and add
two new members: inner_match_level and outer_match_level.
inner/outer_match_level is set to NONE if the inner/outer match criteria
is not specified in the tc rule creation request. The decap assumption is
removed and the code just needs to check for inner/outer_match_level to
enable the corresponding bit in firmware's match_criteria_enable value.

Fixes: 6363651d6dd7 ("net/mlx5e: Properly set steering match levels for offloaded TC decap rules")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Use flow keys dissector to parse packets for ARFS
Maxim Mikityanskiy [Fri, 5 Jul 2019 14:59:28 +0000 (17:59 +0300)]
net/mlx5e: Use flow keys dissector to parse packets for ARFS

The current ARFS code relies on certain fields to be set in the SKB
(e.g. transport_header) and extracts IP addresses and ports by custom
code that parses the packet. The necessary SKB fields, however, are not
always set at that point, which leads to an out-of-bounds access. Use
skb_flow_dissect_flow_keys() to get the necessary information reliably,
fix the out-of-bounds access and reuse the code.

Fixes: 18c908e477dc ("net/mlx5e: Add accelerated RFS support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoMerge tag 'batadv-net-for-davem-20190808' of git://git.open-mesh.org/linux-merge
David S. Miller [Thu, 8 Aug 2019 18:25:39 +0000 (11:25 -0700)]
Merge tag 'batadv-net-for-davem-20190808' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here are some batman-adv bugfixes:

 - Fix netlink dumping of all mcast_flags buckets, by Sven Eckelmann

 - Fix deletion of RTR(4|6) mcast list entries, by Sven Eckelmann
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Wed, 7 Aug 2019 00:11:59 +0000 (17:11 -0700)]
Merge git://git./linux/kernel/git/netdev/net

Pull networking fixes from David Miller:
 "Yeah I should have sent a pull request last week, so there is a lot
  more here than usual:

   1) Fix memory leak in ebtables compat code, from Wenwen Wang.

   2) Several kTLS bug fixes from Jakub Kicinski (circular close on
      disconnect etc.)

   3) Force slave speed check on link state recovery in bonding 802.3ad
      mode, from Thomas Falcon.

   4) Clear RX descriptor bits before assigning buffers to them in
      stmmac, from Jose Abreu.

   5) Several missing of_node_put() calls, mostly wrt. for_each_*() OF
      loops, from Nishka Dasgupta.

   6) Double kfree_skb() in peak_usb can driver, from Stephane Grosjean.

   7) Need to hold sock across skb->destructor invocation, from Cong
      Wang.

   8) IP header length needs to be validated in ipip tunnel xmit, from
      Haishuang Yan.

   9) Use after free in ip6 tunnel driver, also from Haishuang Yan.

  10) Do not use MSI interrupts on r8169 chips before RTL8168d, from
      Heiner Kallweit.

  11) Upon bridge device init failure, we need to delete the local fdb.
      From Nikolay Aleksandrov.

  12) Handle erros from of_get_mac_address() properly in stmmac, from
      Martin Blumenstingl.

  13) Handle concurrent rename vs. dump in netfilter ipset, from Jozsef
      Kadlecsik.

  14) Setting NETIF_F_LLTX on mac80211 causes complete breakage with
      some devices, so revert. From Johannes Berg.

  15) Fix deadlock in rxrpc, from David Howells.

  16) Fix Kconfig deps of enetc driver, we must have PHYLIB. From Yue
      Haibing.

  17) Fix mvpp2 crash on module removal, from Matteo Croce.

  18) Fix race in genphy_update_link, from Heiner Kallweit.

  19) bpf_xdp_adjust_head() stopped working with generic XDP when we
      fixes generic XDP to support stacked devices properly, fix from
      Jesper Dangaard Brouer.

  20) Unbalanced RCU locking in rt6_update_exception_stamp_rt(), from
      David Ahern.

  21) Several memory leaks in new sja1105 driver, from Vladimir Oltean"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (214 commits)
  net: dsa: sja1105: Fix memory leak on meta state machine error path
  net: dsa: sja1105: Fix memory leak on meta state machine normal path
  net: dsa: sja1105: Really fix panic on unregistering PTP clock
  net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well
  net: dsa: sja1105: Fix broken learning with vlan_filtering disabled
  net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus()
  net: sched: sample: allow accessing psample_group with rtnl
  net: sched: police: allow accessing police->params with rtnl
  net: hisilicon: Fix dma_map_single failed on arm64
  net: hisilicon: fix hip04-xmit never return TX_BUSY
  net: hisilicon: make hip04_tx_reclaim non-reentrant
  tc-testing: updated vlan action tests with batch create/delete
  net sched: update vlan action for batched events operations
  net: stmmac: tc: Do not return a fragment entry
  net: stmmac: Fix issues when number of Queues >= 4
  net: stmmac: xgmac: Fix XGMAC selftests
  be2net: disable bh with spin_lock in be_process_mcc
  net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
  net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs
  net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
  ...

5 years agoMerge branch 'sja1105-fixes'
David S. Miller [Tue, 6 Aug 2019 21:37:02 +0000 (14:37 -0700)]
Merge branch 'sja1105-fixes'

Vladimir Oltean says:

====================
Fixes for SJA1105 DSA: FDBs, Learning and PTP

This is an assortment of functional fixes for the sja1105 switch driver
targeted for the "net" tree (although they apply on net-next just as
well).

Patch 1/5 ("net: dsa: sja1105: Fix broken learning with vlan_filtering
disabled") repairs a breakage introduced in the early development stages
of the driver: support for traffic from the CPU has broken "normal"
frame forwarding (based on DMAC) - there is connectivity through the
switch only because all frames are flooded.
I debated whether this patch qualifies as a fix, since it puts the
switch into a mode it has never operated in before (aka SVL). But
"normal" forwarding did use to work before the "Traffic support for
SJA1105 DSA driver" patchset, and arguably this patch should have been
part of that.
Also, it would be strange for this feature to be broken in the 5.2 LTS.

Patch 2/5 ("net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as
well") is a simplification of a previous FDB-related patch that is
currently in the 5.3 rc's.

Patches 3/5 - 5/5 fix various crashes found while running linuxptp over the
switch ports for extended periods of time, or in conjunction with other
error conditions. The fixed-up commits were all introduced in 5.2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: Fix memory leak on meta state machine error path
Vladimir Oltean [Sun, 4 Aug 2019 22:38:48 +0000 (01:38 +0300)]
net: dsa: sja1105: Fix memory leak on meta state machine error path

When RX timestamping is enabled and two link-local (non-meta) frames are
received in a row, this constitutes an error.

The tagger is always caching the last link-local frame, in an attempt to
merge it with the meta follow-up frame when that arrives. To recover
from the above error condition, the initial cached link-local frame is
dropped and the second frame in a row is cached (in expectance of the
second meta frame).

However, when dropping the initial link-local frame, its backing memory
was being leaked.

Fixes: f3097be21bf1 ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: Fix memory leak on meta state machine normal path
Vladimir Oltean [Sun, 4 Aug 2019 22:38:47 +0000 (01:38 +0300)]
net: dsa: sja1105: Fix memory leak on meta state machine normal path

After a meta frame is received, it is associated with the cached
sp->data->stampable_skb from the DSA tagger private structure.

Cached means its refcount is incremented with skb_get() in order for
dsa_switch_rcv() to not free it when the tagger .rcv returns NULL.

The mistake is that skb_unref() is not the correct function to use. It
will correctly decrement the refcount (which will go back to zero) but
the skb memory will not be freed.  That is the job of kfree_skb(), which
also calls skb_unref().

But it turns out that freeing the cached stampable_skb is in fact not
necessary.  It is still a perfectly valid skb, and now it is even
annotated with the partial RX timestamp.  So remove the skb_copy()
altogether and simply pass the stampable_skb with a refcount of 1
(incremented by us, decremented by dsa_switch_rcv) up the stack.

Fixes: f3097be21bf1 ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: Really fix panic on unregistering PTP clock
Vladimir Oltean [Sun, 4 Aug 2019 22:38:46 +0000 (01:38 +0300)]
net: dsa: sja1105: Really fix panic on unregistering PTP clock

The IS_ERR_OR_NULL(priv->clock) check inside
sja1105_ptp_clock_unregister() is preventing cancel_delayed_work_sync
from actually being run.

Additionally, sja1105_ptp_clock_unregister() does not actually get run,
when placed in sja1105_remove(). The DSA switch gets torn down, but the
sja1105 module does not get unregistered. So sja1105_ptp_clock_unregister
needs to be moved to sja1105_teardown, to be symmetrical with
sja1105_ptp_clock_register which is called from the DSA sja1105_setup.

It is strange to fix a "fixes" patch, but the probe failure can only be
seen when the attached PHY does not respond to MDIO (issue which I can't
pinpoint the reason to) and it goes away after I power-cycle the board.
This time the patch was validated on a failing board, and the kernel
panic from the fixed commit's message can no longer be seen.

Fixes: 29dd908d355f ("net: dsa: sja1105: Cancel PTP delayed work on unregister")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well
Vladimir Oltean [Sun, 4 Aug 2019 22:38:45 +0000 (01:38 +0300)]
net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well

It looks like the FDB dump taken from first-generation switches also
contains information on whether entries are static or not. So use that
instead of searching through the driver's tables.

Fixes: d763778224ea ("net: dsa: sja1105: Implement is_static for FDB entries on E/T")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: Fix broken learning with vlan_filtering disabled
Vladimir Oltean [Sun, 4 Aug 2019 22:38:44 +0000 (01:38 +0300)]
net: dsa: sja1105: Fix broken learning with vlan_filtering disabled

When put under a bridge with vlan_filtering 0, the SJA1105 ports will
flood all traffic as if learning was broken. This is because learning
interferes with the rx_vid's configured by dsa_8021q as unique pvid's.

So learning technically still *does* work, it's just that the learnt
entries never get matched due to their unique VLAN ID.

The setting that saves the day is Shared VLAN Learning, which on this
switch family works exactly as desired: VLAN tagging still works
(untagged traffic gets the correct pvid) and FDB entries are still
populated with the correct contents including VID. Also, a frame cannot
violate the forwarding domain restrictions enforced by its classified
VLAN. It is just that the VID is ignored when looking up the FDB for
taking a forwarding decision (selecting the egress port).

This patch activates SVL, and the result is that frames with a learnt
DMAC are no longer flooded in the scenario described above.

Now exactly *because* SVL works as desired, we have to revisit some
earlier patches:

- It is no longer necessary to manipulate the VID of the 'bridge fdb
  {add,del}' command when vlan_filtering is off. This is because now,
  SVL is enabled for that case, so the actual VID does not matter*.

- It is still desirable to hide dsa_8021q VID's in the FDB dump
  callback. But right now the dump callback should no longer hide
  duplicates (one per each front panel port's pvid, plus one for the
  VLAN that the CPU port is going to tag a TX frame with), because there
  shouldn't be any (the switch will match a single FDB entry no matter
  its VID anyway).

* Not really... It's no longer necessary to transform a 'bridge fdb add'
  into 5 fdb add operations, but the user might still add a fdb entry with
  any vid, and all of them would appear as duplicates in 'bridge fdb
  show'. So force a 'bridge fdb add' to insert the VID of 0**, so that we
  can prune the duplicates at insertion time.

** The VID of 0 is better than 1 because it is always guaranteed to be
   in the ports' hardware filter. DSA also avoids putting the VID inside
   the netlink response message towards the bridge driver when we return
   this particular VID, which makes it suitable for FDB entries learnt
   with vlan_filtering off.

Fixes: 227d07a07ef1 ("net: dsa: sja1105: Add support for traffic through standalone ports")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Georg Waibel <georg.waibel@sensor-technik.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus()
Nishka Dasgupta [Sun, 4 Aug 2019 15:30:18 +0000 (21:00 +0530)]
net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus()

Each iteration of for_each_available_child_of_node() puts the previous
node, but in the case of a return from the middle of the loop, there
is no put, thus causing a memory leak. Hence add an of_node_put() before
the return.
Additionally, the local variable ports in the function
qca8k_setup_mdio_bus() takes the return value of of_get_child_by_name(),
which gets a node but does not put it. If the function returns without
putting ports, it may cause a memory leak. Hence put ports before the
mid-loop return statement, and also outside the loop after its last usage
in this function.
Issues found with Coccinelle.

Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'flow_offload-action-fixes'
David S. Miller [Tue, 6 Aug 2019 21:15:39 +0000 (14:15 -0700)]
Merge branch 'flow_offload-action-fixes'

Vlad Buslov says:

====================
action fixes for flow_offload infra compatibility

Fix rcu warnings due to usage of action helpers that expect rcu read lock
protection from rtnl-protected context of flow_offload infra.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: sample: allow accessing psample_group with rtnl
Vlad Buslov [Sat, 3 Aug 2019 13:36:19 +0000 (16:36 +0300)]
net: sched: sample: allow accessing psample_group with rtnl

Recently implemented support for sample action in flow_offload infra leads
to following rcu usage warning:

[ 1938.234856] =============================
[ 1938.234858] WARNING: suspicious RCU usage
[ 1938.234863] 5.3.0-rc1+ #574 Not tainted
[ 1938.234866] -----------------------------
[ 1938.234869] include/net/tc_act/tc_sample.h:47 suspicious rcu_dereference_check() usage!
[ 1938.234872]
               other info that might help us debug this:

[ 1938.234875]
               rcu_scheduler_active = 2, debug_locks = 1
[ 1938.234879] 1 lock held by tc/19540:
[ 1938.234881]  #0: 00000000b03cb918 (rtnl_mutex){+.+.}, at: tc_new_tfilter+0x47c/0x970
[ 1938.234900]
               stack backtrace:
[ 1938.234905] CPU: 2 PID: 19540 Comm: tc Not tainted 5.3.0-rc1+ #574
[ 1938.234908] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 1938.234911] Call Trace:
[ 1938.234922]  dump_stack+0x85/0xc0
[ 1938.234930]  tc_setup_flow_action+0xed5/0x2040
[ 1938.234944]  fl_hw_replace_filter+0x11f/0x2e0 [cls_flower]
[ 1938.234965]  fl_change+0xd24/0x1b30 [cls_flower]
[ 1938.234990]  tc_new_tfilter+0x3e0/0x970
[ 1938.235021]  ? tc_del_tfilter+0x720/0x720
[ 1938.235028]  rtnetlink_rcv_msg+0x389/0x4b0
[ 1938.235038]  ? netlink_deliver_tap+0x95/0x400
[ 1938.235044]  ? rtnl_dellink+0x2d0/0x2d0
[ 1938.235053]  netlink_rcv_skb+0x49/0x110
[ 1938.235063]  netlink_unicast+0x171/0x200
[ 1938.235073]  netlink_sendmsg+0x224/0x3f0
[ 1938.235091]  sock_sendmsg+0x5e/0x60
[ 1938.235097]  ___sys_sendmsg+0x2ae/0x330
[ 1938.235111]  ? __handle_mm_fault+0x12cd/0x19e0
[ 1938.235125]  ? __handle_mm_fault+0x12cd/0x19e0
[ 1938.235138]  ? find_held_lock+0x2b/0x80
[ 1938.235147]  ? do_user_addr_fault+0x22d/0x490
[ 1938.235160]  __sys_sendmsg+0x59/0xa0
[ 1938.235178]  do_syscall_64+0x5c/0xb0
[ 1938.235187]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1938.235192] RIP: 0033:0x7ff9a4d597b8
[ 1938.235197] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83
 ec 28 89 54
[ 1938.235200] RSP: 002b:00007ffcfe381c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1938.235205] RAX: ffffffffffffffda RBX: 000000005d4497f9 RCX: 00007ff9a4d597b8
[ 1938.235208] RDX: 0000000000000000 RSI: 00007ffcfe381cb0 RDI: 0000000000000003
[ 1938.235211] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
[ 1938.235214] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000001
[ 1938.235217] R13: 0000000000480640 R14: 0000000000000012 R15: 0000000000000001

Change tcf_sample_psample_group() helper to allow using it from both rtnl
and rcu protected contexts.

Fixes: a7a7be6087b0 ("net/sched: add sample action to the hardware intermediate representation")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: police: allow accessing police->params with rtnl
Vlad Buslov [Sat, 3 Aug 2019 13:36:18 +0000 (16:36 +0300)]
net: sched: police: allow accessing police->params with rtnl

Recently implemented support for police action in flow_offload infra leads
to following rcu usage warning:

[ 1925.881092] =============================
[ 1925.881094] WARNING: suspicious RCU usage
[ 1925.881098] 5.3.0-rc1+ #574 Not tainted
[ 1925.881100] -----------------------------
[ 1925.881104] include/net/tc_act/tc_police.h:57 suspicious rcu_dereference_check() usage!
[ 1925.881106]
               other info that might help us debug this:

[ 1925.881109]
               rcu_scheduler_active = 2, debug_locks = 1
[ 1925.881112] 1 lock held by tc/18591:
[ 1925.881115]  #0: 00000000b03cb918 (rtnl_mutex){+.+.}, at: tc_new_tfilter+0x47c/0x970
[ 1925.881124]
               stack backtrace:
[ 1925.881127] CPU: 2 PID: 18591 Comm: tc Not tainted 5.3.0-rc1+ #574
[ 1925.881130] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 1925.881132] Call Trace:
[ 1925.881138]  dump_stack+0x85/0xc0
[ 1925.881145]  tc_setup_flow_action+0x1771/0x2040
[ 1925.881155]  fl_hw_replace_filter+0x11f/0x2e0 [cls_flower]
[ 1925.881175]  fl_change+0xd24/0x1b30 [cls_flower]
[ 1925.881200]  tc_new_tfilter+0x3e0/0x970
[ 1925.881231]  ? tc_del_tfilter+0x720/0x720
[ 1925.881243]  rtnetlink_rcv_msg+0x389/0x4b0
[ 1925.881250]  ? netlink_deliver_tap+0x95/0x400
[ 1925.881257]  ? rtnl_dellink+0x2d0/0x2d0
[ 1925.881264]  netlink_rcv_skb+0x49/0x110
[ 1925.881275]  netlink_unicast+0x171/0x200
[ 1925.881284]  netlink_sendmsg+0x224/0x3f0
[ 1925.881299]  sock_sendmsg+0x5e/0x60
[ 1925.881305]  ___sys_sendmsg+0x2ae/0x330
[ 1925.881309]  ? task_work_add+0x43/0x50
[ 1925.881314]  ? fput_many+0x45/0x80
[ 1925.881329]  ? __lock_acquire+0x248/0x1930
[ 1925.881342]  ? find_held_lock+0x2b/0x80
[ 1925.881347]  ? task_work_run+0x7b/0xd0
[ 1925.881359]  __sys_sendmsg+0x59/0xa0
[ 1925.881375]  do_syscall_64+0x5c/0xb0
[ 1925.881381]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1925.881384] RIP: 0033:0x7feb245047b8
[ 1925.881388] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83
 ec 28 89 54
[ 1925.881391] RSP: 002b:00007ffc2d2a5788 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1925.881395] RAX: ffffffffffffffda RBX: 000000005d4497ed RCX: 00007feb245047b8
[ 1925.881398] RDX: 0000000000000000 RSI: 00007ffc2d2a57f0 RDI: 0000000000000003
[ 1925.881400] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
[ 1925.881403] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000001
[ 1925.881406] R13: 0000000000480640 R14: 0000000000000012 R15: 0000000000000001

Change tcf_police_rate_bytes_ps() and tcf_police_tcfp_burst() helpers to
allow using them from both rtnl and rcu protected contexts.

Fixes: 8c8cfc6ed274 ("net/sched: add police action to the hardware intermediate representation")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'hisilicon-fixes'
David S. Miller [Tue, 6 Aug 2019 21:14:01 +0000 (14:14 -0700)]
Merge branch 'hisilicon-fixes'

Jiangfeng Xiao says:

====================
net: hisilicon: Fix a few problems with hip04_eth

During the use of the hip04_eth driver,
several problems were found,
which solved the hip04_tx_reclaim reentry problem,
fixed the problem that hip04_mac_start_xmit never
returns NETDEV_TX_BUSY
and the dma_map_single failed on the arm64 platform.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hisilicon: Fix dma_map_single failed on arm64
Jiangfeng Xiao [Sat, 3 Aug 2019 12:31:41 +0000 (20:31 +0800)]
net: hisilicon: Fix dma_map_single failed on arm64

On the arm64 platform, executing "ifconfig eth0 up" will fail,
returning "ifconfig: SIOCSIFFLAGS: Input/output error."

ndev->dev is not initialized, dma_map_single->get_dma_ops->
dummy_dma_ops->__dummy_map_page will return DMA_ERROR_CODE
directly, so when we use dma_map_single, the first parameter
is to use the device of platform_device.

Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hisilicon: fix hip04-xmit never return TX_BUSY
Jiangfeng Xiao [Sat, 3 Aug 2019 12:31:40 +0000 (20:31 +0800)]
net: hisilicon: fix hip04-xmit never return TX_BUSY

TX_DESC_NUM is 256, in tx_count, the maximum value of
mod(TX_DESC_NUM - 1) is 254, the variable "count" in
the hip04_mac_start_xmit function is never equal to
(TX_DESC_NUM - 1), so hip04_mac_start_xmit never
return NETDEV_TX_BUSY.

tx_count is modified to mod(TX_DESC_NUM) so that
the maximum value of tx_count can reach
(TX_DESC_NUM - 1), then hip04_mac_start_xmit can reurn
NETDEV_TX_BUSY.

Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hisilicon: make hip04_tx_reclaim non-reentrant
Jiangfeng Xiao [Sat, 3 Aug 2019 12:31:39 +0000 (20:31 +0800)]
net: hisilicon: make hip04_tx_reclaim non-reentrant

If hip04_tx_reclaim is interrupted while it is running
and then __napi_schedule continues to execute
hip04_rx_poll->hip04_tx_reclaim, reentrancy occurs
and oops is generated. So you need to mask the interrupt
during the hip04_tx_reclaim run.

The kernel oops exception stack is as follows:

Unable to handle kernel NULL pointer dereference
at virtual address 00000050
pgd = c0003000
[00000050] *pgd=80000000a04003, *pmd=00000000
Internal error: Oops: 206 [#1] SMP ARM
Modules linked in: hip04_eth mtdblock mtd_blkdevs mtd
ohci_platform ehci_platform ohci_hcd ehci_hcd
vfat fat sd_mod usb_storage scsi_mod usbcore usb_common
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O    4.4.185 #1
Hardware name: Hisilicon A15
task: c0a250e0 task.stack: c0a00000
PC is at hip04_tx_reclaim+0xe0/0x17c [hip04_eth]
LR is at hip04_tx_reclaim+0x30/0x17c [hip04_eth]
pc : [<bf30c3a4>]    lr : [<bf30c2f4>]    psr: 600e0313
sp : c0a01d88  ip : 00000000  fp : c0601f9c
r10: 00000000  r9 : c3482380  r8 : 00000001
r7 : 00000000  r6 : 000000e1  r5 : c3482000  r4 : 0000000c
r3 : f2209800  r2 : 00000000  r1 : 00000000  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 32c5387d  Table: 03d28c80  DAC: 55555555
Process swapper/0 (pid: 0, stack limit = 0xc0a00190)
Stack: (0xc0a01d88 to 0xc0a02000)
[<bf30c3a4>] (hip04_tx_reclaim [hip04_eth]) from [<bf30d2e0>]
                                                (hip04_rx_poll+0x88/0x368 [hip04_eth])
[<bf30d2e0>] (hip04_rx_poll [hip04_eth]) from [<c04c2d9c>] (net_rx_action+0x114/0x34c)
[<c04c2d9c>] (net_rx_action) from [<c021eed8>] (__do_softirq+0x218/0x318)
[<c021eed8>] (__do_softirq) from [<c021f284>] (irq_exit+0x88/0xac)
[<c021f284>] (irq_exit) from [<c0240090>] (msa_irq_exit+0x11c/0x1d4)
[<c0240090>] (msa_irq_exit) from [<c02677e0>] (__handle_domain_irq+0x110/0x148)
[<c02677e0>] (__handle_domain_irq) from [<c0201588>] (gic_handle_irq+0xd4/0x118)
[<c0201588>] (gic_handle_irq) from [<c0551700>] (__irq_svc+0x40/0x58)
Exception stack(0xc0a01f30 to 0xc0a01f78)
1f20:                                     c0ae8b40 00000000 00000000 00000000
1f40: 00000002 ffffe000 c0601f9c 00000000 ffffffff c0a2257c c0a22440 c0831a38
1f60: c0a01ec4 c0a01f80 c0203714 c0203718 600e0213 ffffffff
[<c0551700>] (__irq_svc) from [<c0203718>] (arch_cpu_idle+0x20/0x3c)
[<c0203718>] (arch_cpu_idle) from [<c025bfd8>] (cpu_startup_entry+0x244/0x29c)
[<c025bfd8>] (cpu_startup_entry) from [<c054b0d8>] (rest_init+0xc8/0x10c)
[<c054b0d8>] (rest_init) from [<c0800c58>] (start_kernel+0x468/0x514)
Code: a40599e5 016086e2 018088e2 7660efe6 (503090e5)
---[ end trace 1db21d6d09c49d74 ]---
Kernel panic - not syncing: Fatal exception in interrupt
CPU3: stopping
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G      D    O    4.4.185 #1

Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'Fix-batched-event-generation-for-vlan-action'
David S. Miller [Tue, 6 Aug 2019 21:05:40 +0000 (14:05 -0700)]
Merge branch 'Fix-batched-event-generation-for-vlan-action'

Roman Mashak says:

====================
Fix batched event generation for vlan action

When adding or deleting a batch of entries, the kernel sends up to
TCA_ACT_MAX_PRIO (defined to 32 in kernel) entries in an event to user
space. However it does not consider that the action sizes may vary and
require different skb sizes.

For example, consider the following script adding 32 entries with all
supported vlan parameters (in order to maximize netlink messages size):

% cat tc-batch.sh
TC="sudo /mnt/iproute2.git/tc/tc"

$TC actions flush action vlan
for i in `seq 1 $1`;
do
   cmd="action vlan push protocol 802.1q id 4094 priority 7 pipe \
               index $i cookie aabbccddeeff112233445566778800a1 "
   args=$args$cmd
done
$TC actions add $args
%
% ./tc-batch.sh 32
Error: Failed to fill netlink attributes while adding TC action.
We have an error talking to the kernel
%

patch 1 adds callback in tc_action_ops of vlan action, which calculates
the action size, and passes size to tcf_add_notify()/tcf_del_notify().

patch 2 updates the TDC test suite with relevant vlan test cases.
====================

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotc-testing: updated vlan action tests with batch create/delete
Roman Mashak [Fri, 2 Aug 2019 19:16:47 +0000 (15:16 -0400)]
tc-testing: updated vlan action tests with batch create/delete

Update TDC tests with cases varifying ability of TC to install or delete
batches of vlan actions.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet sched: update vlan action for batched events operations
Roman Mashak [Fri, 2 Aug 2019 19:16:46 +0000 (15:16 -0400)]
net sched: update vlan action for batched events operations

Add get_fill_size() routine used to calculate the action size
when building a batch of events.

Fixes: c7e2b9689 ("sched: introduce vlan action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mips_fixes_5.3_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips...
Linus Torvalds [Tue, 6 Aug 2019 21:01:08 +0000 (14:01 -0700)]
Merge tag 'mips_fixes_5.3_1' of git://git./linux/kernel/git/mips/linux

Pull MIPS fixes from Paul Burton:
 "A few MIPS fixes for 5.3:

   - Various switch fall through annotations to fixup warnings & errors
     resulting from -Wimplicit-fallthrough.

   - A fix for systems (at least jazz) using an i8253 PIT as clocksource
     when it's not suitably configured.

   - Set struct cacheinfo's cpu_map_populated field to true, indicating
     that we filled in cache info detected from cop0 registers &
     avoiding complaints about that info being (intentionally) missing
     in devicetree"

* tag 'mips_fixes_5.3_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: BCM63XX: Mark expected switch fall-through
  MIPS: OProfile: Mark expected switch fall-throughs
  MIPS: Annotate fall-through in Cavium Octeon code
  MIPS: Annotate fall-through in kvm/emulate.c
  mips: fix cacheinfo
  MIPS: kernel: only use i8253 clocksource with periodic clockevent

5 years agoMerge branch 'stmmac-fixes'
David S. Miller [Tue, 6 Aug 2019 19:26:11 +0000 (12:26 -0700)]
Merge branch 'stmmac-fixes'

Jose Abreu says:

====================
net: stmmac: Fixes for -net

Couple of fixes for -net. More info in commit log.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: tc: Do not return a fragment entry
Jose Abreu [Tue, 6 Aug 2019 13:16:18 +0000 (15:16 +0200)]
net: stmmac: tc: Do not return a fragment entry

Do not try to return a fragment entry from TC list. Otherwise we may not
clean properly allocated entries.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: Fix issues when number of Queues >= 4
Jose Abreu [Tue, 6 Aug 2019 13:16:17 +0000 (15:16 +0200)]
net: stmmac: Fix issues when number of Queues >= 4

When queues >= 4 we use different registers but we were not subtracting
the offset of 4. Fix this.

Found out by Coverity.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: xgmac: Fix XGMAC selftests
Jose Abreu [Tue, 6 Aug 2019 13:16:16 +0000 (15:16 +0200)]
net: stmmac: xgmac: Fix XGMAC selftests

Fixup the XGMAC selftests by correctly finishing the implementation of
set_filter callback.

Result:
$ ethtool -t enp4s0
The test result is PASS
The test extra info:
 1. MAC Loopback           0
 2. PHY Loopback           -95
 3. MMC Counters           -95
 4. EEE                    -95
 5. Hash Filter MC         0
 6. Perfect Filter UC      0
 7. MC Filter              0
 8. UC Filter              0
 9. Flow Control           0

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'wireless-drivers-for-davem-2019-08-06' of git://git.kernel.org/pub/scm...
David S. Miller [Tue, 6 Aug 2019 18:48:48 +0000 (11:48 -0700)]
Merge tag 'wireless-drivers-for-davem-2019-08-06' of git://git./linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 5.3

Second set of fixes for 5.3. Lots of iwlwifi fixes have accumulated
which consists most of patches in this pull request. Only most notable
iwlwifi fixes are listed below.

mwifiex

* fix a regression related to WPA1 networks since v5.3-rc1

iwlwifi

* fix use-after-free issues

* fix DMA mapping API usage errors

* fix frame drop occurring due to reorder buffer handling in
  RSS in certain conditions

* fix rate scale locking issues

* disable TX A-MSDU on older NICs as it causes problems and was
  never supposed to be supported

* new PCI IDs

* GEO_TX_POWER_LIMIT API issue that many people were hitting
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Linus Torvalds [Tue, 6 Aug 2019 18:47:23 +0000 (11:47 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

 - functional regression fix for some of the Logitech unifying devices,
   from Hans de Goede

 - race condition fix in hid-sony for bug severely affecting
   Valve/Android deployments, from Roderick Colenbrander

 - several fixes for issues found by syzbot/kasan, from Oliver Neukum
   and Hillf Danton

 - functional regression fix for Wacom Cintiq device, from Aaron
   Armstrong Skomra

 - a few other assorted device-specific quirks

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: sony: Fix race condition between rumble and device remove.
  HID: hiddev: do cleanup in failure of opening a device
  HID: hiddev: avoid opening a disconnected device
  HID: input: fix a4tech horizontal wheel custom usage
  HID: Add quirk for HP X1200 PIXART OEM mouse
  HID: holtek: test for sanity of intfdata
  HID: wacom: fix bit shift for Cintiq Companion 2
  HID: quirks: Set the INCREMENT_USAGE_ON_DUPLICATE quirk on Saitek X52
  HID: logitech-dj: Really fix return value of logi_dj_recv_query_hidpp_devices
  HID: Add 044f:b320 ThrustMaster, Inc. 2 in 1 DT
  HID: logitech-dj: add the Powerplay receiver
  HID: logitech-hidpp: add USB PID for a few more supported mice
  HID: logitech-dj: rename "gaming" receiver to "lightspeed"

5 years agobe2net: disable bh with spin_lock in be_process_mcc
Denis Kirjanov [Tue, 6 Aug 2019 10:51:11 +0000 (12:51 +0200)]
be2net: disable bh with spin_lock in be_process_mcc

be_process_mcc() is invoked in 3 different places and
always with BHs disabled except the be_poll function
but since it's invoked from softirq with BHs
disabled it won't hurt.

v1->v2: added explanation to the patch
v2->v3: add a missing call from be_cmds.c

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
Christophe JAILLET [Tue, 6 Aug 2019 08:55:12 +0000 (10:55 +0200)]
net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'

A call to 'kfree_skb()' is missing in the error handling path of
'init_one()'.
This is already present in 'remove_one()' but is missing here.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: sun4i-emac: Support phy-handle property for finding PHYs
Chen-Yu Tsai [Tue, 6 Aug 2019 07:35:39 +0000 (15:35 +0800)]
net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs

The sun4i-emac uses the "phy" property to find the PHY it's supposed to
use. This property was deprecated in favor of "phy-handle" in commit
8c5b09447625 ("dt-bindings: net: sun4i-emac: Convert the binding to a
schemas").

Add support for this new property name, and fall back to the old one in
case the device tree hasn't been updated.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'x86/grand-schemozzle' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 6 Aug 2019 18:22:22 +0000 (11:22 -0700)]
Merge branch 'x86/grand-schemozzle' of git://git./linux/kernel/git/tip/tip

Pull pti updates from Thomas Gleixner:
 "The performance deterioration departement is not proud at all to
  present yet another set of speculation fences to mitigate the next
  chapter in the 'what could possibly go wrong' story.

  The new vulnerability belongs to the Spectre class and affects GS
  based data accesses and has therefore been dubbed 'Grand Schemozzle'
  for secret communication purposes. It's officially listed as
  CVE-2019-1125.

  Conditional branches in the entry paths which contain a SWAPGS
  instruction (interrupts and exceptions) can be mis-speculated which
  results in speculative accesses with a wrong GS base.

  This can happen on entry from user mode through a mis-speculated
  branch which takes the entry from kernel mode path and therefore does
  not execute the SWAPGS instruction. The following speculative accesses
  are done with user GS base.

  On entry from kernel mode the mis-speculated branch executes the
  SWAPGS instruction in the entry from user mode path which has the same
  effect that the following GS based accesses are done with user GS
  base.

  If there is a disclosure gadget available in these code paths the
  mis-speculated data access can be leaked through the usual side
  channels.

  The entry from user mode issue affects all CPUs which have speculative
  execution. The entry from kernel mode issue affects only Intel CPUs
  which can speculate through SWAPGS. On CPUs from other vendors SWAPGS
  has semantics which prevent that.

  SMAP migitates both problems but only when the CPU is not affected by
  the Meltdown vulnerability.

  The mitigation is to issue LFENCE instructions in the entry from
  kernel mode path for all affected CPUs and on the affected Intel CPUs
  also in the entry from user mode path unless PTI is enabled because
  the CR3 write is serializing.

  The fences are as usual enabled conditionally and can be completely
  disabled on the kernel command line. The Spectre V1 documentation is
  updated accordingly.

  A big "Thank You!" goes to Josh for doing the heavy lifting for this
  round of hardware misfeature 'repair'. Of course also "Thank You!" to
  everybody else who contributed in one way or the other"

* 'x86/grand-schemozzle' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation: Add swapgs description to the Spectre v1 documentation
  x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS
  x86/entry/64: Use JMP instead of JMPQ
  x86/speculation: Enable Spectre v1 swapgs mitigations
  x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations

5 years agoHID: sony: Fix race condition between rumble and device remove.
Roderick Colenbrander [Fri, 2 Aug 2019 22:50:19 +0000 (15:50 -0700)]
HID: sony: Fix race condition between rumble and device remove.

Valve reported a kernel crash on Ubuntu 18.04 when disconnecting a DS4
gamepad while rumble is enabled. This issue is reproducible with a
frequency of 1 in 3 times in the game Borderlands 2 when using an
automatic weapon, which triggers many rumble operations.

We found the issue to be a race condition between sony_remove and the
final device destruction by the HID / input system. The problem was
that sony_remove didn't clean some of its work_item state in
"struct sony_sc". After sony_remove work, the corresponding evdev
node was around for sufficient time for applications to still queue
rumble work after "sony_remove".

On pre-4.19 kernels the race condition caused a kernel crash due to a
NULL-pointer dereference as "sc->output_report_dmabuf" got freed during
sony_remove. On newer kernels this crash doesn't happen due the buffer
now being allocated using devm_kzalloc. However we can still queue work,
while the driver is an undefined state.

This patch fixes the described problem, by guarding the work_item
"state_worker" with an initialized variable, which we are setting back
to 0 on cleanup.

Signed-off-by: Roderick Colenbrander <roderick.colenbrander@sony.com>
CC: stable@vger.kernel.org
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
5 years agoHID: hiddev: do cleanup in failure of opening a device
Hillf Danton [Tue, 6 Aug 2019 08:40:15 +0000 (16:40 +0800)]
HID: hiddev: do cleanup in failure of opening a device

Undo what we did for opening before releasing the memory slice.

Reported-by: syzbot <syzbot+62a1e04fd3ec2abf099e@syzkaller.appspotmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
5 years agoHID: hiddev: avoid opening a disconnected device
Hillf Danton [Tue, 6 Aug 2019 08:38:58 +0000 (16:38 +0800)]
HID: hiddev: avoid opening a disconnected device

syzbot found the following crash on:

HEAD commit:    e96407b4 usb-fuzzer: main usb gadget fuzzer driver
git tree:       https://github.com/google/kasan.git usb-fuzzer
console output: https://syzkaller.appspot.com/x/log.txt?x=147ac20c600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=792eb47789f57810
link: https://syzkaller.appspot.com/bug?extid=62a1e04fd3ec2abf099e
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x302a/0x3b50
kernel/locking/lockdep.c:3753
Read of size 8 at addr ffff8881cf591a08 by task syz-executor.1/26260

CPU: 1 PID: 26260 Comm: syz-executor.1 Not tainted 5.3.0-rc2+ #24
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0xca/0x13e lib/dump_stack.c:113
  print_address_description+0x6a/0x32c mm/kasan/report.c:351
  __kasan_report.cold+0x1a/0x33 mm/kasan/report.c:482
  kasan_report+0xe/0x12 mm/kasan/common.c:612
  __lock_acquire+0x302a/0x3b50 kernel/locking/lockdep.c:3753
  lock_acquire+0x127/0x320 kernel/locking/lockdep.c:4412
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0x32/0x50 kernel/locking/spinlock.c:159
  hiddev_release+0x82/0x520 drivers/hid/usbhid/hiddev.c:221
  __fput+0x2d7/0x840 fs/file_table.c:280
  task_work_run+0x13f/0x1c0 kernel/task_work.c:113
  exit_task_work include/linux/task_work.h:22 [inline]
  do_exit+0x8ef/0x2c50 kernel/exit.c:878
  do_group_exit+0x125/0x340 kernel/exit.c:982
  get_signal+0x466/0x23d0 kernel/signal.c:2728
  do_signal+0x88/0x14e0 arch/x86/kernel/signal.c:815
  exit_to_usermode_loop+0x1a2/0x200 arch/x86/entry/common.c:159
  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x45f/0x580 arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459829
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f75b2a6ccf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 000000000075c078 RCX: 0000000000459829
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000000000075c078
RBP: 000000000075c070 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000075c07c
R13: 00007ffcdfe1023f R14: 00007f75b2a6d9c0 R15: 000000000075c07c

Allocated by task 104:
  save_stack+0x1b/0x80 mm/kasan/common.c:69
  set_track mm/kasan/common.c:77 [inline]
  __kasan_kmalloc mm/kasan/common.c:487 [inline]
  __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:460
  kmalloc include/linux/slab.h:552 [inline]
  kzalloc include/linux/slab.h:748 [inline]
  hiddev_connect+0x242/0x5b0 drivers/hid/usbhid/hiddev.c:900
  hid_connect+0x239/0xbb0 drivers/hid/hid-core.c:1882
  hid_hw_start drivers/hid/hid-core.c:1981 [inline]
  hid_hw_start+0xa2/0x130 drivers/hid/hid-core.c:1972
  appleir_probe+0x13e/0x1a0 drivers/hid/hid-appleir.c:308
  hid_device_probe+0x2be/0x3f0 drivers/hid/hid-core.c:2209
  really_probe+0x281/0x650 drivers/base/dd.c:548
  driver_probe_device+0x101/0x1b0 drivers/base/dd.c:709
  __device_attach_driver+0x1c2/0x220 drivers/base/dd.c:816
  bus_for_each_drv+0x15c/0x1e0 drivers/base/bus.c:454
  __device_attach+0x217/0x360 drivers/base/dd.c:882
  bus_probe_device+0x1e4/0x290 drivers/base/bus.c:514
  device_add+0xae6/0x16f0 drivers/base/core.c:2114
  hid_add_device+0x33c/0x990 drivers/hid/hid-core.c:2365
  usbhid_probe+0xa81/0xfa0 drivers/hid/usbhid/hid-core.c:1386
  usb_probe_interface+0x305/0x7a0 drivers/usb/core/driver.c:361
  really_probe+0x281/0x650 drivers/base/dd.c:548
  driver_probe_device+0x101/0x1b0 drivers/base/dd.c:709
  __device_attach_driver+0x1c2/0x220 drivers/base/dd.c:816
  bus_for_each_drv+0x15c/0x1e0 drivers/base/bus.c:454
  __device_attach+0x217/0x360 drivers/base/dd.c:882
  bus_probe_device+0x1e4/0x290 drivers/base/bus.c:514
  device_add+0xae6/0x16f0 drivers/base/core.c:2114
  usb_set_configuration+0xdf6/0x1670 drivers/usb/core/message.c:2023
  generic_probe+0x9d/0xd5 drivers/usb/core/generic.c:210
  usb_probe_device+0x99/0x100 drivers/usb/core/driver.c:266
  really_probe+0x281/0x650 drivers/base/dd.c:548
  driver_probe_device+0x101/0x1b0 drivers/base/dd.c:709
  __device_attach_driver+0x1c2/0x220 drivers/base/dd.c:816
  bus_for_each_drv+0x15c/0x1e0 drivers/base/bus.c:454
  __device_attach+0x217/0x360 drivers/base/dd.c:882
  bus_probe_device+0x1e4/0x290 drivers/base/bus.c:514
  device_add+0xae6/0x16f0 drivers/base/core.c:2114
  usb_new_device.cold+0x6a4/0xe79 drivers/usb/core/hub.c:2536
  hub_port_connect drivers/usb/core/hub.c:5098 [inline]
  hub_port_connect_change drivers/usb/core/hub.c:5213 [inline]
  port_event drivers/usb/core/hub.c:5359 [inline]
  hub_event+0x1b5c/0x3640 drivers/usb/core/hub.c:5441
  process_one_work+0x92b/0x1530 kernel/workqueue.c:2269
  worker_thread+0x96/0xe20 kernel/workqueue.c:2415
  kthread+0x318/0x420 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

Freed by task 104:
  save_stack+0x1b/0x80 mm/kasan/common.c:69
  set_track mm/kasan/common.c:77 [inline]
  __kasan_slab_free+0x130/0x180 mm/kasan/common.c:449
  slab_free_hook mm/slub.c:1423 [inline]
  slab_free_freelist_hook mm/slub.c:1470 [inline]
  slab_free mm/slub.c:3012 [inline]
  kfree+0xe4/0x2f0 mm/slub.c:3953
  hiddev_connect.cold+0x45/0x5c drivers/hid/usbhid/hiddev.c:914
  hid_connect+0x239/0xbb0 drivers/hid/hid-core.c:1882
  hid_hw_start drivers/hid/hid-core.c:1981 [inline]
  hid_hw_start+0xa2/0x130 drivers/hid/hid-core.c:1972
  appleir_probe+0x13e/0x1a0 drivers/hid/hid-appleir.c:308
  hid_device_probe+0x2be/0x3f0 drivers/hid/hid-core.c:2209
  really_probe+0x281/0x650 drivers/base/dd.c:548
  driver_probe_device+0x101/0x1b0 drivers/base/dd.c:709
  __device_attach_driver+0x1c2/0x220 drivers/base/dd.c:816
  bus_for_each_drv+0x15c/0x1e0 drivers/base/bus.c:454
  __device_attach+0x217/0x360 drivers/base/dd.c:882
  bus_probe_device+0x1e4/0x290 drivers/base/bus.c:514
  device_add+0xae6/0x16f0 drivers/base/core.c:2114
  hid_add_device+0x33c/0x990 drivers/hid/hid-core.c:2365
  usbhid_probe+0xa81/0xfa0 drivers/hid/usbhid/hid-core.c:1386
  usb_probe_interface+0x305/0x7a0 drivers/usb/core/driver.c:361
  really_probe+0x281/0x650 drivers/base/dd.c:548
  driver_probe_device+0x101/0x1b0 drivers/base/dd.c:709
  __device_attach_driver+0x1c2/0x220 drivers/base/dd.c:816
  bus_for_each_drv+0x15c/0x1e0 drivers/base/bus.c:454
  __device_attach+0x217/0x360 drivers/base/dd.c:882
  bus_probe_device+0x1e4/0x290 drivers/base/bus.c:514
  device_add+0xae6/0x16f0 drivers/base/core.c:2114
  usb_set_configuration+0xdf6/0x1670 drivers/usb/core/message.c:2023
  generic_probe+0x9d/0xd5 drivers/usb/core/generic.c:210
  usb_probe_device+0x99/0x100 drivers/usb/core/driver.c:266
  really_probe+0x281/0x650 drivers/base/dd.c:548
  driver_probe_device+0x101/0x1b0 drivers/base/dd.c:709
  __device_attach_driver+0x1c2/0x220 drivers/base/dd.c:816
  bus_for_each_drv+0x15c/0x1e0 drivers/base/bus.c:454
  __device_attach+0x217/0x360 drivers/base/dd.c:882
  bus_probe_device+0x1e4/0x290 drivers/base/bus.c:514
  device_add+0xae6/0x16f0 drivers/base/core.c:2114
  usb_new_device.cold+0x6a4/0xe79 drivers/usb/core/hub.c:2536
  hub_port_connect drivers/usb/core/hub.c:5098 [inline]
  hub_port_connect_change drivers/usb/core/hub.c:5213 [inline]
  port_event drivers/usb/core/hub.c:5359 [inline]
  hub_event+0x1b5c/0x3640 drivers/usb/core/hub.c:5441
  process_one_work+0x92b/0x1530 kernel/workqueue.c:2269
  worker_thread+0x96/0xe20 kernel/workqueue.c:2415
  kthread+0x318/0x420 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff8881cf591900
  which belongs to the cache kmalloc-512 of size 512
The buggy address is located 264 bytes inside of
  512-byte region [ffff8881cf591900ffff8881cf591b00)
The buggy address belongs to the page:
page:ffffea00073d6400 refcount:1 mapcount:0 mapping:ffff8881da002500
index:0x0 compound_mapcount: 0
flags: 0x200000000010200(slab|head)
raw: 0200000000010200 0000000000000000 0000000100000001 ffff8881da002500
raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8881cf591900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8881cf591980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8881cf591a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                       ^
  ffff8881cf591a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8881cf591b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

In order to avoid opening a disconnected device, we need to check exist
again after acquiring the existance lock, and bail out if necessary.

Reported-by: syzbot <syzbot+62a1e04fd3ec2abf099e@syzkaller.appspotmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
5 years agoMIPS: BCM63XX: Mark expected switch fall-through
Gustavo A. R. Silva [Mon, 5 Aug 2019 18:55:33 +0000 (13:55 -0500)]
MIPS: BCM63XX: Mark expected switch fall-through

Mark switch cases where we are expecting to fall through.

This patch fixes the following warning (Building: bcm63xx_defconfig mips):

arch/mips/pci/ops-bcm63xx.c: In function ‘bcm63xx_pcie_can_access’:
arch/mips/pci/ops-bcm63xx.c:474:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
   if (PCI_SLOT(devfn) == 0)
      ^
arch/mips/pci/ops-bcm63xx.c:477:2: note: here
  default:
  ^~~~~~~

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: bcm-kernel-feedback-list@broadcom.com
Cc: linux-mips@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
5 years agoMerge tag 'linux-can-fixes-for-5.3-20190802' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Mon, 5 Aug 2019 21:44:53 +0000 (14:44 -0700)]
Merge tag 'linux-can-fixes-for-5.3-20190802' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2019-08-02

this is a pull request of 4 patches for net/master.

The first two patches are by Wang Xiayang, they force that the string buffer
during a dev_info() is properly NULL terminated.

The last two patches are by Tomas Bortoli and fix both a potential info leak of
kernel memory to USB devices.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
Nikolay Aleksandrov [Fri, 2 Aug 2019 10:57:36 +0000 (13:57 +0300)]
net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER

Most of the bridge device's vlan init bugs come from the fact that its
default pvid is created at the wrong time, way too early in ndo_init()
before the device is even assigned an ifindex. It introduces a bug when the
bridge's dev_addr is added as fdb during the initial default pvid creation
the notification has ifindex/NDA_MASTER both equal to 0 (see example below)
which really makes no sense for user-space[0] and is wrong.
Usually user-space software would ignore such entries, but they are
actually valid and will eventually have all necessary attributes.
It makes much more sense to send a notification *after* the device has
registered and has a proper ifindex allocated rather than before when
there's a chance that the registration might still fail or to receive
it with ifindex/NDA_MASTER == 0. Note that we can remove the fdb flush
from br_vlan_flush() since that case can no longer happen. At
NETDEV_REGISTER br->default_pvid is always == 1 as it's initialized by
br_vlan_init() before that and at NETDEV_UNREGISTER it can be anything
depending why it was called (if called due to NETDEV_REGISTER error
it'll still be == 1, otherwise it could be any value changed during the
device life time).

For the demonstration below a small change to iproute2 for printing all fdb
notifications is added, because it contained a workaround not to show
entries with ifindex == 0.
Command executed while monitoring: $ ip l add br0 type bridge
Before (both ifindex and master == 0):
$ bridge monitor fdb
36:7e:8a:b3:56:ba dev * vlan 1 master * permanent

After (proper br0 ifindex):
$ bridge monitor fdb
e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent

v4: move only the default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
v3: send the correct v2 patch with all changes (stub should return 0)
v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in
    the br_vlan_bridge_event stub when bridge vlans are disabled

[0] https://bugzilla.kernel.org/show_bug.cgi?id=204389

Reported-by: michael-dev <michael-dev@fami-braun.de>
Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodrivers/net/ethernet/marvell/mvmdio.c: Fix non OF case
Arnaud Patard (Rtp) [Fri, 2 Aug 2019 08:32:40 +0000 (10:32 +0200)]
drivers/net/ethernet/marvell/mvmdio.c: Fix non OF case

Orion5.x systems are still using machine files and not device-tree.
Commit 96cb4342382290c9 ("net: mvmdio: allow up to three clocks to be
specified for orion-mdio") has replaced devm_clk_get() with of_clk_get(),
leading to a oops at boot and not working network, as reported in
https://lists.debian.org/debian-arm/2019/07/msg00088.html and possibly in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=908712.

Link: https://lists.debian.org/debian-arm/2019/07/msg00088.html
Fixes: 96cb4342382290c9 ("net: mvmdio: allow up to three clocks to be specified for orion-mdio")
Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/smc: avoid fallback in case of non-blocking connect
Ursula Braun [Fri, 2 Aug 2019 08:47:50 +0000 (10:47 +0200)]
net/smc: avoid fallback in case of non-blocking connect

FASTOPEN is not possible with SMC. sendmsg() with msg_flag MSG_FASTOPEN
triggers a fallback to TCP if the socket is in state SMC_INIT.
But if a nonblocking connect is already started, fallback to TCP
is no longer possible, even though the socket may still be in state
SMC_INIT.
And if a nonblocking connect is already started, a listen() call
does not make sense.

Reported-by: syzbot+bd8cc73d665590a1fcad@syzkaller.appspotmail.com
Fixes: 50717a37db032 ("net/smc: nonblocking connect rework")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/smc: do not schedule tx_work in SMC_CLOSED state
Ursula Braun [Fri, 2 Aug 2019 08:16:38 +0000 (10:16 +0200)]
net/smc: do not schedule tx_work in SMC_CLOSED state

The setsockopts options TCP_NODELAY and TCP_CORK may schedule the
tx worker. Make sure the socket is not yet moved into SMC_CLOSED
state (for instance by a shutdown SHUT_RDWR call).

Reported-by: syzbot+92209502e7aab127c75f@syzkaller.appspotmail.com
Reported-by: syzbot+b972214bb803a343f4fe@syzkaller.appspotmail.com
Fixes: 01d2f7e2cdd31 ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agogve: Fix case where desc_cnt and data_cnt can get out of sync
Catherine Sullivan [Thu, 1 Aug 2019 23:07:31 +0000 (16:07 -0700)]
gve: Fix case where desc_cnt and data_cnt can get out of sync

desc_cnt and data_cnt should always be equal. In the case of a dropped
packet desc_cnt was still getting updated (correctly), data_cnt
was not. To eliminate this bug and prevent it from recurring this
patch combines them into one ring level cnt.

Signed-off-by: Catherine Sullivan <csully@google.com>
Reviewed-by: Sagi Shahar <sagis@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Fix unbalanced rcu locking in rt6_update_exception_stamp_rt
David Ahern [Thu, 1 Aug 2019 21:36:35 +0000 (14:36 -0700)]
ipv6: Fix unbalanced rcu locking in rt6_update_exception_stamp_rt

The nexthop path in rt6_update_exception_stamp_rt needs to call
rcu_read_unlock if it fails to find a fib6_nh match rather than
just returning.

Fixes: e659ba31d806 ("ipv6: Handle all fib6_nh in a nexthop in exception handling")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests/tls: add a litmus test for the socket reuse through shutdown
Jakub Kicinski [Thu, 1 Aug 2019 21:36:02 +0000 (14:36 -0700)]
selftests/tls: add a litmus test for the socket reuse through shutdown

Make sure that shutdown never works, and at the same time document how
I tested to came to the conclusion that currently reuse is not possible.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: partially revert fix transition through disconnect with close
Jakub Kicinski [Thu, 1 Aug 2019 21:36:01 +0000 (14:36 -0700)]
net/tls: partially revert fix transition through disconnect with close

Looks like we were slightly overzealous with the shutdown()
cleanup. Even though the sock->sk_state can reach CLOSED again,
socket->state will not got back to SS_UNCONNECTED once
connections is ESTABLISHED. Meaning we will see EISCONN if
we try to reconnect, and EINVAL if we try to listen.

Only listen sockets can be shutdown() and reused, but since
ESTABLISHED sockets can never be re-connected() or used for
listen() we don't need to try to clean up the ULP state early.

Fixes: 32857cf57f92 ("net/tls: fix transition through disconnect with close")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'spi-fix-v5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Linus Torvalds [Mon, 5 Aug 2019 18:49:02 +0000 (11:49 -0700)]
Merge tag 'spi-fix-v5.3-rc3' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A bunch of small, device specific things here plus a DT bindings fix
  for the new validatable YAML binding format.

  The most notable thing is the fix for GPIO chip selects which fixes a
  corner case in updates of that code to modern APIs, unfortunately due
  to a historical mess the code around GPIO support is obscure, fragile
  and an ABI which makes and attempt to improve the situation painful"

* tag 'spi-fix-v5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: pxa2xx: Add support for Intel Tiger Lake
  spi: bcm2835: Fix 3-wire mode if DMA is enabled
  spi: pxa2xx: Balance runtime PM enable/disable on error
  spi: gpio: Add SPI_MASTER_GPIO_SS flag
  spi: spi-fsl-qspi: change i.MX7D RX FIFO size
  spi: dt-bindings: spi-controller: remove unnecessary 'maxItems: 1' from reg

5 years agoMerge tag 'regulator-fix-v5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 5 Aug 2019 18:47:02 +0000 (11:47 -0700)]
Merge tag 'regulator-fix-v5.3-rc3' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A few small driver specific fixes here plus one core fix for a
  refcounting problem with DT which will have little practical impact
  unless overlays are used"

* tag 'regulator-fix-v5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: of: Add of_node_put() before return in function
  regulator: lp87565: Fix probe failure for "ti,lp87565"
  regulator: axp20x: fix DCDC5 and DCDC6 for AXP803
  regulator: axp20x: fix DCDCA and DCDCD for AXP806

5 years agoMerge tag 'linux-kselftest-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Mon, 5 Aug 2019 18:43:16 +0000 (11:43 -0700)]
Merge tag 'linux-kselftest-5.3-rc4' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "A fix to the Kselftest framework to save and restore errno and a fix
  to livepatch to push and pop dynamic debug config"

* tag 'linux-kselftest-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/livepatch: push and pop dynamic debug config
  kselftest: save-and-restore errno to allow for %m formatting

5 years agoMerge branch 'net-fix-regressions-for-generic-XDP'
David S. Miller [Mon, 5 Aug 2019 18:17:40 +0000 (11:17 -0700)]
Merge branch 'net-fix-regressions-for-generic-XDP'

Jesper Dangaard Brouer says:

====================
net: fix regressions for generic-XDP

Thanks to Brandon Cazander, who wrote a very detailed bug report that
even used perf probe's on xdp-newbies mailing list, we discovered that
generic-XDP contains some regressions when using bpf_xdp_adjust_head().

First issue were that my selftests script, that use bpf_xdp_adjust_head(),
by mistake didn't use generic-XDP any-longer. That selftest should have
caught the real regression introduced in commit 458bf2f224f0 ("net: core:
support XDP generic on stacked devices.").

To verify this patchset fix the regressions, you can invoked manually via:

  cd tools/testing/selftests/bpf/
  sudo ./test_xdp_vlan_mode_generic.sh
  sudo ./test_xdp_vlan_mode_native.sh
====================

Link: https://www.spinics.net/lists/xdp-newbies/msg01231.html
Fixes: 458bf2f224f0 ("net: core: support XDP generic on stacked devices.")
Reported by: Brandon Cazander <brandon.cazander@multapplied.net>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: fix bpf_xdp_adjust_head regression for generic-XDP
Jesper Dangaard Brouer [Thu, 1 Aug 2019 18:00:31 +0000 (20:00 +0200)]
net: fix bpf_xdp_adjust_head regression for generic-XDP

When generic-XDP was moved to a later processing step by commit
458bf2f224f0 ("net: core: support XDP generic on stacked devices.")
a regression was introduced when using bpf_xdp_adjust_head.

The issue is that after this commit the skb->network_header is now
changed prior to calling generic XDP and not after. Thus, if the header
is changed by XDP (via bpf_xdp_adjust_head), then skb->network_header
also need to be updated again.  Fix by calling skb_reset_network_header().

Fixes: 458bf2f224f0 ("net: core: support XDP generic on stacked devices.")
Reported-by: Brandon Cazander <brandon.cazander@multapplied.net>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests/bpf: reduce time to execute test_xdp_vlan.sh
Jesper Dangaard Brouer [Thu, 1 Aug 2019 18:00:26 +0000 (20:00 +0200)]
selftests/bpf: reduce time to execute test_xdp_vlan.sh

Given the increasing number of BPF selftests, it makes sense to
reduce the time to execute these tests.  The ping parameters are
adjusted to reduce the time from measures 9 sec to approx 2.8 sec.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests/bpf: add wrapper scripts for test_xdp_vlan.sh
Jesper Dangaard Brouer [Thu, 1 Aug 2019 18:00:21 +0000 (20:00 +0200)]
selftests/bpf: add wrapper scripts for test_xdp_vlan.sh

In-order to test both native-XDP (xdpdrv) and generic-XDP (xdpgeneric)
create two wrapper test scripts, that start the test_xdp_vlan.sh script
with these modes.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobpf: fix XDP vlan selftests test_xdp_vlan.sh
Jesper Dangaard Brouer [Thu, 1 Aug 2019 18:00:16 +0000 (20:00 +0200)]
bpf: fix XDP vlan selftests test_xdp_vlan.sh

Change BPF selftest test_xdp_vlan.sh to (default) use generic XDP.

This selftest was created together with a fix for generic XDP, in commit
297249569932 ("net: fix generic XDP to handle if eth header was
mangled"). And was suppose to catch if generic XDP was broken again.

The tests are using veth and assumed that veth driver didn't support
native driver XDP, thus it used the (ip link set) 'xdp' attach that fell
back to generic-XDP. But veth gained native-XDP support in 948d4f214fde
("veth: Add driver XDP"), which caused this test script to use
native-XDP.

Fixes: 948d4f214fde ("veth: Add driver XDP")
Fixes: 97396ff0bc2d ("selftests/bpf: add XDP selftests for modifying and popping VLAN headers")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5e: always initialize frag->last_in_page
Qian Cai [Thu, 1 Aug 2019 13:52:54 +0000 (09:52 -0400)]
net/mlx5e: always initialize frag->last_in_page

The commit 069d11465a80 ("net/mlx5e: RX, Enhance legacy Receive Queue
memory scheme") introduced an undefined behaviour below due to
"frag->last_in_page" is only initialized in mlx5e_init_frags_partition()
when,

if (next_frag.offset + frag_info[f].frag_stride > PAGE_SIZE)

or after bailed out the loop,

for (i = 0; i < mlx5_wq_cyc_get_size(&rq->wqe.wq); i++)

As the result, there could be some "frag" have uninitialized
value of "last_in_page".

Later, get_frag() obtains those "frag" and check "frag->last_in_page" in
mlx5e_put_rx_frag() and triggers the error during boot. Fix it by always
initializing "frag->last_in_page" to "false" in
mlx5e_init_frags_partition().

UBSAN: Undefined behaviour in
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:325:12
load of value 170 is not a valid value for type 'bool' (aka '_Bool')
Call trace:
 dump_backtrace+0x0/0x264
 show_stack+0x20/0x2c
 dump_stack+0xb0/0x104
 __ubsan_handle_load_invalid_value+0x104/0x128
 mlx5e_handle_rx_cqe+0x8e8/0x12cc [mlx5_core]
 mlx5e_poll_rx_cq+0xca8/0x1a94 [mlx5_core]
 mlx5e_napi_poll+0x17c/0xa30 [mlx5_core]
 net_rx_action+0x248/0x940
 __do_softirq+0x350/0x7b8
 irq_exit+0x200/0x26c
 __handle_domain_irq+0xc8/0x128
 gic_handle_irq+0x138/0x228
 el1_irq+0xb8/0x140
 arch_cpu_idle+0x1a4/0x348
 do_idle+0x114/0x1b0
 cpu_startup_entry+0x24/0x28
 rest_init+0x1ac/0x1dc
 arch_call_rest_init+0x10/0x18
 start_kernel+0x4d4/0x57c

Fixes: 069d11465a80 ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>