platform/kernel/linux-rpi.git
6 years agonet/ipv6: Add support for path selection using hash of 5-tuple
David Ahern [Fri, 2 Mar 2018 16:32:18 +0000 (08:32 -0800)]
net/ipv6: Add support for path selection using hash of 5-tuple

Some operators prefer IPv6 path selection to use a standard 5-tuple
hash rather than just an L3 hash with the flow the label. To that end
add support to IPv6 for multipath hash policy similar to bf4e0a3db97eb
("net: ipv4: add support for ECMP hash policy choice"). The default
is still L3 which covers source and destination addresses along with
flow label and IPv6 protocol.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ipv6: Pass skb to route lookup
David Ahern [Fri, 2 Mar 2018 16:32:17 +0000 (08:32 -0800)]
net/ipv6: Pass skb to route lookup

IPv6 does path selection for multipath routes deep in the lookup
functions. The next patch adds L4 hash option and needs the skb
for the forward path. To get the skb to the relevant FIB lookup
functions it needs to go through the fib rules layer, so add a
lookup_data argument to the fib_lookup_arg struct.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Rename NETEVENT_MULTIPATH_HASH_UPDATE
David Ahern [Fri, 2 Mar 2018 16:32:16 +0000 (08:32 -0800)]
net: Rename NETEVENT_MULTIPATH_HASH_UPDATE

Rename NETEVENT_MULTIPATH_HASH_UPDATE to
NETEVENT_IPV4_MPATH_HASH_UPDATE to denote it relates to a change
in the IPv4 hash policy.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ipv6: Make rt6_multipath_hash similar to fib_multipath_hash
David Ahern [Fri, 2 Mar 2018 16:32:15 +0000 (08:32 -0800)]
net/ipv6: Make rt6_multipath_hash similar to fib_multipath_hash

Make rt6_multipath_hash more of a direct parallel to fib_multipath_hash
and reduce stack and overhead in the process: get_hash_from_flowi6 is
just a wrapper around __get_hash_from_flowi6 with another stack
allocation for flow_keys. Move setting the addresses, protocol and
label into rt6_multipath_hash and allow it to make the call to
flow_hash_from_keys.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ipv4: Simplify fib_multipath_hash with optional flow keys
David Ahern [Fri, 2 Mar 2018 16:32:14 +0000 (08:32 -0800)]
net/ipv4: Simplify fib_multipath_hash with optional flow keys

As of commit e37b1e978bec5 ("ipv6: route: dissect flow in input path if
fib rules need it") fib_multipath_hash takes an optional flow keys. If
non-NULL it means the skb has already been dissected. If not set, then
fib_multipath_hash needs to call skb_flow_dissect_flow_keys.

Simplify the logic by setting flkeys to the local stack variable keys.
Simplifies fib_multipath_hash by only have 1 set of instructions
setting hash_keys.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Align ip_multipath_l3_keys and ip6_multipath_l3_keys
David Ahern [Fri, 2 Mar 2018 16:32:13 +0000 (08:32 -0800)]
net: Align ip_multipath_l3_keys and ip6_multipath_l3_keys

Symmetry is good and allows easy comparison that ipv4 and ipv6 are
doing the same thing. To that end, change ip_multipath_l3_keys to
set addresses at the end after the icmp compares, and move the
initialization of ipv6 flow keys to rt6_multipath_hash.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ipv4: Pass net to fib_multipath_hash instead of fib_info
David Ahern [Fri, 2 Mar 2018 16:32:12 +0000 (08:32 -0800)]
net/ipv4: Pass net to fib_multipath_hash instead of fib_info

fib_multipath_hash only needs net struct to check a sysctl. Make it
clear by passing net instead of fib_info. In the end this allows
alignment between the ipv4 and ipv6 versions.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'sctp-clean-up-sctp_sendmsg'
David S. Miller [Sun, 4 Mar 2018 18:00:58 +0000 (13:00 -0500)]
Merge branch 'sctp-clean-up-sctp_sendmsg'

Xin Long says:

====================
sctp: clean up sctp_sendmsg

This cleanup mostly does three things:

 - extract some codes into functions to make sendmsg more readable.

 - tidy up some codes to avoid the unnecessary checks.

 - adjust some logic so that it will be easier to add the send flags
   and cmsgs features that I will post after this.

To make it easy to review and to check if the code is compatible with
before, this patchset is to do it step by step in 9 patches.

NOTE:
There will be a conflict when merging
Commit 2277c7cd75e3 ("sctp: Add LSM hooks") from selinux tree,
the solution is to:

1. remove all the lines in [B]:

    <<<<<<< HEAD
    [A]
    =======
    [B]
    >>>>>>> 2277c7c... sctp: Add LSM hooks

2. and apply the following diff-output:

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 980621e..d6803c8 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1686,6 +1686,7 @@ static int sctp_sendmsg_new_asoc(struct sock *sk, __u16 sflags,
  struct net *net = sock_net(sk);
  struct sctp_association *asoc;
  enum sctp_scope scope;
+ struct sctp_af *af;
  int err = -EINVAL;

  *tp = NULL;
@@ -1711,6 +1712,22 @@ static int sctp_sendmsg_new_asoc(struct sock *sk, __u16 sflags,

  scope = sctp_scope(daddr);

+ /* Label connection socket for first association 1-to-many
+  * style for client sequence socket()->sendmsg(). This
+  * needs to be done before sctp_assoc_add_peer() as that will
+  * set up the initial packet that needs to account for any
+  * security ip options (CIPSO/CALIPSO) added to the packet.
+  */
+ af = sctp_get_af_specific(daddr->sa.sa_family);
+ if (!af)
+ return -EINVAL;
+
+ err = security_sctp_bind_connect(sk, SCTP_SENDMSG_CONNECT,
+  (struct sockaddr *)daddr,
+  af->sockaddr_len);
+ if (err < 0)
+ return err;
+
  asoc = sctp_association_new(ep, sk, scope, GFP_KERNEL);
  if (!asoc)
  return -ENOMEM;
====================

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: adjust some codes in a better order in sctp_sendmsg
Xin Long [Thu, 1 Mar 2018 15:05:18 +0000 (23:05 +0800)]
sctp: adjust some codes in a better order in sctp_sendmsg

sctp_sendmsg_new_asoc and SCTP_ADDR_OVER check is only necessary
when daddr is set, so move them up to if (daddr) statement.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: improve some variables in sctp_sendmsg
Xin Long [Thu, 1 Mar 2018 15:05:17 +0000 (23:05 +0800)]
sctp: improve some variables in sctp_sendmsg

This patch mostly is to:

  - rename sinfo_flags as sflags, to make the indents look better, and
    also keep consistent with other sctp_sendmsg_xx functions.

  - replace new_asoc with bool new, no need to define a pointer here,
    as if new_asoc is set, it must be asoc.

  - rename the 'out_nounlock:' as 'out', shorter and nicer.

  - remove associd, only one place is using it now, just use
    sinfo->sinfo_assoc_id directly.

  - remove 'cmsgs' initialization in sctp_sendmsg, as it will be done
    in sctp_sendmsg_parse.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: remove the unnecessary transport looking up from sctp_sendmsg
Xin Long [Thu, 1 Mar 2018 15:05:16 +0000 (23:05 +0800)]
sctp: remove the unnecessary transport looking up from sctp_sendmsg

Now sctp_assoc_lookup_paddr can only be called only if daddr has
been set. But if daddr has been set, sctp_endpoint_lookup_assoc
would be done, where it could already have the transport.

So this unnecessary transport looking up should be removed, but
only reset transport as NULL when SCTP_ADDR_OVER is not set for
UDP type socket.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: factor out sctp_sendmsg_update_sinfo from sctp_sendmsg
Xin Long [Thu, 1 Mar 2018 15:05:15 +0000 (23:05 +0800)]
sctp: factor out sctp_sendmsg_update_sinfo from sctp_sendmsg

This patch is to move the codes for trying to get sinfo from
asoc into sctp_sendmsg_update_sinfo.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: factor out sctp_sendmsg_parse from sctp_sendmsg
Xin Long [Thu, 1 Mar 2018 15:05:14 +0000 (23:05 +0800)]
sctp: factor out sctp_sendmsg_parse from sctp_sendmsg

This patch is to move the codes for parsing msghdr and checking
sk into sctp_sendmsg_parse.

Note that different from before, 'sinfo' in sctp_sendmsg won't
be NULL any more. It gets the value either from cmsgs->srinfo,
cmsgs->sinfo or asoc. With it, the 'sinfo' and 'fill_sinfo_ttl'
check can be removed from sctp_sendmsg.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: factor out sctp_sendmsg_get_daddr from sctp_sendmsg
Xin Long [Thu, 1 Mar 2018 15:05:13 +0000 (23:05 +0800)]
sctp: factor out sctp_sendmsg_get_daddr from sctp_sendmsg

This patch is to move the codes for trying to get daddr from
msg->msg_name into sctp_sendmsg_get_daddr.

Note that after adding 'daddr', 'to' and 'msg_name' can be
deleted.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: factor out sctp_sendmsg_check_sflags from sctp_sendmsg
Xin Long [Thu, 1 Mar 2018 15:05:12 +0000 (23:05 +0800)]
sctp: factor out sctp_sendmsg_check_sflags from sctp_sendmsg

This patch is to move the codes for checking sinfo_flags on one asoc
after this asoc has been found into sctp_sendmsg_check_sflags.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: factor out sctp_sendmsg_new_asoc from sctp_sendmsg
Xin Long [Thu, 1 Mar 2018 15:05:11 +0000 (23:05 +0800)]
sctp: factor out sctp_sendmsg_new_asoc from sctp_sendmsg

This patch is to move the codes for creating a new asoc if
no asoc was found into sctp_sendmsg_new_asoc.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: factor out sctp_sendmsg_to_asoc from sctp_sendmsg
Xin Long [Thu, 1 Mar 2018 15:05:10 +0000 (23:05 +0800)]
sctp: factor out sctp_sendmsg_to_asoc from sctp_sendmsg

This patch is to move the codes for checking and sending on
one asoc after this asoc has been found or created into
sctp_sendmsg_to_asoc.

Note that 'err != -ESRCH' check is for the case that asoc is
freed when waiting for tx buffer in sctp_sendmsg_to_asoc.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Sat, 3 Mar 2018 02:53:11 +0000 (21:53 -0500)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2018-03-03

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Extend bpftool to build up CFG information of eBPF programs and add an
   option to dump this in DOT format such that this can later be used with
   DOT graphic tools (xdot, graphviz, etc) to visualize it. Part of the
   analysis performed is sub-program detection and basic-block partitioning,
   from Jiong.

2) Multiple enhancements for bpftool's batch mode, more specifically the
   parser now understands comments (#), continuation lines (\), and arguments
   enclosed between quotes. Also, allow to read from stdin via '-' as input
   file, all from Quentin.

3) Improve BPF kselftests by i) unifying the rlimit handling into a helper
   that is then used by all tests, and ii) add support for testing tail calls
   to test_verifier plus add tests covering all corner cases. The latter is
   especially useful for testing JITs, from Daniel.

4) Remove x64 JIT's bpf_flush_icache() since flush_icache_range() is a noop
   on x64, from Daniel.

5) Fix one more occasion in BPF samples where we do not detach the BPF program
   from the cgroup after completion, from Prashant.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/usb/kalmia: use ARRAY_SIZE for various array sizing calculations
Colin Ian King [Fri, 2 Mar 2018 13:42:39 +0000 (13:42 +0000)]
net/usb/kalmia: use ARRAY_SIZE for various array sizing calculations

Use the ARRAY_SIZE macro on a couple of arrays to determine
size of the arrays. Also fix up alignment to clean up a checkpatch
warning. Improvement suggested by Coccinelle.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: Add TP Congestion map entry for single-port
Ganesh Goudar [Fri, 2 Mar 2018 10:27:07 +0000 (15:57 +0530)]
cxgb4: Add TP Congestion map entry for single-port

Add TP Congestion Map entry for single-port T6 cards.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'mac80211-next-for-davem-2018-03-02' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Fri, 2 Mar 2018 14:50:21 +0000 (09:50 -0500)]
Merge tag 'mac80211-next-for-davem-2018-03-02' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Only a few new things:
 * hwsim net namespace stuff from Kirill Tkhai
 * A-MSDU support in fast-RX
 * 4-addr mode support in fast-RX
 * support for a spec quirk in Add-BA negotiation
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: remove dead code when allocating filter
Ganesh Goudar [Fri, 2 Mar 2018 09:05:49 +0000 (14:35 +0530)]
cxgb4: remove dead code when allocating filter

Error code is already returned earlier if filter exists
at specified location. So, remove dead code trying to
free existing filter.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert hwsim_net_ops
Kirill Tkhai [Thu, 1 Mar 2018 11:30:17 +0000 (14:30 +0300)]
net: Convert hwsim_net_ops

These pernet_operations allocate and destroy IDA identifier,
and these actions are synchronized by IDA subsystem locks.
Exit method removes mac80211_hwsim_data enteries from the lists,
and this is synchronized by hwsim_radio_lock with the rest
parallel pernet_operations. Also it queues destroy_radio()
work, and these work already may be executed in parallel
with any pernet_operations (as it's a work :). So, we may
mark these pernet_operations as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agomac80211_hwsim: Make hwsim_netgroup IDA
Kirill Tkhai [Thu, 1 Mar 2018 11:30:09 +0000 (14:30 +0300)]
mac80211_hwsim: Make hwsim_netgroup IDA

hwsim_netgroup counter is declarated as int, and it is incremented
every time a new net is created. After sizeof(int) net are created,
it will overflow, and different net namespaces will have the same
identifier. This patch fixes the problem by introducing IDA instead
of int counter. IDA guarantees, all the net namespaces have the uniq
identifier.

Note, that after we do ida_simple_remove() in hwsim_exit_net(),
and we destroy the ID, later there may be executed destroy_radio()
from the workqueue. But destroy_radio() does not use the ID, so it's OK.

Out of bounds of this patch, just as a report to wireless subsystem
maintainer, destroy_radio() increaments hwsim_radios_generation
without hwsim_radio_lock, so this may need one more patch to fix.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agoMerge branch 'bpf-bpftool-batch-improvements'
Daniel Borkmann [Fri, 2 Mar 2018 08:46:41 +0000 (09:46 +0100)]
Merge branch 'bpf-bpftool-batch-improvements'

Quentin Monnet says:

====================
Several enhancements for bpftool batch mode are introduced in this series.

More specifically, input files for batch mode gain support for:
  * comments (starting with '#'),
  * continuation lines (after a line ending with '\'),
  * arguments enclosed between quotes.

Also, make bpftool able to read from standard input when "-" is provided as
input file name.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agotools: bpftool: add support for quotations in batch files
Quentin Monnet [Fri, 2 Mar 2018 04:20:11 +0000 (20:20 -0800)]
tools: bpftool: add support for quotations in batch files

Improve argument parsing from batch input files in order to support
arguments enclosed between single (') or double quotes ("). For example,
this command can now be parsed in batch mode:

    bpftool prog dump xlated id 1337 file "/tmp/my file with spaces"

The function responsible for parsing command arguments is copied from
its counterpart in lib/utils.c in iproute2 package.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agotools: bpftool: read from stdin when batch file name is "-"
Quentin Monnet [Fri, 2 Mar 2018 04:20:10 +0000 (20:20 -0800)]
tools: bpftool: read from stdin when batch file name is "-"

Make bpftool read its command list from standard input when the name if
the input file is a single dash.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agotools: bpftool: support continuation lines in batch files
Quentin Monnet [Fri, 2 Mar 2018 04:20:09 +0000 (20:20 -0800)]
tools: bpftool: support continuation lines in batch files

Add support for continuation lines, such as in the following example:

    prog show
    prog dump xlated \
        id 1337 opcodes

This patch is based after the code for support for continuation lines
from file lib/utils.c from package iproute2.

"Lines" in error messages are renamed as "commands", as we count the
number of commands (but we ignore empty lines, comments, and do not add
continuation lines to the count).

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agotools: bpftool: support comments in batch files
Quentin Monnet [Fri, 2 Mar 2018 04:20:08 +0000 (20:20 -0800)]
tools: bpftool: support comments in batch files

Replace '#' by '\0' in commands read from batch files in order to avoid
processing the remaining part of the line, thus allowing users to use
comments in the files.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoMerge branch 'tcp_bbr-more-GSO-work'
David S. Miller [Fri, 2 Mar 2018 02:44:29 +0000 (21:44 -0500)]
Merge branch 'tcp_bbr-more-GSO-work'

Eric Dumazet says:

====================
tcp_bbr: more GSO work

Playing with r8152 USB 1Gbit NIC, on both USB2 and USB3 slots, I found
that BBR was performing poorly, because of TSO being limited to 16KB

This patch series makes sure BBR is not under estimating number of
packets that are needed to fill the pipe when a device has suboptimal
TSO limits.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp_bbr: remove bbr->tso_segs_goal
Eric Dumazet [Wed, 28 Feb 2018 22:40:47 +0000 (14:40 -0800)]
tcp_bbr: remove bbr->tso_segs_goal

Its value is computed then immediately used,
there is no need to store it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp_bbr: better deal with suboptimal GSO (II)
Eric Dumazet [Wed, 28 Feb 2018 22:40:46 +0000 (14:40 -0800)]
tcp_bbr: better deal with suboptimal GSO (II)

This is second part of dealing with suboptimal device gso parameters.
In first patch (350c9f484bde "tcp_bbr: better deal with suboptimal GSO")
we dealt with devices having low gso_max_segs

Some devices lower gso_max_size from 64KB to 16 KB (r8152 is an example)

In order to probe an optimal cwnd, we want BBR being not sensitive
to whatever GSO constraint a device can have.

This patch removes tso_segs_goal() CC callback in favor of
min_tso_segs() for CC wanting to override sysctl_tcp_min_tso_segs

Next patch will remove bbr->tso_segs_goal since it does not have
to be persistent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'bpftool-visualization'
Alexei Starovoitov [Fri, 2 Mar 2018 02:29:50 +0000 (18:29 -0800)]
Merge branch 'bpftool-visualization'

Jakub Kicinski says:

====================
Jiong says:

This patch set is an application of CFG information on eBPF program
visualization. It presents some initial code for building CFG information
from eBPF instruction sequences.

After we get eBPF program bytecode, we do sub-program detection and
basic-block partition. These information then are visualized into DOT
graph.

The user could use any DOT graphic tools (xdot, graphviz etc) to view it.

For example:

  bpftool prog dump xlated id 2 visual &>output.dot

  [xdot | dotty] output.dot
  dot -Tpng -o output.png

This initial patch set hasn't tuned much on the dot description layout
nor decoration, we could improve them later once the direction of the patch
set is agreed on. We could also visualize some static analysis performance
data.

v2 (Jakub):
 - update license headers and add SPDX tags.
====================

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agotools: bpftool: add bash completion for CFG dump
Quentin Monnet [Fri, 2 Mar 2018 02:01:23 +0000 (18:01 -0800)]
tools: bpftool: add bash completion for CFG dump

Add bash completion for the "visual" keyword used for dumping the CFG of
eBPF programs with bpftool. Make sure we only complete with this keyword
when we dump "xlated" (and not "jited") instructions.

Acked-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agotools: bpftool: new command-line option and documentation for 'visual'
Jiong Wang [Fri, 2 Mar 2018 02:01:22 +0000 (18:01 -0800)]
tools: bpftool: new command-line option and documentation for 'visual'

This patch adds new command-line option for visualizing the xlated eBPF
sequence.

Documentations are updated accordingly.

Usage:

  bpftool prog dump xlated id 2 visual

Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agotools: bpftool: generate .dot graph from CFG information
Jiong Wang [Fri, 2 Mar 2018 02:01:21 +0000 (18:01 -0800)]
tools: bpftool: generate .dot graph from CFG information

This patch let bpftool print .dot graph file into stdout.

This graph is generated by the following steps:

  - iterate through the function list.
  - generate basic-block(BB) definition for each BB in the function.
  - draw out edges to connect BBs.

This patch is the initial support, the layout and decoration of the .dot
graph could be improved.

Also, it will be useful if we could visualize some performance data from
static analysis.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agotools: bpftool: add out edges for each basic-block
Jiong Wang [Fri, 2 Mar 2018 02:01:20 +0000 (18:01 -0800)]
tools: bpftool: add out edges for each basic-block

This patch adds out edges for each basic-block. We will need these out
edges to finish the .dot graph drawing.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agotools: bpftool: partition basic-block for each function in the CFG
Jiong Wang [Fri, 2 Mar 2018 02:01:19 +0000 (18:01 -0800)]
tools: bpftool: partition basic-block for each function in the CFG

This patch partition basic-block for each function in the CFG. The
algorithm is simple, we identify basic-block head in a first traversal,
then second traversal to identify the tail.

We could build extended basic-block (EBB) in next steps. EBB could make the
graph more readable when the eBPF sequence is big.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agotools: bpftool: detect sub-programs from the eBPF sequence
Jiong Wang [Fri, 2 Mar 2018 02:01:18 +0000 (18:01 -0800)]
tools: bpftool: detect sub-programs from the eBPF sequence

This patch detect all sub-programs from the eBPF sequence and keep the
information in the new CFG data structure.

The detection algorithm is basically the same as the one in verifier except
we need to use insn->off instead of insn->imm to get the pc-relative call
offset. Because verifier has modified insn->off/insn->imm during finishing
the verification.

Also, we don't need to do some sanity checks as verifier has done them.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agotools: bpftool: factor out xlated dump related code into separate file
Jiong Wang [Fri, 2 Mar 2018 02:01:17 +0000 (18:01 -0800)]
tools: bpftool: factor out xlated dump related code into separate file

This patch factors out those code of dumping xlated eBPF instructions into
xlated_dumper.[h|c].

They are quite independent dumper functions, so better to be kept
separately.

New dumper support will be added in later patches in this set.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agotools: bpftool: remove unnecessary 'if' to reduce indentation
Jiong Wang [Fri, 2 Mar 2018 02:01:16 +0000 (18:01 -0800)]
tools: bpftool: remove unnecessary 'if' to reduce indentation

It is obvious we could use 'else if' instead of start a new 'if' in the
touched code.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agosocket: skip checking sk_err for recvmmsg(MSG_ERRQUEUE)
Soheil Hassas Yeganeh [Tue, 27 Feb 2018 23:22:40 +0000 (18:22 -0500)]
socket: skip checking sk_err for recvmmsg(MSG_ERRQUEUE)

recvmmsg does not call ___sys_recvmsg when sk_err is set.
That is fine for normal reads but, for MSG_ERRQUEUE, recvmmsg
should always call ___sys_recvmsg regardless of sk->sk_err to
be able to clear error queue. Otherwise, users are not able to
drain the error queue using recvmmsg.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-phy-Reduce-duplication'
David S. Miller [Fri, 2 Mar 2018 02:23:42 +0000 (21:23 -0500)]
Merge branch 'net-phy-Reduce-duplication'

Florian Fainelli says:

====================
net: phy: Reduce duplication

This patch series reduces the duplication among 10G PHY drivers that just
essentially stub most functions, but do that while replicating what the existing
generic functions do.

Changes in v3:

- removed unused "reg" variable in teranetics.c
- fixed subject for patch 5 since we actually use gen10g_no_soft_reset()

Changes in v2:

- rename gen10g_soft_reset() to gen10g_no_soft_reset() to better illustrate
  what it does (or does not)
- removed stray comment in marvell10g.c
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: marvell10g: Utilize gen10g_no_soft_reset()
Florian Fainelli [Fri, 2 Mar 2018 00:08:59 +0000 (16:08 -0800)]
net: phy: marvell10g: Utilize gen10g_no_soft_reset()

We do the same thing as the generic function: nothing, so utilize it.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
6 years agonet: phy: cortina: Utilize generic functions
Florian Fainelli [Fri, 2 Mar 2018 00:08:58 +0000 (16:08 -0800)]
net: phy: cortina: Utilize generic functions

cortina_soft_reset() does the same thing as gen10g_soft_reset(), and
cortina_config_aneg() is actually doing what gen10g_config_init() does
for 10G capable PHYs.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
6 years agonet: phy: teranetics: Utilize generic functions
Florian Fainelli [Fri, 2 Mar 2018 00:08:57 +0000 (16:08 -0800)]
net: phy: teranetics: Utilize generic functions

Update teranetics_aneg_done() to use genphy_c45_aneg_done() instead of
duplicating that code, and switch to gen10g_* functions where
appropriate instead of maintaining identical copies doing nothing.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
6 years agonet: phy: Export gen10g_* functions
Florian Fainelli [Fri, 2 Mar 2018 00:08:56 +0000 (16:08 -0800)]
net: phy: Export gen10g_* functions

In order to remove a fair amount of duplication in the different 10G PHY
drivers, export all gen10g_* functions to be able to make use of those.
While we are at it, rename gen10g_soft_reset() to gen10g_no_soft_reset()
to illustrate what it does.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
6 years agonet: phy: aquantia: Utilize genphy_c45_aneg_done()
Florian Fainelli [Fri, 2 Mar 2018 00:08:55 +0000 (16:08 -0800)]
net: phy: aquantia: Utilize genphy_c45_aneg_done()

The driver duplicates what the generic function does, so use the generic
function intead.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
6 years agoMerge branch 'mac89x0-fixes-and-cleanups'
David S. Miller [Fri, 2 Mar 2018 02:21:36 +0000 (21:21 -0500)]
Merge branch 'mac89x0-fixes-and-cleanups'

Finn Thain says:

====================
Fixes, cleanup and modernization for mac89x0 driver

Changes since v4 of combined patch series:
- Removed redundant and non-portable MACH_IS_MAC tests.
- Added acked-by tags from Geert Uytterhoeven.
- Omitted patches unrelated to mac89x0 driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mac89x0: Replace custom debug logging with netif_* calls
Finn Thain [Thu, 1 Mar 2018 23:29:28 +0000 (18:29 -0500)]
net/mac89x0: Replace custom debug logging with netif_* calls

Adopt the conventional style of debug logging because it is both
shorter and more flexible.
Remove the 'version_printed' flag as the version will be printed
only once anyway (when the module loads).

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mac89x0: Fix and modernize log messages
Finn Thain [Thu, 1 Mar 2018 23:29:28 +0000 (18:29 -0500)]
net/mac89x0: Fix and modernize log messages

Fix log message fragments that no longer produce the desired output
since the behaviour of printk() was changed.
Add missing printk severity levels.
Drop deprecated "out of memory" message as per checkpatch advice.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mac89x0: Convert to platform_driver
Finn Thain [Thu, 1 Mar 2018 23:29:28 +0000 (18:29 -0500)]
net/mac89x0: Convert to platform_driver

Apparently these Dayna cards don't have a pseudoslot declaration ROM
which means they can't be probed like NuBus cards.

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mac89x0: Remove redundant code
Finn Thain [Thu, 1 Mar 2018 23:29:28 +0000 (18:29 -0500)]
net/mac89x0: Remove redundant code

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'forwarding-selftest-fixes'
David S. Miller [Fri, 2 Mar 2018 02:19:03 +0000 (21:19 -0500)]
Merge branch 'forwarding-selftest-fixes'

David Ahern says:

====================
selftests: forwarding: misc bug fixes and enhancements

Bug fixes and an enhancement for the recent forwarding tests:
- only check tc version on tc tests
- handle multipath tests failing with 0 packet count
- fix ping command for IPv6 on Debian jessie
- improve summary of multipath tests

v2
- add CHECK_TC to bridge_vlan_aware.sh (Ido)
- dropped patch 2; always check for mz given its use
- fixed commit message for the last patch (Multipath: was dropped)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Add description to the multipath tests
David Ahern [Thu, 1 Mar 2018 21:49:33 +0000 (13:49 -0800)]
selftests: forwarding: Add description to the multipath tests

Add a better description to the summary for multipath tests. e.g.,

INFO: Running IPv6 multipath tests
TEST: ECMP                                               [PASS]
INFO: Expected ratio 1.00 Measured ratio 1.02
TEST: Weighted MP 2:1                                    [PASS]
INFO: Expected ratio 2.00 Measured ratio 2.02
TEST: Weighted MP 11:45                                  [PASS]
INFO: Expected ratio 4.09 Measured ratio 4.03

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Use PING6 instead of ping for ipv6 multipath test
David Ahern [Thu, 1 Mar 2018 21:49:32 +0000 (13:49 -0800)]
selftests: forwarding: Use PING6 instead of ping for ipv6 multipath test

On Debian jessie ping can not handle IPv6 addresses so the command
fails. Use PING6 which is set to ping6.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Handle 0 for packet difference in multipath tests
David Ahern [Thu, 1 Mar 2018 21:49:31 +0000 (13:49 -0800)]
selftests: forwarding: Handle 0 for packet difference in multipath tests

If the packet stats have a difference of 0, the test output shows:
INFO: Expected ratio 2.00 Measured ratio
Runtime error (func=(main), adr=9): Divide by zero
(standard_in) 2: syntax error
(standard_in) 1: syntax error
./router_multipath.sh: line 187: test: : integer expression expected
TEST: Multipath                                                     [FAIL]
Too large discrepancy between expected and measured ratios

Handle the 0 and display a cleaner message:
INFO: Running IPv6 multipath tests
TEST: Multipath                                                     [FAIL]
Packet difference is 0

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Only check tc version for tc tests
David Ahern [Thu, 1 Mar 2018 21:49:30 +0000 (13:49 -0800)]
selftests: forwarding: Only check tc version for tc tests

Capabilities of tc command are irrelevant for router tests:
    $ ./router.sh
    SKIP: iproute2 too old, missing shared block support

Add a CHECK_TC flag and only check tc capabilities if set. Add flag to
tc_common.sh and have it sourced before lib.sh

Also, if the command lacks some feature the test should exit non-0.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agofib_rules: FRA_GENERIC_POLICY updates for ip proto, sport and dport attrs
Roopa Prabhu [Fri, 2 Mar 2018 01:55:37 +0000 (17:55 -0800)]
fib_rules: FRA_GENERIC_POLICY updates for ip proto, sport and dport attrs

Fixes: bfff4862653b ("net: fib_rules: support for match on ip_proto, sport and dport")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosamples/bpf: detach prog from cgroup
Prashant Bhole [Thu, 1 Mar 2018 05:47:40 +0000 (14:47 +0900)]
samples/bpf: detach prog from cgroup

test_cgrp2_sock.sh and test_cgrp2_sock2.sh tests keep the program
attached to cgroup even after completion.
Using detach functionality of test_cgrp2_sock in both scripts.

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoipv6: allow userspace to add IFA_F_OPTIMISTIC addresses
Sabrina Dubroca [Wed, 28 Feb 2018 15:40:08 +0000 (16:40 +0100)]
ipv6: allow userspace to add IFA_F_OPTIMISTIC addresses

According to RFC 4429 (section 3.1), adding new IPv6 addresses as
optimistic addresses is acceptable, as long as the implementation
follows some rules:

   * Optimistic DAD SHOULD only be used when the implementation is aware
        that the address is based on a most likely unique interface
        identifier (such as in [RFC2464]), generated randomly [RFC3041],
        or by a well-distributed hash function [RFC3972] or assigned by
        Dynamic Host Configuration Protocol for IPv6 (DHCPv6) [RFC3315].
        Optimistic DAD SHOULD NOT be used for manually entered
        addresses.

Thus, it seems reasonable to allow userspace to set the optimistic flag
when adding new addresses.

We must not let userspace set NODAD + OPTIMISTIC, since if the kernel is
not performing DAD we would never clear the optimistic flag. We must
also ignore userspace's request to add OPTIMISTIC flag to addresses that
have already completed DAD (addresses that don't have the TENTATIVE
flag, or that have the DADFAILED flag).

Then we also need to clear the OPTIMISTIC flag on permanent addresses
when DAD fails. Otherwise, IFA_F_OPTIMISTIC addresses added by userspace
can still be used after DAD has failed, because in
ipv6_chk_addr_and_flags(), IFA_F_OPTIMISTIC overrides IFA_F_TENTATIVE.

Setting IFA_F_OPTIMISTIC from userspace is conditional on
CONFIG_IPV6_OPTIMISTIC_DAD and the optimistic_dad sysctl.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Fix spelling mistake "greater then" -> "greater than"
Gal Pressman [Wed, 28 Feb 2018 13:59:15 +0000 (15:59 +0200)]
net: Fix spelling mistake "greater then" -> "greater than"

Fix trivial spelling mistake "greater then" -> "greater than".

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phylink: Remove redundant netdev.phydev assignment
Richard Cochran [Fri, 23 Feb 2018 15:25:46 +0000 (07:25 -0800)]
net: phylink: Remove redundant netdev.phydev assignment

As a part of working on MII time stamping infrastructure, I was trying
to figure out how netdev->phydev gets assigned, and I stumbled across
this.  Ever since the new phylink code came in, the field is assigned
twice.

The function, phylink_connect_phy(), calls

phy_attach_direct()
phylink_bringup_phy()

and phy_attach_direct() sets

dev->phydev = phydev;

but phylink_bringup_phy() then sets the same field again:

pl->netdev->phydev = phy;

Similarly, the function, phylink_of_phy_connect(), calls

of_phy_attach()
phy_attach_direct()
phylink_bringup_phy()

The removal code is also duplicated:

phylink_disconnect_phy()
pl->netdev->phydev = NULL;
phy_disconnect()
phy_detach()
phydev->attached_dev->phydev = NULL;

This patch removes the redundant assignments, restricting manipulation
of the netdev.phydev field to phy_attach_direct() and phy_detach().

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'smc-link-layer-control-enhancements'
David S. Miller [Thu, 1 Mar 2018 18:21:32 +0000 (13:21 -0500)]
Merge branch 'smc-link-layer-control-enhancements'

Ursula Braun says:

====================
net/smc: Link Layer Control enhancements

here is a series of smc patches enabling SMC communication with peers
supporting more than one link per link group.

The first three patches are preparing code cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: prevent new connections on link group
Karsten Graul [Thu, 1 Mar 2018 12:51:33 +0000 (13:51 +0100)]
net/smc: prevent new connections on link group

When the processing of a DELETE LINK message has started,
new connections should not be added to the link group that
is about to terminate.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: process add/delete link messages
Karsten Graul [Thu, 1 Mar 2018 12:51:32 +0000 (13:51 +0100)]
net/smc: process add/delete link messages

Add initial support for the LLC messages ADD LINK and DELETE LINK.
Introduce a link state field. Extend the initial LLC handshake with
ADD LINK processing.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: do not allow eyecatchers in rmbe
Karsten Graul [Thu, 1 Mar 2018 12:51:31 +0000 (13:51 +0100)]
net/smc: do not allow eyecatchers in rmbe

SMC does not support eyecatchers in RMB elements,
decline peers requesting this support.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: process confirm/delete rkey messages
Karsten Graul [Thu, 1 Mar 2018 12:51:30 +0000 (13:51 +0100)]
net/smc: process confirm/delete rkey messages

Process and respond to CONFIRM RKEY and DELETE RKEY messages.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: respond to test link messages
Karsten Graul [Thu, 1 Mar 2018 12:51:29 +0000 (13:51 +0100)]
net/smc: respond to test link messages

Add TEST LINK message responses, which also serves as preparation for
support of sockopt TCP_KEEPALIVE.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: remove unused fields from smc structures
Karsten Graul [Thu, 1 Mar 2018 12:51:28 +0000 (13:51 +0100)]
net/smc: remove unused fields from smc structures

The daddr field holds the destination IPv4 address. The field was set but
never used and can be removed. The addr field was a left-over from an
earlier version of non-blocking connects and can be removed.
The result of the call to kernel_getpeername is not used, the call can be
removed. Non-blocking connects are working, so remove restriction comment.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: move netinfo function to file smc_clc.c
Karsten Graul [Thu, 1 Mar 2018 12:51:27 +0000 (13:51 +0100)]
net/smc: move netinfo function to file smc_clc.c

The function smc_netinfo_by_tcpsk() belongs to CLC handling.
Move it to smc_clc.c and rename to smc_clc_netinfo_by_tcpsk.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: cleanup smc_llc.h and smc_clc.h headers
Stefan Raspl [Thu, 1 Mar 2018 12:51:26 +0000 (13:51 +0100)]
net/smc: cleanup smc_llc.h and smc_clc.h headers

Remove structures used internal only from headers.
And remove an extra function parameter.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'ipv4-ipv6-mcast-align'
David S. Miller [Thu, 1 Mar 2018 18:13:24 +0000 (13:13 -0500)]
Merge branch 'ipv4-ipv6-mcast-align'

Yuval Mintz says:

====================
ipmr, ip6mr: Align multicast routing for IPv4 & IPv6

Historically ip6mr was based [cut-n-paste] on ipmr and the two have not
diverged too much. Apparently as ipv4 multicast routing is more common
than its ipv6 brethren modifications since then are mostly one-way,
affecting ipmr while leaving ip6mr unchanged.

This series is meant to re-factor both ipmr and ip6mr into having common
structures [and some functionality], adding 2 new common files -
mroute_base.h and ipmr_base.c.

The series begins by bringing ip6mr up to speed to some of the changes
applied in the past to ipmr [#2, #3].
It is then possible to re-factor a lot of the common structures -
vif devices [#1], mr_table [#4] mfc_cache [#6], and use the common
structures in both ipmr and ip6mr.

The rest of the patches re-factor some choice flows used by both ipmr
and ip6mr and eliminates duplicity.

This series would later allow for easy extension of ipmr offloading
to support ip6mr offloading as well, as almost all structures
related to the offloading would be shared between the two protocols.

Changes from previous versions
------------------------------
v2:
  - #6 Corrected reporting logic when hitting an unresolved cache
  - #7 Addressed kernel doc style [Thanks Nikolay]

RFC -> v1:
  - Corrected support for CONFIG_IP{,V6}_MROUTE_MULTIPLE_TABLES
  - Addressed a couple of kbuild test robot issues
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipmr, ip6mr: Unite dumproute flows
Yuval Mintz [Wed, 28 Feb 2018 21:29:39 +0000 (23:29 +0200)]
ipmr, ip6mr: Unite dumproute flows

The various MFC entries are being held in the same kind of mr_tables
for both ipmr and ip6mr, and their traversal logic is identical.
Also, with the exception of the addresses [and other small tidbits]
the major bulk of the nla setting is identical.

Unite as much of the dumping as possible between the two.
Notice this requires creating an mr_table iterator for each, as the
for-each preprocessor macro can't be used by the common logic.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoip6mr: Remove MFC_NOTIFY and refactor flags
Yuval Mintz [Wed, 28 Feb 2018 21:29:38 +0000 (23:29 +0200)]
ip6mr: Remove MFC_NOTIFY and refactor flags

MFC_NOTIFY exists in ip6mr, probably as some legacy code
[was already removed for ipmr in commit
06bd6c0370bb ("net: ipmr: remove unused MFC_NOTIFY flag and make the flags enum").
Remove it from ip6mr as well, and move the enum into a common file;
Notice MFC_OFFLOAD is currently only used by ipmr.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipmr, ip6mr: Unite vif seq functions
Yuval Mintz [Wed, 28 Feb 2018 21:29:37 +0000 (23:29 +0200)]
ipmr, ip6mr: Unite vif seq functions

Same as previously done with the mfc seq, the logic for the vif seq is
refactored to be shared between ipmr and ip6mr.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipmr, ip6mr: Unite mfc seq logic
Yuval Mintz [Wed, 28 Feb 2018 21:29:36 +0000 (23:29 +0200)]
ipmr, ip6mr: Unite mfc seq logic

With the exception of the final dump, ipmr and ip6mr have the exact same
seq logic for traversing a given mr_table. Refactor that code and make
it common.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipmr, ip6mr: Unite logic for searching in MFC cache
Yuval Mintz [Wed, 28 Feb 2018 21:29:35 +0000 (23:29 +0200)]
ipmr, ip6mr: Unite logic for searching in MFC cache

ipmr and ip6mr utilize the exact same methods for searching the
hashed resolved connections, difference being only in the construction
of the hash comparison key.

In order to unite the flow, introduce an mr_table operation set that
would contain the protocol specific information required for common
flows, in this case - the hash parameters and a comparison key
representing a (*,*) route.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipmr, ip6mr: Make mfc_cache a common structure
Yuval Mintz [Wed, 28 Feb 2018 21:29:34 +0000 (23:29 +0200)]
ipmr, ip6mr: Make mfc_cache a common structure

mfc_cache and mfc6_cache are almost identical - the main difference is
in the origin/group addresses and comparison-key. Make a common
structure encapsulating most of the multicast routing logic  - mr_mfc
and convert both ipmr and ip6mr into using it.

For easy conversion [casting, in this case] mr_mfc has to be the first
field inside every multicast routing abstraction utilizing it.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipmr, ip6mr: Unite creation of new mr_table
Yuval Mintz [Wed, 28 Feb 2018 21:29:33 +0000 (23:29 +0200)]
ipmr, ip6mr: Unite creation of new mr_table

Now that both ipmr and ip6mr are using the same mr_table structure,
we can have a common function to allocate & initialize a new instance.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomroute*: Make mr_table a common struct
Yuval Mintz [Wed, 28 Feb 2018 21:29:32 +0000 (23:29 +0200)]
mroute*: Make mr_table a common struct

Following previous changes to ip6mr, mr_table and mr6_table are
basically the same [up to mr6_table having additional '6' suffixes to
its variable names].
Move the common structure definition into a common header; This
requires renaming all references in ip6mr to variables that had the
distinct suffix.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoip6mr: Align hash implementation to ipmr
Yuval Mintz [Wed, 28 Feb 2018 21:29:31 +0000 (23:29 +0200)]
ip6mr: Align hash implementation to ipmr

Since commit 8fb472c09b9d ("ipmr: improve hash scalability") ipmr has
been using rhashtable as a basis for its mfc routes, but ip6mr is
currently still using the old private MFC hash implementation.

Align ip6mr to the current ipmr implementation.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoip6mr: Make mroute_sk rcu-based
Yuval Mintz [Wed, 28 Feb 2018 21:29:30 +0000 (23:29 +0200)]
ip6mr: Make mroute_sk rcu-based

In ipmr the mr_table socket is handled under RCU. Introduce the same
for ip6mr.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipmr,ipmr6: Define a uniform vif_device
Yuval Mintz [Wed, 28 Feb 2018 21:29:29 +0000 (23:29 +0200)]
ipmr,ipmr6: Define a uniform vif_device

The two implementations have almost identical structures - vif_device and
mif_device. As a step toward uniforming the mr_tables, eliminate the
mif_device and relocate the vif_device definition into a new common
header file.

Also, introduce a common initializing function for setting most of the
vif_device fields in a new common source file. This requires modifying
the ipv{4,6] Kconfig and ipv4 makefile as we're introducing a new common
config option - CONFIG_IP_MROUTE_COMMON.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'fib_rules-support-sport-dport-and-proto-match'
David S. Miller [Thu, 1 Mar 2018 03:45:05 +0000 (22:45 -0500)]
Merge branch 'fib_rules-support-sport-dport-and-proto-match'

Roopa Prabhu says:

====================
fib_rules: support sport, dport and proto match

This series extends fib rule match support to include sport, dport
and ip proto match (to complete the 5-tuple match support).
Common use-cases of Policy based routing in the data center require
5-tuple match. The last 2 patches in the series add a call to flow dissect
in the fwd path if required by the installed fib rules (controlled by a flag).

v1:
  - Fix errors reported by kbuild and feedback on RFC
  - extend port match uapi to accomodate port ranges

v2:
  - address comments from Nikolay, David Ahern and Paolo (Thanks!)

Pending things I will submit separate patches for:
  - extack for fib rules
  - fib rules test (as requested by david ahern)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv6: route: dissect flow in input path if fib rules need it
Roopa Prabhu [Thu, 1 Mar 2018 03:43:22 +0000 (22:43 -0500)]
ipv6: route: dissect flow in input path if fib rules need it

Dissect flow in fwd path if fib rules require it. Controlled by
a flag to avoid penatly for the common case. Flag is set when fib
rules with sport, dport and proto match that require flow dissect
are installed. Also passes the dissected hash keys to the multipath
hash function when applicable to avoid dissecting the flow again.
icmp packets will continue to use inner header for hash
calculations.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv6: route: dissect flow in input path if fib rules need it
Roopa Prabhu [Thu, 1 Mar 2018 03:42:41 +0000 (22:42 -0500)]
ipv6: route: dissect flow in input path if fib rules need it

Dissect flow in fwd path if fib rules require it. Controlled by
a flag to avoid penatly for the common case. Flag is set when fib
rules with sport, dport and proto match that require flow dissect
are installed. Also passes the dissected hash keys to the multipath
hash function when applicable to avoid dissecting the flow again.
icmp packets will continue to use inner header for hash
calculations (Thanks to Nikolay Aleksandrov for some review here).

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv6: fib6_rules: support for match on sport, dport and ip proto
Roopa Prabhu [Thu, 1 Mar 2018 03:41:37 +0000 (22:41 -0500)]
ipv6: fib6_rules: support for match on sport, dport and ip proto

support to match on src port, dst port and ip protocol.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv4: fib_rules: support match on sport, dport and ip proto
Roopa Prabhu [Thu, 1 Mar 2018 03:41:06 +0000 (22:41 -0500)]
ipv4: fib_rules: support match on sport, dport and ip proto

support to match on src port, dst port and ip protocol.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: fib_rules: support for match on ip_proto, sport and dport
Roopa Prabhu [Thu, 1 Mar 2018 03:40:16 +0000 (22:40 -0500)]
net: fib_rules: support for match on ip_proto, sport and dport

uapi for ip_proto, sport and dport range match
in fib rules.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: fix interrupt number after adding support for MSI-X interrupts
Heiner Kallweit [Wed, 28 Feb 2018 19:43:38 +0000 (20:43 +0100)]
r8169: fix interrupt number after adding support for MSI-X interrupts

In case of MSI-X the interrupt number may differ from pcidev->irq.
Fix this by using pci_irq_vector().

Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Wed, 28 Feb 2018 17:34:20 +0000 (12:34 -0500)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2018-02-28

This series contains updates to fm10k only.

Jake provides all the changes in this series, starting with making the
function header comments consistent and to align with how the kernel
documentation expects it.  Also cleaned up code comment as well as bump
the driver version.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'selftests-forwarding-Add-VRF-based-tests'
David S. Miller [Wed, 28 Feb 2018 17:25:49 +0000 (12:25 -0500)]
Merge branch 'selftests-forwarding-Add-VRF-based-tests'

Ido Schimmel says:

====================
selftests: forwarding: Add VRF-based tests

One of the nice things about network namespaces is that they allow one
to easily create and test complex environments.

Unfortunately, these namespaces can not be used with actual switching
ASICs, as their ports can not be migrated to other network namespaces
(NETIF_F_NETNS_LOCAL) and most of them probably do not support the
L1-separation provided by namespaces.

However, a similar kind of flexibility can be achieved by using VRFs and
by looping the switch ports together. For example:

                             br0
                              +
               vrf-h1         |           vrf-h2
                 +        +---+----+        +
                 |        |        |        |
    192.0.2.1/24 +        +        +        + 192.0.2.2/24
               swp1     swp2     swp3     swp4
                 +        +        +        +
                 |        |        |        |
                 +--------+        +--------+

The VRFs act as lightweight namespaces representing hosts connected to
the switch.

This approach for testing switch ASICs has several advantages over the
traditional method that requires multiple physical machines, to name a
few:

1. Only the device under test (DUT) is being tested without noise from
other system.

2. Ability to easily provision complex topologies. Testing bridging
between 4-ports LAGs or 8-way ECMP requires many physical links that are
not always available. With the VRF-based approach one merely needs to
loopback more ports.

These tests are written with switch ASICs in mind, but they can be run
on any Linux box using veth pairs to emulate physical loopbacks.

v2:
* Order local variables declaration according to function arguments
  order (Petr)

v1:
* Change location to net/forwarding instead of forwarding/
* Add ability to pause on failure
* Add ability to pause on cleanup
* Make configuration file optional
* Make ping/ping6/mz configurable
* Add more tc tests
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Introduce basic shared blocks tests
Jiri Pirko [Wed, 28 Feb 2018 10:25:19 +0000 (12:25 +0200)]
selftests: forwarding: Introduce basic shared blocks tests

Test shared block infrastructure. This is a basic test that shares TC
block in between 2 clsact qdiscs.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Introduce basic tc chains tests
Jiri Pirko [Wed, 28 Feb 2018 10:25:18 +0000 (12:25 +0200)]
selftests: forwarding: Introduce basic tc chains tests

Tests chains matching and goto chain action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Introduce tc actions tests
Jiri Pirko [Wed, 28 Feb 2018 10:25:17 +0000 (12:25 +0200)]
selftests: forwarding: Introduce tc actions tests

Add first part of actions tests. This patch only contains tests of gact
ok/drop/trap and mirred redirect egress.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Introduce tc flower matching tests
Jiri Pirko [Wed, 28 Feb 2018 10:25:16 +0000 (12:25 +0200)]
selftests: forwarding: Introduce tc flower matching tests

Add first part of flower tests. This patch only contains dst/src ip/mac
matching.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Allow to get netdev interfaces names from commandline
Jiri Pirko [Wed, 28 Feb 2018 10:25:15 +0000 (12:25 +0200)]
selftests: forwarding: Allow to get netdev interfaces names from commandline

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Add MAC get helper
Jiri Pirko [Wed, 28 Feb 2018 10:25:14 +0000 (12:25 +0200)]
selftests: forwarding: Add MAC get helper

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Add tc offload check helper
Jiri Pirko [Wed, 28 Feb 2018 10:25:13 +0000 (12:25 +0200)]
selftests: forwarding: Add tc offload check helper

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>