platform/kernel/linux-rpi3.git
6 years agonet: Convert tcpv6_net_ops
Kirill Tkhai [Mon, 19 Feb 2018 08:49:40 +0000 (11:49 +0300)]
net: Convert tcpv6_net_ops

These pernet_operations create and destroy net::ipv6.tcp_sk
socket, which is used in tcp_v6_send_response() only. It looks
like foreign pernet_operations don't want to set ipv6 connection
inside destroyed net, so this socket may be created in destroyed
in parallel with anything else. inet_twsk_purge() is also safe
for that, as described in patch for tcp_sk_ops. So, it's possible
to mark them as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert fib6_rules_net_ops
Kirill Tkhai [Mon, 19 Feb 2018 08:49:31 +0000 (11:49 +0300)]
net: Convert fib6_rules_net_ops

These pernet_operations register and unregister
net::ipv6.fib6_rules_ops, which are used for
routing. It looks like there are no pernet_operations,
which send ipv6 packages to another net, so we
are able to mark them as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert ipv6_inetpeer_ops
Kirill Tkhai [Mon, 19 Feb 2018 08:49:20 +0000 (11:49 +0300)]
net: Convert ipv6_inetpeer_ops

net->ipv6.peers is dereferenced in three places via inet_getpeer_v6(),
and it's used to handle skb. All the users of inet_getpeer_v6() do not
look like be able to be called from foreign net pernet_operations, so
we may mark them as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert raw6_net_ops, udplite6_net_ops, ipv6_proc_ops, if6_proc_net_ops and...
Kirill Tkhai [Mon, 19 Feb 2018 08:49:10 +0000 (11:49 +0300)]
net: Convert raw6_net_ops, udplite6_net_ops, ipv6_proc_ops, if6_proc_net_ops and ip6_route_net_late_ops

These pernet_operations create and destroy /proc entries
and safely may be converted and safely may be mark as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert icmpv6_sk_ops, ndisc_net_ops and igmp6_net_ops
Kirill Tkhai [Mon, 19 Feb 2018 08:48:57 +0000 (11:48 +0300)]
net: Convert icmpv6_sk_ops, ndisc_net_ops and igmp6_net_ops

These pernet_operations create and destroy net::ipv6.icmp_sk
socket, used to send ICMP or error reply.

Nobody can dereference the socket to handle a packet before
net is initialized, as there is no routing; nobody can do
that in parallel with exit, as all of devices are moved
to init_net or destroyed and there are no packets it-flight.
So, it's possible to mark these pernet_operations as async.

The same for ndisc_net_ops and for igmp6_net_ops. The last
one also creates and destroys /proc entries.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert ip6mr_net_ops
Kirill Tkhai [Mon, 19 Feb 2018 08:48:37 +0000 (11:48 +0300)]
net: Convert ip6mr_net_ops

These pernet_operations create and destroy /proc entries,
populate and depopulate net::rules_ops and multiroute table.
All the structures are pernet, and they are not touched
by foreign net pernet_operations. So, it's possible to mark
them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert cfg80211_pernet_ops
Kirill Tkhai [Mon, 19 Feb 2018 08:48:14 +0000 (11:48 +0300)]
net: Convert cfg80211_pernet_ops

This patch finishes converting pernet_operations
registered in net/wireless directory.

These pernet_operations have only exit method,
which moves devices to init_net. This action
is not pernet_operations-specific, and function
cfg80211_switch_netns() may be called all time
during the system life. All necessary protection
against concurrent cfg80211_pernet_exit() is made
by rtnl_lock(). So, cfg80211_pernet_ops is able
to be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert inet6_net_ops
Kirill Tkhai [Mon, 19 Feb 2018 08:48:04 +0000 (11:48 +0300)]
net: Convert inet6_net_ops

init method initializes sysctl defaults, allocates
percpu arrays and creates /proc entries.
exit method reverts the above.

There are no pernet_operations, which are interested
in the above entities of foreign net namespace, so
inet6_net_ops are able to be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodpaa_eth: fix pause capability advertisement logic
Jake Moroni [Sun, 18 Feb 2018 20:26:04 +0000 (15:26 -0500)]
dpaa_eth: fix pause capability advertisement logic

The ADVERTISED_Asym_Pause bit was being improperly set when both
rx and tx pause were enabled. When rx and tx are both enabled, only
the ADVERTISED_Pause bit is supposed to be set.

Signed-off-by: Jake Moroni <mail@jakemoroni.com>
Acked-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: simplify sh_eth_check_reset()
Sergei Shtylyov [Sun, 18 Feb 2018 15:21:05 +0000 (18:21 +0300)]
sh_eth: simplify sh_eth_check_reset()

The *while* loop in this function  can be turned into a normal *for* loop.
And getting rid  of the  single return point saves us a few more LoCs...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'dwmac-meson8b-small-cleanup'
David S. Miller [Mon, 19 Feb 2018 16:26:32 +0000 (11:26 -0500)]
Merge branch 'dwmac-meson8b-small-cleanup'

Martin Blumenstingl says:

====================
dwmac-meson8b: small cleanup

This is a follow-up to my previous series "dwmac-meson8b: clock fixes for
Meson8b" from [0].
during the review of that series it was found that the clock registration
could be simplified. now that the previous series has landed we can start
cleaning up the clock registration.

the goal of this series is to simplify the code in the dwmac-meson8b
driver. no functional changes are intended.

I have tested this on my Khadas VIM2 (GXM SoC, with RGMII PHY) and my
Endless Mini (EC-100, Meson8b SoC with RMII PHY, .dts support is not part
of mainline yet). no problems were found.

[0] http://lists.infradead.org/pipermail/linux-amlogic/2018-January/006143.html
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: stmmac: dwmac-meson8b: make the clock configurations private
Martin Blumenstingl [Sat, 17 Feb 2018 14:08:20 +0000 (15:08 +0100)]
net: stmmac: dwmac-meson8b: make the clock configurations private

The common clock framework needs access to the "clock configuration"
structs during runtime.
However, only the common clock framework should access these. Ensure
this by moving the configuration structs out of struct meson8b_dwmac,
so only meson8b_init_rgmii_tx_clk() and the common clock framework know
about these configurations.

Suggested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: stmmac: dwmac-meson8b: only keep struct device around
Martin Blumenstingl [Sat, 17 Feb 2018 14:08:19 +0000 (15:08 +0100)]
net: stmmac: dwmac-meson8b: only keep struct device around

Nothing in the dwmac-meson8b driver (except .probe itself) requires the
platform_device anymore after .probe has finished. Replace it with a
pointer to struct device since this is what the functions inside the
driver are actually accessing.
No functional changes.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: stmmac: dwmac-meson8b: simplify clock registration
Martin Blumenstingl [Sat, 17 Feb 2018 14:08:18 +0000 (15:08 +0100)]
net: stmmac: dwmac-meson8b: simplify clock registration

To goal of this patch is to simplify the registration of the RGMII TX
clock (and it's parent clocks). This is achieved by:
- introducing the meson8b_dwmac_register_clk helper-function to remove
  code duplication when registering a single clock (this saves a few
  lines since we have 4 clocks internally)
- using devm_add_action_or_reset to disable the RGMII TX clock
  automatically when needed. This also allows us to re-use the standard
  stmmac_pltfr_remove function.
- devm_kasprintf() and devm_kstrdup() are not used anymore to generate
  the clock name (these are replaced by a variable on the stack) because
  the common clock framework already uses kstrdup() internally.

No functional changes intended.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: hwtstamp: remove unnecessary range checking tests
Gustavo A. R. Silva [Fri, 16 Feb 2018 17:47:39 +0000 (11:47 -0600)]
net: dsa: mv88e6xxx: hwtstamp: remove unnecessary range checking tests

_port_ is already known to be a valid index in the callers [1]. So
these checks are unnecessary.

[1] https://lkml.org/lkml/2018/2/16/469

Addresses-Coverity-ID: 1465287
Addresses-Coverity-ID: 1465291
Suggested-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoptr_ring: Remove now-redundant smp_read_barrier_depends()
Andrea Parri [Fri, 16 Feb 2018 11:06:13 +0000 (12:06 +0100)]
ptr_ring: Remove now-redundant smp_read_barrier_depends()

Because READ_ONCE() now implies smp_read_barrier_depends(), the
smp_read_barrier_depends() in __ptr_ring_consume() is redundant;
this commit removes it and updates the comments.

Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: <linux-kernel@vger.kernel.org>
Cc: <netdev@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotun: export flags, uid, gid, queue information over netlink
Sabrina Dubroca [Fri, 16 Feb 2018 10:03:07 +0000 (11:03 +0100)]
tun: export flags, uid, gid, queue information over netlink

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Only honor ifindex in IP_PKTINFO if non-0
David Ahern [Fri, 16 Feb 2018 19:03:03 +0000 (11:03 -0800)]
net: Only honor ifindex in IP_PKTINFO if non-0

Only allow ifindex from IP_PKTINFO to override SO_BINDTODEVICE settings
if the index is actually set in the message.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: avoid unintended sign extension on a 16 bit shift
Colin Ian King [Fri, 16 Feb 2018 16:55:05 +0000 (16:55 +0000)]
net: dsa: mv88e6xxx: avoid unintended sign extension on a 16 bit shift

The shifting of timehi by 16 bits to the left will be promoted to
a 32 bit signed int and then sign-extended to an u64. If the top bit
of timehi is set then all then all the upper bits of ns end up as also
being set because of the sign-extension. Fix this by making timehi and
timelo u64.  Also move the declaration of ns.

Detected by CoverityScan, CID#1465288 ("Unintended sign extension")

Fixes: c6fe0ad2c349 ("net: dsa: mv88e6xxx: add rx/tx timestamping support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoravb: add support for changing MTU
Niklas Söderlund [Fri, 16 Feb 2018 16:10:08 +0000 (17:10 +0100)]
ravb: add support for changing MTU

Allow for changing the MTU within the limit of the maximum size of a
descriptor (2048 bytes). Add the callback to change MTU from user-space
and take the configurable MTU into account when configuring the
hardware.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'nfp-whitespace-sync-and-flower-TCP-flags'
David S. Miller [Fri, 16 Feb 2018 21:24:24 +0000 (16:24 -0500)]
Merge branch 'nfp-whitespace-sync-and-flower-TCP-flags'

Jakub Kicinski says:

====================
nfp: whitespace sync and flower TCP flags

Whitespace cleanup from Michael and flower offload support for matching
on TCP flags from Pieter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonfp: flower: implement tcp flag match offload
Pieter Jansen van Vuuren [Fri, 16 Feb 2018 04:19:09 +0000 (20:19 -0800)]
nfp: flower: implement tcp flag match offload

Implement tcp flag match offloading. Current tcp flag match support include
FIN, SYN, RST, PSH and URG flags, other flags are unsupported. The PSH and
URG flags are only set in the hardware fast path when used in combination
with the SYN, RST and PSH flags.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonfp: standardize FW header whitespace
Michael Rapson [Fri, 16 Feb 2018 04:19:08 +0000 (20:19 -0800)]
nfp: standardize FW header whitespace

The nfp_net_ctrl.h file used spaces for indentation in the past but
tabs have crept in.  Host driver files use tabs for indentation by
default, so let's convert to tabs for consistency across the file
and our drivers.

Signed-off-by: Michael Rapson <michael.rapson@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-sched-act-add-extack-support'
David S. Miller [Fri, 16 Feb 2018 21:05:59 +0000 (16:05 -0500)]
Merge branch 'net-sched-act-add-extack-support'

Alexander Aring says:

====================
net: sched: act: add extack support

this patch series adds extack support for the TC action subsystem.
As example I for the extack support in a TC action I choosed mirred
action.

- Alex

Cc: David Ahern <dsahern@gmail.com>
changes since v3:
- adapt recommended changes from Davide Caratti, please check if
  I catch everything. Thanks.

changes since v2:

- remove newline in extack of generic walker handling
  Thanks to Davide Caratti
- add kernel@mojatatu.com in cc
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: act: mirred: add extack support
Alexander Aring [Thu, 15 Feb 2018 15:55:00 +0000 (10:55 -0500)]
net: sched: act: mirred: add extack support

This patch adds extack support for TC mirred action.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: act: handle extack in tcf_generic_walker
Alexander Aring [Thu, 15 Feb 2018 15:54:59 +0000 (10:54 -0500)]
net: sched: act: handle extack in tcf_generic_walker

This patch adds extack handling for a common used TC act function
"tcf_generic_walker()" to add an extack message on failures.
The tcf_generic_walker() function can fail if get a invalid command
different than DEL and GET. The naming "action" here is wrong, the
correct naming would be command.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: act: add extack for walk callback
Alexander Aring [Thu, 15 Feb 2018 15:54:58 +0000 (10:54 -0500)]
net: sched: act: add extack for walk callback

This patch adds extack support for act walker callback api. This
prepares to handle extack support inside each specific act
implementation.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: act: add extack for lookup callback
Alexander Aring [Thu, 15 Feb 2018 15:54:57 +0000 (10:54 -0500)]
net: sched: act: add extack for lookup callback

This patch adds extack support for act lookup callback api. This
prepares to handle extack support inside each specific act
implementation.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: act: add extack to init callback
Alexander Aring [Thu, 15 Feb 2018 15:54:56 +0000 (10:54 -0500)]
net: sched: act: add extack to init callback

This patch adds extack support for act init callback api. This
prepares to handle extack support inside each specific act
implementation.

Based on work by David Ahern <dsahern@gmail.com>

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: act: handle generic action errors
Alexander Aring [Thu, 15 Feb 2018 15:54:55 +0000 (10:54 -0500)]
net: sched: act: handle generic action errors

This patch adds extack support for generic act handling. The extack
will be set deeper to each called function which is not part of netdev
core api.

Based on work by David Ahern <dsahern@gmail.com>

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: act: add extack to init
Alexander Aring [Thu, 15 Feb 2018 15:54:54 +0000 (10:54 -0500)]
net: sched: act: add extack to init

This patch adds extack to tcf_action_init and tcf_action_init_1
functions. These are necessary to make individual extack handling in
each act implementation.

Based on work by David Ahern <dsahern@gmail.com>

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: act: fix code style
Alexander Aring [Thu, 15 Feb 2018 15:54:53 +0000 (10:54 -0500)]
net: sched: act: fix code style

This patch is used by subsequent patches. It fixes code style issues
caught by checkpatch.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'RDS-zerocopy-support'
David S. Miller [Fri, 16 Feb 2018 21:04:18 +0000 (16:04 -0500)]
Merge branch 'RDS-zerocopy-support'

Sowmini Varadhan says:

====================
RDS: zerocopy support

This is version 3 of the series, following up on review comments for
 http://patchwork.ozlabs.org/project/netdev/list/?series=28530

Review comments addressed
Patch 4
  - fix fragile use of skb->cb[], do not set ee_code incorrectly.
Patch 5:
  - remove needless bzero of skb->cb[], consolidate err cleanup

A brief overview of this feature follows.

This patch series provides support for MSG_ZERCOCOPY
on a PF_RDS socket based on the APIs and infrastructure added
by Commit f214f915e7db ("tcp: enable MSG_ZEROCOPY")

For single threaded rds-stress testing using rds-tcp with the
ixgbe driver using 1M message sizes (-a 1M -q 1M) preliminary
results show that  there is a significant reduction in latency: about
90 usec with zerocopy, compared with 200 usec without zerocopy.

This patchset modifies the above for zerocopy in the following manner.
- if the MSG_ZEROCOPY flag is specified with rds_sendmsg(), and,
- if the SO_ZEROCOPY  socket option has been set on the PF_RDS socket,
application pages sent down with rds_sendmsg are pinned. The pinning
uses the accounting infrastructure added by a91dbff551a6 ("sock: ulimit
on MSG_ZEROCOPY pages"). The message is unpinned when all references
to the message go down to 0, and the message is freed by rds_message_purge.

A multithreaded application using this infrastructure must send down
a unique 32 bit cookie as ancillary data with each sendmsg invocation.
The format of this ancillary data is described in Patch 5 of the series.
The cookie is passed up to the application on the sk_error_queue when
the message is unpinned, indicating to the application that it is now
safe to free/reuse the message buffer. The details of the completion
notification are provided in Patch 4 of this series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests/net: add zerocopy support for PF_RDS test case
Sowmini Varadhan [Thu, 15 Feb 2018 18:49:38 +0000 (10:49 -0800)]
selftests/net: add zerocopy support for PF_RDS test case

Send a cookie with sendmsg() on PF_RDS sockets, and process the
returned batched cookies in do_recv_completion()

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests/net: add support for PF_RDS sockets
Sowmini Varadhan [Thu, 15 Feb 2018 18:49:37 +0000 (10:49 -0800)]
selftests/net: add support for PF_RDS sockets

Add support for basic PF_RDS client-server testing in msg_zerocopy

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agords: zerocopy Tx support.
Sowmini Varadhan [Thu, 15 Feb 2018 18:49:36 +0000 (10:49 -0800)]
rds: zerocopy Tx support.

If the MSG_ZEROCOPY flag is specified with rds_sendmsg(), and,
if the SO_ZEROCOPY socket option has been set on the PF_RDS socket,
application pages sent down with rds_sendmsg() are pinned.

The pinning uses the accounting infrastructure added by
Commit a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages")

The payload bytes in the message may not be modified for the
duration that the message has been pinned. A multi-threaded
application using this infrastructure may thus need to be notified
about send-completion so that it can free/reuse the buffers
passed to rds_sendmsg(). Notification of send-completion will
identify each message-buffer by a cookie that the application
must specify as ancillary data to rds_sendmsg().
The ancillary data in this case has cmsg_level == SOL_RDS
and cmsg_type == RDS_CMSG_ZCOPY_COOKIE.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agords: support for zcopy completion notification
Sowmini Varadhan [Thu, 15 Feb 2018 18:49:35 +0000 (10:49 -0800)]
rds: support for zcopy completion notification

RDS removes a datagram (rds_message) from the retransmit queue when
an ACK is received. The ACK indicates that the receiver has queued
the RDS datagram, so that the sender can safely forget the datagram.
When all references to the rds_message are quiesced, rds_message_purge
is called to release resources used by the rds_message

If the datagram to be removed had pinned pages set up, add
an entry to the rs->rs_znotify_queue so that the notifcation
will be sent up via rds_rm_zerocopy_callback() when the
rds_message is eventually freed by rds_message_purge.

rds_rm_zerocopy_callback() attempts to batch the number of cookies
sent with each notification  to a max of SO_EE_ORIGIN_MAX_ZCOOKIES.
This is achieved by checking the tail skb in the sk_error_queue:
if this has room for one more cookie, the cookie from the
current notification is added; else a new skb is added to the
sk_error_queue. Every invocation of rds_rm_zerocopy_callback() will
trigger a ->sk_error_report to notify the application.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosock: permit SO_ZEROCOPY on PF_RDS socket
Sowmini Varadhan [Thu, 15 Feb 2018 18:49:34 +0000 (10:49 -0800)]
sock: permit SO_ZEROCOPY on PF_RDS socket

allow the application to set SO_ZEROCOPY on the underlying sk
of a PF_RDS socket

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agords: hold a sock ref from rds_message to the rds_sock
Sowmini Varadhan [Thu, 15 Feb 2018 18:49:33 +0000 (10:49 -0800)]
rds: hold a sock ref from rds_message to the rds_sock

The existing model holds a reference from the rds_sock to the
rds_message, but the rds_message does not itself hold a sock_put()
on the rds_sock. Instead the m_rs field in the rds_message is
assigned when the message is queued on the sock, and nulled when
the message is dequeued from the sock.

We want to be able to notify userspace when the rds_message
is actually freed (from rds_message_purge(), after the refcounts
to the rds_message go to 0). At the time that rds_message_purge()
is called, the message is no longer on the rds_sock retransmit
queue. Thus the explicit reference for the m_rs is needed to
send a notification that will signal to userspace that
it is now safe to free/reuse any pages that may have
been pinned down for zerocopy.

This patch manages the m_rs assignment in the rds_message with
the necessary refcount book-keeping.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoskbuff: export mm_[un]account_pinned_pages for other modules
Sowmini Varadhan [Thu, 15 Feb 2018 18:49:32 +0000 (10:49 -0800)]
skbuff: export mm_[un]account_pinned_pages for other modules

RDS would like to use the helper functions for managing pinned pages
added by Commit a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages")

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Revert sched action extack support series.
David S. Miller [Fri, 16 Feb 2018 21:03:39 +0000 (16:03 -0500)]
net: Revert sched action extack support series.

It was mis-applied and the changes had rejects.

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-sched-act-add-extack-support'
David S. Miller [Fri, 16 Feb 2018 20:44:42 +0000 (15:44 -0500)]
Merge branch 'net-sched-act-add-extack-support'

Alexander Aring says:

====================
net: sched: act: add extack support

this patch series adds extack support for the TC action subsystem.
As example I for the extack support in a TC action I choosed mirred
action.

- Alex

Cc: David Ahern <dsahern@gmail.com>
changes since v3:
- adapt recommended changes from Davide Caratti, please check if
  I catch everything. Thanks.

changes since v2:

- remove newline in extack of generic walker handling
  Thanks to Davide Caratti
- add kernel@mojatatu.com in cc
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: act: add extack to init
Alexander Aring [Thu, 15 Feb 2018 15:54:54 +0000 (10:54 -0500)]
net: sched: act: add extack to init

This patch adds extack to tcf_action_init and tcf_action_init_1
functions. These are necessary to make individual extack handling in
each act implementation.

Based on work by David Ahern <dsahern@gmail.com>

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: act: fix code style
Alexander Aring [Thu, 15 Feb 2018 15:54:53 +0000 (10:54 -0500)]
net: sched: act: fix code style

This patch is used by subsequent patches. It fixes code style issues
caught by checkpatch.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: fix unbalance in the error path of tca_action_flush()
Davide Caratti [Thu, 15 Feb 2018 14:50:57 +0000 (15:50 +0100)]
net: sched: fix unbalance in the error path of tca_action_flush()

When tca_action_flush() calls the action walk() and gets an error,
a successful call to nla_nest_start() is not followed by a call to
nla_nest_cancel(). It's harmless, as the skb is freed in the error
path - but it's worth to fix this unbalance.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'dsa-mv88e6xxx-Improve-PTP-access-latency'
David S. Miller [Fri, 16 Feb 2018 20:37:10 +0000 (15:37 -0500)]
Merge branch 'dsa-mv88e6xxx-Improve-PTP-access-latency'

Andrew Lunn says:

====================
net: dsa: mv88e6xxx: Improve PTP access latency

PTP needs to retrieve the hardware timestamps from the switch device
in a low latency manor. However ethtool -S and bridge fdb show can
hold the switch register access mutex for a long time. These patches
changes the reading the statistics and the ATU so that the mutex is
released and taken again between each statistic or ATU entry. The PTP
code can then interleave its access to the hardware, keeping its
latency low.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Release mutex between each ATU read
Andrew Lunn [Thu, 15 Feb 2018 13:38:35 +0000 (14:38 +0100)]
net: dsa: mv88e6xxx: Release mutex between each ATU read

The PTP code needs low latency access to the PTP hardware timestamps.
Reading all the ATU entries in one go adds a lot of latency to the PTP
code. So take and release the reg_lock mutex for each individual MAC
address in the ATU, allowing the PTP thread jump in between.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Release mutex between each statistics read
Andrew Lunn [Thu, 15 Feb 2018 13:38:34 +0000 (14:38 +0100)]
net: dsa: mv88e6xxx: Release mutex between each statistics read

The PTP code needs low latency access to the PTP hardware timestamps.
Reading all the statistics in one go adds a lot of latency to the PTP
code. So take and release the reg_lock mutex for each individual
statistics, allowing the PTP thread jump in between.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'tipc-de-generealize-topology-server'
David S. Miller [Fri, 16 Feb 2018 20:26:35 +0000 (15:26 -0500)]
Merge branch 'tipc-de-generealize-topology-server'

Jon Maloy says:

====================
tipc: de-generealize topology server

The topology server is partially based on a template that is much
more generic than what we need. This results in a code that is
unnecessarily hard to follow and keeping bug free.

We now take the consequence of the fact that we only have one such
server in TIPC, - with no prospects for introducing any more, and
adapt the code to the specialized task is really is doing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: rename tipc_server to tipc_topsrv
Jon Maloy [Thu, 15 Feb 2018 09:40:51 +0000 (10:40 +0100)]
tipc: rename tipc_server to tipc_topsrv

We rename struct tipc_server to struct tipc_topsrv. This reflect its now
specialized role as topology server. Accoringly, we change or add function
prefixes to make it clearer which functionality those belong to.

There are no functional changes in this commit.

Acked-by: Ying.Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: separate topology server listener socket from subcsriber sockets
Jon Maloy [Thu, 15 Feb 2018 09:40:50 +0000 (10:40 +0100)]
tipc: separate topology server listener socket from subcsriber sockets

We move the listener socket to struct tipc_server and give it its own
work item. This makes it easier to follow the code, and entails some
simplifications in the reception code in subscriber sockets.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: make struct tipc_server private for server.c
Jon Maloy [Thu, 15 Feb 2018 09:40:49 +0000 (10:40 +0100)]
tipc: make struct tipc_server private for server.c

In order to narrow the interface and dependencies between the topology
server and the subscription/binding table functionality we move struct
tipc_server inside the file server.c. This requires some code
adaptations in other files, but those are mostly minor.

The most important change is that we have to move the start/stop
functions for the topology server to server.c, where they logically
belong anyway.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: some prefix changes
Jon Maloy [Thu, 15 Feb 2018 09:40:48 +0000 (10:40 +0100)]
tipc: some prefix changes

Since we now have removed struct tipc_subscriber from the code, and
only struct tipc_subscription remains, there is no longer need for long
and awkward prefixes to distinguish between their pertaining functions.

We now change all tipc_subscrp_* prefixes to tipc_sub_*. This is
a purely cosmetic change.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: collapse subscription creation functions
Jon Maloy [Thu, 15 Feb 2018 09:40:47 +0000 (10:40 +0100)]
tipc: collapse subscription creation functions

After the previous changes it becomes logical to collapse the two-level
creation of subscription instances into one. We do that here.

We also rename the creation and deletion functions for more consistency.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: simplify endianness handling in topology subscriber
Jon Maloy [Thu, 15 Feb 2018 09:40:46 +0000 (10:40 +0100)]
tipc: simplify endianness handling in topology subscriber

Because of the requirement for total distribution transparency, users
send subscriptions and receive topology events in their own host format.
It is up to the topology server to determine this format and do the
correct conversions to and from its own host format when needed.

Until now, this has been handled in a rather non-transparent way inside
the topology server and subscriber code, leading to unnecessary
complexity when creating subscriptions and issuing events.

We now improve this situation by adding two new macros, tipc_sub_read()
and tipc_evt_write(). Both those functions calculate the need for
conversion internally before performing their respective operations.
Hence, all handling of such conversions become transparent to the rest
of the code.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: simplify interaction between subscription and topology connection
Jon Maloy [Thu, 15 Feb 2018 09:40:45 +0000 (10:40 +0100)]
tipc: simplify interaction between subscription and topology connection

The message transmission and reception in the topology server is more
generic than is currently necessary. By basing the funtionality on the
fact that we only send items of type struct tipc_event and always
receive items of struct tipc_subcr we can make several simplifications,
and also get rid of some unnecessary dynamic memory allocations.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: eliminate struct tipc_subscriber
Jon Maloy [Thu, 15 Feb 2018 09:40:44 +0000 (10:40 +0100)]
tipc: eliminate struct tipc_subscriber

It is unnecessary to keep two structures, struct tipc_conn and struct
tipc_subscriber, with a one-to-one relationship and still with different
life cycles. The fact that the two often run in different contexts, and
still may access each other via direct pointers constitutes an additional
hazard, something we have experienced at several occasions, and still
see happening.

We have identified at least two remaining problems that are easier to
fix if we simplify the topology server data structure somewhat.

- When there is a race between a subscription up/down event and a
  timeout event, it is fully possible that the former might be delivered
  after the latter, leading to confusion for the receiver.

- The function tipc_subcrp_timeout() is executing in interrupt context,
  while the following call chain is at least theoretically possible:
  tipc_subscrp_timeout()
    tipc_subscrp_send_event()
      tipc_conn_sendmsg()
        conn_put()
          tipc_conn_kref_release()
            sock_release(sock)

I.e., we end up calling a function that might try to sleep in
interrupt context. To eliminate this, we need to ensure that the
tipc_conn structure and the socket, as well as the subscription
instances, only are deleted in work queue context, i.e., after the
timeout event really has been sent out.

We now remove this unnecessary complexity, by merging data and
functionality of the subscriber structure into struct tipc_conn
and the associated file server.c. We thereafter add a spinlock and
a new 'inactive' state to the subscription structure. Using those,
both problems described above can be easily solved.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: remove unnecessary function pointers
Jon Maloy [Thu, 15 Feb 2018 09:40:43 +0000 (10:40 +0100)]
tipc: remove unnecessary function pointers

Interaction between the functionality in server.c and subscr.c is
done via function pointers installed in struct server. This makes
the code harder to follow, and doesn't serve any obvious purpose.

Here, we replace the function pointers with direct function calls.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: remove redundant code in topology server
Jon Maloy [Thu, 15 Feb 2018 09:40:42 +0000 (10:40 +0100)]
tipc: remove redundant code in topology server

The socket handling in the topology server is unnecessarily generic.
It is prepared to handle both SOCK_RDM, SOCK_DGRAM and SOCK_STREAM
type sockets, as well as the only socket type which is really used,
SOCK_SEQPACKET.

We now remove this redundant code to make the code more readable.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetoot...
David S. Miller [Thu, 15 Feb 2018 20:43:49 +0000 (15:43 -0500)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2018-02-15

Here's the first bluetooth-next pull request targetting the 4.17 kernel
release.

 - Fixes & cleanups to Atheros and Marvell drivers
 - Support for two new Realtek controllers
 - Support for new Intel Bluetooth controller
 - Fix for supporting multiple slave-role Bluetooth LE connections
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests/net: fixes psock_fanout eBPF test case
Prashant Bhole [Thu, 15 Feb 2018 00:19:26 +0000 (09:19 +0900)]
selftests/net: fixes psock_fanout eBPF test case

eBPF test fails due to verifier failure because log_buf is too small.
Fixed by increasing log_buf size

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ipv4: Remove fib table id from rtable
David Ahern [Wed, 14 Feb 2018 22:24:28 +0000 (14:24 -0800)]
net/ipv4: Remove fib table id from rtable

Remove rt_table_id from rtable. It was added for getroute to return the
table id that was hit in the lookup. With the changes for fibmatch the
table id can be extracted from the fib_info returned in the fib_result
so it no longer needs to be in rtable directly.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'tc-testing-plugin-architecture'
David S. Miller [Thu, 15 Feb 2018 20:38:33 +0000 (15:38 -0500)]
Merge branch 'tc-testing-plugin-architecture'

Brenda J. Butler says:

====================
tools: tc-testing: Plugin Architecture

To make tdc.py more general, we are introducing a plugin architecture.

This patch set first organizes the command line parameters, then
introduces the plugin architecture and some example plugins.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotools: tc-testing: Update README and TODO
Brenda J. Butler [Wed, 14 Feb 2018 19:09:25 +0000 (14:09 -0500)]
tools: tc-testing: Update README and TODO

Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotools: tc-testing: valgrindPlugin
Brenda J. Butler [Wed, 14 Feb 2018 19:09:24 +0000 (14:09 -0500)]
tools: tc-testing: valgrindPlugin

Run the command under test under valgrind.  Produce an extra set of
tap output for the memory check on each test.

Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotools: tc-testing: nsPlugin
Brenda J. Butler [Wed, 14 Feb 2018 19:09:23 +0000 (14:09 -0500)]
tools: tc-testing: nsPlugin

Move the functionality of creating a namespace before the test suite
and destroying it afterwards to a plugin.

Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotools: tc-testing: rootPlugin
Brenda J. Butler [Wed, 14 Feb 2018 19:09:22 +0000 (14:09 -0500)]
tools: tc-testing: rootPlugin

Move the functionality that checks for root permissions into a plugin.

Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotools: tc-testing: Introduce plugin architecture
Brenda J. Butler [Wed, 14 Feb 2018 19:09:21 +0000 (14:09 -0500)]
tools: tc-testing: Introduce plugin architecture

This should be a general test architecture, and yet allow specific
tests to be done.  Introduce a plugin architecture.

An individual test has 4 stages, setup/execute/verify/teardown.  Each
plugin gets a chance to run a function at each stage, plus one call
before all the tests are called ("pre" suite) and one after all the
tests are called ("post" suite).  In addition, just before each
command is executed, the plugin gets a chance to modify the command
using the "adjust_command" hook.  This makes the test suite quite
flexible.

Future patches will take some functionality out of the tdc.py script and
place it in plugins.

To use the plugins, place the implementation in the plugins directory
and run tdc.py.  It will notice the plugins and use them.

Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotools: tc-testing: Refactor test-runner
Brenda J. Butler [Wed, 14 Feb 2018 19:09:20 +0000 (14:09 -0500)]
tools: tc-testing: Refactor test-runner

Split the test_runner function into the loop part (test_runner)
and the contents (run_one_test) for maintainability.
It makes it a little easier to catch exceptions
in an individual test, and keep going (and flush a bunch
of tap results for the skipped tests).

Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotools: tc-testing: Command line parms
Brenda J. Butler [Wed, 14 Feb 2018 19:09:19 +0000 (14:09 -0500)]
tools: tc-testing: Command line parms

Separate the functionality of the command line parameters into "selection"
parameters, "action" parameters and other parameters.

"Selection" parameters are for choosing which tests on which to act.
"Action" parameters are for choosing what to do with the selected tests.
"Other" parameters are for global effect (like "help" or "verbose").

With this commit, we add the ability to name a directory as another
selection mechanism.  We can accumulate a number of tests by directory,
file, category, or even by test id, instead of being constrained to
run all tests in one collection or just one test.

Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'tunchr-get-netns'
David S. Miller [Thu, 15 Feb 2018 20:34:42 +0000 (15:34 -0500)]
Merge branch 'tunchr-get-netns'

Kirill Tkhai says:

====================
net: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device

Currently, it's not possible to get or check net namespace,
which was used to create tun socket. User may have two tun
devices with the same names in different nets, and there
is no way to differ them each other.

The patchset adds support for ioctl() cmd SIOCGSKNS for tun
devices. It will allow people to obtain net namespace file
descriptor like we allow to do that for sockets in general.

v2: Add new patch [2/3] to export open_related_ns().
====================

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device
Kirill Tkhai [Wed, 14 Feb 2018 13:40:14 +0000 (16:40 +0300)]
tun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device

This patch adds possibility to get tun device's net namespace fd
in the same way we allow to do that for sockets.

Socket ioctl numbers do not intersect with tun-specific, and there
is already SIOCSIFHWADDR used in tun code. So, SIOCGSKNS number
is choosen instead of custom-made for this functionality.

Note, that open_related_ns() uses plain get_net_ns() and it's safe
(net can't be already dead at this moment):

  tun socket is allocated via sk_alloc() with zero last arg (kern = 0).
  So, each alive socket increments net::count, and the socket is definitely
  alive during ioctl syscall.

Also, common variable net is introduced, so small cleanup in TUNSETIFF
is made.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Export open_related_ns()
Kirill Tkhai [Wed, 14 Feb 2018 13:40:05 +0000 (16:40 +0300)]
net: Export open_related_ns()

This function will be used to obtain net of tun device.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Make extern and export get_net_ns()
Kirill Tkhai [Wed, 14 Feb 2018 13:39:56 +0000 (16:39 +0300)]
net: Make extern and export get_net_ns()

This function will be used to obtain net of tun device.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Wed, 14 Feb 2018 20:44:05 +0000 (15:44 -0500)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2018-02-14

This patch series enables the new mqprio hardware offload mechanism
creating traffic classes on VFs for XL710 devices. The parameters
needed to configure these traffic classes/queue channels are provides
by the user via the tc tool. A maximum of four traffic classes can be
created on each VF. This patch series also enables application of cloud
filters to each of these traffic classes. The cloud filters are applied
using the tc-flower classifier.

Example:
    1. tc qdisc add dev vf0 root mqprio num_tc 4 map 0 0 0 0 1 2 2 3\
        queues 2@0 2@2 1@4 1@5 hw 1 mode channel
    2. tc qdisc add dev vf0 ingress
    3. ethtool -K vf0 hw-tc-offload on
    4. ip link set eth0 vf 0 spoofchk off
    5. tc filter add dev vf0 protocol ip parent ffff: prio 1 flower dst_ip\
        192.168.3.5/32 ip_proto udp dst_port 25 skip_sw hw_tc 2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agokcm: Call strp_stop before strp_done in kcm_attach
Tom Herbert [Wed, 14 Feb 2018 17:22:42 +0000 (09:22 -0800)]
kcm: Call strp_stop before strp_done in kcm_attach

In kcm_attach strp_done is called when sk_user_data is already
set to fail the attach. strp_done needs the strp to be stopped and
warns if it isn't. Call strp_stop in this case to eliminate the
warning message.

Reported-by: syzbot+88dfb55e4c8b770d86e3@syzkaller.appspotmail.com
Fixes: e5571240236c5652f ("kcm: Check if sk_user_data already set in kcm_attach"
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: dp83867: Add documentation for CLK_OUT pin muxing
Wadim Egorov [Wed, 14 Feb 2018 16:07:12 +0000 (17:07 +0100)]
net: phy: dp83867: Add documentation for CLK_OUT pin muxing

Add documentation of ti,clk-output-sel which can be used to select
a specific clock for CLK_OUT.

Signed-off-by: Wadim Egorov <w.egorov@phytec.de>
Signed-off-by: Daniel Schultz <d.schultz@phytec.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: dp83867: Add binding for the CLK_OUT pin muxing option
Wadim Egorov [Wed, 14 Feb 2018 16:07:11 +0000 (17:07 +0100)]
net: phy: dp83867: Add binding for the CLK_OUT pin muxing option

The DP83867 has a muxing option for the CLK_OUT pin. It is possible
to set CLK_OUT for different channels.
Create a binding to select a specific clock for CLK_OUT pin.

Signed-off-by: Wadim Egorov <w.egorov@phytec.de>
Signed-off-by: Daniel Schultz <d.schultz@phytec.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: apply bearer link tolerance on running links
Jon Maloy [Wed, 14 Feb 2018 12:34:39 +0000 (13:34 +0100)]
tipc: apply bearer link tolerance on running links

Currently, the default link tolerance set in struct tipc_bearer only
has effect on links going up after that moment. I.e., a user has to
reset all the node's links across that bearer to have the new value
applied. This is too limiting and disturbing on a running cluster to
be useful.

We now change this so that also already existing links are updated
dynamically, without any need for a reset, when the bearer value is
changed. We leverage the already existing per-link functionality
for this to achieve the wanted effect.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'cxgb4-speed-up-reading-on-chip-memory'
David S. Miller [Wed, 14 Feb 2018 20:01:52 +0000 (15:01 -0500)]
Merge branch 'cxgb4-speed-up-reading-on-chip-memory'

Rahul Lakkireddy says:

====================
cxgb4: speed up reading on-chip memory

This series of patches speed up reading on-chip memory (EDC and MC)
by reading 64-bits at a time.

Patch 1 reworks logic to read EDC and MC.

Patch 2 adds logic to read EDC and MC 64-bits at a time.

v2:
- Dropped AVX CPU intrinsic instructions.
- Use readq() to read 64-bits at a time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: speed up on-chip memory read
Rahul Lakkireddy [Wed, 14 Feb 2018 07:26:28 +0000 (12:56 +0530)]
cxgb4: speed up on-chip memory read

Use readq() (via t4_read_reg64()) to read 64-bits at a time.
Read residual in 32-bit multiples.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: rework on-chip memory read
Rahul Lakkireddy [Wed, 14 Feb 2018 07:26:27 +0000 (12:56 +0530)]
cxgb4: rework on-chip memory read

Rework logic to read EDC and MC. Do 32-bit reads at a time.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Move ipv4 set_lwt_redirect helper to lwtunnel
David Ahern [Wed, 14 Feb 2018 04:32:04 +0000 (20:32 -0800)]
net: Move ipv4 set_lwt_redirect helper to lwtunnel

IPv4 uses set_lwt_redirect to set the lwtunnel redirect functions as
needed. Move it to lwtunnel.h as lwtunnel_set_redirect and change
IPv6 to also use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'PTP-support-for-DSA-and-mv88e6xxx-driver'
David S. Miller [Wed, 14 Feb 2018 19:33:38 +0000 (14:33 -0500)]
Merge branch 'PTP-support-for-DSA-and-mv88e6xxx-driver'

Andrew Lunn says:

====================
PTP support for DSA and mv88e6xxx driver.

This patchset adds support for using the PTP hardware in switches
supported by the mv88e6xxx driver. The code was produces in
collaboration with Brandon Streiff doing the initial implementation,
and then Richard Cochran and Andrew Lunn making further changes and
cleanups.

The code is sufficient to use ptp4l on a single DSA interface, either
as a master or a slave. Due to the use of an MDIO bus to access the
switch, reading hardware timestamps is slower than what ptp4l
expects. Thus it is necessary to use the option
--tx_timestamp_timeout=32. Heavy use of ethtool -S, or bridge fdb show
can also upset ptp4l. Patches to address this will follow.

Further work is requires to support bridges using Boundary Clock or
Transparent Clock mode.

Since the RFC, an overflow bug has been fixed. Brandon Streiff
has also Acked-by: the updates to his initial patchset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: add workaround for 6341 timestamping
Brandon Streiff [Wed, 14 Feb 2018 00:07:51 +0000 (01:07 +0100)]
net: dsa: mv88e6xxx: add workaround for 6341 timestamping

88E6341 devices default to timestamping at the PHY, but due to a
hardware issue, timestamps via this component are unreliable. For
this family, configure the PTP hardware to force the timestamping
to occur at the MAC.

Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: add rx/tx timestamping support
Brandon Streiff [Wed, 14 Feb 2018 00:07:50 +0000 (01:07 +0100)]
net: dsa: mv88e6xxx: add rx/tx timestamping support

This patch implements RX/TX timestamping support.

The Marvell PTP hardware supports RX timestamping individual message
types, but for simplicity we only support the EVENT receive filter since
few if any clients bother with the more specific filter types.

checkpatch and reverse Christmas tree changes by Andrew Lunn.

Re-factor duplicated code paths and avoid IfOk anti-pattern, use the
common ptp worker thread from the class layer and time stamp UDP/IPv4
frames as well as Layer-2 frame by Richard Cochran.

Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: forward timestamping callbacks to switch drivers
Brandon Streiff [Wed, 14 Feb 2018 00:07:49 +0000 (01:07 +0100)]
net: dsa: forward timestamping callbacks to switch drivers

Forward the rx/tx timestamp machinery from the dsa infrastructure to the
switch driver.

On the rx side, defer delivery of skbs until we have an rx timestamp.
This mimicks the behavior of skb_defer_rx_timestamp.

On the tx side, identify PTP packets, clone them, and pass them to the
underlying switch driver before we transmit. This mimicks the behavior
of skb_tx_timestamp.

Adjusted txstamp API to keep the allocation and freeing of the clone
in the same central function by Richard Cochran

Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: forward hardware timestamping ioctls to switch driver
Brandon Streiff [Wed, 14 Feb 2018 00:07:48 +0000 (01:07 +0100)]
net: dsa: forward hardware timestamping ioctls to switch driver

This patch adds support to the dsa slave network device so that
switch drivers can implement the SIOC[GS]HWTSTAMP ioctls and the
ethtool timestamp-info interface.

Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: add support for event capture
Brandon Streiff [Wed, 14 Feb 2018 00:07:47 +0000 (01:07 +0100)]
net: dsa: mv88e6xxx: add support for event capture

This patch adds support for configuring mv88e6xxx GPIO lines as PTP
pins, so that they may be used for time stamping external events or for
periodic output.

Checkpatch and reverse Christmas tree fixes by Andrew Lunn

Periodic output removed by Richard Cochran, until a better abstraction
of a VCO is added to Linux in general.

Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: add support for GPIO configuration
Brandon Streiff [Wed, 14 Feb 2018 00:07:46 +0000 (01:07 +0100)]
net: dsa: mv88e6xxx: add support for GPIO configuration

MV88E6352 and later switches support GPIO control through the "Scratch
& Misc" global2 register. (Older switches do too, though with a slightly
different register interface. Only the 6352-style is implemented here.)

Add a new file, global2_scratch.c, for operations in the Scratch & Misc
space. Additionally, add a GPIO operations structure to present an
abstract view over GPIO manipulation.

Reverse Christmas tree and unsigned has been replaced with unsigned
int by Andrew Lunn.

Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: expose switch time as a PTP hardware clock
Brandon Streiff [Wed, 14 Feb 2018 00:07:45 +0000 (01:07 +0100)]
net: dsa: mv88e6xxx: expose switch time as a PTP hardware clock

This patch adds basic support for exposing the 32-bit timestamp counter
inside the mv88e6xxx switch as a ptp_clock.

Adjfine implemented by Richard Cochran.
Andrew Lunn: fix return value of PTP stub function.

Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: add accessors for PTP/TAI registers
Brandon Streiff [Wed, 14 Feb 2018 00:07:44 +0000 (01:07 +0100)]
net: dsa: mv88e6xxx: add accessors for PTP/TAI registers

This patch implements support for accessing the Precision Time Protocol
and Time Application Interface registers via the AVB register interface
in the Global 2 register.

The register interface differs slightly between different models; older
models use a 3-bit operations field, while newer models use a 2-bit
field. The operations values and the special "global port" values are
different between the two. This is a similar split to the differences
in the "Ingress Rate" register between models, so, like in that case,
we call the two variants "6352" and "6390" and create an ops structure
to abstract between the two.

checkpatch fixups by Andrew Lunn

Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: export g2 register accessors
Brandon Streiff [Wed, 14 Feb 2018 00:07:43 +0000 (01:07 +0100)]
net: dsa: mv88e6xxx: export g2 register accessors

Let the mv88e6xxx_g2_* register accessor functions be accessible
outside of global2.c.

Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ptp: Add stub for ptp_classify_raw()
Andrew Lunn [Wed, 14 Feb 2018 00:07:42 +0000 (01:07 +0100)]
net: ptp: Add stub for ptp_classify_raw()

When NET_PTP_CLASSIFY is disabled, a stub function is required in
order that the drivers compile.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: try to keep packet if SYN_RCV race is lost
Eric Dumazet [Tue, 13 Feb 2018 14:14:12 +0000 (06:14 -0800)]
tcp: try to keep packet if SYN_RCV race is lost

배석진 reported that in some situations, packets for a given 5-tuple
end up being processed by different CPUS.

This involves RPS, and fragmentation.

배석진 is seeing packet drops when a SYN_RECV request socket is
moved into ESTABLISH state. Other states are protected by socket lock.

This is caused by a CPU losing the race, and simply not caring enough.

Since this seems to occur frequently, we can do better and perform
a second lookup.

Note that all needed memory barriers are already in the existing code,
thanks to the spin_lock()/spin_unlock() pair in inet_ehash_insert()
and reqsk_put(). The second lookup must find the new socket,
unless it has already been accepted and closed by another cpu.

Note that the fragmentation could be avoided in the first place by
use of a correct TCP MSS option in the SYN{ACK} packet, but this
does not mean we can not be more robust.

Many thanks to 배석진 for a very detailed analysis.

Reported-by: 배석진 <soukjin.bae@samsung.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoi40e: Add and delete cloud filter
Avinash Dayanand [Tue, 23 Jan 2018 16:51:06 +0000 (08:51 -0800)]
i40e: Add and delete cloud filter

This patch provides support to add or delete cloud filter for queue
channels created for ADq on VF.
We are using the HW's cloud filter feature and programming it to act
as a TC filter applied to a group of queues.

There are two possible modes for a VF when applying a cloud filter
1. Basic Mode: Intended to apply filters that don't need a VF to be
Trusted. This would include the following
  Dest MAC + L4 port
  Dest MAC + VLAN + L4 port
2. Advanced Mode: This mode is only for filters with combination that
  requires VF to be Trusted.
  Dest IP + L4 port

When cloud filters are applied on a trusted VF and for some reason
the same VF is later made as untrusted then all cloud filters
will be deleted. All cloud filters has to be re-applied in
such a case.
Cloud filters are also deleted when queue channel is deleted.

Testing-Hints:
=============
1. Adding Basic Mode filter should be possible on a VF in
   Non-Trusted mode.
2. In Advanced mode all filters should be able to be created.

Steps:
======
1. Enable ADq and create TCs using TC mqprio command
2. Apply cloud filter.
3. Turn-off the spoof check.
4. Pass traffic.

Example:
========
1. tc qdisc add dev enp4s2 root mqprio num_tc 4 map 0 0 0 0 1 2 2 3\
queues 2@0 2@2 1@4 1@5 hw 1 mode channel
2. tc qdisc add dev enp4s2 ingress
3. ethtool -K enp4s2 hw-tc-offload on
4. ip link set ens261f0 vf 0 spoofchk off
5. tc filter add dev enp4s2 protocol ip parent ffff: prio 1 flower\
dst_ip 192.168.3.5/32 ip_proto udp dst_port 25 skip_sw hw_tc 2

Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40evf: Add support to apply cloud filters
Harshitha Ramamurthy [Tue, 23 Jan 2018 16:51:05 +0000 (08:51 -0800)]
i40evf: Add support to apply cloud filters

This patch enables a tc filter to be applied as a cloud
filter for the VF. This patch adds functions which parse the
tc filter, extract the necessary fields needed to configure the
filter and package them in a virtchnl message to be sent to the
PF to apply them.

Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agovirtchnl: Add filter data structures
Harshitha Ramamurthy [Tue, 23 Jan 2018 16:51:04 +0000 (08:51 -0800)]
virtchnl: Add filter data structures

This patch adds infrastructure to send virtchnl messages to the
PF to configure filters on the VF. The patch adds a struct
called virtchnl_filter which contains information about the fields
in the user-specified tc filter.

Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agovirtchnl: Add a macro to check the size of a union
Harshitha Ramamurthy [Tue, 23 Jan 2018 16:51:03 +0000 (08:51 -0800)]
virtchnl: Add a macro to check the size of a union

This patch adds a macro to check if the size of a union is correct.
It throws a divide by zero error if the union is not of the correct
size.

Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: Service request to configure bandwidth for ADq on a VF
Avinash Dayanand [Tue, 23 Jan 2018 16:51:02 +0000 (08:51 -0800)]
i40e: Service request to configure bandwidth for ADq on a VF

This patch handles the request from ADq enabled VF to allocate
bandwidth to each traffic class which means for each VSI.

Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>