platform/kernel/linux-exynos.git
7 years agoMerge branch 'netlink-extack-route-add-del'
David S. Miller [Mon, 22 May 2017 16:12:21 +0000 (12:12 -0400)]
Merge branch 'netlink-extack-route-add-del'

David Ahern says:

====================
net: Add extack for route add/delete failures

Use the extack feature to improve error messages to user on route
add and delete failures.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ipv6: Add extack messages for route add failures
David Ahern [Sun, 21 May 2017 16:12:05 +0000 (10:12 -0600)]
net: ipv6: Add extack messages for route add failures

Add messages for non-obvious errors (e.g, no need to add text for malloc
failures or ENODEV failures). This mostly covers the annoying EINVAL errors
Some message strings violate the 80-columns but searchable strings need to
trump that rule.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ipv6: Plumb extack through route add functions
David Ahern [Sun, 21 May 2017 16:12:04 +0000 (10:12 -0600)]
net: ipv6: Plumb extack through route add functions

Plumb extack argument down to route add functions.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ipv4: Add extack messages for route add failures
David Ahern [Sun, 21 May 2017 16:12:03 +0000 (10:12 -0600)]
net: ipv4: Add extack messages for route add failures

Add messages for non-obvious errors (e.g, no need to add text for malloc
failures or ENODEV failures). This mostly covers the annoying EINVAL errors
Some message strings violate the 80-columns but searchable strings need to
trump that rule.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ipv4: Plumb extack through route add functions
David Ahern [Sun, 21 May 2017 16:12:02 +0000 (10:12 -0600)]
net: ipv4: Plumb extack through route add functions

Plumb extack argument down to route add functions.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomacsec: double accounting of dropped rx/tx packets
Girish Moodalbail [Fri, 19 May 2017 22:25:44 +0000 (15:25 -0700)]
macsec: double accounting of dropped rx/tx packets

The macsec implementation shouldn't account for rx/tx packets that are
dropped in the netdev framework. The netdev framework itself accounts
for such packets by atomically updating struct net_device`rx_dropped and
struct net_device`tx_dropped fields. Later on when the stats for macsec
link is retrieved, the packets dropped in netdev framework will be
included in dev_get_stats() after calling macsec.c`macsec_get_stats64()

Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: Fix parisc SCM_TIMESTAMPING_PKTINFO value.
David S. Miller [Mon, 22 May 2017 14:26:24 +0000 (10:26 -0400)]
net: Fix parisc SCM_TIMESTAMPING_PKTINFO value.

Needs to follow the existing sequence.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: Define SCM_TIMESTAMPING_PKTINFO on all architectures.
David S. Miller [Mon, 22 May 2017 03:13:37 +0000 (23:13 -0400)]
net: Define SCM_TIMESTAMPING_PKTINFO on all architectures.

A definition was only provided for asm-generic/socket.h
using platforms, define it for the others as well

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: fix tcp_probe_timer() for TCP_USER_TIMEOUT
Eric Dumazet [Sun, 21 May 2017 17:39:00 +0000 (10:39 -0700)]
tcp: fix tcp_probe_timer() for TCP_USER_TIMEOUT

TCP_USER_TIMEOUT is still converted to jiffies value in
icsk_user_timeout

So we need to make a conversion for the cases HZ != 1000

Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: drop unused variables in seg6_genl_dumphac
stephen hemminger [Fri, 19 May 2017 16:55:55 +0000 (09:55 -0700)]
ipv6: drop unused variables in seg6_genl_dumphac

THe seg6_pernet_data variable was set but never used.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agofou: make local function static
stephen hemminger [Fri, 19 May 2017 16:55:54 +0000 (09:55 -0700)]
fou: make local function static

The build header functions are not used by any other code.

net/ipv6/fou6.c:36:5: warning: no previous prototype for ‘fou6_build_header’ [-Wmissing-prototypes]
net/ipv6/fou6.c:54:5: warning: no previous prototype for ‘gue6_build_header’ [-Wmissing-prototypes]

Need to do some code rearranging to satisfy different Kconfig possiblities.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcpnv: do not export local function
stephen hemminger [Fri, 19 May 2017 16:55:52 +0000 (09:55 -0700)]
tcpnv: do not export local function

The TCP New Vegas congestion control was exporting an internal
function tcpnv_get_info which is not used by any other in tree
kernel code. Make it static.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoinet: fix warning about missing prototype
stephen hemminger [Fri, 19 May 2017 16:55:51 +0000 (09:55 -0700)]
inet: fix warning about missing prototype

The prototype for inet_rcv_saddr_equal was not being included.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoila: propagate error code in ila_output
stephen hemminger [Fri, 19 May 2017 16:55:49 +0000 (09:55 -0700)]
ila: propagate error code in ila_output

This warning:
net/ipv6/ila/ila_lwt.c: In function ‘ila_output’:
net/ipv6/ila/ila_lwt.c:42:6: warning: variable ‘err’ set but not used [-Wunused-but-set-variable]

It looks like the code attempts to set propagate different error
values, but always returned -EINVAL.

Compile tested only. Needs review by original author.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodcb: enforce minimum length on IEEE_APPS attribute
stephen hemminger [Fri, 19 May 2017 16:55:48 +0000 (09:55 -0700)]
dcb: enforce minimum length on IEEE_APPS attribute

Found by reviewing the warning about unused policy table.
The code implies that it meant to check for size, but since
it unrolled the loop for attribute validation that is never used.
Instead do explicit check for attribute.

Compile tested only. Needs review by original author.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'net-extend-socket-timestamping-API'
David S. Miller [Sun, 21 May 2017 17:37:35 +0000 (13:37 -0400)]
Merge branch 'net-extend-socket-timestamping-API'

Miroslav Lichvar says:

====================
Extend socket timestamping API

Changes v5->v6:
- fixed skb_is_swtx_tstamp() when OPT_TX_SWHW is disabled and improved
  its description
- improved OPT_PKTINFO documentation
- improved scm_timestamping documentation

Changes v4->v5:
- fixed initialization of reserved fields in struct scm_ts_pktinfo

Changes v3->v4:
- added reserved fields to struct scm_ts_pktinfo
- replaced patch fixing false SW timestamps with a documentation fix
- updated OPT_TX_SWHW patch to handle false SW timestamps

Changes v2->v3:
- modified struct scm_ts_pktinfo to use fixed-width integer types
- added WARN_ON_ONCE for missing RCU lock in dev_get_by_napi_id()
- modified dev_get_by_napi_id() to not return dev in unexpected branch
- modified recv to return SCM_TIMESTAMPING_PKTINFO even if the interface
  index is unknown

Changes v1->v2:
- added separate patch for new NAPI functions
- split code from __sock_recv_timestamp() for better readability
- fixed RCU locking
- fixed compiler warning (missing case in switch in first patch)
- inline sw_tx_timestamp() in its only user

Changes RFC->v1:
- reworked SOF_TIMESTAMPING_OPT_PKTINFO patch to not add new fields to
  skb shared info (net device is now looked up by napi_id), not require
  any changes in drivers, and restrict the cmsg to incoming packets
- renamed SOF_TIMESTAMPING_OPT_MULTIMSG to SOF_TIMESTAMPING_OPT_TX_SWHW
  and fixed its description
- moved struct scm_ts_pktinfo from errqueue.h to net_tstamp.h as it
  can't be received from the error queue anymore
- improved commit descriptions and removed incorrect comment

This patchset adds new options to the timestamping API that will be
useful for NTP implementations and possibly other applications.

The first patch specifies a timestamp filter for NTP packets. The second
patch updates drivers that can timestamp all packets, or need to list
the filter as unsupported. There is no attempt to add the support to the
phyter driver.

The third patch adds two helper functions working with NAPI ID, which is
needed by the next patch. The fourth patch adds a new option to get a
new control message with the L2 length and interface index for incoming
packets with hardware timestamps.

The fifth patch fixes documentation on number of non-zero fields in
scm_timestamping and warns about false software timestamps when
SO_TIMESTAMP(NS) is combined with SCM_TIMESTAMPING.

The sixth patch adds a new option to request both software and hardware
timestamps for outgoing packets. The seventh patch updates drivers that
assumed software timestamping cannot be used together with hardware
timestamping.

The patches have been tested on x86_64 machines with igb and e1000e
drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethernet: update drivers to make both SW and HW TX timestamps
Miroslav Lichvar [Fri, 19 May 2017 15:52:41 +0000 (17:52 +0200)]
net: ethernet: update drivers to make both SW and HW TX timestamps

Some drivers were calling the skb_tx_timestamp() function only when
a hardware timestamp was not requested. Now that applications can use
the SOF_TIMESTAMPING_OPT_TX_SWHW option to request both software and
hardware timestamps, the drivers need to be modified to unconditionally
call skb_tx_timestamp().

CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: allow simultaneous SW and HW transmit timestamping
Miroslav Lichvar [Fri, 19 May 2017 15:52:40 +0000 (17:52 +0200)]
net: allow simultaneous SW and HW transmit timestamping

Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to
be looped to the socket's error queue with a software timestamp even
when a hardware transmit timestamp is expected to be provided by the
driver.

Applications using this option will receive two separate messages from
the error queue, one with a software timestamp and the other with a
hardware timestamp. As the hardware timestamp is saved to the shared skb
info, which may happen before the first message with software timestamp
is received by the application, the hardware timestamp is copied to the
SCM_TIMESTAMPING control message only when the skb has no software
timestamp or it is an incoming packet.

While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as
there are no other users.

CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: fix documentation of struct scm_timestamping
Miroslav Lichvar [Fri, 19 May 2017 15:52:39 +0000 (17:52 +0200)]
net: fix documentation of struct scm_timestamping

The scm_timestamping struct may return multiple non-zero fields, e.g.
when both software and hardware RX timestamping is enabled, or when the
SO_TIMESTAMP(NS) option is combined with SCM_TIMESTAMPING and a false
software timestamp is generated in the recvmsg() call in order to always
return a SCM_TIMESTAMP(NS) message.

CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: add new control message for incoming HW-timestamped packets
Miroslav Lichvar [Fri, 19 May 2017 15:52:38 +0000 (17:52 +0200)]
net: add new control message for incoming HW-timestamped packets

Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message
for incoming packets with hardware timestamps. It contains the index of
the real interface which received the packet and the length of the
packet at layer 2.

The index is useful with bonding, bridges and other interfaces, where
IP_PKTINFO doesn't allow applications to determine which PHC made the
timestamp. With the L2 length (and link speed) it is possible to
transpose preamble timestamps to trailer timestamps, which are used in
the NTP protocol.

While this information could be provided by two new socket options
independently from timestamping, it doesn't look like they would be very
useful. With this option any performance impact is limited to hardware
timestamping.

Use dev_get_by_napi_id() to get the device and its index. On kernels
with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero
index will be returned in the control message.

CC: Richard Cochran <richardcochran@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: add function to retrieve original skb device using NAPI ID
Miroslav Lichvar [Fri, 19 May 2017 15:52:37 +0000 (17:52 +0200)]
net: add function to retrieve original skb device using NAPI ID

Since commit b68581778cd0 ("net: Make skb->skb_iif always track
skb->dev") skbs don't have the original index of the interface which
received the packet. This information is now needed for a new control
message related to hardware timestamping.

Instead of adding a new field to skb, we can find the device by the NAPI
ID if it is available, i.e. CONFIG_NET_RX_BUSY_POLL is enabled and the
driver is using NAPI. Add dev_get_by_napi_id() and also skb_napi_id() to
hide the CONFIG_NET_RX_BUSY_POLL ifdef.

CC: Richard Cochran <richardcochran@gmail.com>
Suggested-by: Willem de Bruijn <willemb@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethernet: update drivers to handle HWTSTAMP_FILTER_NTP_ALL
Miroslav Lichvar [Fri, 19 May 2017 15:52:36 +0000 (17:52 +0200)]
net: ethernet: update drivers to handle HWTSTAMP_FILTER_NTP_ALL

Include HWTSTAMP_FILTER_NTP_ALL in net_hwtstamp_validate() as a valid
filter and update drivers which can timestamp all packets, or which
explicitly list unsupported filters instead of using a default case, to
handle the filter.

CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: define receive timestamp filter for NTP
Miroslav Lichvar [Fri, 19 May 2017 15:52:35 +0000 (17:52 +0200)]
net: define receive timestamp filter for NTP

Add HWTSTAMP_FILTER_NTP_ALL to the hwtstamp_rx_filters enum for
timestamping of NTP packets. There is currently only one driver
(phyter) that could support it directly.

CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4 : retrieve port information from firmware
Ganesh Goudar [Fri, 19 May 2017 12:20:15 +0000 (17:50 +0530)]
cxgb4 : retrieve port information from firmware

issue get port information command to firmware to retrieve port
information and update if it is different from what was last
recorded and also add indication for supported link modes for
firmware port types FW_PORT_TYPE_SFP28, FW_PORT_TYPE_KR_SFP28,
FW_PORT_TYPE_CR4_QSFP.

Based on the original work by Casey Leedom <leedom@chelsio.com>

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmveth: Support to enable LSO/CSO for Trunk VEA.
Sivakumar Krishnasamy [Fri, 19 May 2017 09:30:38 +0000 (05:30 -0400)]
ibmveth: Support to enable LSO/CSO for Trunk VEA.

Current largesend and checksum offload feature in ibmveth driver,
 - Source VM sends the TCP packets with ip_summed field set as
   CHECKSUM_PARTIAL and TCP pseudo header checksum is placed in
   checksum field
 - CHECKSUM_PARTIAL flag in SKB will enable ibmveth driver to mark
   "no checksum" and "checksum good" bits in transmit buffer descriptor
   before the packet is delivered to pseries PowerVM Hypervisor
 - If ibmveth has largesend capability enabled, transmit buffer descriptors
   are market accordingly before packet is delivered to Hypervisor
   (along with mss value for packets with length > MSS)
 - Destination VM's ibmveth driver receives the packet with "checksum good"
   bit set and so, SKB's ip_summed field is set with CHECKSUM_UNNECESSARY
 - If "largesend" bit was on, mss value is copied from receive descriptor
   into SKB's gso_size and other flags are appropriately set for
   packets > MSS size
 - The packet is now successfully delivered up the stack in destination VM

The offloads described above works fine for TCP communication among VMs in
the same pseries server ( VM A <=> PowerVM Hypervisor <=> VM B )

We are now enabling support for OVS in pseries PowerVM environment. One of
our requirements is to have ibmveth driver configured in "Trunk" mode, when
they are used with OVS. This is because, PowerVM Hypervisor will no more
bridge the packets between VMs, instead the packets are delivered to
IO Server which hosts OVS to bridge them between VMs or to external
networks (flow shown below),
  VM A <=> PowerVM Hypervisor <=> IO Server(OVS) <=> PowerVM Hypervisor
                                                                   <=> VM B
In "IO server" the packet is received by inbound Trunk ibmveth and then
delivered to OVS, which is then bridged to outbound Trunk ibmveth (shown
below),
        Inbound Trunk ibmveth <=> OVS <=> Outbound Trunk ibmveth

In this model, we hit the following issues which impacted the VM
communication performance,

 - Issue 1: ibmveth doesn't support largesend and checksum offload features
   when configured as "Trunk". Driver has explicit checks to prevent
   enabling these offloads.

 - Issue 2: SYN packet drops seen at destination VM. When the packet
   originates, it has CHECKSUM_PARTIAL flag set and as it gets delivered to
   IO server's inbound Trunk ibmveth, on validating "checksum good" bits
   in ibmveth receive routine, SKB's ip_summed field is set with
   CHECKSUM_UNNECESSARY flag. This packet is then bridged by OVS (or Linux
   Bridge) and delivered to outbound Trunk ibmveth. At this point the
   outbound ibmveth transmit routine will not set "no checksum" and
   "checksum good" bits in transmit buffer descriptor, as it does so only
   when the ip_summed field is CHECKSUM_PARTIAL. When this packet gets
   delivered to destination VM, TCP layer receives the packet with checksum
   value of 0 and with no checksum related flags in ip_summed field. This
   leads to packet drops. So, TCP connections never goes through fine.

 - Issue 3: First packet of a TCP connection will be dropped, if there is
   no OVS flow cached in datapath. OVS while trying to identify the flow,
   computes the checksum. The computed checksum will be invalid at the
   receiving end, as ibmveth transmit routine zeroes out the pseudo
   checksum value in the packet. This leads to packet drop.

 - Issue 4: ibmveth driver doesn't have support for SKB's with frag_list.
   When Physical NIC has GRO enabled and when OVS bridges these packets,
   OVS vport send code will end up calling dev_queue_xmit, which in turn
   calls validate_xmit_skb.
   In validate_xmit_skb routine, the larger packets will get segmented into
   MSS sized segments, if SKB has a frag_list and if the driver to which
   they are delivered to doesn't support NETIF_F_FRAGLIST feature.

This patch addresses the above four issues, thereby enabling end to end
largesend and checksum offload support for better performance.

 - Fix for Issue 1 : Remove checks which prevent enabling TCP largesend and
   checksum offloads.
 - Fix for Issue 2 : When ibmveth receives a packet with "checksum good"
   bit set and if its configured in Trunk mode, set appropriate SKB fields
   using skb_partial_csum_set (ip_summed field is set with
   CHECKSUM_PARTIAL)
 - Fix for Issue 3: Recompute the pseudo header checksum before sending the
   SKB up the stack.
 - Fix for Issue 4: Linearize the SKBs with frag_list. Though we end up
   allocating buffers and copying data, this fix gives
   upto 4X throughput increase.

Note: All these fixes need to be dropped together as fixing just one of
them will lead to other issues immediately (especially for Issues 1,2 & 3).

Signed-off-by: Sivakumar Krishnasamy <ksiva@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'qed-next'
David S. Miller [Sun, 21 May 2017 16:56:57 +0000 (12:56 -0400)]
Merge branch 'qed-next'

Yuval Mintz says:

====================
qed/qede updates

This series contains some general minor fixes and enhancements:

 - #1, #2 and #9 correct small missing ethtool functionality.
 - #3, #6  and #8 correct minor issues in driver, but those are either
   print-related or unexposed in existing code.
 - #4 adds proper support to TLB mode bonding.
 - #10 is meant to improve performance on varying cache-line sizes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqede: Support 1G advertisment.
Sudarsana Reddy Kalluru [Sun, 21 May 2017 09:11:00 +0000 (12:11 +0300)]
qede: Support 1G advertisment.

Some variants of adapters support the 1G speed capability. Need to
allow the configuration of 1G speed if adapter supports it.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Fix setting of Management bitfields
Tomer Tayar [Sun, 21 May 2017 09:10:59 +0000 (12:10 +0300)]
qed: Fix setting of Management bitfields

The management firmware HSI contains masks which are already
shifted to their right place, so QED_MFW_SET_FIELD() is clearing
incorrect fields by shifting the mask by the offset.

Luckily, today we set the fields in an incrementing order [so we're
not erasing any previously set fields], but this still needs fixing.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqede: qedr closure after setting state
Mintz, Yuval [Sun, 21 May 2017 09:10:58 +0000 (12:10 +0300)]
qede: qedr closure after setting state

This is benign, but it makes more sense to start the close sequence
only after changing the internal state [in case it would once care].

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Correct print in iscsi error-flow
Mintz, Yuval [Sun, 21 May 2017 09:10:57 +0000 (12:10 +0300)]
qed: Correct print in iscsi error-flow

If too many CQs are requested, qed would print the available
number as if it's a resource and not a feature leading to the
wrong print.

Fixes: 08737a3fa30a ("qed: Inform qedi the number of possible CQs")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Revise alloc/setup/free flow
Tomer Tayar [Sun, 21 May 2017 09:10:56 +0000 (12:10 +0300)]
qed: Revise alloc/setup/free flow

Re-organize the logic that allocates and frees memory of various
sub-components of the hw-function -

 a. No need to pass pointers to said structure as parameters;
    The internal logic knows exactly where to find/set the data.

 b. Nullify pointers after cleanup to prevent possible errors to
    re-entrant code.

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqede: Don't use an internal MAC field
Mintz, Yuval [Sun, 21 May 2017 09:10:55 +0000 (12:10 +0300)]
qede: Don't use an internal MAC field

Driver maintains its primary MAC in a private field which
gets updated when ndo_dev_set_mac() gets called.

However, there are flows where the primary MAC of the device can change
without said NDO being called [bond device in TLB mode configuring
slaves' addresses], resulting in a configuration where there's a mismatch
between what's apparent to user [the netdevice's value] and what's
configured in the HW [the private value].

As we don't have any real motivation of maintaining this
private field, simply remove it and start using the netdevice's
field instead.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqede: Add missing Status-block free
Sudarsana Reddy Kalluru [Sun, 21 May 2017 09:10:54 +0000 (12:10 +0300)]
qede: Add missing Status-block free

When destroying the datapath channels, qede doesn't notify qed of the
released status blocks which were acquired during the initialization.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqede: Honor user request for Tx buffers
Sudarsana Reddy Kalluru [Sun, 21 May 2017 09:10:53 +0000 (12:10 +0300)]
qede: Honor user request for Tx buffers

Driver always allocates the maximal number of tx-buffers irrespective of
actual Tx ring config.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqede: Allow WoL to activate by default
Mintz, Yuval [Sun, 21 May 2017 09:10:52 +0000 (12:10 +0300)]
qede: Allow WoL to activate by default

When management firmware declares that the device is WoL-capable,
the default driver behavior would be to allow the management firmware
to take the decision of whether it's actually needed or not.

Problem is ethtool interface doesn't have a 'default' kind
of option, and user would see the interface WoL as disabled,
which doesn't accurately reflect the actual configuration.
More-so, if the user actually wants to explicitly disable WoL he'd have
to first enable it [otherwise ethtool would block the command].

Instead of allowing management to make the decision, enable WoL by
default on all devices capable of it.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'xgene-check-all-RGMII-phy-mode-variants'
David S. Miller [Fri, 19 May 2017 23:41:45 +0000 (19:41 -0400)]
Merge branch 'xgene-check-all-RGMII-phy-mode-variants'

Iyappan Subramanian says:

====================
Check all RGMII phy mode variants

This patch set,
     - adds phy_interface_mode_is_rgmii() helper function
     - addresses review comment from previous patch set, by calling
       phy_interface_mode_is_rgmii() to address all RGMII variants

v2: Address review comments from v1
     - adds phy_interface_mode_is_rgmii() helper function
     - addresses review comment from previous patch set, by calling
       phy_interface_mode_is_rgmii() to address all RGMII variants
v1:
     - Initial version
====================

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoxgene: Check all RGMII phy mode variants
Iyappan Subramanian [Thu, 18 May 2017 22:13:44 +0000 (15:13 -0700)]
xgene: Check all RGMII phy mode variants

This patch addresses the review comment from the previous patch set,
by using phy_interface_mode_is_rgmii() helper function to address
all RGMII phy mode variants.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Quan Nguyen <qnguyen@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agophy: Add helper function to check phy interface mode
Iyappan Subramanian [Thu, 18 May 2017 22:13:43 +0000 (15:13 -0700)]
phy: Add helper function to check phy interface mode

Added helper function that checks phy_mode is RGMII (all variants)
'bool phy_interface_mode_is_rgmii(phy_interface_t mode)'

Changed the following function, to use the above.
'bool phy_interface_is_rgmii(struct phy_device *phydev)'

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'net-fix-CRC32c-in-the-forwarding-path'
David S. Miller [Fri, 19 May 2017 23:21:32 +0000 (19:21 -0400)]
Merge branch 'net-fix-CRC32c-in-the-forwarding-path'

Davide Caratti says:

====================
net: fix CRC32c in the forwarding path

Current kernel allows offloading CRC32c computation when SCTP packets
are generated, setting skb->ip_summed to CHECKSUM_PARTIAL, if the
underlying device features have NETIF_F_SCTP_CRC set. However, after these
packets are forwarded, they may land on a device where CRC32c offloading is
not available: as a consequence, transmission is done with wrong CRC32c.
It's not possible to use sctp_compte_cksum() in the forwarding path
and in most drivers, because it needs symbols exported by libcrc32c module.

Patch 1 and 2 of this series try to solve this problem, introducing a new
helper function, namely skb_crc32c_csum_help(), that can be used to resolve
CHECKSUM_PARTIAL when crc32c is needed instead of Internet Checksum.

Currently, we need to parse the packet headers to understand what algorithm
is needed to resolve CHECKSUM_PARTIAL. We can speedup things by storing
this information in the skb metadata, and use it to call an appropriate
helper (skb_checksum_help or skb_crc32c_csum_help), or leave the packet
unmodified when the NIC is able to offload the checksum computation.

Patch 3 deprecates skb->csum_bad to free one bit in skb metadata; patch 4
introduces skb->csum_not_inet, providing skb with an indication on the
algorithm needed to resolve CHECKSUM_PARTIAL.
Patch 5 and 6 fix the kernel forwarding path and openvswitch datapath,
where skb_checksum_help was unconditionally called to resolve CHECKSUM_PARTIAL,
thus generating wrong CRC32c in forwarded SCTP packets.
Finally, patch 7 updates documentation to provide a better description of
possible values of skb->ip_summed.

Some further work is still possible:
* drivers that parse the packet header to correctly resolve CHECKSUM_PARTIAL
(e.g. ixgbe_tx_csum()) can benefit from testing skb->csum_not_inet to avoid
calling ip_hdr(skb)->protocol or ixgbe_ipv6_csum_is_sctp(skb).

* drivers that call skb_checksum_help() to resolve CHECKSUM_PARTIAL can
call skb_csum_hwoffload_help to avoid corrupting SCTP packets.

Changes v2->v3:
- patch 1/7: more standard declaration of stub variables

Changes v1->v2:
- none

Changes RFCv4->v1:
- patch 2/7: use WARN_ON_ONCE() instead of BUG_ON(), and avoid computing
CRC32c on the error path.
- patch 3/7: don't invert tests on the values of same_flow and
NAPI_GRO_CB(skb)->flush in dev_gro_receive(), it's useless and it breaks
GRO functionality as reported by kernel test robot.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosk_buff.h: improve description of CHECKSUM_{COMPLETE, UNNECESSARY}
Davide Caratti [Thu, 18 May 2017 13:44:43 +0000 (15:44 +0200)]
sk_buff.h: improve description of CHECKSUM_{COMPLETE, UNNECESSARY}

Add FCoE to the list of protocols that can set CHECKSUM_UNNECESSARY; add a
note to CHECKSUM_COMPLETE section to specify that it does not apply to SCTP
and FCoE protocols.

Suggested-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoopenvswitch: more accurate checksumming in queue_userspace_packet()
Davide Caratti [Thu, 18 May 2017 13:44:42 +0000 (15:44 +0200)]
openvswitch: more accurate checksumming in queue_userspace_packet()

if skb carries an SCTP packet and ip_summed is CHECKSUM_PARTIAL, it needs
CRC32c in place of Internet Checksum: use skb_csum_hwoffload_help to avoid
corrupting such packets while queueing them towards userspace.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: more accurate checksumming in validate_xmit_skb()
Davide Caratti [Thu, 18 May 2017 13:44:41 +0000 (15:44 +0200)]
net: more accurate checksumming in validate_xmit_skb()

skb_csum_hwoffload_help() uses netdev features and skb->csum_not_inet to
determine if skb needs software computation of Internet Checksum or crc32c
(or nothing, if this computation can be done by the hardware). Use it in
place of skb_checksum_help() in validate_xmit_skb() to avoid corruption
of non-GSO SCTP packets having skb->ip_summed equal to CHECKSUM_PARTIAL.

While at it, remove references to skb_csum_off_chk* functions, since they
are not present anymore in Linux  _ see commit cf53b1da73bd ("Revert
 "net: Add driver helper functions to determine checksum offloadability"").

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: use skb->csum_not_inet to identify packets needing crc32c
Davide Caratti [Thu, 18 May 2017 13:44:40 +0000 (15:44 +0200)]
net: use skb->csum_not_inet to identify packets needing crc32c

skb->csum_not_inet carries the indication on which algorithm is needed to
compute checksum on skb in the transmit path, when skb->ip_summed is equal
to CHECKSUM_PARTIAL. If skb carries a SCTP packet and crc32c hasn't been
yet written in L4 header, skb->csum_not_inet is assigned to 1; otherwise,
assume Internet Checksum is needed and thus set skb->csum_not_inet to 0.

Suggested-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosk_buff: remove support for csum_bad in sk_buff
Davide Caratti [Thu, 18 May 2017 13:44:39 +0000 (15:44 +0200)]
sk_buff: remove support for csum_bad in sk_buff

This bit was introduced with commit 5a21232983aa ("net: Support for
csum_bad in skbuff") to reduce the stack workload when processing RX
packets carrying a wrong Internet Checksum. Up to now, only one driver and
GRO core are setting it.

Suggested-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: introduce skb_crc32c_csum_help
Davide Caratti [Thu, 18 May 2017 13:44:38 +0000 (15:44 +0200)]
net: introduce skb_crc32c_csum_help

skb_crc32c_csum_help is like skb_checksum_help, but it is designed for
checksumming SCTP packets using crc32c (see RFC3309), provided that
libcrc32c.ko has been loaded before. In case libcrc32c is not loaded,
invoking skb_crc32c_csum_help on a skb results in one the following
printouts:

warn_crc32c_csum_update: attempt to compute crc32c without libcrc32c.ko
warn_crc32c_csum_combine: attempt to compute crc32c without libcrc32c.ko

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoskbuff: add stub to help computing crc32c on SCTP packets
Davide Caratti [Thu, 18 May 2017 13:44:37 +0000 (15:44 +0200)]
skbuff: add stub to help computing crc32c on SCTP packets

sctp_compute_checksum requires crc32c symbol (provided by libcrc32c), so
it can't be used in net core. Like it has been done previously with other
symbols (e.g. ipv6_dst_lookup), introduce a stub struct skb_checksum_ops
to allow computation of crc32c checksum in net core after sctp.ko (and thus
libcrc32c) has been loaded.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: warn on negative reordering values
Soheil Hassas Yeganeh [Tue, 16 May 2017 21:39:02 +0000 (17:39 -0400)]
tcp: warn on negative reordering values

Commit bafbb9c73241 ("tcp: eliminate negative reordering
in tcp_clean_rtx_queue") fixes an issue for negative
reordering metrics.

To be resilient to such errors, warn and return
when a negative metric is passed to tcp_update_reordering().

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Thu, 18 May 2017 20:11:32 +0000 (16:11 -0400)]
Merge git://git./linux/kernel/git/davem/net

7 years agonet/mlx5e: Fix possible memory leak
Wei Yongjun [Thu, 18 May 2017 15:34:41 +0000 (15:34 +0000)]
net/mlx5e: Fix possible memory leak

'encap_header' is malloced and should be freed before leaving from
the error handling cases, otherwise it will cause memory leak.

Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Remove unused including <linux/version.h>
Wei Yongjun [Thu, 18 May 2017 15:26:29 +0000 (15:26 +0000)]
qed: Remove unused including <linux/version.h>

Remove including <linux/version.h> that is not needed.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: fix missing unlock on error in __ibmvnic_reset()
Wei Yongjun [Thu, 18 May 2017 15:24:52 +0000 (15:24 +0000)]
ibmvnic: fix missing unlock on error in __ibmvnic_reset()

Add the missing unlock before return from function __ibmvnic_reset()
in the error handling case.

Fixes: ed651a10875f ("ibmvnic: Updated reset handling")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'md/4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
Linus Torvalds [Thu, 18 May 2017 19:04:41 +0000 (12:04 -0700)]
Merge tag 'md/4.12-rc2' of git://git./linux/kernel/git/shli/md

Pull MD fixes from Shaohua Li:

 - Several bug fixes for raid5-cache from Song Liu, mainly handle
   journal disk error

 - Fix bad block handling in choosing raid1 disk from Tomasz Majchrzak

 - Simplify external metadata array sysfs handling from Artur
   Paszkiewicz

 - Optimize raid0 discard handling from me, now raid0 will dispatch
   large discard IO directly to underlayer disks.

* tag 'md/4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  raid1: prefer disk without bad blocks
  md/r5cache: handle sync with data in write back cache
  md/r5cache: gracefully handle journal device errors for writeback mode
  md/raid1/10: avoid unnecessary locking
  md/raid5-cache: in r5l_do_submit_io(), submit io->split_bio first
  md/md0: optimize raid0 discard handling
  md: don't return -EAGAIN in md_allow_write for external metadata arrays
  md/raid5: make use of spin_lock_irq over local_irq_disable + spin_lock

7 years agonet1080: Remove unused function nc_dump_ttl()
Matthias Kaehlcke [Thu, 18 May 2017 17:57:19 +0000 (10:57 -0700)]
net1080: Remove unused function nc_dump_ttl()

The function is not used, removing it fixes the following warning when
building with clang:

drivers/net/usb/net1080.c:271:20: error: unused function
    'nc_dump_ttl' [-Werror,-Wunused-function]

Also remove the definition of TTL_THIS, which is only used in
nc_dump_ttl()

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agor8152: Remove unused function usb_ocp_read()
Matthias Kaehlcke [Thu, 18 May 2017 17:45:33 +0000 (10:45 -0700)]
r8152: Remove unused function usb_ocp_read()

The function is not used, removing it fixes the following warning when
building with clang:

drivers/net/usb/r8152.c:825:5: error: unused function 'usb_ocp_read'
    [-Werror,-Wunused-function]

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 18 May 2017 18:40:21 +0000 (11:40 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Don't allow negative TCP reordering values, from Soheil Hassas
    Yeganeh.

 2) Don't overflow while parsing ipv6 header options, from Craig Gallek.

 3) Handle more cleanly the case where an individual route entry during
    a dump will not fit into the allocated netlink SKB, from David
    Ahern.

 4) Add missing CONFIG_INET dependency for mlx5e, from Arnd Bergmann.

 5) Allow neighbour updates to converge more quickly via gratuitous
    ARPs, from Ihar Hrachyshka.

 6) Fix compile error from CONFIG_INET is disabled, from Eric Dumazet.

 7) Fix use after free in x25 protocol init, from Lin Zhang.

 8) Valid VLAN pvid ranges passed into br_validate(), from Tobias
    Jungel.

 9) NULL out address lists in child sockets in SCTP, this is similar to
    the fix we made for inet connection sockets last week. From Eric
    Dumazet.

10) Fix NULL deref in mlxsw driver, from Ido Schimmel.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
  mlxsw: spectrum: Avoid possible NULL pointer dereference
  sh_eth: Do not print an error message for probe deferral
  sh_eth: Use platform device for printing before register_netdev()
  mlxsw: spectrum_router: Fix rif counter freeing routine
  mlxsw: spectrum_dpipe: Fix incorrect entry index
  cxgb4: update latest firmware version supported
  qmi_wwan: add another Lenovo EM74xx device ID
  sctp: do not inherit ipv6_{mc|ac|fl}_list from parent
  udp: make *udp*_queue_rcv_skb() functions static
  bridge: netlink: check vlan_default_pvid range
  net: ethernet: faraday: To support device tree usage.
  net: x25: fix one potential use-after-free issue
  bpf: adjust verifier heuristics
  ipv6: Check ip6_find_1stfragopt() return value properly.
  selftests/bpf: fix broken build due to types.h
  bnxt_en: Check status of firmware DCBX agent before setting DCB_CAP_DCBX_HOST.
  bnxt_en: Call bnxt_dcb_init() after getting firmware DCBX configuration.
  net: fix compile error in skb_orphan_partial()
  ipv6: Prevent overrun when parsing v6 header options
  neighbour: update neigh timestamps iff update is effective
  ...

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Linus Torvalds [Thu, 18 May 2017 18:21:10 +0000 (11:21 -0700)]
Merge git://git./linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:
 "Three sparc bug fixes"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc/ftrace: Fix ftrace graph time measurement
  sparc: Fix -Wstringop-overflow warning
  sparc64: Fix mapping of 64k pages with MAP_FIXED

7 years agoMerge tag 'kbuild-fixes-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masah...
Linus Torvalds [Thu, 18 May 2017 18:17:34 +0000 (11:17 -0700)]
Merge tag 'kbuild-fixes-v4.12' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fix from Masahiro Yamada:
 "Fix headers_install to not delete pre-existing headers in the install
  destination"

* tag 'kbuild-fixes-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: skip install/check of headers right under uapi directories

7 years agoqed: Utilize FW 8.20.0.0
Mintz, Yuval [Thu, 18 May 2017 16:41:04 +0000 (19:41 +0300)]
qed: Utilize FW 8.20.0.0

This pushes qed [and as result, all qed* drivers] into using 8.20.0.0
firmware. The changes are mostly contained in qed with minor changes
to qedi due to some HSI changes.

Content-wise, the firmware contains fixes to various issues exposed
since the release of the previous firmware, including:
 - Corrects iSCSI fast retransmit when data digest is enabled.
 - Stop draining packets when receiving several consecutive PFCs.
 - Prevent possible assertion when consecutively opening/closing
   many connections.
 - Prevent possible assertion due to too long BDQ fetch time.

In addition, the new firmware would allow us to later add iWARP support
in qed and qedr.

Changes from previous version
-----------------------------
 - V2: Fix warning in qed_debug.c

Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com>
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: fix tcp_rearm_rto()
Eric Dumazet [Thu, 18 May 2017 16:15:58 +0000 (09:15 -0700)]
tcp: fix tcp_rearm_rto()

skbs in (re)transmit queue no longer have a copy of jiffies
at the time of the transmit : skb->skb_mstamp is now in usec unit,
with no correlation to tcp_jiffies32.

We have to convert rto from jiffies to usec, compute a time difference
in usec, then convert the delta to HZ units.

Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm...
Linus Torvalds [Thu, 18 May 2017 17:04:42 +0000 (10:04 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace

Pull pid namespace fixes from Eric Biederman:
 "These are two bugs that turn out to have simple fixes that were
  reported during the merge window. Both of these issues have existed
  for a while and it just happens that they both were reported at almost
  the same time"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
  pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes

7 years agoMerge tag 'hwmon-for-linus-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 18 May 2017 16:38:09 +0000 (09:38 -0700)]
Merge tag 'hwmon-for-linus-v4.12-rc2' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
 "Fix problem with hotplug state machine in coretemp driver"

* tag 'hwmon-for-linus-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (coretemp) Handle frozen hotplug state correctly

7 years agomlxsw: spectrum: Avoid possible NULL pointer dereference
Ido Schimmel [Thu, 18 May 2017 11:03:52 +0000 (13:03 +0200)]
mlxsw: spectrum: Avoid possible NULL pointer dereference

In case we got an FDB notification for a port that doesn't exist we
execute an FDB entry delete to prevent it from re-appearing the next
time we poll for notifications.

If the operation failed we would trigger a NULL pointer dereference as
'mlxsw_sp_port' is NULL.

Fix it by reporting the error using the underlying bus device instead.

Fixes: 12f1501e7511 ("mlxsw: spectrum: remove FDB entry in case we get unknown object notification")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoliquidio: make the spinlock octeon_devices_lock static
Colin Ian King [Thu, 18 May 2017 09:14:01 +0000 (10:14 +0100)]
liquidio: make the spinlock octeon_devices_lock static

octeon_devices_lock can be made static as it does not need to be
in global scope.

Cleans up sparse warning: "warning: symbol 'octeon_devices_lock'
was not declared. Should it be static?"

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosh_eth: Do not print an error message for probe deferral
Geert Uytterhoeven [Thu, 18 May 2017 13:01:35 +0000 (15:01 +0200)]
sh_eth: Do not print an error message for probe deferral

EPROBE_DEFER is not an error, hence printing an error message like

    sh-eth ee700000.ethernet: failed to initialise MDIO

may confuse the user.

To fix this, suppress the error message in case of probe deferral.
While at it, shorten the message, and add the actual error code.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosh_eth: Use platform device for printing before register_netdev()
Geert Uytterhoeven [Thu, 18 May 2017 13:01:34 +0000 (15:01 +0200)]
sh_eth: Use platform device for printing before register_netdev()

The MDIO initialization failure message is printed using the network
device, before it has been registered, leading to:

     (null): failed to initialise MDIO

Use the platform device instead to fix this:

    sh-eth ee700000.ethernet: failed to initialise MDIO

Fixes: daacf03f0bbfefee ("sh_eth: Register MDIO bus before registering the network device")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'linux-can-next-for-4.13-20170518' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Thu, 18 May 2017 15:18:20 +0000 (11:18 -0400)]
Merge tag 'linux-can-next-for-4.13-20170518' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2017-05-18

this is a pull request of 4 patches for net-next/master.

All 4 patches are by Quentin Schulz, they add deep deep Suspend/Resume
support to the m_can driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_dpipe: Fix sparse warnings
Arkadi Sharshevsky [Thu, 18 May 2017 07:22:45 +0000 (09:22 +0200)]
mlxsw: spectrum_dpipe: Fix sparse warnings

drivers/net/ethernet/mellanox/mlxsw//spectrum_dpipe.c:221:52: warning:
Using plain integer as NULL pointer
drivers/net/ethernet/mellanox/mlxsw//spectrum_dpipe.c:221:74: warning:
Using plain integer as NULL pointer

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlxsw-fixes'
David S. Miller [Thu, 18 May 2017 15:04:00 +0000 (11:04 -0400)]
Merge branch 'mlxsw-fixes'

Jiri Pirko says:

====================
mlxsw: couple of fixes

Couple of fixes from Arkadi
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Fix rif counter freeing routine
Arkadi Sharshevsky [Thu, 18 May 2017 07:18:53 +0000 (09:18 +0200)]
mlxsw: spectrum_router: Fix rif counter freeing routine

During rif counter freeing the counter index can be invalid. Add check
of validity before freeing the counter.

Fixes: e0c0afd8aa4e ("mlxsw: spectrum: Support for counters on router interfaces")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_dpipe: Fix incorrect entry index
Arkadi Sharshevsky [Thu, 18 May 2017 07:18:52 +0000 (09:18 +0200)]
mlxsw: spectrum_dpipe: Fix incorrect entry index

In case of disabled counters the entry index will be incorrect. Fix this
by moving the entry index set before the counter status check.

Fixes: 2ba5999f009d ("mlxsw: spectrum: Add Support for erif table entries access")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: b53: Add compatible strings for the Cygnus-family BCM11360.
Eric Anholt [Thu, 18 May 2017 00:32:12 +0000 (17:32 -0700)]
net: dsa: b53: Add compatible strings for the Cygnus-family BCM11360.

Cygnus is a small family of SoCs, of which we currently have
devicetree for BCM11360 and BCM58300.  The 11360's B53 is mostly the
same as 58xx, just requiring a tiny bit of setup that was previously
missing.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'dsa-headers-cleanup'
David S. Miller [Thu, 18 May 2017 14:40:20 +0000 (10:40 -0400)]
Merge branch 'dsa-headers-cleanup'

Vivien Didelot says:

====================
net: dsa: headers cleanup

The DSA core files share a common private header file. Include the DSA
public header there instead of independently in each core source file.

DSA core and its drivers use switchdev, thus include switchdev.h in the
public DSA header. This allows us to get rid of the forward declaration
and use typedef defined by switchdev.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: use switchdev_obj_dump_cb_t everywhere
Vivien Didelot [Wed, 17 May 2017 19:46:05 +0000 (15:46 -0400)]
net: dsa: use switchdev_obj_dump_cb_t everywhere

Now that the DSA public header includes switchdev.h, use the provided
switchdev_obj_dump_cb_t typedef for the object dump callback.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: include switchdev.h only once
Vivien Didelot [Wed, 17 May 2017 19:46:04 +0000 (15:46 -0400)]
net: dsa: include switchdev.h only once

DSA drivers and core use switchdev. Include switchdev.h only once, in
the dsa.h public header, so that inclusion in DSA drivers or forward
declarations of switchdev structures in not necessary anymore.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: include dsa.h only once
Vivien Didelot [Wed, 17 May 2017 19:46:03 +0000 (15:46 -0400)]
net: dsa: include dsa.h only once

The public include/net/dsa.h file is meant for DSA drivers, while all
DSA core files share a common private header net/dsa/dsa_priv.h file.

Ensure that dsa_priv.h is the only DSA core file to include net/dsa.h,
and add a new line to separate absolute and relative headers at the same
time.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: fix __skb_try_recv_from_queue to return the old behavior
Andrey Vagin [Wed, 17 May 2017 18:39:05 +0000 (11:39 -0700)]
net: fix __skb_try_recv_from_queue to return the old behavior

This function has to return NULL on a error case, because there is a
separate error variable.

The offset has to be changed only if skb is returned

v2: fix udp code to not use an extra variable

Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Fixes: 65101aeca522 ("net/sock: factor out dequeue/peek with offset cod")
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: update latest firmware version supported
Ganesh Goudar [Wed, 17 May 2017 18:38:16 +0000 (00:08 +0530)]
cxgb4: update latest firmware version supported

Change t4fw_version.h to update latest firmware version
number to 1.16.43.0.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: make struct dst_entry::dev first member
Alexey Dobriyan [Wed, 17 May 2017 16:31:39 +0000 (19:31 +0300)]
net: make struct dst_entry::dev first member

struct dst_entry::dev is used most often. Move it so it can be
accessed without imm8 offset on x86_64.

add/remove: 0/0 grow/shrink: 9/239 up/down: 52/-413 (-361)
function                                     old     new   delta
dst_rcu_free                                 126     138     +12
fnhe_flush_routes                            211     219      +8
rt_set_nexthop                               747     754      +7
rt_cache_route                                85      91      +6
rt6_release                                  209     215      +6
dst_release                                  107     111      +4
dst_destroy_rcu                               29      33      +4
dn_dst_check_expire                          329     333      +4
dn_insert_route                              484     485      +1
xfrm_resolve_and_create_bundle              2991    2990      -1
...
ip_route_me_harder                          1163    1157      -6
__ip_append_data.isra                       2730    2724      -6
ip6_forward                                 3052    3045      -7
callforward_do_filter                        659     651      -8
dst_gc_task                                  571     549     -22

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'fsl_ucc_hdlc-enhancements'
David S. Miller [Thu, 18 May 2017 14:28:49 +0000 (10:28 -0400)]
Merge branch 'fsl_ucc_hdlc-enhancements'

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agopowerpc/85xx/kmcent2: use hdlc busmode for UCC1
Holger Brunck [Wed, 17 May 2017 15:24:39 +0000 (17:24 +0200)]
powerpc/85xx/kmcent2: use hdlc busmode for UCC1

Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/wan/fsl_ucc_hdlc: add hdlc-bus support
Holger Brunck [Wed, 17 May 2017 15:24:38 +0000 (17:24 +0200)]
net/wan/fsl_ucc_hdlc: add hdlc-bus support

This adds support for hdlc-bus mode to the fsl_ucc_hdlc driver. This can
be enabled with the "fsl,hdlc-bus" property in the DTS node of the
corresponding ucc.

This aligns the configuration of the UPSMR and GUMR registers to what is
done in our ucc_hdlc driver (that only support hdlc-bus mode) and with
the QuickEngine's documentation for hdlc-bus mode.

GUMR/SYNL is set to AUTO for the busmode as in this case the CD signal
is ignored. The brkpt_support is enabled to set the HBM1 bit in the
CMXUCR register to configure an open-drain connected HDLC bus.

Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agofsl/qe: add bit description for SYNL register for GUMR
Holger Brunck [Wed, 17 May 2017 15:24:37 +0000 (17:24 +0200)]
fsl/qe: add bit description for SYNL register for GUMR

Add the bitmask for the two bit SYNL register according to the QUICK
Engine Reference Manual.

Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/wan/fsl_ucc_hdlc: call qe_setbrg only for loopback mode
Holger Brunck [Wed, 17 May 2017 15:24:36 +0000 (17:24 +0200)]
net/wan/fsl_ucc_hdlc: call qe_setbrg only for loopback mode

We can't assume that we are always in loopback mode if rx and tx clock
have the same clock source. If we want to use HDLC busmode we also have
the same clock source but we are not in loopback mode. So move the
setting of the baudrate generator after the check for property for the
loopback mode.

Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/wan/fsl_ucc_hdlc: fix incorrect memory allocation
Holger Brunck [Wed, 17 May 2017 15:24:35 +0000 (17:24 +0200)]
net/wan/fsl_ucc_hdlc: fix incorrect memory allocation

We need space for the struct qe_bd and not for a pointer to this struct.

Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/wan/fsl_ucc_hdlc: fix wrong indentation
Holger Brunck [Wed, 17 May 2017 15:24:34 +0000 (17:24 +0200)]
net/wan/fsl_ucc_hdlc: fix wrong indentation

Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/wan/fsl_ucc_hdlc: fix unitialized variable warnings
Holger Brunck [Wed, 17 May 2017 15:24:33 +0000 (17:24 +0200)]
net/wan/fsl_ucc_hdlc: fix unitialized variable warnings

This fixes the following compiler warnings:
drivers/net/wan/fsl_ucc_hdlc.c: In function 'ucc_hdlc_poll':
warning: 'skb' may be used uninitialized in this function
[-Wmaybe-uninitialized]
  skb->mac_header = skb->data - skb->head;

and

drivers/net/wan/fsl_ucc_hdlc.c: In function 'ucc_hdlc_probe':
drivers/net/wan/fsl_ucc_hdlc.c:1127:3: warning: 'utdm' may be used
uninitialized in this function [-Wmaybe-uninitialized]
   kfree(utdm);

Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/wan/fsl_ucc_hdlc: cleanup debug traces
Holger Brunck [Wed, 17 May 2017 15:24:32 +0000 (17:24 +0200)]
net/wan/fsl_ucc_hdlc: cleanup debug traces

Some of the tracing seems to be remaining traces for basic driver
development. They can be removed now, as they cause noisy printouts.

Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Cc: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqmi_wwan: add another Lenovo EM74xx device ID
Bjørn Mork [Wed, 17 May 2017 14:31:41 +0000 (16:31 +0200)]
qmi_wwan: add another Lenovo EM74xx device ID

In their infinite wisdom, and never ending quest for end user frustration,
Lenovo has decided to use a new USB device ID for the wwan modules in
their 2017 laptops.  The actual hardware is still the Sierra Wireless
EM7455 or EM7430, depending on region.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: do not inherit ipv6_{mc|ac|fl}_list from parent
Eric Dumazet [Wed, 17 May 2017 14:16:40 +0000 (07:16 -0700)]
sctp: do not inherit ipv6_{mc|ac|fl}_list from parent

SCTP needs fixes similar to 83eaddab4378 ("ipv6/dccp: do not inherit
ipv6_mc_list from parent"), otherwise bad things can happen.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoudp: make *udp*_queue_rcv_skb() functions static
Paolo Abeni [Wed, 17 May 2017 12:52:16 +0000 (14:52 +0200)]
udp: make *udp*_queue_rcv_skb() functions static

Since the udp memory accounting refactor, we don't need any more
to export the *udp*_queue_rcv_skb(). Make them static and fix
a couple of sparse warnings:

net/ipv4/udp.c:1615:5: warning: symbol 'udp_queue_rcv_skb' was not
declared. Should it be static?
net/ipv6/udp.c:572:5: warning: symbol 'udpv6_queue_rcv_skb' was not
declared. Should it be static?

Fixes: 850cbaddb52d ("udp: use it's own memory accounting schema")
Fixes: c915fe13cbaa ("udplite: fix NULL pointer dereference")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: make struct net_device::tx_queue_len unsigned int
Alexey Dobriyan [Wed, 17 May 2017 10:30:44 +0000 (13:30 +0300)]
net: make struct net_device::tx_queue_len unsigned int

4 billion packet queue is something unthinkable so use 32-bit value
for now.

Space savings on x86_64:

add/remove: 0/0 grow/shrink: 3/70 up/down: 16/-131 (-115)
function                                     old     new   delta
change_tx_queue_len                           94     108     +14
qdisc_create                                1176    1177      +1
alloc_netdev_mqs                            1124    1125      +1
xenvif_alloc                                 533     532      -1
x25_asy_setup                                167     166      -1
...
tun_queue_resize                             945     940      -5
pfifo_fast_enqueue                           167     162      -5
qfq_init_qdisc                               168     158     -10
tap_queue_resize                             810     799     -11
transmit                                     719     698     -21

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobridge: netlink: check vlan_default_pvid range
Tobias Jungel [Wed, 17 May 2017 07:29:12 +0000 (09:29 +0200)]
bridge: netlink: check vlan_default_pvid range

Currently it is allowed to set the default pvid of a bridge to a value
above VLAN_VID_MASK (0xfff). This patch adds a check to br_validate and
returns -EINVAL in case the pvid is out of bounds.

Reproduce by calling:

[root@test ~]# ip l a type bridge
[root@test ~]# ip l a type dummy
[root@test ~]# ip l s bridge0 type bridge vlan_filtering 1
[root@test ~]# ip l s bridge0 type bridge vlan_default_pvid 9999
[root@test ~]# ip l s dummy0 master bridge0
[root@test ~]# bridge vlan
port vlan ids
bridge0  9999 PVID Egress Untagged

dummy0  9999 PVID Egress Untagged

Fixes: 0f963b7592ef ("bridge: netlink: add support for default_pvid")
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Tobias Jungel <tobias.jungel@bisdn.de>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoudp: make function udp_skb_dtor_locked static
Colin Ian King [Wed, 17 May 2017 08:50:36 +0000 (09:50 +0100)]
udp: make function udp_skb_dtor_locked static

Function udp_skb_dtor_locked does not need to be in global scope
so make it static to fix sparse warning:

net/ipv4/udp.c: warning: symbol 'udp_skb_dtor_locked' was not
declared. Should it be static?

Fixes: 6dfb4367cd911d ("udp: keep the sk_receive_queue held when splicing")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethernet: faraday: To support device tree usage.
Greentime Hu [Wed, 17 May 2017 07:28:19 +0000 (15:28 +0800)]
net: ethernet: faraday: To support device tree usage.

To support device tree usage for ftmac100.

Signed-off-by: Greentime Hu <green.hu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'vhost_net-rx-batch-dequeuing'
David S. Miller [Thu, 18 May 2017 14:07:42 +0000 (10:07 -0400)]
Merge branch 'vhost_net-rx-batch-dequeuing'

Jason Wang says:

====================
vhost_net rx batch dequeuing

This series tries to implement rx batching for vhost-net. This is done
by batching the dequeuing from skb_array which was exported by
underlayer socket and pass the sbk back through msg_control to finish
userspace copying. This is also the requirement for more batching
implemention on rx path.

Tests shows at most 7.56% improvment bon rx pps on top of batch
zeroing and no obvious changes for TCP_STREAM/TCP_RR result.

Please review.

Thanks

Changes from V4:
- drop batch zeroing patch
- renew the performance numbers
- move skb pointer array out of vhost_net structure

Changes from V3:
- add batch zeroing patch to fix the build warnings

Changes from V2:
- rebase to net-next HEAD
- use unconsume helpers to put skb back on releasing
- introduce and use vhost_net internal buffer helpers
- renew performance numbers on top of batch zeroing

Changes from V1:
- switch to use for() in __ptr_ring_consume_batched()
- rename peek_head_len_batched() to fetch_skbs()
- use skb_array_consume_batched() instead of
  skb_array_consume_batched_bh() since no consumer run in bh
- drop the lockless peeking patch since skb_array could be resized, so
  it's not safe to call lockless one
====================

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agovhost_net: try batch dequing from skb array
Jason Wang [Wed, 17 May 2017 04:14:45 +0000 (12:14 +0800)]
vhost_net: try batch dequing from skb array

We used to dequeue one skb during recvmsg() from skb_array, this could
be inefficient because of the bad cache utilization and spinlock
touching for each packet. This patch tries to batch them by calling
batch dequeuing helpers explicitly on the exported skb array and pass
the skb back through msg_control for underlayer socket to finish the
userspace copying. Batch dequeuing is also the requirement for more
batching improvement on receive path.

Tests were done by pktgen on tap with XDP1 in guest. Host is Intel(R)
Xeon(R) CPU E5-2650 0 @ 2.00GHz.

rx batch | pps

0   2.25Mpps
1   2.33Mpps (+3.56%)
4   2.33Mpps (+3.56%)
16  2.35Mpps (+4.44%)
64  2.42Mpps (+7.56%) <- Default rx batching
128 2.40Mpps (+6.67%)
256 2.38Mpps (+5.78%)

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotap: support receiving skb from msg_control
Jason Wang [Wed, 17 May 2017 04:14:44 +0000 (12:14 +0800)]
tap: support receiving skb from msg_control

This patch makes tap_recvmsg() can receive from skb from its caller
through msg_control. Vhost_net will be the first user.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotun: support receiving skb through msg_control
Jason Wang [Wed, 17 May 2017 04:14:43 +0000 (12:14 +0800)]
tun: support receiving skb through msg_control

This patch makes tun_recvmsg() can receive from skb from its caller
through msg_control. Vhost_net will be the first user.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotap: export skb_array
Jason Wang [Wed, 17 May 2017 04:14:42 +0000 (12:14 +0800)]
tap: export skb_array

This patch exports skb_array through tap_get_skb_array(). Caller can
then manipulate skb array directly.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotun: export skb_array
Jason Wang [Wed, 17 May 2017 04:14:41 +0000 (12:14 +0800)]
tun: export skb_array

This patch exports skb_array through tun_get_skb_array(). Caller can
then manipulate skb array directly.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>