platform/kernel/linux-rpi.git
8 years agonet/mlx5e: Delay skb->data access
Saeed Mahameed [Wed, 20 Apr 2016 19:02:18 +0000 (22:02 +0300)]
net/mlx5e: Delay skb->data access

Move mlx5e_handle_csum and eth_type_trans to the end of
mlx5e_build_rx_skb to gain some more time before accessing
skb->data, to reduce cache misses.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Remove redundant barrier
Tariq Toukan [Wed, 20 Apr 2016 19:02:17 +0000 (22:02 +0300)]
net/mlx5e: Remove redundant barrier

The bit-op operation one line before is an explicit barrier
by itself.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Use napi_alloc_skb for RX SKB allocations
Tariq Toukan [Wed, 20 Apr 2016 19:02:16 +0000 (22:02 +0300)]
net/mlx5e: Use napi_alloc_skb for RX SKB allocations

Instead of netdev_alloc_skb, we use the napi_alloc_skb function
which is designated to allocate skbuff's for RX in a
channel-specific NAPI instance, and implies the IP packet alignment.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Add fragmented memory support for RX multi packet WQE
Tariq Toukan [Wed, 20 Apr 2016 19:02:15 +0000 (22:02 +0300)]
net/mlx5e: Add fragmented memory support for RX multi packet WQE

If the allocation of a linear (physically continuous) MPWQE fails,
we allocate a fragmented MPWQE.

This is implemented via device's UMR (User Memory Registration)
which allows to register multiple memory fragments into ConnectX
hardware as a continuous buffer.
UMR registration is an asynchronous operation and is done via
ICO SQs.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Added ICO SQs
Tariq Toukan [Wed, 20 Apr 2016 19:02:14 +0000 (22:02 +0300)]
net/mlx5e: Added ICO SQs

Added ICO (Internal Control Operations) SQ per channel to be used
for driver internal operations such as memory registration for
fragmented memory and nop requests upon ifconfig up.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Support RX multi-packet WQE (Striding RQ)
Tariq Toukan [Wed, 20 Apr 2016 19:02:13 +0000 (22:02 +0300)]
net/mlx5e: Support RX multi-packet WQE (Striding RQ)

Introduce the feature of multi-packet WQE (RX Work Queue Element)
referred to as (MPWQE or Striding RQ), in which WQEs are larger
and serve multiple packets each.

Every WQE consists of many strides of the same size, every received
packet is aligned to a beginning of a stride and is written to
consecutive strides within a WQE.

In the regular approach, each regular WQE is big enough to be capable
of serving one received packet of any size up to MTU or 64K in case of
device LRO is enabled, making it very wasteful when dealing with
small packets or device LRO is enabled.

For its flexibility, MPWQE allows a better memory utilization
(implying improvements in CPU utilization and packet rate) as packets
consume strides according to their size, preserving the rest of
the WQE to be available for other packets.

MPWQE default configuration:
Num of WQEs = 16
Strides Per WQE = 2048
Stride Size = 64 byte

The default WQEs memory footprint went from 1024*mtu (~1.5MB) to
16 * 2048 * 64 = 2MB per ring.
However, HW LRO can now be supported at no additional cost in memory
footprint, and hence we turn it on by default and get an even better
performance.

Performance tested on ConnectX4-Lx 50G.
To isolate the feature under test, the numbers below were measured with
HW LRO turned off. We verified that the performance just improves when
LRO is turned back on.

* Netperf single TCP stream:
- BW raised by 10-15% for representative packet sizes:
  default, 64B, 1024B, 1478B, 65536B.

* Netperf multi TCP stream:
- No degradation, line rate reached.

* Pktgen: packet rate raised by 2-10% for traffic of different message
sizes: 64B, 128B, 256B, 1024B, and 1500B.

* Pktgen: packet loss in bursts of small messages (64byte),
single stream:
- | num packets | packets loss before | packets loss after
  |     2K      |       ~ 1K          |       0
  |     8K      |       ~ 6K          |       0
  |     16K     |       ~13K          |       0
  |     32K     |       ~28K          |       0
  |     64K     |       ~57K          |     ~24K

As expected as the driver can receive as many small packets (<=64B) as
the number of total strides in the ring (default = 2048 * 16) vs. 1024
(default ring size regardless of packets size) before this feature.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Use function pointers for RX data path handling
Tariq Toukan [Wed, 20 Apr 2016 19:02:12 +0000 (22:02 +0300)]
net/mlx5e: Use function pointers for RX data path handling

In preparation for Striding RQ feature, which will need its own
RX handlers.
This patch does not change any functionality.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Use only close NUMA node for default RSS
Tariq Toukan [Wed, 20 Apr 2016 19:02:11 +0000 (22:02 +0300)]
net/mlx5e: Use only close NUMA node for default RSS

Distribute default RSS table uniformly over the rings of the
close NUMA node, instead of all available channels.
This way we enforce the preference of close rings over far ones.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Allocate set of queue counters per netdev
Rana Shahout [Wed, 20 Apr 2016 19:02:10 +0000 (22:02 +0300)]
net/mlx5e: Allocate set of queue counters per netdev

Connect all netdev RQs to this set of queue counters.
Also, add an "rx_out_of_buffer" counter to ethtool,
which indicates RX packet drops due to lack of receive
buffers.

Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Introduce device queue counters
Tariq Toukan [Wed, 20 Apr 2016 19:02:09 +0000 (22:02 +0300)]
net/mlx5: Introduce device queue counters

A queue counter can collect several statistics for one or more
hardware queues (QPs, RQs, etc ..) that the counter is attached to.

For Ethernet it will provide an "out of buffer" counter which
collects the number of all packets that are dropped due to lack
of software buffers.

Here we add device commands to alloc/query/dealloc queue counters.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'bcmsysport-napi-updates'
David S. Miller [Thu, 21 Apr 2016 19:06:05 +0000 (15:06 -0400)]
Merge branch 'bcmsysport-napi-updates'

Florian Fainelli says:

====================
net: bcmsysport: utilize newer NAPI APIs

These two patches are very analoguous to what was already submitted for
BCMGENET and switch the SYSTEMPORT driver to utilizing __napi_schedule_irqoff()
and napi_complete_done for the RX NAPI context.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: bcmsysport: use napi_complete_done()
Florian Fainelli [Wed, 20 Apr 2016 18:37:09 +0000 (11:37 -0700)]
net: bcmsysport: use napi_complete_done()

By using napi_complete_done(), we allow fine tuning of
/sys/class/net/ethX/gro_flush_timeout for higher GRO aggregation
efficiency for a Gbit NIC.

Check commit 24d2e4a50737 ("tg3: use napi_complete_done()") for details.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: bcmsysport: use __napi_schedule_irqoff()
Florian Fainelli [Wed, 20 Apr 2016 18:37:08 +0000 (11:37 -0700)]
net: bcmsysport: use __napi_schedule_irqoff()

Both bcm_sysport_tx_isr() and bcm_sysport_rx_isr() run in hard irq
context, we do not need to block irq again.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'nlattr_align'
David S. Miller [Thu, 21 Apr 2016 18:22:14 +0000 (14:22 -0400)]
Merge branch 'nlattr_align'

Nicolas Dichtel says:

====================
libnl: enhance API to ease 64bit alignment for attribute

Here is a proposal to add more helpers in the libnetlink to manage 64-bit
alignment issues.
Note that this series was only tested on x86 by tweeking
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and adding some traces.

The first patch adds helpers for 64bit alignment and other patches
use them.

We could also add helpers for nla_put_u64() and its variants if needed.

v1 -> v2:
 - remove patch #1
 - split patch #2 (now #1 and #2)
 - add nla_need_padding_for_64bit()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoip6mr: align RTA_MFC_STATS on 64-bit
Nicolas Dichtel [Thu, 21 Apr 2016 16:58:27 +0000 (18:58 +0200)]
ip6mr: align RTA_MFC_STATS on 64-bit

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipmr: align RTA_MFC_STATS on 64-bit
Nicolas Dichtel [Thu, 21 Apr 2016 16:58:26 +0000 (18:58 +0200)]
ipmr: align RTA_MFC_STATS on 64-bit

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agortnl: use the new API to align IFLA_STATS*
Nicolas Dichtel [Thu, 21 Apr 2016 16:58:25 +0000 (18:58 +0200)]
rtnl: use the new API to align IFLA_STATS*

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agolibnl: add more helpers to align attributes on 64-bit
Nicolas Dichtel [Thu, 21 Apr 2016 16:58:24 +0000 (18:58 +0200)]
libnl: add more helpers to align attributes on 64-bit

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoveth: Update features to include all tunnel GSO types
Alexander Duyck [Tue, 19 Apr 2016 18:02:26 +0000 (14:02 -0400)]
veth: Update features to include all tunnel GSO types

This patch adds support for the checksum enabled versions of UDP and GRE
tunnels.  With this change we should be able to send and receive GSO frames
of these types over the veth pair without needing to segment the packets.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonetdev_features: Fold NETIF_F_ALL_TSO into NETIF_F_GSO_SOFTWARE
Alexander Duyck [Tue, 19 Apr 2016 18:02:19 +0000 (14:02 -0400)]
netdev_features: Fold NETIF_F_ALL_TSO into NETIF_F_GSO_SOFTWARE

This patch folds NETIF_F_ALL_TSO into the bitmask for NETIF_F_GSO_SOFTWARE.
The idea is to avoid duplication of defines since the only difference
between the two was the GSO_UDP bit.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agogeneve: testing the wrong variable in geneve6_build_skb()
Dan Carpenter [Tue, 19 Apr 2016 14:30:56 +0000 (17:30 +0300)]
geneve: testing the wrong variable in geneve6_build_skb()

We intended to test "err" and not "skb".

Fixes: aed069df099c ('ip_tunnel_core: iptunnel_handle_offloads returns int and doesn't free skb')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoNLA_BINARY misuse bug in HSR
Peter Heise [Tue, 19 Apr 2016 11:34:28 +0000 (13:34 +0200)]
NLA_BINARY misuse bug in HSR

Removed .type field from NLA to do proper length checking.
Reported by Daniel Borkmann and Julia Lawall.

Signed-off-by: Peter Heise <peter.heise@airbus.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: use jiffies_to_msecs to replace EXPIRES_IN_MS in inet/sctp_diag
Xin Long [Tue, 19 Apr 2016 07:10:01 +0000 (15:10 +0800)]
net: use jiffies_to_msecs to replace EXPIRES_IN_MS in inet/sctp_diag

EXPIRES_IN_MS macro comes from net/ipv4/inet_diag.c and dates
back to before jiffies_to_msecs() has been introduced.

Now we can remove it and use jiffies_to_msecs().

Suggested-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoperf, bpf: minimize the size of perf_trace_() tracepoint handler
Alexei Starovoitov [Tue, 19 Apr 2016 03:11:50 +0000 (20:11 -0700)]
perf, bpf: minimize the size of perf_trace_() tracepoint handler

move trace_call_bpf() into helper function to minimize the size
of perf_trace_*() tracepoint handlers.
    text    data     bss     dec      hex filename
10541679 5526646 2945024 19013349 1221ee5 vmlinux_before
10509422 5526646 2945024 18981092 121a0e4 vmlinux_after

It may seem that perf_fetch_caller_regs() can also be moved,
but that is incorrect, since ip/sp will be wrong.

bpf+tracepoint performance is not affected, since
perf_swevent_put_recursion_context() is now inlined.
export_symbol_gpl can also be dropped.

No measurable change in normal perf tracepoints.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: remove tag_protocol from dsa_switch
Vivien Didelot [Mon, 18 Apr 2016 22:24:04 +0000 (18:24 -0400)]
net: dsa: remove tag_protocol from dsa_switch

Having the tag protocol in dsa_switch_driver for setup time and in
dsa_switch_tree for runtime is enough. Remove dsa_switch's one.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Thu, 21 Apr 2016 15:52:05 +0000 (11:52 -0400)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2016-04-20

This series contains updates to fm10k only.

Jacob provides majority of the changes in this series, starting with the
addition of helper functions to reduce code duplication and the amount
of code indentation.  Fixed the use or should we say abuse of the ethtool
stats API, which could result in corrupt memory or misleading statistic
output.  Added the appropriate rtnl_lock() and rtnl_unlock() to avoid
RCU warnings during AER events.  Come to find out, the PTP/1588 support
is not working with the current version of switch management software
and possibly never worked, so just remove support for PTP/1588 for now.
Fixed how error responses from the switch manager after a LPORT_MAP
request are handled, originally which were silently being ignored.
Fixed up code documentation to hopefully ease the code and comment
comprehension.  Fixed a possible NULL pointer dereference after a
kcalloc(), where when writing a new default redirection table, and we
needed to populate a new RSS table using ethtool_rxfh_indir_default().
We populate this table into a region of memory allocated using kcalloc()
but never check it for NULL.

Alex adds support for bulk transmit cleanup for fm10k, like he did for
all of our other drivers.

Ngai-Mint fixes a number of issues with the unicast and multicast address
syncs.  Where an issue would occur when the netdev is pre-configured to
either multicast mode and is enabled for the first time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofm10k: fix incorrect IPv6 extended header checksum
Jacob Keller [Thu, 7 Apr 2016 15:52:53 +0000 (08:52 -0700)]
fm10k: fix incorrect IPv6 extended header checksum

Check for and handle IPv6 extended headers so that Tx checksum offload
can be done. Also use skb_checksum_help for unexpected cases. This was
originally discovered in ixgbe.

Reported-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: consistently use Intel(R) for driver names
Jacob Keller [Thu, 7 Apr 2016 15:21:21 +0000 (08:21 -0700)]
fm10k: consistently use Intel(R) for driver names

Update every header file and other locations to consistently use
Intel(R) instead of just Intel. Also update copyright year of files
which we modified.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: fix possible null pointer deref after kcalloc
Jacob Keller [Thu, 7 Apr 2016 15:21:20 +0000 (08:21 -0700)]
fm10k: fix possible null pointer deref after kcalloc

When writing a new default redirection table, we needed to populate
a new RSS table using ethtool_rxfh_indir_default. We populated this
table into a region of memory allocated using kcalloc, but never checked
this for NULL. Fix this by moving the default table generation into
fm10k_write_reta. If this function is passed a table, use it. Otherwise,
generate the default table using ethtool_rxfh_indir_default, 4 at at
time.

Fixes: 0ea7fae44094 ("fm10k: use ethtool_rxfh_indir_default for default redirection table")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: Reset multicast mode when deleting lport
Ngai-Mint Kwan [Fri, 1 Apr 2016 23:17:39 +0000 (16:17 -0700)]
fm10k: Reset multicast mode when deleting lport

Deleting lport when multicast mode is configured to
FM10K_XCAST_MODE_ALLMULTI or FM10K_XCAST_MODE_PROMISC will result in
generating orphaned multicast-group entries in the switch manager.
Before deleting the lport, reset multicast mode to FM10K_XCAST_MODE_NONE
to flush out these multicast-group entries.

Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: update comment regarding reserved bits check
Jacob Keller [Fri, 1 Apr 2016 23:17:38 +0000 (16:17 -0700)]
fm10k: update comment regarding reserved bits check

The original comment may be read incorrectly as referring to checking
the *entire* length is zero. However, it merely checks only the reserved
bits of both length and reserved in a small amount of code. Update the
comment to indicate this is a clever trick and clearly spell out that it
only checks the reserve bits.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: use different name than FM10K_VLAN_CLEAR for override bit
Jacob Keller [Fri, 1 Apr 2016 23:17:37 +0000 (16:17 -0700)]
fm10k: use different name than FM10K_VLAN_CLEAR for override bit

Use a new #define FM10K_VLAN_OVERRIDE even though we're using the exact
same bit. The reason for this is clarity in the code, otherwise you can
read FM10K_VLAN_CLEAR and think it should be removed. Also add a comment
explaining why the FM10K_VLAN_OVERRIDE bit is set.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: use 8bit notation instead of 10bit notation for diagram
Jacob Keller [Fri, 1 Apr 2016 23:17:36 +0000 (16:17 -0700)]
fm10k: use 8bit notation instead of 10bit notation for diagram

The diagram represents bit layout of the multi-bit VLAN update message
format. Typically these diagrams are drawn using some power of 2 as the
base, to more easily grasp where fields split. Although the numbers
above can make it somewhat easy to understand which bit you're looking
at, it makes the break points not line up. Re-draw the numbers using
base 8, and mark the bit values every 8 bits at the top. This should
make it more easy to grasp the table quickly.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: fix documentation of fm10k_tlv_parse_attr
Jacob Keller [Fri, 1 Apr 2016 23:17:35 +0000 (16:17 -0700)]
fm10k: fix documentation of fm10k_tlv_parse_attr

fm10k_tlv_parse_attr is supposed to return FM10K_NOT_IMPLEMENTED for any
TLV who's attribute id lies outside the range of results. It does not do
this today. In addition, the documentation does not indicate that other
attributes which are not implemented for a given TLV will be silently
ignored. Fix this. Clean up the logic so that we don't rely on the fact
that FM10K_NOT_IMPLEMENTED is greater than zero, as this can easily
cause confusion.

A future extension could look into some way of reporting unknown TLVs
in order to make issues more easily discoverable. We can't just return
FM10K_NOT_IMPLEMENTED here because we don't want to drop the entire
message if it has an unknown TLV.

While here, update the copyright year.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: do not disable PCI device in fm10k_io_error_detected
Jacob Keller [Fri, 1 Apr 2016 23:17:34 +0000 (16:17 -0700)]
fm10k: do not disable PCI device in fm10k_io_error_detected

fm10k_io_error_detected() does not need to call pci_disable_device(). In
the cases where the reset needs to occur, the stack flow will result in
calling fm10k_remove() which already disables the PCI device. If we
leave the pci_disable_device(), we result in a warning about disabling
an already disabled device.

Many PCI drivers do call pci_disable_device() in their .error_detected()
routines, but it does not appear to be required. In addition, these
drivers have a check "is_pci_enabled()" call in their remove routines,
which is how they chose to handle the duplicate device disable.

This seems incorrect, since the PCI device structure is reference
counted. It is very possible that the reference count for the PCI device
could be greater than 1. In this case, you would remove the PCI device
within the error_detected routine, reducing count to 1, then remove it
again in the remove function, reducing it to zero. This would result in
yet another disable somewhere else failing. Thus, we shouldn't be using
is_pci_enabled() to check for this issue. Instead, just remove the
extraneous pci_device_disable() found within the error_detected routine.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: correctly handle LPORT_MAP error
Jacob Keller [Fri, 1 Apr 2016 23:17:33 +0000 (16:17 -0700)]
fm10k: correctly handle LPORT_MAP error

Currently, any error responses from the switch manager after an
LPORT_MAP request are silently ignored. At most the mailbox message will
be reported as an error. This can result in unexpected behavior when the
switch manager has configured a port with zero bandwidth. Add support
for reading the fm10k_swapi_error structure from LPORT_MAP responses.

If the message contains the necessary TLV and has a non-zero error code,
report link down, clear the dglort_map, and delay the next
get_host_state call by a reasonable delay. Also log an error message
indicating that the LPORT_MAP request failed.

The delay ensures preventing an interrupt storm on the switch manager,
and reduces the number of mailbox messages we send in this scenario
drastically.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: Fix multicast mode sync issues
Ngai-Mint Kwan [Fri, 1 Apr 2016 23:17:32 +0000 (16:17 -0700)]
fm10k: Fix multicast mode sync issues

Multicast mode checking is no longer a requirement to perform unicast
and multicast address syncs. Specifically, a device operating in
promiscuous and/or all multicast mode is not excluded. The issue occurs
when the netdev is pre-configured to either multicast mode and is
enabled for the first time. The multicast-group table in the Switch
Manager will be missing obvious multicast entries associated to this
netdev.

Changes were also made to disallow unicast and multicast syncs with
VLAN 0. The Switch Manager considers VLAN 0 to be an invalid entry.
Requests with VLAN 0 by the netdev are only generated when the driver is
freshly installed and the default VID is not set.

Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: drop 1588 support
Jacob Keller [Fri, 1 Apr 2016 23:17:31 +0000 (16:17 -0700)]
fm10k: drop 1588 support

The 1588 support within fm10k does not work correctly with the current
version of the switch management software, and likely never worked
correctly to begin with. Remove support for PTP/1588. Update copyright
year for all these files while we're touching them.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: prevent RCU issues during AER events
Jacob Keller [Fri, 11 Mar 2016 17:52:32 +0000 (09:52 -0800)]
fm10k: prevent RCU issues during AER events

During an AER action response, we were calling fm10k_close without
holding the rtnl_lock() which could lead to possible RCU warnings being
produced due to 64bit stat updates among other causes. Similarly, we
need rtnl_lock() around fm10k_open during fm10k_io_resume. Follow the
same pattern elsewhere in the driver and protect the entire open/close
sequence.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: use DRV_SUMMARY to reduce code duplication
Jacob Keller [Thu, 10 Mar 2016 00:36:08 +0000 (16:36 -0800)]
fm10k: use DRV_SUMMARY to reduce code duplication

Use DRV_SUMMARY, similar to DRV_VERSION so that we don't have to
duplicate the driver summary in multiple places.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: Add support for bulk Tx cleanup & cleanup boolean logic
Alexander Duyck [Mon, 7 Mar 2016 17:30:15 +0000 (09:30 -0800)]
fm10k: Add support for bulk Tx cleanup & cleanup boolean logic

This patch enables bulk free in Tx cleanup for fm10k and cleans up the
boolean logic in the polling routines for fm10k in the hopes of avoiding
any mix-ups similar to what occurred with i40e and i40evf.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: remove debug-statistics support
Jacob Keller [Fri, 4 Mar 2016 23:37:48 +0000 (15:37 -0800)]
fm10k: remove debug-statistics support

This change fixes an (ab)use of the ethtool stats API, which could
result in corrupt memory or misleading stat output. The ethtool stats
API is not robust enough to handle varying number of statistics due to
how it requests the size and allocates memory. Remove the poorly conceived
support originally added for extra debug statistics. In the future,
a new stats API may open up the ability to display these statistics.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agofm10k: add helper functions to set strings and data for ethtool stats
Jacob Keller [Fri, 1 Apr 2016 18:15:09 +0000 (11:15 -0700)]
fm10k: add helper functions to set strings and data for ethtool stats

Reduce duplicate code and the amount of indentation by adding
fm10k_add_stat_strings and fm10k_add_ethtool_stats functions which help
add fm10k_stat structures to the ethtool stats callbacks. This helps
increase ease of use for future stat additions, and increases code
readability.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
8 years agortnetlink: add new RTM_GETSTATS message to dump link stats
Roopa Prabhu [Wed, 20 Apr 2016 15:43:43 +0000 (08:43 -0700)]
rtnetlink: add new RTM_GETSTATS message to dump link stats

This patch adds a new RTM_GETSTATS message to query link stats via netlink
from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
returns a lot more than just stats and is expensive in some cases when
frequent polling for stats from userspace is a common operation.

RTM_GETSTATS is an attempt to provide a light weight netlink message
to explicity query only link stats from the kernel on an interface.
The idea is to also keep it extensible so that new kinds of stats can be
added to it in the future.

This patch adds the following attribute for NETDEV stats:
struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
        [IFLA_STATS_LINK_64]  = { .len = sizeof(struct rtnl_link_stats64) },
};

Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
a single interface or all interfaces with NLM_F_DUMP.

Future possible new types of stat attributes:
link af stats:
    - IFLA_STATS_LINK_IPV6  (nested. for ipv6 stats)
    - IFLA_STATS_LINK_MPLS  (nested. for mpls/mdev stats)
extended stats:
    - IFLA_STATS_LINK_EXTENDED (nested. extended software netdev stats like bridge,
      vlan, vxlan etc)
    - IFLA_STATS_LINK_HW_EXTENDED (nested. extended hardware stats which are
      available via ethtool today)

This patch also declares a filter mask for all stat attributes.
User has to provide a mask of stats attributes to query. filter mask
can be specified in the new hdr 'struct if_stats_msg' for stats messages.
Other important field in the header is the ifindex.

This api can also include attributes for global stats (eg tcp) in the future.
When global stats are included in a stats msg, the ifindex in the header
must be zero. A single stats message cannot contain both global and
netdev specific stats. To easily distinguish them, netdev specific stat
attributes name are prefixed with IFLA_STATS_LINK_

Without any attributes in the filter_mask, no stats will be returned.

This patch has been tested with mofified iproute2 ifstat.

Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: nla_align_64bit() needs to test the right pointer.
David S. Miller [Wed, 20 Apr 2016 19:32:54 +0000 (15:32 -0400)]
net: nla_align_64bit() needs to test the right pointer.

Netlink messages are appended, one object at a time, to the end of
the SKB.  Therefore we need to test skb_tail_pointer() not skb->data
for alignment.

Fixes: 35c5845957c7 ("net: Add helpers for 64-bit aligning netlink attributes.")
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: fix HAVE_EFFICIENT_UNALIGNED_ACCESS typos
Eric Dumazet [Wed, 20 Apr 2016 14:31:31 +0000 (07:31 -0700)]
net: fix HAVE_EFFICIENT_UNALIGNED_ACCESS typos

HAVE_EFFICIENT_UNALIGNED_ACCESS needs CONFIG_ prefix.

Also add a comment in nla_align_64bit() explaining we have
to add a padding if current skb->data is aligned, as it
certainly can be confusing.

Fixes: 35c5845957c7 ("net: Add helpers for 64-bit aligning netlink attributes.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/hsr: Fixed version field in ENUM
Peter Heise [Wed, 20 Apr 2016 07:08:29 +0000 (09:08 +0200)]
net/hsr: Fixed version field in ENUM

New field (IFLA_HSR_VERSION) was added in the middle of an existing
ENUM and would break kernel ABI, therefore moved to the end.
Reported by Stephen Hemminger.

Signed-off-by: Peter Heise <peter.heise@airbus.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: kill circular reference with slave priv
Vivien Didelot [Mon, 18 Apr 2016 20:10:24 +0000 (16:10 -0400)]
net: dsa: kill circular reference with slave priv

The dsa_slave_priv structure does not need a pointer to its net_device.
Kill it.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'bpf_event_output'
David S. Miller [Wed, 20 Apr 2016 00:26:11 +0000 (20:26 -0400)]
Merge branch 'bpf_event_output'

Daniel Borkmann says:

====================
BPF updates

This minor set adds a new helper bpf_event_output() for eBPF cls/act
program types which allows to pass events to user space applications.
For details, please see individual patches.

v1 -> v2:
  - Address kbuild bot found compile issue in patch 2
  - Rest as is
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: add event output helper for notifications/sampling/logging
Daniel Borkmann [Mon, 18 Apr 2016 19:01:24 +0000 (21:01 +0200)]
bpf: add event output helper for notifications/sampling/logging

This patch adds a new helper for cls/act programs that can push events
to user space applications. For networking, this can be f.e. for sampling,
debugging, logging purposes or pushing of arbitrary wake-up events. The
idea is similar to a43eec304259 ("bpf: introduce bpf_perf_event_output()
helper") and 39111695b1b8 ("samples: bpf: add bpf_perf_event_output example").

The eBPF program utilizes a perf event array map that user space populates
with fds from perf_event_open(), the eBPF program calls into the helper
f.e. as skb_event_output(skb, &my_map, BPF_F_CURRENT_CPU, raw, sizeof(raw))
so that the raw data is pushed into the fd f.e. at the map index of the
current CPU.

User space can poll/mmap/etc on this and has a data channel for receiving
events that can be post-processed. The nice thing is that since the eBPF
program and user space application making use of it are tightly coupled,
they can define their own arbitrary raw data format and what/when they
want to push.

While f.e. packet headers could be one part of the meta data that is being
pushed, this is not a substitute for things like packet sockets as whole
packet is not being pushed and push is only done in a single direction.
Intention is more of a generically usable, efficient event pipe to applications.
Workflow is that tc can pin the map and applications can attach themselves
e.g. after cls/act setup to one or multiple map slots, demuxing is done by
the eBPF program.

Adding this facility is with minimal effort, it reuses the helper
introduced in a43eec304259 ("bpf: introduce bpf_perf_event_output() helper")
and we get its functionality for free by overloading its BPF_FUNC_ identifier
for cls/act programs, ctx is currently unused, but will be made use of in
future. Example will be added to iproute2's BPF example files.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf, trace: add BPF_F_CURRENT_CPU flag for bpf_perf_event_output
Daniel Borkmann [Mon, 18 Apr 2016 19:01:23 +0000 (21:01 +0200)]
bpf, trace: add BPF_F_CURRENT_CPU flag for bpf_perf_event_output

Add a BPF_F_CURRENT_CPU flag to optimize the use-case where user space has
per-CPU ring buffers and the eBPF program pushes the data into the current
CPU's ring buffer which saves us an extra helper function call in eBPF.
Also, make sure to properly reserve the remaining flags which are not used.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoarcnet: com90xx: add __init attribute
Julia Lawall [Mon, 18 Apr 2016 14:55:35 +0000 (16:55 +0200)]
arcnet: com90xx: add __init attribute

Add __init attribute on a function that is only called from other __init
functions and that is not inlined, at least with gcc version 4.8.4 on an
x86 machine with allyesconfig.  Currently, the function is put in the
.text.unlikely segment.  Declaring it as __init will cause it to be put in
the .init.text and to disappear after initialization.

The result of objdump -x on the function before the change is as follows:

0000000000000000 l     F .text.unlikely 00000000000000bf check_mirror

And after the change it is as follows:

0000000000000000 l     F .init.text 00000000000000ba check_mirror

Done with the help of Coccinelle.  The semantic patch checks for local
static non-init functions that are called from an __init function and are
not called from any other function.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Michael Grzeschik <mgr@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/ipv6/addrconf: fix sysctl table indentation
Konstantin Khlebnikov [Mon, 18 Apr 2016 11:41:17 +0000 (14:41 +0300)]
net/ipv6/addrconf: fix sysctl table indentation

Separated from previous patch for readability.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/ipv6/addrconf: simplify sysctl registration
Konstantin Khlebnikov [Mon, 18 Apr 2016 11:41:10 +0000 (14:41 +0300)]
net/ipv6/addrconf: simplify sysctl registration

Struct ctl_table_header holds pointer to sysctl table which could be used
for freeing it after unregistration. IPv4 sysctls already use that.
Remove redundant NULL assignment: ndev allocated using kzalloc.

This also saves some bytes: sysctl table could be shorter than
DEVCONF_MAX+1 if some options are disable in config.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: Add helpers for 64-bit aligning netlink attributes.
David S. Miller [Tue, 19 Apr 2016 23:49:29 +0000 (19:49 -0400)]
net: Add helpers for 64-bit aligning netlink attributes.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: Align IFLA_STATS64 attributes properly on architectures that need it.
David S. Miller [Tue, 19 Apr 2016 18:30:10 +0000 (14:30 -0400)]
net: Align IFLA_STATS64 attributes properly on architectures that need it.

Since the nlattr header is 4 bytes in size, it can cause the netlink
attribute payload to not be 8-byte aligned.

This is particularly troublesome for IFLA_STATS64 which contains 64-bit
statistic values.

Solve this by creating a dummy IFLA_PAD attribute which has a payload
which is zero bytes in size.  When HAVE_EFFICIENT_UNALIGNED_ACCESS is
false, we insert an IFLA_PAD attribute into the netlink response when
necessary such that the IFLA_STATS64 payload will be properly aligned.

With help and suggestions from Eric Dumazet.

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: w5100: don't build spi driver without w5100
Arnd Bergmann [Mon, 18 Apr 2016 21:58:30 +0000 (23:58 +0200)]
net: w5100: don't build spi driver without w5100

The w5100-spi driver front-end only makes sense when the w5100
core driver is enabled, not for a configuration that only has w5300:

drivers/net/built-in.o: In function `w5100_spi_remove':
drivers/net/ethernet/wiznet/w5100-spi.c:277: undefined reference to `w5100_remove'
drivers/net/built-in.o: In function `w5100_spi_probe':
drivers/net/ethernet/wiznet/w5100-spi.c:272: undefined reference to `w5100_probe'
drivers/net/built-in.o: In function `w5200_spi_init':
drivers/net/ethernet/wiznet/w5100-spi.c:125: undefined reference to `w5100_ops_priv'
drivers/net/built-in.o: In function `w5200_spi_readbulk':
drivers/net/ethernet/wiznet/w5100-spi.c:125: undefined reference to `w5100_ops_priv'
drivers/net/built-in.o: In function `w5200_spi_writebulk':
drivers/net/ethernet/wiznet/w5100-spi.c:125: undefined reference to `w5100_ops_priv'
drivers/net/built-in.o:(.data+0x3ed1c): undefined reference to `w5100_pm_ops'

This adds an appropriate Kconfig dependency.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 630cf09751fe ("net: w5100: support SPI interface mode")
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: avoid warning for wrong pointer cast
Arnd Bergmann [Sat, 16 Apr 2016 20:29:33 +0000 (22:29 +0200)]
bpf: avoid warning for wrong pointer cast

Two new functions in bpf contain a cast from a 'u64' to a
pointer. This works on 64-bit architectures but causes a warning
on all 32-bit architectures:

kernel/trace/bpf_trace.c: In function 'bpf_perf_event_output_tp':
kernel/trace/bpf_trace.c:350:13: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
  u64 ctx = *(long *)r1;

This changes the cast to first convert the u64 argument into a uintptr_t,
which is guaranteed to be the same size as a pointer.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 9940d67c93b5 ("bpf: support bpf_get_stackid() and bpf_perf_event_output() in tracepoint programs")
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoof_mdio: make of_mdiobus_register_{device|phy}() *void*
Sergei Shtylyov [Sat, 16 Apr 2016 22:05:19 +0000 (01:05 +0300)]
of_mdio: make of_mdiobus_register_{device|phy}() *void*

The results of of_mdiobus_register_{device|phy}() are never checked, so we
can make  both these functions *void*...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoenic: set netdev->vlan_features
Govindarajulu Varadarajan [Fri, 15 Apr 2016 19:10:43 +0000 (00:40 +0530)]
enic: set netdev->vlan_features

Driver sets vlan_feature to netdev->features as hardware supports all of
them on vlan interface.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: Implement support for VF drivers on Hyper-V
KY Srinivasan [Thu, 14 Apr 2016 23:31:54 +0000 (16:31 -0700)]
hv_netvsc: Implement support for VF drivers on Hyper-V

Support VF drivers on Hyper-V. On Hyper-V, each VF instance presented to
the guest has an associated synthetic interface that shares the MAC address
with the VF instance. Typically these are bonded together to support
live migration. By default, the host delivers all the incoming packets
on the synthetic interface. Once the VF is up, we need to explicitly switch
the data path on the host to divert traffic onto the VF interface. Even after
switching the data path, broadcast and multicast packets are always delivered
on the synthetic interface and these will have to be injected back onto the
VF interface (if VF is up).
This patch implements the necessary support in netvsc to support Linux
VF drivers.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'fec-ksettings'
David S. Miller [Mon, 18 Apr 2016 18:45:09 +0000 (14:45 -0400)]
Merge branch 'fec-ksettings'

Philippe Reynes says:

====================
fec: ethtool: move to new api {get|set}_link_ksettings

Ethtool has a new api {get|set}_link_ksettings that deprecate
the old api {get|set}_settings. We update the fec driver to use
this new ethtool api.

For this first version, I've converted old u32 value in phy structure
to link_modes structure. Another way would be to replace u32 in
phy structure to use DECLARE_LINK_MODE_MASK for advertising, ....
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofec: move to new ethtool api {get|set}_link_ksettings
Philippe Reynes [Thu, 14 Apr 2016 22:35:01 +0000 (00:35 +0200)]
fec: move to new ethtool api {get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move the fec driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agophy: add generic function to support ksetting support
Philippe Reynes [Thu, 14 Apr 2016 22:35:00 +0000 (00:35 +0200)]
phy: add generic function to support ksetting support

The old ethtool api (get_setting and set_setting) has
generic phy functions phy_ethtool_sset and phy_ethtool_gset.
To supprt the new ethtool api (get_link_ksettings and
set_link_ksettings), we add generic phy function
phy_ethtool_ksettings_get and phy_ethtool_ksettings_set.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: ethtool: export conversion function between u32 and link mode
Philippe Reynes [Thu, 14 Apr 2016 22:34:59 +0000 (00:34 +0200)]
net: ethtool: export conversion function between u32 and link mode

The function convert_legacy_u32_to_link_mode and
convert_link_mode_to_legacy_u32 may be used outside
of ethtool.c. We rename them to ethtool_convert_...
and export them, so we could use them in others
drivers and modules.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotun: don't require serialization lock on tx
Paolo Abeni [Thu, 14 Apr 2016 16:39:39 +0000 (18:39 +0200)]
tun: don't require serialization lock on tx

The current tun_net_xmit() implementation don't need any external
lock since it relies on rcu protection for the tun data structure
and on socket queue lock for skb queuing.

This patch set the NETIF_F_LLTX feature bit in the tun device, so
that on xmit, in absence of qdisc, no serialization lock is acquired
by the caller.

The user space can remove the default tun qdisc with:

tc qdisc replace dev <tun device name> root noqueue

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoudp: fix if statement in SIOCINQ ioctl
Dan Carpenter [Mon, 18 Apr 2016 08:44:49 +0000 (11:44 +0300)]
udp: fix if statement in SIOCINQ ioctl

We deleted a line of code and accidentally made the "return put_user()"
part of the if statement when it's supposed to be unconditional.

Fixes: 9f9a45beaa96 ('udp: do not expect udp headers on ioctl SIOCINQ')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agortnetlink: rtnl_fill_stats: avoid an unnecssary stats copy
Roopa Prabhu [Sat, 16 Apr 2016 03:36:25 +0000 (20:36 -0700)]
rtnetlink: rtnl_fill_stats: avoid an unnecssary stats copy

This patch passes netlink attr data ptr directly to dev_get_stats
thus elimiating a stats copy.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'dsa-mv88e6xxx-switch-factorization'
David S. Miller [Sun, 17 Apr 2016 22:54:15 +0000 (18:54 -0400)]
Merge branch 'dsa-mv88e6xxx-switch-factorization'

Vivien Didelot says:

====================
net: dsa: mv88e6xxx: factorize switch info

This patchset factorizes the mv88e6xxx code by sharing a new extendable
info structure to store static data such as switch family, product
number, number of ports, number of databases and the name.

The next step is to add a "flags" bitmap member to the info structure in
order to simplify the shared code with a feature-based logic instead of
checking their family/ID.

This is a step forward having a single mv88e6xxx driver supporting many
similar devices, like any usual Linux driver.

Changes v3 -> v4:
  - constify probed name in DSA
  - rebase patchset above conflicting commit 48ace4e

Changes v2 -> v3:
  - update commit messages and add Andrew's tags
  - keep the info lookup code in a separated function
  - split the single switch ID reading in probe in a new commit

Changes v1 -> v2:
  - define PORT_SWITCH_ID_PROD_NUM_* values
  - use plain struct mv88e6xxx_info
  - remove non used yet ps->rev
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: remove switch ID from ps
Vivien Didelot [Sun, 17 Apr 2016 17:24:03 +0000 (13:24 -0400)]
net: dsa: mv88e6xxx: remove switch ID from ps

ps->id is not needed anymore, so remove it as well as the related
defined values.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: add number of db to info
Vivien Didelot [Sun, 17 Apr 2016 17:24:02 +0000 (13:24 -0400)]
net: dsa: mv88e6xxx: add number of db to info

Add the number of databases to the info structure.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: add number of ports to info
Vivien Didelot [Sun, 17 Apr 2016 17:24:01 +0000 (13:24 -0400)]
net: dsa: mv88e6xxx: add number of ports to info

Drop the ps->num_ports variable in favor of a new member of the info
structure. This removes the need to assign it at setup time.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: add family to info
Vivien Didelot [Sun, 17 Apr 2016 17:24:00 +0000 (13:24 -0400)]
net: dsa: mv88e6xxx: add family to info

Add an mv88e6xxx_family enum to the info structure for better family
indentification.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: add switch info
Vivien Didelot [Sun, 17 Apr 2016 17:23:59 +0000 (13:23 -0400)]
net: dsa: mv88e6xxx: add switch info

Add a new switch info structure which is meant to store switch models
static information, such as product number, name, number of ports,
number of databases, etc.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: read switch ID in probe
Vivien Didelot [Sun, 17 Apr 2016 17:23:58 +0000 (13:23 -0400)]
net: dsa: mv88e6xxx: read switch ID in probe

Read the switch ID only once, at probe time, to avoid multiple read
accesses and MII bus checking.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: drop revision probing
Vivien Didelot [Sun, 17 Apr 2016 17:23:57 +0000 (13:23 -0400)]
net: dsa: mv88e6xxx: drop revision probing

There is no point in having a special case for the revision when probing
a switch model. The code gets cluttered with unnecessary defines, and
leads to errors when code such as mv88e6131_setup compares
PORT_SWITCH_ID_6131_B2 to ps->id which masks the revision.

Drop every revision definition, and lookup only the product number.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: drop double ds assignment
Vivien Didelot [Sun, 17 Apr 2016 17:23:56 +0000 (13:23 -0400)]
net: dsa: mv88e6xxx: drop double ds assignment

Every driver assigns ps->ds even though it gets assigned in the shared
mv88e6xxx_setup_common function. Kill redundancy.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: constify probed name
Vivien Didelot [Sun, 17 Apr 2016 17:23:55 +0000 (13:23 -0400)]
net: dsa: constify probed name

Change the dsa_switch_driver.probe function to return a const char *.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'nfp-next'
David S. Miller [Sun, 17 Apr 2016 02:34:40 +0000 (22:34 -0400)]
Merge branch 'nfp-next'

Jakub Kicinski says

====================
nfp: cleanups and improvements

Main purpose of this set is to get rid of doing potentially long
mdelay()s but it also contains some trivial changes I've accumulated.
First two patches fix harmless copy-paste errors, next two clean up
the documentation and remove unused defines.  Patch 5 clarifies the
interpretation of RX descriptor fields.  Patch 6, by far the biggest,
adds ability to perform FW reconfig asynchronously thanks to which
we can stop using mdelay().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonfp: add async reconfiguration mechanism
Jakub Kicinski [Sat, 16 Apr 2016 10:25:54 +0000 (11:25 +0100)]
nfp: add async reconfiguration mechanism

Some callers of nfp_net_reconfig() are in atomic context so
we used to busy wait for commands to complete.  In worst case
scenario that means locking up a core for up to 5 seconds
when a command times out.  Lets add a timer-based mechanism
of asynchronously checking whether reconfiguration completed
successfully for atomic callers to use.  Non-atomic callers
can now just sleep.

The approach taken is quite simple because (1) synchronous
reconfigurations always happen under RTNL (or before device
is registered); (2) we can coalesce pending reconfigs.
There is no need for request queues, timer which eventually
takes a look at reconfiguration result to report errors is
good enough.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonfp: remove buggy RX buffer length validation
Jakub Kicinski [Sat, 16 Apr 2016 10:25:53 +0000 (11:25 +0100)]
nfp: remove buggy RX buffer length validation

Meaning of data_len and meta_len RX WB descriptor fields is
slightly confusing.  Add a comment with a diagram clarifying
the layout.  Also remove the buffer length validation:
(a) it's imprecise for static rx-offsets; (b) if firmware
is buggy enough to DMA past the end of the buffer
WARN_ON_ONCE() doesn't seem like a strong enough response.
skb_put() will do the checking for us anyway.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonfp: remove unused suspicious mask defines
Jakub Kicinski [Sat, 16 Apr 2016 10:25:52 +0000 (11:25 +0100)]
nfp: remove unused suspicious mask defines

NFP_NET_RXR_MASK sounds like a mask which could be used on
NFP_NET_CFG_RXRS_ENABLE register but its value is quite
strange.  In fact there are no users of this define so let's
just remove it.  Same for TX rings.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonfp: correct names of constants in comments
Jakub Kicinski [Sat, 16 Apr 2016 10:25:51 +0000 (11:25 +0100)]
nfp: correct names of constants in comments

Documentation in comments lacks CFG in some names.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonfp: remove unnecessary static
Jakub Kicinski [Sat, 16 Apr 2016 10:25:50 +0000 (11:25 +0100)]
nfp: remove unnecessary static

There is no reason for those local variables to be static.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonfp: check the right pointer for errors
Jakub Kicinski [Sat, 16 Apr 2016 10:25:49 +0000 (11:25 +0100)]
nfp: check the right pointer for errors

Correct checking error condition on wrong pointer -
copy/paste mistake most likely.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'IFF_NO_QUEUE-followups'
David S. Miller [Sun, 17 Apr 2016 02:02:14 +0000 (22:02 -0400)]
Merge branch 'IFF_NO_QUEUE-followups'

Phil Sutter says:

====================
Minor IFF_NO_QUEUE conversion follow-up

The following series converts two further drivers away from setting
'tx_queue_len = 0' to adding IFF_NO_QUEUE to priv_flags instead.

The first one, rtl8188eu in staging didn't exist back when all drivers
were converted. The second one, openvswitch seems to have slipped through
my grep'ing back then, no idea why.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoopenvswitch: Convert to using IFF_NO_QUEUE
Phil Sutter [Fri, 15 Apr 2016 17:14:20 +0000 (19:14 +0200)]
openvswitch: Convert to using IFF_NO_QUEUE

Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agostaging: rtl8188eu: Convert to using IFF_NO_QUEUE
Phil Sutter [Fri, 15 Apr 2016 17:14:19 +0000 (19:14 +0200)]
staging: rtl8188eu: Convert to using IFF_NO_QUEUE

Cc: Jakub Sitnicki <jsitnicki@gmail.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'fjes-next'
David S. Miller [Sun, 17 Apr 2016 01:51:01 +0000 (21:51 -0400)]
Merge branch 'fjes-next'

Taku Izumi says:

====================
FUJITSU Extended Socket driver version 1.1

This patchsets update FUJITSU Extended Socket network driver into version 1.1.
This mainly includes some improvements and minor bugfix.

v1->v2:
  - Remove ioctl and debugfs facility according to David comment
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofjes: Update fjes driver version : 1.1
Taku Izumi [Fri, 15 Apr 2016 02:25:52 +0000 (11:25 +0900)]
fjes: Update fjes driver version : 1.1

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofjes: Introduce spinlock for rx_status
Taku Izumi [Fri, 15 Apr 2016 02:25:46 +0000 (11:25 +0900)]
fjes: Introduce spinlock for rx_status

This patch introduces spinlock of rx_status for
proper excusive control.

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofjes: Enhance changing MTU related work
Taku Izumi [Fri, 15 Apr 2016 02:25:40 +0000 (11:25 +0900)]
fjes: Enhance changing MTU related work

This patch enhances the fjes_change_mtu() method
by introducing new flag named FJES_RX_MTU_CHANGING_DONE
in rx_status. At the same time, default MTU value is
changed into 65510 bytes.

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofjes: fix bitwise check bug in fjes_raise_intr_rxdata_task
Taku Izumi [Fri, 15 Apr 2016 02:25:34 +0000 (11:25 +0900)]
fjes: fix bitwise check bug in fjes_raise_intr_rxdata_task

In fjes_raise_intr_rxdata_task(), there's a bug of bitwise
check because of missing "& FJES_RX_POLL_WORK".
This patch fixes this bug.

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofjes: fix incorrect statistics information in fjes_xmit_frame()
Taku Izumi [Fri, 15 Apr 2016 02:25:27 +0000 (11:25 +0900)]
fjes: fix incorrect statistics information in fjes_xmit_frame()

There are bugs of acounting statistics in fjes_xmit_frame().
Accounting self stats is wrong. accounting stats of other
EPs to be transmitted  is right.
This patch fixes this bug.

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofjes: optimize timeout value
Taku Izumi [Fri, 15 Apr 2016 02:25:21 +0000 (11:25 +0900)]
fjes: optimize timeout value

This patch optimizes the following timeout value.
 - FJES_DEVICE_RESET_TIMEOUT
 - FJES_COMMAND_REQ_TIMEOUT
 - FJES_COMMAND_REQ_BUFF_TIMEOUT

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agostmmac: socfpga: remove extra call to socfpga_dwmac_setup
Dinh Nguyen [Fri, 15 Apr 2016 01:42:29 +0000 (20:42 -0500)]
stmmac: socfpga: remove extra call to socfpga_dwmac_setup

In the socfpga_dwmac_probe function, we have a call to socfpga_dwmac_setup,
which is already called from socfpga_dwmac_init later in the probe function.
Remove this extra call to socfpga_dwmac_setup.

Also we should not be calling socfpga_dwmac_setup() directly without wrapping
it around the proper reset assert/deasserts. That is because the
socfpga_dwmac_setup() is setting up PHY modes in the system manager, and it
is requires the EMAC's to be in reset during the PHY setup.

Reported-by: Matthew Gerlach <mgerlach@opensource.altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodsa: mv88e6xxx: Kill the REG_READ and REG_WRITE macros
Andrew Lunn [Thu, 14 Apr 2016 21:47:12 +0000 (23:47 +0200)]
dsa: mv88e6xxx: Kill the REG_READ and REG_WRITE macros

These macros hide a ds variable and a return statement on error, which
can lead to locking issues. Kill them off.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonetdev_features: Add NETIF_F_TSO_MANGLEID to NETIF_F_ALL_TSO
Alexander Duyck [Thu, 14 Apr 2016 21:04:34 +0000 (17:04 -0400)]
netdev_features: Add NETIF_F_TSO_MANGLEID to NETIF_F_ALL_TSO

I realized that when I added NETIF_F_TSO_MANGLEID as a TSO type I forgot to
add it to NETIF_F_ALL_TSO.  This patch corrects that so the flag will be
included correctly.

The result should be minor as it was only used by a few drivers and in a
few specific cases such as when NETIF_F_SG was not supported on a device so
the TSO flags were cleared.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'ipv6-gre-offloads'
David S. Miller [Sat, 16 Apr 2016 23:09:14 +0000 (19:09 -0400)]
Merge branch 'ipv6-gre-offloads'

Alexander Duyck says:

====================
Add support for offloads with IPv6 GRE tunnels

This patch series enables the use of segmentation and checksum offloads
with IPv6 based GRE tunnels.

In order to enable this series I had to make a change to
iptunnel_handle_offloads so that it would no longer free the skb.  This was
necessary as there were multiple paths in the IPv6 GRE code that required
the skb to still be present so it could be freed.  As it turned out I
believe this actually fixes a bug that was present in FOU/GUE based tunnels
anyway.

Below is a quick breakdown of the performance gains seen with a simple
netperf test passing traffic through a ip6gretap tunnel and then an i40e
interface:

Throughput Throughput  Local Local   Result
           Units       CPU   Service Tag
                       Util  Demand
                       %
3544.93    10^6bits/s  6.30  4.656   "before"
13081.75   10^6bits/s  3.75  0.752   "after"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoip6gre: Add support for GSO
Alexander Duyck [Thu, 14 Apr 2016 19:34:04 +0000 (15:34 -0400)]
ip6gre: Add support for GSO

This patch adds code borrowed from bits and pieces of other protocols to
the IPv6 GRE path so that we can support GSO over IPv6 based GRE tunnels.
By adding this support we are able to significantly improve the throughput
for GRE tunnels as we are able to make use of GSO.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>