platform/kernel/linux-rpi.git
9 years agoipv6: directly include libc-compat.h in ipv6.h
Willem de Bruijn [Mon, 12 Jan 2015 19:29:34 +0000 (14:29 -0500)]
ipv6: directly include libc-compat.h in ipv6.h

Patch 3b50d9029809 ("ipv6: fix redefinition of in6_pktinfo ...")
fixed a libc compatibility issue in ipv6 structure definitions
as described in include/uapi/linux/libc-compat.h.

It relies on including linux/in6.h to include libc-compat.h itself.
Include that file directly to clearly communicate the dependency
(libc-compat.h: "This include must be as early as possible").

Signed-off-by: Willem de Bruijn <willemb@google.com>
----

As discussed in http://patchwork.ozlabs.org/patch/427384/
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Tue, 13 Jan 2015 21:06:07 +0000 (16:06 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-01-13

This series contains updates to i40e and i40evf.

Mitch provides a fix for i40e to move the call to pci_disable_sriov() so
that it is called earlier to ensure that the PF driver won't free VF
resources before the VF remove routine can complete.  Also cleans up
redundant and duplicate code in the i40evf.  Refactors the i40evf
shutdown code and let the watchdog take care of shutting things down.
Fix a possible memory leak, if we are using VLANs and the communication
with the PF fail during shutdown.  On some versions of the firmware, the
VF admin send queue may become stalled.  In this case, the easiest
solution is to place another descriptor on the queue and the firmware
will then process both requests.

Greg adds a warning when the NPAR enabled partitions detected a link speed
less than 10 Gpbs.

Vasu removes redundant VN2VN MAC address which were already added by
the FCoE stack.

Shannon adds code to find how many partitions there are per port and
what is the current partition_id when in NPAR mode.  In multifunction
mode, make sure we only allow SR/IOV on the master PF of a port and
only allow partition 1 to set WoL, speed and flow control.

Kamil adds code to read the PBA block from shadow RAM and returns
the part number in a string format.

Catherine provides a fix to check if link state and link speed has
changed before exiting link event

v2: remove un-needed {} in patch #3 of the series based on feedback from
    Sergei Shtylyov
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoi40e: limit sriov to partition 1 of NPAR configurations
Shannon Nelson [Thu, 11 Dec 2014 07:06:34 +0000 (07:06 +0000)]
i40e: limit sriov to partition 1 of NPAR configurations

Make sure we only allow SR/IOV on the master PF of a port in multifunction
mode.  This should be the case anyway based on the num_vfs configured in
the NVM, but this will help make sure there's no question.  If we're not
in multifunction mode the partition_id will always be 1.

Change-ID: I8b2592366fe6782f15301bde2ebd1d4da240109d
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Don't exit link event early if link speed has changed
Catherine Sullivan [Thu, 11 Dec 2014 07:06:33 +0000 (07:06 +0000)]
i40e: Don't exit link event early if link speed has changed

Previously we were only checking if the link up state had changed,
and if it hadn't exiting the link event routine early. We should
also check if speed has changed, and if it has, stay and finish
processing the link event.

Change-ID: I9c8e0991b3f0279108a7858898c3c5ce0a9856b8
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: limit WoL and link settings to partition 1
Shannon Nelson [Thu, 11 Dec 2014 07:06:32 +0000 (07:06 +0000)]
i40e: limit WoL and link settings to partition 1

When in multi-function mode, e.g. Dell's NPAR, only partition 1
of each MAC is allowed to set WoL, speed, and flow control.

Change-ID: I87a9debc7479361c55a71f0120294ea319f23588
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Adding function for reading PBA String
Kamil Krawczyk [Thu, 11 Dec 2014 07:06:31 +0000 (07:06 +0000)]
i40e: Adding function for reading PBA String

Function will read PBA Block from Shadow RAM and return it in a string format.

Change-ID: I4ee7059f6e21bd0eba38687da15e772e0b4ab36e
Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: find partition_id in npar mode
Shannon Nelson [Thu, 11 Dec 2014 07:06:30 +0000 (07:06 +0000)]
i40e/i40evf: find partition_id in npar mode

When in NPAR mode the driver instance might be controlling the base
partition or one of the other "fake" PFs.  There are some things that
can only be done by the base partition, aka partition_id 1.  This code
does a bit of work to find how many partitions are there per port and
what is the current partition_id.

Change-ID: Iba427f020a1983d02147d86f121b3627e20ee21d
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: remove VN2VN related mac filters
Vasu Dev [Thu, 11 Dec 2014 07:06:28 +0000 (07:06 +0000)]
i40e: remove VN2VN related mac filters

These mac address already added by FCoE stack above netdev,
therefore adding them here is redundant.

Change-ID: Ia5b59f426f57efd20f8945f7c6cc5d741fbe06e5
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Add warning for NPAR partitions with link speed less than 10Gbps
Greg Rose [Thu, 11 Dec 2014 07:06:27 +0000 (07:06 +0000)]
i40e: Add warning for NPAR partitions with link speed less than 10Gbps

NPAR enabled partitions should warn the user when detected link speed is
less than 10Gpbs.

Change-ID: I7728bb8ce279bf0f4f755d78d7071074a4eb5f69
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40evf: kick a stalled admin queue
Mitch A Williams [Tue, 9 Dec 2014 08:53:08 +0000 (08:53 +0000)]
i40evf: kick a stalled admin queue

On some versions of the firmware, the VF admin send queue may become
stalled. In this case, the easiest solution is to just place another
descriptor on the queue; the firmware will then process both requests.

The early init code already accounts for this, but the runtime code does
not. In the watchdog task, check for the stall condition, and if it's
found, send our API version to the PF. When the PF replies, just ignore
the reply.

Change-ID: I380d78185a4f284d649c44d263e648afc9b4d50c
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40evf: enable interrupt 0 appropriately
Mitch A Williams [Tue, 9 Dec 2014 08:53:07 +0000 (08:53 +0000)]
i40evf: enable interrupt 0 appropriately

Don't enable vector 0 in the ISR, just schedule the adminq task and let
it enable the vector. This prevents the task from being called
reentrantly. Make sure that the vector is enabled on all exit paths of
the adminq task, including error exits.

Change-ID: I53f3d14f91ed7a9e90291ea41c681122a5eca5b5
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40evf: don't fire traffic IRQs when the interface is down
Mitch A Williams [Tue, 9 Dec 2014 08:53:06 +0000 (08:53 +0000)]
i40evf: don't fire traffic IRQs when the interface is down

There is always a possibility that MSI-X interrupts can get lost. To
keep this problem from stalling the driver, we fire all of our MSI-X
vectors during the watchdog routine. However, we should not fire the
traffic vectors when the interface is closed. In this case, just fire
vector 0, which is used for admin queue events.

As a result, we do not enable the interrupt cause for vector 0. This
can cause the admin queue handler to be called reentrantly, which
causes a scary "critical section violation" message to be logged,
even though no real damage is done.

Change-ID: Ic43a5184708ab2cb9a23fca7dedd808a46717795
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40evf: remove leftover VLAN filters
Mitch A Williams [Tue, 9 Dec 2014 08:53:05 +0000 (08:53 +0000)]
i40evf: remove leftover VLAN filters

If we're using VLANs and communications with the PF fail during
shutdown, we will leak memory because not all of the VLAN filters will
be removed. To eliminate this possibility, go through the list again
right before the module is removed and delete any leftover entries.

Change-ID: Id3b5315c47ca0a61ae123a96ff345d010bc41aed
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40evf: refactor shutdown code
Mitch A Williams [Tue, 9 Dec 2014 08:53:04 +0000 (08:53 +0000)]
i40evf: refactor shutdown code

If the VF driver is running in the host, the shutdown code is completely
broken. We cannot wait in our down routine for the PF to respond to our
requests, as its admin queue task will never run while we hold the lock.

Instead, we schedule operations, then let the watchdog take care of
shutting things down. If the driver is being removed, then wait in the
remove routine until the watchdog is done before continuing.

Change-ID: I93a58d17389e8d6b58f21e430b56ed7b4590b2c5
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoopenvswitch: Remove unnecessary version.h inclusion
Syam Sidhardhan [Mon, 12 Jan 2015 15:19:35 +0000 (20:49 +0530)]
openvswitch: Remove unnecessary version.h inclusion

version.h inclusion is not necessary as detected by versioncheck.

Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoi40evf: Remove some scary log messages
Mitch A Williams [Tue, 9 Dec 2014 08:53:03 +0000 (08:53 +0000)]
i40evf: Remove some scary log messages

These messages may be triggered during normal init of the driver if the
PF or FW take a long time to respond. There's nothing really wrong, so
don't freak people out logging messages.

If the communication channel really is dead, then we'll retry a few
times and give up. This will log a different more scary message that
should cause consternation. This allows the user to more easily detect a
genuine failure.

Change-ID: I6e2b758d4234a3a09c1015c82c8f2442a697cbdb
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40evf: remove redundant code
Mitch A Williams [Tue, 9 Dec 2014 08:53:02 +0000 (08:53 +0000)]
i40evf: remove redundant code

These functions are redundant and duplicate functionality found in
i40evf_free_all_[tx|rx]_resources.

Change-ID: Ia199908926d7a1a4b8247f75f89b5da24c9b149c
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: disable IOV before freeing resources
Mitch A Williams [Tue, 9 Dec 2014 08:53:01 +0000 (08:53 +0000)]
i40e: disable IOV before freeing resources

If VF drivers are loaded in the host OS, the call to pci_disable_sriov()
will cause these drivers' remove routines to be called. If the PF driver
has already freed VF resources before this happens, then the VF remove
routine can't properly communicate with the PF driver causing all sorts
of mayhem and error messages and hurt feelings.

To fix this, we move the call to pci_disable_sriov() up to the top of
the function and let it complete before freeing any VF resources.

Change-ID: I397c3997a00f6408e32b7735273911e499600236
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agotcp: avoid reducing cwnd when ACK+DSACK is received
Sébastien Barré [Mon, 12 Jan 2015 09:30:40 +0000 (10:30 +0100)]
tcp: avoid reducing cwnd when ACK+DSACK is received

With TLP, the peer may reply to a probe with an
ACK+D-SACK, with ack value set to tlp_high_seq. In the current code,
such ACK+DSACK will be missed and only at next, higher ack will the TLP
episode be considered done. Since the DSACK is not present anymore,
this will cost a cwnd reduction.

This patch ensures that this scenario does not cause a cwnd reduction, since
receiving an ACK+DSACK indicates that both the initial segment and the probe
have been received by the peer.

The following packetdrill test, from Neal Cardwell, validates this patch:

// Establish a connection.
0     socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0     setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0    bind(3, ..., ...) = 0
+0    listen(3, 1) = 0

+0    < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+.020 < . 1:1(0) ack 1 win 257
+0    accept(3, ..., ...) = 4

// Send 1 packet.
+0    write(4, ..., 1000) = 1000
+0    > P. 1:1001(1000) ack 1

// Loss probe retransmission.
// packets_out == 1 => schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
// In this case, this means: 1.5*RTT + 200ms = 230ms
+.230 > P. 1:1001(1000) ack 1
+0    %{ assert tcpi_snd_cwnd == 10 }%

// Receiver ACKs at tlp_high_seq with a DSACK,
// indicating they received the original packet and probe.
+.020 < . 1:1(0) ack 1001 win 257 <sack 1:1001,nop,nop>
+0    %{ assert tcpi_snd_cwnd == 10 }%

// Send another packet.
+0    write(4, ..., 1000) = 1000
+0    > P. 1001:2001(1000) ack 1

// Receiver ACKs above tlp_high_seq, which should end the TLP episode
// if we haven't already. We should not reduce cwnd.
+.020 < . 1:1(0) ack 2001 win 257
+0    %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%

Credits:
-Gregory helped in finding that tcp_process_tlp_ack was where the cwnd
got reduced in our MPTCP tests.
-Neal wrote the packetdrill test above
-Yuchung reworked the patch to make it more readable.

Cc: Gregory Detal <gregory.detal@uclouvain.be>
Cc: Nandita Dukkipati <nanditad@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'rhashtable-next'
David S. Miller [Tue, 13 Jan 2015 19:01:06 +0000 (14:01 -0500)]
Merge branch 'rhashtable-next'

Ying Xue says:

====================
remove nl_sk_hash_lock from netlink socket

After tipc socket successfully avoids the involvement of an extra lock
with rhashtable_lookup_insert(), it's possible for netlink socket to
remove its hash socket lock now. But as netlink socket needs a compare
function to look for an object, we first introduce a new function
called rhashtable_lookup_compare_insert() in commit #1 which is
implemented based on original rhashtable_lookup_insert(). We
subsequently remove nl_sk_hash_lock from netlink socket with the new
introduced function in commit #2. Lastly, as Thomas requested, we add
commit #3 to indicate the implementation of what the grow and shrink
decision function must enforce min/max shift.

v2:
 As Thomas pointed out, there was a race between checking portid and
 then setting it in commit #2. Now use socket lock to make the process
 of both checking and setting portid atomic, and then eliminate the
 race.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: add a note for grow and shrink decision functions
Ying Xue [Mon, 12 Jan 2015 06:52:24 +0000 (14:52 +0800)]
rhashtable: add a note for grow and shrink decision functions

As commit c0c09bfdc415 ("rhashtable: avoid unnecessary wakeup for
worker queue") moves condition statements of verifying whether hash
table size exceeds its maximum threshold or reaches its minimum
threshold from resizing functions to resizing decision functions,
we should add a note in rhashtable.h to indicate the implementation
of what the grow and shrink decision function must enforce min/max
shift, otherwise, it's failed to take min/max shift's set watermarks
into effect.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonetlink: eliminate nl_sk_hash_lock
Ying Xue [Mon, 12 Jan 2015 06:52:23 +0000 (14:52 +0800)]
netlink: eliminate nl_sk_hash_lock

As rhashtable_lookup_compare_insert() can guarantee the process
of search and insertion is atomic, it's safe to eliminate the
nl_sk_hash_lock. After this, object insertion or removal will
be protected with per bucket lock on write side while object
lookup is guarded with rcu read lock on read side.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: involve rhashtable_lookup_compare_insert routine
Ying Xue [Mon, 12 Jan 2015 06:52:22 +0000 (14:52 +0800)]
rhashtable: involve rhashtable_lookup_compare_insert routine

Introduce a new function called rhashtable_lookup_compare_insert()
which is very similar to rhashtable_lookup_insert(). But the former
makes use of users' given compare function to look for an object,
and then inserts it into hash table if found. As the entire process
of search and insertion is under protection of per bucket lock, this
can help users to avoid the involvement of extra lock.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'tuntap_queues'
David S. Miller [Mon, 12 Jan 2015 22:05:14 +0000 (17:05 -0500)]
Merge branch 'tuntap_queues'

Pankaj Gupta says:

====================
Increase the limit of tuntap queues

Networking under KVM works best if we allocate a per-vCPU rx and tx
queue in a virtual NIC. This requires a per-vCPU queue on the host side.
Modern physical NICs have multiqueue support for large number of queues.
To scale vNIC to run multiple queues parallel to maximum number of vCPU's
we need to increase number of queues support in tuntap.

Changes from v4:
PATCH2: Michael.S.Tsirkin - Updated change comment message.

Changes from v3:
PATCH1: Michael.S.Tsirkin - Some cleanups and updated commit message.
                            Perf numbers on 10 Gbs NIC
Changes from v2:
PATCH 3: David Miller     - flex array adds extra level of indirection
                            for preallocated array.(dropped, as flow array
    is allocated using kzalloc with failover to zalloc).
Changes from v1:
PATCH 2: David Miller     - sysctl changes to limit number of queues
                            not required for unprivileged users(dropped).

Changes from RFC
PATCH 1: Sergei Shtylyov  - Add an empty line after declarations.
PATCH 2: Jiri Pirko -       Do not introduce new module paramaters.
 Michael.S.Tsirkin- We can use sysctl for limiting max number
                            of queues.

This series is to increase the number of tuntap queues. Original work is being
done by 'jasowang@redhat.com'. I am taking this 'https://lkml.org/lkml/2013/6/19/29'
patch series as a reference. As per discussion in the patch series:

There were two reasons which prevented us from increasing number of tun queues:

- The netdev_queue array in netdevice were allocated through kmalloc, which may
  cause a high order memory allocation too when we have several queues.
  E.g. sizeof(netdev_queue) is 320, which means a high order allocation would
  happens when the device has more than 16 queues.

- We store the hash buckets in tun_struct which results a very large size of
  tun_struct, this high order memory allocation fail easily when the memory is
  fragmented.

The patch 60877a32bce00041528576e6b8df5abe9251fa73 increases the number of tx
queues. Memory allocation fallback to vzalloc() when kmalloc() fails.

This series tries to address following issues:

- Increase the number of netdev_queue queues for rx similarly its done for tx
  queues by falling back to vzalloc() when memory allocation with kmalloc() fails.

- Increase number of queues to 256, maximum number is equal to maximum number
  of vCPUS allowed in a guest.

I have also done testing with multiple parallel Netperf sessions for different
combination of queues and CPU's. It seems to be working fine without much increase
in cpu load with increase in number of queues. I also see good increase in throughput
with increase in number of queues. Though i had limitation of 8 physical CPU's.

For this test: Two Hosts(Host1 & Host2) are directly connected with cable
Host1 is running Guest1. Data is sent from Host2 to Guest1 via Host1.

Host kernel: 3.19.0-rc2+, AMD Opteron(tm) Processor 6320
NIC : Emulex Corporation OneConnect 10Gb NIC (be3)

Patch Applied  %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle  throughput
Single Queue, 2 vCPU's
-------------
Before Patch :all    0.19    0.00    0.16    0.07    0.04    0.10    0.00    0.18    0.00   99.26  57864.18
After  Patch :all    0.99    0.00    0.64    0.69    0.07    0.26    0.00    1.58    0.00   95.77  57735.77

With 2 Queues, 2 vCPU's
---------------
Before Patch :all    0.19    0.00    0.19    0.10    0.04    0.11    0.00    0.28    0.00   99.08  63083.09
After  Patch :all    0.87    0.00    0.73    0.78    0.09    0.35    0.00    2.04    0.00   95.14  62917.03

With 4 Queues, 4 vCPU's
--------------
Before Patch :all    0.20    0.00    0.21    0.11    0.04    0.12    0.00    0.32    0.00   99.00  80865.06
After  Patch :all    0.71    0.00    0.93    0.85    0.11    0.51    0.00    2.62    0.00   94.27  86463.19

With 8 Queues, 8 vCPU's
--------------
Before Patch :all    0.19    0.00    0.18    0.09    0.04    0.11    0.00    0.23    0.00   99.17  86795.31
After  Patch :all    0.65    0.00    1.18    0.93    0.13    0.68    0.00    3.38    0.00   93.05  89459.93

With 16 Queues, 8 vCPU's
--------------
After  Patch :all    0.61    0.00    1.59    0.97    0.18    0.92    0.00    4.32    0.00   91.41  120951.60
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotuntap: Increase the number of queues in tun.
Pankaj Gupta [Mon, 12 Jan 2015 06:11:29 +0000 (11:41 +0530)]
tuntap: Increase the number of queues in tun.

Networking under kvm works best if we allocate a per-vCPU RX and TX
queue in a virtual NIC. This requires a per-vCPU queue on the host side.

It is now safe to increase the maximum number of queues.
Preceding patch: 'net: allow large number of rx queues'
made sure this won't cause failures due to high order memory
allocations. Increase it to 256: this is the max number of vCPUs
KVM supports.

Size of tun_struct changes from 8512 to 10496 after this patch. This keeps
pages allocated for tun_struct before and after the patch to 3.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: David Gibson <dgibson@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: allow large number of rx queues
Pankaj Gupta [Mon, 12 Jan 2015 06:11:28 +0000 (11:41 +0530)]
net: allow large number of rx queues

netif_alloc_rx_queues() uses kcalloc() to allocate memory
for "struct netdev_queue *_rx" array.
If we are doing large rx queue allocation kcalloc() might
fail, so this patch does a fallback to vzalloc().
Similar implementation is done for tx queue allocation in
netif_alloc_netdev_queues().

We avoid failure of high order memory allocation
with the help of vzalloc(), this allows us to do large
rx and tx queue allocation which in turn helps us to
increase the number of queues in tun.

As vmalloc() adds overhead on a critical network path,
__GFP_REPEAT flag is used with kzalloc() to do this fallback
only when really needed.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Gibson <dgibson@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoteam: Remove dead code
Kenneth Williams [Mon, 12 Jan 2015 00:43:54 +0000 (16:43 -0800)]
team: Remove dead code

The deleted lines are called from a function which is called:
1) Only through __team_options_register via team_options_register and
2) Only during initialization / mode initialization when there are no
ports attached.
Therefore the ports list is guarenteed to be empty and this code will
never be executed.

Signed-off-by: Kenneth Williams <ken@williamsclan.us>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bnx2x: avoid macro redefinition
David Decotigny [Sun, 11 Jan 2015 19:42:37 +0000 (11:42 -0800)]
net: bnx2x: avoid macro redefinition

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: sched: sch_teql: Remove unused function
Rickard Strandqvist [Sun, 11 Jan 2015 14:08:46 +0000 (15:08 +0100)]
net: sched: sch_teql: Remove unused function

Remove the function teql_neigh_release() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: xfrm: xfrm_algo: Remove unused function
Rickard Strandqvist [Sun, 11 Jan 2015 13:03:35 +0000 (14:03 +0100)]
net: xfrm: xfrm_algo: Remove unused function

Remove the function aead_entries() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'bridge_vlan_ranges'
David S. Miller [Mon, 12 Jan 2015 21:47:13 +0000 (16:47 -0500)]
Merge branch 'bridge_vlan_ranges'

Roopa Prabhu says:

====================
bridge: support for vlan range in setlink/dellink

This series adds new flags in IFLA_BRIDGE_VLAN_INFO to indicate
vlan range.

Will post corresponding iproute2 patches if these get accepted.

v1-> v2
    - changed patches to use a nested list attribute
    IFLA_BRIDGE_VLAN_INFO_LIST as suggested by scott feldman
    - dropped notification changes from the series. Will post them
    separately after this range message is accepted.

v2 -> v3
    - incorporated some review feedback
    - include patches to fill vlan ranges during getlink
    - Dropped IFLA_BRIDGE_VLAN_INFO_LIST. I think it may get
    confusing to userspace if we introduce yet another way to
    send lists. With getlink already sending nested
    IFLA_BRIDGE_VLAN_INFO in IFLA_AF_SPEC, It seems better to
    use the existing format for lists and just use the flags from v2
    to mark vlan ranges
====================

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: new function to pack vlans into ranges during gets
Roopa Prabhu [Sat, 10 Jan 2015 15:31:14 +0000 (07:31 -0800)]
bridge: new function to pack vlans into ranges during gets

This patch adds new function to pack vlans into ranges
whereever applicable using the flags BRIDGE_VLAN_INFO_RANGE_BEGIN
and BRIDGE VLAN_INFO_RANGE_END

Old vlan packing code is moved to a new function and continues to be
called when filter_mask is RTEXT_FILTER_BRVLAN.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agortnetlink: new filter RTEXT_FILTER_BRVLAN_COMPRESSED
Roopa Prabhu [Sat, 10 Jan 2015 15:31:13 +0000 (07:31 -0800)]
rtnetlink: new filter RTEXT_FILTER_BRVLAN_COMPRESSED

This filter is same as RTEXT_FILTER_BRVLAN except that it tries
to compress the consecutive vlans into ranges.

This helps on systems with large number of configured vlans.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: support for multiple vlans and vlan ranges in setlink and dellink requests
Roopa Prabhu [Sat, 10 Jan 2015 15:31:12 +0000 (07:31 -0800)]
bridge: support for multiple vlans and vlan ranges in setlink and dellink requests

This patch changes bridge IFLA_AF_SPEC netlink attribute parser to
look for more than one IFLA_BRIDGE_VLAN_INFO attribute. This allows
userspace to pack more than one vlan in the setlink msg.

The dumps were already sending more than one vlan info in the getlink msg.

This patch also adds bridge_vlan_info flags BRIDGE_VLAN_INFO_RANGE_BEGIN and
BRIDGE_VLAN_INFO_RANGE_END to indicate start and end of vlan range

This patch also deletes unused ifla_br_policy.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agodrivers: net: xen-netfront: remove residual dead code
Vincenzo Maffione [Sat, 10 Jan 2015 09:20:25 +0000 (10:20 +0100)]
drivers: net: xen-netfront: remove residual dead code

This patch removes some unused arrays from the netfront private
data structures. These arrays were used in "flip" receive mode.

Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoDriver: Vmxnet3: Reinitialize vmxnet3 backend on wakeup from hibernate
Shrikrishna Khare [Fri, 9 Jan 2015 23:19:14 +0000 (15:19 -0800)]
Driver: Vmxnet3: Reinitialize vmxnet3 backend on wakeup from hibernate

Failing to reinitialize on wakeup results in loss of network connectivity for
vmxnet3 interface.

Signed-off-by: Srividya Murali <smurali@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Reviewed-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobonding: cleanup bond_opts array
Jonathan Toppins [Fri, 9 Jan 2015 18:31:08 +0000 (13:31 -0500)]
bonding: cleanup bond_opts array

Remove the empty array element initializer and size the array with
BOND_OPT_LAST so the compiler will complain if more elements are in
there than should be.

An interesting unwanted side effect of this initializer is that if one
inserts new options into the middle of the array then this initializer
will zero out the option that equals BOND_OPT_TLB_DYNAMIC_LB+1.

Example:
Extend the OPTS enum:
enum {
   ...
   BOND_OPT_TLB_DYNAMIC_LB,
   BOND_OPT_LACP_NEW1,
   BOND_OPT_LAST
};

Now insert into bond_opts array:
static const struct bond_option bond_opts[] = {
      ...
      [BOND_OPT_LACP_RATE] = { .... unchanged stuff .... },
      [BOND_OPT_LACP_NEW1] = { ... new stuff ... },
      ...
      [BOND_OPT_TLB_DYNAMIC_LB] = { .... unchanged stuff ....},
      { } // MARK A
};

Since BOND_OPT_LACP_NEW1 = BOND_OPT_TLB_DYNAMIC_LB+1, the last
initializer (MARK A) will overwrite the contents of BOND_OPT_LACP_NEW1
and can be easily viewed with the crash utility.

Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'tipc-namespaces'
David S. Miller [Mon, 12 Jan 2015 21:24:39 +0000 (16:24 -0500)]
Merge branch 'tipc-namespaces'

Ying Xue says:

====================
tipc: make tipc support namespace

This patchset aims to add net namespace support for TIPC stack.

Currently TIPC module declares the following global resources:
- TIPC network idenfication number
- TIPC node table
- TIPC bearer list table
- TIPC broadcast link
- TIPC socket reference table
- TIPC name service table
- TIPC node address
- TIPC service subscriber server
- TIPC random value
- TIPC netlink

In order that TIPC is aware of namespace, above each resource must be
allocated, initialized and destroyed inside per namespace. Therefore,
the major works of this patchset are to isolate these global resources
and make them private for each namespace. However, before these changes
come true, some necessary preparation works must be first done: convert
socket reference table with generic rhashtable, cleanup core.c and
core.h files, remove unnecessary wrapper functions for kernel timer
interfaces and so on.

It should be noted that commit ##1 ("tipc: fix bug in broadcast
retransmit code") was already submitted to 'net' tree, so please see
below link:

http://patchwork.ozlabs.org/patch/426717/

Since it is prerequisite for the rest of the series to apply, I
prepend them to the series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: make netlink support net namespace
Ying Xue [Fri, 9 Jan 2015 07:27:13 +0000 (15:27 +0800)]
tipc: make netlink support net namespace

Currently tipc module only allows users sitting on "init_net" namespace
to configure it through netlink interface. But now almost each tipc
component is able to be aware of net namespace, so it's time to open
the permission for users residing in other namespaces, allowing them
to configure their own tipc stack instance through netlink interface.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: make tipc random value aware of net namespace
Ying Xue [Fri, 9 Jan 2015 07:27:12 +0000 (15:27 +0800)]
tipc: make tipc random value aware of net namespace

After namespace is supported, each namespace should own its private
random value. So the global variable representing the random value
must be moved to tipc_net structure.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: make subscriber server support net namespace
Ying Xue [Fri, 9 Jan 2015 07:27:11 +0000 (15:27 +0800)]
tipc: make subscriber server support net namespace

TIPC establishes one subscriber server which allows users to subscribe
their interesting name service status. After tipc supports namespace,
one dedicated tipc stack instance is created for each namespace, and
each instance can be deemed as one independent TIPC node. As a result,
subscriber server must be built for each namespace.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: make tipc node address support net namespace
Ying Xue [Fri, 9 Jan 2015 07:27:10 +0000 (15:27 +0800)]
tipc: make tipc node address support net namespace

If net namespace is supported in tipc, each namespace will be treated
as a separate tipc node. Therefore, every namespace must own its
private tipc node address. This means the "tipc_own_addr" global
variable of node address must be moved to tipc_net structure to
satisfy the requirement. It's turned out that users also can assign
node address for every namespace.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: name tipc name table support net namespace
Ying Xue [Fri, 9 Jan 2015 07:27:09 +0000 (15:27 +0800)]
tipc: name tipc name table support net namespace

TIPC name table is used to store the mapping relationship between
TIPC service name and socket port ID. When tipc supports namespace,
it allows users to publish service names only owned by a certain
namespace. Therefore, every namespace must have its private name
table to prevent service names published to one namespace from being
contaminated by other service names in another namespace. Therefore,
The name table global variable (ie, nametbl) and its lock must be
moved to tipc_net structure, and a parameter of namespace must be
added for necessary functions so that they can obtain name table
variable defined in tipc_net structure.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: make tipc socket support net namespace
Ying Xue [Fri, 9 Jan 2015 07:27:08 +0000 (15:27 +0800)]
tipc: make tipc socket support net namespace

Now tipc socket table is statically allocated as a global variable.
Through it, we can look up one socket instance with port ID, insert
a new socket instance to the table, and delete a socket from the
table. But when tipc supports net namespace, each namespace must own
its specific socket table. So the global variable of socket table
must be redefined in tipc_net structure. As a concequence, a new
socket table will be allocated when a new namespace is created, and
a socket table will be deallocated when namespace is destroyed.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: make tipc broadcast link support net namespace
Ying Xue [Fri, 9 Jan 2015 07:27:07 +0000 (15:27 +0800)]
tipc: make tipc broadcast link support net namespace

TIPC broadcast link is statically established and its relevant states
are maintained with the global variables: "bcbearer", "bclink" and
"bcl". Allowing different namespace to own different broadcast link
instances, these variables must be moved to tipc_net structure and
broadcast link instances would be allocated and initialized when
namespace is created.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: make bearer list support net namespace
Ying Xue [Fri, 9 Jan 2015 07:27:06 +0000 (15:27 +0800)]
tipc: make bearer list support net namespace

Bearer list defined as a global variable is used to store bearer
instances. When tipc supports net namespace, bearers created in
one namespace must be isolated with others allocated in other
namespaces, which requires us that the bearer list(bearer_list)
must be moved to tipc_net structure. As a result, a net namespace
pointer has to be passed to functions which access the bearer list.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: make tipc node table aware of net namespace
Ying Xue [Fri, 9 Jan 2015 07:27:05 +0000 (15:27 +0800)]
tipc: make tipc node table aware of net namespace

Global variables associated with node table are below:
- node table list (node_htable)
- node hash table list (tipc_node_list)
- node table lock (node_list_lock)
- node number counter (tipc_num_nodes)
- node link number counter (tipc_num_links)

To make node table support namespace, above global variables must be
moved to tipc_net structure in order to keep secret for different
namespaces. As a consequence, these variables are allocated and
initialized when namespace is created, and deallocated when namespace
is destroyed. After the change, functions associated with these
variables have to utilize a namespace pointer to access them. So
adding namespace pointer as a parameter of these functions is the
major change made in the commit.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: involve namespace infrastructure
Ying Xue [Fri, 9 Jan 2015 07:27:04 +0000 (15:27 +0800)]
tipc: involve namespace infrastructure

Involve namespace infrastructure, make the "tipc_net_id" global
variable aware of per namespace, and rename it to "net_id". In
order that the conversion can be successfully done, an instance
of networking namespace must be passed to relevant functions,
allowing them to access the "net_id" variable of per namespace.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: remove unused tipc_link_get_max_pkt routine
Ying Xue [Fri, 9 Jan 2015 07:27:03 +0000 (15:27 +0800)]
tipc: remove unused tipc_link_get_max_pkt routine

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: feed tipc sock pointer to tipc_sk_timeout routine
Ying Xue [Fri, 9 Jan 2015 07:27:02 +0000 (15:27 +0800)]
tipc: feed tipc sock pointer to tipc_sk_timeout routine

In order to make tipc socket table aware of namespace, a networking
namespace instance must be passed to tipc_sk_lookup(), allowing it
to look up tipc socket instance with a given port ID from a concrete
socket table. However, as now tipc_sk_timeout() only has one port ID
parameter and is not namespace aware, it's unable to obtain a correct
socket instance through tipc_sk_lookup() just with a port ID,
especially after namespace is completely supported.

If port ID is replaced with socket instance as tipc_sk_timeout()'s
parameter, it's unnecessary to look up socket table. But as the timer
handler - tipc_sk_timeout() is run asynchronously, socket reference
must be held before its timer is launched, and must be carefully
checked to identify whether the socket reference needs to be put or
not when its timer is terminated.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: cleanup core.c and core.h files
Ying Xue [Fri, 9 Jan 2015 07:27:01 +0000 (15:27 +0800)]
tipc: cleanup core.c and core.h files

Only the works of initializing and shutting down tipc module are done
in core.h and core.c files, so all stuffs which are not closely
associated with the two tasks should be moved to appropriate places.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: remove unnecessary wrapper functions of kernel timer APIs
Ying Xue [Fri, 9 Jan 2015 07:27:00 +0000 (15:27 +0800)]
tipc: remove unnecessary wrapper functions of kernel timer APIs

Not only some wrapper function like k_term_timer() is empty, but also
some others including k_start_timer() and k_cancel_timer() don't return
back any value to its caller, what's more, there is no any component
in the kernel world to do such thing. Therefore, these timer interfaces
defined in tipc module should be purged.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: remove tipc_core_start/stop routines
Ying Xue [Fri, 9 Jan 2015 07:26:59 +0000 (15:26 +0800)]
tipc: remove tipc_core_start/stop routines

Remove redundant wrapper functions like tipc_core_start() and
tipc_core_stop(), and directly move them to their callers, such
as tipc_init() and tipc_exit(), having us clearly know what are
really done in both initialization and deinitialzation functions.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: fix bug in broadcast retransmit code
Jon Maloy [Fri, 9 Jan 2015 07:26:58 +0000 (15:26 +0800)]
tipc: fix bug in broadcast retransmit code

In commit 58dc55f25631178ee74cd27185956a8f7dcb3e32 ("tipc: use generic
SKB list APIs to manage link transmission queue") we replace all list
traversal loops with the macros skb_queue_walk() or
skb_queue_walk_safe(). While the previous loops were based on the
assumption that the list was NULL-terminated, the standard macros
stop when the iterator reaches the list head, which is non-NULL.

In the function bclink_retransmit_pkt() this macro replacement has
lead to a bug. When we receive a BCAST STATE_MSG we unconditionally
call the function bclink_retransmit_pkt(), whether there really is
anything to retransmit or not, assuming that the sequence number
comparisons will lead to the correct behavior. However, if the
transmission queue is empty, or if there are no eligible buffers in
the transmission queue, we will by mistake pass the list head pointer
to the function tipc_link_retransmit(). Since the list head is not a
valid sk_buff, this leads to a crash.

In this commit we fix this by only calling tipc_link_retransmit()
if we actually found eligible buffers in the transmission queue.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'cxgb4-next'
David S. Miller [Mon, 12 Jan 2015 21:19:43 +0000 (16:19 -0500)]
Merge branch 'cxgb4-next'

Anish Bhatt says:

====================
All Chelsio drivers : Cleanup CPL messages macros

This patch series cleans up all register defines/MACROS defined in t4_msg.h and
affected files as part of the continuing cleanup effort

The patches series is created against 'net-next' tree and  includes patches
to the cxgb4, cxgb4vf, iw_cxgb4, cxgb4i and csiostor drivers.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoiw_cxgb4/cxgb4/cxgb4vf/cxgb4i/csiostor: Cleanup register defines/macros related to...
Hariprasad Shenai [Fri, 9 Jan 2015 05:38:16 +0000 (21:38 -0800)]
iw_cxgb4/cxgb4/cxgb4vf/cxgb4i/csiostor: Cleanup register defines/macros related to all other cpl messages

This patch cleanups all other macros/register define related to
CPL messages that are defined in t4_msg.h and the affected files

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoiw_cxgb4/cxgb4/cxgb4i: Cleanup register defines/MACROS related to CM CPL messages
Hariprasad Shenai [Fri, 9 Jan 2015 05:38:15 +0000 (21:38 -0800)]
iw_cxgb4/cxgb4/cxgb4i: Cleanup register defines/MACROS related to CM CPL messages

This patch cleanups all macros/register define related to connection management
CPL messages that are defined in t4_msg.h and the affected files

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: Add ability to enable TSO
Toshiaki Makita [Fri, 9 Jan 2015 05:16:40 +0000 (14:16 +0900)]
bridge: Add ability to enable TSO

Currently a bridge device turns off TSO feature if no bridge ports
support it. We can always enable it, since packets can be segmented on
ports by software as well as on the bridge device.
This will reduce the number of packets processed in the bridge.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'r8152-next'
David S. Miller [Mon, 12 Jan 2015 21:10:32 +0000 (16:10 -0500)]
Merge branch 'r8152-next'

Hayes Wang says:

====================
r8152: adjust r8152_submit_rx

v2:
Replace the patch #1 with "call rtl_start_rx after netif_carrier_on".

For patch #2, replace checking tp->speed with netif_carrier_ok.

v1:
Avoid r8152_submit_rx() from submitting rx during unexpected
moment. This could reduce the time of stopping rx.

For patch #1, the tp->speed should be updated early. Then,
the patch #2 could use it to check the current linking status.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8152: check the status before submitting rx
hayeswang [Fri, 9 Jan 2015 02:26:36 +0000 (10:26 +0800)]
r8152: check the status before submitting rx

Don't submit the rx if the device is unplugged, stopped, or
linking down.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8152: call rtl_start_rx after netif_carrier_on
hayeswang [Fri, 9 Jan 2015 02:26:35 +0000 (10:26 +0800)]
r8152: call rtl_start_rx after netif_carrier_on

Remove rtl_start_rx() from rtl_enable() and put it after calling
netif_carrier_on().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agovxlan: Improve support for header flags
Tom Herbert [Thu, 8 Jan 2015 20:31:18 +0000 (12:31 -0800)]
vxlan: Improve support for header flags

This patch cleans up the header flags of VXLAN in anticipation of
defining some new ones:

- Move header related definitions from vxlan.c to vxlan.h
- Change VXLAN_FLAGS to be VXLAN_HF_VNI (only currently defined flag)
- Move check for unknown flags to after we find vxlan_sock, this
  assumes that some flags may be processed based on tunnel
  configuration
- Add a comment about why the stack treating unknown set flags as an
  error instead of ignoring them

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agopacket: make packet too small warning match condition
Willem de Bruijn [Thu, 8 Jan 2015 16:29:18 +0000 (11:29 -0500)]
packet: make packet too small warning match condition

The expression in ll_header_truncated() tests less than or equal, but
the warning prints less than. Update the warning.

Reported-by: Jouni Malinen <jkmalinen@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotg3: move init/deinit from open/close to probe/remove
Ivan Vecera [Thu, 8 Jan 2015 15:13:07 +0000 (16:13 +0100)]
tg3: move init/deinit from open/close to probe/remove

Move init and deinit of PTP support from open/close functions
to probe/remove funcs to avoid removing/re-adding of associated PTP
device(s) during ifup/ifdown.

v2: tg3_ptp_init call moved to correct place (thx. Prashant)

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: eth: xgene: devm_ioremap() returns NULL on error
Dan Carpenter [Thu, 8 Jan 2015 10:52:12 +0000 (13:52 +0300)]
net: eth: xgene: devm_ioremap() returns NULL on error

devm_ioremap() returns NULL on failure, it doesn't return an ERR_PTR.

Fixes: de7b5b3d790a ('net: eth: xgene: change APM X-Gene SoC platform ethernet to support ACPI')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocsiostor:fix sparse warnings
Praveen Madhavan [Fri, 9 Jan 2015 15:55:16 +0000 (21:25 +0530)]
csiostor:fix sparse warnings

This patch fixes sparse warning reported by kbuild.
Apply this on net-next since it depends on previous commit.

drivers/scsi/csiostor/csio_hw.c:259:17: sparse: cast to restricted __le32
drivers/scsi/csiostor/csio_hw.c:536:31: sparse: incorrect type in assignment
(different base types)
drivers/scsi/csiostor/csio_hw.c:536:31:    expected unsigned int [unsigned]
[usertype] <noident>
drivers/scsi/csiostor/csio_hw.c:536:31:    got restricted __be32 [usertype]
<noident>
>> drivers/scsi/csiostor/csio_hw.c:2012:5: sparse: symbol 'csio_hw_prep_fw' was
not declared. Should it be static?

Signed-off-by: Praveen Madhavan <praveenm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agodoc: fix the compile fix of txtimestamp.c
Willem de Bruijn [Sat, 10 Jan 2015 17:08:18 +0000 (12:08 -0500)]
doc: fix the compile fix of txtimestamp.c

A fix to ipv6 structure definitions removed the now superfluous
definition of in6_pktinfo in this file.

But, use of the glibc definition requires defining _GNU_SOURCE
(see also https://sourceware.org/bugzilla/show_bug.cgi?id=6775).

Before this change, the following would fail for me:

  make
  make headers_install
  make M=Documentation/networking/timestamping

with

  Documentation/networking/timestamping/txtimestamp.c: In function '__recv_errmsg_cmsg':
  Documentation/networking/timestamping/txtimestamp.c:205:33: error: dereferencing pointer to incomplete type
  Documentation/networking/timestamping/txtimestamp.c:206:23: error: dereferencing pointer to incomplete type

After this patch compilation succeeded.

Fixes: cd91cc5bdddf ("doc: fix the compile error of txtimestamp.c")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'irda-next'
David S. Miller [Mon, 12 Jan 2015 02:40:07 +0000 (21:40 -0500)]
Merge branch 'irda-next'

Chunyan Zhang says:

====================
irda: Use ktime_t instead of timeval

This patch-set removed all uses of timeval and used ktime_t instead if
needed, since 32-bit time types will break in the year 2038.

This patch-set also used the ktime_xxx functions accordingly.
e.g.
* Used ktime_get to get the current time instead of do_gettimeofday.
* And, used ktime_us_delta to get the elapsed time directly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoirda: vlsi_ir: Replace timeval with ktime_t
Chunyan Zhang [Thu, 8 Jan 2015 04:01:32 +0000 (12:01 +0800)]
irda: vlsi_ir: Replace timeval with ktime_t

The vlsi ir driver uses 'timeval', which we try to remove in the kernel
because all 32-bit time types will break in the year 2038.

This patch also changes do_gettimeofday() to ktime_get() accordingly,
since ktime_get returns a ktime_t, but do_gettimeofday returns a
struct timeval, and the other reason is that ktime_get() uses
the monotonic clock.

This patch uses ktime_us_delta to get the elapsed time of microsecond,
and uses div_s64_rem to get what seconds & microseconds time elapsed
for printing.

This patch also changes the function 'vlsi_hard_start_xmit' to do the
same things as the others drivers, that is passing the remaining time
into udelay() instead of looping until enough time has passed.

Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoirda: stir4200: Replace timeval with ktime_t
Chunyan Zhang [Thu, 8 Jan 2015 04:01:31 +0000 (12:01 +0800)]
irda: stir4200: Replace timeval with ktime_t

The stir4200 driver uses 'timeval', which we try to remove in the kernel
because all 32-bit time types will break in the year 2038.

This patch also changes do_gettimeofday() to ktime_get() accordingly,
since ktime_get returns a ktime_t, but do_gettimeofday returns a
struct timeval, and the other reason is that ktime_get() uses
the monotonic clock.

This patch uses ktime_us_delta to get the elapsed time of microsecond.

Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoirda: nsc-ircc: Replace timeval with ktime_t
Chunyan Zhang [Thu, 8 Jan 2015 04:01:30 +0000 (12:01 +0800)]
irda: nsc-ircc: Replace timeval with ktime_t

The nsc ircc driver uses 'timeval', which we try to remove in the kernel
because all 32-bit time types will break in the year 2038.

This patch also changes do_gettimeofday() to ktime_get() accordingly,
since ktime_get returns a ktime_t, but do_gettimeofday returns a
struct timeval, and the other reason is that ktime_get() uses
the monotonic clock.

This patch uses ktime_us_delta to get the elapsed time, and in this
way it no longer needs to check for the overflow, because
ktime_us_delta returns time difference of microsecond.

Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoirda: irda-usb: Replace timeval with ktime_t
Chunyan Zhang [Thu, 8 Jan 2015 04:01:29 +0000 (12:01 +0800)]
irda: irda-usb: Replace timeval with ktime_t

The irda usb driver uses 'timeval', which we try to remove in the kernel
because all 32-bit time types will break in the year 2038.

This patch also changes do_gettimeofday() to ktime_get() accordingly,
since ktime_get returns a ktime_t, but do_gettimeofday returns a
struct timeval, and the other reason is that ktime_get() uses
the monotonic clock.

This patch uses ktime_us_delta to get the elapsed time, and in this
way it no longer needs to check for the overflow, because
ktime_us_delta returns time difference of microsecond.

Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoirda: ali-ircc: Replace timeval with ktime_t
Chunyan Zhang [Thu, 8 Jan 2015 04:01:28 +0000 (12:01 +0800)]
irda: ali-ircc: Replace timeval with ktime_t

The ali ircc driver uses 'timeval', which we try to remove in the kernel
because all 32-bit time types will break in the year 2038.

This patch also changes do_gettimeofday() to ktime_get() accordingly,
since ktime_get returns a ktime_t, but do_gettimeofday returns a
struct timeval, and the other reason is that ktime_get() uses
the monotonic clock.

This patch uses ktime_us_delta to get the elapsed time, and in this
way it no longer needs to check for the overflow, because
ktime_us_delta returns time difference of microsecond.

Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoirda: Removed all unused timeval variables
Chunyan Zhang [Thu, 8 Jan 2015 04:01:27 +0000 (12:01 +0800)]
irda: Removed all unused timeval variables

In the file au1k_ir.c & via-ircc.h, there were two unused definitions of the
timeval type members, this commit therefore removes this unneeded code.

In other three files, the same problem is the rx_time member is only ever
written, never read, so removed it entirely.

Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'sti_drivers'
David S. Miller [Sun, 11 Jan 2015 23:53:46 +0000 (18:53 -0500)]
Merge branch 'sti_drivers'

Peter Griffin says:

====================
Fix sti drivers whcih mix reg address spaces

A V2 of this old series incorporating Arnd and Lees Feedback form v1.

Following on from Arnds comments about the picophy driver here
https://lkml.org/lkml/2014/11/13/161, this series fixes the
remaining upstreamed drivers for STI, which are mixing address spaces
in the reg property. We do this in a way similar to the keystone
and bcm7445 platforms, by having sysconfig phandle/ offset pair
(where only one register is required). Or phandle / integer array
where multiple offsets in the same bank are needed).

This series breaks DT compatability! But the platform support
is WIP and only being used by the few developers who are upstreaming
support for it. I've made each change to the driver / dt doc / dt
file as a single atomic commit so the kernel will remain bisectable.

This series then also enables the picophy driver, and adds back in
the ehci/ohci dt nodes for stih410 which make use of the picophy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agostmmac: dwmac-sti: Pass sysconfig register offset via syscon dt property.
Peter Griffin [Wed, 7 Jan 2015 15:04:12 +0000 (15:04 +0000)]
stmmac: dwmac-sti: Pass sysconfig register offset via syscon dt property.

Based on Arnds review comments here https://lkml.org/lkml/2014/11/13/161,
we should not be mixing address spaces in the reg property like this driver
currently does. This patch updates the driver, dt docs and also the existing
dt nodes to pass the sysconfig offset in the syscon dt property.

This patch breaks DT compatibility! But this platform is considered WIP,
and is only used by a few developers who are upstreaming support for it.
This change has been done as a single atomic commit to ensure it is
bisectable.

Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoARM: multi_v7_defconfig: Enable stih407 usb picophy
Peter Griffin [Wed, 7 Jan 2015 15:04:11 +0000 (15:04 +0000)]
ARM: multi_v7_defconfig: Enable stih407 usb picophy

This patch enables the picoPHY usb phy which is used by
the usb2 and usb3 host controllers when controlling usb2/1.1
devices. It is found in stih407 family SoC's from STMicroelectronics.

Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoARM: STi: DT: STiH410: Add DT nodes for the ehci and ohci usb controllers.
Peter Griffin [Wed, 7 Jan 2015 15:04:10 +0000 (15:04 +0000)]
ARM: STi: DT: STiH410: Add DT nodes for the ehci and ohci usb controllers.

This patch adds the DT nodes for the extra ehci and ohci usb controllers
on the stih410 SoC.

Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoARM: STi: DT: STiH410: Add usb2 picophy dt nodes
Peter Griffin [Wed, 7 Jan 2015 15:04:09 +0000 (15:04 +0000)]
ARM: STi: DT: STiH410: Add usb2 picophy dt nodes

This patch adds the dt nodes for the extra usb2 picophys found on
the stih410.

These two picophys are used in conjunction with the extra ehci/ohci usb
controllers also found on the stih410 SoC.

Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoARM: STi: DT: STiH407: Add usb2 picophy dt nodes
Peter Griffin [Wed, 7 Jan 2015 15:04:08 +0000 (15:04 +0000)]
ARM: STi: DT: STiH407: Add usb2 picophy dt nodes

This patch adds the dt nodes for the usb2 picophy found on the stih407
device family. It is used on stih407 by the dwc3 usb3 controller when
controlling usb2/1.1 devices.

Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agophy: miphy365x: Pass sysconfig register offsets via syscfg dt property.
Peter Griffin [Wed, 7 Jan 2015 15:04:07 +0000 (15:04 +0000)]
phy: miphy365x: Pass sysconfig register offsets via syscfg dt property.

Based on Arnds review comments here https://lkml.org/lkml/2014/11/13/161,
update the miphy365 phy driver to access sysconfig register offsets via
syscfg dt property.

This is because the reg property should not be mixing address spaces
like it does currently for miphy365. This change then also aligns us
to how other platforms such as keystone and bcm7445 pass there syscon
offsets via DT.

This patch breaks DT compatibility, but this platform is considered WIP,
and is only used by a few developers who are upstreaming support for it.
This change has been done as a single atomic commit to ensure it is
bisectable.

Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agophy: phy-stih407-usb: Pass sysconfig register offsets via syscfg property.
Peter Griffin [Wed, 7 Jan 2015 15:04:06 +0000 (15:04 +0000)]
phy: phy-stih407-usb: Pass sysconfig register offsets via syscfg property.

Based on Arnds review comments here https://lkml.org/lkml/2014/11/13/161,
update the phy driver to not use the reg property to access the sysconfig
register offsets.

This is because other phy's (miphy28, miphy365) have a combination of
memory mapped registers and sysconfig control regs, and we shouldn't
be mixing address spaces in the reg property. In addition we would
ideally like the sysconfig offsets to be passed via DT in a uniform way.

This new method will also allow us to support devices which have sysconfig
registers in different banks more easily and it is also analagous to how
keystone and bcm7745 platforms pass there syscon offsets in DT.

This breaks DT compatibility, but this platform is considered WIP, and
is only used by a few developers who are upstreaming support for it.

Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
David S. Miller [Fri, 9 Jan 2015 04:14:32 +0000 (20:14 -0800)]
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- remove useless return in void functions
- remove unused member 'primary_iface' from 'struct orig_node'
- improve existing kernel doc
- fix several checkpatch complaints
- ensure socket's control block is cleared for received skbs
- add missing DEBUG_FS dependency to BATMAN_ADV_DEBUG symbol

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocsiostor:firmware upgrade fix
Praveen Madhavan [Wed, 7 Jan 2015 13:46:28 +0000 (19:16 +0530)]
csiostor:firmware upgrade fix

This patch fixes removes older means of upgrading Firmware using MAJOR version
and adds newer interface version checking mechanism.

Please apply this patch on net-next since it depends on previous commits.

Signed-off-by: Praveen Madhavan <praveenm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoRevert "ARM: dts: imx6qdl: enable FEC magic-packet feature"
Fabio Estevam [Wed, 7 Jan 2015 12:39:53 +0000 (10:39 -0200)]
Revert "ARM: dts: imx6qdl: enable FEC magic-packet feature"

As 456062b3ec6f ("ARM: imx: add FEC sleep mode callback function") has been
reverted, also revert the dts part.

This reverts commit 07b4d2dda0c00f56248 ("ARM: dts: imx6qdl: enable FEC
magic-packet feature").

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoRevert "ARM: imx: add FEC sleep mode callback function"
Fabio Estevam [Wed, 7 Jan 2015 12:39:52 +0000 (10:39 -0200)]
Revert "ARM: imx: add FEC sleep mode callback function"

i.MX platform maintainer Shawn Guo is not happy with the such commit as
explained below [1]:

"The GPR difference between SoCs can be encoded in device tree as well.
It's pointless to repeat the same code pattern for every single
platform, that need to set up GPR bits for enabling magic packet wake
up, while the only difference is the register and bit offset.

The platform code will become quite messy and unmaintainable if every
device driver dump their GPR register setup code into platform.

Sorry, but it's NACK from me."

This reverts commit 456062b3ec6f5b9 ("ARM: imx: add FEC sleep mode callback
function").

[1] http://www.spinics.net/lists/netdev/msg310922.html

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8169: add support for xmit_more
Florian Westphal [Wed, 7 Jan 2015 09:49:49 +0000 (10:49 +0100)]
r8169: add support for xmit_more

Delay update of hw tail descriptor if we know that another skb is going
to be sent.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'rhashtable-next'
David S. Miller [Fri, 9 Jan 2015 03:47:19 +0000 (19:47 -0800)]
Merge branch 'rhashtable-next'

Ying Xue says:

====================
Involve rhashtable_lookup_insert routine

The series aims to involve rhashtable_lookup_insert() to guarantee
that the process of lookup and insertion of an object from/into hash
table is finished atomically, allowing rhashtable's users not to
introduce an extra lock during search and insertion. For example,
tipc socket is the first user benefiting from this enhancement.

v2 changes:
 - fix the issue of waking up worker thread under a wrong condition in
   patch #2, which is pointed by Thomas.
 - move a comment from rhashtable_inser() to rhashtable_wakeup_worker()
   according to Thomas's suggestion in patch #2.
 - indent the third line of condition statement in
   rhashtable_wakeup_worker() to inner bracket in patch #2.
 - drop patch #3 of v1 series
 - fix an issue of being unable to remove an object from hash table in
   certain special case in patch #4.
 - involve a new patch #5 to avoid unnecessary wakeup for worker queue
   thread
 - involve a new patch #6 to initialize atomic "nelems" variable
 - adjust "nelem_hint" value from 256 to 192 avoiding to unnecessarily
   to shrink hash table from the beginning phase in patch #7.

v1 changes:
 But before rhashtable_lookup_insert() is involved, the following
 optimizations need to be first done:
- simplify rhashtable_lookup by reusing rhashtable_lookup_compare()
- introduce rhashtable_wakeup_worker() to further reduce duplicated
  code in patch #2
- fix an issue in patch #3
- involve rhashtable_lookup_insert(). But in this version, we firstly
  use rhashtable_lookup() to search duplicate key in both old and new
  bucket table; secondly introduce another __rhashtable_insert() helper
  function to reduce the duplicated code between rhashtable_insert()
  and rhashtable_lookup_insert().
- add patch #5 into the series as it depends on above patches. But in
  this version, no change is made comparing with its previous version.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: convert tipc reference table to use generic rhashtable
Ying Xue [Wed, 7 Jan 2015 05:41:58 +0000 (13:41 +0800)]
tipc: convert tipc reference table to use generic rhashtable

As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.

If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.

Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: initialize atomic nelems variable
Ying Xue [Wed, 7 Jan 2015 05:41:57 +0000 (13:41 +0800)]
rhashtable: initialize atomic nelems variable

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: avoid unnecessary wakeup for worker queue
Ying Xue [Wed, 7 Jan 2015 05:41:56 +0000 (13:41 +0800)]
rhashtable: avoid unnecessary wakeup for worker queue

Move condition statements of verifying whether hash table size exceeds
its maximum threshold or reaches its minimum threshold from resizing
functions to resizing decision functions, avoiding unnecessary wakeup
for worker queue thread.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: future table needs to be traversed when remove an object
Ying Xue [Wed, 7 Jan 2015 05:41:55 +0000 (13:41 +0800)]
rhashtable: future table needs to be traversed when remove an object

When remove an object from hash table, we currently only traverse old
bucket table to check whether the object exists. If the object is not
found in it, we will try again. But in the second search loop, we still
search the object from the old table instead of future table. As a
result, the object may be not removed from hash table especially when
resizing is currently in progress and the object is just saved in the
future table.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: involve rhashtable_lookup_insert routine
Ying Xue [Wed, 7 Jan 2015 05:41:54 +0000 (13:41 +0800)]
rhashtable: involve rhashtable_lookup_insert routine

Involve a new function called rhashtable_lookup_insert() which makes
lookup and insertion atomic under bucket lock protection, helping us
avoid to introduce an extra lock when we search and insert an object
into hash table.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: introduce rhashtable_wakeup_worker helper function
Ying Xue [Wed, 7 Jan 2015 05:41:53 +0000 (13:41 +0800)]
rhashtable: introduce rhashtable_wakeup_worker helper function

Introduce rhashtable_wakeup_worker() helper function to reduce
duplicated code where to wake up worker.

By the way, as long as the both "future_tbl" and "tbl" bucket table
pointers point to the same bucket array, we should try to wake up
the resizing worker thread, otherwise, it indicates the work of
resizing hash table is not finished yet. However, currently we will
wake up the worker thread only when the two pointers point to
different bucket array. Obviously this is wrong. So, the issue is
also fixed as well in the patch.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: optimize rhashtable_lookup routine
Ying Xue [Wed, 7 Jan 2015 05:41:52 +0000 (13:41 +0800)]
rhashtable: optimize rhashtable_lookup routine

Define an internal compare function and relevant compare argument,
and then make use of rhashtable_lookup_compare() to lookup key in
hash table, reducing duplicated code between rhashtable_lookup()
and rhashtable_lookup_compare().

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'cxgb4-next'
David S. Miller [Fri, 9 Jan 2015 03:39:18 +0000 (19:39 -0800)]
Merge branch 'cxgb4-next'

Hariprasad Shenai says:

====================
Add support for few debugfs entries

This patch series adds support for devlog, cim_la, cim_qcfg and mps_tcam
debugfs entries.

The patches series is created against 'net-next' tree.
And includes patches on cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Add support for mps_tcam debugfs
Hariprasad Shenai [Wed, 7 Jan 2015 03:18:03 +0000 (08:48 +0530)]
cxgb4: Add support for mps_tcam debugfs

Debug log to get the MPS TCAM table

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Add support for cim_qcfg entry in debugfs
Hariprasad Shenai [Wed, 7 Jan 2015 03:18:02 +0000 (08:48 +0530)]
cxgb4: Add support for cim_qcfg entry in debugfs

Adds debug log to get cim queue config

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Add support for cim_la entry in debugfs
Hariprasad Shenai [Wed, 7 Jan 2015 03:18:01 +0000 (08:48 +0530)]
cxgb4: Add support for cim_la entry in debugfs

The CIM LA captures the embedded processor’s internal state. Optionally, it can
also trace the flow of data in and out of the embedded processor. Therefore, the
CIM LA output contains detailed information of what code the embedded processor
executed prior to the CIM LA capture.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: Add support for devlog
Hariprasad Shenai [Wed, 7 Jan 2015 03:18:00 +0000 (08:48 +0530)]
cxgb4: Add support for devlog

Add support for device log entry in debugfs

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>