David S. Miller [Tue, 25 Aug 2015 20:35:32 +0000 (13:35 -0700)]
Merge branch 'rds-assorted-bug-fixes'
Santosh Shilimkar says:
====================
RDS: Assorted bug fixes
We would like to improve RDS upstream support and in that context, I
started playing with it. But run into number of issues including as
basic is RDS IB RDMA doesn't work. As part of the debug, I ended up
creating the $subject series which has bunch of assorted fixes. At
least with this series I can run RDS IB RDMA and other tests
successfully.
Some of these fixes have been done by Chris Meson, Andy Grover and
Zach Brown while at Oracle. There are still more kinks with FMR and
error handling and I plan to address them in a follow up series.
Series generated against Linus's master(v4.2-rc-7) but also applies
against next-next cleanly. Its tested on Oracle hardware with IB
fabric for both bcopy as well as RDMA mode. I don't have access
to iWARP hardware so any testing help on iWARP hardware appreciated.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
santosh.shilimkar@oracle.com [Sat, 22 Aug 2015 22:45:35 +0000 (15:45 -0700)]
RDS: check for valid cm_id before initiating connection
Connection could have been dropped while the route is being resolved
so check for valid cm_id before initiating the connection.
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mukesh Kacker [Sat, 22 Aug 2015 22:45:34 +0000 (15:45 -0700)]
RDS: return EMSGSIZE for oversize requests before processing/queueing
rds_send_queue_rm() allows for the "current datagram" being queued
to exceed SO_SNDBUF thresholds by checking bytes queued without
counting in length of current datagram. (Since sk_sndbuf is set
to twice requested SO_SNDBUF value as a kernel heuristic this
is usually fine!)
If this "current datagram" squeezing past the threshold is itself
many times the size of the sk_sndbuf threshold itself then even
twice the SO_SNDBUF does not save us and it gets queued but
cannot be transmitted. Threads block and deadlock and device
becomes unusable. The check for this datagram not exceeding
SNDBUF thresholds (EMSGSIZE) is not done on this datagram as
that check is only done if queueing attempt fails.
(Datagrams that follow this datagram fail queueing attempts, go
through the check and eventually trip EMSGSIZE error but zero
length datagrams silently fail!)
This fix moves the check for datagrams exceeding SNDBUF limits
before any processing or queueing is attempted and returns EMSGSIZE
early in the rds_sndmsg() code. This change also ensures that all
datagrams get checked for exceeding SNDBUF/sk_sndbuf size limits
and the large datagrams that exceed those limits do not get to
rds_send_queue_rm() code for processing.
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
santosh.shilimkar@oracle.com [Sat, 22 Aug 2015 22:45:33 +0000 (15:45 -0700)]
RDS: make sure rds_send_drop_to properly takes the m_rs_lock
rds_send_drop_to() is used during socket tear down to find all the
messages on the socket and flush them . It can race with the
acking code unless it takes the m_rs_lock on each and every message.
This plugs a hole where we didn't take m_rs_lock on any message that
didn't have the RDS_MSG_ON_CONN set. Taking m_rs_lock avoids
double frees and other memory corruptions as the ack code trusts
the message m_rs pointer on a socket that had actually been freed.
We must take m_rs_lock to access m_rs. Because of lock nesting and
rs access, we also need to acquire rs_lock.
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Santosh Shilimkar [Sat, 22 Aug 2015 22:45:32 +0000 (15:45 -0700)]
RDS: Don't destroy the rdma id until after we're done using it
During connection resets, we are destroying the rdma id too soon. We can't
destroy it when it is still in use. So lets move rdma_destroy_id() after
we clear the rings.
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
santosh.shilimkar@oracle.com [Sat, 22 Aug 2015 22:45:31 +0000 (15:45 -0700)]
RDS: Fix assertion level from fatal to warning
Fix the asserion level since its not fatal and can be hit
in normal execution paths. There is no need to take the
system down.
We keep the WARN_ON() to detect the condition if we get
here with bad pages.
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
santosh.shilimkar@oracle.com [Sat, 22 Aug 2015 22:45:30 +0000 (15:45 -0700)]
RDS: Make sure we do a signaled send for large-send
WR(Work Requests )always generate a WC(Work Completion) with
signaled send. Default RDS ib code is setup for un-signaled
completion. Since RDS connction is persistent, we can end up
sending the data even after large-send when the remote end is
not active(for any reason).
By doing a signaled send at least once per large-send,
we can at least detect the problem in work completion
handler there by avoiding sending more data to
inactive remote.
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
santosh.shilimkar@oracle.com [Sat, 22 Aug 2015 22:45:29 +0000 (15:45 -0700)]
RDS: Mark message mapped before transmit
rds_send_xmit() marks the rds message map flag after
xmit_[rdma/atomic]() which is clearly wrong. We need
to maintain the ownership between transport and rds.
Also take care of error path.
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
santosh.shilimkar@oracle.com [Sat, 22 Aug 2015 22:45:28 +0000 (15:45 -0700)]
RDS: add a sock_destruct callback debug aid
This helps to detect the accidental processes/apps trying to destroy
the RDS socket which they are sharing with other processes/apps.
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
santosh.shilimkar@oracle.com [Sat, 22 Aug 2015 22:45:27 +0000 (15:45 -0700)]
RDS: check for congestion updates during rds_send_xmit
Ensure we don't keep sending the data if the link is congested.
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
santosh.shilimkar@oracle.com [Sat, 22 Aug 2015 22:45:26 +0000 (15:45 -0700)]
RDS: make sure we post recv buffers
If we get an ENOMEM during rds_ib_recv_refill, we might never come
back and refill again later. Patch makes sure to kick krdsd into
helping out.
To achieve this we add RDS_RECV_REFILL flag and update in the refill
path based on that so that at least some therad will keep posting
receive buffers.
Since krdsd and softirq both might race for refill, we decide to
schedule on work queue based on ring_low instead of ring_empty.
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
santosh.shilimkar@oracle.com [Sat, 22 Aug 2015 22:45:25 +0000 (15:45 -0700)]
RDS: don't update ip address tables if the address hasn't changed
If the ip address tables hasn't changed, there is no need to remove
them only to be added back again.
Lets fix it.
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
santosh.shilimkar@oracle.com [Sat, 22 Aug 2015 22:45:24 +0000 (15:45 -0700)]
RDS: destroy the ib state earlier during shutdown
Destroy ib state early during shutdown. Otherwise we can get callbacks
after the QP isn't really able to handle them.
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
santosh.shilimkar@oracle.com [Sat, 22 Aug 2015 22:45:23 +0000 (15:45 -0700)]
RDS: always free recv frag as we free its ring entry
We were still seeing rare occurrences of the WARN_ON(recv->r_frag) which
indicates that the recv refill path was finding allocated frags in ring
entries that were marked free. These were usually followed by OOM crashes.
They only seem to be occurring in the presence of completion errors and
connection resets.
This patch ensures that we free the frag as we mark the ring entry free.
This should stop the refill path from finding allocated frags in ring
entries that were marked free.
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
santosh.shilimkar@oracle.com [Sat, 22 Aug 2015 22:45:22 +0000 (15:45 -0700)]
RDS: restore return value in rds_cmsg_rdma_args()
In rds_cmsg_rdma_args() 'ret' is used by rds_pin_pages() which returns
number of pinned pages on success. And the same value is returned to the
caller of rds_cmsg_rdma_args() on success which is not intended.
Commit
f4a3fc03c1d7 ("RDS: Clean up error handling in rds_cmsg_rdma_args")
removed the 'ret = 0' line which broke RDS RDMA mode.
Fix it by restoring the return value on rds_pin_pages() success
keeping the clean-up in place.
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 22 Aug 2015 00:38:02 +0000 (17:38 -0700)]
tcp: refine pacing rate determination
When TCP pacing was added back in linux-3.12, we chose
to apply a fixed ratio of 200 % against current rate,
to allow probing for optimal throughput even during
slow start phase, where cwnd can be doubled every other gRTT.
At Google, we found it was better applying a different ratio
while in Congestion Avoidance phase.
This ratio was set to 120 %.
We've used the normal tcp_in_slow_start() helper for a while,
then tuned the condition to select the conservative ratio
as soon as cwnd >= ssthresh/2 :
- After cwnd reduction, it is safer to ramp up more slowly,
as we approach optimal cwnd.
- Initial ramp up (ssthresh == INFINITY) still allows doubling
cwnd every other RTT.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 20 Aug 2015 22:06:30 +0000 (15:06 -0700)]
xfrm: Use VRF master index if output device is enslaved
Directs route lookups to VRF table. Compiles out if NET_VRF is not
enabled. With this patch able to successfully bring up ipsec tunnels
in VRFs, even with duplicate network configuration.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Aug 2015 19:30:00 +0000 (12:30 -0700)]
tcp: fix slow start after idle vs TSO/GSO
slow start after idle might reduce cwnd, but we perform this
after first packet was cooked and sent.
With TSO/GSO, it means that we might send a full TSO packet
even if cwnd should have been reduced to IW10.
Moving the SSAI check in skb_entail() makes sense, because
we slightly reduce number of times this check is done,
especially for large send() and TCP Small queue callbacks from
softirq context.
As Neal pointed out, we also need to perform the check
if/when receive window opens.
Tested:
Following packetdrill test demonstrates the problem
// Test of slow start after idle
`sysctl -q net.ipv4.tcp_slow_start_after_idle=1`
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 65535 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+.100 < . 1:1(0) ack 1 win 511
+0 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_SOCKET, SO_SNDBUF, [200000], 4) = 0
+0 write(4, ..., 26000) = 26000
+0 > . 1:5001(5000) ack 1
+0 > . 5001:10001(5000) ack 1
+0 %{ assert tcpi_snd_cwnd == 10 }%
+.100 < . 1:1(0) ack 10001 win 511
+0 %{ assert tcpi_snd_cwnd == 20, tcpi_snd_cwnd }%
+0 > . 10001:20001(10000) ack 1
+0 > P. 20001:26001(6000) ack 1
+.100 < . 1:1(0) ack 26001 win 511
+0 %{ assert tcpi_snd_cwnd == 36, tcpi_snd_cwnd }%
+4 write(4, ..., 20000) = 20000
// If slow start after idle works properly, we should send 5 MSS here (cwnd/2)
+0 > . 26001:31001(5000) ack 1
+0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%
+0 > . 31001:36001(5000) ack 1
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Aug 2015 21:06:37 +0000 (14:06 -0700)]
Merge branch 'fjes'
Taku Izumi says:
====================
FUJITSU Extended Socket network device driver
This patchsets adds FUJITSU Extended Socket network device driver.
Extended Socket network device is a shared memory based high-speed
network interface between Extended Partitions of PRIMEQUEST 2000 E2
series.
You can get some information about Extended Partition and Extended
Socket by referring the following manual.
http://globalsp.ts.fujitsu.com/dmsp/Publications/public/CA92344-0537.pdf
3.2.1 Extended Partitioning
3.2.2 Extended Socke
v2.2 -> v3:
- Fix up according to David's comment (No functional change)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:38 +0000 (17:29 +0900)]
fjes: ethtool support
This patch adds implementation for ethtool support.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:37 +0000 (17:29 +0900)]
fjes: handle receive cancellation request interrupt
This patch adds implementation of handling IRQ
of other receiver's receive cancellation request.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:36 +0000 (17:29 +0900)]
fjes: epstop_task
This patch adds epstop_task.
This task is used to process other receiver's
cancellation request.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:35 +0000 (17:29 +0900)]
fjes: update_zone_task
This patch adds update_zone_task.
Zoning information can be changed by user.
This task is used to monitor if zoning information is
changed or not.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:34 +0000 (17:29 +0900)]
fjes: unshare_watch_task
This patch adds unshare_watch_task.
Shared buffer's status can be changed into unshared.
This task is used to monitor shared buffer's status.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:33 +0000 (17:29 +0900)]
fjes: force_close_task
This patch adds force_close_task.
This task is used to close network device forcibly.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:32 +0000 (17:29 +0900)]
fjes: interrupt_watch_task
This patch adds interrupt_watch_task.
This task is used to prevent delay of interrupts.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:31 +0000 (17:29 +0900)]
fjes: net_device_ops.ndo_vlan_rx_add/kill_vid
This patch adds net_device_ops.ndo_vlan_rx_add_vid and
net_device_ops.ndo_vlan_rx_kill_vid callback.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:30 +0000 (17:29 +0900)]
fjes: net_device_ops.ndo_tx_timeout
This patch adds net_device_ops.ndo_tx_timeout callback.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:29 +0000 (17:29 +0900)]
fjes: net_device_ops.ndo_change_mtu
This patch adds net_device_ops.ndo_change_mtu.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:28 +0000 (17:29 +0900)]
fjes: net_device_ops.ndo_get_stats64
This patch adds net_device_ops.ndo_get_stats64 callback.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:27 +0000 (17:29 +0900)]
fjes: NAPI polling function
This patch adds NAPI polling function and receive related work.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:26 +0000 (17:29 +0900)]
fjes: tx_stall_task
This patch adds tx_stall_task.
When receiver's buffer is full, sender stops
its tx queue. This task is used to monitor
receiver's status and when receiver's buffer
is avairable, it resumes tx queue.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:25 +0000 (17:29 +0900)]
fjes: raise_intr_rxdata_task
This patch add raise_intr_rxdata_task.
Extended Socket Network Device is shared memory
based, so someone's transmission denotes other's
reception. In order to notify receivers, sender
has to raise interruption of receivers.
raise_intr_rxdata_task does this work.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:24 +0000 (17:29 +0900)]
fjes: net_device_ops.ndo_start_xmit
This patch adds net_device_ops.ndo_start_xmit callback,
which is called when sending packets.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:23 +0000 (17:29 +0900)]
fjes: net_device_ops.ndo_open and .ndo_stop
This patch adds net_device_ops.ndo_open and .ndo_stop
callback. These function is called when network device
activation and deactivation.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:22 +0000 (17:29 +0900)]
fjes: buffer address regist/unregistration routine
This patch adds buffer address regist/unregistration routine.
This function is mainly invoked when network device's
activation (open) and deactivation (close)
in order to retist/unregist shared buffer address.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:21 +0000 (17:29 +0900)]
fjes: ES information acquisition routine
This patch adds ES information acquisition routine.
ES information can be retrieved issuing information
request command. ES information includes which
receiver is same zone.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:20 +0000 (17:29 +0900)]
fjes: platform_driver's .probe and .remove routine
This patch implements platform_driver's .probe and .remove
routine, and also adds board specific private data structure.
This driver registers net_device at platform_driver's .probe
routine and unregisters net_device at its .remove routine.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:19 +0000 (17:29 +0900)]
fjes: Hardware cleanup routine
This patch adds hardware cleanup routine to be
invoked at driver's .remove routine.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:18 +0000 (17:29 +0900)]
fjes: Hardware initialization routine
This patch adds hardware initialization routine to be
invoked at driver's .probe routine.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taku Izumi [Fri, 21 Aug 2015 08:29:17 +0000 (17:29 +0900)]
fjes: Introduce FUJITSU Extended Socket Network Device driver
This patch adds the basic code of FUJITSU Extended Socket
Network Device driver.
When "PNP0C02" is found in ACPI DSDT, it evaluates "_STR"
to check if "PNP0C02" is for Extended Socket device driver
and retrieves ACPI resource information. Then creates
platform_device.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Loganaden Velvindron [Fri, 21 Aug 2015 02:22:18 +0000 (19:22 -0700)]
3c59x: Add BQL support for 3c59x ethernet driver.
This BQL patch is based on work done by Tino Reichardt.
Tested on 0000:05:00.0: 3Com PCI 3c905C Tornado at
ffffc90000e6e000 by running
Flent several times.
Signed-off-by: Loganaden Velvindron <logan@elandsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Aug 2015 17:34:40 +0000 (10:34 -0700)]
Merge branch 'ila-precompute'
Tom Herbert says:
====================
ila: Precompute checksums
This patch set:
- Adds argument ot LWT build_state that holds a pointer to the fib
configuration being applied to the new route
- Adds support in ILA to precompute checksum difference for
performance optimization
v2:
- Move return argument in build_state to end of arguments
v3:
- Update the signature for ip6_tun_build_state()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Mon, 24 Aug 2015 16:45:42 +0000 (09:45 -0700)]
ila: Precompute checksum difference for translations
In the ILA build state for LWT compute the checksum difference to apply
to transport checksums that include the IPv6 pseudo header. The
difference is between the route destination (from fib6_config) and the
locator to write.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Mon, 24 Aug 2015 16:45:41 +0000 (09:45 -0700)]
lwt: Add cfg argument to build_state
Add cfg and family arguments to lwt build state functions. cfg is a void
pointer and will either be a pointer to a fib_config or fib6_config
structure. The family parameter indicates which one (either AF_INET
or AF_INET6).
LWT encpasulation implementation may use the fib configuration to build
the LWT state.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shaohui Xie [Fri, 21 Aug 2015 07:29:29 +0000 (15:29 +0800)]
net: phy: add interrupt support for aquantia phy
By implementing config_intr & ack_interrupt, now the phy can support
link connect/disconnect interrupt.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Aug 2015 03:42:57 +0000 (20:42 -0700)]
Merge tag 'nfc-next-4.3-1' of git://git./linux/kernel/git/sameo/nfc-next
Samuel Ortiz says:
====================
NFC 4.3 pull request
This is the NFC pull request for 4.3.
With this one we have:
- A new driver for Samsung's S3FWRN5 NFC chipset. In order to
properly support this driver, a few NCI core routines needed
to be exported. Future drivers like Intel's Fields Peak will
benefit from this.
- SPI support as a physical transport for STM st21nfcb.
- An additional netlink API for sending replies back to userspace
from vendor commands.
- 2 small fixes for TI's trf7970a
- A few st-nci fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Fri, 21 Aug 2015 10:41:14 +0000 (12:41 +0200)]
route: fix breakage after moving lwtunnel state
__recnt and related fields need to be in its own cacheline for performance
reasons. Commit
61adedf3e3f1 ("route: move lwtunnel state to dst_entry")
broke that on 32bit archs, causing BUILD_BUG_ON in dst_hold to be triggered.
This patch fixes the breakage by moving the lwtunnel state to the end of
dst_entry on 32bit archs. Unfortunately, this makes it share the cacheline
with __refcnt and may affect performance, thus further patches may be
needed.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes:
61adedf3e3f1 ("route: move lwtunnel state to dst_entry")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 23 Aug 2015 23:28:18 +0000 (16:28 -0700)]
Merge tag 'linux-can-next-for-4.3-
20150820' of git://git./linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
this is a pull request of a two patches for net-next.
The first patch is by Nik Nyby and fixes a typo in a function name. The
second patch by Lucas Stach demotes register output to debug level.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 23 Aug 2015 23:14:46 +0000 (16:14 -0700)]
Merge branch 'tipc-failover-fixes'
Jon Maloy says:
====================
tipc: fix link failover/synch problems
We fix three problems with the new link failover/synch implementation,
which was introduced earlier in this release cycle. They are all related
to situations where there is a very short interval between the disabling
and enabling of interfaces.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 20 Aug 2015 06:12:56 +0000 (02:12 -0400)]
tipc: fix stale link problem during synchronization
Recent changes to the link synchronization means that we can now just
drop packets arriving on the synchronizing link before the synch point
is reached. This has lead to significant simplifications to the
implementation, but also turns out to have a flip side that we need
to consider.
Under unlucky circumstances, the two endpoints may end up
repeatedly dropping each other's packets, while immediately
asking for retransmission of the same packets, just to drop
them once more. This pattern will eventually be broken when
the synch point is reached on the other link, but before that,
the endpoints may have arrived at the retransmission limit
(stale counter) that indicates that the link should be broken.
We see this happen at rare occasions.
The fix for this is to not ask for retransmissions when a link is in
state LINK_SYNCHING. The fact that the link has reached this state
means that it has already received the first SYNCH packet, and that it
knows the synch point. Hence, it doesn't need any more packets until the
other link has reached the synch point, whereafter it can go ahead and
ask for the missing packets.
However, because of the reduced traffic on the synching link that
follows this change, it may now take longer to discover that the
synch point has been reached. We compensate for this by letting all
packets, on any of the links, trig a check for synchronization
termination. This is possible because the packets themselves don't
contain any information that is needed for discovering this condition.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 20 Aug 2015 06:12:55 +0000 (02:12 -0400)]
tipc: interrupt link synchronization when a link goes down
When we introduced the new link failover/synch mechanism
in commit
6e498158a827fd515b514842e9a06bdf0f75ab86
("tipc: move link synch and failover to link aggregation level"),
we missed the case when the non-tunnel link goes down during the link
synchronization period. In this case the tunnel link will remain in
state LINK_SYNCHING, something leading to unpredictable behavior when
the failover procedure is initiated.
In this commit, we ensure that the node and remaining link goes
back to regular communication state (SELF_UP_PEER_UP/LINK_ESTABLISHED)
when one of the parallel links goes down. We also ensure that we don't
re-enter synch mode if subsequent SYNCH packets arrive on the remaining
link.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 20 Aug 2015 06:12:54 +0000 (02:12 -0400)]
tipc: eliminate risk of premature link setup during failover
When a link goes down, and there is still a working link towards its
destination node, a failover is initiated, and the failed link is not
allowed to re-establish until that procedure is finished. To ensure
this, the concerned link endpoints are set to state LINK_FAILINGOVER,
and the node endpoints to NODE_FAILINGOVER during the failover period.
However, if the link reset is due to a disabled bearer, the corres-
ponding link endpoint is deleted, and only the node endpoint knows
about the ongoing failover. Now, if the disabled bearer is re-enabled
during the failover period, the discovery mechanism may create a new
link endpoint that is ready to be established, despite that this is not
permitted. This situation may cause both the ongoing failover and any
subsequent link synchronization to fail.
In this commit, we ensure that a newly created link goes directly to
state LINK_FAILINGOVER if the corresponding node state is
NODE_FAILINGOVER. This eliminates the problem described above.
Furthermore, we tighten the criteria for which packets are allowed
to end a failover state in the function tipc_node_check_state().
By checking that the receiving link is up and running, instead of just
checking that it is not in failover mode, we eliminate the risk that
protocol packets from the re-created link may cause the failover to
be prematurely terminated.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 23 Aug 2015 23:08:54 +0000 (16:08 -0700)]
Merge branch 'nps_enet_fixes'
Noam Camus says:
====================
*** nps_enet fixups ***
Change v2
TX done is handled back with NAPI poll.
Change v1
This patch set is a bunch of fixes to make nps_enet work correctly with
all platforms, i.e. real device, emulation system, and simulation system.
The main trigger for this patch set was that in our emulation system
the TX end interrupt is "edge-sensitive" and therefore we cannot use the
cause register since it is not sticky.
Also:
TX is handled during HW interrupt context and not NAPI job.
race with TX done was fixed.
added acknowledge for TX when device is "level sensitive".
enable drop of control frames which is not needed for regular usage.
So most of this patch set is about TX handling, which is now more complete.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Noam Camus [Thu, 20 Aug 2015 05:00:05 +0000 (08:00 +0300)]
NET: nps_enet: minor namespace cleanup
We define buf_int_enable in the minimal namespace it is used.
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Noam Camus [Thu, 20 Aug 2015 05:00:04 +0000 (08:00 +0300)]
NET: nps_enet: TX done acknowledge.
This is needed for when TX done interrupt is in
"level mode".
For example it is true for some simulators of this device.
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Noam Camus [Thu, 20 Aug 2015 05:00:03 +0000 (08:00 +0300)]
NET: nps_enet: drop control frames
We set controller to drop control frames and not trying
to pass them on. This is only needed for debug reasons.
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Noam Camus [Thu, 20 Aug 2015 05:00:02 +0000 (08:00 +0300)]
NET: nps_enet: TX done race condition
We need to set tx_skb pointer before send frame.
If we receive interrupt before we set pointer we will try
to free SKB with wrong pointer.
Now we are sure that SKB pointer will never be NULL during
handling TX done and check is removed.
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Noam Camus [Thu, 20 Aug 2015 05:00:01 +0000 (08:00 +0300)]
NET: nps_enet: replace use of cause register
When interrupt is received we read directly from control
register for RX/TX instead of reading cause register
since this register fails to indicate TX done when
TX interrupt is "edge mode".
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 23 Aug 2015 22:59:57 +0000 (15:59 -0700)]
Merge branch 'gro_tunnels'
Tom Herbert says:
====================
gro: Fixes for tunnels and GRO
This patch set addresses some issue related to tunneling and GRO:
- Fix remote checksum offload to properly deal with frag0 in GRO.
- Add support for GRO at VXLAN tunnel (call gro_cells)
Testing: Ran one netperf TCP_STREAM to highlight impact of different
configurations:
GUE
Zero UDP checksum
4628.42 MBps
UDP checksums enabled
6800.51 MBps
UDP checksums and remote checksum offload
7663.82 MBps
UDP checksums and remote checksum offload using no-partial
7287.25 MBps
VXLAN
Zero UDP checksum
4112.02
UDP checksums enabled
6785.80 MBps
UDP checksums and remote checksum offload
7075.56 MBps
v2:
- Drop "gro: Pull headers into skb head for 1st skb in gro list"
from patch set
- In vxlan_remcsum and gue_remcsum return immediately if remcsum
processing was already done
- Add gro callbacks for sit offload
- Use WARN_ON_ONCE if we get a GUE protocol that does not have
GRO offload support
v3:
- Don't restore gro callbacks for sit offload
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Thu, 20 Aug 2015 00:07:34 +0000 (17:07 -0700)]
fou: Do WARN_ON_ONCE in gue_gro_receive for bad proto callbacks
Do WARN_ON_ONCE instead of WARN_ON in gue_gro_receive when the offload
callcaks are bad (either don't exist or gro_receive is not specified).
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Thu, 20 Aug 2015 00:07:33 +0000 (17:07 -0700)]
vxlan: GRO support at tunnel layer
Add calls to gro_cells infrastructure to do GRO when receiving on a tunnel.
Testing:
Ran 200 netperf TCP_STREAM instance
- With fix (GRO enabled on VXLAN interface)
Verify GRO is happening.
9084 MBps tput
3.44% CPU utilization
- Without fix (GRO disabled on VXLAN interface)
Verified no GRO is happening.
9084 MBps tput
5.54% CPU utilization
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Thu, 20 Aug 2015 00:07:32 +0000 (17:07 -0700)]
gro: Fix remcsum offload to deal with frags in GRO
The remote checksum offload GRO did not consider the case that frag0
might be in use. This patch fixes that by accessing headers using the
skb_gro functions and not saving offsets relative to skb->head.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chas Williams [Wed, 19 Aug 2015 23:14:20 +0000 (19:14 -0400)]
net/xen-netfront: only clean up queues if present
If you simply load and unload the module without starting the interfaces,
the queues are never created and you get a bad pointer dereference.
Signed-off-by: Chas Williams <3chas3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Wed, 19 Aug 2015 22:54:55 +0000 (18:54 -0400)]
net: dsa: mv88e6xxx: set 802.1Q mode to Fallback
The current Secure port mode requires the port-based VLANs to also be
valid in the 802.1Q VLAN Table Unit. The current hardware bridging
support only configures the port-based VLANs, thus is broken.
A new patchset is required to adapt the hardware bridging code to fully
support the Secure port mode.
In the meantime, change the 802.1Q mode of every ports to Fallback,
which filtering is more permissive, and doesn't add this restriction to
handle port-based and tagged-based VLANs.
Fixes:
8efdda4a1b60 ("net: dsa: mv88e6xxx: use port 802.1Q mode Secure")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Aug 2015 18:44:04 +0000 (11:44 -0700)]
Merge git://git./linux/kernel/git/davem/net
Conflicts:
drivers/net/usb/qmi_wwan.c
Overlapping additions of new device IDs to qmi_wwan.c
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Aug 2015 18:38:41 +0000 (11:38 -0700)]
enic: Fix build failure with SRIOV disabled.
err_out_vnic_unregister is used regardless of whether
SRIOV is enabled or not.
Reported-by: Jesse Brandeburg <jesse.brangeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Aug 2015 05:18:45 +0000 (22:18 -0700)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
This is second pull request includes the conflict resolution patch that
resulted from the updates that we got for the conntrack template through
kmalloc. No changes with regards to the previously sent 15 patches.
The following patchset contains Netfilter updates for your net-next tree, they
are:
1) Rework the existing nf_tables counter expression to make it per-cpu.
2) Prepare and factor out common packet duplication code from the TEE target so
it can be reused from the new dup expression.
3) Add the new dup expression for the nf_tables IPv4 and IPv6 families.
4) Convert the nf_tables limit expression to use a token-based approach with
64-bits precision.
5) Enhance the nf_tables limit expression to support limiting at packet byte.
This comes after several preparation patches.
6) Add a burst parameter to indicate the amount of packets or bytes that can
exceed the limiting.
7) Add netns support to nfacct, from Andreas Schultz.
8) Pass the nf_conn_zone structure instead of the zone ID in nf_tables to allow
accessing more zone specific information, from Daniel Borkmann.
9) Allow to define zone per-direction to support netns containers with
overlapping network addressing, also from Daniel.
10) Extend the CT target to allow setting the zone based on the skb->mark as a
way to support simple mappings from iptables, also from Daniel.
11) Make the nf_tables payload expression aware of the fact that VLAN offload
may have removed a vlan header, from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Thu, 20 Aug 2015 23:10:19 +0000 (01:10 +0200)]
Merge branch 'master' of git://git./linux/kernel/git/davem/net-next
Resolve conflicts with conntrack template fixes.
Conflicts:
net/netfilter/nf_conntrack_core.c
net/netfilter/nf_synproxy_core.c
net/netfilter/xt_CT.c
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Fri, 21 Aug 2015 00:06:11 +0000 (17:06 -0700)]
Merge tag 'pm+acpi-4.2-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These fix a recent regression in the ACPI backlight code and a memory
leak in the Exynos cpufreq driver.
Specifics:
- Fix a recently introduced issue in the ACPI backlight code which
causes lockdep to complain about a circular lock dependency during
initialization (Hans de Goede).
- Fix a possible memory during initialization in the Exynos cpufreq
driver (Shailendra Verma)"
* tag 'pm+acpi-4.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: exynos: Fix for memory leak in case SoC name does not match
ACPI / video: Fix circular lock dependency issue in the video-detect code
David S. Miller [Thu, 20 Aug 2015 22:42:38 +0000 (15:42 -0700)]
Merge branch 'lwt-ipv6'
Jiri Benc says:
====================
lwtunnel: per route ipv6 support for vxlan
v3: Moved LWTUNNEL_ENCAP_IP6 definition in patch 13.
v2: Fixed issues in patch 4 pointed out by Alexei.
This series enables IPv6 tunnels based on lwtunnel infrastructure. Only
vxlan is supported for now.
Tested in all combinations of IPv4 over IPv6, IPv6 over IPv4 and IPv6 over
IPv6.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 20 Aug 2015 11:56:32 +0000 (13:56 +0200)]
ipv6: route: per route IP tunnel metadata via lightweight tunnel
Allow specification of per route IP tunnel instructions also for IPv6.
This complements commit
3093fbe7ff4b ("route: Per route IP tunnel metadata
via lightweight tunnel").
Signed-off-by: Jiri Benc <jbenc@redhat.com>
CC: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 20 Aug 2015 11:56:31 +0000 (13:56 +0200)]
ipv6: route: extend flow representation with tunnel key
Use flowi_tunnel in flowi6 similarly to what is done with IPv4.
This complements commit
1b7179d3adff ("route: Extend flow representation
with tunnel key").
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 20 Aug 2015 11:56:30 +0000 (13:56 +0200)]
vxlan: metadata based tunneling for IPv6
Support metadata based (formerly flow based) tunneling also for IPv6.
This complements commit
ee122c79d422 ("vxlan: Flow based tunneling").
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 20 Aug 2015 11:56:29 +0000 (13:56 +0200)]
vxlan: do not shadow flags variable
The 'flags' variable is already defined in the outer scope.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 20 Aug 2015 11:56:28 +0000 (13:56 +0200)]
vxlan: provide access function for vxlan socket address family
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 20 Aug 2015 11:56:27 +0000 (13:56 +0200)]
ipv6: ndisc: inherit metadata dst when creating ndisc requests
If output device wants to see the dst, inherit the dst of the original skb
in the ndisc request.
This is an IPv6 counterpart of commit
0accfc268f4d ("arp: Inherit metadata
dst when creating ARP requests").
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 20 Aug 2015 11:56:26 +0000 (13:56 +0200)]
ipv6: drop metadata dst in ip6_route_input
The fix in commit
48fb6b554501 is incomplete, as now ip6_route_input can be
called with non-NULL dst if it's a metadata dst and the reference is leaked.
Drop the reference.
Fixes:
48fb6b554501 ("ipv6: fix crash over flow-based vxlan device")
Fixes:
ee122c79d422 ("vxlan: Flow based tunneling")
CC: Wei-Chun Chao <weichunc@plumgrid.com>
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 20 Aug 2015 11:56:25 +0000 (13:56 +0200)]
route: move lwtunnel state to dst_entry
Currently, the lwtunnel state resides in per-protocol data. This is
a problem if we encapsulate ipv6 traffic in an ipv4 tunnel (or vice versa).
The xmit function of the tunnel does not know whether the packet has been
routed to it by ipv4 or ipv6, yet it needs the lwtstate data. Moving the
lwtstate data to dst_entry makes such inter-protocol tunneling possible.
As a bonus, this brings a nice diffstat.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 20 Aug 2015 11:56:24 +0000 (13:56 +0200)]
ip_tunnels: use tos and ttl fields also for IPv6
Rename the ipv4_tos and ipv4_ttl fields to just 'tos' and 'ttl', as they'll
be used with IPv6 tunnels, too.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 20 Aug 2015 11:56:23 +0000 (13:56 +0200)]
ip_tunnels: add IPv6 addresses to ip_tunnel_key
Add the IPv6 addresses as an union with IPv4 ones. When using IPv4, the
newly introduced padding after the IPv4 addresses needs to be zeroed out.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 20 Aug 2015 11:56:22 +0000 (13:56 +0200)]
ip_tunnels: use offsetofend
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 20 Aug 2015 11:56:21 +0000 (13:56 +0200)]
ip_tunnels: use u8/u16/u32
The ip_tunnels.h include file uses mixture of __u16 and u16 (etc.) types.
Unify it to the non-underscore variants.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 20 Aug 2015 11:56:20 +0000 (13:56 +0200)]
ip_tunnels: remove custom alignment and packing
The custom alignment of struct ip_tunnel_key is unnecessary. In struct
sw_flow_key, it starts at offset 256, in struct ip_tunnel_info it's the
first field.
The structure is also packed even without the __packed keyword.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafael J. Wysocki [Thu, 20 Aug 2015 22:19:29 +0000 (00:19 +0200)]
Merge branches 'acpi-video' and 'cpufreq-fixes'
* acpi-video:
ACPI / video: Fix circular lock dependency issue in the video-detect code
* cpufreq-fixes:
cpufreq: exynos: Fix for memory leak in case SoC name does not match
Jeremy Linton [Wed, 19 Aug 2015 18:56:42 +0000 (13:56 -0500)]
net: xgene Remove xgene specific phy and MAC lookup functions
Convert the xgene_get_mac_address to device_get_mac_address(), and
xgene_get_phy_mode() to device_get_phy_mode().
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 19 Aug 2015 18:40:31 +0000 (11:40 -0700)]
net: Fix nexthop lookups
Andreas reported breakage adding routes with local nexthops:
$ ip route show table main
...
172.28.0.0/24 dev vnf-xe1p0 proto kernel scope link src 172.28.0.16
$ ip route add 10.0.0.0/8 via 172.28.0.32 table 100 dev vnf-xe1p0
RTNETLINK answers: Resource temporarily unavailable
3bfd847203c changed the lookup to use the passed in table but for cases like
this the nexthop is in the local table rather than the passed in table.
Fixes:
3bfd847203c ("net: Use passed in table for nexthop lookups")
Reported-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Feldman [Wed, 19 Aug 2015 18:29:35 +0000 (11:29 -0700)]
bridge: fix netlink max attr size
.maxtype should match .policy. Probably just been getting lucky here
because IFLA_BRPORT_MAX > IFLA_BR_MAX.
Fixes:
13323516 ("bridge: implement rtnl_link_ops->changelink")
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeremy Linton [Wed, 19 Aug 2015 16:46:43 +0000 (11:46 -0500)]
smsc911x: Remove dev==NULL check.
The dev==NULL check in smsc911x_probe_config is useless
and isn't providing any additional protection. If a fwnode
doesn't exist then an appropriate error should be returned
by device_get_phy_mode() covering the original case
of a missing of/fwnode.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeremy Linton [Wed, 19 Aug 2015 16:46:42 +0000 (11:46 -0500)]
device property: Add ETH_ALEN check, update comments.
This patch adds MAC address length check back into
the device_get_mac_addr() function before calling
is_valid_ether_addr() similar to the way the OF
routine does it.
Update the comments for the two new functions.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 Aug 2015 21:13:25 +0000 (14:13 -0700)]
Merge tag 'wireless-drivers-next-for-davem-2015-08-19' of git://git./linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
Major changes:
ath10k:
* add support for qca99x0 family of devices
* improve performance of tx_lock
* add support for raw mode (802.11 frame format) and software crypto
engine enabled via a module parameter
ath9k:
* add fast-xmit support
wil6210:
* implement TSO support
* support bootloader v1 and onwards
iwlwifi:
* Deprecate -10.ucode
* Clean ups towards multiple Rx queues
* Add support for longer CMD IDs. This will be required by new
firmwares since we are getting close to the u8 limit.
* bugfixes for the D0i3 power state
* Add basic support for FTM
* polish the Miracast operation
* fix a few power consumption issues
* scan cleanup
* fixes for D0i3 system state
* add paging for devices that support it
* add again the new RBD allocation model
* add more options to the firmware debug system
* add support for frag SKBs in Tx
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 19 Aug 2015 08:04:51 +0000 (16:04 +0800)]
ipv4: Make fib_encap_match static
Make fib_encap_match() static as it isn't used outside the file.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 Aug 2015 21:10:23 +0000 (14:10 -0700)]
Merge branch 'ewma'
Johannes Berg says:
====================
average: convert users to inline implementation
Since there's very little benefit of the out-of-line implementation
(a single byte of .text in one driver as far as I've seen), convert
all drivers to the inline implementation, saving memory, and remove
the out-of-line implementation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Wed, 19 Aug 2015 07:46:22 +0000 (09:46 +0200)]
average: remove out-of-line implementation
Since all users are now converted to the inline implementation,
remove the out-of-line implementation entirely.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Wed, 19 Aug 2015 07:46:21 +0000 (09:46 +0200)]
rt2x00: use DECLARE_EWMA
Instead of using the out-of-line EWMA calculation, use DECLARE_EWMA()
to create static inlines. On x86/64 this results in code that's one
byte larger (for me), but reduces struct link_ant and struct link
size by the two unsigned long values that store the parameters each.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Wed, 19 Aug 2015 07:46:20 +0000 (09:46 +0200)]
ath5k: use DECLARE_EWMA
This reduces code size slightly (at least on x86/64) while also
removing memory consumption by two unsigned long values for each
ath5k device.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Wed, 19 Aug 2015 07:48:40 +0000 (09:48 +0200)]
virtio_net: use DECLARE_EWMA
Instead of using the out-of-line EWMA calculation, use DECLARE_EWMA()
to create static inlines. On x86/64 this results in no change in code
size for me, but reduces the struct receive_queue size by the two
unsigned long values that store the parameters.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 19 Aug 2015 07:21:58 +0000 (10:21 +0300)]
bnx2x: Fix vxlan endianity issue
Commit
f34fa14cc033 ("bnx2x: Add vxlan RSS support") has introduced an
endianity issue when passing the vxlan UDP port to the HW.
Reported-by: <fengguang.wu@intel.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 Aug 2015 20:01:57 +0000 (13:01 -0700)]
Merge branch 'vrf-cleanups-part-2'
Nikolay Aleksandrov says:
====================
vrf: cleanups part 2
This is the next part of vrf cleanups, patch 1 drops the SLAB_PANIC
when creating kmem cache since it's handled, patch 02 removes a slave
duplicate check which is already done by the lower/upper code, patch 3
moves the ndo_add_slave code around a bit so we can drop an error
label and patch 4 drops the master device checks which are unnecessary
because the ops are taken from the master device itself so it can't be
different.
====================
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Nikolay Aleksandrov [Wed, 19 Aug 2015 03:27:10 +0000 (06:27 +0300)]
vrf: ndo_add|del_slave drop unnecessary checks
When ndo_add|del_slave ops are used, they're taken from the respective
master device's netdev ops, so if the master device is a VRF only then
the VRF ops will get called thus no need to check the type of the
master.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>