David S. Miller [Sat, 19 Oct 2013 23:45:46 +0000 (19:45 -0400)]
Merge branch 'net_get_random_once'
Hannes Frederic Sowa says:
====================
This series implements support for delaying the initialization of secret
keys, e.g. used for hashing, for as long as possible. This functionality
is implemented by a new macro, net_get_random_bytes.
I already used it to protect the socket hashes, the syncookie secret
(most important) and the tcp_fastopen secrets.
Changelog:
v2) Use static_keys in net_get_random_once to have as minimal impact to
the fast-path as possible.
v3) added patch "static_key: WARN on usage before jump_label_init was called":
Patch "x86/jump_label: expect default_nop if static_key gets enabled
on boot-up" relaxes the checks for using static_key primitives before
jump_label_init. So tighten them first.
v4) Update changelog on the patch "static_key: WARN on usage before
jump_label_init was called"
Included patches:
ipv4: split inet_ehashfn to hash functions per compilation unit
ipv6: split inet6_ehashfn to hash functions per compilation unit
static_key: WARN on usage before jump_label_init was called
x86/jump_label: expect default_nop if static_key gets enabled on boot-up
net: introduce new macro net_get_random_once
inet: split syncookie keys for ipv4 and ipv6 and initialize with net_get_random_once
inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once
tcp: switch tcp_fastopen key generation to net_get_random_once
net: switch net_secret key generation to net_get_random_once
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Sat, 19 Oct 2013 19:48:59 +0000 (21:48 +0200)]
net: switch net_secret key generation to net_get_random_once
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Sat, 19 Oct 2013 19:48:58 +0000 (21:48 +0200)]
tcp: switch tcp_fastopen key generation to net_get_random_once
Changed key initialization of tcp_fastopen cookies to net_get_random_once.
If the user sets a custom key net_get_random_once must be called at
least once to ensure we don't overwrite the user provided key when the
first cookie is generated later on.
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Sat, 19 Oct 2013 19:48:57 +0000 (21:48 +0200)]
inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once
Initialize the ehash and ipv6_hash_secrets with net_get_random_once.
Each compilation unit gets its own secret now:
ipv4/inet_hashtables.o
ipv4/udp.o
ipv6/inet6_hashtables.o
ipv6/udp.o
rds/connection.o
The functions still get inlined into the hashing functions. In the fast
path we have at most two (needed in ipv6) if (unlikely(...)).
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Sat, 19 Oct 2013 19:48:56 +0000 (21:48 +0200)]
inet: split syncookie keys for ipv4 and ipv6 and initialize with net_get_random_once
This patch splits the secret key for syncookies for ipv4 and ipv6 and
initializes them with net_get_random_once. This change was the reason I
did this series. I think the initialization of the syncookie_secret is
way to early.
Cc: Florian Westphal <fw@strlen.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Sat, 19 Oct 2013 19:48:55 +0000 (21:48 +0200)]
net: introduce new macro net_get_random_once
net_get_random_once is a new macro which handles the initialization
of secret keys. It is possible to call it in the fast path. Only the
initialization depends on the spinlock and is rather slow. Otherwise
it should get used just before the key is used to delay the entropy
extration as late as possible to get better randomness. It returns true
if the key got initialized.
The usage of static_keys for net_get_random_once is a bit uncommon so
it needs some further explanation why this actually works:
=== In the simple non-HAVE_JUMP_LABEL case we actually have ===
no constrains to use static_key_(true|false) on keys initialized with
STATIC_KEY_INIT_(FALSE|TRUE). So this path just expands in favor of
the likely case that the initialization is already done. The key is
initialized like this:
___done_key = { .enabled = ATOMIC_INIT(0) }
The check
if (!static_key_true(&___done_key)) \
expands into (pseudo code)
if (!likely(___done_key > 0))
, so we take the fast path as soon as ___done_key is increased from the
helper function.
=== If HAVE_JUMP_LABELs are available this depends ===
on patching of jumps into the prepared NOPs, which is done in
jump_label_init at boot-up time (from start_kernel). It is forbidden
and dangerous to use net_get_random_once in functions which are called
before that!
At compilation time NOPs are generated at the call sites of
net_get_random_once. E.g. net/ipv6/inet6_hashtable.c:inet6_ehashfn (we
need to call net_get_random_once two times in inet6_ehashfn, so two NOPs):
71: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
76: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
Both will be patched to the actual jumps to the end of the function to
call __net_get_random_once at boot time as explained above.
arch_static_branch is optimized and inlined for false as return value and
actually also returns false in case the NOP is placed in the instruction
stream. So in the fast case we get a "return false". But because we
initialize ___done_key with (enabled != (entries & 1)) this call-site
will get patched up at boot thus returning true. The final check looks
like this:
if (!static_key_true(&___done_key)) \
___ret = __net_get_random_once(buf, \
expands to
if (!!static_key_false(&___done_key)) \
___ret = __net_get_random_once(buf, \
So we get true at boot time and as soon as static_key_slow_inc is called
on the key it will invert the logic and return false for the fast path.
static_key_slow_inc will change the branch because it got initialized
with .enabled == 0. After static_key_slow_inc is called on the key the
branch is replaced with a nop again.
=== Misc: ===
The helper defers the increment into a workqueue so we don't
have problems calling this code from atomic sections. A seperate boolean
(___done) guards the case where we enter net_get_random_once again before
the increment happend.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Sat, 19 Oct 2013 19:48:54 +0000 (21:48 +0200)]
x86/jump_label: expect default_nop if static_key gets enabled on boot-up
net_get_random_once(intrduced in the next patch) uses static_keys in
a way that they get enabled on boot-up instead of replaced with an
ideal_nop. So check for default_nop on initial enabling.
Other architectures don't check for this.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: x86@kernel.org
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Sat, 19 Oct 2013 19:48:53 +0000 (21:48 +0200)]
static_key: WARN on usage before jump_label_init was called
Usage of the static key primitives to toggle a branch must not be used
before jump_label_init() is called from init/main.c. jump_label_init
reorganizes and wires up the jump_entries so usage before that could
have unforeseen consequences.
Following primitives are now checked for correct use:
* static_key_slow_inc
* static_key_slow_dec
* static_key_slow_dec_deferred
* jump_label_rate_limit
The x86 architecture already checks this by testing if the default_nop
was already replaced with an optimal nop or with a branch instruction. It
will panic then. Other architectures don't check for this.
Because we need to relax this check for the x86 arch to allow code to
transition from default_nop to the enabled state and other architectures
did not check for this at all this patch introduces checking on the
static_key primitives in a non-arch dependent manner.
All checked functions are considered slow-path so the additional check
does no harm to performance.
The warnings are best observed with earlyprintk.
Based on a patch from Andi Kleen.
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Sat, 19 Oct 2013 19:48:52 +0000 (21:48 +0200)]
ipv6: split inet6_ehashfn to hash functions per compilation unit
This patch splits the inet6_ehashfn into separate ones in
ipv6/inet6_hashtables.o and ipv6/udp.o to ease the introduction of
seperate secrets keys later.
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Sat, 19 Oct 2013 19:48:51 +0000 (21:48 +0200)]
ipv4: split inet_ehashfn to hash functions per compilation unit
This duplicates a bit of code but let's us easily introduce
separate secret keys later. The separate compilation units are
ipv4/inet_hashtabbles.o, ipv4/udp.o and rds/connection.o.
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 19 Oct 2013 23:37:06 +0000 (19:37 -0400)]
Merge branch 'ipip_gso'
Eric Dumazet says:
====================
net: Implement GSO/TSO support for IPIP
This patch serie implements GSO/TSO support for IPIP
David, please note it applies after "ipv4: gso: send_check() & segment() cleanups"
( http://patchwork.ozlabs.org/patch/284714/ )
Broadcom bnx2x driver is now enabled for TSO support of IPIP traffic
Before patch :
lpq83:~# ./netperf -H 7.7.9.84 -Cc
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 16384 16384 10.00 3357.88 5.09 3.70 2.983 2.167
After patch :
lpq83:~# ./netperf -H 7.7.9.84 -Cc
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 16384 16384 10.00 8532.40 2.55 7.73 0.588 1.781
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 19 Oct 2013 18:42:58 +0000 (11:42 -0700)]
bnx2x: add TSO support for IPIP
bnx2x driver already handles TSO for GRE, current code
is the same for IPIP.
Performance results : (Note we are now limited by receiver,
as it does not support GRO for IPIP yet)
Before patch :
lpq83:~# ./netperf -H 7.7.9.84 -Cc
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 16384 16384 10.00 7710.19 4.52 6.62 1.152 1.687
After patch :
lpq83:~# ./netperf -H 7.7.9.84 -Cc
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 16384 16384 10.00 8532.40 2.55 7.73 0.588 1.781
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 19 Oct 2013 18:42:57 +0000 (11:42 -0700)]
ipip: add GSO/TSO support
Now inet_gso_segment() is stackable, its relatively easy to
implement GSO/TSO support for IPIP
Performance results, when segmentation is done after tunnel
device (as no NIC is yet enabled for TSO IPIP support) :
Before patch :
lpq83:~# ./netperf -H 7.7.9.84 -Cc
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 16384 16384 10.00 3357.88 5.09 3.70 2.983 2.167
After patch :
lpq83:~# ./netperf -H 7.7.9.84 -Cc
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 16384 16384 10.00 7710.19 4.52 6.62 1.152 1.687
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 19 Oct 2013 18:42:56 +0000 (11:42 -0700)]
ipv4: gso: make inet_gso_segment() stackable
In order to support GSO on IPIP, we need to make
inet_gso_segment() stackable.
It should not assume network header starts right after mac
header.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 19 Oct 2013 18:42:55 +0000 (11:42 -0700)]
ipv4: generalize gre_handle_offloads
This patch makes gre_handle_offloads() more generic
and rename it to iptunnel_handle_offloads()
This will be used to add GSO/TSO support to IPIP tunnels.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 19 Oct 2013 18:42:54 +0000 (11:42 -0700)]
net: generalize skb_segment()
While implementing GSO/TSO support for IPIP, I found skb_segment()
was assuming network header was immediately following mac header.
Its not really true in the case inet_gso_segment() is stacked :
By the time tcp_gso_segment() is called, network header points
to the inner IP header.
Let's instead assume nothing and pick the current offsets found in
original skb, we have skb_headers_offset_update() helper for that.
Also move the csum_start update inside skb_headers_offset_update()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 18 Oct 2013 21:43:55 +0000 (14:43 -0700)]
ipv6: gso: remove redundant locking
ipv6_gso_send_check() and ipv6_gso_segment() are called by
skb_mac_gso_segment() under rcu lock, no need to use
rcu_read_lock() / rcu_read_unlock()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ajit Khaparde [Fri, 18 Oct 2013 21:06:24 +0000 (16:06 -0500)]
be2net: Rework PCIe error report log messaging
Currently we log a message whenever pcie_enable_error_reporting fails.
The message clutters up logs, especially when we don't support it for VFs.
Instead enable this only for PFs and log a message when the call succeeds.
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Fri, 18 Oct 2013 20:48:25 +0000 (13:48 -0700)]
net: misc: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Fri, 18 Oct 2013 20:48:24 +0000 (13:48 -0700)]
net: ipv4/ipv6: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Fri, 18 Oct 2013 20:48:23 +0000 (13:48 -0700)]
net: dccp: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Fri, 18 Oct 2013 20:48:22 +0000 (13:48 -0700)]
net: 8021q/bluetooth/bridge/can/ceph: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 18 Oct 2013 20:13:27 +0000 (13:13 -0700)]
ipv4: gso: send_check() & segment() cleanups
inet_gso_segment() and inet_gso_send_check() are called by
skb_mac_gso_segment() under rcu lock, no need to use
rcu_read_lock() / rcu_read_unlock()
Avoid calling ip_hdr() twice per function.
We can use ip_send_check() helper.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 19 Oct 2013 23:09:18 +0000 (19:09 -0400)]
bonding: Remove __exit tag from bond_netlink_fini().
It can be called from the module init function, so it cannot
be in the exit section.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 19 Oct 2013 22:59:25 +0000 (18:59 -0400)]
Merge branch 'bonding'
Jiri Pirko says:
====================
bonding: introduce bonding options Netlink support
This patchset basically allows "mode" and "active_slave" bonding options
to be propagated and set up via standart RT Netlink interface.
In future other options can be easily added as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 18 Oct 2013 15:43:39 +0000 (17:43 +0200)]
bonding: add Netlink support active_slave option
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 18 Oct 2013 15:43:38 +0000 (17:43 +0200)]
bonding: add Netlink support mode option
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 18 Oct 2013 15:43:37 +0000 (17:43 +0200)]
bonding: move active_slave getting into separate function
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 18 Oct 2013 15:43:36 +0000 (17:43 +0200)]
bonding: remove bond_ioctl_change_active()
no longer needed since bond_option_active_slave_set() can be used
instead.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 18 Oct 2013 15:43:35 +0000 (17:43 +0200)]
bonding: move active_slave setting into separate function
Do a bit of refactoring on the way.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 18 Oct 2013 15:43:34 +0000 (17:43 +0200)]
bonding: move mode setting into separate function
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 18 Oct 2013 15:43:33 +0000 (17:43 +0200)]
bonding: push Netlink bits into separate file
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Thu, 17 Oct 2013 00:29:34 +0000 (17:29 -0700)]
em_ipset: use dev_net() accessor
Randy found that if network namespace not enabled then
nd_net does not exist and would cause compilation failure.
This is handled correctly by using the dev_net() macro.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Wed, 16 Oct 2013 16:36:51 +0000 (12:36 -0400)]
tcp: remove redundant code in __tcp_retransmit_skb()
Remove the specialized code in __tcp_retransmit_skb() that tries to trim
any ACKed payload preceding a FIN before we retransmit (this was added
in 1999 in v2.2.3pre3). This trimming code was made unreachable by the
more general code added above it that uses tcp_trim_head() to trim any
ACKed payload, with or without a FIN (this was added in "[NET]: Add
segmentation offload support to TCP." in 2002 circa v2.5.33).
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Mon, 14 Oct 2013 14:05:09 +0000 (17:05 +0300)]
gianfar: Simplify MQ polling to avoid soft lockup
Under certain low traffic conditions, the single core
devices with multiple Rx/Tx queues (MQ mode) may reach
soft lockup due to gfar_poll not returning in proper time.
The following exception was obtained using iperf on a 100Mbit
half-duplex link, for a p1010 single core device:
BUG: soft lockup - CPU#0 stuck for 23s! [iperf:2847]
Modules linked in:
CPU: 0 PID: 2847 Comm: iperf Not tainted 3.12.0-rc3 #16
task:
e8bf8000 ti:
eeb16000 task.ti:
ee646000
NIP:
c0255b6c LR:
c0367ae8 CTR:
c0461c18
REGS:
eeb17e70 TRAP: 0901 Not tainted (3.12.0-rc3)
MSR:
00029000 <CE,EE,ME> CR:
44228428 XER:
20000000
GPR00:
c0367ad4 eeb17f20 e8bf8000 ee01f4b4 00000008 ffffffff ffffffff
00000000
GPR08:
000000c0 00000008 000000ff ffffffc0 000193fe
NIP [
c0255b6c] find_next_bit+0xb8/0xc4
LR [
c0367ae8] gfar_poll+0xc8/0x1d8
Call Trace:
[
eeb17f20] [
c0367ad4] gfar_poll+0xb4/0x1d8 (unreliable)
[
eeb17f70] [
c0422100] net_rx_action+0xa4/0x158
[
eeb17fa0] [
c003ec6c] __do_softirq+0xcc/0x17c
[
eeb17ff0] [
c000c28c] call_do_softirq+0x24/0x3c
[
ee647cc0] [
c0004660] do_softirq+0x6c/0x94
[
ee647ce0] [
c003eb9c] local_bh_enable+0x9c/0xa0
[
ee647cf0] [
c0454fe8] tcp_prequeue_process+0xa4/0xdc
[
ee647d10] [
c0457e44] tcp_recvmsg+0x498/0x96c
[
ee647d80] [
c047b630] inet_recvmsg+0x40/0x64
[
ee647da0] [
c040ca8c] sock_recvmsg+0x90/0xc0
[
ee647e30] [
c040edb8] SyS_recvfrom+0x98/0xfc
To prevent this, the outer while() loop has been removed
allowing gfar_poll() to return faster even if there's
still budget left. Also, there's no need to recompute
the budget per Rx queue anymore.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Thu, 17 Oct 2013 20:34:11 +0000 (13:34 -0700)]
fib: Use const struct nl_info * in rtmsg_fib
The rtmsg_fib function doesn't modify this argument so mark
it const.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 18 Oct 2013 09:06:56 +0000 (12:06 +0300)]
ax25: cleanup a range test
The current test works fine in practice. The "amount" variable is
actually used as a boolean so negative values or any non-zero values
count as "true". However since we don't allow numbers greater than one,
let's not allow negative numbers either.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
baker.zhang [Sun, 13 Oct 2013 11:50:09 +0000 (19:50 +0800)]
fib_trie: remove duplicated rcu lock
fib_table_lookup has included the rcu lock protection.
Signed-off-by: baker.zhang <baker.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Oct 2013 17:51:35 +0000 (13:51 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
1) Don't use a wildcard SA if a more precise one is in acquire state,
from Fan Du.
2) Simplify the SA lookup when using wildcard source. We need to check
only the destination in this case, from Fan Du.
3) Add a receive path hook for IPsec virtual tunnel interfaces
to xfrm6_mode_tunnel.
4) Add support for IPsec virtual tunnel interfaces to ipv6.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Oct 2013 17:42:30 +0000 (13:42 -0400)]
Merge branch 'qlcnic'
Himanshu Madhani says:
====================
qlcnic: ethtool enhancements and code cleanup.
This patch series contains
o updates to ethtool for pause settings and enhance
register dump to display mask and ring indices.
o cleanup in DCB code and remove redundant eSwitch enablement command.
o fixed firmware dump collection logic to skip unknown entries.
Changes from v3 -> v4
o Dropped patch for Tx queue validation to be submitted in net.
Changes from v2 -> v3
o Updated patch to print informational messages as per Joe Perches's comment.
Changes from v1 -> v2
o Dropped patch to register device if adapter is in FAILED state for more rework.
o Updated patch to display ring indices via ethtool per Ben Hutchings's comment.
o Update patch for DCB cleanup per Stephen Hemminger's comment.
Please apply to net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Himanshu Madhani [Fri, 18 Oct 2013 16:22:35 +0000 (12:22 -0400)]
qlcnic: update version to 5.3.51
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shahed Shaikh [Fri, 18 Oct 2013 16:22:34 +0000 (12:22 -0400)]
qlcnic: Skip unknown entry type while collecting firmware dump
o Driver aborts the minidump collection operation when it finds
an unknown entry opcode. This patch skips unknown entry type
and resumes the minidump collection operation.
o Removed a comparision of collected dump size with expected dump size.
Size may differ when driver decides to skip an entry.
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sucheta Chakraborty [Fri, 18 Oct 2013 16:22:33 +0000 (12:22 -0400)]
qlcnic: dcb code cleanup and refactoring.
o Move dcb specific function definitions to dcb files.
o Move dcb specific variables to qlcnic_dcb structure.
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sony Chacko [Fri, 18 Oct 2013 16:22:32 +0000 (12:22 -0400)]
qlcnic: Remove redundant eSwitch enable commands
When more than one NIC physical functions are enabled on a port,
eSwitch on that port gets enabled automatically. Driver
need not explicitly enable the eSwitch.
Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jitendra Kalsaria [Fri, 18 Oct 2013 16:22:31 +0000 (12:22 -0400)]
qlcnic: Update ethtool standard pause settings.
Update ethtool standard pause parameter settings and display
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pratik Pujar [Fri, 18 Oct 2013 16:22:30 +0000 (12:22 -0400)]
qlcnic: Firmware dump collection when auto recovery is disabled.
o Allow collecting the firmware dump of halted firmware when auto
recovery is disabled.
Signed-off-by: Pratik Pujar <pratik.pujar@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pratik Pujar [Fri, 18 Oct 2013 16:22:29 +0000 (12:22 -0400)]
qlcnic: Enhance ethtool to display ring indices and interrupt mask
o Updated ethtool -d <ethX> option to display ring indices for Transmit(Tx),
Receive(Rx), and Status(St) rings.
o Updated ethtool -d <ethX> option to display ring interrupt mask for Transmit(Tx),
and Status(St) rings.
Signed-off-by: Pratik Pujar <pratik.pujar@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sucheta Chakraborty [Fri, 18 Oct 2013 16:22:28 +0000 (12:22 -0400)]
qlcnic: Print informational messages only once during driver load.
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 18 Oct 2013 17:36:17 +0000 (10:36 -0700)]
tcp: rename tcp_tso_segment()
Rename tcp_tso_segment() to tcp_gso_segment(), to better reflect
what is going on, and ease grep games.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Oct 2013 17:22:19 +0000 (13:22 -0400)]
Merge branch 'tipc'
Jon Maloy says:
====================
Some small and relatively straightforward patches. With exception of
the two first ones they are all unrelated and address minor issues.
v2: update of v1 (http://patchwork.ozlabs.org/patch/277404/)
-added commit to use memcpy_fromiovec on user data as per v1 feedback
-updated sparse fix commit to drop chunks covered by above commit
-added new commit that greatly simplifies the link lookup routine
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Hugne [Fri, 18 Oct 2013 05:23:21 +0000 (07:23 +0200)]
tipc: simplify the link lookup routine
When checking statistics or changing parameters on a link, the
link_find_link function is used to locate the link with a given
name. The complex method of deconstructing the name into local
and remote address/interface is error prone and may fail if the
interface names contains special characters. We change the lookup
method to iterate over the list of nodes and compare the link
names.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Fri, 18 Oct 2013 05:23:20 +0000 (07:23 +0200)]
tipc: correct return value of link_cmd_set_value routine
link_cmd_set_value() takes commands for link, bearer and media related
configuration. Genereally the function returns 0 when a command is
recognized, and -EINVAL when it is not. However, in the switch for link
related commands it returns 0 even when the command is unrecognized. This
will sometimes make it look as if a failed configuration command has been
successful, but has otherwise no negative effects.
We remove this anomaly by returning -EINVAL even for link commands. We also
rework all three switches to make them conforming to common kernel coding
style.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Fri, 18 Oct 2013 05:23:19 +0000 (07:23 +0200)]
tipc: correct return value of recv_msg routine
Currently, rcv_msg() always returns zero on a packet delivery upcall
from net_device.
To make its behavior more compliant with the way this API should be
used, we change this to let it return NET_RX_SUCCESS (which is zero
anyway) when it is able to handle the packet, and NET_RX_DROP otherwise.
The latter does not imply any functional change, it only enables the
driver to keep more accurate statistics about the fate of delivered
packets.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Fri, 18 Oct 2013 05:23:18 +0000 (07:23 +0200)]
tipc: avoid unnecessary lookup for tipc bearer instance
tipc_block_bearer() currently takes a bearer name (const char*)
as argument. This requires the function to make a lookup to find
the pointer to the corresponding bearer struct. In the current
code base this is not necessary, since the only two callers
(tipc_continue(),recv_notification()) already have validated
copies of this pointer, and hence can pass it directly in the
function call.
We change tipc_block_bearer() to directly take struct tipc_bearer*
as argument instead.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Fri, 18 Oct 2013 05:23:17 +0000 (07:23 +0200)]
tipc: make bearer and media naming consistent
TIPC 'bearer' exists as an abstract concept, while 'media'
is deemed a specific implementation of a bearer, such as Ethernet
or Infiniband media. When a component inside TIPC wants to control
a specific media, it only needs to access the generic bearer API
to achieve this. However, in the current media implementations,
the 'bearer' name is also extensively used in media specific
function and variable names.
This may create confusion, so we choose to replace the term 'bearer'
with 'media' in all function names, variable names, and prefixes
where this is what really is meant.
Note that this change is cosmetic only, and no runtime behaviour
changes are made here.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Fri, 18 Oct 2013 05:23:16 +0000 (07:23 +0200)]
tipc: silence sparse warnings
Eliminate below sparse warnings:
net/tipc/link.c:1210:37: warning: cast removes address space of expression
net/tipc/link.c:1218:59: warning: incorrect type in argument 2 (different address spaces)
net/tipc/link.c:1218:59: expected void const [noderef] <asn:1>*from
net/tipc/link.c:1218:59: got unsigned char const [usertype] *[assigned] sect_crs
net/tipc/socket.c:341:49: warning: Using plain integer as NULL pointer
net/tipc/socket.c:1371:36: warning: Using plain integer as NULL pointer
net/tipc/socket.c:1694:57: warning: Using plain integer as NULL pointer
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Andreas Bofjäll <andreas.bofjall@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Fri, 18 Oct 2013 05:23:15 +0000 (07:23 +0200)]
tipc: remove iovec length parameter from all sending functions
tipc_msg_build() now copies message data from iovec to skb_buff
using memcpy_fromiovecend(), which doesn't need to be passed the
iovec length to perform the copying.
So we remove the parameter indicating iovec length in all
functions where TIPC messages are built and sent.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Fri, 18 Oct 2013 05:23:14 +0000 (07:23 +0200)]
tipc: don't use memcpy to copy from user space
tipc_msg_build() calls skb_copy_to_linear_data_offset() to copy data
from user space to kernel space. However, the latter function does
in its turn call memcpy() to perform the actual copying. This poses
an obvious security and robustness risk, since memcpy() never makes
any validity check on the pointer it is copying from.
To correct this, we the replace the offending function call with
a call to memcpy_fromiovecend(), which uses copy_from_user() to
perform the copying.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 14 Oct 2013 20:49:21 +0000 (21:49 +0100)]
net: Delete trailing semi-colon from definition of netdev_WARN()
Macro definitions should not normally end with a semi-colon, as this
makes it dangerous to use them an if...else statement. Happily this
has not happened yet.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 17 Oct 2013 23:27:07 +0000 (16:27 -0700)]
net: refactor sk_page_frag_refill()
While working on virtio_net new allocation strategy to increase
payload/truesize ratio, we found that refactoring sk_page_frag_refill()
was needed.
This patch splits sk_page_frag_refill() into two parts, adding
skb_page_frag_refill() which can be used without a socket.
While we are at it, add a minimum frag size of 32 for
sk_page_frag_refill()
Michael will either use netdev_alloc_frag() from softirq context,
or skb_page_frag_refill() from process context in refill_work()
(GFP_KERNEL allocations)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Oct 2013 04:04:05 +0000 (00:04 -0400)]
Merge branch 'pci_set_drvdata'
Jingoo Han says:
====================
net: ethernet: remove unnecessary pci_set_drvdata() part 1
Since commit
0998d0631001288a5974afc0b2a5f568bcdecb4d
(device-core: Ensure drvdata = NULL when no driver is bound),
the driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 18 Oct 2013 00:25:29 +0000 (09:25 +0900)]
net: enic: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 18 Oct 2013 00:24:54 +0000 (09:24 +0900)]
net: cxgb4vf: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 18 Oct 2013 00:24:07 +0000 (09:24 +0900)]
net: cxgb2: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 18 Oct 2013 00:23:27 +0000 (09:23 +0900)]
net: cxgb3: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 18 Oct 2013 00:23:00 +0000 (09:23 +0900)]
net: cxgb4: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 18 Oct 2013 00:22:22 +0000 (09:22 +0900)]
net: bna: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 18 Oct 2013 00:21:54 +0000 (09:21 +0900)]
net: tg3: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 18 Oct 2013 00:21:26 +0000 (09:21 +0900)]
net: bnx2x: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 18 Oct 2013 00:21:10 +0000 (09:21 +0900)]
net: bnx2: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 18 Oct 2013 00:20:24 +0000 (09:20 +0900)]
net: alx: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 18 Oct 2013 00:19:54 +0000 (09:19 +0900)]
net: amd8111e: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 18 Oct 2013 00:19:23 +0000 (09:19 +0900)]
net: pcnet32: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Acked-by: Don Fry <pcnet32@frontier.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 18 Oct 2013 00:18:53 +0000 (09:18 +0900)]
net: starfire: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 18 Oct 2013 00:18:18 +0000 (09:18 +0900)]
net: 8390: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Fri, 18 Oct 2013 00:17:58 +0000 (09:17 +0900)]
net: typhoon: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Acked-by: David Dillow <dave@thedillows.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Oct 2013 20:14:29 +0000 (16:14 -0400)]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
This is a batch of updates intended for the 3.13 stream...
The biggest item of interest in here is wcn36xx, the new mac80211
driver for Qualcomm WCN3660/WCN3680 hardware.
Regarding the mac80211 bits, Johannes says:
"We have an assortment of cleanups and new features, of which the
biggest one is probably the channel-switch support in IBSS. Nothing
else really stands out much."
On top of that, the ath9k and rt2x00 get a lot of update action from
Felix Fietkau and Gabor Juhos, respectively. There are a handful of
updates to other drivers here and there as well.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 16 Oct 2013 09:49:04 +0000 (02:49 -0700)]
ipv4: shrink rt_cache_stat
Half of the rt_cache_stat fields are no longer used after IP
route cache removal, lets shrink this per cpu area.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Mon, 14 Oct 2013 19:36:32 +0000 (12:36 -0700)]
netdev: inet_timewait_sock.h missing semi-colon when KMEMCHECK is enabled
Fix (a few hundred) build errors due to missing semi-colon when
KMEMCHECK is enabled:
include/net/inet_timewait_sock.h:139:2: error: expected ',', ';' or '}' before 'int'
include/net/inet_timewait_sock.h:148:28: error: 'const struct inet_timewait_sock' has no member named 'tw_death_node'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Oct 2013 19:36:04 +0000 (15:36 -0400)]
Merge branch 'xen_netback'
xen-netback: IPv6 offload support
====================
This patch series adds support for checksum and large packet offloads
into xen-netback. Testing has mainly been done using the Microsoft
network hardware certification suite running in Server 2008R2 VMs with
Citrix PV frontends.
v2:
- Fixed Wei's email address in Cc lines
v3:
- Responded to Wei's comments:
- netif.h now updated with comments and a definition of
XEN_NETIF_GSO_TYPE_NONE.
- limited number of pullups
- Responded to Annie's comments:
- New GSO_BIT macro
v4:
- Responded to more of Wei's comments
- Remove parsing of IPv6 fragment header and added warning
v5:
- Added comment concerning the value chosen for PKT_PROT_LEN
- Dropped deprecation of feature-no-csum-offload
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Wed, 16 Oct 2013 16:50:32 +0000 (17:50 +0100)]
xen-netback: enable IPv6 TCP GSO to the guest
This patch adds code to handle SKB_GSO_TCPV6 skbs and construct appropriate
extra or prefix segments to pass the large packet to the frontend. New
xenstore flags, feature-gso-tcpv6 and feature-gso-tcpv6-prefix, are sampled
to determine if the frontend is capable of handling such packets.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Wed, 16 Oct 2013 16:50:31 +0000 (17:50 +0100)]
xen-netback: handle IPv6 TCP GSO packets from the guest
This patch adds a xenstore feature flag, festure-gso-tcpv6, to advertise
that netback can handle IPv6 TCP GSO packets. It creates SKB_GSO_TCPV6 skbs
if the frontend passes an extra segment with the new type
XEN_NETIF_GSO_TYPE_TCPV6 added to netif.h.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Wed, 16 Oct 2013 16:50:30 +0000 (17:50 +0100)]
xen-netback: Unconditionally set NETIF_F_RXCSUM
There is no mechanism to insist that a guest always generates a packet
with good checksum (at least for IPv4) so we must handle checksum
offloading from the guest and hence should set NETIF_F_RXCSUM.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Wed, 16 Oct 2013 16:50:29 +0000 (17:50 +0100)]
xen-netback: add support for IPv6 checksum offload from guest
For performance of VM to VM traffic on a single host it is better to avoid
calculation of TCP/UDP checksum in the sending frontend. To allow this this
patch adds the code necessary to set up partial checksum for IPv6 packets
and xenstore flag feature-ipv6-csum-offload to advertise that fact to
frontends.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Wed, 16 Oct 2013 16:50:28 +0000 (17:50 +0100)]
xen-netback: add support for IPv6 checksum offload to guest
Check xenstore flag feature-ipv6-csum-offload to determine if a
guest is happy to accept IPv6 packets with only partial checksum.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Oct 2013 19:32:15 +0000 (15:32 -0400)]
Merge branch 'bonding_rcu'
bonding: patchset for rcu use in bonding
====================
The Patch Set convert the xmit of 3ad and alb mode to use rcu lock.
dd rtnl lock and remove read lock for bond sysfs.
v2 because the bond_for_each_slave_rcu without rcu_read_lock() will occurs one warming, so
add new function for alb xmit path to avoid warming.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
dingtianhong [Tue, 15 Oct 2013 08:28:42 +0000 (16:28 +0800)]
bonding: add rtnl lock and remove read lock for bond sysfs
The bond_for_each_slave() will not be protected by read_lock(),
only protected by rtnl_lock(), so need to replace read_lock()
with rtnl_lock().
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dingtianhong [Tue, 15 Oct 2013 08:28:39 +0000 (16:28 +0800)]
bonding: use RCU protection for alb xmit path
The commit
278b20837511776dc9d5f6ee1c7fabd5479838bb
(bonding: initial RCU conversion) has convert the roundrobin,
active-backup, broadcast and xor xmit path to rcu protection,
the performance will be better for these mode, so this time,
convert xmit path for alb mode.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dingtianhong [Tue, 15 Oct 2013 08:28:35 +0000 (16:28 +0800)]
bonding: use RCU protection for 3ad xmit path
The commit
278b20837511776dc9d5f6ee1c7fabd5479838bb
(bonding: initial RCU conversion) has convert the roundrobin,
active-backup, broadcast and xor xmit path to rcu protection,
the performance will be better for these mode, so this time,
convert xmit path for 3ad mode.
Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Oct 2013 19:22:05 +0000 (15:22 -0400)]
Merge branch 'net-next' of git://git./linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says:
====================
netfilter updates: nf_tables pull request
The following patchset contains the current original nf_tables tree
condensed in 17 patches. I have organized them by chronogical order
since the original nf_tables code was released in 2009 and by
dependencies between the different patches.
The patches are:
1) Adapt all existing hooks in the tree to pass hook ops to the
hook callback function, required by nf_tables, from Patrick McHardy.
2) Move alloc_null_binding to nf_nat_core, as it is now also needed by
nf_tables and ip_tables, original patch from Patrick McHardy but
required major changes to adapt it to the current tree that I made.
3) Add nf_tables core, including the netlink API, the packet filtering
engine, expressions and built-in tables, from Patrick McHardy. This
patch includes accumulated fixes since 2009 and minor enhancements.
The patch description contains a list of references to the original
patches for the record. For those that are not familiar to the
original work, see [1], [2] and [3].
4) Add netlink set API, this replaces the original set infrastructure
to introduce a netlink API to add/delete sets and to add/delete
set elements. This includes two set types: the hash and the rb-tree
sets (used for interval based matching). The main difference with
ipset is that this infrastructure is data type agnostic. Patch from
Patrick McHardy.
5) Allow expression operation overload, this API change allows us to
provide define expression subtypes depending on the configuration
that is received from user-space via Netlink. It is used by follow
up patches to provide optimized versions of the payload and cmp
expressions and the x_tables compatibility layer, from Patrick
McHardy.
6) Add optimized data comparison operation, it requires the previous
patch, from Patrick McHardy.
7) Add optimized payload implementation, it requires patch 5, from
Patrick McHardy.
8) Convert built-in tables to chain types. Each chain type have special
semantics (filter, route and nat) that are used by userspace to
configure the chain behaviour. The main chain regarding iptables
is that tables become containers of chain, with no specific semantics.
However, you may still configure your tables and chains to retain
iptables like semantics, patch from me.
9) Add compatibility layer for x_tables. This patch adds support to
use all existing x_tables extensions from nf_tables, this is used
to provide a userspace utility that accepts iptables syntax but
used internally the nf_tables kernel core. This patch includes
missing features in the nf_tables core such as the per-chain
stats, default chain policy and number of chain references, which
are required by the iptables compatibility userspace tool. Patch
from me.
10) Fix transport protocol matching, this fix is a side effect of the
x_tables compatibility layer, which now provides a pointer to the
transport header, from me.
11) Add support for dormant tables, this feature allows you to disable
all chains and rules that are contained in one table, from me.
12) Add IPv6 NAT support. At the time nf_tables was made, there was no
NAT IPv6 support yet, from Tomasz Bursztyka.
13) Complete net namespace support. This patch register the protocol
family per net namespace, so tables (thus, other objects contained
in tables such as sets, chains and rules) are only visible from the
corresponding net namespace, from me.
14) Add the insert operation to the nf_tables netlink API, this requires
adding a new position attribute that allow us to locate where in the
ruleset a rule needs to be inserted, from Eric Leblond.
15) Add rule batching support, including atomic rule-set updates by
using rule-set generations. This patch includes a change to nfnetlink
to include two new control messages to indicate the beginning and
the end of a batch. The end message is interpreted as the commit
message, if it's missing, then the rule-set updates contained in the
batch are aborted, from me.
16) Add trace support to the nf_tables packet filtering core, from me.
17) Add ARP filtering support, original patch from Patrick McHardy, but
adapted to fit into the chain type infrastructure. This was recovered
to be used by nft userspace tool and our compatibility arptables
userspace tool.
There is still work to do to fully replace x_tables [4] [5] but that can
be done incrementally by extending our netlink API. Moreover, looking at
netfilter-devel and the amount of contributions to nf_tables we've been
getting, I think it would be good to have it mainstream to avoid accumulating
large patchsets skip continuous rebases.
I tried to provide a reasonable patchset, we have more than 100 accumulated
patches in the original nf_tables tree, so I collapsed many of the small
fixes to the main patch we had since 2009 and provide a small batch for
review to netdev, while trying to retain part of the history.
For those who didn't give a try to nf_tables yet, there's a quick howto
available from Eric Leblond that describes how to get things working [6].
Comments/reviews welcome.
Thanks!
[1] http://lwn.net/Articles/324251/
[2] http://workshop.netfilter.org/2013/wiki/images/e/ee/Nftables-osd-2013-developer.pdf
[3] http://lwn.net/Articles/564095/
[4] http://people.netfilter.org/pablo/map-pending-work.txt
[4] http://people.netfilter.org/pablo/nftables-todo.txt
[5] https://home.regit.org/netfilter-en/nftables-quick-howto/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Opdenacker [Sun, 13 Oct 2013 06:26:31 +0000 (08:26 +0200)]
irda: update comment mentioning IRQF_DISABLED
This patch removes a comment mentioning IRQF_DISABLED,
which is deprecated.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Opdenacker [Sun, 13 Oct 2013 05:24:29 +0000 (07:24 +0200)]
isdn: remove deprecated IRQF_DISABLED
This patch proposes to remove the use of the IRQF_DISABLED flag
It's a NOOP since 2.6.35 and it will be removed one day.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Oct 2013 19:11:31 +0000 (15:11 -0400)]
Merge branch 'mlx4'
Amir Vadai says:
====================
net/mlx4: Mellanox driver update 15-10-2013
This patchset contains small code cleaning patches, and a patch to make
mlx4_core use module_request() in order to load the relevant link layer module
(mlx4_en or mlx4_ib) according to the port type.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eyal Perry [Tue, 15 Oct 2013 14:55:24 +0000 (16:55 +0200)]
net/mlx4_core: Load higher level modules according to ports type
Mellanox ConnectX architecture is: mlx4_core is the lower level
PCI driver which register on the PCI id, and protocol specific drivers
are depended on it: mlx4_en - for Ethernet and mlx4_ib for Infiniband.
NIC could have multiple ports which can change their type dynamically.
We use the request_module() call to load the relevant protocol driver
when needed: on loading time or at port type change event.
Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Tue, 15 Oct 2013 14:55:23 +0000 (16:55 +0200)]
net/mlx4: Unused local variable in mlx4_opreq_action
Clean up warning added by commit
fe6f700d "net/mlx4_core: Respond to
operation request by firmware".
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Tue, 15 Oct 2013 14:55:22 +0000 (16:55 +0200)]
net/mlx4: Fix typo, move similar defs to same location
Small code cleanup:
1. change MLX4_DEV_CAP_FLAGS2_REASSIGN_MAC_EN to MLX4_DEV_CAP_FLAG2_REASSIGN_MAC_EN
2. put MLX4_SET_PORT_PRIO2TC and MLX4_SET_PORT_SCHEDULER in the same union with the
other MLX4_SET_PORT_yyy
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Tue, 15 Oct 2013 14:55:21 +0000 (16:55 +0200)]
net/mlx4: Clean the code to eliminate trivial build warnings
Remove code that triggers trivial build warnings.
drivers/net/ethernet/mellanox/mlx4/cmd.c: In function ‘mlx4_set_vf_vlan’:
drivers/net/ethernet/mellanox/mlx4/cmd.c:2256: warning: variable ‘vf_oper’ set but not used
drivers/net/ethernet/mellanox/mlx4/mcg.c: In function ‘mlx4_map_sw_to_hw_steering_mode’:
drivers/net/ethernet/mellanox/mlx4/mcg.c:648: warning: comparison of unsigned expression < 0 is always false
drivers/net/ethernet/mellanox/mlx4/mcg.c: In function ‘mlx4_map_sw_to_hw_steering_id’:
drivers/net/ethernet/mellanox/mlx4/mcg.c:685: warning: comparison of unsigned expression < 0 is always false
drivers/net/ethernet/mellanox/mlx4/mcg.c: In function ‘mlx4_hw_rule_sz’:
drivers/net/ethernet/mellanox/mlx4/mcg.c:712: warning: comparison of unsigned expression < 0 is always false
drivers/net/ethernet/mellanox/mlx4/fw.c: In function ‘mlx4_opreq_action’:
drivers/net/ethernet/mellanox/mlx4/fw.c:1732: warning: variable ‘type_m’ set but not used
drivers/net/ethernet/mellanox/mlx4/srq.c:302: warning: no previous prototype for ‘mlx4_srq_lookup’
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 11 Oct 2013 15:54:49 +0000 (08:54 -0700)]
inet_diag: use sock_gen_put()
TCP listener refactoring, part 6 :
Use sock_gen_put() from inet_diag_dump_one_icsk() for future
SYN_RECV support.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Oct 2013 18:27:09 +0000 (14:27 -0400)]
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Included changes:
- ensure RecordRoute information is added to BAT_ICMP echo_request/reply only
- use VLAN_ETH_HLEN when possible
- use htons when possible
- substitute old fragmentation code with a new improved implementation by
Martin Hundebøll
- create common header for BAT_ICMP packets to improve extendibility
- consider the network coding overhead when computing the overall room needed by
batman headers
- add dummy soft-interface rx mode handler
- minor code refactoring and cleanups
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Thu, 17 Oct 2013 18:02:07 +0000 (14:02 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next into for-davem