Eric Dumazet [Thu, 30 Sep 2010 03:33:58 +0000 (03:33 +0000)]
ipv4: rcu conversion in ip_route_output_slow
ip_route_output_slow() is enclosed in an rcu_read_lock() protected
section, so that no references are taken/released on device, thanks to
__ip_dev_find() & dev_get_by_index_rcu()
Tested with ip route cache disabled, and a stress test :
Before patch:
elapsed time :
real 1m38.347s
user 0m11.909s
sys 23m51.501s
Profile:
13788.00 22.7% ip_route_output_slow [kernel]
7875.00 13.0% dst_destroy [kernel]
3925.00 6.5% fib_semantic_match [kernel]
3144.00 5.2% fib_rules_lookup [kernel]
3061.00 5.0% dst_alloc [kernel]
2276.00 3.7% rt_set_nexthop [kernel]
1762.00 2.9% fib_table_lookup [kernel]
1538.00 2.5% _raw_read_lock [kernel]
1358.00 2.2% ip_output [kernel]
After patch:
real 1m28.808s
user 0m13.245s
sys 20m37.293s
10950.00 17.2% ip_route_output_slow [kernel]
10726.00 16.9% dst_destroy [kernel]
5170.00 8.1% fib_semantic_match [kernel]
3937.00 6.2% dst_alloc [kernel]
3635.00 5.7% rt_set_nexthop [kernel]
2900.00 4.6% fib_rules_lookup [kernel]
2240.00 3.5% fib_table_lookup [kernel]
1427.00 2.2% _raw_read_lock [kernel]
1157.00 1.8% kmem_cache_alloc [kernel]
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 30 Sep 2010 03:31:56 +0000 (03:31 +0000)]
ipv4: introduce __ip_dev_find()
ip_dev_find(net, addr) finds a device given an IPv4 source address and
takes a reference on it.
Introduce __ip_dev_find(), taking a third argument, to optionally take
the device reference. Callers not asking the reference to be taken
should be in an rcu_read_lock() protected section.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Wed, 29 Sep 2010 21:39:37 +0000 (21:39 +0000)]
e1000e: 82579 performance improvements
The initial support for 82579 was tuned poorly for performance. Adjust the
packet buffer allocation appropriately for both standard and jumbo frames;
and for jumbo frames increase the receive descriptor pre-fetch, disable
adaptive interrupt moderation and set the DMA latency tolerance.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Wed, 29 Sep 2010 21:38:49 +0000 (21:38 +0000)]
e1000e: use hardware writeback batching
Most e1000e parts support batching writebacks. The problem with this is
that when some of the TADV or TIDV timers are not set, Tx can sit forever.
This is solved in this patch with write flushes using the Flush Partial
Descriptors (FPD) bit in TIDV and RDTR.
This improves bus utilization and removes partial writes on e1000e,
particularly from 82571 parts in S5500 chipset based machines.
Only ES2LAN and 82571/2 parts are included in this optimization, to reduce
testing load.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Emil Tantilov [Wed, 29 Sep 2010 21:35:23 +0000 (21:35 +0000)]
ixgbe: fix link issues and panic with shared interrupts for 82598
Fix possible panic/hang with shared Legacy interrupts by not enabling
interrupts when interface is down.
Also fixes an intermittent link by enabling LSC upon exit from ixgbe_intr()
This patch adds flags to ixgbe_irq_enable() to allow for some flexibility
when enabling interrupts.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 29 Sep 2010 11:53:50 +0000 (11:53 +0000)]
ipv4: __mkroute_output() speedup
While doing stress tests with a disabled IP route cache, I found
__mkroute_output() was touching three times in_device atomic refcount.
Use RCU to touch it once to reduce cache line ping pongs.
Before patch
time to perform the test
real 1m42.009s
user 0m12.545s
sys 25m0.726s
Profile :
16109.00 26.4% ip_route_output_slow vmlinux
7434.00 12.2% dst_destroy vmlinux
3280.00 5.4% fib_rules_lookup vmlinux
3252.00 5.3% fib_semantic_match vmlinux
2622.00 4.3% fib_table_lookup vmlinux
2535.00 4.1% dst_alloc vmlinux
1750.00 2.9% _raw_read_lock vmlinux
1532.00 2.5% rt_set_nexthop vmlinux
After patch
real 1m36.503s
user 0m12.977s
sys 23m25.608s
14234.00 22.4% ip_route_output_slow vmlinux
8717.00 13.7% dst_destroy vmlinux
4052.00 6.4% fib_rules_lookup vmlinux
3951.00 6.2% fib_semantic_match vmlinux
3191.00 5.0% dst_alloc vmlinux
1764.00 2.8% fib_table_lookup vmlinux
1692.00 2.7% _raw_read_lock vmlinux
1605.00 2.5% rt_set_nexthop vmlinux
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rémi Denis-Courmont [Wed, 29 Sep 2010 22:33:50 +0000 (22:33 +0000)]
Phonet: restore flow control credits when sending fails
This patch restores the below flow control patch submitted by Rémi
Denis-Courmont, which accidentaly got lost due to Pipe controller patch
on Phonet.
commit
1a98214feef2221cd7c24b17cd688a5a9d85b2ea
Author: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Date: Mon Aug 30 12:57:03 2010 +0000
Phonet: restore flow control credits when sending fails
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philip Rakity [Tue, 28 Sep 2010 04:26:30 +0000 (04:26 +0000)]
net: pxa168_etc.c recognize additional contributors
Signed-off-by: Philip Rakity <prakity@marvell.com>
Signed-off-by: Sachin Sanap <ssanap@marvell.com>
Signed-off-by: Mark Brown <markb@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 30 Sep 2010 06:35:10 +0000 (23:35 -0700)]
ip_gre: comments change
HARD_TX_LOCK no longer protects tunnels from dead loops,
but xmit_recursion percpu counter.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ondrej Zary [Tue, 28 Sep 2010 08:46:17 +0000 (08:46 +0000)]
de2104x: remove experimental status
It should be ready after 8 years...remove the experimental dependency.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ondrej Zary [Tue, 28 Sep 2010 08:18:55 +0000 (08:18 +0000)]
de2104x: disable media debug messages by default
Print media debug messages only when HW debug is enabled.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Gallatin [Tue, 28 Sep 2010 08:13:12 +0000 (08:13 +0000)]
myri10ge: DCA update (resubmit)
This patch contains the following DCA improvements to myri10ge:
1) Finally move myri10ge to use dca3 API
2) Disable PCIe relaxed ordering when enabling DCA on
myri10ge. This provides a performance boost on Nehalem
based Xeons
3) Make sure to properly initialize NIC's DCA state when it is enabled,
rather than giving the NIC a bogus tag (0) and waiting for
the first received packet to trigger an update. Not using a
real tag can cause hardware exceptions on some motherboards
when a CPU socket is empty.
3) Always update the cached CPU when our interrupt affinity changes
so as to avoid excessive calls to dca3_get_tag()
Signed-off-by: Andrew Gallatin <gallatin@myri.com>
Signed-off-by: Loic Prylli <loic@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Wed, 29 Sep 2010 01:05:37 +0000 (01:05 +0000)]
bnx2x: Moved enabling of MSI to the bnx2x_set_num_queues()
Moved enabling of MSI to the bnx2x_set_num_queues() - the same functions that
handles the initialization of the MSI-X.
From: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Tue, 28 Sep 2010 19:30:14 +0000 (19:30 +0000)]
tcp: tcp_enter_quickack_mode can be static
Function only used in tcp_input.c
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Tue, 28 Sep 2010 17:08:02 +0000 (17:08 +0000)]
arp: remove unnecessary export of arp_broken_ops
arp_broken_ops is only used in arp.c
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 28 Sep 2010 05:58:37 +0000 (05:58 +0000)]
net: rename netdev rx_queue to ingress_queue
There is some confusion with rx_queue name after RPS, and net drivers
private rx_queue fields.
I suggest to rename "struct net_device"->rx_queue to ingress_queue.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 28 Sep 2010 03:23:34 +0000 (03:23 +0000)]
ip6tnl: percpu stats accounting
Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.
Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.
This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 28 Sep 2010 00:17:17 +0000 (00:17 +0000)]
ipip: enable lockless xmits
IPIP tunnels can benefit from lockless xmits, using NETIF_F_LLTX
Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one ipip tunnel (size:200 bytes per frame)
Before patch :
real 2m53.321s
user 0m10.277s
sys 46m0.597s
After patch:
real 0m32.063s
user 0m9.237s
sys 8m16.255s
Last problem to solve is the contention on dst :
16118.00 28.3% __ip_route_output_key vmlinux
6135.00 10.8% dst_release vmlinux
3220.00 5.6% ip_finish_output vmlinux
2149.00 3.8% ip_route_output_flow vmlinux
1575.00 2.8% ip_append_data vmlinux
1481.00 2.6% ip_push_pending_frames vmlinux
1349.00 2.4% __xfrm_lookup vmlinux
1216.00 2.1% csum_partial_copy_generic vmlinux
1208.00 2.1% udp_sendmsg vmlinux
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Sep 2010 23:05:47 +0000 (23:05 +0000)]
ip_gre: lockless xmit
GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX
Note: If tunnels are created with the "oseq" option, LLTX is not
enabled :
Even using an atomic_t o_seq, we would increase chance for packets being
out of order at receiver.
Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one gre tunnel (size:200 bytes per frame)
Before patch :
real 3m0.094s
user 0m9.365s
sys 47m50.103s
After patch:
real 0m29.756s
user 0m11.097s
sys 7m33.012s
Last problem to solve is the contention on dst :
38660.00 21.4% __ip_route_output_key vmlinux
20786.00 11.5% dst_release vmlinux
14191.00 7.8% __xfrm_lookup vmlinux
12410.00 6.9% ip_finish_output vmlinux
4540.00 2.5% ip_push_pending_frames vmlinux
4427.00 2.4% ip_append_data vmlinux
4265.00 2.4% __alloc_skb vmlinux
4140.00 2.3% __ip_local_out vmlinux
3991.00 2.2% dev_queue_xmit vmlinux
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 28 Sep 2010 02:53:41 +0000 (02:53 +0000)]
sit: enable lockless xmits
SIT tunnels can benefit from lockless xmits, using NETIF_F_LLTX
Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one sit tunnel (size:220 bytes per frame)
Before patch :
real 3m15.399s
user 0m9.185s
sys 51m55.403s
75029.00 87.5% _raw_spin_lock vmlinux
1090.00 1.3% dst_release vmlinux
902.00 1.1% dev_queue_xmit vmlinux
627.00 0.7% sock_wfree vmlinux
613.00 0.7% ip6_push_pending_frames ipv6.ko
505.00 0.6% __ip_route_output_key vmlinux
After patch:
real 1m1.387s
user 0m12.489s
sys 15m58.868s
28239.00 23.3% dst_release vmlinux
13570.00 11.2% ip6_push_pending_frames ipv6.ko
13118.00 10.8% ip6_append_data ipv6.ko
7995.00 6.6% __ip_route_output_key vmlinux
7924.00 6.5% sk_dst_check vmlinux
5015.00 4.1% udpv6_sendmsg ipv6.ko
3594.00 3.0% sock_alloc_send_pskb vmlinux
3135.00 2.6% sock_wfree vmlinux
3055.00 2.5% ip6_sk_dst_lookup ipv6.ko
2473.00 2.0% ip_finish_output vmlinux
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 28 Sep 2010 02:17:58 +0000 (02:17 +0000)]
sit: fix percpu stats accounting
commit
15fc1f7056ebd (sit: percpu stats accounting) forgot the fallback
tunnel case (sit0), and can crash pretty fast.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Sep 2010 23:56:46 +0000 (23:56 +0000)]
ipip: fix percpu stats accounting
commit
3c97af99a5aa1 (ipip: percpu stats accounting) forgot the fallback
tunnel case (tunl0), and can crash pretty fast.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Sep 2010 20:50:33 +0000 (20:50 +0000)]
dummy: percpu stats and lockless xmit
Converts dummy network device driver to :
- percpu stats
- 64bit stats
- lockless xmit (NETIF_F_LLTX)
- performance features added (NETIF_F_SG | NETIF_F_FRAGLIST |
NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 29 Sep 2010 20:23:09 +0000 (13:23 -0700)]
net: add a recursion limit in xmit path
As tunnel devices are going to be lockless, we need to make sure a
misconfigured machine wont enter an infinite loop.
Add a percpu variable, and limit to three the number of stacked xmits.
Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maciej Żenczykowski [Mon, 27 Sep 2010 00:07:02 +0000 (00:07 +0000)]
ipv6: Implement Any-IP support for IPv6.
AnyIP is the capability to receive packets and establish incoming
connections on IPs we have not explicitly configured on the machine.
An example use case is to configure a machine to accept all incoming
traffic on eth0, and leave the policy of whether traffic for a given IP
should be delivered to the machine up to the load balancer.
Can be setup as follows:
ip -6 rule from all iif eth0 lookup 200
ip -6 route add local default dev lo table 200
(in this case for all IPv6 addresses)
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Sun, 23 May 2010 19:54:12 +0000 (19:54 +0000)]
ipv4: Allow configuring subnets as local addresses
This patch allows a host to be configured to respond to any address in
a specified range as if it were local, without actually needing to
configure the address on an interface. This is done through routing
table configuration. For instance, to configure a host to respond
to any address in 10.1/16 received on eth0 as a local address we can do:
ip rule add from all iif eth0 lookup 200
ip route add local 10.1/16 dev lo proto kernel scope host src 127.0.0.1 table 200
This host is now reachable by any 10.1/16 address (route lookup on
input for packets received on eth0 can find the route). On output, the
rule will not be matched so that this host can still send packets to
10.1/16 (not sent on loopback). Presumably, external routing can be
configured to make sense out of this.
To make this work, we needed to modify the logic in finding the
interface which is assigned a given source address for output
(dev_ip_find). We perform a normal fib_lookup instead of just a
lookup on the local table, and in the lookup we ignore the input
interface for matching.
This patch is useful to implement IP-anycast for subnets of virtual
addresses.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Tue, 28 Sep 2010 17:34:27 +0000 (10:34 -0700)]
pch_gbe: add header files
Fix build errors, more header files needed.
drivers/net/pch_gbe/pch_gbe_main.c:965: error: implicit declaration of function 'tcp_hdr'
drivers/net/pch_gbe/pch_gbe_main.c:965: error: invalid type argument of '->' (have 'int')
drivers/net/pch_gbe/pch_gbe_main.c:968: error: invalid type argument of '->' (have 'int')
drivers/net/pch_gbe/pch_gbe_main.c:976: error: implicit declaration of function 'udp_hdr'
drivers/net/pch_gbe/pch_gbe_main.c:976: error: invalid type argument of '->' (have 'int')
drivers/net/pch_gbe/pch_gbe_main.c:980: error: invalid type argument of '->' (have 'int')
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kumar Sanghvi [Mon, 27 Sep 2010 19:35:06 +0000 (19:35 +0000)]
Documentation: Update Phonet doc for Pipe Controller implementation
Updates the Phonet document with description related to Pipe controller
implementation
Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Tue, 28 Sep 2010 05:11:51 +0000 (22:11 -0700)]
tg3: Use netif_set_real_num_{rx,tx}_queues()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:32:59 +0000 (08:32 +0000)]
8021q: Use netif_copy_real_num_queues() to set queue counts
This covers RX if necessary, as well as TX.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:31:07 +0000 (08:31 +0000)]
sfc: Use netif_set_real_num_{rx,tx}_queues()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:30:59 +0000 (08:30 +0000)]
niu: Use netif_set_real_num_{rx,tx}_queues()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:30:34 +0000 (08:30 +0000)]
myri10ge: Use netif_set_real_num_{rx, tx}_queues()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:30:05 +0000 (08:30 +0000)]
mv643xx_eth: Use netif_set_real_num_{rx, tx}_queues()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:29:34 +0000 (08:29 +0000)]
mlx4_en: Use netif_set_real_num_{rx, tx}_queues()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:28:56 +0000 (08:28 +0000)]
ixgbe: Use netif_set_real_num_{rx,tx}_queues()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:28:39 +0000 (08:28 +0000)]
igb: Use netif_set_real_num_{rx,tx}_queues()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:27:37 +0000 (08:27 +0000)]
gianfar: Use netif_set_real_num_rx_queues()
Do not set num_tx_queues or real_num_tx_queues, since
alloc_etherdev_mq() does that.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:26:10 +0000 (08:26 +0000)]
cxgb4vf: Use netif_set_real_num_{rx, tx}_queues()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:25:57 +0000 (08:25 +0000)]
cxgb4: Use netif_set_real_num_{rx,tx}_queues()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:25:44 +0000 (08:25 +0000)]
cxgb3: Use netif_set_real_num_{rx,tx}_queues()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:25:27 +0000 (08:25 +0000)]
bnx2x: Use netif_set_real_num_{rx,tx}_queues()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:25:16 +0000 (08:25 +0000)]
bnx2: Use netif_set_real_num_{rx,tx}_queues()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:24:49 +0000 (08:24 +0000)]
net: Add netif_copy_real_num_queues() for use by virtual net drivers
This sets the active numbers of queues on a net device to match another.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 27 Sep 2010 08:24:33 +0000 (08:24 +0000)]
net: Allow changing number of RX queues after device allocation
For RPS, we create a kobject for each RX queue based on the number of
queues passed to alloc_netdev_mq(). However, drivers generally do not
determine the numbers of hardware queues to use until much later, so
this usually represents the maximum number the driver may use and not
the actual number in use.
For TX queues, drivers can update the actual number using
netif_set_real_num_tx_queues(). Add a corresponding function for RX
queues, netif_set_real_num_rx_queues().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Sep 2010 06:07:30 +0000 (06:07 +0000)]
net: sk_{detach|attach}_filter() rcu fixes
sk_attach_filter() and sk_detach_filter() are run with socket locked.
Use the appropriate rcu_dereference_protected() instead of blocking BH,
and rcu_dereference_bh().
There is no point adding BH prevention and memory barrier.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Sep 2010 04:18:27 +0000 (04:18 +0000)]
fib: use atomic_inc_not_zero() in fib_rules_lookup
It seems we dont use appropriate refcount increment in an
rcu_read_lock() protected section.
fib_rule_get() might increment a null refcount and bad things could
happen.
While fib_nl_delrule() respects an rcu grace period before calling
fib_rule_put(), fib_rules_cleanup_ops() calls fib_rule_put() without a
grace period.
Note : after this patch, we might avoid the synchronize_rcu() call done
in fib_nl_delrule()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Sep 2010 00:38:18 +0000 (00:38 +0000)]
sit: percpu stats accounting
Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.
Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.
This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Sep 2010 00:35:50 +0000 (00:35 +0000)]
ipip: percpu stats accounting
Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.
Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.
This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Sep 2010 03:57:11 +0000 (03:57 +0000)]
ip_gre: percpu stats accounting
Le lundi 27 septembre 2010 à 14:29 +0100, Ben Hutchings a écrit :
> > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> > index 5d6ddcb..de39b22 100644
> > --- a/net/ipv4/ip_gre.c
> > +++ b/net/ipv4/ip_gre.c
> [...]
> > @@ -377,7 +405,7 @@ static struct ip_tunnel *ipgre_tunnel_locate(struct net *net,
> > if (parms->name[0])
> > strlcpy(name, parms->name, IFNAMSIZ);
> > else
> > - sprintf(name, "gre%%d");
> > + strcpy(name, "gre%d");
> >
> > dev = alloc_netdev(sizeof(*t), name, ipgre_tunnel_setup);
> > if (!dev)
> [...]
>
> This is a valid fix, but doesn't belong in this patch!
>
Sorry ? It was not a fix, but at most a cleanup ;)
Anyway I forgot the gretap case...
[PATCH 2/4 v2] ip_gre: percpu stats accounting
Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.
Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.
This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Sep 2010 00:33:35 +0000 (00:33 +0000)]
tunnels: prepare percpu accounting
Tunnels are going to use percpu for their accounting.
They are going to use a new tstats field in net_device.
skb_tunnel_rx() is changed to be a wrapper around __skb_tunnel_rx()
IPTUNNEL_XMIT() is changed to be a wrapper around __IPTUNNEL_XMIT()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 26 Sep 2010 22:47:09 +0000 (22:47 +0000)]
vlan: use this_cpu_ptr() in vlan_skb_recv()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kumar Sanghvi [Sun, 26 Sep 2010 19:07:59 +0000 (19:07 +0000)]
Phonet: Implement Pipe Controller to support Nokia Slim Modems
Phonet stack assumes the presence of Pipe Controller, either in Modem or
on Application Processing Engine user-space for the Pipe data.
Nokia Slim Modems like WG2.5 used in ST-Ericsson U8500 platform do not
implement Pipe controller in them.
This patch adds Pipe Controller implemenation to Phonet stack to support
Pipe data over Phonet stack for Nokia Slim Modems.
Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 27 Sep 2010 08:03:03 +0000 (01:03 -0700)]
Merge branch 'master' of /linux/kernel/git/davem/net-2.6
Conflicts:
drivers/net/qlcnic/qlcnic_init.c
net/ipv4/ip_output.c
Otavio Salvador [Mon, 27 Sep 2010 02:58:07 +0000 (19:58 -0700)]
net: r6040: store BIOS default MAC in perm_add
Signed-off-by: Otavio Salvador <otavio@ossystems.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Fri, 24 Sep 2010 09:55:52 +0000 (09:55 +0000)]
ipv6: add a missing unregister_pernet_subsys call
Clean up a missing exit path in the ipv6 module init routines. In
addrconf_init we call ipv6_addr_label_init which calls register_pernet_subsys
for the ipv6_addr_label_ops structure. But if module loading fails, or if the
ipv6 module is removed, there is no corresponding unregister_pernet_subsys call,
which leaves a now-bogus address on the pernet_list, leading to oopses in
subsequent registrations. This patch cleans up both the failed load path and
the unload path. Tested by myself with good results.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
include/net/addrconf.h | 1 +
net/ipv6/addrconf.c | 11 ++++++++---
net/ipv6/addrlabel.c | 5 +++++
3 files changed, 14 insertions(+), 3 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 23 Sep 2010 23:51:51 +0000 (23:51 +0000)]
net: loopback driver cleanup
loopback driver uses dev->ml_priv to store its percpu stats pointer.
It uses ugly casts "(void __percpu __force *)" to shut up sparse
complains.
Define an union to better document we use ml_priv in loopback driver and
define a lstats field with appropriate types.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 23 Sep 2010 21:46:03 +0000 (21:46 +0000)]
net: fix rcu use in ip_route_output_slow
__in_dev_get_rtnl(dev_out) is called while RTNL is not held, thus
triggers a lockdep fault.
At this point, we only perform a raw test of dev_out->ip_ptr being NULL,
we dont need to make sure ip_ptr cant changed right after.
We can use rcu_dereference_raw() for this.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 23 Sep 2010 17:26:35 +0000 (17:26 +0000)]
rps: allocate rx queues in register_netdevice only
Instead of having two places were we allocate dev->_rx, introduce
netif_alloc_rx_queues() helper and call it only from
register_netdevice(), not from alloc_netdev_mq()
Goal is to let drivers change dev->num_rx_queues after allocating netdev
and before registering it.
This also removes a lot of ifdefs in net/core/dev.c
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasiliy Kulikov [Mon, 27 Sep 2010 01:56:06 +0000 (18:56 -0700)]
s390: use free_netdev(netdev) instead of kfree()
Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
@@
struct net_device* dev;
@@
-kfree(dev)
+free_netdev(dev)
Signed-off-by: David S. Miller <davem@davemloft.net>
Kulikov Vasiliy [Sat, 25 Sep 2010 23:58:06 +0000 (23:58 +0000)]
sgiseeq: use free_netdev(netdev) instead of kfree()
Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
@@
struct net_device* dev;
@@
-kfree(dev)
+free_netdev(dev)
Signed-off-by: David S. Miller <davem@davemloft.net>
Kulikov Vasiliy [Sat, 25 Sep 2010 23:58:03 +0000 (23:58 +0000)]
rionet: use free_netdev(netdev) instead of kfree()
Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
@@
struct net_device* dev;
@@
-kfree(dev)
+free_netdev(dev)
Signed-off-by: David S. Miller <davem@davemloft.net>
Kulikov Vasiliy [Sat, 25 Sep 2010 23:58:00 +0000 (23:58 +0000)]
ibm_newemac: use free_netdev(netdev) instead of kfree()
Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
@@
struct net_device* dev;
@@
-kfree(dev)
+free_netdev(dev)
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Sep 2010 01:53:07 +0000 (18:53 -0700)]
net: update SOCK_MIN_RCVBUF
SOCK_MIN_RCVBUF current value is 256 bytes
It doesnt permit to receive the smallest possible frame, considering
socket sk_rmem_alloc/sk_rcvbuf account skb truesizes. On 64bit arches,
sizeof(struct sk_buff) is 240 bytes. Add the typical 64 bytes of
headroom, and we go over the limit.
With old kernels and 32bit arches, we were under the limit, if netdriver
was doing copybreak.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vincent Stehlé [Mon, 27 Sep 2010 01:50:05 +0000 (18:50 -0700)]
smsc911x: Add MODULE_ALIAS()
This enables auto loading for the smsc911x ethernet driver.
Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Thu, 23 Sep 2010 11:19:54 +0000 (11:19 +0000)]
net: reset skb queue mapping when rx'ing over tunnel
Reset queue mapping when an skb is reentering the stack via a tunnel.
On second pass, the queue mapping from the original device is no
longer valid.
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 23 Sep 2010 05:40:09 +0000 (05:40 +0000)]
drivers/net: return operator cleanup
Change "return (EXPR);" to "return EXPR;"
return is not a function, parentheses are not required.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 23 Sep 2010 05:06:54 +0000 (05:06 +0000)]
net: skb_frag_t can be smaller on small arches
On 32bit arches, if PAGE_SIZE is smaller than 65536, we can use 16bit
offset and size fields. This patch saves 72 bytes per skb on i386, or
128 bytes after rounding.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karl Hiramoto [Thu, 23 Sep 2010 01:50:54 +0000 (01:50 +0000)]
br2684: fix scheduling while atomic
You can't call atomic_notifier_chain_unregister() while in atomic context.
Fix, call un/register_atmdevice_notifier in module __init and __exit.
Bug report:
http://comments.gmane.org/gmane.linux.network/172603
Reported-by: Mikko Vinni <mmvinni@yahoo.com>
Tested-by: Mikko Vinni <mmvinni@yahoo.com>
Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 23 Sep 2010 00:46:11 +0000 (00:46 +0000)]
net: propagate NETIF_F_HIGHDMA to vlans
Automatically allows vlans to get NETIF_F_HIGHDMA if underlying device
supports it.
On 32bit arches (and more precisely if CONFIG_HIGHMEM is enabled), it
can help to reduce cost of illegal_highdma() and __skb_linearize()
calls.
Tested on tg3 , bnx2, bonding, this worked very well.
This is a generalization of a patch provided by Yi Zou & Jeff Kirsher.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ondrej Zary [Sat, 25 Sep 2010 10:39:17 +0000 (10:39 +0000)]
de2104x: fix TP link detection
Compex FreedomLine 32 PnP-PCI2 cards have only TP and BNC connectors but the
SROM contains AUI port too. When TP loses link, the driver switches to
non-existing AUI port (which reports that carrier is always present).
Connecting TP back generates LinkPass interrupt but de_media_interrupt() is
broken - it only updates the link state of currently connected media, ignoring
the fact that LinkPass and LinkFail bits of MacStatus register belong to the
TP port only (the chip documentation says that).
This patch changes de_media_interrupt() to switch media to TP when link goes
up (and media type is not locked) and also to update the link state only when
the TP port is used.
Also the NonselPortActive (and also SelPortActive) bits of SIAStatus register
need to be cleared (by writing 1) after reading or they're useless.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ondrej Zary [Fri, 24 Sep 2010 23:57:02 +0000 (23:57 +0000)]
de2104x: fix power management
At least my 21041 cards come out of suspend with bus mastering disabled so
they did not work after resume(no data transferred).
After adding pci_set_master(), the driver oopsed immediately on resume -
because de_clean_rings() is called on suspend but de_init_rings() call
was missing in resume.
Also disable link (reset SIA) before sleep (de4x5 does this too).
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Rose [Thu, 23 Sep 2010 12:46:25 +0000 (12:46 +0000)]
ixgbevf: Refactor ring parameter re-size
The function to resize the Tx/Rx rings had the potential to
dereference a NULL pointer and the code would attempt to resize
the Tx ring even if the Rx ring allocation had failed. This
would cause some confusion in the return code semantics. Fixed
up to just unwind the allocations if any of them fail and return
an error.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ondrej Zary [Thu, 23 Sep 2010 10:59:18 +0000 (10:59 +0000)]
de2104x: disable autonegotiation on broken hardware
At least on older 21041-AA chips (mine is rev. 11), TP duplex autonegotiation
causes the card not to work at all (link is up but no packets are transmitted).
de4x5 disables autonegotiation completely. But it seems to work on newer
(21041-PA rev. 21) so disable it only on rev<20 chips.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 22 Sep 2010 12:43:39 +0000 (12:43 +0000)]
net: fix a lockdep splat
We have for each socket :
One spinlock (sk_slock.slock)
One rwlock (sk_callback_lock)
Possible scenarios are :
(A) (this is used in net/sunrpc/xprtsock.c)
read_lock(&sk->sk_callback_lock) (without blocking BH)
<BH>
spin_lock(&sk->sk_slock.slock);
...
read_lock(&sk->sk_callback_lock);
...
(B)
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)
(C)
spin_lock_bh(&sk->sk_slock)
...
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)
spin_unlock_bh(&sk->sk_slock)
This (C) case conflicts with (A) :
CPU1 [A] CPU2 [C]
read_lock(callback_lock)
<BH> spin_lock_bh(slock)
<wait to spin_lock(slock)>
<wait to write_lock_bh(callback_lock)>
We have one problematic (C) use case in inet_csk_listen_stop() :
local_bh_disable();
bh_lock_sock(child); // spin_lock_bh(&sk->sk_slock)
WARN_ON(sock_owned_by_user(child));
...
sock_orphan(child); // write_lock_bh(&sk->sk_callback_lock)
lockdep is not happy with this, as reported by Tetsuo Handa
It seems only way to deal with this is to use read_lock_bh(callbacklock)
everywhere.
Thanks to Jarek for pointing a bug in my first attempt and suggesting
this solution.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe Cavallaro [Sat, 25 Sep 2010 04:27:41 +0000 (21:27 -0700)]
stmmac: review the wake-up support
If the PM support is available this is passed
through the platform instead to be hard-coded
in the core files.
WoL on Magic Frame can be enabled by using
the ethtool support.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Masayuki Ohtake [Tue, 21 Sep 2010 01:44:11 +0000 (01:44 +0000)]
net: Add Gigabit Ethernet driver of Topcliff PCH
Signed-off-by: Masayuki Ohtake <masa-korg@dsn.okisemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 20 Sep 2010 20:16:27 +0000 (20:16 +0000)]
ip: take care of last fragment in ip_append_data
While investigating a bit, I found ip_fragment() slow path was taken
because ip_append_data() provides following layout for a send(MTU +
N*(MTU - 20)) syscall :
- one skb with 1500 (mtu) bytes
- N fragments of 1480 (mtu-20) bytes (before adding IP header)
last fragment gets 17 bytes of trail data because of following bit:
if (datalen == length + fraggap)
alloclen += rt->dst.trailer_len;
Then esp4 adds 16 bytes of data (while trailer_len is 17... hmm...
another bug ?)
In ip_fragment(), we notice last fragment is too big (1496 + 20) > mtu,
so we take slow path, building another skb chain.
In order to avoid taking slow path, we should correct ip_append_data()
to make sure last fragment has real trail space, under mtu...
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 22 Sep 2010 20:43:57 +0000 (20:43 +0000)]
net: return operator cleanup
Change "return (EXPR);" to "return EXPR;"
return is not a function, parentheses are not required.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Wed, 22 Sep 2010 18:23:05 +0000 (18:23 +0000)]
e1000: use GRO for receive
E1000 can benefit from calling the GRO receive functions.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Wed, 22 Sep 2010 18:22:42 +0000 (18:22 +0000)]
e1000: fix occasional panic on unload
Net drivers in general have an issue where timers fired
by mod_timer or work threads with schedule_work are running
outside of the rtnl_lock.
With no other lock protection these routines are vulnerable
to races with driver unload or reset paths.
The longer term solution to this might be a redesign with
safer locks being taken in the driver to guarantee no
reentrance, but for now a safe and effective fix is
to take the rtnl_lock in these routines.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Wed, 22 Sep 2010 18:22:17 +0000 (18:22 +0000)]
e1000: use work queues
E1000 is using several timers that in a follow on patch
will need to acquire the rtnl_lock in order to be safe.
This patch moves the timer bodies into work queues which
will allow the next patch to add rtnl_lock.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yi Zou [Wed, 22 Sep 2010 17:57:58 +0000 (17:57 +0000)]
e1000/e1000e/igb/ixgb/ixgbe: set NETIF_F_HIGHDMA for VLAN feature flags
If the netdev->features is set with NETIF_F_HIGHDMA, we should set the
corresponding netdev->vlan_features as well to allow VLAN netdev created
on top of the real netdev to be able to also benefit from HIGHDMA on 32bit
system, reducing the performance hit that is caused by __skb_linearize(),
particularly for large send. This is fixed in this patch for all Intel e1000,
e1000e, igb, ixgbe, and ixgbe drivers since this should be beneficial
to all devices supported by these drivers.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joseph Gasparakis [Wed, 22 Sep 2010 17:56:44 +0000 (17:56 +0000)]
igb: Add support for DH89xxCC
This patch adds support for the Intel(r) DH89xxCC series. The new
device will be using Intel(r) i347-AT4 and Marvell(r) M88E1322 and
M88E1112 PHYs. Support for these devices has also been added here.
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 22 Sep 2010 17:56:20 +0000 (17:56 +0000)]
igb: clear VF_PROMISC bits instead of setting all other bits
This change corrects an issue in which we were setting all flag bits except
for promisc instead of clearing the promisc bits due to the incorrect use
of an |= instead of an &=.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Wed, 22 Sep 2010 17:17:01 +0000 (17:17 +0000)]
e1000e: 82579 do not gate auto config of PHY by hardware during nominal use
For non-managed versions of 82579, set the bit that prevents the hardware
from automatically configuring the PHY after resets only when the driver
performs a reset, clear the bit after resets. This is so the hardware can
configure the PHY automatically when the part is reset in a manner that is
not controlled by the driver (e.g. in a virtual environment via PCI FLR)
otherwise the PHY will be mis-configured causing issues such as failing to
link at 1000Mbps.
For managed versions of 82579, keep the previous behavior since the
manageability firmware will handle the PHY configuration.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Wed, 22 Sep 2010 17:16:40 +0000 (17:16 +0000)]
e1000e: 82579 jumbo frame workaround causing CRC errors
The subject workaround was causing CRC errors due to writing the wrong
register with updates of the RCTL register. It was also found that the
workaround function which modifies the RCTL register was being called in
the middle of a read-modify-write operation of the RCTL register, so the
function call has been moved appropriately. Lastly, jumbo frames must not
be allowed when CRC stripping is disabled by a module parameter because the
workaround requires the CRC be stripped.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Wed, 22 Sep 2010 17:16:18 +0000 (17:16 +0000)]
e1000e: 82579 unaccounted missed packets
On 82579, there is a hardware bug that can cause received packets to not
get transferred from the PHY to the MAC due to K1 (a power saving feature
of the PHY-MAC interconnect similar to ASPM L1). Since the MAC controls
the accounting of missed packets, these will go unnoticed. Workaround the
issue by setting the K1 beacon duration according to the link speed.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Wed, 22 Sep 2010 17:15:54 +0000 (17:15 +0000)]
e1000e: 82566DC fails to get link
Two recent patches to cleanup the reset[1] and initial PHY configuration[2]
code paths for ICH/PCH devices inadvertently left out a 10msec delay and
device ID check respectively which are necessary for the 82566DC (device id
0x104b) to be configured properly, otherwise it will not get link.
[1] commit
e98cac447cc1cc418dff1d610a5c79c4f2bdec7f
[2] commit
3f0c16e84438d657d29446f85fe375794a93f159
CC: stable@kernel.org
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Wed, 22 Sep 2010 17:15:33 +0000 (17:15 +0000)]
e1000e: 82579 SMBus address and LEDs incorrect after device reset
Since the hardware is prevented from performing automatic PHY configuration
(the driver does it instead), the OEM_WRITE_ENABLE bit in the EXTCNF_CTRL
register will not get cleared preventing the SMBus address and the LED
configuration to be written to the PHY registers. On 82579, do not check
the OEM_WRITE_ENABLE bit.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Wed, 22 Sep 2010 17:15:08 +0000 (17:15 +0000)]
e1000e: 82577/8/9 issues with device in Sx
When going to Sx, disable gigabit in PHY (e1000_oem_bits_config_ich8lan)
in addition to the MAC before configuring PHY wakeup otherwise the PHY
configuration writes might be missed. Also write the LED configuration
and SMBus address to the PHY registers (e1000_oem_bits_config_ich8lan and
e1000_write_smbus_addr, respectively). The reset is no longer needed
since re-auto-negotiation is forced in e1000_oem_bits_config_ich8lan and
leaving it in causes issues with auto-negotiating the link.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ulrich Weber [Wed, 22 Sep 2010 06:45:11 +0000 (06:45 +0000)]
xfrm4: strip ECN bits from tos field
otherwise ECT(1) bit will get interpreted as RTO_ONLINK
and routing will fail with XfrmOutBundleGenError.
Signed-off-by: Ulrich Weber <uweber@astaro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luca Tettamanti [Wed, 22 Sep 2010 10:42:31 +0000 (10:42 +0000)]
atl1: zero out CMB and SBM in atl1_free_ring_resources
They are allocated in atl1_setup_ring_resources, zero out the pointers
in atl1_free_ring_resources (like the other resources).
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Acked-by: Chris Snook <chris.snook@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luca Tettamanti [Wed, 22 Sep 2010 10:41:58 +0000 (10:41 +0000)]
atl1: fix resume
adapter->cmb.cmb is initialized when the device is opened and freed when
it's closed. Accessing it unconditionally during resume results either
in a crash (NULL pointer dereference, when the interface has not been
opened yet) or data corruption (when the interface has been used and
brought down adapter->cmb.cmb points to a deallocated memory area).
Cc: stable@kernel.org
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Acked-by: Chris Snook <chris.snook@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andrew hendry [Tue, 21 Sep 2010 15:24:45 +0000 (15:24 +0000)]
X.25 remove bkl in poll
The x25_datagram_poll didn't add anything, removed it.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andrew hendry [Tue, 21 Sep 2010 15:24:25 +0000 (15:24 +0000)]
X.25 remove bkl in getsockname
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 22 Sep 2010 10:00:47 +0000 (10:00 +0000)]
sfc: Add support for SFE4003 board and TXC43128 PHY
This board never went into production, but some engineering samples
are in use.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 22 Sep 2010 10:00:11 +0000 (10:00 +0000)]
sfc: Remove support for SFN4111T, SFT9001 and Falcon GMAC
SFN4111T never reached production and is not being used for internal
or customer testing.
Since we have no production Falcon boards using the SFT9001 or the
GMAC, remove support for them as well.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ollie Wild [Wed, 22 Sep 2010 05:54:54 +0000 (05:54 +0000)]
net: Move "struct net" declaration inside the __KERNEL__ macro guard
This patch reduces namespace pollution by moving the "struct net" declaration
out of the userspace-facing portion of linux/netlink.h. It has no impact on
the kernel.
(This came up because we have several C++ applications which use "net" as a
namespace name.)
Signed-off-by: Ollie Wild <aaw@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Olsa [Tue, 21 Sep 2010 21:17:34 +0000 (21:17 +0000)]
netfilter: nf_conntrack_defrag: check socket type before touching nodefrag flag
we need to check proper socket type within ipv4_conntrack_defrag
function before referencing the nodefrag flag.
For example the tun driver receive path produces skbs with
AF_UNSPEC socket type, and so current code is causing unwanted
fragmented packets going out.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>