profile/ivi/kernel-adaptation-intel-automotive.git
14 years agoipv4: rcu conversion in ip_route_output_slow
Eric Dumazet [Thu, 30 Sep 2010 03:33:58 +0000 (03:33 +0000)]
ipv4: rcu conversion in ip_route_output_slow

ip_route_output_slow() is enclosed in an rcu_read_lock() protected
section, so that no references are taken/released on device, thanks to
__ip_dev_find() & dev_get_by_index_rcu()

Tested with ip route cache disabled, and a stress test :

Before patch:

elapsed time :

real 1m38.347s
user 0m11.909s
sys 23m51.501s

Profile:

13788.00 22.7% ip_route_output_slow [kernel]
 7875.00 13.0% dst_destroy          [kernel]
 3925.00  6.5% fib_semantic_match   [kernel]
 3144.00  5.2% fib_rules_lookup     [kernel]
 3061.00  5.0% dst_alloc            [kernel]
 2276.00  3.7% rt_set_nexthop       [kernel]
 1762.00  2.9% fib_table_lookup     [kernel]
 1538.00  2.5% _raw_read_lock       [kernel]
 1358.00  2.2% ip_output            [kernel]

After patch:

real 1m28.808s
user 0m13.245s
sys 20m37.293s

10950.00 17.2% ip_route_output_slow [kernel]
10726.00 16.9% dst_destroy          [kernel]
 5170.00  8.1% fib_semantic_match   [kernel]
 3937.00  6.2% dst_alloc            [kernel]
 3635.00  5.7% rt_set_nexthop       [kernel]
 2900.00  4.6% fib_rules_lookup     [kernel]
 2240.00  3.5% fib_table_lookup     [kernel]
 1427.00  2.2% _raw_read_lock       [kernel]
 1157.00  1.8% kmem_cache_alloc     [kernel]

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipv4: introduce __ip_dev_find()
Eric Dumazet [Thu, 30 Sep 2010 03:31:56 +0000 (03:31 +0000)]
ipv4: introduce __ip_dev_find()

ip_dev_find(net, addr) finds a device given an IPv4 source address and
takes a reference on it.

Introduce __ip_dev_find(), taking a third argument, to optionally take
the device reference. Callers not asking the reference to be taken
should be in an rcu_read_lock() protected section.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: 82579 performance improvements
Bruce Allan [Wed, 29 Sep 2010 21:39:37 +0000 (21:39 +0000)]
e1000e: 82579 performance improvements

The initial support for 82579 was tuned poorly for performance.  Adjust the
packet buffer allocation appropriately for both standard and jumbo frames;
and for jumbo frames increase the receive descriptor pre-fetch, disable
adaptive interrupt moderation and set the DMA latency tolerance.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: use hardware writeback batching
Jesse Brandeburg [Wed, 29 Sep 2010 21:38:49 +0000 (21:38 +0000)]
e1000e: use hardware writeback batching

Most e1000e parts support batching writebacks.  The problem with this is
that when some of the TADV or TIDV timers are not set, Tx can sit forever.

This is solved in this patch with write flushes using the Flush Partial
Descriptors (FPD) bit in TIDV and RDTR.

This improves bus utilization and removes partial writes on e1000e,
particularly from 82571 parts in S5500 chipset based machines.

Only ES2LAN and 82571/2 parts are included in this optimization, to reduce
testing load.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbe: fix link issues and panic with shared interrupts for 82598
Emil Tantilov [Wed, 29 Sep 2010 21:35:23 +0000 (21:35 +0000)]
ixgbe: fix link issues and panic with shared interrupts for 82598

Fix possible panic/hang with shared Legacy interrupts by not enabling
interrupts when interface is down.

Also fixes an intermittent link by enabling LSC upon exit from ixgbe_intr()

This patch adds flags to ixgbe_irq_enable() to allow for some flexibility
when enabling interrupts.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipv4: __mkroute_output() speedup
Eric Dumazet [Wed, 29 Sep 2010 11:53:50 +0000 (11:53 +0000)]
ipv4: __mkroute_output() speedup

While doing stress tests with a disabled IP route cache, I found
__mkroute_output() was touching three times in_device atomic refcount.

Use RCU to touch it once to reduce cache line ping pongs.

Before patch

time to perform the test
real 1m42.009s
user 0m12.545s
sys 25m0.726s

Profile :

16109.00 26.4% ip_route_output_slow   vmlinux
 7434.00 12.2% dst_destroy            vmlinux
 3280.00  5.4% fib_rules_lookup       vmlinux
 3252.00  5.3% fib_semantic_match     vmlinux
 2622.00  4.3% fib_table_lookup       vmlinux
 2535.00  4.1% dst_alloc              vmlinux
 1750.00  2.9% _raw_read_lock         vmlinux
 1532.00  2.5% rt_set_nexthop         vmlinux

After patch

real 1m36.503s
user 0m12.977s
sys 23m25.608s

14234.00 22.4% ip_route_output_slow   vmlinux
 8717.00 13.7% dst_destroy            vmlinux
 4052.00  6.4% fib_rules_lookup       vmlinux
 3951.00  6.2% fib_semantic_match     vmlinux
 3191.00  5.0% dst_alloc              vmlinux
 1764.00  2.8% fib_table_lookup       vmlinux
 1692.00  2.7% _raw_read_lock         vmlinux
 1605.00  2.5% rt_set_nexthop         vmlinux

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoPhonet: restore flow control credits when sending fails
Rémi Denis-Courmont [Wed, 29 Sep 2010 22:33:50 +0000 (22:33 +0000)]
Phonet: restore flow control credits when sending fails

This patch restores the below flow control patch submitted by Rémi
Denis-Courmont, which accidentaly got lost due to Pipe controller patch
on Phonet.

commit 1a98214feef2221cd7c24b17cd688a5a9d85b2ea
Author: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Date:   Mon Aug 30 12:57:03 2010 +0000

Phonet: restore flow control credits when sending fails

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: pxa168_etc.c recognize additional contributors
Philip Rakity [Tue, 28 Sep 2010 04:26:30 +0000 (04:26 +0000)]
net: pxa168_etc.c recognize additional contributors

Signed-off-by: Philip Rakity <prakity@marvell.com>
Signed-off-by: Sachin Sanap <ssanap@marvell.com>
Signed-off-by: Mark Brown <markb@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoip_gre: comments change
Eric Dumazet [Thu, 30 Sep 2010 06:35:10 +0000 (23:35 -0700)]
ip_gre: comments change

HARD_TX_LOCK no longer protects tunnels from dead loops,
but xmit_recursion percpu counter.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agode2104x: remove experimental status
Ondrej Zary [Tue, 28 Sep 2010 08:46:17 +0000 (08:46 +0000)]
de2104x: remove experimental status

It should be ready after 8 years...remove the experimental dependency.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agode2104x: disable media debug messages by default
Ondrej Zary [Tue, 28 Sep 2010 08:18:55 +0000 (08:18 +0000)]
de2104x: disable media debug messages by default

Print media debug messages only when HW debug is enabled.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agomyri10ge: DCA update (resubmit)
Andrew Gallatin [Tue, 28 Sep 2010 08:13:12 +0000 (08:13 +0000)]
myri10ge: DCA update (resubmit)

This patch contains the following DCA improvements to myri10ge:

1) Finally move myri10ge to use dca3 API

2) Disable PCIe relaxed ordering when enabling DCA on
     myri10ge.  This provides a performance boost on Nehalem
     based Xeons

3) Make sure to properly initialize NIC's DCA state when it is enabled,
     rather than giving the NIC a bogus tag (0) and waiting for
     the first received packet to trigger an update.  Not using a
     real tag can cause hardware exceptions on some motherboards
     when a CPU socket is empty.

3) Always update the cached CPU when our interrupt affinity changes
     so as to avoid excessive calls to dca3_get_tag()

Signed-off-by: Andrew Gallatin <gallatin@myri.com>
Signed-off-by: Loic Prylli <loic@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobnx2x: Moved enabling of MSI to the bnx2x_set_num_queues()
Dmitry Kravkov [Wed, 29 Sep 2010 01:05:37 +0000 (01:05 +0000)]
bnx2x: Moved enabling of MSI to the bnx2x_set_num_queues()

Moved enabling of MSI to the bnx2x_set_num_queues() - the same functions that
 handles the initialization of the MSI-X.

From: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotcp: tcp_enter_quickack_mode can be static
stephen hemminger [Tue, 28 Sep 2010 19:30:14 +0000 (19:30 +0000)]
tcp: tcp_enter_quickack_mode can be static

Function only used in tcp_input.c

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoarp: remove unnecessary export of arp_broken_ops
stephen hemminger [Tue, 28 Sep 2010 17:08:02 +0000 (17:08 +0000)]
arp: remove unnecessary export of arp_broken_ops

arp_broken_ops is only used in arp.c

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: rename netdev rx_queue to ingress_queue
Eric Dumazet [Tue, 28 Sep 2010 05:58:37 +0000 (05:58 +0000)]
net: rename netdev rx_queue to ingress_queue

There is some confusion with rx_queue name after RPS, and net drivers
private rx_queue fields.

I suggest to rename "struct net_device"->rx_queue to ingress_queue.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoip6tnl: percpu stats accounting
Eric Dumazet [Tue, 28 Sep 2010 03:23:34 +0000 (03:23 +0000)]
ip6tnl: percpu stats accounting

Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.

Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.

This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipip: enable lockless xmits
Eric Dumazet [Tue, 28 Sep 2010 00:17:17 +0000 (00:17 +0000)]
ipip: enable lockless xmits

IPIP tunnels can benefit from lockless xmits, using NETIF_F_LLTX

Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one ipip tunnel (size:200 bytes per frame)

Before patch :
real 2m53.321s
user 0m10.277s
sys 46m0.597s

After patch:
real 0m32.063s
user 0m9.237s
sys 8m16.255s

Last problem to solve is the contention on dst :

16118.00 28.3% __ip_route_output_key         vmlinux
 6135.00 10.8% dst_release                   vmlinux
 3220.00  5.6% ip_finish_output              vmlinux
 2149.00  3.8% ip_route_output_flow          vmlinux
 1575.00  2.8% ip_append_data                vmlinux
 1481.00  2.6% ip_push_pending_frames        vmlinux
 1349.00  2.4% __xfrm_lookup                 vmlinux
 1216.00  2.1% csum_partial_copy_generic     vmlinux
 1208.00  2.1% udp_sendmsg                   vmlinux

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoip_gre: lockless xmit
Eric Dumazet [Mon, 27 Sep 2010 23:05:47 +0000 (23:05 +0000)]
ip_gre: lockless xmit

GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX

Note: If tunnels are created with the "oseq" option, LLTX is not
enabled :

Even using an atomic_t o_seq, we would increase chance for packets being
out of order at receiver.

Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one gre tunnel (size:200 bytes per frame)

Before patch :
real 3m0.094s
user 0m9.365s
sys 47m50.103s

After patch:
real 0m29.756s
user 0m11.097s
sys 7m33.012s

Last problem to solve is the contention on dst :

38660.00 21.4% __ip_route_output_key          vmlinux
20786.00 11.5% dst_release                    vmlinux
14191.00  7.8% __xfrm_lookup                  vmlinux
12410.00  6.9% ip_finish_output               vmlinux
 4540.00  2.5% ip_push_pending_frames         vmlinux
 4427.00  2.4% ip_append_data                 vmlinux
 4265.00  2.4% __alloc_skb                    vmlinux
 4140.00  2.3% __ip_local_out                 vmlinux
 3991.00  2.2% dev_queue_xmit                 vmlinux

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosit: enable lockless xmits
Eric Dumazet [Tue, 28 Sep 2010 02:53:41 +0000 (02:53 +0000)]
sit: enable lockless xmits

SIT tunnels can benefit from lockless xmits, using NETIF_F_LLTX

Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one sit tunnel (size:220 bytes per frame)

Before patch :

real 3m15.399s
user 0m9.185s
sys 51m55.403s

75029.00 87.5% _raw_spin_lock            vmlinux
 1090.00  1.3% dst_release               vmlinux
  902.00  1.1% dev_queue_xmit            vmlinux
  627.00  0.7% sock_wfree                vmlinux
  613.00  0.7% ip6_push_pending_frames   ipv6.ko
  505.00  0.6% __ip_route_output_key     vmlinux

After patch:

real 1m1.387s
user 0m12.489s
sys 15m58.868s

28239.00 23.3% dst_release               vmlinux
13570.00 11.2% ip6_push_pending_frames   ipv6.ko
13118.00 10.8% ip6_append_data           ipv6.ko
 7995.00  6.6% __ip_route_output_key     vmlinux
 7924.00  6.5% sk_dst_check              vmlinux
 5015.00  4.1% udpv6_sendmsg             ipv6.ko
 3594.00  3.0% sock_alloc_send_pskb      vmlinux
 3135.00  2.6% sock_wfree                vmlinux
 3055.00  2.5% ip6_sk_dst_lookup         ipv6.ko
 2473.00  2.0% ip_finish_output          vmlinux

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosit: fix percpu stats accounting
Eric Dumazet [Tue, 28 Sep 2010 02:17:58 +0000 (02:17 +0000)]
sit: fix percpu stats accounting

commit 15fc1f7056ebd (sit: percpu stats accounting) forgot the fallback
tunnel case (sit0), and can crash pretty fast.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipip: fix percpu stats accounting
Eric Dumazet [Mon, 27 Sep 2010 23:56:46 +0000 (23:56 +0000)]
ipip: fix percpu stats accounting

commit 3c97af99a5aa1 (ipip: percpu stats accounting) forgot the fallback
tunnel case (tunl0), and can crash pretty fast.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodummy: percpu stats and lockless xmit
Eric Dumazet [Mon, 27 Sep 2010 20:50:33 +0000 (20:50 +0000)]
dummy: percpu stats and lockless xmit

Converts dummy network device driver to :

- percpu stats

- 64bit stats

- lockless xmit (NETIF_F_LLTX)

- performance features added (NETIF_F_SG | NETIF_F_FRAGLIST |
NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: add a recursion limit in xmit path
Eric Dumazet [Wed, 29 Sep 2010 20:23:09 +0000 (13:23 -0700)]
net: add a recursion limit in xmit path

As tunnel devices are going to be lockless, we need to make sure a
misconfigured machine wont enter an infinite loop.

Add a percpu variable, and limit to three the number of stacked xmits.

Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipv6: Implement Any-IP support for IPv6.
Maciej Żenczykowski [Mon, 27 Sep 2010 00:07:02 +0000 (00:07 +0000)]
ipv6: Implement Any-IP support for IPv6.

AnyIP is the capability to receive packets and establish incoming
connections on IPs we have not explicitly configured on the machine.

An example use case is to configure a machine to accept all incoming
traffic on eth0, and leave the policy of whether traffic for a given IP
should be delivered to the machine up to the load balancer.

Can be setup as follows:
  ip -6 rule from all iif eth0 lookup 200
  ip -6 route add local default dev lo table 200
(in this case for all IPv6 addresses)

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipv4: Allow configuring subnets as local addresses
Tom Herbert [Sun, 23 May 2010 19:54:12 +0000 (19:54 +0000)]
ipv4: Allow configuring subnets as local addresses

This patch allows a host to be configured to respond to any address in
a specified range as if it were local, without actually needing to
configure the address on an interface.  This is done through routing
table configuration.  For instance, to configure a host to respond
to any address in 10.1/16 received on eth0 as a local address we can do:

ip rule add from all iif eth0 lookup 200
ip route add local 10.1/16 dev lo proto kernel scope host src 127.0.0.1 table 200

This host is now reachable by any 10.1/16 address (route lookup on
input for packets received on eth0 can find the route).  On output, the
rule will not be matched so that this host can still send packets to
10.1/16 (not sent on loopback).  Presumably, external routing can be
configured to make sense out of this.

To make this work, we needed to modify the logic in finding the
interface which is assigned a given source address for output
(dev_ip_find).  We perform a normal fib_lookup instead of just a
lookup on the local table, and in the lookup we ignore the input
interface for matching.

This patch is useful to implement IP-anycast for subnets of virtual
addresses.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agopch_gbe: add header files
Randy Dunlap [Tue, 28 Sep 2010 17:34:27 +0000 (10:34 -0700)]
pch_gbe: add header files

Fix build errors, more header files needed.

drivers/net/pch_gbe/pch_gbe_main.c:965: error: implicit declaration of function 'tcp_hdr'
drivers/net/pch_gbe/pch_gbe_main.c:965: error: invalid type argument of '->' (have 'int')
drivers/net/pch_gbe/pch_gbe_main.c:968: error: invalid type argument of '->' (have 'int')
drivers/net/pch_gbe/pch_gbe_main.c:976: error: implicit declaration of function 'udp_hdr'
drivers/net/pch_gbe/pch_gbe_main.c:976: error: invalid type argument of '->' (have 'int')
drivers/net/pch_gbe/pch_gbe_main.c:980: error: invalid type argument of '->' (have 'int')

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoDocumentation: Update Phonet doc for Pipe Controller implementation
Kumar Sanghvi [Mon, 27 Sep 2010 19:35:06 +0000 (19:35 +0000)]
Documentation: Update Phonet doc for Pipe Controller implementation

Updates the Phonet document with description related to Pipe controller
implementation

Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotg3: Use netif_set_real_num_{rx,tx}_queues()
Ben Hutchings [Tue, 28 Sep 2010 05:11:51 +0000 (22:11 -0700)]
tg3: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years ago8021q: Use netif_copy_real_num_queues() to set queue counts
Ben Hutchings [Mon, 27 Sep 2010 08:32:59 +0000 (08:32 +0000)]
8021q: Use netif_copy_real_num_queues() to set queue counts

This covers RX if necessary, as well as TX.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosfc: Use netif_set_real_num_{rx,tx}_queues()
Ben Hutchings [Mon, 27 Sep 2010 08:31:07 +0000 (08:31 +0000)]
sfc: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoniu: Use netif_set_real_num_{rx,tx}_queues()
Ben Hutchings [Mon, 27 Sep 2010 08:30:59 +0000 (08:30 +0000)]
niu: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agomyri10ge: Use netif_set_real_num_{rx, tx}_queues()
Ben Hutchings [Mon, 27 Sep 2010 08:30:34 +0000 (08:30 +0000)]
myri10ge: Use netif_set_real_num_{rx, tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agomv643xx_eth: Use netif_set_real_num_{rx, tx}_queues()
Ben Hutchings [Mon, 27 Sep 2010 08:30:05 +0000 (08:30 +0000)]
mv643xx_eth: Use netif_set_real_num_{rx, tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agomlx4_en: Use netif_set_real_num_{rx, tx}_queues()
Ben Hutchings [Mon, 27 Sep 2010 08:29:34 +0000 (08:29 +0000)]
mlx4_en: Use netif_set_real_num_{rx, tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbe: Use netif_set_real_num_{rx,tx}_queues()
Ben Hutchings [Mon, 27 Sep 2010 08:28:56 +0000 (08:28 +0000)]
ixgbe: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoigb: Use netif_set_real_num_{rx,tx}_queues()
Ben Hutchings [Mon, 27 Sep 2010 08:28:39 +0000 (08:28 +0000)]
igb: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agogianfar: Use netif_set_real_num_rx_queues()
Ben Hutchings [Mon, 27 Sep 2010 08:27:37 +0000 (08:27 +0000)]
gianfar: Use netif_set_real_num_rx_queues()

Do not set num_tx_queues or real_num_tx_queues, since
alloc_etherdev_mq() does that.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocxgb4vf: Use netif_set_real_num_{rx, tx}_queues()
Ben Hutchings [Mon, 27 Sep 2010 08:26:10 +0000 (08:26 +0000)]
cxgb4vf: Use netif_set_real_num_{rx, tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocxgb4: Use netif_set_real_num_{rx,tx}_queues()
Ben Hutchings [Mon, 27 Sep 2010 08:25:57 +0000 (08:25 +0000)]
cxgb4: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agocxgb3: Use netif_set_real_num_{rx,tx}_queues()
Ben Hutchings [Mon, 27 Sep 2010 08:25:44 +0000 (08:25 +0000)]
cxgb3: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobnx2x: Use netif_set_real_num_{rx,tx}_queues()
Ben Hutchings [Mon, 27 Sep 2010 08:25:27 +0000 (08:25 +0000)]
bnx2x: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobnx2: Use netif_set_real_num_{rx,tx}_queues()
Ben Hutchings [Mon, 27 Sep 2010 08:25:16 +0000 (08:25 +0000)]
bnx2: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: Add netif_copy_real_num_queues() for use by virtual net drivers
Ben Hutchings [Mon, 27 Sep 2010 08:24:49 +0000 (08:24 +0000)]
net: Add netif_copy_real_num_queues() for use by virtual net drivers

This sets the active numbers of queues on a net device to match another.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: Allow changing number of RX queues after device allocation
Ben Hutchings [Mon, 27 Sep 2010 08:24:33 +0000 (08:24 +0000)]
net: Allow changing number of RX queues after device allocation

For RPS, we create a kobject for each RX queue based on the number of
queues passed to alloc_netdev_mq().  However, drivers generally do not
determine the numbers of hardware queues to use until much later, so
this usually represents the maximum number the driver may use and not
the actual number in use.

For TX queues, drivers can update the actual number using
netif_set_real_num_tx_queues().  Add a corresponding function for RX
queues, netif_set_real_num_rx_queues().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: sk_{detach|attach}_filter() rcu fixes
Eric Dumazet [Mon, 27 Sep 2010 06:07:30 +0000 (06:07 +0000)]
net: sk_{detach|attach}_filter() rcu fixes

sk_attach_filter() and sk_detach_filter() are run with socket locked.

Use the appropriate rcu_dereference_protected() instead of blocking BH,
and rcu_dereference_bh().
There is no point adding BH prevention and memory barrier.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agofib: use atomic_inc_not_zero() in fib_rules_lookup
Eric Dumazet [Mon, 27 Sep 2010 04:18:27 +0000 (04:18 +0000)]
fib: use atomic_inc_not_zero() in fib_rules_lookup

It seems we dont use appropriate refcount increment in an
rcu_read_lock() protected section.

fib_rule_get() might increment a null refcount and bad things could
happen.

While fib_nl_delrule() respects an rcu grace period before calling
fib_rule_put(), fib_rules_cleanup_ops() calls fib_rule_put() without a
grace period.

Note : after this patch, we might avoid the synchronize_rcu() call done
in fib_nl_delrule()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosit: percpu stats accounting
Eric Dumazet [Mon, 27 Sep 2010 00:38:18 +0000 (00:38 +0000)]
sit: percpu stats accounting

Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.

Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.

This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipip: percpu stats accounting
Eric Dumazet [Mon, 27 Sep 2010 00:35:50 +0000 (00:35 +0000)]
ipip: percpu stats accounting

Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.

Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.

This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoip_gre: percpu stats accounting
Eric Dumazet [Mon, 27 Sep 2010 03:57:11 +0000 (03:57 +0000)]
ip_gre: percpu stats accounting

Le lundi 27 septembre 2010 à 14:29 +0100, Ben Hutchings a écrit :

> > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> > index 5d6ddcb..de39b22 100644
> > --- a/net/ipv4/ip_gre.c
> > +++ b/net/ipv4/ip_gre.c
> [...]
> > @@ -377,7 +405,7 @@ static struct ip_tunnel *ipgre_tunnel_locate(struct net *net,
> >   if (parms->name[0])
> >   strlcpy(name, parms->name, IFNAMSIZ);
> >   else
> > - sprintf(name, "gre%%d");
> > + strcpy(name, "gre%d");
> >
> >   dev = alloc_netdev(sizeof(*t), name, ipgre_tunnel_setup);
> >   if (!dev)
> [...]
>
> This is a valid fix, but doesn't belong in this patch!
>

Sorry ? It was not a fix, but at most a cleanup ;)

Anyway I forgot the gretap case...

[PATCH 2/4 v2] ip_gre: percpu stats accounting

Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.

Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.

This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotunnels: prepare percpu accounting
Eric Dumazet [Mon, 27 Sep 2010 00:33:35 +0000 (00:33 +0000)]
tunnels: prepare percpu accounting

Tunnels are going to use percpu for their accounting.

They are going to use a new tstats field in net_device.

skb_tunnel_rx() is changed to be a wrapper around __skb_tunnel_rx()

IPTUNNEL_XMIT() is changed to be a wrapper around __IPTUNNEL_XMIT()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agovlan: use this_cpu_ptr() in vlan_skb_recv()
Eric Dumazet [Sun, 26 Sep 2010 22:47:09 +0000 (22:47 +0000)]
vlan: use this_cpu_ptr() in vlan_skb_recv()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoPhonet: Implement Pipe Controller to support Nokia Slim Modems
Kumar Sanghvi [Sun, 26 Sep 2010 19:07:59 +0000 (19:07 +0000)]
Phonet: Implement Pipe Controller to support Nokia Slim Modems

Phonet stack assumes the presence of Pipe Controller, either in Modem or
on Application Processing Engine user-space for the Pipe data.
Nokia Slim Modems like WG2.5 used in ST-Ericsson U8500 platform do not
implement Pipe controller in them.
This patch adds Pipe Controller implemenation to Phonet stack to support
Pipe data over Phonet stack for Nokia Slim Modems.

Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
David S. Miller [Mon, 27 Sep 2010 08:03:03 +0000 (01:03 -0700)]
Merge branch 'master' of /linux/kernel/git/davem/net-2.6

Conflicts:
drivers/net/qlcnic/qlcnic_init.c
net/ipv4/ip_output.c

14 years agonet: r6040: store BIOS default MAC in perm_add
Otavio Salvador [Mon, 27 Sep 2010 02:58:07 +0000 (19:58 -0700)]
net: r6040: store BIOS default MAC in perm_add

Signed-off-by: Otavio Salvador <otavio@ossystems.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoipv6: add a missing unregister_pernet_subsys call
Neil Horman [Fri, 24 Sep 2010 09:55:52 +0000 (09:55 +0000)]
ipv6: add a missing unregister_pernet_subsys call

Clean up a missing exit path in the ipv6 module init routines.  In
addrconf_init we call ipv6_addr_label_init which calls register_pernet_subsys
for the ipv6_addr_label_ops structure.  But if module loading fails, or if the
ipv6 module is removed, there is no corresponding unregister_pernet_subsys call,
which leaves a now-bogus address on the pernet_list, leading to oopses in
subsequent registrations.  This patch cleans up both the failed load path and
the unload path.  Tested by myself with good results.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
 include/net/addrconf.h |    1 +
 net/ipv6/addrconf.c    |   11 ++++++++---
 net/ipv6/addrlabel.c   |    5 +++++
 3 files changed, 14 insertions(+), 3 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: loopback driver cleanup
Eric Dumazet [Thu, 23 Sep 2010 23:51:51 +0000 (23:51 +0000)]
net: loopback driver cleanup

loopback driver uses dev->ml_priv to store its percpu stats pointer.
It uses ugly casts "(void __percpu __force *)" to shut up sparse
complains.

Define an union to better document we use ml_priv in loopback driver and
define a lstats field with appropriate types.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: fix rcu use in ip_route_output_slow
Eric Dumazet [Thu, 23 Sep 2010 21:46:03 +0000 (21:46 +0000)]
net: fix rcu use in ip_route_output_slow

__in_dev_get_rtnl(dev_out) is called while RTNL is not held, thus
triggers a lockdep fault.

At this point, we only perform a raw test of dev_out->ip_ptr being NULL,
we dont need to make sure ip_ptr cant changed right after.

We can use rcu_dereference_raw() for this.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agorps: allocate rx queues in register_netdevice only
Eric Dumazet [Thu, 23 Sep 2010 17:26:35 +0000 (17:26 +0000)]
rps: allocate rx queues in register_netdevice only

Instead of having two places were we allocate dev->_rx, introduce
netif_alloc_rx_queues() helper and call it only from
register_netdevice(), not from alloc_netdev_mq()

Goal is to let drivers change dev->num_rx_queues after allocating netdev
and before registering it.

This also removes a lot of ifdefs in net/core/dev.c

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agos390: use free_netdev(netdev) instead of kfree()
Vasiliy Kulikov [Mon, 27 Sep 2010 01:56:06 +0000 (18:56 -0700)]
s390: use free_netdev(netdev) instead of kfree()

Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

@@
struct net_device* dev;
@@

-kfree(dev)
+free_netdev(dev)

Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosgiseeq: use free_netdev(netdev) instead of kfree()
Kulikov Vasiliy [Sat, 25 Sep 2010 23:58:06 +0000 (23:58 +0000)]
sgiseeq: use free_netdev(netdev) instead of kfree()

Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

@@
struct net_device* dev;
@@

-kfree(dev)
+free_netdev(dev)

Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agorionet: use free_netdev(netdev) instead of kfree()
Kulikov Vasiliy [Sat, 25 Sep 2010 23:58:03 +0000 (23:58 +0000)]
rionet: use free_netdev(netdev) instead of kfree()

Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

@@
struct net_device* dev;
@@

-kfree(dev)
+free_netdev(dev)

Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoibm_newemac: use free_netdev(netdev) instead of kfree()
Kulikov Vasiliy [Sat, 25 Sep 2010 23:58:00 +0000 (23:58 +0000)]
ibm_newemac: use free_netdev(netdev) instead of kfree()

Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

@@
struct net_device* dev;
@@

-kfree(dev)
+free_netdev(dev)

Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: update SOCK_MIN_RCVBUF
Eric Dumazet [Mon, 27 Sep 2010 01:53:07 +0000 (18:53 -0700)]
net: update SOCK_MIN_RCVBUF

SOCK_MIN_RCVBUF current value is 256 bytes

It doesnt permit to receive the smallest possible frame, considering
socket sk_rmem_alloc/sk_rcvbuf account skb truesizes. On 64bit arches,
sizeof(struct sk_buff) is 240 bytes. Add the typical 64 bytes of
headroom, and we go over the limit.

With old kernels and 32bit arches, we were under the limit, if netdriver
was doing copybreak.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosmsc911x: Add MODULE_ALIAS()
Vincent Stehlé [Mon, 27 Sep 2010 01:50:05 +0000 (18:50 -0700)]
smsc911x: Add MODULE_ALIAS()

This enables auto loading for the smsc911x ethernet driver.

Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: reset skb queue mapping when rx'ing over tunnel
Tom Herbert [Thu, 23 Sep 2010 11:19:54 +0000 (11:19 +0000)]
net: reset skb queue mapping when rx'ing over tunnel

Reset queue mapping when an skb is reentering the stack via a tunnel.
On second pass, the queue mapping from the original device is no
longer valid.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agodrivers/net: return operator cleanup
Eric Dumazet [Thu, 23 Sep 2010 05:40:09 +0000 (05:40 +0000)]
drivers/net: return operator cleanup

Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: skb_frag_t can be smaller on small arches
Eric Dumazet [Thu, 23 Sep 2010 05:06:54 +0000 (05:06 +0000)]
net: skb_frag_t can be smaller on small arches

On 32bit arches, if PAGE_SIZE is smaller than 65536, we can use 16bit
offset and size fields. This patch saves 72 bytes per skb on i386, or
128 bytes after rounding.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agobr2684: fix scheduling while atomic
Karl Hiramoto [Thu, 23 Sep 2010 01:50:54 +0000 (01:50 +0000)]
br2684: fix scheduling while atomic

You can't call atomic_notifier_chain_unregister() while in atomic context.

Fix, call un/register_atmdevice_notifier in module __init and __exit.

Bug report:
http://comments.gmane.org/gmane.linux.network/172603

Reported-by: Mikko Vinni <mmvinni@yahoo.com>
Tested-by: Mikko Vinni <mmvinni@yahoo.com>
Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: propagate NETIF_F_HIGHDMA to vlans
Eric Dumazet [Thu, 23 Sep 2010 00:46:11 +0000 (00:46 +0000)]
net: propagate NETIF_F_HIGHDMA to vlans

Automatically allows vlans to get NETIF_F_HIGHDMA if underlying device
supports it.

On 32bit arches (and more precisely if CONFIG_HIGHMEM is enabled), it
can help to reduce cost of illegal_highdma() and __skb_linearize()
calls.

Tested on tg3 , bnx2, bonding, this worked very well.

This is a generalization of a patch provided by Yi Zou & Jeff Kirsher.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agode2104x: fix TP link detection
Ondrej Zary [Sat, 25 Sep 2010 10:39:17 +0000 (10:39 +0000)]
de2104x: fix TP link detection

Compex FreedomLine 32 PnP-PCI2 cards have only TP and BNC connectors but the
SROM contains AUI port too. When TP loses link, the driver switches to
non-existing AUI port (which reports that carrier is always present).

Connecting TP back generates LinkPass interrupt but de_media_interrupt() is
broken - it only updates the link state of currently connected media, ignoring
the fact that LinkPass and LinkFail bits of MacStatus register belong to the
TP port only (the chip documentation says that).

This patch changes de_media_interrupt() to switch media to TP when link goes
up (and media type is not locked) and also to update the link state only when
the TP port is used.

Also the NonselPortActive (and also SelPortActive) bits of SIAStatus register
need to be cleared (by writing 1) after reading or they're useless.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agode2104x: fix power management
Ondrej Zary [Fri, 24 Sep 2010 23:57:02 +0000 (23:57 +0000)]
de2104x: fix power management

At least my 21041 cards come out of suspend with bus mastering disabled so
they did not work after resume(no data transferred).
After adding pci_set_master(), the driver oopsed immediately on resume -
because de_clean_rings() is called on suspend but de_init_rings() call
was missing in resume.

Also disable link (reset SIA) before sleep (de4x5 does this too).

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoixgbevf: Refactor ring parameter re-size
Greg Rose [Thu, 23 Sep 2010 12:46:25 +0000 (12:46 +0000)]
ixgbevf: Refactor ring parameter re-size

The function to resize the Tx/Rx rings had the potential to
dereference a NULL pointer and the code would attempt to resize
the Tx ring even if the Rx ring allocation had failed.  This
would cause some confusion in the return code semantics.  Fixed
up to just unwind the allocations if any of them fail and return
an error.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agode2104x: disable autonegotiation on broken hardware
Ondrej Zary [Thu, 23 Sep 2010 10:59:18 +0000 (10:59 +0000)]
de2104x: disable autonegotiation on broken hardware

At least on older 21041-AA chips (mine is rev. 11), TP duplex autonegotiation
causes the card not to work at all (link is up but no packets are transmitted).

de4x5 disables autonegotiation completely. But it seems to work on newer
(21041-PA rev. 21) so disable it only on rev<20 chips.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: fix a lockdep splat
Eric Dumazet [Wed, 22 Sep 2010 12:43:39 +0000 (12:43 +0000)]
net: fix a lockdep splat

We have for each socket :

One spinlock (sk_slock.slock)
One rwlock (sk_callback_lock)

Possible scenarios are :

(A) (this is used in net/sunrpc/xprtsock.c)
read_lock(&sk->sk_callback_lock) (without blocking BH)
<BH>
spin_lock(&sk->sk_slock.slock);
...
read_lock(&sk->sk_callback_lock);
...

(B)
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)

(C)
spin_lock_bh(&sk->sk_slock)
...
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)
spin_unlock_bh(&sk->sk_slock)

This (C) case conflicts with (A) :

CPU1 [A]                         CPU2 [C]
read_lock(callback_lock)
<BH>                             spin_lock_bh(slock)
<wait to spin_lock(slock)>
                                 <wait to write_lock_bh(callback_lock)>

We have one problematic (C) use case in inet_csk_listen_stop() :

local_bh_disable();
bh_lock_sock(child); // spin_lock_bh(&sk->sk_slock)
WARN_ON(sock_owned_by_user(child));
...
sock_orphan(child); // write_lock_bh(&sk->sk_callback_lock)

lockdep is not happy with this, as reported by Tetsuo Handa

It seems only way to deal with this is to use read_lock_bh(callbacklock)
everywhere.

Thanks to Jarek for pointing a bug in my first attempt and suggesting
this solution.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agostmmac: review the wake-up support
Giuseppe Cavallaro [Sat, 25 Sep 2010 04:27:41 +0000 (21:27 -0700)]
stmmac: review the wake-up support

If the PM support is available this is passed
through the platform instead to be hard-coded
in the core files.
WoL on Magic Frame can be enabled by using
the ethtool support.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: Add Gigabit Ethernet driver of Topcliff PCH
Masayuki Ohtake [Tue, 21 Sep 2010 01:44:11 +0000 (01:44 +0000)]
net: Add Gigabit Ethernet driver of Topcliff PCH

Signed-off-by: Masayuki Ohtake <masa-korg@dsn.okisemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoip: take care of last fragment in ip_append_data
Eric Dumazet [Mon, 20 Sep 2010 20:16:27 +0000 (20:16 +0000)]
ip: take care of last fragment in ip_append_data

While investigating a bit, I found ip_fragment() slow path was taken
because ip_append_data() provides following layout for a send(MTU +
N*(MTU - 20)) syscall :

- one skb with 1500 (mtu) bytes
- N fragments of 1480 (mtu-20) bytes (before adding IP header)
last fragment gets 17 bytes of trail data because of following bit:

if (datalen == length + fraggap)
alloclen += rt->dst.trailer_len;

Then esp4 adds 16 bytes of data (while trailer_len is 17... hmm...
another bug ?)

In ip_fragment(), we notice last fragment is too big (1496 + 20) > mtu,
so we take slow path, building another skb chain.

In order to avoid taking slow path, we should correct ip_append_data()
to make sure last fragment has real trail space, under mtu...

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: return operator cleanup
Eric Dumazet [Wed, 22 Sep 2010 20:43:57 +0000 (20:43 +0000)]
net: return operator cleanup

Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000: use GRO for receive
Jesse Brandeburg [Wed, 22 Sep 2010 18:23:05 +0000 (18:23 +0000)]
e1000: use GRO for receive

E1000 can benefit from calling the GRO receive functions.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000: fix occasional panic on unload
Jesse Brandeburg [Wed, 22 Sep 2010 18:22:42 +0000 (18:22 +0000)]
e1000: fix occasional panic on unload

Net drivers in general have an issue where timers fired
by mod_timer or work threads with schedule_work are running
outside of the rtnl_lock.

With no other lock protection these routines are vulnerable
to races with driver unload or reset paths.

The longer term solution to this might be a redesign with
safer locks being taken in the driver to guarantee no
reentrance, but for now a safe and effective fix is
to take the rtnl_lock in these routines.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000: use work queues
Jesse Brandeburg [Wed, 22 Sep 2010 18:22:17 +0000 (18:22 +0000)]
e1000: use work queues

E1000 is using several timers that in a follow on patch
will need to acquire the rtnl_lock in order to be safe.

This patch moves the timer bodies into work queues which
will allow the next patch to add rtnl_lock.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000/e1000e/igb/ixgb/ixgbe: set NETIF_F_HIGHDMA for VLAN feature flags
Yi Zou [Wed, 22 Sep 2010 17:57:58 +0000 (17:57 +0000)]
e1000/e1000e/igb/ixgb/ixgbe: set NETIF_F_HIGHDMA for VLAN feature flags

If the netdev->features is set with NETIF_F_HIGHDMA, we should set the
corresponding netdev->vlan_features as well to allow VLAN netdev created
on top of the real netdev to be able to also benefit from HIGHDMA on 32bit
system, reducing the performance hit that is caused by __skb_linearize(),
particularly for large send. This is fixed in this patch for all Intel e1000,
e1000e, igb, ixgbe, and ixgbe drivers since this should be beneficial
to all devices supported by these drivers.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoigb: Add support for DH89xxCC
Joseph Gasparakis [Wed, 22 Sep 2010 17:56:44 +0000 (17:56 +0000)]
igb: Add support for DH89xxCC

This patch adds support for the Intel(r) DH89xxCC series. The new
device will be using Intel(r) i347-AT4 and Marvell(r) M88E1322 and
M88E1112 PHYs. Support for these devices has also been added here.

Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoigb: clear VF_PROMISC bits instead of setting all other bits
Alexander Duyck [Wed, 22 Sep 2010 17:56:20 +0000 (17:56 +0000)]
igb: clear VF_PROMISC bits instead of setting all other bits

This change corrects an issue in which we were setting all flag bits except
for promisc instead of clearing the promisc bits due to the incorrect use
of an |= instead of an &=.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: 82579 do not gate auto config of PHY by hardware during nominal use
Bruce Allan [Wed, 22 Sep 2010 17:17:01 +0000 (17:17 +0000)]
e1000e: 82579 do not gate auto config of PHY by hardware during nominal use

For non-managed versions of 82579, set the bit that prevents the hardware
from automatically configuring the PHY after resets only when the driver
performs a reset, clear the bit after resets.  This is so the hardware can
configure the PHY automatically when the part is reset in a manner that is
not controlled by the driver (e.g. in a virtual environment via PCI FLR)
otherwise the PHY will be mis-configured causing issues such as failing to
link at 1000Mbps.
For managed versions of 82579, keep the previous behavior since the
manageability firmware will handle the PHY configuration.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: 82579 jumbo frame workaround causing CRC errors
Bruce Allan [Wed, 22 Sep 2010 17:16:40 +0000 (17:16 +0000)]
e1000e: 82579 jumbo frame workaround causing CRC errors

The subject workaround was causing CRC errors due to writing the wrong
register with updates of the RCTL register.  It was also found that the
workaround function which modifies the RCTL register was being called in
the middle of a read-modify-write operation of the RCTL register, so the
function call has been moved appropriately.  Lastly, jumbo frames must not
be allowed when CRC stripping is disabled by a module parameter because the
workaround requires the CRC be stripped.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: 82579 unaccounted missed packets
Bruce Allan [Wed, 22 Sep 2010 17:16:18 +0000 (17:16 +0000)]
e1000e: 82579 unaccounted missed packets

On 82579, there is a hardware bug that can cause received packets to not
get transferred from the PHY to the MAC due to K1 (a power saving feature
of the PHY-MAC interconnect similar to ASPM L1).  Since the MAC controls
the accounting of missed packets, these will go unnoticed.  Workaround the
issue by setting the K1 beacon duration according to the link speed.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: 82566DC fails to get link
Bruce Allan [Wed, 22 Sep 2010 17:15:54 +0000 (17:15 +0000)]
e1000e: 82566DC fails to get link

Two recent patches to cleanup the reset[1] and initial PHY configuration[2]
code paths for ICH/PCH devices inadvertently left out a 10msec delay and
device ID check respectively which are necessary for the 82566DC (device id
0x104b) to be configured properly, otherwise it will not get link.

[1] commit e98cac447cc1cc418dff1d610a5c79c4f2bdec7f
[2] commit 3f0c16e84438d657d29446f85fe375794a93f159

CC: stable@kernel.org
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: 82579 SMBus address and LEDs incorrect after device reset
Bruce Allan [Wed, 22 Sep 2010 17:15:33 +0000 (17:15 +0000)]
e1000e: 82579 SMBus address and LEDs incorrect after device reset

Since the hardware is prevented from performing automatic PHY configuration
(the driver does it instead), the OEM_WRITE_ENABLE bit in the EXTCNF_CTRL
register will not get cleared preventing the SMBus address and the LED
configuration to be written to the PHY registers.  On 82579, do not check
the OEM_WRITE_ENABLE bit.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoe1000e: 82577/8/9 issues with device in Sx
Bruce Allan [Wed, 22 Sep 2010 17:15:08 +0000 (17:15 +0000)]
e1000e: 82577/8/9 issues with device in Sx

When going to Sx, disable gigabit in PHY (e1000_oem_bits_config_ich8lan)
in addition to the MAC before configuring PHY wakeup otherwise the PHY
configuration writes might be missed.  Also write the LED configuration
and SMBus address to the PHY registers (e1000_oem_bits_config_ich8lan and
e1000_write_smbus_addr, respectively).  The reset is no longer needed
since re-auto-negotiation is forced in e1000_oem_bits_config_ich8lan and
leaving it in causes issues with auto-negotiating the link.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoxfrm4: strip ECN bits from tos field
Ulrich Weber [Wed, 22 Sep 2010 06:45:11 +0000 (06:45 +0000)]
xfrm4: strip ECN bits from tos field

otherwise ECT(1) bit will get interpreted as RTO_ONLINK
and routing will fail with XfrmOutBundleGenError.

Signed-off-by: Ulrich Weber <uweber@astaro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoatl1: zero out CMB and SBM in atl1_free_ring_resources
Luca Tettamanti [Wed, 22 Sep 2010 10:42:31 +0000 (10:42 +0000)]
atl1: zero out CMB and SBM in atl1_free_ring_resources

They are allocated in atl1_setup_ring_resources, zero out the pointers
in atl1_free_ring_resources (like the other resources).

Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Acked-by: Chris Snook <chris.snook@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoatl1: fix resume
Luca Tettamanti [Wed, 22 Sep 2010 10:41:58 +0000 (10:41 +0000)]
atl1: fix resume

adapter->cmb.cmb is initialized when the device is opened and freed when
it's closed. Accessing it unconditionally during resume results either
in a crash (NULL pointer dereference, when the interface has not been
opened yet) or data corruption (when the interface has been used and
brought down adapter->cmb.cmb points to a deallocated memory area).

Cc: stable@kernel.org
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Acked-by: Chris Snook <chris.snook@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoX.25 remove bkl in poll
andrew hendry [Tue, 21 Sep 2010 15:24:45 +0000 (15:24 +0000)]
X.25 remove bkl in poll

The x25_datagram_poll didn't add anything, removed it.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoX.25 remove bkl in getsockname
andrew hendry [Tue, 21 Sep 2010 15:24:25 +0000 (15:24 +0000)]
X.25 remove bkl in getsockname

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosfc: Add support for SFE4003 board and TXC43128 PHY
Ben Hutchings [Wed, 22 Sep 2010 10:00:47 +0000 (10:00 +0000)]
sfc: Add support for SFE4003 board and TXC43128 PHY

This board never went into production, but some engineering samples
are in use.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosfc: Remove support for SFN4111T, SFT9001 and Falcon GMAC
Ben Hutchings [Wed, 22 Sep 2010 10:00:11 +0000 (10:00 +0000)]
sfc: Remove support for SFN4111T, SFT9001 and Falcon GMAC

SFN4111T never reached production and is not being used for internal
or customer testing.

Since we have no production Falcon boards using the SFT9001 or the
GMAC, remove support for them as well.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonet: Move "struct net" declaration inside the __KERNEL__ macro guard
Ollie Wild [Wed, 22 Sep 2010 05:54:54 +0000 (05:54 +0000)]
net: Move "struct net" declaration inside the __KERNEL__ macro guard

This patch reduces namespace pollution by moving the "struct net" declaration
out of the userspace-facing portion of linux/netlink.h.  It has no impact on
the kernel.

(This came up because we have several C++ applications which use "net" as a
namespace name.)

Signed-off-by: Ollie Wild <aaw@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agonetfilter: nf_conntrack_defrag: check socket type before touching nodefrag flag
Jiri Olsa [Tue, 21 Sep 2010 21:17:34 +0000 (21:17 +0000)]
netfilter: nf_conntrack_defrag: check socket type before touching nodefrag flag

we need to check proper socket type within ipv4_conntrack_defrag
function before referencing the nodefrag flag.

For example the tun driver receive path produces skbs with
AF_UNSPEC socket type, and so current code is causing unwanted
fragmented packets going out.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>