platform/kernel/linux-exynos.git
11 years agonetfilter: remove unneeded variable proc_net_netfilter
Pablo Neira Ayuso [Fri, 5 Apr 2013 17:40:10 +0000 (19:40 +0200)]
netfilter: remove unneeded variable proc_net_netfilter

Now that this supports net namespace for nflog and nfqueue,
we can remove the global proc_net_netfilter which has no
clients anymore.

Based on patch from Gao feng.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: nfnetlink_queue: add net namespace support for nfnetlink_queue
Gao feng [Sun, 24 Mar 2013 23:50:47 +0000 (23:50 +0000)]
netfilter: nfnetlink_queue: add net namespace support for nfnetlink_queue

This patch makes /proc/net/netfilter/nfnetlink_queue pernet.
Moreover, there's a pernet instance table and lock.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: enable per netns support for nf_loggers
Gao feng [Fri, 5 Apr 2013 15:04:35 +0000 (17:04 +0200)]
netfilter: enable per netns support for nf_loggers

After this patch, all nf_loggers support net namespace. Still
xt_LOG and ebt_log require syslog netns support.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: nfnetlink_log: add net namespace support for nfnetlink_log
Gao feng [Sun, 24 Mar 2013 23:50:45 +0000 (23:50 +0000)]
netfilter: nfnetlink_log: add net namespace support for nfnetlink_log

This patch makes /proc/net/netfilter/nfnetlink_log pernet.
Moreover, there's a pernet instance table and lock.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: ipt_ULOG: add net namespace support for ipt_ULOG
Gao feng [Sun, 24 Mar 2013 23:50:44 +0000 (23:50 +0000)]
netfilter: ipt_ULOG: add net namespace support for ipt_ULOG

Add pernet support to ipt_ULOG by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

This patch also make ulog_buffers and netlink socket
nflognl per netns.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: ebt_ulog: add net namespace support for ebt_ulog
Gao feng [Sun, 24 Mar 2013 23:50:43 +0000 (23:50 +0000)]
netfilter: ebt_ulog: add net namespace support for ebt_ulog

Add pernet support to ebt_ulog by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

This patch also make ulog_buffers and netlink socket
ebtulognl per netns.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: xt_LOG: add net namespace support for xt_LOG
Gao feng [Sun, 24 Mar 2013 23:50:42 +0000 (23:50 +0000)]
netfilter: xt_LOG: add net namespace support for xt_LOG

Add pernet support to xt_LOG by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

Since syslog ns has yet not been implemented, we don't want
the containers to DDOS host's syslogd. So only enable ebt_log
only from init_net and wait for syslog ns support

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: ebt_log: add net namespace support for ebt_log
Gao feng [Sun, 24 Mar 2013 23:50:41 +0000 (23:50 +0000)]
netfilter: ebt_log: add net namespace support for ebt_log

Add pernet support to ebt_log by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

Since syslog ns has yet not been implemented, we don't want
the containers to DDOS host's syslogd. So only enable ebt_log
only from init_net and wait for syslog ns support.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: nf_log: prepare net namespace support for loggers
Gao feng [Sun, 24 Mar 2013 23:50:40 +0000 (23:50 +0000)]
netfilter: nf_log: prepare net namespace support for loggers

This patch adds netns support to nf_log and it prepares netns
support for existing loggers. It is composed of four major
changes.

1) nf_log_register has been split to two functions: nf_log_register
   and nf_log_set. The new nf_log_register is used to globally
   register the nf_logger and nf_log_set is used for enabling
   pernet support from nf_loggers.

   Per netns is not yet complete after this patch, it comes in
   separate follow up patches.

2) Add net as a parameter of nf_log_bind_pf. Per netns is not
   yet complete after this patch, it only allows to bind the
   nf_logger to the protocol family from init_net and it skips
   other cases.

3) Adapt all nf_log_packet callers to pass netns as parameter.
   After this patch, this function only works for init_net.

4) Make the sysctl net/netfilter/nf_log pernet.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: make /proc/net/netfilter pernet
Gao feng [Sun, 24 Mar 2013 23:50:39 +0000 (23:50 +0000)]
netfilter: make /proc/net/netfilter pernet

This patch makes this proc dentry pernet. So far only init_net
had a /proc/net/netfilter directory.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: fix struct ip6t_frag field description
Michal Kubeček [Tue, 2 Apr 2013 06:12:11 +0000 (08:12 +0200)]
netfilter: fix struct ip6t_frag field description

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: xt_NFQUEUE: coalesce IPv4 and IPv6 hashing
holger@eitzenberger.org [Sat, 23 Mar 2013 10:04:04 +0000 (10:04 +0000)]
netfilter: xt_NFQUEUE: coalesce IPv4 and IPv6 hashing

Because rev1 and rev3 of the target share the same hashing
generalize it by introduing nfqueue_hash().

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: xt_NFQUEUE: introduce CPU fanout
holger@eitzenberger.org [Sat, 23 Mar 2013 10:04:03 +0000 (10:04 +0000)]
netfilter: xt_NFQUEUE: introduce CPU fanout

Current NFQUEUE target uses a hash, computed over source and
destination address (and other parameters), for steering the packet
to the actual NFQUEUE. This, however forgets about the fact that the
packet eventually is handled by a particular CPU on user request.

If E. g.

  1) IRQ affinity is used to handle packets on a particular CPU already
     (both single-queue or multi-queue case)

and/or

  2) RPS is used to steer packets to a specific softirq

the target easily chooses an NFQUEUE which is not handled by a process
pinned to the same CPU.

The idea is therefore to use the CPU index for determining the
NFQUEUE handling the packet.

E. g. when having a system with 4 CPUs, 4 MQ queues and 4 NFQUEUEs it
looks like this:

 +-----+  +-----+  +-----+  +-----+
 |NFQ#0|  |NFQ#1|  |NFQ#2|  |NFQ#3|
 +-----+  +-----+  +-----+  +-----+
    ^        ^        ^        ^
    |        |NFQUEUE |        |
    +        +        +        +
 +-----+  +-----+  +-----+  +-----+
 |rx-0 |  |rx-1 |  |rx-2 |  |rx-3 |
 +-----+  +-----+  +-----+  +-----+

The NFQUEUEs not necessarily have to start with number 0, setups with
less NFQUEUEs than packet-handling CPUs are not a problem as well.

This patch extends the NFQUEUE target to accept a new
NFQ_FLAG_CPU_FANOUT flag. If this is specified the target uses the
CPU index for determining the NFQUEUE being used. I have to introduce
rev3 for this. The 'flags' are folded into _v2 'bypass'.

By changing the way which queue is assigned, I'm able to improve the
performance if the processes reading on the NFQUEUs are pinned
correctly.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: use IS_ENABLE to replace if defined in TRACE target
Gao feng [Thu, 21 Mar 2013 19:48:42 +0000 (19:48 +0000)]
netfilter: use IS_ENABLE to replace if defined in TRACE target

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agoipvs: do not disable bh for long time
Julian Anastasov [Fri, 22 Mar 2013 09:46:54 +0000 (11:46 +0200)]
ipvs: do not disable bh for long time

We used a global BH disable in LOCAL_OUT hook.
Add _bh suffix to all places that need it and remove
the disabling from LOCAL_OUT and sync code.

Functions like ip_defrag need protection from
BH, so add it. As for nf_nat_mangle_tcp_packet, it needs
RCU lock.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert services to rcu
Julian Anastasov [Fri, 22 Mar 2013 09:46:53 +0000 (11:46 +0200)]
ipvs: convert services to rcu

This is the final step in RCU conversion.

Things that are removed:

- svc->usecnt: now svc is accessed under RCU read lock
- svc->inc: and some unused code
- ip_vs_bind_pe and ip_vs_unbind_pe: no ability to replace PE
- __ip_vs_svc_lock: replaced with RCU
- IP_VS_WAIT_WHILE: now readers lookup svcs and dests under
RCU and work in parallel with configuration

Other changes:

- before now, a RCU read-side critical section included the
calling of the schedule method, now it is extended to include
service lookup
- ip_vs_svc_table and ip_vs_svc_fwm_table are now using hlist
- svc->pe and svc->scheduler remain to the end (of grace period),
the schedulers are prepared for such RCU readers
even after done_service is called but they need
to use synchronize_rcu because last ip_vs_scheduler_put
can happen while RCU read-side critical sections
use an outdated svc->scheduler pointer
- as planned, update_service is removed
- empty services can be freed immediately after grace period.
If dests were present, the services are freed from
the dest trash code

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert dests to rcu
Julian Anastasov [Fri, 22 Mar 2013 09:46:52 +0000 (11:46 +0200)]
ipvs: convert dests to rcu

In previous commits the schedulers started to access
svc->destinations with _rcu list traversal primitives
because the IP_VS_WAIT_WHILE macro still plays the role of
grace period. Now it is time to finish the updating part,
i.e. adding and deleting of dests with _rcu suffix before
removing the IP_VS_WAIT_WHILE in next commit.

We use the same rule for conns as for the
schedulers: dests can be searched in RCU read-side critical
section where ip_vs_dest_hold can be called by ip_vs_bind_dest.

Some things are not perfect, for example, calling
functions like ip_vs_lookup_dest from updating code under
RCU, just because we use some function both from reader
and from updater.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert sched_lock to spin lock
Julian Anastasov [Fri, 22 Mar 2013 09:46:51 +0000 (11:46 +0200)]
ipvs: convert sched_lock to spin lock

As all read_locks are gone spin lock is preferred.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: do not expect result from done_service
Julian Anastasov [Fri, 22 Mar 2013 09:46:50 +0000 (11:46 +0200)]
ipvs: do not expect result from done_service

This method releases the scheduler state,
it can not fail. Such change will help to properly
replace the scheduler in following patch.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: reorganize dest trash
Julian Anastasov [Fri, 22 Mar 2013 09:46:49 +0000 (11:46 +0200)]
ipvs: reorganize dest trash

All dests will go to trash, no exceptions.
But we have to use new list node t_list for this, due
to RCU changes in following patches. Dests will wait there
initial grace period and later all conns and schedulers to
put their reference. The dests don't get reference for
staying in dest trash as before.

As result, we do not load ip_vs_dest_put with
extra checks for last refcnt and the schedulers do not
need to play games with atomic_inc_not_zero while
selecting best destination.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert wrr scheduler to rcu
Julian Anastasov [Fri, 22 Mar 2013 09:46:48 +0000 (11:46 +0200)]
ipvs: convert wrr scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations. As the weight for some
dest can be reduced during dest selection, change the
algorithm to check weights by using minimum weights in the
1 .. max_weight-(di-1) range, with the same step (di). By this
way we ensure that there will be always a weight >= 1 check
before claiming that all destinations are overloaded.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert wlc scheduler to rcu
Julian Anastasov [Fri, 22 Mar 2013 09:46:47 +0000 (11:46 +0200)]
ipvs: convert wlc scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert sh scheduler to rcu
Julian Anastasov [Fri, 22 Mar 2013 09:46:46 +0000 (11:46 +0200)]
ipvs: convert sh scheduler to rcu

Use the 3 new methods to reassign dests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert sed scheduler to rcu
Julian Anastasov [Fri, 22 Mar 2013 09:46:45 +0000 (11:46 +0200)]
ipvs: convert sed scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert rr scheduler to rcu
Julian Anastasov [Fri, 22 Mar 2013 09:46:44 +0000 (11:46 +0200)]
ipvs: convert rr scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations. As the previous entry
could be unlinked, limit the list traversals to 2 when
lookup started from previous entry.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert nq scheduler to rcu
Julian Anastasov [Fri, 22 Mar 2013 09:46:43 +0000 (11:46 +0200)]
ipvs: convert nq scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert lc scheduler to rcu
Julian Anastasov [Fri, 22 Mar 2013 09:46:42 +0000 (11:46 +0200)]
ipvs: convert lc scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert lblcr scheduler to rcu
Julian Anastasov [Fri, 22 Mar 2013 09:46:41 +0000 (11:46 +0200)]
ipvs: convert lblcr scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations. The read_lock for sched_lock is
removed. The set.lock is removed because now it is used in
rare cases, mostly under sched_lock.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert lblc scheduler to rcu
Julian Anastasov [Fri, 22 Mar 2013 09:46:40 +0000 (11:46 +0200)]
ipvs: convert lblc scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations. The read_lock for sched_lock is
removed. Use a dead flag to prevent new entries to be created
while scheduler is reclaimed. Use hlist for the hash table.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert dh scheduler to rcu
Julian Anastasov [Fri, 22 Mar 2013 09:46:39 +0000 (11:46 +0200)]
ipvs: convert dh scheduler to rcu

Use the new add_dest and del_dest methods
to reassign dests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: add ip_vs_dest_hold and ip_vs_dest_put
Julian Anastasov [Fri, 22 Mar 2013 09:46:38 +0000 (11:46 +0200)]
ipvs: add ip_vs_dest_hold and ip_vs_dest_put

ip_vs_dest_hold will be used under RCU lock
while ip_vs_dest_put can be called even after dest
is removed from service, as it happens for conns and
some schedulers.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: preparations for using rcu in schedulers
Julian Anastasov [Fri, 22 Mar 2013 09:46:37 +0000 (11:46 +0200)]
ipvs: preparations for using rcu in schedulers

Allow schedulers to use rcu_dereference when
returning destination on lookup. The RCU read-side critical
section will allow ip_vs_bind_dest to get dest refcnt as
preparation for the step where destinations will be
deleted without an IP_VS_WAIT_WHILE guard that holds the
packet processing during update.

Add new optional scheduler methods add_dest,
del_dest and upd_dest. For now the methods are called
together with update_service but update_service will be
removed in a following change.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: change ip_vs_sched_lock to mutex
Julian Anastasov [Fri, 22 Mar 2013 09:46:36 +0000 (11:46 +0200)]
ipvs: change ip_vs_sched_lock to mutex

The global list with schedulers ip_vs_schedulers
is accessed only from user context - configuration and
scheduler module [un]registration. Use ip_vs_sched_mutex
instead.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: avoid kmem_cache_zalloc in ip_vs_conn_new
Julian Anastasov [Thu, 21 Mar 2013 09:58:12 +0000 (11:58 +0200)]
ipvs: avoid kmem_cache_zalloc in ip_vs_conn_new

We have many fields to set and few to reset,
use kmem_cache_alloc instead to save some cycles.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: reorder keys in connection structure
Julian Anastasov [Thu, 21 Mar 2013 09:58:11 +0000 (11:58 +0200)]
ipvs: reorder keys in connection structure

__ip_vs_conn_in_get and ip_vs_conn_out_get are
hot places. Optimize them, so that ports are matched first.
By moving net and fwmark below, on 32-bit arch we can fit
caddr in 32-byte cache line and all addresses in 64-byte
cache line.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert connection locking
Julian Anastasov [Thu, 21 Mar 2013 09:58:10 +0000 (11:58 +0200)]
ipvs: convert connection locking

Convert __ip_vs_conntbl_lock_array as follows:

- readers that do not modify conn lists will use RCU lock
- updaters that modify lists will use spinlock_t

Now for conn lookups we will use RCU read-side
critical section. Without using __ip_vs_conn_get such
places have access to connection fields and can
dereference some pointers like pe and pe_data plus
the ability to update timer expiration. If full access
is required we contend for reference.

We add barrier in __ip_vs_conn_put, so that
other CPUs see the refcnt operation after other writes.

With the introduction of ip_vs_conn_unlink()
we try to reorganize ip_vs_conn_expire(), so that
unhashing of connections that should stay more time is
avoided, even if it is for very short time.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert locks used in persistence engines
Julian Anastasov [Thu, 21 Mar 2013 09:58:09 +0000 (11:58 +0200)]
ipvs: convert locks used in persistence engines

Allow the readers to use RCU lock and for
PE module registrations use global mutex instead of
spinlock. All PE modules need to use synchronize_rcu
in their module exit handler.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: remove rs_lock by using RCU
Julian Anastasov [Thu, 21 Mar 2013 09:58:08 +0000 (11:58 +0200)]
ipvs: remove rs_lock by using RCU

rs_lock was used to protect rs_table (hash table)
from updaters (under global mutex) and readers (packet handlers).
We can remove rs_lock by using RCU lock for readers. Reclaiming
dest only with kfree_rcu is enough because the readers access
only fields from the ip_vs_dest structure.

Use hlist for rs_table.

As we are now using hlist_del_rcu, introduce in_rs_table
flag as replacement for the list_empty checks which do not
work with RCU. It is needed because only NAT dests are in
the rs_table.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert app locks
Julian Anastasov [Thu, 21 Mar 2013 09:58:07 +0000 (11:58 +0200)]
ipvs: convert app locks

We use locks like tcp_app_lock, udp_app_lock,
sctp_app_lock to protect access to the protocol hash tables
from readers in packet context while the application
instances (inc) are [un]registered under global mutex.

As the hash tables are mostly read when conns are
created and bound to app, use RCU for readers and reclaim
app instance after grace period.

Simplify ip_vs_app_inc_get because we use usecnt
only for statistics and rely on module refcounting.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: optimize dst usage for real server
Julian Anastasov [Thu, 21 Mar 2013 09:58:06 +0000 (11:58 +0200)]
ipvs: optimize dst usage for real server

Currently when forwarding requests to real servers
we use dst_lock and atomic operations when cloning the
dst_cache value. As the dst_cache value does not change
most of the time it is better to use RCU and to lock
dst_lock only when we need to replace the obsoleted dst.
For this to work we keep dst_cache in new structure protected
by RCU. For packets to remote real servers we will use noref
version of dst_cache, it will be valid while we are in RCU
read-side critical section because now dst_release for replaced
dsts will be invoked after the grace period. Packets to
local real servers that are passed to local stack with
NF_ACCEPT need a dst clone.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: consolidate all dst checks on transmit in one place
Julian Anastasov [Thu, 21 Mar 2013 09:58:05 +0000 (11:58 +0200)]
ipvs: consolidate all dst checks on transmit in one place

Consolidate the PMTU checks, ICMP sending and
skb_dst modification in __ip_vs_get_out_rt and
__ip_vs_get_out_rt_v6. Now skb_dst is changed early
to simplify the transmitters.

Make sure update_pmtu is called only for local clients.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: do not use skb_share_check
Julian Anastasov [Thu, 21 Mar 2013 09:58:04 +0000 (11:58 +0200)]
ipvs: do not use skb_share_check

We run in contexts like ip_rcv, ipv6_rcv, br_handle_frame,
do not expect shared skbs.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: no need to reroute anymore on DNAT over loopback
Julian Anastasov [Thu, 21 Mar 2013 09:58:03 +0000 (11:58 +0200)]
ipvs: no need to reroute anymore on DNAT over loopback

After commit 70e7341673 (ipv4: Show that ip_send_reply()
is purely unicast routine.) we do not need to reroute DNAT-ed
traffic over loopback because reply uses iph daddr and not
rt_spec_dst.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: rename functions related to dst_cache reset
Julian Anastasov [Thu, 21 Mar 2013 09:58:02 +0000 (11:58 +0200)]
ipvs: rename functions related to dst_cache reset

Move and give better names to two functions:

- ip_vs_dst_reset to __ip_vs_dst_cache_reset
- __ip_vs_dev_reset to ip_vs_forget_dev

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: convert the IP_VS_XMIT macros to functions
Julian Anastasov [Thu, 21 Mar 2013 09:58:01 +0000 (11:58 +0200)]
ipvs: convert the IP_VS_XMIT macros to functions

It was a bad idea to hide return statements in macros.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: prefer NETDEV_DOWN event to free cached dsts
Julian Anastasov [Thu, 21 Mar 2013 09:58:00 +0000 (11:58 +0200)]
ipvs: prefer NETDEV_DOWN event to free cached dsts

The real server becomes unreachable on down event,
no need to wait device unregistration. Should help in
releasing dsts early before dst->dev is replaced with lo.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agoipvs: avoid routing by TOS for real server
Julian Anastasov [Thu, 21 Mar 2013 09:57:59 +0000 (11:57 +0200)]
ipvs: avoid routing by TOS for real server

Avoid replacing the cached route for real server
on every packet with different TOS. I doubt that routing
by TOS for real server is used at all, so we should be
better with such optimization.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agonet: add skb_dst_set_noref_force
Julian Anastasov [Thu, 21 Mar 2013 09:57:58 +0000 (11:57 +0200)]
net: add skb_dst_set_noref_force

Rename skb_dst_set_noref to __skb_dst_set_noref and
add force flag as suggested by David Miller. The new wrapper
skb_dst_set_noref_force will force dst entries that are not
cached to be attached as skb dst without taking reference
as long as provided dst is reclaimed after RCU grace period.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
11 years agonet: add ETH_P_802_3_MIN
Simon Horman [Thu, 28 Mar 2013 04:38:25 +0000 (13:38 +0900)]
net: add ETH_P_802_3_MIN

Add a new constant ETH_P_802_3_MIN, the minimum ethernet type for
an 802.3 frame. Frames with a lower value in the ethernet type field
are Ethernet II.

Also update all the users of this value that David Miller and
I could find to use the new constant.

Also correct a bug in util.c. The comparison with ETH_P_802_3_MIN
should be >= not >.

As suggested by Jesse Gross.

Compile tested only.

Cc: David Miller <davem@davemloft.net>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Bart De Schuymer <bart.de.schuymer@pandora.be>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: linux-bluetooth@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
Cc: bridge@lists.linux-foundation.org
Cc: linux-wireless@vger.kernel.org
Cc: linux1394-devel@lists.sourceforge.net
Cc: linux-media@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: dev@openvswitch.org
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: fix compilation without CONFIG_BNX2X_SRIOV
Dmitry Kravkov [Wed, 27 Mar 2013 08:56:10 +0000 (08:56 +0000)]
bnx2x: fix compilation without CONFIG_BNX2X_SRIOV

Move mutex initialization by allocation of the mailbox it protects.

introduced in commit 1d6f3cd89 'bnx2x: Prevent VF race'

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotokenring: delete last holdout of CONFIG_TR
Paul Bolle [Wed, 27 Mar 2013 10:52:28 +0000 (10:52 +0000)]
tokenring: delete last holdout of CONFIG_TR

Tokenring support was deleted in v3.5. One last holdout of the macro
CONFIG_TR escaped that fate. Until now.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: fix compile error of implicit declaration of skb_probe_transport_header
Ying Xue [Wed, 27 Mar 2013 16:46:06 +0000 (16:46 +0000)]
net: fix compile error of implicit declaration of skb_probe_transport_header

The commit 40893fd(net: switch to use skb_probe_transport_header())
involes a new error accidently. When NET_SKBUFF_DATA_USES_OFFSE is
not enabled, below compile error happens:

  CC      net/packet/af_packet.o
  net/packet/af_packet.c: In function ‘packet_sendmsg_spkt’:
  net/packet/af_packet.c:1516:2: error: implicit declaration of function ‘skb_probe_transport_header’ [-Werror=implicit-function-declaration]
  cc1: some warnings being treated as errors
  make[2]: *** [net/packet/af_packet.o] Error 1
  make[1]: *** [net/packet] Error 2
  make: *** [net] Error 2

As it seems skb_probe_transport_header() is not related to
NET_SKBUFF_DATA_USES_OFFSE, we should move the definition of
skb_probe_transport_header() out of scope of
NET_SKBUFF_DATA_USES_OFFSE macro.

Cc: Jason Wang <jasowang@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoyam: remove redundant null check on dev
Colin Ian King [Wed, 27 Mar 2013 18:25:04 +0000 (18:25 +0000)]
yam: remove redundant null check on dev

yam_open has a redundant null check on null,  it will
never be called with dev == NULL. Remove this redundant check.
This also cleans up a smatch warning:

drivers/net/hamradio/yam.c:869 yam_open() warn: variable
  dereferenced before check 'dev' (see line 867)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Wed, 27 Mar 2013 17:52:49 +0000 (13:52 -0400)]
Merge git://git./linux/kernel/git/davem/net

Conflicts:
include/net/ipip.h

The changes made to ipip.h in 'net' were already included
in 'net-next' before that header was moved to another location.

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next
David S. Miller [Wed, 27 Mar 2013 17:14:03 +0000 (13:14 -0400)]
Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next

Marc Kleine-Budde says:

===================
this is a pull-request for net-next/master. It consists of three
patches by Lars-Peter Clausen to clean up the mcp251x spi-can driver,
two patches from Ludovic Desroches to bring device tree support to the
at91_can driver and a patch by me to fix a sparse warning in the
blackfin driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agophy: Elimination the forced speed reduction algorithm.
Kirill Kapranov [Wed, 27 Mar 2013 01:16:13 +0000 (01:16 +0000)]
phy: Elimination the forced speed reduction algorithm.

In case of fixed speed set up for a NIC (e.g. ethtool -s eth0 autoneg off speed
100 duplex full) with an ethernet cable plugged off, the mentioned algorithm
slows down a NIC speed, so further cable hook-up leads to nonoperable link state.

Signed-off-by: Kirill Kapranov <kapranoff@inbox.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: use the frag lru_lock to protect netns_frags.nqueues update
Jesper Dangaard Brouer [Wed, 27 Mar 2013 05:55:56 +0000 (05:55 +0000)]
net: use the frag lru_lock to protect netns_frags.nqueues update

Move the protection of netns_frags.nqueues updates under the LRU_lock,
instead of the write lock.  As they are located on the same cacheline,
and this is also needed when transitioning to use per hash bucket locking.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: frag, avoid several CPUs grabbing same frag queue during LRU evictor loop
Jesper Dangaard Brouer [Wed, 27 Mar 2013 05:55:25 +0000 (05:55 +0000)]
net: frag, avoid several CPUs grabbing same frag queue during LRU evictor loop

The LRU list is protected by its own lock, since commit 3ef0eb0db4
(net: frag, move LRU list maintenance outside of rwlock), and
no-longer by a read_lock.

This makes it possible, to remove the inet_frag_queue, which is about
to be "evicted", from the LRU list head.  This avoids the problem, of
several CPUs grabbing the same frag queue.

Note, cannot remove the inet_frag_lru_del() call in fq_unlink()
called by inet_frag_kill(), because inet_frag_kill() is also used in
other situations.  Thus, we use list_del_init() to allow this
double list_del to work.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoppp: reuse print_hex_dump_bytes
Andy Shevchenko [Wed, 27 Mar 2013 05:54:14 +0000 (05:54 +0000)]
ppp: reuse print_hex_dump_bytes

There is a native function to dump hex buffers.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: core: let's use native isxdigit instead of custom
Andy Shevchenko [Wed, 27 Mar 2013 05:54:13 +0000 (05:54 +0000)]
net: core: let's use native isxdigit instead of custom

In kernel we have fast and pretty implementation of the isxdigit() function.
Let's use it.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: Cosmetic changes
Yaniv Rosner [Wed, 27 Mar 2013 01:05:19 +0000 (01:05 +0000)]
bnx2x: Cosmetic changes

Make few alignments, comment fixes and debug messages.

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: Support reading I2C EEPROM SFF8472
Yaniv Rosner [Wed, 27 Mar 2013 01:05:18 +0000 (01:05 +0000)]
bnx2x: Support reading I2C EEPROM SFF8472

Add full support for "ethtool -m" command.

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: Prevent VF race
Dmitry Kravkov [Wed, 27 Mar 2013 01:05:17 +0000 (01:05 +0000)]
bnx2x: Prevent VF race

The mail box containing the Vf-Pf messages is susceptible
to a race - it's possible for 2 flows to try and write commands,
causing one to override the other's message.
Use a mutex to synchronize the access, preventing said race.

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: Fix VF outer vlan removal
Ariel Elior [Wed, 27 Mar 2013 01:05:16 +0000 (01:05 +0000)]
bnx2x: Fix VF outer vlan removal

Outer vlan removal in VF queues was made according to the VF's
multi-function mode (which is never set).
Instead, the PF's multi-function mode should be used to determine
if outer vlan removal is needed.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: Fix VF statistics
Ariel Elior [Wed, 27 Mar 2013 01:05:15 +0000 (01:05 +0000)]
bnx2x: Fix VF statistics

After a VF performs load/unload its statistics become corrupt -
we now zero the statistics structures upon a VF device load.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: missing ARI should not be lethal
Ariel Elior [Wed, 27 Mar 2013 01:05:14 +0000 (01:05 +0000)]
bnx2x: missing ARI should not be lethal

If ARI forwarding flag is missing from the PCI bridge, remove SR-IOV
support instead of failing the probe process.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: Fix AER semaphore release
Yuval Mintz [Tue, 26 Mar 2013 23:28:03 +0000 (23:28 +0000)]
bnx2x: Fix AER semaphore release

Commit 7fa6f34 "AER revised" erroneously inserted an error-flow
in which a semaphore is released even though the attempt to take it
has failed.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: switch to use skb_probe_transport_header()
Jason Wang [Tue, 26 Mar 2013 23:11:22 +0000 (23:11 +0000)]
net: switch to use skb_probe_transport_header()

Switch to use the new help skb_probe_transport_header() to do the l4 header
probing for untrusted sources. For packets with partial csum, the header should
already been set by skb_partial_csum_set().

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: core: introduce skb_probe_transport_header()
Jason Wang [Tue, 26 Mar 2013 23:11:21 +0000 (23:11 +0000)]
net: core: introduce skb_probe_transport_header()

Sometimes, we need probe and set the transport header for packets (e.g from
untrusted source). This patch introduces a new helper
skb_probe_transport_header() which tries to probe and set the l4 header through
skb_flow_dissect(), if not just set the transport header to the hint passed by
caller.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: core: let skb_partial_csum_set() set transport header
Jason Wang [Tue, 26 Mar 2013 23:11:20 +0000 (23:11 +0000)]
net: core: let skb_partial_csum_set() set transport header

For untrusted packets with partial checksum, we need to set the transport header
for precise packet length estimation. We can just let skb_pratial_csum_set() to
do this to avoid extra call to skb_flow_dissect() and simplify the caller.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoptp_pch: eliminate a number of sparse warnings
Sahara [Tue, 26 Mar 2013 19:07:23 +0000 (19:07 +0000)]
ptp_pch: eliminate a number of sparse warnings

This fixes a number of sparse warnings like:
  warning: incorrect type in argument 2 (different address spaces)
    expected void volatile [noderef] <asn:2>*addr
    got unsigned int *<noident>

  warning: Using plain integer as NULL pointer

Additionally this fixes a warning from checkpatch.pl like:
  WARNING: sizeof pch_param.station should be sizeof(pch_param.station)

Signed-off-by: Sahara <keun-o.park@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge tag 'iommu-fixes-v3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 27 Mar 2013 16:25:11 +0000 (09:25 -0700)]
Merge tag 'iommu-fixes-v3.9-rc4' of git://git./linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
 "Here are some fixes which have collected since Linux v3.9-rc1.

  The most important one fixes a long-standing regressen which make
  re-hotplugged devices unusable when AMD IOMMU is used.

  The other patches fix build issues (build regression on OMAP and a
  section mismatch).  One patch just removes a duplicate header include."

* tag 'iommu-fixes-v3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Make sure dma_ops are set for hotplug devices
  x86, io_apic: remove duplicated include from irq_remapping.c
  iommu: OMAP: build only on OMAP2+
  amd_iommu_init: remove __init from amd_iommu_erratum_746_workaround

11 years agovfs/splice: Fix missed checks in new __kernel_write() helper
Al Viro [Wed, 27 Mar 2013 15:20:30 +0000 (15:20 +0000)]
vfs/splice: Fix missed checks in new __kernel_write() helper

Commit 06ae43f34bcc ("Don't bother with redoing rw_verify_area() from
default_file_splice_from()") lost the checks to test existence of the
write/aio_write methods.  My apologies ;-/

Eventually, we want that in fs/splice.c side of things (no point
repeating it for every buffer, after all), but for now this is the
obvious minimal fix.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agocan: bfin_can: declare locally used functions static
Marc Kleine-Budde [Thu, 11 Oct 2012 21:20:27 +0000 (23:20 +0200)]
can: bfin_can: declare locally used functions static

This patch fixes the following sparse warning:

drivers/net/can/bfin_can.c:415:13: warning: symbol 'bfin_can_interrupt' was not declared. Should it be static?
drivers/net/can/bfin_can.c:507:19: warning: symbol 'alloc_bfin_candev' was not declared. Should it be static?

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
11 years agocan: mcp251x: Use dev_pm_ops
Lars-Peter Clausen [Tue, 12 Mar 2013 12:13:53 +0000 (13:13 +0100)]
can: mcp251x: Use dev_pm_ops

Use dev_pm_ops instead of the deprecated legacy suspend/resume callbacks.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
11 years agocan: mcp251x: Use module_spi_driver
Lars-Peter Clausen [Tue, 12 Mar 2013 12:13:52 +0000 (13:13 +0100)]
can: mcp251x: Use module_spi_driver

By using module_spi_driver we can eliminate a few lines of boilerplate code.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
11 years agocan: mcp251x: Remove redundant spi driver bus initialization
Lars-Peter Clausen [Tue, 12 Mar 2013 12:13:51 +0000 (13:13 +0100)]
can: mcp251x: Remove redundant spi driver bus initialization

In ancient times it was necessary to manually initialize the bus field of an
spi_driver to spi_bus_type. These days this is done in spi_driver_register() so
we can drop the manual assignment.

The patch was generated using the following coccinelle semantic patch:
// <smpl>
@@
identifier _driver;
@@
struct spi_driver _driver = {
.driver = {
- .bus = &spi_bus_type,
},
};
// </smpl>

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
11 years agocan: Kconfig: CAN_AT91 depends on ARM
Ludovic Desroches [Mon, 11 Mar 2013 17:26:04 +0000 (18:26 +0100)]
can: Kconfig: CAN_AT91 depends on ARM

SAMA5D3 devices also embed CAN feature. Moreover if we want to produce a single
kernel image it is not useful to be too restrictive.

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
11 years agocan: at91_can: add dt support
Ludovic Desroches [Mon, 11 Mar 2013 17:26:03 +0000 (18:26 +0100)]
can: at91_can: add dt support

Add device tree support.

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
11 years agoiommu/amd: Make sure dma_ops are set for hotplug devices
Joerg Roedel [Tue, 26 Mar 2013 21:48:23 +0000 (22:48 +0100)]
iommu/amd: Make sure dma_ops are set for hotplug devices

There is a bug introduced with commit 27c2127 that causes
devices which are hot unplugged and then hot-replugged to
not have per-device dma_ops set. This causes these devices
to not function correctly. Fixed with this patch.

Cc: stable@vger.kernel.org
Reported-by: Andreas Degert <andreas.degert@googlemail.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
11 years ago6lowpan: use IEEE802154_ADDR_LEN instead of a magic number
Tony Cheneau [Tue, 26 Mar 2013 18:09:25 +0000 (18:09 +0000)]
6lowpan: use IEEE802154_ADDR_LEN instead of a magic number

Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years ago6lowpan: fix a small formatting issue
Tony Cheneau [Tue, 26 Mar 2013 18:09:24 +0000 (18:09 +0000)]
6lowpan: fix a small formatting issue

This formatting issue was introduced with commit
d4ac32365dcbfd341a87eae444c26679f889249a

Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoieee802154/at86rf230: Fix register names for RX_AACK_ON and TX_ARET_ON
stefan@datenfreihafen.org [Tue, 26 Mar 2013 12:41:31 +0000 (12:41 +0000)]
ieee802154/at86rf230: Fix register names for RX_AACK_ON and TX_ARET_ON

The register names have been wrong since the beginning but it only showed up now
as they are actualy used for the upcoming auto ACK support.

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoieee802154/at86rf230: Implement hardware address filter callback.
stefan@datenfreihafen.org [Tue, 26 Mar 2013 12:41:30 +0000 (12:41 +0000)]
ieee802154/at86rf230: Implement hardware address filter callback.

Implement the filter function to update short address, pan id and ieee
address on change. Allowing for hardware address filtering needed for
auto ACK.

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoieee802154/dgram: Pass source address in dgram_recvmsg
Stephen Röttger [Tue, 26 Mar 2013 12:41:29 +0000 (12:41 +0000)]
ieee802154/dgram: Pass source address in dgram_recvmsg

This patch lets dgram_recvmsg fill in the sockaddr struct in
msg->msg_name with the source address of the packet.
This is used by the userland functions recvmsg and recvfrom to get the
senders address.

[Stefan: Changed from old zigbee legacy tree to mainline]

Signed-off-by: Stephen Röttger <stephen.roettger@zero-entropy.de>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoVXLAN: Fix sparse warnings.
Pravin B Shelar [Tue, 26 Mar 2013 08:29:30 +0000 (08:29 +0000)]
VXLAN: Fix sparse warnings.

Fixes following warning:-
drivers/net/vxlan.c:471:35: warning: symbol 'dev' shadows an earlier one
drivers/net/vxlan.c:433:26: originally declared here
drivers/net/vxlan.c:794:34: warning: symbol 'vxlan' shadows an earlier one
drivers/net/vxlan.c:757:26: originally declared here

CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Wed, 27 Mar 2013 00:42:55 +0000 (17:42 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
 "stable fodder; assorted deadlock fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vt: synchronize_rcu() under spinlock is not nice...
  Nest rename_lock inside vfsmount_lock
  Don't bother with redoing rw_verify_area() from default_file_splice_from()

11 years agovt: synchronize_rcu() under spinlock is not nice...
Al Viro [Wed, 27 Mar 2013 00:30:17 +0000 (20:30 -0400)]
vt: synchronize_rcu() under spinlock is not nice...

vcs_poll_data_free() calls unregister_vt_notifier(), which calls
atomic_notifier_chain_unregister(), which calls synchronize_rcu().
Do it *after* we'd dropped ->f_lock.

Cc: stable@vger.kernel.org (all kernels since 2.6.37)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoNest rename_lock inside vfsmount_lock
Al Viro [Tue, 26 Mar 2013 22:25:57 +0000 (18:25 -0400)]
Nest rename_lock inside vfsmount_lock

... lest we get livelocks between path_is_under() and d_path() and friends.

The thing is, wrt fairness lglocks are more similar to rwsems than to rwlocks;
it is possible to have thread B spin on attempt to take lock shared while thread
A is already holding it shared, if B is on lower-numbered CPU than A and there's
a thread C spinning on attempt to take the same lock exclusive.

As the result, we need consistent ordering between vfsmount_lock (lglock) and
rename_lock (seq_lock), even though everything that takes both is going to take
vfsmount_lock only shared.

Spotted-by: Brad Spengler <spender@grsecurity.net>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Tue, 26 Mar 2013 21:24:29 +0000 (14:24 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Always increment IPV4 ID field in encapsulated GSO packets, even
    when DF is set.  Regression fix from Pravin B Shelar.

 2) Fix per-net subsystem initialization in netfilter conntrack,
    otherwise we may access dynamically allocated memory before it is
    actually allocated.  From Gao Feng.

 3) Fix DMA buffer lengths in iwl3945 driver, from Stanislaw Gruszka.

 4) Fix race between submission of sync vs async commands in mwifiex
    driver, from Amitkumar Karwar.

 5) Add missing cancel of command timer in mwifiex driver, from Bing
    Zhao.

 6) Missing SKB free in rtlwifi USB driver, from Jussi Kivilinna.

 7) Thermal layer tries to use a genetlink multicast string that is
    longer than the 16 character limit.  Fix it and add a BUG check to
    prevent this kind of thing from happening in the future.

 From Masatake YAMATO.

 8) Fix many bugs in the handling of the teardown of L2TP connections,
    UDP encapsulation instances, and sockets.  From Tom Parkin.

 9) Missing socket release in IRDA, from Kees Cook.

10) Fix fec driver modular build, from Fabio Estevam.

11) Erroneous use of kfree() instead of free_netdev() in lantiq_etop,
    from Wei Yongjun.

12) Fix bugs in handling of queue numbers and steering rules in mlx4
    driver, from Moshe Lazer, Hadar Hen Zion, and Or Gerlitz.

13) Some FOO_DIAG_MAX constants were defined off by one, fix from Andrey
    Vagin.

14) TCP segmentation deferral is unintentionally done too strongly,
    breaking ACK clocking.  Fix from Eric Dumazet.

15) net_enable_timestamp() can legitimately be invoked from software
    interrupts, and in a way that is safe, so remove the WARN_ON().
    Also from Eric Dumazet.

16) Fix use after free in VLANs, from Cong Wang.

17) Fix TCP slow start retransmit storms after SACK reneging, from
    Yuchung Cheng.

18) Unix socket release should mark a socket dead before NULL'ing out
    sock->sk, otherwise we can race.  Fix from Paul Moore.

19) IPV6 addrconf code can try to free static memory, from Hong Zhiguo.

20) Fix register mis-programming, NULL pointer derefs, and wrong PHC
    clock frequency in IGB driver.  From Lior LevyAlex Williamson, Jiri
    Benc, and Jeff Kirsher.

21) skb->ip_summed logic in pch_gbe driver is reversed, breaking packet
    forwarding.  Fix from Veaceslav Falico.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
  ipv4: Fix ip-header identification for gso packets.
  bonding: remove already created master sysfs link on failure
  af_unix: dont send SCM_CREDENTIAL when dest socket is NULL
  pch_gbe: fix ip_summed checksum reporting on rx
  igb: fix PHC stopping on max freq
  igb: make sensor info static
  igb: SR-IOV init reordering
  igb: Fix null pointer dereference
  igb: fix i350 anti spoofing config
  ixgbevf: don't release the soft entries
  ipv6: fix bad free of addrconf_init_net
  unix: fix a race condition in unix_release()
  tcp: undo spurious timeout after SACK reneging
  bnx2x: fix assignment of signed expression to unsigned variable
  bridge: fix crash when set mac address of br interface
  8021q: fix a potential use-after-free
  net: remove a WARN_ON() in net_enable_timestamp()
  tcp: preserve ACK clocking in TSO
  net: fix *_DIAG_MAX constants
  net/mlx4_core: Disallow releasing VF QPs which have steering rules
  ...

11 years agoMerge tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Tue, 26 Mar 2013 21:23:45 +0000 (14:23 -0700)]
Merge tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Fix an NFSv4 idmapper regression
 - Fix an Oops in the pNFS blocks client
 - Fix up various issues with pNFS layoutcommit
 - Ensure correct read ordering of variables in
   rpc_wake_up_task_queue_locked

* tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked
  NFSv4.1: Add a helper pnfs_commit_and_return_layout
  NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn
  NFSv4.1: Fix a race in pNFS layoutcommit
  pnfs-block: removing DM device maybe cause oops when call dev_remove
  NFSv4: Fix the string length returned by the idmapper

11 years agoipv4: Fix ip-header identification for gso packets.
Pravin B Shelar [Sun, 24 Mar 2013 17:36:29 +0000 (17:36 +0000)]
ipv4: Fix ip-header identification for gso packets.

ip-header id needs to be incremented even if IP_DF flag is set.
This behaviour was changed in commit 490ab08127cebc25e3a26
(IP_GRE: Fix IP-Identification).

Following patch fixes it so that identification is always
incremented.

Reported-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: remove already created master sysfs link on failure
Veaceslav Falico [Tue, 26 Mar 2013 16:43:28 +0000 (17:43 +0100)]
bonding: remove already created master sysfs link on failure

If slave sysfs symlink failes to be created - we end up without removing
the master sysfs symlink. Remove it in case of failure.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMAINTAINERS: Update qlge maintainers list
Shahed Shaikh [Tue, 26 Mar 2013 05:34:55 +0000 (05:34 +0000)]
MAINTAINERS: Update qlge maintainers list

Add myself to qlge maintainers list.

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: fec: TX Buffer incorrectly initialized
Jim Baxter [Tue, 26 Mar 2013 05:25:07 +0000 (05:25 +0000)]
net: fec: TX Buffer incorrectly initialized

The TX Buffer in fec_enet_alloc_buffers was being initialized
with the receive register define BD_ENET_RX_INT instead of
the transmit register define BD_ENET_TX_INT

Signed-off-by: Jim Baxter <jim_baxter@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'stmmac'
David S. Miller [Tue, 26 Mar 2013 16:54:13 +0000 (12:54 -0400)]
Merge branch 'stmmac'

Giuseppe CAVALLARO says:

====================
These patches enhance the driver adding the PTP support and the initial code
for RGMII/SGMII/TBI/RTBI modes.
Also this patches review the driver removing some Koption for selecting between
chain and ring modes. REally useful to validate the driver also at build time.
Before adding PTP, the extended descriptor support has been added because it
is mandatory to save HW timestamp in new dedicated descriptors. Also in this
case no Koption added.

Concerning the PTP, I have hacked/reviewed and tested many
part of these patches also verifying the back compatibility on
several HW and chips.

Concerning the SGMII/RGMII we have already discussed about the support
in the net.dev Mailing list with Byungho where these patchs were partially
analysed.
So I have only ported them against the latest net-next (and on
top of PTP). I have added some missing things: e.g. some parts of the
ethtool for ANE. As we clarified with Byungho, we will add further
enhancements on top of these patches if needed.

I have also built all against ARM/SH/X68 platforms and no issues on
ST-Boxes.

Thx goes to Rayagond that wrote and tested the PTP and to Byungho for SGMII.

V2: This Version 2 has the fixes discussed in the ML, for example:
    o completely remove the Koption... all the decisions are made at probe time
    o review the PTP patches and better organize them just in two patches
    o added all the fixes provided by Richard on PTP and CLK driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agostmmac: update the Doc and Version (PTP+SGMII)
Giuseppe CAVALLARO [Tue, 26 Mar 2013 04:43:12 +0000 (04:43 +0000)]
stmmac: update the Doc and Version (PTP+SGMII)

This patch updates the stmmac.txt file adding information related to the PTP
and SGMII/RGMII supports.

Also the patch updates the driver version to: March_2013.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agostmmac: add the support for PTP hw clock driver
Rayagond Kokatanur [Tue, 26 Mar 2013 04:43:11 +0000 (04:43 +0000)]
stmmac: add the support for PTP hw clock driver

This patch implements PHC (ptp hardware clock) driver for stmmac
driver to support 1588 PTP.

V2: added support for FINE method, reduced loop delay and review spinlock.

Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com>
Hacked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agostmmac: add IEEE PTPv1 and PTPv2 support.
Rayagond Kokatanur [Tue, 26 Mar 2013 04:43:10 +0000 (04:43 +0000)]
stmmac: add IEEE PTPv1 and PTPv2 support.

This patch enhances the stmmac driver to support IEEE 1588-2002
PTP (Precision Time Protocol) version 1 and IEEE 1588-2008 PPT
version 2.

Precision Time Protocol(PTP),which enables precise synchronization
of clocks in measurement and control systems implemented with
technologies such as network communication,local computing,
& distributed objects.

Both PTPv1 and PTPv2 is selected at run-time using the HW capability
register.

The PTPv1 TimeStamp support can be used on chips that have the normal
descriptor structures and PTPv2 TimeStamp support can be used on chips
that have the Extended descriptors(DES4-5-6-7). All such sanity checks
are done and verified by using HW capability register.

V2: in this version the ethtool support has been included in this patch;
Koptions have been completely removed (previously added to select
PTP and PTPv2). PTPv1 and PTPv2 is now added in a single patch instead of
two patches.
get_timestamp() and get_systemtime() L/H have been combined into single APIs.

Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agostmmac: add tx_skbuff_dma to save descriptors used by PTP
Rayagond Kokatanur [Tue, 26 Mar 2013 04:43:09 +0000 (04:43 +0000)]
stmmac: add tx_skbuff_dma to save descriptors used by PTP

This patch adds a new pointer variable called "tx_skbuff_dma" to private
data structure. This variable will holds the physical address of packet
to be transmitted & same will be used to free/unmap the memory once the
corresponding packet is transmitted by device.

Prior to this patch the descriptor buffer pointer(ie des2) itself was
being used for freeing/unmapping the buffer memory. But in case PTP v1
with normal descriptor the field(des2) will be overwritten by device
with timestamp value, hence driver will loose the buffer pointer to be
freed/unmapped.

Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>