platform/kernel/linux-arm64.git
10 years agotipc: remove unnecessary variables and conditions
wangweidong [Thu, 12 Dec 2013 01:36:39 +0000 (09:36 +0800)]
tipc: remove unnecessary variables and conditions

We remove a number of unnecessary variables and branches
in TIPC. This patch is cosmetic and does not change the
operation of TIPC in any way.

Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet-ipv6: Fix alleged compiler warning in ipv6_exthdrs_len()
Jerry Chu [Mon, 16 Dec 2013 02:48:07 +0000 (18:48 -0800)]
net-ipv6: Fix alleged compiler warning in ipv6_exthdrs_len()

It was reported that Commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36
("net-gro: Prepare GRO stack for the upcoming tunneling support")
triggered a compiler warning in ipv6_exthdrs_len():

net/ipv6/ip6_offload.c: In function ‘ipv6_gro_complete’:
net/ipv6/ip6_offload.c:178:24: warning: ‘optlen’ may be used uninitialized in this function [-Wmaybe-u
    opth = (void *)opth + optlen;
     ^
    net/ipv6/ip6_offload.c:164:22: note: ‘optlen’ was declared here
    int len = 0, proto, optlen;
                        ^
Note that there was no real bug here - optlen was never uninitialized
before use. (Was the version of gcc I used smarter to not complain?)

Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc...
David S. Miller [Sun, 15 Dec 2013 03:33:45 +0000 (22:33 -0500)]
Merge branch 'for-davem' of git://git./linux/kernel/git/bwh/sfc-next

Ben Hutchings says:

====================
1. Change PTP clock name to 'sfc'.
2. Complete support for hardware timestamping and PTP clock on the
SFC9100 family.
3. Various cleanups for the PTP code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: fix compiler warning in ipv6_exthdrs_len
Hannes Frederic Sowa [Sat, 14 Dec 2013 06:29:29 +0000 (07:29 +0100)]
ipv6: fix compiler warning in ipv6_exthdrs_len

Commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO
stack for the upcoming tunneling support") used an uninitialized variable
which leads to the following compiler warning:

net/ipv6/ip6_offload.c: In function ‘ipv6_gro_complete’:
net/ipv6/ip6_offload.c:178:24: warning: ‘optlen’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    opth = (void *)opth + optlen;
                        ^
net/ipv6/ip6_offload.c:164:22: note: ‘optlen’ was declared here
  int len = 0, proto, optlen;
                      ^
Fix it up.

Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'bonding_rcu'
David S. Miller [Sat, 14 Dec 2013 06:58:19 +0000 (01:58 -0500)]
Merge branch 'bonding_rcu'

Ding Tianhong says:

====================
bonding: rebuild the lock use for bond monitor

Now the bond slave list is not protected by bond lock, only by RTNL,
but the monitor still use the bond lock to protect the slave list,
it is useless, according to the Veaceslav's opinion, there were
three way to fix the protect problem:

1. add bond_master_upper_dev_link() and bond_upper_dev_unlink()
   in bond->lock, but it is unsafe to call call_netdevice_notifiers()
   in write lock.
2. remove unused bond->lock for monitor function, only use the exist
   rtnl lock(), it will take performance loss in fast path.
3. use RCU to protect the slave list, of course, performance is better,
   but in slow path, it is ignored.

obviously the solution 1 is not fit here, I will consider the 2 and 3
solution. My principle is simple, if in fast path, RCU is better,
otherwise in slow path, both is well, but according to the Jay Vosburgh's
opinion, the monitor will loss performace if use RTNL to protect the all
slave list, so remove the bond lock and replace with RCU.

The second problem is the curr_slave_lock for bond, it is too old and
unwanted in many place, because the curr_active_slave would only be
changed in 3 place:

1. enslave slave.
2. release slave.
3. change active slave.

all above were already holding bond lock, RTNL and curr_slave_lock
together, it is tedious and no need to add so mach lock, when change
the curr_active_slave, you have to hold the RTNL and curr_slave_lock
together, and when you read the curr_active_slave, RTNL or curr_slave_lock,
any one of them is no problem.

for the stability, I did not change the logic for the monitor,
all change is clear and simple, I have test the patch set for lockdep,
it work well and stability.

v2. accept the Jay Vosburgh's opinion, remove the RTNL and replace with RCU,
    also add some rcu function for bond use, so the patch set reach 10.

v3. accept the Nikolay Aleksandrov's opinion, remove no needed bond_has_slave_rcu(),
    add protection for several 3ad mode handler functions and current_arp_slave.
    rebuild the bond_first_slave_rcu(), make it more clear.

v4. because the struct netdev_adjacent should not be exist in netdevice.h, so I have
    to make a new function to support micro bond_first_slave_rcu().
    also add a new patch to simplify the bond_resend_igmp_join_requests_delayed().

v5. according the Jay Vosburgh's opinion, in patch 2 and 6, the calling of notify
    peer is hardly to happen with the bond_xxx_commit() when the monitoring is running,
    so the performance impact about make two round trips to one trip on RTNL is minimal,
    no need to do that,the reason is very clear, so modify the patch 2 and 6, recover
    the notify peer in RTNL alone.
====================

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: rebuild the bond_resend_igmp_join_requests_delayed()
dingtianhong [Fri, 13 Dec 2013 02:20:26 +0000 (10:20 +0800)]
bonding: rebuild the bond_resend_igmp_join_requests_delayed()

The bond_resend_igmp_join_requests_delayed() and
bond_resend_igmp_join_requests() should be integrated,
because the bond_resend_igmp_join_requests_delayed() did
nothing except bond_resend_igmp_join_requests().

The bond igmp_retrans could only be changed in bond_change_active_slave
and here, bond_change_active_slave will be called in RTNL and curr_slave_lock,
the bond_resend_igmp_join_requests already hold RTNL, so no need
to free RTNL and hold curr_slave_lock again, it may be a small optimization,
so move the igmp_retrans in RTNL and remove the curr_slave_lock.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: remove unwanted lock for bond_store_primaryxxx()
dingtianhong [Fri, 13 Dec 2013 02:20:22 +0000 (10:20 +0800)]
bonding: remove unwanted lock for bond_store_primaryxxx()

The bond_select_active_slave() will not release and acquire
bond lock, so it is no need to read the bond lock for them,
and the bond_store_primaryxxx() is already in RTNL, so remove the
unwanted lock.

Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: remove unwanted lock for bond_option_active_slave_set()
dingtianhong [Fri, 13 Dec 2013 02:20:17 +0000 (10:20 +0800)]
bonding: remove unwanted lock for bond_option_active_slave_set()

The bond_option_active_slave_set() is always called in RTNL,
the RTNL could protect bond slave list, so remove the unwanted
bond lock.

Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: add RCU for bond_3ad_state_machine_handler()
dingtianhong [Fri, 13 Dec 2013 02:20:12 +0000 (10:20 +0800)]
bonding: add RCU for bond_3ad_state_machine_handler()

The bond_3ad_state_machine_handler() use the bond lock to protect
the bond slave list and slave port together, but it is not enough,
the bond slave list was link and unlink in RTNL, not bond lock,
so I add RCU to protect the slave list from leaving.

The bond lock is still used here, because when the slave has been
removed from the list by the time the state machine runs, it appears
to be possible for both function to manupulate the same aggregator->lag_ports
by finding the aggregator via two different ports that are both members of
that aggregator (i.e., port A of the agg is being unbound, and port B
of the agg is runing its state machine).

If I remove the bond lock, there are nothing to mutex changes
to aggregator->lag_ports between bond_3ad_state_machine_handler and
bond_3ad_unbind_slave, So the bond lock is the simplest way to protect
aggregator->lag_ports.

There was a lot of function need RCU protect, I have two choice
to make the function in RCU-safe, (1) create new similar functions
and make the bond slave list in RCU. (2) modify the existed functions
and make them in read-side critical section, because the RCU
read-side critical sections may be nested.

I choose (2) because it is no need to create more similar functions.

The nots in the function is still too old, clean up the nots.

Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: remove unwanted lock for bond enslave and release
dingtianhong [Fri, 13 Dec 2013 02:20:07 +0000 (10:20 +0800)]
bonding: remove unwanted lock for bond enslave and release

The bond_change_active_slave() and bond_select_active_slave()
do't need bond lock anymore, so remove the unwanted bond lock
for these two functions.

The bond_select_active_slave() will release and acquire
curr_slave_lock, so the curr_slave_lock need to protect
the function.

In bond enslave and bond release, the bond slave list is also
protected by RTNL, so bond lock is no need to exist, remove
the lock and clean the functions.

Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: rebuild the lock use for bond_activebackup_arp_mon()
dingtianhong [Fri, 13 Dec 2013 02:20:02 +0000 (10:20 +0800)]
bonding: rebuild the lock use for bond_activebackup_arp_mon()

The bond_activebackup_arp_mon() use the bond lock for read to
protect the slave list, it is no effect, and the RTNL is only
called for bond_ab_arp_commit() and peer notify, for the performance
better, use RCU to replace with the bond lock, to the bond slave
list need to called in RCU, add a new bond_first_slave_rcu()
to get the first slave in RCU protection.

In bond_ab_arp_probe(), the bond->current_arp_slave may changd
if bond release slave, just like:

        bond_ab_arp_probe()                     bond_release()
        cpu 0                                   cpu 1
        ...
        if (bond->current_arp_slave...)         ...
        ...                             bond->current_arp_slave = NULl
        bond->current_arp_slave->dev->name      ...

So the current_arp_slave need to dereference in the section.

Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: create bond_first_slave_rcu()
dingtianhong [Fri, 13 Dec 2013 02:19:55 +0000 (10:19 +0800)]
bonding: create bond_first_slave_rcu()

The bond_first_slave_rcu() will be used to instead of bond_first_slave()
in rcu_read_lock().

According to the Jay Vosburgh's suggestion, the struct netdev_adjacent
should hide from users who wanted to use it directly. so I package a
new function to get the first slave of the bond.

Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: rebuild the lock use for bond_loadbalance_arp_mon()
dingtianhong [Fri, 13 Dec 2013 02:19:50 +0000 (10:19 +0800)]
bonding: rebuild the lock use for bond_loadbalance_arp_mon()

The bond_loadbalance_arp_mon() use the bond lock to protect the
bond slave list, it is no effect, so I could use RTNL or RCU to
replace it, considering the performance impact, the RCU is more
better here, so the bond lock replace with the RCU.

The bond_select_active_slave() need RTNL and curr_slave_lock
together, but there is no RTNL lock here, so add a rtnl_rtylock.

Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: rebuild the lock use for bond_alb_monitor()
dingtianhong [Fri, 13 Dec 2013 02:19:45 +0000 (10:19 +0800)]
bonding: rebuild the lock use for bond_alb_monitor()

The bond_alb_monitor use bond lock to protect the bond slave list,
it is no effect here, we need to use RTNL or RCU to replace bond lock,
the bond_alb_monitor will called 10 times one second, RTNL may loss
performance here, so I replace bond lock with RCU to protect the
bond slave list, also the RTNL is preserved, the logic of the monitor
did not changed.

Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: rebuild the lock use for bond_mii_monitor()
dingtianhong [Fri, 13 Dec 2013 02:19:39 +0000 (10:19 +0800)]
bonding: rebuild the lock use for bond_mii_monitor()

The bond_mii_monitor() still use bond lock to protect bond slave list,
it is no effect, I have 2 way to fix the problem, move the RTNL to the
top of the function, or add RCU to protect the bond slave list,
according to the Jay Vosburgh's opinion, 10 times one second is a
truely big performance loss if use RTNL to protect the whole monitor,
so I would take the advice and use RCU to protect the bond slave list.

The bond_has_slave() will not protect by anything, there will no things
happen if the slave list is be changed, unless the bond was free, but
it will not happened before the monitor, the bond will closed before
be freed.

The peers notify for the bond will calling curr_active_slave, so
derefence the slave to make sure we will accessing the same slave
if the curr_active_slave changed, as the rcu dereference need in
read-side critical sector and bond_change_active_slave() will call
it with no RCU hold,  so add peer notify in rcu_read_lock which
will be nested in monitor.

Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: remove the no effect lock for bond_select_active_slave()
dingtianhong [Fri, 13 Dec 2013 02:19:32 +0000 (10:19 +0800)]
bonding: remove the no effect lock for bond_select_active_slave()

The bond slave list was no longer protected by bond lock and only
protected by RTNL or RCU, so anywhere that use bond lock to protect
slave list is meaningless.

remove the release and acquire bond lock for bond_select_active_slave().

The curr_active_slave could only be changed in 3 place:

1. enslave slave.
2. release slave.
3. change_active_slave.

all above place were holding bond lock, RTNL and curr_slave_lock
together, it is tedious and meaningless, obviously bond lock is no
need here, but RTNL or curr_slave_lock is needed, so if you want
to access active slave, you have to choose one lock, RTNL or
curr_slave_lock, if RTNL is exist, no need to add curr_slave_lock,
otherwise curr_slave_lock is better, because of the performance.

there are several place calling bond_select_active_slave() and
bond_change_active_slave(), the next step I will clean these place
and remove the no effect lock.

there are some document changed together when update the function.

Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agopkt_sched: set root qdisc before change() in attach_default_qdiscs()
Eric Dumazet [Thu, 12 Dec 2013 23:41:56 +0000 (15:41 -0800)]
pkt_sched: set root qdisc before change() in attach_default_qdiscs()

After commit 95dc19299f74 ("pkt_sched: give visibility to mq slave
qdiscs") we call disc_list_add() while the device qdisc might be
the noop_qdisc one.

This shows up as duplicates in "tc qdisc show", as all inactive devices
point to noop_qdisc.

Fix this by setting dev->qdisc to the new qdisc before calling
ops->change() in attach_default_qdiscs()

Add a WARN_ON_ONCE() to catch any future similar problem.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc...
David S. Miller [Sat, 14 Dec 2013 06:11:22 +0000 (01:11 -0500)]
Merge branch 'for-davem' of git://git./linux/kernel/git/bwh/sfc-next

Ben Hutchings says:

====================
An assortment of changes for Linux 3.14:

1. Merge the sfc fixes that you have already merged into net.git.
   (The branch point for those was such that this does not bring in any
   other changes.)
2. Reduce log level for a generally useless warning message, from
   Robert Stonehouse.
3. Include BISTs in ethtool offline self-test for EF10 and recover from
   BISTs initiated through other functions, from Jon Cooper.
4. Improve a sanity check on RX completions.
5. Avoid incrementing RX dropped count while the interface is down, from
   Jon Cooper.
6. Improve hardware sensor naming and log messages, from Edward Cree.
7. Log all unexpected errors returned by firmware, from Edward Cree.
8. Expose another NVRAM partition to userland.
9. Some refactoring of the PTP code in preparation for EF10 support.
10. Various minor cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'bonding_netlink'
David S. Miller [Sat, 14 Dec 2013 06:08:14 +0000 (01:08 -0500)]
Merge branch 'bonding_netlink'

Scott Feldman says:

====================
bonding: add more netlink attributes

v2:

Addressed v1 review comments.  In particular, Jay's concern about
current sysfs ordering limitations carrying over to iproute.  Netlink
attributes are processed in a priority order in
bond_netlink.c:bond_changelink().  Lower priority attributes can't undo
higher priority attributes when attempting to set both with iproute
command.  For example, this command will fail:

  ip link add bond1 type bond mode active-backup miimon 10 arp_interval 10

Because we're trying to create a new bond to use incompatible miimon
and ARP interval attributes.  However, if attributes are applied
one-at-a-time, previously applied attributes can be overridden:

  ip link add bond1 type bond mode active-backup miimon 10
  ip link set dev bond1 type bond arp_interval 10

These two commands succeed.  The bond is first created to use miimon.
Next, the bond is converted to use ARP interval, which undoes miimon.

v1:

Following Jiri Pirko's lead, add more bonding netlink attributes.  Sending
matching iproute2 patch separately.  sysfs access to attributes is
retained.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: add arp_all_targets netlink support
sfeldma@cumulusnetworks.com [Thu, 12 Dec 2013 22:10:45 +0000 (14:10 -0800)]
bonding: add arp_all_targets netlink support

Add IFLA_BOND_ARP_ALL_TARGETS to allow get/set of bonding parameter
arp_all_targets via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: add arp_validate netlink support
sfeldma@cumulusnetworks.com [Thu, 12 Dec 2013 22:10:38 +0000 (14:10 -0800)]
bonding: add arp_validate netlink support

Add IFLA_BOND_ARP_VALIDATE to allow get/set of bonding parameter
arp_validate via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: add arp_ip_target netlink support
sfeldma@cumulusnetworks.com [Thu, 12 Dec 2013 22:10:31 +0000 (14:10 -0800)]
bonding: add arp_ip_target netlink support

Add IFLA_BOND_ARP_IP_TARGET to allow get/set of bonding parameter
arp_ip_target via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: add arp_interval netlink support
sfeldma@cumulusnetworks.com [Thu, 12 Dec 2013 22:10:24 +0000 (14:10 -0800)]
bonding: add arp_interval netlink support

Add IFLA_BOND_ARP_INTERVAL to allow get/set of bonding parameter
arp_interval via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: add use_carrier netlink support
sfeldma@cumulusnetworks.com [Thu, 12 Dec 2013 22:10:16 +0000 (14:10 -0800)]
bonding: add use_carrier netlink support

Add IFLA_BOND_USE_CARRIER to allow get/set of bonding parameter
use_carrier via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: add downdelay netlink support
sfeldma@cumulusnetworks.com [Thu, 12 Dec 2013 22:10:09 +0000 (14:10 -0800)]
bonding: add downdelay netlink support

Add IFLA_BOND_DOWNDELAY to allow get/set of bonding parameter
downdelay via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: add updelay netlink support
sfeldma@cumulusnetworks.com [Thu, 12 Dec 2013 22:10:02 +0000 (14:10 -0800)]
bonding: add updelay netlink support

Add IFLA_BOND_UPDELAY to allow get/set of bonding parameter
updelay via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: add miimon netlink support
sfeldma@cumulusnetworks.com [Thu, 12 Dec 2013 22:09:55 +0000 (14:09 -0800)]
bonding: add miimon netlink support

Add IFLA_BOND_MIIMON to allow get/set of bonding parameter
miimon via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agopacket: fix using smp_processor_id() in preemptible code
Li Zhong [Thu, 12 Dec 2013 21:39:55 +0000 (22:39 +0100)]
packet: fix using smp_processor_id() in preemptible code

This patches fixes the following warning by replacing smp_processor_id()
with raw_smp_processor_id():

[   11.120893] BUG: using smp_processor_id() in preemptible [00000000] code: arping/3510
[   11.120913] caller is .packet_sendmsg+0xc14/0xe68
[   11.120920] CPU: 13 PID: 3510 Comm: arping Not tainted 3.13.0-rc3-next-20131211-dirty #1
[   11.120926] Call Trace:
[   11.120932] [c0000001f803f6f0] [c0000000000138dc] .show_stack+0x110/0x25c (unreliable)
[   11.120942] [c0000001f803f7e0] [c00000000083dd24] .dump_stack+0xa0/0x37c
[   11.120951] [c0000001f803f870] [c000000000493fd4] .debug_smp_processor_id+0xfc/0x12c
[   11.120959] [c0000001f803f900] [c0000000007eba78] .packet_sendmsg+0xc14/0xe68
[   11.120968] [c0000001f803fa80] [c000000000700968] .sock_sendmsg+0xa0/0xe0
[   11.120975] [c0000001f803fbf0] [c0000000007014d8] .SyS_sendto+0x100/0x148
[   11.120983] [c0000001f803fd60] [c0000000006fff10] .SyS_socketcall+0x1c4/0x2e8
[   11.120990] [c0000001f803fe30] [c00000000000a1e4] syscall_exit+0x0/0x9c

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonetconf: add proxy-arp support
stephen hemminger [Thu, 12 Dec 2013 21:06:50 +0000 (13:06 -0800)]
netconf: add proxy-arp support

Add support to netconf to show changes to proxy-arp status on a per
interface basis via netlink in a manner similar to forwarding
and reverse path state.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosfc: Remove unnecessary condition for processing the TX timestamp queue
Ben Hutchings [Thu, 28 Nov 2013 18:58:12 +0000 (18:58 +0000)]
sfc: Remove unnecessary condition for processing the TX timestamp queue

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Don't clear timestamps in efx_ptp_rx()
Ben Hutchings [Thu, 28 Nov 2013 18:58:11 +0000 (18:58 +0000)]
sfc: Don't clear timestamps in efx_ptp_rx()

A freshly allocated skb starts with timestamps clear.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Enable PTP clock and timestamping for all functions on EF10
Ben Hutchings [Thu, 5 Dec 2013 21:28:42 +0000 (21:28 +0000)]
sfc: Enable PTP clock and timestamping for all functions on EF10

The SFC9100 family has only one clock per controller, shared by all
functions.  Therefore only create a clock device under the primary
function, and make all other functions refer to the primary's clock
device.

Since PTP functionality is limited to port 0 and PF 0 on the earlier
SFN[56]322F boards, and we also set the primary flag for that
function, we can make the creation of a clock device conditional only
on this flag.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Associate primary and secondary functions of controller
Ben Hutchings [Fri, 18 Oct 2013 18:21:45 +0000 (19:21 +0100)]
sfc: Associate primary and secondary functions of controller

The primary function of an EF10 controller will share its clock
device with other functions in the same domain (which we call
secondary functions).  To this end, we need to associate functions
on the same controller.

We do not control probe order, so allow primary and secondary
functions to appear in any order.  Maintain global lists of all
primary functions and of unassociated secondary functions,
and a list of secondary functions on each primary function.

Use the VPD serial number to tell whether functions are part of the
same controller.  VPD will not be readable by virtual functions, so
this may need to be revisited later.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Store VPD serial number at probe time
Ben Hutchings [Thu, 5 Dec 2013 20:13:22 +0000 (20:13 +0000)]
sfc: Store VPD serial number at probe time

Original version by Stuart Hodgson.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Add RX packet timestamping for EF10
Jon Cooper [Mon, 18 Nov 2013 12:54:41 +0000 (12:54 +0000)]
sfc: Add RX packet timestamping for EF10

The EF10 firmware can optionally insert RX timestamps in the packet
prefix.  These only include the clock minor value.  We must also
enable periodic time sync events on each event queue which provide
the high bits of the clock value.

[bwh: Combined and rebased several changes.
 Added the above description and some sanity checks for inline vs
 separate timestamps.
 Changed efx_rx_skb_attach_timestamp() to read the packet prefix
 from the skb head area.]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Copy RX prefix into skb head area in efx_rx_mk_skb()
Ben Hutchings [Thu, 28 Nov 2013 18:58:11 +0000 (18:58 +0000)]
sfc: Copy RX prefix into skb head area in efx_rx_mk_skb()

We can potentially pull the entire packet contents into the head area
and then free the page it was in.  In order to read an inline
timestamp safely, we need to copy the prefix into the head area as
well.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: split setup of hardware timestamping into NIC-type operation
Daniel Pieczko [Thu, 21 Nov 2013 17:11:25 +0000 (17:11 +0000)]
sfc: split setup of hardware timestamping into NIC-type operation

I added efx_ptp_get_mode() to avoid moving the definition for
efx_ptp_data, since the current PTP mode is needed for
siena.c:siena_set_ptp_hwtstamp.

[bwh: Also move the rx_filters mask, and add kernel-doc]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Add support for SFC9100 timestamp format
Laurence Evans [Wed, 4 Dec 2013 23:47:56 +0000 (23:47 +0000)]
sfc: Add support for SFC9100 timestamp format

The clock minor tick on the SFC9100 family is 2^-27 s, not 1 ns.
There are also various pipeline delays which we need to correct for
when interpreting timestamps.

We query the firmware for the clock format and corrections at run-time.

[bwh: Combined and rebased several changes]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Tidy up PTP synchronization code
Laurence Evans [Thu, 21 Nov 2013 10:38:24 +0000 (10:38 +0000)]
sfc: Tidy up PTP synchronization code

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: PTP - tidy up unused/useless variables
Laurence Evans [Thu, 21 Nov 2013 10:38:24 +0000 (10:38 +0000)]
sfc: PTP - tidy up unused/useless variables

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Remove kernel-doc for efx_ptp_data fields not present in this version
Ben Hutchings [Fri, 6 Dec 2013 18:52:34 +0000 (18:52 +0000)]
sfc: Remove kernel-doc for efx_ptp_data fields not present in this version

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Initialise efx_ptp_data::phc_clock_info from a static template
Ben Hutchings [Wed, 16 Oct 2013 17:32:41 +0000 (18:32 +0100)]
sfc: Initialise efx_ptp_data::phc_clock_info from a static template

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Do not use MAC address as clock name
Ben Hutchings [Wed, 16 Oct 2013 17:32:39 +0000 (18:32 +0100)]
sfc: Do not use MAC address as clock name

We'll be sharing clocks between multiple functions with their own MAC
addresses.  The name field is now documented as 'A short "friendly
name" to identify the clock ...' and '... not meant to be a unique
id.'  So use the name 'sfc'.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Store flags from MC_CMD_DRV_ATTACH for later use
Ben Hutchings [Wed, 16 Oct 2013 17:32:34 +0000 (18:32 +0100)]
sfc: Store flags from MC_CMD_DRV_ATTACH for later use

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Remove dependency of PTP on having a dedicated channel
Ben Hutchings [Tue, 15 Oct 2013 16:54:56 +0000 (17:54 +0100)]
sfc: Remove dependency of PTP on having a dedicated channel

We need a dedicated channel on Siena to ensure we can match up
the separate RX and timestamp events for each PTP packet.  We won't
do this for EF10 as timestamps are delivered inline.

Pass a channel index of 0 to MC_CMD_PTP_OP_ENABLE when there is no
dedicated channel.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Split PTP multicast filter insertion/removal out of efx_ptp_{start,stop}()
Ben Hutchings [Tue, 15 Oct 2013 16:54:56 +0000 (17:54 +0100)]
sfc: Split PTP multicast filter insertion/removal out of efx_ptp_{start,stop}()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Return EBUSY for filter insertion on EF10, matching Falcon/Siena
Ben Hutchings [Wed, 9 Oct 2013 13:17:27 +0000 (14:17 +0100)]
sfc: Return EBUSY for filter insertion on EF10, matching Falcon/Siena

The MC firmware will return error MC_CMD_ERR_ENOSPC if filter
insertion fails due to lack of resources.  The net driver's filter
implementation for Falcon-architecture returns EBUSY.  They should
behave consistently, so for EF10 change ENOSPC to EBUSY.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Expose NVRAM_PARTITION_TYPE_LICENSE on EF10
Ben Hutchings [Wed, 9 Oct 2013 13:14:41 +0000 (14:14 +0100)]
sfc: Expose NVRAM_PARTITION_TYPE_LICENSE on EF10

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Fold efx_flush_all() into efx_stop_port() and update comments
Ben Hutchings [Tue, 8 Oct 2013 16:33:20 +0000 (17:33 +0100)]
sfc: Fold efx_flush_all() into efx_stop_port() and update comments

efx_flush_all() is a really misleading name - it has nothing to do
with e.g. flushing DMA queues.  Since it's called immediately after
efx_stop_port() and is highly dependent on what that does, combine
the two functions.

Update comments to explain what this is doing a little better.
Also update an related and erroneous comment in efx_start_port().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Map MCDI error MC_CMD_ERR_ENOTSUP to Linux EOPNOTSUPP
Ben Hutchings [Tue, 8 Oct 2013 15:36:58 +0000 (16:36 +0100)]
sfc: Map MCDI error MC_CMD_ERR_ENOTSUP to Linux EOPNOTSUPP

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Log all unexpected MCDI errors
Edward Cree [Fri, 31 May 2013 17:36:12 +0000 (18:36 +0100)]
sfc: Log all unexpected MCDI errors

Split each of efx_mcdi_rpc, efx_mcdi_rpc_finish, and efx_mcdi_rpc_async into
a normal and a _quiet version; made the former log MCDI errors with
netif_err (and include the raw MCDI error code), and the latter never log
them at all.  Changed various callers; any where some errors are expected
(but others are not) call the _quiet version and then if necessary log the
MCDI error themselves.  Said logging is done by new efx_mcdi_display_error.

Callers of efx_mcdi_rpc*_quiet functions which may want to log the error
need to ensure that their outbuf is big enough to hold an MCDI error; to
this end, they now use MCDI_DECLARE_BUF_OUT_OR_ERR, which always allocates
at least 8 bytes.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Add new sensor names
Ben Hutchings [Wed, 4 Dec 2013 20:17:28 +0000 (20:17 +0000)]
sfc: Add new sensor names

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Revise sensor names to be more understandable and consistent
Edward Cree [Thu, 3 Oct 2013 18:06:18 +0000 (19:06 +0100)]
sfc: Revise sensor names to be more understandable and consistent

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Report units in sensor warnings
Edward Cree [Mon, 30 Sep 2013 09:52:49 +0000 (10:52 +0100)]
sfc: Report units in sensor warnings

Add units to the "Sensor reports condition X for raw value Y" messages.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Correct RX dropped count for drops while interface is down
Jon Cooper [Mon, 30 Sep 2013 16:36:50 +0000 (17:36 +0100)]
sfc: Correct RX dropped count for drops while interface is down

We don't directly control RX ingress on Siena or any later
controllers, and so we cannot prevent packets from entering the RX
datapath while the RX queues are not set up.  This results in
the hardware incrementing RX_NODESC_DROP_CNT, but it's not an
error and we should not include it in error stats.

When bringing an interface up or down, pull (or wait for) stats and
count the number of packets that were dropped while the interface was
down.  Subtract this from the reported RX dropped count.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Make initial fill of RX descriptors synchronous
Jon Cooper [Wed, 2 Oct 2013 10:04:14 +0000 (11:04 +0100)]
sfc: Make initial fill of RX descriptors synchronous

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Tighten the check for RX merged completion events
Ben Hutchings [Tue, 24 Sep 2013 22:21:57 +0000 (23:21 +0100)]
sfc: Tighten the check for RX merged completion events

The addition of RX event merging support means we don't reliably
detect dropped RX events now.  Currently we will only detect them if
the previous event for the RX queue had the CONT bit set.

Only accept RX completion events as merged if the
GET_CAPABILITIES_OUT_RX_BATCHING bit is set in datapath_caps (which it
won't be for the low-latency datapath) and the CONT bit is not set on
the event.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agosfc: Add MC BISTs to ethtool offline self test on EF10
Jon Cooper [Mon, 16 Sep 2013 13:18:51 +0000 (14:18 +0100)]
sfc: Add MC BISTs to ethtool offline self test on EF10

To run BISTs the MC goes down in to a special mode where it will only
respond to MCDI from the testing PF, and TX, RX and event queues are
torn down. Other PFs get a message as it goes down to tell them it's
going down.

When the other PFs get this message, they check the soft status
register to tell when the MC has rebooted after BIST mode and they can
start recovery.

[bwh: Convert the test result to 1 or -1 as for earlier NICs]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
10 years agoipv6: fix incorrect type in declaration
Florent Fourcot [Thu, 12 Dec 2013 16:07:58 +0000 (17:07 +0100)]
ipv6: fix incorrect type in declaration

Introduced by 1397ed35f22d7c30d0b89ba74b6b7829220dfcfd
  "ipv6: add flowinfo for tcp6 pkt_options for all cases"

Reported-by: kbuild test robot <fengguang.wu@intel.com>
V2: fix the title, add empty line after the declaration (Sergei Shtylyov
feedbacks)

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: eth: 8390: remove section warning in etherh.c
Olof Johansson [Thu, 12 Dec 2013 08:39:37 +0000 (00:39 -0800)]
net: eth: 8390: remove section warning in etherh.c

Commit c45f812f0280 ('8390 : Replace ei_debug with msg_enable/NETIF_MSG_*
feature') ended up moving the printout of version[] from something that
will be compiled out due to defines, to something that is now evaluated
at runtime.

That means that what always used to be an access to an __initdata string
from non-__init code started showing up as a section mismatch when it
didn't before.

All other 8390 versions skip __initdata on the version string, and
starting to annotate the whole chain of callers with __init seems like
more churn than it's worth on this driver, so remove it from etherh.c as well.

Fixes: c45f812f0280 ('8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature')
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet-gro: Prepare GRO stack for the upcoming tunneling support
Jerry Chu [Thu, 12 Dec 2013 04:53:45 +0000 (20:53 -0800)]
net-gro: Prepare GRO stack for the upcoming tunneling support

This patch modifies the GRO stack to avoid the use of "network_header"
and associated macros like ip_hdr() and ipv6_hdr() in order to allow
an arbitary number of IP hdrs (v4 or v6) to be used in the
encapsulation chain. This lays the foundation for various IP
tunneling support (IP-in-IP, GRE, VXLAN, SIT,...) to be added later.

With this patch, the GRO stack traversing now is mostly based on
skb_gro_offset rather than special hdr offsets saved in skb (e.g.,
skb->network_header). As a result all but the top layer (i.e., the
the transport layer) must have hdrs of the same length in order for
a pkt to be considered for aggregation. Therefore when adding a new
encap layer (e.g., for tunneling), one must check and skip flows
(e.g., by setting NAPI_GRO_CB(p)->same_flow to 0) that have a
different hdr length.

Note that unlike the network header, the transport header can and
will continue to be set by the GRO code since there will be at
most one "transport layer" in the encap chain.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'macvtap_capture'
David S. Miller [Thu, 12 Dec 2013 18:38:46 +0000 (13:38 -0500)]
Merge branch 'macvtap_capture'

Vlad Yasevich says:

====================
Add packet capture support on macvtap device

Change from RFC:
  - moved to the rx_handler approach.

This series adds support for packet capturing on macvtap device.
The initial approach was to simply export the capturing code as
a function from the core network.  While simple, it was not
a very architecturally clean approach.

The new appraoch is to provide macvtap with its rx_handler which can
is attached to the macvtap device itself.   Macvlan will simply requeue
the packet with an updated skb->dev.  BTW, macvlan layer already does this
for macvlan devices.  So, now macvtap and macvlan have almost the
same exact input path.

I've toyed with short-circuting the input path for macvtap by returning
RX_HANDLER_ANOTHER, but that just made the code more complicated and
didn't provide any kind of measurable gain (at least according to
netperf and perf runs on the host).

To see if there was a performance regression, I ran 1, 2 and 4 netperf
STREAM and MAERTS tests agains the VM from both remote host and another
guest on the same system.   The command ran was
    netperf -H $host -t $test -l 20 -i 10 -I 95 -c -C

The numbers I was getting with the new code were consistently very
slightly (1-2%) better then the old code.  I don't consider this
an improvement, but it's not a regression! :)

Running 'perf record' on the host didn't show any new hot spots
and cpu utilization stayed about the same.  This was better
then I expected from simply looking at the code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agomacvlan: Remove custom recieve and forward handlers
Vlad Yasevich [Wed, 11 Dec 2013 18:27:11 +0000 (13:27 -0500)]
macvlan: Remove custom recieve and forward handlers

Since now macvlan and macvtap use the same receive and
forward handlers, we can remove them completely and use
netif_rx and dev_forward_skb() directly.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agomacvtap: Add support of packet capture on macvtap device.
Vlad Yasevich [Wed, 11 Dec 2013 18:27:10 +0000 (13:27 -0500)]
macvtap: Add support of packet capture on macvtap device.

Macvtap device currently doesn not allow a user to capture
traffic on due to the fact that it steals the packets
from the network stack before the skb->dev is set correctly
on the receive side, and that use uses macvlan transmit
path directly on the send side.  As a result, we never
get a change to give traffic to the taps while the correct
device is set in the skb.

This patch makes macvtap device behave almost exaclty like
macvlan.  On the send side, we switch to using dev_queue_xmit().
On the receive side, to deliver packets to macvtap, we now
use netif_rx and dev_forward_skb just like macvlan.  The only
differnce now is that macvtap has its own rx_handler which is
attached to the macvtap netdev.  It is here that we now steal
the packet and provide it to the socket.

As a result, we can now capture traffic on the macvtap device:
   tcpdump -i macvtap0

It also gives us the abilit to add tc actions to the macvtap
device and actually utilize different bandwidth management
queues on output.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'bpf'
David S. Miller [Thu, 12 Dec 2013 01:28:41 +0000 (20:28 -0500)]
Merge branch 'bpf'

Daniel Borkmann says:

====================
bpf/filter updates

This set adds just two minimal helper tools that complement the
already available bpf_jit_disasm and complete BPF tooling; plus
it adds and an extensive documentation update of filter.txt.

Please see individual descriptions for details.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agofilter: doc: improve BPF documentation
Daniel Borkmann [Wed, 11 Dec 2013 22:43:45 +0000 (23:43 +0100)]
filter: doc: improve BPF documentation

This patch significantly updates the BPF documentation and describes
its internal architecture, Linux extensions, and handling of the
kernel's BPF and JIT engine, plus documents how development can be
facilitated with the help of bpf_dbg, bpf_asm, bpf_jit_disasm.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agofilter: bpf_asm: add minimal bpf asm tool
Daniel Borkmann [Wed, 11 Dec 2013 22:43:44 +0000 (23:43 +0100)]
filter: bpf_asm: add minimal bpf asm tool

There are a couple of valid use cases for a minimal low-level bpf asm
like tool, for example, using/linking to libpcap is not an option, the
required BPF filters use Linux extensions that are not supported by
libpcap's compiler, a filter might be more complex and not cleanly
implementable with libpcap's compiler, particular filter codes should
be optimized differently than libpcap's internal BPF compiler does,
or for security audits of emitted BPF JIT code for prepared set of BPF
instructions resp. BPF JIT compiler development in general.

Then, in such cases writing such a filter in low-level syntax can be
an good alternative, for example, xt_bpf and cls_bpf users might have
requirements that could result in more complex filter code, or one that
cannot be expressed with libpcap (e.g. different return codes in
cls_bpf for flowids on various BPF code paths).

Moreover, BPF JIT implementors may wish to manually write test cases
in order to verify the resulting JIT image, and thus need low-level
access to BPF code generation as well. Therefore, complete the available
toolchain for BPF with this small bpf_asm helper tool for the tools/net/
directory. These 3 complementary minimal helper tools round up and
facilitate BPF development.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agofilter: bpf_dbg: add minimal bpf debugger
Daniel Borkmann [Wed, 11 Dec 2013 22:43:43 +0000 (23:43 +0100)]
filter: bpf_dbg: add minimal bpf debugger

This patch adds a minimal BPF debugger that "emulates" the kernel's
BPF engine (w/o extensions) and allows for single stepping (forwards
and backwards through BPF code) or running with >=1 breakpoints through
selected or all packets from a pcap file with a provided user filter
in order to facilitate verification of a BPF program. When a breakpoint
is being hit, it dumps all register contents, decoded instructions and
in case of branches both decoded branch targets as well as other useful
information.

Having this facility is in particular useful to verify BPF programs
against given test traffic *before* attaching to a live system.

With the general availability of cls_bpf, xt_bpf, socket filters,
team driver and e.g. PTP code, all BPF users, quite often a single
more complex BPF program is being used. Reasons for a more complex
BPF program are primarily to optimize execution time for making a
verdict when multiple simple BPF programs are combined into one in
order to prevent parsing same headers multiple times. In particular,
for cls_bpf that can have various return paths for encoding flowids,
and xt_bpf to come to a fw verdict this can be the case.

Therefore, as this can result in more complex and harder to debug
code, it would be very useful to have this minimal tool for testing
purposes. It can also be of help for BPF JIT developers as filters
are "test attached" to the kernel on a temporary socket thus
triggering a JIT image dump when enabled. The tool uses an interactive
libreadline shell with auto-completion and history support.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: eth: cpsw: 64-bit phys_addr_t and sparse cleanup
Olof Johansson [Wed, 11 Dec 2013 23:58:07 +0000 (15:58 -0800)]
net: eth: cpsw: 64-bit phys_addr_t and sparse cleanup

Minor fix for printk format of a phys_addr_t, and the switch of two local
functions to static since they're not used outside of the file.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: eth: davinci_cpdma: Mark a local variable static
Olof Johansson [Wed, 11 Dec 2013 23:51:21 +0000 (15:51 -0800)]
net: eth: davinci_cpdma: Mark a local variable static

Only used locally. Found by sparse.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: eth: davinci_cpdma: 64-bit phys/dma_addr_t cleanup
Olof Johansson [Wed, 11 Dec 2013 23:51:20 +0000 (15:51 -0800)]
net: eth: davinci_cpdma: 64-bit phys/dma_addr_t cleanup

Silences the below warnings when building with ARM_LPAE enabled, which
gives longer dma_addr_t by default:

drivers/net/ethernet/ti/davinci_cpdma.c: In function 'cpdma_desc_pool_create':
drivers/net/ethernet/ti/davinci_cpdma.c:182:3: warning: passing argument 3 of 'dma_alloc_attrs' from incompatible pointer type [enabled by default]
drivers/net/ethernet/ti/davinci_cpdma.c: In function 'desc_phys':
drivers/net/ethernet/ti/davinci_cpdma.c:222:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
drivers/net/ethernet/ti/davinci_cpdma.c:223:8: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years ago8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature
Matthew Whitehead [Wed, 11 Dec 2013 22:00:59 +0000 (17:00 -0500)]
8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature

Removed the shared ei_debug variable. Replaced it by adding u32 msg_enable to
the private struct ei_device. Now each 8390 ethernet instance has a per-device
logging variable.

Changed older style printk() calls to more canonical forms.

Tested on: ne, ne2k-pci, smc-ultra, and wd hardware.

V4.0
- Substituted pr_info() and pr_debug() for printk() KERN_INFO and KERN_DEBUG

V3.0
- Checked for cases where pr_cont() was most appropriate choice.
- Changed module parameter from 'debug' to 'msg_enable' because debug was
no longer the best description.

V2.0
- Changed netif_msg_(drv|probe|ifdown|rx_err|tx_err|tx_queued|intr|rx_status|hw)
to netif_(dbg|info|warn|err) where possible.

Signed-off-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: router reachability probing
Jiri Benc [Wed, 11 Dec 2013 12:48:20 +0000 (13:48 +0100)]
ipv6: router reachability probing

RFC 4191 states in 3.5:

   When a host avoids using any non-reachable router X and instead sends
   a data packet to another router Y, and the host would have used
   router X if router X were reachable, then the host SHOULD probe each
   such router X's reachability by sending a single Neighbor
   Solicitation to that router's address.  A host MUST NOT probe a
   router's reachability in the absence of useful traffic that the host
   would have sent to the router if it were reachable.  In any case,
   these probes MUST be rate-limited to no more than one per minute per
   router.

Currently, when the neighbour corresponding to a router falls into
NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state
value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but
should be probed with a single NS. The probe is ratelimited by the existing
code. To better distinguish meanings of the failure values, rename
RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosctp: remove redundant null check on asoc
wangweidong [Wed, 11 Dec 2013 08:42:14 +0000 (16:42 +0800)]
sctp: remove redundant null check on asoc

In sctp_err_lookup, goto out while the asoc is not NULL, so remove the
check NULL. Also, in sctp_err_finish which called by sctp_v4_err and
sctp_v6_err, they pass asoc to sctp_err_finish while the asoc is not
NULL, so remove the check.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosch_htb: remove unnecessary NULL pointer judgment
Yang Yingliang [Wed, 11 Dec 2013 07:48:37 +0000 (15:48 +0800)]
sch_htb: remove unnecessary NULL pointer judgment

It already has a NULL pointer judgment of rtab in qdisc_put_rtab().
Remove the judgment outside of qdisc_put_rtab().

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv4: fix wildcard search with inet_confirm_addr()
Nicolas Dichtel [Tue, 10 Dec 2013 14:02:40 +0000 (15:02 +0100)]
ipv4: fix wildcard search with inet_confirm_addr()

Help of this function says: "in_dev: only on this interface, 0=any interface",
but since commit 39a6d0630012 ("[NETNS]: Process inet_confirm_addr in the
correct namespace."), the code supposes that it will never be NULL. This
function is never called with in_dev == NULL, but it's exported and may be used
by an external module.

Because this patch restore the ability to call inet_confirm_addr() with in_dev
== NULL, I partially revert the above commit, as suggested by Julian.

CC: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agovxlan: leave multicast group when vxlan device down
Gao feng [Tue, 10 Dec 2013 08:37:33 +0000 (16:37 +0800)]
vxlan: leave multicast group when vxlan device down

vxlan_group_used only allows device to leave multicast group
when the remote_ip of this vxlan device is difference from
other vxlan devices' remote_ip. this will cause device not
leave multicast group untile the vn_sock of this vxlan deivce
being released.

The check in vxlan_group_used is not quite precise. since even
the remote_ip is same, but these vxlan devices may use different
lower devices, and they may use different vn_socks.

Only when some vxlan devices use the same vn_sock,same lower
device and same remote_ip, the mc_list of the vn_sock should
not be changed.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agovxlan: remove vxlan_group_used in vxlan_open
Gao feng [Tue, 10 Dec 2013 08:37:32 +0000 (16:37 +0800)]
vxlan: remove vxlan_group_used in vxlan_open

In vxlan_open, vxlan_group_used always returns true,
because the state of the vxlan deivces which we want
to open has alreay been running. and it has already
in vxlan_list.

Since ip_mc_join_group takes care of the reference
of struct ip_mc_list. removing vxlan_group_used here
is safe.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobgmac: replace some magic values with defines
Rafał Miłecki [Wed, 11 Dec 2013 07:44:37 +0000 (08:44 +0100)]
bgmac: replace some magic values with defines

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobgmac: reset cached MAC state during chip reset
Rafał Miłecki [Wed, 11 Dec 2013 06:44:14 +0000 (07:44 +0100)]
bgmac: reset cached MAC state during chip reset

Without this bgmac_adjust_link didn't know it should re-initialize MAC
state. This led to the MAC not working after if down & up routine.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet_sched: expand control flow of macro SKIP_NONLOCAL
Yang Yingliang [Wed, 11 Dec 2013 07:17:11 +0000 (15:17 +0800)]
net_sched: expand control flow of macro SKIP_NONLOCAL

SKIP_NONLOCAL hides the control flow. The control flow should be
inlined and expanded explicitly in code so that someone who reads
it can tell the control flow can be changed by the statement.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: macb: Fix build warning
Soren Brinkmann [Wed, 11 Dec 2013 04:57:57 +0000 (20:57 -0800)]
net: macb: Fix build warning

When adjusting the link speed, the target frequency is determined by a
'swith (LINK_SPEED)' statement, that assigns the target rate only for
valid and expected LINK_SPEED values. This incomplete switch statement
leads to the following build warning:
     drivers/net/ethernet/cadence/macb.c: In function 'macb_handle_link_change':
  >> drivers/net/ethernet/cadence/macb.c:241:14: warning: 'rate' may be used uninitialized in this function [-Wmaybe-uninitialized]
        netdev_warn(dev, "unable to generate target frequency: %ld Hz\n",
                   ^
     drivers/net/ethernet/cadence/macb.c:215:13: note: 'rate' was declared here
       long ferr, rate, rate_rounded;

Fixing this by bailing out of that function in the switch's default case
before the rate variable is used.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'tipc'
David S. Miller [Wed, 11 Dec 2013 05:17:51 +0000 (00:17 -0500)]
Merge branch 'tipc'

Jon Maloy says:

====================
tipc: cleanups in media and bearer layer

This commit series performs a number cleanups in order to make the
bearer and media part of the code more comprehensible and manageable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotipc: remove unused 'blocked' flag from tipc_link struct
Ying Xue [Wed, 11 Dec 2013 04:45:44 +0000 (20:45 -0800)]
tipc: remove unused 'blocked' flag from tipc_link struct

In early versions of TIPC it was possible to administratively block
individual links through the use of the member flag 'blocked'. This
functionality was deemed redundant, and since commit 7368dd ("tipc:
clean out all instances of #if 0'd unused code"), this flag has been
unused.

In the current code, a link only needs to be blocked for sending and
reception if it is subject to an ongoing link failover. In that case,
it is sufficient to check if the number of expected failover packets
is non-zero, something which is done via the funtion 'link_blocked()'.

This commit finally removes the redundant 'blocked' flag completely.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotipc: eliminate code duplication in media layer
Ying Xue [Wed, 11 Dec 2013 04:45:43 +0000 (20:45 -0800)]
tipc: eliminate code duplication in media layer

Currently TIPC supports two L2 media types, Ethernet and Infiniband.
Because both these media are accessed through the common net_device API,
several functions in the two media adaptation files turn out to be
fully or almost identical, leading to unnecessary code duplication.

In this commit we extract this common code from the two media files
and move them to the generic bearer.c. Additionally, we change
the function names to reflect their real role: to access L2 media,
irrespective of type.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotipc: relocate common functions from media to bearer
Ying Xue [Wed, 11 Dec 2013 04:45:42 +0000 (20:45 -0800)]
tipc: relocate common functions from media to bearer

Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.

As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.

Note that the eth_started and ib_started flags were removed during the
code relocation.  There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.

Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotipc: remove TIPC usage of field af_packet_priv in struct net_device
Ying Xue [Wed, 11 Dec 2013 04:45:41 +0000 (20:45 -0800)]
tipc: remove TIPC usage of field af_packet_priv in struct net_device

TIPC is currently using the field 'af_packet_priv' in struct net_device
as a handle to find the bearer instance associated to the given network
device. But, by doing so it is blocking other networking cleanups, such
as the one discussed here:

http://patchwork.ozlabs.org/patch/178044/

This commit removes this usage from TIPC. Instead, we introduce a new
field, 'tipc_ptr', to the net_device structure, to serve this purpose.
When TIPC bearer is enabled, the bearer object is associated to
'tipc_ptr'. When a TIPC packet arrives in the recv_msg() upcall
from a networking device, the bearer object can now be obtained from
'tipc_ptr'. When a bearer is disabled, the bearer object is detached
from its underlying network device by setting 'tipc_ptr' to NULL.

Additionally, an RCU lock is used to protect the new pointer.
Henceforth, the existing tipc_net_lock is used in write mode to
serialize write accesses to this pointer, while the new RCU lock is
applied on the read side to ensure that the pointer is 100% valid
within its wrapped area for all readers.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotipc: improve naming and comment consistency in media layer
Jon Paul Maloy [Wed, 11 Dec 2013 04:45:40 +0000 (20:45 -0800)]
tipc: improve naming and comment consistency in media layer

struct 'tipc_media' represents the specific info that the media
layer adaptors (eth_media and ib_media) expose to the generic
bearer layer. We clarify this by improved commenting, and by giving
the 'media_list' array the more appropriate name 'media_info_array'.

There are no functional changes in this commit.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotipc: initiate media type array at compile time
Jon Paul Maloy [Wed, 11 Dec 2013 04:45:39 +0000 (20:45 -0800)]
tipc: initiate media type array at compile time

Communication media types are abstracted through the struct 'tipc_media',
one per media type. These structs are allocated statically inside their
respective media file.

Furthermore, in order to be able to reach all instances from a central
location, we keep a static array with pointers to these structs. This
array is currently initialized at runtime, under protection of
tipc_net_lock. However, since the contents of the array itself never
changes after initialization, we can just as well initialize it at
compile time and make it 'const', at the same time making it obvious
that no lock protection is needed here.

This commit makes the array constant and removes the redundant lock
protection.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotipc: eliminate redundant code with kfree_skb_list routine
Ying Xue [Wed, 11 Dec 2013 04:45:38 +0000 (20:45 -0800)]
tipc: eliminate redundant code with kfree_skb_list routine

sk_buff lists are currently relased by looping over the list and
explicitly releasing each buffer.

We replace all occurrences of this loop with a call to kfree_skb_list().

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'macb'
David S. Miller [Wed, 11 Dec 2013 03:56:31 +0000 (22:56 -0500)]
Merge branch 'macb'

From: Soren Brinkmann <soren.brinkmann@xilinx.com>

====================
net: macb updates

I'd really like to have Ethernet working for Zynq, so I want to at least
revive this discussion regarding this patchset. And the first four
patches should not even be too controversial.
I didn't change anything compared to my original RFC submission, except
for a typo in one of the commit messages.
Handling the tx_clk as optional clock input seems a little bit weird,
but it works on my Zynq platform and should be compatible with other
users of macb and their DT descriptions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: macb: Adjust tx_clk when link speed changes
Soren Brinkmann [Wed, 11 Dec 2013 00:07:23 +0000 (16:07 -0800)]
net: macb: Adjust tx_clk when link speed changes

Adjust the ethernet clock according to the negotiated link speed.

Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: macb: Use devm_request_irq()
Soren Brinkmann [Wed, 11 Dec 2013 00:07:22 +0000 (16:07 -0800)]
net: macb: Use devm_request_irq()

Use the device managed interface to request the IRQ, simplifying error
paths.

Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: macb: Use devm_ioremap()
Soren Brinkmann [Wed, 11 Dec 2013 00:07:21 +0000 (16:07 -0800)]
net: macb: Use devm_ioremap()

Use the device managed version of ioremap to remap IO memory,
simplifying error paths.

Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: macb: Migrate to devm clock interface
Soren Brinkmann [Wed, 11 Dec 2013 00:07:20 +0000 (16:07 -0800)]
net: macb: Migrate to devm clock interface

Migrate to using the device managed interface for clocks and clean up
the associated error paths.

Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: macb: Migrate to dev_pm_ops
Soren Brinkmann [Wed, 11 Dec 2013 00:07:19 +0000 (16:07 -0800)]
net: macb: Migrate to dev_pm_ops

Migrate the suspend/resume functions to use the dev_pm_ops PM interface.

Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet_sched: sfq: put sfq_unlink in a do - while loop
Yang Yingliang [Tue, 10 Dec 2013 12:55:33 +0000 (20:55 +0800)]
net_sched: sfq: put sfq_unlink in a do - while loop

Macros with multiple statements should be enclosed in a do - while loop

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet_sched: add space around '>' and before '('
Yang Yingliang [Tue, 10 Dec 2013 12:55:32 +0000 (20:55 +0800)]
net_sched: add space around '>' and before '('

Spaces required around that '>' (ctx:VxV) and
before the open parenthesis '('.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet_sched: change "foo* bar" to "foo *bar"
Yang Yingliang [Tue, 10 Dec 2013 12:55:31 +0000 (20:55 +0800)]
net_sched: change "foo* bar" to "foo *bar"

"foo* bar" or "foo * bar" should be "foo *bar".

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet_sched: cls_bpf: use tabs to do indent
Yang Yingliang [Tue, 10 Dec 2013 12:55:30 +0000 (20:55 +0800)]
net_sched: cls_bpf: use tabs to do indent

Code indent should use tabs where possible

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>