review.tizen.org Git - platform/kernel/linux-amlogic.git/log

r8169: adjust the flow of hw_start

The suggestion as following:
- initial settings or default settings
- rtl_hw_start_xxx. rtl_hw_start_xxx may change some default settings.
- enable tx/rx. This has to be after the above two steps.
- rtl_set_rx_mode. AcceptXXXs have to be enabled after enabling tx/rx.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: add a new chip for RTL8111G

Add a new chip for RTL8111G series.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: Update the RTL8111G parameters

- replace rtl8168g-1.fw with rtl8168g-2.fw which support new method.
- fix PHY power down is useless.
- disable rx early which causes the rx abnormal.
- enable auto fifo.
- set 10M IFG to default value.
- fix the conflict between jumbo frame and flow control.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: Modify the method for setting firmware

Remove useless action PHY_READ_EFUSE, PHY_READ_MAC_BYTE, PHY_WRITE_MAC_BYTE,
PHY_WRITE_ERI_WORD. And define the new action PHY_MDIO_CHG.

PHY_MDIO_CHG is used to modify the mdio operation. By the way, the
firmware could support setting mac ocp.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: Update PHY settings of RTL8111G

Add the new settings and correct the wrong settings.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: Modify the mothod for PHY settings of RTL8111G

Replace the current settings with rtl_writephy and rtl_readphy.
For the hardware, the settings are same with previous ones. This
make the setting method like the previous chips.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: Remove firmware code

Some codes are belong to binary codes and should be removed.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://1984.lsi.us.es/nf-next

Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter and IPVS updates for
your net-next tree, most relevantly they are:

* Add net namespace support to NFLOG, ULOG and ebt_ulog and NFQUEUE.
  The LOG and ebt_log target has been also adapted, but they still
  depend on the syslog netnamespace that seems to be missing, from
  Gao Feng.

* Don't lose indications of congestion in IPv6 fragmentation handling,
  from Hannes Frederic Sowa.i

* IPVS conversion to use RCU, including some code consolidation patches
  and optimizations, also some from Julian Anastasov.

* cpu fanout support for NFQUEUE, from Holger Eitzenberger.

* Better error reporting to userspace when dropping packets from
  all our _*_[xfrm|route]_me_harder functions, from Patrick McHardy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling

This change brings netfilter reassembly logic on par with
reassembly.c. The corresponding change in net-next is
(eec2e61 ipv6: implement RFC3168 5.3 (ecn protection) for
ipv6 fragmentation handling)

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: remove unneeded variable proc_net_netfilter

Now that this supports net namespace for nflog and nfqueue,
we can remove the global proc_net_netfilter which has no
clients anymore.

Based on patch from Gao feng.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink_queue: add net namespace support for nfnetlink_queue

This patch makes /proc/net/netfilter/nfnetlink_queue pernet.
Moreover, there's a pernet instance table and lock.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: enable per netns support for nf_loggers

After this patch, all nf_loggers support net namespace. Still
xt_LOG and ebt_log require syslog netns support.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink_log: add net namespace support for nfnetlink_log

This patch makes /proc/net/netfilter/nfnetlink_log pernet.
Moreover, there's a pernet instance table and lock.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ipt_ULOG: add net namespace support for ipt_ULOG

Add pernet support to ipt_ULOG by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

This patch also make ulog_buffers and netlink socket
nflognl per netns.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ebt_ulog: add net namespace support for ebt_ulog

Add pernet support to ebt_ulog by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

This patch also make ulog_buffers and netlink socket
ebtulognl per netns.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: xt_LOG: add net namespace support for xt_LOG

Add pernet support to xt_LOG by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

Since syslog ns has yet not been implemented, we don't want
the containers to DDOS host's syslogd. So only enable ebt_log
only from init_net and wait for syslog ns support

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ebt_log: add net namespace support for ebt_log

Add pernet support to ebt_log by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

Since syslog ns has yet not been implemented, we don't want
the containers to DDOS host's syslogd. So only enable ebt_log
only from init_net and wait for syslog ns support.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_log: prepare net namespace support for loggers

This patch adds netns support to nf_log and it prepares netns
support for existing loggers. It is composed of four major
changes.

1) nf_log_register has been split to two functions: nf_log_register
   and nf_log_set. The new nf_log_register is used to globally
   register the nf_logger and nf_log_set is used for enabling
   pernet support from nf_loggers.

   Per netns is not yet complete after this patch, it comes in
   separate follow up patches.

2) Add net as a parameter of nf_log_bind_pf. Per netns is not
   yet complete after this patch, it only allows to bind the
   nf_logger to the protocol family from init_net and it skips
   other cases.

3) Adapt all nf_log_packet callers to pass netns as parameter.
   After this patch, this function only works for init_net.

4) Make the sysctl net/netfilter/nf_log pernet.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: make /proc/net/netfilter pernet

This patch makes this proc dentry pernet. So far only init_net
had a /proc/net/netfilter directory.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

net: frag queue per hash bucket locking

This patch implements per hash bucket locking for the frag queue
hash.  This removes two write locks, and the only remaining write
lock is for protecting hash rebuild.  This essentially reduce the
readers-writer lock to a rebuild lock.

This patch is part of "net: frag performance followup"
http://thread.gmane.org/gmane.linux.network/263644
of which two patches have already been accepted:

Same test setup as previous:
(http://thread.gmane.org/gmane.linux.network/257155)
Two 10G interfaces, on seperate NUMA nodes, are under-test, and uses
Ethernet flow-control.  A third interface is used for generating the
DoS attack (with trafgen).

Notice, I have changed the frag DoS generator script to be more
efficient/deadly.  Before it would only hit one RX queue, now its
sending packets causing multi-queue RX, due to "better" RX hashing.

Test types summary (netperf UDP_STREAM):
Test-20G64K     == 2x10G with 65K fragments
Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
Test-20G64K+DoS == Same as 20G64K with frag DoS
Test-20G3F+DoS  == Same as 20G3F  with frag DoS
Test-20G64K+MQ  == Same as 20G64K with Multi-Queue frag DoS
Test-20G3F+MQ   == Same as 20G3F  with Multi-Queue frag DoS

When I rebased this-patch(03) (on top of net-next commit a210576c) and
removed the _bh spinlock, I saw a performance regression.  BUT this
was caused by some unrelated change in-between.  See tests below.

Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de.
Test (B) verifying-retest of commit 1b5ab0de corrospond to patch-02.
Test (C) is what I reported before for this-patch

Test (D) is net-next master HEAD (commit a210576c), which reveals some
(unknown) performance regression (compared against test (B)).
Test (D) function as a new base-test.

Performance table summary (in Mbit/s):

(#) Test-type:  20G64K    20G3F    20G64K+DoS  20G3F+DoS  20G64K+MQ 20G3F+MQ
    ----------  -------   -------  ----------  ---------  --------  -------
(A) Patch-02  : 18848.7   13230.1   4103.04     5310.36     130.0    440.2
(B) 1b5ab0de  : 18841.5   13156.8   4101.08     5314.57     129.0    424.2
(C) Patch-03v1: 18838.0   13490.5   4405.11     6814.72     196.6    461.6

(D) a210576c  : 18321.5   11250.4   3635.34     5160.13     119.1    405.2
(E) with _bh  : 17247.3   11492.6   3994.74     6405.29     166.7    413.6
(F) without bh: 17471.3   11298.7   3818.05     6102.11     165.7    406.3

Test (E) and (F) is this-patch(03), with(V1) and without(V2) the _bh spinlocks.

I cannot explain the slow down for 20G64K (but its an artificial
"lab-test" so I'm not worried).  But the other results does show
improvements.  And test (E) "with _bh" version is slightly better.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
----
V2:
- By analysis from Hannes Frederic Sowa and Eric Dumazet, we don't
  need the spinlock _bh versions, as Netfilter currently does a
  local_bh_disable() before entering inet_fragment.
- Fold-in desc from cover-mail
V3:
- Drop the chain_len counter per hash bucket.
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Pull net into net-next to get the synchronize_net() bug fix in
bonding.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix VSOCK layer handling of context ID changes, from Reilly Grant.

2) Now that we have a synchronize_net() in netdev_rx_handler_unregister(),
    we can't let any call sites hold locks.  Unfortunately bonding does,
    so we have to drop the rwlock there a little bit earlier, fix from
    Veaceslav Falico.

3) MAC address setting loop exits one iteration too early in mlx4
    driver, from Yan Burman.

4) Restore ipv6 routes properly upon ifdown/ifup of loopback, from
    Balakumaran Kannan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  VSOCK: Handle changes to the VMCI context ID.
  net IPv6 : Fix broken IPv6 routing table after loopback down-up
  cbq: incorrect processing of high limits
  net/mlx4_en: Fix setting initial MAC address
  bonding: get netdev_rx_handler_unregister out of locks

Merge tag 'regmap-v3.9-rc4' of git://git./linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
"A small collection of fixes.  The most important ones are those from
  Stephen and Lars-Peter both of which fix cache issues that have been
  lurking for a while but not manifesting noticably enough for anyone to
  report them."

* tag 'regmap-v3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: async: Add missing return
  regmap: don't corrupt work buffer in _regmap_raw_write()
  regmap: cache Fix regcache-rbtree sync
  regmap: Initialize `map->debugfs' before regcache

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull DRM fixes from Dave Airlie:
"Two core fixes, both regressions, along with some intel and some
  nouveau fixes for regressions and oopses"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm: correctly restore mappings if drm_open fails
  drm/nouveau: fix NULL ptr dereference from nv50_disp_intr()
  drm/nouveau: fix handling empty channel list in ioctl's
  drm: don't unlock in the addfb error paths
  drm/i915: Fix build failure
  drm/i915: Be sure to turn hsync/vsync back on at crt enable (v2)
  drm/i915: duct-tape locking when eDP init fails

Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
"A collection of fixes pretty much across the MIPS code.  Even the
  change to include/linux/signal.h by David Howells' 2a1486981c13 ("Fix
  breakage in MIPS siginfo handling") should be considered MIPS-specific
  as it touches an ifdefed segment that is only relevant to MIPS and
  which unfortunately can't be made to go away entirely."

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  Fix breakage in MIPS siginfo handling
  Revert "MIPS: BCM63XX: Call board_register_device from device_initcall()"
  MIPS: BCM63XX: Make nvram checksum failure non fatal
  MIPS: Fix code generation for non-DSP capable CPUs
  MIPS: Fix inconsistent formatting inside /proc/cpuinfo
  MIPS: SEAD3: Enable LL/SC.
  MIPS: Get rid of CONFIG_CPU_HAS_LLSC again
  MIPS: Add dependencies for HAVE_ARCH_TRANSPARENT_HUGEPAGE
  MIPS: VR4133: Fix probe for LL/SC.
  MIPS: Fix logic errors in bitops.c
  MIPS: Use CONFIG_CPU_MIPSR2 in csum_partial.S
  MIPS: compat: Return same error ENOSYS as native for invalid operation.

net: fix smatch warnings inside datagram_poll

Commit 7d4c04fc170087119727119074e72445f2bb192b ("net: add option to enable
error queue packets waking select") has an issue due to operator precedence
causing the bit-wise OR to bind to the sock_flags call instead of the result of
the terniary conditional. This fixes the *_poll functions to work properly. The
old code results in "mask |= POLLPRI" instead of what was intended, which is to
only include POLLPRI when the socket option is enabled.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drm: correctly restore mappings if drm_open fails

If first drm_open fails, the error-handling path will
incorrectly restore inode's mapping to NULL. This can
cause the crash later on. Fix by separately storing
away mapping pointers that drm_open can touch and
restore each from its own respective variable if the
call fails.

Fixes: https://bugzilla.novell.com/show_bug.cgi?id=807850
(thanks to Michal Hocko for investigating investigating and
finding the root cause of the bug)

Reference:
http://lists.freedesktop.org/archives/dri-devel/2013-March/036564.html

v2: Use one variable to store file and inode mapping
since they are the same at the function entry.
Fix spelling mistakes in commit message.

v3: Add reference to the original bug report.

Reported-by: Marco Munderloh <munderl@tnt.uni-hannover.de>
Tested-by: Marco Munderloh <munderl@tnt.uni-hannover.de>
Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge branch 'drm-nouveau-fixes-3.9' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next

Oops fixers.
* 'drm-nouveau-fixes-3.9' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nouveau: fix NULL ptr dereference from nv50_disp_intr()
drm/nouveau: fix handling empty channel list in ioctl's

net/nxp/lpc_eth: Drop ifdef CONFIG_OF_NET

Since of_get_mac_address() is now declared even if CONFIG_OF_NET
is not configured, the ifdef is no longer necessary and can be
removed.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/freescale/fec: Simplify OF dependencies

Since of_get_mac_address() is now defined even if CONFIG_OF_NET
is not configured, the ifdef around the code calling it is no longer
necessary and can be removed.

Similar, since of_get_phy_mode() is now defined as dummy function
if OF_NET is not configured, it is no longer necessary to provide
an OF dependent function as front-end. Also, the function depends
on OF_NET, not on OF, so the conditional code was not correct anyway.
Drop the front-end function and call of_get_phy_mode() directly.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/cadence/macb: Simplify OF dependencies

With of_get_mac_address() and of_get_phy_mode() now defined as dummy
functions if OF_NET is not configured, it is no longer necessary to
provide OF dependent functions as front-end. Also, the two functions
depend on OF_NET, not on OF, so the conditional code was not correct
anyway.

Drop the front-end functions and call of_get_mac_address() and
of_get_phy_mode() directly instead.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/cadence/at91_ether: Simplify OF dependencies

With of_get_mac_address() and of_get_phy_mode() now defined as dummy
functions if OF_NET is not configured, it is no longer necessary to
provide OF dependent functions as front-end. Also, the two functions
depend on OF_NET, not on OF, so the conditional code was not correct
anyway.

Drop the front-end functions and call of_get_mac_address() and
of_get_phy_mode() directly instead.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

of_net.h: Provide empty functions if OF_NET is not configured

of_get_mac_address() and of_get_phy_mode() are only provided if OF_NET
is configured. While most callers check for the define, not all do, and those
who do require #ifdef around the code. For those who don't, the missing check
can result in errors such as

arch/powerpc/sysdev/tsi108_dev.c:107:3: error: implicit declaration of
function 'of_get_mac_address' [-Werror=implicit-function-declaration]
arch/powerpc/sysdev/mv64x60_dev.c:253:2: error: implicit declaration of
function 'of_get_mac_address' [-Werror=implicit-function-declaration]

Provide empty functions if OF_NET is not configured.
This is safe because all callers do check the return values.

Cc: David Daney <david.daney@cavium.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next

One locking regression fix, and a couple of other i915 ones.

* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
  drm: don't unlock in the addfb error paths
  drm/i915: Fix build failure
  drm/i915: Be sure to turn hsync/vsync back on at crt enable (v2)
  drm/i915: duct-tape locking when eDP init fails

VSOCK: Handle changes to the VMCI context ID.

The VMCI context ID of a virtual machine may change at any time. There
is a VMCI event which signals this but datagrams may be processed before
this is handled. It is therefore necessary to be flexible about the
destination context ID of any datagrams received. (It can be assumed to
be correct because it is provided by the hypervisor.) The context ID on
existing sockets should be updated to reflect how the hypervisor is
currently referring to the system.

Signed-off-by: Reilly Grant <grantr@vmware.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net IPv6 : Fix broken IPv6 routing table after loopback down-up

IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo)
interface. After down-up, routes of other interface's IPv6 addresses through
'lo' are lost.

IPv6 addresses assigned to all interfaces are routed through 'lo' for internal
communication. Once 'lo' is down, those routing entries are removed from routing
table. But those removed entries are not being re-created properly when 'lo' is
brought up. So IPv6 addresses of other interfaces becomes unreachable from the
same machine. Also this breaks communication with other machines because of
NDISC packet processing failure.

This patch fixes this issue by reading all interface's IPv6 addresses and adding
them to IPv6 routing table while bringing up 'lo'.

==Testing==
Before applying the patch:
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$

After applying the patch:
$ route -A inet6
Kernel IPv6 routing
table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$

Signed-off-by: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
Signed-off-by: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipconfig: add informative timeout messages while waiting for carrier

Commit 3fb72f1e6e6165c5f495e8dc11c5bbd14c73385c ("ipconfig wait
for carrier") added a "wait for carrier on at least one interface"
policy, with a worst case maximum wait of two minutes.

However, if you encounter this, you won't get any feedback from
the console as to the nature of what is going on. You just see
the booting process hang for two minutes and then continue.

Here we add a message so the user knows what is going on, and
hence can take action to rectify the situation (e.g. fix network
cable or whatever.) After the 1st 10s pause, output now begins
that looks like this:

Waiting up to 110 more seconds for network.
Waiting up to 100 more seconds for network.
Waiting up to 90 more seconds for network.
Waiting up to 80 more seconds for network.
...

Since most systems will have no problem getting link/carrier in the
1st 10s, the only people who will see these messages are people with
genuine issues that need to be resolved.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

forcedeth: Do a dma_mapping_error check after skb_frag_dma_map

This backtrace was recently reported on a 3.9 kernel:

Actual results: from syslog /var/log/messsages:
kernel: [17539.340285] ------------[ cut here ]------------
kernel: [17539.341012] WARNING: at lib/dma-debug.c:937 check_unmap+0x493/0x960()
kernel: [17539.341012] Hardware name: MS-7125
kernel: [17539.341012] forcedeth 0000:00:0a.0: DMA-API: device driver failed to
check map error[device address=0x0000000013c88000] [size=544 bytes] [mapped as
page]
kernel: [17539.341012] Modules linked in: fuse ebtable_nat ipt_MASQUERADE
nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_nat nf_nat_ipv6
ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat
nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
nf_conntrack bnep bluetooth rfkill ebtable_filter ebtables ip6table_filter
ip6_tables snd_hda_codec_hdmi snd_cmipci snd_mpu401_uart snd_hda_intel
snd_intel8x0 snd_opl3_lib snd_ac97_codec gameport snd_hda_codec snd_rawmidi
ac97_bus snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd
k8temp soundcore serio_raw i2c_nforce2 forcedeth ata_generic pata_acpi nouveau
video mxm_wmi wmi i2c_algo_bit drm_kms_helper ttm drm i2c_core sata_sil pata_amd
sata_nv uinput
kernel: [17539.341012] Pid: 17340, comm: sshd Not tainted
3.9.0-0.rc4.git0.1.fc19.i686.PAE #1
kernel: [17539.341012] Call Trace:
kernel: [17539.341012]  [<c045573c>] warn_slowpath_common+0x6c/0xa0
kernel: [17539.341012]  [<c0701953>] ? check_unmap+0x493/0x960
kernel: [17539.341012]  [<c0701953>] ? check_unmap+0x493/0x960
kernel: [17539.341012]  [<c04557a3>] warn_slowpath_fmt+0x33/0x40
kernel: [17539.341012]  [<c0701953>] check_unmap+0x493/0x960
kernel: [17539.341012]  [<c049238f>] ? sched_clock_cpu+0xdf/0x150
kernel: [17539.341012]  [<c0701e87>] debug_dma_unmap_page+0x67/0x70
kernel: [17539.341012]  [<f7eae8f2>] nv_unmap_txskb.isra.32+0x92/0x100

Its pretty plainly the result of an skb fragment getting unmapped without having
its initial mapping operation checked for errors.  This patch corrects that.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ISDN:divert: beautify code: useless 'break', 'return (0)', additional comments.

  delete useless 'break' after 'return'.
  let 'return 0' instead of 'return (0)'
  also give a comment for 'break' to let readers notice it.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cbq: incorrect processing of high limits

currently cbq works incorrectly for limits > 10% real link bandwidth,
and practically does not work for limits > 50% real link bandwidth.
Below are results of experiments taken on 1 Gbit link

In shaper | Actual Result
-----------+---------------
  100M     | 108 Mbps
  200M     | 244 Mbps
  300M     | 412 Mbps
  500M     | 893 Mbps

This happen because of q->now changes incorrectly in cbq_dequeue():
when it is called before real end of packet transmitting,
L2T is greater than real time delay, q_now gets an extra boost
but never compensate it.

To fix this problem we prevent change of q->now until its synchronization
with real time.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Reviewed-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Bump up the version to 5.2.40

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Fix sparse warnings.

warning: 'pf_info.max_tx_ques' may be used uninitialized in this function
[-Wmaybe-uninitialized]

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Fix NULL dereference in error path.

o Fix for smatch tool reported error
   drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c:2029
   qlcnic_probe() error: potential NULL dereference 'adapter'.
o While returning from an error path in probe, adapter is not
  initialized. So do not access adapter in cleanup path.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Fix potential NULL dereference

[net-next:master 301/302] drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c:563
qlcnic_set_multi() error: potential null dereference 'cur'. (kzalloc returns null)

o Break out of the loop after memory allocation failure. Program all the
MAC addresses that were cached in the return path.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Remove dead sysctl_tcp_cookie_size declaration

Remove a declaration left over from the TCPCT-ectomy. This sysctl is
no longer referenced anywhere since 1a2c6181c4 ("tcp: Remove TCPCT").

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipc: set msg back to -EAGAIN if copy wasn't performed

Make sure that msg pointer is set back to error value in case of
MSG_COPY flag is set and desired message to copy wasn't found. This
garantees that msg is either a error pointer or a copy address.

Otherwise the last message in queue will be freed without unlinking from
the queue (which leads to memory corruption) and the dummy allocated
copy won't be released.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bridge: remove a redundant synchronize_net()

commit 00cfec37484761 (net: add a synchronize_net() in
netdev_rx_handler_unregister())
allows us to remove the synchronized_net() call from del_nbp()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fix setting initial MAC address

Commit 6bbb6d9 "net/mlx4_en: Optimize Rx fast path filter checks" introduced a regression
under which the MAC address read from the card was not converted correctly
(the most significant byte was not handled), fix that.

Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: get netdev_rx_handler_unregister out of locks

Now that netdev_rx_handler_unregister contains synchronize_net(), we need
to call it outside of bond->lock, cause it might sleep. Also, remove the
already unneded synchronize_net().

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'fixes' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC bug fixes from Arnd Bergmann:
"After a quiet set of fixes for 3.9-rc4, a lot of people woke up and
  sent urgent fixes for 3.9.  I pushed back on a number of them that got
  deferred to 3.10, but these are the ones that seemed important.

  Regression in 3.9:

   - Multiple regressions in OMAP2+ clock cleanup
   - SH-Mobile frame buffer bug fix that merged here because of
     maintainer MIA
   - ux500 prcmu changes broke DT booting
   - MMCI duplicated regulator setup on ux500
   - New ux500 clock driver broke ethernet on snowball
   - Local interrupt driver for mvebu broke ethernet
   - MVEBU GPIO driver did not get set up right on Orion DT
   - incorrect interrupt number on Orion crypto for DT

  Long-standing bugs, including candidates for stable:

   - Kirkwood MMC needs to disable invalid card detect pins
   - MV SDIO pinmux was wrong on Mirabox
   - GoFlex Net board file needs to set NAND chip delay
   - MSM timer restart race
   - ep93xx early debug code broke in 3.7
   - i.MX CPU hotplug race
   - Incorrect clock setup for OMAP1 USB
   - Workaround for bad clock setup by some old OMAP4 boot loaders
   - Static I/O mappings on cns3xxx since 3.2"

* tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: cns3xxx: fix mapping of private memory region
  arm: mvebu: Fix pinctrl for Armada 370 Mirabox SDIO port.
  arm: orion5x: correct IRQ used in dtsi for mv_cesa
  arm: orion5x: fix orion5x.dtsi gpio parameters
  ARM: Kirkwood: fix unused mvsdio gpio pins
  arm: mvebu: Use local interrupt only for the timer 0
  ARM: kirkwood: Fix chip-delay for GoFlex Net
  ARM: ux500: Enable the clock controlling Ethernet on Snowball
  ARM: ux500: Stop passing ios_handler() as an MMCI power controlling call-back
  ARM: ux500: Apply the TCPM and TCDM locations and sizes to dbx5x0 DT
  fbdev: sh_mobile_lcdc: fixup B side hsync adjust settings
  ARM: OMAP: clocks: Delay clk inits atleast until slab is initialized
  ARM: imx: fix sync issue between imx_cpu_die and imx_cpu_kill
  ARM: msm: Stop counting before reprogramming clockevent
  ARM: ep93xx: Fix wait for UART FIFO to be empty
  ARM: OMAP4: PM: fix PM regression introduced by recent clock cleanup
  ARM: OMAP3: hwmod data: keep MIDLEMODE in force-standby for musb
  ARM: OMAP4: clock data: lock USB DPLL on boot
  ARM: OMAP1: fix USB host on 1710

Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfix from J Bruce Fields:
"An xdr decoding error--thanks, Toralf Förster, and Trinity!"

* 'for-3.9' of git://linux-nfs.org/~bfields/linux:
nfsd4: reject "negative" acl lengths

Merge tag 'v3.9-rc1_cns3xxx_fixes' of git://git.infradead.org/users/cbou/linux-cns3xxx into fixes

From Anton Vorontsov <anton@enomsg.org>:

This tag includes Mac Lin's work to revive CNS3xxx booting:

"Since commit 0536bdf33faf (ARM: move iotable mappings within the vmalloc
region), [...] the pre-defined iotable mappings is not in the vmalloc
region. [...] move the iotable mappings into the vmalloc region, and
merge the MPCore private memory region (containing the SCU, the GIC and
the TWD) as a single region."

Plus there is a small cosmetic fix, also from Mac Lin.

* tag 'v3.9-rc1_cns3xxx_fixes' of git://git.infradead.org/users/cbou/linux-cns3xxx:
ARM: cns3xxx: fix mapping of private memory region

[arnd: dropped the cosmetic fix from the merge as it is not needed for 3.9]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

netfilter: fix struct ip6t_frag field description

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: xt_NFQUEUE: coalesce IPv4 and IPv6 hashing

Because rev1 and rev3 of the target share the same hashing
generalize it by introduing nfqueue_hash().

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: xt_NFQUEUE: introduce CPU fanout

Current NFQUEUE target uses a hash, computed over source and
destination address (and other parameters), for steering the packet
to the actual NFQUEUE. This, however forgets about the fact that the
packet eventually is handled by a particular CPU on user request.

If E. g.

  1) IRQ affinity is used to handle packets on a particular CPU already
     (both single-queue or multi-queue case)

and/or

  2) RPS is used to steer packets to a specific softirq

the target easily chooses an NFQUEUE which is not handled by a process
pinned to the same CPU.

The idea is therefore to use the CPU index for determining the
NFQUEUE handling the packet.

E. g. when having a system with 4 CPUs, 4 MQ queues and 4 NFQUEUEs it
looks like this:

+-----+  +-----+  +-----+  +-----+
|NFQ#0|  |NFQ#1|  |NFQ#2|  |NFQ#3|
+-----+  +-----+  +-----+  +-----+
    ^        ^        ^        ^
    |        |NFQUEUE |        |
    +        +        +        +
+-----+  +-----+  +-----+  +-----+
|rx-0 |  |rx-1 |  |rx-2 |  |rx-3 |
+-----+  +-----+  +-----+  +-----+

The NFQUEUEs not necessarily have to start with number 0, setups with
less NFQUEUEs than packet-handling CPUs are not a problem as well.

This patch extends the NFQUEUE target to accept a new
NFQ_FLAG_CPU_FANOUT flag. If this is specified the target uses the
CPU index for determining the NFQUEUE being used. I have to introduce
rev3 for this. The 'flags' are folded into _v2 'bypass'.

By changing the way which queue is assigned, I'm able to improve the
performance if the processes reading on the NFQUEUs are pinned
correctly.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/rusty/linux

Pull virtio fixes from Rusty Russell:
"One reversion, a tiny leak fix, and a cc:stable locking fix, in two
  parts"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  virtio: console: add locking around c_ovq operations
  virtio: console: rename cvq_lock to c_ivq_lock
  hw_random: free rng_buffer at module exit
  Revert "virtio_console: Initialize guest_connected=true for rproc_serial"

netfilter: use IS_ENABLE to replace if defined in TRACE target

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

loop: prevent bdev freeing while device in use

struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".

But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
  bd_set_size+0x10/0xa0
  loop_clr_fd+0x1f8/0x420 [loop]
  lo_ioctl+0x200/0x7e0 [loop]
  lo_compat_ioctl+0x47/0xe0 [loop]
  compat_blkdev_ioctl+0x341/0x1290
  do_filp_open+0x42/0xa0
  compat_sys_ioctl+0xc1/0xf20
  do_sys_open+0x16e/0x1d0
  sysenter_dispatch+0x7/0x1a

To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().

The issue is reprodusible on current Linus head and v3.3. Here is the test:

  dd if=/dev/zero of=loop.file bs=1M count=1
  while [ true ]; do
    losetup /dev/loop0 loop.file
    echo 2 > /proc/sys/vm/drop_caches
    losetup -d /dev/loop0
  done

[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
  time we call loop_set_fd() we check that loop_device->lo_state is
  Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
  it will get EBUSY.  And if we try to loop_clr_fd() on unbound loop
  device we'll get ENXIO.

  loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
  loop_device->lo_ctl_mutex. ]

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipvs: do not disable bh for long time

We used a global BH disable in LOCAL_OUT hook.
Add _bh suffix to all places that need it and remove
the disabling from LOCAL_OUT and sync code.

Functions like ip_defrag need protection from
BH, so add it. As for nf_nat_mangle_tcp_packet, it needs
RCU lock.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert services to rcu

This is the final step in RCU conversion.

Things that are removed:

- svc->usecnt: now svc is accessed under RCU read lock
- svc->inc: and some unused code
- ip_vs_bind_pe and ip_vs_unbind_pe: no ability to replace PE
- __ip_vs_svc_lock: replaced with RCU
- IP_VS_WAIT_WHILE: now readers lookup svcs and dests under
RCU and work in parallel with configuration

Other changes:

- before now, a RCU read-side critical section included the
calling of the schedule method, now it is extended to include
service lookup
- ip_vs_svc_table and ip_vs_svc_fwm_table are now using hlist
- svc->pe and svc->scheduler remain to the end (of grace period),
the schedulers are prepared for such RCU readers
even after done_service is called but they need
to use synchronize_rcu because last ip_vs_scheduler_put
can happen while RCU read-side critical sections
use an outdated svc->scheduler pointer
- as planned, update_service is removed
- empty services can be freed immediately after grace period.
If dests were present, the services are freed from
the dest trash code

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert dests to rcu

In previous commits the schedulers started to access
svc->destinations with _rcu list traversal primitives
because the IP_VS_WAIT_WHILE macro still plays the role of
grace period. Now it is time to finish the updating part,
i.e. adding and deleting of dests with _rcu suffix before
removing the IP_VS_WAIT_WHILE in next commit.

We use the same rule for conns as for the
schedulers: dests can be searched in RCU read-side critical
section where ip_vs_dest_hold can be called by ip_vs_bind_dest.

Some things are not perfect, for example, calling
functions like ip_vs_lookup_dest from updating code under
RCU, just because we use some function both from reader
and from updater.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert sched_lock to spin lock

As all read_locks are gone spin lock is preferred.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: do not expect result from done_service

This method releases the scheduler state,
it can not fail. Such change will help to properly
replace the scheduler in following patch.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: reorganize dest trash

All dests will go to trash, no exceptions.
But we have to use new list node t_list for this, due
to RCU changes in following patches. Dests will wait there
initial grace period and later all conns and schedulers to
put their reference. The dests don't get reference for
staying in dest trash as before.

As result, we do not load ip_vs_dest_put with
extra checks for last refcnt and the schedulers do not
need to play games with atomic_inc_not_zero while
selecting best destination.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert wrr scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations. As the weight for some
dest can be reduced during dest selection, change the
algorithm to check weights by using minimum weights in the
1 .. max_weight-(di-1) range, with the same step (di). By this
way we ensure that there will be always a weight >= 1 check
before claiming that all destinations are overloaded.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert wlc scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert sh scheduler to rcu

Use the 3 new methods to reassign dests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert sed scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert rr scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations. As the previous entry
could be unlinked, limit the list traversals to 2 when
lookup started from previous entry.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert nq scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert lc scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert lblcr scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations. The read_lock for sched_lock is
removed. The set.lock is removed because now it is used in
rare cases, mostly under sched_lock.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert lblc scheduler to rcu

The schedule method now needs _rcu list-traversal
primitive for svc->destinations. The read_lock for sched_lock is
removed. Use a dead flag to prevent new entries to be created
while scheduler is reclaimed. Use hlist for the hash table.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert dh scheduler to rcu

Use the new add_dest and del_dest methods
to reassign dests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: add ip_vs_dest_hold and ip_vs_dest_put

ip_vs_dest_hold will be used under RCU lock
while ip_vs_dest_put can be called even after dest
is removed from service, as it happens for conns and
some schedulers.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: preparations for using rcu in schedulers

Allow schedulers to use rcu_dereference when
returning destination on lookup. The RCU read-side critical
section will allow ip_vs_bind_dest to get dest refcnt as
preparation for the step where destinations will be
deleted without an IP_VS_WAIT_WHILE guard that holds the
packet processing during update.

Add new optional scheduler methods add_dest,
del_dest and upd_dest. For now the methods are called
together with update_service but update_service will be
removed in a following change.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: change ip_vs_sched_lock to mutex

The global list with schedulers ip_vs_schedulers
is accessed only from user context - configuration and
scheduler module [un]registration. Use ip_vs_sched_mutex
instead.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: avoid kmem_cache_zalloc in ip_vs_conn_new

We have many fields to set and few to reset,
use kmem_cache_alloc instead to save some cycles.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: reorder keys in connection structure

__ip_vs_conn_in_get and ip_vs_conn_out_get are
hot places. Optimize them, so that ports are matched first.
By moving net and fwmark below, on 32-bit arch we can fit
caddr in 32-byte cache line and all addresses in 64-byte
cache line.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert connection locking

Convert __ip_vs_conntbl_lock_array as follows:

- readers that do not modify conn lists will use RCU lock
- updaters that modify lists will use spinlock_t

Now for conn lookups we will use RCU read-side
critical section. Without using __ip_vs_conn_get such
places have access to connection fields and can
dereference some pointers like pe and pe_data plus
the ability to update timer expiration. If full access
is required we contend for reference.

We add barrier in __ip_vs_conn_put, so that
other CPUs see the refcnt operation after other writes.

With the introduction of ip_vs_conn_unlink()
we try to reorganize ip_vs_conn_expire(), so that
unhashing of connections that should stay more time is
avoided, even if it is for very short time.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert locks used in persistence engines

Allow the readers to use RCU lock and for
PE module registrations use global mutex instead of
spinlock. All PE modules need to use synchronize_rcu
in their module exit handler.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: remove rs_lock by using RCU

rs_lock was used to protect rs_table (hash table)
from updaters (under global mutex) and readers (packet handlers).
We can remove rs_lock by using RCU lock for readers. Reclaiming
dest only with kfree_rcu is enough because the readers access
only fields from the ip_vs_dest structure.

Use hlist for rs_table.

As we are now using hlist_del_rcu, introduce in_rs_table
flag as replacement for the list_empty checks which do not
work with RCU. It is needed because only NAT dests are in
the rs_table.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert app locks

We use locks like tcp_app_lock, udp_app_lock,
sctp_app_lock to protect access to the protocol hash tables
from readers in packet context while the application
instances (inc) are [un]registered under global mutex.

As the hash tables are mostly read when conns are
created and bound to app, use RCU for readers and reclaim
app instance after grace period.

Simplify ip_vs_app_inc_get because we use usecnt
only for statistics and rely on module refcounting.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: optimize dst usage for real server

Currently when forwarding requests to real servers
we use dst_lock and atomic operations when cloning the
dst_cache value. As the dst_cache value does not change
most of the time it is better to use RCU and to lock
dst_lock only when we need to replace the obsoleted dst.
For this to work we keep dst_cache in new structure protected
by RCU. For packets to remote real servers we will use noref
version of dst_cache, it will be valid while we are in RCU
read-side critical section because now dst_release for replaced
dsts will be invoked after the grace period. Packets to
local real servers that are passed to local stack with
NF_ACCEPT need a dst clone.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: consolidate all dst checks on transmit in one place

Consolidate the PMTU checks, ICMP sending and
skb_dst modification in __ip_vs_get_out_rt and
__ip_vs_get_out_rt_v6. Now skb_dst is changed early
to simplify the transmitters.

Make sure update_pmtu is called only for local clients.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: do not use skb_share_check

We run in contexts like ip_rcv, ipv6_rcv, br_handle_frame,
do not expect shared skbs.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: no need to reroute anymore on DNAT over loopback

After commit 70e7341673 (ipv4: Show that ip_send_reply()
is purely unicast routine.) we do not need to reroute DNAT-ed
traffic over loopback because reply uses iph daddr and not
rt_spec_dst.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: rename functions related to dst_cache reset

Move and give better names to two functions:

- ip_vs_dst_reset to __ip_vs_dst_cache_reset
- __ip_vs_dev_reset to ip_vs_forget_dev

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: convert the IP_VS_XMIT macros to functions

It was a bad idea to hide return statements in macros.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: prefer NETDEV_DOWN event to free cached dsts

The real server becomes unreachable on down event,
no need to wait device unregistration. Should help in
releasing dsts early before dst->dev is replaced with lo.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: avoid routing by TOS for real server

Avoid replacing the cached route for real server
on every packet with different TOS. I doubt that routing
by TOS for real server is used at all, so we should be
better with such optimization.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

net: add skb_dst_set_noref_force

Rename skb_dst_set_noref to __skb_dst_set_noref and
add force flag as suggested by David Miller. The new wrapper
skb_dst_set_noref_force will force dst entries that are not
cached to be attached as skb dst without taking reference
as long as provided dst is reclaimed after RCU grace period.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Simon Horman <horms@verge.net.au>

Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux

Pull tegra clock driver fix from Mike Turquette:
"Missing base address in Tegra clock driver results in non-operational
  PCIe.  On some devices this means that Ethernet will go uninitialized
  and other devices will fail.  This pull request fixes it with a single
  patch to pass the proper base address in the Tegra clock driver."

* tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux:
  clk: tegra: Allow PLLE training to succeed

Merge tag 'for-3.9-rc' of git://git./linux/kernel/git/rwlove/fcoe

Pull FCoE fixes from Robert Love:
"Critical patches to fix FCoE VN2VN mode with new interfaces targeting
  3.9-rc"

* tag 'for-3.9-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rwlove/fcoe:
  libfcoe: Fix fcoe_sysfs VN2VN mode
  libfc, fcoe, bnx2fc: Split fc_disc_init into fc_disc_{init, config}
  libfc, fcoe, bnx2fc: Always use fcoe_disc_init for discovery layer initialization
  fcoe: Fix deadlock between create and destroy paths
  bnx2fc: Make the fcoe_cltr the SCSI host parent

clk: tegra: Allow PLLE training to succeed

Under some circumstances the PLLE needs to be retrained, in which case
access to the PMC registers is required. Fix this by passing a pointer
to the PMC registers instead of NULL when registering the PLLE clock.

Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Acked-By: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>

Merge git://git./linux/kernel/git/davem/net

Conflicts:
net/mac80211/sta_info.c
net/wireless/core.h

Two minor conflicts in wireless. Overlapping additions of extern
declarations in net/wireless/core.h and a bug fix overlapping with
the addition of a boolean parameter to __ieee80211_key_free().

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'stable' of git://git./linux/kernel/git/cmetcalf/linux-tile

Pull arch/tile fix from Chris Metcalf:
"This change allows newer Tilera boot tools to work correctly with
  current (and stable) kernels by using the right filename to get the
  initramfs from the Tilera hypervisor filesystem."

* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  tile: expect new initramfs name from hypervisor file system

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) sadb_msg prepared for IPSEC userspace forgets to initialize the
    satype field, fix from Nicolas Dichtel.

2) Fix mac80211 synchronization during station removal, from Johannes
    Berg.

3) Fix IPSEC sequence number notifications when they wrap, from Steffen
    Klassert.

4) Fix cfg80211 wdev tracing crashes when add_virtual_intf() returns an
    error pointer, from Johannes Berg.

5) In mac80211, don't call into the channel context code with the
    interface list mutex held.  From Johannes Berg.

6) In mac80211, if we don't actually associate, do not restart the STA
    timer, otherwise we can crash.  From Ben Greear.

7) Missing dma_mapping_error() check in e1000, ixgb, and e1000e.  From
    Christoph Paasch.

8) Fix sja1000 driver defines to not conflict with SH port, from Marc
    Kleine-Budde.

9) Don't call il4965_rs_use_green with a NULL station, from Colin Ian
    King.

10) Suspend/Resume in the FEC driver fail because the buffer descriptors
    are not initialized at all the moments in which they should.  Fix
    from Frank Li.

11) cpsw and davinci_emac drivers both use the wrong interface to
    restart a stopped TX queue.  Use netif_wake_queue not
    netif_start_queue, the latter is for initialization/bringup not
    active management of the queue.  From Mugunthan V N.

12) Fix regression in rate calculations done by
    psched_ratecfg_precompute(), missing u64 type promotion.  From
    Sergey Popovich.

13) Fix length overflow in tg3 VPD parsing, from Kees Cook.

14) AOE driver fails to allocate enough headroom, resulting in crashes.
    Fix from Eric Dumazet.

15) RX overflow happens too quickly in sky2 driver because pause packet
    thresholds are not programmed correctly.  From Mirko Lindner.

16) Bonding driver manages arp_interval and miimon settings incorrectly,
    disabling one unintentionally disables both.  Fix from Nikolay
    Aleksandrov.

17) smsc75xx drivers don't program the RX mac properly for jumbo frames.
    Fix from Steve Glendinning.

18) Fix off-by-one in Codel packet scheduler.  From Vijay Subramanian.

19) Fix packet corruption in atl1c by disabling MSI support, from Hannes
    Frederic Sowa.

20) netdev_rx_handler_unregister() needs a synchronize_net() to fix
    crashes in bonding driver unload stress tests.  From Eric Dumazet.

21) rxlen field of ks8851 RX packet descriptors not interpreted
    correctly (it is 12 bits not 16 bits, so needs to be masked after
    shifting the 32-bit value down 16 bits).  Fix from Max Nekludov.

22) Fix missed RX/TX enable in sh_eth driver due to mishandling of link
    change indications.  From Sergei Shtylyov.

23) Fix crashes during spurious ECI interrupts in sh_eth driver, also
    from Sergei Shtylyov.

24) dm9000 driver initialization is done wrong for revision B devices
    with DSP PHY, from Joseph CHANG.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
  DM9000B: driver initialization upgrade
  sh_eth: make 'link' field of 'struct sh_eth_private' *int*
  sh_eth: workaround for spurious ECI interrupt
  sh_eth: fix handling of no LINK signal
  ks8851: Fix interpretation of rxlen field.
  net: add a synchronize_net() in netdev_rx_handler_unregister()
  MAINTAINERS: Update netxen_nic maintainers list
  atl1e: drop pci-msi support because of packet corruption
  net: fq_codel: Fix off-by-one error
  net: calxedaxgmac: Wake-on-LAN fixes
  net: calxedaxgmac: fix rx ring handling when OOM
  net: core: Remove redundant call to 'nf_reset' in 'dev_forward_skb'
  smsc75xx: fix jumbo frame support
  net: fix the use of this_cpu_ptr
  bonding: fix disabling of arp_interval and miimon
  ipv6: don't accept node local multicast traffic from the wire
  sky2: Threshold for Pause Packet is set wrong
  sky2: Receive Overflows not counted
  aoe: reserve enough headroom on skbs
  line up comment for ndo_bridge_getlink
  ...

net: add option to enable error queue packets waking select

Currently, when a socket receives something on the error queue it only wakes up
the socket on select if it is in the "read" list, that is the socket has
something to read. It is useful also to wake the socket if it is in the error
list, which would enable software to wait on error queue packets without waking
up for regular data on the socket. The main use case is for receiving
timestamped transmit packets which return the timestamp to the socket via the
error queue. This enables an application to select on the socket for the error
queue only instead of for the regular traffic.

-v2-
* Added the SO_SELECT_ERR_QUEUE socket option to every architechture specific file
* Modified every socket poll function that checks error queue

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jeffrey Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Matthew Vick <matthew.vick@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

DM9000B: driver initialization upgrade

Fix bug for DM9000 revision B which contain a DSP PHY

DM9000B use DSP PHY instead previouse DM9000 revisions' analog PHY,
So need extra change in initialization, For
explicity PHY Reset and PHY init parameter, and
first DM9000_NCR reset need NCR_MAC_LBK bit by dm9000_probe().

Following DM9000_NCR reset cause by dm9000_open() clear the
NCR_MAC_LBK bit.

Without this fix, Power-up FIFO pointers error happen around 2%
rate among Davicom's customers' boards. With this fix, All above
cases can be solved.

Signed-off-by: Joseph CHANG <josright123@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>