platform/kernel/linux-exynos.git
12 years agonet/can: use module_pci_driver
Axel Lin [Sat, 14 Apr 2012 04:38:43 +0000 (12:38 +0800)]
net/can: use module_pci_driver

This patch converts the drivers in drivers/net/can/* to use
module_pci_driver() macro which makes the code smaller and a bit simpler.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-can@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
12 years agocan: fix sparse warning for cgw_list
Daniel Baluta [Sun, 25 Mar 2012 23:05:50 +0000 (02:05 +0300)]
can: fix sparse warning for cgw_list

Make cgw_list static to remove the following sparse warning:
net/can/gw.c:69:1: warning: symbol 'cgw_list' was not declared.
Should it be static?

Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
12 years agotcp: restore formatting of macros for tcp_skb_cb sacked field
Neal Cardwell [Mon, 16 Apr 2012 07:08:06 +0000 (07:08 +0000)]
tcp: restore formatting of macros for tcp_skb_cb sacked field

Commit b82d1bb4 inadvertendly placed unrelated new code between
TCPCB_EVER_RETRANS and TCPCB_RETRANS and the other macros that refer
to the sacked field in the struct tcp_skb_cb (probably because there
was a misleading empty line there). This commit fixes up the
formatting so that all macros related to the sacked field are adjacent
again.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoatl1: remove unused member from atl1_adapter structure
Tony Zelenoff [Sun, 15 Apr 2012 21:27:29 +0000 (21:27 +0000)]
atl1: remove unused member from atl1_adapter structure

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agodrivers/net: fix unresolved 64bit math in mellanox/mlx4/en_dcb_nl.c
Paul Gortmaker [Sun, 15 Apr 2012 18:17:34 +0000 (18:17 +0000)]
drivers/net: fix unresolved 64bit math in mellanox/mlx4/en_dcb_nl.c

Commit 109d2446052a484c58f07f71f9457bf7b71017f8

    "net/mlx4_en: Set max rate-limit for a TC"

introduced 64 bit math operations into mlx4_en_dcbnl_ieee_setmaxrate()

causing the following final link failure on an x86_32 allmodconfig

  ERROR: "__udivdi3" [drivers/net/ethernet/mellanox/mlx4/mlx4_en.ko] undefined!

Convert it to use div_u64() instead.

Cc: Amir Vadai <amirv@mellanox.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Sun, 15 Apr 2012 17:19:04 +0000 (13:19 -0400)]
Merge git://git./linux/kernel/git/davem/net

Conflicts:
drivers/net/ethernet/atheros/atlx/atl1.c
drivers/net/ethernet/atheros/atlx/atl1.h

Resolved a conflict between a DMA error bug fix and NAPI
support changes in the atl1 driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agomacvlan: add FDB bridge ops and macvlan flags
John Fastabend [Sun, 15 Apr 2012 06:44:37 +0000 (06:44 +0000)]
macvlan: add FDB bridge ops and macvlan flags

This adds FDB bridge ops to the macvlan device passthru mode.
Additionally a flags field was added and a NOPROMISC bit to
allow users to use passthru mode without the driver calling
dev_set_promiscuity(). The flags field is a u16 placed in a
4 byte hole (consuming 2 bytes) of the macvlan_dev struct.

We want to do this so that the macvlan driver or stack
above the macvlan driver does not have to process every
packet. For the use case where we know all the MAC addresses
of the endstations above us this works well.

This patch is a result of Roopa Prabhu's work. Follow up
patches are needed for VEPA and VEB macvlan modes.

v2: Change from distinct nopromisc mode to a flags field to
    configure this. This avoids the tendency to add a new
    mode every time we need some slightly different behavior.
v3: fix error in dev_set_promiscuity and add change and get
    link attributes for flags.

CC: Roopa Prabhu <roprabhu@cisco.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoixgbe: UTA table incorrectly programmed
Greg Rose [Sun, 15 Apr 2012 06:44:31 +0000 (06:44 +0000)]
ixgbe: UTA table incorrectly programmed

The UTA table was being set to the functional equivalent of promiscuous
mode.  This was resulting in traffic from the virtual function being
flooded onto the wire and the PF device. This resulted in additional
overhead for VF traffic sent to the network and in the case of traffic
sent to the PF or another VF resulted in unwanted packets on the wire.

This was actually not the intended behavior. Now that we can program
the embedded switch correctly we can remove this snippit of code. Users
who want to support this should configure the FDB correctly using the
FDB ops.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoixgbe: allow RAR table to be updated in promisc mode
John Fastabend [Sun, 15 Apr 2012 06:44:25 +0000 (06:44 +0000)]
ixgbe: allow RAR table to be updated in promisc mode

This allows RAR table updates while in promiscuous. With
SR-IOV enabled it is valuable to allow the RAR table to
be updated even when in promisc mode to configure forwarding

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoixgbe: enable FDB netdevice ops
John Fastabend [Sun, 15 Apr 2012 06:44:19 +0000 (06:44 +0000)]
ixgbe: enable FDB netdevice ops

Enable FDB ops on ixgbe when in SR-IOV mode.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: rtnetlink notify events for FDB NTF_SELF adds and deletes
John Fastabend [Sun, 15 Apr 2012 06:44:14 +0000 (06:44 +0000)]
net: rtnetlink notify events for FDB NTF_SELF adds and deletes

It is useful to be able to monitor for FDB events in user space.
This patch adds support to generate netlink events when a change
is made to a device supporting the FDB ops.

This brings embedded switches inline with the SW net/bridge which
triggers events on FDB updates as well.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: add fdb generic dump routine
John Fastabend [Sun, 15 Apr 2012 06:44:08 +0000 (06:44 +0000)]
net: add fdb generic dump routine

This adds a generic dump routine drivers can call. It
should be sufficient to handle any bridging model that
uses the unicast address list. This should be most SR-IOV
enabled NICs.

v2: return error on nlmsg_put and use -EMSGSIZE instead
    of -ENOMEM this is inline other usages

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: addr_list: add exclusive dev_uc_add and dev_mc_add
John Fastabend [Sun, 15 Apr 2012 06:44:02 +0000 (06:44 +0000)]
net: addr_list: add exclusive dev_uc_add and dev_mc_add

This adds a dev_uc_add_excl() and dev_mc_add_excl() calls
similar to the original dev_{uc|mc}_add() except it sets
the global bit and returns -EEXIST for duplicat entires.

This is useful for drivers that support SR-IOV, macvlan
devices and any other devices that need to manage the
unicast and multicast lists.

v2: fix typo UNICAST should be MULTICAST in dev_mc_add_excl()

CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: add generic PF_BRIDGE:RTM_ FDB hooks
John Fastabend [Sun, 15 Apr 2012 06:43:56 +0000 (06:43 +0000)]
net: add generic PF_BRIDGE:RTM_ FDB hooks

This adds two new flags NTF_MASTER and NTF_SELF that can
now be used to specify where PF_BRIDGE netlink commands should
be sent. NTF_MASTER sends the commands to the 'dev->master'
device for parsing. Typically this will be the linux net/bridge,
or open-vswitch devices. Also without any flags set the command
will be handled by the master device as well so that current user
space tools continue to work as expected.

The NTF_SELF flag will push the PF_BRIDGE commands to the
device. In the basic example below the commands are then parsed
and programmed in the embedded bridge.

Note if both NTF_SELF and NTF_MASTER bits are set then the
command will be sent to both 'dev->master' and 'dev' this allows
user space to easily keep the embedded bridge and software bridge
in sync.

There is a slight complication in the case with both flags set
when an error occurs. To resolve this the rtnl handler clears
the NTF_ flag in the netlink ack to indicate which sets completed
successfully. The add/del handlers will abort as soon as any
error occurs.

To support this new net device ops were added to call into
the device and the existing bridging code was refactored
to use these. There should be no required changes in user space
to support the current bridge behavior.

A basic setup with a SR-IOV enabled NIC looks like this,

          veth0  veth2
            |      |
          ------------
          |  bridge0 |   <---- software bridging
          ------------
               /
               /
  ethx.y      ethx
    VF         PF
     \         \          <---- propagate FDB entries to HW
     \         \
  --------------------
  |  Embedded Bridge |    <---- hardware offloaded switching
  --------------------

In this case the embedded bridge must be managed to allow 'veth0'
to communicate with 'ethx.y' correctly. At present drivers managing
the embedded bridge either send frames onto the network which
then get dropped by the switch OR the embedded bridge will flood
these frames. With this patch we have a mechanism to manage the
embedded bridge correctly from user space. This example is specific
to SR-IOV but replacing the VF with another PF or dropping this
into the DSA framework generates similar management issues.

Examples session using the 'br'[1] tool to add, dump and then
delete a mac address with a new "embedded" option and enabled
ixgbe driver:

# br fdb add 22:35:19:ac:60:59 dev eth3
# br fdb
port    mac addr                flags
veth0   22:35:19:ac:60:58       static
veth0   9a:5f:81:f7:f6:ec       local
eth3    00:1b:21:55:23:59       local
eth3    22:35:19:ac:60:59       static
veth0   22:35:19:ac:60:57       static
#br fdb add 22:35:19:ac:60:59 embedded dev eth3
#br fdb
port    mac addr                flags
veth0   22:35:19:ac:60:58       static
veth0   9a:5f:81:f7:f6:ec       local
eth3    00:1b:21:55:23:59       local
eth3    22:35:19:ac:60:59       static
veth0   22:35:19:ac:60:57       static
eth3    22:35:19:ac:60:59       local embedded
#br fdb del 22:35:19:ac:60:59 embedded dev eth3

I added a couple lines to 'br' to set the flags correctly is all. It
is my opinion that the merit of this patch is now embedded and SW
bridges can both be modeled correctly in user space using very nearly
the same message passing.

[1] 'br' tool was published as an RFC here and will be renamed 'bridge'
    http://patchwork.ozlabs.org/patch/117664/

Thanks to Jamal Hadi Salim, Stephen Hemminger and Ben Hutchings for
valuable feedback, suggestions, and review.

v2: fixed api descriptions and error case with both NTF_SELF and
    NTF_MASTER set plus updated patch description.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoatl1: do not drop rx/tx interrupts before they are scheduled
Tony Zelenoff [Fri, 13 Apr 2012 06:09:54 +0000 (06:09 +0000)]
atl1: do not drop rx/tx interrupts before they are scheduled

To prevent interrupts lost they should be dropped only if
they are scheduled via napi interfaces. In other case, there is
exists situation when napi handler process TX interrupt, stay in
RX processing and in that moment any other interrupt received.
Then before this patch TX bit in ISR will be cleaned, napi
schedule will not occur in case of currently processing event and
TX interrupt definitely will be lost.

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoatl1: do not process interrupts in cycle in handler
Tony Zelenoff [Fri, 13 Apr 2012 06:09:53 +0000 (06:09 +0000)]
atl1: do not process interrupts in cycle in handler

As the rx/tx handled inside napi handler, the cycle is
not needed now, because only the rx/tx need such kind of
processing.

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoatl1: enable errors and link ints when rx/tx scheduled
Tony Zelenoff [Fri, 13 Apr 2012 06:09:52 +0000 (06:09 +0000)]
atl1: enable errors and link ints when rx/tx scheduled

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoatl1: add value to check ability of reenabling IRQs
Tony Zelenoff [Fri, 13 Apr 2012 06:09:51 +0000 (06:09 +0000)]
atl1: add value to check ability of reenabling IRQs

Unfortunately it is not clear from code is usage of
IMR register possible or not. So, to prevent possible
side-effects of reading this register i prefer store
interrupts enable flag separately.

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoatl1: make function to set imr of card
Tony Zelenoff [Fri, 13 Apr 2012 06:09:50 +0000 (06:09 +0000)]
atl1: make function to set imr of card

This function should be used later to set/remove proper
bits in imr to disable only rx ints.

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoatl1: use defined functions to disable irq
Tony Zelenoff [Fri, 13 Apr 2012 06:09:49 +0000 (06:09 +0000)]
atl1: use defined functions to disable irq

Looks like direct writes to IMR register is not good idea,
because there are exist functions to make this work.

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoatl1: add napi process of tx interrupts
Tony Zelenoff [Fri, 13 Apr 2012 06:09:48 +0000 (06:09 +0000)]
atl1: add napi process of tx interrupts

Make the tx ints processing same as rx ones via napi.
The idea got from e1000. The interrupt disabling is
still not fine grained.

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoatl1: make driver napi compatible
Tony Zelenoff [Fri, 13 Apr 2012 06:09:47 +0000 (06:09 +0000)]
atl1: make driver napi compatible

This is first step, here there is no fine interrupt
disabling which cause TX/ERR interrupts stalling when
RX scheduled ints processed.

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoatl1: handle rx in separate condition
Tony Zelenoff [Fri, 13 Apr 2012 06:09:46 +0000 (06:09 +0000)]
atl1: handle rx in separate condition

Remove rx from unlikely optimization in case of rx is very
likely thing for network card. This also reduce code a bit.

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agobridge: Add multicast_querier toggle and disable queries by default
Herbert Xu [Fri, 13 Apr 2012 02:37:42 +0000 (02:37 +0000)]
bridge: Add multicast_querier toggle and disable queries by default

Sending general queries was implemented as an optimisation to speed
up convergence on start-up.  In order to prevent interference with
multicast routers a zero source address has to be used.

Unfortunately these packets appear to cause some multicast-aware
switches to misbehave, e.g., by disrupting multicast packets to us.

Since the multicast snooping feature still functions without sending
our own queries, this patch will change the default to not send
queries.

For those that need queries in order to speed up convergence on start-up,
a toggle is provided to restore the previous behaviour.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agobridge: Restart queries when last querier expires
Herbert Xu [Fri, 13 Apr 2012 02:37:42 +0000 (02:37 +0000)]
bridge: Restart queries when last querier expires

As it stands when we discover that a real querier (one that queries
with a non-zero source address) we stop querying.  However, even
after said querier has fallen off the edge of the earth, we will
never restart querying (unless the bridge itself is restarted).

This patch fixes this by kicking our own querier into gear when
the timer for other queriers expire.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agobridge: Add br_multicast_start_querier
Herbert Xu [Fri, 13 Apr 2012 02:37:42 +0000 (02:37 +0000)]
bridge: Add br_multicast_start_querier

This patch adds the helper br_multicast_start_querier so that
the code which starts the queriers in br_multicast_toggle can
be reused elsewhere.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: cleanup unsigned to unsigned int
Eric Dumazet [Sun, 15 Apr 2012 05:58:06 +0000 (05:58 +0000)]
net: cleanup unsigned to unsigned int

Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoipv4: fix checkpatch errors
Daniel Baluta [Sun, 15 Apr 2012 01:34:41 +0000 (01:34 +0000)]
ipv4: fix checkpatch errors

Fix checkpatch errors of the following type:
* ERROR: "foo * bar" should be "foo *bar"
* ERROR: "(foo*)" should be "(foo *)"

Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agovirtio-net: send gratuitous packets when needed
Jason Wang [Wed, 11 Apr 2012 20:43:52 +0000 (20:43 +0000)]
virtio-net: send gratuitous packets when needed

As hypervior does not have the knowledge of guest network configuration, it's
better to ask guest to send gratuitous packets when needed.

This patch implements VIRTIO_NET_F_GUEST_ANNOUNCE feature: hypervisor would
notice the guest when it thinks it's time for guest to announce the link
presnece. Guest tests VIRTIO_NET_S_ANNOUNCE bit during config change interrupt
and woule send gratuitous packets through netif_notify_peers() and ack the
notification through ctrl vq.

We need to make sure the atomicy of read and ack in guest otherwise we may ack
more times than being notified. This is done through handling the whole config
change interrupt in an non-reentrant workqueue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoisdn/hysdn: Convert to kstrtoul_from_user
Peter Hüwe [Sat, 14 Apr 2012 13:42:59 +0000 (13:42 +0000)]
isdn/hysdn: Convert to kstrtoul_from_user

This patch replaces the code for getting an number from a
userspace buffer by a simple call to kstroul_from_user.
This makes it easier to read and less error prone.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoipv6: Remove unused argument to addrconf_dad_start().
David S. Miller [Sun, 15 Apr 2012 01:37:40 +0000 (21:37 -0400)]
ipv6: Remove unused argument to addrconf_dad_start().

Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: Fix spelling typo in net
Masanari Iida [Fri, 13 Apr 2012 04:33:20 +0000 (04:33 +0000)]
net: Fix spelling typo in net

Correct spelling typo within drivers/net.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotcp: Remove redundant code entering quickack mode
Vijay Subramanian [Fri, 13 Apr 2012 13:23:59 +0000 (13:23 +0000)]
tcp: Remove redundant code entering quickack mode

tcp_enter_quickack_mode() already calls tcp_incr_quickack() and sets
icsk->icsk_ack.ato  to TCP_ATO_MIN. This patch removes the duplication.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotcp: bind() use stronger condition for bind_conflict
Alex Copot [Thu, 12 Apr 2012 22:21:45 +0000 (22:21 +0000)]
tcp: bind() use stronger condition for bind_conflict

We must try harder to get unique (addr, port) pairs when
doing port autoselection for sockets with SO_REUSEADDR
option set.

We achieve this by adding a relaxation parameter to
inet_csk_bind_conflict. When 'relax' parameter is off
we return a conflict whenever the current searched
pair (addr, port) is not unique.

This tries to address the problems reported in patch:
8d238b25b1ec22a73b1c2206f111df2faaff8285
Revert "tcp: bind() fix when many ports are bound"

Tests where ran for creating and binding(0) many sockets
on 100 IPs. The results are, on average:

* 60000 sockets, 600 ports / IP:
* 0.210 s, 620 (IP, port) duplicates without patch
* 0.219 s, no duplicates with patch
* 100000 sockets, 1000 ports / IP:
* 0.371 s, 1720 duplicates without patch
* 0.373 s, no duplicates with patch
* 200000 sockets, 2000 ports / IP:
* 0.766 s, 6900 duplicates without patch
* 0.768 s, no duplicates with patch
* 500000 sockets, 5000 ports / IP:
* 2.227 s, 41500 duplicates without patch
* 2.284 s, no duplicates with patch

Signed-off-by: Alex Copot <alex.mihai.c@gmail.com>
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoinet: makes syn_ack_timeout mandatory
Eric Dumazet [Thu, 12 Apr 2012 22:16:05 +0000 (22:16 +0000)]
inet: makes syn_ack_timeout mandatory

There are two struct request_sock_ops providers, tcp and dccp.

inet_csk_reqsk_queue_prune() can avoid testing syn_ack_timeout being
NULL if we make it non NULL like syn_ack_timeout

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: dccp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agotcp: RFC6298 supersedes RFC2988bis
Eric Dumazet [Thu, 12 Apr 2012 19:48:40 +0000 (19:48 +0000)]
tcp: RFC6298 supersedes RFC2988bis

Updates some comments to track RFC6298

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/ethernet: ks8851_mll fix rx frame buffer overflow
Davide Ciminaghi [Fri, 13 Apr 2012 04:48:25 +0000 (04:48 +0000)]
net/ethernet: ks8851_mll fix rx frame buffer overflow

At the beginning of ks_rcv(), a for loop retrieves the
header information relevant to all the frames stored
in the mac's internal buffers. The number of pending
frames is stored as an 8 bits field in KS_RXFCTR.
If interrupts are disabled long enough to allow for more than
32 frames to accumulate in the MAC's internal buffers, a buffer
overflow occurs.
This patch fixes the problem by making the
driver's frame_head_info buffer big enough.
Well actually, since the chip appears to have 12K of
internal rx buffers and the shortest ethernet frame should
be 64 bytes long, maybe the limit could be set to
12*1024/64 = 192 frames, but 255 should be safer.

Signed-off-by: Davide Ciminaghi <ciminaghi@gnudd.com>
Signed-off-by: Raffaele Recalcati <raffaele.recalcati@bticino.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/wan: use module_pci_driver
Axel Lin [Fri, 13 Apr 2012 18:41:21 +0000 (18:41 +0000)]
net/wan: use module_pci_driver

This patch converts the drivers in drivers/net/wan/* to use
module_pci_driver() macro which makes the code smaller and a bit simpler.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/tokenring: use module_pci_driver
Axel Lin [Fri, 13 Apr 2012 18:40:17 +0000 (18:40 +0000)]
net/tokenring: use module_pci_driver

This patch converts the drivers in drivers/net/tokenring/* to use
module_pci_driver() macro which makes the code smaller and a bit simpler.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Sat, 14 Apr 2012 19:17:54 +0000 (15:17 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

12 years agotunnel: implement 64 bits statistics
stephen hemminger [Thu, 12 Apr 2012 06:31:16 +0000 (06:31 +0000)]
tunnel: implement 64 bits statistics

Convert the per-cpu statistics kept for GRE, IPIP, and SIT tunnels
to use 64 bit statistics.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoixgbe: add I2C clock stretching
Don Skidmore [Thu, 15 Mar 2012 07:36:37 +0000 (07:36 +0000)]
ixgbe: add I2C clock stretching

This patch adds support for I2C clock stretching which is required per
SFF-8636.  Customers with passive DA cables implement clock stretching
would fail without this patch.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoigb: Update version to 3.4.7.
Carolyn Wyborny [Mon, 9 Apr 2012 23:13:02 +0000 (23:13 +0000)]
igb: Update version to 3.4.7.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoe1000e: cleanup boolean logic
Bruce Allan [Fri, 13 Apr 2012 00:08:31 +0000 (00:08 +0000)]
e1000e: cleanup boolean logic

Replace occurrences of 'if (<bool expr> == <1|0>)' with
'if ([!]<bool expr>)'

Replace occurrences of '<bool var> = (<non-bool expr>) ? true : false'
with '<bool var> = <non-bool expr>'.

Replace occurrence of '<bool var> = <non-bool expr>' with
'<bool var> = !!<non-bool expr>'

While the latter replacement is not really necessary, it is done here for
consistency and clarity.  No functional changes.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoe1000e: cleanup remaining strings split across multiple lines
Bruce Allan [Thu, 12 Apr 2012 05:47:09 +0000 (05:47 +0000)]
e1000e: cleanup remaining strings split across multiple lines

Now that split strings generate checkpatch warnings (per Chapter 2 of
Documentation/CodingStyle to make it easier to grep the code for the
string) cleanup the remaining instances of them in the driver.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoe100: enable transmit time stamping.
Richard Cochran [Sun, 8 Apr 2012 14:38:10 +0000 (14:38 +0000)]
e100: enable transmit time stamping.

This patch enables software (and phy device) transmit time stamping.
Tested on an old PIII laptop with built in NIC.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoe100: Support the get_ts_info ethtool method.
Richard Cochran [Wed, 4 Apr 2012 17:43:31 +0000 (17:43 +0000)]
e100: Support the get_ts_info ethtool method.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoixgbe: fix WoL issue with fiber
Don Skidmore [Thu, 5 Apr 2012 08:12:05 +0000 (08:12 +0000)]
ixgbe: fix WoL issue with fiber

There are times we turn of the laser before shutdown.  This is a bad thing
if we want to wake on lan to work so now we make sure the laser is on
before shutdown if we support WoL.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agoe1000e: issues in Sx on 82577/8/9
Bruce Allan [Thu, 12 Apr 2012 06:27:03 +0000 (06:27 +0000)]
e1000e: issues in Sx on 82577/8/9

A workaround was previously put in the driver to reset the device when
transitioning to Sx in order to activate the changed settings of the PHY
OEM bits (Low Power Link Up, or LPLU, and GbE disable configuration) for
82577/8/9 devices.  After further review, it was found such a reset can
cause the 82579 to confuse which version of 82579 it actually is and broke
LPLU on all 82577/8/9 devices.  The workaround during an S0->Sx transition
on 82579 (instead of resetting the PHY) is to restart auto-negotiation
after the OEM bits are configured; the restart of auto-negotiation
activates the new OEM bits as does the reset.  With 82577/8, the reset is
changed to a generic reset which fixes the LPLU bits getting set wrong.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
12 years agobonding: Fixup get_tx_queue() op second arg type.
David S. Miller [Fri, 13 Apr 2012 19:02:35 +0000 (15:02 -0400)]
bonding: Fixup get_tx_queue() op second arg type.

I missed this when fixing up the warning in the previous commit.

Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agortnetlink: ops->get_tx_queue() cannot take a const 'tb'.
David S. Miller [Fri, 13 Apr 2012 18:21:04 +0000 (14:21 -0400)]
rtnetlink: ops->get_tx_queue() cannot take a const 'tb'.

net/core/rtnetlink.c: In function ‘rtnl_create_link’:
net/core/rtnetlink.c:1645:3: warning: passing argument 2 of ‘ops->get_tx_queues’ from incompatible pointer type [enabled by default]
net/core/rtnetlink.c:1645:3: note: expected ‘const struct nlattr **’ but argument is of type ‘struct nlattr **’

Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: smsc911x: fix skb handling in receive path
Will Deacon [Thu, 12 Apr 2012 05:54:09 +0000 (05:54 +0000)]
net: smsc911x: fix skb handling in receive path

The SMSC911x driver resets the ->head, ->data and ->tail pointers in the
skb on the reset path in order to avoid buffer overflow due to packet
padding performed by the hardware.

This patch fixes the receive path so that the skb pointers are fixed up
after the data has been read from the device, The error path is also
fixed to use number of words consistently and prevent erroneous FIFO
fastforwarding when skipping over bad data.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoks8851: Fix missing mutex_lock/unlock
Matt Renzelmann [Fri, 13 Apr 2012 07:59:40 +0000 (07:59 +0000)]
ks8851: Fix missing mutex_lock/unlock

Move the ks8851_rdreg16 call above the call to request_irq and cache
the result for subsequent repeated use.  A spurious interrupt may
otherwise cause a crash.  Thanks to Stephen Boyd, Flavio Leitner, and
Ben Hutchings for feedback.

Signed-off-by: Matt Renzelmann <mjr@cs.wisc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoPhonet: change maintainer address
Rémi Denis-Courmont [Thu, 12 Apr 2012 03:39:18 +0000 (03:39 +0000)]
Phonet: change maintainer address

nokia.com MX does not cope well with kernel.org.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoPhonet: missing headers (sparse)
Rémi Denis-Courmont [Thu, 12 Apr 2012 03:39:17 +0000 (03:39 +0000)]
Phonet: missing headers (sparse)

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoPhonet: phonet_net_id can be static (sparse)
Rémi Denis-Courmont [Thu, 12 Apr 2012 03:39:16 +0000 (03:39 +0000)]
Phonet: phonet_net_id can be static (sparse)

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoneighbour: Make neigh_table_init_no_netlink() static.
Hiroaki SHIMODA [Fri, 13 Apr 2012 07:34:44 +0000 (07:34 +0000)]
neighbour: Make neigh_table_init_no_netlink() static.

neigh_table_init_no_netlink() is only used in net/core/neighbour.c file.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agodrivers/net/ethernet/xilinx/axi ethernet: Correct Copyright
Michal Simek [Thu, 12 Apr 2012 01:11:12 +0000 (01:11 +0000)]
drivers/net/ethernet/xilinx/axi ethernet: Correct Copyright

Also fix MAINTAINERS file to reflect autorship.

Daniel and Ariane changed coding style but not any functional changes in the driver
itself.

Signed-off-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years ago8139cp: set intr mask after its handler is registered
Jason Wang [Wed, 11 Apr 2012 22:10:54 +0000 (22:10 +0000)]
8139cp: set intr mask after its handler is registered

We set intr mask before its handler is registered, this does not work well when
8139cp is sharing irq line with other devices. As the irq could be enabled by
the device before 8139cp's hander is registered which may lead unhandled
irq. Fix this by introducing an helper cp_irq_enable() and call it after
request_irq().

Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoudp: intoduce udp_encap_needed static_key
Eric Dumazet [Wed, 11 Apr 2012 23:05:28 +0000 (23:05 +0000)]
udp: intoduce udp_encap_needed static_key

Most machines dont use UDP encapsulation (L2TP)

Adds a static_key so that udp_queue_rcv_skb() doesnt have to perform a
test if L2TP never setup the encap_rcv on a socket.

Idea of this patch came after Simon Horman proposal to add a hook on TCP
as well.

If static_key is not yet enabled, the fast path does a single JMP .

When static_key is enabled, JMP destination is patched to reach the real
encap_type/encap_rcv logic, possibly adding cache misses.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: dev@openvswitch.org
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoatl1: fix kernel panic in case of DMA errors
Tony Zelenoff [Wed, 11 Apr 2012 06:15:03 +0000 (06:15 +0000)]
atl1: fix kernel panic in case of DMA errors

Problem:
There was two separate work_struct structures which share one
handler. Unfortunately getting atl1_adapter structure from
work_struct in case of DMA error was done from incorrect
offset which cause kernel panics.

Solution:
The useless work_struct for DMA error removed and
handler name changed to more generic one.

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: WIZnet drivers: fix possible NULL dereference
Mike Sinkovsky [Tue, 10 Apr 2012 19:53:53 +0000 (19:53 +0000)]
net: WIZnet drivers: fix possible NULL dereference

This fixes possible null dereference in probe() function: when both
.mac_addr and .link_gpio are unknown, dev.platform_data may be NULL

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mike Sinkovsky <msink@permonline.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agortnetlink: fix comments
stephen hemminger [Tue, 10 Apr 2012 18:32:59 +0000 (18:32 +0000)]
rtnetlink: fix comments

Fix spelling and references in rtnetlink.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agortnetlink & bonding: change args got get_tx_queues
stephen hemminger [Tue, 10 Apr 2012 18:34:43 +0000 (18:34 +0000)]
rtnetlink & bonding: change args got get_tx_queues

Change get_tx_queues, drop unsused arg/return value real_tx_queues,
and use return by value (with error) rather than call by reference.

Probably bonding should just change to LLTX and the whole get_tx_queues
API could disappear!

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoethernet: replace open-coded ARRAY_SIZE with macro
Jim Cromie [Tue, 10 Apr 2012 14:56:22 +0000 (14:56 +0000)]
ethernet: replace open-coded ARRAY_SIZE with macro

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoenic: replace open-coded ARRAY_SIZE with macro
Jim Cromie [Tue, 10 Apr 2012 14:56:09 +0000 (14:56 +0000)]
enic: replace open-coded ARRAY_SIZE with macro

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agobroadcom: replace open-coded ARRAY_SIZE with macro
Jim Cromie [Tue, 10 Apr 2012 14:56:03 +0000 (14:56 +0000)]
broadcom: replace open-coded ARRAY_SIZE with macro

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: Remove redundant spi driver bus initialization
Lars-Peter Clausen [Tue, 10 Apr 2012 10:51:29 +0000 (10:51 +0000)]
net: Remove redundant spi driver bus initialization

In ancient times it was necessary to manually initialize the bus field of an
spi_driver to spi_bus_type. These days this is done in spi_driver_register() so
we can drop the manual assignment.

The patch was generated using the following coccinelle semantic patch:
// <smpl>
@@
identifier _driver;
@@
struct spi_driver _driver = {
.driver = {
- .bus = &spi_bus_type,
},
};
// </smpl>

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Gabor Juhos <juhosg@openwrt.org>
Cc: Frederic Lambert <frdrc66@gmail.com>
Cc: netdev@vger.kernel.org
Acked-by: Gabor Juhos <juhosg@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/garp: fix GID rbtree ordering
David Ward [Mon, 9 Apr 2012 04:13:53 +0000 (04:13 +0000)]
net/garp: fix GID rbtree ordering

The comparison operators were backwards in both garp_attr_lookup and
garp_attr_create, so the entire GID rbtree was in reverse order.
(There was no practical side effect to this though, except that PDUs
were sent with attributes listed in reverse order, which is still
valid by the protocol. This change is only for clarity.)

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoskbuff: struct ubuf_info callback type safety
Michael S. Tsirkin [Mon, 9 Apr 2012 00:24:02 +0000 (00:24 +0000)]
skbuff: struct ubuf_info callback type safety

The skb struct ubuf_info callback gets passed struct ubuf_info
itself, not the arg value as the field name and the function signature
seem to imply. Rename the arg field to ctx to match usage,
add documentation and change the callback argument type
to make usage clear and to have compiler check correctness.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoppp: Fix race condition with queue start/stop
David Woodhouse [Sun, 8 Apr 2012 10:01:44 +0000 (10:01 +0000)]
ppp: Fix race condition with queue start/stop

Commit e675f0cc9a872fd152edc0c77acfed19bf28b81e ("ppp: Don't stop and
restart queue on every TX packet") introduced a race condition which
could leave the net queue stopped even when the channel is no longer
busy. By calling netif_stop_queue() from ppp_start_xmit(), based on the
return value from ppp_xmit_process() but *after* all the locks have been
dropped, we could potentially do so *after* the channel has actually
finished transmitting and attempted to re-wake the queue.

Fix this by moving the netif_stop_queue() into ppp_xmit_process() under
the xmit lock. I hadn't done this previously, because it gets called
from other places than ppp_start_xmit(). But I now think it's the better
option. The net queue *should* be stopped if the channel becomes
congested due to writes from pppd, anyway.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agopppoatm: Fix excessive queue bloat
David Woodhouse [Sun, 8 Apr 2012 09:55:43 +0000 (09:55 +0000)]
pppoatm: Fix excessive queue bloat

We discovered that PPPoATM has an excessively deep transmit queue. A
queue the size of the default socket send buffer (wmem_default) is
maintained between the PPP generic core and the ATM device.

Fix it to queue a maximum of *two* packets. The one the ATM device is
currently working on, and one more for the ATM driver to process
immediately in its TX done interrupt handler. The PPP core is designed
to feed packets to the channel with minimal latency, so that really
ought to be enough to keep the ATM device busy.

While we're at it, fix the fact that we were triggering the wakeup
tasklet on *every* pppoatm_pop() call. The comment saying "this is
inefficient, but doing it right is too hard" turns out to be overly
pessimistic... I think :)

On machines like the Traverse Geos, with a slow Geode CPU and two
high-speed ADSL2+ interfaces, there were reports of extremely high CPU
usage which could partly be attributed to the extra wakeups.

(The wakeup handling could actually be made a whole lot easier if we
 stop checking sk->sk_sndbuf altogether. Given that we now only queue
 *two* packets ever, one wonders what the point is. As it is, you could
 already deadlock the thing by setting the sk_sndbuf to a value lower
 than the MTU of the device, and it'd just block for ever.)

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoipv6: fix problem with expired dst cache
Gao feng [Fri, 6 Apr 2012 00:13:10 +0000 (00:13 +0000)]
ipv6: fix problem with expired dst cache

If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet.
this dst cache will not check expire because it has no RTF_EXPIRES flag.
So this dst cache will always be used until the dst gc run.

Change the struct dst_entry,add a union contains new pointer from and expires.
When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use.
we can use this field to point to where the dst cache copy from.
The dst.from is only used in IPV6.

rt6_check_expired check if rt6_info.dst.from is expired.

ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.

ip6_dst_destroy release the ort.

Add some functions to operate the RTF_EXPIRES flag and expires(from) together.
and change the code to use these new adding functions.

Changes from v5:
modify ip6_route_add and ndisc_router_discovery to use new adding functions.

Only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocaif-hsi: Postpone init of HSI until open()
sjur.brandeland@stericsson.com [Thu, 12 Apr 2012 08:27:27 +0000 (08:27 +0000)]
caif-hsi: Postpone init of HSI until open()

Do the initialization of the HSI interface when the
interface is opened, instead of upon registration.
When the interface is closed the HSI interface is
de-initialized, allowing other modules to use the
HSI interface.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocaif-hsi: Remove stop/start of queue.
sjur.brandeland@stericsson.com [Thu, 12 Apr 2012 08:27:26 +0000 (08:27 +0000)]
caif-hsi: Remove stop/start of queue.

CAIF HSI is currently a virtual device. Stopping/starting the
queues is wrong on a virtual device.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocaif-hsi: robust frame aggregation for HSI
Dmitry Tarnyagin [Thu, 12 Apr 2012 08:27:25 +0000 (08:27 +0000)]
caif-hsi: robust frame aggregation for HSI

Implement aggregation algorithm, combining more data into a single
HSI transfer. 4 different traffic categories are supported:
 1. TC_PRIO_CONTROL .. TC_PRIO_MAX (CTL)
 2. TC_PRIO_INTERACTIVE            (VO)
 3. TC_PRIO_INTERACTIVE_BULK       (VI)
 4. TC_PRIO_BESTEFFORT, TC_PRIO_BULK, TC_PRIO_FILLER (BEBK)

Signed-off-by: Dmitry Tarnyagin <dmitry.tarnyagin@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocaif: set traffic class for caif packets
Dmitry Tarnyagin [Thu, 12 Apr 2012 08:27:24 +0000 (08:27 +0000)]
caif: set traffic class for caif packets

Set traffic class for CAIF packets, based on socket
priority, CAIF protocol type, or type of message.

Traffic class mapping for different packet types:
 - control:       TC_PRIO_CONTROL;
 - flow control:  TC_PRIO_CONTROL;
 - at:            TC_PRIO_CONTROL;
 - rfm:           TC_PRIO_INTERACTIVE_BULK;
 - other sockets: equals to socket's TC;
 - network data:  no change.

Signed-off-by: Dmitry Tarnyagin <dmitry.tarnyagin@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Fri, 13 Apr 2012 15:30:55 +0000 (11:30 -0400)]
Merge git://git./linux/kernel/git/davem/net

Pull in the 'net' tree to get CAIF bug fixes upon which
the following set of CAIF feature patches depend.

Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocaif_hsi: use dev_dbg not dev_err for reporting
Kim Lilliestierna XX [Thu, 12 Apr 2012 08:18:09 +0000 (08:18 +0000)]
caif_hsi: use dev_dbg not dev_err for reporting

Use dev_dbg instead of dev_err for reporting in cfhsi_wakeup_cb.

Signed-off-by: Kim Lilliestierna <kim.xx.lilliestierna@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocaif-hsi: Free flip_buffer at shutdown
sjur.brandeland@stericsson.com [Thu, 12 Apr 2012 08:18:08 +0000 (08:18 +0000)]
caif-hsi: Free flip_buffer at shutdown

Fix memory leak of RX flip-buffer.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agocaif: Fix memory leakage in the chnl_net.c.
Tomasz Gregorek [Thu, 12 Apr 2012 08:18:07 +0000 (08:18 +0000)]
caif: Fix memory leakage in the chnl_net.c.

Added kfree_skb() calls in the chnk_net.c file on
the error paths.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agol2tp: don't overwrite source address in l2tp_ip_bind()
James Chapman [Tue, 10 Apr 2012 00:10:43 +0000 (00:10 +0000)]
l2tp: don't overwrite source address in l2tp_ip_bind()

Applications using L2TP/IP sockets want to be able to bind() an L2TP/IP
socket to set the local tunnel id while leaving the auto-assigned source
address alone. So if no source address is supplied, don't overwrite
the address already stored in the socket.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agol2tp: fix refcount leak in l2tp_ip sockets
James Chapman [Tue, 10 Apr 2012 00:10:42 +0000 (00:10 +0000)]
l2tp: fix refcount leak in l2tp_ip sockets

The l2tp_ip socket close handler does not update the module refcount
correctly which prevents module unload after the first bind() call on
an L2TPv3 IP encapulation socket.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: Fix misplaced parenthesis in virtio_net.c
Torsten Kaiser [Mon, 9 Apr 2012 05:14:15 +0000 (05:14 +0000)]
net: Fix misplaced parenthesis in virtio_net.c

Commit 2e57b79ccef1ff1422fdf45a9b28fe60f8f084f7 misplaced its
parenthesis and now tx_fifo_errors will only be incremented if an
ENOMEM error is not written to the syslog.

Correct the parenthesis and indentation to the original goal of
counting all non ENOMEM errors and ratelimiting only the messages.

Signed-of-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/key/af_key.c: add missing kfree_skb
Julia Lawall [Sun, 8 Apr 2012 22:41:10 +0000 (22:41 +0000)]
net/key/af_key.c: add missing kfree_skb

At the point of this error-handling code, alloc_skb has succeded, so free
the resulting skb by jumping to the err label.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agophonet: Sort out initiailziation and cleanup code.
Eric W. Biederman [Fri, 6 Apr 2012 15:35:39 +0000 (15:35 +0000)]
phonet: Sort out initiailziation and cleanup code.

Recently an oops was reported in phonet if there was a failure during
network namespace creation.

[  163.733755] ------------[ cut here ]------------
[  163.734501] kernel BUG at include/net/netns/generic.h:45!
[  163.734501] invalid opcode: 0000 [#1] PREEMPT SMP
[  163.734501] CPU 2
[  163.734501] Pid: 19145, comm: trinity Tainted: G        W 3.4.0-rc1-next-20120405-sasha-dirty #57
[  163.734501] RIP: 0010:[<ffffffff824d6062>]  [<ffffffff824d6062>] phonet_pernet+0x182/0x1a0
[  163.734501] RSP: 0018:ffff8800674d5ca8  EFLAGS: 00010246
[  163.734501] RAX: 000000003fffffff RBX: 0000000000000000 RCX: ffff8800678c88d8
[  163.734501] RDX: 00000000003f4000 RSI: ffff8800678c8910 RDI: 0000000000000282
[  163.734501] RBP: ffff8800674d5cc8 R08: 0000000000000000 R09: 0000000000000000
[  163.734501] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880068bec920
[  163.734501] R13: ffffffff836b90c0 R14: 0000000000000000 R15: 0000000000000000
[  163.734501] FS:  00007f055e8de700(0000) GS:ffff88007d000000(0000) knlGS:0000000000000000
[  163.734501] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  163.734501] CR2: 00007f055e6bb518 CR3: 0000000070c16000 CR4: 00000000000406e0
[  163.734501] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  163.734501] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  163.734501] Process trinity (pid: 19145, threadinfo ffff8800674d4000, task ffff8800678c8000)
[  163.734501] Stack:
[  163.734501]  ffffffff824d5f00 ffffffff810e2ec1 ffff880067ae0000 00000000ffffffd4
[  163.734501]  ffff8800674d5cf8 ffffffff824d667a ffff880067ae0000 00000000ffffffd4
[  163.734501]  ffffffff836b90c0 0000000000000000 ffff8800674d5d18 ffffffff824d707d
[  163.734501] Call Trace:
[  163.734501]  [<ffffffff824d5f00>] ? phonet_pernet+0x20/0x1a0
[  163.734501]  [<ffffffff810e2ec1>] ? get_parent_ip+0x11/0x50
[  163.734501]  [<ffffffff824d667a>] phonet_device_destroy+0x1a/0x100
[  163.734501]  [<ffffffff824d707d>] phonet_device_notify+0x3d/0x50
[  163.734501]  [<ffffffff810dd96e>] notifier_call_chain+0xee/0x130
[  163.734501]  [<ffffffff810dd9d1>] raw_notifier_call_chain+0x11/0x20
[  163.734501]  [<ffffffff821cce12>] call_netdevice_notifiers+0x52/0x60
[  163.734501]  [<ffffffff821cd235>] rollback_registered_many+0x185/0x270
[  163.734501]  [<ffffffff821cd334>] unregister_netdevice_many+0x14/0x60
[  163.734501]  [<ffffffff823123e3>] ipip_exit_net+0x1b3/0x1d0
[  163.734501]  [<ffffffff82312230>] ? ipip_rcv+0x420/0x420
[  163.734501]  [<ffffffff821c8515>] ops_exit_list+0x35/0x70
[  163.734501]  [<ffffffff821c911b>] setup_net+0xab/0xe0
[  163.734501]  [<ffffffff821c9416>] copy_net_ns+0x76/0x100
[  163.734501]  [<ffffffff810dc92b>] create_new_namespaces+0xfb/0x190
[  163.734501]  [<ffffffff810dca21>] unshare_nsproxy_namespaces+0x61/0x80
[  163.734501]  [<ffffffff810afd1f>] sys_unshare+0xff/0x290
[  163.734501]  [<ffffffff8187622e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  163.734501]  [<ffffffff82665539>] system_call_fastpath+0x16/0x1b
[  163.734501] Code: e0 c3 fe 66 0f 1f 44 00 00 48 c7 c2 40 60 4d 82 be 01 00 00 00 48 c7 c7 80 d1 23 83 e8 48 2a c4 fe e8 73 06 c8 fe 48 85 db 75 0e <0f> 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 48 83 c4 10 48 89 d8
[  163.734501] RIP  [<ffffffff824d6062>] phonet_pernet+0x182/0x1a0
[  163.734501]  RSP <ffff8800674d5ca8>
[  163.861289] ---[ end trace fb5615826c548066 ]---

After investigation it turns out there were two issues.
1) Phonet was not implementing network devices but was using register_pernet_device
   instead of register_pernet_subsys.

   This was allowing there to be cases when phonenet was not initialized and
   the phonet net_generic was not set for a network namespace when network
   device events were being reported on the netdevice_notifier for a network
   namespace leading to the oops above.

2) phonet_exit_net was implementing a confusing and special case of handling all
   network devices from going away that it was hard to see was correct, and would
   only occur when the phonet module was removed.

   Now that unregister_netdevice_notifier has been modified to synthesize unregistration
   events for the network devices that are extant when called this confusing special
   case in phonet_exit_net is no longer needed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet: In unregister_netdevice_notifier unregister the netdevices.
Eric W. Biederman [Fri, 6 Apr 2012 15:33:35 +0000 (15:33 +0000)]
net: In unregister_netdevice_notifier unregister the netdevices.

We already synthesize events in register_netdevice_notifier and synthesizing
events in unregister_netdevice_notifier allows to us remove the need for
special case cleanup code.

This change should be safe as it adds no new cases for existing callers
of unregiser_netdevice_notifier to handle.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville...
David S. Miller [Fri, 13 Apr 2012 00:12:31 +0000 (20:12 -0400)]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless-next

12 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Thu, 12 Apr 2012 23:41:23 +0000 (19:41 -0400)]
Merge git://git./linux/kernel/git/davem/net

12 years agonet/ipv6/exthdrs.c: Strict PadN option checking
Eldad Zack [Thu, 12 Apr 2012 21:36:17 +0000 (17:36 -0400)]
net/ipv6/exthdrs.c: Strict PadN option checking

Added strict checking of PadN, as PadN can be used to increase header
size and thus push the protocol header into the 2nd fragment.

PadN is used to align the options within the Hop-by-Hop or
Destination Options header to 64-bit boundaries. The maximum valid
size is thus 7 bytes.
RFC 4942 recommends to actively check the "payload" itself and
ensure that it contains only zeroes.

See also RFC 4942 section 2.1.9.5.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoMerge branch 'akpm' (Andrew's patch-bomb)
Linus Torvalds [Thu, 12 Apr 2012 21:15:21 +0000 (14:15 -0700)]
Merge branch 'akpm' (Andrew's patch-bomb)

Merge fixes from Andrew Morton.

* emailed from Andrew Morton <akpm@linux-foundation.org>: (14 patches)
  panic: fix stack dump print on direct call to panic()
  drivers/rtc/rtc-pl031.c: enable clock on all ST variants
  Revert "mm: vmscan: fix misused nr_reclaimed in shrink_mem_cgroup_zone()"
  hugetlb: fix race condition in hugetlb_fault()
  drivers/rtc/rtc-twl.c: use static register while reading time
  drivers/rtc/rtc-s3c.c: add placeholder for driver private data
  drivers/rtc/rtc-s3c.c: fix compilation error
  MAINTAINERS: add PCDP console maintainer
  memcg: do not open code accesses to res_counter members
  drivers/rtc/rtc-efi.c: fix section mismatch warning
  drivers/rtc/rtc-r9701.c: reset registers if invalid values are detected
  drivers/char/random.c: fix boot id uniqueness race
  memcg: fix broken boolen expression
  memcg: fix up documentation on global LRU

12 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 12 Apr 2012 21:04:33 +0000 (14:04 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix bluetooth userland regression reported by Keith Packard, from
    Gustavo Padovan.

 2) Revert ath9k PS idle change, from Sujith Manoharan.

 3) Correct default TCP memory limits (again), from Eric Dumazet.

 4) Fix tcp_rcv_rtt_update() accidental use of unscaled RTT, from Neal
    Cardwell.

 5) We made a facility for layers like wireless to say how much tailroom
    they need in the SKB for link layer stuff such as wireless
    encryption etc., but TCP works hard to fill every SKB out to the end
    defeating this specification.

    This leads to every TCP packet getting reallocated by the wireless
    code in order to have the right amount of tailroom available.

    Fix TCP to only fill SKBs out to the real amount of data area it
    asked for during the allocation, this way it won't eat into the
    slack added for the device's tailroom needs.

    Reported by Marc Merlin and fixed by Eric Dumazet.

 6) Leaks, endian bugs, and new device IDs in bluetooth from Santosh
    Nayak, João Paulo Rechi Vita, Cho, Yu-Chen, Andrei Emeltchenko,
    AceLan Kao, and Andrei Emeltchenko.

 7) OOPS on tty_close fix in bluetooth's hci_ldisc from Johan Hovold.

 8) netfilter erroneously scales TCP window twice, fix from Changli Gao.

 9) Memleak fix in wext-core from Julia Lawall.

10) Consistently handle invalid TCP packets in ipv4 vs.  ipv6 conntrack,
    from Jozsef Kadlecsik.

11) Validate IP header length properly in netfilter conntrack's
    ipv4_get_l4proto().

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (39 commits)
  NFC: Fix the LLCP Tx fragmentation loop
  rtlwifi: Add missing DMA buffer unmapping for PCI drivers
  rtlwifi: Preallocate USB read buffers and eliminate kalloc in read routine
  tcp: avoid order-1 allocations on wifi and tx path
  net: allow pskb_expand_head() to get maximum tailroom
  bridge: Do not send queries on multicast group leaves
  MAINTAINERS: Mark NATSEMI driver as orphan'd.
  tcp: fix tcp_rcv_rtt_update() use of an unscaled RTT sample
  tcp: restore correct limit
  Revert "ath9k: fix going to full-sleep on PS idle"
  rt2x00: Fix rfkill_polling register function.
  bcma: fix build error on MIPS; implicit pcibios_enable_device
  netfilter: nf_conntrack: fix incorrect logic in nf_conntrack_init_net
  netfilter: nf_ct_ipv4: packets with wrong ihl are invalid
  netfilter: nf_ct_ipv4: handle invalid IPv4 and IPv6 packets consistently
  net/wireless/wext-core.c: add missing kfree
  rtlwifi: Fix oops on rate-control failure
  mac80211: Convert WARN_ON to WARN_ON_ONCE
  rtlwifi: rtl8192de: Fix firmware initialization
  nl80211: ensure interface is up in various APIs
  ...

12 years agoMerge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Thu, 12 Apr 2012 20:58:23 +0000 (13:58 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "Mostly exynos and intel.

  Intel has 3 regression fixers (more info in intel merge commit), along
  with some other make hw work fixes, exynos has some cleanups and an
  ioctl fix.

  A couple of radeon fixes, couple of build fixes, and a savage
  userspace interface possible overflow fix."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (23 commits)
  drm/exynos: fixed exynos broken ioctl
  drm/i915: clear fencing tracking state when retiring requests
  drm/exynos: fix to pointer manager member of struct exynos_drm_subdrv
  drm/exynos: fix struct for operation callback functions to driver name
  drm/exynos: use define instead of default_win member in struct mixer_context
  drm/exynos: rename s/HDMI_OVERLAY_NUMBER/MIXER_WIN_NR
  drm/exynos: remove unused codes in hdmi and mixer
  drm/exynos: remove unnecessary type conversion of hdmi and mixer
  drm/i915: make rc6 module parameter read-only
  drm/i915: implement ColorBlt w/a
  drm/i915/ringbuffer: Exclude last 2 cachlines of ring on 845g
  Revert "drm/i915: reenable gmbus on gen3+ again"
  drm/radeon: only add the mm i2c bus if the hw_i2c module param is set
  vgaarb.h: fix build warnings
  drm/i915: properly compute dp dithering for user-created modes
  drm/radeon/kms: fix DVO setup on some r4xx chips
  drm/savage: fix integer overflows in savage_bci_cmdbuf()
  drm/radeon: replace udelay with mdelay for long timeouts
  drm/i915: Finish any pending operations on the framebuffer before disabling
  drm/i915: Removed IVB forced enable of sprite dest key.
  ...

12 years agonet: Fixed coding style issues relating to braces.
Jeffrin Jose [Sun, 8 Apr 2012 06:07:42 +0000 (06:07 +0000)]
net: Fixed coding style issues relating to braces.

Fixed coding style issues in net/core/utils.c
in relation with braces placement.

Signed-off-by: Jeffrin Jose <ahiliation@yahoo.co.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agonet/ipv6/ipv6_sockglue.c: Removed redundant extern
Eldad Zack [Tue, 10 Apr 2012 08:51:27 +0000 (08:51 +0000)]
net/ipv6/ipv6_sockglue.c: Removed redundant extern

extern int sysctl_mld_max_msf is already defined in linux/ipv6.h.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoMerge tag 'md-3.4-fixes' of git://neil.brown.name/md
Linus Torvalds [Thu, 12 Apr 2012 20:12:56 +0000 (13:12 -0700)]
Merge tag 'md-3.4-fixes' of git://neil.brown.name/md

Pull a few more fixes for md from NeilBrown:
 "Two are tagged for -stable.  They can cause an oops, but very rarely."

* tag 'md-3.4-fixes' of git://neil.brown.name/md:
  md/bitmap: prevent bitmap_daemon_work running while initialising bitmap
  md/raid1,raid10: Fix calculation of 'vcnt' when processing error recovery.
  MD: Bitmap version cleanup.

12 years agopanic: fix stack dump print on direct call to panic()
Jason Wessel [Thu, 12 Apr 2012 19:49:17 +0000 (12:49 -0700)]
panic: fix stack dump print on direct call to panic()

Commit 6e6f0a1f0fa6 ("panic: don't print redundant backtraces on oops")
causes a regression where no stack trace will be printed at all for the
case where kernel code calls panic() directly while not processing an
oops, and of course there are 100's of instances of this type of call.

The original commit executed the check (!oops_in_progress), but this will
always be false because just before the dump_stack() there is a call to
bust_spinlocks(1), which does the following:

  void __attribute__((weak)) bust_spinlocks(int yes)
  {
if (yes) {
++oops_in_progress;

The proper way to resolve the problem that original commit tried to
solve is to avoid printing a stack dump from panic() when the either of
the following conditions is true:

  1) TAINT_DIE has been set (this is done by oops_end())
     This indicates and oops has already been printed.
  2) oops_in_progress > 1
     This guards against the rare case where panic() is invoked
     a second time, or in between oops_begin() and oops_end()

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: <stable@vger.kernel.org> [3.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agodrivers/rtc/rtc-pl031.c: enable clock on all ST variants
Linus Walleij [Thu, 12 Apr 2012 19:49:16 +0000 (12:49 -0700)]
drivers/rtc/rtc-pl031.c: enable clock on all ST variants

The ST variants of the PL031 all require bit 26 in the control register
to be set before they work properly.  Discovered this when testing on
the Nomadik board where it would suprisingly just stand still.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Mian Yousaf Kaukab <mian.yousaf.kaukab@stericsson.com>
Cc: Alessandro Rubini <rubini@unipv.it>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoRevert "mm: vmscan: fix misused nr_reclaimed in shrink_mem_cgroup_zone()"
Ying Han [Thu, 12 Apr 2012 19:49:16 +0000 (12:49 -0700)]
Revert "mm: vmscan: fix misused nr_reclaimed in shrink_mem_cgroup_zone()"

This reverts commit c38446cc65e1f2b3eb8630c53943b94c4f65f670.

Before the commit, the code makes senses to me but not after the commit.
The "nr_reclaimed" is the number of pages reclaimed by scanning through
the memcg's lru lists.  The "nr_to_reclaim" is the target value for the
whole function.  For example, we like to early break the reclaim if
reclaimed 32 pages under direct reclaim (not DEF_PRIORITY).

After the reverted commit, the target "nr_to_reclaim" is decremented each
time by "nr_reclaimed" but we still use it to compare the "nr_reclaimed".
It just doesn't make sense to me...

Signed-off-by: Ying Han <yinghan@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agohugetlb: fix race condition in hugetlb_fault()
Chris Metcalf [Thu, 12 Apr 2012 19:49:15 +0000 (12:49 -0700)]
hugetlb: fix race condition in hugetlb_fault()

The race is as follows:

Suppose a multi-threaded task forks a new process (on cpu A), thus
bumping up the ref count on all the pages.  While the fork is occurring
(and thus we have marked all the PTEs as read-only), another thread in
the original process (on cpu B) tries to write to a huge page, taking an
access violation from the write-protect and calling hugetlb_cow().  Now,
suppose the fork() fails.  It will undo the COW and decrement the ref
count on the pages, so the ref count on the huge page drops back to 1.
Meanwhile hugetlb_cow() also decrements the ref count by one on the
original page, since the original address space doesn't need it any
more, having copied a new page to replace the original page.  This
leaves the ref count at zero, and when we call unlock_page(), we panic.

fork on CPU A fault on CPU B
============= ==============
...
down_write(&parent->mmap_sem);
down_write_nested(&child->mmap_sem);
...
while duplicating vmas
if error
break;
...
up_write(&child->mmap_sem);
up_write(&parent->mmap_sem); ...
down_read(&parent->mmap_sem);
...
lock_page(page);
handle COW
page_mapcount(old_page) == 2
alloc and prepare new_page
...
handle error
page_remove_rmap(page);
put_page(page);
...
fold new_page into pte
page_remove_rmap(page);
put_page(page);
...
oops ==> unlock_page(page);
up_read(&parent->mmap_sem);

The solution is to take an extra reference to the page while we are
holding the lock on it.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>