review.tizen.org Git - platform/kernel/linux-stable.git/log

sctp: Fix list corruption resulting from freeing an association on a list

A few days ago Dave Jones reported this oops:

[22766.294255] general protection fault: 0000 [#1] PREEMPT SMP
[22766.295376] CPU 0
[22766.295384] Modules linked in:
[22766.387137]  ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90
ffff880147c03a74
[22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000
[22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000,
[22766.387137] Stack:
[22766.387140]  ffff880147c03a10
[22766.387140]  ffffffffa169f2b6
[22766.387140]  ffff88013ed95728
[22766.387143]  0000000000000002
[22766.387143]  0000000000000000
[22766.387143]  ffff880003fad062
[22766.387144]  ffff88013c120000
[22766.387144]
[22766.387145] Call Trace:
[22766.387145]  <IRQ>
[22766.387150]  [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0
[sctp]
[22766.387154]  [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
[22766.387157]  [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
[22766.387161]  [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
[22766.387163]  [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
[22766.387166]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387168]  [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
[22766.387169]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387171]  [<ffffffff81590a07>] ip_local_deliver+0x47/0x80
[22766.387172]  [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
[22766.387174]  [<ffffffff81590c54>] ip_rcv+0x214/0x320
[22766.387176]  [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
[22766.387178]  [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
[22766.387180]  [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
[22766.387182]  [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
[22766.387183]  [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
[22766.387185]  [<ffffffff81559280>] napi_skb_finish+0x70/0xa0
[22766.387187]  [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
[22766.387218]  [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
[22766.387242]  [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
[22766.387266]  [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
[22766.387268]  [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
[22766.387270]  [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
[22766.387273]  [<ffffffff810734d0>] __do_softirq+0xe0/0x420
[22766.387275]  [<ffffffff8169826c>] call_softirq+0x1c/0x30
[22766.387278]  [<ffffffff8101db15>] do_softirq+0xd5/0x110
[22766.387279]  [<ffffffff81073bc5>] irq_exit+0xd5/0xe0
[22766.387281]  [<ffffffff81698b03>] do_IRQ+0x63/0xd0
[22766.387283]  [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
[22766.387283]  <EOI>
[22766.387284]
[22766.387285]  [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
[22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48
89 e5 48 83
ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00
48 89 fb
49 89 f5 66 c1 c0 08 66 39 46 02
[22766.387307]
[22766.387307] RIP
[22766.387311]  [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
[22766.387311]  RSP <ffff880147c039b0>
[22766.387142]  ffffffffa16ab120
[22766.599537] ---[ end trace 3f6dae82e37b17f5 ]---
[22766.601221] Kernel panic - not syncing: Fatal exception in interrupt

It appears from his analysis and some staring at the code that this is likely
occuring because an association is getting freed while still on the
sctp_assoc_hashtable.  As a result, we get a gpf when traversing the hashtable
while a freed node corrupts part of the list.

Nominally I would think that an mibalanced refcount was responsible for this,
but I can't seem to find any obvious imbalance.  What I did note however was
that the two places where we create an association using
sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths
which free a newly created association after calling sctp_primitive_ASSOCIATE.
sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which
issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to
the aforementioned hash table.  the sctp command interpreter that process side
effects has not way to unwind previously processed commands, so freeing the
association from the __sctp_connect or sctp_sendmsg error path would lead to a
freed association remaining on this hash table.

I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init,
which allows us to proerly use hlist_unhashed to check if the node is on a
hashlist safely during a delete.  That in turn alows us to safely call
sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths
before freeing them, regardles of what the associations state is on the hash
list.

I noted, while I was doing this, that the __sctp_unhash_endpoint was using
hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes
pointers to make that function work properly, so I fixed that up in a simmilar
fashion.

I attempted to test this using a virtual guest running the SCTP_RR test from
netperf in a loop while running the trinity fuzzer, both in a loop.  I wasn't
able to recreate the problem prior to this fix, nor was I able to trigger the
failure after (neither of which I suppose is suprising).  Given the trace above
however, I think its likely that this is what we hit.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: davej@redhat.com
CC: davej@redhat.com
CC: "David S. Miller" <davem@davemloft.net>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

net: respect GFP_DMA in __netdev_alloc_skb()

Few drivers use GFP_DMA allocations, and netdev_alloc_frag()
doesn't allocate pages in DMA zone.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_sfb: Fix missing NULL check

Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=44461

Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Fix bug in bnx2_free_tx_skbs().

In rare cases, bnx2x_free_tx_skbs() can unmap the wrong DMA address
when it gets to the last entry of the tx ring. We were not using
the proper macro to skip the last entry when advancing the tx index.

Reported-by: Zongyun Lai <zlai@vmware.com>
Reviewed-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

IPoIB: fix skb truesize underestimatiom

Or Gerlitz reported triggering of WARN_ON_ONCE(delta < len); in
skb_try_coalesce()
This warning tracks drivers that incorrectly set skb->truesize

IPoIB indeed allocates a full page to store a fragment, but only
accounts in skb->truesize the used part of the page (frame length)

This patch fixes skb truesize underestimation, and
also fixes a performance issue, because RX skbs have not enough tailroom
to allow IP and TCP stacks to pull their header in skb linear part
without an expensive call to pskb_expand_head()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: Shlomo Pongartz <shlomop@mellanox.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix memory leak - vlan_info struct

In driver reload test there is a memory leak.
The structure vlan_info was not freed when the driver was removed.
It was not released since the nr_vids var is one after last vlan was removed.
The nr_vids is one, since vlan zero is added to the interface when the interface
is being set, but the vlan zero is not deleted at unregister.
Fix - delete vlan zero when we unregister the device.

Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- fix a bug generated by the wrong interaction between the GW feature and the
Bridge Loop Avoidance

gianfar: fix potential sk_wmem_alloc imbalance

commit db83d136d7f753 (gianfar: Fix missing sock reference when
processing TX time stamps) added a potential sk_wmem_alloc imbalance

If the new skb has a different truesize than old one, we can get a
negative sk_wmem_alloc once new skb is orphaned at TX completion.

Now we no longer early orphan skbs in dev_hard_start_xmit(), this
probably can lead to fatal bugs.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Manfred Rudigier <manfred.rudigier@omicron.at>
Cc: Claudiu Manoil <claudiu.manoil@freescale.com>
Cc: Jiajun Wu <b06378@freescale.com>
Cc: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/ethernet/broadcom/cnic.c: remove invalid reference to list iterator variable

If list_for_each_entry, etc complete a traversal of the list, the iterator
variable ends up pointing to an address at an offset from the list head,
and not a meaningful structure.  Thus this value should not be used after
the end of the iterator.  There does not seem to be a meaningful value to
provide to netdev_warn.  Replace with pr_warn, since pr_err is used
elsewhere.

This problem was found using Coccinelle (http://coccinelle.lip6.fr/).

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/rxrpc/ar-peer.c: remove invalid reference to list iterator variable

If list_for_each_entry, etc complete a traversal of the list, the iterator
variable ends up pointing to an address at an offset from the list head,
and not a meaningful structure. Thus this value should not be used after
the end of the iterator. This seems to be a copy-paste bug from a previous
debugging message, and so the meaningless value is just deleted.

This problem was found using Coccinelle (http://coccinelle.lip6.fr/).

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/isdn/mISDN/stack.c: remove invalid reference to list iterator variable

If list_for_each_entry, etc complete a traversal of the list, the iterator
variable ends up pointing to an address at an offset from the list head,
and not a meaningful structure. Thus this value should not be used after
the end of the iterator. The dereferences are just deleted from the
debugging statement.

This problem was found using Coccinelle (http://coccinelle.lip6.fr/).

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cgroup: fix out of bounds accesses

dev->priomap is allocated by extend_netdev_table() called from
update_netdev_tables().
And this is only called if write_priomap() is called.

But if write_priomap() is not called, it seems we can have out of bounds
accesses in cgrp_destroy(), read_priomap() & skb_update_prio()

With help from Gao Feng

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: debugfs and network namespaces are incompatible

The bonding debugfs support has been broken in the presence of network
namespaces since it has been added. The debugfs support does not handle
multiple bonding devices with the same name in different network
namespaces.

I haven't had any bug reports, and I'm not interested in getting any.
Disable the debugfs support when network namespaces are enabled.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: Manage /proc/net/bonding/ entries from the netdev events

It was recently reported that moving a bonding device between network
namespaces causes warnings from /proc. It turns out after the move we
were trying to add and to remove the /proc/net/bonding entries from the
wrong network namespace.

Move the bonding /proc registration code into the NETDEV_REGISTER and
NETDEV_UNREGISTER events where the proc registration and unregistration
will always happen at the right time.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: Fix for higher mtu size handling

For the higher mtu sizes requiring the buffer size greater than 8192,
the buffers are sent or received using multiple dma descriptors/ same
descriptor with option of multi buffer handling.
It was observed during tests that the driver was missing on data
packets during the normal ping operations if the data buffers being used
catered to jumbo frame handling.

The memory barrriers are added in between preparation of dma descriptors
in the jumbo frame handling path to ensure all instructions before
enabling the dma are complete.

Signed-off-by: Deepak Sikri <deepak.sikri@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: Fix for nfs hang on multiple reboot

It was observed that during multiple reboots nfs hangs. The status of
receive descriptors shows that all the descriptors were in control of
CPU, and none were assigned to DMA.
Also the DMA status register confirmed that the Rx buffer is
unavailable.

This patch adds the fix for the same by adding the memory barriers to
ascertain that the all instructions before enabling the Rx or Tx DMA are
completed which involves the proper setting of the ownership bit in DMA
descriptors.

Signed-off-by: Deepak Sikri <deepak.sikri@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/linville/wireless into for-davem

iwlegacy: don't mess up the SCD when removing a key

When we remove a key, we put a key index which was supposed
to tell the fw that we are actually removing the key. But
instead the fw took that index as a valid index and messed
up the SRAM of the device.

This memory corruption on the device mangled the data of
the SCD. The impact on the user is that SCD queue 2 got
stuck after having removed keys.

Reported-by: Paul Bolle <pebolle@tiscali.nl>
Cc: stable@vger.kernel.org
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

iwlegacy: always monitor for stuck queues

This is iwlegacy version of:

commit 342bbf3fee2fa9a18147e74b2e3c4229a4564912
Author: Johannes Berg <johannes.berg@intel.com>
Date:   Sun Mar 4 08:50:46 2012 -0800

    iwlwifi: always monitor for stuck queues

    If we only monitor while associated, the following
    can happen:
     - we're associated, and the queue stuck check
       runs, setting the queue "touch" time to X
     - we disassociate, stopping the monitoring,
       which leaves the time set to X
     - almost 2s later, we associate, and enqueue
       a frame
     - before the frame is transmitted, we monitor
       for stuck queues, and find the time set to
       X, although it is now later than X + 2000ms,
       so we decide that the queue is stuck and
       erroneously restart the device

Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00usb: fix indexes ordering on RX queue kick

On rt2x00_dmastart() we increase index specified by Q_INDEX and on
rt2x00_dmadone() we increase index specified by Q_INDEX_DONE. So entries
between Q_INDEX_DONE and Q_INDEX are those we currently process in the
hardware. Entries between Q_INDEX and Q_INDEX_DONE are those we can
submit to the hardware.

According to that fix rt2x00usb_kick_queue(), as we need to submit RX
entries that are not processed by the hardware. It worked before only
for empty queue, otherwise was broken.

Note that for TX queues indexes ordering are ok. We need to kick entries
that have filled skb, but was not submitted to the hardware, i.e.
started from Q_INDEX_DONE and have ENTRY_DATA_PENDING bit set.

From practical standpoint this fixes RX queue stall, usually reproducible
in AP mode, like for example reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=828824

Reported-and-tested-by: Franco Miceli <fmiceli@plan.ceibal.edu.uy>
Reported-and-tested-by: Tom Horsley <horsley1953@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

mwifiex: fix Coverity SCAN CID 709078: Resource leak (RESOURCE_LEAK)

> *. CID 709078: Resource leak (RESOURCE_LEAK)
> - drivers/net/wireless/mwifiex/cfg80211.c, line: 935
> Assigning: "bss_cfg" = storage returned from "kzalloc(132UL, 208U)"
> - but was not free
> drivers/net/wireless/mwifiex/cfg80211.c:935

Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

mac80211: destroy assoc_data correctly if assoc fails

If association failed due to internal error (e.g. no
supported rates IE), we call ieee80211_destroy_assoc_data()
with assoc=true, while we actually reject the association.

This results in the BSSID not being zeroed out.

After passing assoc=false, we no longer have to call
sta_info_destroy_addr() explicitly. While on it, move
the "associated" message after the assoc_success check.

Cc: stable@vger.kernel.org [3.4+]
Signed-off-by: Eliad Peller <eliad@wizery.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

NFC: Prevent NULL deref when getting socket name

llcp_sock_getname can be called without a device attached to the nfc_llcp_sock.

This would lead to the following BUG:

[  362.341807] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  362.341815] IP: [<ffffffff836258e5>] llcp_sock_getname+0x75/0xc0
[  362.341818] PGD 31b35067 PUD 30631067 PMD 0
[  362.341821] Oops: 0000 [#627] PREEMPT SMP DEBUG_PAGEALLOC
[  362.341826] CPU 3
[  362.341827] Pid: 7816, comm: trinity-child55 Tainted: G      D W    3.5.0-rc4-next-20120628-sasha-00005-g9f23eb7 #479
[  362.341831] RIP: 0010:[<ffffffff836258e5>]  [<ffffffff836258e5>] llcp_sock_getname+0x75/0xc0
[  362.341832] RSP: 0018:ffff8800304fde88  EFLAGS: 00010286
[  362.341834] RAX: 0000000000000000 RBX: ffff880033cb8000 RCX: 0000000000000001
[  362.341835] RDX: ffff8800304fdec4 RSI: ffff8800304fdec8 RDI: ffff8800304fdeda
[  362.341836] RBP: ffff8800304fdea8 R08: 7ebcebcb772b7ffb R09: 5fbfcb9c35bdfd53
[  362.341838] R10: 4220020c54326244 R11: 0000000000000246 R12: ffff8800304fdec8
[  362.341839] R13: ffff8800304fdec4 R14: ffff8800304fdec8 R15: 0000000000000044
[  362.341841] FS:  00007effa376e700(0000) GS:ffff880035a00000(0000) knlGS:0000000000000000
[  362.341843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  362.341844] CR2: 0000000000000000 CR3: 0000000030438000 CR4: 00000000000406e0
[  362.341851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  362.341856] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  362.341858] Process trinity-child55 (pid: 7816, threadinfo ffff8800304fc000, task ffff880031270000)
[  362.341858] Stack:
[  362.341862]  ffff8800304fdea8 ffff880035156780 0000000000000000 0000000000001000
[  362.341865]  ffff8800304fdf78 ffffffff83183b40 00000000304fdec8 0000006000000000
[  362.341868]  ffff8800304f0027 ffffffff83729649 ffff8800304fdee8 ffff8800304fdf48
[  362.341869] Call Trace:
[  362.341874]  [<ffffffff83183b40>] sys_getpeername+0xa0/0x110
[  362.341877]  [<ffffffff83729649>] ? _raw_spin_unlock_irq+0x59/0x80
[  362.341882]  [<ffffffff810f342b>] ? do_setitimer+0x23b/0x290
[  362.341886]  [<ffffffff81985ede>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  362.341889]  [<ffffffff8372a539>] system_call_fastpath+0x16/0x1b
[  362.341921] Code: 84 00 00 00 00 00 b8 b3 ff ff ff 48 85 db 74 54 66 41 c7 04 24 27 00 49 8d 7c 24 12 41 c7 45 00 60 00 00 00 48 8b 83 28 05 00 00 <8b> 00 41 89 44 24 04 0f b6 83 41 05 00 00 41 88 44 24 10 0f b6
[  362.341924] RIP  [<ffffffff836258e5>] llcp_sock_getname+0x75/0xc0
[  362.341925]  RSP <ffff8800304fde88>
[  362.341926] CR2: 0000000000000000
[  362.341928] ---[ end trace 6d450e935ee18bf3 ]---

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

mac80211: correct size the argument to kzalloc in minstrel_ht

msp has type struct minstrel_ht_sta_priv not struct minstrel_ht_sta.

(This incorporates the fixup originally posted as "mac80211: fix kzalloc
memory corruption introduced in minstrel_ht". -- JWL)

Reported-by: Fengguang Wu <wfg@linux.intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

Merge branch 'master' of git://1984.lsi.us.es/nf

Pablo Neira Ayuso says:

====================
* One to get the timeout special parameter for the SET target back working
  (this was introduced while trying to fix another bug in 3.4) from
  Jozsef Kadlecsik.

* One crash fix if containers and nf_conntrack are used reported by Hans
  Schillstrom by myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nf_ct_ecache: fix crash with multiple containers, one shutting down

Hans reports that he's still hitting:

BUG: unable to handle kernel NULL pointer dereference at 000000000000027c
IP: [<ffffffff813615db>] netlink_has_listeners+0xb/0x60
PGD 0
Oops: 0000 [#3] PREEMPT SMP
CPU 0

It happens when adding a number of containers with do:

nfct_query(h, NFCT_Q_CREATE, ct);

and most likely one namespace shuts down.

this problem was supposed to be fixed by:
70e9942 netfilter: nf_conntrack: make event callback registration per-netns

Still, it was missing one rcu_access_pointer to check if the callback
is set or not.

Reported-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ipset: timeout fixing bug broke SET target special timeout value

The patch "127f559 netfilter: ipset: fix timeout value overflow bug"
broke the SET target when no timeout was specified.

Reported-by: Jean-Philippe Menil <jean-philippe.menil@univ-nantes.fr>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cnic: Don't use netdev->base_addr

commit c0357e975afdbbedab5c662d19bef865f02adc17
bnx2: stop using net_device.{base_addr, irq}.

removed netdev->base_addr so we need to update cnic to get the MMIO
base address from pci_resource_start(). Otherwise, mmap of the uio
device will fail.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qmi_wwan: add ZTE MF60

Adding a device with limited QMI support. It does not support
normal QMI_WDS commands for connection management. Instead,
sending a QMI_CTL SET_INSTANCE_ID command is required to
enable the network interface:

01 0f 00 00 00 00 00 00 20 00 04 00 01 01 00 00

A number of QMI_DMS and QMI_NAS commands are also supported
for optional device management.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

cgroup: fix panic in netprio_cgroup

we set max_prioidx to the first zero bit index of prioidx_map in
function get_prioidx.

So when we delete the low index netprio cgroup and adding a new
netprio cgroup again,the max_prioidx will be set to the low index.

when we set the high index cgroup's net_prio.ifpriomap,the function
write_priomap will call update_netdev_tables to alloc memory which
size is sizeof(struct netprio_map) + sizeof(u32) * (max_prioidx + 1),
so the size of array that map->priomap point to is max_prioidx +1,
which is low than what we actually need.

fix this by adding check in get_prioidx,only set max_prioidx when
max_prioidx low than the new prioidx.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdev/phy: Fixup lockdep warnings in mdio-mux.c

With lockdep enabled we get:

=============================================
[ INFO: possible recursive locking detected ]
3.4.4-Cavium-Octeon+ #313 Not tainted
---------------------------------------------
kworker/u:1/36 is trying to acquire lock:
(&bus->mdio_lock){+.+...}, at: [<ffffffff813da7e8>] mdio_mux_read+0x38/0xa0

but task is already holding lock:
(&bus->mdio_lock){+.+...}, at: [<ffffffff813d79e4>] mdiobus_read+0x44/0x88

other info that might help us debug this:
Possible unsafe locking scenario:

       CPU0
       ----
  lock(&bus->mdio_lock);
  lock(&bus->mdio_lock);

*** DEADLOCK ***

May be due to missing lock nesting notation
.
.
.

This is a false positive, since we are indeed using 'nested' locking,
we need to use mutex_lock_nested().

Now in theory we can stack multiple MDIO multiplexers, but that would
require passing the nesting level (which is difficult to know) to
mutex_lock_nested().  Instead we assume the simple case of a single
level of nesting.  Since these are only warning messages, it isn't so
important to solve the general case.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: DCB and SR-IOV can not co-exist and will cause hangs

DCB and SR-IOV cannot currently be enabled at the same time as the queueing
schemes are incompatible. If they are both enabled it will result in Tx
hangs since only the first Tx queue will be able to transmit any traffic.

This simple fix for this is to block us from enabling TCs in ixgbe_setup_tc
if SR-IOV is enabled. This change will be reverted once we can support
SR-IOV and DCB coexistence.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netem: add limitation to reordered packets

Fix two netem bugs :

1) When a frame was dropped by tfifo_enqueue(), drop counter
   was incremented twice.

2) When reordering is triggered, we enqueue a packet without
   checking queue limit. This can OOM pretty fast when this
   is repeated enough, since skbs are orphaned, no socket limit
   can help in this situation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mark Gordon <msg@google.com>
Cc: Andreas Terzis <aterzis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: fix issue of transmit queue 0 timed out

some people report atl1c could cause system hang with following
kernel trace info:
---------------------------------------
WARNING: at.../net/sched/sch_generic.c:258 dev_watchdog+0x1db/0x1d0()
...
NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
...
---------------------------------------
This is caused by netif_stop_queue calling when cable Link is down.
So remove netif_stop_queue, because link_watch will take it over.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Cloud Ren <cjren@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dont use __netdev_alloc_skb for bounce buffer

commit a1c7fff7e1 (net: netdev_alloc_skb() use build_skb()) broke b44 on
some 64bit machines.

It appears b44 and b43 use __netdev_alloc_skb() instead of alloc_skb()
for their bounce buffers.

There is no need to add an extra NET_SKB_PAD reservation for bounce
buffers :

- In TX path, NET_SKB_PAD is useless

- In RX path in b44, we force a copy of incoming frames if
GFP_DMA allocations were needed.

Reported-and-bisected-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ieee802154: verify packet size before trying to allocate it

Currently when sending data over datagram, the send function will attempt to
allocate any size passed on from the userspace.

We should make sure that this size is checked and limited. We'll limit it
to the MTU of the device, which is checked later anyway.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

batman-adv: check incoming packet type for bla

If the gateway functionality is used, some broadcast packets (DHCP
requests) may be transmitted as unicast packets. As the bridge loop
avoidance code now only considers the payload Ethernet destination,
it may drop the DHCP request for clients which are claimed by other
backbone gateways, because it falsely infers from the broadcast address
that the right backbone gateway should havehandled the broadcast.

Fix this by checking and delegating the batman-adv packet type used
for transmission.

Reported-by: Guido Iribarren <guidoiribarren@buenosaireslibre.org>
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>

Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux

Pull fix to common clk framework from Michael Turquette:
"The previous set of common clk fixes for -rc5 left an uninitialized
  int which could lead to bad array indexing when switching clock
  parents.  The issue is fixed with a trivial change to the code flow in
  __clk_set_parent."

* tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux:
  clk: fix parent validation in __clk_set_parent()

Merge tag 'md-3.5-fixes' of git://neil.brown.name/md

Pull raid10 build failure fix from NeilBrown:
"I really shouldn't do important things late in the day.  It seems that
  I get careless."

* tag 'md-3.5-fixes' of git://neil.brown.name/md:
  md/raid10: fix careless build error

Merge git://git./linux/kernel/git/davem/net

Pull networking update from David Miller:

1) Fix RX sequence number handling in mwifiex, from Stone Piao.

2) Netfilter ipset mis-compares device names, fix from Florian
    Westphal.

3) Fix route leak in ipv6 IPVS, from Eric Dumazet.

4) NFS fixes.  Several buffer overflows in NCI layer from Dan
    Rosenberg, and release sock OOPS'er fix from Eric Dumazet.

5) Fix WEP handling ath9k, we started using a bit the chip provides to
    indicate undecrypted packets but that bit turns out to be unreliable
    in certain configurations.  Fix from Felix Fietkau.

6) Fix Kconfig dependency bug in wlcore, from Randy Dunlap.

7) New USB IDs for rtlwifi driver from Larry Finger.

8) Fix crashes in qmi_wwan usbnet driver when disconnecting, from Bjørn
    Mork.

9) Gianfar driver programs coalescing settings properly in single queue
    mode, but does not do so in multi-queue mode.  Fix from Claudiu
    Manoil.

10) Missing module.h include in davinci_cpdma.c, from Daniel Mack.

11) Need dummy handler for IPSET_CMD_NONE otherwise we crash in ipset if
    we get this via nfnetlink, fix from Tomasz Bursztyka.

12) Missing RCU unlock in nfnetlink error path, also from Tomasz.

13) Fix divide by zero in igbvf when the user tries to set an RX
    coalescing value of 0 usecs, from Mitch A Williams.

14) We can process SCTP sacks for the wrong transport, oops.  Fix from
    Neil Horman.

15) Remove hw IP payload checksumming from e1000e driver.  This has zery
    value in our stack, and turning it on creates a very unintuitive
    restriction for users when using jumbo MTUs.

    Specifically, when IP payload checksums are on you cannot use both
    receive hashing offload and jumbo MTU.  Fix from Bruce Allan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
  e1000e: remove use of IP payload checksum
  sctp: be more restrictive in transport selection on bundled sacks
  igbvf: fix divide by zero
  netfilter: nfnetlink: fix missing rcu_read_unlock in nfnetlink_rcv_msg
  netfilter: ipset: fix crash if IPSET_CMD_NONE command is sent
  davinci_cpdma: include linux/module.h
  gianfar: Fix RXICr/TXICr programming for multi-queue mode
  net: Downgrade CAP_SYS_MODULE deprecated message from error to warning.
  net: qmi_wwan: fix Oops while disconnecting
  mwifiex: fix memory leak associated with IE manamgement
  ath9k: fix panic caused by returning a descriptor we have queued for reuse
  mac80211: correct behaviour on unrecognised action frames
  ath9k: enable serialize_regmode for non-PCIE AR9287
  rtlwifi: rtl8192cu: New USB IDs
  NFC: Return from rawsock_release when sk is NULL
  iwlwifi: fix activating inactive stations
  wlcore: drop INET dependency
  ath9k: fix dynamic WEP related regression
  NFC: Prevent multiple buffer overflows in NCI
  netfilter: update location of my trees
  ...

md/raid10: fix careless build error

build error introduced by commit b357f04a67c2aeee8

That function doesn't get extra args until a later patch. Bother.

Reported-by: Fengguang Wu <wfg@linux.intel.com>
Reported-by: Simon Kirby <sim@hostway.ca>
Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: NeilBrown <neilb@suse.de>

floppy: cancel any pending fd_timeouts before adding a new one

In commit 070ad7e793dc ("floppy: convert to delayed work and
single-thread wq") the 'fd_timeout' timer was converted to a delayed
work. However, the "del_timer(&fd_timeout)" was lost in the process,
and any previous pending timeouts would stay active when we then
re-queued the timeout.

This resulted in the floppy probe sequence having a (stale) 20s timeout
rather than the intended 3s timeout, and thus made booting with the
floppy driver (but no actual floppy controller) take much longer than it
should.

Of course, there's little reason for most people to compile the floppy
driver into the kernel at all, which is why most people never noticed.

Canceling the delayed work where we used to do the del_timer() fixes the
issue, and makes the floppy probing use the proper new timeout instead.
The three second timeout is still very wasteful, but better than the 20s
one.

Reported-and-tested-by: Andi Kleen <ak@linux.intel.com>
Reported-and-tested-by: Calvin Walton <calvin.walton@kepstin.ca>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block bits from Jens Axboe:
"As vacation is coming up, thought I'd better get rid of my pending
  changes in my for-linus branch for this iteration.  It contains:

   - Two patches for mtip32xx.  Killing a non-compliant sysfs interface
     and moving it to debugfs, where it belongs.

   - A few patches from Asias.  Two legit bug fixes, and one killing an
     interface that is no longer in use.

   - A patch from Jan, making the annoying partition ioctl warning a bit
     less annoying, by restricting it to !CAP_SYS_RAWIO only.

   - Three bug fixes for drbd from Lars Ellenberg.

   - A fix for an old regression for umem, it hasn't really worked since
     the plugging scheme was changed in 3.0.

   - A few fixes from Tejun.

   - A splice fix from Eric Dumazet, fixing an issue with pipe
     resizing."

* 'for-linus' of git://git.kernel.dk/linux-block:
  scsi: Silence unnecessary warnings about ioctl to partition
  block: Drop dead function blk_abort_queue()
  block: Mitigate lock unbalance caused by lock switching
  block: Avoid missed wakeup in request waitqueue
  umem: fix up unplugging
  splice: fix racy pipe->buffers uses
  drbd: fix null pointer dereference with on-congestion policy when diskless
  drbd: fix list corruption by failing but already aborted reads
  drbd: fix access of unallocated pages and kernel panic
  xen/blkfront: Add WARN to deal with misbehaving backends.
  blkcg: drop local variable @q from blkg_destroy()
  mtip32xx: Create debugfs entries for troubleshooting
  mtip32xx: Remove 'registers' and 'flags' from sysfs
  blkcg: fix blkg_alloc() failure path
  block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED
  block: fix return value on cfq_init() failure
  mtip32xx: Remove version.h header file inclusion
  xen/blkback: Copy id field when doing BLKIF_DISCARD.

clk: fix parent validation in __clk_set_parent()

The below commit introduced a bug in __clk_set_parent()
which could cause it to *skip* the parent validation
which makes sure the parent passed to the api is a valid
one.

    commit 7975059db572eb47f0fb272a62afeae272a4b209
    Author: Rajendra Nayak <rnayak@ti.com>
    Date:   Wed Jun 6 14:41:31 2012 +0530

        clk: Allow late cache allocation for clk->parents

This was identified by the following compiler warning..

    drivers/clk/clk.c: In function '__clk_set_parent':
    drivers/clk/clk.c:1083:5: warning: 'i' may be used uninitialized in this function [-Wuninitialized]

.. as reported by Marc Kleine-Budde.

There were various options discussed on how to fix this, one
being initing 'i' to clk->num_parents, but the below approach
was found to be more appropriate as it also makes the 'parent
validation' code simpler to read.

Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Rajendra Nayak <rnayak@ti.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
Cc: stable@kernel.org

Merge tag 'sound-3.5' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Just a few driver-specific fixes for ASoC and HD-audio."

* tag 'sound-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Fix no sound from ALC662 after Windows reboot
  ASoC: tlv320aic3x: Fix codec pll configure bug
  ASoC: wm2200: Add missing BCLK rate

Merge tag 'dm-3.5-fixes' of git://git./linux/kernel/git/agk/linux-dm

Pull device-mapper fixes from Alasdair G Kergon:
"Four minor thin provisioning fixes and correct and update dm-verity
  documentation."

* tag 'dm-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
  dm: verity fix documentation
  dm persistent data: fix allocation failure in space map checker init
  dm persistent data: handle space map checker creation failure
  dm persistent data: fix shadow_info_leak on dm_tm_destroy
  dm thin: commit metadata before creating metadata snapshot

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"One regression fix, two radeon fixes (one for an oops), and an i915
  fix to unload framebuffers earlier.

  We originally were going to leave the i915 fix until -next, but grub2
  in some situations causes vesafb/efifb to be loaded now, and this
  causes big slowdowns, and I have reports in rawhide I'd like to have
  fixed."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/i915: kick any firmware framebuffers before claiming the gtt
  drm: edid: Don't add inferred modes with higher resolution
  drm/radeon: fix rare segfault
  drm/radeon: fix VM page table setup on SI

Merge tag 'md-3.5-fixes' of git://neil.brown.name/md

Pull md fixes from NeilBrown:
"md: collection of bug fixes for 3.5

  You go away for 2 weeks vacation and what do you get when you come
  back? Piles of bugs :-)

  Some found by inspection, some by testing, some during use in the
  field, and some while developing for the next window..."

* tag 'md-3.5-fixes' of git://neil.brown.name/md:
  md: fix up plugging (again).
  md: support re-add of recovering devices.
  md/raid1: fix bug in read_balance introduced by hot-replace
  raid5: delayed stripe fix
  md/raid456: When read error cannot be recovered, record bad block
  md: make 'name' arg to md_register_thread non-optional.
  md/raid10: fix failure when trying to repair a read error.
  md/raid5: fix refcount problem when blocked_rdev is set.
  md:Add blk_plug in sync_thread.
  md/raid5: In ops_run_io, inc nr_pending before calling md_wait_for_blocked_rdev
  md/raid5: Do not add data_offset before call to is_badblock
  md/raid5: prefer replacing failed devices over want-replacement devices.
  md/raid10: Don't try to recovery unmatched (and unused) chunks.

Merge branch 'for-linus2' of git://git./linux/kernel/git/jmorris/linux-security

Pull security layer fixes from James Morris.

A documentation update, and a nommu build fix.

* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
security: Fix nommu build.
security: document no_new_privs

dm: verity fix documentation

Veritysetup is now part of cryptsetup package.
Remove on-disk header description (which is not parsed in kernel)
and point users to cryptsetup where it the format is documented.
Mention units for block size paramaters.
Fix target line specification and dmsetup parameters.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm persistent data: fix allocation failure in space map checker init

If CONFIG_DM_DEBUG_SPACE_MAPS is enabled and memory is fragmented and a
sufficiently-large metadata device is used in a thin pool then the space
map checker will fail to allocate the memory it requires.

Switch from kmalloc to vmalloc to allow larger virtually contiguous
allocations for the space map checker's internal count arrays.

Reported-by: Vivek Goyal <vgoyal@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm persistent data: handle space map checker creation failure

If CONFIG_DM_DEBUG_SPACE_MAPS is enabled and dm_sm_checker_create()
fails, dm_tm_create_internal() would still return success even though it
cleaned up all resources it was supposed to have created.  This will
lead to a kernel crash:

general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
...
RIP: 0010:[<ffffffff81593659>]  [<ffffffff81593659>] dm_bufio_get_block_size+0x9/0x20
Call Trace:
  [<ffffffff81599bae>] dm_bm_block_size+0xe/0x10
  [<ffffffff8159b8b8>] sm_ll_init+0x78/0xd0
  [<ffffffff8159c1a6>] sm_ll_new_disk+0x16/0xa0
  [<ffffffff8159c98e>] dm_sm_disk_create+0xfe/0x160
  [<ffffffff815abf6e>] dm_pool_metadata_open+0x16e/0x6a0
  [<ffffffff815aa010>] pool_ctr+0x3f0/0x900
  [<ffffffff8158d565>] dm_table_add_target+0x195/0x450
  [<ffffffff815904c4>] table_load+0xe4/0x330
  [<ffffffff815917ea>] ctl_ioctl+0x15a/0x2c0
  [<ffffffff81591963>] dm_ctl_ioctl+0x13/0x20
  [<ffffffff8116a4f8>] do_vfs_ioctl+0x98/0x560
  [<ffffffff8116aa51>] sys_ioctl+0x91/0xa0
  [<ffffffff81869f52>] system_call_fastpath+0x16/0x1b

Fix the space map checker code to return an appropriate ERR_PTR and have
dm_sm_disk_create() and dm_tm_create_internal() check for it with
IS_ERR.

Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm persistent data: fix shadow_info_leak on dm_tm_destroy

Cleanup the shadow table before destroying the transaction manager.

Reference: leak was identified with kmemleak when running
test_discard_random_sectors in the thinp-test-suite.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm thin: commit metadata before creating metadata snapshot

Userland sometimes sees a corrupt metadata block if metadata is changing
rapidly when a metadata snapshot is reserved for userland,  To make the
problem go away, commit before we take the metadata snapshot (which is a
sensible thing to do anyway).

The checksums mean userland spots this corruption immediately so there's
no risk of acting on incorrect data.  No corruption exists from the
kernel's point of view, and thin_check passes after pool shutdown.

I believe this is to do with shared blocks at the first level of the
{device, mapping} btree.  Prior to the metadata-snap support no sharing
at this level was possible, so this patch is only required after commit
cc8394d86f045b86ff303d3c9e4ce47d97148951 ("dm thin: provide userspace
access to pool metadata").

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

security: Fix nommu build.

The security + nommu configuration presently blows up with an undefined
reference to BDI_CAP_EXEC_MAP:

security/security.c: In function 'mmap_prot':
security/security.c:687:36: error: dereferencing pointer to incomplete type
security/security.c:688:16: error: 'BDI_CAP_EXEC_MAP' undeclared (first use in this function)
security/security.c:688:16: note: each undeclared identifier is reported only once for each function it appears in

include backing-dev.h directly to fix it up.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>

drm/i915: kick any firmware framebuffers before claiming the gtt

Especially vesafb likes to map everything as uc- (yikes), and if that
mapping hangs around still while we try to map the gtt as wc the
kernel will downgrade our request to uc-, resulting in abyssal
performance.

Unfortunately we can't do this as early as readon does (i.e. as the
first thing we do when initializing the hw) because our fb/mmio space
region moves around on a per-gen basis. So I've had to move it below
the gtt initialization, but that seems to work, too. The important
thing is that we do this before we set up the gtt wc mapping.

Now an altogether different question is why people compile their
kernels with vesafb enabled, but I guess making things just work isn't
bad per se ...

v2:
- s/radeondrmfb/inteldrmfb/
- fix up error handling

v3: Kill #ifdef X86, this is Intel after all. Noticed by Ben Widawsky.

v4: Jani Nikula complained about the pointless bool primary
initialization.

v5: Don't oops if we can't allocate, noticed by Chris Wilson.

v6: Resolve conflicts with agp rework and fixup whitespace.

This is commit e188719a2891f01b3100d in drm-next.

Backport to 3.5 -fixes queue requested by Dave Airlie - due to grub
using vesa on fedora their initrd seems to load vesafb before loading
the real kms driver. So tons more people actually experience a
dead-slow gpu. Hence also the Cc: stable.

Cc: stable@vger.kernel.org
Reported-and-tested-by: "Kilarski, Bernard R" <bernard.r.kilarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: edid: Don't add inferred modes with higher resolution

When a monitor EDID doesn't give the preferred bit, driver assumes
that the mode with the higest resolution and rate is the preferred
mode. Meanwhile the recent changes for allowing more modes in the
GFT/CVT ranges give actually more modes, and some modes may be over
the native size. Thus such a mode would be picked up as the preferred
mode although it's no native resolution.

For avoiding such a problem, this patch limits the addition of
inferred modes by checking not to be greater than other modes.
Also, it checks the duplicated mode entry at the same time.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon: fix rare segfault

In gem idle/busy ioctl the radeon object was derefenced after
drm_gem_object_unreference_unlocked which in case the object
have been destroyed lead to use of a possibly free pointer with
possibly wrong data.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

md: fix up plugging (again).

The value returned by "mddev_check_plug" is only valid until the
next 'schedule' as that will unplug things.  This could happen at any
call to mempool_alloc.
So just calling mddev_check_plug at the start doesn't really make
sense.

So call it just before, or just after, queuing things for the thread.
As the action that happens at unplug is to wake the thread, this makes
lots of sense.
If we cannot add a plug (which requires a small GFP_ATOMIC alloc) we
wake thread immediately.

RAID5 is a bit different.  Requests are queued for the thread and the
thread is woken by release_stripe.  So we don't need to wake the
thread on failure.
However the thread doesn't perform certain actions when there is any
active plug, so it is important to install a plug before waking the
thread.  So for RAID5 we install the plug *before* queuing the request
and waking the thread.

Without this patch it is possible for raid1 or raid10 to queue a
request without then waking the thread, resulting in the array locking
up.

Also change raid10 to only flush_pending_write when there are not
active plugs, just like raid1.

This patch is suitable for 3.0 or later.  I plan to submit it to
-stable, but I'll like to let it spend a few weeks in mainline
first to be sure it is completely safe.

Signed-off-by: NeilBrown <neilb@suse.de>

md: support re-add of recovering devices.

We currently only allow a device to be re-added if it appear to be
in-sync. This is overly restrictive as it may be desirable to re-add
a device that is in the middle of recovery.

So remove the test for "InSync" - the test on rdev->raid_disk is
sufficient to ensure that the re-add will succeed.

Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Tested-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid1: fix bug in read_balance introduced by hot-replace

When we added hot_replace we doubled the number of devices
that could be in a RAID1 array. So we doubled how far read_balance
would search. Unfortunately we didn't double the point at which
it looped back to the beginning - so it effectively loops over
all non-replacement disks twice.
This doesn't cause bad behaviour, but it pointless and means we
never read from replacement devices.

Signed-off-by: NeilBrown <neilb@suse.de>

raid5: delayed stripe fix

There isn't locking setting STRIPE_DELAYED and STRIPE_PREREAD_ACTIVE bits, but
the two bits have relationship. A delayed stripe can be moved to hold list only
when preread active stripe count is below IO_THRESHOLD. If a stripe has both
the bits set, such stripe will be in delayed list and preread count not 0,
which will make such stripe never leave delayed list.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid456: When read error cannot be recovered, record bad block

We may not be able to fix a bad block if:
- the array is degraded
- the over-write fails.

In these cases we currently eject the device, but we should
record a bad block if possible.

Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: make 'name' arg to md_register_thread non-optional.

Having the 'name' arg optional and defaulting to the current
personality name is no necessary and leads to errors, as when
changing the level of an array we can end up using the
name of the old level instead of the new one.

So make it non-optional and always explicitly pass the name
of the level that the array will be.

Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid10: fix failure when trying to repair a read error.

commit 58c54fcca3bac5bf9290cfed31c76e4c4bfbabaf
md/raid10: handle further errors during fix_read_error better.

in 3.1 added "r10_sync_page_io" which takes an IO size in sectors.
But we were passing the IO size in bytes!!!
This resulting in bio_add_page failing, and empty request being sent
down, and a consequent BUG_ON in scsi_lib.

[fix missing space in error message at same time]

This fix is suitable for 3.1.y and later.

Cc: stable@vger.kernel.org
Reported-by: Christian Balzer <chibi@gol.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc

Pull a couple more powerpc fixes from Benjamin Herrenschmidt:
"Here are two more fixes that I "missed" when scrubbing patchwork last
  week which are worth still having in 3.5."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/kvm: sldi should be sld
  powerpc/xmon: Use cpumask iterator to avoid warning

security: document no_new_privs

Document no_new_privs.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>

md/raid5: fix refcount problem when blocked_rdev is set.

commit 43220aa0f22cd3ce5b30246d50ccd696d119edea
md/raid5: fix a hang on device failure.

fixed a hang, but introduced a refcounting in-balance so
that if the presence of bad-blocks ever caused an rdev to
be 'blocked' we would increment the refcount on the rdev and
never decrement it.

So added the needed rdev_dec_pending when md_wait_for_blocked_rdev
is not called.

Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md:Add blk_plug in sync_thread.

Add blk_plug in sync_thread will increase the performance of sync.
Because sync_thread did not blk_plug,so when raid sync, the bio merge
not well.

Testing environment:
SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI
Controller.
OS:Linux xxx 3.5.0-rc2+ #340 SMP Tue Jun 12 09:00:25 CST 2012
x86_64 x86_64 x86_64 GNU/Linux.
RAID5: four ST31000524NS disk.

Without blk_plug:recovery speed about 63M/Sec;
Add blk_plug:recovery speed about 120M/Sec.

Using blktrace:
blktrace -d /dev/sdb -w 60  -o -|blkparse -i -

without blk_plug:
Total (8,16):
Reads Queued:      309811,     1239MiB Writes Queued:           0,        0KiB
Read Dispatches:   283583,     1189MiB Write Dispatches:        0,        0KiB
Reads Requeued:         0 Writes Requeued:         0
Reads Completed:   273351,     1149MiB Writes Completed:        0,        0KiB
Read Merges:        23533,    94132KiB Write Merges:            0,        0KiB
IO unplugs:             0         Timer unplugs:           0

add blk_plug:
Total (8,16):
Reads Queued:      428697,     1714MiB Writes Queued:           0,        0KiB
Read Dispatches:     3954,     1714MiB Write Dispatches:        0,        0KiB
Reads Requeued:         0 Writes Requeued:         0
Reads Completed:     3956,     1715MiB Writes Completed:        0,        0KiB
Read Merges:       424743,     1698MiB Write Merges:            0,        0KiB
IO unplugs:             0         Timer unplugs:        3384

The ratio of merge will be markedly increased.

Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: In ops_run_io, inc nr_pending before calling md_wait_for_blocked_rdev

In ops_run_io(), the call to md_wait_for_blocked_rdev will decrement
nr_pending so we lose the reference we hold on the rdev.
So atomic_inc it first to maintain the reference.

This bug was introduced by commit 73e92e51b7969ef5477d
md/raid5. Don't write to known bad block on doubtful devices.

which appeared in 3.0, so patch is suitable for stable kernels since
then.

Cc: stable@vger.kernel.org
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: Do not add data_offset before call to is_badblock

In chunk_aligned_read() we are adding data_offset before calling
is_badblock.  But is_badblock also adds data_offset, so that is bad.

So move the addition of data_offset to after the call to
is_badblock.

This bug was introduced by commit 31c176ecdf3563140e639
     md/raid5: avoid reading from known bad blocks.
which first appeared in 3.0.  So that patch is suitable for any
-stable kernel from 3.0.y onwards.  However it will need minor
revision for most of those (as the comment didn't appear until
recently).

Cc: stable@vger.kernel.org
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: prefer replacing failed devices over want-replacement devices.

If a RAID5 has both a failed device and a device marked as
'WantReplacement', then we should preferentially replace the failed
device.
However the current code replaces whichever is found first.
So split into 2 loops, check fail failed/missing first, and only check
for WantReplacement if nothing is failed or missing.

Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid10: Don't try to recovery unmatched (and unused) chunks.

If a RAID10 has an odd number of chunks - as might happen when there
are an odd number of devices - the last chunk has no pair and so is
not mirrored. We don't store data there, but when recovering the last
device in an array we retry to recover that last chunk from a
non-existent location. This results in an error, and the recovery
aborts.

When we get to that last chunk we should just stop - there is nothing
more to do anyway.

This bug has been present since the introduction of RAID10, so the
patch is appropriate for any -stable kernel.

Cc: stable@vger.kernel.org
Reported-by: Christian Balzer <chibi@gol.com>
Tested-by: Christian Balzer <chibi@gol.com>
Signed-off-by: NeilBrown <neilb@suse.de>

powerpc/kvm: sldi should be sld

Since we are taking a registers, this should never have been an sldi.
Talking to paulus offline, this is the correct fix.

Was introduced by:
commit 19ccb76a1938ab364a412253daec64613acbf3df
Author: Paul Mackerras <paulus@samba.org>
Date: Sat Jul 23 17:42:46 2011 +1000

Talking to paulus, this shouldn't be a literal.

Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: <stable@kernel.org> [v3.2+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/xmon: Use cpumask iterator to avoid warning

We have a bug report where the kernel hits a warning in the cpumask
code:

WARNING: at include/linux/cpumask.h:107

Which is:
        WARN_ON_ONCE(cpu >= nr_cpumask_bits);

The backtrace is:
        cpu_cmd
        cmds
        xmon_core
        xmon
        die

xmon is iterating through 0 to NR_CPUS. I'm not sure why we are still
open coding this but iterating above nr_cpu_ids is definitely a bug.

This patch iterates through all possible cpus, in case we issue a
system reset and CPUs in an offline state call in.

Perhaps the old code was trying to handle CPUs that were in the
partition but were never started (eg kexec into a kernel with an
nr_cpus= boot option). They are going to die way before we get into
xmon since we haven't set any kernel state up for them.

Signed-off-by: Anton Blanchard <anton@samba.org>
CC: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm

Pull two ARM fixes from Russell King:
"It's been fairly quiet with the fixes.  Just two this time.  One fixes
  a long standing problem with KALLSYMS needing an additional pass, and
  the other sorts a problem with the vmalloc space interacting with
  static IO mappings."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7438/1: fill possible PMD empty section gaps
  ARM: 7428/1: Prevent KALLSYM size mismatch on ARM.

ARM: 7438/1: fill possible PMD empty section gaps

On ARM with the 2-level page table format, a PMD entry is represented by
two consecutive section entries covering 2MB of virtual space.

However, static mappings always were allowed to use separate 1MB section
entries. This means in practice that a static mapping may create half
populated PMDs via create_mapping().

Since commit 0536bdf33f (ARM: move iotable mappings within the vmalloc
region) those static mappings are located in the vmalloc area. We must
ensure no such half populated PMDs are accessible once vmalloc() or
ioremap() start looking at the vmalloc area for nearby free virtual
address ranges, or various things leading to a kernel crash will happen.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reported-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: "R, Sricharan" <r.sricharan@ti.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

e1000e: remove use of IP payload checksum

Currently only used when packet split mode is enabled with jumbo frames,
IP payload checksum (for fragmented UDP packets) is mutually exclusive with
receive hashing offload since the hardware uses the same space in the
receive descriptor for the hardware-provided packet checksum and the RSS
hash, respectively. Users currently must disable jumbos when receive
hashing offload is enabled, or vice versa, because of this incompatibility.
Since testing has shown that IP payload checksum does not provide any real
benefit, just remove it so that there is no longer a choice between jumbos
or receive hashing offload but not both as done in other Intel GbE drivers
(e.g. e1000, igb).

Also, add a missing check for IP checksum error reported by the hardware;
let the stack verify the checksum when this happens.

CC: stable <stable@vger.kernel.org> [3.4]
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: be more restrictive in transport selection on bundled sacks

It was noticed recently that when we send data on a transport, its possible that
we might bundle a sack that arrived on a different transport.  While this isn't
a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
2960:

An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
   etc.) to the same destination transport address from which it
   received the DATA or control chunk to which it is replying.  This
   rule should also be followed if the endpoint is bundling DATA chunks
   together with the reply chunk.

This patch seeks to correct that.  It restricts the bundling of sack operations
to only those transports which have moved the ctsn of the association forward
since the last sack.  By doing this we guarantee that we only bundle outbound
saks on a transport that has received a chunk since the last sack.  This brings
us into stricter compliance with the RFC.

Vlad had initially suggested that we strictly allow only sack bundling on the
transport that last moved the ctsn forward.  While this makes sense, I was
concerned that doing so prevented us from bundling in the case where we had
received chunks that moved the ctsn on multiple transports.  In those cases, the
RFC allows us to select any of the transports having received chunks to bundle
the sack on.  so I've modified the approach to allow for that, by adding a state
variable to each transport that tracks weather it has moved the ctsn since the
last sack.  This I think keeps our behavior (and performance), close enough to
our current profile that I think we can do this without a sysctl knob to
enable/disable it.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yaseivch <vyasevich@gmail.com>
CC: David S. Miller <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Reported-by: Michele Baldessari <michele@redhat.com>
Reported-by: sorin serban <sserban@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igbvf: fix divide by zero

Using ethtool -C ethX rx-usecs 0 crashes with a divide by zero.
Refactor this function to fix this issue and make it more clear
what the intent of each conditional is. Add comment regarding
using a setting of zero.

CC: stable <stable@vger.kernel.org> [3.3+]
CC: David Ahern <daahern@cisco.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Linux 3.5-rc5

Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"Another week, another batch of fixes.

  All are small, contained, targeted fixes for explicit problems --
  mostly build and boot failures across i.MX, OMAP, Renesas/Shmobile and
  Samsung."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: imx6q: fix suspend regression caused by common clk migration
  ARM: OMAP4470: Fix OMAP4470 boot failure
  ARM: EXYNOS: Fix EXYNOS_DEV_DMA Kconfig entry
  ARM: OMAP2+: nand: fix build error when CONFIG_MTD_ONENAND_OMAP2=n
  ARM: shmobile: r8a7779: Route all interrupts to ARM
  ARM: shmobile: kzm9d: use late init machine hook
  ARM: shmobile: kzm9g: use late init machine hook
  ARM: mach-shmobile: armadillo800eva: Use late init machine hook
  ARM: SAMSUNG: Fix for S3C2412 EBI memory mapping
  ARM: mach-shmobile: add missing GPIO IRQ configuration on mackerel
  ARM: mach-shmobile: Fix build when SMP is enabled and EMEV2 is not enabled
  ARM: shmobile: sh7372: bugfix: chclr_offset base
  ARM: shmobile: sh73a0: bugfix: SY-DMAC number
  ARM: SAMSUNG: Should check for IS_ERR(clk) instead of NULL

printk.c: fix kernel-doc warnings

Fix kernel-doc warnings in printk.c: use correct parameter name.

Warning(kernel/printk.c:2429): No description found for parameter 'buf'
Warning(kernel/printk.c:2429): Excess function parameter 'line' description in 'kmsg_dump_get_buffer'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

linux/irq.h: fix kernel-doc warning

Fix kernel-doc warning. This struct member was removed in commit
875682648b89 ("irq: Remove irq_chip->release()") so remove its
associated kernel-doc entry also.

Warning(include/linux/irq.h:338): Excess struct/union/enum/typedef member 'release' description in 'irq_chip'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'v3.5-samsung-fixes-1' of git://git./linux/kernel/git/kgene/linux-samsung into fixes

* 'v3.5-samsung-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
  ARM: EXYNOS: Fix EXYNOS_DEV_DMA Kconfig entry
  ARM: SAMSUNG: Fix for S3C2412 EBI memory mapping
  ARM: SAMSUNG: Should check for IS_ERR(clk) instead of NULL

ARM: imx6q: fix suspend regression caused by common clk migration

When moving to common clk framework, the imx6q clks rom and mmdc_ch1_axi
get different on/off states than old clk driver, which breaks suspend
function. There might be a better way to manage these clocks, but let's
takes the old clk driver approach to fix the regression first.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'omap-fixes-for-v3.5-rc4' of git://git./linux/kernel/git/tmlind/linux-omap into fixes

From Tony Lindgren:
"Here's one more regression fix that I missed earlier, and a
trivial fix to get omap4470 booting."

* tag 'omap-fixes-for-v3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP4470: Fix OMAP4470 boot failure
ARM: OMAP2+: nand: fix build error when CONFIG_MTD_ONENAND_OMAP2=n

Merge branch 'release' of git://git./linux/kernel/git/lenb/linux

Pull ACPI & Power Management patches from Len Brown.

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  acpi_pad: fix power_saving thread deadlock
  ACPI video: Still use ACPI backlight control if _DOS doesn't exist
  ACPI, APEI, Avoid too much error reporting in runtime
  ACPI: Add a quirk for "AMILO PRO V2030" to ignore the timer overriding
  ACPI: Remove one board specific WARN when ignoring timer overriding
  ACPI: Make acpi_skip_timer_override cover all source_irq==0 cases
  ACPI, x86: fix Dell M6600 ACPI reboot regression via DMI
  ACPI sysfs.c strlen fix

Merge tag 'driver-core-3.5-rc5' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver Core fixes from Greg Kroah-Hartman:
"Here is a number of printk() fixes, specifically a few reported by the
  crazy blog program that ships in SUSE releases (that's "boot log" and
  not "web log", it predates the general "blog" terminology by many
  years), and the restoration of the continuation line functionality
  reported by Stephen and others.  Yes, the changes seem a bit big this
  late in the cycle, but I've been beating on them for a while now, and
  Stephen has even optimized it a bit, so all looks good to me.

  The other change in here is a Documentation update for the stable
  kernel rules describing how some distro patches should be backported,
  to hopefully drive a bit more response from the distros to the stable
  kernel releases.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'driver-core-3.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  printk: Optimize if statement logic where newline exists
  printk: flush continuation lines immediately to console
  syslog: fill buffer with more than a single message for SYSLOG_ACTION_READ
  Revert "printk: return -EINVAL if the message len is bigger than the buf size"
  printk: fix regression in SYSLOG_ACTION_CLEAR
  stable: Allow merging of backports for serious user-visible performance issues

Merge branches 'acpi_pad-bugzilla-42981', 'apei-bugzilla-43282', 'video-bugzilla-43168', 'bugzilla-40002' and 'bugfix-misc' into release

bug fixes

acpi_pad: fix power_saving thread deadlock

The acpi_pad driver can get stuck in destroy_power_saving_task()
waiting for kthread_stop() to stop a power_saving thread. The problem
is that the isolated_cpus_lock mutex is owned when
destroy_power_saving_task() calls kthread_stop(), which waits for a
power_saving thread to end, and the power_saving thread tries to
acquire the isolated_cpus_lock when it calls round_robin_cpu(). This
patch fixes the issue by making round_robin_cpu() use its own mutex.

https://bugzilla.kernel.org/show_bug.cgi?id=42981

Cc: stable@vger.kernel.org
Signed-off-by: Stuart Hayes <Stuart_Hayes@Dell.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ACPI video: Still use ACPI backlight control if _DOS doesn't exist

This fixes a regression in 3.4-rc1 caused by commit
ea9f8856bd6d4ed45885b06a338f7362cd6c60e5
(ACPI video: Harden video bus adding.)

Some platforms don't have _DOS control method, but the ACPI
backlight still works.
We should not invoke _DOS for these platforms.

https://bugzilla.kernel.org/show_bug.cgi?id=43168

Cc: Igor Murzov <intergalactic.anonymous@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

Merge tag 'pm-for-3.5-rc5' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael J. Wysocki:

* Fix for a bug in async suspend error code path causing parents to
   wait forever for their children in case of a suspend error from
   Mandeep Singh Baines (-stable metarial).

* Fix for a suspend regression related to earlier changes in the ACPI
   cpuidle driver from Deepthi Dharwar.

* tag 'pm-for-3.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup.
  PM / Sleep: Prevent waiting forever on asynchronous suspend after abort

Merge branch 'master' of git://1984.lsi.us.es/nf

Pablo Neira Ayuso says:

====================
The following are 4 fixes and the update of the MAINTAINERS file
to point to my Netfilter trees.

They are:

* One refcount leak fix in IPVS IPv6 support from Eric Dumazet.

* One fix for interface comparison in ipset hash-netiface sets
  from Florian Westphal.

* One fix for a missing rcu_read_unlock in nfnetlink from
  Tomasz Bursztyka.

* One fix for a kernel crash if IPSET_CMD_NONE is set to ipset via
  nfnetlink, again from Tomasz Bursztyka.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

printk: Optimize if statement logic where newline exists

In reviewing Kay's fix up patch: "printk: Have printk() never buffer its
data", I found two if statements that could be combined and optimized.

Put together the two 'cont.len && cont.owner == current' if statements
into a single one, and check if we need to call cont_add(). This also
removes the unneeded double cont_flush() calls.

Link: http://lkml.kernel.org/r/1340869133.876.10.camel@mop
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc

Pull powerpc fixes from Benjamin Herrenschmidt:
"Here are a few powerpc fixes.  Arguably some of this should have come
  to you earlier but I'm only just catching up after my medical leave.

  Mostly these fixes regressions, a couple are long standing bugs."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/pseries: Fix software invalidate TCE
  powerpc: check_and_cede_processor() never cedes
  powerpc/ftrace: Do not trace restore_interrupts()
  powerpc: Fix Section mismatch warnings in prom_init.c
  ppc64: fix missing to check all bits of _TIF_USER_WORK_MASK in preempt
  powerpc: Fix uninitialised error in numa.c
  powerpc: Fix BPF_JIT code to link with multiple TOCs

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, cpufeature: Remove stray %s, add -w to mkcapflags.pl
  x86, cpufeature: Catch duplicate CPU feature strings
  x86, cpufeature: Rename X86_FEATURE_DTS to X86_FEATURE_DTHERM
  x86: Fix kernel-doc warnings
  x86, compat: Use test_thread_flag(TIF_IA32) in compat signal delivery

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull oprofile fixlet from Ingo Molnar.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
oprofile: perf: use NR_CPUS instead or nr_cpumask_bits for static array

Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull RCU fix from Ingo Molnar.

Fixes a bug introduced in this merge window by commit b1420f1c ("Make
rcu_barrier() less disruptive")

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rcu: Stop rcu_do_batch() from multiplexing the "count" variable

printk: flush continuation lines immediately to console

Continuation lines are buffered internally, intended to merge the
chunked printk()s into a single record, and to isolate potentially
racy continuation users from usual terminated line users.

This though, has the effect that partial lines are not printed to
the console in the moment they are emitted. In case the kernel
crashes in the meantime, the potentially interesting printed
information would never reach the consoles.

Here we share the continuation buffer with the console copy logic,
and partial lines are always immediately flushed to the available
consoles. They are still buffered internally to improve the
readability and integrity of the messages and minimize the amount
of needed record headers to store.

Signed-off-by: Kay Sievers <kay@vrfy.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>