review.tizen.org Git - sdk/emulator/emulator-kernel.git/log

Fix mountpoint reference leakage in linkat

commit d22e6338db7f613dd4f6095c190682fcc519e4b7 upstream.

Recent changes to retry on ESTALE in linkat
(commit 442e31ca5a49e398351b2954b51f578353fdf210)
introduced a mountpoint reference leak and a small memory
leak in case a filesystem link operation returns ESTALE
which is pretty normal for distributed filesystems like
lustre, nfs and so on.
Free old_path in such a case.

[AV: there was another missing path_put() nearby - on the previous
goto retry]

[js: the second path_put is not in 3.12 yet, hunk removed]

Signed-off-by: Oleg Drokin: <green@linuxhacker.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable

[ Upstream commit ec0223ec48a90cb605244b45f7c62de856403729 ]

RFC4895 introduced AUTH chunks for SCTP; during the SCTP
handshake RANDOM; CHUNKS; HMAC-ALGO are negotiated (CHUNKS
being optional though):

  ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
  <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------

A special case is when an endpoint requires COOKIE-ECHO
chunks to be authenticated:

  ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
  <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
  ------------------ AUTH; COOKIE-ECHO ---------------->
  <-------------------- COOKIE-ACK ---------------------

RFC4895, section 6.3. Receiving Authenticated Chunks says:

  The receiver MUST use the HMAC algorithm indicated in
  the HMAC Identifier field. If this algorithm was not
  specified by the receiver in the HMAC-ALGO parameter in
  the INIT or INIT-ACK chunk during association setup, the
  AUTH chunk and all the chunks after it MUST be discarded
  and an ERROR chunk SHOULD be sent with the error cause
  defined in Section 4.1. [...] If no endpoint pair shared
  key has been configured for that Shared Key Identifier,
  all authenticated chunks MUST be silently discarded. [...]

  When an endpoint requires COOKIE-ECHO chunks to be
  authenticated, some special procedures have to be followed
  because the reception of a COOKIE-ECHO chunk might result
  in the creation of an SCTP association. If a packet arrives
  containing an AUTH chunk as a first chunk, a COOKIE-ECHO
  chunk as the second chunk, and possibly more chunks after
  them, and the receiver does not have an STCB for that
  packet, then authentication is based on the contents of
  the COOKIE-ECHO chunk. In this situation, the receiver MUST
  authenticate the chunks in the packet by using the RANDOM
  parameters, CHUNKS parameters and HMAC_ALGO parameters
  obtained from the COOKIE-ECHO chunk, and possibly a local
  shared secret as inputs to the authentication procedure
  specified in Section 6.3. If authentication fails, then
  the packet is discarded. If the authentication is successful,
  the COOKIE-ECHO and all the chunks after the COOKIE-ECHO
  MUST be processed. If the receiver has an STCB, it MUST
  process the AUTH chunk as described above using the STCB
  from the existing association to authenticate the
  COOKIE-ECHO chunk and all the chunks after it. [...]

Commit bbd0d59809f9 introduced the possibility to receive
and verification of AUTH chunk, including the edge case for
authenticated COOKIE-ECHO. On reception of COOKIE-ECHO,
the function sctp_sf_do_5_1D_ce() handles processing,
unpacks and creates a new association if it passed sanity
checks and also tests for authentication chunks being
present. After a new association has been processed, it
invokes sctp_process_init() on the new association and
walks through the parameter list it received from the INIT
chunk. It checks SCTP_PARAM_RANDOM, SCTP_PARAM_HMAC_ALGO
and SCTP_PARAM_CHUNKS, and copies them into asoc->peer
meta data (peer_random, peer_hmacs, peer_chunks) in case
sysctl -w net.sctp.auth_enable=1 is set. If in INIT's
SCTP_PARAM_SUPPORTED_EXT parameter SCTP_CID_AUTH is set,
peer_random != NULL and peer_hmacs != NULL the peer is to be
assumed asoc->peer.auth_capable=1, in any other case
asoc->peer.auth_capable=0.

Now, if in sctp_sf_do_5_1D_ce() chunk->auth_chunk is
available, we set up a fake auth chunk and pass that on to
sctp_sf_authenticate(), which at latest in
sctp_auth_calculate_hmac() reliably dereferences a NULL pointer
at position 0..0008 when setting up the crypto key in
crypto_hash_setkey() by using asoc->asoc_shared_key that is
NULL as condition key_id == asoc->active_key_id is true if
the AUTH chunk was injected correctly from remote. This
happens no matter what net.sctp.auth_enable sysctl says.

The fix is to check for net->sctp.auth_enable and for
asoc->peer.auth_capable before doing any operations like
sctp_sf_authenticate() as no key is activated in
sctp_auth_asoc_init_active_key() for each case.

Now as RFC4895 section 6.3 states that if the used HMAC-ALGO
passed from the INIT chunk was not used in the AUTH chunk, we
SHOULD send an error; however in this case it would be better
to just silently discard such a maliciously prepared handshake
as we didn't even receive a parameter at all. Also, as our
endpoint has no shared key configured, section 6.3 says that
MUST silently discard, which we are doing from now onwards.

Before calling sctp_sf_pdiscard(), we need not only to free
the association, but also the chunk->auth_chunk skb, as
commit bbd0d59809f9 created a skb clone in that case.

I have tested this locally by using netfilter's nfqueue and
re-injecting packets into the local stack after maliciously
modifying the INIT chunk (removing RANDOM; HMAC-ALGO param)
and the SCTP packet containing the COOKIE_ECHO (injecting
AUTH chunk before COOKIE_ECHO). Fixed with this patch applied.

Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH chunk")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <yasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ip_tunnel:multicast process cause panic due to skb->_skb_refdst NULL pointer

[ Upstream commit 10ddceb22bab11dab10ba645c7df2e4a8e7a5db5 ]

when ip_tunnel process multicast packets, it may check if the packet is looped
back packet though 'rt_is_output_route(skb_rtable(skb))' in ip_tunnel_rcv(),
but before that , skb->_skb_refdst has been dropped in iptunnel_pull_header(),
so which leads to a panic.

fix the bug: https://bugzilla.kernel.org/show_bug.cgi?id=70681

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

tg3: Don't check undefined error bits in RXBD

[ Upstream commit d7b95315cc7f441418845a165ee56df723941487 ]

Redefine the RXD_ERR_MASK to include only relevant error bits. This fixes
a customer reported issue of randomly dropping packets on the 5719.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ipv6: ipv6_find_hdr restore prev functionality

[ Upstream commit accfe0e356327da5bd53da8852b93fc22de9b5fc ]

The commit 9195bb8e381d81d5a315f911904cdf0cfcc919b8 ("ipv6: improve
ipv6_find_hdr() to skip empty routing headers") broke ipv6_find_hdr().

When a target is specified like IPPROTO_ICMPV6 ipv6_find_hdr()
returns -ENOENT when it's found, not the header as expected.

A part of IPVS is broken and possible also nft_exthdr_eval().
When target is -1 which it is most cases, it works.

This patch exits the do while loop if the specific header is found
so the nexthdr could be returned as expected.

Reported-by: Art -kwaak- van Breemen <ard@telegraafnet.nl>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
CC:Ansis Atteka <aatteka@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

sfc: check for NULL efx->ptp_data in efx_ptp_event

[ Upstream commit 8f355e5cee63c2c0c145d8206c4245d0189f47ff ]

If we receive a PTP event from the NIC when we haven't set up PTP state
in the driver, we attempt to read through a NULL pointer efx->ptp_data,
triggering a panic.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ipv6: reuse ip6_frag_id from ip6_ufo_append_data

[ Upstream commit 916e4cf46d0204806c062c8c6c4d1f633852c5b6 ]

Currently we generate a new fragmentation id on UFO segmentation. It
is pretty hairy to identify the correct net namespace and dst there.
Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst
available at all.

This causes unreliable or very predictable ipv6 fragmentation id
generation while segmentation.

Luckily we already have pregenerated the ip6_frag_id in
ip6_ufo_append_data and can use it here.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

virtio-net: alloc big buffers also when guest can receive UFO

[ Upstream commit 0e7ede80d929ff0f830c44a543daa1acd590c749 ]

We should alloc big buffers also when guest can receive UFO
packets to let the big packets fit into guest rx buffer.

Fixes 5c5167515d80f78f6bb538492c423adcae31ad65
(virtio-net: Allow UFO feature to be set and advertised.)

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

neigh: recompute reachabletime before returning from neigh_periodic_work()

[ Upstream commit feff9ab2e7fa773b6a3965f77375fe89f7fd85cf ]

If the neigh table's entries is less than gc_thresh1, the function
will return directly, and the reachabletime will not be recompute,
so the reachabletime can be guessed.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

net-tcp: fastopen: fix high order allocations

[ Upstream commit f5ddcbbb40aa0ba7fbfe22355d287603dbeeaaac ]

This patch fixes two bugs in fastopen :

1) The tcp_sendmsg(..., @size) argument was ignored.

Code was relying on user not fooling the kernel with iovec mismatches

2) When MTU is about 64KB, tcp_send_syn_data() attempts order-5
allocations, which are likely to fail when memory gets fragmented.

Fixes: 783237e8daf13 ("net-tcp: Fast Open client - sending SYN-data")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Tested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

tun: remove bogus hardware vlan acceleration flags from vlan_features

[ Upstream commit 6671b2240c54585d4afb5286a29f1569fe5e40a8 ]

Even though only the outer vlan tag can be HW accelerated in the transmission
path, in the TUN/TAP driver vlan_features mirrors hw_features, which happens
to have the NETIF_F_HW_VLAN_?TAG_TX flags set. Because of this, during packet
tranmisssion through a stacked vlan device dev_hard_start_xmit, (incorrectly)
assuming that the vlan device supports hardware vlan acceleration, does not
add the vlan header to the skb payload and the inner vlan tags are lost
(vlan_tci contains the outer vlan tag when userspace reads the packet from
the tap device).

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

veth: Fix vlan_features so as to be able to use stacked vlan interfaces

[ Upstream commit 8d0d21f4053c07714802cbe8b1fe26913ec296cc ]

Even if we create a stacked vlan interface such as veth0.10.20, it sends
single tagged frames (tagged with only vid 10).
Because vlan_features of a veth interface has the
NETIF_F_HW_VLAN_[CTAG/STAG]_TX bits, veth0.10 also has that feature, so
dev_hard_start_xmit(veth0.10) doesn't call __vlan_put_tag() and
vlan_dev_hard_start_xmit(veth0.10) overwrites vlan_tci.
This prevents us from using a combination of 802.1ad and 802.1Q
in containers, etc.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

SELinux: Increase ebitmap_node size for 64-bit configuration

commit a767f680e34bf14a36fefbbb6d85783eef99fd57 upstream.

Currently, the ebitmap_node structure has a fixed size of 32 bytes. On
a 32-bit system, the overhead is 8 bytes, leaving 24 bytes for being
used as bitmaps. The overhead ratio is 1/4.

On a 64-bit system, the overhead is 16 bytes. Therefore, only 16 bytes
are left for bitmap purpose and the overhead ratio is 1/2. With a
3.8.2 kernel, a boot-up operation will cause the ebitmap_get_bit()
function to be called about 9 million times. The average number of
ebitmap_node traversal is about 3.7.

This patch increases the size of the ebitmap_node structure to 64
bytes for 64-bit system to keep the overhead ratio at 1/4. This may
also improve performance a little bit by making node to node traversal
less frequent (< 2) as more bits are available in each node.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

SELinux: Reduce overhead of mls_level_isvalid() function call

commit fee7114298cf54bbd221cdb2ab49738be8b94f4c upstream.

While running the high_systime workload of the AIM7 benchmark on
a 2-socket 12-core Westmere x86-64 machine running 3.10-rc4 kernel
(with HT on), it was found that a pretty sizable amount of time was
spent in the SELinux code. Below was the perf trace of the "perf
record -a -s" of a test run at 1500 users:

  5.04%            ls  [kernel.kallsyms]     [k] ebitmap_get_bit
  1.96%            ls  [kernel.kallsyms]     [k] mls_level_isvalid
  1.95%            ls  [kernel.kallsyms]     [k] find_next_bit

The ebitmap_get_bit() was the hottest function in the perf-report
output.  Both the ebitmap_get_bit() and find_next_bit() functions
were, in fact, called by mls_level_isvalid(). As a result, the
mls_level_isvalid() call consumed 8.95% of the total CPU time of
all the 24 virtual CPUs which is quite a lot. The majority of the
mls_level_isvalid() function invocations come from the socket creation
system call.

Looking at the mls_level_isvalid() function, it is checking to see
if all the bits set in one of the ebitmap structure are also set in
another one as well as the highest set bit is no bigger than the one
specified by the given policydb data structure. It is doing it in
a bit-by-bit manner. So if the ebitmap structure has many bits set,
the iteration loop will be done many times.

The current code can be rewritten to use a similar algorithm as the
ebitmap_contains() function with an additional check for the
highest set bit. The ebitmap_contains() function was extended to
cover an optional additional check for the highest set bit, and the
mls_level_isvalid() function was modified to call ebitmap_contains().

With that change, the perf trace showed that the used CPU time drop
down to just 0.08% (ebitmap_contains + mls_level_isvalid) of the
total which is about 100X less than before.

  0.07%            ls  [kernel.kallsyms]     [k] ebitmap_contains
  0.05%            ls  [kernel.kallsyms]     [k] ebitmap_get_bit
  0.01%            ls  [kernel.kallsyms]     [k] mls_level_isvalid
  0.01%            ls  [kernel.kallsyms]     [k] find_next_bit

The remaining ebitmap_get_bit() and find_next_bit() functions calls
are made by other kernel routines as the new mls_level_isvalid()
function will not call them anymore.

This patch also improves the high_systime AIM7 benchmark result,
though the improvement is not as impressive as is suggested by the
reduction in CPU time spent in the ebitmap functions. The table below
shows the performance change on the 2-socket x86-64 system (with HT
on) mentioned above.

+--------------+---------------+----------------+-----------------+
|   Workload   | mean % change | mean % change  | mean % change   |
|              | 10-100 users  | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| high_systime |     +0.1%     |     +0.9%      |     +2.6%       |
+--------------+---------------+----------------+-----------------+

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

mm: do not walk all of system memory during show_mem

commit c78e93630d15b5f5774213aad9bdc9f52473a89b upstream.

It has been reported on very large machines that show_mem is taking almost
5 minutes to display information.  This is a serious problem if there is
an OOM storm.  The bulk of the cost is in show_mem doing a very expensive
PFN walk to give us the following information

  Total RAM:       Also available as totalram_pages
  Highmem pages:   Also available as totalhigh_pages
  Reserved pages:  Can be inferred from the zone structure
  Shared pages:    PFN walk required
  Unshared pages:  PFN walk required
  Quick pages:     Per-cpu walk required

Only the shared/unshared pages requires a full PFN walk but that
information is useless.  It is also inaccurate as page pins of unshared
pages would be accounted for as shared.  Even if the information was
accurate, I'm struggling to think how the shared/unshared information
could be useful for debugging OOM conditions.  Maybe it was useful before
rmap existed when reclaiming shared pages was costly but it is less
relevant today.

The PFN walk could be optimised a bit but why bother as the information is
useless.  This patch deletes the PFN walker and infers the total RAM,
highmem and reserved pages count from struct zone.  It omits the
shared/unshared page usage on the grounds that it is useless.  It also
corrects the reporting of HighMem as HighMem/MovableOnly as ZONE_MOVABLE
has similar problems to HighMem with respect to lowmem/highmem exhaustion.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

epoll: do not take the nested ep->mtx on EPOLL_CTL_DEL

commit 4ff36ee94d93ddb4b7846177f1118d9aa33408e2 upstream.

The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock.
That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with
epoll_ctl(b, EPOLL_CTL_DEL, a, x).  The deadlock was introduced with
commmit 67347fe4e632 ("epoll: do not take global 'epmutex' for simple
topologies").

The acquistion of the ep->mtx for the destination 'ep' was added such
that a concurrent EPOLL_CTL_ADD operation would see the correct state of
the ep (Specifically, the check for '!list_empty(&f.file->f_ep_links')

However, by simply not acquiring the lock, we do not serialize behind
the ep->mtx from the add path, and thus may perform a full path check
when if we had waited a little longer it may not have been necessary.
However, this is a transient state, and performing the full loop
checking in this case is not harmful.

The important point is that we wouldn't miss doing the full loop
checking when required, since EPOLL_CTL_ADD always locks any 'ep's that
its operating upon.  The reason we don't need to do lock ordering in the
add path, is that we are already are holding the global 'epmutex'
whenever we do the double lock.  Further, the original posting of this
patch, which was tested for the intended performance gains, did not
perform this additional locking.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Nelson Elhage <nelhage@nelhage.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

epoll: do not take global 'epmutex' for simple topologies

commit 67347fe4e6326338ee217d7eb826bedf30b2e155 upstream.

When calling EPOLL_CTL_ADD for an epoll file descriptor that is attached
directly to a wakeup source, we do not need to take the global 'epmutex',
unless the epoll file descriptor is nested.  The purpose of taking the
'epmutex' on add is to prevent complex topologies such as loops and deep
wakeup paths from forming in parallel through multiple EPOLL_CTL_ADD
operations.  However, for the simple case of an epoll file descriptor
attached directly to a wakeup source (with no nesting), we do not need to
hold the 'epmutex'.

This patch along with 'epoll: optimize EPOLL_CTL_DEL using rcu' improves
scalability on larger systems.  Quoting Nathan Zimmer's mail on SPECjbb
performance:

"On the 16 socket run the performance went from 35k jOPS to 125k jOPS.  In
addition the benchmark when from scaling well on 10 sockets to scaling
well on just over 40 sockets.

...

Currently the benchmark stops scaling at around 40-44 sockets but it seems like
I found a second unrelated bottleneck."

[akpm@linux-foundation.org: use `bool' for boolean variables, remove unneeded/undesirable cast of void*, add missed ep_scan_ready_list() kerneldoc]
Signed-off-by: Jason Baron <jbaron@akamai.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Nelson Elhage <nelhage@nelhage.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

epoll: optimize EPOLL_CTL_DEL using rcu

commit ae10b2b4eb01bedc91d29d5c5bb9e416fd806c40 upstream.

Nathan Zimmer found that once we get over 10+ cpus, the scalability of
SPECjbb falls over due to the contention on the global 'epmutex', which is
taken in on EPOLL_CTL_ADD and EPOLL_CTL_DEL operations.

Patch #1 removes the 'epmutex' lock completely from the EPOLL_CTL_DEL path
by using rcu to guard against any concurrent traversals.

Patch #2 remove the 'epmutex' lock from EPOLL_CTL_ADD operations for
simple topologies. IE when adding a link from an epoll file descriptor to
a wakeup source, where the epoll file descriptor is not nested.

This patch (of 2):

Optimize EPOLL_CTL_DEL such that it does not require the 'epmutex' by
converting the file->f_ep_links list into an rcu one. In this way, we can
traverse the epoll network on the add path in parallel with deletes.
Since deletes can't create loops or worse wakeup paths, this is safe.

This patch in combination with the patch "epoll: Do not take global 'epmutex'
for simple topologies", shows a dramatic performance improvement in
scalability for SPECjbb.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Nelson Elhage <nelhage@nelhage.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
CC: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

x86/dumpstack: Fix printk_address for direct addresses

commit 5f01c98859073cb512b01d4fad74b5f4e047be0b upstream.

Consider a kernel crash in a module, simulated the following way:

static int my_init(void)
{
         char *map = (void *)0x5;
         *map = 3;
         return 0;
}
module_init(my_init);

When we turn off FRAME_POINTERs, the very first instruction in
that function causes a BUG. The problem is that we print IP in
the BUG report using %pB (from printk_address). And %pB
decrements the pointer by one to fix printing addresses of
functions with tail calls.

This was added in commit 71f9e59800e5ad4 ("x86, dumpstack: Use
%pB format specifier for stack trace") to fix the call stack
printouts.

So instead of correct output:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
  IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173]

We get:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
  IP: [<ffffffffa0152000>] 0xffffffffa0151fff

To fix that, we use %pS only for stack addresses printouts (via
newly added printk_stack_address) and %pB for regs->ip (via
printk_address). I.e. we revert to the old behaviour for all
except call stacks. And since from all those reliable is 1, we
remove that parameter from printk_address.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: joe@perches.com
Cc: jirislaby@gmail.com
Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

SUNRPC: close a rare race in xs_tcp_setup_socket.

commit 93dc41bdc5c853916610576c6b48a1704959c70d upstream.

We have one report of a crash in xs_tcp_setup_socket.
The call path to the crash is:

  xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested.

The 'sock' passed to that last function is NULL.

The only way I can see this happening is a concurrent call to
xs_close:

  xs_close -> xs_reset_transport -> sock_release -> inet_release

inet_release sets:
   sock->sk = NULL;
inet_stream_connect calls
   lock_sock(sock->sk);
which gets NULL.

All calls to xs_close are protected by XPRT_LOCKED as are most
activations of the workqueue which runs xs_tcp_setup_socket.
The exception is xs_tcp_schedule_linger_timeout.

So presumably the timeout queued by the later fires exactly when some
other code runs xs_close().

To protect against this we can move the cancel_delayed_work_sync()
call from xs_destory() to xs_close().

As xs_close is never called from the worker scheduled on
->connect_worker, this can never deadlock.

Signed-off-by: NeilBrown <neilb@suse.de>
[Trond: Make it safe to call cancel_delayed_work_sync() on AF_LOCAL sockets]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

sched/rt: Remove redundant nr_cpus_allowed test

commit 6bfa687c19b7ab8adee03f0d43c197c2945dd869 upstream.

In 76854c7e8f3f4172fef091e78d88b3b751463ac6 ("sched: Use
rt.nr_cpus_allowed to recover select_task_rq() cycles") an
optimization was added to select_task_rq_rt() that immediately
returns when p->nr_cpus_allowed == 1 at the beginning of the
function.

This makes the latter p->nr_cpus_allowed > 1 check redundant,
which can now be removed.

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <mgalbraith@suse.de>
Cc: tomk@rgmadvisors.com
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1380914693-24634-1-git-send-email-shawn.bohrer@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

sched/rt: Add missing rmb()

commit 7c3f2ab7b844f1a859afbc3d41925e8a0faba5fa upstream.

While discussing the proposed SCHED_DEADLINE patches which in parts
mimic the existing FIFO code it was noticed that the wmb in
rt_set_overloaded() didn't have a matching barrier.

The only site using rt_overloaded() to test the rto_count is
pull_rt_task() and we should issue a matching rmb before then assuming
there's an rto_mask bit set.

Without that smp_rmb() in there we could actually miss seeing the
rto_mask bit.

Also, change to using smp_[wr]mb(), even though this is SMP only code;
memory barriers without smp_ always make me think they're against
hardware of some sort.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: vincent.guittot@linaro.org
Cc: luca.abeni@unitn.it
Cc: bruce.ashfield@windriver.com
Cc: dhaval.giani@gmail.com
Cc: rostedt@goodmis.org
Cc: hgu1972@gmail.com
Cc: oleg@redhat.com
Cc: fweisbec@gmail.com
Cc: darren@dvhart.com
Cc: johan.eker@ericsson.com
Cc: p.faure@akatech.ch
Cc: paulmck@linux.vnet.ibm.com
Cc: raistlin@linux.it
Cc: claudio@evidence.eu.com
Cc: insop.song@gmail.com
Cc: michael@amarulasolutions.com
Cc: liming.wang@windriver.com
Cc: fchecconi@gmail.com
Cc: jkacur@redhat.com
Cc: tommaso.cucinotta@sssup.it
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: harald.gustafsson@ericsson.com
Cc: nicola.manica@disi.unitn.it
Cc: tglx@linutronix.de
Link: http://lkml.kernel.org/r/20131015103507.GF10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

sched: Assign correct scheduling domain to 'sd_llc'

commit 5d4cf996cf134e8ddb4f906b8197feb9267c2b77 upstream.

Commit 42eb088e (sched: Avoid NULL dereference on sd_busy) corrected a NULL
dereference on sd_busy but the fix also altered what scheduling domain it
used for the 'sd_llc' percpu variable.

One impact of this is that a task selecting a runqueue may consider
idle CPUs that are not cache siblings as candidates for running.
Tasks are then running on CPUs that are not cache hot.

This was found through bisection where ebizzy threads were not seeing equal
performance and it looked like a scheduling fairness issue. This patch
mitigates but does not completely fix the problem on all machines tested
implying there may be an additional bug or a common root cause. Here are
the average range of performance seen by individual ebizzy threads. It
was tested on top of candidate patches related to x86 TLB range flushing.

4-core machine
    3.13.0-rc3            3.13.0-rc3
       vanilla            fixsd-v3r3
Mean   1        0.00 (  0.00%)        0.00 (  0.00%)
Mean   2        0.34 (  0.00%)        0.10 ( 70.59%)
Mean   3        1.29 (  0.00%)        0.93 ( 27.91%)
Mean   4        7.08 (  0.00%)        0.77 ( 89.12%)
Mean   5      193.54 (  0.00%)        2.14 ( 98.89%)
Mean   6      151.12 (  0.00%)        2.06 ( 98.64%)
Mean   7      115.38 (  0.00%)        2.04 ( 98.23%)
Mean   8      108.65 (  0.00%)        1.92 ( 98.23%)

8-core machine
Mean   1         0.00 (  0.00%)        0.00 (  0.00%)
Mean   2         0.40 (  0.00%)        0.21 ( 47.50%)
Mean   3        23.73 (  0.00%)        0.89 ( 96.25%)
Mean   4        12.79 (  0.00%)        1.04 ( 91.87%)
Mean   5        13.08 (  0.00%)        2.42 ( 81.50%)
Mean   6        23.21 (  0.00%)       69.46 (-199.27%)
Mean   7        15.85 (  0.00%)      101.72 (-541.77%)
Mean   8       109.37 (  0.00%)       19.13 ( 82.51%)
Mean   12      124.84 (  0.00%)       28.62 ( 77.07%)
Mean   16      113.50 (  0.00%)       24.16 ( 78.71%)

It's eliminated for one machine and reduced for another.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: H Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131217092124.GV11295@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

sched: Initialize power_orig for overlapping groups

commit 8e8339a3a1069141985daaa2521ba304509ddecd upstream.

Yinghai reported that he saw a /0 in sg_capacity on his EX parts.
Make sure to always initialize power_orig now that we actually use it.

Ideally build_sched_domains() -> init_sched_groups_power() would also
initialize this; but for some yet unexplained reason some setups seem
to miss updates there.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-l8ng2m9uml6fhibln8wqpom7@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

sched: Avoid NULL dereference on sd_busy

commit 42eb088ed246a5a817bb45a8b32fe234cf1c0f8b upstream.

Commit 37dc6b50cee9 ("sched: Remove unnecessary iteration over sched
domains to update nr_busy_cpus") forgot to clear 'sd_busy' under some
conditions leading to a possible NULL deref in set_cpu_sd_state_idle().

Reported-by: Anton Blanchard <anton@samba.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131118113701.GF3866@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus

commit 37dc6b50cee97954c4e6edcd5b1fa614b76038ee upstream.

nr_busy_cpus parameter is used by nohz_kick_needed() to find out the
number of busy cpus in a sched domain which has SD_SHARE_PKG_RESOURCES
flag set. Therefore instead of updating nr_busy_cpus at every level
of sched domain, since it is irrelevant, we can update this parameter
only at the parent domain of the sd which has this flag set. Introduce
a per-cpu parameter sd_busy which represents this parent domain.

In nohz_kick_needed() we directly query the nr_busy_cpus parameter
associated with the groups of sd_busy.

By associating sd_busy with the highest domain which has
SD_SHARE_PKG_RESOURCES flag set, we cover all lower level domains
which could have this flag set and trigger nohz_idle_balancing if any
of the levels have more than one busy cpu.

sd_busy is irrelevant for asymmetric load balancing. However sd_asym
has been introduced to represent the highest sched domain which has
SD_ASYM_PACKING flag set so that it can be queried directly when
required.

While we are at it, we might as well change the nohz_idle parameter to
be updated at the sd_busy domain level alone and not the base domain
level of a CPU. This will unify the concept of busy cpus at just one
level of sched domain where it is currently used.

Signed-off-by: Preeti U Murthy<preeti@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: svaidy@linux.vnet.ibm.com
Cc: vincent.guittot@linaro.org
Cc: bitbucket@online.de
Cc: benh@kernel.crashing.org
Cc: anton@samba.org
Cc: Morten.Rasmussen@arm.com
Cc: pjt@google.com
Cc: peterz@infradead.org
Cc: mikey@neuling.org
Link: http://lkml.kernel.org/r/20131030031252.23426.4417.stgit@preeti.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

rcu: Throttle rcu_try_advance_all_cbs() execution

commit c229828ca6bc62d6c654f64b1d1b8a9ebd8a56f3 upstream.

The rcu_try_advance_all_cbs() function is invoked on each attempted
entry to and every exit from idle. If this function determines that
there are callbacks ready to invoke, the caller will invoke the RCU
core, which in turn will result in a pair of context switches. If a
CPU enters and exits idle extremely frequently, this can result in
an excessive number of context switches and high CPU overhead.

This commit therefore causes rcu_try_advance_all_cbs() to throttle
itself, refusing to do work more than once per jiffy.

Reported-by: Tibor Billes <tbilles@gmx.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Tibor Billes <tbilles@gmx.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

rcu: Throttle invoke_rcu_core() invocations due to non-lazy callbacks

commit c337f8f58ed7cf150651d232af8222421a71463d upstream.

If a non-lazy callback arrives on a CPU that has previously gone idle
with no non-lazy callbacks, invoke_rcu_core() forces the RCU core to
run. However, it does not update the conditions, which could result
in several closely spaced invocations of the RCU core, which in turn
could result in an excessively high context-switch rate and resulting
high overhead.

This commit therefore updates the ->all_lazy and ->nonlazy_posted_snap
fields to prevent closely spaced invocations.

Reported-by: Tibor Billes <tbilles@gmx.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Tibor Billes <tbilles@gmx.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

powerpc: Fix fatal SLB miss when restoring PPR

commit 0c4888ef1d8a8b82c29075ce7e257ff795af15c7 upstream.

When restoring the PPR value, we incorrectly access the thread structure
at a time where MSR:RI is clear, which means we cannot recover from nested
faults. However the thread structure isn't covered by the "bolted" SLB
entries and thus accessing can fault.

This fixes it by splitting the code so that the PPR value is loaded into
a GPR before MSR:RI is cleared.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

sched: Fix asymmetric scheduling for POWER7

commit 2042abe7977222ef606306faa2dce8fd51e98e65 upstream.

Asymmetric scheduling within a core is a scheduler loadbalancing
feature that is triggered when SD_ASYM_PACKING flag is set. The goal
for the load balancer is to move tasks to lower order idle SMT threads
within a core on a POWER7 system.

In nohz_kick_needed(), we intend to check if our sched domain (core)
is completely busy or we have idle cpu.

The following check for SD_ASYM_PACKING:

(cpumask_first_and(nohz.idle_cpus_mask, sched_domain_span(sd)) < cpu)

already covers the case of checking if the domain has an idle cpu,
because cpumask_first_and() will not yield any set bits if this domain
has no idle cpu.

Hence, nr_busy check against group weight can be removed.

Reported-by: Michael Neuling <michael.neuling@au1.ibm.com>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: vincent.guittot@linaro.org
Cc: bitbucket@online.de
Cc: benh@kernel.crashing.org
Cc: anton@samba.org
Cc: Morten.Rasmussen@arm.com
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131030031242.23426.13019.stgit@preeti.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

PCI: Drop warning about drivers that don't use pci_set_master()

commit fbeeb822f6f45cadf154d7b7cff1c13537cd799d upstream.

f41f064cf4 ("PCI: Workaround missing pci_set_master in pci drivers") made
pci_enable_bridge() turn on bus mastering if the driver hadn't done so
already. It also added a warning in this case. But there's no reason to
warn about it unless it's actually a problem to enable bus mastering here.

This patch drops the warning because I'm not aware of any such problem.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

NOHZ: Check for nohz active instead of nohz enabled

commit d689fe222a858c767cb8594faf280048e532b53f upstream.

RCU and the fine grained idle time accounting functions check
tick_nohz_enabled. But that variable is merily telling that NOHZ has
been enabled in the config and not been disabled on the command line.

But it does not tell anything about nohz being active. That's what all
this should check for.

Matthew reported, that the idle accounting on his old P1 machine
showed bogus values, when he enabled NOHZ in the config and did not
disable it on the kernel command line. The reason is that his machine
uses (refined) jiffies as a clocksource which explains why the "fine"
grained accounting went into lala land, because it depends on when the
system goes and leaves idle relative to the jiffies increment.

Provide a tick_nohz_active indicator and let RCU and the accounting
code use this instead of tick_nohz_enable.

Reported-and-tested-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: john.stultz@linaro.org
Cc: mwhitehe@redhat.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311132052240.30673@ionos.tec.linutronix.de
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

nohz: Fix another inconsistency between CONFIG_NO_HZ=n and nohz=off

commit 0e576acbc1d9600cf2d9b4a141a2554639959d50 upstream.

If CONFIG_NO_HZ=n tick_nohz_get_sleep_length() returns NSEC_PER_SEC/HZ.

If CONFIG_NO_HZ=y and the nohz functionality is disabled via the
command line option "nohz=off" or not enabled due to missing hardware
support, then tick_nohz_get_sleep_length() returns 0. That happens
because ts->sleep_length is never set in that case.

Set it to NSEC_PER_SEC/HZ when the NOHZ mode is inactive.

Reported-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

futex: move user address verification up to common code

commit 5cdec2d833748fbd27d3682f7209225c504c79c5 upstream.

When debugging the read-only hugepage case, I was confused by the fact
that get_futex_key() did an access_ok() only for the non-shared futex
case, since the user address checking really isn't in any way specific
to the private key handling.

Now, it turns out that the shared key handling does effectively do the
equivalent checks inside get_user_pages_fast() (it doesn't actually
check the address range on x86, but does check the page protections for
being a user page). So it wasn't actually a bug, but the fact that we
treat the address differently for private and shared futexes threw me
for a loop.

Just move the check up, so that it gets done for both cases. Also, use
the 'rw' parameter for the type, even if it doesn't actually matter any
more (it's a historical artifact of the old racy i386 "page faults from
kernel space don't check write protections").

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

efifb: prevent null-deref when iterating dmi_list

commit 55aa42f2e690157e254a6a6989f5f4ac928b35c8 upstream.

The dmi_list array is initialized using gnu designated initializers, and
therefore may contain fewer explicitly defined entries as there are
elements in it. This is because the enum above with M_xyz constants
contains more items than the designated initializer. Those elements not
explicitly initialized are implicitly set to 0.

Now efifb_setup() loops through all these array elements, and performs
a strcmp on each item. For non explicitly initialized elements this will
be a null pointer:

This patch swaps the check order in the if statement, thus checks first
whether dmi_list[i].base is null.

Signed-off-by: James Bates <james.h.bates@gmail.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

blktrace: Send BLK_TN_PROCESS events to all running traces

commit a404d5576bbe586a1097a8bc2f32c5f22651b0aa upstream.

Currently each task sends BLK_TN_PROCESS event to the first traced
device it interacts with after a new trace is started. When there are
several traced devices and the task accesses more devices, this logic
can result in BLK_TN_PROCESS being sent several times to some devices
while it is never sent to other devices. Thus blkparse doesn't display
command name when parsing some blktrace files.

Fix the problem by sending BLK_TN_PROCESS event to all traced devices
when a task interacts with any of them.

Signed-off-by: Jan Kara <jack@suse.cz>
Review-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

usb: ohci: use amd_chipset_type to filter for SB800 prefetch

commit 02c123ee99c793f65af2dbda17d5fe87d448f808 upstream.

Commit "usb: pci-quirks: refactor AMD quirk to abstract AMD chipset types"
introduced a new AMD chipset type to filter AMD platforms with different
chipsets.

According to a recent thread [1], this patch updates SB800 prefetch routine
in AMD PLL quirk. And make it use the new chipset type to represent SB800
generation.

[1] http://marc.info/?l=linux-usb&m=138012321616452&w=2

Signed-off-by: Huang Rui <ray.huang@amd.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

usb: ehci: use amd_chipset_type to filter for usb subsystem hang bug

commit 3ad145b62a15c86150dd0cc229a39a3120d462f9 upstream.

Commit "usb: pci-quirks: refactor AMD quirk to abstract AMD chipset types"
introduced a new AMD chipset type to filter AMD platforms with different
chipsets.

According to a recent thread [1], this patch updates USB subsystem hang
symptom quirk which is observed on AMD all SB600 and SB700 revision
0x3a/0x3b. And make it use the new chipset type to represent.

[1] http://marc.info/?l=linux-usb&m=138012321616452&w=2

Signed-off-by: Huang Rui <ray.huang@amd.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

usb: pci-quirks: refactor AMD quirk to abstract AMD chipset types

commit 22b4f0cd1d4d98f50213e9a37ead654e80b54b9d upstream.

This patch abstracts out a AMD chipset type which includes southbridge
generation and its revision. When os excutes usb_amd_find_chipset_info
routine to initialize AMD chipset type, driver will know which kind of
chipset is used.

This update has below benifits:
- Driver is able to confirm which southbridge generations and their
revision are used, with chipset detection once.
- To describe chipset generations with enumeration types brings better
readability.
- It's flexible to filter AMD platforms to implement new quirks in future.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Andiry Xu <andiry.xu@gmail.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - Explicitly keep codec powered up in hdmi_present_sense

commit da4a7a3926d09c13ae052ede67feb7285e01e3f5 upstream.

This should help us avoid the following mutex deadlock:

[] mutex_lock+0x2a/0x50
[] hdmi_present_sense+0x53/0x3a0 [snd_hda_codec_hdmi]
[] generic_hdmi_resume+0x5a/0x70 [snd_hda_codec_hdmi]
[] hda_call_codec_resume+0xec/0x1d0 [snd_hda_codec]
[] snd_hda_power_save+0x1e4/0x280 [snd_hda_codec]
[] codec_exec_verb+0x5f/0x290 [snd_hda_codec]
[] snd_hda_codec_read+0x5b/0x90 [snd_hda_codec]
[] snd_hdmi_get_eld_size+0x1e/0x20 [snd_hda_codec_hdmi]
[] snd_hdmi_get_eld+0x2c/0xd0 [snd_hda_codec_hdmi]
[] hdmi_present_sense+0x9a/0x3a0 [snd_hda_codec_hdmi]
[] hdmi_repoll_eld+0x34/0x50 [snd_hda_codec_hdmi]

Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - Delay HDMI presence reports while waiting for ELD information

commit efe4710860fa6ed10dd041f13902f0e06c86e8cc upstream.

There is a small gap between the jack detection unsolicited event and
the time the ELD is updated. When user-space queries the HDMI ELD
immediately after receiving the notification, it might fail because of
this gap.

For avoiding such a problem, this patch tries to delay the HDMI jack
detect notification until ELD information is fully updated. The
workaround is imperfect, but good enough as a starting point.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - Name Haswell HDMI controllers better

commit fab1285a51b7bf55adb4678d82e606829c9dab85 upstream.

"HDA Intel MID" is no correct name for Haswell HDMI controllers.
Give them a better name, "HDA Intel HDMI".

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda: add device IDs for AMD Evergreen/Northern Islands HDMI

commit bbaa0d6665bc14133d7eb573d2b5ff898a06f365 upstream.

The device IDs of the AMD Cypress/Juniper/Redwood/Cedar/Cayman/Antilles/
Barts/Turks/Caicos HDMI HDA controllers weren't added explicitly
because the generic entry works, but it made the device appearing as
"Generic", and people are confused as if it's no proper HDMI
controller. Add them so that the name shows up properly as "ATI HDMI"
instead of "Generic".

According to Takashi's tests and the lack of complaints, these devices
work fine without disabling snooping.

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - Add Device IDs for Intel Wildcat Point-LP PCH

commit 4eeca499be4ff4216b745e35ae8c8bffa6445eac upstream.

This patch adds the HD Audio Device IDs for the Intel Wildcat Point-LP PCH.

Signed-off-by: James Ralston <james.d.ralston@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - rename function not_share_unassigned_cvt()

commit 300016b960661b4df63690177b22ba5426ff5706 upstream.

The function name not_share_unassigned_cvt() is opposite to what it does.
This patch renames it to intel_not_share_assigned_cvt(), and addes comments
to explain why some Intel display codecs need this workaround.

Signed-off-by: Mengdong Lin <mengdong.lin@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - not choose assigned converters for unused pins of Valleyview

commit 023838542dc8a4eac9650f98942671078a4ce73d upstream.

For Valleyview display codec, if an unused pin chooses an assgined converter
selected by a used pin, playback on the unused pin can also give sound to the
output device of the used pin. It's because data flows from the same convertor
to the display port of the used pin. This issue is same as Haswell.

So this patch avoids using assinged convertors for unused pins.
The related function haswell_config_cvts() is renamed for code reuse.

Signed-off-by: Mengdong Lin <mengdong.lin@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - Fix typos in patch_hdmi.c

commit b55447a7301b12d509df4b2909ed38d125ad83d4 upstream.

... which was introduced by the previous commit a4e9a38b, causing
build errors without CONFIG_PROC_FS.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - add codec ID for Valleyview2 display codec

commit cc1a95d9f6423ced191b6f264e9657d98844ea0d upstream.

This patch adds codec ID (0x80862882) and module alias for
Valleyview2 display codec.

Signed-off-by: Mengdong Lin <mengdong.lin@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - Move mutex from hda_eld to per_pin in HDMI codec driver

commit a4e9a38b40a0e2f7dad1a0b355896d23fbdd16e0 upstream.

Since the lock is used primarily in patch_hdmi.c, it's better to move
it in the local struct instead of exporting in hda_eld. The only
functions requiring the lock in hda_eld.c are proc accessors. So in
this patch, the proc entry and its creation/deletion/accessors are
moved into patch_hdmi.c, together with the mutex lock to pin_spec
struct.

The former proc info functions are exported so that they can be called
from patch_hdmi.c.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - Fix possible races in HDMI driver

commit cbbaa603a03cc46681e24d6b2804b62fde95a2af upstream.

Some per_pin fields and ELD contents might be changed dynamically in
multiple ways where the concurrent accesses are still opened in the
current code. This patch fixes such possible races by using eld->lock
in appropriate places.

Reported-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - hdmi: Tweak debug messages to be more useful

commit 980b24958f0c615fd003d37f0fce4ab1ecd01784 upstream.

Allow channel map debugging for both automatic and manual channel maps,
and print CA always when updating infoframe.

Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - hdmi: Fix available channel maps missing from TLV

commit bb731f2100e614a8d7c5965d3663aed893859733 upstream.

Currently the available channel maps TLV only contains channel maps that
are limited to the traditional 7.1 speakers.

Since the other HDMI channel mapping functions have been fixed to
properly handle all CEA-861-E specified speakers, allow them to be
listed.

Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - hdmi: Fix channel maps with less common speakers

commit a5b7d510b2220cccbcaeb1b87a6d8c47efeb154c upstream.

For some speakers and slots the CEA slot <-> speaker assignment depends
on the used CEA Channel Allocation value.

Therefore the from_cea_slot() and to_cea_slot() helpers currently only
work correctly for the regular 7.1 speakers.

Fix them to work with all speakers, taking the re-ordered CA index as
input and adapting use sites accordingly.

This change allows manual channel mapping to actually work for all CEA
allocated speakers. Additionally, this fixes incorrect channel map
reporting in automatic channel mapping mode when an affected speaker
position is used (e.g. 6.1 map which contains an RC speaker).

Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - hdmi: Fix unused slots being enabled in manual and non-PCM mappings

commit 11f7c52d90b21a51b0bc6a8b642c6ed150bdc219 upstream.

hdmi_manual_setup_channel_mapping() and hdmi_std_setup_channel_mapping
try to assign ALSA channels to HDMI channel slots and disable (i.e.
silence) other slots.

However, they try to disable a slot by using AC_VERB_SET_CHAN_SLOT with
parameter ((alsa_ch << 8) | 0xf), while the correct parameter is
((0xf << 8) | hdmi_slot), i.e. the slot should be unassigned, not the
ALSA channel.

Fix that by actually disabling the unused slots.

Note that this bug did not cause any (reported) issues because slots
incorrectly having audio are normally ignored by a receiver if the CEA
channel allocation used does not map that slot to any speaker.
Additionally, the converter channel count configuration limits the
number of actually active channels in any case.

Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - hdmi: Fix programmed active channel count

commit 1df5a06abbaa876ecc01ea84064cdffb4f52a1a1 upstream.

Currently the converter channel count is set to the number of actual
input channels. The audio infoframe channel count field is set
similarly.

However, sometimes the used channel map does not map all input channels
to outputs. Notably, 3 channel modes (e.g. 2.1) require a dummy input
channel so there are 4 input channels. According to the HDA
specification, converter channel count should be programmed according to
the number of _active_ channels.

On Intel HDMI codecs (but not on NVIDIA), setting the converter channel
to a higher value than there are actually mapped channels to HDMI slots
will cause no audio to be output at all.

Note that the effects of this issue are currently partially masked by
other bugs that prevent the driver from actually unmapping channels in
certain cases. For example, if a 4 channel stream is first created and
prepared, it gets a FL,FR,RL,RR mapping (ALSA->HDMI slot mapping 0->0,
1->1, 2->4, 3->5). If one thereafter assigns a FR,FL,FC mapping to it,
the driver will remap 2->3 but fail to unmap 2->4 and 3->5, so there are
still 4 active channels and the issue will not trigger in this case.
These bugs will be fixed separately.

Fix the channel counts in the converter channel count field and in the
audio infoframe channel count field to match the actual number of active
channels.

Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - hdmi: Fix incorrect default channel mapping for unusual CAs

commit 90f28002110d783f49639f0db2ccdc0b58302cbd upstream.

hdmi_std_setup_channel_mapping() selects a Channel Allocation according
to the sink reported speaker mask, preferring the ALSA standard layouts.

If the channel allocation is not one of the ALSA standard layouts, the
ALSA channels are mapped directly to HDMI channels in order. However,
the function does not take into account that there a holes in the HDMI
channel map.

Additionally, the function tries to disable a slot by using
AC_VERB_SET_CHAN_SLOT with parameter ((alsa_ch << 8) | 0xf), while the
correct parameter is ((0xf << 8) | hdmi_slot), i.e. the slot should be
unassigned, not the ALSA channel.

Fix both of the issues for non-ALSA-default layouts.

Tested on Intel HDMI with a speaker mask of FL | FR | FC | RC, which
causes CA 0x06 to be selected for 4-channel audio, which causes
incorrect output (sound destined to RC goes to FC and FC goes nowhere)
without the patch.

Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

s390/appldata: restore missing init_virt_timer()

commit b7c5b1aa2836c933ab03f90391619ebdc9112e46 upstream.

Commit 27f6b416 "s390/vtimer: rework virtual timer interface" removed
the call to init_virt_timer() by mistake, which is added again by this
patch.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

s390,time: revert direct ktime path for s390 clockevent device

commit 8adbf78ec4839c1dc4ff20c9a1f332a7bc99e6e6 upstream.

Git commit 4f37a68cdaf6dea833cfdded2a3e0c47c0f006da
"s390: Use direct ktime path for s390 clockevent device" makes use
of the CLOCK_EVT_FEAT_KTIME clockevent option to avoid the delta
calculation with ktime_get() in clockevents_program_event and the
get_tod_clock() in s390_next_event. This is based on the assumption
that the difference between the internal ktime and the hardware
clock is reflected in the wall_to_monotonic delta. But this is not
true, the ntp corrections are applied via changes to the tk->mult
multiplier and this is not reflected in wall_to_monotonic.

In theory this could be solved by using the raw monotonic clock
but it is simpler to switch back to the standard clock delta
calculation.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

s390/time,vdso: convert to the new update_vsyscall interface

commit 79c74ecbebf76732f91b82a62ce7fc8a88326962 upstream.

Switch to the improved update_vsyscall interface that provides
sub-nanosecond precision for gettimeofday and clock_gettime.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

s390/3270: fix missing device_destroy() call

commit d1e61fe49fd450be15d402ac353784f5ba8a624e upstream.

Unloading the fs3270 kernel module does not remove the created
"3270/tub" device. Reloading the module then causes a sysfs warning:
"sysfs: cannot create duplicate filename '/devices/virtual/3270/3270!tub'".

Call device_destroy() in the module exit function to solve this issue.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ACPI: update win8 OSI blacklist

commit b4cb9244a544a1623305eb58267a90418268d31e upstream.

More people have reported they need this for their machines to work
correctly.

References: https://bugzilla.kernel.org/show_bug.cgi?id=60682
Reported-by: Stefan Hellermann <bugzilla.kernel.org@the2masters.de>
Reported-by: Benedikt Sauer <filmor@gmail.com>
Reported-by: Erno Kuusela <erno@iki.fi>
Reported-by: Jonathan Doman <jonathan.doman@gmail.com>
Reported-by: Christoph Klaffl <christophklaffl@gmail.com>
Reported-by: Jan Hendrik Nielsen <jan.hendrik.nielsen@informatik.hu-berlin.de>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

Linux 3.12.14

drm/i915/dp: add native aux defer retry limit

commit f51a44b9a6c4982cc25bfb3727de9bb893621ebc upstream.

Retrying indefinitely places too much trust on the aux implementation of
the sink devices.

Reported-by: Daniel Martin <consume.noise@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71267
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Tested-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Sree Harsha Totakura <freedesktop@h.totakura.in>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

drm/i915/dp: increase native aux defer retry timeout

commit 04eada25d1f72efdecd32d702706594f81de65d5 upstream.

Give more slack to sink devices before retrying on native aux
defer. AFAICT the 100 us timeout was not based on the DP spec.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Cc: stable@vger.kernel.org (on Jani's request)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

drm/radeon: free uvd ring on unload

commit d965441342f3b7d63db784cad852328d17d47942 upstream.

Need to free the uvd ring. Also reshuffle gart tear down to
happen after uvd tear down.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

drm/radeon: disable pll sharing for DP on DCE4.1

commit 9ef4e1d000a5b335fcebfcf8aef3405e59574c89 upstream.

Causes display problems. We had already disabled
sharing for non-DP displays.

Based on a patch from:
Niels Ole Salscheider <niels_ole@salscheider-online.de>

bug:
https://bugzilla.kernel.org/show_bug.cgi?id=58121

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

drm/radeon: fix missing bo reservation

commit 5e386b574cf7e1593e1296e5b0feea4108ed6ad8 upstream.

Otherwise we might get a crash here.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

drm/radeon: print the supported atpx function mask

commit 9f050c7f9738ffa746c63415136645ad231b1348 upstream.

Print the supported functions mask in addition to
the version. This is useful in debugging PX
problems since we can see what functions are available.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

drm/radeon: fix audio disable on dce6+

commit d7eb0a0940618f36e5937d81c06ad7bf438a99e2 upstream.

Properly clear the enable bit when audio disable is requested.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

dm thin: fix the error path for the thin device constructor

commit 1acacc0784aab45627b6009e0e9224886279ac0b upstream.

dm_pool_close_thin_device() must be called if dm_set_target_max_io_len()
fails in thin_ctr(). Otherwise __pool_destroy() will fail because the
pool will still have an open thin device:

device-mapper: thin metadata: attempt to close pmd when 1 device(s) are still open
device-mapper: thin: __pool_destroy: dm_pool_metadata_close() failed.

Also, must establish error code if failing thin_ctr() because the pool
is in fail_io mode.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

dm thin: avoid metadata commit if a pool's thin devices haven't changed

commit 4d1662a30dde6e545086fe0e8fd7e474c4e0b639 upstream.

Commit 905e51b ("dm thin: commit outstanding data every second")
introduced a periodic commit. This commit occurs regardless of whether
any thin devices have made changes.

Fix the periodic commit to check if any of a pool's thin devices have
changed using dm_pool_changed_this_transaction().

Reported-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

dm mpath: fix stalls when handling invalid ioctls

commit a1989b330093578ea5470bea0a00f940c444c466 upstream.

An invalid ioctl will never be valid, irrespective of whether multipath
has active paths or not. So for invalid ioctls we do not have to wait
for multipath to activate any paths, but can rather return an error
code immediately. This fix resolves numerous instances of:

udevd[]: worker [] unexpectedly returned with status 0x0100

that have been seen during testing.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

dma: ste_dma40: don't dereference free:d descriptor

commit e9baa9d9d520fb0e24cca671e430689de2d4a4b2 upstream.

It appears that in the DMA40 driver the DMA tasklet will very
often dereference memory for a descriptor just free:d from the
DMA40 slab. Nothing happens because no other part of the driver
has yet had a chance to claim this memory, but it's really
nasty to dereference free:d memory, so let's check the flag
before the descriptor is free and store it in a bool variable.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

PM / hibernate: Fix restore hang in freeze_processes()

commit f8d5b9e9e5372f0deb7bc1ab1088a9b60b0a793d upstream.

During restore, pm_notifier chain are called with
PM_RESTORE_PREPARE.  The firmware_class driver handler
fw_pm_notify does not have a handler for this.  As a result,
it keeps a reader on the kmod.c umhelper_sem.  During
freeze_processes, the call to __usermodehelper_disable tries to
take a write lock on this semaphore and hangs waiting.

Signed-off-by: Sebastian Capella <sebastian.capella@linaro.org>
Acked-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

i7300_edac: Fix device reference count

commit 75135da0d68419ef8a925f4c1d5f63d8046e314d upstream.

pci_get_device() decrements the reference count of "from" (last
argument) so when we break off the loop successfully we have only one
device reference - and we don't know which device we have. If we want
a reference to each device, we must take them explicitly and let
the pci_get_device() walk complete to avoid duplicate references.

This is serious, as over-putting device references will cause
the device to eventually disappear. Without this fix, the kernel
crashes after a few insmod/rmmod cycles.

Tested on an Intel S7000FC4UR system with a 7300 chipset.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224111656.09bbb7ed@endymion.delvare
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

qla2xxx: Fix kernel panic on selective retransmission request

commit 6f58c780e5a5b43a6d2121e0d43cdcba1d3cc5fc upstream.

A selective retransmission request (SRR) is a fibre-channel
protocol control request which provides support for requesting
retransmission of a data sequence in response to an issue such as
frame loss or corruption.  These events are experienced
infrequently in fibre-channel based networks which makes
it difficult to test and assess codepaths which handle these
events.

We were fortunate enough, for some definition of fortunate, to
have a metro-area single-mode SAN link which, at 10 GBPS
sustained load levels, would consistently generate SRR's in
a SCST based target implementation using our SCST/in-kernel
Qlogic target interface driver.  In response to an SRR the
in-kernel Qlogic target driver immediately panics resulting
in a catastrophic storage failure for serviced initiators.

The culprit was a debug statement in the qla_target.c file which
does not verify that a pointer to the SCSI CDB is not null.
The unchecked pointer dereference results in the kernel panic
and resultant system failure.

The other two references to the SCSI CDB by the SRR handling code
use a ternary operator to verify a non-null pointer is being
acted on.  This patch simply adds a similar test to the implicated
debug statement.

This patch is a candidate for any stable kernel being maintained
since it addresses a potentially catastrophic event with
minimal downside.

Signed-off-by: Dr. Greg Wettstein <greg@enjellic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ARM64: unwind: Fix PC calculation

commit e306dfd06fcb44d21c80acb8e5a88d55f3d1cf63 upstream.

The frame PC value in the unwind code used to just take the saved LR
value and use that. That's incorrect as a stack trace, since it shows
the return path stack, not the call path stack.

In particular, it shows faulty information in case the bl is done as
the very last instruction of one label, since the return point will be
in the next label. That can easily be seen with tail calls to panic(),
which is marked __noreturn and thus doesn't have anything useful after it.

Easiest here is to just correct the unwind code and do a -4, to get the
actual call site for the backtrace instead of the return site.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

irq-metag*: stop set_affinity vectoring to offline cpus

commit f229006ec6beabf7b844653d92fa61f025fe3dcf upstream.

Fix irq_set_affinity callbacks in the Meta IRQ chip drivers to AND
cpu_online_mask into the cpumask when picking a CPU to vector the
interrupt to.

As Thomas pointed out, the /proc/irq/$N/smp_affinity interface doesn't
filter out offline CPUs, so without this patch if you offline CPU0 and
set an IRQ affinity to 0x3 it vectors the interrupt onto CPU0 even
though it is offline.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-metag@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

mm, thp: fix infinite loop on memcg OOM

commit 9845cbbd113fbb5b769a45d8e88dc47bc12df4e0 upstream.

Masayoshi Mizuma reported a bug with the hang of an application under
the memcg limit.  It happens on write-protection fault to huge zero page

If we successfully allocate a huge page to replace zero page but hit the
memcg limit we need to split the zero page with split_huge_page_pmd()
and fallback to small pages.

The other part of the problem is that VM_FAULT_OOM has special meaning
in do_huge_pmd_wp_page() context.  __handle_mm_fault() expects the page
to be split if it sees VM_FAULT_OOM and it will will retry page fault
handling.  This causes an infinite loop if the page was not split.

do_huge_pmd_wp_zero_page_fallback() can return VM_FAULT_OOM if it failed
to allocate one small page, so fallback to small pages will not help.

The solution for this part is to replace VM_FAULT_OOM with
VM_FAULT_FALLBACK is fallback required.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

Input - arizona-haptics: Fix double lock of dapm_mutex

commit c4204960e9d0ba99459dbf1db918f99a45e7a62a upstream.

snd_soc_dapm_sync takes the dapm_mutex internally, but we currently take
it externally as well. This patch fixes this.

Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ipc,mqueue: remove limits for the amount of system-wide queues

commit f3713fd9cff733d9df83116422d8e4af6e86b2bb upstream.

Commit 93e6f119c0ce ("ipc/mqueue: cleanup definition names and
locations") added global hardcoded limits to the amount of message
queues that can be created.  While these limits are per-namespace,
reality is that it ends up breaking userspace applications.
Historically users have, at least in theory, been able to create up to
INT_MAX queues, and limiting it to just 1024 is way too low and dramatic
for some workloads and use cases.  For instance, Madars reports:

"This update imposes bad limits on our multi-process application.  As
  our app uses approaches that each process opens its own set of queues
  (usually something about 3-5 queues per process).  In some scenarios
  we might run up to 3000 processes or more (which of-course for linux
  is not a problem).  Thus we might need up to 9000 queues or more.  All
  processes run under one user."

Other affected users can be found in launchpad bug #1155695:
  https://bugs.launchpad.net/ubuntu/+source/manpages/+bug/1155695

Instead of increasing this limit, revert it entirely and fallback to the
original way of dealing queue limits -- where once a user's resource
limit is reached, and all memory is used, new queues cannot be created.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reported-by: Madars Vitolins <m@silodev.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

quota: Fix race between dqput() and dquot_scan_active()

commit 1362f4ea20fa63688ba6026e586d9746ff13a846 upstream.

Currently last dqput() can race with dquot_scan_active() causing it to
call callback for an already deactivated dquot. The race is as follows:

CPU1 CPU2
  dqput()
    spin_lock(&dq_list_lock);
    if (atomic_read(&dquot->dq_count) > 1) {
     - not taken
    if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
      spin_unlock(&dq_list_lock);
      ->release_dquot(dquot);
        if (atomic_read(&dquot->dq_count) > 1)
         - not taken
  dquot_scan_active()
    spin_lock(&dq_list_lock);
    if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
     - not taken
    atomic_inc(&dquot->dq_count);
    spin_unlock(&dq_list_lock);
        - proceeds to release dquot
    ret = fn(dquot, priv);
     - called for inactive dquot

Fix the problem by making sure possible ->release_dquot() is finished by
the time we call the callback and new calls to it will notice reference
dquot_scan_active() has taken and bail out.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ioat: fix tasklet tear down

commit da87ca4d4ca101f177fffd84f1f0a5e4c0343557 upstream.

Since commit 77873803363c "net_dma: mark broken" we no longer pin dma
engines active for the network-receive-offload use case.  As a result
the ->free_chan_resources() that occurs after the driver self test no
longer has a NET_DMA induced ->alloc_chan_resources() to back it up.  A
late firing irq can lead to ksoftirqd spinning indefinitely due to the
tasklet_disable() performed by ->free_chan_resources().  Only
->alloc_chan_resources() can clear this condition in affected kernels.

This problem has been present since commit 3e037454bcfa "I/OAT: Add
support for MSI and MSI-X" in 2.6.24, but is now exposed. Given the
NET_DMA use case is deprecated we can revisit moving the driver to use
threaded irqs.  For now, just tear down the irq and tasklet properly by:

1/ Disable the irq from triggering the tasklet

2/ Disable the irq from re-arming

3/ Flush inflight interrupts

4/ Flush the timer

5/ Flush inflight tasklets

References:
https://lkml.org/lkml/2014/1/27/282
https://lkml.org/lkml/2014/2/19/672

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Mike Galbraith <bitbucket@online.de>
Reported-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Tested-by: Mike Galbraith <bitbucket@online.de>
Tested-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

SELinux: bigendian problems with filename trans rules

commit 9085a6422900092886da8c404e1c5340c4ff1cbf upstream.

When writing policy via /sys/fs/selinux/policy I wrote the type and class
of filename trans rules in CPU endian instead of little endian. On
x86_64 this works just fine, but it means that on big endian arch's like
ppc64 and s390 userspace reads the policy and converts it from
le32_to_cpu. So the values are all screwed up. Write the values in le
format like it should have been to start.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

xtensa: introduce spill_registers_kernel macro

commit e2fd1374c705abe4661df3fb6fadb3879c7c1846 upstream.

Most in-kernel users want registers spilled on the kernel stack and
don't require PS.EXCM to be set. That means that they don't need fixup
routine and could reuse regular window overflow mechanism for that,
which makes spill routine very simple.

Suggested-by: Chris Zankel <chris@zankel.net>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

xtensa: save current register frame in fast_syscall_spill_registers_fixup

commit 3251f1e27a5a17f0efd436cfd1e7b9896cfab0a0 upstream.

We need it saved because it contains a3 where we track which register
windows we still need to spill, and fixup handler may call C exception
handlers. Also fix comments.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

irqchip: orion: Fix getting generic chip pointer.

commit d86e9af6336c0ad586a5dbd70064253d40bbb5ff upstream.

Enabling SPARSE_IRQ shows up a bug in the irq-orion bridge interrupt
handler. The bridge interrupt is implemented using a single generic
chip. Thus the parameter passed to irq_get_domain_generic_chip()
should always be zero.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Fixes: 9dbd90f17e4f ("irqchip: Add support for Marvell Orion SoCs")
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

irqchip: orion: clear stale interrupts in irq_startup

commit e0318ec3bf3f1502cd11b21b1eb00aa355b40b67 upstream.

Bridge IRQ_CAUSE bits are asserted regardless of the corresponding bit in
IRQ_MASK register. To avoid interrupt events on stale irqs, we have to clear
them before unmask. This installs an .irq_startup callback to ensure stale
irqs are cleared before initial unmask.

Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Tested-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

irqchip: orion: use handle_edge_irq on bridge irqs

commit 5f40067fc86f0e49329ad4a852c278998ff4394e upstream.

Bridge irqs are edge-triggered, i.e. they get asserted on low-to-high
transitions and not on the level of the downstream interrupt line.
This replaces handle_level_irq by the more appropriate handle_edge_irq.

Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Tested-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

irqchip: orion: clear bridge cause register on init

commit 7b119fd1bdc59a8060df5b659b9f7a70e0169fd6 upstream.

It is good practice to mask and clear pending irqs on init. We already
mask all irqs, so also clear the bridge irq cause register.

Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Tested-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ALSA: hda - Add a fixup for HP Folio 13 mute LED

commit 37c367ecdb9a01c9acc980e6e17913570a1788a7 upstream.

HP Folio 13 may have a broken BIOS that doesn't set up the mute LED
GPIO properly, and the driver guesses it wrongly, too. Add a new
fixup entry for setting the GPIO pin statically for this laptop.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70991
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

perf: Fix hotplug splat

commit e3703f8cdfcf39c25c4338c3ad8e68891cca3731 upstream.

Drew Richardson reported that he could make the kernel go *boom* when hotplugging
while having perf events active.

It turned out that when you have a group event, the code in
__perf_event_exit_context() fails to remove the group siblings from
the context.

We then proceed with destroying and freeing the event, and when you
re-plug the CPU and try and add another event to that CPU, things go
*boom* because you've still got dead entries there.

Reported-by: Drew Richardson <drew.richardson@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/n/tip-k6v5wundvusvcseqj1si0oz0@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

iommu/arm-smmu: set CBARn.BPSHCFG to NSH for s1-s2-bypass contexts

commit 57ca90f6800987ac274d7ba065ae6692cdf9bcd7 upstream.

Whilst trying to bring-up an SMMUv2 implementation with the table
walker plumbed into a coherent interconnect, I noticed that the memory
transactions targetting the CPU caches from the SMMU were marked as
outer-shareable instead of inner-shareable.

After a bunch of digging, it seems that we actually need to program
CBARn.BPSHCFG for s1-s2-bypass contexts to act as non-shareable in order
for the shareability configured in the corresponding TTBCR not to be
overridden with an outer-shareable attribute.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

iommu/arm-smmu: really fix page table locking

commit c9d09e2748eaa55cac2af274574baa6368189bc1 upstream.

Commit a44a9791e778 ("iommu/arm-smmu: use mutex instead of spinlock for
locking page tables") replaced the page table spinlock with a mutex, to
allow blocking allocations to satisfy lazy mapping requests.

Unfortunately, it turns out that IOMMU mappings are created from atomic
context (e.g. spinlock held during a dma_map), so this change doesn't
really help us in practice.

This patch is a partial revert of the offending commit, bringing back
the original spinlock but replacing our page table allocations for any
levels below the pgd (which is allocated during domain init) with
GFP_ATOMIC instead of GFP_KERNEL.

Reported-by: Andreas Herrmann <andreas.herrmann@calxeda.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

iommu/arm-smmu: fix pud/pmd entry fill sequence

commit 97a644208d1a08b7104d1fe2ace8cef011222711 upstream.

The ARM SMMU driver's population of puds and pmds is broken, since we
iterate over the next level of table repeatedly setting the current
level descriptor to point at the pmd being initialised. This is clearly
wrong when dealing with multiple pmds/puds.

This patch fixes the problem by moving the pud/pmd population out of the
loop and instead performing it when we allocate the next level (like we
correctly do for ptes already). The starting address for the next level
is then calculated prior to entering the loop.

Signed-off-by: Yifan Zhang <zhangyf@marvell.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

iio:gyro: bug on L3GD20H gyroscope support

commit a0657716416f834ef7710a9044614d50a36c3bdc upstream.

The driver was not able to manage the sensor: during probe function
and wai check, the driver stops and writes: "device name and WhoAmI mismatch."
The correct value of L3GD20H wai is 0xd7 instead of 0xd4.
Dropped support for the sensor.

Signed-off-by: Denis Ciocca <denis.ciocca@st.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

staging: r8188eu: Add new device ID

commit 260ea9c2e2d330303163e286ab01b66dbcfe3a6f upstream.

The D-Link DWA-123 REV D1 with USB ID 2001:3310 uses this driver.

Signed-off-by: Manu Gupta <manugupt1@gmail.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

staging: binder: Fix death notifications

commit e194fd8a5d8e0a7eeed239a8534460724b62fe2d upstream.

The change (008fa749e0fe5b2fffd20b7fe4891bb80d072c6a) that moved the
node release code to a separate function broke death notifications in
some cases. When it encountered a reference without a death
notification request, it would skip looking at the remaining
references, and therefore fail to send death notifications for them.

Cc: Colin Cross <ccross@android.com>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Jeremy Compostella <jeremy.compostella@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

regulator: da9063: Bug fix when setting max voltage on LDOs 5-11

commit ebf6dad0de89677aa58a4d8b009014ff88a23452 upstream.

Bug fix to allow the setting of maximum voltage for certain LDOs.

What the bug is:

There is a problem caused by an invalid calculation of n_voltages
in the driver. This n_voltages value has the potential to be
different for each regulator.

The value for linear_min_sel is set as DA9063_V##regl_name#
which can be different depending upon the regulator. This is
chosen according to the following definitions in the DA9063
registers.h file:

DA9063_VLDO1_BIAS 0
DA9063_VLDO2_BIAS 0
DA9063_VLDO3_BIAS 0
DA9063_VLDO4_BIAS 0
DA9063_VLDO5_BIAS 2
DA9063_VLDO6_BIAS 2
DA9063_VLDO7_BIAS 2
DA9063_VLDO8_BIAS 2
DA9063_VLDO9_BIAS 3
DA9063_VLDO10_BIAS 2
DA9063_VLDO11_BIAS 2

The calculation for n_voltages is valid for LDOs whose BIAS value
is zero but this is not correct for those LDOs which have a
non-zero value.

What the fix is:

In order to take into account the non-zero linear_min_sel value which
is set for the regulators LDO5, LDO6, LDO7, LDO8, LDO9, LDO10 and
LDO11, the calculation for n_voltages should take into account the
missing term defined by DA9063_V##regl_name#.

This will in turn allow the core constraints calculation to set the
maximum voltage limits correctly and therefore allow users to apply
the maximum expected voltage to all of the LDOs.

Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>

workqueue: ensure @task is valid across kthread_stop()

commit 5bdfff96c69a4d5ab9c49e60abf9e070ecd2acbb upstream.

When a kworker should die, the kworkre is notified through WORKER_DIE
flag instead of kthread_should_stop().  This, IIRC, is primarily to
keep the test synchronized inside worker_pool lock.  WORKER_DIE is
first set while holding pool->lock, the lock is dropped and
kthread_stop() is called.

Unfortunately, this means that there's a slight chance that the target
kworker may see WORKER_DIE before kthread_stop() finishes and exits
and frees the target task before or during kthread_stop().

Fix it by pinning the target task before setting WORKER_DIE and
putting it after kthread_stop() is done.

tj: Improved patch description and comment.  Moved pinning above
    WORKER_DIE for better signify what it's protecting.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>