Heiko Carstens [Tue, 28 Jan 2014 01:07:19 +0000 (17:07 -0800)]
compat: fix sys_fanotify_mark
commit
592f6b842f64e416c7598a1b97c649b34241e22d upstream.
Commit
91c2e0bcae72 ("unify compat fanotify_mark(2), switch to
COMPAT_SYSCALL_DEFINE") added a new unified compat fanotify_mark syscall
to be used by all architectures.
Unfortunately the unified version merges the split mask parameter in a
wrong way: the lower and higher word got swapped.
This was discovered with glibc's tst-fanotify test case.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mark Brown [Mon, 27 Jan 2014 00:32:14 +0000 (00:32 +0000)]
ACPI / init: Flag use of ACPI and ACPI idioms for power supplies to regulator API
commit
49a12877d2777cadcb838981c3c4f5a424aef310 upstream.
There is currently no facility in ACPI to express the hookup of voltage
regulators, the expectation is that the regulators that exist in the
system will be handled transparently by firmware if they need software
control at all. This means that if for some reason the regulator API is
enabled on such a system it should assume that any supplies that devices
need are provided by the system at all relevant times without any software
intervention.
Tell the regulator core to make this assumption by calling
regulator_has_full_constraints(). Do this as soon as we know we are using
ACPI so that the information is available to the regulator core as early
as possible. This will cause the regulator core to pretend that there is
an always on regulator supplying any supply that is requested but that has
not otherwise been mapped which is the behaviour expected on a system with
ACPI.
Should the ability to specify regulators be added in future revisions of
ACPI then once we have support for ACPI mappings in the kernel the same
assumptions will apply. It is also likely that systems will default to a
mode of operation which does not require any interpretation of these
mappings in order to be compatible with existing operating system releases
so it should remain safe to make these assumptions even if the mappings
exist but are not supported by the kernel.
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Josh Triplett [Wed, 21 Aug 2013 00:20:14 +0000 (17:20 -0700)]
turbostat: Use GCC's CPUID functions to support PIC
commit
2b92865e648ce04a39fda4f903784a5d01ecb0dc upstream.
turbostat uses inline assembly to call cpuid. On 32-bit x86, on systems
that have certain security features enabled by default that make -fPIC
the default, this causes a build error:
turbostat.c: In function ‘check_cpuid’:
turbostat.c:1906:2: error: PIC register clobbered by ‘ebx’ in ‘asm’
asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
^
GCC provides a header cpuid.h, containing a __get_cpuid function that
works with both PIC and non-PIC. (On PIC, it saves and restores ebx
around the cpuid instruction.) Use that instead.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Josh Triplett [Wed, 21 Aug 2013 00:20:12 +0000 (17:20 -0700)]
turbostat: Don't put unprocessed uapi headers in the include path
commit
b731f3119de57144e16c19fd593b8daeb637843e upstream.
turbostat's Makefile puts arch/x86/include/uapi/ in the include path, so
that it can include <asm/msr.h> from it. It isn't in general safe to
include even uapi headers directly from the kernel tree without
processing them through scripts/headers_install.sh, but asm/msr.h
happens to work.
However, that include path can break with some versions of system
headers, by overriding some system headers with the unprocessed versions
directly from the kernel source. For instance:
In file included from /build/x86-generic/usr/include/bits/sigcontext.h:28:0,
from /build/x86-generic/usr/include/signal.h:339,
from /build/x86-generic/usr/include/sys/wait.h:31,
from turbostat.c:27:
../../../../arch/x86/include/uapi/asm/sigcontext.h:4:28: fatal error: linux/compiler.h: No such file or directory
This occurs because the system bits/sigcontext.h on that build system
includes <asm/sigcontext.h>, and asm/sigcontext.h in the kernel source
includes <linux/compiler.h>, which scripts/headers_install.sh would have
filtered out.
Since turbostat really only wants a single header, just include that one
header rather than putting an entire directory of kernel headers on the
include path.
In the process, switch from msr.h to msr-index.h, since turbostat just
wants the MSR numbers.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Li Zefan [Tue, 10 Sep 2013 03:43:37 +0000 (11:43 +0800)]
slub: Fix calculation of cpu slabs
commit
8afb1474db4701d1ab80cd8251137a3260e6913e upstream.
/sys/kernel/slab/:t-0000048 # cat cpu_slabs
231 N0=16 N1=215
/sys/kernel/slab/:t-0000048 # cat slabs
145 N0=36 N1=109
See, the number of slabs is smaller than that of cpu slabs.
The bug was introduced by commit
49e2258586b423684f03c278149ab46d8f8b6700
("slub: per cpu cache for partial pages").
We should use page->pages instead of page->pobjects when calculating
the number of cpu partial slabs. This also fixes the mapping of slabs
and nodes.
As there's no variable storing the number of total/active objects in
cpu partial slabs, and we don't have user interfaces requiring those
statistics, I just add WARN_ON for those cases.
Acked-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gregory CLEMENT [Mon, 20 Jan 2014 14:59:50 +0000 (15:59 +0100)]
ARM: mvebu: Fix kernel hang in mvebu_soc_id_init() when of_iomap failed
commit
dc4910d9e93f8cc56b190dd8fc9e789135978216 upstream.
When pci_base is accessed whereas it has not been properly mapped by
of_iomap() the kernel hang. The check of this pointer made an improper
use of IS_ERR() instead of comparing to NULL. This patch fix this
issue.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reported-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Fixes:
930ab3d403ae (i2c: mv64xxx: Add I2C Transaction Generator support)
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sebastian Hesselbarth [Thu, 16 Jan 2014 08:10:31 +0000 (09:10 +0100)]
ARM: orion: provide C-style interrupt handler for MULTI_IRQ_HANDLER
commit
f28d7de6bd4d41774744e011141945affa127da4 upstream.
DT-enabled Marvell Kirkwood and Dove SoCs make use of an irqchip
driver. As expected for irqchip drivers, it uses a C-style
interrupt handler and therefore selects MULTI_IRQ_HANDLER.
Now, compiling a kernel with both non-DT and DT support enabled,
selecting MULTI_IRQ_HANDLER will break ASM irq handler used by
non-DT boards.
Therefore, we provide a C-style irq handler even for non-DT boards,
if MULTI_IRQ_HANDLER is set. By installing the C-style irq handler
in orion_irq_init this is transparent to all non-DT board files.
While the regression report was filed on Marvell Kirkwood, also
Marvell Dove non-DT boards are affected and fixed by this patch.
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Tested-by: Ian Campbell <ijc@hellion.org.uk>
Reported-by: Ian Campbell <ijc@hellion.org.uk>
Fixes:
2326f04321a9 ("ARM: kirkwood: convert to DT irqchip and clocksource")
Fixes:
f07d73e33d0e ("ARM: dove: convert to DT irqchip and clocksource")
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wolfram Sang [Tue, 26 Nov 2013 01:16:25 +0000 (02:16 +0100)]
mmc: core: sd: implement proper support for sd3.0 au sizes
commit
9288cac05405a7da406097a44721aa4004609b4d upstream.
This reverts and updates commit
77776fd0a4cc541b9 ("mmc: sd: fix the
maximum au_size for SD3.0"). The au_size for SD3.0 cannot be achieved
by a simple bit shift, so this needs to be implemented differently.
Also, don't print the warning in case of 0 since 'not defined' is
different from 'invalid'.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Reviewed-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Chris Ball <chris@printf.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ludovic Desroches [Wed, 20 Nov 2013 15:01:11 +0000 (16:01 +0100)]
mmc: atmel-mci: fix timeout errors in SDIO mode when using DMA
commit
66b512eda74d59b17eac04c4da1b38d82059e6c9 upstream.
With some SDIO devices, timeout errors can happen when reading data.
To solve this issue, the DMA transfer has to be activated before sending
the command to the device. This order is incorrect in PDC mode. So we
have to take care if we are using DMA or PDC to know when to send the
MMC command.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ray Jui [Sat, 26 Oct 2013 18:03:44 +0000 (11:03 -0700)]
mmc: fix host release issue after discard operation
commit
f662ae48ae67dfd42739e65750274fe8de46240a upstream.
Under function mmc_blk_issue_rq, after an MMC discard operation,
the MMC request data structure may be freed in memory. Later in
the same function, the check of req->cmd_flags & MMC_REQ_SPECIAL_MASK
is dangerous and invalid. It causes the MMC host not to be released
when it should.
This patch fixes the issue by marking the special request down before
the discard/flush operation.
Reported by: Harold (SoonYeal) Yang <haroldsy@broadcom.com>
Signed-off-by: Ray Jui <rjui@broadcom.com>
Reviewed-by: Seungwon Jeon <tgih.jun@samsung.com>
Acked-by: Seungwon Jeon <tgih.jun@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrey Vagin [Thu, 30 Jan 2014 23:46:10 +0000 (15:46 -0800)]
mm: don't lose the SOFT_DIRTY flag on mprotect
commit
24f91eba18bbfdb27e71a1aae5b3a61b67fcd091 upstream.
The SOFT_DIRTY bit shows that the content of memory was changed after a
defined point in the past. mprotect() doesn't change the content of
memory, so it must not change the SOFT_DIRTY bit.
This bug causes a malfunction: on the first iteration all pages are
dumped. On other iterations only pages with the SOFT_DIRTY bit are
dumped. So if the SOFT_DIRTY bit is cleared from a page by mistake, the
page is not dumped and its content will be restored incorrectly.
This patch does nothing with _PAGE_SWP_SOFT_DIRTY, becase pte_modify()
is called only for present pages.
Fixes commit
0f8975ec4db2 ("mm: soft-dirty bits for user memory changes
tracking").
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cyrill Gorcunov [Thu, 23 Jan 2014 23:53:42 +0000 (15:53 -0800)]
mm: ignore VM_SOFTDIRTY on VMA merging
commit
34228d473efe764d4db7c0536375f0c993e6e06a upstream.
The VM_SOFTDIRTY bit affects vma merge routine: if two VMAs has all bits
in vm_flags matched except dirty bit the kernel can't longer merge them
and this forces the kernel to generate new VMAs instead.
It finally may lead to the situation when userspace application reaches
vm.max_map_count limit and get crashed in worse case
| (gimp:11768): GLib-ERROR **: gmem.c:110: failed to allocate 4096 bytes
|
| (file-tiff-load:12038): LibGimpBase-WARNING **: file-tiff-load: gimp_wire_read(): error
| xinit: connection to X server lost
|
| waiting for X server to shut down
| /usr/lib64/gimp/2.0/plug-ins/file-tiff-load terminated: Hangup
| /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup
| /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup
https://bugzilla.kernel.org/show_bug.cgi?id=67651
https://bugzilla.gnome.org/show_bug.cgi?id=719619#c0
Initial problem came from missed VM_SOFTDIRTY in do_brk() routine but
even if we would set up VM_SOFTDIRTY here, there is still a way to
prevent VMAs from merging: one can call
| echo 4 > /proc/$PID/clear_refs
and clear all VM_SOFTDIRTY over all VMAs presented in memory map, then
new do_brk() will try to extend old VMA and finds that dirty bit doesn't
match thus new VMA will be generated.
As discussed with Pavel, the right approach should be to ignore
VM_SOFTDIRTY bit when we're trying to merge VMAs and if merge successed
we mark extended VMA with dirty bit where needed.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Bastian Hougaard <gnome@rvzt.net>
Reported-by: Mel Gorman <mgorman@suse.de>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Hocko [Thu, 23 Jan 2014 23:53:37 +0000 (15:53 -0800)]
memcg: fix css reference leak and endless loop in mem_cgroup_iter
commit
0eef615665ede1e0d603ea9ecca88c1da6f02234 upstream.
Commit
19f39402864e ("memcg: simplify mem_cgroup_iter") has reorganized
mem_cgroup_iter code in order to simplify it. A part of that change was
dropping an optimization which didn't call css_tryget on the root of the
walked tree. The patch however didn't change the css_put part in
mem_cgroup_iter which excludes root.
This wasn't an issue at the time because __mem_cgroup_iter_next bailed
out for root early without taking a reference as cgroup iterators
(css_next_descendant_pre) didn't visit root themselves.
Nevertheless cgroup iterators have been reworked to visit root by commit
bd8815a6d802 ("cgroup: make css_for_each_descendant() and friends
include the origin css in the iteration") when the root bypass have been
dropped in __mem_cgroup_iter_next. This means that css_put is not
called for root and so css along with mem_cgroup and other cgroup
internal object tied by css lifetime are never freed.
Fix the issue by reintroducing root check in __mem_cgroup_iter_next and
do not take css reference for it.
This reference counting magic protects us also from another issue, an
endless loop reported by Hugh Dickins when reclaim races with root
removal and css_tryget called by iterator internally would fail. There
would be no other nodes to visit so __mem_cgroup_iter_next would return
NULL and mem_cgroup_iter would interpret it as "start looping from root
again" and so mem_cgroup_iter would loop forever internally.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Hugh Dickins <hughd@google.com>
Tested-by: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Hocko [Thu, 23 Jan 2014 23:53:35 +0000 (15:53 -0800)]
memcg: fix endless loop caused by mem_cgroup_iter
commit
ecc736fc3c71c411a9d201d8588c9e7e049e5d8c upstream.
Hugh has reported an endless loop when the hardlimit reclaim sees the
same group all the time. This might happen when the reclaim races with
the memcg removal.
shrink_zone
[rmdir root]
mem_cgroup_iter(root, NULL, reclaim)
// prev = NULL
rcu_read_lock()
mem_cgroup_iter_load
last_visited = iter->last_visited // gets root || NULL
css_tryget(last_visited) // failed
last_visited = NULL [1]
memcg = root = __mem_cgroup_iter_next(root, NULL)
mem_cgroup_iter_update
iter->last_visited = root;
reclaim->generation = iter->generation
mem_cgroup_iter(root, root, reclaim)
// prev = root
rcu_read_lock
mem_cgroup_iter_load
last_visited = iter->last_visited // gets root
css_tryget(last_visited) // failed
[1]
The issue seemed to be introduced by commit
5f5781619718 ("memcg: relax
memcg iter caching") which has replaced unconditional css_get/css_put by
css_tryget/css_put for the cached iterator.
This patch fixes the issue by skipping css_tryget on the root of the
tree walk in mem_cgroup_iter_load and symmetrically doesn't release it
in mem_cgroup_iter_update.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Hugh Dickins <hughd@google.com>
Tested-by: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Weiner [Wed, 29 Jan 2014 22:05:41 +0000 (14:05 -0800)]
mm/page-writeback.c: do not count anon pages as dirtyable memory
commit
a1c3bfb2f67ef766de03f1f56bdfff9c8595ab14 upstream.
The VM is currently heavily tuned to avoid swapping. Whether that is
good or bad is a separate discussion, but as long as the VM won't swap
to make room for dirty cache, we can not consider anonymous pages when
calculating the amount of dirtyable memory, the baseline to which
dirty_background_ratio and dirty_ratio are applied.
A simple workload that occupies a significant size (40+%, depending on
memory layout, storage speeds etc.) of memory with anon/tmpfs pages and
uses the remainder for a streaming writer demonstrates this problem. In
that case, the actual cache pages are a small fraction of what is
considered dirtyable overall, which results in an relatively large
portion of the cache pages to be dirtied. As kswapd starts rotating
these, random tasks enter direct reclaim and stall on IO.
Only consider free pages and file pages dirtyable.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tejun Heo <tj@kernel.org>
Tested-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Weiner [Wed, 29 Jan 2014 22:05:39 +0000 (14:05 -0800)]
mm/page-writeback.c: fix dirty_balance_reserve subtraction from dirtyable memory
commit
a804552b9a15c931cfc2a92a2e0aed1add8b580a upstream.
Tejun reported stuttering and latency spikes on a system where random
tasks would enter direct reclaim and get stuck on dirty pages. Around
50% of memory was occupied by tmpfs backed by an SSD, and another disk
(rotating) was reading and writing at max speed to shrink a partition.
: The problem was pretty ridiculous. It's a 8gig machine w/ one ssd and 10k
: rpm harddrive and I could reliably reproduce constant stuttering every
: several seconds for as long as buffered IO was going on on the hard drive
: either with tmpfs occupying somewhere above 4gig or a test program which
: allocates about the same amount of anon memory. Although swap usage was
: zero, turning off swap also made the problem go away too.
:
: The trigger conditions seem quite plausible - high anon memory usage w/
: heavy buffered IO and swap configured - and it's highly likely that this
: is happening in the wild too. (this can happen with copying large files
: to usb sticks too, right?)
This patch (of 2):
The dirty_balance_reserve is an approximation of the fraction of free
pages that the page allocator does not make available for page cache
allocations. As a result, it has to be taken into account when
calculating the amount of "dirtyable memory", the baseline to which
dirty_background_ratio and dirty_ratio are applied.
However, currently the reserve is subtracted from the sum of free and
reclaimable pages, which is non-sensical and leads to erroneous results
when the system is dominated by unreclaimable pages and the
dirty_balance_reserve is bigger than free+reclaimable. In that case, at
least the already allocated cache should be considered dirtyable.
Fix the calculation by subtracting the reserve from the amount of free
pages, then adding the reclaimable pages on top.
[akpm@linux-foundation.org: fix CONFIG_HIGHMEM build]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tejun Heo <tj@kernel.org>
Tested-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hugh Dickins [Thu, 23 Jan 2014 23:53:32 +0000 (15:53 -0800)]
mm/memcg: iteration skip memcgs not yet fully initialized
commit
d8ad30559715ce97afb7d1a93a12fd90e8fff312 upstream.
It is surprising that the mem_cgroup iterator can return memcgs which
have not yet been fully initialized. By accident (or trial and error?)
this appears not to present an actual problem; but it may be better to
prevent such surprises, by skipping memcgs not yet online.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Naoya Horiguchi [Thu, 23 Jan 2014 23:53:14 +0000 (15:53 -0800)]
mm/memory-failure.c: shift page lock from head page to tail page after thp split
commit
54b9dd14d09f24927285359a227aa363ce46089e upstream.
After thp split in hwpoison_user_mappings(), we hold page lock on the
raw error page only between try_to_unmap, hence we are in danger of race
condition.
I found in the RHEL7 MCE-relay testing that we have "bad page" error
when a memory error happens on a thp tail page used by qemu-kvm:
Triggering MCE exception on CPU 10
mce: [Hardware Error]: Machine check events logged
MCE exception done on CPU 10
MCE 0x38c535: Killing qemu-kvm:8418 due to hardware memory corruption
MCE 0x38c535: dirty LRU page recovery: Recovered
qemu-kvm[8418]: segfault at 20 ip
00007ffb0f0f229a sp
00007fffd6bc5240 error 4 in qemu-kvm[
7ffb0ef14000+420000]
BUG: Bad page state in process qemu-kvm pfn:38c400
page:
ffffea000e310000 count:0 mapcount:0 mapping: (null) index:0x7ffae3c00
page flags: 0x2fffff0008001d(locked|referenced|uptodate|dirty|swapbacked)
Modules linked in: hwpoison_inject mce_inject vhost_net macvtap macvlan ...
CPU: 0 PID: 8418 Comm: qemu-kvm Tainted: G M -------------- 3.10.0-54.0.1.el7.mce_test_fixed.x86_64 #1
Hardware name: NEC NEC Express5800/R120b-1 [N8100-1719F]/MS-91E7-001, BIOS 4.6.3C19 02/10/2011
Call Trace:
dump_stack+0x19/0x1b
bad_page.part.59+0xcf/0xe8
free_pages_prepare+0x148/0x160
free_hot_cold_page+0x31/0x140
free_hot_cold_page_list+0x46/0xa0
release_pages+0x1c1/0x200
free_pages_and_swap_cache+0xad/0xd0
tlb_flush_mmu.part.46+0x4c/0x90
tlb_finish_mmu+0x55/0x60
exit_mmap+0xcb/0x170
mmput+0x67/0xf0
vhost_dev_cleanup+0x231/0x260 [vhost_net]
vhost_net_release+0x3f/0x90 [vhost_net]
__fput+0xe9/0x270
____fput+0xe/0x10
task_work_run+0xc4/0xe0
do_exit+0x2bb/0xa40
do_group_exit+0x3f/0xa0
get_signal_to_deliver+0x1d0/0x6e0
do_signal+0x48/0x5e0
do_notify_resume+0x71/0xc0
retint_signal+0x48/0x8c
The reason of this bug is that a page fault happens before unlocking the
head page at the end of memory_failure(). This strange page fault is
trying to access to address 0x20 and I'm not sure why qemu-kvm does
this, but anyway as a result the SIGSEGV makes qemu-kvm exit and on the
way we catch the bad page bug/warning because we try to free a locked
page (which was the former head page.)
To fix this, this patch suggests to shift page lock from head page to
tail page just after thp split. SIGSEGV still happens, but it affects
only error affected VMs, not a whole system.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Konrad Rzeszutek Wilk [Tue, 26 Nov 2013 20:05:40 +0000 (15:05 -0500)]
xen/pvhvm: If xen_platform_pci=0 is set don't blow up (v4).
commit
51c71a3bbaca868043cc45b3ad3786dd48a90235 upstream.
The user has the option of disabling the platform driver:
00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
which is used to unplug the emulated drivers (IDE, Realtek 8169, etc)
and allow the PV drivers to take over. If the user wishes
to disable that they can set:
xen_platform_pci=0
(in the guest config file)
or
xen_emul_unplug=never
(on the Linux command line)
except it does not work properly. The PV drivers still try to
load and since the Xen platform driver is not run - and it
has not initialized the grant tables, most of the PV drivers
stumble upon:
input: Xen Virtual Keyboard as /devices/virtual/input/input5
input: Xen Virtual Pointer as /devices/virtual/input/input6M
------------[ cut here ]------------
kernel BUG at /home/konrad/ssd/konrad/linux/drivers/xen/grant-table.c:1206!
invalid opcode: 0000 [#1] SMP
Modules linked in: xen_kbdfront(+) xenfs xen_privcmd
CPU: 6 PID: 1389 Comm: modprobe Not tainted 3.13.0-rc1upstream-00021-ga6c892b-dirty #1
Hardware name: Xen HVM domU, BIOS 4.4-unstable 11/26/2013
RIP: 0010:[<
ffffffff813ddc40>] [<
ffffffff813ddc40>] get_free_entries+0x2e0/0x300
Call Trace:
[<
ffffffff8150d9a3>] ? evdev_connect+0x1e3/0x240
[<
ffffffff813ddd0e>] gnttab_grant_foreign_access+0x2e/0x70
[<
ffffffffa0010081>] xenkbd_connect_backend+0x41/0x290 [xen_kbdfront]
[<
ffffffffa0010a12>] xenkbd_probe+0x2f2/0x324 [xen_kbdfront]
[<
ffffffff813e5757>] xenbus_dev_probe+0x77/0x130
[<
ffffffff813e7217>] xenbus_frontend_dev_probe+0x47/0x50
[<
ffffffff8145e9a9>] driver_probe_device+0x89/0x230
[<
ffffffff8145ebeb>] __driver_attach+0x9b/0xa0
[<
ffffffff8145eb50>] ? driver_probe_device+0x230/0x230
[<
ffffffff8145eb50>] ? driver_probe_device+0x230/0x230
[<
ffffffff8145cf1c>] bus_for_each_dev+0x8c/0xb0
[<
ffffffff8145e7d9>] driver_attach+0x19/0x20
[<
ffffffff8145e260>] bus_add_driver+0x1a0/0x220
[<
ffffffff8145f1ff>] driver_register+0x5f/0xf0
[<
ffffffff813e55c5>] xenbus_register_driver_common+0x15/0x20
[<
ffffffff813e76b3>] xenbus_register_frontend+0x23/0x40
[<
ffffffffa0015000>] ? 0xffffffffa0014fff
[<
ffffffffa001502b>] xenkbd_init+0x2b/0x1000 [xen_kbdfront]
[<
ffffffff81002049>] do_one_initcall+0x49/0x170
.. snip..
which is hardly nice. This patch fixes this by having each
PV driver check for:
- if running in PV, then it is fine to execute (as that is their
native environment).
- if running in HVM, check if user wanted 'xen_emul_unplug=never',
in which case bail out and don't load any PV drivers.
- if running in HVM, and if PCI device 5853:0001 (xen_platform_pci)
does not exist, then bail out and not load PV drivers.
- (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=ide-disks',
then bail out for all PV devices _except_ the block one.
Ditto for the network one ('nics').
- (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=unnecessary'
then load block PV driver, and also setup the legacy IDE paths.
In (v3) make it actually load PV drivers.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it
Reported-by: Anthony PERARD <anthony.perard@citrix.com>
Reported-and-Tested-by: Fabio Fantoni <fabio.fantoni@m2r.biz>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Add extra logic to handle the myrid ways 'xen_emul_unplug'
can be used per Ian and Stefano suggestion]
[v3: Make the unnecessary case work properly]
[v4: s/disks/ide-disks/ spotted by Fabio]
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> [for PCI parts]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
AKASHI Takahiro [Mon, 13 Jan 2014 21:33:09 +0000 (13:33 -0800)]
audit: correct a type mismatch in audit_syscall_exit()
commit
06bdadd7634551cfe8ce071fe44d0311b3033d9e upstream.
audit_syscall_exit() saves a result of regs_return_value() in intermediate
"int" variable and passes it to __audit_syscall_exit(), which expects its
second argument as a "long" value. This will result in truncating the
value returned by a system call and making a wrong audit record.
I don't know why gcc compiler doesn't complain about this, but anyway it
causes a problem at runtime on arm64 (and probably most 64-bit archs).
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Richard Guy Briggs [Fri, 13 Sep 2013 03:03:51 +0000 (23:03 -0400)]
audit: reset audit backlog wait time after error recovery
commit
e789e561a50de0aaa8c695662d97aaa5eac9d55f upstream.
When the audit queue overflows and times out (audit_backlog_wait_time), the
audit queue overflow timeout is set to zero. Once the audit queue overflow
timeout condition recovers, the timeout should be reset to the original value.
See also:
https://lkml.org/lkml/2013/9/2/473
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Dan Duval <dan.duval@oracle.com>
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Miklos Szeredi [Wed, 22 Jan 2014 18:36:57 +0000 (19:36 +0100)]
fuse: fix pipe_buf_operations
commit
28a625cbc2a14f17b83e47ef907b2658576a32aa upstream.
Having this struct in module memory could Oops when if the module is
unloaded while the buffer still persists in a pipe.
Since sock_pipe_buf_ops is essentially the same as fuse_dev_pipe_buf_steal
merge them into nosteal_pipe_buf_ops (this is the same as
default_pipe_buf_ops except stealing the page from the buffer is not
allowed).
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bjorn Helgaas [Fri, 17 Jan 2014 21:57:29 +0000 (14:57 -0700)]
Revert "EISA: Initialize device before its resources"
commit
765ee51f9a3f652959b4c7297d198a28e37952b4 upstream.
This reverts commit
26abfeed4341872364386c6a52b9acef8c81a81a.
In the eisa_probe() force_probe path, if we were unable to request slot
resources (e.g., [io 0x800-0x8ff]), we skipped the slot with "Cannot
allocate resource for EISA slot %d" before reading the EISA signature in
eisa_init_device().
Commit
26abfeed4341 moved eisa_init_device() earlier, so we tried to read
the EISA signature before requesting the slot resources, and this caused
hangs during boot.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1251816
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alex Williamson [Tue, 21 Jan 2014 23:48:18 +0000 (15:48 -0800)]
intel-iommu: fix off-by-one in pagetable freeing
commit
08336fd218e087cc4fcc458e6b6dcafe8702b098 upstream.
dma_pte_free_level() has an off-by-one error when checking whether a pte
is completely covered by a range. Take for example the case of
attempting to free pfn 0x0 - 0x1ff, ie. 512 entries covering the first
2M superpage.
The level_size() is 0x200 and we test:
static void dma_pte_free_level(...
...
if (!(0 > 0 || 0x1ff < 0 + 0x200)) {
...
}
Clearly the 2nd test is true, which means we fail to take the branch to
clear and free the pagetable entry. As a result, we're leaking
pagetables and failing to install new pages over the range.
This was found with a PCI device assigned to a QEMU guest using vfio-pci
without a VGA device present. The first 1M of guest address space is
mapped with various combinations of 4K pages, but eventually the range
is entirely freed and replaced with a 2M contiguous mapping.
intel-iommu errors out with something like:
ERROR: DMA PTE for vPFN 0x0 already set (to
5c2b8003 not
849c00083)
In this case
5c2b8003 is the pointer to the previous leaf page that was
neither freed nor cleared and
849c00083 is the superpage entry that
we're trying to replace it with.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wanlong Gao [Tue, 21 Jan 2014 23:48:41 +0000 (15:48 -0800)]
arch/sh/kernel/kgdb.c: add missing #include <linux/sched.h>
commit
53a52f17d96c8d47c79a7dafa81426317e89c7c1 upstream.
arch/sh/kernel/kgdb.c: In function 'sleeping_thread_to_gdb_regs':
arch/sh/kernel/kgdb.c:225:32: error: implicit declaration of function 'task_stack_page' [-Werror=implicit-function-declaration]
arch/sh/kernel/kgdb.c:242:23: error: dereferencing pointer to incomplete type
arch/sh/kernel/kgdb.c:243:22: error: dereferencing pointer to incomplete type
arch/sh/kernel/kgdb.c: In function 'singlestep_trap_handler':
arch/sh/kernel/kgdb.c:310:27: error: 'SIGTRAP' undeclared (first use in this function)
arch/sh/kernel/kgdb.c:310:27: note: each undeclared identifier is reported only once for each function it appears in
This was introduced by commit
16559ae48c76 ("kgdb: remove #include
<linux/serial_8250.h> from kgdb.h").
[geert@linux-m68k.org: reworded and reformatted]
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steven Rostedt (Red Hat) [Thu, 23 Jan 2014 17:27:59 +0000 (12:27 -0500)]
tracing: Check if tracing is enabled in trace_puts()
commit
3132e107d608f8753240d82d61303c500fd515b4 upstream.
If trace_puts() is used very early in boot up, it can crash the machine
if it is called before the ring buffer is allocated. If a trace_printk()
is used with no arguments, then it will be converted into a trace_puts()
and suffer the same fate.
Fixes:
09ae72348ecc "tracing: Add trace_puts() for even faster trace_printk() tracing"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steven Rostedt (Red Hat) [Tue, 14 Jan 2014 15:19:46 +0000 (10:19 -0500)]
tracing: Have trace buffer point back to trace_array
commit
dced341b2d4f06668efaab33f88de5d287c0f45b upstream.
The trace buffer has a descriptor pointer that goes back to the trace
array. But it was never assigned. Luckily, nothing uses it (yet), but
it will in the future.
Although nothing currently uses this, if any of the new features get
backported to older kernels, and because this is such a simple change,
I'm marking it for stable too.
Fixes:
12883efb670c "tracing: Consolidate max_tr into main trace_array structure"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tetsuo Handa [Mon, 6 Jan 2014 12:28:15 +0000 (21:28 +0900)]
SELinux: Fix memory leak upon loading policy
commit
8ed814602876bec9bad2649ca17f34b499357a1c upstream.
Hello.
I got below leak with linux-3.10.0-54.0.1.el7.x86_64 .
[ 681.903890] kmemleak: 5538 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
Below is a patch, but I don't know whether we need special handing for undoing
ebitmap_set_bit() call.
----------
>>From
fe97527a90fe95e2239dfbaa7558f0ed559c0992 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Mon, 6 Jan 2014 16:30:21 +0900
Subject: SELinux: Fix memory leak upon loading policy
Commit
2463c26d "SELinux: put name based create rules in a hashtable" did not
check return value from hashtab_insert() in filename_trans_read(). It leaks
memory if hashtab_insert() returns error.
unreferenced object 0xffff88005c9160d0 (size 8):
comm "systemd", pid 1, jiffies
4294688674 (age 235.265s)
hex dump (first 8 bytes):
57 0b 00 00 6b 6b 6b a5 W...kkk.
backtrace:
[<
ffffffff816604ae>] kmemleak_alloc+0x4e/0xb0
[<
ffffffff811cba5e>] kmem_cache_alloc_trace+0x12e/0x360
[<
ffffffff812aec5d>] policydb_read+0xd1d/0xf70
[<
ffffffff812b345c>] security_load_policy+0x6c/0x500
[<
ffffffff812a623c>] sel_write_load+0xac/0x750
[<
ffffffff811eb680>] vfs_write+0xc0/0x1f0
[<
ffffffff811ec08c>] SyS_write+0x4c/0xa0
[<
ffffffff81690419>] system_call_fastpath+0x16/0x1b
[<
ffffffffffffffff>] 0xffffffffffffffff
However, we should not return EEXIST error to the caller, or the systemd will
show below message and the boot sequence freezes.
systemd[1]: Failed to load SELinux policy. Freezing.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Thu, 6 Feb 2014 19:22:39 +0000 (11:22 -0800)]
Linux 3.12.10
Borislav Petkov [Tue, 14 Jan 2014 23:07:11 +0000 (00:07 +0100)]
x86, cpu, amd: Add workaround for family 16h, erratum 793
commit
3b56496865f9f7d9bcb2f93b44c63f274f08e3b6 upstream.
This adds the workaround for erratum 793 as a precaution in case not
every BIOS implements it. This addresses CVE-2013-6885.
Erratum text:
[Revision Guide for AMD Family 16h Models 00h-0Fh Processors,
document 51810 Rev. 3.04 November 2013]
793 Specific Combination of Writes to Write Combined Memory Types and
Locked Instructions May Cause Core Hang
Description
Under a highly specific and detailed set of internal timing
conditions, a locked instruction may trigger a timing sequence whereby
the write to a write combined memory type is not flushed, causing the
locked instruction to stall indefinitely.
Potential Effect on System
Processor core hang.
Suggested Workaround
BIOS should set MSR
C001_1020[15] = 1b.
Fix Planned
No fix planned
[ hpa: updated description, fixed typo in MSR name ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnic
Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Paul Mackerras [Sat, 18 Jan 2014 10:14:47 +0000 (21:14 +1100)]
powerpc: Make sure "cache" directory is removed when offlining cpu
commit
91b973f90c1220d71923e7efe1e61f5329806380 upstream.
The code in remove_cache_dir() is supposed to remove the "cache"
subdirectory from the sysfs directory for a CPU when that CPU is
being offlined. It tries to do this by calling kobject_put() on
the kobject for the subdirectory. However, the subdirectory only
gets removed once the last reference goes away, and the reference
being put here may well not be the last reference. That means
that the "cache" subdirectory may still exist when the offlining
operation has finished. If the same CPU subsequently gets onlined,
the code tries to add a new "cache" subdirectory. If the old
subdirectory has not yet been removed, we get a WARN_ON in the
sysfs code, with stack trace, and an error message printed on the
console. Further, we ultimately end up with an online cpu with no
"cache" subdirectory.
This fixes it by doing an explicit kobject_del() at the point where
we want the subdirectory to go away. kobject_del() removes the sysfs
directory even though the object still exists in memory. The object
will get freed at some point in the future. A subsequent onlining
operation can create a new sysfs directory, even if the old object
still exists in memory, without causing any problems.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Srivatsa S. Bhat [Mon, 30 Dec 2013 11:35:34 +0000 (17:05 +0530)]
powerpc: Fix the setup of CPU-to-Node mappings during CPU online
commit
d4edc5b6c480a0917e61d93d55531d7efa6230be upstream.
On POWER platforms, the hypervisor can notify the guest kernel about dynamic
changes in the cpu-numa associativity (VPHN topology update). Hence the
cpu-to-node mappings that we got from the firmware during boot, may no longer
be valid after such updates. This is handled using the arch_update_cpu_topology()
hook in the scheduler, and the sched-domains are rebuilt according to the new
mappings.
But unfortunately, at the moment, CPU hotplug ignores these updated mappings
and instead queries the firmware for the cpu-to-numa relationships and uses
them during CPU online. So the kernel can end up assigning wrong NUMA nodes
to CPUs during subsequent CPU hotplug online operations (after booting).
Further, a particularly problematic scenario can result from this bug:
On POWER platforms, the SMT mode can be switched between 1, 2, 4 (and even 8)
threads per core. The switch to Single-Threaded (ST) mode is performed by
offlining all except the first CPU thread in each core. Switching back to
SMT mode involves onlining those other threads back, in each core.
Now consider this scenario:
1. During boot, the kernel gets the cpu-to-node mappings from the firmware
and assigns the CPUs to NUMA nodes appropriately, during CPU online.
2. Later on, the hypervisor updates the cpu-to-node mappings dynamically and
communicates this update to the kernel. The kernel in turn updates its
cpu-to-node associations and rebuilds its sched domains. Everything is
fine so far.
3. Now, the user switches the machine from SMT to ST mode (say, by running
ppc64_cpu --smt=1). This involves offlining all except 1 thread in each
core.
4. The user then tries to switch back from ST to SMT mode (say, by running
ppc64_cpu --smt=4), and this involves onlining those threads back. Since
CPU hotplug ignores the new mappings, it queries the firmware and tries to
associate the newly onlined sibling threads to the old NUMA nodes. This
results in sibling threads within the same core getting associated with
different NUMA nodes, which is incorrect.
The scheduler's build-sched-domains code gets thoroughly confused with this
and enters an infinite loop and causes soft-lockups, as explained in detail
in commit
3be7db6ab (powerpc: VPHN topology change updates all siblings).
So to fix this, use the numa_cpu_lookup_table to remember the updated
cpu-to-node mappings, and use them during CPU hotplug online operations.
Further, we also need to ensure that all threads in a core are assigned to a
common NUMA node, irrespective of whether all those threads were online during
the topology update. To achieve this, we take care not to use cpu_sibling_mask()
since it is not hotplug invariant. Instead, we use cpu_first_sibling_thread()
and set up the mappings manually using the 'threads_per_core' value for that
particular platform. This helps us ensure that we don't hit this bug with any
combination of CPU hotplug and SMT mode switching.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Sterba [Wed, 15 Jan 2014 17:15:52 +0000 (18:15 +0100)]
btrfs: restrict snapshotting to own subvolumes
commit
d024206133ce21936b3d5780359afc00247655b7 upstream.
Currently, any user can snapshot any subvolume if the path is accessible and
thus indirectly create and keep files he does not own under his direcotries.
This is not possible with traditional directories.
In security context, a user can snapshot root filesystem and pin any
potentially buggy binaries, even if the updates are applied.
All the snapshots are visible to the administrator, so it's possible to
verify if there are suspicious snapshots.
Another more practical problem is that any user can pin the space used
by eg. root and cause ENOSPC.
Original report:
https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/484786
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wang Shilong [Tue, 7 Jan 2014 09:26:58 +0000 (17:26 +0800)]
Btrfs: handle EAGAIN case properly in btrfs_drop_snapshot()
commit
90515e7f5d7d24cbb2a4038a3f1b5cfa2921aa17 upstream.
We may return early in btrfs_drop_snapshot(), we shouldn't
call btrfs_std_err() for this case, fix it.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Grover [Sat, 25 Jan 2014 00:18:54 +0000 (16:18 -0800)]
target/iscsi: Fix network portal creation race
commit
ee291e63293146db64668e8d65eb35c97e8324f4 upstream.
When creating network portals rapidly, such as when restoring a
configuration, LIO's code to reuse existing portals can return a false
negative if the thread hasn't run yet and set np_thread_state to
ISCSI_NP_THREAD_ACTIVE. This causes an error in the network stack
when attempting to bind to the same address/port.
This patch sets NP_THREAD_ACTIVE before the np is placed on g_np_list,
so even if the thread hasn't run yet, iscsit_get_np will return the
existing np.
Also, convert np_lock -> np_mutex + hold across adding new net portal
to g_np_list to prevent a race where two threads may attempt to create
the same network portal, resulting in one of them failing.
(nab: Add missing mutex_unlocks in iscsit_add_np failure paths)
(DanC: Fix incorrect spin_unlock -> spin_unlock_bh)
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nicholas Bellinger [Fri, 10 Jan 2014 02:06:59 +0000 (02:06 +0000)]
iscsi-target: Pre-allocate more tags to avoid ack starvation
commit
4a4caa29f1abcb14377e05d57c0793d338fb945d upstream.
This patch addresses an traditional iscsi-target fabric ack starvation
issue where iscsit_allocate_cmd() -> percpu_ida_alloc_state() ends up
hitting slow path percpu-ida code, because iscsit_ack_from_expstatsn()
is expected to free ack'ed tags after tag allocation.
This is done to take into account the tags waiting to be acknowledged
and released in iscsit_ack_from_expstatsn(), but who's number are not
directly limited by the CmdSN Window queue_depth being enforced by
the target.
So that said, this patch bumps up the pre-allocated number of
per session tags to:
(max(queue_depth, ISCSIT_MIN_TAGS) * 2) + ISCSIT_EXTRA_TAGS
for good measure to avoid the percpu_ida_alloc_state() slow path.
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Asias He [Wed, 15 Jan 2014 23:48:48 +0000 (10:18 +1030)]
virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze
commit
f466f75385369a181409e46da272db3de6f5c5cb upstream.
vqs are freed in virtscsi_freeze but the hotcpu_notifier is not
unregistered. We will have a use-after-free usage when the notifier
callback is called after virtscsi_freeze.
Fixes:
285e71ea6f3583a85e27cb2b9a7d8c35d4c0d558
("virtio-scsi: reset virtqueue affinity when doing cpu hotplug")
Signed-off-by: Asias He <asias.hejun@gmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vijaya Mohan Guvva [Wed, 4 Dec 2013 13:43:58 +0000 (05:43 -0800)]
SCSI: bfa: Chinook quad port 16G FC HBA claim issue
commit
dcaf9aed995c2b2a49fb86bbbcfa2f92c797ab5d upstream.
Bfa driver crash is observed while pushing the firmware on to chinook
quad port card due to uninitialized bfi_image_ct2 access which gets
initialized only for CT2 ASIC based cards after request_firmware().
For quard port chinook (CT2 ASIC based), bfi_image_ct2 is not getting
initialized as there is no check for chinook PCI device ID before
request_firmware and instead bfi_image_cb is initialized as it is the
default case for card type check.
This patch includes changes to read the right firmware for quad port chinook.
Signed-off-by: Vijaya Mohan Guvva <vmohan@brocade.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Pugliese [Mon, 9 Dec 2013 19:40:29 +0000 (13:40 -0600)]
usb: core: get config and string descriptors for unauthorized devices
commit
83e83ecb79a8225e79bc8e54e9aff3e0e27658a2 upstream.
There is no need to skip querying the config and string descriptors for
unauthorized WUSB devices when usb_new_device is called. It is allowed
by WUSB spec. The only action that needs to be delayed until
authorization time is the set config. This change allows user mode
tools to see the config and string descriptors earlier in enumeration
which is needed for some WUSB devices to function properly on Android
systems. It also reduces the amount of divergent code paths needed
for WUSB devices.
Signed-off-by: Thomas Pugliese <thomas.pugliese@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mikulas Patocka [Tue, 28 Jan 2014 23:10:44 +0000 (00:10 +0100)]
hpfs: remember free space
commit
2cbe5c76fc5e38e9af4b709593146e4b8272b69e upstream.
Previously, hpfs scanned all bitmaps each time the user asked for free
space using statfs. This patch changes it so that hpfs scans the
bitmaps only once, remembes the free space and on next invocation of
statfs it returns the value instantly.
New versions of wine are hammering on the statfs syscall very heavily,
making some games unplayable when they're stored on hpfs, with load
times in minutes.
This should be backported to the stable kernels because it fixes
user-visible problem (excessive level load times in wine).
Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stephen Warren [Mon, 3 Feb 2014 23:54:04 +0000 (16:54 -0700)]
ALSA: hda/hdmi - allow PIN_OUT to be dynamically enabled
(This is upstream
75fae117a5db "ALSA: hda/hdmi - allow PIN_OUT to be
dynamically enabled", backported to stable 3.10 through 3.12. 3.13 and
later can take the original patch.)
Commit
384a48d71520 "ALSA: hda: HDMI: Support codecs with fewer cvts
than pins" dynamically enabled each pin widget's PIN_OUT only when the
pin was actively in use. This was required on certain NVIDIA CODECs for
correct operation. Specifically, if multiple pin widgets each had their
mux input select the same audio converter widget and each pin widget had
PIN_OUT enabled, then only one of the pin widgets would actually receive
the audio, and often not the one the user wanted!
However, this apparently broke some Intel systems, and commit
6169b673618b "ALSA: hda - Always turn on pins for HDMI/DP" reverted the
dynamic setting of PIN_OUT. This in turn broke the afore-mentioned NVIDIA
CODECs.
This change supports either dynamic or static handling of PIN_OUT,
selected by a flag set up during CODEC initialization. This flag is
enabled for all recent NVIDIA GPUs.
Reported-by: Uosis <uosisl@gmail.com>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Anssi Hannula [Mon, 3 Feb 2014 23:54:03 +0000 (16:54 -0700)]
ALSA: hda - hdmi: introduce patch_nvhdmi()
(This is a backport of *part* of upstream
611885bc963a "ALSA: hda -
hdmi: Disallow unsupported 2ch remapping on NVIDIA codecs" to stable
3.10 through 3.12. Later stable already contain all of the original
patch.)
Mainline commit
611885bc963a "ALSA: hda - hdmi: Disallow unsupported 2ch
remapping on NVIDIA codecs" introduces function patch_nvhdmi(). That
function is edited by
75fae117a5db "ALSA: hda/hdmi - allow PIN_OUT to be
dynamically enabled". In order to backport the PIN_OUT patch, I am first
back-porting just the addition of function patch_nvhdmi(), so that the
conflicts applying the PIN_OUT patch are simplified.
Ideally, one might backport all of
611885bc963a. However, that commit
doesn't apply to stable kernels, since it relies on a chain of other
patches which implement new features.
Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[swarren, extracted just a small part of the original patch]
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Takashi Iwai [Mon, 13 Jan 2014 11:40:07 +0000 (12:40 +0100)]
ALSA: hda - Don't set indep_hp flag for old AD codecs
commit
cbd209f41ea5f39394de5c1fe2dd9aa54a9c5744 upstream.
Some old AD codecs don't like the independent HP handling, either it
contains a single DAC (AD1981) or it mandates the mixer routing
(AD1986A). This patch removes the indep_hp flag for such codecs.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=68081
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mihai Caraman [Thu, 9 Jan 2014 15:01:05 +0000 (17:01 +0200)]
KVM: PPC: e500: Fix bad address type in deliver_tlb_misss()
commit
70713fe315ed14cd1bb07d1a7f33e973d136ae3d upstream.
Use gva_t instead of unsigned int for eaddr in deliver_tlb_miss().
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andreas Schwab [Mon, 30 Dec 2013 14:36:56 +0000 (15:36 +0100)]
KVM: PPC: Book3S HV: use xics_wake_cpu only when defined
commit
48eaef0518a565d3852e301c860e1af6a6db5a84 upstream.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Helge Deller [Fri, 31 Jan 2014 20:33:17 +0000 (21:33 +0100)]
parisc: fix cache-flushing
commit
57737c49dd72c96cfbcd4f66559f3ffc399aeb4f upstream.
This commit:
f8dae00684d678afa13041ef170cecfd1297ed40: parisc: Ensure full cache coherency for kmap/kunmap
caused negative caching side-effects, e.g. hanging processes with expect and
too many inequivalent alias messages from flush_dcache_page() on Debian 5 systems.
This patch now partly reverts it and has been in production use on our debian buildd
makeservers since a week without any major problems.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mikulas Patocka [Thu, 23 Jan 2014 04:04:33 +0000 (23:04 -0500)]
alpha: fix broken network checksum
commit
0ef38d70d4118b2ce1a538d14357be5ff9dc2bbd upstream.
The patch
3ddc5b46a8e90f3c9251338b60191d0a804b0d92 breaks networking on
alpha (there is a follow-up fix
5cfe8f1ba5eebe6f4b6e5858cdb1a5be4f3272a6,
but networking is still broken even with the second patch).
The patch
3ddc5b46a8e90f3c9251338b60191d0a804b0d92 makes
csum_partial_copy_from_user check the pointer with access_ok. However,
csum_partial_copy_from_user is called also from csum_partial_copy_nocheck
and csum_partial_copy_nocheck is called on kernel pointers and it is
supposed not to check pointer validity.
This bug results in ssh session hangs if the system is loaded and bulk
data are printed to ssh terminal.
This patch fixes csum_partial_copy_nocheck to call set_fs(KERNEL_DS), so
that access_ok in csum_partial_copy_from_user accepts kernel-space
addresses.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Neal Cardwell [Mon, 3 Feb 2014 01:40:13 +0000 (20:40 -0500)]
inet_diag: fix inet_diag_dump_icsk() timewait socket state logic
[ Based upon upstream commit
70315d22d3c7383f9a508d0aab21e2eb35b2303a ]
Fix inet_diag_dump_icsk() to reflect the fact that both TIME_WAIT and
FIN_WAIT2 connections are represented by inet_timewait_sock (not just
TIME_WAIT). Thus:
(a) We need to iterate through the time_wait buckets if the user wants
either TIME_WAIT or FIN_WAIT2. (Before fixing this, "ss -nemoi state
fin-wait-2" would not return any sockets, even if there were some in
FIN_WAIT2.)
(b) We need to check tw_substate to see if the user wants to dump
sockets in the particular substate (TIME_WAIT or FIN_WAIT2) that a
given connection is in. (Before fixing this, "ss -nemoi state
time-wait" would actually return sockets in state FIN_WAIT2.)
An analogous fix is in v3.13:
70315d22d3c7383f9a508d0aab21e2eb35b2303a
("inet_diag: fix inet_diag_dump_icsk() to use correct state for
timewait sockets") but that patch is quite different because 3.13 code
is very different in this area due to the unification of TCP hash
tables in 05dbc7b ("tcp/dccp: remove twchain") in v3.13-rc1.
I tested that this applies cleanly between v3.3 and v3.12, and tested
that it works in both 3.3 and 3.12. It does not apply cleanly to 3.2
and earlier (though it makes semantic sense), and semantically is not
the right fix for 3.13 and beyond (as mentioned above).
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Duan Jiong [Tue, 28 Jan 2014 03:49:43 +0000 (11:49 +0800)]
net: gre: use icmp_hdr() to get inner ip header
[ Upstream commit
c0c0c50ff7c3e331c90bab316d21f724fb9e1994 ]
When dealing with icmp messages, the skb->data points the
ip header that triggered the sending of the icmp message.
In gre_cisco_err(), the parse_gre_header() is called, and the
iptunnel_pull_header() is called to pull the skb at the end of
the parse_gre_header(), so the skb->data doesn't point the
inner ip header.
Unfortunately, the ipgre_err still needs those ip addresses in
inner ip header to look up tunnel by ip_tunnel_lookup().
So just use icmp_hdr() to get inner ip header instead of skb->data.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Annie Li [Tue, 28 Jan 2014 03:35:42 +0000 (11:35 +0800)]
xen-netfront: fix resource leak in netfront
[ Upstream commit
cefe0078eea52af17411eb1248946a94afb84ca5 ]
This patch removes grant transfer releasing code from netfront, and uses
gnttab_end_foreign_access to end grant access since
gnttab_end_foreign_access_ref may fail when the grant entry is
currently used for reading or writing.
* clean up grant transfer code kept from old netfront(2.6.18) which grants
pages for access/map and transfer. But grant transfer is deprecated in current
netfront, so remove corresponding release code for transfer.
* fix resource leak, release grant access (through gnttab_end_foreign_access)
and skb for tx/rx path, use get_page to ensure page is released when grant
access is completed successfully.
Xen-blkfront/xen-tpmfront/xen-pcifront also have similar issue, but patches
for them will be created separately.
V6: Correct subject line and commit message.
V5: Remove unecessary change in xennet_end_access.
V4: Revert put_page in gnttab_end_foreign_access, and keep netfront change in
single patch.
V3: Changes as suggestion from David Vrabel, ensure pages are not freed untill
grant acess is ended.
V2: Improve patch comments.
Signed-off-by: Annie Li <annie.li@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Holger Eitzenberger [Mon, 27 Jan 2014 09:33:18 +0000 (10:33 +0100)]
net: Fix memory leak if TPROXY used with TCP early demux
[ Upstream commit
a452ce345d63ddf92cd101e4196569f8718ad319 ]
I see a memory leak when using a transparent HTTP proxy using TPROXY
together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable):
unreferenced object 0xffff88008cba4a40 (size 1696):
comm "softirq", pid 0, jiffies
4294944115 (age 8907.520s)
hex dump (first 32 bytes):
0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00 .. j@..7..2.....
02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<
ffffffff810b710a>] kmem_cache_alloc+0xad/0xb9
[<
ffffffff81270185>] sk_prot_alloc+0x29/0xc5
[<
ffffffff812702cf>] sk_clone_lock+0x14/0x283
[<
ffffffff812aaf3a>] inet_csk_clone_lock+0xf/0x7b
[<
ffffffff8129a893>] netlink_broadcast+0x14/0x16
[<
ffffffff812c1573>] tcp_create_openreq_child+0x1b/0x4c3
[<
ffffffff812c033e>] tcp_v4_syn_recv_sock+0x38/0x25d
[<
ffffffff812c13e4>] tcp_check_req+0x25c/0x3d0
[<
ffffffff812bf87a>] tcp_v4_do_rcv+0x287/0x40e
[<
ffffffff812a08a7>] ip_route_input_noref+0x843/0xa55
[<
ffffffff812bfeca>] tcp_v4_rcv+0x4c9/0x725
[<
ffffffff812a26f4>] ip_local_deliver_finish+0xe9/0x154
[<
ffffffff8127a927>] __netif_receive_skb+0x4b2/0x514
[<
ffffffff8127aa77>] process_backlog+0xee/0x1c5
[<
ffffffff8127c949>] net_rx_action+0xa7/0x200
[<
ffffffff81209d86>] add_interrupt_randomness+0x39/0x157
But there are many more, resulting in the machine going OOM after some
days.
From looking at the TPROXY code, and with help from Florian, I see
that the memory leak is introduced in tcp_v4_early_demux():
void tcp_v4_early_demux(struct sk_buff *skb)
{
/* ... */
iph = ip_hdr(skb);
th = tcp_hdr(skb);
if (th->doff < sizeof(struct tcphdr) / 4)
return;
sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo,
iph->saddr, th->source,
iph->daddr, ntohs(th->dest),
skb->skb_iif);
if (sk) {
skb->sk = sk;
where the socket is assigned unconditionally to skb->sk, also bumping
the refcnt on it. This is problematic, because in our case the skb
has already a socket assigned in the TPROXY target. This then results
in the leak I see.
The very same issue seems to be with IPv6, but haven't tested.
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Oliver Hartkopp [Thu, 23 Jan 2014 09:19:34 +0000 (10:19 +0100)]
fib_frontend: fix possible NULL pointer dereference
[ Upstream commit
a0065f266a9b5d51575535a25c15ccbeed9a9966 ]
The two commits
0115e8e30d (net: remove delay at device dismantle) and
748e2d9396a (net: reinstate rtnl in call_netdevice_notifiers()) silently
removed a NULL pointer check for in_dev since Linux 3.7.
This patch re-introduces this check as it causes crashing the kernel when
setting small mtu values on non-ip capable netdevices.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Or Gerlitz [Thu, 23 Jan 2014 09:28:13 +0000 (11:28 +0200)]
net/vxlan: Share RX skb de-marking and checksum checks with ovs
[ Upstream commit
d0bc65557ad09a57b4db176e9e3ccddb26971453 ]
Make sure the practice set by commit 0afb166 "vxlan: Add capability
of Rx checksum offload for inner packet" is applied when the skb
goes through the portion of the RX code which is shared between
vxlan netdevices and ovs vxlan port instances.
Cc: Joseph Gasparakis <joseph.gasparakis@intel.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Duan Jiong [Thu, 23 Jan 2014 06:00:25 +0000 (14:00 +0800)]
ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
[ Upstream commit
11c21a307d79ea5f6b6fc0d3dfdeda271e5e65f6 ]
commit
a622260254ee48("ip_tunnel: fix kernel panic with icmp_dest_unreach")
clear IPCB in ip_tunnel_xmit() , or else skb->cb[] may contain garbage from
GSO segmentation layer.
But commit
0e6fbc5b6c621("ip_tunnels: extend iptunnel_xmit()") refactor codes,
and it clear IPCB behind the dst_link_failure().
So clear IPCB in ip_tunnel_xmit() just like commti
a622260254ee48("ip_tunnel:
fix kernel panic with icmp_dest_unreach").
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Heiko Carstens [Fri, 17 Jan 2014 08:37:15 +0000 (09:37 +0100)]
s390/bpf,jit: fix 32 bit divisions, use unsigned divide instructions
[ Upstream commit
3af57f78c38131b7a66e2b01e06fdacae01992a3 ]
The s390 bpf jit compiler emits the signed divide instructions "dr" and "d"
for unsigned divisions.
This can cause problems: the dividend will be zero extended to a 64 bit value
and the divisor is the 32 bit signed value as specified A or X accumulator,
even though A and X are supposed to be treated as unsigned values.
The divide instrunctions will generate an exception if the result cannot be
expressed with a 32 bit signed value.
This is the case if e.g. the dividend is 0xffffffff and the divisor either 1
or also 0xffffffff (signed: -1).
To avoid all these issues simply use unsigned divide instructions.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Dumazet [Wed, 15 Jan 2014 14:50:07 +0000 (06:50 -0800)]
bpf: do not use reciprocal divide
[ Upstream commit
aee636c4809fa54848ff07a899b326eb1f9987a2 ]
At first Jakub Zawadzki noticed that some divisions by reciprocal_divide
were not correct. (off by one in some cases)
http://www.wireshark.org/~darkjames/reciprocal-buggy.c
He could also show this with BPF:
http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
The reciprocal divide in linux kernel is not generic enough,
lets remove its use in BPF, as it is not worth the pain with
current cpus.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Daniel Borkmann <dxchgb@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christoph Paasch [Thu, 16 Jan 2014 19:01:21 +0000 (20:01 +0100)]
tcp: metrics: Avoid duplicate entries with the same destination-IP
[ Upstream commit
77f99ad16a07aa062c2d30fae57b1fee456f6ef6 ]
Because the tcp-metrics is an RCU-list, it may be that two
soft-interrupts are inside __tcp_get_metrics() for the same
destination-IP at the same time. If this destination-IP is not yet part of
the tcp-metrics, both soft-interrupts will end up in tcpm_new and create
a new entry for this IP.
So, we will have two tcp-metrics with the same destination-IP in the list.
This patch checks twice __tcp_get_metrics(). First without holding the
lock, then while holding the lock. The second one is there to confirm
that the entry has not been added by another soft-irq while waiting for
the spin-lock.
Fixes:
51c5d0c4b169b (tcp: Maintain dynamic metrics in local cache.)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gerald Schaefer [Thu, 16 Jan 2014 15:54:48 +0000 (16:54 +0100)]
net: rds: fix per-cpu helper usage
[ Upstream commit
c196403b79aa241c3fefb3ee5bb328aa7c5cc860 ]
commit
ae4b46e9d "net: rds: use this_cpu_* per-cpu helper" broke per-cpu
handling for rds. chpfirst is the result of __this_cpu_read(), so it is
an absolute pointer and not __percpu. Therefore, __this_cpu_write()
should not operate on chpfirst, but rather on cache->percpu->first, just
like __this_cpu_read() did before.
Signed-off-byd Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Richard Weinberger [Tue, 14 Jan 2014 21:46:36 +0000 (22:46 +0100)]
net,via-rhine: Fix tx_timeout handling
[ Upstream commit
a926592f5e4e900f3fa903298c4619a131e60963 ]
rhine_reset_task() misses to disable the tx scheduler upon reset,
this can lead to a crash if work is still scheduled while we're resetting
the tx queue.
Fixes:
[ 93.591707] BUG: unable to handle kernel NULL pointer dereference at
0000004c
[ 93.595514] IP: [<
c119d10d>] rhine_napipoll+0x491/0x6
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hannes Frederic Sowa [Mon, 13 Jan 2014 01:45:22 +0000 (02:45 +0100)]
net: avoid reference counter overflows on fib_rules in multicast forwarding
[ Upstream commit
95f4a45de1a0f172b35451fc52283290adb21f6e ]
Bob Falken reported that after 4G packets, multicast forwarding stopped
working. This was because of a rule reference counter overflow which
freed the rule as soon as the overflow happend.
This patch solves this by adding the FIB_LOOKUP_NOREF flag to
fib_rules_lookup calls. This is safe even from non-rcu locked sections
as in this case the flag only implies not taking a reference to the rule,
which we don't need at all.
Rules only hold references to the namespace, which are guaranteed to be
available during the call of the non-rcu protected function reg_vif_xmit
because of the interface reference which itself holds a reference to
the net namespace.
Fixes:
f0ad0860d01e47 ("ipv4: ipmr: support multiple tables")
Fixes:
d1db275dd3f6e4 ("ipv6: ip6mr: support multiple tables")
Reported-by: Bob Falken <NetFestivalHaveFun@gmx.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christian Engelmayer [Sat, 11 Jan 2014 21:19:30 +0000 (22:19 +0100)]
ieee802154: Fix memory leak in ieee802154_add_iface()
[ Upstream commit
267d29a69c6af39445f36102a832b25ed483f299 ]
Fix a memory leak in the ieee802154_add_iface() error handling path.
Detected by Coverity: CID 710490.
Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bjørn Mork [Fri, 10 Jan 2014 22:10:17 +0000 (23:10 +0100)]
net: usbnet: fix SG initialisation
[ Upstream commit
fdc3452cd2c7b2bfe0f378f92123f4f9a98fa2bd ]
Commit
60e453a940ac ("USBNET: fix handling padding packet")
added an extra SG entry in case padding is necessary, but
failed to update the initialisation of the list. This can
cause list traversal to fall off the end of the list,
resulting in an oops.
Fixes:
60e453a940ac ("USBNET: fix handling padding packet")
Reported-by: Thomas Kear <thomas@kear.co.nz>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Tested-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Schmidt [Thu, 9 Jan 2014 13:36:27 +0000 (14:36 +0100)]
bnx2x: fix DMA unmapping of TSO split BDs
[ Upstream commit
95e92fd40c967c363ad66b2fd1ce4dcd68132e54 ]
bnx2x triggers warnings with CONFIG_DMA_API_DEBUG=y:
WARNING: CPU: 0 PID: 2253 at lib/dma-debug.c:887 check_unmap+0xf8/0x920()
bnx2x 0000:28:00.0: DMA-API: device driver frees DMA memory with
different size [device address=0x00000000da2b389e] [map size=1490 bytes]
[unmap size=66 bytes]
The reason is that bnx2x splits a TSO BD into two BDs (headers + data)
using one DMA mapping for both, but it uses only the length of the first
BD when unmapping.
This patch fixes the bug by unmapping the whole length of the two BDs.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shane Huang [Wed, 22 Jan 2014 22:05:46 +0000 (14:05 -0800)]
i2c: piix4: Add support for AMD ML and CZ SMBus changes
commit
032f708bc4f6da868ec49dac48ddf3670d8035d3 upstream.
The locations of SMBus register base address and enablement bit are changed
from AMD ML, which need this patch to be supported.
Signed-off-by: Shane Huang <shane.huang@amd.com>
Reviewed-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gregory CLEMENT [Tue, 31 Dec 2013 16:07:35 +0000 (17:07 +0100)]
i2c: mv64xxx: Document the newly introduced Armada XP A0 compatible
commit
f8b94beb7e6a374cb0de531b72377c49857b35ca upstream.
The first variants of Armada XP SoCs (A0 stepping) have issues related
to the i2c controller which prevent to use the offload mechanism and
lead to a kernel hang during boot.
The commit introduces a new the compatible string
marvell,mv78230-a0-i2c for the i2c controller.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
cc: devicetree@vger.kernel.org
Fixes:
930ab3d403ae (i2c: mv64xxx: Add I2C Transaction Generator support)
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gregory CLEMENT [Tue, 31 Dec 2013 15:59:33 +0000 (16:59 +0100)]
i2c: mv64xxx: Fix bus hang on A0 version of the Armada XP SoCs
commit
6cf70ae928bae17077efc0d528dec49bc380438b upstream.
The first variants of Armada XP SoCs (A0 stepping) have issues related
to the i2c controller which prevent to use the offload mechanism and
lead to a kernel hang during boot.
The commit introduces a new the compatible string
marvell,mv78230-a0-i2c for the i2c controller. When this compatible
string is used the driver disables the offload mechanism and the
kernel no more hangs on these SoCs.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reported-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Fixes:
930ab3d403ae (i2c: mv64xxx: Add I2C Transaction Generator support)
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gregory CLEMENT [Tue, 7 Jan 2014 15:26:01 +0000 (16:26 +0100)]
ARM: mvebu: Add quirk for i2c for the OpenBlocks AX3-4 board
commit
85e618a1be2b2092318178d1d66bdad49cbbeeeb upstream.
The first variants of Armada XP SoCs (A0 stepping) have issues related
to the i2c controller which prevent to use the offload mechanism and
lead to a kernel hang during boot.
This commit add quirk in the mvebu platform code to check the SoC
version and then update the compatible string for the i2c controller
according to the revision of the SoC. Currently only some OpenBlocks
AX3-4 boards are known to use an A0 revision so the check is done only
for these boards.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Fixes:
930ab3d403ae (i2c: mv64xxx: Add I2C Transaction Generator support)
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gregory CLEMENT [Thu, 2 Jan 2014 14:08:59 +0000 (15:08 +0100)]
ARM: mvebu: Add support to get the ID and the revision of a SoC
commit
af8d1c63afcbf36eea06789c92e22d4af118d2fb upstream.
All the mvebu SoCs have information related to their variant and
revision that can be read from the PCI control register.
This patch adds support for Armada XP and Armada 370. This reading of
the revision and the ID are done before the PCI initialization to
avoid any conflicts. Once these data are retrieved, the resources are
freed to let the PCI subsystem use it.
Fixes:
930ab3d403ae (i2c: mv64xxx: Add I2C Transaction Generator support)
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Takashi Iwai [Mon, 13 Jan 2014 11:32:44 +0000 (12:32 +0100)]
hp_accel: Add a new PnP ID HPQ6007 for new HP laptops
commit
b0ad4ff35d479a46a3b995a299db9aeb097acfce upstream.
The DriveGuard chips on the new HP laptops are with a new PnP ID
"HPQ6007". It should be compatible with older chips.
Acked-by: Éric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Minchan Kim [Thu, 30 Jan 2014 23:45:58 +0000 (15:45 -0800)]
zram: fix race between reset and flushing pending work
commit
da4a04126baa3be03bc566d4a2ee0944c5e783d0 upstream.
Dan and Sergey reported that there is a racy between reset and flushing
of pending work so that it could make oops by freeing zram->meta in
reset while zram_slot_free can access zram->meta if new request is
adding during the race window.
This patch moves flush after taking init_lock so it prevents new request
so that it closes the race.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kent Overstreet [Wed, 18 Dec 2013 01:51:02 +0000 (17:51 -0800)]
bcache: Data corruption fix
commit
ef71ec00002d92a08eb27e9d036e3d48835b6597 upstream.
The code that handles overlapping extents that we've just read back in from disk
was depending on the behaviour of the code that handles overlapping extents as
we're inserting into a btree node in the case of an insert that forced an
existing extent to be split: on insert, if we had to split we'd also insert a
new extent to represent the top part of the old extent - and then that new
extent would get written out.
The code that read the extents back in thus not bother with splitting extents -
if it saw an extent that ovelapped in the middle of an older extent, it would
trim the old extent to only represent the bottom part, assuming that the
original insert would've inserted a new extent to represent the top part.
I still haven't figured out _how_ it can happen, but I'm now pretty convinced
(and testing has confirmed) that there's some kind of an obscure corner case
(probably involving extent merging, and multiple overwrites in different sets)
that breaks this. The fix is to change the mergesort fixup code to split extents
itself when required.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric W. Biederman [Mon, 20 Jan 2014 23:26:15 +0000 (15:26 -0800)]
vfs: Is mounted should be testing mnt_ns for NULL or error.
commit
260a459d2e39761fbd39803497205ce1690bc7b1 upstream.
A bug was introduced with the is_mounted helper function in
commit
f7a99c5b7c8bd3d3f533c8b38274e33f3da9096e
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Sat Jun 9 00:59:08 2012 -0400
get rid of ->mnt_longterm
it's enough to set ->mnt_ns of internal vfsmounts to something
distinct from all struct mnt_namespace out there; then we can
just use the check for ->mnt_ns != NULL in the fast path of
mntput_no_expire()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The intent was to test if the real_mount(vfsmount)->mnt_ns was
NULL_OR_ERR but the code is actually testing real_mount(vfsmount)
and always returning true.
The result is d_absolute_path returning paths it should be hiding.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric W. Biederman [Mon, 20 Jan 2014 23:43:25 +0000 (15:43 -0800)]
vfs: Remove second variable named error in __dentry_path
commit
a8323da0366d3398eda62741d2ac1130c8a172ed upstream.
In commit
232d2d60aa5469bb097f55728f65146bd49c1d25
Author: Waiman Long <Waiman.Long@hp.com>
Date: Mon Sep 9 12:18:13 2013 -0400
dcache: Translating dentry into pathname without taking rename_lock
The __dentry_path locking was changed and the variable error was
intended to be moved outside of the loop. Unfortunately the inner
declaration of error was not removed. Resulting in a version of
__dentry_path that will never return an error.
Remove the problematic inner declaration of error and allow
__dentry_path to return errors once again.
Cc: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Theodore Ts'o [Tue, 7 Jan 2014 17:58:19 +0000 (12:58 -0500)]
ext4: avoid clearing beyond i_blocks when truncating an inline data file
commit
09c455aaa8f47a94d5bafaa23d58365768210507 upstream.
A missing cast means that when we are truncating a file which is less
than 60 bytes, we don't clear the correct area of memory, and in fact
we can end up truncating the next inode in the inode table, or worse
yet, some other kernel data structure.
Addresses-Coverity-Id: #751987
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tejun Heo [Thu, 16 Jan 2014 14:47:17 +0000 (09:47 -0500)]
libata: disable LPM for some WD SATA-I devices
commit
ecd75ad514d73efc1bbcc5f10a13566c3ace5f53 upstream.
For some reason, some early WD drives spin up and down drives
erratically when the link is put into slumber mode which can reduce
the life expectancy of the device significantly. Unfortunately, we
don't have full list of devices and given the nature of the issue it'd
be better to err on the side of false positives than the other way
around. Let's disable LPM on all WD devices which match one of the
known problematic model prefixes and are SATA-I.
As horkage list doesn't support matching SATA capabilities, this is
implemented as two horkages - WD_BROKEN_LPM and NOLPM. The former is
set for the known prefixes and sets the latter if the matched device
is SATA-I.
Note that this isn't optimal as this disables all LPM operations and
partial link power state reportedly works fine on these; however, the
way LPM is implemented in libata makes it difficult to precisely map
libata LPM setting to specific link power state. Well, these devices
are already fairly outdated. Let's just disable whole LPM for now.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Nikos Barkas <levelwol@gmail.com>
Reported-and-tested-by: Ioannis Barkas <risc4all@yahoo.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=57211
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Simon Guinot [Tue, 14 Jan 2014 19:10:27 +0000 (20:10 +0100)]
ARM: mvebu: update the SATA compatible string for Armada 370/XP
commit
a96cc303e42ad7830dde929aad0046e448a05505 upstream.
This patch updates the Armada 370/XP SATA node with the new compatible
string "marvell,armada-370-sata".
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Gregory Clement <gregory.clement@free-electrons.com>
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Cc: Lior Amsalem <alior@marvell.com>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lior Amsalem [Tue, 14 Jan 2014 19:09:57 +0000 (20:09 +0100)]
ata: sata_mv: fix disk hotplug for Armada 370/XP SoCs
commit
9013d64e661fc2a37a1742670202171c27fef4b5 upstream.
On Armada 370/XP SoCs, once a disk is removed from a SATA port, then the
re-plug events are not detected by the sata_mv driver. This patch fixes
the issue by updating the PHY speed in the LP_PHY_CTL register (0x58)
according to the SControl speed.
Note that this fix is only applied if the compatible string
"marvell,armada-370-sata" is found in the SATA DT node.
Fixes:
9ae6f740b49f ("arm: mach-mvebu: add support for Armada 370 and Armada XP with DT")
Signed-off-by: Lior Amsalem <alior@marvell.com>
Signed-off-by: Nadav Haklai <nadavh@marvell.com>
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Gregory Clement <gregory.clement@free-electrons.com>
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Simon Guinot [Tue, 14 Jan 2014 19:04:39 +0000 (20:04 +0100)]
ata: sata_mv: introduce compatible string "marvell, armada-370-sata"
commit
b1f5c73bd5a4752efb7d7af019034044b08aafe9 upstream.
The sata_mv driver supports the SATA IP found in several Marvell SoCs.
As some new SATA registers have been introduced with the Armada 370/XP
SoCs, a way to identify them is needed.
This patch introduces a new compatible string for the SATA IP found in
Armada 370/XP SoCs.
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Gregory Clement <gregory.clement@free-electrons.com>
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Cc: Lior Amsalem <alior@marvell.com>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Huewe [Wed, 30 Oct 2013 19:46:55 +0000 (20:46 +0100)]
tpm/tpm_ppi: Do not compare strcmp(a,b) == -1
commit
747d35bd9bb4ae6bd74b19baa5bbe32f3e0cee11 upstream.
Depending on the implementation strcmp might return the difference between
two strings not only -1,0,1 consequently
if (strcmp (a,b) == -1)
might lead to taking the wrong branch
-> compare with < 0 instead,
which in any case is more canonical.
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Huewe [Tue, 29 Oct 2013 23:54:20 +0000 (00:54 +0100)]
tpm/tpm_i2c_stm_st33: Check return code of get_burstcount
commit
85c5e0d451125c6ddb78663972e40af810b83644 upstream.
The 'get_burstcount' function can in some circumstances 'return -EBUSY' which
in tpm_stm_i2c_send is stored in an 'u32 burstcnt'
thus converting the signed value into an unsigned value, resulting
in 'burstcnt' being huge.
Changing the type to u32 only does not solve the problem as the signed
value is converted to an unsigned in I2C_WRITE_DATA, resulting in the
same effect.
Thus
-> Change type of burstcnt to u32 (the return type of get_burstcount)
-> Add a check for the return value of 'get_burstcount' and propagate a
potential error.
This makes also sense in the 'I2C_READ_DATA' case, where the there is no
signed/unsigned conversion.
found by coverity
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Adrien Vergé [Fri, 24 Jan 2014 19:56:14 +0000 (14:56 -0500)]
ALSA: hda - Fix silent output on MacBook Air 1,1
commit
e7729a415315fcd9516912050d85d5aaebcededc upstream.
Similarly to other Apple products, MBA 1,1 needs a specific quirk.
Pin 0x18 must be set to VREF_50 to have sound output. This was no
longer done since commit 1a97b7f, resulting in a mute built-in speaker.
This patch corrects the regression by creating a fixup for the MBA 1,1.
Fixes:
1a97b7f22774 ("ALSA: hda/realtek - Remove the last static quirks for ALC882")
Tested-by: Adrien Vergé <adrienverge@gmail.com>
Signed-off-by: Adrien Vergé <adrienverge@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Carpenter [Thu, 23 Jan 2014 08:21:28 +0000 (11:21 +0300)]
ALSA: bits vs bytes bug in snd_card_create()
commit
4c3773eda49c872a3034382f8ec3080002e715bf upstream.
The test here is intended intended to prevent shift wrapping bugs when
we do "1U << idx2". We should consider the number of bits in a u32
instead of the number of bytes.
[fix another chunk similarly by tiwai]
Fixes:
7bb2491b35a2 ('ALSA: Add kconfig to specify the max card numbers')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Takashi Iwai [Fri, 10 Jan 2014 13:20:42 +0000 (14:20 +0100)]
ALSA: Enable CONFIG_ZONE_DMA for smaller PCI DMA masks
commit
80ab8eae70e51d578ebbeb228e0f7a562471b8b7 upstream.
The PCI devices with DMA masks smaller than 32bit should enable
CONFIG_ZONE_DMA. Since the recent change of page allocator, page
allocations via dma_alloc_coherent() with the limited DMA mask bits
may fail more frequently, ended up with no available buffers, when
CONFIG_ZONE_DMA isn't enabled. With CONFIG_ZONE_DMA, the system has
much more chance to obtain such pages.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=68221
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Takashi Iwai [Tue, 7 Jan 2014 17:11:44 +0000 (18:11 +0100)]
ALSA: hda - Don't create duplicated ctls for loopback paths
commit
43a8e50a46a4e1dd1451e4a4ffa1f7695fb7d287 upstream.
AD1986A mic pins (0x1d and 0x1f) share the same widget for controlling
the loopback volume/mute, but the generic parser didn't check it.
This ended up with the duplicated controls for the same effect.
This patch adds the check of the duplication for avoiding it.
After this fix, there will be only one control although it affects
both paths; this remaining issue should be fixed later in a different
patch.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=66621
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Takashi Iwai [Tue, 7 Jan 2014 16:48:11 +0000 (17:48 +0100)]
ALSA: hda - Correct AD1986A 3stack pin configs
commit
ed0e0d0617a8dc3d8b82c6e54827f269f2247b07 upstream.
The 3stack pin configs for AD1986A codec had incorrect values that
resulted in broken mic and line-in.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=66621
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Takashi Iwai [Thu, 26 Dec 2013 22:13:08 +0000 (00:13 +0200)]
ALSA: rme9652: fix a missing comma in channel_map_9636_ds[]
commit
770bd4bf2e664939a9dacd3d26ec9ff7a3933210 upstream.
The lack of comma leads to the wrong channel for an SPDIF channel.
Unfortunately this wasn't caught by compiler because it's still a
valid expression.
Reported-by: Alexander Aristov <aristov.alexander@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Charles Keepax [Tue, 21 Jan 2014 16:27:51 +0000 (16:27 +0000)]
ASoC: wm5110: Extend SYSCLK patch file for rev D
commit
34354792432b6e0a3b156819a1a19716c50d3473 upstream.
Latest evaluation of the the device has given some patch file additions
for improved performance.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lars-Peter Clausen [Wed, 8 Jan 2014 10:22:25 +0000 (11:22 +0100)]
ASoC: adau1701: Fix ADAU1701_SEROCTL_WORD_LEN_16 constant
commit
e20970ada3f699c113fe64b04492af083d11a7d8 upstream.
The driver defines ADAU1701_SEROCTL_WORD_LEN_16 as 0x10 while it should be b10,
so 0x2. This patch fixes it.
Reported-by: Magnus Reftel <magnus.reftel@lockless.no>
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Krzysztof Kozlowski [Fri, 20 Dec 2013 09:35:07 +0000 (10:35 +0100)]
mfd: max77686: Fix regmap resource leak on driver remove
commit
74142ffc0b52cfe6f9d2f6f34a5f3eedbfe3ce51 upstream.
The regmap used by max77686 MFD driver was not freed with regmap_exit()
on driver exit. This lead to leak of resources.
Replace regmap_init_i2c() call in driver probe with initialization of
managed register map so the regmap will be properly freed by the device
management code.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dongsheng Yang [Fri, 20 Dec 2013 18:41:47 +0000 (13:41 -0500)]
perf kvm: Fix kvm report without guestmount.
commit
ad85ace07a05062ef6b59c35a5e80b6eaee1eee6 upstream.
Currently, if we use perf kvm --guestkallsyms --guestmodules report, we
can not get the perf information from perf data file. All sample are
shown as unknown.
Reproducing steps:
# perf kvm --guestkallsyms /tmp/kallsyms --guestmodules /tmp/modules record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.624 MB perf.data.guest (~27260 samples) ]
# perf kvm --guestkallsyms /tmp/kallsyms --guestmodules /tmp/modules report |grep %
100.00% [guest/6471] [unknown] [g] 0xffffffff8164f330
This bug was introduced by
207b57926 (perf kvm: Fix regression with guest machine creation).
In original code, it uses perf_session__find_machine(), it means we deliver symbol to machine
which has the same pid, if no machine found, deliver it to *default* guest. But if we use
perf_session__findnew_machine() here, if no machine was found, new machine with pid will be built
and added. Then the default guest which with pid == 0 will never get a symbol.
And because the new machine initialized here has no kernel map created, the symbol delivered to
it will be marked as "unknown".
This patch here is to revert commit
207b57926 and fix the SEGFAULT bug in another way.
Verification steps:
# ./perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.651 MB perf.data.guest (~28437 samples) ]
# ./perf kvm --guestkallsyms /home/kallsyms --guestmodules /home/modules report |grep %
22.64% :6471 [guest.kernel.kallsyms] [g] update_rq_clock.part.70
19.99% :6471 [guest.kernel.kallsyms] [g] d_free
18.46% :6471 [guest.kernel.kallsyms] [g] bio_phys_segments
16.25% :6471 [guest.kernel.kallsyms] [g] dequeue_task
12.78% :6471 [guest.kernel.kallsyms] [g] __switch_to
7.91% :6471 [guest.kernel.kallsyms] [g] scheduler_tick
1.75% :6471 [guest.kernel.kallsyms] [g] native_apic_mem_write
0.21% :6471 [guest.kernel.kallsyms] [g] apic_timer_interrupt
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/r/1387564907-3045-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chen-Yu Tsai [Thu, 16 Jan 2014 06:34:23 +0000 (14:34 +0800)]
pinctrl: sunxi: Honor GPIO output initial vaules
commit
fa8cf57c923e86a693a85aff1df579245a27cbb3 upstream.
Some GPIO users, such as fixed-regulator, request GPIO output with
initial value of 1. This was ignored by sunxi driver.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stephen Warren [Thu, 23 Jan 2014 23:55:19 +0000 (15:55 -0800)]
rtc: max8907: weekday encoding fixes
commit
75ea799df4cb07e505c91b4abaa87bc28aad3e66 upstream.
The current MAX8907 driver has two issues related to weekday value
handling:
1)
The HW WEEKDAY register has range 0..6 rather than 1..7 as documented.
Note that I validated the actual HW range by observing the HW register
roll from 6->0 rather than 6->7->1 as would otherwise be expected.
This matches Linux's tm_wday range of 0..6.
When the CMOS RAM content is lost, the date returned from the device is
2007-01-01 00:00:00, which is a Monday. The WEEKDAY register reads 1 in
this case. This matches the numbering in Linux's tm_wday field.
Hence we should write Linux's tm_wday value to the register without
modifying it. Hence, remove the +1/-1 calculations for WEEKDAY/tm_wday.
2)
There's no need to make alarms match on the WEEKDAY register, since the
other fields together uniquely define the alarm date/time. Ignoring the
WEEKDAY value in the match isolates the driver from any incorrect value in
the current time copy of the WEEKDAY register.
Each change individually, or both together, solves an issue that I
observed; "hwclock -r" would time out waiting for its alarm to fire if the
CMOS RAM content had been lost, and hence the WEEKDAY register value
mismatched what the driver expected it to be. "hwclock -w" would solve
this by over-writing the HW default WEEKDAY register value with what the
driver expected.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sebastian Andrzej Siewior [Wed, 27 Nov 2013 16:43:43 +0000 (17:43 +0100)]
parport: parport_pc: remove double PCI ID for NetMos
commit
d6a484520c5572a4170fa915109ccfc0c38f5008 upstream.
In commit 85747f ("PATCH] parport: add NetMOS 9805 support") Max added
the PCI ID for NetMOS 9805 based on a Debian bug report from 2k4 which
was at the v2.4.26 time frame. The patch made into 2.6.14.
Shortly before that patch akpm merged commit
296d3c783b ("[PATCH] Support
NetMOS based PCI cards providing serial and parallel ports") which made
into v2.6.9-rc1.
Now we have two different entries for the same PCI id.
I have here the NetMos 9805 which claims to support SPP/EPP/ECP mode.
This patch takes Max's entry for titan_1284p1 (base != -1 specifies the
ioport for ECP mode) and replaces akpm's entry for netmos_9805 which
specified -1 (=none). Both share the same PCI-ID (my card has subsystem
0x1000 / 0x0020 so it should match PCI_ANY).
While here I also drop the entry for titan_1284p2 which is the same as
netmos_9815.
Cc: Maximilian Attems <maks@stro.at>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Heiko Carstens [Tue, 21 Jan 2014 16:31:10 +0000 (17:31 +0100)]
s390/uapi: fix struct statfs64 definition
commit
4e078146dff728f4865270a47710d57797e81eb6 upstream.
With
b8668fd0a7e1b59f "s390/uapi: change struct statfs[64] member types
to unsigned values" the size of a couple of struct statfs64 member got
incorrectly changed from 64 to 32 bit for 32 bit builds.
Fix this by changing the type of couple of struct statfs64 members from
unsigned long to unsigned long long.
The definition of struct compat_statfs64 was correct however.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dominik Dingel [Mon, 9 Dec 2013 17:30:01 +0000 (18:30 +0100)]
KVM: s390: ioeventfd: ignore leftmost bits
commit
ff1f3cb4b3ac5d039f02679f34cb1498d110d241 upstream.
The diagnose 500 subcode 3 contains the 32 bit subchannel id in bits 32-63
(counting from the left). As for other I/O instructions, bits 0-31 should be
ignored and thus not be passed to kvm_io_bus_write_cookie().
This fixes a bug where the guest passed non-zero bits 0-31 which the
host tried to interpret, leading to ioeventfd notification failures.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Heiko Carstens [Mon, 11 Nov 2013 12:56:47 +0000 (13:56 +0100)]
KVM: s390: fix diagnose code extraction
commit
743db27c526e0f31cc507959d662e97e2048a86f upstream.
The diagnose code to be used is the contents of the base register (if not
zero), plus the displacement. The current code ignores the base register
contents. So let's fix that...
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stephen Warren [Tue, 7 Jan 2014 22:00:12 +0000 (15:00 -0700)]
serial: 8250: enable UART_BUG_NOMSR for Tegra
commit
3685f19e07802ec4207b52465c408f185b66490e upstream.
Tegra chips have 4 or 5 identical UART modules embedded. UARTs C..E have
their MODEM-control signals tied off to a static state. However UARTs A
and B can optionally route those signals to/from package pins, depending
on the exact pinmux configuration.
When these signals are not routed to package pins, false interrupts may
trigger either temporarily, or permanently, all while not showing up in
the IIR; it will read as NO_INT. This will eventually lead to the UART
IRQ being disabled due to unhandled interrupts. When this happens, the
kernel may print e.g.:
irq 68: nobody cared (try booting with the "irqpoll" option)
In order to prevent this, enable UART_BUG_NOMSR. This prevents
UART_IER_MSI from being enabled, which prevents the false interrupts
from triggering.
In practice, this is not needed under any of the following conditions:
* On Tegra chips after Tegra30, since the HW bug has apparently been
fixed.
* On UARTs C..E since their MODEM control signals are tied to the correct
static state which doesn't trigger the issue.
* On UARTs A..B if the MODEM control signals are routed out to package
pins, since they will then carry valid signals.
However, we ignore these exceptions for now, since they are only relevant
if a board actually hooks up more than a 4-wire UART, and no currently
supported board does this. If we ever support a board that does, we can
refine the algorithm that enables UART_BUG_NOMSR to take those exceptions
into account, and/or read a flag from DT/... that indicates that the
board has hooked up and pinmux'd more than a 4-wire UART.
Reported-by: Olof Johansson <olof@lixom.net> # autotester
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jonathan Woithe [Mon, 9 Dec 2013 06:03:08 +0000 (16:33 +1030)]
serial: 8250: Fix initialisation of Quatech cards with the AMCC PCI chip
commit
9c5320f8d7d9a2cf623e65d50e1113f34d9b9eb1 upstream.
Fix the initialisation of older Quatech serial cards which are fitted with
the AMCC PCI Matchmaker interface chip.
Signed-off-by: Jonathan Woithe (jwoithe@just42.net)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yegor Yefremov [Mon, 9 Dec 2013 11:11:15 +0000 (12:11 +0100)]
serial: add support for 200 v3 series Titan card
commit
48c0247d7b7bf58abb85a39021099529df365c4d upstream.
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sarah Sharp [Mon, 6 Jan 2014 21:07:03 +0000 (13:07 -0800)]
xhci: Set scatter-gather limit to avoid failed block writes.
commit
f2d9b991c549f159dc9ae81f77d8206c790cbfee upstream.
Commit
35773dac5f862cb1c82ea151eba3e2f6de51ec3e "usb: xhci: Link TRB
must not occur within a USB payload burst" attempted to fix an issue
found with USB ethernet adapters, and inadvertently broke USB storage
devices. The patch attempts to ensure that transfers never span a
segment, and rejects transfers that have more than 63 entries (or
possibly less, if some entries cross 64KB boundaries).
usb-storage limits the maximum transfer size to 120K, and we had assumed
the block layer would pass a scatter-gather list of 4K entries,
resulting in no more than 31 sglist entries:
http://marc.info/?l=linux-usb&m=
138498190419312&w=2
That assumption was wrong, since we've seen the driver reject a write
that was 218 sectors long (of probably 512 bytes each):
Jan 1 07:04:49 jidanni5 kernel: [ 559.624704] xhci_hcd 0000:00:14.0: Too many fragments 79, max 63
...
Jan 1 07:04:58 jidanni5 kernel: [ 568.622583] Write(10): 2a 00 00 06 85 0e 00 00 da 00
Limit the number of scatter-gather entries to half a ring segment. That
should be margin enough in case some entries cross 64KB boundaries.
Increase the number of TRBs per segment from 64 to 256, which should
result in ring segments fitting on a 4K page.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reported-by: jidanni@jidanni.org
References: http://bugs.debian.org/733907
Fixes:
35773dac5f86 ('usb: xhci: Link TRB must not occur within a USB payload burst')
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>