platform/kernel/linux-exynos.git
10 years agoshmem: support RENAME_EXCHANGE
Miklos Szeredi [Wed, 23 Jul 2014 13:15:34 +0000 (15:15 +0200)]
shmem: support RENAME_EXCHANGE

This is really simple in tmpfs since the VFS already takes care of
shuffling the dentries.  Just adjust nlink on parent directories and touch
c & mtimes.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoshmem: support RENAME_NOREPLACE
Miklos Szeredi [Wed, 23 Jul 2014 13:15:33 +0000 (15:15 +0200)]
shmem: support RENAME_NOREPLACE

Implement ->rename2 instead of ->rename.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agobtrfs: add RENAME_NOREPLACE
Miklos Szeredi [Wed, 23 Jul 2014 13:15:32 +0000 (15:15 +0200)]
btrfs: add RENAME_NOREPLACE

RENAME_NOREPLACE is trivial to implement for most filesystems: switch over
to ->rename2() and check for the supported flags.  The rest is done by the
VFS.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Chris Mason <clm@fb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agobad_inode: add ->rename2()
Miklos Szeredi [Wed, 23 Jul 2014 13:15:31 +0000 (15:15 +0200)]
bad_inode: add ->rename2()

so we return -EIO instead of -EINVAL.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agofs: call rename2 if exists
Miklos Szeredi [Wed, 23 Jul 2014 13:15:30 +0000 (15:15 +0200)]
fs: call rename2 if exists

Christoph Hellwig suggests:

1) make vfs_rename call ->rename2 if it exists instead of ->rename
2) switch all filesystems that you're adding NOREPLACE support for to
   use ->rename2
3) see how many ->rename instances we'll have left after a few
   iterations of 2.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agokernel/acct.c: fix coding style warnings and errors
Ionut Alexa [Wed, 30 Jul 2014 23:28:36 +0000 (09:28 +1000)]
kernel/acct.c: fix coding style warnings and errors

Signed-off-by: Ionut Alexa <ionut.m.alexa@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agodeath to mnt_pinned
Al Viro [Thu, 7 Aug 2014 13:12:31 +0000 (09:12 -0400)]
death to mnt_pinned

Rather than playing silly buggers with vfsmount refcounts, just have
acct_on() ask fs/namespace.c for internal clone of file->f_path.mnt
and replace it with said clone.  Then attach the pin to original
vfsmount.  Voila - the clone will be alive until the file gets closed,
making sure that underlying superblock remains active, etc., and
we can drop the original vfsmount, so that it's not kept busy.
If the file lives until the final mntput of the original vfsmount,
we'll notice that there's an fs_pin (one in bsd_acct_struct that
holds that file) and mnt_pin_kill() will take it out.  Since
->kill() is synchronous, we won't proceed past that point until
these files are closed (and private clones of our vfsmount are
gone), so we get the same ordering warranties we used to get.

mnt_pin()/mnt_unpin()/->mnt_pinned is gone now, and good riddance -
it never became usable outside of kernel/acct.c (and racy wrt
umount even there).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agomake fs/{namespace,super}.c forget about acct.h
Al Viro [Wed, 21 May 2014 22:22:52 +0000 (18:22 -0400)]
make fs/{namespace,super}.c forget about acct.h

These externs belong in fs/internal.h.  Rename (they are not acct-specific
anymore) and move them over there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agotake fs_pin stuff to fs/*
Al Viro [Thu, 7 Aug 2014 12:39:04 +0000 (08:39 -0400)]
take fs_pin stuff to fs/*

Add a new field to fs_pin - kill(pin).  That's what umount and r/o remount
will be calling for all pins attached to vfsmount and superblock resp.
Called after bumping the refcount, so it won't go away under us.  Dropping
the refcount is responsibility of the instance.  All generic stuff moved to
fs/fs_pin.c; the next step will rip all the knowledge of kernel/acct.c from
fs/super.c and fs/namespace.c.  After that - death to mnt_pin(); it was
intended to be usable as generic mechanism for code that wants to attach
objects to vfsmount, so that they would not make the sucker busy and
would get killed on umount.  Never got it right; it remained acct.c-specific
all along.  Now it's very close to being killable.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agostart carving bsd_acct_struct up
Al Viro [Thu, 7 Aug 2014 12:00:52 +0000 (08:00 -0400)]
start carving bsd_acct_struct up

pull generic parts into struct fs_pin.  Eventually we want those
to replace mnt_pin()/mnt_unpin() mess; that stuff will move to
fs/*.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoacct: move mnt_pin() upwards.
Al Viro [Thu, 7 Aug 2014 11:51:29 +0000 (07:51 -0400)]
acct: move mnt_pin() upwards.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agomake acct_kill() wait for file closing.
Al Viro [Thu, 7 Aug 2014 11:35:19 +0000 (07:35 -0400)]
make acct_kill() wait for file closing.

Do actual closing of file via schedule_work().  And use
__fput_sync() there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agodrop ->s_umount around acct_auto_close()
Al Viro [Thu, 7 Aug 2014 11:32:06 +0000 (07:32 -0400)]
drop ->s_umount around acct_auto_close()

just repeat the frozen check after regaining it, and check that sb
is still alive.  If several threads hit acct_auto_close() at the
same time, acct_auto_close() will survive that just fine.  And we
really don't want to play with writes and closing the file with
->s_umount held exclusive - it's a deadlock country.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoacct: get rid of acct_lock for acct->count
Al Viro [Thu, 7 Aug 2014 11:04:28 +0000 (07:04 -0400)]
acct: get rid of acct_lock for acct->count

* make acct->count atomic and acct freeing - rcu-delayed.
* instead of grabbing acct_lock around the places where we take a reference,
do that under rcu_read_lock() with atomic_long_inc_not_zero().
* have the new acct locked before making ns->bacct point to it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoacct: get rid of acct_list
Al Viro [Thu, 7 Aug 2014 10:23:41 +0000 (06:23 -0400)]
acct: get rid of acct_list

Put these suckers on per-vfsmount and per-superblock lists instead.
Note: right now it's still acct_lock for everything, but that's
going to change.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoacct: simplify check_free_space()
Al Viro [Sat, 19 Apr 2014 18:24:18 +0000 (14:24 -0400)]
acct: simplify check_free_space()

a) file can't be NULL
b) file can't be changed under us
c) all writes are serialized by acct->lock; no need to mess with
spinlock there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoacct: new lifetime rules
Al Viro [Thu, 7 Aug 2014 11:51:03 +0000 (07:51 -0400)]
acct: new lifetime rules

Do not reuse bsd_acct_struct after closing the damn thing.
Structure lifetime is controlled by refcount now.  We also
have a mutex in there, held over closing and writing (the
file is O_APPEND, so we are not losing any concurrency).

As the result, we do not need to bother with get_file()/fput()
on log write anymore.  Moreover, do_acct_process() only needs
acct itself; file and pidns are picked from it.

Killed instances are distinguished by having NULL ->ns.
Refcount is protected by acct_lock; anybody taking the
mutex needs to grab a reference first.

The things will get a lot simpler in the next commits - this
is just the minimal chunk switching to the new lifetime rules.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoacct: serialize acct_on()
Al Viro [Thu, 15 May 2014 10:49:45 +0000 (06:49 -0400)]
acct: serialize acct_on()

brute-force - on a global mutex that isn't nested into anything.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoacct() should honour the limits from the very beginning
Al Viro [Wed, 7 May 2014 09:23:41 +0000 (05:23 -0400)]
acct() should honour the limits from the very beginning

We need to check free space on the first write to freshly opened log.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agosplit the slow path in acct_process() off
Al Viro [Wed, 7 May 2014 09:12:09 +0000 (05:12 -0400)]
split the slow path in acct_process() off

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoseparate namespace-independent parts of filling acct_t
Al Viro [Sun, 27 Apr 2014 03:45:53 +0000 (23:45 -0400)]
separate namespace-independent parts of filling acct_t

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoacct: switch to __kernel_write()
Al Viro [Sat, 19 Apr 2014 18:37:20 +0000 (14:37 -0400)]
acct: switch to __kernel_write()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoacct: encode_comp_t(0) is 0, fortunately...
Al Viro [Sat, 19 Apr 2014 18:44:49 +0000 (14:44 -0400)]
acct: encode_comp_t(0) is 0, fortunately...

There was an amusing bogosity in ac_rw calculation - it tried to
do encode_comp_t(encode_comp_t(0) / 1024).  Seeing that comp_t is
a 3-bit exponent + 13-bit mantissa... it's a good thing that 0 is
represented by all-bits-clear.

The history of that one is interesting - it was introduced in
2.1.68pre1, when acct.c had been reworked and moved to separate
file.  Two months later (2.1.86) somebody has noticed that the
sucker won't compile - there was no task_struct::io_usage.
At which point the ac_io calculation had changed from
encode_comp_t(current->io_usage) to encode_comp_t(0) and the
bug in the next line (absolutely real back then, had it ever
managed to compile) become a harmless bogosity.  Looks like
nobody has ever noticed until now.

Anyway, let's bury that idiocy now that it got noticed.  17 years
is long enough...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agoMerge commit 'ccbf62d8a284cf181ac28c8e8407dd077d90dd4b' into for-next
Al Viro [Thu, 7 Aug 2014 18:07:57 +0000 (14:07 -0400)]
Merge commit 'ccbf62d8a284cf181ac28c8e8407dd077d90dd4b' into for-next

backmerge to avoid kernel/acct.c conflict

10 years agoLinux 3.16 v3.16
Linus Torvalds [Sun, 3 Aug 2014 22:25:02 +0000 (15:25 -0700)]
Linux 3.16

10 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 3 Aug 2014 16:58:20 +0000 (09:58 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "Two fixes in the timer area:
   - a long-standing lock inversion due to a printk
   - suspend-related hrtimer corruption in sched_clock"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
  sched_clock: Avoid corrupting hrtimer tree during suspend

10 years agoMerge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Linus Torvalds [Sat, 2 Aug 2014 17:57:39 +0000 (10:57 -0700)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
 "A few fixes for ARM.  Some of these are correctness issues:
   - TLBs must be flushed after the old mappings are removed by the DMA
     mapping code, but before the new mappings are established.
   - An off-by-one entry error in the Keystone LPAE setup code.

  Fixes include:
   - ensuring that the identity mapping for LPAE does not remove the
     kernel image from the identity map.
   - preventing userspace from trapping into kgdb.
   - fixing a preemption issue in the Intel iwmmxt code.
   - fixing a build error with nommu.

  Other changes include:
   - Adding a note about which areas of memory are expected to be
     accessible while the identity mapping tables are in place"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8124/1: don't enter kgdb when userspace executes a kgdb break instruction
  ARM: idmap: add identity mapping usage note
  ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout
  ARM: fix alignment of keystone page table fixup
  ARM: 8112/1: only select ARM_PATCH_PHYS_VIRT if MMU is enabled
  ARM: 8100/1: Fix preemption disable in iwmmxt_task_enable()
  ARM: DMA: ensure that old section mappings are flushed from the TLB

10 years agoARM: 8124/1: don't enter kgdb when userspace executes a kgdb break instruction
Omar Sandoval [Fri, 1 Aug 2014 17:14:06 +0000 (18:14 +0100)]
ARM: 8124/1: don't enter kgdb when userspace executes a kgdb break instruction

The kgdb breakpoint hooks (kgdb_brk_fn and kgdb_compiled_brk_fn)
should only be entered when a kgdb break instruction is executed
from the kernel. Otherwise, if kgdb is enabled, a userspace program
can cause the kernel to drop into the debugger by executing either
KGDB_BREAKINST or KGDB_COMPILED_BREAK.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
10 years agoARM: idmap: add identity mapping usage note
Russell King [Tue, 29 Jul 2014 11:18:34 +0000 (12:18 +0100)]
ARM: idmap: add identity mapping usage note

Add a note about the usage of the identity mapping; we do not support
accesses outside of the identity map region and kernel image while a
CPU is using the identity map.  This is because the identity mapping
may overwrite vmalloc space, IO mappings, the vectors pages, etc.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
10 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sat, 2 Aug 2014 01:01:41 +0000 (18:01 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
 "This contains a couple of fixes - one is the aio fix from Christoph,
  the other a fallocate() one from Eric"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: fix check for fallocate on active swapfile
  direct-io: fix AIO regression

10 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 2 Aug 2014 00:37:01 +0000 (17:37 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fix from Peter Anvin:
 "A single fix to not invoke the espfix code on Xen PV, as it turns out
  to oops the guest when invoked after all.  This patch leaves some
  amount of dead code, in particular unnecessary initialization of the
  espfix stacks when they won't be used, but in the interest of keeping
  the patch minimal that cleanup can wait for the next cycle"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_64/entry/xen: Do not invoke espfix64 on Xen

10 years agoMerge tag 'staging-3.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sat, 2 Aug 2014 00:16:05 +0000 (17:16 -0700)]
Merge tag 'staging-3.16-rc8' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver bugfixes from Greg KH:
 "Here are some tiny staging driver bugfixes that I've had in my tree
  for the past week that resolve some reported issues.  Nothing major at
  all, but it would be good to get them merged for 3.16-rc8 or -final"

* tag 'staging-3.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: vt6655: Fix disassociated messages every 10 seconds
  staging: vt6655: Fix Warning on boot handle_irq_event_percpu.
  staging: rtl8723au: rtw_resume(): release semaphore before exit on error
  iio:bma180: Missing check for frequency fractional part
  iio:bma180: Fix scale factors to report correct acceleration units
  iio: buffer: Fix demux table creation

10 years agoMerge tag 'dm-3.16-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device...
Linus Torvalds [Fri, 1 Aug 2014 19:50:05 +0000 (12:50 -0700)]
Merge tag 'dm-3.16-fixes-3' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:
 "Fix dm bufio shrinker to properly zero-fill all fields.

  Fix race in dm cache that caused improper reporting of the number of
  dirty blocks in the cache"

* tag 'dm-3.16-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache: fix race affecting dirty block count
  dm bufio: fully initialize shrinker

10 years agoMerge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Fri, 1 Aug 2014 19:49:02 +0000 (12:49 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM straggler SoC fix from Olof Johansson:
 "A DT bugfix for Nomadik that had an ambigouos double-inversion of a
  gpio line, and one MAINTAINER URL update that might as well go in now.

  We could hold off until the merge window, but then we'll just have to
  mark the DT fix for stable and it just seems like in total causing
  more work"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  MAINTAINERS: Update Tegra Git URL
  ARM: nomadik: fix up double inversion in DT

10 years agodm cache: fix race affecting dirty block count
Anssi Hannula [Fri, 1 Aug 2014 15:55:47 +0000 (11:55 -0400)]
dm cache: fix race affecting dirty block count

nr_dirty is updated without locking, causing it to drift so that it is
non-zero (either a small positive integer, or a very large one when an
underflow occurs) even when there are no actual dirty blocks.  This was
due to a race between the workqueue and map function accessing nr_dirty
in parallel without proper protection.

People were seeing under runs due to a race on increment/decrement of
nr_dirty, see: https://lkml.org/lkml/2014/6/3/648

Fix this by using an atomic_t for nr_dirty.

Reported-by: roma1390@gmail.com
Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
10 years agodm bufio: fully initialize shrinker
Greg Thelen [Thu, 31 Jul 2014 16:07:19 +0000 (09:07 -0700)]
dm bufio: fully initialize shrinker

1d3d4437eae1 ("vmscan: per-node deferred work") added a flags field to
struct shrinker assuming that all shrinkers were zero filled.  The dm
bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data
in flags.  So far the only defined flags bit is SHRINKER_NUMA_AWARE.
But there are proposed patches which add other bits to shrinker.flags
(e.g. memcg awareness).

Rather than simply initializing the shrinker, this patch uses kzalloc()
when allocating the dm_bufio_client to ensure that the embedded shrinker
and any other similar structures are zeroed.

This fixes theoretical over aggressive shrinking of dm bufio objects.
If the uninitialized dm_bufio_client.shrinker.flags contains
SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for
each numa node rather than just once.  This has been broken since 3.12.

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # v3.12+
10 years agotimer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
Jan Kara [Fri, 1 Aug 2014 10:20:02 +0000 (12:20 +0200)]
timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks

clockevents_increase_min_delta() calls printk() from under
hrtimer_bases.lock. That causes lock inversion on scheduler locks because
printk() can call into the scheduler. Lockdep puts it as:

======================================================
[ INFO: possible circular locking dependency detected ]
3.15.0-rc8-06195-g939f04b #2 Not tainted
-------------------------------------------------------
trinity-main/74 is trying to acquire lock:
 (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c

but task is already holding lock:
 (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (hrtimer_bases.lock){-.-...}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197
       [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85
       [<81080792>] task_clock_event_start+0x3a/0x3f
       [<810807a4>] task_clock_event_add+0xd/0x14
       [<8108259a>] event_sched_in+0xb6/0x17a
       [<810826a2>] group_sched_in+0x44/0x122
       [<81082885>] ctx_sched_in.isra.67+0x105/0x11f
       [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b
       [<81082bf6>] __perf_install_in_context+0x8b/0xa3
       [<8107eb8e>] remote_function+0x12/0x2a
       [<8105f5af>] smp_call_function_single+0x2d/0x53
       [<8107e17d>] task_function_call+0x30/0x36
       [<8107fb82>] perf_install_in_context+0x87/0xbb
       [<810852c9>] SYSC_perf_event_open+0x5c6/0x701
       [<810856f9>] SyS_perf_event_open+0x17/0x19
       [<8142f8ee>] syscall_call+0x7/0xb

-> #4 (&ctx->lock){......}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f04c>] _raw_spin_lock+0x21/0x30
       [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<8142cae0>] schedule+0xf/0x11
       [<8142f9a6>] work_resched+0x5/0x30

-> #3 (&rq->lock){-.-.-.}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f04c>] _raw_spin_lock+0x21/0x30
       [<81040873>] __task_rq_lock+0x33/0x3a
       [<8104184c>] wake_up_new_task+0x25/0xc2
       [<8102474b>] do_fork+0x15c/0x2a0
       [<810248a9>] kernel_thread+0x1a/0x1f
       [<814232a2>] rest_init+0x1a/0x10e
       [<817af949>] start_kernel+0x303/0x308
       [<817af2ab>] i386_start_kernel+0x79/0x7d

-> #2 (&p->pi_lock){-.-...}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<810413dd>] try_to_wake_up+0x1d/0xd6
       [<810414cd>] default_wake_function+0xb/0xd
       [<810461f3>] __wake_up_common+0x39/0x59
       [<81046346>] __wake_up+0x29/0x3b
       [<811b8733>] tty_wakeup+0x49/0x51
       [<811c3568>] uart_write_wakeup+0x17/0x19
       [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
       [<811c5f28>] serial8250_handle_irq+0x54/0x6a
       [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
       [<811c56d8>] serial8250_interrupt+0x38/0x9e
       [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
       [<81051296>] handle_irq_event+0x2c/0x43
       [<81052cee>] handle_level_irq+0x57/0x80
       [<81002a72>] handle_irq+0x46/0x5c
       [<810027df>] do_IRQ+0x32/0x89
       [<8143036e>] common_interrupt+0x2e/0x33
       [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
       [<811c25a4>] uart_start+0x2d/0x32
       [<811c2c04>] uart_write+0xc7/0xd6
       [<811bc6f6>] n_tty_write+0xb8/0x35e
       [<811b9beb>] tty_write+0x163/0x1e4
       [<811b9cd9>] redirected_tty_write+0x6d/0x75
       [<810b6ed6>] vfs_write+0x75/0xb0
       [<810b7265>] SyS_write+0x44/0x77
       [<8142f8ee>] syscall_call+0x7/0xb

-> #1 (&tty->write_wait){-.....}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<81046332>] __wake_up+0x15/0x3b
       [<811b8733>] tty_wakeup+0x49/0x51
       [<811c3568>] uart_write_wakeup+0x17/0x19
       [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
       [<811c5f28>] serial8250_handle_irq+0x54/0x6a
       [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
       [<811c56d8>] serial8250_interrupt+0x38/0x9e
       [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
       [<81051296>] handle_irq_event+0x2c/0x43
       [<81052cee>] handle_level_irq+0x57/0x80
       [<81002a72>] handle_irq+0x46/0x5c
       [<810027df>] do_IRQ+0x32/0x89
       [<8143036e>] common_interrupt+0x2e/0x33
       [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
       [<811c25a4>] uart_start+0x2d/0x32
       [<811c2c04>] uart_write+0xc7/0xd6
       [<811bc6f6>] n_tty_write+0xb8/0x35e
       [<811b9beb>] tty_write+0x163/0x1e4
       [<811b9cd9>] redirected_tty_write+0x6d/0x75
       [<810b6ed6>] vfs_write+0x75/0xb0
       [<810b7265>] SyS_write+0x44/0x77
       [<8142f8ee>] syscall_call+0x7/0xb

-> #0 (&port_lock_key){-.....}:
       [<8104a62d>] __lock_acquire+0x9ea/0xc6d
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<811c60be>] serial8250_console_write+0x8c/0x10c
       [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
       [<8104f5d5>] console_unlock+0x1d7/0x398
       [<8104fb70>] vprintk_emit+0x3da/0x3e4
       [<81425f76>] printk+0x17/0x19
       [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
       [<8105c548>] clockevents_program_event+0xe7/0xf3
       [<8105cc1c>] tick_program_event+0x1e/0x23
       [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
       [<8103c49e>] __remove_hrtimer+0x5b/0x79
       [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
       [<8103cb4b>] hrtimer_cancel+0xd/0x18
       [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
       [<81080705>] task_clock_event_stop+0x20/0x64
       [<81080756>] task_clock_event_del+0xd/0xf
       [<81081350>] event_sched_out+0xab/0x11e
       [<810813e0>] group_sched_out+0x1d/0x66
       [<81081682>] ctx_sched_out+0xaf/0xbf
       [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<8142cae0>] schedule+0xf/0x11
       [<8142f9a6>] work_resched+0x5/0x30

other info that might help us debug this:

Chain exists of:
  &port_lock_key --> &ctx->lock --> hrtimer_bases.lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(hrtimer_bases.lock);
                               lock(&ctx->lock);
                               lock(hrtimer_bases.lock);
  lock(&port_lock_key);

 *** DEADLOCK ***

4 locks held by trinity-main/74:
 #0:  (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb
 #1:  (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
 #2:  (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
 #3:  (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4

stack backtrace:
CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2
 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
Call Trace:
 [<81426f69>] dump_stack+0x16/0x18
 [<81425a99>] print_circular_bug+0x18f/0x19c
 [<8104a62d>] __lock_acquire+0x9ea/0xc6d
 [<8104a942>] lock_acquire+0x92/0x101
 [<811c60be>] ? serial8250_console_write+0x8c/0x10c
 [<811c6032>] ? wait_for_xmitr+0x76/0x76
 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
 [<811c60be>] ? serial8250_console_write+0x8c/0x10c
 [<811c60be>] serial8250_console_write+0x8c/0x10c
 [<8104af87>] ? lock_release+0x191/0x223
 [<811c6032>] ? wait_for_xmitr+0x76/0x76
 [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
 [<8104f5d5>] console_unlock+0x1d7/0x398
 [<8104fb70>] vprintk_emit+0x3da/0x3e4
 [<81425f76>] printk+0x17/0x19
 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
 [<8105cc1c>] tick_program_event+0x1e/0x23
 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
 [<8103c49e>] __remove_hrtimer+0x5b/0x79
 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
 [<8103cb4b>] hrtimer_cancel+0xd/0x18
 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
 [<81080705>] task_clock_event_stop+0x20/0x64
 [<81080756>] task_clock_event_del+0xd/0xf
 [<81081350>] event_sched_out+0xab/0x11e
 [<810813e0>] group_sched_out+0x1d/0x66
 [<81081682>] ctx_sched_out+0xaf/0xbf
 [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
 [<8104416d>] ? __dequeue_entity+0x23/0x27
 [<81044505>] ? pick_next_task_fair+0xb1/0x120
 [<8142cacc>] __schedule+0x4c6/0x4cb
 [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108
 [<810475b0>] ? trace_hardirqs_off+0xb/0xd
 [<81056346>] ? rcu_irq_exit+0x64/0x77

Fix the problem by using printk_deferred() which does not call into the
scheduler.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
10 years agovfs: fix check for fallocate on active swapfile
Eric Biggers [Wed, 25 Jun 2014 04:45:08 +0000 (23:45 -0500)]
vfs: fix check for fallocate on active swapfile

Fix the broken check for calling sys_fallocate() on an active swapfile,
introduced by commit 0790b31b69374ddadefe ("fs: disallow all fallocate
operation on active swapfile").

Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
10 years agodirect-io: fix AIO regression
Christoph Hellwig [Wed, 30 Jul 2014 11:18:48 +0000 (07:18 -0400)]
direct-io: fix AIO regression

The direct-io.c rewrite to use the iov_iter infrastructure stopped updating
the size field in struct dio_submit, and thus rendered the check for
allowing asynchronous completions to always return false.  Fix this by
comparing it to the count of bytes in the iov_iter instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Tim Chen <tim.c.chen@linux.intel.com>
10 years agoMerge tag 'pm+acpi-3.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Thu, 31 Jul 2014 23:42:10 +0000 (16:42 -0700)]
Merge tag 'pm+acpi-3.16-rc8' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
 "One commit that fixes a problem causing PNP devices to be associated
  with wrong ACPI device objects sometimes during device enumeration due
  to an incorrect check in a matching function.

  That problem was uncovered by the ACPI device enumeration rework in
  3.14"

* tag 'pm+acpi-3.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / PNP: Fix acpi_pnp_match()

10 years agoMerge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux
Linus Torvalds [Thu, 31 Jul 2014 17:02:15 +0000 (10:02 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux

Pull clock driver fix from Mike Turquette:
 "A single patch to re-enable audio which is broken on all DRA7
  SoC-based platforms.  Missed this one from the last set of fixes"

* tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux:
  clk: ti: clk-7xx: Correct ABE DPLL configuration

10 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Thu, 31 Jul 2014 17:01:34 +0000 (10:01 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "This adds missing SELinux labeling to AF_ALG sockets which apparently
  causes SELinux (or at least the SELinux people) to misbehave :)"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: af_alg - properly label AF_ALG socket

10 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Thu, 31 Jul 2014 17:00:42 +0000 (10:00 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI barrier fix from James Bottomley:
 "This is a potential data corruption fix: If we get an error sending
  down a barrier, we simply ignore it meaning the barrier semantics get
  violated without anyone being any the wiser.  If the system crashes at
  this point, the filesystem potentially becomes corrupt.  Fix is to
  report errors on failed barriers"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: handle flush errors properly

10 years agoclk: ti: clk-7xx: Correct ABE DPLL configuration
Peter Ujfalusi [Wed, 2 Apr 2014 13:48:45 +0000 (16:48 +0300)]
clk: ti: clk-7xx: Correct ABE DPLL configuration

ABE DPLL frequency need to be lowered from 361267200
to 180633600 to facilitate the ATL requironments.
The dpll_abe_m2x2_ck clock need to be set to double
of ABE DPLL rate in order to have correct clocks
for audio.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
10 years agocrypto: af_alg - properly label AF_ALG socket
Milan Broz [Tue, 29 Jul 2014 18:41:09 +0000 (18:41 +0000)]
crypto: af_alg - properly label AF_ALG socket

Th AF_ALG socket was missing a security label (e.g. SELinux)
which means that socket was in "unlabeled" state.

This was recently demonstrated in the cryptsetup package
(cryptsetup v1.6.5 and later.)
See https://bugzilla.redhat.com/show_bug.cgi?id=1115120

This patch clones the sock's label from the parent sock
and resolves the issue (similar to AF_BLUETOOTH protocol family).

Cc: stable@vger.kernel.org
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
10 years agokexec: fix build error when hugetlbfs is disabled
David Rientjes [Thu, 31 Jul 2014 02:05:55 +0000 (19:05 -0700)]
kexec: fix build error when hugetlbfs is disabled

free_huge_page() is undefined without CONFIG_HUGETLBFS and there's no
need to filter PageHuge() page is such a configuration either, so avoid
exporting the symbol to fix a build error:

   In file included from kernel/kexec.c:14:0:
   kernel/kexec.c: In function 'crash_save_vmcoreinfo_init':
   kernel/kexec.c:1623:20: error: 'free_huge_page' undeclared (first use in this function)
     VMCOREINFO_SYMBOL(free_huge_page);
                       ^

Introduced by commit 8f1d26d0e59b ("kexec: export free_huge_page to
VMCOREINFO")

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Acked-by: Olof Johansson <olof@lixom.net>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoMerge branch 'akpm' (patches from Andrew Morton)
Linus Torvalds [Thu, 31 Jul 2014 00:16:36 +0000 (17:16 -0700)]
Merge branch 'akpm' (patches from Andrew Morton)

Merge fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  Josh has moved
  kexec: export free_huge_page to VMCOREINFO
  mm: fix filemap.c pagecache_get_page() kernel-doc warnings
  mm: debugfs: move rounddown_pow_of_two() out from do_fault path
  memcg: oom_notify use-after-free fix
  hwpoison: call action_result() in failure path of hwpoison_user_mappings()
  hwpoison: fix hugetlbfs/thp precheck in hwpoison_user_mappings()
  rapidio/tsi721_dma: fix failure to obtain transaction descriptor
  mm, thp: do not allow thp faults to avoid cpuset restrictions
  mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()

10 years agoJosh has moved
Josh Triplett [Wed, 30 Jul 2014 23:08:42 +0000 (16:08 -0700)]
Josh has moved

My IBM email addresses haven't worked for years; also map some
old-but-functional forwarding addresses to my canonical address.

Update my GPG key fingerprint; I moved to 4096R a long time ago.

Update description.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agokexec: export free_huge_page to VMCOREINFO
Atsushi Kumagai [Wed, 30 Jul 2014 23:08:39 +0000 (16:08 -0700)]
kexec: export free_huge_page to VMCOREINFO

PG_head_mask was added into VMCOREINFO to filter huge pages in b3acc56bfe1
("kexec: save PG_head_mask in VMCOREINFO"), but makedumpfile still need
another symbol to filter *hugetlbfs* pages.

If a user hope to filter user pages, makedumpfile tries to exclude them by
checking the condition whether the page is anonymous, but hugetlbfs pages
aren't anonymous while they also be user pages.

We know it's possible to detect them in the same way as PageHuge(),
so we need the start address of free_huge_page():

    int PageHuge(struct page *page)
    {
            if (!PageCompound(page))
                    return 0;

            page = compound_head(page);
            return get_compound_page_dtor(page) == free_huge_page;
    }

For that reason, this patch changes free_huge_page() into public
to export it to VMCOREINFO.

Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agomm: fix filemap.c pagecache_get_page() kernel-doc warnings
Randy Dunlap [Wed, 30 Jul 2014 23:08:37 +0000 (16:08 -0700)]
mm: fix filemap.c pagecache_get_page() kernel-doc warnings

Fix kernel-doc warnings in mm/filemap.c: pagecache_get_page():

  Warning(..//mm/filemap.c:1054): No description found for parameter 'cache_gfp_mask'
  Warning(..//mm/filemap.c:1054): No description found for parameter 'radix_gfp_mask'
  Warning(..//mm/filemap.c:1054): Excess function parameter 'gfp_mask' description in 'pagecache_get_page'

Fixes: 2457aec63745 ("mm: non-atomically mark page accessed during page cache allocation where possible")

[mgorman@suse.de: change everything]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agomm: debugfs: move rounddown_pow_of_two() out from do_fault path
Andrey Ryabinin [Wed, 30 Jul 2014 23:08:35 +0000 (16:08 -0700)]
mm: debugfs: move rounddown_pow_of_two() out from do_fault path

do_fault_around() expects fault_around_bytes rounded down to nearest page
order.  Instead of calling rounddown_pow_of_two every time in
fault_around_pages()/fault_around_mask() we could do round down when user
changes fault_around_bytes via debugfs interface.

This also fixes bug when user set fault_around_bytes to 0.  Result of
rounddown_pow_of_two(0) is not defined, therefore fault_around_bytes == 0
doesn't work without this patch.

Let's set fault_around_bytes to PAGE_SIZE if user sets to something less
than PAGE_SIZE

[akpm@linux-foundation.org: tweak code layout]
Fixes: a9b0f861("mm: nominate faultaround area in bytes rather than page order")
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org> [3.15.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agomemcg: oom_notify use-after-free fix
Michal Hocko [Wed, 30 Jul 2014 23:08:33 +0000 (16:08 -0700)]
memcg: oom_notify use-after-free fix

Paul Furtado has reported the following GPF:

  general protection fault: 0000 [#1] SMP
  Modules linked in: ipv6 dm_mod xen_netfront coretemp hwmon x86_pkg_temp_thermal crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 microcode pcspkr ext4 jbd2 mbcache raid0 xen_blkfront
  CPU: 3 PID: 3062 Comm: java Not tainted 3.16.0-rc5 #1
  task: ffff8801cfe8f170 ti: ffff8801d2ec4000 task.ti: ffff8801d2ec4000
  RIP: e030:mem_cgroup_oom_synchronize+0x140/0x240
  RSP: e02b:ffff8801d2ec7d48  EFLAGS: 00010283
  RAX: 0000000000000001 RBX: ffff88009d633800 RCX: 000000000000000e
  RDX: fffffffffffffffe RSI: ffff88009d630200 RDI: ffff88009d630200
  RBP: ffff8801d2ec7da8 R08: 0000000000000012 R09: 00000000fffffffe
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff88009d633800
  R13: ffff8801d2ec7d48 R14: dead000000100100 R15: ffff88009d633a30
  FS:  00007f1748bb4700(0000) GS:ffff8801def80000(0000) knlGS:0000000000000000
  CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007f4110300308 CR3: 00000000c05f7000 CR4: 0000000000002660
  Call Trace:
    pagefault_out_of_memory+0x18/0x90
    mm_fault_error+0xa9/0x1a0
    __do_page_fault+0x478/0x4c0
    do_page_fault+0x2c/0x40
    page_fault+0x28/0x30
  Code: 44 00 00 48 89 df e8 40 ca ff ff 48 85 c0 49 89 c4 74 35 4c 8b b0 30 02 00 00 4c 8d b8 30 02 00 00 4d 39 fe 74 1b 0f 1f 44 00 00 <49> 8b 7e 10 be 01 00 00 00 e8 42 d2 04 00 4d 8b 36 4d 39 fe 75
  RIP  mem_cgroup_oom_synchronize+0x140/0x240

Commit fb2a6fc56be6 ("mm: memcg: rework and document OOM waiting and
wakeup") has moved mem_cgroup_oom_notify outside of memcg_oom_lock
assuming it is protected by the hierarchical OOM-lock.

Although this is true for the notification part the protection doesn't
cover unregistration of event which can happen in parallel now so
mem_cgroup_oom_notify can see already unlinked and/or freed
mem_cgroup_eventfd_list.

Fix this by using memcg_oom_lock also in mem_cgroup_oom_notify.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=80881

Fixes: fb2a6fc56be6 (mm: memcg: rework and document OOM waiting and wakeup)
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Paul Furtado <paulfurtado91@gmail.com>
Tested-by: Paul Furtado <paulfurtado91@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agohwpoison: call action_result() in failure path of hwpoison_user_mappings()
Naoya Horiguchi [Wed, 30 Jul 2014 23:08:30 +0000 (16:08 -0700)]
hwpoison: call action_result() in failure path of hwpoison_user_mappings()

hwpoison_user_mappings() could fail for various reasons, so printk()s to
print out the reasons should be done in each failure check inside
hwpoison_user_mappings().

And currently we don't call action_result() when hwpoison_user_mappings()
fails, which is not consistent with other exit points of memory error
handler.  So this patch fixes these messaging problems.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agohwpoison: fix hugetlbfs/thp precheck in hwpoison_user_mappings()
Naoya Horiguchi [Wed, 30 Jul 2014 23:08:28 +0000 (16:08 -0700)]
hwpoison: fix hugetlbfs/thp precheck in hwpoison_user_mappings()

A recent fix from Chen Yucong, commit 0bc1f8b0682c ("hwpoison: fix the
handling path of the victimized page frame that belong to non-LRU")
rejects going into unmapping operation for hugetlbfs/thp pages, which
results in failing error containing on such pages.  This patch fixes it.

With this patch, hwpoison functional tests in mce-test testsuite pass.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agorapidio/tsi721_dma: fix failure to obtain transaction descriptor
Alexandre Bounine [Wed, 30 Jul 2014 23:08:26 +0000 (16:08 -0700)]
rapidio/tsi721_dma: fix failure to obtain transaction descriptor

This is a bug fix for the situation when function tsi721_desc_get() fails
to obtain a free transaction descriptor.

The bug usually results in a memory access crash dump when data transfer
scatter-gather list has more entries than size of hardware buffer
descriptors ring.  This fix ensures that error is properly returned to a
caller instead of an invalid entry.

This patch is applicable to kernel versions starting from v3.5.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org> [3.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agomm, thp: do not allow thp faults to avoid cpuset restrictions
David Rientjes [Wed, 30 Jul 2014 23:08:24 +0000 (16:08 -0700)]
mm, thp: do not allow thp faults to avoid cpuset restrictions

The page allocator relies on __GFP_WAIT to determine if ALLOC_CPUSET
should be set in allocflags.  ALLOC_CPUSET controls if a page allocation
should be restricted only to the set of allowed cpuset mems.

Transparent hugepages clears __GFP_WAIT when defrag is disabled to prevent
the fault path from using memory compaction or direct reclaim.  Thus, it
is unfairly able to allocate outside of its cpuset mems restriction as a
side-effect.

This patch ensures that ALLOC_CPUSET is only cleared when the gfp mask is
truly GFP_ATOMIC by verifying it is also not a thp allocation.

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Alex Thorlton <athorlton@sgi.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Bob Liu <lliubbo@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hedi Berriche <hedi@sgi.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agomm/page-writeback.c: fix divide by zero in bdi_dirty_limits()
Maxim Patlasov [Wed, 30 Jul 2014 23:08:21 +0000 (16:08 -0700)]
mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()

Under memory pressure, it is possible for dirty_thresh, calculated by
global_dirty_limits() in balance_dirty_pages(), to equal zero.  Then, if
strictlimit is true, bdi_dirty_limits() tries to resolve the proportion:

  bdi_bg_thresh : bdi_thresh = background_thresh : dirty_thresh

by dividing by zero.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoMAINTAINERS: Update Tegra Git URL
Andreas Färber [Mon, 28 Jul 2014 18:06:26 +0000 (12:06 -0600)]
MAINTAINERS: Update Tegra Git URL

swarren/linux-tegra.git is a stale location; it has moved to
tegra/linux.git.

While the git protocol re-directs to the new location, HTTP does not.
Besides, MAINTAINERS should contain the canonical URL.

Signed-off-by: Andreas Färber <afaerber@suse.de>
[swarren, updated commit message]
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
10 years agoARM: nomadik: fix up double inversion in DT
Linus Walleij [Fri, 25 Jul 2014 10:18:42 +0000 (12:18 +0200)]
ARM: nomadik: fix up double inversion in DT

The GPIO pin connected to card detect was inverted twice: once by
the argument to the GPIO line itself where it was magically marked
as active low by the flag GPIO_ACTIVE_LOW (0x01) in the third cell,
and also marked active low AGAIN by explicitly stating
"cd-inverted" (a deprecated method).

After commit 78f87df2b4f8760954d7d80603d0cfcbd4759683
"mmc: mmci: Use the common mmc DT parser" this results in the
line being inverted twice so it was effectively uninverted, while
the old code would not have this effect, instead disregarding the
flag on the GPIO line altogether, which is a bug. I admit the
semantics may be unclear but inverting twice is as good a
definition as any on how this should work.

So fix up the buggy device tree. Use proper #includes so the DTS
is clear and readable.

Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
10 years agoMerge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux
Linus Torvalds [Wed, 30 Jul 2014 16:01:04 +0000 (09:01 -0700)]
Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux

Pull Exynos platform DT fix from Grant Likely:
 "Device tree Exynos bug fix for v3.16-rc7

  This bug fix has been brewing for a while.  I hate sending it to you
  so late, but I only got confirmation that it solves the problem this
  past weekend.  The diff looks big for a bug fix, but the majority of
  it is only executed in the Exynos quirk case.  Unfortunately it
  required splitting early_init_dt_scan() in two and adding quirk
  handling in the middle of it on ARM.

  Exynos has buggy firmware that puts bad data into the memory node.
  Commit 1c2f87c22566 ("ARM: Get rid of meminfo") exposed the bug by
  dropping the artificial upper bound on the number of memory banks that
  can be added.  Exynos fails to boot after that commit.  This branch
  fixes it by splitting the early DT parse function and inserting a
  fixup hook.  Exynos uses the hook to correct the DT before parsing
  memory regions"

* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
  arm: Add devicetree fixup machine function
  of: Add memory limiting function for flattened devicetrees
  of: Split early_init_dt_scan into two parts

10 years agoMerge tag 'stable/for-linus-3.16-rc7-tag' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Wed, 30 Jul 2014 16:00:20 +0000 (09:00 -0700)]
Merge tag 'stable/for-linus-3.16-rc7-tag' of git://git./linux/kernel/git/xen/tip

Pull Xen fix from David Vrabel:
 "Fix BUG when trying to expand the grant table.  This seems to occur
  often during boot with Ubuntu 14.04 PV guests"

* tag 'stable/for-linus-3.16-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: safely map and unmap grant frames when in atomic context

10 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Wed, 30 Jul 2014 15:59:15 +0000 (08:59 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fix from Paolo Bonzini:
 "Fix a bug which allows KVM guests to bring down the entire system on
  some 64K enabled ARM64 hosts"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform

10 years agoRevert "cdc_subset: deal with a device that needs reset for timeout"
Linus Torvalds [Wed, 30 Jul 2014 15:56:23 +0000 (08:56 -0700)]
Revert "cdc_subset: deal with a device that needs reset for timeout"

This reverts commit 20fbe3ae990fd54fc7d1f889d61958bc8b38f254.

As reported by Stephen Rothwell, it causes compile failures in certain
configurations:

  drivers/net/usb/cdc_subset.c:360:15: error: 'dummy_prereset' undeclared here (not in a function)
    .pre_reset = dummy_prereset,
                 ^
  drivers/net/usb/cdc_subset.c:361:16: error: 'dummy_postreset' undeclared here (not in a function)
    .post_reset = dummy_postreset,
                  ^

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: David Miller <davem@davemloft.net>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Wed, 30 Jul 2014 15:54:17 +0000 (08:54 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Make fragmentation IDs less predictable, from Eric Dumazet.

 2) TSO tunneling can crash in bnx2x driver, fix from Dmitry Kravkov.

 3) Don't allow NULL msg->msg_name just because msg->msg_namelen is
    non-zero, from Andrey Ryabinin.

 4) ndm->ndm_type set using wrong macros, from Jun Zhao.

 5) cdc-ether devices can come up with entries in their address filter,
    so explicitly clear the filter after the device initializes.  From
    Oliver Neukum.

 6) Forgotten refcount bump in xfrm_lookup(), from Steffen Klassert.

 7) Short packets not padded properly, exposing random data, in bcmgenet
    driver.  Fix from Florian Fainelli.

 8) xgbe_probe() doesn't return an error code, but rather zero, when
    netif_set_real_num_tx_queues() fails.  Fix from Wei Yongjun.

 9) USB speed not probed properly in r8152 driver, from Hayes Wang.

10) Transmit logic choosing the outgoing port in the sunvnet driver
    needs to consider a) is the port actually up and b) whether it is a
    switch port.  Fix from David L Stevens.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
  net: phy: re-apply PHY fixups during phy_register_device
  cdc-ether: clean packet filter upon probe
  cdc_subset: deal with a device that needs reset for timeout
  net: sendmsg: fix NULL pointer dereference
  isdn/bas_gigaset: fix a leak on failure path in gigaset_probe()
  ip: make IP identifiers less predictable
  neighbour : fix ndm_type type error issue
  sunvnet: only use connected ports when sending
  can: c_can_platform: Fix raminit, use devm_ioremap() instead of devm_ioremap_resource()
  bnx2x: fix crash during TSO tunneling
  r8152: fix the checking of the usb speed
  net: phy: Ensure the MDIO bus module is held
  net: phy: Set the driver when registering an MDIO bus device
  bnx2x: fix set_setting for some PHYs
  hyperv: Fix error return code in netvsc_init_buf()
  amd-xgbe: Fix error return code in xgbe_probe()
  ath9k: fix aggregation session lockup
  net: bcmgenet: correctly pad short packets
  net: sctp: inherit auth_capable on INIT collisions
  mac80211: fix crash on getting sta info with uninitialized rate control
  ...

10 years agox86/xen: safely map and unmap grant frames when in atomic context
David Vrabel [Fri, 11 Jul 2014 15:42:34 +0000 (16:42 +0100)]
x86/xen: safely map and unmap grant frames when in atomic context

arch_gnttab_map_frames() and arch_gnttab_unmap_frames() are called in
atomic context but were calling alloc_vm_area() which might sleep.

Also, if a driver attempts to allocate a grant ref from an interrupt
and the table needs expanding, then the CPU may already by in lazy MMU
mode and apply_to_page_range() will BUG when it tries to re-enable
lazy MMU mode.

These two functions are only used in PV guests.

Introduce arch_gnttab_init() to allocates the virtual address space in
advance.

Avoid the use of apply_to_page_range() by using saving and using the
array of PTE addresses from the alloc_vm_area() call (which ensures
that the required page tables are pre-allocated).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agokvm: arm64: vgic: fix hyp panic with 64k pages on juno platform
Will Deacon [Fri, 25 Jul 2014 15:29:12 +0000 (16:29 +0100)]
kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform

If the physical address of GICV isn't page-aligned, then we end up
creating a stage-2 mapping of the page containing it, which causes us to
map neighbouring memory locations directly into the guest.

As an example, consider a platform with GICV at physical 0x2c02f000
running a 64k-page host kernel. If qemu maps this into the guest at
0x80010000, then guest physical addresses 0x80010000 - 0x8001efff will
map host physical region 0x2c020000 - 0x2c02efff. Accesses to these
physical regions may cause UNPREDICTABLE behaviour, for example, on the
Juno platform this will cause an SError exception to EL3, which brings
down the entire physical CPU resulting in RCU stalls / HYP panics / host
crashing / wasted weeks of debugging.

SBSA recommends that systems alias the 4k GICV across the bounding 64k
region, in which case GICV physical could be described as 0x2c020000 in
the above scenario.

This patch fixes the problem by failing the vgic probe if the physical
base address or the size of GICV aren't page-aligned. Note that this
generated a warning in dmesg about freeing enabled IRQs, so I had to
move the IRQ enabling later in the probe.

Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joel Schopp <joel.schopp@amd.com>
Cc: Don Dutile <ddutile@redhat.com>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Joel Schopp <joel.schopp@amd.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
10 years agoarm: Add devicetree fixup machine function
Laura Abbott [Tue, 15 Jul 2014 17:03:36 +0000 (10:03 -0700)]
arm: Add devicetree fixup machine function

Commit 1c2f87c22566cd057bc8cde10c37ae9da1a1bb76
(ARM: 8025/1: Get rid of meminfo) dropped the upper bound on
the number of memory banks that can be added as there was no
technical need in the kernel. It turns out though, some bootloaders
(specifically the arndale-octa exynos boards) may pass invalid memory
information and rely on the kernel to not parse this data. This is a
bug in the bootloader but we still need to work around this.
Work around this by introducing a dt_fixup function. This function
gets called before the flattened devicetree is scanned for memory
and the like. In this fixup function for exynos, limit the maximum
number of memory regions in the devicetree.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Tested-by: Andreas Färber <afaerber@suse.de>
[glikely: Added a comment and fixed up function name]
Signed-off-by: Grant Likely <grant.likely@linaro.org>
10 years agoof: Add memory limiting function for flattened devicetrees
Laura Abbott [Tue, 15 Jul 2014 17:03:35 +0000 (10:03 -0700)]
of: Add memory limiting function for flattened devicetrees

Buggy bootloaders may pass bogus memory entries in the devicetree.
Add of_fdt_limit_memory to add an upper bound on the number of
entries that can be present in the devicetree.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Tested-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
10 years agoof: Split early_init_dt_scan into two parts
Laura Abbott [Tue, 15 Jul 2014 17:03:34 +0000 (10:03 -0700)]
of: Split early_init_dt_scan into two parts

Currently, early_init_dt_scan validates the header, sets the
boot params, and scans for chosen/memory all in one function.
Split this up into two separate functions (validation/setting
boot params in one, scanning in another) to allow for
additional setup between boot params and scanning the memory.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Tested-by: Andreas Färber <afaerber@suse.de>
[glikely: s/early_init_dt_scan_all/early_init_dt_scan_nodes/]
Signed-off-by: Grant Likely <grant.likely@linaro.org>
10 years agoACPI / PNP: Fix acpi_pnp_match()
Rafael J. Wysocki [Tue, 29 Jul 2014 22:23:09 +0000 (00:23 +0200)]
ACPI / PNP: Fix acpi_pnp_match()

The acpi_pnp_match() function is used for finding the ACPI device
object that should be associated with the given PNP device.
Unfortunately, the check used by that function is not strict enough
and may cause success to be returned for a wrong ACPI device object.

To fix that, use the observation that the pointer to the ACPI
device object in question is already stored in the data field
in struct pnp_dev, so acpi_pnp_match() can simply use that
field to do its job.

This problem was uncovered in 3.14 by commit 202317a573b2 (ACPI / scan:
Add acpi_device objects for all device nodes in the namespace).

Fixes: 202317a573b2 (ACPI / scan: Add acpi_device objects for all device nodes in the namespace)
Reported-and-tested-by: Vinson Lee <vlee@twopensource.com>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
10 years agonet: phy: re-apply PHY fixups during phy_register_device
Florian Fainelli [Mon, 28 Jul 2014 23:28:07 +0000 (16:28 -0700)]
net: phy: re-apply PHY fixups during phy_register_device

Commit 87aa9f9c61ad ("net: phy: consolidate PHY reset in phy_init_hw()")
moved the call to phy_scan_fixups() in phy_init_hw() after a software
reset is performed.

By the time phy_init_hw() is called in phy_device_register(), no driver
has been bound to this PHY yet, so all the checks in phy_init_hw()
against the PHY driver and the PHY driver's config_init function will
return 0. We will therefore never call phy_scan_fixups() as we should.

Fix this by calling phy_scan_fixups() and check for its return value to
restore the intended functionality.

This broke PHY drivers which do register an early PHY fixup callback to
intercept the PHY probing and do things like changing the 32-bits unique
PHY identifier when a pseudo-PHY address has been used, as well as
board-specific PHY fixups that need to be applied during driver probe
time.

Reported-by: Hauke Merthens <hauke-m@hauke-m.de>
Reported-by: Jonas Gorski <jogo@openwrt.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agocdc-ether: clean packet filter upon probe
Oliver Neukum [Mon, 28 Jul 2014 08:56:36 +0000 (10:56 +0200)]
cdc-ether: clean packet filter upon probe

There are devices that don't do reset all the way. So the packet filter should
be set to a sane initial value. Failure to do so leads to intermittent failures
of DHCP on some systems under some conditions.

Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agocdc_subset: deal with a device that needs reset for timeout
Oliver Neukum [Mon, 28 Jul 2014 08:12:34 +0000 (10:12 +0200)]
cdc_subset: deal with a device that needs reset for timeout

This device needs to be reset to recover from a timeout.
Unfortunately this can be handled only at the level of
the subdrivers.

Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: sendmsg: fix NULL pointer dereference
Andrey Ryabinin [Sat, 26 Jul 2014 17:26:58 +0000 (21:26 +0400)]
net: sendmsg: fix NULL pointer dereference

Sasha's report:
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel with the KASAN patchset, I've stumbled on the following spew:
>
> [ 4448.949424] ==================================================================
> [ 4448.951737] AddressSanitizer: user-memory-access on address 0
> [ 4448.952988] Read of size 2 by thread T19638:
> [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
> [ 4448.956823]  ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40
> [ 4448.958233]  ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d
> [ 4448.959552]  0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000
> [ 4448.961266] Call Trace:
> [ 4448.963158] dump_stack (lib/dump_stack.c:52)
> [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184)
> [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352)
> [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339)
> [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339)
> [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555)
> [ 4448.970103] sock_sendmsg (net/socket.c:654)
> [ 4448.971584] ? might_fault (mm/memory.c:3741)
> [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740)
> [ 4448.973596] ? verify_iovec (net/core/iovec.c:64)
> [ 4448.974522] ___sys_sendmsg (net/socket.c:2096)
> [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
> [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273)
> [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1))
> [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188)
> [ 4448.980535] __sys_sendmmsg (net/socket.c:2181)
> [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
> [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2))
> [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> [ 4448.986754] SyS_sendmmsg (net/socket.c:2201)
> [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542)
> [ 4448.988929] ==================================================================

This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0.

After this report there was no usual "Unable to handle kernel NULL pointer dereference"
and this gave me a clue that address 0 is mapped and contains valid socket address structure in it.

This bug was introduced in f3d3342602f8bcbf37d7c46641cb9bca7618eb1c
(net: rework recvmsg handler msg_name and msg_namelen logic).
Commit message states that:
"Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
 non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
 affect sendto as it would bail out earlier while trying to copy-in the
 address."
But in fact this affects sendto when address 0 is mapped and contains
socket address structure in it. In such case copy-in address will succeed,
verify_iovec() function will successfully exit with msg->msg_namelen > 0
and msg->msg_name == NULL.

This patch fixes it by setting msg_namelen to 0 if msg_name == NULL.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: <stable@vger.kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoisdn/bas_gigaset: fix a leak on failure path in gigaset_probe()
Alexey Khoroshilov [Fri, 25 Jul 2014 22:34:31 +0000 (02:34 +0400)]
isdn/bas_gigaset: fix a leak on failure path in gigaset_probe()

There is a lack of usb_put_dev(udev) on failure path in gigaset_probe().

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Tue, 29 Jul 2014 17:28:38 +0000 (10:28 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
 "A nice small set of bug fixes for arm-soc:

   - two incorrect register addresses in DT files on shmobile and hisilicon
   - one revert for a regression on omap
   - one bug fix for a newly introduced pin controller binding
   - one regression fix for the memory controller on omap
   - one patch to avoid a harmless WARN_ON"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: dts: Revert enabling of twl configuration for n900
  ARM: dts: fix L2 address in Hi3620
  ARM: OMAP2+: gpmc: fix gpmc_hwecc_bch_capable()
  pinctrl: dra: dt-bindings: Fix pull enable/disable
  ARM: shmobile: r8a7791: Fix SD2CKCR register address
  ARM: OMAP2+: l2c: squelch warning dump on power control setting

10 years agoAFS: Correctly assemble the client UUID
David Howells [Tue, 29 Jul 2014 16:53:23 +0000 (17:53 +0100)]
AFS: Correctly assemble the client UUID

Correctly assemble the client UUID by OR'ing in the flags rather than
assigning them over the other components.

Reported-by: Himangi Saraogi <himangi774@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agomm: fix page_alloc.c kernel-doc warnings
Randy Dunlap [Sun, 27 Jul 2014 21:15:33 +0000 (14:15 -0700)]
mm: fix page_alloc.c kernel-doc warnings

Fix kernel-doc warnings and function name in mm/page_alloc.c:

  Warning(..//mm/page_alloc.c:6074): No description found for parameter 'pfn'
  Warning(..//mm/page_alloc.c:6074): No description found for parameter 'mask'
  Warning(..//mm/page_alloc.c:6074): Excess function parameter 'start_bitidx' description in 'get_pfnblock_flags_mask'
  Warning(..//mm/page_alloc.c:6102): No description found for parameter 'pfn'
  Warning(..//mm/page_alloc.c:6102): No description found for parameter 'mask'
  Warning(..//mm/page_alloc.c:6102): Excess function parameter 'start_bitidx' description in 'set_pfnblock_flags_mask'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout
Konstantin Khlebnikov [Fri, 25 Jul 2014 08:17:12 +0000 (09:17 +0100)]
ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout

On LPAE, each level 1 (pgd) page table entry maps 1GiB, and the level 2
(pmd) entries map 2MiB.

When the identity mapping is created on LPAE, the pgd pointers are copied
from the swapper_pg_dir.  If we find that we need to modify the contents
of a pmd, we allocate a new empty pmd table and insert it into the
appropriate 1GB slot, before then filling it with the identity mapping.

However, if the 1GB slot covers the kernel lowmem mappings, we obliterate
those mappings.

When replacing a PMD, first copy the old PMD contents to the new PMD, so
that we preserve the existing mappings, particularly the mappings of the
kernel itself.

[rewrote commit message and added code comment -- rmk]

Fixes: ae2de101739c ("ARM: LPAE: Add identity mapping support for the 3-level page table format")
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
10 years agoMerge tag 'omap-for-v3.16/n900-regression' of git://git.kernel.org/pub/scm/linux...
Arnd Bergmann [Tue, 29 Jul 2014 11:04:27 +0000 (13:04 +0200)]
Merge tag 'omap-for-v3.16/n900-regression' of git://git./linux/kernel/git/tmlind/linux-omap into fixes

Merge "omap n900 regression fix for v3.16 rc series" from Tony Lindgren:

Minimal regression fix for n900 display that got broken with
enabling of twl4030 PM features. Turns out more work is needed
before we can enable twl4030 PM on n900.

I did not notice this earlier as I have my n900 in a rack
and the display did not get enabled for device tree based booting
until for v3.16.

* tag 'omap-for-v3.16/n900-regression' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: Revert enabling of twl configuration for n900

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
10 years agoARM: fix alignment of keystone page table fixup
Russell King [Tue, 29 Jul 2014 08:24:47 +0000 (09:24 +0100)]
ARM: fix alignment of keystone page table fixup

If init_mm.brk is not section aligned, the LPAE fixup code will miss
updating the final PMD.  Fix this by aligning map_end.

Fixes: a77e0c7b2774 ("ARM: mm: Recreate kernel mappings in early_paging_init()")
Cc: <stable@vger.kernel.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
10 years agoARM: dts: Revert enabling of twl configuration for n900
Tony Lindgren [Fri, 25 Jul 2014 11:41:25 +0000 (04:41 -0700)]
ARM: dts: Revert enabling of twl configuration for n900

Commit 9188883fd66e9 (ARM: dts: Enable twl4030 off-idle configuration
for selected omaps) allowed n900 to cut off core voltages during
off-idle. This however caused a regression where twl regulator
vaux1 was not getting enabled for the LCD panel as we are not
requesting it for the panel.

Turns out quite a few devices on n900 are using vaux1, and we need
to either stop idling it, or add proper regulator_get calls for all
users. But until we have a proper solution implemented and tested,
let's just disable the twl off-idle configuration for now for n900.

Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Fixes: 9188883fd66e9 (ARM: dts: Enable twl4030 off-idle configuration for selected omaps)
Signed-off-by: Tony Lindgren <tony@atomide.com>
10 years agoip: make IP identifiers less predictable
Eric Dumazet [Sat, 26 Jul 2014 06:58:10 +0000 (08:58 +0200)]
ip: make IP identifiers less predictable

In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
Jedidiah describe ways exploiting linux IP identifier generation to
infer whether two machines are exchanging packets.

With commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we
changed IP id generation, but this does not really prevent this
side-channel technique.

This patch adds a random amount of perturbation so that IP identifiers
for a given destination [1] are no longer monotonically increasing after
an idle period.

Note that prandom_u32_max(1) returns 0, so if generator is used at most
once per jiffy, this patch inserts no hole in the ID suite and do not
increase collision probability.

This is jiffies based, so in the worst case (HZ=1000), the id can
rollover after ~65 seconds of idle time, which should be fine.

We also change the hash used in __ip_select_ident() to not only hash
on daddr, but also saddr and protocol, so that ICMP probes can not be
used to infer information for other protocols.

For IPv6, adds saddr into the hash as well, but not nexthdr.

If I ping the patched target, we can see ID are now hard to predict.

21:57:11.008086 IP (...)
    A > target: ICMP echo request, seq 1, length 64
21:57:11.010752 IP (... id 2081 ...)
    target > A: ICMP echo reply, seq 1, length 64

21:57:12.013133 IP (...)
    A > target: ICMP echo request, seq 2, length 64
21:57:12.015737 IP (... id 3039 ...)
    target > A: ICMP echo reply, seq 2, length 64

21:57:13.016580 IP (...)
    A > target: ICMP echo request, seq 3, length 64
21:57:13.019251 IP (... id 3437 ...)
    target > A: ICMP echo reply, seq 3, length 64

[1] TCP sessions uses a per flow ID generator not changed by this patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jeffrey Knockel <jeffk@cs.unm.edu>
Reported-by: Jedidiah R. Crandall <crandall@cs.unm.edu>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoneighbour : fix ndm_type type error issue
Jun Zhao [Fri, 25 Jul 2014 16:38:59 +0000 (00:38 +0800)]
neighbour : fix ndm_type type error issue

ndm_type means L3 address type, in neighbour proxy and vxlan, it's RTN_UNICAST.
NDA_DST is for netlink TLV type, hence it's not right value in this context.

Signed-off-by: Jun Zhao <mypopydev@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosunvnet: only use connected ports when sending
David L Stevens [Fri, 25 Jul 2014 14:30:11 +0000 (10:30 -0400)]
sunvnet: only use connected ports when sending

The sunvnet driver doesn't check whether or not a port is connected when
transmitting packets, which results in failures if a port fails to connect
(e.g., due to a version mismatch). The original code also assumes
unnecessarily that the first port is up and a switch, even though there is
a flag for switch ports.

This patch only matches a port if it is connected, and otherwise uses the
switch_port flag to send the packet to a switch port that is up.

Signed-off-by: David L Stevens <david.stevens@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge tag 'linux-can-fixes-for-3.16-20140725' of git://gitorious.org/linux-can/linux-can
David S. Miller [Tue, 29 Jul 2014 00:01:01 +0000 (17:01 -0700)]
Merge tag 'linux-can-fixes-for-3.16-20140725' of git://gitorious.org/linux-can/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2014-07-25

this is a pull request of one patch for the net tree, hoping to get into the
3.16 release.

The patch by George Cherian fixes a regression in the c_can platform driver.
When using two interfaces the regression leads to a non function second
interface.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agox86_64/entry/xen: Do not invoke espfix64 on Xen
Andy Lutomirski [Wed, 23 Jul 2014 15:34:11 +0000 (08:34 -0700)]
x86_64/entry/xen: Do not invoke espfix64 on Xen

This moves the espfix64 logic into native_iret.  To make this work,
it gets rid of the native patch for INTERRUPT_RETURN:
INTERRUPT_RETURN on native kernels is now 'jmp native_iret'.

This changes the 16-bit SS behavior on Xen from OOPSing to leaking
some bits of the Xen hypervisor's RSP (I think).

[ hpa: this is a nonzero cost on native, but probably not enough to
  measure. Xen needs to fix this in their own code, probably doing
  something equivalent to espfix64. ]

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
10 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Mon, 28 Jul 2014 18:35:30 +0000 (11:35 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6

Pull ARM AES crypto fixes from Herbert Xu:
 "This push fixes a regression on ARM where odd-sized blocks supplied to
  AES may cause crashes"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: arm-aes - fix encryption of unaligned data
  crypto: arm64-aes - fix encryption of unaligned data

10 years agoMerge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Linus Torvalds [Mon, 28 Jul 2014 18:34:31 +0000 (11:34 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc

Pull powerpc fixes from Ben Herrenschmidt:
 "Here are 3 more small powerpc fixes that should still go into .16.

  One is a recent regression (MMCR2 business), the other is a trivial
  endian fix without which FW updates won't work on LE in IBM machines,
  and the 3rd one turns a BUG_ON into a WARN_ON which is definitely a
  LOT more friendly especially when the whole thing is about retrieving
  error logs ..."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Fix endianness of flash_block_list in rtas_flash
  powerpc/powernv: Change BUG_ON to WARN_ON in elog code
  powerpc/perf: Fix MMCR2 handling for EBB

10 years agocrypto: arm-aes - fix encryption of unaligned data
Mikulas Patocka [Fri, 25 Jul 2014 23:42:30 +0000 (19:42 -0400)]
crypto: arm-aes - fix encryption of unaligned data

Fix the same alignment bug as in arm64 - we need to pass residue
unprocessed bytes as the last argument to blkcipher_walk_done.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # 3.13+
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
10 years agocrypto: arm64-aes - fix encryption of unaligned data
Mikulas Patocka [Fri, 25 Jul 2014 23:40:20 +0000 (19:40 -0400)]
crypto: arm64-aes - fix encryption of unaligned data

cryptsetup fails on arm64 when using kernel encryption via AF_ALG socket.
See https://bugzilla.redhat.com/show_bug.cgi?id=1122937

The bug is caused by incorrect handling of unaligned data in
arch/arm64/crypto/aes-glue.c. Cryptsetup creates a buffer that is aligned
on 8 bytes, but not on 16 bytes. It opens AF_ALG socket and uses the
socket to encrypt data in the buffer. The arm64 crypto accelerator causes
data corruption or crashes in the scatterwalk_pagedone.

This patch fixes the bug by passing the residue bytes that were not
processed as the last parameter to blkcipher_walk_done.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
10 years agopowerpc: Fix endianness of flash_block_list in rtas_flash
Thomas Falcon [Fri, 25 Jul 2014 17:47:42 +0000 (12:47 -0500)]
powerpc: Fix endianness of flash_block_list in rtas_flash

The function rtas_flash_firmware passes the address of a data structure,
flash_block_list, when making the update-flash-64-and-reboot rtas call.
While the endianness of the address is handled correctly, the endianness
of the data is not.  This patch ensures that the data in flash_block_list
is big endian when passed to rtas on little endian hosts.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agopowerpc/powernv: Change BUG_ON to WARN_ON in elog code
Vasant Hegde [Wed, 23 Jul 2014 09:22:39 +0000 (14:52 +0530)]
powerpc/powernv: Change BUG_ON to WARN_ON in elog code

We can continue to read the error log (up to MAX size) even if
we get the elog size more than MAX size. Hence change BUG_ON to
WARN_ON.

Also updated error message.

Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
10 years agoLinux 3.16-rc7 v3.16-rc7
Linus Torvalds [Sun, 27 Jul 2014 19:41:55 +0000 (12:41 -0700)]
Linux 3.16-rc7

10 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 27 Jul 2014 16:57:16 +0000 (09:57 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "A bunch of fixes for perf and kprobes:
   - revert a commit that caused a perf group regression
   - silence dmesg spam
   - fix kprobe probing errors on ia64 and ppc64
   - filter kprobe faults from userspace
   - lockdep fix for perf exit path
   - prevent perf #GP in KVM guest
   - correct perf event and filters"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes: Fix "Failed to find blacklist" probing errors on ia64 and ppc64
  kprobes/x86: Don't try to resolve kprobe faults from userspace
  perf/x86/intel: Avoid spamming kernel log for BTS buffer failure
  perf/x86/intel: Protect LBR and extra_regs against KVM lying
  perf: Fix lockdep warning on process exit
  perf/x86/intel/uncore: Fix SNB-EP/IVT Cbox filter mappings
  perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge
  perf: Revert ("perf: Always destroy groups on exit")

10 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 27 Jul 2014 16:53:01 +0000 (09:53 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin:
 "A couple of crash fixes, plus a fix that on 32 bits would cause a
  missing -ENOSYS for nonexistent system calls"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, cpu: Fix cache topology for early P4-SMT
  x86_32, entry: Store badsys error code in %eax
  x86, MCE: Robustify mcheck_init_device

10 years agoMerge branch 'vfs-for-3.16' of git://git.infradead.org/users/hch/vfs
Linus Torvalds [Sun, 27 Jul 2014 16:43:41 +0000 (09:43 -0700)]
Merge branch 'vfs-for-3.16' of git://git.infradead.org/users/hch/vfs

Pull vfs fixes from Christoph Hellwig:
 "A vfsmount leak fix, and a compile warning fix"

* 'vfs-for-3.16' of git://git.infradead.org/users/hch/vfs:
  fs: umount on symlink leaks mnt count
  direct-io: fix uninitialized warning in do_direct_IO()

10 years agoMerge tag 'firewire-fix-vt6315' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 27 Jul 2014 16:42:06 +0000 (09:42 -0700)]
Merge tag 'firewire-fix-vt6315' of git://git./linux/kernel/git/ieee1394/linux1394

Pull firewire regression fix from Stefan Richter:
 "IEEE 1394 (FireWire) subsystem fix: MSI don't work on VIA PCIe
  controllers with some isochronous workloads (regression since
  v3.16-rc1)"

* tag 'firewire-fix-vt6315' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: ohci: disable MSI for VIA VT6315 again

10 years agoFix gcc-4.9.0 miscompilation of load_balance() in scheduler
Linus Torvalds [Sat, 26 Jul 2014 21:52:01 +0000 (14:52 -0700)]
Fix gcc-4.9.0 miscompilation of load_balance()  in scheduler

Michel Dänzer and a couple of other people reported inexplicable random
oopses in the scheduler, and the cause turns out to be gcc mis-compiling
the load_balance() function when debugging is enabled.  The gcc bug
apparently goes back to gcc-4.5, but slight optimization changes means
that it now showed up as a problem in 4.9.0 and 4.9.1.

The instruction scheduling problem causes gcc to schedule a spill
operation to before the stack frame has been created, which in turn can
corrupt the spilled value if an interrupt comes in.  There may be other
effects of this bug too, but that's the code generation problem seen in
Michel's case.

This is fixed in current gcc HEAD, but the workaround as suggested by
Markus Trippelsdorf is pretty simple: use -fno-var-tracking-assignments
when compiling the kernel, which disables the gcc code that causes the
problem.  This can result in slightly worse debug information for
variable accesses, but that is infinitely preferable to actual code
generation problems.

Doing this unconditionally (not just for CONFIG_DEBUG_INFO) also allows
non-debug builds to verify that the debug build would be identical: we
can do

    export GCC_COMPARE_DEBUG=1

to make gcc internally verify that the result of the build is
independent of the "-g" flag (it will make the compiler build everything
twice, toggling the debug flag, and compare the results).

Without the "-fno-var-tracking-assignments" option, the build would fail
(even with 4.8.3 that didn't show the actual stack frame bug) with a gcc
compare failure.

See also gcc bugzilla:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61801

Reported-by: Michel Dänzer <michel@daenzer.net>
Suggested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agomm: fix direct reclaim writeback regression
Hugh Dickins [Sat, 26 Jul 2014 19:58:23 +0000 (12:58 -0700)]
mm: fix direct reclaim writeback regression

Shortly before 3.16-rc1, Dave Jones reported:

  WARNING: CPU: 3 PID: 19721 at fs/xfs/xfs_aops.c:971
           xfs_vm_writepage+0x5ce/0x630 [xfs]()
  CPU: 3 PID: 19721 Comm: trinity-c61 Not tainted 3.15.0+ #3
  Call Trace:
    xfs_vm_writepage+0x5ce/0x630 [xfs]
    shrink_page_list+0x8f9/0xb90
    shrink_inactive_list+0x253/0x510
    shrink_lruvec+0x563/0x6c0
    shrink_zone+0x3b/0x100
    shrink_zones+0x1f1/0x3c0
    try_to_free_pages+0x164/0x380
    __alloc_pages_nodemask+0x822/0xc90
    alloc_pages_vma+0xaf/0x1c0
    handle_mm_fault+0xa31/0xc50
  etc.

 970   if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
 971                   PF_MEMALLOC))

I did not respond at the time, because a glance at the PageDirty block
in shrink_page_list() quickly shows that this is impossible: we don't do
writeback on file pages (other than tmpfs) from direct reclaim nowadays.
Dave was hallucinating, but it would have been disrespectful to say so.

However, my own /var/log/messages now shows similar complaints

  WARNING: CPU: 1 PID: 28814 at fs/ext4/inode.c:1881 ext4_writepage+0xa7/0x38b()
  WARNING: CPU: 0 PID: 27347 at fs/ext4/inode.c:1764 ext4_writepage+0xa7/0x38b()

from stressing some mmotm trees during July.

Could a dirty xfs or ext4 file page somehow get marked PageSwapBacked,
so fail shrink_page_list()'s page_is_file_cache() test, and so proceed
to mapping->a_ops->writepage()?

Yes, 3.16-rc1's commit 68711a746345 ("mm, migration: add destination
page freeing callback") has provided such a way to compaction: if
migrating a SwapBacked page fails, its newpage may be put back on the
list for later use with PageSwapBacked still set, and nothing will clear
it.

Whether that can do anything worse than issue WARN_ON_ONCEs, and get
some statistics wrong, is unclear: easier to fix than to think through
the consequences.

Fixing it here, before the put_new_page(), addresses the bug directly,
but is probably the worst place to fix it.  Page migration is doing too
many parts of the job on too many levels: fixing it in
move_to_new_page() to complement its SetPageSwapBacked would be
preferable, except why is it (and newpage->mapping and newpage->index)
done there, rather than down in migrate_page_move_mapping(), once we are
sure of success? Not a cleanup to get into right now, especially not
with memcg cleanups coming in 3.17.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>