review.tizen.org Git - platform/kernel/linux-rpi.git/log

Merge branch 'for-linus' of git://git./linux/kernel/git/mpe/linux

Pull powerpc fix from Michael Ellerman:
"One fix from Scott, he says:

  This patch fixes a crash (introduced in v3.18-rc1) in the FSL MSI driver
  when threaded IRQs are enabled"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  powerpc/fsl_msi: mark the msi cascade handler IRQF_NO_THREAD

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"Misc fixes:
   - gold linker build fix
   - noxsave command line parsing fix
   - bugfix for NX setup
   - microcode resume path bug fix
   - _TIF_NOHZ versus TIF_NOHZ bugfix as discussed in the mysterious
     lockup thread"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1
  x86, kaslr: Handle Gold linker for finding bss/brk
  x86, mm: Set NX across entire PMD at boot
  x86, microcode: Update BSPs microcode on resume
  x86: Require exact match for 'noxsave' command line option

Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
"Misc fixes: two NUMA fixes, two cputime fixes and an RCU/lockdep fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency
  sched/cputime: Fix cpu_timer_sample_group() double accounting
  sched/numa: Avoid selecting oneself as swap target
  sched/numa: Fix out of bounds read in sched_init_numa()
  sched: Remove lockdep check in sched_move_task()

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Misc fixes: two Intel uncore driver fixes, a CPU-hotplug fix and a
  build dependencies fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Fix boot crash on SBOX PMU on Haswell-EP
  perf/x86/intel/uncore: Fix IRP uncore register offsets on Haswell EP
  perf: Fix corruption of sibling list with hotplug
  perf/x86: Fix embarrasing typo

Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull core fix from Ingo Molnar:
"Fix GENMASK macro shift overflow"

Nobody seems to currently use GENMASK() to fill every single last bit
(which is what overflows) in-tree, and gcc would warn about it, so we
have that going for us. But apparently there are pending changes that
want this.

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
bitops: Fix shift overflow in GENMASK macros

x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1

TIF_NOHZ is 19 (i.e. _TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME |
_TIF_SINGLESTEP), not (1<<19).

This code is involved in Dave's trinity lockup, but I don't see why
it would cause any of the problems he's seeing, except inadvertently
by causing a different path through entry_64.S's syscall handling.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/a6cd3b60a3f53afb6e1c8081b0ec30ff19003dd7.1416434075.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86, kaslr: Handle Gold linker for finding bss/brk

When building with the Gold linker, the .bss and .brk areas of vmlinux
are shown as consecutive instead of having the same file offset. Allow
for either state, as long as things add up correctly.

Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Link: http://lkml.kernel.org/r/20141118001604.GA25045@www.outflux.net
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86, mm: Set NX across entire PMD at boot

When setting up permissions on kernel memory at boot, the end of the
PMD that was split from bss remained executable. It should be NX like
the rest. This performs a PMD alignment instead of a PAGE alignment to
get the correct span of memory.

Before:
---[ High Kernel Mapping ]---
...
0xffffffff8202d000-0xffffffff82200000  1868K     RW       GLB NX pte
0xffffffff82200000-0xffffffff82c00000    10M     RW   PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82df5000  2004K     RW       GLB NX pte
0xffffffff82df5000-0xffffffff82e00000    44K     RW       GLB x  pte
0xffffffff82e00000-0xffffffffc0000000   978M                     pmd

After:
---[ High Kernel Mapping ]---
...
0xffffffff8202d000-0xffffffff82200000  1868K     RW       GLB NX pte
0xffffffff82200000-0xffffffff82e00000    12M     RW   PSE GLB NX pmd
0xffffffff82e00000-0xffffffffc0000000   978M                     pmd

[ tglx: Changed it to roundup(_brk_end, PMD_SIZE) and added a comment.
        We really should unmap the reminder along with the holes
        caused by init,initdata etc. but thats a different issue ]

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20141114194737.GA3091@www.outflux.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86, microcode: Update BSPs microcode on resume

In the situation when we apply early microcode but do *not* apply late
microcode, we fail to update the BSP's microcode on resume because we
haven't initialized the uci->mc microcode pointer. So, in order to
alleviate that, we go and dig out the stashed microcode patch during
early boot. It is basically the same thing that is done on the APs early
during boot so do that too here.

Tested-by: alex.schnaidt@gmail.com
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88001
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20141118094657.GA6635@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

powerpc/fsl_msi: mark the msi cascade handler IRQF_NO_THREAD

The commit 543c043cbae7 ("powerpc/fsl_msi: change the irq handler from
chained to normal") changes the msi cascade handler from chained to
normal. Since cascade handler must run in hard interrupt context, this
will cause kernel panic if we force threading of all the interrupt
handler via kernel command parameter 'threadirqs'. So mark the irq
handler IRQF_NO_THREAD explicitly.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>

Linux 3.18-rc5

Merge tag 'armsoc-for-rc5' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"Another small set of fixes:

   - some DT compatible typo fixes
   - irq setup fix dealing with irq storms on orion
   - i2c quirk generalization for mvebu
   - a handful of smaller fixes for OMAP
   - a couple of added file patterns for OMAP entries in MAINTAINERS"

* tag 'armsoc-for-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: at91/dt: Fix sama5d3x typos
  pinctrl: dra: dt-bindings: Fix output pull up/down
  MAINTAINERS: Update entry for omap related .dts files to cover new SoCs
  MAINTAINERS: add more files under OMAP SUPPORT
  ARM: dts: AM437x-SK-EVM: Fix DCDC3 voltage
  ARM: dts: AM437x-GP-EVM: Fix DCDC3 voltage
  ARM: dts: AM43x-EPOS-EVM: Fix DCDC3 voltage
  ARM: dts: am335x-evm: Fix 5th NAND partition's name
  ARM: orion: Fix for certain sequence of request_irq can cause irq storm
  ARM: mvebu: armada xp: Generalize use of i2c quirk

Merge git://git./linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:

1) Fix NULL oops in Schizo PCI controller error handler.

2) Fix race between xchg and other operations on 32-bit sparc, from
    Andreas Larsson.

3) swab*() helpers need a dummy memory input operand to show data flow
    on 64-bit sparc.

4) Fix RCU warnings due to missing irq_{enter,exit}() around
    generic_smp_call_function*() calls.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix constraints on swab helpers.
  sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks
  sparc64: Do irq_{enter,exit}() around generic_smp_call_function*().
  sparc64: Fix crashes in schizo_pcierr_intr_other().

Merge tag 'md/3.18-fix' of git://neil.brown.name/md

Pull md bugfix from Neil Brown:
"One fix for md for 3.18.

This fixes a regression introduced in 3.13"

* tag 'md/3.18-fix' of git://neil.brown.name/md:
md: Always set RECOVERY_NEEDED when clearing RECOVERY_FROZEN

ARM: at91/dt: Fix sama5d3x typos

Some DT files had a typo with a missing "5" in sama5d3x first compatible string.

Signed-off-by: Peter Rosin <peda@axentia.se>
[nicolas.ferre@atmel.com: modify commit log]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'omap-fixes-against-v3.18-rc4' of git://git./linux/kernel/git/tmlind/linux-omap into fixes

Merge "omap fixes against v3.18-rc4" from Tony Lindgren:

Few omap fixes for hangs and wrong pinctrl defines, and update
MAINTAINERS file to avoid missing PMIC and SoC related patches:

- Fix random hangs on am437x because of incorrect default
  value for the DDR regulator

- Fix wrong partition name for NAND on am335x-evm

- Fix wrong pinctrl defines for dra7xx

- Update maintainers entries for PMICs and SoCs

* tag 'omap-fixes-against-v3.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  pinctrl: dra: dt-bindings: Fix output pull up/down
  MAINTAINERS: Update entry for omap related .dts files to cover new SoCs
  MAINTAINERS: add more files under OMAP SUPPORT
  ARM: dts: AM437x-SK-EVM: Fix DCDC3 voltage
  ARM: dts: AM437x-GP-EVM: Fix DCDC3 voltage
  ARM: dts: AM43x-EPOS-EVM: Fix DCDC3 voltage
  ARM: dts: am335x-evm: Fix 5th NAND partition's name

Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'mvebu-fixes-3.18' of git://git.infradead.org/linux-mvebu into fixes

Merge "mvebu fixes for v3.18" from Jason Cooper:

- Armada XP
    - Generalize i2c quirk

- orion
    - Fix irq storm caused by specific sequence of request_irq

* tag 'mvebu-fixes-3.18' of git://git.infradead.org/linux-mvebu:
  ARM: orion: Fix for certain sequence of request_irq can cause irq storm
  ARM: mvebu: armada xp: Generalize use of i2c quirk

md: Always set RECOVERY_NEEDED when clearing RECOVERY_FROZEN

md_check_recovery will skip any recovery and also clear
MD_RECOVERY_NEEDED if MD_RECOVERY_FROZEN is set.
So when we clear _FROZEN, we must set _NEEDED and ensure that
md_check_recovery gets run.
Otherwise we could miss out on something that is needed.

In particular, this can make it impossible to remove a
failed device from an array is the 'recovery-needed' processing
didn't happen.
Suitable for stable kernels since 3.13.

Cc: stable@vger.kernel.org (3.13+)
Reported-and-tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Fixes: 30b8feb730f9b9b3c5de02580897da03f59b6b16
Signed-off-by: NeilBrown <neilb@suse.de>

sparc64: Fix constraints on swab helpers.

We are reading the memory location, so we have to have a memory
constraint in there purely for the sake of showing the data flow
to the compiler.

Reported-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"This is a set of six fixes and a MAINTAINER update.

  The fixes are two multipath (one in Test Unit Ready handling for the
  path checkers and one in the section of code that sends a start unit
  after failover; both of these were perturbed by the scsi-mq update), a
  CD-ROM door locking fix that was likewise introduced by scsi-mq and
  three driver fixes for a previous code update in cxgb4i, megaraid_sas
  and bnx2fc"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  bnx2fc: fix tgt spinlock locking
  megaraid_sas: fix bug in handling return value of pci_enable_msix_range()
  cxgb4i: send abort_rpl correctly
  cxgbi: add maintainer for cxgb3i/cxgb4i
  scsi: TUR path is down after adapter gets reset with multipath
  scsi: call device handler for failed TUR command
  scsi: only re-lock door after EH on devices that were reset

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"Microcode fixes, a Xen fix and a KASLR boot loading fix with certain
  memory layouts"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, AMD: Fix ucode patch stashing on 32-bit
  x86/core, x86/xen/smp: Use 'die_complete' completion when taking CPU down
  x86, microcode: Fix accessing dis_ucode_ldr on 32-bit
  x86, kaslr: Prevent .bss from overlaping initrd
  x86, microcode, AMD: Fix early ucode loading on 32-bit

x86-64: make csum_partial_copy_from_user() error handling consistent

Al Viro pointed out that the x86-64 csum_partial_copy_from_user() is
somewhat confused about what it should do on errors, notably it mostly
clears the uncopied end result buffer, but misses that for the initial
alignment case.

All users should check for errors, so it's dubious whether the clearing
is even necessary, and Al also points out that we should probably clean
up the calling conventions, but regardless of any future changes to this
function, the fact that it is inconsistent is just annoying.

So make the __get_user() failure path use the same error exit as all the
other errors do.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: Require exact match for 'noxsave' command line option

We have some very similarly named command-line options:

arch/x86/kernel/cpu/common.c:__setup("noxsave", x86_xsave_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaveopt", x86_xsaveopt_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaves", x86_xsaves_setup);

__setup() is designed to match options that take arguments, like
"foo=bar" where you would have:

__setup("foo", x86_foo_func...);

The problem is that "noxsave" actually _matches_ "noxsaves" in
the same way that "foo" matches "foo=bar". If you boot an old
kernel that does not know about "noxsaves" with "noxsaves" on the
command line, it will interpret the argument as "noxsave", which
is not what you want at all.

This makes the "noxsave" handler only return success when it finds
an *exact* match.

[ tglx: We really need to make __setup() more robust. ]

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20141111220133.FE053984@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency

Commit d670ec13178d0 "posix-cpu-timers: Cure SMP wobbles" fixes one glibc
test case in cost of breaking another one. After that commit, calling
clock_nanosleep(TIMER_ABSTIME, X) and then clock_gettime(&Y) can result
of Y time being smaller than X time.

Reproducer/tester can be found further below, it can be compiled and ran by:

gcc -o tst-cpuclock2 tst-cpuclock2.c -pthread
while ./tst-cpuclock2 ; do : ; done

This reproducer, when running on a buggy kernel, will complain
about "clock_gettime difference too small".

Issue happens because on start in thread_group_cputimer() we initialize
sum_exec_runtime of cputimer with threads runtime not yet accounted and
then add the threads runtime to running cputimer again on scheduler
tick, making it's sum_exec_runtime bigger than actual threads runtime.

KOSAKI Motohiro posted a fix for this problem, but that patch was never
applied: https://lkml.org/lkml/2013/5/26/191 .

This patch takes different approach to cure the problem. It calls
update_curr() when cputimer starts, that assure we will have updated
stats of running threads and on the next schedule tick we will account
only the runtime that elapsed from cputimer start. That also assure we
have consistent state between cpu times of individual threads and cpu
time of the process consisted by those threads.

Full reproducer (tst-cpuclock2.c):

#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <stdio.h>
#include <time.h>
#include <pthread.h>
#include <stdint.h>
#include <inttypes.h>

/* Parameters for the Linux kernel ABI for CPU clocks.  */
#define CPUCLOCK_SCHED          2
#define MAKE_PROCESS_CPUCLOCK(pid, clock) \
((~(clockid_t) (pid) << 3) | (clockid_t) (clock))

static pthread_barrier_t barrier;

/* Help advance the clock.  */
static void *chew_cpu(void *arg)
{
pthread_barrier_wait(&barrier);
while (1) ;

return NULL;
}

/* Don't use the glibc wrapper.  */
static int do_nanosleep(int flags, const struct timespec *req)
{
clockid_t clock_id = MAKE_PROCESS_CPUCLOCK(0, CPUCLOCK_SCHED);

return syscall(SYS_clock_nanosleep, clock_id, flags, req, NULL);
}

static int64_t tsdiff(const struct timespec *before, const struct timespec *after)
{
int64_t before_i = before->tv_sec * 1000000000ULL + before->tv_nsec;
int64_t after_i = after->tv_sec * 1000000000ULL + after->tv_nsec;

return after_i - before_i;
}

int main(void)
{
int result = 0;
pthread_t th;

pthread_barrier_init(&barrier, NULL, 2);

if (pthread_create(&th, NULL, chew_cpu, NULL) != 0) {
perror("pthread_create");
return 1;
}

pthread_barrier_wait(&barrier);

/* The test.  */
struct timespec before, after, sleeptimeabs;
int64_t sleepdiff, diffabs;
const struct timespec sleeptime = {.tv_sec = 0,.tv_nsec = 100000000 };

/* The relative nanosleep.  Not sure why this is needed, but its presence
   seems to make it easier to reproduce the problem.  */
if (do_nanosleep(0, &sleeptime) != 0) {
perror("clock_nanosleep");
return 1;
}

/* Get the current time.  */
if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &before) < 0) {
perror("clock_gettime[2]");
return 1;
}

/* Compute the absolute sleep time based on the current time.  */
uint64_t nsec = before.tv_nsec + sleeptime.tv_nsec;
sleeptimeabs.tv_sec = before.tv_sec + nsec / 1000000000;
sleeptimeabs.tv_nsec = nsec % 1000000000;

/* Sleep for the computed time.  */
if (do_nanosleep(TIMER_ABSTIME, &sleeptimeabs) != 0) {
perror("absolute clock_nanosleep");
return 1;
}

/* Get the time after the sleep.  */
if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &after) < 0) {
perror("clock_gettime[3]");
return 1;
}

/* The time after sleep should always be equal to or after the absolute sleep
   time passed to clock_nanosleep.  */
sleepdiff = tsdiff(&sleeptimeabs, &after);
if (sleepdiff < 0) {
printf("absolute clock_nanosleep woke too early: %" PRId64 "\n", sleepdiff);
result = 1;

printf("Before %llu.%09llu\n", before.tv_sec, before.tv_nsec);
printf("After  %llu.%09llu\n", after.tv_sec, after.tv_nsec);
printf("Sleep  %llu.%09llu\n", sleeptimeabs.tv_sec, sleeptimeabs.tv_nsec);
}

/* The difference between the timestamps taken before and after the
   clock_nanosleep call should be equal to or more than the duration of the
   sleep.  */
diffabs = tsdiff(&before, &after);
if (diffabs < sleeptime.tv_nsec) {
printf("clock_gettime difference too small: %" PRId64 "\n", diffabs);
result = 1;
}

pthread_cancel(th);

return result;
}

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141112155843.GA24803@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/cputime: Fix cpu_timer_sample_group() double accounting

While looking over the cpu-timer code I found that we appear to add
the delta for the calling task twice, through:

  cpu_timer_sample_group()
    thread_group_cputimer()
      thread_group_cputime()
        times->sum_exec_runtime += task_sched_runtime();

    *sample = cputime.sum_exec_runtime + task_delta_exec();

Which would make the sample run ahead, making the sleep short.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20141112113737.GI10476@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/numa: Avoid selecting oneself as swap target

Because the whole numa task selection stuff runs with preemption
enabled (its long and expensive) we can end up migrating and selecting
oneself as a swap target. This doesn't really work out well -- we end
up trying to acquire the same lock twice for the swap migrate -- so
avoid this.

Reported-and-Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141110100328.GF29390@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>

bitops: Fix shift overflow in GENMASK macros

On some 32 bits architectures, including x86, GENMASK(31, 0) returns 0
instead of the expected ~0UL.

This is the same on some 64 bits architectures with GENMASK_ULL(63, 0).

This is due to an overflow in the shift operand, 1 << 32 for GENMASK,
1 << 64 for GENMASK_ULL.

Reported-by: Eric Paire <eric.paire@st.com>
Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Maxime Coquelin <maxime.coquelin@st.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v3.13+
Cc: linux@rasmusvillemoes.dk
Cc: gong.chen@linux.intel.com
Cc: John Sullivan <jsrhbz@kanargh.force9.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Fixes: 10ef6b0dffe4 ("bitops: Introduce a more generic BITMASK macro")
Link: http://lkml.kernel.org/r/1415267659-10563-1-git-send-email-maxime.coquelin@st.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86/intel/uncore: Fix boot crash on SBOX PMU on Haswell-EP

There were several reports that on some systems writing the SBOX0 PMU
initialization MSR would #GP at boot. This did not happen on all
systems -- my two test systems booted fine.

Writing the three initialization bits bit-by-bit seems to avoid the
problem. So add a special callback to do just that.

This replaces an earlier patch that disabled the SBOX.

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-and-Tested-by: Patrick Lu <patrick.lu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-4-git-send-email-andi@firstfloor.org
[ Fixed a whitespace error and added attribution tags that were left out inexplicably. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86/intel/uncore: Fix IRP uncore register offsets on Haswell EP

The counter register offsets for the IRP box PMU for Haswell-EP
were incorrect. The offsets actually changed over IvyBridge EP.

Fix them to the correct values. For this we need to fork the read
function from the IVB and use an own counter array.

Tested-by: patrick.lu@intel.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf: Fix corruption of sibling list with hotplug

When a CPU hotplugged out, we call perf_remove_from_context() (via
perf_event_exit_cpu()) to rip each CPU-bound event out of its PMU's cpu
context, but leave siblings grouped together. Freeing of these events is
left to the mercy of the usual refcounting.

When a CPU-bound event's refcount drops to zero we cross-call to
__perf_remove_from_context() to clean it up, detaching grouped siblings.

This works when the relevant CPU is online, but will fail if the CPU is
currently offline, and we won't detach the event from its siblings
before freeing the event, leaving the sibling list corrupt. If the
sibling list is later walked (e.g. because the CPU cam online again
before a remaining sibling's refcount drops to zero), we will walk the
now corrupted siblings list, potentially dereferencing garbage values.

Given that the events should never be scheduled again (as we removed
them from their context), we can simply detatch siblings when the CPU
goes down in the first place. If the CPU comes back online, the
redundant call to __perf_remove_from_context() is safe.

Reported-by: Drew Richardson <drew.richardson@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: vincent.weaver@maine.edu
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1415203904-25308-2-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
"Two fixes this time, one to ensure that the kuser helper option
  depends on MMU as they aren't available for noMMU targets (and if the
  option is selected, we end up oopsing.)

  The second fix plugs a corner case with the decompressor, ensuring
  that the instruction stream can see the relocated code in every case
  on ARMv7 CPUs"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8198/1: make kuser helpers depend on MMU
  ARM: 8191/1: decompressor: ensure I-side picks up relocated code

Merge branch 'parisc-3.18-2' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc updates from Helge Deller:
"Changes include:
   - wire up the bpf syscall
   - remove CONFIG_64BIT usage from some userspace-exported header files
   - use compat functions for msgctl, shmat, shmctl and semtimedop
     syscalls"

* 'parisc-3.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Avoid using CONFIG_64BIT in userspace exported headers
  parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls
  parisc: Use BUILD_BUG() instead of undefined functions
  parisc: Wire up bpf syscall

Merge tag 'for-v3.18-rc' of git://git.infradead.org/battery-2.6

Pull power supply updates from Sebastian Reichel:
"Power supply and reset changes for the v3.18-rc:

   - misc. charger-manager fixes
   - year 2038 fix in ab8500_fg
   - fix error handling of bq2415x_charger"

* tag 'for-v3.18-rc' of git://git.infradead.org/battery-2.6:
  power: charger-manager: Fix accessing invalidated power supply after charger unbind
  power: charger-manager: Fix accessing invalidated power supply after fuel gauge unbind
  power: charger-manager: Avoid recursive thermal get_temp call
  power_supply: Add no_thermal property to prevent recursive get_temp calls
  power: bq2415x_charger: Fix memory leak on DTS parsing error
  power: bq2415x_charger: Properly handle ENODEV from power_supply_get_by_phandle
  power: ab8500_fg.c: use 64-bit time types

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm gixes from Dave Airlie:
- exynos: infinite loop regressions fixed
- i915: one regression
- radeon: one race condition on monitor probing
- noveau: two regressions
- tegra: one vblank regression fix

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/tegra: dc: Add missing call to drm_vblank_on()
  drm/nouveau/nv50/disp: Fix modeset on G94
  drm/gk20a/fb: fix setting of large page size bit
  drm/radeon: add locking around atombios scratch space usage
  drm/i915: Fix obj->map_and_fenceable across tiling changes
  drm/exynos: fix possible infinite loop issue
  drm/exynos: g2d: fix null pointer dereference
  drm/exynos: resolve infinite loop issue on non multi-platform
  drm/exynos: resolve infinite loop issue on multi-platform

kernel: use the gnu89 standard explicitly

Sasha Levin reports:
"gcc5 changes the default standard to c11, which makes kernel build
  unhappy

  Explicitly define the kernel standard to be gnu89 which should keep
  everything working exactly like it was before gcc5"

There are multiple small issues with the new default, but the biggest
issue seems to be that the old - and very useful - GNU extension to
allow a cast in front of an initializer has gone away.

Patch updated by Kirill:
"I'm pretty sure all gcc versions you can build kernel with supports
  -std=gnu89.  cc-option is redunrant.

  We also need to adjust HOSTCFLAGS otherwise allmodconfig fails for me"

Note by Andrew Pinski:
"Yes it was reported and both problems relating to this extension has
  been added to gnu99 and gnu11.  Though there are other issues with the
  kernel dealing with extern inline have different semantics between
  gnu89 and gnu99/11"

End result: we may be able to move up to a newer stdc model eventually,
but right now the newer models have some annoying deficiencies, so the
traditional "gnu89" model ends up being the preferred one.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Singed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:

   - stable patches to fix NFSv4.x delegation reclaim error paths
   - fix a bug whereby we were advertising NFSv4.1 but using NFSv4.2
     features
   - fix a use-after-free problem with pNFS block layouts
   - fix a memory leak in the pNFS files O_DIRECT code
   - replace an intrusive and Oops-prone performance fix in the NFSv4
     atomic open code with a safer one-line version and revert the two
     original patches"

* tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor
  NFS: Don't try to reclaim delegation open state if recovery failed
  NFSv4: Ensure that we call FREE_STATEID when NFSv4.x stateids are revoked
  NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
  NFSv4.1: nfs41_clear_delegation_stateid shouldn't trust NFS_DELEGATED_STATE
  NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired
  NFS: SEEK is an NFS v4.2 feature
  nfs: Fix use of uninitialized variable in nfs_getattr()
  nfs: Remove bogus assignment
  nfs: remove spurious WARN_ON_ONCE in write path
  pnfs/blocklayout: serialize GETDEVICEINFO calls
  nfs: fix pnfs direct write memory leak
  Revert "NFS: nfs4_do_open should add negative results to the dcache."
  Revert "NFS: remove BUG possibility in nfs4_open_and_get_state"
  NFSv4: Ensure nfs_atomic_open set the dentry verifier on ENOENT

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input subsystem updates from Dmitry Torokhov:
"Mostly small fixups to PS/2 tochpad drivers (ALPS, Elantech,
  Synaptics) to better deal with specific hardware"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: elantech - update the documentation
  Input: elantech - provide a sysfs knob for crc_enabled
  Input: elantech - report the middle button of the touchpad
  Input: alps - ignore bad data on Dell Latitudes E6440 and E7440
  Input: alps - allow up to 2 invalid packets without resetting device
  Input: alps - ignore potential bare packets when device is out of sync
  Input: elantech - fix crc_enabled for Fujitsu H730
  Input: elantech - use elantech_report_trackpoint for hardware v4 too
  Input: twl4030-pwrbutton - ensure a wakeup event is recorded.
  Input: synaptics - add min/max quirk for Lenovo T440s

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

- fix EFI stub cache maintenance causing aborts during boot on certain
   platforms

- handle byte stores in __clear_user without panicking

- fix race condition in aarch64_insn_patch_text_sync() (instruction
   patching)

- Couple of type fixes

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: ARCH_PFN_OFFSET should be unsigned long
  Correct the race condition in aarch64_insn_patch_text_sync()
  arm64: __clear_user: handle exceptions on strb
  arm64: Fix data type for physical address
  arm64: efi: Fix stub cache maintenance

Merge tag 'platform-drivers-x86-v3.18-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86

Pull x86 platform drivers fixlets from Darren Hart:
"Just two patches to remove hp_accel events from the keyboard bus
  stream via an i8042 filter"

* tag 'platform-drivers-x86-v3.18-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
  platform: hp_accel: Add SERIO_I8042 as a dependency since it now includes i8042.h/serio.h
  platform: hp_accel: add a i8042 filter to remove HPQ6000 data from kb bus stream

Merge branch 'for-3.18-fixes' of git://git./linux/kernel/git/tj/libata

Pull libata fixes from Tejun Heo:
"The most notable is the revert of lock splitting optimization in ahci.
  This also made the IRQ handling threaded even when there's only one
  IRQ in use.  The conversion missed IRFQ_SHARED leading to screaming
  IRQs problem in some cases and the threaded IRQ handling showed
  performance regression in some LKP test cases.  The changes are
  reverted for now.  It'll probably be retried once threaded IRQ
  handling is removed from ahci.

  Other than that, there's one fix for ahci and several patches adding
  device IDs"

* 'for-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  ahci: fix AHCI parameters not taken into account
  ata: sata_rcar: Add r8a7793 device support
  ahci: Add Device IDs for Intel Sunrise Point PCH
  ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks
  Revert "AHCI: Optimize single IRQ interrupt processing"
  Revert "AHCI: Do not acquire ata_host::lock from single IRQ handler"
  ata: sata_rcar: Disable DIPM mode for r8a7790 ES1

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block layer fixes from Jens Axboe:
"Four small fixes that should be merged for the current 3.18-rc series.
  This pull request contains:

   - a minor bugfix for computation of best IO priority given two
     merging requests.  From Jan Kara.

   - the final (final) merge count issue that has been plaguing
     virtio-blk.  From Ming Lei.

   - enable parallel reinit notify for blk-mq queues, to combine the
     cost of an RCU grace period across lots of devices.  From Tejun
     Heo.

   - an error handling fix for the SCSI_IOCTL_SEND_COMMAND ioctl.  From
     Tony Battersby"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: blk-merge: fix blk_recount_segments()
  scsi: Fix more error handling in SCSI_IOCTL_SEND_COMMAND
  blk-mq: make mq_queue_reinit_notify() freeze queues in parallel
  block: Fix computation of merged request priority

Merge tag 'pm+acpi-3.18-rc5' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael Wysocki:
"These are three regression fixes, two recent (generic power domains,
  suspend-to-idle) and one older (cpufreq), an ACPI blacklist entry for
  one more machine having problems with Windows 8 compatibility, a minor
  cpufreq driver fix (cpufreq-dt) and a fixup for new callback
  definitions (generic power domains).

  Specifics:

   - Fix a crash in the suspend-to-idle code path introduced by a recent
     commit that forgot to check a pointer against NULL before
     dereferencing it (Dmitry Eremin-Solenikov).

   - Fix a boot crash on Exynos5 introduced by a recent commit making
     that platform use generic Device Tree bindings for power domains
     which exposed a weakness in the generic power domains framework
     leading to that crash (Ulf Hansson).

   - Fix a crash during system resume on systems where cpufreq depends
     on Operation Performance Points (OPP) for functionality, but
     CONFIG_OPP is not set.  This leads the cpufreq driver registration
     to fail, but the resume code attempts to restore the pre-suspend
     cpufreq configuration (which does not exist) nevertheless and
     crashes.  From Geert Uytterhoeven.

   - Add a new ACPI blacklist entry for Dell Vostro 3546 that has
     problems if it is reported as Windows 8 compatible to the BIOS
     (Adam Lee).

   - Fix swapped arguments in an error message in the cpufreq-dt driver
     (Abhilash Kesavan).

   - Fix up the prototypes of new callbacks in struct generic_pm_domain
     to make them more useful.  Users of those callbacks will be added
     in 3.19 and it's better for them to be based on the correct struct
     definition in mainline from the start.  From Ulf Hansson and Kevin
     Hilman"

* tag 'pm+acpi-3.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / Domains: Fix initial default state of the need_restore flag
  PM / sleep: Fix entering suspend-to-IDLE if no freeze_oops is set
  PM / Domains: Change prototype for the attach and detach callbacks
  cpufreq: Avoid crash in resume on SMP without OPP
  cpufreq: cpufreq-dt: Fix arguments in clock failure error message
  ACPI / blacklist: blacklist Win8 OSI for Dell Vostro 3546

Merge tag 'firewire-fix' of git://git./linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Stefan Richter:
"IEEE 1394 (FireWire) subsystem fix: The character device file
  interface for raw 1394 I/O took uninitialized kernel stack as
  substitute for missing ioctl() argument data.  This could partially
  show up in subsequent read() output"

* tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: cdev: prevent kernel stack leaking into ioctl arguments

Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs fix from Al Viro:
"Fix for a really embarrassing braino in iov_iter. Kudos to paulus..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Fix thinko in iov_iter_single_seg_count

Merge branches 'pm-domains', 'pm-sleep' and 'pm-cpufreq'

* pm-domains:
  PM / Domains: Fix initial default state of the need_restore flag
  PM / Domains: Change prototype for the attach and detach callbacks

* pm-sleep:
  PM / sleep: Fix entering suspend-to-IDLE if no freeze_oops is set

* pm-cpufreq:
  cpufreq: Avoid crash in resume on SMP without OPP
  cpufreq: cpufreq-dt: Fix arguments in clock failure error message

Merge branch 'acpi-blacklist'

* acpi-blacklist:
ACPI / blacklist: blacklist Win8 OSI for Dell Vostro 3546

firewire: cdev: prevent kernel stack leaking into ioctl arguments

Found by the UC-KLEE tool:  A user could supply less input to
firewire-cdev ioctls than write- or write/read-type ioctl handlers
expect.  The handlers used data from uninitialized kernel stack then.

This could partially leak back to the user if the kernel subsequently
generated fw_cdev_event_'s (to be read from the firewire-cdev fd)
which notably would contain the _u64 closure field which many of the
ioctl argument structures contain.

The fact that the handlers would act on random garbage input is a
lesser issue since all handlers must check their input anyway.

The fix simply always null-initializes the entire ioctl argument buffer
regardless of the actual length of expected user input.  That is, a
runtime overhead of memset(..., 40) is added to each firewirew-cdev
ioctl() call.  [Comment from Clemens Ladisch:  This part of the stack is
most likely to be already in the cache.]

Remarks:
  - There was never any leak from kernel stack to the ioctl output
    buffer itself.  IOW, it was not possible to read kernel stack by a
    read-type or write/read-type ioctl alone; the leak could at most
    happen in combination with read()ing subsequent event data.
  - The actual expected minimum user input of each ioctl from
    include/uapi/linux/firewire-cdev.h is, in bytes:
    [0x00] = 32, [0x05] =  4, [0x0a] = 16, [0x0f] = 20, [0x14] = 16,
    [0x01] = 36, [0x06] = 20, [0x0b] =  4, [0x10] = 20, [0x15] = 20,
    [0x02] = 20, [0x07] =  4, [0x0c] =  0, [0x11] =  0, [0x16] =  8,
    [0x03] =  4, [0x08] = 24, [0x0d] = 20, [0x12] = 36, [0x17] = 12,
    [0x04] = 20, [0x09] = 24, [0x0e] =  4, [0x13] = 40, [0x18] =  4.

Reported-by: David Ramos <daramos@stanford.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio bugfix from Michael S Tsirkin:
"This fixes a crash in virtio console multi-channel mode that got
introduced in -rc1"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_console: move early VQ enablement

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) sunhme driver lacks DMA mapping error checks, based upon a report by
    Meelis Roos.

2) Fix memory leak in mvpp2 driver, from Sudip Mukherjee.

3) DMA memory allocation sizes are wrong in systemport ethernet driver,
    fix from Florian Fainelli.

4) Fix use after free in mac80211 defragmentation code, from Johannes
    Berg.

5) Some networking uapi headers missing from Kbuild file, from Stephen
    Hemminger.

6) TUN driver gets csum_start offset wrong when VLAN accel is enabled,
    and macvtap has a similar bug, from Herbert Xu.

7) Adjust several tunneling drivers to set dev->iflink after registry,
    because registry sets that to -1 overwriting whatever we did.  From
    Steffen Klassert.

8) Geneve forgets to set inner tunneling type, causing GSO segmentation
    to fail on some NICs.  From Jesse Gross.

9) Fix several locking bugs in stmmac driver, from Fabrice Gasnier and
    Giuseppe CAVALLARO.

10) Fix spurious timeouts with NewReno on low traffic connections, from
    Marcelo Leitner.

11) Fix descriptor updates in enic driver, from Govindarajulu
    Varadarajan.

12) PPP calls bpf_prog_create() with locks held, which isn't kosher.
    Fix from Takashi Iwai.

13) Fix NULL deref in SCTP with malformed INIT packets, from Daniel
    Borkmann.

14) psock_fanout selftest accesses past the end of the mmap ring, fix
    from Shuah Khan.

15) Fix PTP timestamping for VLAN packets, from Richard Cochran.

16) netlink_unbind() calls in netlink pass wrong initial argument, from
    Hiroaki SHIMODA.

17) vxlan socket reuse accidently reuses a socket when the address
    family is different, so we have to explicitly check this, from
    Marcelo Lietner.

18) Fix missing include in nft_reject_bridge.c breaking the build on ppc
    and other architectures, from Guenter Roeck.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits)
  vxlan: Do not reuse sockets for a different address family
  smsc911x: power-up phydev before doing a software reset.
  lib: rhashtable - Remove weird non-ASCII characters from comments
  net/smsc911x: Fix delays in the PHY enable/disable routines
  net/smsc911x: Fix rare soft reset timeout issue due to PHY power-down mode
  netlink: Properly unbind in error conditions.
  net: ptp: fix time stamp matching logic for VLAN packets.
  cxgb4 : dcb open-lldp interop fixes
  selftests/net: psock_fanout seg faults in sock_fanout_read_ring()
  net: bcmgenet: apply MII configuration in bcmgenet_open()
  net: bcmgenet: connect and disconnect from the PHY state machine
  net: qualcomm: Fix dependency
  ixgbe: phy: fix uninitialized status in ixgbe_setup_phy_link_tnx
  net: phy: Correctly handle MII ioctl which changes autonegotiation.
  ipv6: fix IPV6_PKTINFO with v4 mapped
  net: sctp: fix memory leak in auth key management
  net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
  net: ppp: Don't call bpf_prog_create() in ppp_lock
  net/mlx4_en: Advertize encapsulation offloads features only when VXLAN tunnel is set
  cxgb4 : Fix bug in DCB app deletion
  ...

Input: elantech - update the documentation

A chapter is added to describe the trackpoint packets.

A section is added to describe the behaviour of the knob crc_enabled in
sysfs.

The introduction of the documentation only mentioned v1/v2, but in the
last part it already contains explanation of v3 and v4. The introduction
is updated.

Signed-off-by: Ulrik De Bie <ulrik.debie-os@e2big.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Input: elantech - provide a sysfs knob for crc_enabled

The detection of crc_enabled is known to fail for Fujitsu H730. A DMI
blacklist is added for that, but it can be expected that other laptops will
pop up with this.

Here a sysfs knob is provided to alter the behaviour of crc_enabled.
Writing 0 or 1 to it sets the variable to 0 or 1. Reading it will show the
crc_enabled variable (0 or 1).

Reported-by: Stefan Valouch <stefan@valouch.com>
Signed-off-by: Ulrik De Bie <ulrik.debie-os@e2big.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Input: elantech - report the middle button of the touchpad

In the past, no elantech was known with 3 touchpad mouse buttons.
Fujitsu H730 is the first known elantech with a middle button. This commit
enables this middle button. For backwards compatibility, the Fujitsu is
detected via DMI, and only for this one 3 buttons will be announced.

Reported-by: Stefan Valouch <stefan@valouch.com>
Signed-off-by: Ulrik De Bie <ulrik.debie-os@e2big.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Input: alps - ignore bad data on Dell Latitudes E6440 and E7440

Sometimes on Dell Latitude laptops psmouse/alps driver receive invalid ALPS
protocol V3 packets with bit7 set in last byte. More often it can be
reproduced on Dell Latitude E6440 or E7440 with closed lid and pushing
cover above touchpad.

If bit7 in last packet byte is set then it is not valid ALPS packet. I was
told that ALPS devices never send these packets. It is not know yet who
send those packets, it could be Dell EC, bug in BIOS and also bug in
touchpad firmware...

With this patch alps driver does not process those invalid packets, but
instead of reporting PSMOUSE_BAD_DATA, getting into out of sync state,
getting back in sync with the next byte and spam dmesg we return
PSMOUSE_FULL_PACKET. If driver is truly out of sync we'll fail the checks
on the next byte and report PSMOUSE_BAD_DATA then.

Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Tested-by: Pali Rohár <pali.rohar@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'akpm' (fixes from Andrew Morton)

Merge misc fixes from Andrew Morton:
"15 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  MAINTAINERS: add IIO include files
  kernel/panic.c: update comments for print_tainted
  mem-hotplug: reset node present pages when hot-adding a new pgdat
  mem-hotplug: reset node managed pages when hot-adding a new pgdat
  mm/debug-pagealloc: correct freepage accounting and order resetting
  fanotify: fix notification of groups with inode & mount marks
  mm, compaction: prevent infinite loop in compact_zone
  mm: alloc_contig_range: demote pages busy message from warn to info
  mm/slab: fix unalignment problem on Malta with EVA due to slab merge
  mm/page_alloc: restrict max order of merging on isolated pageblock
  mm/page_alloc: move freepage counting logic to __free_one_page()
  mm/page_alloc: add freepage on isolate pageblock to correct buddy list
  mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype
  mm/compaction: skip the range until proper target pageblock is met
  zram: avoid kunmap_atomic() of a NULL pointer

Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

Pull Ceph fixes from Sage Weil:
"There is an overflow bug fix for cephfs from Zheng, a fix for handling
  large authentication ticket buffers in libceph from Ilya, and a few
  fixes for the request handling code from Ilya that affect RBD volumes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: change from BUG to WARN for __remove_osd() asserts
  libceph: clear r_req_lru_item in __unregister_linger_request()
  libceph: unlink from o_linger_requests when clearing r_osd
  libceph: do not crash on large auth tickets
  ceph: fix flush tid comparision

Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid

Pull HID fixes from Jiri Kosina:

- fix for an oops in HID core upon repeated subdriver insertion/removal
   under certain circumstances, by Benjamin Tissoires

- quirk for another Elan Touchscreen device, by Adel Gadllah

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: core: cleanup .claimed field on disconnect
  HID: usbhid: enable always-poll quirk for Elan Touchscreen 0103

MAINTAINERS: add IIO include files

Files under include/linux/iio were not reported as part of the IIO
subsystem.

Signed-off-by: Daniel Baluta <daniel.baluta@intel.com>
Reported-by: Cristina Ciocan <cristina.ciocan@intel.com>
Reviewed-by: Jingoo Han <jg1.han@samsung.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/panic.c: update comments for print_tainted

Commit 69361eef9056 ("panic: add TAINT_SOFTLOCKUP") added the 'L' flag,
but failed to update the comments for print_tainted(). So, update the
comments.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mem-hotplug: reset node present pages when hot-adding a new pgdat

When memory is hot-added, all the memory is in offline state.  So clear
all zones' present_pages because they will be updated in online_pages()
and offline_pages().  Otherwise, /proc/zoneinfo will corrupt:

When the memory of node2 is offline:

  # cat /proc/zoneinfo
  ......
  Node 2, zone   Movable
  ......
        spanned  8388608
        present  8388608
        managed  0

When we online memory on node2:

  # cat /proc/zoneinfo
  ......
  Node 2, zone   Movable
  ......
        spanned  8388608
        present  16777216
        managed  8388608

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: <stable@vger.kernel.org> [3.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mem-hotplug: reset node managed pages when hot-adding a new pgdat

In free_area_init_core(), zone->managed_pages is set to an approximate
value for lowmem, and will be adjusted when the bootmem allocator frees
pages into the buddy system.

But free_area_init_core() is also called by hotadd_new_pgdat() when
hot-adding memory.  As a result, zone->managed_pages of the newly added
node's pgdat is set to an approximate value in the very beginning.

Even if the memory on that node has node been onlined,
/sys/device/system/node/nodeXXX/meminfo has wrong value:

  hot-add node2 (memory not onlined)
  cat /sys/device/system/node/node2/meminfo
  Node 2 MemTotal:       33554432 kB
  Node 2 MemFree:               0 kB
  Node 2 MemUsed:        33554432 kB
  Node 2 Active:                0 kB

This patch fixes this problem by reset node managed pages to 0 after
hot-adding a new node.

1. Move reset_managed_pages_done from reset_node_managed_pages() to
   reset_all_zones_managed_pages()
2. Make reset_node_managed_pages() non-static
3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat
   is initialized

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: <stable@vger.kernel.org> [3.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/debug-pagealloc: correct freepage accounting and order resetting

One thing I did in this patch is fixing freepage accounting.  If we
clear guard page and link it onto isolate buddy list, we should not
increase freepage count.  This patch adds conditional branch to skip
counting in this case.  Without this patch, this overcounting happens
frequently if guard order is set and CMA is used.

Another thing fixed in this patch is the target to reset order.  In
__free_one_page(), we check the buddy page whether it is a guard page or
not.  And, if so, we should clear guard attribute on the buddy page and
reset order of it to 0.  But, current code resets original page's order
rather than buddy one's.  Maybe, this doesn't have any problem, because
whole merged page's order will be re-assigned soon.  But, it is better
to correct code.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Gioh Kim <gioh.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fanotify: fix notification of groups with inode & mount marks

fsnotify() needs to merge inode and mount marks lists when notifying
groups about events so that ignore masks from inode marks are reflected
in mount mark notifications and groups are notified in proper order
(according to priorities).

Currently the sorting of the lists done by fsnotify_add_inode_mark() /
fsnotify_add_vfsmount_mark() and fsnotify() differed which resulted
ignore masks not being used in some cases.

Fix the problem by always using the same comparison function when
sorting / merging the mark lists.

Thanks to Heinrich Schuchardt for improvements of my patch.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=87721
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, compaction: prevent infinite loop in compact_zone

Several people have reported occasionally seeing processes stuck in
compact_zone(), even triggering soft lockups, in 3.18-rc2+.

Testing a revert of commit e14c720efdd7 ("mm, compaction: remember
position within pageblock in free pages scanner") fixed the issue,
although the stuck processes do not appear to involve the free scanner.

Finally, by code inspection, the bug was found in isolate_migratepages()
which uses a slightly different condition to detect if the migration and
free scanners have met, than compact_finished().  That has not been a
problem until commit e14c720efdd7 allowed the free scanner position
between individual invocations to be in the middle of a pageblock.

In a relatively rare case, the migration scanner position can end up at
the beginning of a pageblock, with the free scanner position in the
middle of the same pageblock.  If it's the migration scanner's turn,
isolate_migratepages() exits immediately (without updating the
position), while compact_finished() decides to continue compaction,
resulting in a potentially infinite loop.  The system can recover only
if another process creates enough high-order pages to make the watermark
checks in compact_finished() pass.

This patch fixes the immediate problem by bumping the migration
scanner's position to meet the free scanner in isolate_migratepages(),
when both are within the same pageblock.  This causes compact_finished()
to terminate properly.  A more robust check in compact_finished() is
planned as a cleanup for better future maintainability.

Fixes: e14c720efdd73 ("mm, compaction: remember position within pageblock in free pages scanner)
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: P. Christeas <xrg@linux.gr>
Tested-by: P. Christeas <xrg@linux.gr>
Link: http://marc.info/?l=linux-mm&m=141508604232522&w=2
Reported-by: Norbert Preining <preining@logic.at>
Tested-by: Norbert Preining <preining@logic.at>
Link: https://lkml.org/lkml/2014/11/4/904
Reported-by: Pavel Machek <pavel@ucw.cz>
Link: https://lkml.org/lkml/2014/11/7/164
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: alloc_contig_range: demote pages busy message from warn to info

Having test_pages_isolated failure message as a warning confuses users
into thinking that it is more serious than it really is. In reality, if
called via CMA, allocation will be retried so a single
test_pages_isolated failure does not prevent allocation from succeeding.

Demote the warning message to an info message and reformat it such that
the text "failed" does not appear and instead a less worrying "PFNS
busy" is used.

This message is trivially reproducible on a 10GB x86 machine on 3.16.y
kernels configured with CONFIG_DMA_CMA.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/slab: fix unalignment problem on Malta with EVA due to slab merge

Unlike SLUB, sometimes, object isn't started at the beginning of the
slab in SLAB.  This causes the unalignment problem after slab merging is
supported by commit 12220dea07f1 ("mm/slab: support slab merge").

Following is the report from Markos that fail to boot on Malta with EVA.

    Calibrating delay loop... 19.86 BogoMIPS (lpj=99328)
    pid_max: default: 32768 minimum: 301
    Mount-cache hash table entries: 4096 (order: 0, 16384 bytes)
    Mountpoint-cache hash table entries: 4096 (order: 0, 16384 bytes)
    Kernel bug detected[#1]:
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-05639-g12220dea07f1 #1631
    task: 1f04f5d8 ti: 1f050000 task.ti: 1f050000
    epc   : 80141190 alloc_unbound_pwq+0x234/0x304
        Not tainted
    ra    : 80141184 alloc_unbound_pwq+0x228/0x304
    Process swapper/0 (pid: 1, threadinfo=1f050000, task=1f04f5d8, tls=00000000)
    Call Trace:
      alloc_unbound_pwq+0x234/0x304
      apply_workqueue_attrs+0x11c/0x294
      __alloc_workqueue_key+0x23c/0x470
      init_workqueues+0x320/0x400
      do_one_initcall+0xe8/0x23c
      kernel_init_freeable+0x9c/0x224
      kernel_init+0x10/0x100
      ret_from_kernel_thread+0x14/0x1c
    [ end trace cb88537fdc8fa200 ]
    Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

alloc_unbound_pwq() allocates slab object from pool_workqueue.  This
kmem_cache requires 256 bytes alignment, but, current merging code
doesn't honor that, and merge it with kmalloc-256.  kmalloc-256 requires
only cacheline size alignment so that above failure occurs.  However, in
x86, kmalloc-256 is luckily aligned in 256 bytes, so the problem didn't
happen on it.

To fix this problem, this patch introduces alignment mismatch check in
find_mergeable().  This will fix the problem.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reported-by: Markos Chandras <Markos.Chandras@imgtec.com>
Tested-by: Markos Chandras <Markos.Chandras@imgtec.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/page_alloc: restrict max order of merging on isolated pageblock

Current pageblock isolation logic could isolate each pageblock
individually.  This causes freepage accounting problem if freepage with
pageblock order on isolate pageblock is merged with other freepage on
normal pageblock.  We can prevent merging by restricting max order of
merging to pageblock order if freepage is on isolate pageblock.

A side-effect of this change is that there could be non-merged buddy
freepage even if finishing pageblock isolation, because undoing
pageblock isolation is just to move freepage from isolate buddy list to
normal buddy list rather than to consider merging.  So, the patch also
makes undoing pageblock isolation consider freepage merge.  When
un-isolation, freepage with more than pageblock order and it's buddy are
checked.  If they are on normal pageblock, instead of just moving, we
isolate the freepage and free it in order to get merged.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Heesub Shin <heesub.shin@samsung.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Ritesh Harjani <ritesh.list@gmail.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/page_alloc: move freepage counting logic to __free_one_page()

All the caller of __free_one_page() has similar freepage counting logic,
so we can move it to __free_one_page(). This reduce line of code and
help future maintenance.

This is also preparation step for "mm/page_alloc: restrict max order of
merging on isolated pageblock" which fix the freepage counting problem
on freepage with more than pageblock order.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Heesub Shin <heesub.shin@samsung.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Ritesh Harjani <ritesh.list@gmail.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/page_alloc: add freepage on isolate pageblock to correct buddy list

In free_pcppages_bulk(), we use cached migratetype of freepage to
determine type of buddy list where freepage will be added.  This
information is stored when freepage is added to pcp list, so if
isolation of pageblock of this freepage begins after storing, this
cached information could be stale.  In other words, it has original
migratetype rather than MIGRATE_ISOLATE.

There are two problems caused by this stale information.

One is that we can't keep these freepages from being allocated.
Although this pageblock is isolated, freepage will be added to normal
buddy list so that it could be allocated without any restriction.  And
the other problem is incorrect freepage accounting.  Freepages on
isolate pageblock should not be counted for number of freepage.

Following is the code snippet in free_pcppages_bulk().

    /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
    __free_one_page(page, page_to_pfn(page), zone, 0, mt);
    trace_mm_page_pcpu_drain(page, 0, mt);
    if (likely(!is_migrate_isolate_page(page))) {
        __mod_zone_page_state(zone, NR_FREE_PAGES, 1);
        if (is_migrate_cma(mt))
            __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
    }

As you can see above snippet, current code already handle second
problem, incorrect freepage accounting, by re-fetching pageblock
migratetype through is_migrate_isolate_page(page).

But, because this re-fetched information isn't used for
__free_one_page(), first problem would not be solved.  This patch try to
solve this situation to re-fetch pageblock migratetype before
__free_one_page() and to use it for __free_one_page().

In addition to move up position of this re-fetch, this patch use
optimization technique, re-fetching migratetype only if there is isolate
pageblock.  Pageblock isolation is rare event, so we can avoid
re-fetching in common case with this optimization.

This patch also correct migratetype of the tracepoint output.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Heesub Shin <heesub.shin@samsung.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Ritesh Harjani <ritesh.list@gmail.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype

Before describing bugs itself, I first explain definition of freepage.

1. pages on buddy list are counted as freepage.
2. pages on isolate migratetype buddy list are *not* counted as freepage.
3. pages on cma buddy list are counted as CMA freepage, too.

Now, I describe problems and related patch.

Patch 1: There is race conditions on getting pageblock migratetype that
it results in misplacement of freepages on buddy list, incorrect
freepage count and un-availability of freepage.

Patch 2: Freepages on pcp list could have stale cached information to
determine migratetype of buddy list to go.  This causes misplacement of
freepages on buddy list and incorrect freepage count.

Patch 4: Merging between freepages on different migratetype of
pageblocks will cause freepages accouting problem.  This patch fixes it.

Without patchset [3], above problem doesn't happens on my CMA allocation
test, because CMA reserved pages aren't used at all.  So there is no
chance for above race.

With patchset [3], I did simple CMA allocation test and get below
result:

- Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
- run kernel build (make -j16) on background
- 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
- Result: more than 5000 freepage count are missed

With patchset [3] and this patchset, I found that no freepage count are
missed so that I conclude that problems are solved.

On my simple memory offlining test, these problems also occur on that
environment, too.

This patch (of 4):

There are two paths to reach core free function of buddy allocator,
__free_one_page(), one is free_one_page()->__free_one_page() and the
other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page().
Each paths has race condition causing serious problems.  At first, this
patch is focused on first type of freepath.  And then, following patch
will solve the problem in second type of freepath.

In the first type of freepath, we got migratetype of freeing page
without holding the zone lock, so it could be racy.  There are two cases
of this race.

1. pages are added to isolate buddy list after restoring orignal
    migratetype

    CPU1                                   CPU2

    get migratetype => return MIGRATE_ISOLATE
    call free_one_page() with MIGRATE_ISOLATE

                                grab the zone lock
                                unisolate pageblock
                                release the zone lock

    grab the zone lock
    call __free_one_page() with MIGRATE_ISOLATE
    freepage go into isolate buddy list,
    although pageblock is already unisolated

This may cause two problems.  One is that we can't use this page anymore
until next isolation attempt of this pageblock, because freepage is on
isolate buddy list.  The other is that freepage accouting could be wrong
due to merging between different buddy list.  Freepages on isolate buddy
list aren't counted as freepage, but ones on normal buddy list are
counted as freepage.  If merge happens, buddy freepage on normal buddy
list is inevitably moved to isolate buddy list without any consideration
of freepage accouting so it could be incorrect.

2. pages are added to normal buddy list while pageblock is isolated.
    It is similar with above case.

This also may cause two problems.  One is that we can't keep these
freepages from being allocated.  Although this pageblock is isolated,
freepage would be added to normal buddy list so that it could be
allocated without any restriction.  And the other problem is same as
case 1, that it, incorrect freepage accouting.

This race condition would be prevented by checking migratetype again
with holding the zone lock.  Because it is somewhat heavy operation and
it isn't needed in common case, we want to avoid rechecking as much as
possible.  So this patch introduce new variable, nr_isolate_pageblock in
struct zone to check if there is isolated pageblock.  With this, we can
avoid to re-check migratetype in common case and do it only if there is
isolated pageblock or migratetype is MIGRATE_ISOLATE.  This solve above
mentioned problems.

Changes from v3:
Add one more check in free_one_page() that checks whether migratetype is
MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Heesub Shin <heesub.shin@samsung.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Ritesh Harjani <ritesh.list@gmail.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/compaction: skip the range until proper target pageblock is met

Commit 7d49d8868336 ("mm, compaction: reduce zone checking frequency in
the migration scanner") has a side-effect that changes the iteration
range calculation. Before the change, block_end_pfn is calculated using
start_pfn, but now it blindly adds pageblock_nr_pages to the previous
value.

This causes the problem that isolation_start_pfn is larger than
block_end_pfn when we isolate the page with more than pageblock order.
In this case, isolation would fail due to an invalid range parameter.

To prevent this, this patch implements skipping the range until a proper
target pageblock is met. Without this patch, CMA with more than
pageblock order always fails but with this patch it will succeed.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

zram: avoid kunmap_atomic() of a NULL pointer

zram could kunmap_atomic() a NULL pointer in a rare situation: a zram
page becomes a full-zeroed page after a partial write io. The current
code doesn't handle this case and performs kunmap_atomic() on a NULL
pointer, which panics the kernel.

This patch fixes this issue.

Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang.kh@gmail.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ARM: 8198/1: make kuser helpers depend on MMU

The kuser helpers page is not set up on non-MMU systems, so it does
not make sense to allow CONFIG_KUSER_HELPERS to be enabled when
CONFIG_MMU=n.  Allowing it to be set on !MMU results in an oops in
set_tls (used in execve and the arm_syscall trap handler):

Unhandled exception: IPSR = 00000005 LR = fffffff1
CPU: 0 PID: 1 Comm: swapper Not tainted 3.18.0-rc1-00041-ga30465a #216
task: 8b838000 ti: 8b82a000 task.ti: 8b82a000
PC is at flush_thread+0x32/0x40
LR is at flush_thread+0x21/0x40
pc : [<8f00157a>]    lr : [<8f001569>]    psr: 4100000b
sp : 8b82be20  ip : 00000000  fp : 8b83c000
r10: 00000001  r9 : 88018c84  r8 : 8bb85000
r7 : 8b838000  r6 : 00000000  r5 : 8bb77400  r4 : 8b82a000
r3 : ffff0ff0  r2 : 8b82a000  r1 : 00000000  r0 : 88020354
xPSR: 4100000b
CPU: 0 PID: 1 Comm: swapper Not tainted 3.18.0-rc1-00041-ga30465a #216
[<8f002bc1>] (unwind_backtrace) from [<8f002033>] (show_stack+0xb/0xc)
[<8f002033>] (show_stack) from [<8f00265b>] (__invalid_entry+0x4b/0x4c)

As best I can tell this issue existed for the set_tls ARM syscall
before commit fbfb872f5f41 "ARM: 8148/1: flush TLS and thumbee
register state during exec" consolidated the TLS manipulation code
into the set_tls helper function, but now that we're using it to flush
register state during execve, !MMU users encounter the oops at the
first exec.

Prevent CONFIG_MMU=n configurations from enabling
CONFIG_KUSER_HELPERS.

Fixes: fbfb872f5f41 (ARM: 8148/1: flush TLS and thumbee register state during exec)

Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Reported-by: Stefan Agner <stefan@agner.ch>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ARM: 8191/1: decompressor: ensure I-side picks up relocated code

To speed up decompression, the decompressor sets up a flat, cacheable
mapping of memory. However, when there is insufficient space to hold
the page tables for this mapping, we don't bother to enable the caches
and subsequently skip all the cache maintenance hooks.

Skipping the cache maintenance before jumping to the relocated code
allows the processor to predict the branch and populate the I-cache
with stale data before the relocation loop has completed (since a
bootloader may have SCTLR.I set, which permits normal, cacheable
instruction fetches regardless of SCTLR.M).

This patch moves the cache maintenance check into the maintenance
routines themselves, allowing the v6/v7 versions to invalidate the
I-cache regardless of the MMU state.

Cc: <stable@vger.kernel.org>
Reported-by: Marc Carino <marc.ceeeee@gmail.com>
Tested-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Merge tag 'drm/tegra/for-3.18-rc5' of git://people.freedesktop.org/~tagr/linux into drm-fixes

drm/tegra: Fixes for v3.18-rc5

This is a single patch that fixes the VBLANK machinery after:

7ffd7a68511c drm: Always reject drm_vblank_get() after drm_vblank_off()

* tag 'drm/tegra/for-3.18-rc5' of git://people.freedesktop.org/~tagr/linux:
drm/tegra: dc: Add missing call to drm_vblank_on()

Merge branch 'linux-3.18' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes

One modesetting, one gk20a fix.

* 'linux-3.18' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nouveau/nv50/disp: Fix modeset on G94
drm/gk20a/fb: fix setting of large page size bit

Merge tag 'drm-intel-fixes-2014-11-13' of git://anongit.freedesktop.org/drm-intel into drm-fixes

one regression fix.

* tag 'drm-intel-fixes-2014-11-13' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Fix obj->map_and_fenceable across tiling changes

vxlan: Do not reuse sockets for a different address family

Currently, we only match against local port number in order to reuse
socket. But if this new vxlan wants an IPv6 socket and a IPv4 one bound
to that port, vxlan will reuse an IPv4 socket as IPv6 and a panic will
follow. The following steps reproduce it:

   # ip link add vxlan6 type vxlan id 42 group 229.10.10.10 \
       srcport 5000 6000 dev eth0
   # ip link add vxlan7 type vxlan id 43 group ff0e::110 \
       srcport 5000 6000 dev eth0
   # ip link set vxlan6 up
   # ip link set vxlan7 up
   <panic>

[    4.187481] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
...
[    4.188076] Call Trace:
[    4.188085]  [<ffffffff81667c4a>] ? ipv6_sock_mc_join+0x3a/0x630
[    4.188098]  [<ffffffffa05a6ad6>] vxlan_igmp_join+0x66/0xd0 [vxlan]
[    4.188113]  [<ffffffff810a3430>] process_one_work+0x220/0x710
[    4.188125]  [<ffffffff810a33c4>] ? process_one_work+0x1b4/0x710
[    4.188138]  [<ffffffff810a3a3b>] worker_thread+0x11b/0x3a0
[    4.188149]  [<ffffffff810a3920>] ? process_one_work+0x710/0x710

So address family must also match in order to reuse a socket.

Reported-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

smsc911x: power-up phydev before doing a software reset.

With commit be9dad1f9f26604fb ("net: phy: suspend phydev when going
to HALTED"), the PHY device will be put in a low-power mode using
BMCR_PDOWN if the the interface is set down. The smsc911x driver does
a software_reset opening the device driver (ndo_open). In such case,
the PHY must be powered-up before access to any register and before
calling the software_reset function. Otherwise, as the PHY is powered
down the software reset fails and the interface can not be enabled
again.

This patch fixes this scenario that is easy to reproduce setting down
the network interface and setting up again.

    $ ifconfig eth0 down
    $ ifconfig eth0 up
    ifconfig: SIOCSIFFLAGS: Input/output error

Signed-off-by: Enric Balletbo i Serra <eballetbo@iseebcn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lib: rhashtable - Remove weird non-ASCII characters from comments

My editor spewed garbage that looked like memory corruption on
my screen. It turns out that a number of occurences of "fi" got
turned into a ligature.

This patch replaces these ligatures with the ASCII letters "fi".

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smsc911x: Fix delays in the PHY enable/disable routines

Increased delay in the smsc911x_phy_disable_energy_detect (from 1ms to 2ms).
Dropped delays in the smsc911x_phy_enable_energy_detect (100ms and 1ms).

The patch affect SMSC LAN generation 4 chips with integrated PHY (LAN9221).

I saw problems with soft reset due to wrong udelay timings.
After I fixed udelay, I measured the time needed to bring integrated PHY
from power-down to operational mode (the time beetween clearing EDPWRDOWN
bit and soft reset complete event). I got 1ms (measured using ktime_get).
The value is equal to the current value (1ms) used in the
smsc911x_phy_disable_energy_detect. It is near the upper bound and in order
to avoid rare soft reset faults it is doubled (2ms).

I don't know official timing for bringing up integrated PHY as specs doesn't
clarify this (or may be I didn't found).

It looks safe to drop delays before and after setting EDPWRDOWN bit
(enable PHY power-down mode). I didn't saw any regressions with the patch.

The patch was reviewed by Steve Glendinning and Microchip Team.

Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Acked-by: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smsc911x: Fix rare soft reset timeout issue due to PHY power-down mode

The patch affect SMSC LAN generation 4 chips with integrated PHY (LAN9221).

It is possible that PHY could enter power-down mode (ENERGYON clear),
between ENERGYON bit check in smsc911x_phy_disable_energy_detect and SRST
bit set in smsc911x_soft_reset. This could happen, for example, if someone
disconnect ethernet cable between the checks. The PHY in a power-down mode
would prevent the MAC portion of chip to be software reseted.

Initially found by code review, confirmed later using test case.

This is low probability issue, and in order to reproduce it you have to
run the script:

while true; do
ifconfig eth0 down
ifconfig eth0 up || break
done

While the script is running you have to plug/unplug ethernet cable many
times (using gpio controlled ethernet switch, for example) until get:

[ 4516.477783] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 4516.512207] smsc911x smsc911x.0: eth0: SMSC911x/921x identified at 0xce006000, IRQ: 336
[ 4516.524658] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 4516.559082] smsc911x smsc911x.0: eth0: SMSC911x/921x identified at 0xce006000, IRQ: 336
[ 4516.571990] ADDRCONF(NETDEV_UP): eth0: link is not ready
ifconfig: SIOCSIFFLAGS: Input/output error

The patch was reviewed by Steve Glendinning and Microchip Team.

Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Acked-by: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

libceph: change from BUG to WARN for __remove_osd() asserts

No reason to use BUG_ON for osd request list assertions.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>

libceph: clear r_req_lru_item in __unregister_linger_request()

kick_requests() can put linger requests on the notarget list. This
means we need to clear the much-overloaded req->r_req_lru_item in
__unregister_linger_request() as well, or we get an assertion failure
in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item).

AFAICT the assumption was that registered linger requests cannot be on
any of req->r_req_lru_item lists, but that's clearly not the case.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>

libceph: unlink from o_linger_requests when clearing r_osd

Requests have to be unlinked from both osd->o_requests (normal
requests) and osd->o_linger_requests (linger requests) lists when
clearing req->r_osd.  Otherwise __unregister_linger_request() gets
confused and we trip over a !list_empty(&osd->o_linger_requests)
assert in __remove_osd().

MON=1 OSD=1:

    # cat remove-osd.sh
    #!/bin/bash
    rbd create --size 1 test
    DEV=$(rbd map test)
    ceph osd out 0
    sleep 3
    rbd map dne/dne # obtain a new osdmap as a side effect
    rbd unmap $DEV & # will block
    sleep 3
    ceph osd in 0

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>

libceph: do not crash on large auth tickets

Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth
tickets will have their buffers vmalloc'ed, which leads to the
following crash in crypto:

[   28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0
[   28.686032] IP: [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80
[   28.686032] PGD 0
[   28.688088] Oops: 0000 [#1] PREEMPT SMP
[   28.688088] Modules linked in:
[   28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305
[   28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   28.688088] Workqueue: ceph-msgr con_work
[   28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000
[   28.688088] RIP: 0010:[<ffffffff81392b42>]  [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80
[   28.688088] RSP: 0018:ffff8800d903f688  EFLAGS: 00010286
[   28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0
[   28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750
[   28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880
[   28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010
[   28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000
[   28.688088] FS:  00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
[   28.688088] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0
[   28.688088] Stack:
[   28.688088]  ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32
[   28.688088]  ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020
[   28.688088]  ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010
[   28.688088] Call Trace:
[   28.688088]  [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40
[   28.688088]  [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40
[   28.688088]  [<ffffffff81395d32>] blkcipher_walk_done+0x182/0x220
[   28.688088]  [<ffffffff813990bf>] crypto_cbc_encrypt+0x15f/0x180
[   28.688088]  [<ffffffff81399780>] ? crypto_aes_set_key+0x30/0x30
[   28.688088]  [<ffffffff8156c40c>] ceph_aes_encrypt2+0x29c/0x2e0
[   28.688088]  [<ffffffff8156d2a3>] ceph_encrypt2+0x93/0xb0
[   28.688088]  [<ffffffff8156d7da>] ceph_x_encrypt+0x4a/0x60
[   28.688088]  [<ffffffff8155b39d>] ? ceph_buffer_new+0x5d/0xf0
[   28.688088]  [<ffffffff8156e837>] ceph_x_build_authorizer.isra.6+0x297/0x360
[   28.688088]  [<ffffffff8112089b>] ? kmem_cache_alloc_trace+0x11b/0x1c0
[   28.688088]  [<ffffffff8156b496>] ? ceph_auth_create_authorizer+0x36/0x80
[   28.688088]  [<ffffffff8156ed83>] ceph_x_create_authorizer+0x63/0xd0
[   28.688088]  [<ffffffff8156b4b4>] ceph_auth_create_authorizer+0x54/0x80
[   28.688088]  [<ffffffff8155f7c0>] get_authorizer+0x80/0xd0
[   28.688088]  [<ffffffff81555a8b>] prepare_write_connect+0x18b/0x2b0
[   28.688088]  [<ffffffff81559289>] try_read+0x1e59/0x1f10

This is because we set up crypto scatterlists as if all buffers were
kmalloc'ed.  Fix it.

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>

ceph: fix flush tid comparision

TID of cap flush ack is 64 bits, but ceph_inode_info::flushing_cap_tid
is only 16 bits. 16 bits should be plenty to let the cap flush updates
pipeline appropriately, but we need to cast in the proper direction when
comparing these differently-sized versions. So downcast the 64-bits one
to 16 bits.

Reflects ceph.git commit a5184cf46a6e867287e24aeb731634828467cd98.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>

Fix thinko in iov_iter_single_seg_count

The branches of the if (i->type & ITER_BVEC) statement in
iov_iter_single_seg_count() are the wrong way around; if ITER_BVEC is
clear then we use i->bvec, when we should be using i->iov. This fixes
it.

In my case, the symptom that this caused was that a KVM guest doing
filesystem operations on a virtual disk would result in one of qemu's
threads on the host going into an infinite loop in
generic_perform_write(). The loop would hit the copied == 0 case and
call iov_iter_single_seg_count() to reduce the number of bytes to try
to process, but because of the error, iov_iter_single_seg_count()
would just return i->count and the loop made no progress and continued
forever.

Cc: stable@vger.kernel.org # 3.16+
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor

Bruce reported that he was seeing the following BUG pop:

    BUG: sleeping function called from invalid context at mm/slab.c:2846
    in_atomic(): 0, irqs_disabled(): 0, pid: 4539, name: mount.nfs
    2 locks held by mount.nfs/4539:
    #0:  (nfs_clid_init_mutex){+.+.+.}, at: [<ffffffffa01c0a9a>] nfs4_discover_server_trunking+0x4a/0x2f0 [nfsv4]
    #1:  (rcu_read_lock){......}, at: [<ffffffffa00e3185>] gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
    Preemption disabled at:[<ffffffff81a4f082>] printk+0x4d/0x4f

    CPU: 3 PID: 4539 Comm: mount.nfs Not tainted 3.18.0-rc1-00013-g5b095e9 #3393
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    ffff880021499390 ffff8800381476a8 ffffffff81a534cf 0000000000000001
    0000000000000000 ffff8800381476c8 ffffffff81097854 00000000000000d0
    0000000000000018 ffff880038147718 ffffffff8118e4f3 0000000020479f00
    Call Trace:
    [<ffffffff81a534cf>] dump_stack+0x4f/0x7c
    [<ffffffff81097854>] __might_sleep+0x114/0x180
    [<ffffffff8118e4f3>] __kmalloc+0x1a3/0x280
    [<ffffffffa00e31d8>] gss_stringify_acceptor+0x58/0xb0 [auth_rpcgss]
    [<ffffffffa00e3185>] ? gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
    [<ffffffffa006b438>] rpcauth_stringify_acceptor+0x18/0x30 [sunrpc]
    [<ffffffffa01b0469>] nfs4_proc_setclientid+0x199/0x380 [nfsv4]
    [<ffffffffa01b04d0>] ? nfs4_proc_setclientid+0x200/0x380 [nfsv4]
    [<ffffffffa01bdf1a>] nfs40_discover_server_trunking+0xda/0x150 [nfsv4]
    [<ffffffffa01bde45>] ? nfs40_discover_server_trunking+0x5/0x150 [nfsv4]
    [<ffffffffa01c0acf>] nfs4_discover_server_trunking+0x7f/0x2f0 [nfsv4]
    [<ffffffffa01c8e24>] nfs4_init_client+0x104/0x2f0 [nfsv4]
    [<ffffffffa01539b4>] nfs_get_client+0x314/0x3f0 [nfs]
    [<ffffffffa0153780>] ? nfs_get_client+0xe0/0x3f0 [nfs]
    [<ffffffffa01c83aa>] nfs4_set_client+0x8a/0x110 [nfsv4]
    [<ffffffffa0069708>] ? __rpc_init_priority_wait_queue+0xa8/0xf0 [sunrpc]
    [<ffffffffa01c9b2f>] nfs4_create_server+0x12f/0x390 [nfsv4]
    [<ffffffffa01c1472>] nfs4_remote_mount+0x32/0x60 [nfsv4]
    [<ffffffff81196489>] mount_fs+0x39/0x1b0
    [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20
    [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150
    [<ffffffffa01c1396>] nfs_do_root_mount+0x86/0xc0 [nfsv4]
    [<ffffffffa01c1784>] nfs4_try_mount+0x44/0xc0 [nfsv4]
    [<ffffffffa01549b7>] ? get_nfs_version+0x27/0x90 [nfs]
    [<ffffffffa0161a2d>] nfs_fs_mount+0x47d/0xd60 [nfs]
    [<ffffffff81a59c5e>] ? mutex_unlock+0xe/0x10
    [<ffffffffa01606a0>] ? nfs_remount+0x430/0x430 [nfs]
    [<ffffffffa01609c0>] ? nfs_clone_super+0x140/0x140 [nfs]
    [<ffffffff81196489>] mount_fs+0x39/0x1b0
    [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20
    [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150
    [<ffffffff811b5830>] do_mount+0x210/0xbe0
    [<ffffffff811b54ca>] ? copy_mount_options+0x3a/0x160
    [<ffffffff811b651f>] SyS_mount+0x6f/0xb0
    [<ffffffff81a5c852>] system_call_fastpath+0x12/0x17

Sleeping under the rcu_read_lock is bad. This patch fixes it by dropping
the rcu_read_lock before doing the allocation and then reacquiring it
and redoing the dereference before doing the copy. If we find that the
string has somehow grown in the meantime, we'll reallocate and try again.

Cc: <stable@vger.kernel.org> # v3.17+
Reported-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

Merge tag 'sound-3.18-rc5' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Things get calming down, now we have only a few fix patches: a trivial
  fix for memory leak in usb-audio, a patch for the new HD-audio PCI id,
  a device-specific mute-LED fix, and a slightly big patch to cover the
  missing COEF inits of various Realtek codecs"

* tag 'sound-3.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Add mute LED control for Lenovo Ideapad Z560
  ALSA: hda/realtek - Change EAPD to verb control
  ALSA: usb-audio: Fix memory leak in FTU quirk
  ALSA: hda_intel: Add DeviceIDs for Sunrise Point-LP

Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security

Pull SELinux fixlet from James Morris:
"WARN_ONCE() here will unnecessarily terrify users"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
selinux: convert WARN_ONCE() to printk() in selinux_nlmsg_perm()

Merge branch 'stable-3.18' of git://git.infradead.org/users/pcmoore/audit

Pull audit fixes from Paul Moore:
"After he sent the initial audit pull request for 3.18, Eric asked me
  to take over the management of the audit tree, hence this pull request
  to fix a couple of problems with audit.

  As you can see below, the changes are minimal: adding some whitespace
  to a string so userspace parses it correctly, and fixing a problem
  with audit's usage of fsnotify that was causing audit watch rules to
  be lost.  Neither of these patches were very controversial on the
  mailing lists and they fix real problems, getting them into 3.18 would
  be a good thing"

* 'stable-3.18' of git://git.infradead.org/users/pcmoore/audit:
  audit: keep inode pinned
  audit: AUDIT_FEATURE_CHANGE message format missing delimiting space

Merge tag 'dm-3.18-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

- stable fix for dm-thin that avoids normal IO racing with discard

- stable fix for a dm-cache related bug in dm-btree walking code that
   results from using very large fast device (eg 4T) with a very small
   cache blocksize (eg 32K) -- this is a very uncommon configuration

- a couple fixes for dm-raid (one for stable and the other addresses a
   crash in 3.18-rc1 code)

- stable fix for dm-thinp that addresses a very rare dm-bufio bug
   having to do with memory reclaimation (via shrinker) when using
   dm-thinp ontop of loopback devices

- fix a leak in dm-stripe target constructor's error path

* tag 'dm-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm btree: fix a recursion depth bug in btree walking code
  dm thin: grab a virtual cell before looking up the mapping
  dm raid: fix inaccessible superblocks causing oops in configure_discard_support
  dm raid: ensure superblock's size matches device's logical block size
  dm bufio: change __GFP_IO to __GFP_FS in shrinker callbacks
  dm stripe: fix potential for leak in stripe_ctr error path

arm64: ARCH_PFN_OFFSET should be unsigned long

pfns are unsigned long, but PHYS_PFN_OFFSET is phys_addr_t. This leads
to page_to_pfn() returning phys_addr_t which cause type mismatches in
some print statements.

Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

Correct the race condition in aarch64_insn_patch_text_sync()

When experimenting with patches to provide kprobes support for aarch64
smp machines would hang when inserting breakpoints into kernel code.
The hangs were caused by a race condition in the code called by
aarch64_insn_patch_text_sync().  The first processor in the
aarch64_insn_patch_text_cb() function would patch the code while other
processors were still entering the function and incrementing the
cpu_count field.  This resulted in some processors never observing the
exit condition and exiting the function.  Thus, processors in the
system hung.

The first processor to enter the patching function performs the
patching and signals that the patching is complete with an increment
of the cpu_count field. When all the processors have incremented the
cpu_count field the cpu_count will be num_cpus_online()+1 and they
will return to normal execution.

Fixes: ae16480785de arm64: introduce interfaces to hotpatch kernel and module code
Signed-off-by: William Cohen <wcohen@redhat.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: __clear_user: handle exceptions on strb

ARM64 currently doesn't fix up faults on the single-byte (strb) case of
__clear_user... which means that we can cause a nasty kernel panic as an
ordinary user with any multiple PAGE_SIZE+1 read from /dev/zero.
i.e.: dd if=/dev/zero of=foo ibs=1 count=1 (or ibs=65537, etc.)

This is a pretty obscure bug in the general case since we'll only
__do_kernel_fault (since there's no extable entry for pc) if the
mmap_sem is contended. However, with CONFIG_DEBUG_VM enabled, we'll
always fault.

if (!down_read_trylock(&mm->mmap_sem)) {
if (!user_mode(regs) && !search_exception_tables(regs->pc))
goto no_context;
retry:
down_read(&mm->mmap_sem);
} else {
/*
* The above down_read_trylock() might have succeeded in
* which
* case, we'll have missed the might_sleep() from
* down_read().
*/
might_sleep();
if (!user_mode(regs) && !search_exception_tables(regs->pc))
goto no_context;
}

Fix that by adding an extable entry for the strb instruction, since it
touches user memory, similar to the other stores in __clear_user.

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Reported-by: Miloš Prchlík <mprchlik@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Fix data type for physical address

Use phys_addr_t for physical address in alloc_init_pud. Although
phys_addr_t and unsigned long are 64 bit in arm64, it is better
to use phys_addr_t to describe physical addresses.

Signed-off-by: Min-Hua Chen <orca.chen@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: efi: Fix stub cache maintenance

While efi-entry.S mentions that efi_entry() will have relocated the
kernel image, it actually means that efi_entry will have placed a copy
of the kernel in the appropriate location, and until this is branched to
at the end of efi_entry.S, all instructions are executed from the
original image.

Thus while the flush in efi_entry.S does ensure that the copy is visible
to noncacheable accesses, it does not guarantee that this is true for
the image instructions are being executed from. This could have
disasterous effects when the MMU and caches are disabled if the image
has not been naturally evicted to the PoC.

Additionally, due to a missing dsb following the ic ialluis, the new
kernel image is not necessarily clean in the I-cache when it is branched
to, with similar potentially disasterous effects.

This patch adds additional flushing to ensure that the currently
executing stub text is flushed to the PoC and is thus visible to
noncacheable accesses. As it is placed after the instructions cache
maintenance for the new image and __flush_dcache_area already contains a
dsb, we do not need to add a separate barrier to ensure completion of
the icache maintenance.

Comments are updated to clarify the situation with regard to the two
images and the maintenance required for both.

Fixes: 3c7f255039a2ad6ee1e3890505caf0d029b22e29
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Joel Schopp <joel.schopp@amd.com>
Reviewed-by: Roy Franz <roy.franz@linaro.org>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ian Campbell <ijc@hellion.org.uk>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

drm/tegra: dc: Add missing call to drm_vblank_on()

When the CRTC is enabled, make sure the VBLANK machinery is enabled.
Failure to do so will cause drm_vblank_get() to not enable the VBLANK on
the CRTC and VBLANK-synchronized page-flips won't work.

While at it, get rid of the legacy drm_vblank_pre_modeset() and
drm_vblank_post_modeset() calls that are replaced by drm_vblank_on()
and drm_vblank_off().

Reported-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>

Merge branch 'stable-3.18' of git://git.infradead.org/users/pcmoore/selinux into for-linus

ALSA: hda - Add mute LED control for Lenovo Ideapad Z560

Lenovo Ideapad Z560 has a mute LED that is controlled via EAPD pin
0x1b on CX20585 codec. (EAPD bit on corresponds to mute LED on.)
The machine doesn't need other EAPD, so the fixup concentrates on
controlling EAPD 0x1b following the vmaster state (but inversely).

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=665315
Reported-by: Szymon Kowalczyk <fazerxlo@o2.pl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>