platform/kernel/linux-rpi.git
8 years agosched/fair: Fix fixed point arithmetic width for shares and effective load
Dietmar Eggemann [Mon, 22 Aug 2016 14:00:41 +0000 (15:00 +0100)]
sched/fair: Fix fixed point arithmetic width for shares and effective load

Since commit:

  2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")

we now have two different fixed point units for load:

- 'shares' in calc_cfs_shares() has 20 bit fixed point unit on 64-bit
  kernels. Therefore use scale_load() on MIN_SHARES.

- 'wl' in effective_load() has 10 bit fixed point unit. Therefore use
  scale_load_down() on tg->shares which has 20 bit fixed point unit on
  64-bit kernels.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471874441-24701-1-git-send-email-dietmar.eggemann@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agosched/core, x86/topology: Fix NUMA in package topology bug
Tim Chen [Wed, 21 Sep 2016 19:19:03 +0000 (12:19 -0700)]
sched/core, x86/topology: Fix NUMA in package topology bug

Current code can call set_cpu_sibling_map() and invoke sched_set_topology()
more than once (e.g. on CPU hot plug).  When this happens after
sched_init_smp() has been called, we lose the NUMA topology extension to
sched_domain_topology in sched_init_numa().  This results in incorrect
topology when the sched domain is rebuilt.

This patch fixes the bug and issues warning if we call sched_set_topology()
after sched_init_smp().

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1474485552-141429-2-git-send-email-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoMerge branch 'linus' into sched/core, to pick up fixes
Ingo Molnar [Fri, 30 Sep 2016 08:44:27 +0000 (10:44 +0200)]
Merge branch 'linus' into sched/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Wed, 28 Sep 2016 23:20:24 +0000 (16:20 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge fixes from Andrew Morton:
 "4 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mem-hotplug: use nodes that contain memory as mask in new_node_page()
  scripts/recordmcount.c: account for .softirqentry.text
  dma-mapping.h: preserve unmap info for CONFIG_DMA_API_DEBUG
  mm,ksm: fix endless looping in allocating memory when ksm enable

8 years agomem-hotplug: use nodes that contain memory as mask in new_node_page()
Li Zhong [Wed, 28 Sep 2016 22:22:38 +0000 (15:22 -0700)]
mem-hotplug: use nodes that contain memory as mask in new_node_page()

9bb627be47a5 ("mem-hotplug: don't clear the only node in new_node_page()")
prevents allocating from an empty nodemask, but as David points out, it is
still wrong.  As node_online_map may include memoryless nodes, only
allocating from these nodes is meaningless.

This patch uses node_states[N_MEMORY] mask to prevent the above case.

Fixes: 9bb627be47a5 ("mem-hotplug: don't clear the only node in new_node_page()")
Fixes: 394e31d2ceb4 ("mem-hotplug: alloc new page from a nearest neighbor node when mem-offline")
Link: http://lkml.kernel.org/r/1474447117.28370.6.camel@TP420
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Suggested-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoscripts/recordmcount.c: account for .softirqentry.text
Dmitry Vyukov [Wed, 28 Sep 2016 22:22:36 +0000 (15:22 -0700)]
scripts/recordmcount.c: account for .softirqentry.text

be7635e7287e ("arch, ftrace: for KASAN put hard/soft IRQ entries into
separate sections") added .softirqentry.text section, but it was not added
to recordmcount.  So functions in the section are untracable.  Add the
section to scripts/recordmcount.c and scripts/recordmcount.pl.

Fixes: be7635e7287e ("arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections")
Link: http://lkml.kernel.org/r/1474902626-73468-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Steve Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org> [4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agodma-mapping.h: preserve unmap info for CONFIG_DMA_API_DEBUG
Andrey Smirnov [Wed, 28 Sep 2016 22:22:33 +0000 (15:22 -0700)]
dma-mapping.h: preserve unmap info for CONFIG_DMA_API_DEBUG

When CONFIG_DMA_API_DEBUG is enabled we need to preserve unmapping address
even if "unmap" is a no-op for our architecutre because we need
debug_dma_unmap_page() to correctly cleanup all of the debug bookkeeping.
Failing to do so results in a false positive warnings about previously
mapped areas never being unmapped.

Link: http://lkml.kernel.org/r/1474387125-3713-1-git-send-email-andrew.smirnov@gmail.com
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: "Luis R. Rodriguez" <mcgrof@suse.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Geliang Tang <geliangtang@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm,ksm: fix endless looping in allocating memory when ksm enable
zhong jiang [Wed, 28 Sep 2016 22:22:30 +0000 (15:22 -0700)]
mm,ksm: fix endless looping in allocating memory when ksm enable

I hit the following hung task when runing a OOM LTP test case with 4.1
kernel.

Call trace:
[<ffffffc000086a88>] __switch_to+0x74/0x8c
[<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
[<ffffffc000a1c09c>] schedule+0x3c/0x94
[<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
[<ffffffc000a1e32c>] down_write+0x64/0x80
[<ffffffc00021f794>] __ksm_exit+0x90/0x19c
[<ffffffc0000be650>] mmput+0x118/0x11c
[<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
[<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
[<ffffffc0000d0f34>] get_signal+0x444/0x5e0
[<ffffffc000089fcc>] do_signal+0x1d8/0x450
[<ffffffc00008a35c>] do_notify_resume+0x70/0x78

The oom victim cannot terminate because it needs to take mmap_sem for
write while the lock is held by ksmd for read which loops in the page
allocator

ksm_do_scan
scan_get_next_rmap_item
down_read
get_next_rmap_item
alloc_rmap_item   #ksmd will loop permanently.

There is no way forward because the oom victim cannot release any memory
in 4.1 based kernel.  Since 4.6 we have the oom reaper which would solve
this problem because it would release the memory asynchronously.
Nevertheless we can relax alloc_rmap_item requirements and use
__GFP_NORETRY because the allocation failure is acceptable as ksm_do_scan
would just retry later after the lock got dropped.

Such a patch would be also easy to backport to older stable kernels which
do not have oom_reaper.

While we are at it add GFP_NOWARN so the admin doesn't have to be alarmed
by the allocation failure.

Link: http://lkml.kernel.org/r/1474165570-44398-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Suggested-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge tag 'for-linus-20160928' of git://git.infradead.org/linux-mtd
Linus Torvalds [Wed, 28 Sep 2016 19:53:08 +0000 (12:53 -0700)]
Merge tag 'for-linus-20160928' of git://git.infradead.org/linux-mtd

Pull late MTD fixes from Brian Norris:
 "Another round of MTD fixes for v4.8

  My apologies for sending this so late.  I've been fairly absent as a
  maintainer this cycle, but I did queue these up weeks ago.  In the
  meantime, Richard was able to handle some other fixes (thanks!) but
  didn't pick these up.

  On the bright side, these are very simple changes that should carry
  little risk.

  Summary:

   - Davinci NAND: fix a long-standing bug in how we clear/prep 4-bit ECC

   - OMAP NAND: an error-handling fix that made it into v4.8-rc1 caused
     error-handling cases in other configurations/code-paths; this fixes
     the fix"

* tag 'for-linus-20160928' of git://git.infradead.org/linux-mtd:
  mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl
  mtd: nand: omap2: Don't call dma_release_channel() if dma_request_chan() failed

8 years agoMAINTAINERS: Update my e-mail
Mark Fasheh [Wed, 28 Sep 2016 19:51:04 +0000 (12:51 -0700)]
MAINTAINERS: Update my e-mail

I will be starting employment at Versity next week and would like to update
my MAINTAINERS e-mail to reflect that change. My versity e-mail is already
activated so I shouldn't get any bounces on the new one. My ability to help
with Ocfs2 kernel maintenance won't change as a result of the new job.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge branch 'for-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj...
Linus Torvalds [Tue, 27 Sep 2016 23:43:11 +0000 (16:43 -0700)]
Merge branch 'for-4.8-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
 "Three late fixes for cgroup: Two cpuset ones, one trivial and the
  other pretty obscure, and a cgroup core fix for a bug which impacts
  cgroup v2 namespace users"

* 'for-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix invalid controller enable rejections with cgroup namespace
  cpuset: fix non static symbol warning
  cpuset: handle race between CPU hotplug and cpuset_hotplug_work

8 years agoLinux 4.8-rc8
Linus Torvalds [Mon, 26 Sep 2016 01:47:13 +0000 (18:47 -0700)]
Linux 4.8-rc8

8 years agoMerge tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Mon, 26 Sep 2016 01:40:13 +0000 (18:40 -0700)]
Merge tag 'trace-v4.8-rc7' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracefs fixes from Steven Rostedt:
 "Al Viro has been looking at the tracefs code, and has pointed out some
  issues.  This contains one fix by me and one by Al.  I'm sure that
  he'll come up with more but for now I tested these patches and they
  don't appear to have any negative impact on tracing"

* tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  fix memory leaks in tracing_buffers_splice_read()
  tracing: Move mutex to protect against resetting of seq data

8 years agofault_in_multipages_readable() throws set-but-unused error
Dave Chinner [Sun, 25 Sep 2016 23:57:33 +0000 (09:57 +1000)]
fault_in_multipages_readable() throws set-but-unused error

When building XFS with -Werror, it now fails with:

  include/linux/pagemap.h: In function 'fault_in_multipages_readable':
  include/linux/pagemap.h:602:16: error: variable 'c' set but not used [-Werror=unused-but-set-variable]
    volatile char c;
                  ^

This is a regression caused by commit e23d4159b109 ("fix
fault_in_multipages_...() on architectures with no-op access_ok()").
Fix it by re-adding the "(void)c" trick taht was previously used to make
the compiler think the variable is used.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
Lorenzo Stoakes [Sun, 11 Sep 2016 22:54:25 +0000 (23:54 +0100)]
mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing

The NUMA balancing logic uses an arch-specific PROT_NONE page table flag
defined by pte_protnone() or pmd_protnone() to mark PTEs or huge page
PMDs respectively as requiring balancing upon a subsequent page fault.
User-defined PROT_NONE memory regions which also have this flag set will
not normally invoke the NUMA balancing code as do_page_fault() will send
a segfault to the process before handle_mm_fault() is even called.

However if access_remote_vm() is invoked to access a PROT_NONE region of
memory, handle_mm_fault() is called via faultin_page() and
__get_user_pages() without any access checks being performed, meaning
the NUMA balancing logic is incorrectly invoked on a non-NUMA memory
region.

A simple means of triggering this problem is to access PROT_NONE mmap'd
memory using /proc/self/mem which reliably results in the NUMA handling
functions being invoked when CONFIG_NUMA_BALANCING is set.

This issue was reported in bugzilla (issue 99101) which includes some
simple repro code.

There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page()
added at commit c0e7cad to avoid accidentally provoking strange
behaviour by attempting to apply NUMA balancing to pages that are in
fact PROT_NONE.  The BUG_ON()'s are consistently triggered by the repro.

This patch moves the PROT_NONE check into mm/memory.c rather than
invoking BUG_ON() as faulting in these pages via faultin_page() is a
valid reason for reaching the NUMA check with the PROT_NONE page table
flag set and is therefore not always a bug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101
Reported-by: Trevor Saunders <tbsaunde@tbsaunde.org>
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Linus Torvalds [Sun, 25 Sep 2016 20:59:52 +0000 (13:59 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
 "A round of 4.8 fixes:

  MIPS generic code:
   - Add a missing ".set pop" in an early commit
   - Fix memory regions reaching top of physical
   - MAAR: Fix address alignment
   - vDSO: Fix Malta EVA mapping to vDSO page structs
   - uprobes: fix incorrect uprobe brk handling
   - uprobes: select HAVE_REGS_AND_STACK_ACCESS_API
   - Avoid a BUG warning during PR_SET_FP_MODE prctl
   - SMP: Fix possibility of deadlock when bringing CPUs online
   - R6: Remove compact branch policy Kconfig entries
   - Fix size calc when avoiding IPIs for small icache flushes
   - Fix pre-r6 emulation FPU initialisation
   - Fix delay slot emulation count in debugfs

  ATH79:
   - Fix test for error return of clk_register_fixed_factor.

  Octeon:
   - Fix kernel header to work for VDSO build.
   - Fix initialization of platform device probing.

  paravirt:
   - Fix undefined reference to smp_bootstrap"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Fix delay slot emulation count in debugfs
  MIPS: SMP: Fix possibility of deadlock when bringing CPUs online
  MIPS: Fix pre-r6 emulation FPU initialisation
  MIPS: vDSO: Fix Malta EVA mapping to vDSO page structs
  MIPS: Select HAVE_REGS_AND_STACK_ACCESS_API
  MIPS: Octeon: Fix platform bus probing
  MIPS: Octeon: mangle-port: fix build failure with VDSO code
  MIPS: Avoid a BUG warning during prctl(PR_SET_FP_MODE, ...)
  MIPS: c-r4k: Fix size calc when avoiding IPIs for small icache flushes
  MIPS: Add a missing ".set pop" in an early commit
  MIPS: paravirt: Fix undefined reference to smp_bootstrap
  MIPS: Remove compact branch policy Kconfig entries
  MIPS: MAAR: Fix address alignment
  MIPS: Fix memory regions reaching top of physical
  MIPS: uprobes: fix incorrect uprobe brk handling
  MIPS: ath79: Fix test for error return of clk_register_fixed_factor().

8 years agoMerge tag 'powerpc-4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 25 Sep 2016 20:52:59 +0000 (13:52 -0700)]
Merge tag 'powerpc-4.8-7' of git://git./linux/kernel/git/powerpc/linux

Pull one more powerpc fix from Michael Ellerman:
 "powernv/pci: Fix m64 checks for SR-IOV and window alignment from
  Russell Currey"

* tag 'powerpc-4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment

8 years agoradix tree: fix sibling entry handling in radix_tree_descend()
Linus Torvalds [Sun, 25 Sep 2016 20:32:46 +0000 (13:32 -0700)]
radix tree: fix sibling entry handling in radix_tree_descend()

The fixes to the radix tree test suite show that the multi-order case is
broken.  The basic reason is that the radix tree code uses tagged
pointers with the "internal" bit in the low bits, and calculating the
pointer indices was supposed to mask off those bits.  But gcc will
notice that we then use the index to re-create the pointer, and will
avoid doing the arithmetic and use the tagged pointer directly.

This cleans the code up, using the existing is_sibling_entry() helper to
validate the sibling pointer range (instead of open-coding it), and
using entry_to_node() to mask off the low tag bit from the pointer.  And
once you do that, you might as well just use the now cleaned-up pointer
directly.

[ Side note: the multi-order code isn't actually ever used in the kernel
  right now, and the only reason I didn't just delete all that code is
  that Kirill Shutemov piped up and said:

    "Well, my ext4-with-huge-pages patchset[1] uses multi-order entries.
     It also converts shmem-with-huge-pages and hugetlb to them.

     I'm okay with converting it to other mechanism, but I need
     something.  (I looked into Konstantin's RFC patchset[2].  It looks
     okay, but I don't feel myself qualified to review it as I don't
     know much about radix-tree internals.)"

  [1] http://lkml.kernel.org/r/20160915115523.29737-1-kirill.shutemov@linux.intel.com
  [2] http://lkml.kernel.org/r/147230727479.9957.1087787722571077339.stgit@zurg ]

Reported-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Cedric Blancher <cedric.blancher@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoradix tree test suite: Test radix_tree_replace_slot() for multiorder entries
Matthew Wilcox [Thu, 22 Sep 2016 18:53:34 +0000 (11:53 -0700)]
radix tree test suite: Test radix_tree_replace_slot() for multiorder entries

When we replace a multiorder entry, check that all indices reflect the
new value.

Also, compile the test suite with -O2, which shows other problems with
the code due to some dodgy pointer operations in the radix tree code.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agofix memory leaks in tracing_buffers_splice_read()
Al Viro [Sat, 17 Sep 2016 22:31:46 +0000 (18:31 -0400)]
fix memory leaks in tracing_buffers_splice_read()

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
8 years agotracing: Move mutex to protect against resetting of seq data
Steven Rostedt (Red Hat) [Sat, 24 Sep 2016 02:57:13 +0000 (22:57 -0400)]
tracing: Move mutex to protect against resetting of seq data

The iter->seq can be reset outside the protection of the mutex. So can
reading of user data. Move the mutex up to the beginning of the function.

Fixes: d7350c3f45694 ("tracing/core: make the read callbacks reentrants")
Cc: stable@vger.kernel.org # 2.6.30+
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8 years agoMIPS: Fix delay slot emulation count in debugfs
Paul Burton [Thu, 22 Sep 2016 14:47:40 +0000 (15:47 +0100)]
MIPS: Fix delay slot emulation count in debugfs

Commit 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot
instructions") accidentally removed use of the MIPS_FPU_EMU_INC_STATS
macro from do_dsemulret, leading to the ds_emul file in debugfs always
returning zero even though we perform delay slot emulations.

Fix this by re-adding the use of the MIPS_FPU_EMU_INC_STATS macro.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot instructions")
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14301/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
8 years agoMIPS: SMP: Fix possibility of deadlock when bringing CPUs online
Matt Redfearn [Thu, 22 Sep 2016 16:15:47 +0000 (17:15 +0100)]
MIPS: SMP: Fix possibility of deadlock when bringing CPUs online

This patch fixes the possibility of a deadlock when bringing up
secondary CPUs.
The deadlock occurs because the set_cpu_online() is called before
synchronise_count_slave(). This can cause a deadlock if the boot CPU,
having scheduled another thread, attempts to send an IPI to the
secondary CPU, which it sees has been marked online. The secondary is
blocked in synchronise_count_slave() waiting for the boot CPU to enter
synchronise_count_master(), but the boot cpu is blocked in
smp_call_function_many() waiting for the secondary to respond to it's
IPI request.

Fix this by marking the CPU online in cpu_callin_map and synchronising
counters before declaring the CPU online and calculating the maps for
IPIs.

Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Reported-by: Justin Chen <justinpopo6@gmail.com>
Tested-by: Justin Chen <justinpopo6@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: stable@vger.kernel.org # v4.1+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14302/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
8 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 24 Sep 2016 19:44:28 +0000 (12:44 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "Three fixlets for perf:

   - add a missing NULL pointer check in the intel BTS driver

   - make BTS an exclusive PMU because BTS can only handle one event at
     a time

   - ensure that exclusive events are limited to one PMU so that several
     exclusive events can be scheduled on different PMU instances"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Limit matching exclusive events to one PMU
  perf/x86/intel/bts: Make it an exclusive PMU
  perf/x86/intel/bts: Make sure debug store is valid

8 years agoMerge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 24 Sep 2016 19:41:19 +0000 (12:41 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "Two smallish fixes:

   - use the proper asm constraint in the Super-H atomic_fetch_ops

   - a trivial typo fix in the Kconfig help text"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help text
  locking/atomic, arch/sh: Fix ATOMIC_FETCH_OP()

8 years agoMerge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 24 Sep 2016 19:35:26 +0000 (12:35 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull EFI fixes from Thomas Gleixner:
 "Two fixes for EFI/PAT:

   - a 32bit overflow bug in the PAT code which was unearthed by the
     large EFI mappings

   - prevent a boot hang on large systems when EFI mixed mode is enabled
     but not used"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Only map RAM into EFI page tables if in mixed-mode
  x86/mm/pat: Prevent hang during boot when mapping pages

8 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 24 Sep 2016 19:30:12 +0000 (12:30 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "Three fixes for irq core and irq chip drivers:

   - Do not set the irq type if type is NONE.  Fixes a boot regression
     on various SoCs

   - Use the proper cpu for setting up the GIC target list.  Discovered
     by the cpumask debugging code.

   - A rather large fix for the MIPS-GIC so per cpu local interrupts
     work again.  This was discovered late because the code falls back
     to slower timers which use normal device interrupts"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mips-gic: Fix local interrupts
  irqchip/gicv3: Silence noisy DEBUG_PER_CPU_MAPS warning
  genirq: Skip chained interrupt trigger setup if type is IRQ_TYPE_NONE

8 years agoMerge branch 'hughd-fixes' (patches from Hugh Dickins)
Linus Torvalds [Sat, 24 Sep 2016 18:31:45 +0000 (11:31 -0700)]
Merge branch 'hughd-fixes' (patches from Hugh Dickins)

Merge VM fixes from High Dickins:
 "I get the impression that Andrew is away or busy at the moment, so I'm
  going to send you three independent uncontroversial little mm fixes
  directly - though none is strictly a 4.8 regression fix.

   - shmem: fix tmpfs to handle the huge= option properly from Toshi
     Kani is a one-liner to fix a major embarrassment in 4.8's hugepages
     on tmpfs feature: although Hillf pointed it out in June, somehow
     both Kirill and I repeatedly dropped the ball on this one.  You
     might wonder if the feature got tested at all with that bug in:
     yes, it did, but for wider testing coverage, Kirill and I had each
     relied too much on an override which bypasses that condition.

   - huge tmpfs: fix Committed_AS leak just a run-of-the-mill accounting
     fix in the same feature.

   - mm: delete unnecessary and unsafe init_tlb_ubc() is an unrelated
     fix to 4.3's TLB flush batching in reclaim: the bug would be rare,
     and none of us will be shamed if this one misses 4.8; but it got
     such a quick ack from Mel today that I'm inclined to offer it along
     with the first two"

* emailed patches from Hugh Dickins <hughd@google.com>:
  mm: delete unnecessary and unsafe init_tlb_ubc()
  huge tmpfs: fix Committed_AS leak
  shmem: fix tmpfs to handle the huge= option properly

8 years agomm: delete unnecessary and unsafe init_tlb_ubc()
Hugh Dickins [Sat, 24 Sep 2016 03:27:04 +0000 (20:27 -0700)]
mm: delete unnecessary and unsafe init_tlb_ubc()

init_tlb_ubc() looked unnecessary to me: tlb_ubc is statically
initialized with zeroes in the init_task, and copied from parent to
child while it is quiescent in arch_dup_task_struct(); so I went to
delete it.

But inserted temporary debug WARN_ONs in place of init_tlb_ubc() to
check that it was always empty at that point, and found them firing:
because memcg reclaim can recurse into global reclaim (when allocating
biosets for swapout in my case), and arrive back at the init_tlb_ubc()
in shrink_node_memcg().

Resetting tlb_ubc.flush_required at that point is wrong: if the upper
level needs a deferred TLB flush, but the lower level turns out not to,
we miss a TLB flush.  But fortunately, that's the only part of the
protocol that does not nest: with the initialization removed, cpumask
collects bits from upper and lower levels, and flushes TLB when needed.

Fixes: 72b252aed506 ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agohuge tmpfs: fix Committed_AS leak
Hugh Dickins [Sat, 24 Sep 2016 03:24:23 +0000 (20:24 -0700)]
huge tmpfs: fix Committed_AS leak

Under swapping load on huge tmpfs, /proc/meminfo's Committed_AS grows
bigger and bigger: just a cosmetic issue for most users, but disabling
for those who run without overcommit (/proc/sys/vm/overcommit_memory 2).

shmem_uncharge() was forgetting to unaccount __vm_enough_memory's
charge, and shmem_charge() was forgetting it on the filesystem-full
error path.

Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoshmem: fix tmpfs to handle the huge= option properly
Toshi Kani [Sat, 24 Sep 2016 03:21:56 +0000 (20:21 -0700)]
shmem: fix tmpfs to handle the huge= option properly

shmem_get_unmapped_area() checks SHMEM_SB(sb)->huge incorrectly, which
leads to a reversed effect of "huge=" mount option.

Fix the check in shmem_get_unmapped_area().

Note, the default value of SHMEM_SB(sb)->huge remains as
SHMEM_HUGE_NEVER.  User will need to specify "huge=" option to enable
huge page mappings.

Reported-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Fri, 23 Sep 2016 23:44:12 +0000 (16:44 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Three driver bugfixes: fixing uninitialized memory pointers (eg20t),
  pm/clock imbalance (qup), and a wrongly set cached variable (pc954x)"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: qup: skip qup_i2c_suspend if the device is already runtime suspended
  i2c: mux: pca954x: retry updating the mux selection on failure
  i2c-eg20t: fix race between i2c init and interrupt enable

8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Fri, 23 Sep 2016 23:34:24 +0000 (16:34 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input updates from Dmitry Torokhov:
 "Just a fix up for the firmware handling to the Silead driver (which is
  a new driver in this release)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: silead_gsl1680 - use "silead/" prefix for firmware loading
  Input: silead_gsl1680 - document firmware-name, fix implementation

8 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 23 Sep 2016 23:24:36 +0000 (16:24 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Three fixes, two regressions and one that poses a problem in blk-mq
  with the new nvmef code"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: skip unmapped queues in blk_mq_alloc_request_hctx
  nvme-rdma: only clear queue flags after successful connect
  blk-throttle: Extend slice if throttle group is not empty

8 years agocgroup: fix invalid controller enable rejections with cgroup namespace
Tejun Heo [Fri, 23 Sep 2016 20:55:49 +0000 (16:55 -0400)]
cgroup: fix invalid controller enable rejections with cgroup namespace

On the v2 hierarchy, "cgroup.subtree_control" rejects controller
enables if the cgroup has processes in it.  The enforcement of this
logic assumes that the cgroup wouldn't have any css_sets associated
with it if there are no tasks in the cgroup, which is no longer true
since a79a908fd2b0 ("cgroup: introduce cgroup namespaces").

When a cgroup namespace is created, it pins the css_set of the
creating task to use it as the root css_set of the namespace.  This
extra reference stays as long as the namespace is around and makes
"cgroup.subtree_control" think that the namespace root cgroup is not
empty even when it is and thus reject controller enables.

Fix it by making cgroup_subtree_control() walk and test emptiness of
each css_set instead of testing whether the list_head is empty.

While at it, update the comment of cgroup_task_count() to indicate
that the returned value may be higher than the number of tasks, which
has always been true due to temporary references and doesn't break
anything.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Evgeny Vereshchagin <evvers@ya.ru>
Cc: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: Aditya Kali <adityakali@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: stable@vger.kernel.org # v4.6+
Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces")
Link: https://github.com/systemd/systemd/pull/3589#issuecomment-249089541
8 years agoMerge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason...
Linus Torvalds [Fri, 23 Sep 2016 20:39:37 +0000 (13:39 -0700)]
Merge branch 'for-linus-4.8' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
 "Josef fixed a problem when quotas are enabled with his latest ENOSPC
  rework, and Jeff added more checks into the subvol ioctls to avoid
  tripping up lookup_one_len"

* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: ensure that file descriptor used with subvol ioctls is a dir
  Btrfs: handle quota reserve failure properly

8 years agoMerge tag 'regmap-fix-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 23 Sep 2016 18:50:49 +0000 (11:50 -0700)]
Merge tag 'regmap-fix-v4.8-rc7' of git://git./linux/kernel/git/broonie/regmap

Pull regmap fix from Mark Brown:
 "A fix for an issue with double locking that was introduced earlier
  this release.  I'd missed in review that we were already in a locked
  region when trying to drop part of the cache"

* tag 'regmap-fix-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: fix deadlock on _regmap_raw_write() error path

8 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Fri, 23 Sep 2016 18:28:04 +0000 (11:28 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This fixes a regression in RSA that was only half-fixed earlier in the
  cycle.  It also fixes an older regression that breaks the keyring
  subsystem"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: rsa-pkcs1pad - Handle leading zero for decryption
  KEYS: Fix skcipher IV clobbering

8 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 23 Sep 2016 18:24:42 +0000 (11:24 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
 "A couple of last-minute arm64 fixes for 4.8:

   - Fix secondary CPU to NUMA node assignment

   - Fix kgdb breakpoint insertion in read-only text sections (when
     CONFIG_DEBUG_RODATA or CONFIG_DEBUG_SET_MODULE_RONX are enabled)"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: kgdb: handle read-only text / modules
  arm64: Call numa_store_cpu_info() earlier.

8 years agoMerge tag 'tags/nand-fixes-for-4.8-rc8' of git://git.infradead.org/linux-ubifs
Linus Torvalds [Fri, 23 Sep 2016 18:15:00 +0000 (11:15 -0700)]
Merge tag 'tags/nand-fixes-for-4.8-rc8' of git://git.infradead.org/linux-ubifs

Pull MTD fixes from Richard Weinberger:
 "NAND Fixes for 4.8-rc8.

  This contains fixes for bugs which got introduced in -rc1.  Usually
  Brian takes NAND patches from Boris, but since Brian is very busy
  these days with other stuff and Boris is not yet member of the
  kernel.org web of trust I stepped in.

  Boris will be in Berlin at ELCE, I'll sign his key and hopefully other
  Kernel developers too such that he can issue his own pull requests
  soon.

  Summary:

   - Fix a wrong OOB layout definition in the mxc driver
   - Fix incorrect ECC handling in the mtk driver"

* tag 'tags/nand-fixes-for-4.8-rc8' of git://git.infradead.org/linux-ubifs:
  mtd: nand: mxc: fix obiwan error in mxc_nand_v[12]_ooblayout_free() functions
  mtd: nand: fix chances to create incomplete ECC data when writing
  mtd: nand: fix generating over-boundary ECC data when writing

8 years agoMerge tag 'mmc-v4.8-rc7' of git://git.linaro.org/people/ulf.hansson/mmc
Linus Torvalds [Fri, 23 Sep 2016 18:10:53 +0000 (11:10 -0700)]
Merge tag 'mmc-v4.8-rc7' of git://git.linaro.org/people/ulf.hansson/mmc

Pull MMC fix from Ulf Hansson:
 "MMC host:

   - dw_mmc: fix the spamming log message"

* tag 'mmc-v4.8-rc7' of git://git.linaro.org/people/ulf.hansson/mmc:
  mmc: dw_mmc: fix the spamming log message

8 years agoMerge tag 'configfs-for-4.8-2' of git://git.infradead.org/users/hch/configfs
Linus Torvalds [Fri, 23 Sep 2016 16:45:15 +0000 (09:45 -0700)]
Merge tag 'configfs-for-4.8-2' of git://git.infradead.org/users/hch/configfs

Pull configfs fix from Christoph Hellwig:
 "One more trivial fix for the binary attribute code from Phil Turnbull"

* tag 'configfs-for-4.8-2' of git://git.infradead.org/users/hch/configfs:
  configfs: Return -EFBIG from configfs_write_bin_file.

8 years agoblk-mq: skip unmapped queues in blk_mq_alloc_request_hctx
Christoph Hellwig [Fri, 23 Sep 2016 16:25:48 +0000 (10:25 -0600)]
blk-mq: skip unmapped queues in blk_mq_alloc_request_hctx

This provides the caller a feedback that a given hctx is not mapped and thus
no command can be sent on it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMIPS: Fix pre-r6 emulation FPU initialisation
Paul Burton [Fri, 23 Sep 2016 14:13:53 +0000 (15:13 +0100)]
MIPS: Fix pre-r6 emulation FPU initialisation

In the mipsr2_decoder() function, used to emulate pre-MIPSr6
instructions that were removed in MIPSr6, the init_fpu() function is
called if a removed pre-MIPSr6 floating point instruction is the first
floating point instruction used by the task. However, init_fpu()
performs varous actions that rely upon not being migrated. For example
in the most basic case it sets the coprocessor 0 Status.CU1 bit to
enable the FPU & then loads FP register context into the FPU registers.
If the task were to migrate during this time, it may end up attempting
to load FP register context on a different CPU where it hasn't set the
CU1 bit, leading to errors such as:

    do_cpu invoked from kernel context![#2]:
    CPU: 2 PID: 7338 Comm: fp-prctl Tainted: G      D         4.7.0-00424-g49b0c82 #2
    task: 838e4000 ti: 88d38000 task.ti: 88d38000
    $ 0   : 00000000 00000001 ffffffff 88d3fef8
    $ 4   : 838e4000 88d38004 00000000 00000001
    $ 8   : 3400fc01 801f8020 808e9100 24000000
    $12   : dbffffff 807b69d8 807b0000 00000000
    $16   : 00000000 80786150 00400fc4 809c0398
    $20   : 809c0338 0040273c 88d3ff28 808e9d30
    $24   : 808e9d30 00400fb4
    $28   : 88d38000 88d3fe88 00000000 8011a2ac
    Hi    : 0040273c
    Lo    : 88d3ff28
    epc   : 80114178 _restore_fp+0x10/0xa0
    ra    : 8011a2ac mipsr2_decoder+0xd5c/0x1660
    Status: 1400fc03 KERNEL EXL IE
    Cause : 1080002c (ExcCode 0b)
    PrId  : 0001a920 (MIPS I6400)
    Modules linked in:
    Process fp-prctl (pid: 7338, threadinfo=88d38000, task=838e4000, tls=766527d0)
    Stack : 00000000 00000000 00000000 88d3fe98 00000000 00000000 809c0398 809c0338
       808e9100 00000000 88d3ff28 00400fc4 00400fc4 0040273c 7fb69e18 004a0000
       004a0000 004a0000 7664add0 8010de18 00000000 00000000 88d3fef8 88d3ff28
       808e9100 00000000 766527d0 8010e534 000c0000 85755000 8181d580 00000000
       00000000 00000000 004a0000 00000000 766527d0 7fb69e18 004a0000 80105c20
       ...
    Call Trace:
    [<80114178>] _restore_fp+0x10/0xa0
    [<8011a2ac>] mipsr2_decoder+0xd5c/0x1660
    [<8010de18>] do_ri+0x90/0x6b8
    [<80105c20>] ret_from_exception+0x0/0x10

Fix this by disabling preemption around the call to init_fpu(), ensuring
that it starts & completes on one CPU.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: b0a668fb2038 ("MIPS: kernel: mips-r2-to-r6-emul: Add R2 emulator for MIPS R6")
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.0+
Patchwork: https://patchwork.linux-mips.org/patch/14305/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
8 years agoarm64: kgdb: handle read-only text / modules
AKASHI Takahiro [Fri, 23 Sep 2016 07:42:08 +0000 (16:42 +0900)]
arm64: kgdb: handle read-only text / modules

Handle read-only cases when CONFIG_DEBUG_RODATA (4.0) or
CONFIG_DEBUG_SET_MODULE_RONX (3.18) are enabled by using
aarch64_insn_write() instead of probe_kernel_write() as introduced by
commit 2f896d586610 ("arm64: use fixmap for text patching") in 4.0.

Fixes: 11d91a770f1f ("arm64: Add CONFIG_DEBUG_SET_MODULE_RONX support")
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
8 years agoarm64: Call numa_store_cpu_info() earlier.
David Daney [Tue, 20 Sep 2016 18:46:35 +0000 (11:46 -0700)]
arm64: Call numa_store_cpu_info() earlier.

The wq_numa_init() function makes a private CPU to node map by calling
cpu_to_node() early in the boot process, before the non-boot CPUs are
brought online.  Since the default implementation of cpu_to_node()
returns zero for CPUs that have never been brought online, the
workqueue system's view is that *all* CPUs are on node zero.

When the unbound workqueue for a non-zero node is created, the
tsk_cpus_allowed() for the worker threads is the empty set because
there are, in the view of the workqueue system, no CPUs on non-zero
nodes.  The code in try_to_wake_up() using this empty cpumask ends up
using the cpumask empty set value of NR_CPUS as an index into the
per-CPU area pointer array, and gets garbage as it is one past the end
of the array.  This results in:

[    0.881970] Unable to handle kernel paging request at virtual address fffffb1008b926a4
[    1.970095] pgd = fffffc00094b0000
[    1.973530] [fffffb1008b926a4] *pgd=0000000000000000, *pud=0000000000000000, *pmd=0000000000000000
[    1.982610] Internal error: Oops: 96000004 [#1] SMP
[    1.987541] Modules linked in:
[    1.990631] CPU: 48 PID: 295 Comm: cpuhp/48 Tainted: G        W       4.8.0-rc6-preempt-vol+ #9
[    1.999435] Hardware name: Cavium ThunderX CN88XX board (DT)
[    2.005159] task: fffffe0fe89cc300 task.stack: fffffe0fe8b8c000
[    2.011158] PC is at try_to_wake_up+0x194/0x34c
[    2.015737] LR is at try_to_wake_up+0x150/0x34c
[    2.020318] pc : [<fffffc00080e7468>] lr : [<fffffc00080e7424>] pstate: 600000c5
[    2.027803] sp : fffffe0fe8b8fb10
[    2.031149] x29: fffffe0fe8b8fb10 x28: 0000000000000000
[    2.036522] x27: fffffc0008c63bc8 x26: 0000000000001000
[    2.041896] x25: fffffc0008c63c80 x24: fffffc0008bfb200
[    2.047270] x23: 00000000000000c0 x22: 0000000000000004
[    2.052642] x21: fffffe0fe89d25bc x20: 0000000000001000
[    2.058014] x19: fffffe0fe89d1d00 x18: 0000000000000000
[    2.063386] x17: 0000000000000000 x16: 0000000000000000
[    2.068760] x15: 0000000000000018 x14: 0000000000000000
[    2.074133] x13: 0000000000000000 x12: 0000000000000000
[    2.079505] x11: 0000000000000000 x10: 0000000000000000
[    2.084879] x9 : 0000000000000000 x8 : 0000000000000000
[    2.090251] x7 : 0000000000000040 x6 : 0000000000000000
[    2.095621] x5 : ffffffffffffffff x4 : 0000000000000000
[    2.100991] x3 : 0000000000000000 x2 : 0000000000000000
[    2.106364] x1 : fffffc0008be4c24 x0 : ffffff0ffffada80
[    2.111737]
[    2.113236] Process cpuhp/48 (pid: 295, stack limit = 0xfffffe0fe8b8c020)
[    2.120102] Stack: (0xfffffe0fe8b8fb10 to 0xfffffe0fe8b90000)
[    2.125914] fb00:                                   fffffe0fe8b8fb80 fffffc00080e7648
.
.
.
[    2.442859] Call trace:
[    2.445327] Exception stack(0xfffffe0fe8b8f940 to 0xfffffe0fe8b8fa70)
[    2.451843] f940: fffffe0fe89d1d00 0000040000000000 fffffe0fe8b8fb10 fffffc00080e7468
[    2.459767] f960: fffffe0fe8b8f980 fffffc00080e4958 ffffff0ff91ab200 fffffc00080e4b64
[    2.467690] f980: fffffe0fe8b8f9d0 fffffc00080e515c fffffe0fe8b8fa80 0000000000000000
[    2.475614] f9a0: fffffe0fe8b8f9d0 fffffc00080e58e4 fffffe0fe8b8fa80 0000000000000000
[    2.483540] f9c0: fffffe0fe8d10000 0000000000000040 fffffe0fe8b8fa50 fffffc00080e5ac4
[    2.491465] f9e0: ffffff0ffffada80 fffffc0008be4c24 0000000000000000 0000000000000000
[    2.499387] fa00: 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000040
[    2.507309] fa20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    2.515233] fa40: 0000000000000000 0000000000000000 0000000000000000 0000000000000018
[    2.523156] fa60: 0000000000000000 0000000000000000
[    2.528089] [<fffffc00080e7468>] try_to_wake_up+0x194/0x34c
[    2.533723] [<fffffc00080e7648>] wake_up_process+0x28/0x34
[    2.539275] [<fffffc00080d3764>] create_worker+0x110/0x19c
[    2.544824] [<fffffc00080d69dc>] alloc_unbound_pwq+0x3cc/0x4b0
[    2.550724] [<fffffc00080d6bcc>] wq_update_unbound_numa+0x10c/0x1e4
[    2.557066] [<fffffc00080d7d78>] workqueue_online_cpu+0x220/0x28c
[    2.563234] [<fffffc00080bd288>] cpuhp_invoke_callback+0x6c/0x168
[    2.569398] [<fffffc00080bdf74>] cpuhp_up_callbacks+0x44/0xe4
[    2.575210] [<fffffc00080be194>] cpuhp_thread_fun+0x13c/0x148
[    2.581027] [<fffffc00080dfbac>] smpboot_thread_fn+0x19c/0x1a8
[    2.586929] [<fffffc00080dbd64>] kthread+0xdc/0xf0
[    2.591776] [<fffffc0008083380>] ret_from_fork+0x10/0x50
[    2.597147] Code: b00057e1 91304021 91005021 b8626822 (b8606821)
[    2.603464] ---[ end trace 58c0cd36b88802bc ]---
[    2.608138] Kernel panic - not syncing: Fatal exception

Fix by moving call to numa_store_cpu_info() for all CPUs into
smp_prepare_cpus(), which happens before wq_numa_init().  Since
smp_store_cpu_info() now contains only a single function call,
simplify by removing the function and out-lining its contents.

Suggested-by: Robert Richter <rric@kernel.org>
Fixes: 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.")
Cc: <stable@vger.kernel.org> # 4.7.x-
Signed-off-by: David Daney <david.daney@cavium.com>
Reviewed-by: Robert Richter <rrichter@cavium.com>
Tested-by: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
8 years agolocking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help text
Vivien Didelot [Thu, 22 Sep 2016 20:55:13 +0000 (16:55 -0400)]
locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help text

Fix the indefinitiley -> indefinitely typo in Kconfig.debug.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160922205513.17821-1-vivien.didelot@savoirfairelinux.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoobjtool: Add do_task_dead() to global noreturn list
Josh Poimboeuf [Thu, 22 Sep 2016 21:21:25 +0000 (16:21 -0500)]
objtool: Add do_task_dead() to global noreturn list

objtool reports the following new warning:

  kernel/exit.o: warning: objtool: do_exit() falls through to next function complete_and_exit()

The warning is caused by do_exit()'s new call to do_task_dead(), which
is a new "noreturn" function which objtool doesn't know about yet,
introduced by:

  9af6528ee9b6 ("sched/core: Optimize __schedule()")

( objtool has to know all the global noreturn functions so it can follow
  the control flow of any functions which call them.  Unfortunately they
  need to be hard-coded because there's no automated way to detect them. )

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kbuild-all@01.org
Cc: tipbuild@zytor.com
Link: http://lkml.kernel.org/r/20160922212125.zbuewckqll4yur25@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agonvme-rdma: only clear queue flags after successful connect
Sagi Grimberg [Fri, 23 Sep 2016 01:58:17 +0000 (19:58 -0600)]
nvme-rdma: only clear queue flags after successful connect

Otherwise, nvme_rdma_stop_and_clear_queue() will incorrectly
try to stop/free rdma qps/cm_ids that are already freed.

Fixes: e89ca58f9c90 ("nvme-rdma: add DELETING queue flag")
Reported-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoi2c: qup: skip qup_i2c_suspend if the device is already runtime suspended
Sudeep Holla [Thu, 25 Aug 2016 11:23:39 +0000 (12:23 +0100)]
i2c: qup: skip qup_i2c_suspend if the device is already runtime suspended

If the i2c device is already runtime suspended, if qup_i2c_suspend is
executed during suspend-to-idle or suspend-to-ram it will result in the
following splat:

WARNING: CPU: 3 PID: 1593 at drivers/clk/clk.c:476 clk_core_unprepare+0x80/0x90
Modules linked in:

CPU: 3 PID: 1593 Comm: bash Tainted: G        W       4.8.0-rc3 #14
Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
PC is at clk_core_unprepare+0x80/0x90
LR is at clk_unprepare+0x28/0x40
pc : [<ffff0000086eecf0>] lr : [<ffff0000086f0c58>] pstate: 60000145
Call trace:
 clk_core_unprepare+0x80/0x90
 qup_i2c_disable_clocks+0x2c/0x68
 qup_i2c_suspend+0x10/0x20
 platform_pm_suspend+0x24/0x68
 ...

This patch fixes the issue by executing qup_i2c_pm_suspend_runtime
conditionally in qup_i2c_suspend.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
8 years agoMerge tag 'media/v4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Thu, 22 Sep 2016 16:04:49 +0000 (09:04 -0700)]
Merge tag 'media/v4.8-7' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - several fixes for new drivers added for Kernel 4.8 addition (cec
   core, pulse8 cec driver and Mediatek vcodec)

 - a regression fix for cx23885 and saa7134 drivers

 - an important fix for rcar-fcp, making rcar_fcp_enable() return 0 on
   success

* tag 'media/v4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (25 commits)
  [media] cx23885/saa7134: assign q->dev to the PCI device
  [media] rcar-fcp: Make sure rcar_fcp_enable() returns 0 on success
  [media] cec: fix ioctl return code when not registered
  [media] cec: don't Feature Abort broadcast msgs when unregistered
  [media] vcodec:mediatek: Refine VP8 encoder driver
  [media] vcodec:mediatek: Refine H264 encoder driver
  [media] vcodec:mediatek: change H264 profile default to profile high
  [media] vcodec:mediatek: Add timestamp and timecode copy for V4L2 Encoder
  [media] vcodec:mediatek: Fix visible_height larger than coded_height issue in s_fmt_out
  [media] vcodec:mediatek: Fix fops_vcodec_release flow for V4L2 Encoder
  [media] vcodec:mediatek:code refine for v4l2 Encoder driver
  [media] cec-funcs.h: add missing vendor-specific messages
  [media] cec-edid: check for IEEE identifier
  [media] pulse8-cec: fix error handling
  [media] pulse8-cec: set correct Signal Free Time
  [media] mtk-vcodec: add HAS_DMA dependency
  [media] cec: ignore messages when log_addr_mask == 0
  [media] cec: add item to TODO
  [media] cec: set unclaimed addresses to CEC_LOG_ADDR_INVALID
  [media] cec: add CEC_LOG_ADDRS_FL_ALLOW_UNREG_FALLBACK flag
  ...

8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 22 Sep 2016 15:49:25 +0000 (08:49 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:
 "Mostly small bits scattered all over the place, which is usually how
  things go this late in the -rc series.

   1) Proper driver init device resets in bnx2, from Baoquan He.

   2) Fix accounting overflow in __tcp_retransmit_skb(),
      sk_forward_alloc, and ip_idents_reserve, from Eric Dumazet.

   3) Fix crash in bna driver ethtool stats handling, from Ivan Vecera.

   4) Missing check of skb_linearize() return value in mac80211, from
      Johannes Berg.

   5) Endianness fix in nf_table_trace dumps, from Liping Zhang.

   6) SSN comparison fix in SCTP, from Marcelo Ricardo Leitner.

   7) Update DSA and b44 MAINTAINERS entries.

   8) Make input path of vti6 driver work again, from Nicolas Dichtel.

   9) Off-by-one in mlx4, from Sebastian Ott.

  10) Fix fallback route lookup handling in ipv6, from Vincent Bernat.

  11) Fix stack corruption on probe in qed driver, from Yuval Mintz.

  12) PHY init fixes in r8152 from Hayes Wang.

  13) Missing SKB free in irda_accept error path, from Phil Turnbull"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
  tcp: properly account Fast Open SYN-ACK retrans
  tcp: fix under-accounting retransmit SNMP counters
  MAINTAINERS: Update b44 maintainer.
  net: get rid of an signed integer overflow in ip_idents_reserve()
  net/mlx4_core: Fix to clean devlink resources
  net: can: ifi: Configure transmitter delay
  vti6: fix input path
  ipmr, ip6mr: return lastuse relative to now
  r8152: disable ALDPS and EEE before setting PHY
  r8152: remove r8153_enable_eee
  r8152: move PHY settings to hw_phy_cfg
  r8152: move enabling PHY
  r8152: move some functions
  cxgb4/cxgb4vf: Allocate more queues for 25G and 100G adapter
  qed: Fix stack corruption on probe
  MAINTAINERS: Add an entry for the core network DSA code
  net: ipv6: fallback to full lookup if table lookup is unsuitable
  net/mlx5: E-Switch, Handle mode change failures
  net/mlx5: E-Switch, Fix error flow in the SRIOV e-switch init code
  net/mlx5: Fix flow counter bulk command out mailbox allocation
  ...

8 years agosched/debug: Hide printk() by default
Peter Zijlstra [Tue, 20 Sep 2016 09:05:31 +0000 (11:05 +0200)]
sched/debug: Hide printk() by default

Dietmar accidentally added an unconditional sched domain printk. Hide
it behind the normal sched_debug flag.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: cd92bfd3b8cb ("sched/core: Store maximum per-CPU capacity in root domain")
[ Fixed !SCHED_DEBUG build failure. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agosched/fair: Fix SCHED_HRTICK bug leading to late preemption of tasks
Srivatsa Vaddagiri [Sat, 17 Sep 2016 01:28:51 +0000 (18:28 -0700)]
sched/fair: Fix SCHED_HRTICK bug leading to late preemption of tasks

SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
(just when they would have exceeded their ideal_runtime).

It makes use of a per-CPU hrtimer resource and hence arming that
hrtimer should be based on total SCHED_FAIR tasks a CPU has across its
various cfs_rqs, rather than being based on number of tasks in a
particular cfs_rq (as implemented currently).

As a result, with current code, its possible for a running task (which
is the sole task in its cfs_rq) to be preempted much after its
ideal_runtime has elapsed, resulting in increased latency for tasks in
other cfs_rq on same CPU.

Fix this by arming sched hrtimer based on total number of SCHED_FAIR
tasks a CPU has across its various cfs_rqs.

Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1474075731-11550-1-git-send-email-joonwoop@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoperf/core: Limit matching exclusive events to one PMU
Alexander Shishkin [Tue, 20 Sep 2016 15:48:11 +0000 (18:48 +0300)]
perf/core: Limit matching exclusive events to one PMU

An "exclusive" PMU is the one that can only have one event scheduled in
at any given time. There may be more than one of such PMUs in a system,
though, like Intel PT and BTS. It should be allowed to have one event
for either of those inside the same context (there may be other constraints
that may prevent this, but those would be hardware-specific). However,
the exclusivity code is written so that only one event from any of the
"exclusive" PMUs is allowed in a context.

Fix this by making the exclusive event filter explicitly match two events'
PMUs.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160920154811.3255-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoperf/x86/intel/bts: Make it an exclusive PMU
Alexander Shishkin [Tue, 20 Sep 2016 15:48:10 +0000 (18:48 +0300)]
perf/x86/intel/bts: Make it an exclusive PMU

Just like intel_pt, intel_bts can only handle one event at a time,
which is the reason we introduced PERF_PMU_CAP_EXCLUSIVE in the first
place. However, at the moment one can have as many intel_bts events
within the same context at the same time as one pleases. Only one of
them, however, will get scheduled and receive the actual trace data.

Fix this by making intel_bts an "exclusive" PMU.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160920154811.3255-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agosched/core: Avoid _cond_resched() for PREEMPT=y
Peter Zijlstra [Mon, 19 Sep 2016 10:57:53 +0000 (12:57 +0200)]
sched/core: Avoid _cond_resched() for PREEMPT=y

On fully preemptible kernels _cond_resched() is pointless, so avoid
emitting any code for it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agosched/core: Optimize __schedule()
Peter Zijlstra [Tue, 13 Sep 2016 16:37:29 +0000 (18:37 +0200)]
sched/core: Optimize __schedule()

Oleg noted that by making do_exit() use __schedule() for the TASK_DEAD
context switch, we can avoid the TASK_DEAD special case currently in
__schedule() because that avoids the extra preempt_disable() from
schedule().

In order to facilitate this, create a do_task_dead() helper which we
place in the scheduler code, such that it can access __schedule().

Also add some __noreturn annotations to the functions, there's no
coming back from do_exit().

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Cheng Chao <cs.os.kernel@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: chris@chris-wilson.co.uk
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/20160913163729.GB5012@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agostop_machine: Avoid a sleep and wakeup in stop_one_cpu()
Cheng Chao [Wed, 14 Sep 2016 02:01:50 +0000 (10:01 +0800)]
stop_machine: Avoid a sleep and wakeup in stop_one_cpu()

In case @cpu == smp_proccessor_id(), we can avoid a sleep+wakeup
cycle by doing a preemption.

Callers such as sched_exec() can benefit from this change.

Signed-off-by: Cheng Chao <cs.os.kernel@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: chris@chris-wilson.co.uk
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/1473818510-6779-1-git-send-email-cs.os.kernel@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agosched/core: Remove unnecessary initialization in sched_init()
Cheng Chao [Wed, 14 Sep 2016 02:18:56 +0000 (10:18 +0800)]
sched/core: Remove unnecessary initialization in sched_init()

init_idle() is called immediately after:

  current->sched_class = &fair_sched_class;

init_idle() sets:

  current->sched_class = &idle_sched_class;

First assignment is superfluous.

Signed-off-by: Cheng Chao <cs.os.kernel@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1473819536-7398-1-git-send-email-cs.os.kernel@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoMerge branch 'linus' into sched/core, to pick up fixes
Ingo Molnar [Thu, 22 Sep 2016 12:49:40 +0000 (14:49 +0200)]
Merge branch 'linus' into sched/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agolocking/atomic, arch/sh: Fix ATOMIC_FETCH_OP()
Peter Zijlstra [Fri, 26 Aug 2016 13:06:04 +0000 (15:06 +0200)]
locking/atomic, arch/sh: Fix ATOMIC_FETCH_OP()

We cannot use the "z" constraint twice, since its a single register
(r0). Change the one not used by movli.l/movco.l to "r".

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agosched/core: Do not use smp_processor_id() with preempt enabled in smpboot_thread_fn()
Con Kolivas [Tue, 13 Sep 2016 06:27:05 +0000 (16:27 +1000)]
sched/core: Do not use smp_processor_id() with preempt enabled in smpboot_thread_fn()

We should not be using smp_processor_id() with preempt enabled.

Bug identified and fix provided by Alfred Chen.

Reported-by: Alfred Chen <cchalpha@gmail.com>
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Cc: Alfred Chen <cchalpha@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/2042051.3vvUWIM0vs@hex
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoregmap: fix deadlock on _regmap_raw_write() error path
Nikita Yushchenko [Thu, 22 Sep 2016 09:02:25 +0000 (12:02 +0300)]
regmap: fix deadlock on _regmap_raw_write() error path

Commit 815806e39bf6 ("regmap: drop cache if the bus transfer error")
added a call to regcache_drop_region() to error path in
_regmap_raw_write(). However that path runs with regmap lock taken,
and regcache_drop_region() tries to re-take it, causing a deadlock.
Fix that by calling map->cache_ops->drop() directly.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
8 years agocrypto: rsa-pkcs1pad - Handle leading zero for decryption
Herbert Xu [Thu, 22 Sep 2016 09:04:57 +0000 (17:04 +0800)]
crypto: rsa-pkcs1pad - Handle leading zero for decryption

As the software RSA implementation now produces fixed-length
output, we need to eliminate leading zeros in the calling code
instead.

This patch does just that for pkcs1pad decryption while signature
verification was fixed in an earlier patch.

Fixes: 9b45b7bba3d2 ("crypto: rsa - Generate fixed-length output")
Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
8 years agoKEYS: Fix skcipher IV clobbering
Herbert Xu [Tue, 20 Sep 2016 12:35:55 +0000 (20:35 +0800)]
KEYS: Fix skcipher IV clobbering

The IV must not be modified by the skcipher operation so we need
to duplicate it.

Fixes: c3917fd9dfbc ("KEYS: Use skcipher")
Cc: stable@vger.kernel.org
Reported-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
8 years agommc: dw_mmc: fix the spamming log message
Jaehoon Chung [Thu, 22 Sep 2016 05:12:00 +0000 (14:12 +0900)]
mmc: dw_mmc: fix the spamming log message

When there is no Card which is set to "broken-cd", it's displayed a clock
information continuously. Because it's polling for detecting card.
This patch is fixed this problem.

Fixes: 65257a0deed5 ("mmc: dw_mmc: remove UBSAN warning in dw_mci_setup_bus()")
Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
8 years agotcp: properly account Fast Open SYN-ACK retrans
Yuchung Cheng [Wed, 21 Sep 2016 23:16:15 +0000 (16:16 -0700)]
tcp: properly account Fast Open SYN-ACK retrans

Since the TFO socket is accepted right off SYN-data, the socket
owner can call getsockopt(TCP_INFO) to collect ongoing SYN-ACK
retransmission or timeout stats (i.e., tcpi_total_retrans,
tcpi_retransmits). Currently those stats are only updated
upon handshake completes. This patch fixes it.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: fix under-accounting retransmit SNMP counters
Yuchung Cheng [Wed, 21 Sep 2016 23:16:14 +0000 (16:16 -0700)]
tcp: fix under-accounting retransmit SNMP counters

This patch fixes these under-accounting SNMP rtx stats
LINUX_MIB_TCPFORWARDRETRANS
LINUX_MIB_TCPFASTRETRANS
LINUX_MIB_TCPSLOWSTARTRETRANS
when retransmitting TSO packets

Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
David S. Miller [Thu, 22 Sep 2016 06:56:23 +0000 (02:56 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2016-09-21

1) Propagate errors on security context allocation.
   From Mathias Krause.

2) Fix inbound policy checks for inter address family tunnels.
   From Thomas Zeitlhofer.

3) Fix an old memory leak on aead algorithm usage.
   From Ilan Tayari.

4) A recent patch fixed a possible NULL pointer dereference
   but broke the vti6 input path.
   Fix from Nicolas Dichtel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge tag 'linux-can-fixes-for-4.8-20160921' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Thu, 22 Sep 2016 06:47:46 +0000 (02:47 -0400)]
Merge tag 'linux-can-fixes-for-4.8-20160921' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2016-09-21

this is another pull request of one patch for the upcoming linux-4.8 release.

Marek Vasut fixes the CAN-FD bit rate switch in the ifi driver by configuring
the transmitter delay.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMAINTAINERS: Update b44 maintainer.
Michael Chan [Wed, 21 Sep 2016 03:33:15 +0000 (23:33 -0400)]
MAINTAINERS: Update b44 maintainer.

Taking over as maintainer since Gary Zambrano is no longer working
for Broadcom.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: get rid of an signed integer overflow in ip_idents_reserve()
Eric Dumazet [Wed, 21 Sep 2016 01:06:17 +0000 (18:06 -0700)]
net: get rid of an signed integer overflow in ip_idents_reserve()

Jiri Pirko reported an UBSAN warning happening in ip_idents_reserve()

[] UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:11
[] signed integer overflow:
[] -2117905507 + -695755206 cannot be represented in type 'int'

Since we do not have uatomic_add_return() yet, use atomic_cmpxchg()
so that the arithmetics can be done using unsigned int.

Fixes: 04ca6973f7c1 ("ip: make IP identifiers less predictable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx4_core: Fix to clean devlink resources
Kamal Heib [Tue, 20 Sep 2016 11:55:31 +0000 (14:55 +0300)]
net/mlx4_core: Fix to clean devlink resources

This patch cleans devlink resources by calling devlink_port_unregister()
to avoid the following issues:

- Kernel panic when triggering reset flow.
- Memory leak due to unfreed resources in mlx4_init_port_info().

Fixes: 09d4d087cd48 ("mlx4: Implement devlink interface")
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge tag 'wireless-drivers-for-davem-2016-09-20' of git://git.kernel.org/pub/scm...
David S. Miller [Thu, 22 Sep 2016 01:45:19 +0000 (21:45 -0400)]
Merge tag 'wireless-drivers-for-davem-2016-09-20' of git://git./linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.8

iwlwifi

* fix to prevent firmware crash when sending off-channel frames
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobtrfs: ensure that file descriptor used with subvol ioctls is a dir
Jeff Mahoney [Wed, 21 Sep 2016 12:31:29 +0000 (08:31 -0400)]
btrfs: ensure that file descriptor used with subvol ioctls is a dir

If the subvol/snapshot create/destroy ioctls are passed a regular file
with execute permissions set, we'll eventually Oops while trying to do
inode->i_op->lookup via lookup_one_len.

This patch ensures that the file descriptor refers to a directory.

Fixes: cb8e70901d (Btrfs: Fix subvolume creation locking rules)
Fixes: 76dda93c6a (Btrfs: add snapshot/subvolume destroy ioctl)
Cc: <stable@vger.kernel.org> #v2.6.29+
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
8 years agoBtrfs: handle quota reserve failure properly
Josef Bacik [Thu, 15 Sep 2016 18:57:48 +0000 (14:57 -0400)]
Btrfs: handle quota reserve failure properly

btrfs/022 was spitting a warning for the case that we exceed the quota.  If we
fail to make our quota reservation we need to clean up our data space
reservation.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Tested-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
8 years agoi2c: mux: pca954x: retry updating the mux selection on failure
Peter Rosin [Wed, 14 Sep 2016 13:24:12 +0000 (15:24 +0200)]
i2c: mux: pca954x: retry updating the mux selection on failure

The cached value of the last selected channel prevents retries on the
next call, even on failure to update the selected channel. Fix that.

Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
8 years agoi2c-eg20t: fix race between i2c init and interrupt enable
Yadi.hu [Sun, 18 Sep 2016 10:52:31 +0000 (18:52 +0800)]
i2c-eg20t: fix race between i2c init and interrupt enable

the eg20t driver call request_irq() function before the pch_base_address,
base address of i2c controller's register, is assigned an effective value.

there is one possible scenario that an interrupt which isn't inside eg20t
arrives immediately after request_irq() is executed when i2c controller
shares an interrupt number with others. since the interrupt handler
pch_i2c_handler() has already active as shared action, it will be called
and read its own register to determine if this interrupt is from itself.

At that moment, since base address of i2c registers is not remapped
in kernel space yet,so the INT handler will access an illegal address
and then a error occurs.

Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
8 years agoMIPS: vDSO: Fix Malta EVA mapping to vDSO page structs
James Hogan [Wed, 7 Sep 2016 12:37:01 +0000 (13:37 +0100)]
MIPS: vDSO: Fix Malta EVA mapping to vDSO page structs

The page structures associated with the vDSO pages in the kernel image
are calculated using virt_to_page(), which uses __pa() under the hood to
find the pfn associated with the virtual address. The vDSO data pointers
however point to kernel symbols, so __pa_symbol() should really be used
instead.

Since there is no equivalent to virt_to_page() which uses __pa_symbol(),
fix init_vdso_image() to work directly with pfns, calculated with
__phys_to_pfn(__pa_symbol(...)).

This issue broke the Malta Enhanced Virtual Addressing (EVA)
configuration which has a non-default implementation of __pa_symbol().
This is because it uses a physical alias so that the kernel executes
from KSeg0 (VA 0x80000000 -> PA 0x00000000), while RAM is provided to
the kernel in the KUSeg range (VA 0x00000000 -> PA 0x80000000) which
uses the same underlying RAM.

Since there are no page structures associated with the low physical
address region, some arbitrary kernel memory would be interpreted as a
page structure for the vDSO pages and badness ensues.

Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 4.4.x-
Patchwork: https://patchwork.linux-mips.org/patch/14229/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
8 years agonet: can: ifi: Configure transmitter delay
Marek Vasut [Mon, 19 Sep 2016 19:34:01 +0000 (21:34 +0200)]
net: can: ifi: Configure transmitter delay

Configure the transmitter delay register at +0x1c to correctly handle
the CAN FD bitrate switch (BRS). This moves the SSP (secondary sample
point) to a proper offset, so that the TDC mechanism works and won't
generate error frames on the CAN link.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
8 years agovti6: fix input path
Nicolas Dichtel [Mon, 19 Sep 2016 14:17:57 +0000 (16:17 +0200)]
vti6: fix input path

Since commit 1625f4529957, vti6 is broken, all input packets are dropped
(LINUX_MIB_XFRMINNOSTATES is incremented).

XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 is set by vti6_rcv() before calling
xfrm6_rcv()/xfrm6_rcv_spi(), thus we cannot set to NULL that value in
xfrm6_rcv_spi().

A new function xfrm6_rcv_tnl() that enables to pass a value to
xfrm6_rcv_spi() is added, so that xfrm6_rcv() is not touched (this function
is used in several handlers).

CC: Alexey Kodanev <alexey.kodanev@oracle.com>
Fixes: 1625f4529957 ("net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
8 years agoipmr, ip6mr: return lastuse relative to now
Nikolay Aleksandrov [Tue, 20 Sep 2016 14:17:22 +0000 (16:17 +0200)]
ipmr, ip6mr: return lastuse relative to now

When I introduced the lastuse member I made a subtle error because it was
returned as an absolute value but that is meaningless to user-space as it
doesn't allow to see how old exactly an entry is. Let's make it similar to
how the bridge returns such values and make it relative to "now" (jiffies).
This allows us to show the actual age of the entries and is much more
useful (e.g. user-space daemons can age out entries, iproute2 can display
the lastuse properly).

Fixes: 43b9e1274060 ("net: ipmr/ip6mr: add support for keeping an entry age")
Reported-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'r8152-phy-fixes'
David S. Miller [Wed, 21 Sep 2016 04:53:53 +0000 (00:53 -0400)]
Merge branch 'r8152-phy-fixes'

Hayes Wang says:

====================
r8152: correct the flow of PHY

First, to enable the PHY as early as possible. Some settings may fail if the
PHY is power down.

Move the other PHY settings to hw_phy_cfg() to make sure the order is correct.

Finally, disable ALDPS and EEE before updating the PHY for RTL8153.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agor8152: disable ALDPS and EEE before setting PHY
hayeswang [Tue, 20 Sep 2016 08:22:09 +0000 (16:22 +0800)]
r8152: disable ALDPS and EEE before setting PHY

Disable ALDPS and EEE to avoid the possible failure when setting the PHY.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agor8152: remove r8153_enable_eee
hayeswang [Tue, 20 Sep 2016 08:22:08 +0000 (16:22 +0800)]
r8152: remove r8153_enable_eee

Remove r8153_enable_eee().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agor8152: move PHY settings to hw_phy_cfg
hayeswang [Tue, 20 Sep 2016 08:22:07 +0000 (16:22 +0800)]
r8152: move PHY settings to hw_phy_cfg

Move the PHY relative settings together to hw_phy_cfg().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agor8152: move enabling PHY
hayeswang [Tue, 20 Sep 2016 08:22:06 +0000 (16:22 +0800)]
r8152: move enabling PHY

Move enabling PHY to init(), otherwise some other settings may fail.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agor8152: move some functions
hayeswang [Tue, 20 Sep 2016 08:22:05 +0000 (16:22 +0800)]
r8152: move some functions

Move the following functions forward.

r8152_mmd_indirect()
r8152_mmd_read()
r8152_mmd_write()
r8152_eee_en()
r8152b_enable_eee()
r8153_eee_en()
r8153_enable_eee()
r8152b_enable_fc()
r8153_aldps_en()

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocxgb4/cxgb4vf: Allocate more queues for 25G and 100G adapter
Hariprasad Shenai [Tue, 20 Sep 2016 06:30:52 +0000 (12:00 +0530)]
cxgb4/cxgb4vf: Allocate more queues for 25G and 100G adapter

We were missing check for 25G and 100G while checking port speed,
which lead to less number of queues getting allocated for 25G & 100G
adapters and leading to low throughput. Adding the missing check for
both NIC and vNIC driver.

Also fixes port advertisement for 25G and 100G in ethtool output.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agopowerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment
Russell Currey [Wed, 14 Sep 2016 06:37:17 +0000 (16:37 +1000)]
powerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment

Commit 5958d19a143e checks for prefetchable m64 BARs by comparing the
addresses instead of using resource flags.  This broke SR-IOV as the m64
check in pnv_pci_ioda_fixup_iov_resources() fails.

The condition in pnv_pci_window_alignment() also changed to checking
only IORESOURCE_MEM_64 instead of both IORESOURCE_MEM_64 and
IORESOURCE_PREFETCH.

Revert these cases to the previous behaviour, adding a new helper function
to do so.  This is named pnv_pci_is_m64_flags() to make it clear this
function is only looking at resource flags and should not be relied on for
non-SRIOV resources.

Fixes: 5958d19a143e ("Fix incorrect PE reservation attempt on some 64-bit BARs")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoMerge tag 'linux-can-fixes-for-4.8-20160919' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Wed, 21 Sep 2016 02:46:14 +0000 (22:46 -0400)]
Merge tag 'linux-can-fixes-for-4.8-20160919' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2016-09-19

this is a pull request of one patch for the upcoming linux-4.8 release.

The patch by Fabio Estevam fixes the pm handling in the flexcan driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge tag 'usercopy-v4.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees...
Linus Torvalds [Wed, 21 Sep 2016 00:11:19 +0000 (17:11 -0700)]
Merge tag 'usercopy-v4.8-rc8' of git://git./linux/kernel/git/kees/linux

Pull usercopy hardening fix from Kees Cook:
 "Expand the arm64 vmalloc check to include skipping the module space
  too"

* tag 'usercopy-v4.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  mm: usercopy: Check for module addresses

8 years agofix fault_in_multipages_...() on architectures with no-op access_ok()
Al Viro [Tue, 20 Sep 2016 19:07:42 +0000 (20:07 +0100)]
fix fault_in_multipages_...() on architectures with no-op access_ok()

Switching iov_iter fault-in to multipages variants has exposed an old
bug in underlying fault_in_multipages_...(); they break if the range
passed to them wraps around.  Normally access_ok() done by callers will
prevent such (and it's a guaranteed EFAULT - ERR_PTR() values fall into
such a range and they should not point to any valid objects).

However, on architectures where userland and kernel live in different
MMU contexts (e.g. s390) access_ok() is a no-op and on those a range
with a wraparound can reach fault_in_multipages_...().

Since any wraparound means EFAULT there, the fix is trivial - turn
those

    while (uaddr <= end)
    ...
into

    if (unlikely(uaddr > end))
    return -EFAULT;
    do
    ...
    while (uaddr <= end);

Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Cc: stable@vger.kernel.org # v3.5+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: usercopy: Check for module addresses
Laura Abbott [Tue, 20 Sep 2016 15:56:36 +0000 (08:56 -0700)]
mm: usercopy: Check for module addresses

While running a compile on arm64, I hit a memory exposure

usercopy: kernel memory exposure attempt detected from fffffc0000f3b1a8 (buffer_head) (1 bytes)
------------[ cut here ]------------
kernel BUG at mm/usercopy.c:75!
Internal error: Oops - BUG: 0 [#1] SMP
Modules linked in: ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp
llc ebtable_nat ip6table_security ip6table_raw ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
iptable_security iptable_raw iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
ebtable_filter ebtables ip6table_filter ip6_tables vfat fat xgene_edac
xgene_enet edac_core i2c_xgene_slimpro i2c_core at803x realtek xgene_dma
mdio_xgene gpio_dwapb gpio_xgene_sb xgene_rng mailbox_xgene_slimpro nfsd
auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sdhci_of_arasan
sdhci_pltfm sdhci mmc_core xhci_plat_hcd gpio_keys
CPU: 0 PID: 19744 Comm: updatedb Tainted: G        W 4.8.0-rc3-threadinfo+ #1
Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang Board, BIOS 3.06.12 Aug 12 2016
task: fffffe03df944c00 task.stack: fffffe00d128c000
PC is at __check_object_size+0x70/0x3f0
LR is at __check_object_size+0x70/0x3f0
...
[<fffffc00082b4280>] __check_object_size+0x70/0x3f0
[<fffffc00082cdc30>] filldir64+0x158/0x1a0
[<fffffc0000f327e8>] __fat_readdir+0x4a0/0x558 [fat]
[<fffffc0000f328d4>] fat_readdir+0x34/0x40 [fat]
[<fffffc00082cd8f8>] iterate_dir+0x190/0x1e0
[<fffffc00082cde58>] SyS_getdents64+0x88/0x120
[<fffffc0008082c70>] el0_svc_naked+0x24/0x28

fffffc0000f3b1a8 is a module address. Modules may have compiled in
strings which could get copied to userspace. In this instance, it
looks like "." which matches with a size of 1 byte. Extend the
is_vmalloc_addr check to be is_vmalloc_or_module_addr to cover
all possible cases.

Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
8 years agoirqchip/mips-gic: Fix local interrupts
Paul Burton [Tue, 13 Sep 2016 16:53:35 +0000 (17:53 +0100)]
irqchip/mips-gic: Fix local interrupts

Since the device hierarchy domain was added by commit c98c1822ee13
("irqchip/mips-gic: Add device hierarchy domain"), GIC local interrupts
have been broken.

Users attempting to setup a per-cpu local IRQ, for example the GIC timer
clock events code in drivers/clocksource/mips-gic-timer.c, the
setup_percpu_irq function would refuse with -EINVAL because the GIC
irqchip driver never called irq_set_percpu_devid so the
IRQ_PER_CPU_DEVID flag was never set for the IRQ. This happens because
irq_set_percpu_devid was being called from the gic_irq_domain_map
function which is no longer called.

Doing only that runs into further problems because gic_dev_domain_alloc
set the struct irq_chip for all interrupts, local or shared, to
gic_level_irq_controller despite that only being suitable for shared
interrupts. The typical outcome of this is that gic_level_irq_controller
callback functions are called for local interrupts, and then hwirq
number calculations overflow & the driver ends up attempting to access
some invalid register with an address calculated from an invalid hwirq
number. Best case scenario is that this then leads to a bus error. This
is fixed by abstracting the setup of the hwirq & chip to a new function
gic_setup_dev_chip which is used by both the root GIC IRQ domain & the
device domain.

Finally, decoding local interrupts failed because gic_dev_domain_alloc
only called irq_domain_alloc_irqs_parent for shared interrupts. Local
ones were therefore never associated with hwirqs in the root GIC IRQ
domain and the virq in gic_handle_local_int would always be 0. This is
fixed by calling irq_domain_alloc_irqs_parent unconditionally & having
gic_irq_domain_alloc handle both local & shared interrupts, which is
easy due to the aforementioned abstraction of chip setup into
gic_setup_dev_chip.

This fixes use of the MIPS GIC timer for clock events, which has been
broken since c98c1822ee13 ("irqchip/mips-gic: Add device hierarchy
domain") but hadn't been noticed due to a silent fallback to the MIPS
coprocessor 0 count/compare clock events device.

Fixes: c98c1822ee13 ("irqchip/mips-gic: Add device hierarchy domain")
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Qais Yousef <qsyousef@gmail.com>
Cc: stable@vger.kernel.org
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: http://lkml.kernel.org/r/20160913165335.31389-1-paul.burton@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 years agofs/proc/kcore.c: Add bounce buffer for ktext data
Jiri Olsa [Thu, 8 Sep 2016 07:57:08 +0000 (09:57 +0200)]
fs/proc/kcore.c: Add bounce buffer for ktext data

We hit hardened usercopy feature check for kernel text access by reading
kcore file:

  usercopy: kernel memory exposure attempt detected from ffffffff8179a01f (<kernel text>) (4065 bytes)
  kernel BUG at mm/usercopy.c:75!

Bypassing this check for kcore by adding bounce buffer for ktext data.

Reported-by: Steve Best <sbest@redhat.com>
Fixes: f5509cc18daa ("mm: Hardened usercopy")
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agofs/proc/kcore.c: Make bounce buffer global for read
Jiri Olsa [Thu, 8 Sep 2016 07:57:07 +0000 (09:57 +0200)]
fs/proc/kcore.c: Make bounce buffer global for read

Next patch adds bounce buffer for ktext area, so it's
convenient to have single bounce buffer for both
vmalloc/module and ktext cases.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming...
Ingo Molnar [Tue, 20 Sep 2016 14:56:56 +0000 (16:56 +0200)]
Merge tag 'efi-urgent' of git://git./linux/kernel/git/mfleming/efi into efi/urgent

Pull EFI fixes from Matt Fleming:

 * Fix a boot hang on large memory machines (multiple terabyte) caused
   by type conversion errors in the x86 PAT code (Matt Fleming)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoperf/x86/intel/bts: Make sure debug store is valid
Sebastian Andrzej Siewior [Tue, 20 Sep 2016 13:12:21 +0000 (15:12 +0200)]
perf/x86/intel/bts: Make sure debug store is valid

Since commit 4d4c47412464 ("perf/x86/intel/bts: Fix BTS PMI detection")
my box goes boom on boot:

| .... node  #0, CPUs:      #1 #2 #3 #4 #5 #6 #7
| BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
| IP: [<ffffffff8100c463>] intel_bts_interrupt+0x43/0x130
| Call Trace:
|  <NMI> d [<ffffffff8100b341>] intel_pmu_handle_irq+0x51/0x4b0
|  [<ffffffff81004d47>] perf_event_nmi_handler+0x27/0x40

This happens because the code introduced in this commit dereferences the
debug store pointer unconditionally. The debug store is not guaranteed to
be available, so a NULL pointer check as on other places is required.

Fixes: 4d4c47412464 ("perf/x86/intel/bts: Fix BTS PMI detection")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: vince@deater.net
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20160920131220.xg5pbdjtznszuyzb@breakpoint.cc
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>