Linus Torvalds [Fri, 4 Aug 2017 16:59:24 +0000 (09:59 -0700)]
Merge tag 'drm-fixes-for-v4.13-rc4' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Either my email ate everything or everyone is on holidays, either way
all I can find is some lonely AMD fixes"
[ Europe might be on vacation, and the Pacific NW is too hot for work. ]
* tag 'drm-fixes-for-v4.13-rc4' of git://people.freedesktop.org/~airlied/linux:
drm/amdgpu: Use list_del_init in amdgpu_mn_unregister
drm/amdgpu: Fix undue fallthroughs in golden registers initialization
drm/amdgpu: fix header on gfx9 clear state
Linus Torvalds [Fri, 4 Aug 2017 16:56:54 +0000 (09:56 -0700)]
Merge tag 'powerpc-4.13-5' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes for recently merged code:
- a fix for the _PAGE_DEVMAP support, which was breaking KVM on
Power9 radix
- avoid a (harmless) lockdep warning in the early SMP code
- return failure for some uses of dma_set_mask() rather than falling
back to 32-bits
- fix stack setup in watchdog soft_nmi_common() to use emergency
stack
- fix of_irq_to_resource() error check in of_fsl_spi_probe()
Two fixes going to stable:
- fix saving of Transactional Memory SPRs in core dump
- fix __check_irq_replay missing decrementer interrupt
And two misc:
- fix 64-bit boot wrapper build with non-biarch compiler
- work around a POWER9 PMU hang after state-loss idle
Thanks to: Alistair Popple, Aneesh Kumar K.V, Cyril Bur, Gustavo
Romero, Jose Ricardo Ziviani, Laurent Vivier, Nicholas Piggin, Oliver
O'Halloran, Sergei Shtylyov, Suraj Jitindar Singh, Thomas Gleixner"
* tag 'powerpc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64: Fix __check_irq_replay missing decrementer interrupt
powerpc/perf: POWER9 PMU stops after idle workaround
powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check
powerpc/64s: Fix stack setup in watchdog soft_nmi_common()
powerpc/powernv/pci: Return failure for some uses of dma_set_mask()
powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler
powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU
powerpc/tm: Fix saving of TM SPRs in core dump
powerpc/mm: Fix pmd/pte_devmap() on non-leaf entries
Nicholas Piggin [Tue, 1 Aug 2017 13:59:28 +0000 (23:59 +1000)]
powerpc/64: Fix __check_irq_replay missing decrementer interrupt
If the decrementer wraps again and de-asserts the decrementer
exception while hard-disabled, __check_irq_replay() has a test to
notice the wrap when interrupts are re-enabled.
The decrementer check must be done when clearing the PACA_IRQ_HARD_DIS
flag, not when the PACA_IRQ_DEC flag is tested. Previously this worked
because the decrementer interrupt was always the first one checked
after clearing the hard disable flag, but HMI check was moved ahead of
that, which introduced this bug.
This can cause a missed decrementer interrupt if we soft-disable
interrupts then take an HMI which is recorded in irq_happened, then
hard-disable interrupts for > 4s to wrap the decrementer.
Fixes:
e0e0d6b7390b ("powerpc/64: Replay hypervisor maintenance interrupt first")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Thu, 20 Jul 2017 01:53:22 +0000 (11:53 +1000)]
powerpc/perf: POWER9 PMU stops after idle workaround
POWER9 DD2 PMU can stop after a state-loss idle in some conditions.
A solution is to set then clear MMCRA[60] after wake from state-loss
idle. MMCRA[60] is a non-architected bit, see the user manual for
details.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Dave Airlie [Fri, 4 Aug 2017 01:43:14 +0000 (11:43 +1000)]
Merge branch 'drm-fixes-4.13' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Just a few small fixes for 4.13.
* 'drm-fixes-4.13' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: Use list_del_init in amdgpu_mn_unregister
drm/amdgpu: Fix undue fallthroughs in golden registers initialization
drm/amdgpu: fix header on gfx9 clear state
Linus Torvalds [Thu, 3 Aug 2017 22:25:14 +0000 (15:25 -0700)]
Merge tag 'vfio-v4.13-rc4' of git://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:
- SPAPR/EEH config build fix (Murilo Opsfelder Araujo)
- Fix possible device lock deadlock (Alex Williamson)
- Correctly size integrated endpoint PCIe capabilities (Alex
Williamson)
* tag 'vfio-v4.13-rc4' of git://github.com/awilliam/linux-vfio:
vfio/pci: Fix handling of RC integrated endpoint PCIe capability size
vfio/pci: Use pci_try_reset_function() on initial open
include/linux/vfio.h: Guard powerpc-specific functions with CONFIG_VFIO_SPAPR_EEH
Linus Torvalds [Thu, 3 Aug 2017 21:58:13 +0000 (14:58 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"15 fixes"
[ This does not merge the "fortify: use WARN instead of BUG for now"
patch, which needs a bit of extra work to build cleanly with all
configurations. Arnd is on it. - Linus ]
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
ocfs2: don't clear SGID when inheriting ACLs
mm: allow page_cache_get_speculative in interrupt context
userfaultfd: non-cooperative: flush event_wqh at release time
ipc: add missing container_of()s for randstruct
cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()
userfaultfd_zeropage: return -ENOSPC in case mm has gone
mm: take memory hotplug lock within numa_zonelist_order_handler()
mm/page_io.c: fix oops during block io poll in swapin path
zram: do not free pool->size_class
kthread: fix documentation build warning
kasan: avoid -Wmaybe-uninitialized warning
userfaultfd: non-cooperative: notify about unmap of destination during mremap
mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
pid: kill pidhash_size in pidhash_init()
mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors
Linus Torvalds [Thu, 3 Aug 2017 19:37:12 +0000 (12:37 -0700)]
Merge tag 'acpi-4.13-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix two issues in the ACPI SoC drivers (Intel LPSS and AMD APD),
a crash in the PCC mailbox initialization code and a WDAT watchdog
initialization failure.
Specifics:
- Fix a device ID of Hisilicon Hip07/08 in the ACPI APD (AMD SoC)
driver (Hanjun Guo).
- Fix list corruption (introduced during the 4.11 cycle) in the ACPI
LPSS (Intel SoC) driver (Hans de Goede).
- Fix PCC mailbox handling code crash during initialization when PCCT
is not present and PCC channel 0 is requested (Hoan Tran).
- Fix a WDAT watchdog initialization issue causing platform device
creation to fail due to partially overlapping address ranges in
resources (Ryan Kennedy)"
* tag 'acpi-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: APD: Fix HID for Hisilicon Hip07/08
mailbox: pcc: Fix crash when request PCC channel 0
ACPI / watchdog: Fix init failure with overlapping register regions
ACPI / LPSS: Only call pwm_add_table() for the first PWM controller
Linus Torvalds [Thu, 3 Aug 2017 19:32:49 +0000 (12:32 -0700)]
Merge tag 'pm-4.13-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two cpufreq issues, one introduced recently and one related
to recent changes, fix cpufreq documentation, fix up recently added
code in the Thunderbolt driver and update runtime PM framework
documentation.
Specifics:
- Fix the handling of the scaling_cur_freq cpufreq policy attribute
on x86 systems with the MPERF/APERF registers present to make it
behave more as expected after recent changes (Rafael Wysocki).
- Drop a leftover callback from the intel_pstate driver which also
prevents the cpuinfo_cur_freq cpufreq policy attribute from being
incorrectly exposed when intel_pstate works in the active mode
(Rafael Wysocki).
- Add a missing piece describing the cpuinfo_cur_freq policy
attribute to cpufreq documentation (Rafael Wysocki).
- Fix up a recently added part of the Thunderbolt driver to avoid
aborting system suspends if its mailbox commands time out (Rafael
Wysocki).
- Update device runtime PM framework documentation to reflect the
current behavior of the code (Johan Hovold)"
* tag 'pm-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thunderbolt: icm: Ignore mailbox errors in icm_suspend()
cpufreq: x86: Make scaling_cur_freq behave more as expected
PM / runtime: Document new pm_runtime_set_suspended() constraint
cpufreq: docs: Add missing cpuinfo_cur_freq description
cpufreq: intel_pstate: Drop ->get from intel_pstate structure
Rafael J. Wysocki [Thu, 3 Aug 2017 18:30:18 +0000 (20:30 +0200)]
Merge branches 'acpi-soc', 'acpi-wdat' and 'acpi-cppc'
* acpi-soc:
ACPI: APD: Fix HID for Hisilicon Hip07/08
ACPI / LPSS: Only call pwm_add_table() for the first PWM controller
* acpi-wdat:
ACPI / watchdog: Fix init failure with overlapping register regions
* acpi-cppc:
mailbox: pcc: Fix crash when request PCC channel 0
Rafael J. Wysocki [Thu, 3 Aug 2017 18:29:45 +0000 (20:29 +0200)]
Merge branches 'pm-core' and 'pm-misc'
* pm-core:
PM / runtime: Document new pm_runtime_set_suspended() constraint
* pm-misc:
thunderbolt: icm: Ignore mailbox errors in icm_suspend()
Rafael J. Wysocki [Thu, 3 Aug 2017 18:29:24 +0000 (20:29 +0200)]
Merge branches 'pm-cpufreq-x86', 'pm-cpufreq-docs' and 'intel_pstate'
* pm-cpufreq-x86:
cpufreq: x86: Make scaling_cur_freq behave more as expected
* pm-cpufreq-docs:
cpufreq: docs: Add missing cpuinfo_cur_freq description
* intel_pstate:
cpufreq: intel_pstate: Drop ->get from intel_pstate structure
Linus Torvalds [Thu, 3 Aug 2017 03:56:44 +0000 (20:56 -0700)]
Merge tag 'nfs-for-4.13-4' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"Two fixes from Trond this time, now that he's back from his vacation.
The first is a stable fix for the EXCHANGE_ID issue on the mailing
list, and the other fixes a double-free situation that he found at the
same time.
Stable fix:
- Fix EXCHANGE_ID corrupt verifier issue
Other fix:
- Fix double frees in nfs4_test_session_trunk()"
* tag 'nfs-for-4.13-4' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4: Fix double frees in nfs4_test_session_trunk()
NFSv4: Fix EXCHANGE_ID corrupt verifier issue
Annie Cherkaev [Sat, 15 Jul 2017 21:08:58 +0000 (15:08 -0600)]
isdn/i4l: fix buffer overflow
This fixes a potential buffer overflow in isdn_net.c caused by an
unbounded strcpy.
[ ISDN seems to be effectively unmaintained, and the I4L driver in
particular is long deprecated, but in case somebody uses this..
- Linus ]
Signed-off-by: Jiten Thakkar <jitenmt@gmail.com>
Signed-off-by: Annie Cherkaev <annie.cherk@gmail.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Wed, 2 Aug 2017 20:32:30 +0000 (13:32 -0700)]
ocfs2: don't clear SGID when inheriting ACLs
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.
Fix the problem by moving posix_acl_update_mode() out of ocfs2_set_acl()
into ocfs2_iop_set_acl(). That way the function will not be called when
inheriting ACLs which is what we want as it prevents SGID bit clearing
and the mode has been properly set by posix_acl_create() anyway. Also
posix_acl_chmod() that is calling ocfs2_set_acl() takes care of updating
mode itself.
Fixes:
073931017b4 ("posix_acl: Clear SGID bit when setting file permissions")
Link: http://lkml.kernel.org/r/20170801141252.19675-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kan Liang [Wed, 2 Aug 2017 20:32:27 +0000 (13:32 -0700)]
mm: allow page_cache_get_speculative in interrupt context
Kernel panic when calling the IRQ-safe __get_user_pages_fast in NMI
handler.
The bug was introduced by commit
2947ba054a4d ("x86/mm/gup: Switch GUP
to the generic get_user_page_fast() implementation").
The original x86 __get_user_page_fast used plain get_page() or
page_ref_add(). However, the generic __get_user_page_fast uses
page_cache_get_speculative(), which has VM_BUG_ON(in_interrupt()).
There is no reason to prevent page_cache_get_speculative from using in
interrupt context. According to the author, putting a BUG_ON there is
just because the code is not verifying correctness of interrupt races.
I did some tests in interrupt context. There is no issue found.
Removing VM_BUG_ON(in_interrupt()) for page_cache_get_speculative().
Link: http://lkml.kernel.org/r/1501609146-59730-1-git-send-email-kan.liang@intel.com
Fixes:
2947ba054a4d ("x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation")
Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Wed, 2 Aug 2017 20:32:24 +0000 (13:32 -0700)]
userfaultfd: non-cooperative: flush event_wqh at release time
There may still be threads waiting on event_wqh at the time the
userfault file descriptor is closed. Flush the events wait-queue to
prevent waiting threads from hanging.
Link: http://lkml.kernel.org/r/1501398127-30419-1-git-send-email-rppt@linux.vnet.ibm.com
Fixes:
9cd75c3cd4c3d ("userfaultfd: non-cooperative: add ability to report
non-PF events from uffd descriptor")
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Wed, 2 Aug 2017 20:32:21 +0000 (13:32 -0700)]
ipc: add missing container_of()s for randstruct
When building with the randstruct gcc plugin, the layout of the IPC
structs will be randomized, which requires any sub-structure accesses to
use container_of(). The proc display handlers were missing the needed
container_of()s since the iterator is passing in the top-level struct
kern_ipc_perm.
This would lead to crashes when running the "lsipc" program after the
system had IPC registered (e.g. after starting up Gnome):
general protection fault: 0000 [#1] PREEMPT SMP
...
RIP: 0010:shm_add_rss_swap.isra.1+0x13/0xa0
...
Call Trace:
sysvipc_shm_proc_show+0x5e/0x150
sysvipc_proc_show+0x1a/0x30
seq_read+0x2e9/0x3f0
...
Link: http://lkml.kernel.org/r/20170730205950.GA55841@beast
Fixes:
3859a271a003 ("randstruct: Mark various structs for randomization")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dima Zavin [Wed, 2 Aug 2017 20:32:18 +0000 (13:32 -0700)]
cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()
In codepaths that use the begin/retry interface for reading
mems_allowed_seq with irqs disabled, there exists a race condition that
stalls the patch process after only modifying a subset of the
static_branch call sites.
This problem manifested itself as a deadlock in the slub allocator,
inside get_any_partial. The loop reads mems_allowed_seq value (via
read_mems_allowed_begin), performs the defrag operation, and then
verifies the consistency of mem_allowed via the read_mems_allowed_retry
and the cookie returned by xxx_begin.
The issue here is that both begin and retry first check if cpusets are
enabled via cpusets_enabled() static branch. This branch can be
rewritted dynamically (via cpuset_inc) if a new cpuset is created. The
x86 jump label code fully synchronizes across all CPUs for every entry
it rewrites. If it rewrites only one of the callsites (specifically the
one in read_mems_allowed_retry) and then waits for the
smp_call_function(do_sync_core) to complete while a CPU is inside the
begin/retry section with IRQs off and the mems_allowed value is changed,
we can hang.
This is because begin() will always return 0 (since it wasn't patched
yet) while retry() will test the 0 against the actual value of the seq
counter.
The fix is to use two different static keys: one for begin
(pre_enable_key) and one for retry (enable_key). In cpuset_inc(), we
first bump the pre_enable key to ensure that cpuset_mems_allowed_begin()
always return a valid seqcount if are enabling cpusets. Similarly, when
disabling cpusets via cpuset_dec(), we first ensure that callers of
cpuset_mems_allowed_retry() will start ignoring the seqcount value
before we let cpuset_mems_allowed_begin() return 0.
The relevant stack traces of the two stuck threads:
CPU: 1 PID: 1415 Comm: mkdir Tainted: G L 4.9.36-00104-g540c51286237 #4
Hardware name: Default string Default string/Hardware, BIOS 4.29.1-
20170526215256 05/26/2017
task:
ffff8817f9c28000 task.stack:
ffffc9000ffa4000
RIP: smp_call_function_many+0x1f9/0x260
Call Trace:
smp_call_function+0x3b/0x70
on_each_cpu+0x2f/0x90
text_poke_bp+0x87/0xd0
arch_jump_label_transform+0x93/0x100
__jump_label_update+0x77/0x90
jump_label_update+0xaa/0xc0
static_key_slow_inc+0x9e/0xb0
cpuset_css_online+0x70/0x2e0
online_css+0x2c/0xa0
cgroup_apply_control_enable+0x27f/0x3d0
cgroup_mkdir+0x2b7/0x420
kernfs_iop_mkdir+0x5a/0x80
vfs_mkdir+0xf6/0x1a0
SyS_mkdir+0xb7/0xe0
entry_SYSCALL_64_fastpath+0x18/0xad
...
CPU: 2 PID: 1 Comm: init Tainted: G L 4.9.36-00104-g540c51286237 #4
Hardware name: Default string Default string/Hardware, BIOS 4.29.1-
20170526215256 05/26/2017
task:
ffff8818087c0000 task.stack:
ffffc90000030000
RIP: int3+0x39/0x70
Call Trace:
<#DB> ? ___slab_alloc+0x28b/0x5a0
<EOE> ? copy_process.part.40+0xf7/0x1de0
__slab_alloc.isra.80+0x54/0x90
copy_process.part.40+0xf7/0x1de0
copy_process.part.40+0xf7/0x1de0
kmem_cache_alloc_node+0x8a/0x280
copy_process.part.40+0xf7/0x1de0
_do_fork+0xe7/0x6c0
_raw_spin_unlock_irq+0x2d/0x60
trace_hardirqs_on_caller+0x136/0x1d0
entry_SYSCALL_64_fastpath+0x5/0xad
do_syscall_64+0x27/0x350
SyS_clone+0x19/0x20
do_syscall_64+0x60/0x350
entry_SYSCALL64_slow_path+0x25/0x25
Link: http://lkml.kernel.org/r/20170731040113.14197-1-dmitriyz@waymo.com
Fixes:
46e700abc44c ("mm, page_alloc: remove unnecessary taking of a seqlock when cpusets are disabled")
Signed-off-by: Dima Zavin <dmitriyz@waymo.com>
Reported-by: Cliff Spradlin <cspradlin@waymo.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Wed, 2 Aug 2017 20:32:15 +0000 (13:32 -0700)]
userfaultfd_zeropage: return -ENOSPC in case mm has gone
In the non-cooperative userfaultfd case, the process exit may race with
outstanding mcopy_atomic called by the uffd monitor. Returning -ENOSPC
instead of -EINVAL when mm is already gone will allow uffd monitor to
distinguish this case from other error conditions.
Unfortunately I overlooked userfaultfd_zeropage when updating
userfaultd_copy().
Link: http://lkml.kernel.org/r/1501136819-21857-1-git-send-email-rppt@linux.vnet.ibm.com
Fixes:
96333187ab162 ("userfaultfd_copy: return -ENOSPC in case mm has gone")
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Heiko Carstens [Wed, 2 Aug 2017 20:32:12 +0000 (13:32 -0700)]
mm: take memory hotplug lock within numa_zonelist_order_handler()
Andre Wild reported the following warning:
WARNING: CPU: 2 PID: 1205 at kernel/cpu.c:240 lockdep_assert_cpus_held+0x4c/0x60
Modules linked in:
CPU: 2 PID: 1205 Comm: bash Not tainted 4.13.0-rc2-00022-gfd2b2c57ec20 #10
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
task:
00000000701d8100 task.stack:
0000000073594000
Krnl PSW :
0704f00180000000 0000000000145e24 (lockdep_assert_cpus_held+0x4c/0x60)
...
Call Trace:
lockdep_assert_cpus_held+0x42/0x60)
stop_machine_cpuslocked+0x62/0xf0
build_all_zonelists+0x92/0x150
numa_zonelist_order_handler+0x102/0x150
proc_sys_call_handler.isra.12+0xda/0x118
proc_sys_write+0x34/0x48
__vfs_write+0x3c/0x178
vfs_write+0xbc/0x1a0
SyS_write+0x66/0xc0
system_call+0xc4/0x2b0
locks held by bash/1205:
#0: (sb_writers#4){.+.+.+}, at: vfs_write+0xa6/0x1a0
#1: (zl_order_mutex){+.+...}, at: numa_zonelist_order_handler+0x44/0x150
#2: (zonelists_mutex){+.+...}, at: numa_zonelist_order_handler+0xf4/0x150
Last Breaking-Event-Address:
lockdep_assert_cpus_held+0x48/0x60
This can be easily triggered with e.g.
echo n > /proc/sys/vm/numa_zonelist_order
In commit
3f906ba23689a ("mm/memory-hotplug: switch locking to a percpu
rwsem") memory hotplug locking was changed to fix a potential deadlock.
This also switched the stop_machine() invocation within
build_all_zonelists() to stop_machine_cpuslocked() which now expects
that online cpus are locked when being called.
This assumption is not true if build_all_zonelists() is being called
from numa_zonelist_order_handler().
In order to fix this simply add a mem_hotplug_begin()/mem_hotplug_done()
pair to numa_zonelist_order_handler().
Link: http://lkml.kernel.org/r/20170726111738.38768-1-heiko.carstens@de.ibm.com
Fixes:
3f906ba23689a ("mm/memory-hotplug: switch locking to a percpu rwsem")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Andre Wild <wild@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Wed, 2 Aug 2017 20:32:09 +0000 (13:32 -0700)]
mm/page_io.c: fix oops during block io poll in swapin path
When a thread is OOM-killed during swap_readpage() operation, an oops
occurs because end_swap_bio_read() is calling wake_up_process() based on
an assumption that the thread which called swap_readpage() is still
alive.
Out of memory: Kill process 525 (polkitd) score 0 or sacrifice child
Killed process 525 (polkitd) total-vm:528128kB, anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
oom_reaper: reaped process 525 (polkitd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter coretemp ppdev pcspkr vmw_balloon sg shpchp vmw_vmci parport_pc parport i2c_piix4 ip_tables xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_acpi vmwgfx ahci libahci drm_kms_helper ata_piix syscopyarea sysfillrect sysimgblt fb_sys_fops mptspi scsi_transport_spi ttm e1000 mptscsih drm mptbase i2c_core libata serio_raw
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0-rc2-next-
20170725 #129
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
task:
ffffffffb7c16500 task.stack:
ffffffffb7c00000
RIP: 0010:__lock_acquire+0x151/0x12f0
Call Trace:
<IRQ>
lock_acquire+0x59/0x80
_raw_spin_lock_irqsave+0x3b/0x4f
try_to_wake_up+0x3b/0x410
wake_up_process+0x10/0x20
end_swap_bio_read+0x6f/0xf0
bio_endio+0x92/0xb0
blk_update_request+0x88/0x270
scsi_end_request+0x32/0x1c0
scsi_io_completion+0x209/0x680
scsi_finish_command+0xd4/0x120
scsi_softirq_done+0x120/0x140
__blk_mq_complete_request_remote+0xe/0x10
flush_smp_call_function_queue+0x51/0x120
generic_smp_call_function_single_interrupt+0xe/0x20
smp_trace_call_function_single_interrupt+0x22/0x30
smp_call_function_single_interrupt+0x9/0x10
call_function_single_interrupt+0xa7/0xb0
</IRQ>
RIP: 0010:native_safe_halt+0x6/0x10
default_idle+0xe/0x20
arch_cpu_idle+0xa/0x10
default_idle_call+0x1e/0x30
do_idle+0x187/0x200
cpu_startup_entry+0x6e/0x70
rest_init+0xd0/0xe0
start_kernel+0x456/0x477
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0xf7/0x11a
secondary_startup_64+0xa5/0xa5
Code: c3 49 81 3f 20 9e 0b b8 41 bc 00 00 00 00 44 0f 45 e2 83 fe 01 0f 87 62 ff ff ff 89 f0 49 8b 44 c7 08 48 85 c0 0f 84 52 ff ff ff <f0> ff 80 98 01 00 00 8b 3d 5a 49 c4 01 45 8b b3 18 0c 00 00 85
RIP: __lock_acquire+0x151/0x12f0 RSP:
ffffa01f39e03c50
---[ end trace
6c441db499169b1e ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception in interrupt
Fix it by holding a reference to the thread.
[akpm@linux-foundation.org: add comment]
Fixes:
23955622ff8d231b ("swap: add block io poll in swapin path")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Shaohua Li <shli@fb.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Wed, 2 Aug 2017 20:32:03 +0000 (13:32 -0700)]
zram: do not free pool->size_class
Mike reported kernel goes oops with ltp:zram03 testcase.
zram: Added device: zram0
zram0: detected capacity change from 0 to
107374182400
BUG: unable to handle kernel paging request at
0000306d61727a77
IP: zs_map_object+0xb9/0x260
PGD 0
P4D 0
Oops: 0000 [#1] SMP
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in: zram(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) loop(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) x_tables(E) af_packet(E) br_netfilter(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) intel_powerclamp(E) coretemp(E) cdc_ether(E) kvm_intel(E) usbnet(E) mii(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) iTCO_wdt(E) ghash_clmulni_intel(E) bnx2(E) iTCO_vendor_support(E) pcbc(E) ioatdma(E) ipmi_ssif(E) aesni_intel(E) i5500_temp(E) i2c_i801(E) aes_x86_64(E) lpc_ich(E) shpchp(E) mfd_core(E) crypto_simd(E) i7core_edac(E) dca(E) glue_helper(E) cryptd(E) ipmi_si(E) button(E) acpi_cpufreq(E) ipmi_devintf(E) pcspkr(E) ipmi_msghandler(E)
nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E) ata_generic(E) i2c_algo_bit(E) ata_piix(E) drm_kms_helper(E) ahci(E) syscopyarea(E) sysfillrect(E) libahci(E) sysimgblt(E) fb_sys_fops(E) uhci_hcd(E) ehci_pci(E) ttm(E) ehci_hcd(E) libata(E) drm(E) megaraid_sas(E) usbcore(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E) [last unloaded: zram]
CPU: 6 PID: 12356 Comm: swapon Tainted: G E 4.13.0.g87b2c3f-default #194
Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698 , BIOS -[D6E150AUS-1.10]- 12/15/2010
task:
ffff880158d2c4c0 task.stack:
ffffc90001680000
RIP: 0010:zs_map_object+0xb9/0x260
Call Trace:
zram_bvec_rw.isra.26+0xe8/0x780 [zram]
zram_rw_page+0x6e/0xa0 [zram]
bdev_read_page+0x81/0xb0
do_mpage_readpage+0x51a/0x710
mpage_readpages+0x122/0x1a0
blkdev_readpages+0x1d/0x20
__do_page_cache_readahead+0x1b2/0x270
ondemand_readahead+0x180/0x2c0
page_cache_sync_readahead+0x31/0x50
generic_file_read_iter+0x7e7/0xaf0
blkdev_read_iter+0x37/0x40
__vfs_read+0xce/0x140
vfs_read+0x9e/0x150
SyS_read+0x46/0xa0
entry_SYSCALL_64_fastpath+0x1a/0xa5
Code: 81 e6 00 c0 3f 00 81 fe 00 00 16 00 0f 85 9f 01 00 00 0f b7 13 65 ff 05 5e 07 dc 7e 66 c1 ea 02 81 e2 ff 01 00 00 49 8b 54 d4 08 <8b> 4a 48 41 0f af ce 81 e1 ff 0f 00 00 41 89 c9 48 c7 c3 a0 70
RIP: zs_map_object+0xb9/0x260 RSP:
ffffc90001683988
CR2:
0000306d61727a77
He bisected the problem is [1].
After commit
cf8e0fedf078 ("mm/zsmalloc: simplify zs_max_alloc_size
handling"), zram doesn't use double pointer for pool->size_class any
more in zs_create_pool so counter function zs_destroy_pool don't need to
free it, either.
Otherwise, it does kfree wrong address and then, kernel goes Oops.
Link: http://lkml.kernel.org/r/20170725062650.GA12134@bbox
Fixes:
cf8e0fedf078 ("mm/zsmalloc: simplify zs_max_alloc_size handling")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Mike Galbraith <efault@gmx.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jonathan Corbet [Wed, 2 Aug 2017 20:32:01 +0000 (13:32 -0700)]
kthread: fix documentation build warning
The kerneldoc comment for kthread_create() had an incorrect argument
name, leading to a warning in the docs build.
Correct it, and make one more small step toward a warning-free build.
Link: http://lkml.kernel.org/r/20170724135916.7f486c6f@lwn.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Wed, 2 Aug 2017 20:31:58 +0000 (13:31 -0700)]
kasan: avoid -Wmaybe-uninitialized warning
gcc-7 produces this warning:
mm/kasan/report.c: In function 'kasan_report':
mm/kasan/report.c:351:3: error: 'info.first_bad_addr' may be used uninitialized in this function [-Werror=maybe-uninitialized]
print_shadow_for_address(info->first_bad_addr);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mm/kasan/report.c:360:27: note: 'info.first_bad_addr' was declared here
The code seems fine as we only print info.first_bad_addr when there is a
shadow, and we always initialize it in that case, but this is relatively
hard for gcc to figure out after the latest rework.
Adding an intialization to the most likely value together with the other
struct members shuts up that warning.
Fixes:
b235b9808664 ("kasan: unify report headers")
Link: https://patchwork.kernel.org/patch/9641417/
Link: http://lkml.kernel.org/r/20170725152739.4176967-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Alexander Potapenko <glider@google.com>
Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Wed, 2 Aug 2017 20:31:55 +0000 (13:31 -0700)]
userfaultfd: non-cooperative: notify about unmap of destination during mremap
When mremap is called with MREMAP_FIXED it unmaps memory at the
destination address without notifying userfaultfd monitor.
If the destination were registered with userfaultfd, the monitor has no
way to distinguish between the old and new ranges and to properly relate
the page faults that would occur in the destination region.
Fixes:
897ab3e0c49e ("userfaultfd: non-cooperative: add event for memory unmaps")
Link: http://lkml.kernel.org/r/1500276876-3350-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Wed, 2 Aug 2017 20:31:52 +0000 (13:31 -0700)]
mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
Nadav Amit identified a theoritical race between page reclaim and
mprotect due to TLB flushes being batched outside of the PTL being held.
He described the race as follows:
CPU0 CPU1
---- ----
user accesses memory using RW PTE
[PTE now cached in TLB]
try_to_unmap_one()
==> ptep_get_and_clear()
==> set_tlb_ubc_flush_pending()
mprotect(addr, PROT_READ)
==> change_pte_range()
==> [ PTE non-present - no flush ]
user writes using cached RW PTE
...
try_to_unmap_flush()
The same type of race exists for reads when protecting for PROT_NONE and
also exists for operations that can leave an old TLB entry behind such
as munmap, mremap and madvise.
For some operations like mprotect, it's not necessarily a data integrity
issue but it is a correctness issue as there is a window where an
mprotect that limits access still allows access. For munmap, it's
potentially a data integrity issue although the race is massive as an
munmap, mmap and return to userspace must all complete between the
window when reclaim drops the PTL and flushes the TLB. However, it's
theoritically possible so handle this issue by flushing the mm if
reclaim is potentially currently batching TLB flushes.
Other instances where a flush is required for a present pte should be ok
as either the page lock is held preventing parallel reclaim or a page
reference count is elevated preventing a parallel free leading to
corruption. In the case of page_mkclean there isn't an obvious path
that userspace could take advantage of without using the operations that
are guarded by this patch. Other users such as gup as a race with
reclaim looks just at PTEs. huge page variants should be ok as they
don't race with reclaim. mincore only looks at PTEs. userfault also
should be ok as if a parallel reclaim takes place, it will either fault
the page back in or read some of the data before the flush occurs
triggering a fault.
Note that a variant of this patch was acked by Andy Lutomirski but this
was for the x86 parts on top of his PCID work which didn't make the 4.13
merge window as expected. His ack is dropped from this version and
there will be a follow-on patch on top of PCID that will include his
ack.
[akpm@linux-foundation.org: tweak comments]
[akpm@linux-foundation.org: fix spello]
Link: http://lkml.kernel.org/r/20170717155523.emckq2esjro6hf3z@suse.de
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: <stable@vger.kernel.org> [v4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kefeng Wang [Wed, 2 Aug 2017 20:31:50 +0000 (13:31 -0700)]
pid: kill pidhash_size in pidhash_init()
After commit
3d375d78593c ("mm: update callers to use HASH_ZERO flag"),
drop unused pidhash_size in pidhash_init().
Link: http://lkml.kernel.org/r/1500389267-49222-1-git-send-email-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Pavel Tatashin <Pasha.Tatashin@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Jordan [Wed, 2 Aug 2017 20:31:47 +0000 (13:31 -0700)]
mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors
Commit
9a291a7c9428 ("mm/hugetlb: report -EHWPOISON not -EFAULT when
FOLL_HWPOISON is specified") causes __get_user_pages to ignore certain
errors from follow_hugetlb_page. After such error, __get_user_pages
subsequently calls faultin_page on the same VMA and start address that
follow_hugetlb_page failed on instead of returning the error immediately
as it should.
In follow_hugetlb_page, when hugetlb_fault returns a value covered under
VM_FAULT_ERROR, follow_hugetlb_page returns it without setting nr_pages
to 0 as __get_user_pages expects in this case, which causes the
following to happen in __get_user_pages: the "while (nr_pages)" check
succeeds, we skip the "if (!vma..." check because we got a VMA the last
time around, we find no page with follow_page_mask, and we call
faultin_page, which calls hugetlb_fault for the second time.
This issue also slightly changes how __get_user_pages works. Before, it
only returned error if it had made no progress (i = 0). But now,
follow_hugetlb_page can clobber "i" with an error code since its new
return path doesn't check for progress. So if "i" is nonzero before a
failing call to follow_hugetlb_page, that indication of progress is lost
and __get_user_pages can return error even if some pages were
successfully pinned.
To fix this, change follow_hugetlb_page so that it updates nr_pages,
allowing __get_user_pages to fail immediately and restoring the "error
only if no progress" behavior to __get_user_pages.
Tested that __get_user_pages returns when expected on error from
hugetlb_fault in follow_hugetlb_page.
Fixes:
9a291a7c9428 ("mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified")
Link: http://lkml.kernel.org/r/1500406795-58462-1-git-send-email-daniel.m.jordan@oracle.com
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: James Morse <james.morse@arm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: zhong jiang <zhongjiang@huawei.com>
Cc: <stable@vger.kernel.org> [4.12.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Felix Kuehling [Wed, 2 Aug 2017 02:34:55 +0000 (22:34 -0400)]
drm/amdgpu: Use list_del_init in amdgpu_mn_unregister
Otherwise bo->shadow_list (which is aliased by bo->mn_list) will not
appear empty in amdgpu_ttm_bo_destroy and cause an oops when freeing
former userptr BOs.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jean Delvare [Sun, 30 Jul 2017 08:18:25 +0000 (10:18 +0200)]
drm/amdgpu: Fix undue fallthroughs in golden registers initialization
As I was staring at the si_init_golden_registers code, I noticed that
the Pitcairn initialization silently falls through the Cape Verde
initialization, and the Oland initialization falls through the Hainan
initialization. However there is no comment stating that this is
intentional, and the radeon driver doesn't have any such fallthrough,
so I suspect this is not supposed to happen.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes:
62a37553414a ("drm/amdgpu: add si implementation v10")
Cc: Ken Wang <Qingqing.Wang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Marek Olšák" <maraeo@gmail.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Linus Torvalds [Wed, 2 Aug 2017 16:43:28 +0000 (09:43 -0700)]
Merge tag 'platform-drivers-x86-v4.13-3' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Darren Hart:
"Fix two bugs under error or abnormal usage conditions. Correct a
config dependency:
dell-wmi:
- Fix driver interface version query
wmi:
- Fix error handling in acpi_wmi_init()
peaq-wmi:
- select INPUT_POLLDEV"
* tag 'platform-drivers-x86-v4.13-3' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: dell-wmi: Fix driver interface version query
platform/x86: wmi: Fix error handling in acpi_wmi_init()
platform/x86: peaq-wmi: select INPUT_POLLDEV
Linus Torvalds [Wed, 2 Aug 2017 15:43:19 +0000 (08:43 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"These seven patches are mostly minor build, Kconfig and error leg
fixes"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qedi: Fix return code in qedi_ep_connect()
scsi: lpfc: fix linking against modular NVMe support
scsi: scsi_transport_fc: return -EBUSY for deleted vport
scsi: libcxgbi: add check for valid cxgbi_task_data
scsi: aic7xxx: fix firmware build with O=path
scsi: megaraid_sas: fix memleak in megasas_alloc_cmdlist_fusion
scsi: qedi: Add ISCSI_BOOT_SYSFS to Kconfig
Trond Myklebust [Tue, 1 Aug 2017 20:02:48 +0000 (16:02 -0400)]
NFSv4: Fix double frees in nfs4_test_session_trunk()
rpc_clnt_add_xprt() expects the callback function to be synchronous, and
expects to release the transport and switch references itself.
Fixes:
04fa2c6bb51b1 ("NFS pnfs data server multipath session trunking")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Sergei Shtylyov [Sat, 29 Jul 2017 19:52:09 +0000 (22:52 +0300)]
powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check
of_irq_to_resource() has recently been fixed to return negative error #'s
along with 0 in case of failure, however the Freescale MPC832x RDB board
code still only regards 0 as a failure indication -- fix it up.
Fixes:
7a4228bbff76 ("of: irq: use of_irq_get() in of_irq_to_resource()")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Andy Lutomirski [Tue, 1 Aug 2017 15:37:26 +0000 (08:37 -0700)]
platform/x86: dell-wmi: Fix driver interface version query
When I converted dell-wmi to the new bus infrastructure, I left the
call to dell_wmi_check_descriptor_buffer() in dell_wmi_init(). This
could cause two problems:
- An error message when loading the driver on a system without
dell-wmi. We'd try to read the event descriptor even if the WMI
GUID wasn't there.
- A possible race if dell-wmi was loaded manually before wmi was
fully initialized.
Fix it by moving the call to the probe function where it belongs.
Fixes:
bff589be59c5 ("platform/x86: dell-wmi: Convert to the WMI bus infrastructure")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Trond Myklebust [Tue, 1 Aug 2017 20:02:47 +0000 (16:02 -0400)]
NFSv4: Fix EXCHANGE_ID corrupt verifier issue
The verifier is allocated on the stack, but the EXCHANGE_ID RPC call was
changed to be asynchronous by commit
8d89bd70bc939. If we interrrupt
the call to rpc_wait_for_completion_task(), we can therefore end up
transmitting random stack contents in lieu of the verifier.
Fixes:
8d89bd70bc939 ("NFS setup async exchange_id")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Linus Torvalds [Tue, 1 Aug 2017 20:20:24 +0000 (13:20 -0700)]
Merge branch 'parisc-4.13-4' of git://git./linux/kernel/git/deller/parisc-linux
Pull parsic fixes from Helge Deller:
- Our cache flushing code ran into a BUG in case context is not
current. Fix it by flushing the whole cache in such rare situations
(by Dave Anglin).
- Fix a "sleeping function called from invalid context BUG" in our
pdc_stable driver by rearranging our locks (by James Bottomley)
- The thread and irq stacks require more than 16 KB since kernel 4.11.
Increase both to 32 KB.
- Define CONFIG_CPU_BIG_ENDIAN unconditionally on parisc to avoid wrong
behaviour in qrwlock functions (by Babu Moger).
* 'parisc-4.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Define CONFIG_CPU_BIG_ENDIAN
parisc: pdc_stable: Fix locking when creating sysfs links
parisc: Increase thread and stack size to 32kb
parisc: Handle vma's whose context is not current in flush_cache_range
Linus Torvalds [Tue, 1 Aug 2017 05:36:42 +0000 (22:36 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Handle notifier registry failures properly in tun/tap driver, from
Tonghao Zhang.
2) Fix bpf verifier handling of subtraction bounds and add a testcase
for this, from Edward Cree.
3) Increase reset timeout in ftgmac100 driver, from Ben Herrenschmidt.
4) Fix use after free in prd_retire_rx_blk_timer_exired() in AF_PACKET,
from Cong Wang.
5) Fix SElinux regression due to recent UDP optimizations, from Paolo
Abeni.
6) We accidently increment IPSTATS_MIB_FRAGFAILS in the ipv6 code
paths, fix from Stefano Brivio.
7) Fix some mem leaks in dccp, from Xin Long.
8) Adjust MDIO_BUS kconfig deps to avoid build errors, from Arnd
Bergmann.
9) Mac address length check and buffer size fixes from Cong Wang.
10) Don't leak sockets in ipv6 udp early demux, from Paolo Abeni.
11) Fix return value when copy_from_user() fails in
bpf_prog_get_info_by_fd(), from Daniel Borkmann.
12) Handle PHY_HALTED properly in phy library state machine, from
Florian Fainelli.
13) Fix OOPS in fib_sync_down_dev(), from Ido Schimmel.
14) Fix truesize calculation in virtio_net which led to performance
regressions, from Michael S Tsirkin.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
samples/bpf: fix bpf tunnel cleanup
udp6: fix jumbogram reception
ppp: Fix a scheduling-while-atomic bug in del_chan
Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config"
virtio_net: fix truesize for mergeable buffers
mv643xx_eth: fix of_irq_to_resource() error check
MAINTAINERS: Add more files to the PHY LIBRARY section
ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()
net: phy: Correctly process PHY_HALTED in phy_stop_machine()
sunhme: fix up GREG_STAT and GREG_IMASK register offsets
bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len
tcp: avoid bogus gcc-7 array-bounds warning
net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"
bpf: don't indicate success when copy_from_user fails
udp6: fix socket leak on early demux
net: thunderx: Fix BGX transmit stall due to underflow
Revert "vhost: cache used event for better performance"
team: use a larger struct for mac address
net: check dev->addr_len for dev_set_mac_address()
phy: bcm-ns-usb3: fix MDIO_BUS dependency
...
William Tu [Mon, 31 Jul 2017 21:40:50 +0000 (14:40 -0700)]
samples/bpf: fix bpf tunnel cleanup
test_tunnel_bpf.sh fails to remove the vxlan11 tunnel device, causing the
next geneve tunnelling test case fails. In addition, the geneve reserved bit
in tcbpf2_kern.c should be zero, according to the RFC.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 31 Jul 2017 14:52:36 +0000 (16:52 +0200)]
udp6: fix jumbogram reception
Since commit
67a51780aebb ("ipv6: udp: leverage scratch area
helpers") udp6_recvmsg() read the skb len from the scratch area,
to avoid a cache miss.
But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their
length exceeds the 16 bits available in the scratch area. As a side
effect the length returned by recvmsg() is:
<ingress datagram len> % (1<<16)
This commit addresses the issue allocating one more bit in the
IP6CB flags field and setting it for incoming jumbograms.
Such field is still in the first cacheline, so at recvmsg()
time we can check it and fallback to access skb->len if
required, without a measurable overhead.
Fixes:
67a51780aebb ("ipv6: udp: leverage scratch area helpers")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Mon, 31 Jul 2017 10:07:38 +0000 (18:07 +0800)]
ppp: Fix a scheduling-while-atomic bug in del_chan
The PPTP set the pptp_sock_destruct as the sock's sk_destruct, it would
trigger this bug when __sk_free is invoked in atomic context, because of
the call path pptp_sock_destruct->del_chan->synchronize_rcu.
Now move the synchronize_rcu to pptp_release from del_chan. This is the
only one case which would free the sock and need the synchronize_rcu.
The following is the panic I met with kernel 3.3.8, but this issue should
exist in current kernel too according to the codes.
BUG: scheduling while atomic
__schedule_bug+0x5e/0x64
__schedule+0x55/0x580
? ppp_unregister_channel+0x1cd5/0x1de0 [ppp_generic]
? dev_hard_start_xmit+0x423/0x530
? sch_direct_xmit+0x73/0x170
__cond_resched+0x16/0x30
_cond_resched+0x22/0x30
wait_for_common+0x18/0x110
? call_rcu_bh+0x10/0x10
wait_for_completion+0x12/0x20
wait_rcu_gp+0x34/0x40
? wait_rcu_gp+0x40/0x40
synchronize_sched+0x1e/0x20
0xf8417298
0xf8417484
? sock_queue_rcv_skb+0x109/0x130
__sk_free+0x16/0x110
? udp_queue_rcv_skb+0x1f2/0x290
sk_free+0x16/0x20
__udp4_lib_rcv+0x3b8/0x650
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 31 Jul 2017 18:05:32 +0000 (11:05 -0700)]
Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config"
This reverts commit
28b45910ccda ("net: bcmgenet: Remove init parameter
from bcmgenet_mii_config") because in the process of moving from
dev_info() to dev_info_once() we essentially lost the helpful printed
messages once the second instance of the driver is loaded.
dev_info_once() does not actually print the message once per device
instance, but once period.
Fixes:
28b45910ccda ("net: bcmgenet: Remove init parameter from bcmgenet_mii_config")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Mon, 31 Jul 2017 18:49:49 +0000 (21:49 +0300)]
virtio_net: fix truesize for mergeable buffers
Seth Forshee noticed a performance degradation with some workloads.
This turns out to be due to packet drops. Euan Kemp noticed that this
is because we drop all packets where length exceeds the truesize, but
for some packets we add in extra memory without updating the truesize.
This in turn was kept around unchanged from
ab7db91705e95 ("virtio-net:
auto-tune mergeable rx buffer size for improved performance"). That
commit had an internal reason not to account for the extra space: not
enough bits to do it. No longer true so let's account for the allocated
length exactly.
Many thanks to Seth Forshee for the report and bisecting and Euan Kemp
for debugging the issue.
Fixes:
680557cf79f8 ("virtio_net: rework mergeable buffer handling")
Reported-by: Euan Kemp <euan.kemp@coreos.com>
Tested-by: Euan Kemp <euan.kemp@coreos.com>
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 29 Jul 2017 19:18:41 +0000 (22:18 +0300)]
mv643xx_eth: fix of_irq_to_resource() error check
of_irq_to_resource() has recently been fixed to return negative error #'s
along with 0 in case of failure, however the Marvell MV643xx Ethernet
driver still only regards 0 as invalid IRQ -- fix it up.
Fixes:
7a4228bbff76 ("of: irq: use of_irq_get() in of_irq_to_resource()")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 31 Jul 2017 16:47:50 +0000 (09:47 -0700)]
MAINTAINERS: Add more files to the PHY LIBRARY section
Include missing files that are provided by, used, or directly maintained
within the PHY LIBRARY, this include uapi header, header files used by
Device Tree code etc.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 28 Jul 2017 20:27:44 +0000 (23:27 +0300)]
ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()
Michał reported a NULL pointer deref during fib_sync_down_dev() when
unregistering a netdevice. The problem is that we don't check for
'in_dev' being NULL, which can happen in very specific cases.
Usually routes are flushed upon NETDEV_DOWN sent in either the netdev or
the inetaddr notification chains. However, if an interface isn't
configured with any IP address, then it's possible for host routes to be
flushed following NETDEV_UNREGISTER, after NULLing dev->ip_ptr in
inetdev_destroy().
To reproduce:
$ ip link add type dummy
$ ip route add local 1.1.1.0/24 dev dummy0
$ ip link del dev dummy0
Fix this by checking for the presence of 'in_dev' before referencing it.
Fixes:
982acb97560c ("ipv4: fib: Notify about nexthop status changes")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Tested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 28 Jul 2017 18:58:36 +0000 (11:58 -0700)]
net: phy: Correctly process PHY_HALTED in phy_stop_machine()
Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we set the state machine to
PHY_HALTED but we don't get to run it to process this state past that
point.
Fix this with a synchronous call to phy_state_machine() in order to have
the state machine actually act on PHY_HALTED, set the PHY device's link
down, turn the network device's carrier off and finally call the
adjust_link() function.
Reported-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Fixes:
a390d1f379cf ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Cave-Ayland [Thu, 27 Jul 2017 16:26:00 +0000 (17:26 +0100)]
sunhme: fix up GREG_STAT and GREG_IMASK register offsets
Update the values to match those from the STP2002QFP documentation.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 31 Jul 2017 21:03:05 +0000 (14:03 -0700)]
Merge branch 'for-4.13-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"Several cgroup bug fixes.
- cgroup core was calling a migration callback on empty migrations,
which could make cpuset crash.
- There was a very subtle bug where the controller interface files
aren't created directly when cgroup2 is mounted. Because later
operations create them, this bug didn't get noticed earlier.
- Failed writes to cgroup.subtree_control were incorrectly returning
zero"
* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix error return value from cgroup_subtree_control()
cgroup: create dfl_root files on subsys registration
cgroup: don't call migration methods if there are no tasks to migrate
Linus Torvalds [Mon, 31 Jul 2017 20:37:28 +0000 (13:37 -0700)]
Merge branch 'for-4.13-fixes' of git://git./linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
"Two notable fixes.
- While adding NUMA affinity support to unbound workqueues, the
assumption that an unbound workqueue with max_active == 1 is
ordered was broken.
The plan was to use explicit alloc_ordered_workqueue() for those
cases. Unfortunately, I forgot to update the documentation properly
and we grew a handful of use cases which depend on that assumption.
While we want to convert them to alloc_ordered_workqueue(), we
don't really lose anything by enforcing ordered execution on
unbound max_active == 1 workqueues and it doesn't make sense to
risk subtle bugs. Restore the assumption.
- Workqueue assumes that CPU <-> NUMA node mapping remains static.
This is a general assumption - we don't have any synchronization
mechanism around CPU <-> node mapping. Unfortunately, powerpc may
change the mapping dynamically leading to crashes. Michael added a
workaround so that we at least don't crash while powerpc hotplug
code gets updated"
* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Work around edge cases for calc of pool's cpumask
workqueue: implicit ordered attribute should be overridable
workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
Linus Torvalds [Mon, 31 Jul 2017 20:33:21 +0000 (13:33 -0700)]
Merge branch 'for-4.13-fixes' of git://git./linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Dan found a really old bug where libata hotplug code wasn't sanitizing
index value from userland and may end up indexing with a negative
number. It is scary but fortunately can only be triggered by root.
Other than that, minor fixes"
* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: fix a couple of doc build warnings
libata: array underflow in ata_find_dev()
ata: sata_rcar: add gen[23] fallback compatibility strings
libata: remove unused rc in ata_eh_handle_port_resume
libata: Cleanup ata_read_log_page()
ata: fix gemini Kconfig dependencies
Babu Moger [Thu, 6 Jul 2017 16:34:19 +0000 (09:34 -0700)]
parisc: Define CONFIG_CPU_BIG_ENDIAN
While working on enabling queued rwlock on SPARC, found this following
code in include/asm-generic/qrwlock.h which uses CONFIG_CPU_BIG_ENDIAN
to clear a byte.
static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
{
return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
}
Problem is many of the fixed big endian architectures don't define
CPU_BIG_ENDIAN and clears the wrong byte.
Define CPU_BIG_ENDIAN for parisc architecture to fix it.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Jonathan Corbet [Sun, 30 Jul 2017 22:16:04 +0000 (16:16 -0600)]
libata: fix a couple of doc build warnings
The kerneldoc comments for a couple of functions in drivers/ata/libata-eh.c
had fallen behind the current implementation, resulting in these doc build
warnings:
./drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link'
./drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done'
./drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc'
./drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense'
Update the comments and make the warnings go away.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
James Bottomley [Mon, 31 Jul 2017 13:49:54 +0000 (15:49 +0200)]
parisc: pdc_stable: Fix locking when creating sysfs links
There's no need to take the write lock when creating sysfs links.
This patch fixes the following BUG:
BUG: sleeping function called from invalid context at mm/slab.h:416
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc2-00110-g0b5477d9dabd #111
Backtrace:
[<
0000000040217ac8>] show_stack+0x20/0x38
[<
00000000406fbbb0>] dump_stack+0xb0/0x128
[<
0000000040274090>] ___might_sleep+0x180/0x1b8
[<
0000000040274144>] __might_sleep+0x7c/0xe8
[<
0000000040373874>] kmem_cache_alloc+0x14c/0x1e0
[<
0000000040419514>] __kernfs_new_node+0x84/0x1b8
[<
000000004041b09c>] kernfs_new_node+0x3c/0x78
[<
000000004041e040>] kernfs_create_link+0x40/0xd8
[<
000000004041f320>] sysfs_do_create_link_sd.isra.0+0xb0/0x130
[<
000000004041f3d4>] sysfs_create_link+0x34/0x58
[<
000000004011b4a4>] pdc_stable_init+0x2c4/0x458
[<
0000000040200250>] do_one_initcall+0x70/0x1d8
[<
0000000040101644>] kernel_init_freeable+0x27c/0x390
[<
000000004020be44>] kernel_init+0x24/0x1c0
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Helge Deller <deller@gmx.de>
Rafael J. Wysocki [Mon, 24 Jul 2017 23:31:00 +0000 (01:31 +0200)]
thunderbolt: icm: Ignore mailbox errors in icm_suspend()
On one of my test machines nhi_mailbox_cmd() called from icm_suspend()
times out and returnes an error which then is propagated to the
caller and causes the entire system suspend to be aborted which isn't
very useful.
Instead of aborting system suspend, print the error into the log
and continue.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Michael Jamet <michael.jamet@intel.com>
Nicholas Piggin [Sat, 29 Jul 2017 12:50:27 +0000 (22:50 +1000)]
powerpc/64s: Fix stack setup in watchdog soft_nmi_common()
The watchdog soft-NMI exception stack setup loads a stack pointer
twice, which is an obvious error. It ends up using the system reset
interrupt (true-NMI) stack, which is also a bug because the watchdog
could be preempted by a system reset interrupt that overwrites the
NMI stack.
Change the soft-NMI to use the "emergency stack". The current kernel
stack is not used, because of the longer-term goal to prevent
asynchronous stack access using soft-disable.
Fixes:
2104180a5369 ("powerpc/64s: implement arch-specific hardlockup watchdog")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Mon, 31 Jul 2017 10:20:29 +0000 (20:20 +1000)]
Merge tag 'v4.13-rc1' into fixes
The fixes branch is based off a random pre-rc1 commit, because we had
some fixes that needed to go in before rc1 was released.
However we now need to fix some code that went in after that point, but
before rc1, so merge rc1 to get that code into fixes so we can fix it!
Helge Deller [Mon, 31 Jul 2017 06:38:27 +0000 (08:38 +0200)]
parisc: Increase thread and stack size to 32kb
Since kernel 4.11 the thread and irq stacks on parisc randomly overflow
the default size of 16k. The reason why stack usage suddenly grew is yet
unknown.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Helge Deller <deller@gmx.de>
John David Anglin [Sun, 30 Jul 2017 20:20:19 +0000 (16:20 -0400)]
parisc: Handle vma's whose context is not current in flush_cache_range
In testing James' patch to drivers/parisc/pdc_stable.c, I hit the BUG
statement in flush_cache_range() during a system shutdown:
kernel BUG at arch/parisc/kernel/cache.c:595!
CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
Workqueue: events free_ioctx
IAOQ[0]: flush_cache_range+0x144/0x148
IAOQ[1]: flush_cache_page+0x0/0x1a8
RP(r2): flush_cache_range+0xec/0x148
Backtrace:
[<
00000000402910ac>] unmap_page_range+0x84/0x880
[<
00000000402918f4>] unmap_single_vma+0x4c/0x60
[<
0000000040291a18>] zap_page_range_single+0x110/0x160
[<
0000000040291c34>] unmap_mapping_range+0x174/0x1a8
[<
000000004026ccd8>] truncate_pagecache+0x50/0xa8
[<
000000004026cd84>] truncate_setsize+0x54/0x70
[<
000000004033d534>] put_aio_ring_file+0x44/0xb0
[<
000000004033d5d8>] aio_free_ring+0x38/0x140
[<
000000004033d714>] free_ioctx+0x34/0xa8
[<
00000000401b0028>] process_one_work+0x1b8/0x4d0
[<
00000000401b04f4>] worker_thread+0x1b4/0x648
[<
00000000401b9128>] kthread+0x1b0/0x208
[<
0000000040150020>] end_fault_vector+0x20/0x28
[<
0000000040639518>] nf_ip_reroute+0x50/0xa8
[<
0000000040638ed0>] nf_ip_route+0x10/0x78
[<
0000000040638c90>] xfrm4_mode_tunnel_input+0x180/0x1f8
CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
Workqueue: events free_ioctx
Backtrace:
[<
0000000040163bf0>] show_stack+0x20/0x38
[<
0000000040688480>] dump_stack+0xa8/0x120
[<
0000000040163dc4>] die_if_kernel+0x19c/0x2b0
[<
0000000040164d0c>] handle_interruption+0xa24/0xa48
This patch modifies flush_cache_range() to handle non current contexts.
In as much as this occurs infrequently, the simplest approach is to
flush the entire cache when this happens.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
Linus Torvalds [Sun, 30 Jul 2017 19:40:36 +0000 (12:40 -0700)]
Linux 4.13-rc3
Linus Torvalds [Sun, 30 Jul 2017 19:19:35 +0000 (12:19 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A small set of x86 fixes:
- prevent the kernel from using the EFI reboot method when EFI is
disabled.
- two patches addressing clang issues"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Disable the address-of-packed-member compiler warning
x86/efi: Fix reboot_mode when EFI runtime services are disabled
x86/boot: #undef memcpy() et al in string.c
Linus Torvalds [Sun, 30 Jul 2017 18:54:08 +0000 (11:54 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"Two patches addressing build warnings caused by inconsistent kernel
doc comments"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/wait: Clean up some documentation warnings
sched/core: Fix some documentation build warnings
Linus Torvalds [Sun, 30 Jul 2017 18:52:15 +0000 (11:52 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A couple of fixes for performance counters and kprobes:
- a series of small patches which make the uncore performance
counters on Skylake server systems work correctly
- add a missing instruction slot release to the failure path of
kprobes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes/x86: Release insn_slot in failure path
perf/x86/intel/uncore: Fix missing marker for skx_uncore_cha_extra_regs
perf/x86/intel/uncore: Fix SKX CHA event extra regs
perf/x86/intel/uncore: Remove invalid Skylake server CHA filter field
perf/x86/intel/uncore: Fix Skylake server CHA LLC_LOOKUP event umask
perf/x86/intel/uncore: Fix Skylake server PCU PMU event format
perf/x86/intel/uncore: Fix Skylake UPI PMU event masks
Linus Torvalds [Sun, 30 Jul 2017 18:27:33 +0000 (11:27 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"Fix for a regression caused by the conversion of x86 to the generic
hotplug code.
Instead of doing a plain single line revert, this adds a pile of
comments so the semantics of the force argument are clear"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/cpuhotplug: Revert "Set force affinity flag on hotplug migration"
Hanjun Guo [Fri, 28 Jul 2017 09:42:35 +0000 (17:42 +0800)]
ACPI: APD: Fix HID for Hisilicon Hip07/08
ACPI HID for Hisilicon Hip07/08 should be HISI02A1/2,
not HISI0A21/2, HISI02A1/2 was tested ok but was modified
by the stupid typo when upstream the patches (by me),
correct them to the right IDs (matching the IDs in
drivers/i2c/busses/i2c-designware-platdrv.c).
Fixes:
6e14cf361a0c (ACPI / APD: Add clock frequency for Hisilicon Hip07/08 I2C controller)
Reported-by: Tao Tian <tiantao6@huawei.com>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Fri, 28 Jul 2017 12:45:03 +0000 (14:45 +0200)]
cpufreq: x86: Make scaling_cur_freq behave more as expected
After commit
f8475cef9008 "x86: use common aperfmperf_khz_on_cpu() to
calculate KHz using APERF/MPERF" the scaling_cur_freq policy attribute
in sysfs only behaves as expected on x86 with APERF/MPERF registers
available when it is read from at least twice in a row. The value
returned by the first read may not be meaningful, because the
computations in there use cached values from the previous iteration
of aperfmperf_snapshot_khz() which may be stale.
To prevent that from happening, modify arch_freq_get_on_cpu() to
call aperfmperf_snapshot_khz() twice, with a short delay between
these calls, if the previous invocation of aperfmperf_snapshot_khz()
was too far back in the past (specifically, more that 1s ago).
Also, as pointed out by Doug Smythies, aperf_delta is limited now
and the multiplication of it by cpu_khz won't overflow, so simplify
the s->khz computations too.
Fixes:
f8475cef9008 "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF"
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Daniel Borkmann [Fri, 28 Jul 2017 15:05:25 +0000 (17:05 +0200)]
bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len
bpf_prog_size(prog->len) is not the correct length we want to dump
back to user space. The code in bpf_prog_get_info_by_fd() uses this
to copy prog->insnsi to user space, but bpf_prog_size(prog->len) also
includes the size of struct bpf_prog itself plus program instructions
and is usually used either in context of accounting or for bpf_prog_alloc()
et al, thus we copy out of bounds in bpf_prog_get_info_by_fd()
potentially. Use the correct bpf_prog_insn_size() instead.
Fixes:
1e2709769086 ("bpf: Add BPF_OBJ_GET_INFO_BY_FD")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 28 Jul 2017 14:41:37 +0000 (16:41 +0200)]
tcp: avoid bogus gcc-7 array-bounds warning
When using CONFIG_UBSAN_SANITIZE_ALL, the TCP code produces a
false-positive warning:
net/ipv4/tcp_output.c: In function 'tcp_connect':
net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds]
tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
^~
net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds]
tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
I have opened a gcc bug for this, but distros have already shipped
compilers with this problem, and it's not clear yet whether there is
a way for gcc to avoid the warning. As the problem is related to the
bitfield access, this introduces a temporary variable to store the old
enum value.
I did not notice this warning earlier, since UBSAN is disabled when
building with COMPILE_TEST, and that was always turned on in both
allmodconfig and randconfig tests.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81601
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Jul 2017 22:30:08 +0000 (15:30 -0700)]
Merge tag 'wireless-drivers-for-davem-2017-07-28' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.13
Two fixes for for brcmfmac, the crash was reported by two people
already so it's a high priority fix.
brcmfmac
* fix a crash in skb headroom handling in v4.13-rc1
* fix a memory leak due to a merge error in v4.6
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 27 Jul 2017 22:15:09 +0000 (23:15 +0100)]
net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"
Trivial fix to spelling mistake in printk message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 27 Jul 2017 19:02:46 +0000 (21:02 +0200)]
bpf: don't indicate success when copy_from_user fails
err in bpf_prog_get_info_by_fd() still holds 0 at that time from prior
check_uarg_tail_zero() check. Explicitly return -EFAULT instead, so
user space can be notified of buggy behavior.
Fixes:
1e2709769086 ("bpf: Add BPF_OBJ_GET_INFO_BY_FD")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 27 Jul 2017 12:45:09 +0000 (14:45 +0200)]
udp6: fix socket leak on early demux
When an early demuxed packet reaches __udp6_lib_lookup_skb(), the
sk reference is retrieved and used, but the relevant reference
count is leaked and the socket destructor is never called.
Beyond leaking the sk memory, if there are pending UDP packets
in the receive queue, even the related accounted memory is leaked.
In the long run, this will cause persistent forward allocation errors
and no UDP skbs (both ipv4 and ipv6) will be able to reach the
user-space.
Fix this by explicitly accessing the early demux reference before
the lookup, and properly decreasing the socket reference count
after usage.
Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and
the now obsoleted comment about "socket cache".
The newly added code is derived from the current ipv4 code for the
similar path.
v1 -> v2:
fixed the __udp6_lib_rcv() return code for resubmission,
as suggested by Eric
Reported-by: Sam Edwards <CFSworks@gmail.com>
Reported-by: Marc Haber <mh+netdev@zugschlus.de>
Fixes:
5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Thu, 27 Jul 2017 07:23:04 +0000 (12:53 +0530)]
net: thunderx: Fix BGX transmit stall due to underflow
For SGMII/RGMII/QSGMII interfaces when physical link goes down
while traffic is high is resulting in underflow condition being set
on that specific BGX's LMAC. Which assets a backpresure and VNIC stops
transmitting packets.
This is due to BGX being disabled in link status change callback while
packet is in transit. This patch fixes this issue by not disabling BGX
but instead just disables packet Rx and Tx.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Thu, 27 Jul 2017 03:22:05 +0000 (11:22 +0800)]
Revert "vhost: cache used event for better performance"
This reverts commit
809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
was reported to break vhost_net. We want to cache used event and use
it to check for notification. The assumption was that guest won't move
the event idx back, but this could happen in fact when 16 bit index
wraps around after 64K entries.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Jul 2017 18:26:45 +0000 (11:26 -0700)]
Merge tag 'mlx5-fixes-2017-07-27-V2' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2017-07-27
This series contains some misc fixes to the mlx5 driver.
Please pull and let me know if there's any problem.
V1->V2:
- removed redundant braces
for -stable:
4.7
net/mlx5: Fix command bad flow on command entry allocation failure
4.9
net/mlx5: Consider tx_enabled in all modes on remap
net/mlx5e: Fix outer_header_zero() check size
4.10
net/mlx5: Fix mlx5_add_flow_rules call with correct num of dests
4.11
net/mlx5: Fix mlx5_ifc_mtpps_reg_bits structure size
net/mlx5e: Add field select to MTPPS register
net/mlx5e: Fix broken disable 1PPS flow
net/mlx5e: Change 1PPS out scheme
net/mlx5e: Add missing support for PTP_CLK_REQ_PPS request
net/mlx5e: Fix wrong delay calculation for overflow check scheduling
net/mlx5e: Schedule overflow check work to mlx5e workqueue
4.12
net/mlx5: Fix command completion after timeout access invalid structure
net/mlx5e: IPoIB, Modify add/remove underlay QPN flows
I hope this is not too much, but most of the patches do apply cleanly on -stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Wed, 26 Jul 2017 22:22:07 +0000 (15:22 -0700)]
team: use a larger struct for mac address
IPv6 tunnels use sizeof(struct in6_addr) as dev->addr_len,
but in many places especially bonding, we use struct sockaddr
to copy and set mac addr, this could lead to stack out-of-bounds
access.
Fix it by using a larger address storage like bonding.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Wed, 26 Jul 2017 22:22:06 +0000 (15:22 -0700)]
net: check dev->addr_len for dev_set_mac_address()
Historically, dev_ifsioc() uses struct sockaddr as mac
address definition, this is why dev_set_mac_address()
accepts a struct sockaddr pointer as input but now we
have various types of mac addresse whose lengths
are up to MAX_ADDR_LEN, longer than struct sockaddr,
and saved in dev->addr_len.
It is too late to fix dev_ifsioc() due to API
compatibility, so just reject those larger than
sizeof(struct sockaddr), otherwise we would read
and use some random bytes from kernel stack.
Fortunately, only a few IPv6 tunnel devices have addr_len
larger than sizeof(struct sockaddr) and they don't support
ndo_set_mac_addr(). But with team driver, in lb mode, they
can still be enslaved to a team master and make its mac addr
length as the same.
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 29 Jul 2017 00:21:41 +0000 (17:21 -0700)]
Merge tag 'devicetree-fixes-for-4.13' of git://git./linux/kernel/git/robh/linux
Pull DeviceTree fixes from Rob Herring:
"Two small DT fixes:
- Fix error handling in of_irq_to_resource_table() due to
of_irq_to_resource() error return changes.
- Fix dtx_diff script due to dts include path changes"
* tag 'devicetree-fixes-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: irq: fix of_irq_to_resource() error check
scripts/dtc: dtx_diff - update include dts paths to match build
Linus Torvalds [Fri, 28 Jul 2017 21:44:56 +0000 (14:44 -0700)]
Merge tag 'nfs-for-4.13-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"More NFS client bugfixes for 4.13.
Most of these fix locking bugs that Ben and Neil noticed, but I also
have a patch to fix one more access bug that was reported after last
week.
Stable fixes:
- Fix a race where CB_NOTIFY_LOCK fails to wake a waiter
- Invalidate file size when taking a lock to prevent corruption
Other fixes:
- Don't excessively generate tiny writes with fallocate
- Use the raw NFS access mask in nfs4_opendata_access()"
* tag 'nfs-for-4.13-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter
NFS: Optimize fallocate by refreshing mapping when needed.
NFS: invalidate file size when taking a lock.
NFS: Use raw NFS access mask in nfs4_opendata_access()
Linus Torvalds [Fri, 28 Jul 2017 21:29:48 +0000 (14:29 -0700)]
Merge tag 'xfs-4.13-fixes-2' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
- fix firstfsb variables that we left uninitialized, which could lead
to locking problems.
- check for NULL metadata buffer pointers before using them.
- don't allow btree cursor manipulation if the btree block is corrupt.
Better to just shut down.
- fix infinite loop problems in quotacheck.
- fix buffer overrun when validating directory blocks.
- fix deadlock problem in bunmapi.
* tag 'xfs-4.13-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix multi-AG deadlock in xfs_bunmapi
xfs: check that dir block entries don't off the end of the buffer
xfs: fix quotacheck dquot id overflow infinite loop
xfs: check _alloc_read_agf buffer pointer before using
xfs: set firstfsb to NULLFSBLOCK before feeding it to _bmapi_write
xfs: check _btree_check_block value
Linus Torvalds [Fri, 28 Jul 2017 20:36:56 +0000 (13:36 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"s390:
- SRCU fix
PPC:
- host crash fixes
x86:
- bugfixes, including making nested posted interrupts really work
Generic:
- tweaks to kvm_stat and to uevents"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: LAPIC: Fix reentrancy issues with preempt notifiers
tools/kvm_stat: add '-f help' to get the available event list
tools/kvm_stat: use variables instead of hard paths in help output
KVM: nVMX: Fix loss of L2's NMI blocking state
KVM: nVMX: Fix posted intr delivery when vcpu is in guest mode
x86: irq: Define a global vector for nested posted interrupts
KVM: x86: do mask out upper bits of PAE CR3
KVM: make pid available for uevents without debugfs
KVM: s390: take srcu lock when getting/setting storage keys
KVM: VMX: remove unused field
KVM: PPC: Book3S HV: Fix host crash on changing HPT size
KVM: PPC: Book3S HV: Enable TM before accessing TM registers
Linus Torvalds [Fri, 28 Jul 2017 20:35:12 +0000 (13:35 -0700)]
Merge tag 'for-linus-4.13b-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Three minor cleanups for xen related drivers"
* tag 'for-linus-4.13b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: dont fiddle with event channel masking in suspend/resume
xen: selfballoon: remove unnecessary static in frontswap_selfshrink()
xen: Drop un-informative message during boot
Linus Torvalds [Fri, 28 Jul 2017 20:29:36 +0000 (13:29 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"I'd been collecting these whilst we debugged a CPU hotplug failure,
but we ended up diagnosing that one to tglx, who has taken a fix via
the -tip tree separately.
We're seeing some NFS issues that we haven't gotten to the bottom of
yet, and we've uncovered some issues with our backtracing too so there
might be another fixes pull before we're done.
Summary:
- Ensure we have a guard page after the kernel image in vmalloc
- Fix incorrect prefetch stride in copy_page
- Ensure irqs are disabled in die()
- Fix for event group validation in QCOM L2 PMU driver
- Fix requesting of PMU IRQs on AMD Seattle
- Minor cleanups and fixes"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mmu: Place guard page after mapping of kernel image
drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPU
arm64: sysreg: Fix unprotected macro argmuent in write_sysreg
perf: qcom_l2: fix column exclusion check
arm64/lib: copy_page: use consistent prefetch stride
arm64/numa: Drop duplicate message
perf: Convert to using %pOF instead of full_name
arm64: Convert to using %pOF instead of full_name
arm64: traps: disable irq in die()
arm64: atomics: Remove '&' from '+&' asm constraint in lse atomics
arm64: uaccess: Remove redundant __force from addr cast in __range_ok
Linus Torvalds [Fri, 28 Jul 2017 20:25:15 +0000 (13:25 -0700)]
Merge tag 'powerpc-4.13-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"The highlight is Ben's patch to work around a host killing bug when
running KVM guests with the Radix MMU on Power9. See the long change
log of that commit for more detail.
And then three fairly minor fixes:
- fix of_node_put() underflow during reconfig remove, using old DLPAR
tools.
- fix recently introduced ld version check with 64-bit LE-only
toolchain.
- free the subpage_prot_table correctly, avoiding a memory leak.
Thanks to: Aneesh Kumar K.V, Benjamin Herrenschmidt, Laurent Vivier"
* tag 'powerpc-4.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm/hash: Free the subpage_prot_table correctly
powerpc/Makefile: Fix ld version check with 64-bit LE-only toolchain
powerpc/pseries: Fix of_node_put() underflow during reconfig remove
powerpc/mm/radix: Workaround prefetch issue with KVM
Benjamin Coddington [Fri, 28 Jul 2017 16:33:54 +0000 (12:33 -0400)]
NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter
nfs4_retry_setlk() sets the task's state to TASK_INTERRUPTIBLE within the
same region protected by the wait_queue's lock after checking for a
notification from CB_NOTIFY_LOCK callback. However, after releasing that
lock, a wakeup for that task may race in before the call to
freezable_schedule_timeout_interruptible() and set TASK_WAKING, then
freezable_schedule_timeout_interruptible() will set the state back to
TASK_INTERRUPTIBLE before the task will sleep. The result is that the task
will sleep for the entire duration of the timeout.
Since we've already set TASK_INTERRUPTIBLE in the locked section, just use
freezable_schedule_timout() instead.
Fixes:
a1d617d8f134 ("nfs: allow blocking locks to be awoken by lock callbacks")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Linus Torvalds [Fri, 28 Jul 2017 19:31:49 +0000 (12:31 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- remove broken dt bindings in inside-secure
- fix authencesn crash when used with digest_null
- fix cavium/nitrox firmware path
- fix SHA3 failure in brcm
- fix Kconfig dependency for brcm
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: authencesn - Fix digest_null crash
crypto: brcm - remove BCM_PDC_MBOX dependency in Kconfig
Documentation/bindings: crypto: remove the dma-mask property
crypto: inside-secure - do not parse the dma mask from dt
crypto: cavium/nitrox - Change in firmware path.
crypto: brcm - Fix SHA3-512 algorithm failure
Linus Torvalds [Fri, 28 Jul 2017 19:26:59 +0000 (12:26 -0700)]
Merge branch 'for-4.13-part3' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Fixes addressing problems reported by users, and there's one more
regression fix"
* 'for-4.13-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: round down size diff when shrinking/growing device
Btrfs: fix early ENOSPC due to delalloc
btrfs: fix lockup in find_free_extent with read-only block groups
Btrfs: fix dir item validation when replaying xattr deletes
Linus Torvalds [Fri, 28 Jul 2017 19:24:21 +0000 (12:24 -0700)]
Merge branch 'for-next' of git://git./linux/kernel/git/shli/md
Pull MD fixes from Shaohua Li:
"This fixes several bugs, three of them are marked for stable:
- an initialization issue fixed by Ming
- a bio clone race issue fixed by me
- an async tx flush issue fixed by Ofer
- other cleanups"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
MD: fix warnning for UP case
md/raid5: add thread_group worker async_tx_issue_pending_all
md: simplify code with bio_io_error
md/raid1: fix writebehind bio clone
md: raid1-10: move raid1/raid10 common code into raid1-10.c
md: raid1/raid10: initialize bvec table via bio_add_page()
md: remove 'idx' from 'struct resync_pages'
Linus Torvalds [Fri, 28 Jul 2017 19:17:17 +0000 (12:17 -0700)]
Merge tag 'for-4.13/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- a few DM integrity fixes that improve performance. One that address
inefficiencies in the on-disk journal device layout. Another that
makes use of the block layer's on-stack plugging when writing the
journal.
- a dm-bufio fix for the blk_status_t conversion that went in during
the merge window.
- a few DM raid fixes that address correctness when suspending the
device and a validation fix for validation that occurs during device
activation.
- a couple DM zoned target fixes. Important one being the fix to not
use GFP_KERNEL in the IO path due to concerns about deadlock in
low-memory conditions (e.g. swap over a DM zoned device, etc).
- a DM DAX device fix to make sure dm_dax_flush() is called if the
underlying DAX device is operating as a write cache.
* tag 'for-4.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm, dax: Make sure dm_dax_flush() is called if device supports it
dm verity fec: fix GFP flags used with mempool_alloc()
dm zoned: use GFP_NOIO in I/O path
dm zoned: remove test for impossible REQ_OP_FLUSH conditions
dm raid: bump target version
dm raid: avoid mddev->suspended access
dm raid: fix activation check in validate_raid_redundancy()
dm raid: remove WARN_ON() in raid10_md_layout_to_format()
dm bufio: fix error code in dm_bufio_write_dirty_buffers()
dm integrity: test for corrupted disk format during table load
dm integrity: WARN_ON if variables representing journal usage get out of sync
dm integrity: use plugging when writing the journal
dm integrity: fix inefficient allocation of journal space
Linus Torvalds [Fri, 28 Jul 2017 19:13:34 +0000 (12:13 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A small collection of fixes that should go into this series. This
contains:
- NVMe pull request from Christoph, with various fixes for nvme
proper and nvme-fc.
- disable runtime PM for blk-mq for now.
With scsi now defaulting to using blk-mq, this reared its head as
an issue. Longer term we'll fix up runtime PM for blk-mq, for now
just disable it to prevent a hang on laptop resume for some folks.
- blk-mq CPU <-> hw queue map fix from Christoph.
- xen/blkfront pull request from Konrad, with two small fixes for the
blkfront driver.
- a few fixups for nbd from Joseph.
- a stable fix for pblk from Javier"
* 'for-linus' of git://git.kernel.dk/linux-block:
lightnvm: pblk: advance bio according to lba index
nvme: validate admin queue before unquiesce
nbd: clear disconnected on reconnect
nvme-pci: fix HMB size calculation
nvme-fc: revise TRADDR parsing
nvme-fc: address target disconnect race conditions in fcp io submit
nvme: fabrics commands should use the fctype field for data direction
nvme: also provide a UUID in the WWID sysfs attribute
xen/blkfront: always allocate grants first from per-queue persistent grants
xen-blkfront: fix mq start/stop race
blk-mq: map queues to all present CPUs
block: disable runtime-pm for blk-mq
xen-blkfront: Fix handling of non-supported operations
nbd: only set sndtimeo if we have a timeout set
nbd: take tx_lock before disconnecting
nbd: allow multiple disconnects to be sent
Linus Torvalds [Fri, 28 Jul 2017 19:04:36 +0000 (12:04 -0700)]
Merge tag 'mmc-v4.13-rc1' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"Here are a couple of mmc fixes intended for v4.13-rc1.
I have also included a couple of cleanup patches in this pull request
for OMAP2+, related to the omap_hsmmc driver. The reason is because of
the changes are also depending on OMAP SoC specific code, so this
simplifies how to deal with this.
Summary:
MMC host:
- sunxi: Correct time phase settings
- omap_hsmmc: Clean up some dead code
- dw_mmc: Fix message printed for deprecated num-slots DT binding
- dw_mmc: Fix DT documentation"
* tag 'mmc-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
Documentation: dw-mshc: deprecate num-slots
mmc: dw_mmc: fix the wrong condition check of getting num-slots from DT
mmc: host: omap_hsmmc: remove unused platform callbacks
ARM: OMAP2+: hsmmc.c: Remove dead code
mmc: sunxi: Keep default timing phase settings for new timing mode
Michael Bringmann [Thu, 27 Jul 2017 21:27:14 +0000 (16:27 -0500)]
workqueue: Work around edge cases for calc of pool's cpumask
There is an underlying assumption/trade-off in many layers of the Linux
system that CPU <-> node mapping is static. This is despite the presence
of features like NUMA and 'hotplug' that support the dynamic addition/
removal of fundamental system resources like CPUs and memory. PowerPC
systems, however, do provide extensive features for the dynamic change
of resources available to a system.
Currently, there is little or no synchronization protection around the
updating of the CPU <-> node mapping, and the export/update of this
information for other layers / modules. In systems which can change
this mapping during 'hotplug', like PowerPC, the information is changing
underneath all layers that might reference it.
This patch attempts to ensure that a valid, usable cpumask attribute
is used by the workqueue infrastructure when setting up new resource
pools. It prevents a crash that has been observed when an 'empty'
cpumask is passed along to the worker/task scheduling code. It is
intended as a temporary workaround until a more fundamental review and
correction of the issue can be done.
[With additions to the patch provided by Tejun Hao <tj@kernel.org>]
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Javier González [Fri, 28 Jul 2017 13:13:16 +0000 (15:13 +0200)]
lightnvm: pblk: advance bio according to lba index
When a lba either hits the cache or corresponds to an empty entry in the
L2P table, we need to advance the bio according to the position in which
the lba is located. Otherwise, we will copy data in the wrong page, thus
causing data corruption for the application.
In case of a cache hit, we assumed that bio->bi_iter.bi_idx would
contain the correct index, but this is no necessarily true. Instead, use
the local bio advance counter and iterator. This guarantees that lbas
hitting the cache are copied into the right bv_page.
In case of an empty L2P entry, we omitted to advance the bio. In the
cases when the same I/O also contains a cache hit, data corresponding
to this lba will be copied to the wrong bv_page. Fix this by advancing
the bio as we do in the case of a cache hit.
Fixes:
a4bd217b4326 lightnvm: physical block device (pblk) target
Signed-off-by: Javier González <javier@javigon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Alistair Popple [Wed, 26 Jul 2017 05:26:40 +0000 (15:26 +1000)]
powerpc/powernv/pci: Return failure for some uses of dma_set_mask()
Commit
8e3f1b1d8255 ("powerpc/powernv/pci: Enable 64-bit devices to access
>4GB DMA space") introduced the ability for PCI device drivers to request a
DMA mask between 64 and 32 bits and actually get a mask greater than
32-bits. However currently if certain machine configuration dependent
conditions are not meet the code silently falls back to a 32-bit mask.
This makes it hard for device drivers to detect which mask they actually
got. Instead we should return an error when the request could not be
fulfilled which allows drivers to either fallback or implement other
workarounds as documented in DMA-API-HOWTO.txt.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 26 Jul 2017 13:19:04 +0000 (23:19 +1000)]
powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler
Historically the boot wrapper was always built 32-bit big endian, even
for 64-bit kernels. That was because old firmwares didn't necessarily
support booting a 64-bit image. Because of that arch/powerpc/boot/Makefile
uses CROSS32CC for compilation.
However when we added 64-bit little endian support, we also added
support for building the boot wrapper 64-bit. However we kept using
CROSS32CC, because in most cases it is just CC and everything works.
However if the user doesn't specify CROSS32_COMPILE (which no one ever
does AFAIK), and CC is *not* biarch (32/64-bit capable), then CROSS32CC
becomes just "gcc". On native systems that is probably OK, but if we're
cross building it definitely isn't, leading to eg:
gcc ... -m64 -mlittle-endian -mabi=elfv2 ... arch/powerpc/boot/cpm-serial.c
gcc: error: unrecognized argument in option ‘-mabi=elfv2’
gcc: error: unrecognized command line option ‘-mlittle-endian’
make: *** [zImage] Error 2
To fix it, stop using CROSS32CC, because we may or may not be building
32-bit. Instead setup a BOOTCC, which defaults to CC, and only use
CROSS32_COMPILE if it's set and we're building for 32-bit.
Fixes:
147c05168fc8 ("powerpc/boot: Add support for 64bit little endian wrapper")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Michael Ellerman [Thu, 27 Jul 2017 13:23:37 +0000 (23:23 +1000)]
powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU
In smp_cpus_done() we need to call smp_ops->setup_cpu() for the boot
CPU, which means it has to run *on* the boot CPU.
In the past we ensured it ran on the boot CPU by changing the CPU
affinity mask of current directly. That was removed in commit
6d11b87d55eb ("powerpc/smp: Replace open coded task affinity logic"),
and replaced with a work queue call.
Unfortunately using a work queue leads to a lockdep warning, now that
the CPU hotplug lock is a regular semaphore:
======================================================
WARNING: possible circular locking dependency detected
...
kworker/0:1/971 is trying to acquire lock:
(cpu_hotplug_lock.rw_sem){++++++}, at: [<
c000000000100974>] apply_workqueue_attrs+0x34/0xa0
but task is already holding lock:
((&wfc.work)){+.+.+.}, at: [<
c0000000000fdb2c>] process_one_work+0x25c/0x800
...
CPU0 CPU1
---- ----
lock((&wfc.work));
lock(cpu_hotplug_lock.rw_sem);
lock((&wfc.work));
lock(cpu_hotplug_lock.rw_sem);
Although the deadlock can't happen in practice, because
smp_cpus_done() only runs in early boot before CPU hotplug is allowed,
lockdep can't tell that.
Luckily in commit
8fb12156b8db ("init: Pin init task to the boot CPU,
initially") tglx changed the generic code to pin init to the boot CPU
to begin with. The unpinning of init from the boot CPU happens in
sched_init_smp(), which is called after smp_cpus_done().
So smp_cpus_done() is always called on the boot CPU, which means we
don't need the work queue call at all - and the lockdep warning goes
away.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Will Deacon [Mon, 24 Jul 2017 10:46:09 +0000 (11:46 +0100)]
arm64: mmu: Place guard page after mapping of kernel image
The vast majority of virtual allocations in the vmalloc region are followed
by a guard page, which can help to avoid overruning on vma into another,
which may map a read-sensitive device.
This patch adds a guard page to the end of the kernel image mapping (i.e.
following the data/bss segments).
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Matthias Kaehlcke [Tue, 25 Jul 2017 21:50:53 +0000 (14:50 -0700)]
x86/boot: Disable the address-of-packed-member compiler warning
The clang warning 'address-of-packed-member' is disabled for the general
kernel code, also disable it for the x86 boot code.
This suppresses a bunch of warnings like this when building with clang:
./arch/x86/include/asm/processor.h:535:30: warning: taking address of
packed member 'sp0' of class or structure 'x86_hw_tss' may result in an
unaligned pointer value [-Waddress-of-packed-member]
return this_cpu_read_stable(cpu_tss.x86_tss.sp0);
^~~~~~~~~~~~~~~~~~~
./arch/x86/include/asm/percpu.h:391:59: note: expanded from macro
'this_cpu_read_stable'
#define this_cpu_read_stable(var) percpu_stable_op("mov", var)
^~~
./arch/x86/include/asm/percpu.h:228:16: note: expanded from macro
'percpu_stable_op'
: "p" (&(var)));
^~~
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170725215053.135586-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Gustavo Romero [Wed, 19 Jul 2017 05:44:13 +0000 (01:44 -0400)]
powerpc/tm: Fix saving of TM SPRs in core dump
Currently flush_tmregs_to_thread() does not save the TM SPRs (TFHAR,
TFIAR, TEXASR) to the thread struct, unless the process is currently
inside a suspended transaction.
If the process is core dumping, and the TM SPRs have changed since the
last time the process was context switched, then we will save stale
values of the TM SPRs to the core dump.
Fix it by saving the live register state to the thread struct in that
case.
Fixes:
08e1c01d6aed ("powerpc/ptrace: Enable support for TM SPR state")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>