Dan Williams [Sat, 17 Mar 2018 00:36:44 +0000 (17:36 -0700)]
dax: Report bytes remaining in dax_iomap_actor()
In preparation for protecting the dax read(2) path from media errors
with copy_to_iter_mcsafe() (via dax_copy_to_iter()), convert the
implementation to report the bytes successfully transferred.
Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 2 May 2018 13:46:33 +0000 (06:46 -0700)]
dax: Introduce a ->copy_to_iter dax operation
Similar to the ->copy_from_iter() operation, a platform may want to
deploy an architecture or device specific routine for handling reads
from a dax_device like /dev/pmemX. On x86 this routine will point to a
machine check safe version of copy_to_iter(). For now, add the plumbing
to device-mapper and the dax core.
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 23 May 2018 06:17:03 +0000 (23:17 -0700)]
uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation
Add a common Kconfig CONFIG_ARCH_HAS_UACCESS_MCSAFE that archs can
optionally select, and fixup the declaration of _copy_to_iter_mcsafe().
Fixes:
8780356ef630 ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 4 May 2018 00:06:31 +0000 (17:06 -0700)]
x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()
Use the updated memcpy_mcsafe() implementation to define
copy_user_mcsafe() and copy_to_iter_mcsafe(). The most significant
difference from typical copy_to_iter() is that the ITER_KVEC and
ITER_BVEC iterator types can fail to complete a full transfer.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: hch@lst.de
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/152539239150.31796.9189779163576449784.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dan Williams [Fri, 4 May 2018 00:06:26 +0000 (17:06 -0700)]
x86/asm/memcpy_mcsafe: Add write-protection-fault handling
In preparation for using memcpy_mcsafe() to handle user copies it needs
to be to handle write-protection faults while writing user pages. Add
MMU-fault handlers alongside the machine-check exception handlers.
Note that the machine check fault exception handling makes assumptions
about source buffer alignment and poison alignment. In the write fault
case, given the destination buffer is arbitrarily aligned, it needs a
separate / additional fault handling approach. The mcsafe_handle_tail()
helper is reused. The @limit argument is set to @len since there is no
safety concern about retriggering an MMU fault, and this simplifies the
assembly.
Co-developed-by: Tony Luck <tony.luck@intel.com>
Reported-by: Mika Penttilä <mika.penttila@nextfour.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: hch@lst.de
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/152539238635.31796.14056325365122961778.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dan Williams [Fri, 4 May 2018 00:06:21 +0000 (17:06 -0700)]
x86/asm/memcpy_mcsafe: Return bytes remaining
Machine check safe memory copies are currently deployed in the pmem
driver whenever reading from persistent memory media, so that -EIO is
returned rather than triggering a kernel panic. While this protects most
pmem accesses, it is not complete in the filesystem-dax case. When
filesystem-dax is enabled reads may bypass the block layer and the
driver via dax_iomap_actor() and its usage of copy_to_iter().
In preparation for creating a copy_to_iter() variant that can handle
machine checks, teach memcpy_mcsafe() to return the number of bytes
remaining rather than -EFAULT when an exception occurs.
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: hch@lst.de
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/152539238119.31796.14318473522414462886.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dan Williams [Fri, 4 May 2018 00:06:16 +0000 (17:06 -0700)]
x86/asm/memcpy_mcsafe: Add labels for __memcpy_mcsafe() write fault handling
The memcpy_mcsafe() implementation handles CPU exceptions when reading
from the source address. Before it can be used for user copies it needs
to grow support for handling write faults. In preparation for adding
that exception handling update the labels for the read cache word X case
(.L_cache_rX) and write cache word X case (.L_cache_wX).
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hch@lst.de
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/152539237606.31796.6719743548991782264.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dan Williams [Fri, 4 May 2018 00:06:11 +0000 (17:06 -0700)]
x86/asm/memcpy_mcsafe: Remove loop unrolling
In preparation for teaching memcpy_mcsafe() to return 'bytes remaining'
rather than pass / fail, simplify the implementation to remove loop
unrolling. The unrolling complicates the fault handling for negligible
benefit given modern CPUs perform loop stream detection.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: hch@lst.de
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/152539237092.31796.9115692316555638048.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sun, 13 May 2018 23:15:17 +0000 (16:15 -0700)]
Linux 4.17-rc5
Linus Torvalds [Sun, 13 May 2018 17:53:08 +0000 (10:53 -0700)]
Merge branch 'x86-pti-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86/pti updates from Thomas Gleixner:
"A mixed bag of fixes and updates for the ghosts which are hunting us.
The scheduler fixes have been pulled into that branch to avoid
conflicts.
- A set of fixes to address a khread_parkme() race which caused lost
wakeups and loss of state.
- A deadlock fix for stop_machine() solved by moving the wakeups
outside of the stopper_lock held region.
- A set of Spectre V1 array access restrictions. The possible
problematic spots were discuvered by Dan Carpenters new checks in
smatch.
- Removal of an unused file which was forgotten when the rest of that
functionality was removed"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Remove unused file
perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr
perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver
perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map()
perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_*
perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[]
sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
sched/core: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
sched/core: Introduce set_special_state()
kthread, sched/wait: Fix kthread_parkme() completion issue
kthread, sched/wait: Fix kthread_parkme() wait-loop
sched/fair: Fix the update of blocked load when newly idle
stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock
Linus Torvalds [Sun, 13 May 2018 17:46:53 +0000 (10:46 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"Revert the new NUMA aware placement approach which turned out to
create more problems than it solved"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "sched/numa: Delay retrying placement for automatic NUMA balance after wake_affine()"
Linus Torvalds [Sun, 13 May 2018 17:44:32 +0000 (10:44 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf tooling fixes from Thomas Gleixner:
"Another small set of perf tooling fixes and updates:
- Revert "perf pmu: Fix pmu events parsing rule", as it broke Intel
PT event description parsing (Arnaldo Carvalho de Melo)
- Sync x86's cpufeatures.h and kvm UAPI headers with the kernel
sources, suppressing the ABI drift warnings (Arnaldo Carvalho de
Melo)
- Remove duplicated entry for westmereep-dp in Intel's mapfile.csv
(William Cohen)
- Fix typo in 'perf bench numa' options description (Yisheng Xie)"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "perf pmu: Fix pmu events parsing rule"
tools headers kvm: Sync ARM UAPI headers with the kernel sources
tools headers kvm: Sync uapi/linux/kvm.h with the kernel sources
tools headers: Sync x86 cpufeatures.h with the kernel sources
perf vendor events intel: Remove duplicated entry for westmereep-dp in mapfile.csv
perf bench numa: Fix typo in options
Linus Torvalds [Sun, 13 May 2018 17:28:53 +0000 (10:28 -0700)]
Merge tag 'dma-mapping-4.17-5' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
"Just one little fix from Jean to avoid a harmless but very annoying
warning, especially for the drm code"
* tag 'dma-mapping-4.17-5' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: silent unwanted warning "buffer is full"
Linus Torvalds [Sun, 13 May 2018 01:49:53 +0000 (18:49 -0700)]
Merge tag '4.17-rc4-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Some small SMB3 fixes for 4.17-rc5, some for stable"
* tag '4.17-rc4-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: directory sync should not return an error
cifs: smb2ops: Fix listxattr() when there are no EAs
cifs: smbd: Enable signing with smbdirect
cifs: Allocate validate negotiation request through kmalloc
Linus Torvalds [Sat, 12 May 2018 17:58:57 +0000 (10:58 -0700)]
Merge branch 'next' of git://git./linux/kernel/git/rzhang/linux
Pull thermal fixes from Zhang Rui:
- fix NULL pointer dereference on module load/probe for int3403_thermal
driver
- fix an emergency shutdown issue on exynos thermal driver
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: exynos: Propagate error value from tmu_read()
thermal: exynos: Reading temperature makes sense only when TMU is turned on
thermal: int3403_thermal: Fix NULL pointer deref on module load / probe
Linus Torvalds [Sat, 12 May 2018 17:55:48 +0000 (10:55 -0700)]
Merge tag 'for-linus-
20180511' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Just a few NVMe fixes this round - one fixing a use-after-free, one
fixes the return value after controller reset, and the last one fixes
an issue where some drives will spuriously EIO. We should get these
into 4.17"
* tag 'for-linus-
20180511' of git://git.kernel.dk/linux-block:
nvme: add quirk to force medium priority for SQ creation
nvme: Fix sync controller reset return
nvme: fix use-after-free in nvme_free_ns_head
Jean Delvare [Sat, 12 May 2018 09:57:37 +0000 (11:57 +0200)]
swiotlb: silent unwanted warning "buffer is full"
If DMA_ATTR_NO_WARN is passed to swiotlb_alloc_buffer(), it should be
passed further down to swiotlb_tbl_map_single(). Otherwise we escape
half of the warnings but still log the other half.
This is one of the multiple causes of spurious warnings reported at:
https://bugs.freedesktop.org/show_bug.cgi?id=104082
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes:
0176adb00406 ("swiotlb: refactor coherent buffer allocation")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Michel Dänzer <michel@daenzer.net>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: stable@vger.kernel.org # v4.16
Mel Gorman [Wed, 9 May 2018 16:31:15 +0000 (17:31 +0100)]
Revert "sched/numa: Delay retrying placement for automatic NUMA balance after wake_affine()"
This reverts commit
7347fc87dfe6b7315e74310ee1243dc222c68086.
Srikar Dronamra pointed out that while the commit in question did show
a performance improvement on ppc64, it did so at the cost of disabling
active CPU migration by automatic NUMA balancing which was not the intent.
The issue was that a serious flaw in the logic failed to ever active balance
if SD_WAKE_AFFINE was disabled on scheduler domains. Even when it's enabled,
the logic is still bizarre and against the original intent.
Investigation showed that fixing the patch in either the way he suggested,
using the correct comparison for jiffies values or introducing a new
numa_migrate_deferred variable in task_struct all perform similarly to a
revert with a mix of gains and losses depending on the workload, machine
and socket count.
The original intent of the commit was to handle a problem whereby
wake_affine, idle balancing and automatic NUMA balancing disagree on the
appropriate placement for a task. This was particularly true for cases where
a single task was a massive waker of tasks but where wake_wide logic did
not apply. This was particularly noticeable when a futex (a barrier) woke
all worker threads and tried pulling the wakees to the waker nodes. In that
specific case, it could be handled by tuning MPI or openMP appropriately,
but the behavior is not illogical and was worth attempting to fix. However,
the approach was wrong. Given that we're at rc4 and a fix is not obvious,
it's better to play safe, revert this commit and retry later.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: ggherdovich@suse.cz
Cc: hpa@zytor.com
Cc: matt@codeblueprint.co.uk
Cc: mpe@ellerman.id.au
Link: http://lkml.kernel.org/r/20180509163115.6fnnyeg4vdm2ct4v@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sat, 12 May 2018 01:04:12 +0000 (18:04 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"13 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
rbtree: include rcu.h
scripts/faddr2line: fix error when addr2line output contains discriminator
ocfs2: take inode cluster lock before moving reflinked inode from orphan dir
mm, oom: fix concurrent munlock and oom reaper unmap, v3
mm: migrate: fix double call of radix_tree_replace_slot()
proc/kcore: don't bounds check against address 0
mm: don't show nr_indirectly_reclaimable in /proc/vmstat
mm: sections are not offlined during memory hotremove
z3fold: fix reclaim lock-ups
init: fix false positives in W+X checking
lib/find_bit_benchmark.c: avoid soft lockup in test_find_first_bit()
KASAN: prohibit KASAN+STRUCTLEAK combination
MAINTAINERS: update Shuah's email address
Sebastian Andrzej Siewior [Fri, 11 May 2018 23:02:14 +0000 (16:02 -0700)]
rbtree: include rcu.h
Since commit
c1adf20052d8 ("Introduce rb_replace_node_rcu()")
rbtree_augmented.h uses RCU related data structures but does not include
the header file. It works as long as it gets somehow included before
that and fails otherwise.
Link: http://lkml.kernel.org/r/20180504103159.19938-1-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changbin Du [Fri, 11 May 2018 23:02:11 +0000 (16:02 -0700)]
scripts/faddr2line: fix error when addr2line output contains discriminator
When addr2line output contains discriminator, the current awk script
cannot parse it. This patch fixes it by extracting key words using
regex which is more reliable.
$ scripts/faddr2line vmlinux tlb_flush_mmu_free+0x26
tlb_flush_mmu_free+0x26/0x50:
tlb_flush_mmu_free at mm/memory.c:258 (discriminator 3)
scripts/faddr2line: eval: line 173: unexpected EOF while looking for matching `)'
Link: http://lkml.kernel.org/r/1525323379-25193-1-git-send-email-changbin.du@intel.com
Fixes:
6870c0165feaa5 ("scripts/faddr2line: show the code context")
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: NeilBrown <neilb@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ashish Samant [Fri, 11 May 2018 23:02:07 +0000 (16:02 -0700)]
ocfs2: take inode cluster lock before moving reflinked inode from orphan dir
While reflinking an inode, we create a new inode in orphan directory,
then take EX lock on it, reflink the original inode to orphan inode and
release EX lock. Once the lock is released another node could request
it in EX mode from ocfs2_recover_orphans() which causes downconvert of
the lock, on this node, to NL mode.
Later we attempt to initialize security acl for the orphan inode and
move it to the reflink destination. However, while doing this we dont
take EX lock on the inode. This could potentially cause problems
because we could be starting transaction, accessing journal and
modifying metadata of the inode while holding NL lock and with another
node holding EX lock on the inode.
Fix this by taking orphan inode cluster lock in EX mode before
initializing security and moving orphan inode to reflink destination.
Use the __tracker variant while taking inode lock to avoid recursive
locking in the ocfs2_init_security_and_acl() call chain.
Link: http://lkml.kernel.org/r/1523475107-7639-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Acked-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Fri, 11 May 2018 23:02:04 +0000 (16:02 -0700)]
mm, oom: fix concurrent munlock and oom reaper unmap, v3
Since exit_mmap() is done without the protection of mm->mmap_sem, it is
possible for the oom reaper to concurrently operate on an mm until
MMF_OOM_SKIP is set.
This allows munlock_vma_pages_all() to concurrently run while the oom
reaper is operating on a vma. Since munlock_vma_pages_range() depends
on clearing VM_LOCKED from vm_flags before actually doing the munlock to
determine if any other vmas are locking the same memory, the check for
VM_LOCKED in the oom reaper is racy.
This is especially noticeable on architectures such as powerpc where
clearing a huge pmd requires serialize_against_pte_lookup(). If the pmd
is zapped by the oom reaper during follow_page_mask() after the check
for pmd_none() is bypassed, this ends up deferencing a NULL ptl or a
kernel oops.
Fix this by manually freeing all possible memory from the mm before
doing the munlock and then setting MMF_OOM_SKIP. The oom reaper can not
run on the mm anymore so the munlock is safe to do in exit_mmap(). It
also matches the logic that the oom reaper currently uses for
determining when to set MMF_OOM_SKIP itself, so there's no new risk of
excessive oom killing.
This issue fixes CVE-2018-1000200.
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804241526320.238665@chino.kir.corp.google.com
Fixes:
212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: David Rientjes <rientjes@google.com>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Fri, 11 May 2018 23:02:00 +0000 (16:02 -0700)]
mm: migrate: fix double call of radix_tree_replace_slot()
radix_tree_replace_slot() is called twice for head page, it's obviously
a bug. Let's fix it.
Link: http://lkml.kernel.org/r/20180423072101.GA12157@hori1.linux.bs1.fc.nec.co.jp
Fixes:
e71769ae5260 ("mm: enable thp migration for shmem thp")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Zi Yan <zi.yan@sent.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Laura Abbott [Fri, 11 May 2018 23:01:57 +0000 (16:01 -0700)]
proc/kcore: don't bounds check against address 0
The existing kcore code checks for bad addresses against __va(0) with
the assumption that this is the lowest address on the system. This may
not hold true on some systems (e.g. arm64) and produce overflows and
crashes. Switch to using other functions to validate the address range.
It's currently only seen on arm64 and it's not clear if anyone wants to
use that particular combination on a stable release. So this is not
urgent for stable.
Link: http://lkml.kernel.org/r/20180501201143.15121-1-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Tested-by: Dave Anderson <anderson@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>a
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roman Gushchin [Fri, 11 May 2018 23:01:53 +0000 (16:01 -0700)]
mm: don't show nr_indirectly_reclaimable in /proc/vmstat
Don't show nr_indirectly_reclaimable in /proc/vmstat, because there is
no need to export this vm counter to userspace, and some changes are
expected in reclaimable object accounting, which can alter this counter.
Link: http://lkml.kernel.org/r/20180425191422.9159-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Tatashin [Fri, 11 May 2018 23:01:50 +0000 (16:01 -0700)]
mm: sections are not offlined during memory hotremove
Memory hotplug and hotremove operate with per-block granularity. If the
machine has a large amount of memory (more than 64G), the size of a
memory block can span multiple sections. By mistake, during hotremove
we set only the first section to offline state.
The bug was discovered because kernel selftest started to fail:
https://lkml.kernel.org/r/
20180423011247.GK5563@yexl-desktop
After commit, "mm/memory_hotplug: optimize probe routine". But, the bug
is older than this commit. In this optimization we also added a check
for sections to be in a proper state during hotplug operation.
Link: http://lkml.kernel.org/r/20180427145257.15222-1-pasha.tatashin@oracle.com
Fixes:
2d070eab2e82 ("mm: consider zone which is not fully populated to have holes")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Wool [Fri, 11 May 2018 23:01:46 +0000 (16:01 -0700)]
z3fold: fix reclaim lock-ups
Do not try to optimize in-page object layout while the page is under
reclaim. This fixes lock-ups on reclaim and improves reclaim
performance at the same time.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20180430125800.444cae9706489f412ad12621@gmail.com
Signed-off-by: Vitaly Wool <vitaly.vul@sony.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: <Oleksiy.Avramchenko@sony.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeffrey Hugo [Fri, 11 May 2018 23:01:42 +0000 (16:01 -0700)]
init: fix false positives in W+X checking
load_module() creates W+X mappings via __vmalloc_node_range() (from
layout_and_allocate()->move_module()->module_alloc()) by using
PAGE_KERNEL_EXEC. These mappings are later cleaned up via
"call_rcu_sched(&freeinit->rcu, do_free_init)" from do_init_module().
This is a problem because call_rcu_sched() queues work, which can be run
after debug_checkwx() is run, resulting in a race condition. If hit,
the race results in a nasty splat about insecure W+X mappings, which
results in a poor user experience as these are not the mappings that
debug_checkwx() is intended to catch.
This issue is observed on multiple arm64 platforms, and has been
artificially triggered on an x86 platform.
Address the race by flushing the queued work before running the
arch-defined mark_rodata_ro() which then calls debug_checkwx().
Link: http://lkml.kernel.org/r/1525103946-29526-1-git-send-email-jhugo@codeaurora.org
Fixes:
e1a58320a38d ("x86/mm: Warn on W^X mappings")
Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Reported-by: Timur Tabi <timur@codeaurora.org>
Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yury Norov [Fri, 11 May 2018 23:01:39 +0000 (16:01 -0700)]
lib/find_bit_benchmark.c: avoid soft lockup in test_find_first_bit()
test_find_first_bit() is intentionally sub-optimal, and may cause soft
lockup due to long time of run on some systems. So decrease length of
bitmap to traverse to avoid lockup.
With the change below, time of test execution doesn't exceed 0.2 seconds
on my testing system.
Link: http://lkml.kernel.org/r/20180420171949.15710-1-ynorov@caviumnetworks.com
Fixes:
4441fca0a27f5 ("lib: test module for find_*_bit() functions")
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry Vyukov [Fri, 11 May 2018 23:01:35 +0000 (16:01 -0700)]
KASAN: prohibit KASAN+STRUCTLEAK combination
Currently STRUCTLEAK inserts initialization out of live scope of variables
from KASAN point of view. This leads to KASAN false positive reports.
Prohibit this combination for now.
Link: http://lkml.kernel.org/r/20180419172451.104700-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dennis Zhou <dennisszhou@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shuah Khan (Samsung OSG) [Fri, 11 May 2018 23:01:32 +0000 (16:01 -0700)]
MAINTAINERS: update Shuah's email address
Update email address in MAINTAINERS file due to IT infrastructure changes
at Samsung.
Link: http://lkml.kernel.org/r/20180501212815.25911-1-shuah@kernel.org
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 11 May 2018 21:14:46 +0000 (14:14 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Verify lengths of keys provided by the user is AF_KEY, from Kevin
Easton.
2) Add device ID for BCM89610 PHY. Thanks to Bhadram Varka.
3) Add Spectre guards to some ATM code, courtesy of Gustavo A. R.
Silva.
4) Fix infinite loop in NSH protocol code. To Eric Dumazet we are most
grateful for this fix.
5) Line up /proc/net/netlink headers properly. This fix from YU Bo, we
do appreciate.
6) Use after free in TLS code. Once again we are blessed by the
honorable Eric Dumazet with this fix.
7) Fix regression in TLS code causing stalls on partial TLS records.
This fix is bestowed upon us by Andrew Tomt.
8) Deal with too small MTUs properly in LLC code, another great gift
from Eric Dumazet.
9) Handle cached route flushing properly wrt. MTU locking in ipv4, to
Hangbin Liu we give thanks for this.
10) Fix regression in SO_BINDTODEVIC handling wrt. UDP socket demux.
Paolo Abeni, he gave us this.
11) Range check coalescing parameters in mlx4 driver, thank you Moshe
Shemesh.
12) Some ipv6 ICMP error handling fixes in rxrpc, from our good brother
David Howells.
13) Fix kexec on mlx5 by freeing IRQs in shutdown path. Daniel Juergens,
you're the best!
14) Don't send bonding RLB updates to invalid MAC addresses. Debabrata
Benerjee saved us!
15) Uh oh, we were leaking in udp_sendmsg and ping_v4_sendmsg. The ship
is now water tight, thanks to Andrey Ignatov.
16) IPSEC memory leak in ixgbe from Colin Ian King, man we've got holes
everywhere!
17) Fix error path in tcf_proto_create, Jiri Pirko what would we do
without you!
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (92 commits)
net sched actions: fix refcnt leak in skbmod
net: sched: fix error path in tcf_proto_create() when modules are not configured
net sched actions: fix invalid pointer dereferencing if skbedit flags missing
ixgbe: fix memory leak on ipsec allocation
ixgbevf: fix ixgbevf_xmit_frame()'s return type
ixgbe: return error on unsupported SFP module when resetting
ice: Set rq_last_status when cleaning rq
ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
mlxsw: core: Fix an error handling path in 'mlxsw_core_bus_device_register()'
bonding: send learning packets for vlans on slave
bonding: do not allow rlb updates to invalid mac
net/mlx5e: Err if asked to offload TC match on frag being first
net/mlx5: E-Switch, Include VF RDMA stats in vport statistics
net/mlx5: Free IRQs in shutdown path
rxrpc: Trace UDP transmission failure
rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages
rxrpc: Fix the min security level for kernel calls
rxrpc: Fix error reception on AF_INET6 sockets
rxrpc: Fix missing start of call timeout
qed: fix spelling mistake: "taskelt" -> "tasklet"
...
Linus Torvalds [Fri, 11 May 2018 20:56:43 +0000 (13:56 -0700)]
Merge tag 'nfs-for-4.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"These patches fix both a possible corruption during NFSoRDMA MR
recovery, and a sunrpc tracepoint crash.
Additionally, Trond has a new email address to put in the MAINTAINERS
file"
* tag 'nfs-for-4.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
Change Trond's email address in MAINTAINERS
sunrpc: Fix latency trace point crashes
xprtrdma: Fix list corruption / DMAR errors during MR recovery
Roman Mashak [Fri, 11 May 2018 18:35:33 +0000 (14:35 -0400)]
net sched actions: fix refcnt leak in skbmod
When application fails to pass flags in netlink TLV when replacing
existing skbmod action, the kernel will leak refcnt:
$ tc actions get action skbmod index 1
total acts 0
action order 0: skbmod pipe set smac 00:11:22:33:44:55
index 1 ref 1 bind 0
For example, at this point a buggy application replaces the action with
index 1 with new smac 00:aa:22:33:44:55, it fails because of zero flags,
however refcnt gets bumped:
$ tc actions get actions skbmod index 1
total acts 0
action order 0: skbmod pipe set smac 00:11:22:33:44:55
index 1 ref 2 bind 0
$
Tha patch fixes this by calling tcf_idr_release() on existing actions.
Fixes:
86da71b57383d ("net_sched: Introduce skbmod action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 11 May 2018 20:36:06 +0000 (13:36 -0700)]
Merge tag 'ceph-for-4.17-rc5' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"These patches fix two long-standing bugs in the DIO code path, one of
which is a crash trivially triggerable with splice()"
* tag 'ceph-for-4.17-rc5' of git://github.com/ceph/ceph-client:
ceph: fix iov_iter issues in ceph_direct_read_write()
libceph: add osd_req_op_extent_osd_data_bvecs()
ceph: fix rsize/wsize capping in ceph_direct_read_write()
Jiri Pirko [Fri, 11 May 2018 15:45:32 +0000 (17:45 +0200)]
net: sched: fix error path in tcf_proto_create() when modules are not configured
In case modules are not configured, error out when tp->ops is null
and prevent later null pointer dereference.
Fixes:
33a48927c193 ("sched: push TC filter protocol creation into a separate function")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 11 May 2018 20:14:24 +0000 (13:14 -0700)]
Merge tag 'sh-for-4.17-fixes' of git://git.libc.org/linux-sh
Pull arch/sh fixes from Rich Felker:
"Fixes for critical regressions and a build failure.
The regressions were introduced in 4.15 and 4.17-rc1 and prevented
booting on affected systems"
* tag 'sh-for-4.17-fixes' of git://git.libc.org/linux-sh:
sh: switch to NO_BOOTMEM
sh: mm: Fix unprotected access to struct device
sh: fix build failure for J2 cpu with SMP disabled
Linus Torvalds [Fri, 11 May 2018 20:09:04 +0000 (13:09 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"There's a small memblock accounting problem when freeing the initrd
and a Spectre-v2 mitigation for NVIDIA Denver CPUs which just requires
a match on the CPU ID register.
Summary:
- Mitigate Spectre-v2 for NVIDIA Denver CPUs
- Free memblocks corresponding to freed initrd area"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: capabilities: Add NVIDIA Denver CPU to bp_harden list
arm64: Add MIDR encoding for NVIDIA CPUs
arm64: To remove initrd reserved area entry from memblock
Linus Torvalds [Fri, 11 May 2018 20:07:22 +0000 (13:07 -0700)]
Merge tag 'powerpc-4.17-5' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One fix for an actual regression, the change to the SYSCALL_DEFINE
wrapper broke FTRACE_SYSCALLS for us due to a name mismatch. There's
also another commit to the same code to make sure we match all our
syscalls with various prefixes.
And then just one minor build fix, and the removal of an unused
variable that was removed and then snuck back in due to some rebasing.
Thanks to: Naveen N. Rao"
* tag 'powerpc-4.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries: Fix CONFIG_NUMA=n build
powerpc/trace/syscalls: Update syscall name matching logic to account for ppc_ prefix
powerpc/trace/syscalls: Update syscall name matching logic
powerpc/64: Remove unused paca->soft_enabled
Linus Torvalds [Fri, 11 May 2018 20:04:35 +0000 (13:04 -0700)]
Merge tag 'trace-v4.17-rc4' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Working on some new updates to trace filtering, I noticed that the
regex_match_front() test was updated to be limited to the size of the
pattern instead of the full test string.
But as the test string is not guaranteed to be nul terminated, it
still needs to consider the size of the test string"
* tag 'trace-v4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix regex_match_front() to not over compare the test string
David S. Miller [Fri, 11 May 2018 19:57:23 +0000 (15:57 -0400)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2018-05-11
This series contains fixes to the ice, ixgbe and ixgbevf drivers.
Jeff Shaw provides a fix to ensure rq_last_status gets set, whether or
not the hardware responds with an error in the ice driver.
Emil adds a check for unsupported module during the reset routine for
ixgbe.
Luc Van Oostenryck fixes ixgbevf_xmit_frame() where it was not using the
correct return value (int).
Colin Ian King fixes a potential resource leak in ixgbe, where we were
not freeing ipsec in our cleanup path.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 11 May 2018 19:55:57 +0000 (15:55 -0400)]
Merge tag 'rxrpc-fixes-
20180510' of git://git./linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Fixes
Here are three fixes for AF_RXRPC and two tracepoints that were useful for
finding them:
(1) Fix missing start of expect-Rx-by timeout on initial packet
transmission so that calls will time out if the peer doesn't respond.
(2) Fix error reception on AF_INET6 sockets by using the correct family of
sockopts on the UDP transport socket.
(3) Fix setting the minimum security level on kernel calls so that they
can be encrypted.
(4) Add a tracepoint to log ICMP/ICMP6 and other error reports from the
transport socket.
(5) Add a tracepoint to log UDP sendmsg failure so that we can find out if
transmission failure occurred on the UDP socket.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak [Fri, 11 May 2018 14:55:09 +0000 (10:55 -0400)]
net sched actions: fix invalid pointer dereferencing if skbedit flags missing
When application fails to pass flags in netlink TLV for a new skbedit action,
the kernel results in the following oops:
[ 8.307732] BUG: unable to handle kernel paging request at
0000000000021130
[ 8.309167] PGD
80000000193d1067 P4D
80000000193d1067 PUD
180e0067 PMD 0
[ 8.310595] Oops: 0000 [#1] SMP PTI
[ 8.311334] Modules linked in: kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper serio_raw
[ 8.314190] CPU: 1 PID: 397 Comm: tc Not tainted 4.17.0-rc3+ #357
[ 8.315252] RIP: 0010:__tcf_idr_release+0x33/0x140
[ 8.316203] RSP: 0018:
ffffa0718038f840 EFLAGS:
00010246
[ 8.317123] RAX:
0000000000000001 RBX:
0000000000021100 RCX:
0000000000000000
[ 8.319831] RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000021100
[ 8.321181] RBP:
0000000000000000 R08:
000000000004adf8 R09:
0000000000000122
[ 8.322645] R10:
0000000000000000 R11:
ffffffff9e5b01ed R12:
0000000000000000
[ 8.324157] R13:
ffffffff9e0d3cc0 R14:
0000000000000000 R15:
0000000000000000
[ 8.325590] FS:
00007f591292e700(0000) GS:
ffff8fcf5bc40000(0000) knlGS:
0000000000000000
[ 8.327001] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 8.327987] CR2:
0000000000021130 CR3:
00000000180e6004 CR4:
00000000001606a0
[ 8.329289] Call Trace:
[ 8.329735] tcf_skbedit_init+0xa7/0xb0
[ 8.330423] tcf_action_init_1+0x362/0x410
[ 8.331139] ? try_to_wake_up+0x44/0x430
[ 8.331817] tcf_action_init+0x103/0x190
[ 8.332511] tc_ctl_action+0x11a/0x220
[ 8.333174] rtnetlink_rcv_msg+0x23d/0x2e0
[ 8.333902] ? _cond_resched+0x16/0x40
[ 8.334569] ? __kmalloc_node_track_caller+0x5b/0x2c0
[ 8.335440] ? rtnl_calcit.isra.31+0xf0/0xf0
[ 8.336178] netlink_rcv_skb+0xdb/0x110
[ 8.336855] netlink_unicast+0x167/0x220
[ 8.337550] netlink_sendmsg+0x2a7/0x390
[ 8.338258] sock_sendmsg+0x30/0x40
[ 8.338865] ___sys_sendmsg+0x2c5/0x2e0
[ 8.339531] ? pagecache_get_page+0x27/0x210
[ 8.340271] ? filemap_fault+0xa2/0x630
[ 8.340943] ? page_add_file_rmap+0x108/0x200
[ 8.341732] ? alloc_set_pte+0x2aa/0x530
[ 8.342573] ? finish_fault+0x4e/0x70
[ 8.343332] ? __handle_mm_fault+0xbc1/0x10d0
[ 8.344337] ? __sys_sendmsg+0x53/0x80
[ 8.345040] __sys_sendmsg+0x53/0x80
[ 8.345678] do_syscall_64+0x4f/0x100
[ 8.346339] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 8.347206] RIP: 0033:0x7f591191da67
[ 8.347831] RSP: 002b:
00007fff745abd48 EFLAGS:
00000246 ORIG_RAX:
000000000000002e
[ 8.349179] RAX:
ffffffffffffffda RBX:
00007fff745abe70 RCX:
00007f591191da67
[ 8.350431] RDX:
0000000000000000 RSI:
00007fff745abdc0 RDI:
0000000000000003
[ 8.351659] RBP:
000000005af35251 R08:
0000000000000001 R09:
0000000000000000
[ 8.352922] R10:
00000000000005f1 R11:
0000000000000246 R12:
0000000000000000
[ 8.354183] R13:
00007fff745afed0 R14:
0000000000000001 R15:
00000000006767c0
[ 8.355400] Code: 41 89 d4 53 89 f5 48 89 fb e8 aa 20 fd ff 85 c0 0f 84 ed 00
00 00 48 85 db 0f 84 cf 00 00 00 40 84 ed 0f 85 cd 00 00 00 45 84 e4 <8b> 53 30
74 0d 85 d2 b8 ff ff ff ff 0f 8f b3 00 00 00 8b 43 2c
[ 8.358699] RIP: __tcf_idr_release+0x33/0x140 RSP:
ffffa0718038f840
[ 8.359770] CR2:
0000000000021130
[ 8.360438] ---[ end trace
60c66be45dfc14f0 ]---
The caller calls action's ->init() and passes pointer to "struct tc_action *a",
which later may be initialized to point at the existing action, otherwise
"struct tc_action *a" is still invalid, and therefore dereferencing it is an
error as happens in tcf_idr_release, where refcnt is decremented.
So in case of missing flags tcf_idr_release must be called only for
existing actions.
v2:
- prepare patch for net tree
Fixes:
5e1567aeb7fe ("net sched: skbedit action fix late binding")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jens Axboe [Tue, 8 May 2018 16:25:15 +0000 (10:25 -0600)]
nvme: add quirk to force medium priority for SQ creation
Some P3100 drives have a bug where they think WRRU (weighted round robin)
is always enabled, even though the host doesn't set it. Since they think
it's enabled, they also look at the submission queue creation priority. We
used to set that to MEDIUM by default, but that was removed in commit
81c1cd98351b. This causes various issues on that drive. Add a quirk to
still set MEDIUM priority for that controller.
Fixes:
81c1cd98351b ("nvme/pci: Don't set reserved SQ create flags")
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Linus Torvalds [Fri, 11 May 2018 19:30:34 +0000 (12:30 -0700)]
Merge tag 'for-linus-4.17-rc5-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"One fix for the kernel running as a fully virtualized guest using PV
drivers on old Xen hypervisor versions"
* tag 'for-linus-4.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: Reset VCPU0 info pointer after shared_info remap
Colin Ian King [Wed, 9 May 2018 13:58:48 +0000 (14:58 +0100)]
ixgbe: fix memory leak on ipsec allocation
The error clean up path kfree's adapter->ipsec and should be
instead kfree'ing ipsec. Fix this. Also, the err1 error exit path
does not need to kfree ipsec because this failure path was for
the failed allocation of ipsec.
Detected by CoverityScan, CID#146424 ("Resource Leak")
Fixes:
63a67fe229ea ("ixgbe: add ipsec offload add and remove SA")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Luc Van Oostenryck [Tue, 24 Apr 2018 13:16:48 +0000 (15:16 +0200)]
ixgbevf: fix ixgbevf_xmit_frame()'s return type
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.
Fix this by returning 'netdev_tx_t' in this driver too.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Fri, 20 Apr 2018 00:06:57 +0000 (17:06 -0700)]
ixgbe: return error on unsupported SFP module when resetting
Add check for unsupported module and return the error code.
This fixes a Coverity hit due to unused return status from setup_sfp.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Shaw [Wed, 18 Apr 2018 18:23:27 +0000 (11:23 -0700)]
ice: Set rq_last_status when cleaning rq
Prior to this commit, the rq_last_status was only set when hardware
responded with an error. This leads to rq_last_status being invalid
in the future when hardware eventually responds without error. This
commit resolves the issue by unconditionally setting rq_last_status
with the value returned in the descriptor.
Fixes:
940b61af02f4 ("ice: Initialize PF and setup miscellaneous
interrupt")
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Trond Myklebust [Fri, 11 May 2018 18:13:57 +0000 (14:13 -0400)]
Change Trond's email address in MAINTAINERS
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Rob Herring [Fri, 11 May 2018 13:45:59 +0000 (08:45 -0500)]
sh: switch to NO_BOOTMEM
Commit
0fa1c579349f ("of/fdt: use memblock_virt_alloc for early alloc")
inadvertently switched the DT unflattening allocations from memblock to
bootmem which doesn't work because the unflattening happens before
bootmem is initialized. Swapping the order of bootmem init and
unflattening could also fix this, but removing bootmem is desired. So
enable NO_BOOTMEM on SH like other architectures have done.
Fixes:
0fa1c579349f ("of/fdt: use memblock_virt_alloc for early alloc")
Reported-by: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rich Felker <dalias@libc.org>
Linus Torvalds [Fri, 11 May 2018 16:52:01 +0000 (09:52 -0700)]
mmap: introduce sane default mmap limits
The internal VM "mmap()" interfaces are based on the mmap target doing
everything using page indexes rather than byte offsets, because
traditionally (ie 32-bit) we had the situation that the byte offset
didn't fit in a register. So while the mmap virtual address was limited
by the word size of the architecture, the backing store was not.
So we're basically passing "pgoff" around as a page index, in order to
be able to describe backing store locations that are much bigger than
the word size (think files larger than 4GB etc).
But while this all makes a ton of sense conceptually, we've been dogged
by various drivers that don't really understand this, and internally
work with byte offsets, and then try to work with the page index by
turning it into a byte offset with "pgoff << PAGE_SHIFT".
Which obviously can overflow.
Adding the size of the mapping to it to get the byte offset of the end
of the backing store just exacerbates the problem, and if you then use
this overflow-prone value to check various limits of your device driver
mmap capability, you're just setting yourself up for problems.
The correct thing for drivers to do is to do their limit math in page
indices, the way the interface is designed. Because the generic mmap
code _does_ test that the index doesn't overflow, since that's what the
mmap code really cares about.
HOWEVER.
Finding and fixing various random drivers is a sisyphean task, so let's
just see if we can just make the core mmap() code do the limiting for
us. Realistically, the only "big" backing stores we need to care about
are regular files and block devices, both of which are known to do this
properly, and which have nice well-defined limits for how much data they
can access.
So let's special-case just those two known cases, and then limit other
random mmap users to a backing store that still fits in "unsigned long".
Realistically, that's not much of a limit at all on 64-bit, and on
32-bit architectures the only worry might be the GPU drivers, which can
have big physical address spaces.
To make it possible for drivers like that to say that they are 64-bit
clean, this patch does repurpose the "FMODE_UNSIGNED_OFFSET" bit in the
file flags to allow drivers to mark their file descriptors as safe in
the full 64-bit mmap address space.
[ The timing for doing this is less than optimal, and this should really
go in a merge window. But realistically, this needs wide testing more
than it needs anything else, and being main-line is the only way to do
that.
So the earlier the better, even if it's outside the proper development
cycle - Linus ]
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Charles Machalow [Thu, 10 May 2018 23:01:38 +0000 (16:01 -0700)]
nvme: Fix sync controller reset return
If a controller reset is requested while the device has no namespaces,
we were incorrectly returning ENETRESET. This patch adds the check for
ADMIN_ONLY controller state to indicate a successful reset.
Fixes:
8000d1fdb0 ("nvme-rdma: fix sysfs invoked reset_ctrl error flow ")
Cc: <stable@vger.kernel.org>
Signed-off-by: Charles Machalow <charles.machalow@intel.com>
[changelog]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Linus Torvalds [Fri, 11 May 2018 16:49:02 +0000 (09:49 -0700)]
Merge tag 'pm-4.17-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two PCI power management regressions from the 4.13 cycle and
one cpufreq schedutil governor bug introduced during the 4.12 cycle,
drop a stale comment from the schedutil code and fix two mistakes in
docs.
Specifics:
- Restore device_may_wakeup() check in pci_enable_wake() removed
inadvertently during the 4.13 cycle to prevent systems from drawing
excessive power when suspended or off, among other things (Rafael
Wysocki).
- Fix pci_dev_run_wake() to properly handle devices that only can
signal PME# when in the D3cold power state (Kai Heng Feng).
- Fix the schedutil cpufreq governor to avoid using UINT_MAX as the
new CPU frequency in some cases due to a missing check (Rafael
Wysocki).
- Remove a stale comment regarding worker kthreads from the schedutil
cpufreq governor (Juri Lelli).
- Fix a copy-paste mistake in the intel_pstate driver documentation
(Juri Lelli).
- Fix a typo in the system sleep states documentation (Jonathan
Neuschäfer)"
* tag 'pm-4.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PCI / PM: Check device_may_wakeup() in pci_enable_wake()
PCI / PM: Always check PME wakeup capability for runtime wakeup support
cpufreq: schedutil: Avoid using invalid next_freq
cpufreq: schedutil: remove stale comment
PM: docs: intel_pstate: fix Active Mode w/o HWP paragraph
PM: docs: sleep-states: Fix a typo ("includig")
Linus Torvalds [Fri, 11 May 2018 16:46:14 +0000 (09:46 -0700)]
Merge tag 'mtd/fixes-for-4.17-rc5' of git://git.infradead.org/linux-mtd
Pull mtd fixes from Boris Brezillon:
- make nand_soft_waitrdy() wait tWB before polling the status REG
- fix BCH write in the the Marvell NAND controller driver
- fix wrong picosec to msec conversion in the Marvell NAND controller
driver
- fix DMA handling in the TI OneNAND controllre driver
* tag 'mtd/fixes-for-4.17-rc5' of git://git.infradead.org/linux-mtd:
mtd: rawnand: Make sure we wait tWB before polling the STATUS reg
mtd: rawnand: marvell: fix command xtype in BCH write hook
mtd: rawnand: marvell: pass ms delay to wait_op
mtd: onenand: omap2: Disable DMA for HIGHMEM buffers
David S. Miller [Fri, 11 May 2018 16:26:29 +0000 (12:26 -0400)]
Merge tag 'mlx5-fixes-2018-05-10' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2018-05-10
the following series includes some fixes for mlx5 core driver.
Please pull and let me know if there's any problem.
For -stable v4.5
("net/mlx5: E-Switch, Include VF RDMA stats in vport statistics")
For -stable v4.10
("net/mlx5e: Err if asked to offload TC match on frag being first")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 11 May 2018 16:18:02 +0000 (09:18 -0700)]
Merge tag 'drm-fixes-for-v4.17-rc5' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"nouveau, amdgpu, i915, vc4, omap, exynos and atomic fixes.
As last week seemed a bit slow, we got a few more fixes this week.
The main stuff is two weeks of fixes for amdgpu, some missing bits of
vega12 atom firmware support were added, and some power management
fixes.
Nouveau got two regression fixes for an DP MST deadlock and a random
oops fix.
i915 got an LVDS panel timeout fix 2 WARN fixes.
exynos fixed a pagefault issue in the mixer driver.
vc4 has an oops fix.
omap had a bunch of uninit var and error-checking fixes. Two atomic
modesetting state fixes.
One minor agp cleanup patch"
* tag 'drm-fixes-for-v4.17-rc5' of git://people.freedesktop.org/~airlied/linux: (30 commits)
drm/amd/pp: Fix performance drop on Fiji
drm/nouveau: Fix deadlock in nv50_mstm_register_connector()
drm/nouveau/ttm: don't dereference nvbo::cli, it can outlive client
agp: uninorth: make two functions static
drm/amd/pp: Refine the output of pp_power_profile_mode on VI
drm/amdgpu: Switch to interruptable wait to recover from ring hang.
drm/ttm: Use GFP_TRANSHUGE_LIGHT for allocating huge pages
drm/amd/display: Use kvzalloc for potentially large allocations
drm/amd/display: Don't return ddc result and read_bytes in same return value
drm/amd/display: Add get_firmware_info_v3_2 for VG12
drm/amd: Add BIOS smu_info v3_3 required struct def.
drm/amd/display: Add VG12 ASIC IDs
drm/vc4: Fix scaling of uni-planar formats
drm/exynos: hdmi: avoid duplicating drm_bridge_attach
drm/i915: Fix drm:intel_enable_lvds ERROR message in kernel log
drm/i915: Correctly populate user mode h/vdisplay with pipe src size during readout
drm/i915: Adjust eDP's logical vco in a reliable place.
drm/bridge/sii8620: add Kconfig dependency on extcon
drm/omap: handle alloc failures in omap_connector
drm/omap: add missing linefeeds to prints
...
Andrey Ignatov [Thu, 10 May 2018 17:59:34 +0000 (10:59 -0700)]
ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
Fix more memory leaks in ip_cmsg_send() callers. Part of them were fixed
earlier in
919483096bfe.
* udp_sendmsg one was there since the beginning when linux sources were
first added to git;
* ping_v4_sendmsg one was copy/pasted in
c319b4d76b9e.
Whenever return happens in udp_sendmsg() or ping_v4_sendmsg() IP options
have to be freed if they were allocated previously.
Add label so that future callers (if any) can use it instead of kfree()
before return that is easy to forget.
Fixes:
c319b4d76b9e (net: ipv4: add IPPROTO_ICMP socket kind)
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe JAILLET [Thu, 10 May 2018 11:26:16 +0000 (13:26 +0200)]
mlxsw: core: Fix an error handling path in 'mlxsw_core_bus_device_register()'
Resources are not freed in the reverse order of the allocation.
Labels are also mixed-up.
Fix it and reorder code and labels in the error handling path of
'mlxsw_core_bus_device_register()'
Fixes:
ef3116e5403e ("mlxsw: spectrum: Register KVD resources with devlink")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 11 May 2018 15:50:41 +0000 (11:50 -0400)]
Merge branch 'bonding-bug-fixes-and-regressions'
Debabrata Banerjee says:
====================
bonding: bug fixes and regressions
Fixes to bonding driver for balance-alb mode, suitable for stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Debabrata Banerjee [Wed, 9 May 2018 23:32:11 +0000 (19:32 -0400)]
bonding: send learning packets for vlans on slave
There was a regression at some point from the intended functionality of
commit
f60c3704e87d ("bonding: Fix alb mode to only use first level
vlans.")
Given the return value vlan_get_encap_level() we need to store the nest
level of the bond device, and then compare the vlan's encap level to
this. Without this, this check always fails and learning packets are
never sent.
In addition, this same commit caused a regression in the behavior of
balance_alb, which requires learning packets be sent for all interfaces
using the slave's mac in order to load balance properly. For vlan's
that have not set a user mac, we can send after checking one bit.
Otherwise we need send the set mac, albeit defeating rx load balancing
for that vlan.
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Debabrata Banerjee [Wed, 9 May 2018 23:32:10 +0000 (19:32 -0400)]
bonding: do not allow rlb updates to invalid mac
Make sure multicast, broadcast, and zero mac's cannot be the output of rlb
updates, which should all be directed arps. Receive load balancing will be
collapsed if any of these happen, as the switch will broadcast.
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steven Rostedt (VMware) [Wed, 9 May 2018 15:59:32 +0000 (11:59 -0400)]
tracing: Fix regex_match_front() to not over compare the test string
The regex match function regex_match_front() in the tracing filter logic,
was fixed to test just the pattern length from testing the entire test
string. That is, it went from strncmp(str, r->pattern, len) to
strcmp(str, r->pattern, r->len).
The issue is that str is not guaranteed to be nul terminated, and if r->len
is greater than the length of str, it can access more memory than is
allocated.
The solution is to add a simple test if (len < r->len) return 0.
Cc: stable@vger.kernel.org
Fixes:
285caad415f45 ("tracing/filters: Fix MATCH_FRONT_ONLY filter matching")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Rafael J. Wysocki [Fri, 11 May 2018 13:17:18 +0000 (15:17 +0200)]
Merge branches 'pm-pci' and 'pm-docs'
* pm-pci:
PCI / PM: Check device_may_wakeup() in pci_enable_wake()
PCI / PM: Always check PME wakeup capability for runtime wakeup support
* pm-docs:
PM: docs: intel_pstate: fix Active Mode w/o HWP paragraph
PM: docs: sleep-states: Fix a typo ("includig")
Zhang Rui [Fri, 11 May 2018 01:37:21 +0000 (09:37 +0800)]
Merge branch 'thermal-soc' into next
Jann Horn [Fri, 11 May 2018 00:19:01 +0000 (02:19 +0200)]
compat: fix 4-byte infoleak via uninitialized struct field
Commit
3a4d44b61625 ("ntp: Move adjtimex related compat syscalls to
native counterparts") removed the memset() in compat_get_timex(). Since
then, the compat adjtimex syscall can invoke do_adjtimex() with an
uninitialized ->tai.
If do_adjtimex() doesn't write to ->tai (e.g. because the arguments are
invalid), compat_put_timex() then copies the uninitialized ->tai field
to userspace.
Fix it by adding the memset() back.
Fixes:
3a4d44b61625 ("ntp: Move adjtimex related compat syscalls to native counterparts")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Airlie [Fri, 11 May 2018 00:37:07 +0000 (10:37 +1000)]
Merge branch 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Single amdgpu regression fix
* 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux:
drm/amd/pp: Fix performance drop on Fiji
Steve French [Thu, 10 May 2018 15:59:37 +0000 (10:59 -0500)]
smb3: directory sync should not return an error
As with NFS, which ignores sync on directory handles,
fsync on a directory handle is a noop for CIFS/SMB3.
Do not return an error on it. It breaks some database
apps otherwise.
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Roi Dayan [Thu, 22 Mar 2018 16:51:37 +0000 (18:51 +0200)]
net/mlx5e: Err if asked to offload TC match on frag being first
The HW doesn't support matching on frag first/later, return error if we are
asked to offload that.
Fixes:
3f7d0eb42d59 ("net/mlx5e: Offload TC matching on packets being IP fragments")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Adi Nissim [Wed, 25 Apr 2018 08:21:32 +0000 (11:21 +0300)]
net/mlx5: E-Switch, Include VF RDMA stats in vport statistics
The host side reporting of VF vport statistics didn't include the VF
RDMA traffic.
Fixes:
3b751a2a418a ("net/mlx5: E-Switch, Introduce get vf statistics")
Signed-off-by: Adi Nissim <adin@mellanox.com>
Reported-by: Ariel Almog <ariela@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Daniel Jurgens [Mon, 26 Mar 2018 18:35:29 +0000 (13:35 -0500)]
net/mlx5: Free IRQs in shutdown path
Some platforms require IRQs to be free'd in the shutdown path. Otherwise
they will fail to be reallocated after a kexec.
Fixes:
8812c24d28f4 ("net/mlx5: Add fast unload support in shutdown flow")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
David Howells [Thu, 10 May 2018 22:26:01 +0000 (23:26 +0100)]
rxrpc: Trace UDP transmission failure
Add a tracepoint to log transmission failure from the UDP transport socket
being used by AF_RXRPC.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 10 May 2018 22:26:01 +0000 (23:26 +0100)]
rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages
Add a tracepoint to log received ICMP/ICMP6 events and other error
messages.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 10 May 2018 22:26:01 +0000 (23:26 +0100)]
rxrpc: Fix the min security level for kernel calls
Fix the kernel call initiation to set the minimum security level for kernel
initiated calls (such as from kAFS) from the sockopt value.
Fixes:
19ffa01c9c45 ("rxrpc: Use structs to hold connection params and protocol info")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 10 May 2018 22:26:00 +0000 (23:26 +0100)]
rxrpc: Fix error reception on AF_INET6 sockets
AF_RXRPC tries to turn on IP_RECVERR and IP_MTU_DISCOVER on the UDP socket
it just opened for communications with the outside world, regardless of the
type of socket. Unfortunately, this doesn't work with an AF_INET6 socket.
Fix this by turning on IPV6_RECVERR and IPV6_MTU_DISCOVER instead if the
socket is of the AF_INET6 family.
Without this, kAFS server and address rotation doesn't work correctly
because the algorithm doesn't detect received network errors.
Fixes:
75b54cb57ca3 ("rxrpc: Add IPv6 support")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 10 May 2018 22:26:00 +0000 (23:26 +0100)]
rxrpc: Fix missing start of call timeout
The expect_rx_by call timeout is supposed to be set when a call is started
to indicate that we need to receive a packet by that point. This is
currently put back every time we receive a packet, but it isn't started
when we first send a packet. Without this, the call may wait forever if
the server doesn't deign to reply.
Fix this by setting the timeout upon a successful UDP sendmsg call for the
first DATA packet. The timeout is initiated only for initial transmission
and not for subsequent retries as we don't want the retry mechanism to
extend the timeout indefinitely.
Fixes:
a158bdd3247b ("rxrpc: Fix call timeouts")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
David S. Miller [Thu, 10 May 2018 21:57:11 +0000 (17:57 -0400)]
Merge tag 'linux-can-fixes-for-4.17-
20180510' of ssh://gitolite./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
this is a pull request for net/master consisting of 2 patches.
Both patches are from Lukas Wunner and fix two problems found in the
hi311x CAN driver under high load situations.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 10 May 2018 14:03:27 +0000 (15:03 +0100)]
qed: fix spelling mistake: "taskelt" -> "tasklet"
Trivial fix to spelling mistake in DP_VERBOSE message text
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 10 May 2018 09:34:13 +0000 (17:34 +0800)]
sctp: remove sctp_chunk_put from fail_mark err path in sctp_ulpevent_make_rcvmsg
In Commit
1f45f78f8e51 ("sctp: allow GSO frags to access the chunk too"),
it held the chunk in sctp_ulpevent_make_rcvmsg to access it safely later
in recvmsg. However, it also added sctp_chunk_put in fail_mark err path,
which is only triggered before holding the chunk.
syzbot reported a use-after-free crash happened on this err path, where
it shouldn't call sctp_chunk_put.
This patch simply removes this call.
Fixes:
1f45f78f8e51 ("sctp: allow GSO frags to access the chunk too")
Reported-by: syzbot+141d898c5f24489db4aa@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe JAILLET [Thu, 10 May 2018 07:06:04 +0000 (09:06 +0200)]
net/mlx4_en: Fix an error handling path in 'mlx4_en_init_netdev()'
If an error occurs, 'mlx4_en_destroy_netdev()' is called.
It then calls 'mlx4_en_free_resources()' which does the needed resources
cleanup.
So, doing some explicit kfree in the error handling path would lead to
some double kfree.
Simplify code to avoid such a case.
Fixes:
67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Wed, 9 May 2018 21:09:04 +0000 (14:09 -0700)]
hv_netvsc: set master device
The hyper-v transparent bonding should have used master_dev_link.
The netvsc device should look like a master bond device not
like the upper side of a tunnel.
This makes the semantics the same so that userspace applications
looking at network devices see the correct master relationshipship.
Fixes:
0c195567a8f6 ("netvsc: transparent VF management")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 10 May 2018 21:34:50 +0000 (17:34 -0400)]
Merge tag 'mac80211-for-davem-2018-05-09' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
We only have a few fixes this time:
* WMM element validation
* SAE timeout
* add-BA timeout
* docbook parsing
* a few memory leaks in error paths
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 9 May 2018 16:50:22 +0000 (09:50 -0700)]
tipc: fix one byte leak in tipc_sk_set_orig_addr()
sysbot/KMSAN reported an uninit-value in recvmsg() that
I tracked down to tipc_sk_set_orig_addr(), missing
srcaddr->member.scope initialization.
This patches moves srcaddr->sock.scope init to follow
fields order and ease future verifications.
BUG: KMSAN: uninit-value in copy_to_user include/linux/uaccess.h:184 [inline]
BUG: KMSAN: uninit-value in move_addr_to_user+0x32e/0x530 net/socket.c:226
CPU: 0 PID: 4549 Comm: syz-executor287 Not tainted 4.17.0-rc3+ #88
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:113
kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
kmsan_internal_check_memory+0x135/0x1e0 mm/kmsan/kmsan.c:1157
kmsan_copy_to_user+0x69/0x160 mm/kmsan/kmsan.c:1199
copy_to_user include/linux/uaccess.h:184 [inline]
move_addr_to_user+0x32e/0x530 net/socket.c:226
___sys_recvmsg+0x4e2/0x810 net/socket.c:2285
__sys_recvmsg net/socket.c:2328 [inline]
__do_sys_recvmsg net/socket.c:2338 [inline]
__se_sys_recvmsg net/socket.c:2335 [inline]
__x64_sys_recvmsg+0x325/0x460 net/socket.c:2335
do_syscall_64+0x154/0x220 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4455e9
RSP: 002b:
00007fe3bd36ddb8 EFLAGS:
00000246 ORIG_RAX:
000000000000002f
RAX:
ffffffffffffffda RBX:
00000000006dac24 RCX:
00000000004455e9
RDX:
0000000000002002 RSI:
0000000020000400 RDI:
0000000000000003
RBP:
00000000006dac20 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
0000000000000000
R13:
00007fff98ce4b6f R14:
00007fe3bd36e9c0 R15:
0000000000000003
Local variable description: ----addr@___sys_recvmsg
Variable was created at:
___sys_recvmsg+0xd5/0x810 net/socket.c:2246
__sys_recvmsg net/socket.c:2328 [inline]
__do_sys_recvmsg net/socket.c:2338 [inline]
__se_sys_recvmsg net/socket.c:2335 [inline]
__x64_sys_recvmsg+0x325/0x460 net/socket.c:2335
Byte 19 of 32 is uninitialized
Fixes:
31c82a2d9d51 ("tipc: add second source address to recvmsg()/recvfrom()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Wed, 9 May 2018 16:45:42 +0000 (18:45 +0200)]
tc-testing: fix tdc tests for 'bpf' action
- correct a typo in the value of 'matchPattern' of test 282d, potentially
causing false negative
- allow errors when 'teardown' executes '$TC action flush action bpf' in
test 282d, to fix false positive when it is run with act_bpf unloaded
- correct the value of 'matchPattern' in test e939, causing false positive
in case the BPF JIT is enabled
Fixes:
440ea4ae1828 ("tc-testing: add selftests for 'bpf' action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moshe Shemesh [Wed, 9 May 2018 15:35:13 +0000 (18:35 +0300)]
net/mlx4_en: Verify coalescing parameters are in range
Add check of coalescing parameters received through ethtool are within
range of values supported by the HW.
Driver gets the coalescing rx/tx-usecs and rx/tx-frames as set by the
users through ethtool. The ethtool support up to 32 bit value for each.
However, mlx4 modify cq limits the coalescing time parameter and
coalescing frames parameters to 16 bits.
Return out of range error if user tries to set these parameters to
higher values.
Change type of sample-interval and adaptive_rx_coal parameters in mlx4
driver to u32 as the ethtool holds them as u32 and these parameters are
not limited due to mlx4 HW.
Fixes:
c27a02cd94d6 ('mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC')
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Wed, 9 May 2018 13:30:35 +0000 (19:00 +0530)]
cxgb4: copy mbox log size to PF0-3 adap instances
copy mbox size to adapter instances of PF0-3 to avoid
mbox log overflow. This fixes the possible protection
fault.
Fixes:
baf5086840ab ("cxgb4: restructure VF mgmt code")
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Wed, 9 May 2018 13:10:09 +0000 (18:40 +0530)]
cxgb4: zero the HMA memory
firmware expects HMA memory to be zeroed, use __GFP_ZERO
for HMA memory allocation.
Fixes:
8b4e6b3ca2ed ("cxgb4: Add HMA support")
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 9 May 2018 10:42:34 +0000 (12:42 +0200)]
udp: fix SO_BINDTODEVICE
Damir reported a breakage of SO_BINDTODEVICE for UDP sockets.
In absence of VRF devices, after commit
fb74c27735f0 ("net:
ipv4: add second dif to udp socket lookups") the dif mismatch
isn't fatal anymore for UDP socket lookup with non null
sk_bound_dev_if, breaking SO_BINDTODEVICE semantics.
This changeset addresses the issue making the dif match mandatory
again in the above scenario.
Reported-by: Damir Mansurov <dnman@oktetlabs.ru>
Fixes:
fb74c27735f0 ("net: ipv4: add second dif to udp socket lookups")
Fixes:
1801b570dd2a ("net: ipv6: add second dif to udp socket lookups")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Wed, 9 May 2018 10:06:44 +0000 (18:06 +0800)]
ipv4: reset fnhe_mtu_locked after cache route flushed
After route cache is flushed via ipv4_sysctl_rtcache_flush(), we forget
to reset fnhe_mtu_locked in rt_bind_exception(). When pmtu is updated
in __ip_rt_update_pmtu(), it will return directly since the pmtu is
still locked. e.g.
+ ip netns exec client ping 10.10.1.1 -c 1 -s 1400 -M do
PING 10.10.1.1 (10.10.1.1) 1400(1428) bytes of data.
>From 10.10.0.254 icmp_seq=1 Frag needed and DF set (mtu = 0)
Signed-off-by: David S. Miller <davem@davemloft.net>
Mohammed Gamal [Wed, 9 May 2018 08:17:34 +0000 (10:17 +0200)]
hv_netvsc: Fix net device attach on older Windows hosts
On older windows hosts the net_device instance is returned to
the caller of rndis_filter_device_add() without having the presence
bit set first. This would cause any subsequent calls to network device
operations (e.g. MTU change, channel change) to fail after the device
is detached once, returning -ENODEV.
Instead of returning the device instabce, we take the exit path where
we call netif_device_attach()
Fixes:
7b2ee50c0cd5 ("hv_netvsc: common detach logic")
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pieter Jansen van Vuuren [Wed, 9 May 2018 07:18:58 +0000 (00:18 -0700)]
nfp: flower: remove headroom from max MTU calculation
Since commit
29a5dcae2790 ("nfp: flower: offload phys port MTU change") we
take encapsulation headroom into account when calculating the max allowed
MTU. This is unnecessary as the max MTU advertised by firmware should have
already accounted for encap headroom.
Subtracting headroom twice brings the max MTU below what's necessary for
some deployments.
Fixes:
29a5dcae2790 ("nfp: flower: offload phys port MTU change")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 9 May 2018 09:48:33 +0000 (10:48 +0100)]
net/9p: fix spelling mistake: "suspsend" -> "suspend"
Trivial fix to spelling mistake in dev_warn message text
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 8 May 2018 22:24:28 +0000 (23:24 +0100)]
sctp: fix spelling mistake: "max_retans" -> "max_retrans"
Trivial fix to spelling mistake in error string
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 8 May 2018 22:01:51 +0000 (23:01 +0100)]
firestream: fix spelling mistake: "reseverd" -> "reserved"
Trivial fix to spelling mistake in res_strings string array
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 10 May 2018 19:22:36 +0000 (15:22 -0400)]
Merge branch 'qed-rdma-fixes'
Michal Kalderon says:
====================
qed*: Rdma fixes
This patch series include two fixes for bugs related to rdma.
The first has to do with loading the driver over an iWARP
device.
The second fixes a previous commit that added proper link
indication for iWARP / RoCE.
====================
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Sudarsana Kalluru <Sudarsana.Kalluru@cavium.com>
Michal Kalderon [Tue, 8 May 2018 18:29:19 +0000 (21:29 +0300)]
qede: Fix gfp flags sent to rdma event node allocation
A previous commit
4609adc27175 ("qede: Fix qedr link update")
added a flow that could allocate rdma event objects from an
interrupt path (link notification). Therefore the kzalloc call
should be done with GFP_ATOMIC.
fixes:
4609adc27175 ("qede: Fix qedr link update")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Sudarsana Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kalderon [Tue, 8 May 2018 18:29:18 +0000 (21:29 +0300)]
qed: Fix l2 initializations over iWARP personality
If qede driver was loaded on a device configured for iWARP
the l2 mutex wouldn't be allocated, and some l2 related
resources wouldn't be freed.
fixes:
c851a9dc4359 ("qed: Introduce iWARP personality")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Sudarsana Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 10 May 2018 18:42:01 +0000 (11:42 -0700)]
Merge tag 'for-4.17/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- a stable fix for DM integrity to use kvfree
- fix for a 4.17-rc1 change to dm-bufio's buffer alignment
- fixes for a few sparse warnings
- remove VLA usage in DM mirror target
- improve DM thinp Documentation for the "read_only" feature
* tag 'for-4.17/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm thin: update Documentation to clarify when "read_only" is valid
dm mirror: remove VLA usage
dm: fix some sparse warnings and whitespace in dax methods
dm cache background tracker: fix sparse warning
dm bufio: fix buffer alignment
dm integrity: use kvfree for kvmalloc'd memory
Ingo Molnar [Thu, 10 May 2018 18:09:00 +0000 (20:09 +0200)]
Merge tag 'perf-urgent-for-mingo-4.17-
20180507' of git://git./linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
. Revert "perf pmu: Fix pmu events parsing rule", as it broke Intel PT
event description parsing (Arnaldo Carvalho de Melo)
. Sync x86's cpufeatures.h and kvm UAPI headers with the kernel sources,
suppressing the ABI drift warnings (Arnaldo Carvalho de Melo)
- Remove duplicated entry for westmereep-dp in Intel's mapfile.csv (William Cohen)
- Fix typo in 'perf bench numa' options description (Yisheng Xie)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>