platform/kernel/linux-rpi.git
5 years agoscripts/gdb: fix debugging modules on s390
Ilya Leoshkevich [Sat, 19 Oct 2019 03:20:43 +0000 (20:20 -0700)]
scripts/gdb: fix debugging modules on s390

Currently lx-symbols assumes that module text is always located at
module->core_layout->base, but s390 uses the following layout:

  +------+  <- module->core_layout->base
  | GOT  |
  +------+  <- module->core_layout->base + module->arch->plt_offset
  | PLT  |
  +------+  <- module->core_layout->base + module->arch->plt_offset +
  | TEXT |     module->arch->plt_size
  +------+

Therefore, when trying to debug modules on s390, all the symbol
addresses are skewed by plt_offset + plt_size.

Fix by adding plt_offset + plt_size to module_addr in
load_module_symbols().

Link: http://lkml.kernel.org/r/20191017085917.81791-1-iii@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agokernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register
Song Liu [Sat, 19 Oct 2019 03:20:40 +0000 (20:20 -0700)]
kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register

Attaching uprobe to text section in THP splits the PMD mapped page table
into PTE mapped entries.  On uprobe detach, we would like to regroup PMD
mapped page table entry to regain performance benefit of THP.

However, the regroup is broken For perf_event based trace_uprobe.  This
is because perf_event based trace_uprobe calls uprobe_unregister twice
on close: first in TRACE_REG_PERF_CLOSE, then in
TRACE_REG_PERF_UNREGISTER.  The second call will split the PMD mapped
page table entry, which is not the desired behavior.

Fix this by only use FOLL_SPLIT_PMD for uprobe register case.

Add a WARN() to confirm uprobe unregister never work on huge pages, and
abort the operation when this WARN() triggers.

Link: http://lkml.kernel.org/r/20191017164223.2762148-6-songliubraving@fb.com
Fixes: 5a52c9df62b4 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT")
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/thp: allow dropping THP from page cache
Kirill A. Shutemov [Sat, 19 Oct 2019 03:20:36 +0000 (20:20 -0700)]
mm/thp: allow dropping THP from page cache

Once a THP is added to the page cache, it cannot be dropped via
/proc/sys/vm/drop_caches.  Fix this issue with proper handling in
invalidate_mapping_pages().

Link: http://lkml.kernel.org/r/20191017164223.2762148-5-songliubraving@fb.com
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/vmscan.c: support removing arbitrary sized pages from mapping
William Kucharski [Sat, 19 Oct 2019 03:20:33 +0000 (20:20 -0700)]
mm/vmscan.c: support removing arbitrary sized pages from mapping

__remove_mapping() assumes that pages can only be either base pages or
HPAGE_PMD_SIZE.  Ask the page what size it is.

Link: http://lkml.kernel.org/r/20191017164223.2762148-4-songliubraving@fb.com
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/thp: fix node page state in split_huge_page_to_list()
Kirill A. Shutemov [Sat, 19 Oct 2019 03:20:30 +0000 (20:20 -0700)]
mm/thp: fix node page state in split_huge_page_to_list()

Make sure split_huge_page_to_list() handles the state of shmem THP and
file THP properly.

Link: http://lkml.kernel.org/r/20191017164223.2762148-3-songliubraving@fb.com
Fixes: 60fbf0ab5da1 ("mm,thp: stats for file backed THP")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoproc/meminfo: fix output alignment
Kirill A. Shutemov [Sat, 19 Oct 2019 03:20:27 +0000 (20:20 -0700)]
proc/meminfo: fix output alignment

Patch series "Fixes for THP in page cache", v2.

This patch (of 5):

Add extra space for FileHugePages and FilePmdMapped, so the output is
aligned with other rows.

Link: http://lkml.kernel.org/r/20191017164223.2762148-2-songliubraving@fb.com
Fixes: 60fbf0ab5da1 ("mm,thp: stats for file backed THP")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch
Ben Dooks (Codethink) [Sat, 19 Oct 2019 03:20:24 +0000 (20:20 -0700)]
mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch

mm_init.c needs to include <linux/mman.h> for the definition of
vm_committed_as_batch.  Fixes the following sparse warning:

  mm/mm_init.c:141:5: warning: symbol 'vm_committed_as_batch' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20191016091509.26708-1-ben.dooks@codethink.co.uk
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definition
Ben Dooks [Sat, 19 Oct 2019 03:20:20 +0000 (20:20 -0700)]
mm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definition

The generic_file_vm_ops is defined in <linux/ramfs.h> so include it to
fix the following warning:

  mm/filemap.c:2717:35: warning: symbol 'generic_file_vm_ops' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20191008102311.25432-1-ben.dooks@codethink.co.uk
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm: include <linux/huge_mm.h> for is_vma_temporary_stack
Ben Dooks [Sat, 19 Oct 2019 03:20:17 +0000 (20:20 -0700)]
mm: include <linux/huge_mm.h> for is_vma_temporary_stack

Include <linux/huge_mm.h> for the definition of is_vma_temporary_stack
to fix the following sparse warning:

  mm/rmap.c:1673:6: warning: symbol 'is_vma_temporary_stack' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20191009151155.27763-1-ben.dooks@codethink.co.uk
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agozram: fix race between backing_dev_show and backing_dev_store
Chenwandun [Sat, 19 Oct 2019 03:20:14 +0000 (20:20 -0700)]
zram: fix race between backing_dev_show and backing_dev_store

CPU0:        CPU1:
backing_dev_show        backing_dev_store
    ......    ......
    file = zram->backing_dev;
    down_read(&zram->init_lock);    down_read(&zram->init_init_lock)
    file_path(file, ...);    zram->backing_dev = backing_dev;
    up_read(&zram->init_lock);    up_read(&zram->init_lock);

gets the value of zram->backing_dev too early in backing_dev_show, which
resultin the value being NULL at the beginning, and not NULL later.

backtrace:
  d_path+0xcc/0x174
  file_path+0x10/0x18
  backing_dev_show+0x40/0xb4
  dev_attr_show+0x20/0x54
  sysfs_kf_seq_show+0x9c/0x10c
  kernfs_seq_show+0x28/0x30
  seq_read+0x184/0x488
  kernfs_fop_read+0x5c/0x1a4
  __vfs_read+0x44/0x128
  vfs_read+0xa0/0x138
  SyS_read+0x54/0xb4

Link: http://lkml.kernel.org/r/1571046839-16814-1-git-send-email-chenwandun@huawei.com
Signed-off-by: Chenwandun <chenwandun@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/memcontrol: update lruvec counters in mem_cgroup_move_account
Konstantin Khlebnikov [Sat, 19 Oct 2019 03:20:11 +0000 (20:20 -0700)]
mm/memcontrol: update lruvec counters in mem_cgroup_move_account

Mapped, dirty and writeback pages are also counted in per-lruvec stats.
These counters needs update when page is moved between cgroups.

Currently is nobody *consuming* the lruvec versions of these counters and
that there is no user-visible effect.

Link: http://lkml.kernel.org/r/157112699975.7360.1062614888388489788.stgit@buzz
Fixes: 00f3ca2c2d66 ("mm: memcontrol: per-lruvec stats infrastructure")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoocfs2: fix panic due to ocfs2_wq is null
Yi Li [Sat, 19 Oct 2019 03:20:08 +0000 (20:20 -0700)]
ocfs2: fix panic due to ocfs2_wq is null

mount.ocfs2 failed when reading ocfs2 filesystem superblock encounters
an error.  ocfs2_initialize_super() returns before allocating ocfs2_wq.
ocfs2_dismount_volume() triggers the following panic.

  Oct 15 16:09:27 cnwarekv-205120 kernel: On-disk corruption discovered.Please run fsck.ocfs2 once the filesystem is unmounted.
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_read_locked_inode:537 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:458 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:491 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_initialize_super:2313 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_fill_super:1033 ERROR: status = -30
  ------------[ cut here ]------------
  Oops: 0002 [#1] SMP NOPTI
  CPU: 1 PID: 11753 Comm: mount.ocfs2 Tainted: G  E
        4.14.148-200.ckv.x86_64 #1
  Hardware name: Sugon H320-G30/35N16-US, BIOS 0SSDX017 12/21/2018
  task: ffff967af0520000 task.stack: ffffa5f05484000
  RIP: 0010:mutex_lock+0x19/0x20
  Call Trace:
    flush_workqueue+0x81/0x460
    ocfs2_shutdown_local_alloc+0x47/0x440 [ocfs2]
    ocfs2_dismount_volume+0x84/0x400 [ocfs2]
    ocfs2_fill_super+0xa4/0x1270 [ocfs2]
    ? ocfs2_initialize_super.isa.211+0xf20/0xf20 [ocfs2]
    mount_bdev+0x17f/0x1c0
    mount_fs+0x3a/0x160

Link: http://lkml.kernel.org/r/1571139611-24107-1-git-send-email-yili@winhong.com
Signed-off-by: Yi Li <yilikernel@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agohugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()
David Hildenbrand [Sat, 19 Oct 2019 03:20:05 +0000 (20:20 -0700)]
hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()

Uninitialized memmaps contain garbage and in the worst case trigger
kernel BUGs, especially with CONFIG_PAGE_POISONING.  They should not get
touched.

Let's make sure that we only consider online memory (managed by the
buddy) that has initialized memmaps.  ZONE_DEVICE is not applicable.

page_zone() will call page_to_nid(), which will trigger
VM_BUG_ON_PGFLAGS(PagePoisoned(page), page) with CONFIG_PAGE_POISONING
and CONFIG_DEBUG_VM_PGFLAGS when called on uninitialized memmaps.  This
can be the case when an offline memory block (e.g., never onlined) is
spanned by a zone.

Note: As explained by Michal in [1], alloc_contig_range() will verify
the range.  So it boils down to the wrong access in this function.

[1] http://lkml.kernel.org/r/20180423000943.GO17484@dhcp22.suse.cz

Link: http://lkml.kernel.org/r/20191015120717.4858-1-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm: memblock: do not enforce current limit for memblock_phys* family
Mike Rapoport [Sat, 19 Oct 2019 03:20:01 +0000 (20:20 -0700)]
mm: memblock: do not enforce current limit for memblock_phys* family

Until commit 92d12f9544b7 ("memblock: refactor internal allocation
functions") the maximal address for memblock allocations was forced to
memblock.current_limit only for the allocation functions returning
virtual address.  The changes introduced by that commit moved the limit
enforcement into the allocation core and as a result the allocation
functions returning physical address also started to limit allocations
to memblock.current_limit.

This caused breakage of etnaviv GPU driver:

  etnaviv etnaviv: bound 130000.gpu (ops gpu_ops)
  etnaviv etnaviv: bound 134000.gpu (ops gpu_ops)
  etnaviv etnaviv: bound 2204000.gpu (ops gpu_ops)
  etnaviv-gpu 130000.gpu: model: GC2000, revision: 5108
  etnaviv-gpu 130000.gpu: command buffer outside valid memory window
  etnaviv-gpu 134000.gpu: model: GC320, revision: 5007
  etnaviv-gpu 134000.gpu: command buffer outside valid memory window
  etnaviv-gpu 2204000.gpu: model: GC355, revision: 1215
  etnaviv-gpu 2204000.gpu: Ignoring GPU with VG and FE2.0

Restore the behaviour of memblock_phys* family so that these functions
will not enforce memblock.current_limit.

Link: http://lkml.kernel.org/r/1570915861-17633-1-git-send-email-rppt@kernel.org
Fixes: 92d12f9544b7 ("memblock: refactor internal allocation functions")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: Adam Ford <aford173@gmail.com>
Tested-by: Adam Ford <aford173@gmail.com> [imx6q-logicpd]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size
Honglei Wang [Sat, 19 Oct 2019 03:19:58 +0000 (20:19 -0700)]
mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size

Commit 1a61ab8038e72 ("mm: memcontrol: replace zone summing with
lruvec_page_state()") has made lruvec_page_state to use per-cpu counters
instead of calculating it directly from lru_zone_size with an idea that
this would be more effective.

Tim has reported that this is not really the case for their database
benchmark which is showing an opposite results where lruvec_page_state
is taking up a huge chunk of CPU cycles (about 25% of the system time
which is roughly 7% of total cpu cycles) on 5.3 kernels.  The workload
is running on a larger machine (96cpus), it has many cgroups (500) and
it is heavily direct reclaim bound.

Tim Chen said:

: The problem can also be reproduced by running simple multi-threaded
: pmbench benchmark with a fast Optane SSD swap (see profile below).
:
:
: 6.15%     3.08%  pmbench          [kernel.vmlinux]            [k] lruvec_lru_size
:             |
:             |--3.07%--lruvec_lru_size
:             |          |
:             |          |--2.11%--cpumask_next
:             |          |          |
:             |          |           --1.66%--find_next_bit
:             |          |
:             |           --0.57%--call_function_interrupt
:             |                     |
:             |                      --0.55%--smp_call_function_interrupt
:             |
:             |--1.59%--0x441f0fc3d009
:             |          _ops_rdtsc_init_base_freq
:             |          access_histogram
:             |          page_fault
:             |          __do_page_fault
:             |          handle_mm_fault
:             |          __handle_mm_fault
:             |          |
:             |           --1.54%--do_swap_page
:             |                     swapin_readahead
:             |                     swap_cluster_readahead
:             |                     |
:             |                      --1.53%--read_swap_cache_async
:             |                                __read_swap_cache_async
:             |                                alloc_pages_vma
:             |                                __alloc_pages_nodemask
:             |                                __alloc_pages_slowpath
:             |                                try_to_free_pages
:             |                                do_try_to_free_pages
:             |                                shrink_node
:             |                                shrink_node_memcg
:             |                                |
:             |                                |--0.77%--lruvec_lru_size
:             |                                |
:             |                                 --0.76%--inactive_list_is_low
:             |                                           |
:             |                                            --0.76%--lruvec_lru_size
:             |
:              --1.50%--measure_read
:                        page_fault
:                        __do_page_fault
:                        handle_mm_fault
:                        __handle_mm_fault
:                        do_swap_page
:                        swapin_readahead
:                        swap_cluster_readahead
:                        |
:                         --1.48%--read_swap_cache_async
:                                   __read_swap_cache_async
:                                   alloc_pages_vma
:                                   __alloc_pages_nodemask
:                                   __alloc_pages_slowpath
:                                   try_to_free_pages
:                                   do_try_to_free_pages
:                                   shrink_node
:                                   shrink_node_memcg
:                                   |
:                                   |--0.75%--inactive_list_is_low
:                                   |          |
:                                   |           --0.75%--lruvec_lru_size
:                                   |
:                                    --0.73%--lruvec_lru_size

The likely culprit is the cache traffic the lruvec_page_state_local
generates.  Dave Hansen says:

: I was thinking purely of the cache footprint.  If it's reading
: pn->lruvec_stat_local->count[idx] is three separate cachelines, so 192
: bytes of cache *96 CPUs = 18k of data, mostly read-only.  1 cgroup would
: be 18k of data for the whole system and the caching would be pretty
: efficient and all 18k would probably survive a tight page fault loop in
: the L1.  500 cgroups would be ~90k of data per CPU thread which doesn't
: fit in the L1 and probably wouldn't survive a tight page fault loop if
: both logical threads were banging on different cgroups.
:
: It's just a theory, but it's why I noted the number of cgroups when I
: initially saw this show up in profiles

Fix the regression by partially reverting the said commit and calculate
the lru size explicitly.

Link: http://lkml.kernel.org/r/20190905071034.16822-1-honglei.wang@oracle.com
Fixes: 1a61ab8038e72 ("mm: memcontrol: replace zone summing with lruvec_page_state()")
Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: <stable@vger.kernel.org> [5.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/gup: fix a misnamed "write" argument, and a related bug
John Hubbard [Sat, 19 Oct 2019 03:19:53 +0000 (20:19 -0700)]
mm/gup: fix a misnamed "write" argument, and a related bug

In several routines, the "flags" argument is incorrectly named "write".
Change it to "flags".

Also, in one place, the misnaming led to an actual bug:
"flags & FOLL_WRITE" is required, rather than just "flags".
(That problem was flagged by krobot, in v1 of this patch.)

Also, change the flags argument from int, to unsigned int.

You can see that this was a simple oversight, because the
calling code passes "flags" to the fifth argument:

gup_pgd_range():
    ...
    if (!gup_huge_pd(__hugepd(pgd_val(pgd)), addr,
    PGDIR_SHIFT, next, flags, pages, nr))

...which, until this patch, the callees referred to as "write".

Also, change two lines to avoid checkpatch line length
complaints, and another line to fix another oversight
that checkpatch called out: missing "int" on pdshift.

Link: http://lkml.kernel.org/r/20191014184639.1512873-3-jhubbard@nvidia.com
Fixes: b798bec4741b ("mm/gup: change write parameter to flags in fast walk")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reported-by: kbuild test robot <lkp@intel.com>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/gup_benchmark: add a missing "w" to getopt string
John Hubbard [Sat, 19 Oct 2019 03:19:50 +0000 (20:19 -0700)]
mm/gup_benchmark: add a missing "w" to getopt string

Even though gup_benchmark.c has code to handle the -w command-line option,
the "w" is not part of the getopt string.  It looks as if it has been
missing the whole time.

On my machine, this leads naturally to the following predictable result:

  $ sudo ./gup_benchmark -w
  ./gup_benchmark: invalid option -- 'w'

...which is fixed with this commit.

Link: http://lkml.kernel.org/r/20191014184639.1512873-2-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: kbuild test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoocfs2: fix error handling in ocfs2_setattr()
Chengguang Xu [Sat, 19 Oct 2019 03:19:47 +0000 (20:19 -0700)]
ocfs2: fix error handling in ocfs2_setattr()

Should set transfer_to[USRQUOTA/GRPQUOTA] to NULL on error case before
jumping to do dqput().

Link: http://lkml.kernel.org/r/20191010082349.1134-1-cgxu519@mykernel.net
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release
Roman Gushchin [Sat, 19 Oct 2019 03:19:44 +0000 (20:19 -0700)]
mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release

Karsten reported the following panic in __free_slab() happening on a s390x
machine:

  Unable to handle kernel pointer dereference in virtual kernel address space
  Failing address: 0000000000000000 TEID: 0000000000000483
  Fault in home space mode while using kernel ASCE.
  AS:00000000017d4007 R3:000000007fbd0007 S:000000007fbff000 P:000000000000003d
  Oops: 0004 ilc:3 Ý#1¨ PREEMPT SMP
  Modules linked in: tcp_diag inet_diag xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_at nf_nat
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-05872-g6133e3e4bada-dirty #14
  Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
  Krnl PSW : 0704d00180000000 00000000003cadb6 (__free_slab+0x686/0x6b0)
             R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
  Krnl GPRS: 00000000f3a32928 0000000000000000 000000007fbf5d00 000000000117c4b8
             0000000000000000 000000009e3291c1 0000000000000000 0000000000000000
             0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
             0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
             000000000117ba00 000003e000057db0 00000000003cabcc 000003e000057c78
  Krnl Code: 00000000003cada6e310a1400004        lg      %r1,320(%r10)
             00000000003cadacc0e50046c286        brasl   %r14,ca32b8
            #00000000003cadb2a7f4fe36            brc     15,3caa1e
            >00000000003cadb6e32060800024        stg     %r2,128(%r6)
             00000000003cadbca7f4fd9e            brc     15,3ca8f8
             00000000003cadc0c0e50046790c        brasl   %r14,c99fd8
             00000000003cadc6a7f4fe2c            brc     15,3caa
             00000000003cadc6a7f4fe2c            brc     15,3caa1e
             00000000003cadcaecb1ffff00d9        aghik   %r11,%r1,-1
  Call Trace:
  (<00000000003cabcc> __free_slab+0x49c/0x6b0)
   <00000000001f5886> rcu_core+0x5a6/0x7e0
   <0000000000ca2dea> __do_softirq+0xf2/0x5c0
   <0000000000152644> irq_exit+0x104/0x130
   <000000000010d222> do_IRQ+0x9a/0xf0
   <0000000000ca2344> ext_int_handler+0x130/0x134
   <0000000000103648> enabled_wait+0x58/0x128
  (<0000000000103634> enabled_wait+0x44/0x128)
   <0000000000103b00> arch_cpu_idle+0x40/0x58
   <0000000000ca0544> default_idle_call+0x3c/0x68
   <000000000018eaa4> do_idle+0xec/0x1c0
   <000000000018ee0e> cpu_startup_entry+0x36/0x40
   <000000000122df34> arch_call_rest_init+0x5c/0x88
   <0000000000000000> 0x0
  INFO: lockdep is turned off.
  Last Breaking-Event-Address:
   <00000000003ca8f4> __free_slab+0x1c4/0x6b0
  Kernel panic - not syncing: Fatal exception in interrupt

The kernel panics on an attempt to dereference the NULL memcg pointer.
When shutdown_cache() is called from the kmem_cache_destroy() context, a
memcg kmem_cache might have empty slab pages in a partial list, which are
still charged to the memory cgroup.

These pages are released by free_partial() at the beginning of
shutdown_cache(): either directly or by scheduling a RCU-delayed work
(if the kmem_cache has the SLAB_TYPESAFE_BY_RCU flag).  The latter case
is when the reported panic can happen: memcg_unlink_cache() is called
immediately after shrinking partial lists, without waiting for scheduled
RCU works.  It sets the kmem_cache->memcg_params.memcg pointer to NULL,
and the following attempt to dereference it by __free_slab() from the
RCU work context causes the panic.

To fix the issue, let's postpone the release of the memcg pointer to
destroy_memcg_params().  It's called from a separate work context by
slab_caches_to_rcu_destroy_workfn(), which contains a full RCU barrier.
This guarantees that all scheduled page release RCU works will complete
before the memcg pointer will be zeroed.

Big thanks for Karsten for the perfect report containing all necessary
information, his help with the analysis of the problem and testing of the
fix.

Link: http://lkml.kernel.org/r/20191010160549.1584316-1-guro@fb.com
Fixes: fb2f2b0adb98 ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Karsten Graul <kgraul@linux.ibm.com>
Tested-by: Karsten Graul <kgraul@linux.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Karsten Graul <kgraul@linux.ibm.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/memunmap: don't access uninitialized memmap in memunmap_pages()
Aneesh Kumar K.V [Sat, 19 Oct 2019 03:19:39 +0000 (20:19 -0700)]
mm/memunmap: don't access uninitialized memmap in memunmap_pages()

Patch series "mm/memory_hotplug: Shrink zones before removing memory",
v6.

This series fixes the access of uninitialized memmaps when shrinking
zones/nodes and when removing memory.  Also, it contains all fixes for
crashes that can be triggered when removing certain namespace using
memunmap_pages() - ZONE_DEVICE, reported by Aneesh.

We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be
more involved (we don't have SECTION_IS_ONLINE as an indicator), and
shrinking is only of limited use (set_zone_contiguous() cannot detect
the ZONE_DEVICE as contiguous).

We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount
of code to a minimum.  Shrinking is especially necessary to keep
zone->contiguous set where possible, especially, on memory unplug of
DIMMs at zone boundaries.

--------------------------------------------------------------------------

Zones are now properly shrunk when offlining memory blocks or when
onlining failed.  This allows to properly shrink zones on memory unplug
even if the separate memory blocks of a DIMM were onlined to different
zones or re-onlined to a different zone after offlining.

Example:

  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  0
          present  0
          managed  0
  :/# echo "online_movable" > /sys/devices/system/memory/memory41/state
  :/# echo "online_movable" > /sys/devices/system/memory/memory43/state
  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  98304
          present  65536
          managed  65536
  :/# echo 0 > /sys/devices/system/memory/memory43/online
  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  32768
          present  32768
          managed  32768
  :/# echo 0 > /sys/devices/system/memory/memory41/online
  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  0
          present  0
          managed  0

This patch (of 10):

With an altmap, the memmap falling into the reserved altmap space are not
initialized and, therefore, contain a garbage NID and a garbage zone.
Make sure to read the NID/zone from a memmap that was initialized.

This fixes a kernel crash that is observed when destroying a namespace:

  kernel BUG at include/linux/mm.h:1107!
  cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
      pc: c0000000004b9728: memunmap_pages+0x238/0x340
      lr: c0000000004b9724: memunmap_pages+0x234/0x340
  ...
      pid   = 3669, comm = ndctl
  kernel BUG at include/linux/mm.h:1107!
    devm_action_release+0x30/0x50
    release_nodes+0x268/0x2d0
    device_release_driver_internal+0x174/0x240
    unbind_store+0x13c/0x190
    drv_attr_store+0x44/0x60
    sysfs_kf_write+0x70/0xa0
    kernfs_fop_write+0x1ac/0x290
    __vfs_write+0x3c/0x70
    vfs_write+0xe4/0x200
    ksys_write+0x7c/0x140
    system_call+0x5c/0x68

The "page_zone(pfn_to_page(pfn)" was introduced by 69324b8f4833 ("mm,
devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support"), however, I
think we will never have driver reserved memory with
MEMORY_DEVICE_PRIVATE (no altmap AFAIKS).

[david@redhat.com: minimze code changes, rephrase description]
Link: http://lkml.kernel.org/r/20191006085646.5768-2-david@redhat.com
Fixes: 2c2a5af6fed2 ("mm, memory_hotplug: add nid parameter to arch_remove_memory")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Damian Tometzki <damian.tometzki@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Rich Felker <dalias@libc.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span()
David Hildenbrand [Sat, 19 Oct 2019 03:19:33 +0000 (20:19 -0700)]
mm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span()

We might use the nid of memmaps that were never initialized.  For
example, if the memmap was poisoned, we will crash the kernel in
pfn_to_nid() right now.  Let's use the calculated boundaries of the
separate zones instead.  This now also avoids having to iterate over a
whole bunch of subsections again, after shrinking one zone.

Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory
hotplug"), the memmap was initialized to 0 and the node was set to the
right value.  After that commit, the node might be garbage.

We'll have to fix shrink_zone_span() next.

Link: http://lkml.kernel.org/r/20191006085646.5768-4-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [d0dc12e86b319]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Damian Tometzki <damian.tometzki@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Pankaj Gupta <pagupta@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Rich Felker <dalias@libc.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo
Qian Cai [Sat, 19 Oct 2019 03:19:29 +0000 (20:19 -0700)]
mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo

Uninitialized memmaps contain garbage and in the worst case trigger
kernel BUGs, especially with CONFIG_PAGE_POISONING.  They should not get
touched.

For example, when not onlining a memory block that is spanned by a zone
and reading /proc/pagetypeinfo with CONFIG_DEBUG_VM_PGFLAGS and
CONFIG_PAGE_POISONING, we can trigger a kernel BUG:

  :/# echo 1 > /sys/devices/system/memory/memory40/online
  :/# echo 1 > /sys/devices/system/memory/memory42/online
  :/# cat /proc/pagetypeinfo > test.file
   page:fffff2c585200000 is uninitialized and poisoned
   raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
   raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
   page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
   There is not page extension available.
   ------------[ cut here ]------------
   kernel BUG at include/linux/mm.h:1107!
   invalid opcode: 0000 [#1] SMP NOPTI

Please note that this change does not affect ZONE_DEVICE, because
pagetypeinfo_showmixedcount_print() is called from
mm/vmstat.c:pagetypeinfo_showmixedcount() only for populated zones, and
ZONE_DEVICE is never populated (zone->present_pages always 0).

[david@redhat.com: move check to outer loop, add comment, rephrase description]
Link: http://lkml.kernel.org/r/20191011140638.8160-1-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoscripts/gdb: fix lx-dmesg when CONFIG_PRINTK_CALLER is set
Joel Colledge [Sat, 19 Oct 2019 03:19:26 +0000 (20:19 -0700)]
scripts/gdb: fix lx-dmesg when CONFIG_PRINTK_CALLER is set

When CONFIG_PRINTK_CALLER is set, struct printk_log contains an
additional member caller_id.  This affects the offset of the log text.
Account for this by using the type information from gdb to determine all
the offsets instead of using hardcoded values.

This fixes following error:

  (gdb) lx-dmesg
  Python Exception <class 'ValueError'> embedded null character:
  Error occurred in Python command: embedded null character

The read_u* utility functions now take an offset argument to make them
easier to use.

Link: http://lkml.kernel.org/r/20191011142500.2339-1-joel.colledge@linbit.com
Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/memory-failure.c: don't access uninitialized memmaps in memory_failure()
David Hildenbrand [Sat, 19 Oct 2019 03:19:23 +0000 (20:19 -0700)]
mm/memory-failure.c: don't access uninitialized memmaps in memory_failure()

We should check for pfn_to_online_page() to not access uninitialized
memmaps.  Reshuffle the code so we don't have to duplicate the error
message.

Link: http://lkml.kernel.org/r/20191009142435.3975-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agofs/proc/page.c: don't access uninitialized memmaps in fs/proc/page.c
David Hildenbrand [Sat, 19 Oct 2019 03:19:20 +0000 (20:19 -0700)]
fs/proc/page.c: don't access uninitialized memmaps in fs/proc/page.c

There are three places where we access uninitialized memmaps, namely:
- /proc/kpagecount
- /proc/kpageflags
- /proc/kpagecgroup

We have initialized memmaps either when the section is online or when the
page was initialized to the ZONE_DEVICE.  Uninitialized memmaps contain
garbage and in the worst case trigger kernel BUGs, especially with
CONFIG_PAGE_POISONING.

For example, not onlining a DIMM during boot and calling /proc/kpagecount
with CONFIG_PAGE_POISONING:

  :/# cat /proc/kpagecount > tmp.test
  BUG: unable to handle page fault for address: fffffffffffffffe
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 114616067 P4D 114616067 PUD 114618067 PMD 0
  Oops: 0000 [#1] SMP NOPTI
  CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
  RIP: 0010:kpagecount_read+0xce/0x1e0
  Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480
  RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202
  RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000
  RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
  R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08
  FS:  00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0
  Call Trace:
   proc_reg_read+0x3c/0x60
   vfs_read+0xc5/0x180
   ksys_read+0x68/0xe0
   do_syscall_64+0x5c/0xa0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

For now, let's drop support for ZONE_DEVICE from the three pseudo files
in order to fix this.  To distinguish offline memory (with garbage
memmap) from ZONE_DEVICE memory with properly initialized memmaps, we
would have to check get_dev_pagemap() and pfn_zone_device_reserved()
right now.  The usage of both (especially, special casing devmem) is
frowned upon and needs to be reworked.

The fundamental issue we have is:

if (pfn_to_online_page(pfn)) {
/* memmap initialized */
} else if (pfn_valid(pfn)) {
/*
 * ???
 * a) offline memory. memmap garbage.
 * b) devmem: memmap initialized to ZONE_DEVICE.
 * c) devmem: reserved for driver. memmap garbage.
 * (d) devmem: memmap currently initializing - garbage)
 */
}

We'll leave the pfn_zone_device_reserved() check in stable_page_flags()
in place as that function is also used from memory failure.  We now no
longer dump information about pages that are not in use anymore -
offline.

Link: http://lkml.kernel.org/r/20191009142435.3975-2-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com>
Cc: Pankaj gupta <pagupta@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agodrivers/base/memory.c: don't access uninitialized memmaps in soft_offline_page_store()
David Hildenbrand [Sat, 19 Oct 2019 03:19:16 +0000 (20:19 -0700)]
drivers/base/memory.c: don't access uninitialized memmaps in soft_offline_page_store()

Uninitialized memmaps contain garbage and in the worst case trigger kernel
BUGs, especially with CONFIG_PAGE_POISONING.  They should not get touched.

Right now, when trying to soft-offline a PFN that resides on a memory
block that was never onlined, one gets a misleading error with
CONFIG_PAGE_POISONING:

  :/# echo 5637144576 > /sys/devices/system/memory/soft_offline_page
  [   23.097167] soft offline: 0x150000 page already poisoned

But the actual result depends on the garbage in the memmap.

soft_offline_page() can only work with online pages, it returns -EIO in
case of ZONE_DEVICE.  Make sure to only forward pages that are online
(iow, managed by the buddy) and, therefore, have an initialized memmap.

Add a check against pfn_to_online_page() and similarly return -EIO.

Link: http://lkml.kernel.org/r/20191010141200.8985-1-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: <stable@vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoMerge tag 'for-linus-2019-10-18' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 19 Oct 2019 02:29:36 +0000 (22:29 -0400)]
Merge tag 'for-linus-2019-10-18' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request from Keith that address deadlocks, double resets,
   memory leaks, and other regression.

 - Fixup elv_support_iosched() for bio based devices (Damien)

 - Fixup for the ahci PCS quirk (Dan)

 - Socket O_NONBLOCK handling fix for io_uring (me)

 - Timeout sequence io_uring fixes (yangerkun)

 - MD warning fix for parameter default_layout (Song)

 - blkcg activation fixes (Tejun)

 - blk-rq-qos node deletion fix (Tejun)

* tag 'for-linus-2019-10-18' of git://git.kernel.dk/linux-block:
  nvme-pci: Set the prp2 correctly when using more than 4k page
  io_uring: fix logic error in io_timeout
  io_uring: fix up O_NONBLOCK handling for sockets
  md/raid0: fix warning message for parameter default_layout
  libata/ahci: Fix PCS quirk application
  blk-rq-qos: fix first node deletion of rq_qos_del()
  blkcg: Fix multiple bugs in blkcg_activate_policy()
  io_uring: consider the overflow of sequence for timeout req
  nvme-tcp: fix possible leakage during error flow
  nvmet-loop: fix possible leakage during error flow
  block: Fix elv_support_iosched()
  nvme-tcp: Initialize sk->sk_ll_usec only with NET_RX_BUSY_POLL
  nvme: Wait for reset state when required
  nvme: Prevent resets during paused controller state
  nvme: Restart request timers in resetting state
  nvme: Remove ADMIN_ONLY state
  nvme-pci: Free tagset if no IO queues
  nvme: retain split access workaround for capability reads
  nvme: fix possible deadlock when nvme_update_formats fails

5 years agoMerge tag 'riscv/for-v5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv...
Linus Torvalds [Sat, 19 Oct 2019 02:26:18 +0000 (22:26 -0400)]
Merge tag 'riscv/for-v5.4-rc4' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:
 "Some RISC-V fixes:

   - Fix the virtual memory layout so the fixaddr region doesn't overlap
     with other regions. (This was originally intended to go in as part
     of an earlier patch, but I inadvertently dropped it during a
     rebase)

   - Add the DT chosen/stdout-path property to the HiFive Unleashed DT
     file. This is so "earlycon" can be specified with no arguments on
     the kernel command line, and the correct UART will be automatically
     selected.

  And two cleanup patches:

   - Simplify the code in our breakpoint trap handler.

   - Drop a comment in our TLB flush code that has caused some
     confusion"

* tag 'riscv/for-v5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: fix virtual address overlapped in FIXADDR_START and VMEMMAP_START
  riscv: tlbflush: remove confusing comment on local_flush_tlb_all()
  riscv: dts: HiFive Unleashed: add default chosen/stdout-path
  riscv: remove the switch statement in do_trap_break()

5 years agofilldir[64]: remove WARN_ON_ONCE() for bad directory entries
Linus Torvalds [Fri, 18 Oct 2019 22:41:16 +0000 (18:41 -0400)]
filldir[64]: remove WARN_ON_ONCE() for bad directory entries

This was always meant to be a temporary thing, just for testing and to
see if it actually ever triggered.

The only thing that reported it was syzbot doing disk image fuzzing, and
then that warning is expected.  So let's just remove it before -rc4,
because the extra sanity testing should probably go to -stable, but we
don't want the warning to do so.

Reported-by: syzbot+3031f712c7ad5dd4d926@syzkaller.appspotmail.com
Fixes: 8a23eb804ca4 ("Make filldir[64]() verify the directory entry filename is valid")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoMerge tag 'ceph-for-5.4-rc4' of git://github.com/ceph/ceph-client
Linus Torvalds [Fri, 18 Oct 2019 22:30:09 +0000 (18:30 -0400)]
Merge tag 'ceph-for-5.4-rc4' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A future-proofing decoding fix from Jeff intended for stable and a
  patch for a mostly benign race from Dongsheng"

* tag 'ceph-for-5.4-rc4' of git://github.com/ceph/ceph-client:
  rbd: cancel lock_dwork if the wait is interrupted
  ceph: just skip unrecognized info in ceph_reply_info_extra

5 years agoMerge tag 'for-5.4/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device...
Linus Torvalds [Fri, 18 Oct 2019 22:26:07 +0000 (18:26 -0400)]
Merge tag 'for-5.4/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix DM snapshot deadlock that can occur due to COW throttling
   preventing locks from being released.

 - Fix DM cache's GFP_NOWAIT allocation failure error paths by switching
   to GFP_NOIO.

 - Make __hash_find() static in the DM clone target.

* tag 'for-5.4/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache: fix bugs when a GFP_NOWAIT allocation fails
  dm snapshot: rework COW throttling to fix deadlock
  dm snapshot: introduce account_start_copy() and account_end_copy()
  dm clone: Make __hash_find static

5 years agoMerge tag 'iommu-fixes-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 18 Oct 2019 22:23:16 +0000 (18:23 -0400)]
Merge tag 'iommu-fixes-v5.4-rc3' of git://git./linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Fixes for page-table issues on Mali GPUs

 - Missing free in an error path for ARM-SMMU

 - PASID decoding in the AMD IOMMU Event log code

 - Another update for the locking fixes in the AMD IOMMU driver

 - Reduce the calls to platform_get_irq() in the IPMMU-VMSA and Rockchip
   IOMMUs to get rid of the warning message added to this function
   recently

* tag 'iommu-fixes-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Check PM_LEVEL_SIZE() condition in locked section
  iommu/amd: Fix incorrect PASID decoding from event log
  iommu/ipmmu-vmsa: Only call platform_get_irq() when interrupt is mandatory
  iommu/rockchip: Don't use platform_get_irq to implicitly count irqs
  iommu/io-pgtable-arm: Support all Mali configurations
  iommu/io-pgtable-arm: Correct Mali attributes
  iommu/arm-smmu: Free context bitmap in the err path of arm_smmu_init_domain_context

5 years agoMerge tag 'copy-struct-from-user-v5.4-rc4' of gitolite.kernel.org:pub/scm/linux/kerne...
Linus Torvalds [Fri, 18 Oct 2019 22:19:04 +0000 (18:19 -0400)]
Merge tag 'copy-struct-from-user-v5.4-rc4' of gitolite.pub/scm/linux/kernel/git/brauner/linux

Pull usercopy test fixlets from Christian Brauner:
 "This contains two improvements for the copy_struct_from_user() tests:

   - a coding style change to get rid of the ugly "if ((ret |= test()))"
     pointed out when pulling the original patchset.

   - avoid a soft lockups when running the usercopy tests on machines
     with large page sizes by scanning only a 1024 byte region"

* tag 'copy-struct-from-user-v5.4-rc4' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
  usercopy: Avoid soft lockups in test_check_nonzero_user()
  lib: test_user_copy: style cleanup

5 years agoARM: dts: bcm2837-rpi-cm3: Avoid leds-gpio probing issue
Stefan Wahren [Sun, 13 Oct 2019 10:53:23 +0000 (12:53 +0200)]
ARM: dts: bcm2837-rpi-cm3: Avoid leds-gpio probing issue

bcm2835-rpi.dtsi defines the behavior of the ACT LED, which is available
on all Raspberry Pi boards. But there is no driver for this particual
GPIO on CM3 in mainline yet, so this node was left incomplete without
the actual GPIO definition. Since commit 025bf37725f1 ("gpio: Fix return
value mismatch of function gpiod_get_from_of_node()") this causing probe
issues of the leds-gpio driver for users of the CM3 dtsi file.

  leds-gpio: probe of leds failed with error -2

Until we have the necessary GPIO driver hide the ACT node for CM3
to avoid this.

Reported-by: Fredrik Yhlen <fredrik.yhlen@endian.se>
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Fixes: a54fe8a6cf66 ("ARM: dts: add Raspberry Pi Compute Module 3 and IO board")
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
5 years agotracing: Fix "gfp_t" format for synthetic events
Zhengjun Xing [Fri, 18 Oct 2019 01:20:34 +0000 (09:20 +0800)]
tracing: Fix "gfp_t" format for synthetic events

In the format of synthetic events, the "gfp_t" is shown as "signed:1",
but in fact the "gfp_t" is "unsigned", should be shown as "signed:0".

The issue can be reproduced by the following commands:

echo 'memlatency u64 lat; unsigned int order; gfp_t gfp_flags; int migratetype' > /sys/kernel/debug/tracing/synthetic_events
cat  /sys/kernel/debug/tracing/events/synthetic/memlatency/format

name: memlatency
ID: 2233
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:u64 lat;  offset:8;       size:8; signed:0;
        field:unsigned int order;       offset:16;      size:4; signed:0;
        field:gfp_t gfp_flags;  offset:24;      size:4; signed:1;
        field:int migratetype;  offset:32;      size:4; signed:1;

print fmt: "lat=%llu, order=%u, gfp_flags=%x, migratetype=%d", REC->lat, REC->order, REC->gfp_flags, REC->migratetype

Link: http://lkml.kernel.org/r/20191018012034.6404-1-zhengjun.xing@linux.intel.com
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
5 years agonet: usb: lan78xx: Connect PHY before registering MAC
Andrew Lunn [Thu, 17 Oct 2019 19:29:26 +0000 (21:29 +0200)]
net: usb: lan78xx: Connect PHY before registering MAC

As soon as the netdev is registers, the kernel can start using the
interface. If the driver connects the MAC to the PHY after the netdev
is registered, there is a race condition where the interface can be
opened without having the PHY connected.

Change the order to close this race condition.

Fixes: 92571a1aae40 ("lan78xx: Connect phy early")
Reported-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'vsock-virtio-make-the-credit-mechanism-more-robust'
David S. Miller [Fri, 18 Oct 2019 17:19:43 +0000 (10:19 -0700)]
Merge branch 'vsock-virtio-make-the-credit-mechanism-more-robust'

Stefano Garzarella says:

====================
vsock/virtio: make the credit mechanism more robust

This series makes the credit mechanism implemented in the
virtio-vsock devices more robust.
Patch 1 sends an update to the remote peer when the buf_alloc
change.
Patch 2 prevents a malicious peer (especially the guest) can
consume all the memory of the other peer, discarding packets
when the credit available is not respected.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovsock/virtio: discard packets if credit is not respected
Stefano Garzarella [Thu, 17 Oct 2019 12:44:03 +0000 (14:44 +0200)]
vsock/virtio: discard packets if credit is not respected

If the remote peer doesn't respect the credit information
(buf_alloc, fwd_cnt), sending more data than it can send,
we should drop the packets to prevent a malicious peer
from using all of our memory.

This is patch follows the VIRTIO spec: "VIRTIO_VSOCK_OP_RW data
packets MUST only be transmitted when the peer has sufficient
free buffer space for the payload"

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovsock/virtio: send a credit update when buffer size is changed
Stefano Garzarella [Thu, 17 Oct 2019 12:44:02 +0000 (14:44 +0200)]
vsock/virtio: send a credit update when buffer size is changed

When the user application set a new buffer size value, we should
update the remote peer about this change, since it uses this
information to calculate the credit available.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_trap: Push Ethernet header before reporting trap
Ido Schimmel [Thu, 17 Oct 2019 07:11:03 +0000 (10:11 +0300)]
mlxsw: spectrum_trap: Push Ethernet header before reporting trap

devlink maintains packets and bytes statistics for each trap. Since
eth_type_trans() was called to set the skb's protocol, the data pointer
no longer points to the start of the packet and the bytes accounting is
off by 14 bytes.

Fix this by pushing the skb's data pointer to the start of the packet.

Fixes: b5ce611fd96e ("mlxsw: spectrum: Add devlink-trap support")
Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Tested-by: Alex Kushnarov <alexanderk@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoASoC: SOF: control: return true when kcontrol values change
Dragos Tarcatu [Fri, 18 Oct 2019 12:38:06 +0000 (07:38 -0500)]
ASoC: SOF: control: return true when kcontrol values change

All the kcontrol put() functions are currently returning 0 when
successful. This does not go well with alsamixer as it does
not seem to get notified on SND_CTL_EVENT_MASK_VALUE callbacks
when values change for (some of) the sof kcontrols.
This patch fixes that by returning true for volume, switch
and enum type kcontrols when values do change in put().

Signed-off-by: Dragos Tarcatu <dragos_tarcatu@mentor.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20191018123806.18063-1-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
5 years agoASoC: stm32: sai: fix sysclk management on shutdown
Olivier Moysan [Fri, 18 Oct 2019 08:20:40 +0000 (10:20 +0200)]
ASoC: stm32: sai: fix sysclk management on shutdown

The commit below, adds a call to sysclk callback on shutdown.
This introduces a regression in stm32 SAI driver, as some clock
services are called twice, leading to unbalanced calls.
Move processing related to mclk from shutdown to sysclk callback.
When requested frequency is 0, assume shutdown and release mclk.

Fixes: 2458adb8f92a ("SoC: simple-card-utils: set 0Hz to sysclk when shutdown")

Signed-off-by: Olivier Moysan <olivier.moysan@st.com>
Link: https://lore.kernel.org/r/20191018082040.31022-1-olivier.moysan@st.com
Signed-off-by: Mark Brown <broonie@kernel.org>
5 years agoASoC: Intel: sof-rt5682: add a check for devm_clk_get
Chuhong Yuan [Thu, 17 Oct 2019 02:50:44 +0000 (10:50 +0800)]
ASoC: Intel: sof-rt5682: add a check for devm_clk_get

sof_audio_probe misses a check for devm_clk_get and may cause problems.
Add a check for it to fix the bug.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20191017025044.31474-1-hslester96@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
5 years agoASoC: rsnd: Reinitialize bit clock inversion flag for every format setting
Junya Monden [Wed, 16 Oct 2019 12:42:55 +0000 (14:42 +0200)]
ASoC: rsnd: Reinitialize bit clock inversion flag for every format setting

Unlike other format-related DAI parameters, rdai->bit_clk_inv flag
is not properly re-initialized when setting format for new stream
processing. The inversion, if requested, is then applied not to default,
but to a previous value, which leads to SCKP bit in SSICR register being
set incorrectly.
Fix this by re-setting the flag to its initial value, determined by format.

Fixes: 1a7889ca8aba3 ("ASoC: rsnd: fixup SND_SOC_DAIFMT_xB_xF behavior")
Cc: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Cc: Jiada Wang <jiada_wang@mentor.com>
Cc: Timo Wischer <twischer@de.adit-jv.com>
Cc: stable@vger.kernel.org # v3.17+
Signed-off-by: Junya Monden <jmonden@jp.adit-jv.com>
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://lore.kernel.org/r/20191016124255.7442-1-erosca@de.adit-jv.com
Signed-off-by: Mark Brown <broonie@kernel.org>
5 years agonet: ensure correct skb->tstamp in various fragmenters
Eric Dumazet [Thu, 17 Oct 2019 01:00:56 +0000 (18:00 -0700)]
net: ensure correct skb->tstamp in various fragmenters

Thomas found that some forwarded packets would be stuck
in FQ packet scheduler because their skb->tstamp contained
timestamps far in the future.

We thought we addressed this point in commit 8203e2d844d3
("net: clear skb->tstamp in forwarding paths") but there
is still an issue when/if a packet needs to be fragmented.

In order to meet EDT requirements, we have to make sure all
fragments get the original skb->tstamp.

Note that this original skb->tstamp should be zero in
forwarding path, but might have a non zero value in
output path if user decided so.

Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Thomas Bartschies <Thomas.Bartschies@cvk.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mmc-v5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Fri, 18 Oct 2019 17:00:46 +0000 (10:00 -0700)]
Merge tag 'mmc-v5.4-rc1' of git://git./linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "MMC host:
   - sdhci-iproc: Prevent some spurious interrupts
   - renesas_sdhi/sh_mmcif: Avoid false warnings about IRQs not found

  MEMSTICK host:
   - jmb38x_ms: Fix an error handling path at ->probe()"

* tag 'mmc-v5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  memstick: jmb38x_ms: Fix an error handling path in 'jmb38x_ms_probe()'
  mmc: sdhci-iproc: fix spurious interrupts on Multiblock reads with bcm2711
  mmc: sh_mmcif: Use platform_get_irq_optional() for optional interrupt
  mmc: renesas_sdhi: Do not use platform_get_irq() to count interrupts

5 years agoMerge branch 'net-bcmgenet-restore-internal-EPHY-support'
David S. Miller [Fri, 18 Oct 2019 17:00:07 +0000 (10:00 -0700)]
Merge branch 'net-bcmgenet-restore-internal-EPHY-support'

Doug Berger says:

====================
net: bcmgenet: restore internal EPHY support

I managed to get my hands on an old BCM97435SVMB board to do some
testing with the latest kernel and uncovered a number of things
that managed to get broken over the years (some by me ;).

This commit set attempts to correct the errors I observed in my
testing.

The first commit applies to all internal PHYs to restore proper
reporting of link status when a link comes up.

The second commit restores the soft reset to the initialization of
the older internal EPHYs used by 40nm Set-Top Box devices.

The third corrects a bug I introduced when removing excessive soft
resets by altering the initialization sequence in a way that keeps
the GENETv3 MAC interface happy.

Finally, I observed a number of issues when manually configuring
the network interface of the older EPHYs that appear to be resolved
by the fourth commit.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bcmgenet: reset 40nm EPHY on energy detect
Doug Berger [Wed, 16 Oct 2019 23:06:32 +0000 (16:06 -0700)]
net: bcmgenet: reset 40nm EPHY on energy detect

The EPHY integrated into the 40nm Set-Top Box devices can falsely
detect energy when connected to a disabled peer interface. When the
peer interface is enabled the EPHY will detect and report the link
as active, but on occasion may get into a state where it is not
able to exchange data with the connected GENET MAC. This issue has
not been observed when the link parameters are auto-negotiated;
however, it has been observed with a manually configured link.

It has been empirically determined that issuing a soft reset to the
EPHY when energy is detected prevents it from getting into this bad
state.

Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bcmgenet: soft reset 40nm EPHYs before MAC init
Doug Berger [Wed, 16 Oct 2019 23:06:31 +0000 (16:06 -0700)]
net: bcmgenet: soft reset 40nm EPHYs before MAC init

It turns out that the "Workaround for putting the PHY in IDDQ mode"
used by the internal EPHYs on 40nm Set-Top Box chips when powering
down puts the interface to the GENET MAC in a state that can cause
subsequent MAC resets to be incomplete.

Rather than restore the forced soft reset when powering up internal
PHYs, this commit moves the invocation of phy_init_hw earlier in
the MAC initialization sequence to just before the MAC reset in the
open and resume functions. This allows the interface to be stable
and allows the MAC resets to be successful.

The bcmgenet_mii_probe() function is split in two to accommodate
this. The new function bcmgenet_mii_connect() handles the first
half of the functionality before the MAC initialization, and the
bcmgenet_mii_config() function is extended to provide the remaining
PHY configuration following the MAC initialization.

Fixes: 484bfa1507bf ("Revert "net: bcmgenet: Software reset EPHY after power on"")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: bcm7xxx: define soft_reset for 40nm EPHY
Doug Berger [Wed, 16 Oct 2019 23:06:30 +0000 (16:06 -0700)]
net: phy: bcm7xxx: define soft_reset for 40nm EPHY

The internal 40nm EPHYs use a "Workaround for putting the PHY in
IDDQ mode." These PHYs require a soft reset to restore functionality
after they are powered back up.

This commit defines the soft_reset function to use genphy_soft_reset
during phy_init_hw to accommodate this.

Fixes: 6e2d85ec0559 ("net: phy: Stop with excessive soft reset")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bcmgenet: don't set phydev->link from MAC
Doug Berger [Wed, 16 Oct 2019 23:06:29 +0000 (16:06 -0700)]
net: bcmgenet: don't set phydev->link from MAC

When commit 28b2e0d2cd13 ("net: phy: remove parameter new_link from
phy_mac_interrupt()") removed the new_link parameter it set the
phydev->link state from the MAC before invoking phy_mac_interrupt().

However, once commit 88d6272acaaa ("net: phy: avoid unneeded MDIO
reads in genphy_read_status") was added this initialization prevents
the proper determination of the connection parameters by the function
genphy_read_status().

This commit removes that initialization to restore the proper
functionality.

Fixes: 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in genphy_read_status")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'sound-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 18 Oct 2019 16:21:13 +0000 (09:21 -0700)]
Merge tag 'sound-5.4-rc4' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Just a few small fixes for the usual suspect, HD- and USB-audio:
  enablement of runtime PM for Nvidia due to the recent PCI changes, a
  fix for potential hangs with recent HD-audio platforms, and the rest
  device-specific quirks"

* tag 'sound-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Force runtime PM on Nvidia HDMI codecs
  ALSA: hda/realtek - Enable headset mic on Asus MJ401TA
  ALSA: usb-audio: Disable quirks for BOSS Katana amplifiers
  ALSA: hdac: clear link output stream mapping
  ALSA: hda/realtek: Reduce the Headphone static noise on XPS 9350/9360

5 years agoMerge branch 'watchdog-fix' into fixes
Tony Lindgren [Fri, 18 Oct 2019 15:47:39 +0000 (08:47 -0700)]
Merge branch 'watchdog-fix' into fixes

5 years agobus: ti-sysc: Fix watchdog quirk handling
Tony Lindgren [Thu, 17 Oct 2019 18:21:44 +0000 (11:21 -0700)]
bus: ti-sysc: Fix watchdog quirk handling

I noticed that when probed with ti-sysc, watchdog can trigger on am3, am4
and dra7 causing a device reset.

Turns out I made several mistakes implementing the watchdog quirk handling:

1. We must do both writes to spr register

2. We must also call the reset quirk on disable

3. On am3 and am4 we need to also set swsup quirk flag

I probably only tested this earlier with watchdog service running when the
watchdog never gets disabled.

Fixes: 4e23be473e30 ("bus: ti-sysc: Add support for module specific reset quirks")
Signed-off-by: Tony Lindgren <tony@atomide.com>
5 years agoARM: OMAP2+: Add pdata for OMAP3 ISP IOMMU
Suman Anna [Tue, 27 Aug 2019 00:14:52 +0000 (19:14 -0500)]
ARM: OMAP2+: Add pdata for OMAP3 ISP IOMMU

The OMAP3 ISP IOMMU does not have any reset lines, so it didn't
need any pdata previously. The OMAP IOMMU driver now requires the
platform data ops for device_enable/idle on all the IOMMU devices
after commit db8918f61d51 ("iommu/omap: streamline enable/disable
through runtime pm callbacks") to enable/disable the clocks properly
and maintain the reference count and the omap_hwmod state machine.
So, add these callbacks through iommu pdata quirks for the OMAP3
ISP IOMMU.

Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
5 years agoARM: OMAP2+: Plug in device_enable/idle ops for IOMMUs
Suman Anna [Tue, 27 Aug 2019 00:14:51 +0000 (19:14 -0500)]
ARM: OMAP2+: Plug in device_enable/idle ops for IOMMUs

The OMAP IOMMU driver requires the device_enable/idle platform
data ops on all the IOMMU devices to be able to enable and disable
the clocks after commit db8918f61d51 ("iommu/omap: streamline
enable/disable through runtime pm callbacks"). Plug in these
pdata ops for all the existing IOMMUs through pdata quirks to
maintain functionality.

Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
5 years agoMerge tag 'acpi-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 18 Oct 2019 15:38:26 +0000 (08:38 -0700)]
Merge tag 'acpi-5.4-rc4' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "Fix possible use-after-free in the ACPI CPPC support code (John Garry)
  and prevent the ACPI HMAT parsing code from using possibly incorrect
  data coming from the platform firmware (Daniel Black)"

* tag 'acpi-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: CPPC: Set pcc_data[pcc_ss_id] to NULL in acpi_cppc_processor_exit()
  ACPI: HMAT: ACPI_HMAT_MEMORY_PD_VALID is deprecated since ACPI-6.3

5 years agoMerge tag 'pm-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 18 Oct 2019 15:34:04 +0000 (08:34 -0700)]
Merge tag 'pm-5.4-rc4' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These include a fix for a recent regression in the ACPI CPU
performance scaling code, a PCI device power management fix,
a system shutdown fix related to cpufreq, a removal of an ACPI
suspend-to-idle blacklist entry and a build warning fix.

Specifics:

   - Fix possible NULL pointer dereference in the ACPI processor scaling
     initialization code introduced by a recent cpufreq update (Rafael
     Wysocki).

   - Fix possible deadlock due to suspending cpufreq too late during
     system shutdown (Rafael Wysocki).

   - Make the PCI device system resume code path be more consistent with
     its PM-runtime counterpart to fix an issue with missing delay on
     transitions from D3cold to D0 during system resume from
     suspend-to-idle on some systems (Rafael Wysocki).

   - Drop Dell XPS13 9360 from the LPS0 Idle _DSM blacklist to make it
     use suspend-to-idle by default (Mario Limonciello).

   - Fix build warning in the core system suspend support code (Ben
     Dooks)"

* tag 'pm-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: processor: Avoid NULL pointer dereferences at init time
  PCI: PM: Fix pci_power_up()
  PM: sleep: include <linux/pm_runtime.h> for pm_wq
  cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown
  ACPI: PM: Drop Dell XPS13 9360 from LPS0 Idle _DSM blacklist

5 years agoMerge tag 'mkp-scsi-postmerge' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp...
Linus Torvalds [Fri, 18 Oct 2019 15:08:53 +0000 (08:08 -0700)]
Merge tag 'mkp-scsi-postmerge' of git://git./linux/kernel/git/mkp/scsi

Pull scsi fixes from Martin Petersen:
 "These two commits were in a separate postmerge branch due to a
  dependency on changes merged for 5.4 in the block tree.

  They fix two issues in the intersection of the request cleanup changes
  from block (b7e9e1fb7a92) and the request batching changes
  (8930a6c20791) that were made to SCSI during the 5.4 cycle"

* tag 'mkp-scsi-postmerge' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi:
  scsi: core: fix dh and multipathing for SCSI hosts without request batching
  scsi: core: fix missing .cleanup_rq for SCSI hosts without request batching

5 years agoiommu/amd: Check PM_LEVEL_SIZE() condition in locked section
Joerg Roedel [Fri, 18 Oct 2019 09:34:22 +0000 (11:34 +0200)]
iommu/amd: Check PM_LEVEL_SIZE() condition in locked section

The increase_address_space() function has to check the PM_LEVEL_SIZE()
condition again under the domain->lock to avoid a false trigger of the
WARN_ON_ONCE() and to avoid that the address space is increase more
often than necessary.

Reported-by: Qian Cai <cai@lca.pw>
Fixes: 754265bcab78 ("iommu/amd: Fix race in increase_address_space()")
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
5 years agoMerge branch 'nvme-5.4' of git://git.infradead.org/nvme into for-linus
Jens Axboe [Fri, 18 Oct 2019 14:49:25 +0000 (08:49 -0600)]
Merge branch 'nvme-5.4' of git://git.infradead.org/nvme into for-linus

Pull NVMe updates from Keith:

"This is a collection of bug fixes committed since the previous pull
 request that address deadlocks, double resets, memory leaks, and other
 regression."

* 'nvme-5.4' of git://git.infradead.org/nvme:
  nvme-pci: Set the prp2 correctly when using more than 4k page
  nvme-tcp: fix possible leakage during error flow
  nvmet-loop: fix possible leakage during error flow
  nvme-tcp: Initialize sk->sk_ll_usec only with NET_RX_BUSY_POLL
  nvme: Wait for reset state when required
  nvme: Prevent resets during paused controller state
  nvme: Restart request timers in resetting state
  nvme: Remove ADMIN_ONLY state
  nvme-pci: Free tagset if no IO queues
  nvme: retain split access workaround for capability reads
  nvme: fix possible deadlock when nvme_update_formats fails

5 years agonvme-pci: Set the prp2 correctly when using more than 4k page
Kevin Hao [Fri, 18 Oct 2019 02:53:14 +0000 (10:53 +0800)]
nvme-pci: Set the prp2 correctly when using more than 4k page

In the current code, the nvme is using a fixed 4k PRP entry size,
but if the kernel use a page size which is more than 4k, we should
consider the situation that the bv_offset may be larger than the
dev->ctrl.page_size. Otherwise we may miss setting the prp2 and then
cause the command can't be executed correctly.

Fixes: dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
5 years agosymbol namespaces: revert to previous __ksymtab name scheme
Matthias Maennich [Fri, 18 Oct 2019 09:31:42 +0000 (10:31 +0100)]
symbol namespaces: revert to previous __ksymtab name scheme

The introduction of Symbol Namespaces changed the naming schema of the
__ksymtab entries from __kysmtab__symbol to __ksymtab_NAMESPACE.symbol.

That caused some breakages in tools that depend on the name layout in
either the binaries(vmlinux,*.ko) or in System.map. E.g. kmod's depmod
would not be able to read System.map without a patch to support symbol
namespaces. A warning reported by depmod for namespaced symbols would
look like

  depmod: WARNING: [...]/uas.ko needs unknown symbol usb_stor_adjust_quirks

In order to address this issue, revert to the original naming scheme and
rather read the __kstrtabns_<symbol> entries and their corresponding
values from __ksymtab_strings to update the namespace values for
symbols. After having read all symbols and handled them in
handle_modversions(), the symbols are created. In a second pass, read
the __kstrtabns_ entries and update the namespaces accordingly.

Fixes: 8651ec01daed ("module: add support for symbol namespaces.")
Reported-by: Stefan Wahren <stefan.wahren@i2se.com>
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
5 years agomodpost: make updating the symbol namespace explicit
Matthias Maennich [Fri, 18 Oct 2019 09:31:41 +0000 (10:31 +0100)]
modpost: make updating the symbol namespace explicit

Setting the symbol namespace of a symbol within sym_add_exported feels
displaced and lead to issues in the current implementation of symbol
namespaces. This patch makes updating the namespace an explicit call to
decouple it from adding a symbol to the export list.

Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
5 years agomodpost: delegate updating namespaces to separate function
Matthias Maennich [Fri, 18 Oct 2019 09:31:40 +0000 (10:31 +0100)]
modpost: delegate updating namespaces to separate function

Let the function 'sym_update_namespace' take care of updating the
namespace for a symbol. While this currently only replaces one single
location where namespaces are updated, in a following patch, this
function will get more call sites.

The function signature is intentionally close to sym_update_crc and
taking the name by char* seems like unnecessary work as the symbol has
to be looked up again. In a later patch of this series, this concern
will be addressed.

This function ensures that symbol::namespace is either NULL or has a
valid non-empty value. Previously, the empty string was considered 'no
namespace' as well and this lead to confusion.

Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
5 years agox86/boot/acpi: Move get_cmdline_acpi_rsdp() under #ifdef guard
Zhenzhong Duan [Sun, 29 Sep 2019 01:13:52 +0000 (09:13 +0800)]
x86/boot/acpi: Move get_cmdline_acpi_rsdp() under #ifdef guard

When building with "EXTRA_CFLAGS=-Wall" gcc warns:

arch/x86/boot/compressed/acpi.c:29:30: warning: get_cmdline_acpi_rsdp defined but not used [-Wunused-function]

get_cmdline_acpi_rsdp() is only used when CONFIG_RANDOMIZE_BASE and
CONFIG_MEMORY_HOTREMOVE are both enabled, so any build where one of these
config options is disabled has this issue.

Move the function under the same ifdef guard as the call site.

[ tglx: Add context to the changelog so it becomes useful ]

Fixes: 41fa1ee9c6d6 ("acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1569719633-32164-1-git-send-email-zhenzhong.duan@oracle.com
5 years agox86/hyperv: Set pv_info.name to "Hyper-V"
Andrea Parri [Tue, 15 Oct 2019 10:35:02 +0000 (12:35 +0200)]
x86/hyperv: Set pv_info.name to "Hyper-V"

Michael reported that the x86/hyperv initialization code prints the
following dmesg when running in a VM on Hyper-V:

  [    0.000738] Booting paravirtualized kernel on bare hardware

Let the x86/hyperv initialization code set pv_info.name to "Hyper-V" so
dmesg reports correctly:

  [    0.000172] Booting paravirtualized kernel on Hyper-V

[ tglx: Folded build fix provided by Yue ]

Reported-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Link: https://lkml.kernel.org/r/20191015103502.13156-1-parri.andrea@gmail.com
5 years agoMerge branch 'acpi-tables'
Rafael J. Wysocki [Fri, 18 Oct 2019 08:39:21 +0000 (10:39 +0200)]
Merge branch 'acpi-tables'

* acpi-tables:
  ACPI: HMAT: ACPI_HMAT_MEMORY_PD_VALID is deprecated since ACPI-6.3

5 years agoACPI: CPPC: Set pcc_data[pcc_ss_id] to NULL in acpi_cppc_processor_exit()
John Garry [Tue, 15 Oct 2019 14:07:31 +0000 (22:07 +0800)]
ACPI: CPPC: Set pcc_data[pcc_ss_id] to NULL in acpi_cppc_processor_exit()

When enabling KASAN and DEBUG_TEST_DRIVER_REMOVE, I find this KASAN
warning:

[   20.872057] BUG: KASAN: use-after-free in pcc_data_alloc+0x40/0xb8
[   20.878226] Read of size 4 at addr ffff00236cdeb684 by task swapper/0/1
[   20.884826]
[   20.886309] CPU: 19 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc1-00009-ge7f7df3db5bf-dirty #289
[   20.894994] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019
[   20.903505] Call trace:
[   20.905942]  dump_backtrace+0x0/0x200
[   20.909593]  show_stack+0x14/0x20
[   20.912899]  dump_stack+0xd4/0x130
[   20.916291]  print_address_description.isra.9+0x6c/0x3b8
[   20.921592]  __kasan_report+0x12c/0x23c
[   20.925417]  kasan_report+0xc/0x18
[   20.928808]  __asan_load4+0x94/0xb8
[   20.932286]  pcc_data_alloc+0x40/0xb8
[   20.935938]  acpi_cppc_processor_probe+0x4e8/0xb08
[   20.940717]  __acpi_processor_start+0x48/0xb0
[   20.945062]  acpi_processor_start+0x40/0x60
[   20.949235]  really_probe+0x118/0x548
[   20.952887]  driver_probe_device+0x7c/0x148
[   20.957059]  device_driver_attach+0x94/0xa0
[   20.961231]  __driver_attach+0xa4/0x110
[   20.965055]  bus_for_each_dev+0xe8/0x158
[   20.968966]  driver_attach+0x30/0x40
[   20.972531]  bus_add_driver+0x234/0x2f0
[   20.976356]  driver_register+0xbc/0x1d0
[   20.980182]  acpi_processor_driver_init+0x40/0xe4
[   20.984875]  do_one_initcall+0xb4/0x254
[   20.988700]  kernel_init_freeable+0x24c/0x2f8
[   20.993047]  kernel_init+0x10/0x118
[   20.996524]  ret_from_fork+0x10/0x18
[   21.000087]
[   21.001567] Allocated by task 1:
[   21.004785]  save_stack+0x28/0xc8
[   21.008089]  __kasan_kmalloc.isra.9+0xbc/0xd8
[   21.012435]  kasan_kmalloc+0xc/0x18
[   21.015913]  pcc_data_alloc+0x94/0xb8
[   21.019564]  acpi_cppc_processor_probe+0x4e8/0xb08
[   21.024343]  __acpi_processor_start+0x48/0xb0
[   21.028689]  acpi_processor_start+0x40/0x60
[   21.032860]  really_probe+0x118/0x548
[   21.036512]  driver_probe_device+0x7c/0x148
[   21.040684]  device_driver_attach+0x94/0xa0
[   21.044855]  __driver_attach+0xa4/0x110
[   21.048680]  bus_for_each_dev+0xe8/0x158
[   21.052591]  driver_attach+0x30/0x40
[   21.056155]  bus_add_driver+0x234/0x2f0
[   21.059980]  driver_register+0xbc/0x1d0
[   21.063805]  acpi_processor_driver_init+0x40/0xe4
[   21.068497]  do_one_initcall+0xb4/0x254
[   21.072322]  kernel_init_freeable+0x24c/0x2f8
[   21.076667]  kernel_init+0x10/0x118
[   21.080144]  ret_from_fork+0x10/0x18
[   21.083707]
[   21.085186] Freed by task 1:
[   21.088056]  save_stack+0x28/0xc8
[   21.091360]  __kasan_slab_free+0x118/0x180
[   21.095445]  kasan_slab_free+0x10/0x18
[   21.099183]  kfree+0x80/0x268
[   21.102139]  acpi_cppc_processor_exit+0x1a8/0x1b8
[   21.106832]  acpi_processor_stop+0x70/0x80
[   21.110917]  really_probe+0x174/0x548
[   21.114568]  driver_probe_device+0x7c/0x148
[   21.118740]  device_driver_attach+0x94/0xa0
[   21.122912]  __driver_attach+0xa4/0x110
[   21.126736]  bus_for_each_dev+0xe8/0x158
[   21.130648]  driver_attach+0x30/0x40
[   21.134212]  bus_add_driver+0x234/0x2f0
[   21.0x10/0x18
[   21.161764]
[   21.163244] The buggy address belongs to the object at ffff00236cdeb600
[   21.163244]  which belongs to the cache kmalloc-256 of size 256
[   21.175750] The buggy address is located 132 bytes inside of
[   21.175750]  256-byte region [ffff00236cdeb600ffff00236cdeb700)
[   21.187473] The buggy address belongs to the page:
[   21.192254] page:fffffe008d937a00 refcount:1 mapcount:0 mapping:ffff002370c0fa00 index:0x0 compound_mapcount: 0
[   21.202331] flags: 0x1ffff00000010200(slab|head)
[   21.206940] raw: 1ffff00000010200 dead000000000100 dead000000000122 ffff002370c0fa00
[   21.214671] raw: 0000000000000000 00000000802a002a 00000001ffffffff 0000000000000000
[   21.222400] page dumped because: kasan: bad access detected
[   21.227959]
[   21.229438] Memory state around the buggy address:
[   21.234218]  ffff00236cdeb580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   21.241427]  ffff00236cdeb600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   21.248637] >ffff00236cdeb680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   21.255845]                    ^
[   21.259062]  ffff00236cdeb700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   21.266272]  ffff00236cdeb780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   21.273480] ==================================================================

It seems that global pcc_data[pcc_ss_id] can be freed in
acpi_cppc_processor_exit(), but we may later reference this value, so
NULLify it when freed.

Also remove the useless setting of data "pcc_channel_acquired", which
we're about to free.

Fixes: 85b1407bf6d2 ("ACPI / CPPC: Make CPPC ACPI driver aware of PCC subspace IDs")
Signed-off-by: John Garry <john.garry@huawei.com>
Cc: 4.15+ <stable@vger.kernel.org> # 4.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
5 years agoMerge branches 'pm-cpufreq' and 'pm-sleep'
Rafael J. Wysocki [Fri, 18 Oct 2019 08:27:55 +0000 (10:27 +0200)]
Merge branches 'pm-cpufreq' and 'pm-sleep'

* pm-cpufreq:
  ACPI: processor: Avoid NULL pointer dereferences at init time
  cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown

* pm-sleep:
  PM: sleep: include <linux/pm_runtime.h> for pm_wq
  ACPI: PM: Drop Dell XPS13 9360 from LPS0 Idle _DSM blacklist

5 years agoscsi: lpfc: remove left-over BUILD_NVME defines
Hannes Reinecke [Thu, 17 Oct 2019 15:00:19 +0000 (17:00 +0200)]
scsi: lpfc: remove left-over BUILD_NVME defines

The BUILD_NVME define never got defined anywhere, causing NVMe commands to
be treated as SCSI commands when freeing the buffers.  This was causing a
stuck discovery and a horrible crash in lpfc_set_rrq_active() later on.

Link: https://lore.kernel.org/r/20191017150019.75769-1-hare@suse.de
Fixes: c00f62e6c546 ("scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pair")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
5 years agoscsi: core: try to get module before removing device
Yufen Yu [Tue, 15 Oct 2019 13:05:56 +0000 (21:05 +0800)]
scsi: core: try to get module before removing device

We have a test case like block/001 in blktests, which will create a scsi
device by loading scsi_debug module and then try to delete the device by
sysfs interface. At the same time, it may remove the scsi_debug module.

And getting a invalid paging request BUG_ON as following:

[   34.625854] BUG: unable to handle page fault for address: ffffffffa0016bb8
[   34.629189] Oops: 0000 [#1] SMP PTI
[   34.629618] CPU: 1 PID: 450 Comm: bash Tainted: G        W         5.4.0-rc3+ #473
[   34.632524] RIP: 0010:scsi_proc_hostdir_rm+0x5/0xa0
[   34.643555] CR2: ffffffffa0016bb8 CR3: 000000012cd88000 CR4: 00000000000006e0
[   34.644545] Call Trace:
[   34.644907]  scsi_host_dev_release+0x6b/0x1f0
[   34.645511]  device_release+0x74/0x110
[   34.646046]  kobject_put+0x116/0x390
[   34.646559]  put_device+0x17/0x30
[   34.647041]  scsi_target_dev_release+0x2b/0x40
[   34.647652]  device_release+0x74/0x110
[   34.648186]  kobject_put+0x116/0x390
[   34.648691]  put_device+0x17/0x30
[   34.649157]  scsi_device_dev_release_usercontext+0x2e8/0x360
[   34.649953]  execute_in_process_context+0x29/0x80
[   34.650603]  scsi_device_dev_release+0x20/0x30
[   34.651221]  device_release+0x74/0x110
[   34.651732]  kobject_put+0x116/0x390
[   34.652230]  sysfs_unbreak_active_protection+0x3f/0x50
[   34.652935]  sdev_store_delete.cold.4+0x71/0x8f
[   34.653579]  dev_attr_store+0x1b/0x40
[   34.654103]  sysfs_kf_write+0x3d/0x60
[   34.654603]  kernfs_fop_write+0x174/0x250
[   34.655165]  __vfs_write+0x1f/0x60
[   34.655639]  vfs_write+0xc7/0x280
[   34.656117]  ksys_write+0x6d/0x140
[   34.656591]  __x64_sys_write+0x1e/0x30
[   34.657114]  do_syscall_64+0xb1/0x400
[   34.657627]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   34.658335] RIP: 0033:0x7f156f337130

During deleting scsi target, the scsi_debug module have been removed. Then,
sdebug_driver_template belonged to the module cannot be accessd, resulting
in scsi_proc_hostdir_rm() BUG_ON.

To fix the bug, we add scsi_device_get() in sdev_store_delete() to try to
increase refcount of module, avoiding the module been removed.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191015130556.18061-1-yuyufen@huawei.com
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
5 years agoscsi: hpsa: add missing hunks in reset-patch
Don Brace [Mon, 14 Oct 2019 18:03:58 +0000 (13:03 -0500)]
scsi: hpsa: add missing hunks in reset-patch

Correct returning from reset before outstanding commands are completed
for the device.

Link: https://lore.kernel.org/r/157107623870.17997.11208813089704833029.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
5 years agoscsi: target: core: Do not overwrite CDB byte 1
Bodo Stroesser [Mon, 14 Oct 2019 18:29:04 +0000 (20:29 +0200)]
scsi: target: core: Do not overwrite CDB byte 1

passthrough_parse_cdb() - used by TCMU and PSCSI - attepts to reset the LUN
field of SCSI-2 CDBs (bits 5,6,7 of byte 1).  The current code is wrong as
for newer commands not having the LUN field it overwrites relevant command
bits (e.g. for SECURITY PROTOCOL IN / OUT). We think this code was
unnecessary from the beginning or at least it is no longer useful. So we
remove it entirely.

Link: https://lore.kernel.org/r/12498eab-76fd-eaad-1316-c2827badb76a@ts.fujitsu.com
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
5 years agonet: Update address for MediaTek ethernet driver in MAINTAINERS
Sean Wang [Wed, 16 Oct 2019 21:14:08 +0000 (05:14 +0800)]
net: Update address for MediaTek ethernet driver in MAINTAINERS

Update maintainers for MediaTek ethernet driver with Mark Lee.
He is familiar with MediaTek mt762x series ethernet devices and
will keep following maintenance from the vendor side.

Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Mark Lee <Mark-MC.Lee@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 18 Oct 2019 00:00:14 +0000 (17:00 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "The main thing here is a long-awaited workaround for a CPU erratum on
  ThunderX2 which we have developed in conjunction with engineers from
  Cavium/Marvell.

  At the moment, the workaround is unconditionally enabled for affected
  CPUs at runtime but we may add a command-line option to disable it in
  future if performance numbers show up indicating a significant cost
  for real workloads.

  Summary:

   - Work around Cavium/Marvell ThunderX2 erratum #219

   - Fix regression in mlock() ABI caused by sign-extension of TTBR1 addresses

   - More fixes to the spurious kernel fault detection logic

   - Fix pathological preemption race when enabling some CPU features at boot

   - Drop broken kcore macros in favour of generic implementations

   - Fix userspace view of ID_AA64ZFR0_EL1 when SVE is disabled

   - Avoid NULL dereference on allocation failure during hibernation"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: tags: Preserve tags for addresses translated via TTBR1
  arm64: mm: fix inverted PAR_EL1.F check
  arm64: sysreg: fix incorrect definition of SYS_PAR_EL1_F
  arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled
  arm64: hibernate: check pgd table allocation
  arm64: cpufeature: Treat ID_AA64ZFR0_EL1 as RAZ when SVE is not enabled
  arm64: Fix kcore macros after 52-bit virtual addressing fallout
  arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
  arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
  arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
  arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set

5 years agoipv4: fix race condition between route lookup and invalidation
Wei Wang [Wed, 16 Oct 2019 19:03:15 +0000 (12:03 -0700)]
ipv4: fix race condition between route lookup and invalidation

Jesse and Ido reported the following race condition:
<CPU A, t0> - Received packet A is forwarded and cached dst entry is
taken from the nexthop ('nhc->nhc_rth_input'). Calls skb_dst_set()

<t1> - Given Jesse has busy routers ("ingesting full BGP routing tables
from multiple ISPs"), route is added / deleted and rt_cache_flush() is
called

<CPU B, t2> - Received packet B tries to use the same cached dst entry
from t0, but rt_cache_valid() is no longer true and it is replaced in
rt_cache_route() by the newer one. This calls dst_dev_put() on the
original dst entry which assigns the blackhole netdev to 'dst->dev'

<CPU A, t3> - dst_input(skb) is called on packet A and it is dropped due
to 'dst->dev' being the blackhole netdev

There are 2 issues in the v4 routing code:
1. A per-netns counter is used to do the validation of the route. That
means whenever a route is changed in the netns, users of all routes in
the netns needs to redo lookup. v6 has an implementation of only
updating fn_sernum for routes that are affected.
2. When rt_cache_valid() returns false, rt_cache_route() is called to
throw away the current cache, and create a new one. This seems
unnecessary because as long as this route does not change, the route
cache does not need to be recreated.

To fully solve the above 2 issues, it probably needs quite some code
changes and requires careful testing, and does not suite for net branch.

So this patch only tries to add the deleted cached rt into the uncached
list, so user could still be able to use it to receive packets until
it's done.

Fixes: 95c47f9cf5e0 ("ipv4: call dst_dev_put() properly")
Signed-off-by: Wei Wang <weiwan@google.com>
Reported-by: Ido Schimmel <idosch@idosch.org>
Reported-by: Jesse Hathaway <jesse@mbuki-mvuki.org>
Tested-by: Jesse Hathaway <jesse@mbuki-mvuki.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'xtensa-20191017' of git://github.com/jcmvbkbc/linux-xtensa
Linus Torvalds [Thu, 17 Oct 2019 23:42:50 +0000 (16:42 -0700)]
Merge tag 'xtensa-20191017' of git://github.com/jcmvbkbc/linux-xtensa

Pull Xtensa fixes from Max Filippov:

 - fix {get,put}_user() for 64bit values

 - fix warning about static EXPORT_SYMBOL from modpost

 - fix PCI IO ports mapping for the virt board

 - fix pasto in change_bit for exclusive access option

* tag 'xtensa-20191017' of git://github.com/jcmvbkbc/linux-xtensa:
  xtensa: fix change_bit in exclusive access option
  xtensa: virt: fix PCI IO ports mapping
  xtensa: drop EXPORT_SYMBOL for outs*/ins*
  xtensa: fix type conversion in __get_user_[no]check
  xtensa: clean up assembly arguments in uaccess macros
  xtensa: fix {get,put}_user() for 64bit values

5 years agoipv4: Return -ENETUNREACH if we can't create route but saddr is valid
Stefano Brivio [Wed, 16 Oct 2019 18:52:09 +0000 (20:52 +0200)]
ipv4: Return -ENETUNREACH if we can't create route but saddr is valid

...instead of -EINVAL. An issue was found with older kernel versions
while unplugging a NFS client with pending RPCs, and the wrong error
code here prevented it from recovering once link is back up with a
configured address.

Incidentally, this is not an issue anymore since commit 4f8943f80883
("SUNRPC: Replace direct task wakeups from softirq context"), included
in 5.2-rc7, had the effect of decoupling the forwarding of this error
by using SO_ERROR in xs_wake_error(), as pointed out by Benjamin
Coddington.

To the best of my knowledge, this isn't currently causing any further
issue, but the error code doesn't look appropriate anyway, and we
might hit this in other paths as well.

In detail, as analysed by Gonzalo Siero, once the route is deleted
because the interface is down, and can't be resolved and we return
-EINVAL here, this ends up, courtesy of inet_sk_rebuild_header(),
as the socket error seen by tcp_write_err(), called by
tcp_retransmit_timer().

In turn, tcp_write_err() indirectly calls xs_error_report(), which
wakes up the RPC pending tasks with a status of -EINVAL. This is then
seen by call_status() in the SUN RPC implementation, which aborts the
RPC call calling rpc_exit(), instead of handling this as a
potentially temporary condition, i.e. as a timeout.

Return -EINVAL only if the input parameters passed to
ip_route_output_key_hash_rcu() are actually invalid (this is the case
if the specified source address is multicast, limited broadcast or
all zeroes), but return -ENETUNREACH in all cases where, at the given
moment, the given source address doesn't allow resolving the route.

While at it, drop the initialisation of err to -ENETUNREACH, which
was added to __ip_route_output_key() back then by commit
0315e3827048 ("net: Fix behaviour of unreachable, blackhole and
prohibit routes"), but actually had no effect, as it was, and is,
overwritten by the fib_lookup() return code assignment, and anyway
ignored in all other branches, including the if (fl4->saddr) one:
I find this rather confusing, as it would look like -ENETUNREACH is
the "default" error, while that statement has no effect.

Also note that after commit fc75fc8339e7 ("ipv4: dont create routes
on down devices"), we would get -ENETUNREACH if the device is down,
but -EINVAL if the source address is specified and we can't resolve
the route, and this appears to be rather inconsistent.

Reported-by: Stefan Walter <walteste@inf.ethz.ch>
Analysed-by: Benjamin Coddington <bcodding@redhat.com>
Analysed-by: Gonzalo Siero <gsierohu@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: micrel: Update KSZ87xx PHY name
Marek Vasut [Wed, 16 Oct 2019 13:35:07 +0000 (15:35 +0200)]
net: phy: micrel: Update KSZ87xx PHY name

The KSZ8795 PHY ID is in fact used by KSZ8794/KSZ8795/KSZ8765 switches.
Update the PHY ID and name to reflect that, as this family of switches
is commonly refered to as KSZ87xx

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: George McCollister <george.mccollister@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: micrel: Discern KSZ8051 and KSZ8795 PHYs
Marek Vasut [Wed, 16 Oct 2019 13:35:06 +0000 (15:35 +0200)]
net: phy: micrel: Discern KSZ8051 and KSZ8795 PHYs

The KSZ8051 PHY and the KSZ8794/KSZ8795/KSZ8765 switch share exactly the
same PHY ID. Since KSZ8051 is higher in the ksphy_driver[] list of PHYs
in the micrel PHY driver, it is used even with the KSZ87xx switch. This
is wrong, since the KSZ8051 configures registers of the PHY which are
not present on the simplified KSZ87xx switch PHYs and misconfigures
other registers of the KSZ87xx switch PHYs.

Fortunatelly, it is possible to tell apart the KSZ8051 PHY from the
KSZ87xx switch by checking the Basic Status register Bit 0, which is
read-only and indicates presence of the Extended Capability Registers.
The KSZ8051 PHY has those registers while the KSZ87xx switch does not.

This patch implements simple check for the presence of this bit for
both the KSZ8051 PHY and KSZ87xx switch, to let both use the correct
PHY driver instance.

Fixes: 9d162ed69f51 ("net: phy: micrel: add support for KSZ8795")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: George McCollister <george.mccollister@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoio_uring: fix logic error in io_timeout
yangerkun [Thu, 17 Oct 2019 04:12:35 +0000 (12:12 +0800)]
io_uring: fix logic error in io_timeout

If ctx->cached_sq_head < nxt_sq_head, we should add UINT_MAX to tmp, not
tmp_nxt.

Fixes: 5da0fb1ab34c ("io_uring: consider the overflow of sequence for timeout req")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoio_uring: fix up O_NONBLOCK handling for sockets
Jens Axboe [Thu, 17 Oct 2019 15:20:46 +0000 (09:20 -0600)]
io_uring: fix up O_NONBLOCK handling for sockets

We've got two issues with the non-regular file handling for non-blocking
IO:

1) We don't want to re-do a short read in full for a non-regular file,
   as we can't just read the data again.
2) For non-regular files that don't support non-blocking IO attempts,
   we need to punt to async context even if the file is opened as
   non-blocking. Otherwise the caller always gets -EAGAIN.

Add two new request flags to handle these cases. One is just a cache
of the inode S_ISREG() status, the other tells io_uring that we always
need to punt this request to async context, even if REQ_F_NOWAIT is set.

Cc: stable@vger.kernel.org
Reported-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Tested-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge tag 'xfs-5.4-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Thu, 17 Oct 2019 21:19:52 +0000 (14:19 -0700)]
Merge tag 'xfs-5.4-fixes-4' of git://git./fs/xfs/xfs-linux

Pull xfs fix from Darrick Wong:
 "The single fix converts the seconds field in the recently added XFS
  bulkstat structure to a signed 64-bit quantity.

  The structure layout doesn't change and so far there are no users of
  the ioctl to break because we only publish xfs ioctl interfaces
  through the XFS userspace development libraries, and we're still
  working on a 5.3 release"

* tag 'xfs-5.4-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: change the seconds fields in xfs_bulkstat to signed

5 years agodrm/amdgpu/vce: fix allocation size in enc ring test
Alex Deucher [Thu, 17 Oct 2019 15:36:47 +0000 (11:36 -0400)]
drm/amdgpu/vce: fix allocation size in enc ring test

We need to allocate a large enough buffer for the
feedback buffer, otherwise the IB test can overwrite
other memory.

Reviewed-by: James Zhu <James.Zhu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
5 years agodrm/amdgpu: fix error handling in amdgpu_bo_list_create
Christian König [Wed, 18 Sep 2019 17:42:14 +0000 (19:42 +0200)]
drm/amdgpu: fix error handling in amdgpu_bo_list_create

We need to drop normal and userptr BOs separately.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: fix potential VM faults
Christian König [Thu, 19 Sep 2019 08:38:57 +0000 (10:38 +0200)]
drm/amdgpu: fix potential VM faults

When we allocate new page tables under memory
pressure we should not evict old ones.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 years agodrm/amdgpu: user pages array memory leak fix
Philip Yang [Thu, 3 Oct 2019 18:18:25 +0000 (14:18 -0400)]
drm/amdgpu: user pages array memory leak fix

user_pages array should always be freed after validation regardless if
user pages are changed after bo is created because with HMM change parse
bo always allocate user pages array to get user pages for userptr bo.

v2: remove unused local variable and amend commit

v3: add back get user pages in gem_userptr_ioctl, to detect application
bug where an userptr VMA is not ananymous memory and reject it.

Bugzilla: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1844962

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Joe Barnett <thejoe@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.3
5 years agodrm/amdgpu/vcn: fix allocation size in enc ring test
Alex Deucher [Tue, 15 Oct 2019 22:09:41 +0000 (18:09 -0400)]
drm/amdgpu/vcn: fix allocation size in enc ring test

We need to allocate a large enough buffer for the
session info, otherwise the IB test can overwrite
other memory.

- Session info is 128K according to mesa
- Use the same session info for create and destroy

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=204241
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
5 years agodrm/amdgpu/uvd7: fix allocation size in enc ring test (v2)
Alex Deucher [Tue, 15 Oct 2019 22:08:59 +0000 (18:08 -0400)]
drm/amdgpu/uvd7: fix allocation size in enc ring test (v2)

We need to allocate a large enough buffer for the
session info, otherwise the IB test can overwrite
other memory.

v2: - session info is 128K according to mesa
    - use the same session info for create and destroy

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=204241
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
5 years agodrm/amdgpu/uvd6: fix allocation size in enc ring test (v2)
Alex Deucher [Tue, 15 Oct 2019 22:07:19 +0000 (18:07 -0400)]
drm/amdgpu/uvd6: fix allocation size in enc ring test (v2)

We need to allocate a large enough buffer for the
session info, otherwise the IB test can overwrite
other memory.

v2: - session info is 128K according to mesa
    - use the same session info for create and destroy

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=204241
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
5 years agoMerge tag 'drm-fixes-2019-10-18' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Thu, 17 Oct 2019 21:04:53 +0000 (14:04 -0700)]
Merge tag 'drm-fixes-2019-10-18' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "This is this weeks fixes for drm.

  The dma-resv one is probably the more important one a fair few people
  have reported it, besides that it's a couple of panfrost, a few i915
  and a few amdgpu fixes.

  One radeon patch to fix some ppc64 related issues caused an x86
  regression so is getting reverted for now.

  Summary:

  dma-resv:
   - shared fences for lima/panfrost

  ttm:
   - prefault regression fix
   - lifetime fix

  panfrost:
   - stopped job timeout fix
   - missing register values

  amdgpu:
   - smu7 powerplay fix
   - bail earlier for cik/si detection
   - navi SDMA fix

  radeon:
   - revert a ppc64 shutdown fix that broke x86

  i915:
   - VBT information handling fix
   - Circular locking fix
   - preemption vs resubmission virtual requests fix"

* tag 'drm-fixes-2019-10-18' of git://anongit.freedesktop.org/drm/drm:
  drm/i915: Fixup preempt-to-busy vs resubmission of a virtual request
  drm/i915/userptr: Never allow userptr into the mappable GGTT
  drm/i915: Favor last VBT child device with conflicting AUX ch/DDC pin
  drm/i915/execlists: Refactor -EIO markup of hung requests
  drm/panfrost: Handle resetting on timeout better
  drm/panfrost: Add missing GPU feature registers
  drm/ttm: fix handling in ttm_bo_add_mem_to_lru
  drm/ttm: Restore ttm prefaulting
  drm/ttm: fix busy reference in ttm_mem_evict_first
  drm/amdgpu/sdma5: fix mask value of POLL_REGMEM packet for pipe sync
  drm/amdgpu: Bail earlier when amdgpu.cik_/si_support is not set to 1
  Revert "drm/radeon: Fix EEH during kexec"
  drm/msm/dsi: Implement reset correctly
  dma-buf/resv: fix exclusive fence get
  drm/edid: Add 6 bpc quirk for SDC panel in Lenovo G50
  drm/tiny: Kconfig: Remove always-y THERMAL dep. from TINYDRM_REPAPER
  drm/amdgpu/powerplay: fix typo in mvdd table setup

5 years agoMerge branch 'errata/tx2-219' into for-next/fixes
Will Deacon [Thu, 17 Oct 2019 20:42:42 +0000 (13:42 -0700)]
Merge branch 'errata/tx2-219' into for-next/fixes

Workaround for Cavium/Marvell ThunderX2 erratum #219.

* errata/tx2-219:
  arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
  arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
  arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
  arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set

5 years agoMerge tag 'drm-misc-fixes-2019-10-17' of git://anongit.freedesktop.org/drm/drm-misc...
Dave Airlie [Thu, 17 Oct 2019 20:40:05 +0000 (06:40 +1000)]
Merge tag 'drm-misc-fixes-2019-10-17' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

-dma-resv: Change shared_count to post-increment to fix lima crash (Qiang)
-ttm: A couple fixes related to lifetime and restore prefault behavior
 (Christian & Thomas)
-panfrost: Fill in missing feature reg values and fix stoppedjob timeouts
 (Steven)

Cc: Qiang Yu <yuq825@gmail.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Steven Price <steven.price@arm.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20191017203419.GA142909@art_vandelay
5 years agoMerge tag 'drm-fixes-5.4-2019-10-16' of git://people.freedesktop.org/~agd5f/linux...
Dave Airlie [Thu, 17 Oct 2019 20:12:05 +0000 (06:12 +1000)]
Merge tag 'drm-fixes-5.4-2019-10-16' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

drm-fixes-5.4-2019-10-16:

amdgpu:
- Powerplay fix for SMU7 parts
- Bail earlier when cik/si support is not set to 1
- Fix an SDMA issue on navi

radeon:
- revert a PPC fix which broken x86

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191017022443.3853-1-alexander.deucher@amd.com
5 years agoMerge tag 'drm-intel-fixes-2019-10-17' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Thu, 17 Oct 2019 20:10:25 +0000 (06:10 +1000)]
Merge tag 'drm-intel-fixes-2019-10-17' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Display fix on handling VBT information.
- Important circular locking fix
- Fix for preemption vs resubmission on virtual requests
  - and a prep patch to make this last one to apply cleanly

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191017135444.GA12255@intel.com
5 years agonet: dsa: microchip: Add shared regmap mutex
Marek Vasut [Wed, 16 Oct 2019 13:33:24 +0000 (15:33 +0200)]
net: dsa: microchip: Add shared regmap mutex

The KSZ driver uses one regmap per register width (8/16/32), each with
it's own lock, but accessing the same set of registers. In theory, it
is possible to create a race condition between these regmaps, although
the underlying bus (SPI or I2C) locking should assure nothing bad will
really happen and the accesses would be correct.

To make the driver do the right thing, add one single shared mutex for
all the regmaps used by the driver instead. This assures that even if
some future hardware is on a bus which does not serialize the accesses
the same way SPI or I2C does, nothing bad will happen.

Note that the status_mutex was unused and only initied, hence it was
renamed and repurposed as the regmap mutex.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: George McCollister <george.mccollister@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: microchip: Do not reinit mutexes on KSZ87xx
Marek Vasut [Wed, 16 Oct 2019 13:33:23 +0000 (15:33 +0200)]
net: dsa: microchip: Do not reinit mutexes on KSZ87xx

The KSZ87xx driver calls mutex_init() on mutexes already inited in
ksz_common.c ksz_switch_register(). Do not do it twice, drop the
reinitialization.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: George McCollister <george.mccollister@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: fix argument to stmmac_pcs_ctrl_ane()
Ben Dooks (Codethink) [Wed, 16 Oct 2019 08:22:05 +0000 (09:22 +0100)]
net: stmmac: fix argument to stmmac_pcs_ctrl_ane()

The stmmac_pcs_ctrl_ane() expects a register address as
argument 1, but for some reason the mac_device_info is
being passed.

Fix the warning (and possible bug) from sparse:

drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2613:17: warning: incorrect type in argument 1 (different address spaces)
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2613:17:    expected void [noderef] <asn:2> *ioaddr
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2613:17:    got struct mac_device_info *hw

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'dpaa2-eth-misc-fixes'
David S. Miller [Thu, 17 Oct 2019 19:27:29 +0000 (15:27 -0400)]
Merge branch 'dpaa2-eth-misc-fixes'

Ioana Ciornei says:

====================
dpaa2-eth: misc fixes

This patch set adds a couple of fixes around updating configuration on MAC
change.  Depending on when MC connects the DPNI to a MAC, both the MAC
address and TX FQIDs should be updated everytime there is a change in
configuration.

Changes in v2:
 - used reverse christmas tree ordering in patch 2/2
Changes in v3:
 - add a missing new line
 - go back to FQ based enqueueing after a transient error
====================

Signed-off-by: David S. Miller <davem@davemloft.net>