Jeff Layton [Thu, 2 Jan 2020 12:11:38 +0000 (07:11 -0500)]
ceph: cache layout in parent dir on first sync create
If a create is done, then typically we'll end up writing to the file
soon afterward. We don't want to wait for the reply before doing that
when doing an async create, so that means we need the layout for the
new file before we've gotten the response from the MDS.
All files created in a directory will initially inherit the same layout,
so copy off the requisite info from the first synchronous create in the
directory, and save it in a new i_cached_layout field. Zero out the
layout when we lose Dc caps in the dir.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Mon, 13 Jan 2020 18:04:08 +0000 (13:04 -0500)]
ceph: add new MDS req field to hold delegated inode number
Add new request field to hold the delegated inode number. Encode that
into the message when it's set.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Fri, 15 Nov 2019 16:51:55 +0000 (11:51 -0500)]
ceph: decode interval_sets for delegated inos
Starting in Octopus, the MDS will hand out caps that allow the client
to do asynchronous file creates under certain conditions. As part of
that, the MDS will delegate ranges of inode numbers to the client.
Add the infrastructure to decode these ranges, and stuff them into an
xarray for later consumption by the async creation code.
Because the xarray code currently only handles unsigned long indexes,
and those are 32-bits on 32-bit arches, we only enable the decoding when
running on a 64-bit arch.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Thu, 5 Dec 2019 14:09:25 +0000 (09:09 -0500)]
ceph: make ceph_fill_inode non-static
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Tue, 2 Apr 2019 19:35:56 +0000 (15:35 -0400)]
ceph: perform asynchronous unlink if we have sufficient caps
The MDS is getting a new lock-caching facility that will allow it
to cache the necessary locks to allow asynchronous directory operations.
Since the CEPH_CAP_FILE_* caps are currently unused on directories,
we can repurpose those bits for this purpose.
When performing an unlink, if we have Fx on the parent directory,
and CEPH_CAP_DIR_UNLINK (aka Fr), and we know that the dentry being
removed is the primary link, then then we can fire off an unlink
request immediately and don't need to wait on reply before returning.
In that situation, just fix up the dcache and link count and return
immediately after issuing the call to the MDS. This does mean that we
need to hold an extra reference to the inode being unlinked, and extra
references to the caps to avoid races. Those references are put and
error handling is done in the r_callback routine.
If the operation ends up failing, then set a writeback error on the
directory inode, and the inode itself that can be fetched later by
an fsync on the dir.
The behavior of dir caps is slightly different from caps on normal
files. Because these are just considered an optimization, if the
session is reconnected, we will not automatically reclaim them. They
are instead considered lost until we do another synchronous op in the
parent directory.
Async dirops are enabled via the "nowsync" mount option, which is
patterned after the xfs "wsync" mount option. For now, the default
is "wsync", but eventually we may flip that.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Yan, Zheng [Tue, 18 Feb 2020 13:17:08 +0000 (08:17 -0500)]
ceph: don't take refs to want mask unless we have all bits
If we don't have all of the cap bits for the want mask in
try_get_cap_refs, then just take refs on the need bits.
Signed-off-by: "Yan, Zheng" <ukernel@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Tue, 18 Feb 2020 19:12:45 +0000 (14:12 -0500)]
ceph: cap tracking for async directory operations
Track and correctly handle directory caps for asynchronous operations.
Add aliases for Frc caps that we now designate at Dcu caps (when dealing
with directories).
Unlike file caps, we don't reclaim these when the session goes away, and
instead preemptively release them. In-flight async dirops are instead
handled during reconnect phase. The client needs to re-do a synchronous
operation in order to re-get directory caps.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Tue, 14 Jan 2020 14:23:49 +0000 (09:23 -0500)]
ceph: make __take_cap_refs non-static
Rename it to ceph_take_cap_refs and make it available to other files.
Also replace a comment with a lockdep assertion.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Tue, 14 Jan 2020 20:06:40 +0000 (15:06 -0500)]
ceph: add infrastructure for waiting for async create to complete
When we issue an async create, we must ensure that any later on-the-wire
requests involving it wait for the create reply.
Expand i_ceph_flags to be an unsigned long, and add a new bit that
MDS requests can wait on. If the bit is set in the inode when sending
caps, then don't send it and just return that it has been delayed.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Tue, 18 Feb 2020 19:12:32 +0000 (14:12 -0500)]
ceph: track primary dentry link
Newer versions of the MDS will flag a dentry as "primary". In later
patches, we'll need to consult this info, so track it in di->flags.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Mon, 2 Dec 2019 18:47:57 +0000 (13:47 -0500)]
ceph: add flag to designate that a request is asynchronous
...and ensure that such requests are never queued. The MDS has need to
know that a request is asynchronous so add flags and proper
infrastructure for that.
Also, delegated inode numbers and directory caps are associated with the
session, so ensure that async requests are always transmitted on the
first attempt and are never queued to wait for session reestablishment.
If it does end up looking like we'll need to queue the request, then
have it return -EJUKEBOX so the caller can reattempt with a synchronous
request.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Tue, 25 Feb 2020 19:08:33 +0000 (11:08 -0800)]
ceph: more caps.c lockdep assertions
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Tue, 25 Feb 2020 19:49:53 +0000 (11:49 -0800)]
ceph: clean up kick_flushing_inode_caps()
The last thing that this function does is release i_ceph_lock, so
have the caller do that instead. Add a lockdep assertion to
ensure that the function is always called with i_ceph_lock held.
Change the prototype to take a ceph_inode_info pointer and drop
the separate mdsc argument as we can get that from the session.
While at it, make it non-static. We'll need this to kick any
flushing caps once the create reply comes in.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Fri, 30 Aug 2019 15:38:31 +0000 (17:38 +0200)]
libceph: directly skip to the end of redirect reply
Coverity complains about a double write to *p. Don't bother with
osd_instructions and directly skip to the end of redirect reply.
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Wed, 26 Feb 2020 15:37:55 +0000 (16:37 +0100)]
libceph: simplify ceph_monc_handle_map()
ceph_monc_handle_map() confuses static checkers which report a
false use-after-free on monc->monmap, missing that monc->monmap and
client->monc.monmap is the same pointer.
Use monc->monmap consistently and get rid of "old", which is redundant.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Xiubo Li [Mon, 24 Feb 2020 03:23:11 +0000 (22:23 -0500)]
ceph: return ETIMEDOUT errno to userland when request timed out
req->r_timeout is only used during mounting, so this error will
be more accurate.
URL: https://tracker.ceph.com/issues/44215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Luis Henriques [Mon, 24 Feb 2020 13:44:32 +0000 (13:44 +0000)]
ceph: re-org copy_file_range and fix some error paths
This patch re-organizes copy_file_range, trying to fix a few issues in the
error handling. Here's the summary:
- Abort copy if initial do_splice_direct() returns fewer bytes than
requested.
- Move the 'size' initialization (with i_size_read()) further down in the
code, after the initial call to do_splice_direct(). This avoids issues
with a possibly stale value if a manual copy is done.
- Move the object copy loop into a separate function. This makes it
easier to handle errors (e.g, dirtying caps and updating the MDS
metadata if only some objects have been copied before an error has
occurred).
- Added calls to ceph_oloc_destroy() to avoid leaking memory with src_oloc
and dst_oloc
- After the object copy loop, the new file size to be reported to the MDS
(if there's file size change) is now the actual file size, and not the
size after an eventual extra manual copy.
- Added a few dout() to show the number of bytes copied in the two manual
copies and in the object copy loop.
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Mon, 17 Feb 2020 23:38:37 +0000 (18:38 -0500)]
ceph: move to a dedicated slabcache for mds requests
On my machine (x86_64) this struct is 952 bytes, which gets rounded up
to 1024 by kmalloc. Move this to a dedicated slabcache, so we can
allocate them without the extra 72 bytes of overhead per.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Mon, 17 Feb 2020 15:19:14 +0000 (10:19 -0500)]
ceph: reorganize fields in ceph_mds_request
This shrinks the struct size by 16 bytes.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Andreas Gruenbacher [Thu, 13 Feb 2020 20:24:22 +0000 (21:24 +0100)]
ceph: switch to page_mkwrite_check_truncate in ceph_page_mkwrite
Use the "page has been truncated" logic in page_mkwrite_check_truncate
instead of reimplementing it here. Other than with the existing code,
fail with -EFAULT / VM_FAULT_NOPAGE when page_offset(page) == size here
as well, as should be expected.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Gustavo A. R. Silva [Thu, 13 Feb 2020 16:00:04 +0000 (10:00 -0600)]
ceph: replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit
76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Hannes Reinecke [Fri, 31 Jan 2020 10:37:39 +0000 (11:37 +0100)]
rbd: enable multiple blk-mq queues
Allocate one queue per CPU and get a performance boost from
higher parallelism.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Wed, 12 Feb 2020 14:23:58 +0000 (15:23 +0100)]
rbd: embed image request in blk-mq pdu
Avoid making allocations for !IMG_REQ_CHILD image requests. Only
IMG_REQ_CHILD image requests need to be freed now.
Move the initial request checks to rbd_queue_rq(). Unfortunately we
can't fill the image request and kick the state machine directly from
rbd_queue_rq() because ->queue_rq() isn't allowed to block.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Wed, 12 Feb 2020 14:08:39 +0000 (15:08 +0100)]
rbd: acquire header_rwsem just once in rbd_queue_workfn()
Currently header_rwsem is acquired twice: once in rbd_dev_parent_get()
when the image request is being created and then in rbd_queue_workfn()
to capture mapping_size and snapc. Introduce rbd_img_capture_header()
and move image request allocation so that header_rwsem can be acquired
just once.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Wed, 12 Feb 2020 13:34:03 +0000 (14:34 +0100)]
rbd: get rid of img_request_layered_clear()
No need to clear IMG_REQ_LAYERED before destroying the request.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Hannes Reinecke [Fri, 31 Jan 2020 10:37:36 +0000 (11:37 +0100)]
rbd: kill img_request kref
The reference counter is never increased, so we can as well call
rbd_img_request_destroy() directly and drop the kref.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Yan, Zheng [Tue, 11 Feb 2020 14:54:43 +0000 (22:54 +0800)]
ceph: check if file lock exists before sending unlock request
When a process exits, kernel closes its files. locks_remove_file()
is called to remove file locks on these files. locks_remove_file()
tries unlocking files even there is no file lock.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Xiubo Li [Tue, 11 Feb 2020 15:31:20 +0000 (10:31 -0500)]
ceph: fix description of some mount options
Based on the latest code, the default value for wsize/rsize is
64MB and the default value for the mount_timeout is 60 seconds.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Xiubo Li [Wed, 29 Jan 2020 08:27:07 +0000 (03:27 -0500)]
ceph: move ceph_osdc_{read,write}pages to ceph.ko
Since these helpers are only used by ceph.ko, move them there and
rename them with _sync_ qualifiers.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Tue, 4 Feb 2020 14:37:48 +0000 (09:37 -0500)]
ceph: don't ClearPageChecked in ceph_invalidatepage()
CephFS doesn't set this bit to begin with, so there should be no need
to clear it.
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Thu, 30 Jan 2020 12:54:59 +0000 (13:54 +0100)]
rbd: remove barriers from img_request_layered_{set,clear,test}()
IMG_REQ_LAYERED is set in rbd_img_request_create(), and tested and
cleared in rbd_img_request_destroy() when the image request is about to
be destroyed. The barriers are unnecessary.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ilya Dryomov [Tue, 28 Jan 2020 19:12:22 +0000 (20:12 +0100)]
libceph: drop CEPH_DEFINE_SHOW_FUNC
Although CEPH_DEFINE_SHOW_FUNC is much older, it now duplicates
DEFINE_SHOW_ATTRIBUTE from linux/seq_file.h.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Yan, Zheng [Sat, 11 May 2019 09:27:59 +0000 (17:27 +0800)]
ceph: check inode type for CEPH_CAP_FILE_{CACHE,RD,REXTEND,LAZYIO}
These bits will have new meaning for directory inodes.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Tue, 2 Apr 2019 12:04:30 +0000 (08:04 -0400)]
ceph: add refcounting for Fx caps
In future patches we'll be taking and relying on Fx caps. Add proper
refcounting for them.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Thu, 4 Apr 2019 12:05:38 +0000 (08:05 -0400)]
ceph: register MDS request with dir inode from the start
When the unsafe reply to a request comes in, the request is put on the
r_unsafe_dir inode's list. In future patches, we're going to need to
wait on requests that may not have gotten an unsafe reply yet.
Change __register_request to put the entry on the dir inode's list when
the pointer is set in the request, and don't check the
CEPH_MDS_R_GOT_UNSAFE flag when unregistering it.
The only place that uses this list today is fsync codepath, and with
the coming changes, we'll want to wait on all operations whether it has
gotten an unsafe reply or not.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Linus Torvalds [Sun, 29 Mar 2020 22:25:41 +0000 (15:25 -0700)]
Linux 5.6
Linus Torvalds [Sun, 29 Mar 2020 17:40:31 +0000 (10:40 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge vm fixes from Andrew Morton:
"5 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/sparse: fix kernel crash with pfn_section_valid check
mm: fork: fix kernel_stack memcg stats for various stack implementations
hugetlb_cgroup: fix illegal access to memory
drivers/base/memory.c: indicate all memory blocks as removable
mm/swapfile.c: move inode_lock out of claim_swapfile
Linus Torvalds [Sun, 29 Mar 2020 17:36:29 +0000 (10:36 -0700)]
Merge tag 'timers-urgent-2020-03-29' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for the Hyper-V clocksource driver to make sched clock
actually return nanoseconds and not the virtual clock value which
increments at 10e7 HZ (100ns)"
* tag 'timers-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/hyper-v: Make sched clock return nanoseconds correctly
Linus Torvalds [Sun, 29 Mar 2020 17:07:00 +0000 (10:07 -0700)]
Merge tag 'irq-urgent-2020-03-29' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single bugfix to prevent reference leaks in irq affinity notifiers"
* tag 'irq-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Fix reference leaks on irq affinity notifiers
Aneesh Kumar K.V [Sun, 29 Mar 2020 02:17:29 +0000 (19:17 -0700)]
mm/sparse: fix kernel crash with pfn_section_valid check
Fix the crash like this:
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc000000000c3447c
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
...
NIP [
c000000000c3447c] vmemmap_populated+0x98/0xc0
LR [
c000000000088354] vmemmap_free+0x144/0x320
Call Trace:
section_deactivate+0x220/0x240
__remove_pages+0x118/0x170
arch_remove_memory+0x3c/0x150
memunmap_pages+0x1cc/0x2f0
devm_action_release+0x30/0x50
release_nodes+0x2f8/0x3e0
device_release_driver_internal+0x168/0x270
unbind_store+0x130/0x170
drv_attr_store+0x44/0x60
sysfs_kf_write+0x68/0x80
kernfs_fop_write+0x100/0x290
__vfs_write+0x3c/0x70
vfs_write+0xcc/0x240
ksys_write+0x7c/0x140
system_call+0x5c/0x68
The crash is due to NULL dereference at
test_bit(idx, ms->usage->subsection_map);
due to ms->usage = NULL in pfn_section_valid()
With commit
d41e2f3bd546 ("mm/hotplug: fix hot remove failure in
SPARSEMEM|!VMEMMAP case") section_mem_map is set to NULL after
depopulate_section_mem(). This was done so that pfn_page() can work
correctly with kernel config that disables SPARSEMEM_VMEMMAP. With that
config pfn_to_page does
__section_mem_map_addr(__sec) + __pfn;
where
static inline struct page *__section_mem_map_addr(struct mem_section *section)
{
unsigned long map = section->section_mem_map;
map &= SECTION_MAP_MASK;
return (struct page *)map;
}
Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is
used to check the pfn validity (pfn_valid()). Since section_deactivate
release mem_section->usage if a section is fully deactivated,
pfn_valid() check after a subsection_deactivate cause a kernel crash.
static inline int pfn_valid(unsigned long pfn)
{
...
return early_section(ms) || pfn_section_valid(ms, pfn);
}
where
static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
{
int idx = subsection_map_index(pfn);
return test_bit(idx, ms->usage->subsection_map);
}
Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is
freed. For architectures like ppc64 where large pages are used for
vmmemap mapping (16MB), a specific vmemmap mapping can cover multiple
sections. Hence before a vmemmap mapping page can be freed, the kernel
needs to make sure there are no valid sections within that mapping.
Clearing the section valid bit before depopulate_section_memap enables
this.
[aneesh.kumar@linux.ibm.com: add comment]
Link: http://lkml.kernel.org/r/20200326133235.343616-1-aneesh.kumar@linux.ibm.comLink:
Fixes:
d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roman Gushchin [Sun, 29 Mar 2020 02:17:25 +0000 (19:17 -0700)]
mm: fork: fix kernel_stack memcg stats for various stack implementations
Depending on CONFIG_VMAP_STACK and the THREAD_SIZE / PAGE_SIZE ratio the
space for task stacks can be allocated using __vmalloc_node_range(),
alloc_pages_node() and kmem_cache_alloc_node().
In the first and the second cases page->mem_cgroup pointer is set, but
in the third it's not: memcg membership of a slab page should be
determined using the memcg_from_slab_page() function, which looks at
page->slab_cache->memcg_params.memcg . In this case, using
mod_memcg_page_state() (as in account_kernel_stack()) is incorrect:
page->mem_cgroup pointer is NULL even for pages charged to a non-root
memory cgroup.
It can lead to kernel_stack per-memcg counters permanently showing 0 on
some architectures (depending on the configuration).
In order to fix it, let's introduce a mod_memcg_obj_state() helper,
which takes a pointer to a kernel object as a first argument, uses
mem_cgroup_from_obj() to get a RCU-protected memcg pointer and calls
mod_memcg_state(). It allows to handle all possible configurations
(CONFIG_VMAP_STACK and various THREAD_SIZE/PAGE_SIZE values) without
spilling any memcg/kmem specifics into fork.c .
Note: This is a special version of the patch created for stable
backports. It contains code from the following two patches:
- mm: memcg/slab: introduce mem_cgroup_from_obj()
- mm: fork: fix kernel_stack memcg stats for various stack implementations
[guro@fb.com: introduce mem_cgroup_from_obj()]
Link: http://lkml.kernel.org/r/20200324004221.GA36662@carbon.dhcp.thefacebook.com
Fixes:
4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages")
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200303233550.251375-1-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mina Almasry [Sun, 29 Mar 2020 02:17:22 +0000 (19:17 -0700)]
hugetlb_cgroup: fix illegal access to memory
This appears to be a mistake in commit
faced7e0806cf ("mm: hugetlb
controller for cgroups v2").
Essentially that commit does a hugetlb_cgroup_from_counter assuming that
page_counter_try_charge has initialized counter.
But if that has failed then it seems will not initialize counter, so
hugetlb_cgroup_from_counter(counter) ends up pointing to random memory,
causing kasan to complain.
The solution is to simply use 'h_cg', instead of
hugetlb_cgroup_from_counter(counter), since that is a reference to the
hugetlb_cgroup anyway. After this change kasan ceases to complain.
Fixes:
faced7e0806cf ("mm: hugetlb controller for cgroups v2")
Reported-by: syzbot+cac0c4e204952cf449b1@syzkaller.appspotmail.com
Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Giuseppe Scrivano <gscrivan@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/20200313223920.124230-1-almasrymina@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Sun, 29 Mar 2020 02:17:19 +0000 (19:17 -0700)]
drivers/base/memory.c: indicate all memory blocks as removable
We see multiple issues with the implementation/interface to compute
whether a memory block can be offlined (exposed via
/sys/devices/system/memory/memoryX/removable) and would like to simplify
it (remove the implementation).
1. It runs basically lockless. While this might be good for performance,
we see possible races with memory offlining that will require at
least some sort of locking to fix.
2. Nowadays, more false positives are possible. No arch-specific checks
are performed that validate if memory offlining will not be denied
right away (and such check will require locking). For example, arm64
won't allow to offline any memory block that was added during boot -
which will imply a very high error rate. Other archs have other
constraints.
3. The interface is inherently racy. E.g., if a memory block is detected
to be removable (and was not a false positive at that time), there is
still no guarantee that offlining will actually succeed. So any
caller already has to deal with false positives.
4. It is unclear which performance benefit this interface actually
provides. The introducing commit
5c755e9fd813 ("memory-hotplug: add
sysfs removable attribute for hotplug memory remove") mentioned
"A user-level agent must be able to identify which sections
of memory are likely to be removable before attempting the
potentially expensive operation."
However, no actual performance comparison was included.
Known users:
- lsmem: Will group memory blocks based on the "removable" property. [1]
- chmem: Indirect user. It has a RANGE mode where one can specify
removable ranges identified via lsmem to be offlined. However,
it also has a "SIZE" mode, which allows a sysadmin to skip the
manual "identify removable blocks" step. [2]
- powerpc-utils: Uses the "removable" attribute to skip some memory
blocks right away when trying to find some to offline+remove.
However, with ballooning enabled, it already skips this
information completely (because it once resulted in many false
negatives). Therefore, the implementation can deal with false
positives properly already. [3]
According to Nathan Fontenot, DLPAR on powerpc is nowadays no longer
driven from userspace via the drmgr command (powerpc-utils). Nowadays
it's managed in the kernel - including onlining/offlining of memory
blocks - triggered by drmgr writing to /sys/kernel/dlpar. So the
affected legacy userspace handling is only active on old kernels. Only
very old versions of drmgr on a new kernel (unlikely) might execute
slower - totally acceptable.
With CONFIG_MEMORY_HOTREMOVE, always indicating "removable" should not
break any user space tool. We implement a very bad heuristic now.
Without CONFIG_MEMORY_HOTREMOVE we cannot offline anything, so report
"not removable" as before.
Original discussion can be found in [4] ("[PATCH RFC v1] mm:
is_mem_section_removable() overhaul").
Other users of is_mem_section_removable() will be removed next, so that
we can remove is_mem_section_removable() completely.
[1] http://man7.org/linux/man-pages/man1/lsmem.1.html
[2] http://man7.org/linux/man-pages/man8/chmem.8.html
[3] https://github.com/ibm-power-utilities/powerpc-utils
[4] https://lkml.kernel.org/r/
20200117105759.27905-1-david@redhat.com
Also, this patch probably fixes a crash reported by Steve.
http://lkml.kernel.org/r/CAPcyv4jpdaNvJ67SkjyUJLBnBnXXQv686BiVW042g03FUmWLXw@mail.gmail.com
Reported-by: "Scargall, Steve" <steve.scargall@intel.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Nathan Fontenot <ndfont@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Karel Zak <kzak@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200128093542.6908-1-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naohiro Aota [Sun, 29 Mar 2020 02:17:15 +0000 (19:17 -0700)]
mm/swapfile.c: move inode_lock out of claim_swapfile
claim_swapfile() currently keeps the inode locked when it is successful,
or the file is already swapfile (with -EBUSY). And, on the other error
cases, it does not lock the inode.
This inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().
This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
check out of claim_swapfile(). The inode is unlocked in
"bad_swap_unlock_inode" section, so that the inode is ensured to be
unlocked at "bad_swap". Thus, error handling codes after the locking now
jumps to "bad_swap_unlock_inode" instead of "bad_swap".
=====================================
WARNING: bad unlock balance detected!
5.5.0-rc7+ #176 Not tainted
-------------------------------------
swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at: __do_sys_swapon+0x94b/0x3550
but there are no more locks to release!
other info that might help us debug this:
no locks held by swapon/4294.
stack backtrace:
CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
Call Trace:
dump_stack+0xa1/0xea
print_unlock_imbalance_bug.cold+0x114/0x123
lock_release+0x562/0xed0
up_write+0x2d/0x490
__do_sys_swapon+0x94b/0x3550
__x64_sys_swapon+0x54/0x80
do_syscall_64+0xa4/0x4b0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f15da0a0dc7
Fixes:
1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Qais Youef <qais.yousef@arm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 29 Mar 2020 01:55:15 +0000 (18:55 -0700)]
Merge git://git./linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Fix memory leak in vti6, from Torsten Hilbrich.
2) Fix double free in xfrm_policy_timer, from YueHaibing.
3) NL80211_ATTR_CHANNEL_WIDTH attribute is put with wrong type, from
Johannes Berg.
4) Wrong allocation failure check in qlcnic driver, from Xu Wang.
5) Get ks8851-ml IO operations right, for real this time, from Marek
Vasut.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (22 commits)
r8169: fix PHY driver check on platforms w/o module softdeps
net: ks8851-ml: Fix IO operations, again
mlxsw: spectrum_mr: Fix list iteration in error path
qlcnic: Fix bad kzalloc null test
mac80211: set IEEE80211_TX_CTRL_PORT_CTRL_PROTO for nl80211 TX
mac80211: mark station unauthorized before key removal
mac80211: Check port authorization in the ieee80211_tx_dequeue() case
cfg80211: Do not warn on same channel at the end of CSA
mac80211: drop data frames without key on encrypted links
ieee80211: fix HE SPR size calculation
nl80211: fix NL80211_ATTR_CHANNEL_WIDTH attribute type
xfrm: policy: Fix doulbe free in xfrm_policy_timer
bpf: Explicitly memset some bpf info structures declared on the stack
bpf: Explicitly memset the bpf_attr structure
bpf: Sanitize the bpf_struct_ops tcp-cc name
vti6: Fix memory leak of skb if input policy check fails
esp: remove the skb from the chain when it's enqueued in cryptd_wq
ipv6: xfrm6_tunnel.c: Use built-in RCU list checking
xfrm: add the missing verify_sec_ctx_len check in xfrm_add_acquire
xfrm: fix uctx len check in verify_sec_ctx_len
...
Linus Torvalds [Sat, 28 Mar 2020 20:11:26 +0000 (13:11 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Three more driver bugfixes, and two doc improvements fixing build
warnings while we are here"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: pca-platform: Use platform_irq_get_optional
i2c: st: fix missing struct parameter description
i2c: nvidia-gpu: Handle timeout correctly in gpu_i2c_check_status()
i2c: fix a doc warning
i2c: hix5hd2: add missed clk_disable_unprepare in remove
Linus Torvalds [Sat, 28 Mar 2020 16:14:16 +0000 (09:14 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes: one in drivers (qla2xxx), and one in the core (sd) to
try to cope with USB enclosures that silently change reported
parameters"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Fix optimal I/O size for devices that change reported values
scsi: qla2xxx: Fix I/Os being passed down when FC device is being deleted
Chris Packham [Thu, 26 Mar 2020 22:44:22 +0000 (11:44 +1300)]
i2c: pca-platform: Use platform_irq_get_optional
The interrupt is not required so use platform_irq_get_optional() to
avoid error messages like
i2c-pca-platform
22080000.i2c: IRQ index 0 not found
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Alain Volmat [Thu, 26 Mar 2020 21:22:43 +0000 (22:22 +0100)]
i2c: st: fix missing struct parameter description
Fix a missing struct parameter description to allow
warning free W=1 compilation.
Signed-off-by: Alain Volmat <avolmat@me.com>
Reviewed-by: Patrice Chotard <patrice.chotard@st.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
David S. Miller [Fri, 27 Mar 2020 23:18:51 +0000 (16:18 -0700)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2020-03-27
The following pull-request contains BPF updates for your *net* tree.
We've added 3 non-merge commits during the last 4 day(s) which contain
a total of 4 files changed, 25 insertions(+), 20 deletions(-).
The main changes are:
1) Explicitly memset the bpf_attr structure on bpf() syscall to avoid
having to rely on compiler to do so. Issues have been noticed on
some compilers with padding and other oddities where the request was
then unexpectedly rejected, from Greg Kroah-Hartman.
2) Sanitize the bpf_struct_ops TCP congestion control name in order to
avoid problematic characters such as whitespaces, from Martin KaFai Lau.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 27 Mar 2020 16:33:32 +0000 (17:33 +0100)]
r8169: fix PHY driver check on platforms w/o module softdeps
On Android/x86 the module loading infrastructure can't deal with
softdeps. Therefore the check for presence of the Realtek PHY driver
module fails. mdiobus_register() will try to load the PHY driver
module, therefore move the check to after this call and explicitly
check that a dedicated PHY driver is bound to the PHY device.
Fixes:
f32593773549 ("r8169: check that Realtek PHY driver module is loaded")
Reported-by: Chih-Wei Huang <cwhuang@android-x86.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 27 Mar 2020 21:56:55 +0000 (14:56 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2020-03-27
1) Handle NETDEV_UNREGISTER for xfrm device to handle asynchronous
unregister events cleanly. From Raed Salem.
2) Fix vti6 tunnel inter address family TX through bpf_redirect().
From Nicolas Dichtel.
3) Fix lenght check in verify_sec_ctx_len() to avoid a
slab-out-of-bounds. From Xin Long.
4) Add a missing verify_sec_ctx_len check in xfrm_add_acquire
to avoid a possible out-of-bounds to access. From Xin Long.
5) Use built-in RCU list checking of hlist_for_each_entry_rcu
to silence false lockdep warning in __xfrm6_tunnel_spi_lookup
when CONFIG_PROVE_RCU_LIST is enabled. From Madhuparna Bhowmik.
6) Fix a panic on esp offload when crypto is done asynchronously.
From Xin Long.
7) Fix a skb memory leak in an error path of vti6_rcv.
From Torsten Hilbrich.
8) Fix a race that can lead to a doulbe free in xfrm_policy_timer.
From Xin Long.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 27 Mar 2020 21:34:03 +0000 (14:34 -0700)]
Merge branch 'parisc-5.6-2' of git://git./linux/kernel/git/deller/parisc-linux
Pull parsic fix from Helge Deller:
"Fix a recursive loop when running 'make ARCH=parisc defconfig'"
* 'parisc-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix defconfig selection
Linus Torvalds [Fri, 27 Mar 2020 20:52:32 +0000 (13:52 -0700)]
Merge tag 'arm-soc-fixes-5.6' of git://git./linux/kernel/git/soc/soc
Pull ARM DT and driver fixes from Arnd Bergmann:
"For the devicetree files, there are a total of 20 patches, almost
entirely for 32-bit machines:
- The Allwinner/sun9i r40 SoC dtsi file contains a number of issues,
both for correctness and for style that are addressed in separate
patches. This causes most of the changed lines of the DT updates
this time.
- More Allwinner updates fixing the identification of the security
system on sun8i/A33, a recent regression of the A83t ethernet, and
a few board specific issues on the TBS-A711 macine.
- Several bug fixes for OMAP dts files, most notably fixing the
timings for the NAND flash on the Nokia N900 that regressed a while
ago after the move to configuring them from DT. Some other OMAPs
now set the correct dma limits on the L3 bus, and a regression fix
addresses lost Ethernet on dm814x
- One incorrect setting in the newly added Raspberry Pi Zero W that
may cause issues with the SD card controller.
- A missing property on the bcm2835 firmware node caused incorrect
DMA settings.
- An old bug on the oxnas platform causing spurious interrupts is
finally addressed.
- A regression on the Exynos Midas board broke the OLED panel power
supply.
- The i.MX6 phycore SoM specified the wrong voltage for the SoC, this
is now set to the values from the datasheet.
- Some 64-bit machines use a deprecated string to identify the PSCI
firmware.
There are also several small code fixes addressing mostly serious
issues:
- Fix the sunxi rsb bus access to no longer return incorrect data
when mixing 8 and 16 bit I/O.
- Fix a suspend/resume regression on the OMAP2+ lcdc from a missing
quirk in the ti-sysc driver
- Fix a NULL pointer access from a race in the fsl dpio driver
- Fix a v5.5 regression in the exynos-chipid driver that caused an
invalid error code probing the device on non-exynos platforms
- Fix an out-of-bounds access in the AMD TEE driver"
* tag 'arm-soc-fixes-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (24 commits)
soc: samsung: chipid: Fix return value on non-Exynos platforms
arm64: dts: Fix leftover entry-methods for PSCI
ARM: dts: exynos: Fix regulator node aliasing on Midas-based boards
ARM: dts: oxnas: Fix clear-mask property
ARM: dts: bcm283x: Fix vc4's firmware bus DMA limitations
ARM: dts: omap5: Add bus_dma_limit for L3 bus
ARM: dts: omap4-droid4: Fix lost touchscreen interrupts
ARM: dts: dra7: Add bus_dma_limit for L3 bus
ARM: bcm2835-rpi-zero-w: Add missing pinctrl name
ARM: dts: sun8i: a33: add the new SS compatible
dt-bindings: crypto: add new compatible for A33 SS
ARM: dts: sun8i: r40: Move SPI device nodes based on address order
ARM: dts: sun8i: r40: Fix register base address for SPI2 and SPI3
ARM: dts: sun8i: r40: Move AHCI device node based on address order
ARM: dts: imx6: phycore-som: fix arm and soc minimum voltage
soc: fsl: dpio: register dpio irq handlers after dpio create
tee: amdtee: out of bounds read in find_session()
ARM: dts: N900: fix onenand timings
bus: ti-sysc: Fix quirk flags for lcdc on am335x
ARM: dts: Fix dm814x Ethernet by changing to use rgmii-id mode
...
Linus Torvalds [Fri, 27 Mar 2020 18:06:10 +0000 (11:06 -0700)]
Merge tag 'riscv-for-linus-5.6' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"Sorry for the last minute patches, but a few things fell through the
cracks recently. I was on the fence about sending a late pull request
just for the M-mode fixes, as we don't really have any users, but the
last patch fixes the build for Fedora which I consider pretty
important.
Given that the M-mode fixes should be very low risk, I figured it's
worth sending them along as well.
Thhis passes my standard 'boot in QEMU' test"
* tag 'riscv-for-linus-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Move all address space definition macros to one place
RISC-V: Only select essential drivers for SOC_VIRT config
riscv: fix the IPI missing issue in nommu mode
riscv: uaccess should be used in nommu mode
Linus Torvalds [Fri, 27 Mar 2020 18:02:52 +0000 (11:02 -0700)]
Merge tag 'devicetree-fixes-for-5.6-4' of git://git./linux/kernel/git/robh/linux
Pull Devicetree fix from Rob Herring:
"A single fix for building dtc with GCC 10"
* tag 'devicetree-fixes-for-5.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
scripts/dtc: Remove redundant YYLOC global declaration
Linus Torvalds [Fri, 27 Mar 2020 17:50:31 +0000 (10:50 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fix from Will Deacon:
"Fix defconfig build when using Clang's integrated assembler"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: alternative: fix build with clang integrated assembler
Linus Torvalds [Fri, 27 Mar 2020 16:33:48 +0000 (09:33 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A handful of clk driver fixes.
Mostly they're around the i.MX drivers fixing the parents of a few
clks and making KASAN happy with how the message passing code works.
Besides that we have a TI driver fix for the RTC parent and a fix for
the basic gate type registration functions introduced this release
where they didn't actually pass the arguments in the right places to
the multiplexer function down below"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: imx: Align imx sc clock parent msg structs to 4
clk: imx: Align imx sc clock msg structs to 4
clk: Pass correct arguments to __clk_hw_register_gate()
clk: ti: am43xx: Fix clock parent for RTC clock
clk: imx8mp: Correct the enet_qos parent clock
clk: imx8mp: Correct IMX8MP_CLK_HDMI_AXI clock parent
Linus Torvalds [Fri, 27 Mar 2020 16:21:52 +0000 (09:21 -0700)]
Merge tag 'drm-fixes-2020-03-27' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Pretty quiet: some minor sg mapping fixes for 3 drivers, and a single
oops fix for the scheduler. I'm hoping nobody tries to send me a fixes
pull today but I'll keep an eye out of the weekend.
radeon/amdgpu/dma-buf:
- sg list fixes
scheduler:
- oops fix"
* tag 'drm-fixes-2020-03-27' of git://anongit.freedesktop.org/drm/drm:
drm/scheduler: fix rare NULL ptr race
drm/radeon: fix scatter-gather mapping with user pages
drm/amdgpu: fix scatter-gather mapping with user pages
drm/prime: use dma length macro when mapping sg
Helge Deller [Thu, 26 Mar 2020 22:31:43 +0000 (23:31 +0100)]
parisc: Fix defconfig selection
Fix the recursive loop when running "make ARCH=parisc defconfig".
Fixes:
84669923e1ed ("parisc: Regenerate parisc defconfigs")
Noticed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Dirk Mueller [Tue, 14 Jan 2020 17:53:41 +0000 (18:53 +0100)]
scripts/dtc: Remove redundant YYLOC global declaration
gcc 10 will default to -fno-common, which causes this error at link
time:
(.text+0x0): multiple definition of `yylloc'; dtc-lexer.lex.o (symbol from plugin):(.text+0x0): first defined here
This is because both dtc-lexer as well as dtc-parser define the same
global symbol yyloc. Before with -fcommon those were merged into one
defintion. The proper solution would be to to mark this as "extern",
however that leads to:
dtc-lexer.l:26:16: error: redundant redeclaration of 'yylloc' [-Werror=redundant-decls]
26 | extern YYLTYPE yylloc;
| ^~~~~~
In file included from dtc-lexer.l:24:
dtc-parser.tab.h:127:16: note: previous declaration of 'yylloc' was here
127 | extern YYLTYPE yylloc;
| ^~~~~~
cc1: all warnings being treated as errors
which means the declaration is completely redundant and can just be
dropped.
Signed-off-by: Dirk Mueller <dmueller@suse.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[robh: cherry-pick from upstream]
Cc: stable@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Yubo Xie [Fri, 27 Mar 2020 02:11:59 +0000 (19:11 -0700)]
clocksource/drivers/hyper-v: Make sched clock return nanoseconds correctly
The sched clock read functions return the HV clock (100ns granularity)
without converting it to nanoseconds.
Add the missing conversion.
Fixes:
bd00cd52d5be ("clocksource/drivers/hyperv: Add Hyper-V specific sched clock function")
Signed-off-by: Yubo Xie <yuboxie@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200327021159.31429-1-Tianyu.Lan@microsoft.com
Linus Torvalds [Fri, 27 Mar 2020 03:49:44 +0000 (20:49 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a fix to generate proper timestamps on key autorepeat events that
were broken recently
- a fix for Synaptics driver to only activate reduced reporting mode
when explicitly requested
- a new keycode for "selective screenshot" function
- other assorted fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: fix stale timestamp on key autorepeat events
Input: move the new KEY_SELECTIVE_SCREENSHOT keycode
Input: avoid BIT() macro usage in the serio.h UAPI header
Input: synaptics-rmi4 - set reduced reporting mode only when requested
Input: synaptics - enable RMI on HP Envy 13-ad105ng
Input: allocate keycode for "Selective Screenshot" key
Input: tm2-touchkey - add support for Coreriver TC360 variant
dt-bindings: input: add Coreriver TC360 binding
dt-bindings: vendor-prefixes: Add Coreriver vendor prefix
Input: raydium_i2c_ts - fix error codes in raydium_i2c_boot_trigger()
Marek Vasut [Wed, 25 Mar 2020 14:25:47 +0000 (15:25 +0100)]
net: ks8851-ml: Fix IO operations, again
This patch reverts
58292104832f ("net: ks8851-ml: Fix 16-bit IO operation")
and
edacb098ea9c ("net: ks8851-ml: Fix 16-bit data access"), because it
turns out these were only necessary due to buggy hardware. This patch adds
a check for such a buggy hardware to prevent any such mistakes again.
While working further on the KS8851 driver, it came to light that the
KS8851-16MLL is capable of switching bus endianness by a hardware strap,
EESK pin. If this strap is incorrect, the IO accesses require such endian
swapping as is being reverted by this patch. Such swapping also impacts
the performance significantly.
Hence, in addition to removing it, detect that the hardware is broken,
report to user, and fail to bind with such hardware.
Fixes:
58292104832f ("net: ks8851-ml: Fix 16-bit IO operation")
Fixes:
edacb098ea9c ("net: ks8851-ml: Fix 16-bit data access")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Petr Stetiar <ynezz@true.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Airlie [Fri, 27 Mar 2020 03:03:17 +0000 (13:03 +1000)]
Merge tag 'amd-drm-fixes-5.6-2020-03-26' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.6-2020-03-26:
Scheduler:
- Fix a race condition that could result in a segfault
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200326144538.3937-1-alexander.deucher@amd.com
Dave Airlie [Fri, 27 Mar 2020 02:33:13 +0000 (12:33 +1000)]
Merge tag 'drm-misc-fixes-2020-03-26' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.6:
- SG fixes for prime, radeon and amdgpu.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ef10e822-76dd-125d-ec1f-9a78c5f76bc3@linux.intel.com
Atish Patra [Thu, 26 Mar 2020 22:55:46 +0000 (15:55 -0700)]
RISC-V: Move all address space definition macros to one place
We get the following compilation error if CONFIG_SPARSEMEM_VMEMMAP is set.
---------------------------------------------------------------
./arch/riscv/include/asm/pgtable-64.h: In function ‘pud_page’:
./include/asm-generic/memory_model.h:54:29: error: ‘vmemmap’ undeclared
(first use in this function); did you mean ‘mem_map’?
#define __pfn_to_page(pfn) (vmemmap + (pfn))
^~~~~~~
./include/asm-generic/memory_model.h:82:21: note: in expansion of
macro ‘__pfn_to_page’
#define pfn_to_page __pfn_to_page
^~~~~~~~~~~~~
./arch/riscv/include/asm/pgtable-64.h:70:9: note: in expansion of macro
‘pfn_to_page’
return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
---------------------------------------------------------------
Fix the compliation errors by moving all the address space definition
macros before including pgtable-64.h.
Fixes:
8ad8b72721d0 (riscv: Add KASAN support)
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Dmitry Torokhov [Wed, 25 Mar 2020 17:57:54 +0000 (10:57 -0700)]
Input: fix stale timestamp on key autorepeat events
We need to refresh timestamp when emitting key autorepeat events, otherwise
they will carry timestamp of the original key press event.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206929
Fixes:
3b51c44bd693 ("Input: allow drivers specify timestamp for input events")
Cc: stable@vger.kernel.org
Reported-by: teika kazura <teika@gmx.com>
Tested-by: teika kazura <teika@gmx.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
David Howells [Thu, 26 Mar 2020 15:24:07 +0000 (15:24 +0000)]
afs: Fix unpinned address list during probing
When it's probing all of a fileserver's interfaces to find which one is
best to use, afs_do_probe_fileserver() takes a lock on the server record
and notes the pointer to the address list.
It doesn't, however, pin the address list, so as soon as it drops the
lock, there's nothing to stop the address list from being freed under
us.
Fix this by taking a ref on the address list inside the locked section
and dropping it at the end of the function.
Fixes:
3bf0fb6f33dd ("afs: Probe multiple fileservers simultaneously")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 26 Mar 2020 22:44:41 +0000 (15:44 -0700)]
Merge tag 'ceph-for-5.6-rc8' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A patch for a rather old regression in fullness handling and two
memory leak fixes, marked for stable"
* tag 'ceph-for-5.6-rc8' of git://github.com/ceph/ceph-client:
ceph: fix memory leak in ceph_cleanup_snapid_map()
libceph: fix alloc_msg_with_page_vector() memory leaks
ceph: check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL
Linus Torvalds [Thu, 26 Mar 2020 22:30:49 +0000 (15:30 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"x86 bug fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: X86: Narrow down the IPI fastpath to single target IPI
KVM: LAPIC: Also cancel preemption timer when disarm LAPIC timer
KVM: VMX: don't allow memory operands for inline asm that modifies SP
KVM: LAPIC: Mark hrtimer for period or oneshot mode to expire in hard interrupt context
KVM: SVM: Issue WBINVD after deactivating an SEV guest
KVM: SVM: document KVM_MEM_ENCRYPT_OP, let userspace detect if SEV is available
KVM: x86: remove bogus user-triggerable WARN_ON
Linus Torvalds [Thu, 26 Mar 2020 22:12:19 +0000 (15:12 -0700)]
MAINTAINERS: fix bad file pattern
Testing 'parse-maintainers' due to the previous commit shows a bad file
pattern for the "TI VPE/CAL DRIVERS" entry in the MAINTAINERS file.
There's also a lot of mis-ordered entries, but I'm still a bit nervous
about the inevitable and annoying merge problems it would probably cause
to fix them up.
The MAINTAINERS file is one of my least favorite files due to being huge
and centralized, but fixing it is also horribly painful for that reason.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Sun, 8 Mar 2020 02:59:05 +0000 (18:59 -0800)]
parse-maintainers: Do not sort section content by default
Add an --order switch to control section reordering.
Default for --order is off.
Change the default ordering to a slightly more sensible:
M: Person acting as a maintainer
R: Person acting as a patch reviewer
L: Mailing list where patches should be sent
S: Maintenance status
W: URI for general information
Q: URI for patchwork tracking
B: URI for bug tracking/submission
C: URI for chat
P: URI or file for subsystem specific coding styles
T: SCM tree type and location
F: File and directory pattern
X: File and directory exclusion pattern
N: File glob
K: Keyword - patch content regex
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry Torokhov [Thu, 26 Mar 2020 19:54:02 +0000 (12:54 -0700)]
Input: move the new KEY_SELECTIVE_SCREENSHOT keycode
We should try to keep keycodes sequential unless there is a reason to leave
a gap in numbering, so let's move it from 0x280 to 0x27a while we still
can.
Fixes:
3b059da9835c ("Input: allocate keycode for Selective Screenshot key")
Acked-by: Rajat Jain <rajatja@google.com>
Link: https://lore.kernel.org/r/20200326182711.GA259753@dtor-ws
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Anup Patel [Tue, 10 Mar 2020 11:59:25 +0000 (17:29 +0530)]
RISC-V: Only select essential drivers for SOC_VIRT config
The kconfig select causes build failues for SOC_VIRT config becaus
we are selecting lot of VIRTIO drivers without selecting all required
dependencies.
Better approach is to only select essential drivers from SOC_VIRT
config option and enable required VIRTIO drivers using defconfigs.
Fixes:
759bdc168181 ("RISC-V: Add kconfig option for QEMU virt machine")
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
David S. Miller [Thu, 26 Mar 2020 19:03:02 +0000 (12:03 -0700)]
Merge tag 'mac80211-for-net-2020-03-26' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
We have the following fixes:
* drop data packets if there's no key for them anymore, after
there had been one, to avoid sending them in clear when
hostapd removes the key before it removes the station and
the packets are still queued
* check port authorization again after dequeue, to avoid
sending packets if the station is no longer authorized
* actually remove the authorization flag before the key so
packets are also dropped properly because of this
* fix nl80211 control port packet tagging to handle them as
packets allowed to go out without encryption
* fix NL80211_ATTR_CHANNEL_WIDTH outgoing netlink attribute
width (should be 32 bits, not 8)
* don't WARN in a CSA scenario that happens on some APs
* fix HE spatial reuse element size calculation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 26 Mar 2020 14:17:33 +0000 (16:17 +0200)]
mlxsw: spectrum_mr: Fix list iteration in error path
list_for_each_entry_from_reverse() iterates backwards over the list from
the current position, but in the error path we should start from the
previous position.
Fix this by using list_for_each_entry_continue_reverse() instead.
This suppresses the following error from coccinelle:
drivers/net/ethernet/mellanox/mlxsw//spectrum_mr.c:655:34-38: ERROR:
invalid reference to the index variable of the iterator on line 636
Fixes:
c011ec1bbfd6 ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xu Wang [Thu, 26 Mar 2020 10:14:29 +0000 (18:14 +0800)]
qlcnic: Fix bad kzalloc null test
In qlcnic_83xx_get_reset_instruction_template, the variable
of null test is bad, so correct it.
Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 26 Mar 2020 17:39:36 +0000 (10:39 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"A small set of late-rc patches, mostly fixes for various crashers,
some syzkaller fixes and a mlx5 HW limitation:
- Several MAINTAINERS updates
- Memory leak regression in ODP
- Several fixes for syzkaller related crashes. Google recently taught
syzkaller to create the software RDMA devices
- Crash fixes for HFI1
- Several fixes for mlx5 crashes
- Prevent unprivileged access to an unsafe mlx5 HW resource"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/mlx5: Block delay drop to unprivileged users
RDMA/mlx5: Fix access to wrong pointer while performing flush due to error
RDMA/core: Ensure security pkey modify is not lost
MAINTAINERS: Clean RXE section and add Zhu as RXE maintainer
IB/hfi1: Ensure pq is not left on waitlist
IB/rdmavt: Free kernel completion queue when done
RDMA/mad: Do not crash if the rdma device does not have a umad interface
RDMA/core: Fix missing error check on dev_set_name()
RDMA/nl: Do not permit empty devices names during RDMA_NLDEV_CMD_NEWLINK/SET
RDMA/mlx5: Fix the number of hwcounters of a dynamic counter
MAINTAINERS: Update maintainers for HISILICON ROCE DRIVER
RDMA/odp: Fix leaking the tgid for implicit ODP
Johannes Berg [Thu, 26 Mar 2020 14:53:34 +0000 (15:53 +0100)]
mac80211: set IEEE80211_TX_CTRL_PORT_CTRL_PROTO for nl80211 TX
When a frame is transmitted via the nl80211 TX rather than as a
normal frame, IEEE80211_TX_CTRL_PORT_CTRL_PROTO wasn't set and
this will lead to wrong decisions (rate control etc.) being made
about the frame; fix this.
Fixes:
911806491425 ("mac80211: Add support for tx_control_port")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20200326155333.f183f52b02f0.I4054e2a8c11c2ddcb795a0103c87be3538690243@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Thu, 26 Mar 2020 14:51:35 +0000 (15:51 +0100)]
mac80211: mark station unauthorized before key removal
If a station is still marked as authorized, mark it as no longer
so before removing its keys. This allows frames transmitted to it
to be rejected, providing additional protection against leaking
plain text data during the disconnection flow.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200326155133.ccb4fb0bb356.If48f0f0504efdcf16b8921f48c6d3bb2cb763c99@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Jouni Malinen [Thu, 26 Mar 2020 14:51:34 +0000 (15:51 +0100)]
mac80211: Check port authorization in the ieee80211_tx_dequeue() case
mac80211 used to check port authorization in the Data frame enqueue case
when going through start_xmit(). However, that authorization status may
change while the frame is waiting in a queue. Add a similar check in the
dequeue case to avoid sending previously accepted frames after
authorization change. This provides additional protection against
potential leaking of frames after a station has been disconnected and
the keys for it are being removed.
Cc: stable@vger.kernel.org
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200326155133.ced84317ea29.I34d4c47cd8cc8a4042b38a76f16a601fbcbfd9b3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Ilan Peer [Thu, 26 Mar 2020 13:09:43 +0000 (15:09 +0200)]
cfg80211: Do not warn on same channel at the end of CSA
When cfg80211_update_assoc_bss_entry() is called, there is a
verification that the BSS channel actually changed. As some APs use
CSA also for bandwidth changes, this would result with a kernel
warning.
Fix this by removing the WARN_ON().
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20200326150855.96316ada0e8d.I6710376b1b4257e5f4712fc7ab16e2b638d512aa@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Thu, 26 Mar 2020 13:09:42 +0000 (15:09 +0200)]
mac80211: drop data frames without key on encrypted links
If we know that we have an encrypted link (based on having had
a key configured for TX in the past) then drop all data frames
in the key selection handler if there's no key anymore.
This fixes an issue with mac80211 internal TXQs - there we can
buffer frames for an encrypted link, but then if the key is no
longer there when they're dequeued, the frames are sent without
encryption. This happens if a station is disconnected while the
frames are still on the TXQ.
Detecting that a link should be encrypted based on a first key
having been configured for TX is fine as there are no use cases
for a connection going from with encryption to no encryption.
With extended key IDs, however, there is a case of having a key
configured for only decryption, so we can't just trigger this
behaviour on a key being configured.
Cc: stable@vger.kernel.org
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20200326150855.6865c7f28a14.I9fb1d911b064262d33e33dfba730cdeef83926ca@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Yintian Tao [Mon, 23 Mar 2020 11:19:37 +0000 (19:19 +0800)]
drm/scheduler: fix rare NULL ptr race
There is one one corner case at dma_fence_signal_locked
which will raise the NULL pointer problem just like below.
->dma_fence_signal
->dma_fence_signal_locked
->test_and_set_bit
here trigger dma_fence_release happen due to the zero of fence refcount.
->dma_fence_put
->dma_fence_release
->drm_sched_fence_release_scheduled
->call_rcu
here make the union fled “cb_list” at finished fence
to NULL because struct rcu_head contains two pointer
which is same as struct list_head cb_list
Therefore, to hold the reference of finished fence at drm_sched_process_job
to prevent the null pointer during finished fence dma_fence_signal
[ 732.912867] BUG: kernel NULL pointer dereference, address:
0000000000000008
[ 732.914815] #PF: supervisor write access in kernel mode
[ 732.915731] #PF: error_code(0x0002) - not-present page
[ 732.916621] PGD 0 P4D 0
[ 732.917072] Oops: 0002 [#1] SMP PTI
[ 732.917682] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 5.4.0-rc7 #1
[ 732.918980] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[ 732.920906] RIP: 0010:dma_fence_signal_locked+0x3e/0x100
[ 732.938569] Call Trace:
[ 732.939003] <IRQ>
[ 732.939364] dma_fence_signal+0x29/0x50
[ 732.940036] drm_sched_fence_finished+0x12/0x20 [gpu_sched]
[ 732.940996] drm_sched_process_job+0x34/0xa0 [gpu_sched]
[ 732.941910] dma_fence_signal_locked+0x85/0x100
[ 732.942692] dma_fence_signal+0x29/0x50
[ 732.943457] amdgpu_fence_process+0x99/0x120 [amdgpu]
[ 732.944393] sdma_v4_0_process_trap_irq+0x81/0xa0 [amdgpu]
v2: hold the finished fence at drm_sched_process_job instead of
amdgpu_fence_process
v3: resume the blank line
Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wanpeng Li [Thu, 26 Mar 2020 02:20:01 +0000 (10:20 +0800)]
KVM: X86: Narrow down the IPI fastpath to single target IPI
The original single target IPI fastpath patch forgot to filter the
ICR destination shorthand field. Multicast IPI is not suitable for
this feature since wakeup the multiple sleeping vCPUs will extend
the interrupt disabled time, it especially worse in the over-subscribe
and VM has a little bit more vCPUs scenario. Let's narrow it down to
single target IPI.
Two VMs, each is 76 vCPUs, one running 'ebizzy -M', the other
running cyclictest on all vCPUs, w/ this patch, the avg score
of cyclictest can improve more than 5%. (pv tlb, pv ipi, pv
sched yield are disabled during testing to avoid the disturb).
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <
1585189202-1708-3-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Leonard Crestez [Thu, 20 Feb 2020 16:29:33 +0000 (18:29 +0200)]
clk: imx: Align imx sc clock parent msg structs to 4
The imx SC api strongly assumes that messages are composed out of
4-bytes words but some of our message structs have odd sizeofs.
This produces many oopses with CONFIG_KASAN=y.
Fix by marking with __aligned(4).
Fixes:
666aed2d13ee ("clk: imx: scu: add set parent support")
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Link: https://lkml.kernel.org/r/aad021e432b3062c142973d09b766656eec18fde.1582216144.git.leonard.crestez@nxp.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Leonard Crestez [Thu, 20 Feb 2020 16:29:32 +0000 (18:29 +0200)]
clk: imx: Align imx sc clock msg structs to 4
The imx SC api strongly assumes that messages are composed out of
4-bytes words but some of our message structs have odd sizeofs.
This produces many oopses with CONFIG_KASAN=y.
Fix by marking with __aligned(4).
Fixes:
fe37b4820417 ("clk: imx: add scu clock common part")
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Link: https://lkml.kernel.org/r/10e97a04980d933b2cfecb6b124bf9046b6e4f16.1582216144.git.leonard.crestez@nxp.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Stephen Boyd [Wed, 25 Mar 2020 02:22:57 +0000 (19:22 -0700)]
clk: Pass correct arguments to __clk_hw_register_gate()
I copy/pasted these macros and forgot to update the argument
names and where they're passed to. Fix it so that these macros make
sense.
Reported-by: Maxime Ripard <maxime@cerno.tech>
Fixes:
194efb6e2667 ("clk: gate: Add support for specifying parents via DT/pointers")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lkml.kernel.org/r/20200325022257.148244-1-sboyd@kernel.org
Tested-by: Maxime Ripard <mripard@kernel.org>
Linus Torvalds [Wed, 25 Mar 2020 20:58:05 +0000 (13:58 -0700)]
Merge git://git./linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Fix deadlock in bpf_send_signal() from Yonghong Song.
2) Fix off by one in kTLS offload of mlx5, from Tariq Toukan.
3) Add missing locking in iwlwifi mvm code, from Avraham Stern.
4) Fix MSG_WAITALL handling in rxrpc, from David Howells.
5) Need to hold RTNL mutex in tcindex_partial_destroy_work(), from Cong
Wang.
6) Fix producer race condition in AF_PACKET, from Willem de Bruijn.
7) cls_route removes the wrong filter during change operations, from
Cong Wang.
8) Reject unrecognized request flags in ethtool netlink code, from
Michal Kubecek.
9) Need to keep MAC in reset until PHY is up in bcmgenet driver, from
Doug Berger.
10) Don't leak ct zone template in act_ct during replace, from Paul
Blakey.
11) Fix flushing of offloaded netfilter flowtable flows, also from Paul
Blakey.
12) Fix throughput drop during tx backpressure in cxgb4, from Rahul
Lakkireddy.
13) Don't let a non-NULL skb->dev leave the TCP stack, from Eric
Dumazet.
14) TCP_QUEUE_SEQ socket option has to update tp->copied_seq as well,
also from Eric Dumazet.
15) Restrict macsec to ethernet devices, from Willem de Bruijn.
16) Fix reference leak in some ethtool *_SET handlers, from Michal
Kubecek.
17) Fix accidental disabling of MSI for some r8169 chips, from Heiner
Kallweit.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (138 commits)
net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build
net: ena: Add PCI shutdown handler to allow safe kexec
selftests/net/forwarding: define libs as TEST_PROGS_EXTENDED
selftests/net: add missing tests to Makefile
r8169: re-enable MSI on RTL8168c
net: phy: mdio-bcm-unimac: Fix clock handling
cxgb4/ptp: pass the sign of offset delta in FW CMD
net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop
net: cbs: Fix software cbs to consider packet sending time
net/mlx5e: Do not recover from a non-fatal syndrome
net/mlx5e: Fix ICOSQ recovery flow with Striding RQ
net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset
net/mlx5e: Enhance ICOSQ WQE info fields
net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failure
selftests: netfilter: add nfqueue test case
netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress
netfilter: nft_fwd_netdev: validate family and chain type
netfilter: nft_set_rbtree: Detect partial overlaps on insertion
netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start()
netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion
...
Linus Torvalds [Wed, 25 Mar 2020 20:52:36 +0000 (13:52 -0700)]
Merge tag 'gpio-v5.6-3' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
- One core quirk by myself to fix the .irq_disable() semantics when the
gpiolib core takes over this callback.
- The rest is an elaborate series of four patches fixing Intel laptop
ACPI wakeup quirks.
* tag 'gpio-v5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 CHT + AXP288 model
gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 BYT + AXP288 model
gpiolib: acpi: Rework honor_wakeup option into an ignore_wake option
gpiolib: acpi: Correct comment for HP x2 10 honor_wakeup quirk
gpiolib: Fix irq_disable() semantics
David S. Miller [Wed, 25 Mar 2020 20:12:26 +0000 (13:12 -0700)]
Merge tag 'wireless-drivers-2020-03-25' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for v5.6
Fourth, and last, set of fixes for v5.6. Just two important fixes to
iwlwifi regressions.
iwlwifi
* fix GEO_TX_POWER_LIMIT command on certain devices which caused
firmware to crash during initialisation
* add back device ids for three devices which were accidentally
removed
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Wed, 25 Mar 2020 12:47:18 +0000 (13:47 +0100)]
net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build
net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
pkt->skb->tc_redirected = 1;
^~
net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
pkt->skb->tc_from_ingress = 1;
^~
To avoid a direct dependency with tc actions from netfilter, wrap the
redirect bits around CONFIG_NET_REDIRECT and move helpers to
include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
only existing client of these bits in the tree.
This patch adds skb_set_redirected() that sets on the redirected bit
on the skbuff, it specifies if the packet was redirect from ingress
and resets the timestamp (timestamp reset was originally missing in the
netfilter bugfix).
Fixes:
bcfabee1afd99484 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress")
Reported-by: noreply@ellerman.id.au
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guilherme G. Piccoli [Fri, 20 Mar 2020 12:55:34 +0000 (09:55 -0300)]
net: ena: Add PCI shutdown handler to allow safe kexec
Currently ENA only provides the PCI remove() handler, used during rmmod
for example. This is not called on shutdown/kexec path; we are potentially
creating a failure scenario on kexec:
(a) Kexec is triggered, no shutdown() / remove() handler is called for ENA;
instead pci_device_shutdown() clears the master bit of the PCI device,
stopping all DMA transactions;
(b) Kexec reboot happens and the device gets enabled again, likely having
its FW with that DMA transaction buffered; then it may trigger the (now
invalid) memory operation in the new kernel, corrupting kernel memory area.
This patch aims to prevent this, by implementing a shutdown() handler
quite similar to the remove() one - the difference being the handling
of the netdev, which is unregistered on remove(), but following the
convention observed in other drivers, it's only detached on shutdown().
This prevents an odd issue in AWS Nitro instances, in which after the 2nd
kexec the next one will fail with an initrd corruption, caused by a wild
DMA write to invalid kernel memory. The lspci output for the adapter
present in my instance is:
00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network
Adapter (ENA) [1d0f:ec20]
Suggested-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Acked-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Wed, 25 Mar 2020 08:41:01 +0000 (16:41 +0800)]
selftests/net/forwarding: define libs as TEST_PROGS_EXTENDED
The lib files should not be defined as TEST_PROGS, or we will run them
in run_kselftest.sh.
Also remove ethtool_lib.sh exec permission.
Fixes:
81573b18f26d ("selftests/net/forwarding: add Makefile to install tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Wed, 25 Mar 2020 08:07:01 +0000 (16:07 +0800)]
selftests/net: add missing tests to Makefile
Find some tests are missed in Makefile by running:
for file in $(ls *.sh); do grep -q $file Makefile || echo $file; done
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 25 Mar 2020 17:34:02 +0000 (10:34 -0700)]
Merge tag 'zonefs-5.6-rc7' of git://git./linux/kernel/git/dlemoal/zonefs
Pull zonefs fix from Damien Le Moal:
"A single fix from me to correctly handle the size of read-only zone
files"
* tag 'zonefs-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonfs: Fix handling of read-only zones
Shane Francis [Wed, 25 Mar 2020 09:07:41 +0000 (09:07 +0000)]
drm/radeon: fix scatter-gather mapping with user pages
Calls to dma_map_sg may return less segments / entries than requested
if they fall on page bounderies. The old implementation did not
support this use case.
Fixes:
be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206461
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206895
Bug: https://gitlab.freedesktop.org/drm/amd/issues/1056
Signed-off-by: Shane Francis <bigbeeshane@gmail.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200325090741.21957-4-bigbeeshane@gmail.com
Cc: stable@vger.kernel.org
Shane Francis [Wed, 25 Mar 2020 09:07:40 +0000 (09:07 +0000)]
drm/amdgpu: fix scatter-gather mapping with user pages
Calls to dma_map_sg may return less segments / entries than requested
if they fall on page bounderies. The old implementation did not
support this use case.
Fixes:
be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206461
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206895
Bug: https://gitlab.freedesktop.org/drm/amd/issues/1056
Signed-off-by: Shane Francis <bigbeeshane@gmail.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200325090741.21957-3-bigbeeshane@gmail.com
Cc: stable@vger.kernel.org
Shane Francis [Wed, 25 Mar 2020 09:07:39 +0000 (09:07 +0000)]
drm/prime: use dma length macro when mapping sg
As dma_map_sg can reorganize scatter-gather lists in a
way that can cause some later segments to be empty we should
always use the sg_dma_len macro to fetch the actual length.
This could now be 0 and not need to be mapped to a page or
address array
Fixes:
be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206461
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206895
Bug: https://gitlab.freedesktop.org/drm/amd/issues/1056
Signed-off-by: Shane Francis <bigbeeshane@gmail.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200325090741.21957-2-bigbeeshane@gmail.com
Cc: stable@vger.kernel.org