Miklos Szeredi [Wed, 7 Apr 2021 12:36:42 +0000 (14:36 +0200)]
ecryptfs: stack fileattr ops
Add stacking for the fileattr operations.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Tyler Hicks <code@tyhicks.com>
Miklos Szeredi [Wed, 7 Apr 2021 12:36:42 +0000 (14:36 +0200)]
vfs: add fileattr ops
There's a substantial amount of boilerplate in filesystems handling
FS_IOC_[GS]ETFLAGS/ FS_IOC_FS[GS]ETXATTR ioctls.
Also due to userspace buffers being involved in the ioctl API this is
difficult to stack, as shown by overlayfs issues related to these ioctls.
Introduce a new internal API named "fileattr" (fsxattr can be confused with
xattr, xflags is inappropriate, since this is more than just flags).
There's significant overlap between flags and xflags and this API handles
the conversions automatically, so filesystems may choose which one to use.
In ->fileattr_get() a hint is provided to the filesystem whether flags or
xattr are being requested by userspace, but in this series this hint is
ignored by all filesystems, since generating all the attributes is cheap.
If a filesystem doesn't implemement the fileattr API, just fall back to
f_op->ioctl(). When all filesystems are converted, the fallback can be
removed.
32bit compat ioctls are now handled by the generic code as well.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Linus Torvalds [Sun, 4 Apr 2021 21:15:36 +0000 (14:15 -0700)]
Linux 5.12-rc6
Zheyu Ma [Sat, 3 Apr 2021 06:58:36 +0000 (06:58 +0000)]
firewire: nosy: Fix a use-after-free bug in nosy_ioctl()
For each device, the nosy driver allocates a pcilynx structure.
A use-after-free might happen in the following scenario:
1. Open nosy device for the first time and call ioctl with command
NOSY_IOC_START, then a new client A will be malloced and added to
doubly linked list.
2. Open nosy device for the second time and call ioctl with command
NOSY_IOC_START, then a new client B will be malloced and added to
doubly linked list.
3. Call ioctl with command NOSY_IOC_START for client A, then client A
will be readded to the doubly linked list. Now the doubly linked
list is messed up.
4. Close the first nosy device and nosy_release will be called. In
nosy_release, client A will be unlinked and freed.
5. Close the second nosy device, and client A will be referenced,
resulting in UAF.
The root cause of this bug is that the element in the doubly linked list
is reentered into the list.
Fix this bug by adding a check before inserting a client. If a client
is already in the linked list, don't insert it.
The following KASAN report reveals it:
BUG: KASAN: use-after-free in nosy_release+0x1ea/0x210
Write of size 8 at addr
ffff888102ad7360 by task poc
CPU: 3 PID: 337 Comm: poc Not tainted 5.12.0-rc5+ #6
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
Call Trace:
nosy_release+0x1ea/0x210
__fput+0x1e2/0x840
task_work_run+0xe8/0x180
exit_to_user_mode_prepare+0x114/0x120
syscall_exit_to_user_mode+0x1d/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
Allocated by task 337:
nosy_open+0x154/0x4d0
misc_open+0x2ec/0x410
chrdev_open+0x20d/0x5a0
do_dentry_open+0x40f/0xe80
path_openat+0x1cf9/0x37b0
do_filp_open+0x16d/0x390
do_sys_openat2+0x11d/0x360
__x64_sys_open+0xfd/0x1a0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
Freed by task 337:
kfree+0x8f/0x210
nosy_release+0x158/0x210
__fput+0x1e2/0x840
task_work_run+0xe8/0x180
exit_to_user_mode_prepare+0x114/0x120
syscall_exit_to_user_mode+0x1d/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at
ffff888102ad7300 which belongs to the cache kmalloc-128 of size 128
The buggy address is located 96 bytes inside of 128-byte region [
ffff888102ad7300,
ffff888102ad7380)
[ Modified to use 'list_empty()' inside proper lock - Linus ]
Link: https://lore.kernel.org/lkml/1617433116-5930-1-git-send-email-zheyuma97@gmail.com/
Reported-and-tested-by: 马哲宇 (Zheyu Ma) <zheyuma97@gmail.com>
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 3 Apr 2021 22:42:45 +0000 (15:42 -0700)]
Merge tag 'for-linus' of git://github.com/openrisc/linux
Pull OpenRISC fix from Stafford Horne:
"Fix duplicate header include in Litex SOC driver"
* tag 'for-linus' of git://github.com/openrisc/linux:
soc: litex: Remove duplicated header file inclusion
Linus Torvalds [Sat, 3 Apr 2021 21:26:47 +0000 (14:26 -0700)]
Merge tag 'io_uring-5.12-2021-04-03' of git://git.kernel.dk/linux-block
POull io_uring fix from Jens Axboe:
"Just fixing a silly braino in a previous patch, where we'd end up
failing to compile if CONFIG_BLOCK isn't enabled.
Not that a lot of people do that, but kernel bot spotted it and it's
probably prudent to just flush this out now before -rc6.
Sorry about that, none of my test compile configs have !CONFIG_BLOCK"
* tag 'io_uring-5.12-2021-04-03' of git://git.kernel.dk/linux-block:
io_uring: fix !CONFIG_BLOCK compilation failure
Zhen Lei [Wed, 31 Mar 2021 13:06:43 +0000 (15:06 +0200)]
soc: litex: Remove duplicated header file inclusion
The header file <linux/errno.h> is already included above and can be
removed here.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Mateusz Holenko <mholenko@antmicro.com>
Signed-off-by: Stafford Horne <shorne@gmail.com>
Linus Torvalds [Sat, 3 Apr 2021 19:15:01 +0000 (12:15 -0700)]
Merge tag 'gfs2-v5.12-rc2-fixes2' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
"Two more gfs2 fixes"
* tag 'gfs2-v5.12-rc2-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: report "already frozen/thawed" errors
gfs2: Flag a withdraw if init_threads() fails
Linus Torvalds [Sat, 3 Apr 2021 18:52:18 +0000 (11:52 -0700)]
Merge tag 'riscv-for-linus-5.12-rc6' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"A handful of fixes for 5.12:
- fix a stack tracing regression related to "const register asm"
variables, which have unexpected behavior.
- ensure the value to be written by put_user() is evaluated before
enabling access to userspace memory..
- align the exception vector table correctly, so we don't rely on the
firmware's handling of unaligned accesses.
- build fix to make NUMA depend on MMU, which triggered on some
randconfigs"
* tag 'riscv-for-linus-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Make NUMA depend on MMU
riscv: remove unneeded semicolon
riscv,entry: fix misaligned base for excp_vect_table
riscv: evaluate put_user() arg before enabling user access
riscv: Drop const annotation for sp
Linus Torvalds [Sat, 3 Apr 2021 17:49:38 +0000 (10:49 -0700)]
Merge tag 'powerpc-5.12-5' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fix a bug on pseries where spurious wakeups from H_PROD would prevent
partition migration from succeeding.
Fix oopses seen in pcpu_alloc(), caused by parallel faults of the
percpu mapping causing us to corrupt the protection key used for the
mapping, and cause a fatal key fault.
Thanks to Aneesh Kumar K.V, Murilo Opsfelder Araujo, and Nathan Lynch"
* tag 'powerpc-5.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm/book3s64: Use the correct storage key value when calling H_PROTECT
powerpc/pseries/mobility: handle premature return from H_JOIN
powerpc/pseries/mobility: use struct for shared state
Linus Torvalds [Sat, 3 Apr 2021 17:42:20 +0000 (10:42 -0700)]
Merge tag 'hyperv-fixes-signed-
20210402' of git://git./linux/kernel/git/hyperv/linux
Pull Hyper-V fixes from Wei Liu:
"One fix from Lu Yunlong for a double free in hvfb_probe"
* tag 'hyperv-fixes-signed-
20210402' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
video: hyperv_fb: Fix a double free in hvfb_probe
Linus Torvalds [Sat, 3 Apr 2021 17:14:47 +0000 (10:14 -0700)]
Merge tag 'driver-core-5.12-rc6' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is a single driver core fix for a reported problem with differed
probing. It has been in linux-next for a while with no reported
problems"
* tag 'driver-core-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: clear deferred probe reason on probe retry
Linus Torvalds [Sat, 3 Apr 2021 17:05:16 +0000 (10:05 -0700)]
Merge tag 'char-misc-5.12-rc6' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are a few small driver char/misc changes for 5.12-rc6.
Nothing major here, a few fixes for reported issues:
- interconnect fixes for problems found
- fbcon syzbot-found fix
- extcon fixes
- firmware stratix10 bugfix
- MAINTAINERS file update.
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
drivers: video: fbcon: fix NULL dereference in fbcon_cursor()
mei: allow map and unmap of client dma buffer only for disconnected client
MAINTAINERS: Add linux-phy list and patchwork
interconnect: Fix kerneldoc warning
firmware: stratix10-svc: reset COMMAND_RECONFIG_FLAG_PARTIAL to 0
extcon: Fix error handling in extcon_dev_register
extcon: Add stubs for extcon_register_notifier_all() functions
interconnect: core: fix error return code of icc_link_destroy()
interconnect: qcom: msm8939: remove rpm-ids from non-RPM nodes
Linus Torvalds [Sat, 3 Apr 2021 17:03:51 +0000 (10:03 -0700)]
Merge tag 'staging-5.12-rc6' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are two rtl8192e staging driver fixes for reported problems.
Both of these have been in linux-next for a while with no reported
issues"
* tag 'staging-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8192e: Change state information from u16 to u8
staging: rtl8192e: Fix incorrect source in memcpy()
Linus Torvalds [Sat, 3 Apr 2021 17:00:53 +0000 (10:00 -0700)]
Merge tag 'tty-5.12-rc6' of git://git./linux/kernel/git/gregkh/tty
Pull serial driver fix from Greg KH:
"Here is a single serial driver fix for 5.12-rc6. Is is a revert of a
change that showed up in 5.9 that has been reported to cause problems.
It has been in linux-next for a while with no reported issues"
* tag 'tty-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
soc: qcom-geni-se: Cleanup the code to remove proxy votes
Linus Torvalds [Sat, 3 Apr 2021 16:56:22 +0000 (09:56 -0700)]
Merge tag 'usb-5.12-rc6' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a few small USB driver fixes for 5.12-rc6 to resolve reported
problems.
They include:
- a number of cdc-acm fixes for reported problems. It seems more
people are using this driver lately...
- dwc3 driver fixes for reported problems, and fixes for the fixes :)
- dwc2 driver fixes for reported issues.
- musb driver fix.
- new USB quirk additions.
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (23 commits)
usb: dwc2: Prevent core suspend when port connection flag is 0
usb: dwc2: Fix HPRT0.PrtSusp bit setting for HiKey 960 board.
usb: musb: Fix suspend with devices connected for a64
usb: xhci-mtk: fix broken streams issue on 0.96 xHCI
usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable
usbip: vhci_hcd fix shift out-of-bounds in vhci_hub_control()
USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem
USB: cdc-acm: do not log successful probe on later errors
USB: cdc-acm: always claim data interface
USB: cdc-acm: use negation for NULL checks
USB: cdc-acm: clean up probe error labels
USB: cdc-acm: drop redundant driver-data reset
USB: cdc-acm: drop redundant driver-data assignment
USB: cdc-acm: fix use-after-free after probe failure
USB: cdc-acm: fix double free on probe failure
USB: cdc-acm: downgrade message to debug
USB: cdc-acm: untangle a circular dependency between callback and softint
cdc-acm: fix BREAK rx code path adding necessary calls
usb: gadget: udc: amd5536udc_pci fix null-ptr-dereference
usb: dwc3: pci: Enable dis_uX_susphy_quirk for Intel Merrifield
...
Linus Torvalds [Sat, 3 Apr 2021 16:07:35 +0000 (09:07 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"A single fix to iscsi for a rare race condition which can cause a
kernel panic"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: iscsi: Fix race condition between login and sync thread
Jens Axboe [Sat, 3 Apr 2021 01:45:34 +0000 (19:45 -0600)]
io_uring: fix !CONFIG_BLOCK compilation failure
kernel test robot correctly pinpoints a compilation failure if
CONFIG_BLOCK isn't set:
fs/io_uring.c: In function '__io_complete_rw':
>> fs/io_uring.c:2509:48: error: implicit declaration of function 'io_rw_should_reissue'; did you mean 'io_rw_reissue'? [-Werror=implicit-function-declaration]
2509 | if ((res == -EAGAIN || res == -EOPNOTSUPP) && io_rw_should_reissue(req)) {
| ^~~~~~~~~~~~~~~~~~~~
| io_rw_reissue
cc1: some warnings being treated as errors
Ensure that we have a stub declaration of io_rw_should_reissue() for
!CONFIG_BLOCK.
Fixes:
230d50d448ac ("io_uring: move reissue into regular IO path")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 2 Apr 2021 23:13:13 +0000 (16:13 -0700)]
Merge tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Remove comment that never came to fruition in 22 years of development
(Christoph)
- Remove unused request flag (Christoph)
- Fix for null_blk fake timeout handling (Damien)
- Fix for IOCB_NOWAIT being ignored for O_DIRECT on raw bdevs (Pavel)
- Error propagation fix for multiple split bios (Yufen)
* tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-block:
block: remove the unused RQF_ALLOCED flag
block: update a few comments in uapi/linux/blkpg.h
block: don't ignore REQ_NOWAIT for direct IO
null_blk: fix command timeout completion handling
block: only update parent bi_status when bio fail
Linus Torvalds [Fri, 2 Apr 2021 23:08:19 +0000 (16:08 -0700)]
Merge tag 'io_uring-5.12-2021-04-02' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Nothing really major in here, and finally nothing really related to
signals. A few minor fixups related to the threading changes, and some
general fixes, that's it.
There's the pending gdb-get-confused-about-arch, but that's more of a
cosmetic issue, nothing that hinder use of it. And given that other
archs will likely be affected by that oddity too, better to postpone
any changes there until 5.13 imho"
* tag 'io_uring-5.12-2021-04-02' of git://git.kernel.dk/linux-block:
io_uring: move reissue into regular IO path
io_uring: fix EIOCBQUEUED iter revert
io_uring/io-wq: protect against sprintf overflow
io_uring: don't mark S_ISBLK async work as unbounded
io_uring: drop sqd lock before handling signals for SQPOLL
io_uring: handle setup-failed ctx in kill_timeouts
io_uring: always go for cancellation spin on exec
Linus Torvalds [Fri, 2 Apr 2021 22:34:17 +0000 (15:34 -0700)]
Merge tag 'acpi-5.12-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix an ACPI tables management issue, an issue related to the
ACPI enumeration of devices and CPU wakeup in the ACPI processor
driver.
Specifics:
- Ensure that the memory occupied by ACPI tables on x86 will always
be reserved to prevent it from being allocated for other purposes
which was possible in some cases (Rafael Wysocki).
- Fix the ACPI device enumeration code to prevent it from attempting
to evaluate the _STA control method for devices with unmet
dependencies which is likely to fail (Hans de Goede).
- Fix the handling of CPU0 wakeup in the ACPI processor driver to
prevent CPU0 online failures from occurring (Vitaly Kuznetsov)"
* tag 'acpi-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()
ACPI: scan: Fix _STA getting called on devices with unmet dependencies
ACPI: tables: x86: Reserve memory occupied by ACPI tables
Linus Torvalds [Fri, 2 Apr 2021 22:17:08 +0000 (15:17 -0700)]
Merge tag 'pm-5.12-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a race condition and an ordering issue related to using
device links in the runtime PM framework and two kerneldoc comments in
cpufreq.
Specifics:
- Fix race condition related to the handling of supplier devices
during consumer device probe and fix the order of decrementation of
two related reference counters in the runtime PM core code handling
supplier devices (Adrian Hunter).
- Fix kerneldoc comments in cpufreq that have not been updated along
with the functions documented by them (Geert Uytterhoeven)"
* tag 'pm-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: runtime: Fix race getting/putting suppliers at probe
PM: runtime: Fix ordering in pm_runtime_get_suppliers()
cpufreq: Fix scaling_{available,boost}_frequencies_show() comments
Christoph Hellwig [Fri, 2 Apr 2021 17:17:46 +0000 (19:17 +0200)]
block: remove the unused RQF_ALLOCED flag
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 2 Apr 2021 17:17:31 +0000 (19:17 +0200)]
block: update a few comments in uapi/linux/blkpg.h
The big top of the file comment talk about grand plans that never
happened, so remove them to not confuse the readers. Also mark the
devname and volname fields as ignored as they were never used by the
kernel.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 2 Apr 2021 15:39:00 +0000 (08:39 -0700)]
Merge tag 'trace-v5.12-rc5-2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix stack trace entry size to stop showing garbage
The macro that creates both the structure and the format displayed to
user space for the stack trace event was changed a while ago to fix
the parsing by user space tooling. But this change also modified the
structure used to store the stack trace event. It changed the caller
array field from [0] to [8].
Even though the size in the ring buffer is dynamic and can be
something other than 8 (user space knows how to handle this), the 8
extra words was not accounted for when reserving the event on the ring
buffer, and added 8 more entries, due to the calculation of
"sizeof(*entry) + nr_entries * sizeof(long)", as the sizeof(*entry)
now contains 8 entries.
The size of the caller field needs to be subtracted from the size of
the entry to create the correct allocation size"
* tag 'trace-v5.12-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix stack trace event size
Jens Axboe [Fri, 2 Apr 2021 02:41:15 +0000 (20:41 -0600)]
io_uring: move reissue into regular IO path
It's non-obvious how retry is done for block backed files, when it happens
off the kiocb done path. It also makes it tricky to deal with the iov_iter
handling.
Just mark the req as needing a reissue, and handling it from the
submission path instead. This makes it directly obvious that we're not
re-importing the iovec from userspace past the submit point, and it means
that we can just reuse our usual -EAGAIN retry path from the read/write
handling.
At some point in the future, we'll gain the ability to always reliably
return -EAGAIN through the stack. A previous attempt on the block side
didn't pan out and got reverted, hence the need to check for this
information out-of-band right now.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Rafael J. Wysocki [Fri, 2 Apr 2021 14:57:56 +0000 (16:57 +0200)]
Merge branches 'acpi-tables' and 'acpi-scan'
* acpi-tables:
ACPI: tables: x86: Reserve memory occupied by ACPI tables
* acpi-scan:
ACPI: scan: Fix _STA getting called on devices with unmet dependencies
Rafael J. Wysocki [Fri, 2 Apr 2021 14:45:58 +0000 (16:45 +0200)]
Merge branch 'pm-cpufreq'
* pm-cpufreq:
cpufreq: Fix scaling_{available,boost}_frequencies_show() comments
Pavel Begunkov [Fri, 20 Nov 2020 17:10:28 +0000 (17:10 +0000)]
block: don't ignore REQ_NOWAIT for direct IO
If IOCB_NOWAIT is set on submission, then that needs to get propagated to
REQ_NOWAIT on the block side. Otherwise we completely lose this
information, and any issuer of IOCB_NOWAIT IO will potentially end up
blocking on eg request allocation on the storage side.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Kefeng Wang [Tue, 30 Mar 2021 13:25:31 +0000 (21:25 +0800)]
riscv: Make NUMA depend on MMU
NUMA is useless when NOMMU, and it leads some build error,
make it depend on MMU.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Yang Li [Mon, 22 Mar 2021 08:38:36 +0000 (16:38 +0800)]
riscv: remove unneeded semicolon
Eliminate the following coccicheck warning:
./arch/riscv/mm/kasan_init.c:219:2-3: Unneeded semicolon
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Zihao Yu [Wed, 17 Mar 2021 08:17:25 +0000 (16:17 +0800)]
riscv,entry: fix misaligned base for excp_vect_table
In RV64, the size of each entry in excp_vect_table is 8 bytes. If the
base of the table is not 8-byte aligned, loading an entry in the table
will raise a misaligned exception. Although such exception will be
handled by opensbi/bbl, this still causes performance degradation.
Signed-off-by: Zihao Yu <yuzihao@ict.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Ben Dooks [Mon, 29 Mar 2021 09:57:49 +0000 (10:57 +0100)]
riscv: evaluate put_user() arg before enabling user access
The <asm/uaccess.h> header has a problem with put_user(a, ptr) if
the 'a' is not a simple variable, such as a function. This can lead
to the compiler producing code as so:
1: enable_user_access()
2: evaluate 'a' into register 'r'
3: put 'r' to 'ptr'
4: disable_user_acess()
The issue is that 'a' is now being evaluated with the user memory
protections disabled. So we try and force the evaulation by assigning
'x' to __val at the start, and hoping the compiler barriers in
enable_user_access() do the job of ordering step 2 before step 1.
This has shown up in a bug where 'a' sleeps and thus schedules out
and loses the SR_SUM flag. This isn't sufficient to fully fix, but
should reduce the window of opportunity. The first instance of this
we found is in scheudle_tail() where the code does:
$ less -N kernel/sched/core.c
4263 if (current->set_child_tid)
4264 put_user(task_pid_vnr(current), current->set_child_tid);
Here, the task_pid_vnr(current) is called within the block that has
enabled the user memory access. This can be made worse with KASAN
which makes task_pid_vnr() a rather large call with plenty of
opportunity to sleep.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
Suggested-by: Arnd Bergman <arnd@arndb.de>
--
Changes since v1:
- fixed formatting and updated the patch description with more info
Changes since v2:
- fixed commenting on __put_user() (schwab@linux-m68k.org)
Change since v3:
- fixed RFC in patch title. Should be ready to merge.
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Kefeng Wang [Wed, 17 Mar 2021 15:08:38 +0000 (23:08 +0800)]
riscv: Drop const annotation for sp
The const annotation should not be used for 'sp', or it will
become read only and lead to bad stack output.
Fixes:
dec822771b01 ("riscv: stacktrace: Move register keyword to beginning of declaration")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Linus Torvalds [Fri, 2 Apr 2021 00:57:43 +0000 (17:57 -0700)]
Merge tag 'lto-v5.12-rc6' of git://git./linux/kernel/git/kees/linux
Pull LTO fix from Kees Cook:
"It seems that there is a bug in ld.bfd when doing module section
merging.
As explicit merging is only needed for LTO, the work-around is to only
do it under LTO, leaving the original section layout choices alone
under normal builds:
- Only perform explicit module section merges under LTO (Sean
Christopherson)"
* tag 'lto-v5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kbuild: lto: Merge module sections if and only if CONFIG_LTO_CLANG is enabled
Sean Christopherson [Mon, 22 Mar 2021 23:44:38 +0000 (16:44 -0700)]
kbuild: lto: Merge module sections if and only if CONFIG_LTO_CLANG is enabled
Merge module sections only when using Clang LTO. With ld.bfd, merging
sections does not appear to update the symbol tables for the module,
e.g. 'readelf -s' shows the value that a symbol would have had, if
sections were not merged. ld.lld does not show this problem.
The stale symbol table breaks gdb's function disassembler, and presumably
other things, e.g.
gdb -batch -ex "file arch/x86/kvm/kvm.ko" -ex "disassemble kvm_init"
reads the wrong bytes and dumps garbage.
Fixes:
dd2776222abb ("kbuild: lto: merge module sections")
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210322234438.502582-1-seanjc@google.com
Linus Torvalds [Thu, 1 Apr 2021 19:42:55 +0000 (12:42 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"It's a bit larger than I (and probably you) would like by the time we
get to -rc6, but perhaps not entirely unexpected since the changes in
the last merge window were larger than usual.
x86:
- Fixes for missing TLB flushes with TDP MMU
- Fixes for race conditions in nested SVM
- Fixes for lockdep splat with Xen emulation
- Fix for kvmclock underflow
- Fix srcdir != builddir builds
- Other small cleanups
ARM:
- Fix GICv3 MMIO compatibility probing
- Prevent guests from using the ARMv8.4 self-hosted tracing
extension"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0)
KVM: x86: Prevent 'hv_clock->system_time' from going negative in kvm_guest_time_update()
KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken
KVM: x86: reduce pvclock_gtod_sync_lock critical sections
KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit
KVM: SVM: load control fields from VMCB12 before checking them
KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap
KVM: make: Fix out-of-source module builds
selftests: kvm: make hardware_disable_test less verbose
KVM: x86/vPMU: Forbid writing to MSR_F15H_PERF MSRs when guest doesn't have X86_FEATURE_PERFCTR_CORE
KVM: x86: remove unused declaration of kvm_write_tsc()
KVM: clean up the unused argument
tools/kvm_stat: Add restart delay
KVM: arm64: Fix CPU interface MMIO compatibility detection
KVM: arm64: Disable guest access to trace filter controls
KVM: arm64: Hide system instruction access to Trace registers
Linus Torvalds [Thu, 1 Apr 2021 19:19:03 +0000 (12:19 -0700)]
Merge tag 'drm-fixes-2021-04-02' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Things have settled down in time for Easter, a random smattering of
small fixes across a few drivers.
I'm guessing though there might be some i915 and misc fixes out there
I haven't gotten yet, but since today is a public holiday here, I'm
sending this early so I can have the day off, I'll see if more
requests come in and decide what to do with them later.
amdgpu:
- Polaris idle power fix
- VM fix
- Vangogh S3 fix
- Fixes for non-4K page sizes
amdkfd:
- dqm fence memory corruption fix
tegra:
- lockdep warning fix
- runtine PM reference fix
- display controller fix
- PLL Fix
imx:
- memory leak in error path fix
- LDB driver channel registration fix
- oob array warning in LDB driver
exynos
- unused header file removal"
* tag 'drm-fixes-2021-04-02' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: check alignment on CPU page for bo map
drm/amdgpu: Set a suitable dev_info.gart_page_size
drm/amdgpu/vangogh: don't check for dpm in is_dpm_running when in suspend
drm/amdkfd: dqm fence memory corruption
drm/tegra: sor: Grab runtime PM reference across reset
drm/tegra: dc: Restore coupling of display controllers
gpu: host1x: Use different lock classes for each client
drm/tegra: dc: Don't set PLL clock to 0Hz
drm/amdgpu: fix offset calculation in amdgpu_vm_bo_clear_mappings()
drm/amd/pm: no need to force MCLK to highest when no display connected
drm/exynos/decon5433: Remove the unused include statements
drm/imx: imx-ldb: fix out of bounds array access warning
drm/imx: imx-ldb: Register LDB channel1 when it is the only channel to be used
drm/imx: fix memory leak when fails to init
Dave Airlie [Thu, 1 Apr 2021 18:52:45 +0000 (04:52 +1000)]
Merge tag 'imx-drm-fixes-2021-04-01' of git://git.pengutronix.de/git/pza/linux into drm-fixes
drm/imx: imx-drm-core and imx-ldb fixes
Fix a memory leak in an error path during DRM device initialization,
fix the LDB driver to register channel 1 even if channel 0 is unused,
and fix an out of bounds array access warning in the LDB driver.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210401092235.GA13586@pengutronix.de
Dave Airlie [Thu, 1 Apr 2021 18:44:28 +0000 (04:44 +1000)]
Merge tag 'drm/tegra/for-5.12-rc6' of ssh://git.freedesktop.org/git/tegra/linux into drm-fixes
drm/tegra: Fixes for v5.12-rc6
This contains a couple of fixes for various issues such as lockdep
warnings, runtime PM references, coupled display controllers and
misconfigured PLLs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210401163352.3348296-1-thierry.reding@gmail.com
Steven Rostedt (VMware) [Thu, 1 Apr 2021 17:54:40 +0000 (13:54 -0400)]
tracing: Fix stack trace event size
Commit
cbc3b92ce037 fixed an issue to modify the macros of the stack trace
event so that user space could parse it properly. Originally the stack
trace format to user space showed that the called stack was a dynamic
array. But it is not actually a dynamic array, in the way that other
dynamic event arrays worked, and this broke user space parsing for it. The
update was to make the array look to have 8 entries in it. Helper
functions were added to make it parse it correctly, as the stack was
dynamic, but was determined by the size of the event stored.
Although this fixed user space on how it read the event, it changed the
internal structure used for the stack trace event. It changed the array
size from [0] to [8] (added 8 entries). This increased the size of the
stack trace event by 8 words. The size reserved on the ring buffer was the
size of the stack trace event plus the number of stack entries found in
the stack trace. That commit caused the amount to be 8 more than what was
needed because it did not expect the caller field to have any size. This
produced 8 entries of garbage (and reading random data) from the stack
trace event:
<idle>-0 [002] d... 1976396.837549: <stack trace>
=> trace_event_raw_event_sched_switch
=> __traceiter_sched_switch
=> __schedule
=> schedule_idle
=> do_idle
=> cpu_startup_entry
=> secondary_startup_64_no_verify
=> 0xc8c5e150ffff93de
=> 0xffff93de
=> 0
=> 0
=> 0xc8c5e17800000000
=> 0x1f30affff93de
=> 0x00000004
=> 0x200000000
Instead, subtract the size of the caller field from the size of the event
to make sure that only the amount needed to store the stack trace is
reserved.
Link: https://lore.kernel.org/lkml/your-ad-here.call-01617191565-ext-9692@work.hours/
Cc: stable@vger.kernel.org
Fixes:
cbc3b92ce037 ("tracing: Set kernel_stack's caller size properly")
Reported-by: Vasily Gorbik <gor@linux.ibm.com>
Tested-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Linus Torvalds [Thu, 1 Apr 2021 17:09:31 +0000 (10:09 -0700)]
Merge tag 'sound-5.12-rc6' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Things seem calming down, only usual device-specific fixes for
HD-audio and USB-audio at this time"
* tag 'sound-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: fix mute/micmute LEDs for HP 640 G8
ALSA: hda: Add missing sanity checks in PM prepare/complete callbacks
ALSA: hda: Re-add dropped snd_poewr_change_state() calls
ALSA: usb-audio: Apply sample rate quirk to Logitech Connect
ALSA: hda/realtek: call alc_update_headset_mode() in hp_automute_hook
ALSA: hda/realtek: fix a determine_headset_type issue for a Dell AIO
Linus Torvalds [Thu, 1 Apr 2021 16:39:51 +0000 (09:39 -0700)]
Merge tag 'tomoyo-pr-
20210401' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1
Pull tomory fix from Tetsuo Handa:
"An update on 'tomoyo: recognize kernel threads correctly' from Jens
Axboe to not special case PF_IO_WORKER for PF_KTHREAD"
* tag 'tomoyo-pr-
20210401' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
tomoyo: don't special case PF_IO_WORKER for PF_KTHREAD
Linus Torvalds [Thu, 1 Apr 2021 16:32:18 +0000 (09:32 -0700)]
Merge tag 'xarray-5.12' of git://git.infradead.org/users/willy/xarray
Pull XArray fixes from Matthew Wilcox:
"My apologies for the lateness of this. I had a bug reported in the
test suite, and when I started working on it, I realised I had two
fixes sitting in the xarray tree since last November. Anyway,
everything here is fixes, apart from adding xa_limit_16b. The test
suite passes.
Summary:
- Fix a bug when splitting to a non-zero order
- Documentation fix
- Add a predefined 16-bit allocation limit
- Various test suite fixes"
* tag 'xarray-5.12' of git://git.infradead.org/users/willy/xarray:
idr test suite: Improve reporting from idr_find_test_1
idr test suite: Create anchor before launching throbber
idr test suite: Take RCU read lock in idr_find_test_1
radix tree test suite: Register the main thread with the RCU library
radix tree test suite: Fix compilation
XArray: Add xa_limit_16b
XArray: Fix splitting to non-zero orders
XArray: Fix split documentation
Pavel Begunkov [Thu, 1 Apr 2021 11:18:48 +0000 (12:18 +0100)]
io_uring: fix EIOCBQUEUED iter revert
iov_iter_revert() is done in completion handlers that happensf before
read/write returns -EIOCBQUEUED, no need to repeat reverting afterwards.
Moreover, even though it may appear being just a no-op, it's actually
races with 1) user forging a new iovec of a different size 2) reissue,
that is done via io-wq continues completely asynchronously.
Fixes:
3e6a0d3c7571c ("io_uring: fix -EAGAIN retry with IOPOLL")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 1 Apr 2021 08:55:04 +0000 (09:55 +0100)]
io_uring/io-wq: protect against sprintf overflow
task_pid may be large enough to not fit into the left space of
TASK_COMM_LEN-sized buffers and overflow in sprintf. We not so care
about uniqueness, so replace it with safer snprintf().
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1702c6145d7e1c46fbc382f28334c02e1a3d3994.1617267273.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 1 Apr 2021 14:38:34 +0000 (08:38 -0600)]
io_uring: don't mark S_ISBLK async work as unbounded
S_ISBLK is marked as unbounded work for async preparation, because it
doesn't match S_ISREG. That is incorrect, as any read/write to a block
device is also a bounded operation. Fix it up and ensure that S_ISBLK
isn't marked unbounded.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Wed, 31 Mar 2021 22:52:44 +0000 (07:52 +0900)]
null_blk: fix command timeout completion handling
Memory backed or zoned null block devices may generate actual request
timeout errors due to the submission path being blocked on memory
allocation or zone locking. Unlike fake timeouts or injected timeouts,
the request submission path will call blk_mq_complete_request() or
blk_mq_end_request() for these real timeout errors, causing a double
completion and use after free situation as the block layer timeout
handler executes blk_mq_rq_timed_out() and __blk_mq_free_request() in
blk_mq_check_expired(). This problem often triggers a NULL pointer
dereference such as:
BUG: kernel NULL pointer dereference, address:
0000000000000050
RIP: 0010:blk_mq_sched_mark_restart_hctx+0x5/0x20
...
Call Trace:
dd_finish_request+0x56/0x80
blk_mq_free_request+0x37/0x130
null_handle_cmd+0xbf/0x250 [null_blk]
? null_queue_rq+0x67/0xd0 [null_blk]
blk_mq_dispatch_rq_list+0x122/0x850
__blk_mq_do_dispatch_sched+0xbb/0x2c0
__blk_mq_sched_dispatch_requests+0x13d/0x190
blk_mq_sched_dispatch_requests+0x30/0x60
__blk_mq_run_hw_queue+0x49/0x90
process_one_work+0x26c/0x580
worker_thread+0x55/0x3c0
? process_one_work+0x580/0x580
kthread+0x134/0x150
? kthread_create_worker_on_cpu+0x70/0x70
ret_from_fork+0x1f/0x30
This problem very often triggers when running the full btrfs xfstests
on a memory-backed zoned null block device in a VM with limited amount
of memory.
Avoid this by executing blk_mq_complete_request() in null_timeout_rq()
only for commands that are marked for a fake timeout completion using
the fake_timeout boolean in struct null_cmd. For timeout errors injected
through debugfs, the timeout handler will execute
blk_mq_complete_request()i as before. This is safe as the submission
path does not execute complete requests in this case.
In null_timeout_rq(), also make sure to set the command error field to
BLK_STS_TIMEOUT and to propagate this error through to the request
completion.
Reported-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Reviewed-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210331225244.126426-1-damien.lemoal@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Matthew Wilcox (Oracle) [Thu, 1 Apr 2021 11:50:42 +0000 (07:50 -0400)]
idr test suite: Improve reporting from idr_find_test_1
Instead of just reporting an assertion failure, report enough information
that we can start diagnosing exactly went wrong.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Matthew Wilcox (Oracle) [Thu, 1 Apr 2021 11:46:49 +0000 (07:46 -0400)]
idr test suite: Create anchor before launching throbber
The throbber could race with creation of the anchor entry and cause the
IDR to have zero entries in it, which would cause the test to fail.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Matthew Wilcox (Oracle) [Thu, 1 Apr 2021 11:44:48 +0000 (07:44 -0400)]
idr test suite: Take RCU read lock in idr_find_test_1
When run on a single CPU, this test would frequently access already-freed
memory. Due to timing, this bug never showed up on multi-CPU tests.
Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Matthew Wilcox (Oracle) [Wed, 31 Mar 2021 18:59:19 +0000 (14:59 -0400)]
radix tree test suite: Register the main thread with the RCU library
Several test runners register individual worker threads with the
RCU library, but neglect to register the main thread, which can lead
to objects being freed while the main thread is in what appears to be
an RCU critical section.
Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Vitaly Kuznetsov [Wed, 24 Mar 2021 15:22:19 +0000 (16:22 +0100)]
ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()
Commit
496121c02127 ("ACPI: processor: idle: Allow probing on platforms
with one ACPI C-state") broke CPU0 hotplug on certain systems, e.g.
I'm observing the following on AWS Nitro (e.g r5b.xlarge but other
instance types are affected as well):
# echo 0 > /sys/devices/system/cpu/cpu0/online
# echo 1 > /sys/devices/system/cpu/cpu0/online
<10 seconds delay>
-bash: echo: write error: Input/output error
In fact, the above mentioned commit only revealed the problem and did
not introduce it. On x86, to wakeup CPU an NMI is being used and
hlt_play_dead()/mwait_play_dead() loops are prepared to handle it:
/*
* If NMI wants to wake up CPU0, start CPU0.
*/
if (wakeup_cpu0())
start_cpu0();
cpuidle_play_dead() -> acpi_idle_play_dead() (which is now being called on
systems where it wasn't called before the above mentioned commit) serves
the same purpose but it doesn't have a path for CPU0. What happens now on
wakeup is:
- NMI is sent to CPU0
- wakeup_cpu0_nmi() works as expected
- we get back to while (1) loop in acpi_idle_play_dead()
- safe_halt() puts CPU0 to sleep again.
The straightforward/minimal fix is add the special handling for CPU0 on x86
and that's what the patch is doing.
Fixes:
496121c02127 ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Vitaly Kuznetsov [Fri, 26 Mar 2021 15:55:51 +0000 (16:55 +0100)]
selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0)
Add a test for the issue when KVM_SET_CLOCK(0) call could cause
TSC page value to go very big because of a signedness issue around
hv_clock->system_time.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20210326155551.17446-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Wed, 31 Mar 2021 12:41:29 +0000 (14:41 +0200)]
KVM: x86: Prevent 'hv_clock->system_time' from going negative in kvm_guest_time_update()
When guest time is reset with KVM_SET_CLOCK(0), it is possible for
'hv_clock->system_time' to become a small negative number. This happens
because in KVM_SET_CLOCK handling we set 'kvm->arch.kvmclock_offset' based
on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled,
kvm_guest_time_update() does (masterclock in use case):
hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset;
And 'master_kernel_ns' represents the last time when masterclock
got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a
problem, the difference is very small, e.g. I'm observing
hv_clock.system_time = -70 ns. The issue comes from the fact that
'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in
compute_tsc_page_parameters() becomes a very big number.
Use 'master_kernel_ns' instead of get_kvmclock_ns() when masterclock is in
use and get_kvmclock_base_ns() when it's not to prevent 'system_time' from
going negative.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20210331124130.337992-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 25 Mar 2021 18:11:14 +0000 (14:11 -0400)]
KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken
pvclock_gtod_sync_lock can be taken with interrupts disabled if the
preempt notifier calls get_kvmclock_ns to update the Xen
runstate information:
spin_lock include/linux/spinlock.h:354 [inline]
get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587
kvm_xen_update_runstate+0x3d/0x2c0 arch/x86/kvm/xen.c:69
kvm_xen_update_runstate_guest+0x74/0x320 arch/x86/kvm/xen.c:100
kvm_xen_runstate_set_preempted arch/x86/kvm/xen.h:96 [inline]
kvm_arch_vcpu_put+0x2d8/0x5a0 arch/x86/kvm/x86.c:4062
So change the users of the spinlock to spin_lock_irqsave and
spin_unlock_irqrestore.
Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
Fixes:
30b5c851af79 ("KVM: x86/xen: Add support for vCPU runstate information")
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 25 Mar 2021 18:05:11 +0000 (14:05 -0400)]
KVM: x86: reduce pvclock_gtod_sync_lock critical sections
There is no need to include changes to vcpu->requests into
the pvclock_gtod_sync_lock critical section. The changes to
the shared data structures (in pvclock_update_vm_gtod_copy)
already occur under the lock.
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 31 Mar 2021 11:50:54 +0000 (07:50 -0400)]
Merge branch 'kvm-fix-svm-races' into kvm-master
Paolo Bonzini [Wed, 31 Mar 2021 10:28:01 +0000 (06:28 -0400)]
KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit
Fixing nested_vmcb_check_save to avoid all TOC/TOU races
is a bit harder in released kernels, so do the bare minimum
by avoiding that EFER.SVME is cleared. This is problematic
because svm_set_efer frees the data structures for nested
virtualization if EFER.SVME is cleared.
Also check that EFER.SVME remains set after a nested vmexit;
clearing it could happen if the bit is zero in the save area
that is passed to KVM_SET_NESTED_STATE (the save area of the
nested state corresponds to the nested hypervisor's state
and is restored on the next nested vmexit).
Cc: stable@vger.kernel.org
Fixes:
2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 31 Mar 2021 10:24:43 +0000 (06:24 -0400)]
KVM: SVM: load control fields from VMCB12 before checking them
Avoid races between check and use of the nested VMCB controls. This
for example ensures that the VMRUN intercept is always reflected to the
nested hypervisor, instead of being processed by the host. Without this
patch, it is possible to end up with svm->nested.hsave pointing to
the MSR permission bitmap for nested guests.
This bug is CVE-2021-29657.
Reported-by: Felix Wilhelm <fwilhelm@google.com>
Cc: stable@vger.kernel.org
Fixes:
2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Dave Airlie [Thu, 1 Apr 2021 05:04:58 +0000 (15:04 +1000)]
Merge tag 'amd-drm-fixes-5.12-2021-03-31' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.12-2021-03-31:
amdgpu:
- Polaris idle power fix
- VM fix
- Vangogh S3 fix
- Fixes for non-4K page sizes
amdkfd:
- dqm fence memory corruption fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210401020057.17831-1-alexander.deucher@amd.com
Dave Airlie [Thu, 1 Apr 2021 03:30:49 +0000 (13:30 +1000)]
Merge tag 'exynos-drm-fixes-for-v5.12-rc6' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-fixes
Just one cleanup which drops of_gpio.h inclusion.
- This header file isn't used anymore so drop it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Inki Dae <inki.dae@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1617016858-14081-1-git-send-email-inki.dae@samsung.com
Xℹ Ruoyao [Tue, 30 Mar 2021 15:33:34 +0000 (23:33 +0800)]
drm/amdgpu: check alignment on CPU page for bo map
The page table of AMDGPU requires an alignment to CPU page so we should
check ioctl parameters for it. Return -EINVAL if some parameter is
unaligned to CPU page, instead of corrupt the page table sliently.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Huacai Chen [Tue, 30 Mar 2021 15:33:33 +0000 (23:33 +0800)]
drm/amdgpu: Set a suitable dev_info.gart_page_size
In Mesa, dev_info.gart_page_size is used for alignment and it was
set to AMDGPU_GPU_PAGE_SIZE(4KB). However, the page table of AMDGPU
driver requires an alignment on CPU pages. So, for non-4KB page system,
gart_page_size should be max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE).
Signed-off-by: Rui Wang <wangr@lemote.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Link: https://github.com/loongson-community/linux-stable/commit/caa9c0a1
[Xi: rebased for drm-next, use max_t for checkpatch,
and reworded commit message.]
Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549
Tested-by: Dan Horák <dan@danny.cz>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Alex Deucher [Fri, 26 Mar 2021 20:56:07 +0000 (16:56 -0400)]
drm/amdgpu/vangogh: don't check for dpm in is_dpm_running when in suspend
Do the same thing we do for Renoir. We can check, but since
the sbios has started DPM, it will always return true which
causes the driver to skip some of the SMU init when it shouldn't.
Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Qu Huang [Thu, 28 Jan 2021 12:14:25 +0000 (20:14 +0800)]
drm/amdkfd: dqm fence memory corruption
Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.
Changes since v1:
* Change dqm->fence_addr as a u64 pointer to fix this issue,
also fix up query_status and amdkfd_fence_wait_timeout function
uses 64 bit fence value to make them consistent.
Signed-off-by: Qu Huang <jinsdb@126.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Yufen Yu [Wed, 31 Mar 2021 11:53:59 +0000 (07:53 -0400)]
block: only update parent bi_status when bio fail
For multiple split bios, if one of the bio is fail, the whole
should return error to application. But we found there is a race
between bio_integrity_verify_fn and bio complete, which return
io success to application after one of the bio fail. The race as
following:
split bio(READ) kworker
nvme_complete_rq
blk_update_request //split error=0
bio_endio
bio_integrity_endio
queue_work(kintegrityd_wq, &bip->bip_work);
bio_integrity_verify_fn
bio_endio //split bio
__bio_chain_endio
if (!parent->bi_status)
<interrupt entry>
nvme_irq
blk_update_request //parent error=7
req_bio_endio
bio->bi_status = 7 //parent bio
<interrupt exit>
parent->bi_status = 0
parent->bi_end_io() // return bi_status=0
The bio has been split as two: split and parent. When split
bio completed, it depends on kworker to do endio, while
bio_integrity_verify_fn have been interrupted by parent bio
complete irq handler. Then, parent bio->bi_status which have
been set in irq handler will overwrite by kworker.
In fact, even without the above race, we also need to conside
the concurrency beteen mulitple split bio complete and update
the same parent bi_status. Normally, multiple split bios will
be issued to the same hctx and complete from the same irq
vector. But if we have updated queue map between multiple split
bios, these bios may complete on different hw queue and different
irq vector. Then the concurrency update parent bi_status may
cause the final status error.
Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210331115359.1125679-1-yuyufen@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Wed, 31 Mar 2021 17:14:55 +0000 (10:14 -0700)]
Merge tag 'trace-v5.12-rc5' of git://git./linux/kernel/git/rostedt/linux-trace
Pull ftrace fix from Steven Rostedt:
"Add check of order < 0 before calling free_pages()
The function addresses that are traced by ftrace are stored in pages,
and the size is held in a variable. If there's some error in creating
them, the allocate ones will be freed. In this case, it is possible
that the order of pages to be freed may end up being negative due to a
size of zero passed to get_count_order(), and then that negative
number will cause free_pages() to free a very large section.
Make sure that does not happen"
* tag 'trace-v5.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Check if pages were allocated before calling free_pages()
Linus Torvalds [Wed, 31 Mar 2021 17:09:44 +0000 (10:09 -0700)]
Merge tag 'pinctrl-v5.12-2' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Some overly ripe fixes for the v5.12 kernel. I should have sent
earlier but had my head stuck in GDB.
All are driver fixes:
- Fix up some Intel GPIO base calculations.
- Fix a register offset in the Microchip driver.
- Fix suspend/resume bug in the Rockchip driver.
- Default pull up strength in the Qualcomm LPASS driver.
- Fix two pingroup offsets in the Qualcomm SC7280 driver.
- Fix SDC1 register offset in the Qualcomm SC7280 driver.
- Fix a nasty string concatenation in the Qualcomm SDX55 driver.
- Check the REVID register to see if the device is real or
virtualized during virtualization in the Intel driver"
* tag 'pinctrl-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: intel: check REVID register value for device presence
pinctrl: qcom: fix unintentional string concatenation
pinctrl: qcom: sc7280: Fix SDC1_RCLK configurations
pinctrl: qcom: sc7280: Fix SDC_QDSD_PINGROUP and UFS_RESET offsets
pinctrl: qcom: lpass lpi: use default pullup/strength values
pinctrl: rockchip: fix restore error in resume
pinctrl: microchip-sgpio: Fix wrong register offset for IRQ trigger
pinctrl: intel: Show the GPIO base calculation explicitly
Paolo Bonzini [Wed, 31 Mar 2021 11:45:19 +0000 (07:45 -0400)]
Merge commit 'kvm-tdp-fix-flushes' into kvm-master
Tetsuo Handa [Sun, 21 Mar 2021 14:37:49 +0000 (23:37 +0900)]
reiserfs: update reiserfs_xattrs_initialized() condition
syzbot is reporting NULL pointer dereference at reiserfs_security_init()
[1], for commit
ab17c4f02156c4f7 ("reiserfs: fixup xattr_root caching")
is assuming that REISERFS_SB(s)->xattr_root != NULL in
reiserfs_xattr_jcreate_nblocks() despite that commit made
REISERFS_SB(sb)->priv_root != NULL && REISERFS_SB(s)->xattr_root == NULL
case possible.
I guess that commit
6cb4aff0a77cc0e6 ("reiserfs: fix oops while creating
privroot with selinux enabled") wanted to check xattr_root != NULL
before reiserfs_xattr_jcreate_nblocks(), for the changelog is talking
about the xattr root.
The issue is that while creating the privroot during mount
reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which
dereferences the xattr root. The xattr root doesn't exist, so we get
an oops.
Therefore, update reiserfs_xattrs_initialized() to check both the
privroot and the xattr root.
Link: https://syzkaller.appspot.com/bug?id=8abaedbdeb32c861dc5340544284167dd0e46cde
Reported-and-tested-by: syzbot <syzbot+690cb1e51970435f9775@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes:
6cb4aff0a77c ("reiserfs: fix oops while creating privroot with selinux enabled")
Acked-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Jan Kara <jack@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jens Axboe [Mon, 29 Mar 2021 12:52:44 +0000 (06:52 -0600)]
io_uring: drop sqd lock before handling signals for SQPOLL
Don't call into get_signal() with the sqd mutex held, it'll fail if we're
freezing the task and we'll get complaints on locks still being held:
====================================
WARNING: iou-sqp-8386/8387 still has locks held!
5.12.0-rc4-syzkaller #0 Not tainted
------------------------------------
1 lock held by iou-sqp-8386/8387:
#0:
ffff88801e1d2470 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread+0x24c/0x13a0 fs/io_uring.c:6731
stack backtrace:
CPU: 1 PID: 8387 Comm: iou-sqp-8386 Not tainted 5.12.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
try_to_freeze include/linux/freezer.h:66 [inline]
get_signal+0x171a/0x2150 kernel/signal.c:2576
io_sq_thread+0x8d2/0x13a0 fs/io_uring.c:6748
Fold the get_signal() case in with the parking checks, as we need to drop
the lock in both cases, and since we need to be checking for parking when
juggling the lock anyway.
Reported-by: syzbot+796d767eb376810256f5@syzkaller.appspotmail.com
Fixes:
dbe1bdbb39db ("io_uring: handle signals for IO threads like a normal thread")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hans de Goede [Tue, 30 Mar 2021 18:49:32 +0000 (20:49 +0200)]
ACPI: scan: Fix _STA getting called on devices with unmet dependencies
Commit
71da201f38df ("ACPI: scan: Defer enumeration of devices with
_DEP lists") dropped the following 2 lines from acpi_init_device_object():
/* Assume there are unmet deps until acpi_device_dep_initialize() runs */
device->dep_unmet = 1;
Leaving the initial value of dep_unmet at the 0 from the kzalloc(). This
causes the acpi_bus_get_status() call in acpi_add_single_object() to
actually call _STA, even though there maybe unmet deps, leading to errors
like these:
[ 0.123579] ACPI Error: No handler for Region [ECRM] (
00000000ba9edc4c)
[GenericSerialBus] (
20170831/evregion-166)
[ 0.123601] ACPI Error: Region GenericSerialBus (ID=9) has no handler
(
20170831/exfldio-299)
[ 0.123618] ACPI Error: Method parse/execution failed
\_SB.I2C1.BAT1._STA, AE_NOT_EXIST (
20170831/psparse-550)
Fix this by re-adding the dep_unmet = 1 initialization to
acpi_init_device_object() and modifying acpi_bus_check_add() to make sure
that dep_unmet always gets setup there, overriding the initial 1 value.
This re-fixes the issue initially fixed by
commit
63347db0affa ("ACPI / scan: Use acpi_bus_get_status() to initialize
ACPI_TYPE_DEVICE devs"), which introduced the removed
"device->dep_unmet = 1;" statement.
This issue was noticed; and the fix tested on a Dell Venue 10 Pro 5055.
Fixes:
71da201f38df ("ACPI: scan: Defer enumeration of devices with _DEP lists")
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: 5.11+ <stable@vger.kernel.org> # 5.11+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Tue, 30 Mar 2021 17:54:22 +0000 (10:54 -0700)]
Merge tag 's390-5.12-5' of git://git./linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- fix incorrect initialization and update of vdso data pages, which
results in incorrect tod clock steering, and that
clock_gettime(CLOCK_MONOTONIC_RAW, ...) returns incorrect values.
- update MAINTAINERS for s390 vfio drivers
* tag 's390-5.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
MAINTAINERS: add backups for s390 vfio drivers
s390/vdso: fix initializing and updating of vdso_data
s390/vdso: fix tod_steering_delta type
s390/vdso: copy tod_steering_delta value to vdso_data page
Thierry Reding [Fri, 19 Mar 2021 13:17:22 +0000 (14:17 +0100)]
drm/tegra: sor: Grab runtime PM reference across reset
The SOR resets are exclusively shared with the SOR power domain. This
means that exclusive access can only be granted temporarily and in order
for that to work, a rigorous sequence must be observed. To ensure that a
single consumer gets exclusive access to a reset, each consumer must
implement a rigorous protocol using the reset_control_acquire() and
reset_control_release() functions.
However, these functions alone don't provide any guarantees at the
system level. Drivers need to ensure that the only a single consumer has
access to the reset at the same time. In order for the SOR to be able to
exclusively access its reset, it must therefore ensure that the SOR
power domain is not powered off by holding on to a runtime PM reference
to that power domain across the reset assert/deassert operation.
This used to work fine by accident, but was revealed when recently more
devices started to rely on the SOR power domain.
Fixes:
11c632e1cfd3 ("drm/tegra: sor: Implement acquire/release for reset")
Reported-by: Jonathan Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Matthew Wilcox (Oracle) [Tue, 30 Mar 2021 17:44:35 +0000 (13:44 -0400)]
radix tree test suite: Fix compilation
Commit
4bba4c4bb09a added tools/include/linux/compiler_types.h which
includes linux/compiler-gcc.h. Unfortunately, we had our own (empty)
compiler_types.h which overrode the one added by that commit, and
so we lost the definition of __must_be_array(). Removing our empty
compiler_types.h fixes the problem and reduces our divergence from the
rest of the tools.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Matthew Wilcox (Oracle) [Tue, 30 Mar 2021 17:40:27 +0000 (13:40 -0400)]
XArray: Add xa_limit_16b
A 16-bit limit is a more common limit than I had realised. Make it
generally available.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Matthew Wilcox (Oracle) [Thu, 19 Nov 2020 13:32:31 +0000 (08:32 -0500)]
XArray: Fix splitting to non-zero orders
Splitting an order-4 entry into order-2 entries would leave the array
containing pointers to
000040008000c000 instead of
000044448888cccc.
This is a one-character fix, but enhance the test suite to check this
case.
Reported-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Matthew Wilcox (Oracle) [Sat, 10 Oct 2020 15:17:44 +0000 (11:17 -0400)]
XArray: Fix split documentation
I wrote the documentation backwards; the new order of the entry is stored
in the xas and the caller passes the old entry.
Reported-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Thierry Reding [Fri, 19 Mar 2021 07:06:37 +0000 (08:06 +0100)]
drm/tegra: dc: Restore coupling of display controllers
Coupling of display controllers used to rely on runtime PM to take the
companion controller out of reset. Commit
fd67e9c6ed5a ("drm/tegra: Do
not implement runtime PM") accidentally broke this when runtime PM was
removed.
Restore this functionality by reusing the hierarchical host1x client
suspend/resume infrastructure that's similar to runtime PM and which
perfectly fits this use-case.
Fixes:
fd67e9c6ed5a ("drm/tegra: Do not implement runtime PM")
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Reported-by: Paul Fertser <fercerpav@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Mikko Perttunen [Mon, 29 Mar 2021 13:38:27 +0000 (16:38 +0300)]
gpu: host1x: Use different lock classes for each client
To avoid false lockdep warnings, give each client lock a different
lock class, passed from the initialization site by macro.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Dmitry Osipenko [Tue, 2 Mar 2021 13:15:06 +0000 (16:15 +0300)]
drm/tegra: dc: Don't set PLL clock to 0Hz
RGB output doesn't allow to change parent clock rate of the display and
PCLK rate is set to 0Hz in this case. The tegra_dc_commit_state() shall
not set the display clock to 0Hz since this change propagates to the
parent clock. The DISP clock is defined as a NODIV clock by the tegra-clk
driver and all NODIV clocks use the CLK_SET_RATE_PARENT flag.
This bug stayed unnoticed because by default PLLP is used as the parent
clock for the display controller and PLLP silently skips the erroneous 0Hz
rate changes because it always has active child clocks that don't permit
rate changes. The PLLP isn't acceptable for some devices that we want to
upstream (like Samsung Galaxy Tab and ASUS TF700T) due to a display panel
clock rate requirements that can't be fulfilled by using PLLP and then the
bug pops up in this case since parent clock is set to 0Hz, killing the
display output.
Don't touch DC clock if pclk=0 in order to fix the problem.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Sean Christopherson [Thu, 25 Mar 2021 20:01:19 +0000 (13:01 -0700)]
KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
Prevent the TDP MMU from yielding when zapping a gfn range during NX
page recovery. If a flush is pending from a previous invocation of the
zapping helper, either in the TDP MMU or the legacy MMU, but the TDP MMU
has not accumulated a flush for the current invocation, then yielding
will release mmu_lock with stale TLB entries.
That being said, this isn't technically a bug fix in the current code, as
the TDP MMU will never yield in this case. tdp_mmu_iter_cond_resched()
will yield if and only if it has made forward progress, as defined by the
current gfn vs. the last yielded (or starting) gfn. Because zapping a
single shadow page is guaranteed to (a) find that page and (b) step
sideways at the level of the shadow page, the TDP iter will break its loop
before getting a chance to yield.
But that is all very, very subtle, and will break at the slightest sneeze,
e.g. zapping while holding mmu_lock for read would break as the TDP MMU
wouldn't be guaranteed to see the present shadow page, and thus could step
sideways at a lower level.
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20210325200119.1359384-4-seanjc@google.com>
[Add lockdep assertion. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Thu, 25 Mar 2021 20:01:18 +0000 (13:01 -0700)]
KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
Honor the "flush needed" return from kvm_tdp_mmu_zap_gfn_range(), which
does the flush itself if and only if it yields (which it will never do in
this particular scenario), and otherwise expects the caller to do the
flush. If pages are zapped from the TDP MMU but not the legacy MMU, then
no flush will occur.
Fixes:
29cf0f5007a2 ("kvm: x86/mmu: NX largepage recovery for TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20210325200119.1359384-3-seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Thu, 25 Mar 2021 20:01:17 +0000 (13:01 -0700)]
KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap
When flushing a range of GFNs across multiple roots, ensure any pending
flush from a previous root is honored before yielding while walking the
tables of the current root.
Note, kvm_tdp_mmu_zap_gfn_range() now intentionally overwrites its local
"flush" with the result to avoid redundant flushes. zap_gfn_range()
preserves and return the incoming "flush", unless of course the flush was
performed prior to yielding and no new flush was triggered.
Fixes:
1af4a96025b3 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed")
Cc: stable@vger.kernel.org
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20210325200119.1359384-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Siddharth Chandrasekaran [Wed, 24 Mar 2021 12:43:47 +0000 (13:43 +0100)]
KVM: make: Fix out-of-source module builds
Building kvm module out-of-source with,
make -C $SRC O=$BIN M=arch/x86/kvm
fails to find "irq.h" as the include dir passed to cflags-y does not
prefix the source dir. Fix this by prefixing $(srctree) to the include
dir path.
Signed-off-by: Siddharth Chandrasekaran <sidcha@amazon.de>
Message-Id: <
20210324124347.18336-1-sidcha@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Tue, 23 Mar 2021 10:43:31 +0000 (11:43 +0100)]
selftests: kvm: make hardware_disable_test less verbose
hardware_disable_test produces 512 snippets like
...
main: [511] waiting semaphore
run_test: [511] start vcpus
run_test: [511] all threads launched
main: [511] waiting 368us
main: [511] killing child
and this doesn't have much value, let's print this info with pr_debug().
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20210323104331.1354800-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Tue, 23 Mar 2021 08:45:15 +0000 (09:45 +0100)]
KVM: x86/vPMU: Forbid writing to MSR_F15H_PERF MSRs when guest doesn't have X86_FEATURE_PERFCTR_CORE
MSR_F15H_PERF_CTL0-5, MSR_F15H_PERF_CTR0-5 MSRs are only available when
X86_FEATURE_PERFCTR_CORE CPUID bit was exposed to the guest. KVM, however,
allows these MSRs unconditionally because kvm_pmu_is_valid_msr() ->
amd_msr_idx_to_pmc() check always passes and because kvm_pmu_set_msr() ->
amd_pmu_set_msr() doesn't fail.
In case of a counter (CTRn), no big harm is done as we only increase
internal PMC's value but in case of an eventsel (CTLn), we go deep into
perf internals with a non-existing counter.
Note, kvm_get_msr_common() just returns '0' when these MSRs don't exist
and this also seems to contradict architectural behavior which is #GP
(I did check one old Opteron host) but changing this status quo is a bit
scarier.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20210323084515.1346540-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Dongli Zhang [Fri, 26 Mar 2021 07:03:34 +0000 (00:03 -0700)]
KVM: x86: remove unused declaration of kvm_write_tsc()
kvm_write_tsc() was renamed and made static since commit
0c899c25d754
("KVM: x86: do not attempt TSC synchronization on guest writes"). Remove
its unused declaration.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Message-Id: <
20210326070334.12310-1-dongli.zhang@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Haiwei Li [Sat, 13 Mar 2021 05:10:32 +0000 (13:10 +0800)]
KVM: clean up the unused argument
kvm_msr_ignored_check function never uses vcpu argument. Clean up the
function and invokers.
Signed-off-by: Haiwei Li <lihaiwei@tencent.com>
Message-Id: <
20210313051032.4171-1-lihaiwei.kernel@gmail.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Stefan Raspl [Thu, 25 Mar 2021 12:29:49 +0000 (13:29 +0100)]
tools/kvm_stat: Add restart delay
If this service is enabled and the system rebooted, Systemd's initial
attempt to start this unit file may fail in case the kvm module is not
loaded. Since we did not specify a delay for the retries, Systemd
restarts with a minimum delay a number of times before giving up and
disabling the service. Which means a subsequent kvm module load will
have kvm running without monitoring.
Adding a delay to fix this.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Message-Id: <
20210325122949.1433271-1-raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 30 Mar 2021 17:06:42 +0000 (13:06 -0400)]
Merge tag 'kvmarm-fixes-5.12-3' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.12, take #3
- Fix GICv3 MMIO compatibility probing
- Prevent guests from using the ARMv8.4 self-hosted tracing extension
Linus Torvalds [Tue, 30 Mar 2021 16:49:36 +0000 (09:49 -0700)]
Merge tag 'vfio-v5.12-rc6' of git://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:
- Fix pfnmap batch carryover (Daniel Jordan)
- Fix nvlink Kconfig dependency (Jason Gunthorpe)
* tag 'vfio-v5.12-rc6' of git://github.com/awilliam/linux-vfio:
vfio/nvlink: Add missing SPAPR_TCE_IOMMU depends
vfio/type1: Empty batch for pfnmap pages
Ilya Lipnitskiy [Tue, 30 Mar 2021 04:42:08 +0000 (21:42 -0700)]
mm: fix race by making init_zero_pfn() early_initcall
There are code paths that rely on zero_pfn to be fully initialized
before core_initcall. For example, wq_sysfs_init() is a core_initcall
function that eventually results in a call to kernel_execve, which
causes a page fault with a subsequent mmput. If zero_pfn is not
initialized by then it may not get cleaned up properly and result in an
error:
BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1
Here is an analysis of the race as seen on a MIPS device. On this
particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until
initialized, at which point it becomes PFN 5120:
1. wq_sysfs_init calls into kobject_uevent_env at core_initcall:
kobject_uevent_env+0x7e4/0x7ec
kset_register+0x68/0x88
bus_register+0xdc/0x34c
subsys_virtual_register+0x34/0x78
wq_sysfs_init+0x1c/0x4c
do_one_initcall+0x50/0x1a8
kernel_init_freeable+0x230/0x2c8
kernel_init+0x10/0x100
ret_from_kernel_thread+0x14/0x1c
2. kobject_uevent_env() calls call_usermodehelper_exec() which executes
kernel_execve asynchronously.
3. Memory allocations in kernel_execve cause a page fault, bumping the
MM reference counter:
add_mm_counter_fast+0xb4/0xc0
handle_mm_fault+0x6e4/0xea0
__get_user_pages.part.78+0x190/0x37c
__get_user_pages_remote+0x128/0x360
get_arg_page+0x34/0xa0
copy_string_kernel+0x194/0x2a4
kernel_execve+0x11c/0x298
call_usermodehelper_exec_async+0x114/0x194
4. In case zero_pfn has not been initialized yet, zap_pte_range does
not decrement the MM_ANONPAGES RSS counter and the BUG message is
triggered shortly afterwards when __mmdrop checks the ref counters:
__mmdrop+0x98/0x1d0
free_bprm+0x44/0x118
kernel_execve+0x160/0x1d8
call_usermodehelper_exec_async+0x114/0x194
ret_from_kernel_thread+0x14/0x1c
To avoid races such as described above, initialize init_zero_pfn at
early_initcall level. Depending on the architecture, ZERO_PAGE is
either constant or gets initialized even earlier, at paging_init, so
there is no issue with initializing zero_pfn earlier.
Link: https://lkml.kernel.org/r/CALCv0x2YqOXEAy2Q=hafjhHCtTHVodChv1qpM=niAXOpqEbt7w@mail.gmail.com
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Tested-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 30 Mar 2021 15:57:50 +0000 (08:57 -0700)]
Merge tag 'mips-fixes_5.12_3' of git://git./linux/kernel/git/mips/linux
Pull MIPS fix from Thomas Bogendoerfer:
- Fix compile error with option MIPS_ELF_APPENDED_DTB
* tag 'mips-fixes_5.12_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: kernel: setup.c: fix compilation error
Linus Torvalds [Tue, 30 Mar 2021 15:55:47 +0000 (08:55 -0700)]
Merge tag 'for-linus-5.12b-rc6-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"One Xen related security fix (XSA-371)"
* tag 'for-linus-5.12b-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen-blkback: don't leak persistent grants from xen_blkbk_map()
Steven Rostedt (VMware) [Tue, 30 Mar 2021 13:58:38 +0000 (09:58 -0400)]
ftrace: Check if pages were allocated before calling free_pages()
It is possible that on error pg->size can be zero when getting its order,
which would return a -1 value. It is dangerous to pass in an order of -1
to free_pages(). Check if order is greater than or equal to zero before
calling free_pages().
Link: https://lore.kernel.org/lkml/20210330093916.432697c7@gandalf.local.home/
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Mauri Sandberg [Mon, 29 Mar 2021 12:31:36 +0000 (15:31 +0300)]
MIPS: kernel: setup.c: fix compilation error
With ath79_defconfig enabling CONFIG_MIPS_ELF_APPENDED_DTB gives a
compilation error. This patch fixes it.
Build log:
...
CC kernel/locking/percpu-rwsem.o
../arch/mips/kernel/setup.c:46:39: error: conflicting types for
'__appended_dtb'
const char __section(".appended_dtb") __appended_dtb[0x100000];
^~~~~~~~~~~~~~
In file included from ../arch/mips/kernel/setup.c:34:
../arch/mips/include/asm/bootinfo.h:118:13: note: previous declaration
of '__appended_dtb' was here
extern char __appended_dtb[];
^~~~~~~~~~~~~~
CC fs/attr.o
make[4]: *** [../scripts/Makefile.build:271: arch/mips/kernel/setup.o]
Error 1
...
Root cause seems to be:
Fixes:
b83ba0b9df56 ("MIPS: of: Introduce helper function to get DTB")
Signed-off-by: Mauri Sandberg <sandberg@mailfence.com>
Reviewed-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: trivial@kernel.org
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Jeremy Szu [Tue, 30 Mar 2021 11:44:27 +0000 (19:44 +0800)]
ALSA: hda/realtek: fix mute/micmute LEDs for HP 640 G8
The HP EliteBook 640 G8 Notebook PC is using ALC236 codec which is
using 0x02 to control mute LED and 0x01 to control micmute LED.
Therefore, add a quirk to make it works.
Signed-off-by: Jeremy Szu <jeremy.szu@canonical.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210330114428.40490-1-jeremy.szu@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Takashi Iwai [Mon, 29 Mar 2021 11:30:59 +0000 (13:30 +0200)]
ALSA: hda: Add missing sanity checks in PM prepare/complete callbacks
The recently added PM prepare and complete callbacks don't have the
sanity check whether the card instance has been properly initialized,
which may potentially lead to Oops.
This patch adds the azx_is_pm_ready() call in each place
appropriately like other PM callbacks.
Fixes:
f5dac54d9d93 ("ALSA: hda: Separate runtime and system suspend")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210329113059.25035-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>