Matthew Wilcox (Oracle) [Mon, 21 Aug 2023 14:13:22 +0000 (15:13 +0100)]
libfs: Convert simple_write_begin and simple_write_end to use a folio
Remove a number of implicit calls to compound_head() and various calls
to compatibility functions. This is not sufficient to enable support
for large folios; generic_perform_write() must be converted first.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Message-Id: <
20230821141322.
2535459-1-willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Anh Tuan Phan [Thu, 17 Aug 2023 16:31:42 +0000 (23:31 +0700)]
fs/dcache: Replace printk and WARN_ON by WARN
Use WARN instead of printk + WARN_ON as reported from coccinelle:
./fs/dcache.c:1667:1-7: SUGGESTION: printk + WARN_ON can be just WARN
Signed-off-by: Anh Tuan Phan <tuananhlfc@gmail.com>
Message-Id: <
20230817163142.117706-1-tuananhlfc@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Colin Ian King [Fri, 18 Aug 2023 14:45:56 +0000 (15:45 +0100)]
fs/pipe: remove redundant initialization of pointer buf
The pointer buf is being initializated with a value that is never read,
it is being re-assigned later on at the pointer where it is being used.
The initialization is redundant and can be removed. Cleans up clang scan
build warning:
fs/pipe.c:492:24: warning: Value stored to 'buf' during its
initialization is never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Message-Id: <
20230818144556.
1208082-1-colin.i.king@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Matthew Wilcox (Oracle) [Fri, 18 Aug 2023 20:08:24 +0000 (21:08 +0100)]
fs: Fix kernel-doc warnings
These have a variety of causes and a corresponding variety of solutions.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Message-Id: <
20230818200824.
2720007-1-willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Matthew Wilcox (Oracle) [Fri, 18 Aug 2023 20:10:03 +0000 (21:10 +0100)]
devpts: Fix kernel-doc warnings
This documentation has bit-rotted over time.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Message-Id: <
20230818201003.
2720257-1-willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
GONG, Ruiqi [Wed, 16 Aug 2023 03:32:10 +0000 (11:32 +0800)]
doc: idmappings: fix an error and rephrase a paragraph
If the modified paragraph is referring to the idmapping mentioned in the
previous paragraph (i.e. `u0:k10000:r10000`), then it is `u0` that the
upper idmapset starts with, not `u1000`.
Fix this error and rephrase this paragraph a bit to make this reference
more explicit.
Reported-by: Wang Lei <wanglei249@huawei.com>
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Message-Id: <
20230816033210.914262-1-gongruiqi@huaweicloud.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Loic Poulain [Sun, 13 Aug 2023 08:23:49 +0000 (10:23 +0200)]
init: Add support for rootwait timeout parameter
Add an optional timeout arg to 'rootwait' as the maximum time in
seconds to wait for the root device to show up before attempting
forced mount of the root filesystem.
Use case:
In case of device mapper usage for the rootfs (e.g. root=/dev/dm-0),
if the mapper is not able to create the virtual block for any reason
(wrong arguments, bad dm-verity signature, etc), the `rootwait` param
causes the kernel to wait forever. It may however be desirable to only
wait for a given time and then panic (force mount) to cause device reset.
This gives the bootloader a chance to detect the problem and to take some
measures, such as marking the booted partition as bad (for A/B case) or
entering a recovery mode.
In success case, mounting happens as soon as the root device is ready,
unlike the existing 'rootdelay' parameter which performs an unconditional
pause.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Message-Id: <
20230813082349.513386-1-loic.poulain@linaro.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Mateusz Guzik [Fri, 11 Aug 2023 19:48:14 +0000 (21:48 +0200)]
vfs: fix up the assert in i_readcount_dec
Drops a race where 2 threads could spot a positive value and both
proceed to dec to -1, without reporting anything.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Message-Id: <
20230811194814.
1612336-1-mjguzik@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Yang Li [Fri, 11 Aug 2023 01:43:59 +0000 (09:43 +0800)]
fs: Fix one kernel-doc comment
Fix one kernel-doc comment to silence the warning:
fs/read_write.c:88: warning: Function parameter or member 'maxsize' not described in 'generic_file_llseek_size'
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Message-Id: <
20230811014359.4960-1-yang.lee@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Alexander Mikhalitsyn [Sun, 25 Jun 2023 18:20:47 +0000 (20:20 +0200)]
docs: filesystems: idmappings: clarify from where idmappings are taken
Let's clarify from where we take idmapping of each type:
- caller
- filesystem
- mount
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Message-Id: <
20230625182047.26854-1-aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Marcelo Tosatti [Tue, 27 Jun 2023 20:08:15 +0000 (17:08 -0300)]
fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs
For certain types of applications (for example PLC software or
RAN processing), upon occurrence of an event, it is necessary to
complete a certain task in a maximum amount of time (deadline).
One way to express this requirement is with a pair of numbers,
deadline time and execution time, where:
* deadline time: length of time between event and deadline.
* execution time: length of time it takes for processing of event
to occur on a particular hardware platform
(uninterrupted).
The particular values depend on use-case. For the case
where the realtime application executes in a virtualized
guest, an IPI which must be serviced in the host will cause
the following sequence of events:
1) VM-exit
2) execution of IPI (and function call)
3) VM-entry
Which causes an excess of 50us latency as observed by cyclictest
(this violates the latency requirement of vRAN application with 1ms TTI,
for example).
invalidate_bh_lrus calls an IPI on each CPU that has non empty
per-CPU cache:
on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1);
The performance when using the per-CPU LRU cache is as follows:
42 ns per __find_get_block
68 ns per __find_get_block_slow
Given that the main use cases for latency sensitive applications
do not involve block I/O (data necessary for program operation is
locked in RAM), disable per-CPU buffer_head caches for isolated CPUs.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Message-Id: <ZJtBrybavtb1x45V@tpad>
Signed-off-by: Christian Brauner <brauner@kernel.org>
David Howells [Tue, 8 Aug 2023 11:34:20 +0000 (07:34 -0400)]
vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing
When NFS superblocks are created by automounting, their LSM parameters
aren't set in the fs_context struct prior to sget_fc() being called,
leading to failure to match existing superblocks.
This bug leads to messages like the following appearing in dmesg when
fscache is enabled:
NFS: Cache volume key already in use (nfs,4.2,2,108,
106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1)
Fix this by adding a new LSM hook to load fc->security for submount
creation.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/
Fixes: 9bc61ab18b1d ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.")
Fixes: 779df6a5480f ("NFS: Ensure security label is set for root inode")
Tested-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: "Christian Brauner (Microsoft)" <brauner@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <
20230808-master-v9-1-
e0ecde888221@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Christoph Hellwig [Tue, 8 Aug 2023 16:17:04 +0000 (09:17 -0700)]
fs: unexport d_genocide
d_genocide is only used by built-in code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Message-Id: <
20230808161704.
1099680-1-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Linus Torvalds [Tue, 8 Aug 2023 17:26:35 +0000 (19:26 +0200)]
fs: use __fput_sync in close(2)
close(2) is a special case which guarantees a shallow kernel stack,
making delegation to task_work machinery unnecessary. Said delegation is
problematic as it involves atomic ops and interrupt masking trips, none
of which are cheap on x86-64. Forcing close(2) to do it looks like an
oversight in the original work.
Moreover presence of CONFIG_RSEQ adds an additional overhead as fput()
-> task_work_add(..., TWA_RESUME) -> set_notify_resume() makes the
thread returning to userspace land in resume_user_mode_work(), where
rseq_handle_notify_resume takes a SMAP round-trip if rseq is enabled for
the thread (and it is by default with contemporary glibc).
Sample result when benchmarking open1_processes -t 1 from will-it-scale
(that's an open + close loop) + tmpfs on /tmp, running on the Sapphire
Rapid CPU (ops/s):
stock+RSEQ:
1329857
stock-RSEQ:
1421667 (+7%)
patched:
1523521 (+14.5% / +7%) (with / without rseq)
Patched result is the same regardless of rseq as the codepath is avoided.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Mateusz Guzik [Thu, 27 Jul 2023 11:38:09 +0000 (13:38 +0200)]
file: mostly eliminate spurious relocking in __range_close
Stock code takes a lock trip for every fd in range, but this can be
trivially avoided and real-world consumers do have plenty of already
closed cases.
Just booting Debian 12 with a debug printk shows:
(sh) min 3 max 17 closed 15 empty 0
(sh) min 19 max 63 closed 31 empty 14
(sh) min 4 max 63 closed 0 empty 60
(spawn) min 3 max 63 closed 13 empty 48
(spawn) min 3 max 63 closed 13 empty 48
(mount) min 3 max 17 closed 15 empty 0
(mount) min 19 max 63 closed 32 empty 13
and so on.
While here use more idiomatic naming.
An avoidable relock is left in place to avoid uglifying the code.
The code was not switched to bitmap traversal for the same reason.
Tested with ltp kernel/syscalls/close_range
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Message-Id: <
20230727113809.800067-1-mjguzik@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Zhu Wang [Mon, 31 Jul 2023 11:25:33 +0000 (19:25 +0800)]
fs/ecryptfs: remove kernel-doc warnings
Remove kernel-doc warnings:
fs/ecryptfs/mmap.c:270: warning: Excess function parameter 'flags'
description in 'ecryptfs_write_begin'
Message-ID: <
20230731112533.214216-1-wangzhu9@huawei.com>
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Zhen Lei [Wed, 26 Jul 2023 03:21:35 +0000 (11:21 +0800)]
epoll: simplify ep_alloc()
The get_current_user() does not fail, and moving it after kzalloc() can
simplify the code a bit.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Message-Id: <
20230726032135.933-1-thunder.leizhen@huaweicloud.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Wang Ming [Thu, 13 Jul 2023 12:05:42 +0000 (20:05 +0800)]
fs: Fix error checking for d_hash_and_lookup()
The d_hash_and_lookup() function returns error pointers or NULL.
Most incorrect error checks were fixed, but the one in int path_pts()
was forgotten.
Fixes: eedf265aa003 ("devpts: Make each mount of devpts an independent filesystem.")
Signed-off-by: Wang Ming <machel@vivo.com>
Message-Id: <
20230713120555.7025-1-machel@vivo.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Christian Brauner [Wed, 12 Jul 2023 18:58:49 +0000 (20:58 +0200)]
attr: block mode changes of symlinks
Changing the mode of symlinks is meaningless as the vfs doesn't take the
mode of a symlink into account during path lookup permission checking.
However, the vfs doesn't block mode changes on symlinks. This however,
has lead to an untenable mess roughly classifiable into the following
two categories:
(1) Filesystems that don't implement a i_op->setattr() for symlinks.
Such filesystems may or may not know that without i_op->setattr()
defined, notify_change() falls back to simple_setattr() causing the
inode's mode in the inode cache to be changed.
That's a generic issue as this will affect all non-size changing
inode attributes including ownership changes.
Example: afs
(2) Filesystems that fail with EOPNOTSUPP but change the mode of the
symlink nonetheless.
Some filesystems will happily update the mode of a symlink but still
return EOPNOTSUPP. This is the biggest source of confusion for
userspace.
The EOPNOTSUPP in this case comes from POSIX ACLs. Specifically it
comes from filesystems that call posix_acl_chmod(), e.g., btrfs via
if (!err && attr->ia_valid & ATTR_MODE)
err = posix_acl_chmod(idmap, dentry, inode->i_mode);
Filesystems including btrfs don't implement i_op->set_acl() so
posix_acl_chmod() will report EOPNOTSUPP.
When posix_acl_chmod() is called, most filesystems will have
finished updating the inode.
Perversely, this has the consequences that this behavior may depend
on two kconfig options and mount options:
* CONFIG_POSIX_ACL={y,n}
* CONFIG_${FSTYPE}_POSIX_ACL={y,n}
* Opt_acl, Opt_noacl
Example: btrfs, ext4, xfs
The only way to change the mode on a symlink currently involves abusing
an O_PATH file descriptor in the following manner:
fd = openat(-1, "/path/to/link", O_CLOEXEC | O_PATH | O_NOFOLLOW);
char path[PATH_MAX];
snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
chmod(path, 0000);
But for most major filesystems with POSIX ACL support such as btrfs,
ext4, ceph, tmpfs, xfs and others this will fail with EOPNOTSUPP with
the mode still updated due to the aforementioned posix_acl_chmod()
nonsense.
So, given that for all major filesystems this would fail with EOPNOTSUPP
and that both glibc (cf. [1]) and musl (cf. [2]) outright block mode
changes on symlinks we should just try and block mode changes on
symlinks directly in the vfs and have a clean break with this nonsense.
If this causes any regressions, we do the next best thing and fix up all
filesystems that do return EOPNOTSUPP with the mode updated to not call
posix_acl_chmod() on symlinks.
But as usual, let's try the clean cut solution first. It's a simple
patch that can be easily reverted. Not marking this for backport as I'll
do that manually if we're reasonably sure that this works and there are
no strong objections.
We could block this in chmod_common() but it's more appropriate to do it
notify_change() as it will also mean that we catch filesystems that
change symlink permissions explicitly or accidently.
Similar proposals were floated in the past as in [3] and [4] and again
recently in [5]. There's also a couple of bugs about this inconsistency
as in [6] and [7].
Link: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=99527a3727e44cb8661ee1f743068f108ec93979;hb=HEAD
Link: https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c
Link: https://lore.kernel.org/all/20200911065733.GA31579@infradead.org
Link: https://sourceware.org/legacy-ml/libc-alpha/2020-02/msg00518.html
Link: https://lore.kernel.org/lkml/87lefmbppo.fsf@oldenburg.str.redhat.com
Link: https://sourceware.org/legacy-ml/libc-alpha/2020-02/msg00467.html
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=14578#c17
Reviewed-by: Aleksa Sarai <cyphar@cyphar.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # please backport to all LTSes but not before v6.6-rc2 is tagged
Suggested-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Message-Id: <
20230712-vfs-chmod-symlinks-v2-1-
08cfb92b61dd@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Wen Yang [Sun, 9 Jul 2023 06:54:51 +0000 (14:54 +0800)]
eventfd: prevent underflow for eventfd semaphores
For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling
eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX.
An underflow can happen with EFD_SEMAPHORE eventfds in at least the
following three subsystems:
(1) virt/kvm/eventfd.c
(2) drivers/vfio/virqfd.c
(3) drivers/virt/acrn/irqfd.c
where (2) and (3) are just modeled after (1). An eventfd must be
specified for use with the KVM_IRQFD ioctl(). This can also be an
EFD_SEMAPHORE eventfd. When the eventfd count is zero or has been
decremented to zero an underflow can be triggered when the irqfd is shut
down by raising the KVM_IRQFD_FLAG_DEASSIGN flag in the KVM_IRQFD
ioctl():
// ctx->count == 0
kvm_vm_ioctl()
-> kvm_irqfd()
-> kvm_irqfd_deassign()
-> irqfd_deactivate()
-> irqfd_shutdown()
-> eventfd_ctx_remove_wait_queue(&cnt)
-> eventfd_ctx_do_read(&cnt)
Userspace polling on the eventfd wouldn't notice the underflow because 1
is always returned as the value from eventfd_read() while ctx->count
would've underflowed. It's not a huge deal because this should only be
happening when the irqfd is shutdown but we should still fix it and
avoid the spurious wakeup.
Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")
Signed-off-by: Wen Yang <wenyang.linux@foxmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dylan Yudaken <dylany@fb.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <tencent_7588DFD1F365950A757310D764517A14B306@qq.com>
[brauner: rewrite commit message and add explanation how this underflow can happen]
Signed-off-by: Christian Brauner <brauner@kernel.org>
Luca Vizzarro [Wed, 15 Feb 2023 15:50:05 +0000 (15:50 +0000)]
dnotify: Pass argument of fcntl_dirnotify as int
The interface for fcntl expects the argument passed for the command
F_DIRNOTIFY to be of type int. The current code wrongly treats it as
a long. In order to avoid access to undefined bits, we should explicitly
cast the argument to int.
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Kevin Brodsky <Kevin.Brodsky@arm.com>
Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: David Laight <David.Laight@ACULAB.com>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-morello@op-lists.linaro.org
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luca Vizzarro <Luca.Vizzarro@arm.com>
Message-Id: <
20230414152459.816046-6-Luca.Vizzarro@arm.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Luca Vizzarro [Wed, 1 Feb 2023 16:18:53 +0000 (16:18 +0000)]
pipe: Pass argument of pipe_fcntl as int
The interface for fcntl expects the argument passed for the command
F_SETPIPE_SZ to be of type int. The current code wrongly treats it as
a long. In order to avoid access to undefined bits, we should explicitly
cast the argument to int.
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Kevin Brodsky <Kevin.Brodsky@arm.com>
Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: David Laight <David.Laight@ACULAB.com>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-morello@op-lists.linaro.org
Signed-off-by: Luca Vizzarro <Luca.Vizzarro@arm.com>
Message-Id: <
20230414152459.816046-4-Luca.Vizzarro@arm.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Luca Vizzarro [Wed, 1 Feb 2023 15:05:33 +0000 (15:05 +0000)]
fs: Pass argument to fcntl_setlease as int
The interface for fcntl expects the argument passed for the command
F_SETLEASE to be of type int. The current code wrongly treats it as
a long. In order to avoid access to undefined bits, we should explicitly
cast the argument to int.
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Kevin Brodsky <Kevin.Brodsky@arm.com>
Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: David Laight <David.Laight@ACULAB.com>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Cc: linux-nfs@vger.kernel.org
Cc: linux-morello@op-lists.linaro.org
Signed-off-by: Luca Vizzarro <Luca.Vizzarro@arm.com>
Message-Id: <
20230414152459.816046-3-Luca.Vizzarro@arm.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Luca Vizzarro [Thu, 2 Feb 2023 11:58:31 +0000 (11:58 +0000)]
fcntl: Cast commands with int args explicitly
According to the fcntl API specification commands that expect an
integer, hence not a pointer, always take an int and not long. In
order to avoid access to undefined bits, we should explicitly cast
the argument to int.
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Kevin Brodsky <Kevin.Brodsky@arm.com>
Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: David Laight <David.Laight@ACULAB.com>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-morello@op-lists.linaro.org
Signed-off-by: Luca Vizzarro <Luca.Vizzarro@arm.com>
Message-Id: <
20230414152459.816046-2-Luca.Vizzarro@arm.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Ahelenia Ziemiańska [Mon, 3 Jul 2023 14:42:21 +0000 (16:42 +0200)]
splice: fsnotify_access(in), fsnotify_modify(out) on success in tee
Same logic applies here: this can fill up the pipe, and pollers that rely
on getting IN_MODIFY notifications never wake up.
Fixes: 983652c69199 ("splice: report related fsnotify events")
Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u
Link: https://bugs.debian.org/1039488
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Acked-by: Jan Kara <jack@suse.cz>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <
10d76dd8c85017ae3cd047c9b9a32e26daefdaa2.
1688393619.git.nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Ahelenia Ziemiańska [Mon, 3 Jul 2023 14:42:17 +0000 (16:42 +0200)]
splice: fsnotify_access(fd)/fsnotify_modify(fd) in vmsplice
Same logic applies here: this can fill up the pipe and pollers that rely
on getting IN_MODIFY notifications never wake up.
Fixes: 983652c69199 ("splice: report related fsnotify events")
Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u
Link: https://bugs.debian.org/1039488
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Acked-by: Jan Kara <jack@suse.cz>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <
8d9ad5acb9c5c1dd2376a2ff5da6ac3183115389.
1688393619.git.nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Ahelenia Ziemiańska [Mon, 3 Jul 2023 14:42:13 +0000 (16:42 +0200)]
splice: always fsnotify_access(in), fsnotify_modify(out) on success
The current behaviour caused an asymmetry where some write APIs
(write, sendfile) would notify the written-to/read-from objects,
but splice wouldn't.
This affected userspace which uses inotify, most notably coreutils
tail -f, to monitor pipes.
If the pipe buffer had been filled by a splice-family function:
* tail wouldn't know and thus wouldn't service the pipe, and
* all writes to the pipe would block because it's full,
thus service was denied.
(For the particular case of tail -f this could be worked around
with ---disable-inotify.)
Fixes: 983652c69199 ("splice: report related fsnotify events")
Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u
Link: https://bugs.debian.org/1039488
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Acked-by: Jan Kara <jack@suse.cz>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <
604ec704d933e0e0121d9e107ce914512e045fad.
1688393619.git.nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Matthew Wilcox [Sun, 4 Jun 2023 11:16:06 +0000 (12:16 +0100)]
reiserfs: Check the return value from __getblk()
__getblk() can return a NULL pointer if we run out of memory or if we
try to access beyond the end of the device; check it and handle it
appropriately.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/lkml/CAFcO6XOacq3hscbXevPQP7sXRoYFz34ZdKPYjmd6k5sZuhGFDw@mail.gmail.com/
Tested-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") # probably introduced in 2002
Acked-by: Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Fabio M. De Francesco [Wed, 26 Apr 2023 17:22:23 +0000 (19:22 +0200)]
fs/ecryptfs: Use kmap_local_page() in copy_up_encrypted_with_header()
kmap_atomic() has been deprecated in favor of kmap_local_page().
Therefore, replace kmap_atomic() with kmap_local_page() in
ecryptfs_copy_up_encrypted_with_header().
kmap_atomic() is implemented like a kmap_local_page() which also
disables page-faults and preemption (the latter only in !PREEMPT_RT
kernels). The kernel virtual addresses returned by these two API are
only valid in the context of the callers (i.e., they cannot be handed to
other threads).
With kmap_local_page() the mappings are per thread and CPU local like
in kmap_atomic(); however, they can handle page-faults and can be called
from any context (including interrupts). The tasks that call
kmap_local_page() can be preempted and, when they are scheduled to run
again, the kernel virtual addresses are restored and are still valid.
In ecryptfs_copy_up_encrypted_with_header(), the block of code between
the mapping and un-mapping does not depend on the above-mentioned side
effects of kmap_aatomic(), so that the mere replacements of the old API
with the new one is all that is required (i.e., there is no need to
explicitly call pagefault_disable() and/or preempt_disable()).
Tested in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with
HIGHMEM64GB enabled.
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
Message-Id: <
20230426172223.8896-4-fmdefrancesco@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Fabio M. De Francesco [Wed, 26 Apr 2023 17:22:22 +0000 (19:22 +0200)]
fs/ecryptfs: Use kmap_local_page() in ecryptfs_write()
kmap_atomic() is deprecated in favor of kmap_local_page().
Therefore, replace kmap_atomic() with kmap_local_page() in
ecryptfs_write().
kmap_atomic() is implemented like kmap_local_page() which also disables
page-faults and preemption (the latter only for !PREEMPT_RT kernels).
The code within the mapping/un-mapping in ecryptfs_write() does not
depend on the above-mentioned side effects so that a mere replacement of
the old API with the new one is all that is required (i.e., there is no
need to explicitly call pagefault_disable() and/or preempt_disable()).
Tested in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with
HIGHMEM64GB enabled.
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
Message-Id: <
20230426172223.8896-3-fmdefrancesco@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Fabio M. De Francesco [Wed, 26 Apr 2023 17:22:21 +0000 (19:22 +0200)]
fs/ecryptfs: Replace kmap() with kmap_local_page()
kmap() has been deprecated in favor of kmap_local_page().
Therefore, replace kmap() with kmap_local_page() in fs/ecryptfs.
There are two main problems with kmap(): (1) It comes with an overhead as
the mapping space is restricted and protected by a global lock for
synchronization and (2) it also requires global TLB invalidation when the
kmap’s pool wraps and it might block when the mapping space is fully
utilized until a slot becomes available.
With kmap_local_page() the mappings are per thread, CPU local, can take
page faults, and can be called from any context (including interrupts).
It is faster than kmap() in kernels with HIGHMEM enabled. The tasks can
be preempted and, when they are scheduled to run again, the kernel
virtual addresses are restored and still valid.
Obviously, thread locality implies that the kernel virtual addresses
returned by kmap_local_page() are only valid in the context of the
callers (i.e., they cannot be handed to other threads).
The use of kmap_local_page() in fs/ecryptfs does not break the
above-mentioned assumption, so it is allowed and preferred.
Tested in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with
HIGHMEM64GB enabled.
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
Message-Id: <
20230426172223.8896-2-fmdefrancesco@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Linus Torvalds [Sun, 9 Jul 2023 20:53:13 +0000 (13:53 -0700)]
Linux 6.5-rc1
Linus Torvalds [Sun, 9 Jul 2023 17:29:53 +0000 (10:29 -0700)]
MAINTAINERS 2: Electric Boogaloo
We just sorted the entries and fields last release, so just out of a
perverse sense of curiosity, I decided to see if we can keep things
ordered for even just one release.
The answer is "No. No we cannot".
I suggest that all kernel developers will need weekly training sessions,
involving a lot of Big Bird and Sesame Street. And at the yearly
maintainer summit, we will all sing the alphabet song together.
I doubt I will keep doing this. At some point "perverse sense of
curiosity" turns into just a cold dark place filled with sadness and
despair.
Repeats:
80e62bc8487b ("MAINTAINERS: re-sort all entries and fields")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 9 Jul 2023 17:24:22 +0000 (10:24 -0700)]
Merge tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- swiotlb area sizing fixes (Petr Tesarik)
* tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: reduce the number of areas to match actual memory pool size
swiotlb: always set the number of areas before allocating the pool
Linus Torvalds [Sun, 9 Jul 2023 17:16:04 +0000 (10:16 -0700)]
Merge tag 'irq_urgent_for_v6.5_rc1' of git://git./linux/kernel/git/tip/tip
Pull irq update from Borislav Petkov:
- Optimize IRQ domain's name assignment
* tag 'irq_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqdomain: Use return value of strreplace()
Linus Torvalds [Sun, 9 Jul 2023 17:13:32 +0000 (10:13 -0700)]
Merge tag 'x86_urgent_for_v6.5_rc1' of git://git./linux/kernel/git/tip/tip
Pull x86 fpu fix from Borislav Petkov:
- Do FPU AP initialization on Xen PV too which got missed by the recent
boot reordering work
* tag 'x86_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/xen: Fix secondary processors' FPU initialization
Linus Torvalds [Sun, 9 Jul 2023 17:08:38 +0000 (10:08 -0700)]
Merge tag 'x86-core-2023-07-09' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single fix for the mechanism to park CPUs with an INIT IPI.
On shutdown or kexec, the kernel tries to park the non-boot CPUs with
an INIT IPI. But the same code path is also used by the crash utility.
If the CPU which panics is not the boot CPU then it sends an INIT IPI
to the boot CPU which resets the machine.
Prevent this by validating that the CPU which runs the stop mechanism
is the boot CPU. If not, leave the other CPUs in HLT"
* tag 'x86-core-2023-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Don't send INIT to boot CPU
Linus Torvalds [Sun, 9 Jul 2023 17:02:49 +0000 (10:02 -0700)]
Merge tag 'mips_6.5_1' of git://git./linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- fixes for KVM
- fix for loongson build and cpu probing
- DT fixes
* tag 'mips_6.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: kvm: Fix build error with KVM_MIPS_DEBUG_COP0_COUNTERS enabled
MIPS: dts: add missing space before {
MIPS: Loongson: Fix build error when make modules_install
MIPS: KVM: Fix NULL pointer dereference
MIPS: Loongson: Fix cpu_probe_loongson() again
Linus Torvalds [Sun, 9 Jul 2023 16:50:42 +0000 (09:50 -0700)]
Merge tag 'xfs-6.5-merge-6' of git://git./fs/xfs/xfs-linux
Pull xfs fix from Darrick Wong:
"Nothing exciting here, just getting rid of a gcc warning that I got
tired of seeing when I turn on gcov"
* tag 'xfs-6.5-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix uninit warning in xfs_growfs_data
Linus Torvalds [Sun, 9 Jul 2023 16:45:32 +0000 (09:45 -0700)]
Merge tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- fix potential use after free in unmount
- minor cleanup
- add worker to cleanup stale directory leases
* tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Add a laundromat thread for cached directories
smb: client: remove redundant pointer 'server'
cifs: fix session state transition to avoid use-after-free issue
Linus Torvalds [Sun, 9 Jul 2023 16:35:51 +0000 (09:35 -0700)]
Merge tag 'ntb-6.5' of https://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
"Fixes for pci_clean_master, error handling in driver inits, and
various other issues/bugs"
* tag 'ntb-6.5' of https://github.com/jonmason/ntb:
ntb: hw: amd: Fix debugfs_create_dir error checking
ntb.rst: Fix copy and paste error
ntb_netdev: Fix module_init problem
ntb: intel: Remove redundant pci_clear_master
ntb: epf: Remove redundant pci_clear_master
ntb_hw_amd: Remove redundant pci_clear_master
ntb: idt: drop redundant pci_enable_pcie_error_reporting()
MAINTAINERS: git://github -> https://github.com for jonmason
NTB: EPF: fix possible memory leak in pci_vntb_probe()
NTB: ntb_tool: Add check for devm_kcalloc
NTB: ntb_transport: fix possible memory leak while device_register() fails
ntb: intel: Fix error handling in intel_ntb_pci_driver_init()
NTB: amd: Fix error handling in amd_ntb_pci_driver_init()
ntb: idt: Fix error handling in idt_pci_driver_init()
Hugh Dickins [Sat, 8 Jul 2023 23:04:00 +0000 (16:04 -0700)]
mm: lock newly mapped VMA with corrected ordering
Lockdep is certainly right to complain about
(&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
but task is already holding lock:
(&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db
Invert those to the usual ordering.
Fixes: 33313a747e81 ("mm: lock newly mapped VMA which can be modified after it becomes visible")
Cc: stable@vger.kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 8 Jul 2023 21:30:25 +0000 (14:30 -0700)]
Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git./linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"16 hotfixes. Six are cc:stable and the remainder address post-6.4
issues"
The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since
it was all hopefully fixed in mainline.
* tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
lib: dhry: fix sleeping allocations inside non-preemptable section
kasan, slub: fix HW_TAGS zeroing with slub_debug
kasan: fix type cast in memory_is_poisoned_n
mailmap: add entries for Heiko Stuebner
mailmap: update manpage link
bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
MAINTAINERS: add linux-next info
mailmap: add Markus Schneider-Pargmann
writeback: account the number of pages written back
mm: call arch_swap_restore() from do_swap_page()
squashfs: fix cache race with migration
mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
docs: update ocfs2-devel mailing list address
MAINTAINERS: update ocfs2-devel mailing list address
mm: disable CONFIG_PER_VMA_LOCK until its fixed
fork: lock VMAs of the parent process when forking
Suren Baghdasaryan [Sat, 8 Jul 2023 19:12:12 +0000 (12:12 -0700)]
fork: lock VMAs of the parent process when forking
When forking a child process, the parent write-protects anonymous pages
and COW-shares them with the child being forked using copy_present_pte().
We must not take any concurrent page faults on the source vma's as they
are being processed, as we expect both the vma and the pte's behind it
to be stable. For example, the anon_vma_fork() expects the parents
vma->anon_vma to not change during the vma copy.
A concurrent page fault on a page newly marked read-only by the page
copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
source vma, defeating the anon_vma_clone() that wasn't done because the
parent vma originally didn't have an anon_vma, but we now might end up
copying a pte entry for a page that has one.
Before the per-vma lock based changes, the mmap_lock guaranteed
exclusion with concurrent page faults. But now we need to do a
vma_start_write() to make sure no concurrent faults happen on this vma
while it is being processed.
This fix can potentially regress some fork-heavy workloads. Kernel
build time did not show noticeable regression on a 56-core machine while
a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Suren Baghdasaryan [Sat, 8 Jul 2023 19:12:11 +0000 (12:12 -0700)]
mm: lock newly mapped VMA which can be modified after it becomes visible
mmap_region adds a newly created VMA into VMA tree and might modify it
afterwards before dropping the mmap_lock. This poses a problem for page
faults handled under per-VMA locks because they don't take the mmap_lock
and can stumble on this VMA while it's still being modified. Currently
this does not pose a problem since post-addition modifications are done
only for file-backed VMAs, which are not handled under per-VMA lock.
However, once support for handling file-backed page faults with per-VMA
locks is added, this will become a race.
Fix this by write-locking the VMA before inserting it into the VMA tree.
Other places where a new VMA is added into VMA tree do not modify it
after the insertion, so do not need the same locking.
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Suren Baghdasaryan [Sat, 8 Jul 2023 19:12:10 +0000 (12:12 -0700)]
mm: lock a vma before stack expansion
With recent changes necessitating mmap_lock to be held for write while
expanding a stack, per-VMA locks should follow the same rules and be
write-locked to prevent page faults into the VMA being expanded. Add
the necessary locking.
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 8 Jul 2023 19:35:18 +0000 (12:35 -0700)]
Merge tag 'scsi-misc' of git://git./linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"A few late arriving patches that missed the initial pull request. It's
mostly bug fixes (the dt-bindings is a fix for the initial pull)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Remove unused function declaration
scsi: target: docs: Remove tcm_mod_builder.py
scsi: target: iblock: Quiet bool conversion warning with pr_preempt use
scsi: dt-bindings: ufs: qcom: Fix ICE phandle
scsi: core: Simplify scsi_cdl_check_cmd()
scsi: isci: Fix comment typo
scsi: smartpqi: Replace one-element arrays with flexible-array members
scsi: target: tcmu: Replace strlcpy() with strscpy()
scsi: ncr53c8xx: Replace strlcpy() with strscpy()
scsi: lpfc: Fix lpfc_name struct packing
Linus Torvalds [Sat, 8 Jul 2023 19:28:00 +0000 (12:28 -0700)]
Merge tag 'i2c-for-6.5-rc1-part2' of git://git./linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
- xiic patch should have been in the original pull but slipped through
- mpc patch fixes a build regression
- nomadik cleanup
* tag 'i2c-for-6.5-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mpc: Drop unused variable
i2c: nomadik: Remove a useless call in the remove function
i2c: xiic: Don't try to handle more interrupt events after error
Linus Torvalds [Sat, 8 Jul 2023 19:08:39 +0000 (12:08 -0700)]
Merge tag 'hardening-v6.5-rc1-fixes' of git://git./linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- Check for NULL bdev in LoadPin (Matthias Kaehlcke)
- Revert unwanted KUnit FORTIFY build default
- Fix 1-element array causing boot warnings with xhci-hub
* tag 'hardening-v6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
usb: ch9: Replace bmSublinkSpeedAttr 1-element array with flexible array
Revert "fortify: Allow KUnit test to build without FORTIFY"
dm: verity-loadpin: Add NULL pointer check for 'bdev' parameter
Anup Sharma [Fri, 12 May 2023 20:24:34 +0000 (01:54 +0530)]
ntb: hw: amd: Fix debugfs_create_dir error checking
The debugfs_create_dir function returns ERR_PTR in case of error, and the
only correct way to check if an error occurred is 'IS_ERR' inline function.
This patch will replace the null-comparison with IS_ERR.
Signed-off-by: Anup Sharma <anupnewsmail@gmail.com>
Suggested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Linus Torvalds [Sat, 8 Jul 2023 17:21:51 +0000 (10:21 -0700)]
Merge tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git./linux/kernel/git/perf/perf-tools-next
Pull more perf tools updates from Namhyung Kim:
"These are remaining changes and fixes for this cycle.
Build:
- Allow generating vmlinux.h from BTF using `make GEN_VMLINUX_H=1`
and skip if the vmlinux has no BTF.
- Replace deprecated clang -target xxx option by --target=xxx.
perf record:
- Print event attributes with well known type and config symbols in
the debug output like below:
# perf record -e cycles,cpu-clock -C0 -vv true
<SNIP>
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
- Update AMD IBS event error message since it now support per-process
profiling but no priviledge filters.
$ sudo perf record -e ibs_op//k -C 0
Error:
AMD IBS doesn't support privilege filtering. Try again without
the privilege modifiers (like 'k') at the end.
perf lock contention:
- Support CSV style output using -x option
$ sudo perf lock con -ab -x, sleep 1
# output: contended, total wait, max wait, avg wait, type, caller
19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
8, 67608, 27404, 8451, spinlock, __queue_work+0x174
3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
- Add --output option to save the data to a file not to be interfered
by other debug messages.
Test:
- Fix event parsing test on ARM where there's no raw PMU nor supports
PERF_PMU_CAP_EXTENDED_HW_TYPE.
- Update the lock contention test case for CSV output.
- Fix a segfault in the daemon command test.
Vendor events (JSON):
- Add has_event() to check if the given event is available on system
at runtime. On Intel machines, some transaction events may not be
present when TSC extensions are disabled.
- Update Intel event metrics.
Misc:
- Sort symbols by name using an external array of pointers instead of
a rbtree node in the symbol. This will save 16-bytes or 24-bytes
per symbol whether the sorting is actually requested or not.
- Fix unwinding DWARF callstacks using libdw when --symfs option is
used"
* tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (38 commits)
perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported.
perf test: Fix event parsing test on Arm
perf evsel amd: Fix IBS error message
perf: unwind: Fix symfs with libdw
perf symbol: Fix uninitialized return value in symbols__find_by_name()
perf test: Test perf lock contention CSV output
perf lock contention: Add --output option
perf lock contention: Add -x option for CSV style output
perf lock: Remove stale comments
perf vendor events intel: Update tigerlake to 1.13
perf vendor events intel: Update skylakex to 1.31
perf vendor events intel: Update skylake to 57
perf vendor events intel: Update sapphirerapids to 1.14
perf vendor events intel: Update icelakex to 1.21
perf vendor events intel: Update icelake to 1.19
perf vendor events intel: Update cascadelakex to 1.19
perf vendor events intel: Update meteorlake to 1.03
perf vendor events intel: Add rocketlake events/metrics
perf vendor metrics intel: Make transaction metrics conditional
perf jevents: Support for has_event function
...
Linus Torvalds [Sat, 8 Jul 2023 17:02:24 +0000 (10:02 -0700)]
Merge tag 'bitmap-6.5-rc1' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:
"Fixes for different bitmap pieces:
- lib/test_bitmap: increment failure counter properly
The tests that don't use expect_eq() macro to determine that a test
is failured must increment failed_tests explicitly.
- lib/bitmap: drop optimization of bitmap_{from,to}_arr64
bitmap_{from,to}_arr64() optimization is overly optimistic
on 32-bit LE architectures when it's wired to
bitmap_copy_clear_tail().
- nodemask: Drop duplicate check in for_each_node_mask()
As the return value type of first_node() became unsigned, the node
>= 0 became unnecessary.
- cpumask: fix function description kernel-doc notation
- MAINTAINERS: Add bits.h and bitfield.h to the BITMAP API record
Add linux/bits.h and linux/bitfield.h for visibility"
* tag 'bitmap-6.5-rc1' of https://github.com/norov/linux:
MAINTAINERS: Add bitfield.h to the BITMAP API record
MAINTAINERS: Add bits.h to the BITMAP API record
cpumask: fix function description kernel-doc notation
nodemask: Drop duplicate check in for_each_node_mask()
lib/bitmap: drop optimization of bitmap_{from,to}_arr64
lib/test_bitmap: increment failure counter properly
Geert Uytterhoeven [Wed, 5 Jul 2023 14:54:04 +0000 (16:54 +0200)]
lib: dhry: fix sleeping allocations inside non-preemptable section
The Smatch static checker reports the following warnings:
lib/dhry_run.c:38 dhry_benchmark() warn: sleeping in atomic context
lib/dhry_run.c:43 dhry_benchmark() warn: sleeping in atomic context
Indeed, dhry() does sleeping allocations inside the non-preemptable
section delimited by get_cpu()/put_cpu().
Fix this by using atomic allocations instead.
Add error handling, as atomic these allocations may fail.
Link: https://lkml.kernel.org/r/bac6d517818a7cd8efe217c1ad649fffab9cc371.1688568764.git.geert+renesas@glider.be
Fixes: 13684e966d46283e ("lib: dhry: fix unstable smp_processor_id(_) usage")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/0469eb3a-02eb-4b41-b189-de20b931fa56@moroto.mountain
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrey Konovalov [Wed, 5 Jul 2023 12:44:02 +0000 (14:44 +0200)]
kasan, slub: fix HW_TAGS zeroing with slub_debug
Commit
946fa0dbf2d8 ("mm/slub: extend redzone check to extra allocated
kmalloc space than requested") added precise kmalloc redzone poisoning to
the slub_debug functionality.
However, this commit didn't account for HW_TAGS KASAN fully initializing
the object via its built-in memory initialization feature. Even though
HW_TAGS KASAN memory initialization contains special memory initialization
handling for when slub_debug is enabled, it does not account for in-object
slub_debug redzones. As a result, HW_TAGS KASAN can overwrite these
redzones and cause false-positive slub_debug reports.
To fix the issue, avoid HW_TAGS KASAN memory initialization when
slub_debug is enabled altogether. Implement this by moving the
__slub_debug_enabled check to slab_post_alloc_hook. Common slab code
seems like a more appropriate place for a slub_debug check anyway.
Link: https://lkml.kernel.org/r/678ac92ab790dba9198f9ca14f405651b97c8502.1688561016.git.andreyknvl@google.com
Fixes: 946fa0dbf2d8 ("mm/slub: extend redzone check to extra allocated kmalloc space than requested")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Will Deacon <will@kernel.org>
Acked-by: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: kasan-dev@googlegroups.com
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrey Konovalov [Tue, 4 Jul 2023 00:52:05 +0000 (02:52 +0200)]
kasan: fix type cast in memory_is_poisoned_n
Commit
bb6e04a173f0 ("kasan: use internal prototypes matching gcc-13
builtins") introduced a bug into the memory_is_poisoned_n implementation:
it effectively removed the cast to a signed integer type after applying
KASAN_GRANULE_MASK.
As a result, KASAN started failing to properly check memset, memcpy, and
other similar functions.
Fix the bug by adding the cast back (through an additional signed integer
variable to make the code more readable).
Link: https://lkml.kernel.org/r/8c9e0251c2b8b81016255709d4ec42942dcaf018.1688431866.git.andreyknvl@google.com
Fixes: bb6e04a173f0 ("kasan: use internal prototypes matching gcc-13 builtins")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Heiko Stuebner [Tue, 4 Jul 2023 16:39:19 +0000 (18:39 +0200)]
mailmap: add entries for Heiko Stuebner
I am going to lose my vrull.eu address at the end of july, and while
adding it to mailmap I also realised that there are more old addresses
from me dangling, so update .mailmap for all of them.
Link: https://lkml.kernel.org/r/20230704163919.1136784-3-heiko@sntech.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Heiko Stuebner [Tue, 4 Jul 2023 16:39:18 +0000 (18:39 +0200)]
mailmap: update manpage link
Patch series "Update .mailmap for my work address and fix manpage".
While updating mailmap for the going-away address, I also found that on
current systems the manpage linked from the header comment changed.
And in fact it looks like the git mailmap feature got its own manpage.
This patch (of 2):
On recent systems the git-shortlog manpage only tells people to
See gitmailmap(5)
So instead of sending people on a scavenger hunt, put that info into the
header directly. Though keep the old reference around for older systems.
Link: https://lkml.kernel.org/r/20230704163919.1136784-1-heiko@sntech.de
Link: https://lkml.kernel.org/r/20230704163919.1136784-2-heiko@sntech.de
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liu Shixin [Tue, 4 Jul 2023 10:19:42 +0000 (18:19 +0800)]
bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
commit
dd0ff4d12dd2 ("bootmem: remove the vmemmap pages from kmemleak in
put_page_bootmem") fix an overlaps existing problem of kmemleak. But the
problem still existed when HAVE_BOOTMEM_INFO_NODE is disabled, because in
this case, free_bootmem_page() will call free_reserved_page() directly.
Fix the problem by adding kmemleak_free_part() in free_bootmem_page() when
HAVE_BOOTMEM_INFO_NODE is disabled.
Link: https://lkml.kernel.org/r/20230704101942.2819426-1-liushixin2@huawei.com
Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Randy Dunlap [Tue, 4 Jul 2023 05:44:10 +0000 (22:44 -0700)]
MAINTAINERS: add linux-next info
Add linux-next info to MAINTAINERS for ease of finding this data.
Link: https://lkml.kernel.org/r/20230704054410.12527-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Markus Schneider-Pargmann [Wed, 28 Jun 2023 08:13:41 +0000 (10:13 +0200)]
mailmap: add Markus Schneider-Pargmann
Add my old mail address and update my name.
Link: https://lkml.kernel.org/r/20230628081341.3470229-1-msp@baylibre.com
Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 28 Jun 2023 18:55:48 +0000 (19:55 +0100)]
writeback: account the number of pages written back
nr_to_write is a count of pages, so we need to decrease it by the number
of pages in the folio we just wrote, not by 1. Most callers specify
either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes() might
end up writing 512x as many pages as it asked for.
Dave added:
: XFS is the only filesystem this would affect, right? AFAIA, nothing
: else enables large folios and uses writeback through
: write_cache_pages() at this point...
:
: In which case, I'd be surprised if much difference, if any, gets
: noticed by anyone.
Link: https://lkml.kernel.org/r/20230628185548.981888-1-willy@infradead.org
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peter Collingbourne [Tue, 23 May 2023 00:43:08 +0000 (17:43 -0700)]
mm: call arch_swap_restore() from do_swap_page()
Commit
c145e0b47c77 ("mm: streamline COW logic in do_swap_page()") moved
the call to swap_free() before the call to set_pte_at(), which meant that
the MTE tags could end up being freed before set_pte_at() had a chance to
restore them. Fix it by adding a call to the arch_swap_restore() hook
before the call to swap_free().
Link: https://lkml.kernel.org/r/20230523004312.1807357-2-pcc@google.com
Link: https://linux-review.googlesource.com/id/I6470efa669e8bd2f841049b8c61020c510678965
Fixes: c145e0b47c77 ("mm: streamline COW logic in do_swap_page()")
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reported-by: Qun-wei Lin <Qun-wei.Lin@mediatek.com>
Closes: https://lore.kernel.org/all/5050805753ac469e8d727c797c2218a9d780d434.camel@mediatek.com/
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org> [6.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vincent Whitchurch [Thu, 29 Jun 2023 14:17:57 +0000 (16:17 +0200)]
squashfs: fix cache race with migration
Migration replaces the page in the mapping before copying the contents and
the flags over from the old page, so check that the page in the page cache
is really up to date before using it. Without this, stressing squashfs
reads with parallel compaction sometimes results in squashfs reporting
data corruption.
Link: https://lkml.kernel.org/r/20230629-squashfs-cache-migration-v1-1-d50ebe55099d@axis.com
Fixes: e994f5b677ee ("squashfs: cache partial compressed blocks")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
John Hubbard [Sat, 1 Jul 2023 01:04:42 +0000 (18:04 -0700)]
mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
The following crash happens for me when running the -mm selftests (below).
Specifically, it happens while running the uffd-stress subtests:
kernel BUG at mm/hugetlb.c:7249!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 3238 Comm: uffd-stress Not tainted 6.4.0-hubbard-github+ #109
Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1503 08/03/2018
RIP: 0010:huge_pte_alloc+0x12c/0x1a0
...
Call Trace:
<TASK>
? __die_body+0x63/0xb0
? die+0x9f/0xc0
? do_trap+0xab/0x180
? huge_pte_alloc+0x12c/0x1a0
? do_error_trap+0xc6/0x110
? huge_pte_alloc+0x12c/0x1a0
? handle_invalid_op+0x2c/0x40
? huge_pte_alloc+0x12c/0x1a0
? exc_invalid_op+0x33/0x50
? asm_exc_invalid_op+0x16/0x20
? __pfx_put_prev_task_idle+0x10/0x10
? huge_pte_alloc+0x12c/0x1a0
hugetlb_fault+0x1a3/0x1120
? finish_task_switch+0xb3/0x2a0
? lock_is_held_type+0xdb/0x150
handle_mm_fault+0xb8a/0xd40
? find_vma+0x5d/0xa0
do_user_addr_fault+0x257/0x5d0
exc_page_fault+0x7b/0x1f0
asm_exc_page_fault+0x22/0x30
That happens because a BUG() statement in huge_pte_alloc() attempts to
check that a pte, if present, is a hugetlb pte, but it does so in a
non-lockless-safe manner that leads to a false BUG() report.
We got here due to a couple of bugs, each of which by itself was not quite
enough to cause a problem:
First of all, before commit
c33c794828f2("mm: ptep_get() conversion"), the
BUG() statement in huge_pte_alloc() was itself fragile: it relied upon
compiler behavior to only read the pte once, despite using it twice in the
same conditional.
Next, commit
c33c794828f2 ("mm: ptep_get() conversion") broke that
delicate situation, by causing all direct pte reads to be done via
READ_ONCE(). And so READ_ONCE() got called twice within the same BUG()
conditional, leading to comparing (potentially, occasionally) different
versions of the pte, and thus to false BUG() reports.
Fix this by taking a single snapshot of the pte before using it in the
BUG conditional.
Now, that commit is only partially to blame here but, people doing
bisections will invariably land there, so this will help them find a fix
for a real crash. And also, the previous behavior was unlikely to ever
expose this bug--it was fragile, yet not actually broken.
So that's why I chose this commit for the Fixes tag, rather than the
commit that created the original BUG() statement.
Link: https://lkml.kernel.org/r/20230701010442.2041858-1-jhubbard@nvidia.com
Fixes: c33c794828f2 ("mm: ptep_get() conversion")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: James Houghton <jthoughton@google.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Anthony Iliopoulos [Wed, 28 Jun 2023 01:34:37 +0000 (03:34 +0200)]
docs: update ocfs2-devel mailing list address
The ocfs2-devel mailing list has been migrated to the kernel.org
infrastructure, update all related documentation pointers to reflect the
change.
Link: https://lkml.kernel.org/r/20230628013437.47030-3-ailiop@suse.com
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Anthony Iliopoulos [Wed, 28 Jun 2023 01:34:36 +0000 (03:34 +0200)]
MAINTAINERS: update ocfs2-devel mailing list address
The ocfs2-devel mailing list has been migrated to the kernel.org
infrastructure, update the related entry to reflect the change.
Link: https://lkml.kernel.org/r/20230628013437.47030-2-ailiop@suse.com
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Suren Baghdasaryan [Thu, 6 Jul 2023 01:14:00 +0000 (18:14 -0700)]
mm: disable CONFIG_PER_VMA_LOCK until its fixed
A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86. Disable per-VMA locks config to
prevent this issue until the fix is confirmed. This is expected to be a
temporary measure.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
[2] https://lore.kernel.org/all/
20230227173632.
3292573-30-surenb@google.com
Link: https://lkml.kernel.org/r/20230706011400.2949242-3-surenb@google.com
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Holger Hoffstätte <holger@applied-asynchrony.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Suren Baghdasaryan [Thu, 6 Jul 2023 01:13:59 +0000 (18:13 -0700)]
fork: lock VMAs of the parent process when forking
Patch series "Avoid memory corruption caused by per-VMA locks", v4.
A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86. Based on the reproducer
provided in [1] we suspect this is caused by the lack of VMA locking while
forking a child process.
Patch 1/2 in the series implements proper VMA locking during fork. I
tested the fix locally using the reproducer and was unable to reproduce
the memory corruption problem.
This fix can potentially regress some fork-heavy workloads. Kernel build
time did not show noticeable regression on a 56-core machine while a
stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~7% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Patch 2/2 disables per-VMA locks until the fix is tested and verified.
This patch (of 2):
When forking a child process, parent write-protects an anonymous page and
COW-shares it with the child being forked using copy_present_pte().
Parent's TLB is flushed right before we drop the parent's mmap_lock in
dup_mmap(). If we get a write-fault before that TLB flush in the parent,
and we end up replacing that anonymous page in the parent process in
do_wp_page() (because, COW-shared with the child), this might lead to some
stale writable TLB entries targeting the wrong (old) page. Similar issue
happened in the past with userfaultfd (see flush_tlb_page() call inside
do_wp_page()).
Lock VMAs of the parent process when forking a child, which prevents
concurrent page faults during fork operation and avoids this issue. This
fix can potentially regress some fork-heavy workloads. Kernel build time
did not show noticeable regression on a 56-core machine while a stress
test mapping 10000 VMAs and forking 5000 times in a tight loop shows ~7%
regression. If such fork time regression is unacceptable, disabling
CONFIG_PER_VMA_LOCK should restore its performance. Further optimizations
are possible if this regression proves to be problematic.
Link: https://lkml.kernel.org/r/20230706011400.2949242-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230706011400.2949242-2-surenb@google.com
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=3D217624
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Holger Hoffsttte <holger@applied-asynchrony.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Geoff Levand [Thu, 29 Jun 2023 23:32:44 +0000 (23:32 +0000)]
ntb.rst: Fix copy and paste error
It seems the text for the NTB MSI Test Client section was copied from the
NTB Tool Test Client, but was not updated for the new section. Corrects
the NTB MSI Test Client section text.
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Geoff Levand [Fri, 30 Jun 2023 21:58:46 +0000 (21:58 +0000)]
ntb_netdev: Fix module_init problem
With both the ntb_transport_init and the ntb_netdev_init_module routines in the
module_init init group, the ntb_netdev_init_module routine can be called before
the ntb_transport_init routine that it depends on is called. To assure the
proper initialization order put ntb_netdev_init_module in the late_initcall
group.
Fixes runtime errors where the ntb_netdev_init_module call fails with ENODEV.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cai Huoqing [Fri, 24 Mar 2023 01:32:20 +0000 (09:32 +0800)]
ntb: intel: Remove redundant pci_clear_master
Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;
pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}
pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cai Huoqing [Fri, 24 Mar 2023 01:32:19 +0000 (09:32 +0800)]
ntb: epf: Remove redundant pci_clear_master
Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;
pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}
pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cai Huoqing [Fri, 24 Mar 2023 01:32:18 +0000 (09:32 +0800)]
ntb_hw_amd: Remove redundant pci_clear_master
Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;
pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}
pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Bjorn Helgaas [Tue, 7 Mar 2023 20:30:21 +0000 (14:30 -0600)]
ntb: idt: drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Palmer Dabbelt [Thu, 13 Oct 2022 21:46:38 +0000 (14:46 -0700)]
MAINTAINERS: git://github -> https://github.com for jonmason
Github deprecated the git:// links about a year ago, so let's move to
the https:// URLs instead.
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://github.blog/2021-09-01-improving-git-protocol-security-github/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
ruanjinjie [Wed, 9 Nov 2022 09:28:52 +0000 (17:28 +0800)]
NTB: EPF: fix possible memory leak in pci_vntb_probe()
As ntb_register_device() don't handle error of device_register(),
if ntb_register_device() returns error in pci_vntb_probe(), name of kobject
which is allocated in dev_set_name() called in device_add() is leaked.
As comment of device_add() says, it should call put_device() to drop the
reference count that was set in device_initialize()
when it fails, so the name can be freed in kobject_cleanup().
Signed-off-by: ruanjinjie <ruanjinjie@huawei.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Jiasheng Jiang [Tue, 22 Nov 2022 03:32:44 +0000 (11:32 +0800)]
NTB: ntb_tool: Add check for devm_kcalloc
As the devm_kcalloc may return NULL pointer,
it should be better to add check for the return
value, as same as the others.
Fixes: 7f46c8b3a552 ("NTB: ntb_tool: Add full multi-port NTB API support")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Yang Yingliang [Thu, 10 Nov 2022 15:19:17 +0000 (23:19 +0800)]
NTB: ntb_transport: fix possible memory leak while device_register() fails
If device_register() returns error, the name allocated by
dev_set_name() need be freed. As comment of device_register()
says, it should use put_device() to give up the reference in
the error path. So fix this by calling put_device(), then the
name can be freed in kobject_cleanup(), and client_dev is freed
in ntb_transport_client_release().
Fixes: fce8a7bb5b4b ("PCI-Express Non-Transparent Bridge Support")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Yuan Can [Sat, 5 Nov 2022 09:43:22 +0000 (09:43 +0000)]
ntb: intel: Fix error handling in intel_ntb_pci_driver_init()
A problem about ntb_hw_intel create debugfs failed is triggered with the
following log given:
[ 273.112733] Intel(R) PCI-E Non-Transparent Bridge Driver 2.0
[ 273.115342] debugfs: Directory 'ntb_hw_intel' with parent '/' already present!
The reason is that intel_ntb_pci_driver_init() returns
pci_register_driver() directly without checking its return value, if
pci_register_driver() failed, it returns without destroy the newly created
debugfs, resulting the debugfs of ntb_hw_intel can never be created later.
intel_ntb_pci_driver_init()
debugfs_create_dir() # create debugfs directory
pci_register_driver()
driver_register()
bus_add_driver()
priv = kzalloc(...) # OOM happened
# return without destroy debugfs directory
Fix by removing debugfs when pci_register_driver() returns error.
Fixes: e26a5843f7f5 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Yuan Can [Sat, 5 Nov 2022 09:43:09 +0000 (09:43 +0000)]
NTB: amd: Fix error handling in amd_ntb_pci_driver_init()
A problem about ntb_hw_amd create debugfs failed is triggered with the
following log given:
[ 618.431232] AMD(R) PCI-E Non-Transparent Bridge Driver 1.0
[ 618.433284] debugfs: Directory 'ntb_hw_amd' with parent '/' already present!
The reason is that amd_ntb_pci_driver_init() returns pci_register_driver()
directly without checking its return value, if pci_register_driver()
failed, it returns without destroy the newly created debugfs, resulting
the debugfs of ntb_hw_amd can never be created later.
amd_ntb_pci_driver_init()
debugfs_create_dir() # create debugfs directory
pci_register_driver()
driver_register()
bus_add_driver()
priv = kzalloc(...) # OOM happened
# return without destroy debugfs directory
Fix by removing debugfs when pci_register_driver() returns error.
Fixes: a1b3695820aa ("NTB: Add support for AMD PCI-Express Non-Transparent Bridge")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Yuan Can [Sat, 5 Nov 2022 09:43:01 +0000 (09:43 +0000)]
ntb: idt: Fix error handling in idt_pci_driver_init()
A problem about ntb_hw_idt create debugfs failed is triggered with the
following log given:
[ 1236.637636] IDT PCI-E Non-Transparent Bridge Driver 2.0
[ 1236.639292] debugfs: Directory 'ntb_hw_idt' with parent '/' already present!
The reason is that idt_pci_driver_init() returns pci_register_driver()
directly without checking its return value, if pci_register_driver()
failed, it returns without destroy the newly created debugfs, resulting
the debugfs of ntb_hw_idt can never be created later.
idt_pci_driver_init()
debugfs_create_dir() # create debugfs directory
pci_register_driver()
driver_register()
bus_add_driver()
priv = kzalloc(...) # OOM happened
# return without destroy debugfs directory
Fix by removing debugfs when pci_register_driver() returns error.
Fixes: bf2a952d31d2 ("NTB: Add IDT 89HPESxNTx PCIe-switches support")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Darrick J. Wong [Fri, 7 Jul 2023 01:00:59 +0000 (18:00 -0700)]
xfs: fix uninit warning in xfs_growfs_data
Quiet down this gcc warning:
fs/xfs/xfs_fsops.c: In function ‘xfs_growfs_data’:
fs/xfs/xfs_fsops.c:219:21: error: ‘lastag_extended’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
219 | if (lastag_extended) {
| ^~~~~~~~~~~~~~~
fs/xfs/xfs_fsops.c:100:33: note: ‘lastag_extended’ was declared here
100 | bool lastag_extended;
| ^~~~~~~~~~~~~~~
By setting its value explicitly. From code analysis I don't think this
is a real problem, but I have better things to do than analyse this
closely.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Linus Torvalds [Fri, 7 Jul 2023 22:59:33 +0000 (15:59 -0700)]
Merge tag 'mmc-v6.5-2' of git://git./linux/kernel/git/ulfh/mmc
Pull mmc fix from Ulf Hansson:
- Fix regression of detection of eMMC/SD/SDIO cards
* tag 'mmc-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: Revert "mmc: core: Allow mmc_start_host() synchronously detect a card"
Linus Torvalds [Fri, 7 Jul 2023 22:40:17 +0000 (15:40 -0700)]
Merge tag 'sound-fix-6.5-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes that have been gathered recently:
- Two code-typo fixes in the new UMP core
- A fix in jack reporting to avoid the usage of mutex
- A potential data race fix in HD-audio core regmap code
- A potential data race fix in PCM allocation helper code
- HD-audio quirks for ASUS, Clevo and Unis machines
- Constifications in FireWire drivers"
* tag 'sound-fix-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Add quirk for ASUS ROG GZ301V
ALSA: jack: Fix mutex call in snd_jack_report()
ALSA: seq: ump: fix typo in system_2p_ev_to_ump_midi1()
ALSA: hda/realtek: Whitespace fix
ALSA: hda/realtek: Add quirk for ASUS ROG G614Jx
ALSA: hda/realtek: Amend G634 quirk to enable rear speakers
ALSA: hda/realtek: Add quirk for ASUS ROG GA402X
ALSA: hda/realtek: Add quirk for ASUS ROG GX650P
ALSA: pcm: Fix potential data race at PCM memory allocation helpers
ALSA: hda: fix a possible null-pointer dereference due to data race in snd_hdac_regmap_sync()
ALSA: hda/realtek: Add quirks for Unis H3C Desktop B760 & Q760
ALSA: hda/realtek: Add quirk for Clevo NPx0SNx
ALSA: ump: Correct wrong byte size at converting a UMP System message
ALSA: fireface: make read-only const array for model names static
ALSA: oxfw: make read-only const array models static
Linus Torvalds [Fri, 7 Jul 2023 22:07:20 +0000 (15:07 -0700)]
Merge tag 'ceph-for-6.5-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"A bunch of CephFS fixups from Xiubo, mostly around dropping caps,
along with a fix for a regression in the readahead handling code which
sneaked in with the switch to netfs helpers"
* tag 'ceph-for-6.5-rc1' of https://github.com/ceph/ceph-client:
ceph: don't let check_caps skip sending responses for revoke msgs
ceph: issue a cap release immediately if no cap exists
ceph: trigger to flush the buffer when making snapshot
ceph: fix blindly expanding the readahead windows
ceph: add a dedicated private data for netfs rreq
ceph: voluntarily drop Xx caps for requests those touch parent mtime
ceph: try to dump the msgs when decoding fails
ceph: only send metrics when the MDS rank is ready
Linus Torvalds [Fri, 7 Jul 2023 21:59:38 +0000 (14:59 -0700)]
Merge tag 'ntfs3_for_6.5' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
"Updates:
- support /proc/fs/ntfs3/<dev>/volinfo and label
- alternative boot if primary boot is corrupted
- small optimizations
Fixes:
- fix endian problems
- fix logic errors
- code refactoring and reformatting"
* tag 'ntfs3_for_6.5' of https://github.com/Paragon-Software-Group/linux-ntfs3:
fs/ntfs3: Correct mode for label entry inside /proc/fs/ntfs3/
fs/ntfs3: Add support /proc/fs/ntfs3/<dev>/volinfo and /proc/fs/ntfs3/<dev>/label
fs/ntfs3: Fix endian problem
fs/ntfs3: Add ability to format new mft records with bigger/smaller header
fs/ntfs3: Code refactoring
fs/ntfs3: Code formatting
fs/ntfs3: Do not update primary boot in ntfs_init_from_boot()
fs/ntfs3: Alternative boot if primary boot is corrupted
fs/ntfs3: Mark ntfs dirty when on-disk struct is corrupted
fs/ntfs3: Fix ntfs_atomic_open
fs/ntfs3: Correct checking while generating attr_list
fs/ntfs3: Use __GFP_NOWARN allocation at ntfs_load_attr_list()
fs: ntfs3: Fix possible null-pointer dereferences in mi_read()
fs/ntfs3: Return error for inconsistent extended attributes
fs/ntfs3: Enhance sanity check while generating attr_list
fs/ntfs3: Use wrapper i_blocksize() in ntfs_zero_range()
ntfs: Fix panic about slab-out-of-bounds caused by ntfs_listxattr()
Linus Torvalds [Fri, 7 Jul 2023 21:51:37 +0000 (14:51 -0700)]
Merge tag 'fsnotify_for_v6.5-rc2' of git://git./linux/kernel/git/jack/linux-fs
Pull fsnotify fix from Jan Kara:
"A fix for fanotify to disallow creating of mount or superblock marks
for kernel internal pseudo filesystems"
* tag 'fsnotify_for_v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: disallow mount/sb marks on kernel internal pseudo fs
Linus Torvalds [Fri, 7 Jul 2023 17:07:19 +0000 (10:07 -0700)]
Merge tag 'riscv-for-linus-6.5-mw2' of git://git./linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- A bunch of fixes/cleanups from the first part of the merge window,
mostly related to ACPI and vector as those were large
- Some documentation improvements, mostly related to the new code
- The "riscv,isa" DT key is deprecated
- Support for link-time dead code elimination
- Support for minor fault registration in userfaultd
- A handful of cleanups around CMO alternatives
* tag 'riscv-for-linus-6.5-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (23 commits)
riscv: mm: mark noncoherent_supported as __ro_after_init
riscv: mm: mark CBO relate initialization funcs as __init
riscv: errata: thead: only set cbom size & noncoherent during boot
riscv: Select HAVE_ARCH_USERFAULTFD_MINOR
RISC-V: Document the ISA string parsing rules for ACPI
risc-v: Fix order of IPI enablement vs RCU startup
mm: riscv: fix an unsafe pte read in huge_pte_alloc()
dt-bindings: riscv: deprecate riscv,isa
RISC-V: drop error print from riscv_hartid_to_cpuid()
riscv: Discard vector state on syscalls
riscv: move memblock_allow_resize() after linear mapping is ready
riscv: Enable ARCH_SUSPEND_POSSIBLE for s2idle
riscv: vdso: include vdso/vsyscall.h for vdso_data
selftests: Test RISC-V Vector's first-use handler
riscv: vector: clear V-reg in the first-use trap
riscv: vector: only enable interrupts in the first-use trap
RISC-V: Fix up some vector state related build failures
RISC-V: Document that V registers are clobbered on syscalls
riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD
riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION
...
Linus Torvalds [Fri, 7 Jul 2023 17:00:30 +0000 (10:00 -0700)]
Merge tag 'powerpc-6.5-2' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix PCIe MEM size for pci2 node on Turris 1.x boards
- Two minor build fixes
Thanks to Christophe Leroy, Douglas Anderson, Pali Rohár, Petr Mladek,
and Randy Dunlap.
* tag 'powerpc-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: dts: turris1x.dts: Fix PCIe MEM size for pci2 node
powerpc: Include asm/nmi.c in mobility.c for watchdog_hardlockup_set_timeout_pct()
powerpc: allow PPC_EARLY_DEBUG_CPM only when SERIAL_CPM=y
Linus Torvalds [Fri, 7 Jul 2023 16:55:31 +0000 (09:55 -0700)]
Merge tag 'apparmor-pr-2023-07-06' of git://git./linux/kernel/git/jj/linux-apparmor
Pull apparmor updates from John Johansen:
- fix missing error check for rhashtable_insert_fast
- add missing failure check in compute_xmatch_perms
- fix policy_compat permission remap with extended permissions
- fix profile verification and enable it
- fix kzalloc perms tables for shared dfas
- Fix kernel-doc header for verify_dfa_accept_index
- aa_buffer: Convert 1-element array to flexible array
- Return directly after a failed kzalloc() in two functions
- fix use of strcpy in policy_unpack_test
- fix kernel-doc complaints
- Fix some kernel-doc comments
* tag 'apparmor-pr-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: Fix kernel-doc header for verify_dfa_accept_index
apparmor: fix: kzalloc perms tables for shared dfas
apparmor: fix profile verification and enable it
apparmor: fix policy_compat permission remap with extended permissions
apparmor: aa_buffer: Convert 1-element array to flexible array
apparmor: add missing failure check in compute_xmatch_perms
apparmor: fix missing error check for rhashtable_insert_fast
apparmor: Return directly after a failed kzalloc() in two functions
AppArmor: Fix some kernel-doc comments
apparmor: fix use of strcpy in policy_unpack_test
apparmor: fix kernel-doc complaints
Thomas Gleixner [Wed, 5 Jul 2023 08:59:23 +0000 (10:59 +0200)]
x86/smp: Don't send INIT to boot CPU
Parking CPUs in INIT works well, except for the crash case when the CPU
which invokes smp_park_other_cpus_in_init() is not the boot CPU. Sending
INIT to the boot CPU resets the whole machine.
Prevent this by validating that this runs on the boot CPU. If not fall back
and let CPUs hang in HLT.
Fixes: 45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible")
Reported-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/87ttui91jo.ffs@tglx
Thomas Bogendoerfer [Thu, 6 Jul 2023 16:36:10 +0000 (18:36 +0200)]
MIPS: kvm: Fix build error with KVM_MIPS_DEBUG_COP0_COUNTERS enabled
Commit
e4de20576986 ("MIPS: KVM: Fix NULL pointer dereference") missed
converting one place accessing cop0 registers, which results in a build
error, if KVM_MIPS_DEBUG_COP0_COUNTERS is enabled.
Fixes: e4de20576986 ("MIPS: KVM: Fix NULL pointer dereference")
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Luke D. Jones [Thu, 6 Jul 2023 22:33:23 +0000 (10:33 +1200)]
ALSA: hda/realtek: Add quirk for ASUS ROG GZ301V
Adds the required quirk to enable the Cirrus amp and correct pins
on the ASUS ROG GZ301V series which uses an SPI connected Cirrus amp.
While this works if the related _DSD properties are made available, these
aren't included in the ACPI of these laptops (yet).
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20230706223323.30871-2-luke@ljones.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linus Torvalds [Fri, 7 Jul 2023 05:42:54 +0000 (22:42 -0700)]
Merge tag 'drm-next-2023-07-07' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Lots of fixes, mostly i915 and amdgpu. It's two weeks of i915, and I
think three weeks of amdgpu.
fbdev:
- Fix module infos on sparc
panel:
- Fix mode on Starry-ili9882t
i915:
- Allow DC states along with PW2 only for PWB functionality [adlp+]
- Fix SSC selection for MPLLA [mtl]
- Use hw.adjusted mode when calculating io/fast wake times [psr]
- Apply min softlimit correctly [guc/slpc]
- Assign correct hdcp content type [hdcp]
- Add missing forward declarations/includes to display power headers
- Fix BDW PSR AUX CH data register offsets [psr]
- Use mock device info for creating mock device
amdgpu:
- Misc cleanups
- GFX 9.4.3 fixes
- DEBUGFS build fix
- Fix LPDDR5 reporting
- ASPM fixes
- DCN 3.1.4 fixes
- DP MST fixes
- DCN 3.2.x fixes
- Display PSR TCON fixes
- SMU 13.x fixes
- RAS fixes
- Vega12/20 SMU fixes
- PSP flashing cleanup
- GFX9 MCBP fixes
- SR-IOV fixes
- GPUVM clear mappings fix for always valid BOs
- Add FAMS quirk for problematic monitor
- Fix possible UAF
- Better handle monentary temperature fluctuations
- SDMA 4.4.2 fixes
- Fencing fix"
* tag 'drm-next-2023-07-07' of git://anongit.freedesktop.org/drm/drm: (83 commits)
drm/i915: use mock device info for creating mock device
drm/i915/psr: Fix BDW PSR AUX CH data register offsets
drm/amdgpu: Fix potential fence use-after-free v2
drm/amd/pm: avoid unintentional shutdown due to temperature momentary fluctuation
drm/amd/pm: expose swctf threshold setting for legacy powerplay
drm/amd/display: 3.2.241
drm/amd/display: Take full update path if number of planes changed
drm/amd/display: Create debugging mechanism for Gaming FAMS
drm/amd/display: Add monitor specific edid quirk
drm/amd/display: For new fast update path, loop through each surface
drm/amd/display: Remove Phantom Pipe Check When Calculating K1 and K2
drm/amd/display: Limit new fast update path to addr and gamma / color
drm/amd/display: Fix the delta clamping for shaper LUT
drm/amdgpu: Keep non-psp path for partition switch
drm/amd/display: program DPP shaper and 3D LUT if updated
Revert "drm/amd/display: edp do not add non-edid timings"
drm/amdgpu: share drm device for pci amdgpu device with 1st partition device
drm/amd/pm: Add GFX v9.4.3 unique id to sysfs
drm/amd/pm: Enable pp_feature attribute
drm/amdgpu/vcn: Need to unpause dpg before stop dpg
...
Linus Torvalds [Fri, 7 Jul 2023 05:25:06 +0000 (22:25 -0700)]
Merge tag 'acpi-6.5-rc1-3' of git://git./linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"These fix a couple of compiler warnings, refine an ACPI device
enumeration quirk to address a driver regression and clean up code.
Specifics:
- Make acpi_companion_match() return a const pointer and update its
callers accordingly (Andy Shevchenko)
- Move the extern declaration of the acpi_root variable to a header
file so as to address a compiler warning (Andy Shevchenko)
- Address compiler warnings in the ACPI device enumeration code by
adding a missing header file include to it (Ben Dooks)
- Refine the SMB0001 quirk in the ACPI device enumeration code so as
to address an i2c-scmi driver regression (Andy Shevchenko)
- Clean up two pieces of the ACPI device enumeration code (Andy
Shevchenko)"
* tag 'acpi-6.5-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: scan: Use the acpi_match_acpi_device() helper
ACPI: platform: Move SMB0001 HID to the header and reuse
ACPI: platform: Ignore SMB0001 only when it has resources
ACPI: bus: Introduce acpi_match_acpi_device() helper
ACPI: scan: fix undeclared variable warnings by including sleep.h
ACPI: bus: Constify acpi_companion_match() returned value
ACPI: scan: Move acpi_root to internal header
Linus Torvalds [Fri, 7 Jul 2023 05:15:38 +0000 (22:15 -0700)]
Merge tag 'docs-6.5-2' of git://git.lwn.net/linux
Pull mode documentation updates from Jonathan Corbet:
"A half-dozen late arriving docs patches. They are mostly fixes, but we
also have a kernel-doc tweak for enums and the long-overdue removal of
the outdated and redundant patch-submission comments at the top of the
MAINTAINERS file"
* tag 'docs-6.5-2' of git://git.lwn.net/linux:
scripts: kernel-doc: support private / public marking for enums
Documentation: KVM: SEV: add a missing backtick
Documentation: ACPI: fix typo in ssdt-overlays.rst
Fix documentation of panic_on_warn
docs: remove the tips on how to submit patches from MAINTAINERS
docs: fix typo in zh_TW and zh_CN translation
Linus Torvalds [Fri, 7 Jul 2023 02:24:11 +0000 (19:24 -0700)]
Merge tag 'spi-fix-v6.5-merge-window' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few mostly minor fixes that came in during the merge window, plus
one administrative update for Jonas' e-mail address.
The spi-geni-qcom fix is more major than the others, fixing the newly
added DMA support for large reads which trigger DMA"
* tag 'spi-fix-v6.5-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: bcm{63xx,bca}-hsspi: update my email address
spi: rzv2m-csi: Fix SoC product name
spi: bcm-qspi: return error if neither hif_mspi nor mspi is available
spi: spi-geni-qcom: enable SPI_CONTROLLER_MUST_TX for GPI DMA mode
Linus Torvalds [Fri, 7 Jul 2023 02:20:23 +0000 (19:20 -0700)]
Merge tag 'regulator-fix-v6.5-merge-window' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"A simple dependency fix for a newly added driver"
* tag 'regulator-fix-v6.5-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: raa215300: Add build dependency with COMMON_CLK
Linus Torvalds [Fri, 7 Jul 2023 02:07:15 +0000 (19:07 -0700)]
Merge tag 'trace-v6.5-2' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix bad git merge of #endif in arm64 code
A merge of the arm64 tree caused #endif to go into the wrong place
- Fix crash on lseek of write access to tracefs/error_log
Opening error_log as write only, and then doing an lseek() causes a
kernel panic, because the lseek() handle expects a "seq_file" to
exist (which is not done on write only opens). Use tracing_lseek()
that tests for this instead of calling the default seq lseek handler.
- Check for negative instead of -E2BIG for error on strscpy() returns
Instead of testing for -E2BIG from strscpy(), to be more robust,
check for less than zero, which will make sure it catches any error
that strscpy() may someday return.
* tag 'trace-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/boot: Test strscpy() against less than zero for error
arm64: ftrace: fix build error with CONFIG_FUNCTION_GRAPH_TRACER=n
tracing: Fix null pointer dereference in tracing_err_log_open()
Linus Torvalds [Fri, 7 Jul 2023 02:01:38 +0000 (19:01 -0700)]
Merge tag 'v6.5/vfs.fixes.2' of git://git./linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains two minor fixes for Jan's rename locking work:
- Unlocking the source inode was guarded by a check whether source
was non-NULL. This doesn't make sense because source must be
non-NULL and the commit message explains in detail why
- The lock_two_nondirectories() helper called WARN_ON_ONCE() and
dereferenced the inodes unconditionally but the underlying
lock_two_inodes() helper and the kernel documentation for that
function are clear that it is valid to pass NULL arguments, so a
non-NULL check is needed. No callers currently pass NULL arguments
but let's not knowingly leave landmines around"
* tag 'v6.5/vfs.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: don't assume arguments are non-NULL
fs: no need to check source