Chao Yu [Mon, 21 Aug 2023 15:22:24 +0000 (23:22 +0800)]
f2fs: fix error path of f2fs_submit_page_read()
In error path of f2fs_submit_page_read(), it missed to call
iostat_update_and_unbind_ctx() and free bio_post_read_ctx, fix it.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 21 Aug 2023 15:22:23 +0000 (23:22 +0800)]
f2fs: clean up error handling in sanity_check_{compress_,}inode()
In sanity_check_{compress_,}inode(), it doesn't need to set SBI_NEED_FSCK
in each error case, instead, we can set the flag in do_read_inode() only
once when sanity_check_inode() fails.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 18 Aug 2023 18:34:32 +0000 (11:34 -0700)]
f2fs: avoid false alarm of circular locking
======================================================
WARNING: possible circular locking dependency detected
6.5.0-rc5-syzkaller-00353-gae545c3283dc #0 Not tainted
------------------------------------------------------
syz-executor273/5027 is trying to acquire lock:
ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_down_write fs/f2fs/f2fs.h:2133 [inline]
ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644
but task is already holding lock:
ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_down_read fs/f2fs/f2fs.h:2108 [inline]
ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_add_dentry+0x92/0x230 fs/f2fs/dir.c:783
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&fi->i_xattr_sem){.+.+}-{3:3}:
down_read+0x9c/0x470 kernel/locking/rwsem.c:1520
f2fs_down_read fs/f2fs/f2fs.h:2108 [inline]
f2fs_getxattr+0xb1e/0x12c0 fs/f2fs/xattr.c:532
__f2fs_get_acl+0x5a/0x900 fs/f2fs/acl.c:179
f2fs_acl_create fs/f2fs/acl.c:377 [inline]
f2fs_init_acl+0x15c/0xb30 fs/f2fs/acl.c:420
f2fs_init_inode_metadata+0x159/0x1290 fs/f2fs/dir.c:558
f2fs_add_regular_entry+0x79e/0xb90 fs/f2fs/dir.c:740
f2fs_add_dentry+0x1de/0x230 fs/f2fs/dir.c:788
f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827
f2fs_add_link fs/f2fs/f2fs.h:3554 [inline]
f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781
vfs_mkdir+0x532/0x7e0 fs/namei.c:4117
do_mkdirat+0x2a9/0x330 fs/namei.c:4140
__do_sys_mkdir fs/namei.c:4160 [inline]
__se_sys_mkdir fs/namei.c:4158 [inline]
__x64_sys_mkdir+0xf2/0x140 fs/namei.c:4158
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #0 (&fi->i_sem){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144
lock_acquire kernel/locking/lockdep.c:5761 [inline]
lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726
down_write+0x93/0x200 kernel/locking/rwsem.c:1573
f2fs_down_write fs/f2fs/f2fs.h:2133 [inline]
f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644
f2fs_add_dentry+0xa6/0x230 fs/f2fs/dir.c:784
f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827
f2fs_add_link fs/f2fs/f2fs.h:3554 [inline]
f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781
vfs_mkdir+0x532/0x7e0 fs/namei.c:4117
ovl_do_mkdir fs/overlayfs/overlayfs.h:196 [inline]
ovl_mkdir_real+0xb5/0x370 fs/overlayfs/dir.c:146
ovl_workdir_create+0x3de/0x820 fs/overlayfs/super.c:309
ovl_make_workdir fs/overlayfs/super.c:711 [inline]
ovl_get_workdir fs/overlayfs/super.c:864 [inline]
ovl_fill_super+0xdab/0x6180 fs/overlayfs/super.c:1400
vfs_get_super+0xf9/0x290 fs/super.c:1152
vfs_get_tree+0x88/0x350 fs/super.c:1519
do_new_mount fs/namespace.c:3335 [inline]
path_mount+0x1492/0x1ed0 fs/namespace.c:3662
do_mount fs/namespace.c:3675 [inline]
__do_sys_mount fs/namespace.c:3884 [inline]
__se_sys_mount fs/namespace.c:3861 [inline]
__x64_sys_mount+0x293/0x310 fs/namespace.c:3861
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(&fi->i_xattr_sem);
lock(&fi->i_sem);
lock(&fi->i_xattr_sem);
lock(&fi->i_sem);
Cc: <stable@vger.kernel.org>
Reported-and-tested-by: syzbot+e5600587fa9cbf8e3826@syzkaller.appspotmail.com
Fixes:
5eda1ad1aaff "f2fs: fix deadlock in i_xattr_sem and inode page lock"
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Sun, 30 Jul 2023 14:25:52 +0000 (22:25 +0800)]
Revert "f2fs: do not issue small discard commands during checkpoint"
Previously, we have two mechanisms to cache & submit small discards:
a) set max small discard number in /sys/fs/f2fs/vdb/max_small_discards,
and checkpoint will cache small discard candidates w/ configured maximum
number.
b) call FITRIM ioctl, also, checkpoint in f2fs_trim_fs() will cache small
discard candidates w/ configured discard granularity, but w/o limitation
of number. FSTRIM interface is asynchronized, so it won't submit discard
directly.
Finally, discard thread will submit them in background periodically.
However, after commit
9ac00e7cef10 ("f2fs: do not issue small discard
commands during checkpoint"), the mechanism a) is broken, since no matter
how we configure the sysfs entry /sys/fs/f2fs/vdb/max_small_discards,
checkpoint will not cache small discard candidates any more.
echo 0 > /sys/fs/f2fs/vdb/max_small_discards
xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
sync
cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
echo 100 > /sys/fs/f2fs/vdb/max_small_discards
rm /mnt/f2fs/file
xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
sync
cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
Before the patch:
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 . . . . . . . .
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 3 1 . . . . . .
After the patch:
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 . . . . . . . .
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 . . . . . . . .
This patch reverts commit
9ac00e7cef10 ("f2fs: do not issue small discard
commands during checkpoint") in order to fix this issue.
Fixes:
9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Sun, 30 Jul 2023 14:25:51 +0000 (22:25 +0800)]
f2fs: doc: fix description of max_small_discards
The description of max_small_discards is out-of-update in below two
aspects, fix it.
- it is disabled by default
- small discards will be issued during checkpoint
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Zhiguo Niu [Thu, 10 Aug 2023 08:40:00 +0000 (16:40 +0800)]
f2fs: should update REQ_TIME for direct write
The sending interval of discard and GC should also
consider direct write requests; filesystem is not
idle if there is direct write.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 8 Aug 2023 00:59:49 +0000 (08:59 +0800)]
f2fs: fix to account cp stats correctly
cp_foreground_calls sysfs entry shows total CP call count rather than
foreground CP call count, fix it.
Fixes:
fc7100ea2a52 ("f2fs: Add f2fs stats to sysfs")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 8 Aug 2023 00:59:48 +0000 (08:59 +0800)]
f2fs: fix to account gc stats correctly
As reported, status debugfs entry shows inconsistent GC stats as below:
GC calls: 6008 (BG: 6161)
- data segments : 3053 (BG: 3053)
- node segments : 2955 (BG: 2955)
Total GC calls is larger than BGGC calls, the reason is:
- f2fs_stat_info.call_count accounts total migrated section count
by f2fs_gc()
- f2fs_stat_info.bg_gc accounts total call times of f2fs_gc() from
background gc_thread
Another issue is gc_foreground_calls sysfs entry shows total GC call
count rather than FGGC call count.
This patch changes as below for fix:
- account GC calls and migrated segment count separately
- support to account migrated section count if it enables large section
mode
- fix to show correct value in gc_foreground_calls sysfs entry
Fixes:
fc7100ea2a52 ("f2fs: Add f2fs stats to sysfs")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 19 Jul 2023 13:50:46 +0000 (21:50 +0800)]
f2fs: remove unneeded check condition in __f2fs_setxattr()
It has checked return value of write_all_xattrs(), remove unneeded
following check condition.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 19 Jul 2023 13:50:45 +0000 (21:50 +0800)]
f2fs: fix to update i_ctime in __f2fs_setxattr()
generic/728 - output mismatch (see /media/fstests/results//generic/728.out.bad)
--- tests/generic/728.out 2023-07-19 07:10:48.
362711407 +0000
+++ /media/fstests/results//generic/728.out.bad 2023-07-19 08:39:57.
000000000 +0000
QA output created by 728
+Expected ctime to change after setxattr.
+Expected ctime to change after removexattr.
Silence is golden
...
(Run 'diff -u /media/fstests/tests/generic/728.out /media/fstests/results//generic/728.out.bad' to see the entire diff)
generic/729 1s
It needs to update i_ctime after {set,remove}xattr, fix it.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 20 Jul 2023 11:29:53 +0000 (19:29 +0800)]
Revert "f2fs: fix to do sanity check on extent cache correctly"
syzbot reports a f2fs bug as below:
UBSAN: array-index-out-of-bounds in fs/f2fs/f2fs.h:3275:19
index 1409 is out of range for type '__le32[923]' (aka 'unsigned int[923]')
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
inline_data_addr fs/f2fs/f2fs.h:3275 [inline]
__recover_inline_status fs/f2fs/inode.c:113 [inline]
do_read_inode fs/f2fs/inode.c:480 [inline]
f2fs_iget+0x4730/0x48b0 fs/f2fs/inode.c:604
f2fs_fill_super+0x640e/0x80c0 fs/f2fs/super.c:4601
mount_bdev+0x276/0x3b0 fs/super.c:1391
legacy_get_tree+0xef/0x190 fs/fs_context.c:611
vfs_get_tree+0x8c/0x270 fs/super.c:1519
do_new_mount+0x28f/0xae0 fs/namespace.c:3335
do_mount fs/namespace.c:3675 [inline]
__do_sys_mount fs/namespace.c:3884 [inline]
__se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3861
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The issue was bisected to:
commit
d48a7b3a72f121655d95b5157c32c7d555e44c05
Author: Chao Yu <chao@kernel.org>
Date: Mon Jan 9 03:49:20 2023 +0000
f2fs: fix to do sanity check on extent cache correctly
The root cause is we applied both v1 and v2 of the patch, v2 is the right
fix, so it needs to revert v1 in order to fix reported issue.
v1:
commit
d48a7b3a72f1 ("f2fs: fix to do sanity check on extent cache correctly")
https://lore.kernel.org/lkml/
20230109034920.492914-1-chao@kernel.org/
v2:
commit
269d11948100 ("f2fs: fix to do sanity check on extent cache correctly")
https://lore.kernel.org/lkml/
20230207134808.1827869-1-chao@kernel.org/
Reported-by: syzbot+601018296973a481f302@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/
000000000000fcf0690600e4d04d@google.com/
Fixes:
d48a7b3a72f1 ("f2fs: fix to do sanity check on extent cache correctly")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Minjie Du [Mon, 17 Jul 2023 07:11:09 +0000 (15:11 +0800)]
f2fs: increase usage of folio_next_index() helper
Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using
the existing helper folio_next_index().
Signed-off-by: Minjie Du <duminjie@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chunhai Guo [Thu, 3 Aug 2023 14:28:42 +0000 (22:28 +0800)]
f2fs: Only lfs mode is allowed with zoned block device feature
Now f2fs support four block allocation modes: lfs, adaptive,
fragment:segment, fragment:block. Only lfs mode is allowed with zoned block
device feature.
Fixes:
6691d940b0e0 ("f2fs: introduce fragment allocation mode mount option")
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Shin'ichiro Kawasaki [Fri, 4 Aug 2023 09:15:56 +0000 (18:15 +0900)]
f2fs: check zone type before sending async reset zone command
The commit
25f9080576b9 ("f2fs: add async reset zone command support")
introduced "async reset zone commands" by calling
__submit_zone_reset_cmd() in async discard operations. However,
__submit_zone_reset_cmd() is called regardless of zone type of discard
target zone. When devices have conventional zones, zone reset commands
are sent to the conventional zones and cause I/O errors.
Avoid the I/O errors by checking that the discard target zone type is
sequential write required. If not, handle the discard operation in same
manner as non-zoned, regular block devices. For that purpose, add a new
helper function f2fs_bdev_index() which gets index of the zone reset
target device.
Fixes:
25f9080576b9 ("f2fs: add async reset zone command support")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 11 Jul 2023 20:08:06 +0000 (04:08 +0800)]
f2fs: compress: don't {,de}compress non-full cluster
f2fs won't compress non-full cluster in tail of file, let's skip
dirtying and rewrite such cluster during f2fs_ioc_{,de}compress_file.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 11 Jul 2023 20:08:05 +0000 (04:08 +0800)]
f2fs: allow f2fs_ioc_{,de}compress_file to be interrupted
This patch allows f2fs_ioc_{,de}compress_file() to be interrupted, so that,
userspace won't be blocked when manual {,de}compression on large file is
interrupted by signal.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Christoph Hellwig [Fri, 7 Jul 2023 08:31:49 +0000 (10:31 +0200)]
f2fs: don't reopen the main block device in f2fs_scan_devices
f2fs_scan_devices reopens the main device since the very beginning, which
has always been useless, and also means that we don't pass the right
holder for the reopen, which now leads to a warning as the core super.c
holder ops aren't passed in for the reopen.
Fixes:
3c62be17d4f5 ("f2fs: support multiple devices")
Fixes:
0718afd47f70 ("block: introduce holder ops")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 6 Jul 2023 02:06:14 +0000 (10:06 +0800)]
f2fs: fix to avoid mmap vs set_compress_option case
Compression option in inode should not be changed after they have
been used, however, it may happen in below race case:
Thread A Thread B
- f2fs_ioc_set_compress_option
- check f2fs_is_mmap_file()
- check get_dirty_pages()
- check F2FS_HAS_BLOCKS()
- f2fs_file_mmap
- set_inode_flag(FI_MMAP_FILE)
- fault
- do_page_mkwrite
- f2fs_vm_page_mkwrite
- f2fs_get_block_locked
- fault_dirty_shared_page
- set_page_dirty
- update i_compress_algorithm
- update i_log_cluster_size
- update i_cluster_size
Avoid such race condition by covering f2fs_file_mmap() w/ i_sem lock,
meanwhile add mmap file check condition in f2fs_may_compress() as well.
Fixes:
e1e8debec656 ("f2fs: add F2FS_IOC_SET_COMPRESS_OPTION ioctl")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Randy Dunlap [Mon, 10 Jul 2023 05:23:24 +0000 (22:23 -0700)]
f2fs: fix spelling in ABI documentation
Correct spelling problems as identified by codespell.
Fixes:
9e615dbba41e ("f2fs: add missing description for ipu_policy node")
Fixes:
b2e4a2b300e5 ("f2fs: expose discard related parameters in sysfs")
Fixes:
846ae671ad36 ("f2fs: expose extension_list sysfs entry")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: Yangtao Li <frank.li@vivo.com>
Cc: Konstantin Vyshetsky <vkon@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 19 Jan 2023 18:47:00 +0000 (10:47 -0800)]
f2fs: get out of a repeat loop when getting a locked data page
https://bugzilla.kernel.org/show_bug.cgi?id=216050
Somehow we're getting a page which has a different mapping.
Let's avoid the infinite loop.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 7 Jul 2023 14:03:13 +0000 (07:03 -0700)]
f2fs: flush inode if atomic file is aborted
Let's flush the inode being aborted atomic operation to avoid stale dirty
inode during eviction in this call stack:
f2fs_mark_inode_dirty_sync+0x22/0x40 [f2fs]
f2fs_abort_atomic_write+0xc4/0xf0 [f2fs]
f2fs_evict_inode+0x3f/0x690 [f2fs]
? sugov_start+0x140/0x140
evict+0xc3/0x1c0
evict_inodes+0x17b/0x210
generic_shutdown_super+0x32/0x120
kill_block_super+0x21/0x50
deactivate_locked_super+0x31/0x90
cleanup_mnt+0x100/0x160
task_work_run+0x59/0x90
do_exit+0x33b/0xa50
do_group_exit+0x2d/0x80
__x64_sys_exit_group+0x14/0x20
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
This triggers f2fs_bug_on() in f2fs_evict_inode:
f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE));
This fixes the syzbot report:
loop0: detected capacity change from 0 to 131072
F2FS-fs (loop0): invalid crc value
F2FS-fs (loop0): Found nat_bits in checkpoint
F2FS-fs (loop0): Mounted with checkpoint version =
48b305e4
------------[ cut here ]------------
kernel BUG at fs/f2fs/inode.c:869!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 5014 Comm: syz-executor220 Not tainted 6.4.0-syzkaller-11479-g6cd06ab12d1a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
RIP: 0010:f2fs_evict_inode+0x172d/0x1e00 fs/f2fs/inode.c:869
Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 6a 06 00 00 8b 75 40 ba 01 00 00 00 4c 89 e7 e8 6d ce 06 00 e9 aa fc ff ff e8 63 22 e2 fd <0f> 0b e8 5c 22 e2 fd 48 c7 c0 a8 3a 18 8d 48 ba 00 00 00 00 00 fc
RSP: 0018:
ffffc90003a6fa00 EFLAGS:
00010293
RAX:
0000000000000000 RBX:
0000000000000001 RCX:
0000000000000000
RDX:
ffff8880273b8000 RSI:
ffffffff83a2bd0d RDI:
0000000000000007
RBP:
ffff888077db91b0 R08:
0000000000000007 R09:
0000000000000000
R10:
0000000000000001 R11:
0000000000000001 R12:
ffff888029a3c000
R13:
ffff888077db9660 R14:
ffff888029a3c0b8 R15:
ffff888077db9c50
FS:
0000000000000000(0000) GS:
ffff8880b9800000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f1909bb9000 CR3:
00000000276a9000 CR4:
0000000000350ef0
Call Trace:
<TASK>
evict+0x2ed/0x6b0 fs/inode.c:665
dispose_list+0x117/0x1e0 fs/inode.c:698
evict_inodes+0x345/0x440 fs/inode.c:748
generic_shutdown_super+0xaf/0x480 fs/super.c:478
kill_block_super+0x64/0xb0 fs/super.c:1417
kill_f2fs_super+0x2af/0x3c0 fs/f2fs/super.c:4704
deactivate_locked_super+0x98/0x160 fs/super.c:330
deactivate_super+0xb1/0xd0 fs/super.c:361
cleanup_mnt+0x2ae/0x3d0 fs/namespace.c:1254
task_work_run+0x16f/0x270 kernel/task_work.c:179
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0xa9a/0x29a0 kernel/exit.c:874
do_group_exit+0xd4/0x2a0 kernel/exit.c:1024
__do_sys_exit_group kernel/exit.c:1035 [inline]
__se_sys_exit_group kernel/exit.c:1033 [inline]
__x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1033
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f309be71a09
Code: Unable to access opcode bytes at 0x7f309be719df.
RSP: 002b:
00007fff171df518 EFLAGS:
00000246 ORIG_RAX:
00000000000000e7
RAX:
ffffffffffffffda RBX:
00007f309bef7330 RCX:
00007f309be71a09
RDX:
000000000000003c RSI:
00000000000000e7 RDI:
0000000000000001
RBP:
0000000000000001 R08:
ffffffffffffffc0 R09:
00007f309bef1e40
R10:
0000000000010600 R11:
0000000000000246 R12:
00007f309bef7330
R13:
0000000000000001 R14:
0000000000000000 R15:
0000000000000001
</TASK>
Modules linked in:
---[ end trace
0000000000000000 ]---
RIP: 0010:f2fs_evict_inode+0x172d/0x1e00 fs/f2fs/inode.c:869
Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 6a 06 00 00 8b 75 40 ba 01 00 00 00 4c 89 e7 e8 6d ce 06 00 e9 aa fc ff ff e8 63 22 e2 fd <0f> 0b e8 5c 22 e2 fd 48 c7 c0 a8 3a 18 8d 48 ba 00 00 00 00 00 fc
RSP: 0018:
ffffc90003a6fa00 EFLAGS:
00010293
RAX:
0000000000000000 RBX:
0000000000000001 RCX:
0000000000000000
RDX:
ffff8880273b8000 RSI:
ffffffff83a2bd0d RDI:
0000000000000007
RBP:
ffff888077db91b0 R08:
0000000000000007 R09:
0000000000000000
R10:
0000000000000001 R11:
0000000000000001 R12:
ffff888029a3c000
R13:
ffff888077db9660 R14:
ffff888029a3c0b8 R15:
ffff888077db9c50
FS:
0000000000000000(0000) GS:
ffff8880b9800000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f1909bb9000 CR3:
00000000276a9000 CR4:
0000000000350ef0
Cc: <stable@vger.kernel.org>
Reported-and-tested-by: syzbot+e1246909d526a9d470fa@syzkaller.appspotmail.com
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 10 Jul 2023 06:10:58 +0000 (14:10 +0800)]
f2fs: don't handle error case of f2fs_compress_alloc_page()
f2fs_compress_alloc_page() uses mempool to allocate memory, it never
fail, don't handle error case in its callers.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 4 Aug 2023 19:15:34 +0000 (12:15 -0700)]
Revert "f2fs: clean up w/ sbi->log_sectors_per_block"
This reverts commit
bfd476623999118d9c509cb0fa9380f2912bc225.
Shinichiro Kawasaki reported:
When I ran workloads on f2fs using v6.5-rcX with fixes [1][2] and a zoned block
devices with 4kb logical block size, I observe mount failure as follows. When
I revert this commit, the failure goes away.
[ 167.781975][ T1555] F2FS-fs (dm-0): IO Block Size: 4 KB
[ 167.890728][ T1555] F2FS-fs (dm-0): Found nat_bits in checkpoint
[ 171.482588][ T1555] F2FS-fs (dm-0): Zone without valid block has non-zero write pointer. Reset the write pointer: wp[0x1300,0x8]
[ 171.496000][ T1555] F2FS-fs (dm-0): (0) : Unaligned zone reset attempted (block 280000 + 80000)
[ 171.505037][ T1555] F2FS-fs (dm-0): Discard zone failed: (errno=-5)
The patch replaced "sbi->log_blocksize - SECTOR_SHIFT" with
"sbi->log_sectors_per_block". However, I think these two are not equal when the
device has 4k logical block size. The former uses Linux kernel sector size 512
byte. The latter use 512b sector size or 4kb sector size depending on the
device. mkfs.f2fs obtains logical block size via BLKSSZGET ioctl from the device
and reflects it to the value sbi->log_sector_size_per_block. This causes
unexpected write pointer calculations in check_zone_write_pointer(). This
resulted in unexpected zone reset and the mount failure.
[1] https://lkml.kernel.org/linux-f2fs-devel/
20230711050101.GA19128@lst.de/
[2] https://lore.kernel.org/linux-f2fs-devel/
20230804091556.2372567-1-shinichiro.kawasaki@wdc.com/
Cc: stable@vger.kernel.org
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes:
bfd476623999 ("f2fs: clean up w/ sbi->log_sectors_per_block")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Linus Torvalds [Sun, 9 Jul 2023 20:53:13 +0000 (13:53 -0700)]
Linux 6.5-rc1
Linus Torvalds [Sun, 9 Jul 2023 17:29:53 +0000 (10:29 -0700)]
MAINTAINERS 2: Electric Boogaloo
We just sorted the entries and fields last release, so just out of a
perverse sense of curiosity, I decided to see if we can keep things
ordered for even just one release.
The answer is "No. No we cannot".
I suggest that all kernel developers will need weekly training sessions,
involving a lot of Big Bird and Sesame Street. And at the yearly
maintainer summit, we will all sing the alphabet song together.
I doubt I will keep doing this. At some point "perverse sense of
curiosity" turns into just a cold dark place filled with sadness and
despair.
Repeats:
80e62bc8487b ("MAINTAINERS: re-sort all entries and fields")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 9 Jul 2023 17:24:22 +0000 (10:24 -0700)]
Merge tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- swiotlb area sizing fixes (Petr Tesarik)
* tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: reduce the number of areas to match actual memory pool size
swiotlb: always set the number of areas before allocating the pool
Linus Torvalds [Sun, 9 Jul 2023 17:16:04 +0000 (10:16 -0700)]
Merge tag 'irq_urgent_for_v6.5_rc1' of git://git./linux/kernel/git/tip/tip
Pull irq update from Borislav Petkov:
- Optimize IRQ domain's name assignment
* tag 'irq_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqdomain: Use return value of strreplace()
Linus Torvalds [Sun, 9 Jul 2023 17:13:32 +0000 (10:13 -0700)]
Merge tag 'x86_urgent_for_v6.5_rc1' of git://git./linux/kernel/git/tip/tip
Pull x86 fpu fix from Borislav Petkov:
- Do FPU AP initialization on Xen PV too which got missed by the recent
boot reordering work
* tag 'x86_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/xen: Fix secondary processors' FPU initialization
Linus Torvalds [Sun, 9 Jul 2023 17:08:38 +0000 (10:08 -0700)]
Merge tag 'x86-core-2023-07-09' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single fix for the mechanism to park CPUs with an INIT IPI.
On shutdown or kexec, the kernel tries to park the non-boot CPUs with
an INIT IPI. But the same code path is also used by the crash utility.
If the CPU which panics is not the boot CPU then it sends an INIT IPI
to the boot CPU which resets the machine.
Prevent this by validating that the CPU which runs the stop mechanism
is the boot CPU. If not, leave the other CPUs in HLT"
* tag 'x86-core-2023-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Don't send INIT to boot CPU
Linus Torvalds [Sun, 9 Jul 2023 17:02:49 +0000 (10:02 -0700)]
Merge tag 'mips_6.5_1' of git://git./linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- fixes for KVM
- fix for loongson build and cpu probing
- DT fixes
* tag 'mips_6.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: kvm: Fix build error with KVM_MIPS_DEBUG_COP0_COUNTERS enabled
MIPS: dts: add missing space before {
MIPS: Loongson: Fix build error when make modules_install
MIPS: KVM: Fix NULL pointer dereference
MIPS: Loongson: Fix cpu_probe_loongson() again
Linus Torvalds [Sun, 9 Jul 2023 16:50:42 +0000 (09:50 -0700)]
Merge tag 'xfs-6.5-merge-6' of git://git./fs/xfs/xfs-linux
Pull xfs fix from Darrick Wong:
"Nothing exciting here, just getting rid of a gcc warning that I got
tired of seeing when I turn on gcov"
* tag 'xfs-6.5-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix uninit warning in xfs_growfs_data
Linus Torvalds [Sun, 9 Jul 2023 16:45:32 +0000 (09:45 -0700)]
Merge tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- fix potential use after free in unmount
- minor cleanup
- add worker to cleanup stale directory leases
* tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Add a laundromat thread for cached directories
smb: client: remove redundant pointer 'server'
cifs: fix session state transition to avoid use-after-free issue
Linus Torvalds [Sun, 9 Jul 2023 16:35:51 +0000 (09:35 -0700)]
Merge tag 'ntb-6.5' of https://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
"Fixes for pci_clean_master, error handling in driver inits, and
various other issues/bugs"
* tag 'ntb-6.5' of https://github.com/jonmason/ntb:
ntb: hw: amd: Fix debugfs_create_dir error checking
ntb.rst: Fix copy and paste error
ntb_netdev: Fix module_init problem
ntb: intel: Remove redundant pci_clear_master
ntb: epf: Remove redundant pci_clear_master
ntb_hw_amd: Remove redundant pci_clear_master
ntb: idt: drop redundant pci_enable_pcie_error_reporting()
MAINTAINERS: git://github -> https://github.com for jonmason
NTB: EPF: fix possible memory leak in pci_vntb_probe()
NTB: ntb_tool: Add check for devm_kcalloc
NTB: ntb_transport: fix possible memory leak while device_register() fails
ntb: intel: Fix error handling in intel_ntb_pci_driver_init()
NTB: amd: Fix error handling in amd_ntb_pci_driver_init()
ntb: idt: Fix error handling in idt_pci_driver_init()
Hugh Dickins [Sat, 8 Jul 2023 23:04:00 +0000 (16:04 -0700)]
mm: lock newly mapped VMA with corrected ordering
Lockdep is certainly right to complain about
(&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
but task is already holding lock:
(&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db
Invert those to the usual ordering.
Fixes:
33313a747e81 ("mm: lock newly mapped VMA which can be modified after it becomes visible")
Cc: stable@vger.kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 8 Jul 2023 21:30:25 +0000 (14:30 -0700)]
Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git./linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"16 hotfixes. Six are cc:stable and the remainder address post-6.4
issues"
The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since
it was all hopefully fixed in mainline.
* tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
lib: dhry: fix sleeping allocations inside non-preemptable section
kasan, slub: fix HW_TAGS zeroing with slub_debug
kasan: fix type cast in memory_is_poisoned_n
mailmap: add entries for Heiko Stuebner
mailmap: update manpage link
bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
MAINTAINERS: add linux-next info
mailmap: add Markus Schneider-Pargmann
writeback: account the number of pages written back
mm: call arch_swap_restore() from do_swap_page()
squashfs: fix cache race with migration
mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
docs: update ocfs2-devel mailing list address
MAINTAINERS: update ocfs2-devel mailing list address
mm: disable CONFIG_PER_VMA_LOCK until its fixed
fork: lock VMAs of the parent process when forking
Suren Baghdasaryan [Sat, 8 Jul 2023 19:12:12 +0000 (12:12 -0700)]
fork: lock VMAs of the parent process when forking
When forking a child process, the parent write-protects anonymous pages
and COW-shares them with the child being forked using copy_present_pte().
We must not take any concurrent page faults on the source vma's as they
are being processed, as we expect both the vma and the pte's behind it
to be stable. For example, the anon_vma_fork() expects the parents
vma->anon_vma to not change during the vma copy.
A concurrent page fault on a page newly marked read-only by the page
copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
source vma, defeating the anon_vma_clone() that wasn't done because the
parent vma originally didn't have an anon_vma, but we now might end up
copying a pte entry for a page that has one.
Before the per-vma lock based changes, the mmap_lock guaranteed
exclusion with concurrent page faults. But now we need to do a
vma_start_write() to make sure no concurrent faults happen on this vma
while it is being processed.
This fix can potentially regress some fork-heavy workloads. Kernel
build time did not show noticeable regression on a 56-core machine while
a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/
dbdef34c-3a07-5951-e1ae-
e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/
b198d649-f4bf-b971-31d0-
e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes:
0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Suren Baghdasaryan [Sat, 8 Jul 2023 19:12:11 +0000 (12:12 -0700)]
mm: lock newly mapped VMA which can be modified after it becomes visible
mmap_region adds a newly created VMA into VMA tree and might modify it
afterwards before dropping the mmap_lock. This poses a problem for page
faults handled under per-VMA locks because they don't take the mmap_lock
and can stumble on this VMA while it's still being modified. Currently
this does not pose a problem since post-addition modifications are done
only for file-backed VMAs, which are not handled under per-VMA lock.
However, once support for handling file-backed page faults with per-VMA
locks is added, this will become a race.
Fix this by write-locking the VMA before inserting it into the VMA tree.
Other places where a new VMA is added into VMA tree do not modify it
after the insertion, so do not need the same locking.
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Suren Baghdasaryan [Sat, 8 Jul 2023 19:12:10 +0000 (12:12 -0700)]
mm: lock a vma before stack expansion
With recent changes necessitating mmap_lock to be held for write while
expanding a stack, per-VMA locks should follow the same rules and be
write-locked to prevent page faults into the VMA being expanded. Add
the necessary locking.
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 8 Jul 2023 19:35:18 +0000 (12:35 -0700)]
Merge tag 'scsi-misc' of git://git./linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"A few late arriving patches that missed the initial pull request. It's
mostly bug fixes (the dt-bindings is a fix for the initial pull)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Remove unused function declaration
scsi: target: docs: Remove tcm_mod_builder.py
scsi: target: iblock: Quiet bool conversion warning with pr_preempt use
scsi: dt-bindings: ufs: qcom: Fix ICE phandle
scsi: core: Simplify scsi_cdl_check_cmd()
scsi: isci: Fix comment typo
scsi: smartpqi: Replace one-element arrays with flexible-array members
scsi: target: tcmu: Replace strlcpy() with strscpy()
scsi: ncr53c8xx: Replace strlcpy() with strscpy()
scsi: lpfc: Fix lpfc_name struct packing
Linus Torvalds [Sat, 8 Jul 2023 19:28:00 +0000 (12:28 -0700)]
Merge tag 'i2c-for-6.5-rc1-part2' of git://git./linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
- xiic patch should have been in the original pull but slipped through
- mpc patch fixes a build regression
- nomadik cleanup
* tag 'i2c-for-6.5-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mpc: Drop unused variable
i2c: nomadik: Remove a useless call in the remove function
i2c: xiic: Don't try to handle more interrupt events after error
Linus Torvalds [Sat, 8 Jul 2023 19:08:39 +0000 (12:08 -0700)]
Merge tag 'hardening-v6.5-rc1-fixes' of git://git./linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- Check for NULL bdev in LoadPin (Matthias Kaehlcke)
- Revert unwanted KUnit FORTIFY build default
- Fix 1-element array causing boot warnings with xhci-hub
* tag 'hardening-v6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
usb: ch9: Replace bmSublinkSpeedAttr 1-element array with flexible array
Revert "fortify: Allow KUnit test to build without FORTIFY"
dm: verity-loadpin: Add NULL pointer check for 'bdev' parameter
Anup Sharma [Fri, 12 May 2023 20:24:34 +0000 (01:54 +0530)]
ntb: hw: amd: Fix debugfs_create_dir error checking
The debugfs_create_dir function returns ERR_PTR in case of error, and the
only correct way to check if an error occurred is 'IS_ERR' inline function.
This patch will replace the null-comparison with IS_ERR.
Signed-off-by: Anup Sharma <anupnewsmail@gmail.com>
Suggested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Linus Torvalds [Sat, 8 Jul 2023 17:21:51 +0000 (10:21 -0700)]
Merge tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git./linux/kernel/git/perf/perf-tools-next
Pull more perf tools updates from Namhyung Kim:
"These are remaining changes and fixes for this cycle.
Build:
- Allow generating vmlinux.h from BTF using `make GEN_VMLINUX_H=1`
and skip if the vmlinux has no BTF.
- Replace deprecated clang -target xxx option by --target=xxx.
perf record:
- Print event attributes with well known type and config symbols in
the debug output like below:
# perf record -e cycles,cpu-clock -C0 -vv true
<SNIP>
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
- Update AMD IBS event error message since it now support per-process
profiling but no priviledge filters.
$ sudo perf record -e ibs_op//k -C 0
Error:
AMD IBS doesn't support privilege filtering. Try again without
the privilege modifiers (like 'k') at the end.
perf lock contention:
- Support CSV style output using -x option
$ sudo perf lock con -ab -x, sleep 1
# output: contended, total wait, max wait, avg wait, type, caller
19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
8, 67608, 27404, 8451, spinlock, __queue_work+0x174
3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
- Add --output option to save the data to a file not to be interfered
by other debug messages.
Test:
- Fix event parsing test on ARM where there's no raw PMU nor supports
PERF_PMU_CAP_EXTENDED_HW_TYPE.
- Update the lock contention test case for CSV output.
- Fix a segfault in the daemon command test.
Vendor events (JSON):
- Add has_event() to check if the given event is available on system
at runtime. On Intel machines, some transaction events may not be
present when TSC extensions are disabled.
- Update Intel event metrics.
Misc:
- Sort symbols by name using an external array of pointers instead of
a rbtree node in the symbol. This will save 16-bytes or 24-bytes
per symbol whether the sorting is actually requested or not.
- Fix unwinding DWARF callstacks using libdw when --symfs option is
used"
* tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (38 commits)
perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported.
perf test: Fix event parsing test on Arm
perf evsel amd: Fix IBS error message
perf: unwind: Fix symfs with libdw
perf symbol: Fix uninitialized return value in symbols__find_by_name()
perf test: Test perf lock contention CSV output
perf lock contention: Add --output option
perf lock contention: Add -x option for CSV style output
perf lock: Remove stale comments
perf vendor events intel: Update tigerlake to 1.13
perf vendor events intel: Update skylakex to 1.31
perf vendor events intel: Update skylake to 57
perf vendor events intel: Update sapphirerapids to 1.14
perf vendor events intel: Update icelakex to 1.21
perf vendor events intel: Update icelake to 1.19
perf vendor events intel: Update cascadelakex to 1.19
perf vendor events intel: Update meteorlake to 1.03
perf vendor events intel: Add rocketlake events/metrics
perf vendor metrics intel: Make transaction metrics conditional
perf jevents: Support for has_event function
...
Linus Torvalds [Sat, 8 Jul 2023 17:02:24 +0000 (10:02 -0700)]
Merge tag 'bitmap-6.5-rc1' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:
"Fixes for different bitmap pieces:
- lib/test_bitmap: increment failure counter properly
The tests that don't use expect_eq() macro to determine that a test
is failured must increment failed_tests explicitly.
- lib/bitmap: drop optimization of bitmap_{from,to}_arr64
bitmap_{from,to}_arr64() optimization is overly optimistic
on 32-bit LE architectures when it's wired to
bitmap_copy_clear_tail().
- nodemask: Drop duplicate check in for_each_node_mask()
As the return value type of first_node() became unsigned, the node
>= 0 became unnecessary.
- cpumask: fix function description kernel-doc notation
- MAINTAINERS: Add bits.h and bitfield.h to the BITMAP API record
Add linux/bits.h and linux/bitfield.h for visibility"
* tag 'bitmap-6.5-rc1' of https://github.com/norov/linux:
MAINTAINERS: Add bitfield.h to the BITMAP API record
MAINTAINERS: Add bits.h to the BITMAP API record
cpumask: fix function description kernel-doc notation
nodemask: Drop duplicate check in for_each_node_mask()
lib/bitmap: drop optimization of bitmap_{from,to}_arr64
lib/test_bitmap: increment failure counter properly
Geert Uytterhoeven [Wed, 5 Jul 2023 14:54:04 +0000 (16:54 +0200)]
lib: dhry: fix sleeping allocations inside non-preemptable section
The Smatch static checker reports the following warnings:
lib/dhry_run.c:38 dhry_benchmark() warn: sleeping in atomic context
lib/dhry_run.c:43 dhry_benchmark() warn: sleeping in atomic context
Indeed, dhry() does sleeping allocations inside the non-preemptable
section delimited by get_cpu()/put_cpu().
Fix this by using atomic allocations instead.
Add error handling, as atomic these allocations may fail.
Link: https://lkml.kernel.org/r/bac6d517818a7cd8efe217c1ad649fffab9cc371.1688568764.git.geert+renesas@glider.be
Fixes:
13684e966d46283e ("lib: dhry: fix unstable smp_processor_id(_) usage")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/
0469eb3a-02eb-4b41-b189-
de20b931fa56@moroto.mountain
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrey Konovalov [Wed, 5 Jul 2023 12:44:02 +0000 (14:44 +0200)]
kasan, slub: fix HW_TAGS zeroing with slub_debug
Commit
946fa0dbf2d8 ("mm/slub: extend redzone check to extra allocated
kmalloc space than requested") added precise kmalloc redzone poisoning to
the slub_debug functionality.
However, this commit didn't account for HW_TAGS KASAN fully initializing
the object via its built-in memory initialization feature. Even though
HW_TAGS KASAN memory initialization contains special memory initialization
handling for when slub_debug is enabled, it does not account for in-object
slub_debug redzones. As a result, HW_TAGS KASAN can overwrite these
redzones and cause false-positive slub_debug reports.
To fix the issue, avoid HW_TAGS KASAN memory initialization when
slub_debug is enabled altogether. Implement this by moving the
__slub_debug_enabled check to slab_post_alloc_hook. Common slab code
seems like a more appropriate place for a slub_debug check anyway.
Link: https://lkml.kernel.org/r/678ac92ab790dba9198f9ca14f405651b97c8502.1688561016.git.andreyknvl@google.com
Fixes:
946fa0dbf2d8 ("mm/slub: extend redzone check to extra allocated kmalloc space than requested")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Will Deacon <will@kernel.org>
Acked-by: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: kasan-dev@googlegroups.com
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrey Konovalov [Tue, 4 Jul 2023 00:52:05 +0000 (02:52 +0200)]
kasan: fix type cast in memory_is_poisoned_n
Commit
bb6e04a173f0 ("kasan: use internal prototypes matching gcc-13
builtins") introduced a bug into the memory_is_poisoned_n implementation:
it effectively removed the cast to a signed integer type after applying
KASAN_GRANULE_MASK.
As a result, KASAN started failing to properly check memset, memcpy, and
other similar functions.
Fix the bug by adding the cast back (through an additional signed integer
variable to make the code more readable).
Link: https://lkml.kernel.org/r/8c9e0251c2b8b81016255709d4ec42942dcaf018.1688431866.git.andreyknvl@google.com
Fixes:
bb6e04a173f0 ("kasan: use internal prototypes matching gcc-13 builtins")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Heiko Stuebner [Tue, 4 Jul 2023 16:39:19 +0000 (18:39 +0200)]
mailmap: add entries for Heiko Stuebner
I am going to lose my vrull.eu address at the end of july, and while
adding it to mailmap I also realised that there are more old addresses
from me dangling, so update .mailmap for all of them.
Link: https://lkml.kernel.org/r/20230704163919.1136784-3-heiko@sntech.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Heiko Stuebner [Tue, 4 Jul 2023 16:39:18 +0000 (18:39 +0200)]
mailmap: update manpage link
Patch series "Update .mailmap for my work address and fix manpage".
While updating mailmap for the going-away address, I also found that on
current systems the manpage linked from the header comment changed.
And in fact it looks like the git mailmap feature got its own manpage.
This patch (of 2):
On recent systems the git-shortlog manpage only tells people to
See gitmailmap(5)
So instead of sending people on a scavenger hunt, put that info into the
header directly. Though keep the old reference around for older systems.
Link: https://lkml.kernel.org/r/20230704163919.1136784-1-heiko@sntech.de
Link: https://lkml.kernel.org/r/20230704163919.1136784-2-heiko@sntech.de
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liu Shixin [Tue, 4 Jul 2023 10:19:42 +0000 (18:19 +0800)]
bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
commit
dd0ff4d12dd2 ("bootmem: remove the vmemmap pages from kmemleak in
put_page_bootmem") fix an overlaps existing problem of kmemleak. But the
problem still existed when HAVE_BOOTMEM_INFO_NODE is disabled, because in
this case, free_bootmem_page() will call free_reserved_page() directly.
Fix the problem by adding kmemleak_free_part() in free_bootmem_page() when
HAVE_BOOTMEM_INFO_NODE is disabled.
Link: https://lkml.kernel.org/r/20230704101942.2819426-1-liushixin2@huawei.com
Fixes:
f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Randy Dunlap [Tue, 4 Jul 2023 05:44:10 +0000 (22:44 -0700)]
MAINTAINERS: add linux-next info
Add linux-next info to MAINTAINERS for ease of finding this data.
Link: https://lkml.kernel.org/r/20230704054410.12527-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Markus Schneider-Pargmann [Wed, 28 Jun 2023 08:13:41 +0000 (10:13 +0200)]
mailmap: add Markus Schneider-Pargmann
Add my old mail address and update my name.
Link: https://lkml.kernel.org/r/20230628081341.3470229-1-msp@baylibre.com
Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 28 Jun 2023 18:55:48 +0000 (19:55 +0100)]
writeback: account the number of pages written back
nr_to_write is a count of pages, so we need to decrease it by the number
of pages in the folio we just wrote, not by 1. Most callers specify
either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes() might
end up writing 512x as many pages as it asked for.
Dave added:
: XFS is the only filesystem this would affect, right? AFAIA, nothing
: else enables large folios and uses writeback through
: write_cache_pages() at this point...
:
: In which case, I'd be surprised if much difference, if any, gets
: noticed by anyone.
Link: https://lkml.kernel.org/r/20230628185548.981888-1-willy@infradead.org
Fixes:
793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peter Collingbourne [Tue, 23 May 2023 00:43:08 +0000 (17:43 -0700)]
mm: call arch_swap_restore() from do_swap_page()
Commit
c145e0b47c77 ("mm: streamline COW logic in do_swap_page()") moved
the call to swap_free() before the call to set_pte_at(), which meant that
the MTE tags could end up being freed before set_pte_at() had a chance to
restore them. Fix it by adding a call to the arch_swap_restore() hook
before the call to swap_free().
Link: https://lkml.kernel.org/r/20230523004312.1807357-2-pcc@google.com
Link: https://linux-review.googlesource.com/id/I6470efa669e8bd2f841049b8c61020c510678965
Fixes:
c145e0b47c77 ("mm: streamline COW logic in do_swap_page()")
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reported-by: Qun-wei Lin <Qun-wei.Lin@mediatek.com>
Closes: https://lore.kernel.org/all/
5050805753ac469e8d727c797c2218a9d780d434.camel@mediatek.com/
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org> [6.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vincent Whitchurch [Thu, 29 Jun 2023 14:17:57 +0000 (16:17 +0200)]
squashfs: fix cache race with migration
Migration replaces the page in the mapping before copying the contents and
the flags over from the old page, so check that the page in the page cache
is really up to date before using it. Without this, stressing squashfs
reads with parallel compaction sometimes results in squashfs reporting
data corruption.
Link: https://lkml.kernel.org/r/20230629-squashfs-cache-migration-v1-1-d50ebe55099d@axis.com
Fixes:
e994f5b677ee ("squashfs: cache partial compressed blocks")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
John Hubbard [Sat, 1 Jul 2023 01:04:42 +0000 (18:04 -0700)]
mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
The following crash happens for me when running the -mm selftests (below).
Specifically, it happens while running the uffd-stress subtests:
kernel BUG at mm/hugetlb.c:7249!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 3238 Comm: uffd-stress Not tainted 6.4.0-hubbard-github+ #109
Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1503 08/03/2018
RIP: 0010:huge_pte_alloc+0x12c/0x1a0
...
Call Trace:
<TASK>
? __die_body+0x63/0xb0
? die+0x9f/0xc0
? do_trap+0xab/0x180
? huge_pte_alloc+0x12c/0x1a0
? do_error_trap+0xc6/0x110
? huge_pte_alloc+0x12c/0x1a0
? handle_invalid_op+0x2c/0x40
? huge_pte_alloc+0x12c/0x1a0
? exc_invalid_op+0x33/0x50
? asm_exc_invalid_op+0x16/0x20
? __pfx_put_prev_task_idle+0x10/0x10
? huge_pte_alloc+0x12c/0x1a0
hugetlb_fault+0x1a3/0x1120
? finish_task_switch+0xb3/0x2a0
? lock_is_held_type+0xdb/0x150
handle_mm_fault+0xb8a/0xd40
? find_vma+0x5d/0xa0
do_user_addr_fault+0x257/0x5d0
exc_page_fault+0x7b/0x1f0
asm_exc_page_fault+0x22/0x30
That happens because a BUG() statement in huge_pte_alloc() attempts to
check that a pte, if present, is a hugetlb pte, but it does so in a
non-lockless-safe manner that leads to a false BUG() report.
We got here due to a couple of bugs, each of which by itself was not quite
enough to cause a problem:
First of all, before commit
c33c794828f2("mm: ptep_get() conversion"), the
BUG() statement in huge_pte_alloc() was itself fragile: it relied upon
compiler behavior to only read the pte once, despite using it twice in the
same conditional.
Next, commit
c33c794828f2 ("mm: ptep_get() conversion") broke that
delicate situation, by causing all direct pte reads to be done via
READ_ONCE(). And so READ_ONCE() got called twice within the same BUG()
conditional, leading to comparing (potentially, occasionally) different
versions of the pte, and thus to false BUG() reports.
Fix this by taking a single snapshot of the pte before using it in the
BUG conditional.
Now, that commit is only partially to blame here but, people doing
bisections will invariably land there, so this will help them find a fix
for a real crash. And also, the previous behavior was unlikely to ever
expose this bug--it was fragile, yet not actually broken.
So that's why I chose this commit for the Fixes tag, rather than the
commit that created the original BUG() statement.
Link: https://lkml.kernel.org/r/20230701010442.2041858-1-jhubbard@nvidia.com
Fixes:
c33c794828f2 ("mm: ptep_get() conversion")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: James Houghton <jthoughton@google.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Anthony Iliopoulos [Wed, 28 Jun 2023 01:34:37 +0000 (03:34 +0200)]
docs: update ocfs2-devel mailing list address
The ocfs2-devel mailing list has been migrated to the kernel.org
infrastructure, update all related documentation pointers to reflect the
change.
Link: https://lkml.kernel.org/r/20230628013437.47030-3-ailiop@suse.com
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Anthony Iliopoulos [Wed, 28 Jun 2023 01:34:36 +0000 (03:34 +0200)]
MAINTAINERS: update ocfs2-devel mailing list address
The ocfs2-devel mailing list has been migrated to the kernel.org
infrastructure, update the related entry to reflect the change.
Link: https://lkml.kernel.org/r/20230628013437.47030-2-ailiop@suse.com
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Suren Baghdasaryan [Thu, 6 Jul 2023 01:14:00 +0000 (18:14 -0700)]
mm: disable CONFIG_PER_VMA_LOCK until its fixed
A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86. Disable per-VMA locks config to
prevent this issue until the fix is confirmed. This is expected to be a
temporary measure.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
[2] https://lore.kernel.org/all/
20230227173632.3292573-30-surenb@google.com
Link: https://lkml.kernel.org/r/20230706011400.2949242-3-surenb@google.com
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/
dbdef34c-3a07-5951-e1ae-
e9c6e3cdf51b@kernel.org/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes:
0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Holger Hoffstätte <holger@applied-asynchrony.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Suren Baghdasaryan [Thu, 6 Jul 2023 01:13:59 +0000 (18:13 -0700)]
fork: lock VMAs of the parent process when forking
Patch series "Avoid memory corruption caused by per-VMA locks", v4.
A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86. Based on the reproducer
provided in [1] we suspect this is caused by the lack of VMA locking while
forking a child process.
Patch 1/2 in the series implements proper VMA locking during fork. I
tested the fix locally using the reproducer and was unable to reproduce
the memory corruption problem.
This fix can potentially regress some fork-heavy workloads. Kernel build
time did not show noticeable regression on a 56-core machine while a
stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~7% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Patch 2/2 disables per-VMA locks until the fix is tested and verified.
This patch (of 2):
When forking a child process, parent write-protects an anonymous page and
COW-shares it with the child being forked using copy_present_pte().
Parent's TLB is flushed right before we drop the parent's mmap_lock in
dup_mmap(). If we get a write-fault before that TLB flush in the parent,
and we end up replacing that anonymous page in the parent process in
do_wp_page() (because, COW-shared with the child), this might lead to some
stale writable TLB entries targeting the wrong (old) page. Similar issue
happened in the past with userfaultfd (see flush_tlb_page() call inside
do_wp_page()).
Lock VMAs of the parent process when forking a child, which prevents
concurrent page faults during fork operation and avoids this issue. This
fix can potentially regress some fork-heavy workloads. Kernel build time
did not show noticeable regression on a 56-core machine while a stress
test mapping 10000 VMAs and forking 5000 times in a tight loop shows ~7%
regression. If such fork time regression is unacceptable, disabling
CONFIG_PER_VMA_LOCK should restore its performance. Further optimizations
are possible if this regression proves to be problematic.
Link: https://lkml.kernel.org/r/20230706011400.2949242-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230706011400.2949242-2-surenb@google.com
Fixes:
0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/
dbdef34c-3a07-5951-e1ae-
e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/
b198d649-f4bf-b971-31d0-
e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=
3D217624
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Holger Hoffsttte <holger@applied-asynchrony.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Geoff Levand [Thu, 29 Jun 2023 23:32:44 +0000 (23:32 +0000)]
ntb.rst: Fix copy and paste error
It seems the text for the NTB MSI Test Client section was copied from the
NTB Tool Test Client, but was not updated for the new section. Corrects
the NTB MSI Test Client section text.
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Geoff Levand [Fri, 30 Jun 2023 21:58:46 +0000 (21:58 +0000)]
ntb_netdev: Fix module_init problem
With both the ntb_transport_init and the ntb_netdev_init_module routines in the
module_init init group, the ntb_netdev_init_module routine can be called before
the ntb_transport_init routine that it depends on is called. To assure the
proper initialization order put ntb_netdev_init_module in the late_initcall
group.
Fixes runtime errors where the ntb_netdev_init_module call fails with ENODEV.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cai Huoqing [Fri, 24 Mar 2023 01:32:20 +0000 (09:32 +0800)]
ntb: intel: Remove redundant pci_clear_master
Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;
pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}
pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cai Huoqing [Fri, 24 Mar 2023 01:32:19 +0000 (09:32 +0800)]
ntb: epf: Remove redundant pci_clear_master
Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;
pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}
pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cai Huoqing [Fri, 24 Mar 2023 01:32:18 +0000 (09:32 +0800)]
ntb_hw_amd: Remove redundant pci_clear_master
Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;
pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}
pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Bjorn Helgaas [Tue, 7 Mar 2023 20:30:21 +0000 (14:30 -0600)]
ntb: idt: drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Palmer Dabbelt [Thu, 13 Oct 2022 21:46:38 +0000 (14:46 -0700)]
MAINTAINERS: git://github -> https://github.com for jonmason
Github deprecated the git:// links about a year ago, so let's move to
the https:// URLs instead.
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://github.blog/2021-09-01-improving-git-protocol-security-github/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
ruanjinjie [Wed, 9 Nov 2022 09:28:52 +0000 (17:28 +0800)]
NTB: EPF: fix possible memory leak in pci_vntb_probe()
As ntb_register_device() don't handle error of device_register(),
if ntb_register_device() returns error in pci_vntb_probe(), name of kobject
which is allocated in dev_set_name() called in device_add() is leaked.
As comment of device_add() says, it should call put_device() to drop the
reference count that was set in device_initialize()
when it fails, so the name can be freed in kobject_cleanup().
Signed-off-by: ruanjinjie <ruanjinjie@huawei.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Jiasheng Jiang [Tue, 22 Nov 2022 03:32:44 +0000 (11:32 +0800)]
NTB: ntb_tool: Add check for devm_kcalloc
As the devm_kcalloc may return NULL pointer,
it should be better to add check for the return
value, as same as the others.
Fixes:
7f46c8b3a552 ("NTB: ntb_tool: Add full multi-port NTB API support")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Yang Yingliang [Thu, 10 Nov 2022 15:19:17 +0000 (23:19 +0800)]
NTB: ntb_transport: fix possible memory leak while device_register() fails
If device_register() returns error, the name allocated by
dev_set_name() need be freed. As comment of device_register()
says, it should use put_device() to give up the reference in
the error path. So fix this by calling put_device(), then the
name can be freed in kobject_cleanup(), and client_dev is freed
in ntb_transport_client_release().
Fixes:
fce8a7bb5b4b ("PCI-Express Non-Transparent Bridge Support")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Yuan Can [Sat, 5 Nov 2022 09:43:22 +0000 (09:43 +0000)]
ntb: intel: Fix error handling in intel_ntb_pci_driver_init()
A problem about ntb_hw_intel create debugfs failed is triggered with the
following log given:
[ 273.112733] Intel(R) PCI-E Non-Transparent Bridge Driver 2.0
[ 273.115342] debugfs: Directory 'ntb_hw_intel' with parent '/' already present!
The reason is that intel_ntb_pci_driver_init() returns
pci_register_driver() directly without checking its return value, if
pci_register_driver() failed, it returns without destroy the newly created
debugfs, resulting the debugfs of ntb_hw_intel can never be created later.
intel_ntb_pci_driver_init()
debugfs_create_dir() # create debugfs directory
pci_register_driver()
driver_register()
bus_add_driver()
priv = kzalloc(...) # OOM happened
# return without destroy debugfs directory
Fix by removing debugfs when pci_register_driver() returns error.
Fixes:
e26a5843f7f5 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Yuan Can [Sat, 5 Nov 2022 09:43:09 +0000 (09:43 +0000)]
NTB: amd: Fix error handling in amd_ntb_pci_driver_init()
A problem about ntb_hw_amd create debugfs failed is triggered with the
following log given:
[ 618.431232] AMD(R) PCI-E Non-Transparent Bridge Driver 1.0
[ 618.433284] debugfs: Directory 'ntb_hw_amd' with parent '/' already present!
The reason is that amd_ntb_pci_driver_init() returns pci_register_driver()
directly without checking its return value, if pci_register_driver()
failed, it returns without destroy the newly created debugfs, resulting
the debugfs of ntb_hw_amd can never be created later.
amd_ntb_pci_driver_init()
debugfs_create_dir() # create debugfs directory
pci_register_driver()
driver_register()
bus_add_driver()
priv = kzalloc(...) # OOM happened
# return without destroy debugfs directory
Fix by removing debugfs when pci_register_driver() returns error.
Fixes:
a1b3695820aa ("NTB: Add support for AMD PCI-Express Non-Transparent Bridge")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Yuan Can [Sat, 5 Nov 2022 09:43:01 +0000 (09:43 +0000)]
ntb: idt: Fix error handling in idt_pci_driver_init()
A problem about ntb_hw_idt create debugfs failed is triggered with the
following log given:
[ 1236.637636] IDT PCI-E Non-Transparent Bridge Driver 2.0
[ 1236.639292] debugfs: Directory 'ntb_hw_idt' with parent '/' already present!
The reason is that idt_pci_driver_init() returns pci_register_driver()
directly without checking its return value, if pci_register_driver()
failed, it returns without destroy the newly created debugfs, resulting
the debugfs of ntb_hw_idt can never be created later.
idt_pci_driver_init()
debugfs_create_dir() # create debugfs directory
pci_register_driver()
driver_register()
bus_add_driver()
priv = kzalloc(...) # OOM happened
# return without destroy debugfs directory
Fix by removing debugfs when pci_register_driver() returns error.
Fixes:
bf2a952d31d2 ("NTB: Add IDT 89HPESxNTx PCIe-switches support")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Darrick J. Wong [Fri, 7 Jul 2023 01:00:59 +0000 (18:00 -0700)]
xfs: fix uninit warning in xfs_growfs_data
Quiet down this gcc warning:
fs/xfs/xfs_fsops.c: In function ‘xfs_growfs_data’:
fs/xfs/xfs_fsops.c:219:21: error: ‘lastag_extended’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
219 | if (lastag_extended) {
| ^~~~~~~~~~~~~~~
fs/xfs/xfs_fsops.c:100:33: note: ‘lastag_extended’ was declared here
100 | bool lastag_extended;
| ^~~~~~~~~~~~~~~
By setting its value explicitly. From code analysis I don't think this
is a real problem, but I have better things to do than analyse this
closely.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Linus Torvalds [Fri, 7 Jul 2023 22:59:33 +0000 (15:59 -0700)]
Merge tag 'mmc-v6.5-2' of git://git./linux/kernel/git/ulfh/mmc
Pull mmc fix from Ulf Hansson:
- Fix regression of detection of eMMC/SD/SDIO cards
* tag 'mmc-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: Revert "mmc: core: Allow mmc_start_host() synchronously detect a card"
Linus Torvalds [Fri, 7 Jul 2023 22:40:17 +0000 (15:40 -0700)]
Merge tag 'sound-fix-6.5-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes that have been gathered recently:
- Two code-typo fixes in the new UMP core
- A fix in jack reporting to avoid the usage of mutex
- A potential data race fix in HD-audio core regmap code
- A potential data race fix in PCM allocation helper code
- HD-audio quirks for ASUS, Clevo and Unis machines
- Constifications in FireWire drivers"
* tag 'sound-fix-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Add quirk for ASUS ROG GZ301V
ALSA: jack: Fix mutex call in snd_jack_report()
ALSA: seq: ump: fix typo in system_2p_ev_to_ump_midi1()
ALSA: hda/realtek: Whitespace fix
ALSA: hda/realtek: Add quirk for ASUS ROG G614Jx
ALSA: hda/realtek: Amend G634 quirk to enable rear speakers
ALSA: hda/realtek: Add quirk for ASUS ROG GA402X
ALSA: hda/realtek: Add quirk for ASUS ROG GX650P
ALSA: pcm: Fix potential data race at PCM memory allocation helpers
ALSA: hda: fix a possible null-pointer dereference due to data race in snd_hdac_regmap_sync()
ALSA: hda/realtek: Add quirks for Unis H3C Desktop B760 & Q760
ALSA: hda/realtek: Add quirk for Clevo NPx0SNx
ALSA: ump: Correct wrong byte size at converting a UMP System message
ALSA: fireface: make read-only const array for model names static
ALSA: oxfw: make read-only const array models static
Linus Torvalds [Fri, 7 Jul 2023 22:07:20 +0000 (15:07 -0700)]
Merge tag 'ceph-for-6.5-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"A bunch of CephFS fixups from Xiubo, mostly around dropping caps,
along with a fix for a regression in the readahead handling code which
sneaked in with the switch to netfs helpers"
* tag 'ceph-for-6.5-rc1' of https://github.com/ceph/ceph-client:
ceph: don't let check_caps skip sending responses for revoke msgs
ceph: issue a cap release immediately if no cap exists
ceph: trigger to flush the buffer when making snapshot
ceph: fix blindly expanding the readahead windows
ceph: add a dedicated private data for netfs rreq
ceph: voluntarily drop Xx caps for requests those touch parent mtime
ceph: try to dump the msgs when decoding fails
ceph: only send metrics when the MDS rank is ready
Linus Torvalds [Fri, 7 Jul 2023 21:59:38 +0000 (14:59 -0700)]
Merge tag 'ntfs3_for_6.5' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
"Updates:
- support /proc/fs/ntfs3/<dev>/volinfo and label
- alternative boot if primary boot is corrupted
- small optimizations
Fixes:
- fix endian problems
- fix logic errors
- code refactoring and reformatting"
* tag 'ntfs3_for_6.5' of https://github.com/Paragon-Software-Group/linux-ntfs3:
fs/ntfs3: Correct mode for label entry inside /proc/fs/ntfs3/
fs/ntfs3: Add support /proc/fs/ntfs3/<dev>/volinfo and /proc/fs/ntfs3/<dev>/label
fs/ntfs3: Fix endian problem
fs/ntfs3: Add ability to format new mft records with bigger/smaller header
fs/ntfs3: Code refactoring
fs/ntfs3: Code formatting
fs/ntfs3: Do not update primary boot in ntfs_init_from_boot()
fs/ntfs3: Alternative boot if primary boot is corrupted
fs/ntfs3: Mark ntfs dirty when on-disk struct is corrupted
fs/ntfs3: Fix ntfs_atomic_open
fs/ntfs3: Correct checking while generating attr_list
fs/ntfs3: Use __GFP_NOWARN allocation at ntfs_load_attr_list()
fs: ntfs3: Fix possible null-pointer dereferences in mi_read()
fs/ntfs3: Return error for inconsistent extended attributes
fs/ntfs3: Enhance sanity check while generating attr_list
fs/ntfs3: Use wrapper i_blocksize() in ntfs_zero_range()
ntfs: Fix panic about slab-out-of-bounds caused by ntfs_listxattr()
Linus Torvalds [Fri, 7 Jul 2023 21:51:37 +0000 (14:51 -0700)]
Merge tag 'fsnotify_for_v6.5-rc2' of git://git./linux/kernel/git/jack/linux-fs
Pull fsnotify fix from Jan Kara:
"A fix for fanotify to disallow creating of mount or superblock marks
for kernel internal pseudo filesystems"
* tag 'fsnotify_for_v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: disallow mount/sb marks on kernel internal pseudo fs
Linus Torvalds [Fri, 7 Jul 2023 17:07:19 +0000 (10:07 -0700)]
Merge tag 'riscv-for-linus-6.5-mw2' of git://git./linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- A bunch of fixes/cleanups from the first part of the merge window,
mostly related to ACPI and vector as those were large
- Some documentation improvements, mostly related to the new code
- The "riscv,isa" DT key is deprecated
- Support for link-time dead code elimination
- Support for minor fault registration in userfaultd
- A handful of cleanups around CMO alternatives
* tag 'riscv-for-linus-6.5-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (23 commits)
riscv: mm: mark noncoherent_supported as __ro_after_init
riscv: mm: mark CBO relate initialization funcs as __init
riscv: errata: thead: only set cbom size & noncoherent during boot
riscv: Select HAVE_ARCH_USERFAULTFD_MINOR
RISC-V: Document the ISA string parsing rules for ACPI
risc-v: Fix order of IPI enablement vs RCU startup
mm: riscv: fix an unsafe pte read in huge_pte_alloc()
dt-bindings: riscv: deprecate riscv,isa
RISC-V: drop error print from riscv_hartid_to_cpuid()
riscv: Discard vector state on syscalls
riscv: move memblock_allow_resize() after linear mapping is ready
riscv: Enable ARCH_SUSPEND_POSSIBLE for s2idle
riscv: vdso: include vdso/vsyscall.h for vdso_data
selftests: Test RISC-V Vector's first-use handler
riscv: vector: clear V-reg in the first-use trap
riscv: vector: only enable interrupts in the first-use trap
RISC-V: Fix up some vector state related build failures
RISC-V: Document that V registers are clobbered on syscalls
riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD
riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION
...
Linus Torvalds [Fri, 7 Jul 2023 17:00:30 +0000 (10:00 -0700)]
Merge tag 'powerpc-6.5-2' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix PCIe MEM size for pci2 node on Turris 1.x boards
- Two minor build fixes
Thanks to Christophe Leroy, Douglas Anderson, Pali Rohár, Petr Mladek,
and Randy Dunlap.
* tag 'powerpc-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: dts: turris1x.dts: Fix PCIe MEM size for pci2 node
powerpc: Include asm/nmi.c in mobility.c for watchdog_hardlockup_set_timeout_pct()
powerpc: allow PPC_EARLY_DEBUG_CPM only when SERIAL_CPM=y
Linus Torvalds [Fri, 7 Jul 2023 16:55:31 +0000 (09:55 -0700)]
Merge tag 'apparmor-pr-2023-07-06' of git://git./linux/kernel/git/jj/linux-apparmor
Pull apparmor updates from John Johansen:
- fix missing error check for rhashtable_insert_fast
- add missing failure check in compute_xmatch_perms
- fix policy_compat permission remap with extended permissions
- fix profile verification and enable it
- fix kzalloc perms tables for shared dfas
- Fix kernel-doc header for verify_dfa_accept_index
- aa_buffer: Convert 1-element array to flexible array
- Return directly after a failed kzalloc() in two functions
- fix use of strcpy in policy_unpack_test
- fix kernel-doc complaints
- Fix some kernel-doc comments
* tag 'apparmor-pr-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: Fix kernel-doc header for verify_dfa_accept_index
apparmor: fix: kzalloc perms tables for shared dfas
apparmor: fix profile verification and enable it
apparmor: fix policy_compat permission remap with extended permissions
apparmor: aa_buffer: Convert 1-element array to flexible array
apparmor: add missing failure check in compute_xmatch_perms
apparmor: fix missing error check for rhashtable_insert_fast
apparmor: Return directly after a failed kzalloc() in two functions
AppArmor: Fix some kernel-doc comments
apparmor: fix use of strcpy in policy_unpack_test
apparmor: fix kernel-doc complaints
Thomas Gleixner [Wed, 5 Jul 2023 08:59:23 +0000 (10:59 +0200)]
x86/smp: Don't send INIT to boot CPU
Parking CPUs in INIT works well, except for the crash case when the CPU
which invokes smp_park_other_cpus_in_init() is not the boot CPU. Sending
INIT to the boot CPU resets the whole machine.
Prevent this by validating that this runs on the boot CPU. If not fall back
and let CPUs hang in HLT.
Fixes:
45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible")
Reported-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/87ttui91jo.ffs@tglx
Thomas Bogendoerfer [Thu, 6 Jul 2023 16:36:10 +0000 (18:36 +0200)]
MIPS: kvm: Fix build error with KVM_MIPS_DEBUG_COP0_COUNTERS enabled
Commit
e4de20576986 ("MIPS: KVM: Fix NULL pointer dereference") missed
converting one place accessing cop0 registers, which results in a build
error, if KVM_MIPS_DEBUG_COP0_COUNTERS is enabled.
Fixes:
e4de20576986 ("MIPS: KVM: Fix NULL pointer dereference")
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Luke D. Jones [Thu, 6 Jul 2023 22:33:23 +0000 (10:33 +1200)]
ALSA: hda/realtek: Add quirk for ASUS ROG GZ301V
Adds the required quirk to enable the Cirrus amp and correct pins
on the ASUS ROG GZ301V series which uses an SPI connected Cirrus amp.
While this works if the related _DSD properties are made available, these
aren't included in the ACPI of these laptops (yet).
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20230706223323.30871-2-luke@ljones.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linus Torvalds [Fri, 7 Jul 2023 05:42:54 +0000 (22:42 -0700)]
Merge tag 'drm-next-2023-07-07' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Lots of fixes, mostly i915 and amdgpu. It's two weeks of i915, and I
think three weeks of amdgpu.
fbdev:
- Fix module infos on sparc
panel:
- Fix mode on Starry-ili9882t
i915:
- Allow DC states along with PW2 only for PWB functionality [adlp+]
- Fix SSC selection for MPLLA [mtl]
- Use hw.adjusted mode when calculating io/fast wake times [psr]
- Apply min softlimit correctly [guc/slpc]
- Assign correct hdcp content type [hdcp]
- Add missing forward declarations/includes to display power headers
- Fix BDW PSR AUX CH data register offsets [psr]
- Use mock device info for creating mock device
amdgpu:
- Misc cleanups
- GFX 9.4.3 fixes
- DEBUGFS build fix
- Fix LPDDR5 reporting
- ASPM fixes
- DCN 3.1.4 fixes
- DP MST fixes
- DCN 3.2.x fixes
- Display PSR TCON fixes
- SMU 13.x fixes
- RAS fixes
- Vega12/20 SMU fixes
- PSP flashing cleanup
- GFX9 MCBP fixes
- SR-IOV fixes
- GPUVM clear mappings fix for always valid BOs
- Add FAMS quirk for problematic monitor
- Fix possible UAF
- Better handle monentary temperature fluctuations
- SDMA 4.4.2 fixes
- Fencing fix"
* tag 'drm-next-2023-07-07' of git://anongit.freedesktop.org/drm/drm: (83 commits)
drm/i915: use mock device info for creating mock device
drm/i915/psr: Fix BDW PSR AUX CH data register offsets
drm/amdgpu: Fix potential fence use-after-free v2
drm/amd/pm: avoid unintentional shutdown due to temperature momentary fluctuation
drm/amd/pm: expose swctf threshold setting for legacy powerplay
drm/amd/display: 3.2.241
drm/amd/display: Take full update path if number of planes changed
drm/amd/display: Create debugging mechanism for Gaming FAMS
drm/amd/display: Add monitor specific edid quirk
drm/amd/display: For new fast update path, loop through each surface
drm/amd/display: Remove Phantom Pipe Check When Calculating K1 and K2
drm/amd/display: Limit new fast update path to addr and gamma / color
drm/amd/display: Fix the delta clamping for shaper LUT
drm/amdgpu: Keep non-psp path for partition switch
drm/amd/display: program DPP shaper and 3D LUT if updated
Revert "drm/amd/display: edp do not add non-edid timings"
drm/amdgpu: share drm device for pci amdgpu device with 1st partition device
drm/amd/pm: Add GFX v9.4.3 unique id to sysfs
drm/amd/pm: Enable pp_feature attribute
drm/amdgpu/vcn: Need to unpause dpg before stop dpg
...
Linus Torvalds [Fri, 7 Jul 2023 05:25:06 +0000 (22:25 -0700)]
Merge tag 'acpi-6.5-rc1-3' of git://git./linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"These fix a couple of compiler warnings, refine an ACPI device
enumeration quirk to address a driver regression and clean up code.
Specifics:
- Make acpi_companion_match() return a const pointer and update its
callers accordingly (Andy Shevchenko)
- Move the extern declaration of the acpi_root variable to a header
file so as to address a compiler warning (Andy Shevchenko)
- Address compiler warnings in the ACPI device enumeration code by
adding a missing header file include to it (Ben Dooks)
- Refine the SMB0001 quirk in the ACPI device enumeration code so as
to address an i2c-scmi driver regression (Andy Shevchenko)
- Clean up two pieces of the ACPI device enumeration code (Andy
Shevchenko)"
* tag 'acpi-6.5-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: scan: Use the acpi_match_acpi_device() helper
ACPI: platform: Move SMB0001 HID to the header and reuse
ACPI: platform: Ignore SMB0001 only when it has resources
ACPI: bus: Introduce acpi_match_acpi_device() helper
ACPI: scan: fix undeclared variable warnings by including sleep.h
ACPI: bus: Constify acpi_companion_match() returned value
ACPI: scan: Move acpi_root to internal header
Linus Torvalds [Fri, 7 Jul 2023 05:15:38 +0000 (22:15 -0700)]
Merge tag 'docs-6.5-2' of git://git.lwn.net/linux
Pull mode documentation updates from Jonathan Corbet:
"A half-dozen late arriving docs patches. They are mostly fixes, but we
also have a kernel-doc tweak for enums and the long-overdue removal of
the outdated and redundant patch-submission comments at the top of the
MAINTAINERS file"
* tag 'docs-6.5-2' of git://git.lwn.net/linux:
scripts: kernel-doc: support private / public marking for enums
Documentation: KVM: SEV: add a missing backtick
Documentation: ACPI: fix typo in ssdt-overlays.rst
Fix documentation of panic_on_warn
docs: remove the tips on how to submit patches from MAINTAINERS
docs: fix typo in zh_TW and zh_CN translation
Linus Torvalds [Fri, 7 Jul 2023 02:24:11 +0000 (19:24 -0700)]
Merge tag 'spi-fix-v6.5-merge-window' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few mostly minor fixes that came in during the merge window, plus
one administrative update for Jonas' e-mail address.
The spi-geni-qcom fix is more major than the others, fixing the newly
added DMA support for large reads which trigger DMA"
* tag 'spi-fix-v6.5-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: bcm{63xx,bca}-hsspi: update my email address
spi: rzv2m-csi: Fix SoC product name
spi: bcm-qspi: return error if neither hif_mspi nor mspi is available
spi: spi-geni-qcom: enable SPI_CONTROLLER_MUST_TX for GPI DMA mode
Linus Torvalds [Fri, 7 Jul 2023 02:20:23 +0000 (19:20 -0700)]
Merge tag 'regulator-fix-v6.5-merge-window' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"A simple dependency fix for a newly added driver"
* tag 'regulator-fix-v6.5-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: raa215300: Add build dependency with COMMON_CLK
Linus Torvalds [Fri, 7 Jul 2023 02:07:15 +0000 (19:07 -0700)]
Merge tag 'trace-v6.5-2' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix bad git merge of #endif in arm64 code
A merge of the arm64 tree caused #endif to go into the wrong place
- Fix crash on lseek of write access to tracefs/error_log
Opening error_log as write only, and then doing an lseek() causes a
kernel panic, because the lseek() handle expects a "seq_file" to
exist (which is not done on write only opens). Use tracing_lseek()
that tests for this instead of calling the default seq lseek handler.
- Check for negative instead of -E2BIG for error on strscpy() returns
Instead of testing for -E2BIG from strscpy(), to be more robust,
check for less than zero, which will make sure it catches any error
that strscpy() may someday return.
* tag 'trace-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/boot: Test strscpy() against less than zero for error
arm64: ftrace: fix build error with CONFIG_FUNCTION_GRAPH_TRACER=n
tracing: Fix null pointer dereference in tracing_err_log_open()
Linus Torvalds [Fri, 7 Jul 2023 02:01:38 +0000 (19:01 -0700)]
Merge tag 'v6.5/vfs.fixes.2' of git://git./linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains two minor fixes for Jan's rename locking work:
- Unlocking the source inode was guarded by a check whether source
was non-NULL. This doesn't make sense because source must be
non-NULL and the commit message explains in detail why
- The lock_two_nondirectories() helper called WARN_ON_ONCE() and
dereferenced the inodes unconditionally but the underlying
lock_two_inodes() helper and the kernel documentation for that
function are clear that it is valid to pass NULL arguments, so a
non-NULL check is needed. No callers currently pass NULL arguments
but let's not knowingly leave landmines around"
* tag 'v6.5/vfs.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: don't assume arguments are non-NULL
fs: no need to check source
Dave Airlie [Fri, 7 Jul 2023 01:05:09 +0000 (11:05 +1000)]
Merge tag 'drm-misc-next-fixes-2023-07-06' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
Short summary of fixes pull:
* panel: Fix mode on Starry-ili9882t
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230706112203.GA30555@linux-uq9g
Dave Airlie [Fri, 7 Jul 2023 00:52:23 +0000 (10:52 +1000)]
Merge tag 'drm-intel-next-fixes-2023-07-06' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
- Fix BDW PSR AUX CH data register offsets [psr] (Ville Syrjälä)
- Use mock device info for creating mock device (Jani Nikula)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZKZ6VIeInBYrBuph@tursulin-desk
Dave Airlie [Fri, 7 Jul 2023 00:14:26 +0000 (10:14 +1000)]
Merge tag 'amd-drm-fixes-6.5-2023-06-30-1' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-fixes-6.5-2023-06-30-1:
amdgpu:
- Misc cleanups
- GFX 9.4.3 fixes
- DEBUGFS build fix
- Fix LPDDR5 reporting
- ASPM fixes
- DCN 3.1.4 fixes
- DP MST fixes
- DCN 3.2.x fixes
- Display PSR TCON fixes
- SMU 13.x fixes
- RAS fixes
- Vega12/20 SMU fixes
- PSP flashing cleanup
- GFX9 MCBP fixes
- SR-IOV fixes
- GPUVM clear mappings fix for always valid BOs
- Add FAMS quirk for problematic monitor
- Fix possible UAF
- Better handle monentary temperature fluctuations
- SDMA 4.4.2 fixes
- Fencing fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230630175757.8128-1-alexander.deucher@amd.com
Dave Airlie [Thu, 6 Jul 2023 23:53:01 +0000 (09:53 +1000)]
Merge tag 'drm-intel-next-fixes-2023-06-29' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
- Allow DC states along with PW2 only for PWB functionality [adlp+] (Imre Deak)
- Fix SSC selection for MPLLA [mtl] (Radhakrishna Sripada)
- Use hw.adjusted mode when calculating io/fast wake times [psr] (Jouni Högander)
- Apply min softlimit correctly [guc/slpc] (Vinay Belgaumkar)
- Assign correct hdcp content type [hdcp] (Suraj Kandpal)
- Add missing forward declarations/includes to display power headers (Imre Deak)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZJ1WpY+GF9NcsWXp@tursulin-desk
Linus Torvalds [Thu, 6 Jul 2023 20:18:30 +0000 (13:18 -0700)]
Merge tag 's390-6.5-2' of git://git./linux/kernel/git/s390/linux
Pull more s390 updates from Alexander Gordeev:
- Fix virtual vs physical address confusion in vmem_add_range() and
vmem_remove_range() functions
- Include <linux/io.h> instead of <asm/io.h> and <asm-generic/io.h>
throughout s390 code
- Make all PSW related defines also available for assembler files.
Remove PSW_DEFAULT_KEY define from uapi for that
- When adding an undefined symbol the build still succeeds, but
userspace crashes trying to execute VDSO, because the symbol is not
resolved. Add undefined symbols check to prevent that
- Use kvmalloc_array() instead of kzalloc() for allocaton of 256k
memory when executing s390 crypto adapter IOCTL
- Add -fPIE flag to prevent decompressor misaligned symbol build error
with clang
- Use .balign instead of .align everywhere. This is a no-op for s390,
but with this there no mix in using .align and .balign anymore
- Filter out -mno-pic-data-is-text-relative flag when compiling kernel
to prevent VDSO build error
- Rework entering of DAT-on mode on CPU restart to use PSW_KERNEL_BITS
mask directly
- Do not retry administrative requests to some s390 crypto cards, since
the firmware assumes replay attacks
- Remove most of the debug code, which is build in when kernel config
option CONFIG_ZCRYPT_DEBUG is enabled
- Remove CONFIG_ZCRYPT_MULTIDEVNODES kernel config option and switch
off the multiple devices support for the s390 zcrypt device driver
- With the conversion to generic entry machine checks are accounted to
the current context instead of irq time. As result, the STCKF
instruction at the beginning of the machine check handler and the
lowcore member are no longer required, therefore remove it
- Fix various typos found with codespell
- Minor cleanups to CPU-measurement Counter and Sampling Facilities
code
- Revert patch that removes VMEM_MAX_PHYS macro, since it causes a
regression
* tag 's390-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (25 commits)
Revert "s390/mm: get rid of VMEM_MAX_PHYS macro"
s390/cpum_sf: remove check on CPU being online
s390/cpum_sf: handle casts consistently
s390/cpum_sf: remove unnecessary debug statement
s390/cpum_sf: remove parameter in call to pr_err
s390/cpum_sf: simplify function setup_pmu_cpu
s390/cpum_cf: remove unneeded debug statements
s390/entry: remove mcck clock
s390: fix various typos
s390/zcrypt: remove ZCRYPT_MULTIDEVNODES kernel config option
s390/zcrypt: do not retry administrative requests
s390/zcrypt: cleanup some debug code
s390/entry: rework entering DAT-on mode on CPU restart
s390/mm: fence off VM macros from asm and linker
s390: include linux/io.h instead of asm/io.h
s390/ptrace: make all psw related defines also available for asm
s390/ptrace: remove PSW_DEFAULT_KEY from uapi
s390/vdso: filter out mno-pic-data-is-text-relative cflag
s390: consistently use .balign instead of .align
s390/decompressor: fix misaligned symbol build error
...
Guenter Roeck [Tue, 4 Jul 2023 15:00:31 +0000 (08:00 -0700)]
i2c: mpc: Drop unused variable
Fix the following build error.
Error log:
drivers/i2c/busses/i2c-mpc.c: In function 'mpc_i2c_setup_512x':
drivers/i2c/busses/i2c-mpc.c:310:20: error: unused variable 'pval'
Fixes:
9d178e00583e ("i2c: mpc: Use of_property_read_reg() to parse "reg"")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Christophe JAILLET [Tue, 4 Jul 2023 19:50:28 +0000 (21:50 +0200)]
i2c: nomadik: Remove a useless call in the remove function
Since commit
235602146ec9 ("i2c-nomadik: turn the platform driver to an amba
driver"), there is no more request_mem_region() call in this driver.
So remove the release_mem_region() call from the remove function which is
likely a left over.
Fixes:
235602146ec9 ("i2c-nomadik: turn the platform driver to an amba driver")
Cc: <stable@vger.kernel.org> # v3.6+
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Robert Hancock [Tue, 6 Jun 2023 18:25:58 +0000 (12:25 -0600)]
i2c: xiic: Don't try to handle more interrupt events after error
In xiic_process, it is possible that error events such as arbitration
lost or TX error can be raised in conjunction with other interrupt flags
such as TX FIFO empty or bus not busy. Error events result in the
controller being reset and the error returned to the calling request,
but the function could potentially try to keep handling the other
events, such as by writing more messages into the TX FIFO. Since the
transaction has already failed, this is not helpful and will just cause
issues.
This problem has been present ever since:
commit
7f9906bd7f72 ("i2c: xiic: Service all interrupts in isr")
which allowed non-error events to be handled after errors, but became
more obvious after:
commit
743e227a8959 ("i2c: xiic: Defer xiic_wakeup() and
__xiic_start_xfer() in xiic_process()")
which reworked the code to add a WARN_ON which triggers if both the
xfer_more and wakeup_req flags were set, since this combination is
not supposed to happen, but was occurring in this scenario.
Skip further interrupt handling after error flags are detected to avoid
this problem.
Fixes:
7f9906bd7f72 ("i2c: xiic: Service all interrupts in isr")
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Acked-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>