Jan Kara [Tue, 12 Jul 2022 10:54:25 +0000 (12:54 +0200)]
ext2: factor our freeing of xattr block reference
Free of xattr block reference is opencode in two places. Factor it out
into a separate function and use it.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-6-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Tue, 12 Jul 2022 10:54:24 +0000 (12:54 +0200)]
ext4: fix race when reusing xattr blocks
When ext4_xattr_block_set() decides to remove xattr block the following
race can happen:
CPU1 CPU2
ext4_xattr_block_set() ext4_xattr_release_block()
new_bh = ext4_xattr_block_cache_find()
lock_buffer(bh);
ref = le32_to_cpu(BHDR(bh)->h_refcount);
if (ref == 1) {
...
mb_cache_entry_delete();
unlock_buffer(bh);
ext4_free_blocks();
...
ext4_forget(..., bh, ...);
jbd2_journal_revoke(..., bh);
ext4_journal_get_write_access(..., new_bh, ...)
do_get_write_access()
jbd2_journal_cancel_revoke(..., new_bh);
Later the code in ext4_xattr_block_set() finds out the block got freed
and cancels reusal of the block but the revoke stays canceled and so in
case of block reuse and journal replay the filesystem can get corrupted.
If the race works out slightly differently, we can also hit assertions
in the jbd2 code.
Fix the problem by making sure that once matching mbcache entry is
found, code dropping the last xattr block reference (or trying to modify
xattr block in place) waits until the mbcache entry reference is
dropped. This way code trying to reuse xattr block is protected from
someone trying to drop the last reference to xattr block.
Reported-and-tested-by: Ritesh Harjani <ritesh.list@gmail.com>
CC: stable@vger.kernel.org
Fixes:
82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-5-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Tue, 12 Jul 2022 10:54:23 +0000 (12:54 +0200)]
ext4: unindent codeblock in ext4_xattr_block_set()
Remove unnecessary else (and thus indentation level) from a code block
in ext4_xattr_block_set(). It will also make following code changes
easier. No functional changes.
CC: stable@vger.kernel.org
Fixes:
82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Tue, 12 Jul 2022 10:54:22 +0000 (12:54 +0200)]
ext4: remove EA inode entry from mbcache on inode eviction
Currently we remove EA inode from mbcache as soon as its xattr refcount
drops to zero. However there can be pending attempts to reuse the inode
and thus refcount handling code has to handle the situation when
refcount increases from zero anyway. So save some work and just keep EA
inode in mbcache until it is getting evicted. At that moment we are sure
following iget() of EA inode will fail anyway (or wait for eviction to
finish and load things from the disk again) and so removing mbcache
entry at that moment is fine and simplifies the code a bit.
CC: stable@vger.kernel.org
Fixes:
82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Tue, 12 Jul 2022 10:54:21 +0000 (12:54 +0200)]
mbcache: add functions to delete entry if unused
Add function mb_cache_entry_delete_or_get() to delete mbcache entry if
it is unused and also add a function to wait for entry to become unused
- mb_cache_entry_wait_unused(). We do not share code between the two
deleting function as one of them will go away soon.
CC: stable@vger.kernel.org
Fixes:
82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Tue, 12 Jul 2022 10:54:20 +0000 (12:54 +0200)]
mbcache: don't reclaim used entries
Do not reclaim entries that are currently used by somebody from a
shrinker. Firstly, these entries are likely useful. Secondly, we will
need to keep such entries to protect pending increment of xattr block
refcount.
CC: stable@vger.kernel.org
Fixes:
82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Lukas Czerner [Mon, 4 Jul 2022 14:27:21 +0000 (16:27 +0200)]
ext4: make sure ext4_append() always allocates new block
ext4_append() must always allocate a new block, otherwise we run the
risk of overwriting existing directory block corrupting the directory
tree in the process resulting in all manner of problems later on.
Add a sanity check to see if the logical block is already allocated and
error out if it is.
Cc: stable@kernel.org
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20220704142721.157985-2-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Lukas Czerner [Mon, 4 Jul 2022 14:27:20 +0000 (16:27 +0200)]
ext4: check if directory block is within i_size
Currently ext4 directory handling code implicitly assumes that the
directory blocks are always within the i_size. In fact ext4_append()
will attempt to allocate next directory block based solely on i_size and
the i_size is then appropriately increased after a successful
allocation.
However, for this to work it requires i_size to be correct. If, for any
reason, the directory inode i_size is corrupted in a way that the
directory tree refers to a valid directory block past i_size, we could
end up corrupting parts of the directory tree structure by overwriting
already used directory blocks when modifying the directory.
Fix it by catching the corruption early in __ext4_read_dirblock().
Addresses Red-Hat-Bugzilla: #2070205
CVE: CVE-2022-1184
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20220704142721.157985-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Ojaswin Mujoo [Mon, 4 Jul 2022 05:46:03 +0000 (11:16 +0530)]
ext4: reflect mb_optimize_scan value in options file
Add support to display the mb_optimize_scan value in
/proc/fs/ext4/<dev>/options file. The option is only
displayed when the value is non default.
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20220704054603.21462-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Ye Bin [Wed, 22 Jun 2022 09:02:23 +0000 (17:02 +0800)]
ext4: avoid remove directory when directory is corrupted
Now if check directoy entry is corrupted, ext4_empty_dir may return true
then directory will be removed when file system mounted with "errors=continue".
In order not to make things worse just return false when directory is corrupted.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220622090223.682234-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jiang Jian [Tue, 21 Jun 2022 06:15:31 +0000 (14:15 +0800)]
ext4: aligned '*' in comments
The '*' in the comment is not aligned.
Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com>
Link: https://lore.kernel.org/r/20220621061531.19669-1-jiangjian@cdjrlc.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Bagas Sanjaya [Sun, 19 Jun 2022 07:29:39 +0000 (14:29 +0700)]
Documentation: ext4: fix cell spacing of table heading on blockmap table
Commit
3103084afcf234 ("ext4, doc: remove unnecessary escaping") removes
redundant underscore escaping, however the cell spacing in heading row of
blockmap table became not aligned anymore, hence triggers malformed table
warning:
Documentation/filesystems/ext4/blockmap.rst:3: WARNING: Malformed table.
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| i.i_block Offset | Where It Points |
<snipped>...
The warning caused the table not being loaded.
Realign the heading row cell by adding missing space at the first cell
to fix the warning.
Fixes:
3103084afcf234 ("ext4, doc: remove unnecessary escaping")
Cc: stable@kernel.org
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Wang Jianjian <wangjianjian3@huawei.com>
Cc: linux-ext4@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20220619072938.7334-1-bagasdotme@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Li Lingfeng [Fri, 17 Jun 2022 06:25:15 +0000 (14:25 +0800)]
ext4: recover csum seed of tmp_inode after migrating to extents
When migrating to extents, the checksum seed of temporary inode
need to be replaced by inode's, otherwise the inode checksums
will be incorrect when swapping the inodes data.
However, the temporary inode can not match it's checksum to
itself since it has lost it's own checksum seed.
mkfs.ext4 -F /dev/sdc
mount /dev/sdc /mnt/sdc
xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/sdc/testfile
chattr -e /mnt/sdc/testfile
chattr +e /mnt/sdc/testfile
umount /dev/sdc
fsck -fn /dev/sdc
========
...
Pass 1: Checking inodes, blocks, and sizes
Inode 13 passes checks, but checksum does not match inode. Fix? no
...
========
The fix is simple, save the checksum seed of temporary inode, and
recover it after migrating to extents.
Fixes:
e81c9302a6c3 ("ext4: set csum seed in tmp inode while migrating to extents")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220617062515.2113438-1-lilingfeng3@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Ye Bin [Fri, 17 Jun 2022 01:39:35 +0000 (09:39 +0800)]
ext4: fix warning in ext4_iomap_begin as race between bmap and write
We got issue as follows:
------------[ cut here ]------------
WARNING: CPU: 3 PID: 9310 at fs/ext4/inode.c:3441 ext4_iomap_begin+0x182/0x5d0
RIP: 0010:ext4_iomap_begin+0x182/0x5d0
RSP: 0018:
ffff88812460fa08 EFLAGS:
00010293
RAX:
ffff88811f168000 RBX:
0000000000000000 RCX:
ffffffff97793c12
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000003
RBP:
ffff88812c669160 R08:
ffff88811f168000 R09:
ffffed10258cd20f
R10:
ffff88812c669077 R11:
ffffed10258cd20e R12:
0000000000000001
R13:
00000000000000a4 R14:
000000000000000c R15:
ffff88812c6691ee
FS:
00007fd0d6ff3740(0000) GS:
ffff8883af180000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007fd0d6dda290 CR3:
0000000104a62000 CR4:
00000000000006e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
iomap_apply+0x119/0x570
iomap_bmap+0x124/0x150
ext4_bmap+0x14f/0x250
bmap+0x55/0x80
do_vfs_ioctl+0x952/0xbd0
__x64_sys_ioctl+0xc6/0x170
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Above issue may happen as follows:
bmap write
bmap
ext4_bmap
iomap_bmap
ext4_iomap_begin
ext4_file_write_iter
ext4_buffered_write_iter
generic_perform_write
ext4_da_write_begin
ext4_da_write_inline_data_begin
ext4_prepare_inline_data
ext4_create_inline_data
ext4_set_inode_flag(inode,
EXT4_INODE_INLINE_DATA);
if (WARN_ON_ONCE(ext4_has_inline_data(inode))) ->trigger bug_on
To solved above issue hold inode lock in ext4_bamp.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20220617013935.397596-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Baokun Li [Thu, 16 Jun 2022 02:13:58 +0000 (10:13 +0800)]
ext4: correct the misjudgment in ext4_iget_extra_inode
Use the EXT4_INODE_HAS_XATTR_SPACE macro to more accurately
determine whether the inode have xattr space.
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220616021358.2504451-5-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Baokun Li [Thu, 16 Jun 2022 02:13:57 +0000 (10:13 +0800)]
ext4: correct max_inline_xattr_value_size computing
If the ext4 inode does not have xattr space, 0 is returned in the
get_max_inline_xattr_value_size function. Otherwise, the function returns
a negative value when the inode does not contain EXT4_STATE_XATTR.
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220616021358.2504451-4-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Baokun Li [Thu, 16 Jun 2022 02:13:56 +0000 (10:13 +0800)]
ext4: fix use-after-free in ext4_xattr_set_entry
Hulk Robot reported a issue:
==================================================================
BUG: KASAN: use-after-free in ext4_xattr_set_entry+0x18ab/0x3500
Write of size 4105 at addr
ffff8881675ef5f4 by task syz-executor.0/7092
CPU: 1 PID: 7092 Comm: syz-executor.0 Not tainted 4.19.90-dirty #17
Call Trace:
[...]
memcpy+0x34/0x50 mm/kasan/kasan.c:303
ext4_xattr_set_entry+0x18ab/0x3500 fs/ext4/xattr.c:1747
ext4_xattr_ibody_inline_set+0x86/0x2a0 fs/ext4/xattr.c:2205
ext4_xattr_set_handle+0x940/0x1300 fs/ext4/xattr.c:2386
ext4_xattr_set+0x1da/0x300 fs/ext4/xattr.c:2498
__vfs_setxattr+0x112/0x170 fs/xattr.c:149
__vfs_setxattr_noperm+0x11b/0x2a0 fs/xattr.c:180
__vfs_setxattr_locked+0x17b/0x250 fs/xattr.c:238
vfs_setxattr+0xed/0x270 fs/xattr.c:255
setxattr+0x235/0x330 fs/xattr.c:520
path_setxattr+0x176/0x190 fs/xattr.c:539
__do_sys_lsetxattr fs/xattr.c:561 [inline]
__se_sys_lsetxattr fs/xattr.c:557 [inline]
__x64_sys_lsetxattr+0xc2/0x160 fs/xattr.c:557
do_syscall_64+0xdf/0x530 arch/x86/entry/common.c:298
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x459fe9
RSP: 002b:
00007fa5e54b4c08 EFLAGS:
00000246 ORIG_RAX:
00000000000000bd
RAX:
ffffffffffffffda RBX:
000000000051bf60 RCX:
0000000000459fe9
RDX:
00000000200003c0 RSI:
0000000020000180 RDI:
0000000020000140
RBP:
000000000051bf60 R08:
0000000000000001 R09:
0000000000000000
R10:
0000000000001009 R11:
0000000000000246 R12:
0000000000000000
R13:
00007ffc73c93fc0 R14:
000000000051bf60 R15:
00007fa5e54b4d80
[...]
==================================================================
Above issue may happen as follows:
-------------------------------------
ext4_xattr_set
ext4_xattr_set_handle
ext4_xattr_ibody_find
>> s->end < s->base
>> no EXT4_STATE_XATTR
>> xattr_check_inode is not executed
ext4_xattr_ibody_set
ext4_xattr_set_entry
>> size_t min_offs = s->end - s->base
>> UAF in memcpy
we can easily reproduce this problem with the following commands:
mkfs.ext4 -F /dev/sda
mount -o debug_want_extra_isize=128 /dev/sda /mnt
touch /mnt/file
setfattr -n user.cat -v `seq -s z 4096|tr -d '[:digit:]'` /mnt/file
In ext4_xattr_ibody_find, we have the following assignment logic:
header = IHDR(inode, raw_inode)
= raw_inode + EXT4_GOOD_OLD_INODE_SIZE + i_extra_isize
is->s.base = IFIRST(header)
= header + sizeof(struct ext4_xattr_ibody_header)
is->s.end = raw_inode + s_inode_size
In ext4_xattr_set_entry
min_offs = s->end - s->base
= s_inode_size - EXT4_GOOD_OLD_INODE_SIZE - i_extra_isize -
sizeof(struct ext4_xattr_ibody_header)
last = s->first
free = min_offs - ((void *)last - s->base) - sizeof(__u32)
= s_inode_size - EXT4_GOOD_OLD_INODE_SIZE - i_extra_isize -
sizeof(struct ext4_xattr_ibody_header) - sizeof(__u32)
In the calculation formula, all values except s_inode_size and
i_extra_size are fixed values. When i_extra_size is the maximum value
s_inode_size - EXT4_GOOD_OLD_INODE_SIZE, min_offs is -4 and free is -8.
The value overflows. As a result, the preceding issue is triggered when
memcpy is executed.
Therefore, when finding xattr or setting xattr, check whether
there is space for storing xattr in the inode to resolve this issue.
Cc: stable@kernel.org
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220616021358.2504451-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Baokun Li [Thu, 16 Jun 2022 02:13:55 +0000 (10:13 +0800)]
ext4: add EXT4_INODE_HAS_XATTR_SPACE macro in xattr.h
When adding an xattr to an inode, we must ensure that the inode_size is
not less than EXT4_GOOD_OLD_INODE_SIZE + extra_isize + pad. Otherwise,
the end position may be greater than the start position, resulting in UAF.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220616021358.2504451-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Whitney [Wed, 15 Jun 2022 16:05:30 +0000 (12:05 -0400)]
ext4: fix extent status tree race in writeback error recovery path
A race can occur in the unlikely event ext4 is unable to allocate a
physical cluster for a delayed allocation in a bigalloc file system
during writeback. Failure to allocate a cluster forces error recovery
that includes a call to mpage_release_unused_pages(). That function
removes any corresponding delayed allocated blocks from the extent
status tree. If a new delayed write is in progress on the same cluster
simultaneously, resulting in the addition of an new extent containing
one or more blocks in that cluster to the extent status tree, delayed
block accounting can be thrown off if that delayed write then encounters
a similar cluster allocation failure during future writeback.
Write lock the i_data_sem in mpage_release_unused_pages() to fix this
problem. Ext4's block/cluster accounting code for bigalloc relies on
i_data_sem for mutual exclusion, as is found in the delayed write path,
and the locking in mpage_release_unused_pages() is missing.
Cc: stable@kernel.org
Reported-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20220615160530.1928801-1-enwlinux@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhang Yi [Sat, 11 Jun 2022 13:04:26 +0000 (21:04 +0800)]
jbd2: fix outstanding credits assert in jbd2_journal_commit_transaction()
We catch an assert problem in jbd2_journal_commit_transaction() when
doing fsstress and request falut injection tests. The problem is
happened in a race condition between jbd2_journal_commit_transaction()
and ext4_end_io_end(). Firstly, ext4_writepages() writeback dirty pages
and start reserved handle, and then the journal was aborted due to some
previous metadata IO error, jbd2_journal_abort() start to commit current
running transaction, the committing procedure could be raced by
ext4_end_io_end() and lead to subtract j_reserved_credits twice from
commit_transaction->t_outstanding_credits, finally the
t_outstanding_credits is mistakenly smaller than t_nr_buffers and
trigger assert.
kjournald2 kworker
jbd2_journal_commit_transaction()
write_unlock(&journal->j_state_lock);
atomic_sub(j_reserved_credits, t_outstanding_credits); //sub once
jbd2_journal_start_reserved()
start_this_handle() //detect aborted journal
jbd2_journal_free_reserved() //get running transaction
read_lock(&journal->j_state_lock)
__jbd2_journal_unreserve_handle()
atomic_sub(j_reserved_credits, t_outstanding_credits);
//sub again
read_unlock(&journal->j_state_lock);
journal->j_running_transaction = NULL;
J_ASSERT(t_nr_buffers <= t_outstanding_credits) //bomb!!!
Fix this issue by using journal->j_state_lock to protect the subtraction
in jbd2_journal_commit_transaction().
Fixes:
96f1e0974575 ("jbd2: avoid long hold times of j_state_lock while committing a transaction")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220611130426.2013258-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Wed, 8 Jun 2022 11:23:50 +0000 (13:23 +0200)]
jbd2: unexport jbd2_log_start_commit()
jbd2_log_start_commit() is not used outside of jbd2 so unexport it. Also
make __jbd2_log_start_commit() static when we are at it.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220608112355.4397-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Wed, 8 Jun 2022 11:23:49 +0000 (13:23 +0200)]
jbd2: remove unused exports for jbd2 debugging
Jbd2 exports jbd2_journal_enable_debug and __jbd2_debug() depite the
first is used only in fs/jbd2/journal.c and the second only within jbd2
code. Remove the pointless exports make jbd2_journal_enable_debug
static.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220608112355.4397-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Wed, 8 Jun 2022 11:23:48 +0000 (13:23 +0200)]
jbd2: rename jbd_debug() to jbd2_debug()
The name of jbd_debug() is confusing as all functions inside jbd2 have
jbd2_ prefix. Rename jbd_debug() to jbd2_debug(). No functional changes.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220608112355.4397-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Wed, 8 Jun 2022 11:23:47 +0000 (13:23 +0200)]
ext4: use ext4_debug() instead of jbd_debug()
We use jbd_debug() in some places in ext4. It seems a bit strange to use
jbd2 debugging output function for ext4 code. Also these days
ext4_debug() uses dynamic printk so each debug message can be enabled /
disabled on its own so the time when it made some sense to have these
combined (to allow easier common selecting of messages to report) has
passed. Just convert all jbd_debug() uses in ext4 to ext4_debug().
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220608112355.4397-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
hanjinke [Mon, 6 Jun 2022 15:53:05 +0000 (23:53 +0800)]
ext4: reuse order and buddy in mb_mark_used when buddy split
After each buddy split, mb_mark_used will search the proper order
for the block which may consume some loop in mb_find_order_for_block.
In fact, we can reuse the order and buddy generated by the buddy split.
Reviewed by: lei.rao@intel.com
Signed-off-by: hanjinke <hanjinke.666@bytedance.com>
Link: https://lore.kernel.org/r/20220606155305.74146-1-hanjinke.666@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Wed, 29 Jun 2022 04:00:26 +0000 (00:00 -0400)]
ext4: update the s_overhead_clusters in the backup sb's when resizing
When the EXT4_IOC_RESIZE_FS ioctl is complete, update the backup
superblocks. We don't do this for the old-style resize ioctls since
they are quite ancient, and only used by very old versions of
resize2fs --- and we don't want to update the backup superblocks every
time EXT4_IOC_GROUP_ADD is called, since it might get called a lot.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20220629040026.112371-2-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Wed, 29 Jun 2022 04:00:25 +0000 (00:00 -0400)]
ext4: update s_overhead_clusters in the superblock during an on-line resize
When doing an online resize, the on-disk superblock on-disk wasn't
updated. This means that when the file system is unmounted and
remounted, and the on-disk overhead value is non-zero, this would
result in the results of statfs(2) to be incorrect.
This was partially fixed by Commits
10b01ee92df5 ("ext4: fix overhead
calculation to account for the reserved gdt blocks"),
85d825dbf489
("ext4: force overhead calculation if the s_overhead_cluster makes no
sense"), and
eb7054212eac ("ext4: update the cached overhead value in
the superblock").
However, since it was too expensive to forcibly recalculate the
overhead for bigalloc file systems at every mount, this didn't fix the
problem for bigalloc file systems. This commit should address the
problem when resizing file systems with the bigalloc feature enabled.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20220629040026.112371-1-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhang Yi [Thu, 30 Jun 2022 09:01:00 +0000 (17:01 +0800)]
ext4: fix reading leftover inlined symlinks
Since commit
6493792d3299 ("ext4: convert symlink external data block
mapping to bdev"), create new symlink with inline_data is not supported,
but it missing to handle the leftover inlined symlinks, which could
cause below error message and fail to read symlink.
ls: cannot read symbolic link 'foo': Structure needs cleaning
EXT4-fs error (device sda): ext4_map_blocks:605: inode #12: block
2021161080: comm ls: lblock 0 mapped to illegal pblock
2021161080
(length 1)
Fix this regression by adding ext4_read_inline_link(), which read the
inline data directly and convert it through a kmalloced buffer.
Fixes:
6493792d3299 ("ext4: convert symlink external data block mapping to bdev")
Cc: stable@kernel.org
Reported-by: Torge Matthies <openglfreak@googlemail.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Tested-by: Torge Matthies <openglfreak@googlemail.com>
Link: https://lore.kernel.org/r/20220630090100.2769490-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Linus Torvalds [Sun, 3 Jul 2022 22:39:28 +0000 (15:39 -0700)]
Linux 5.19-rc5
Linus Torvalds [Sun, 3 Jul 2022 21:40:28 +0000 (14:40 -0700)]
lockref: remove unused 'lockref_get_or_lock()' function
Looking at the conditional lock acquire functions in the kernel due to
the new sparse support (see commit
4a557a5d1a61 "sparse: introduce
conditional lock acquire function attribute"), it became obvious that
the lockref code has a couple of them, but they don't match the usual
naming convention for the other ones, and their return value logic is
also reversed.
In the other very similar places, the naming pattern is '*_and_lock()'
(eg 'atomic_put_and_lock()' and 'refcount_dec_and_lock()'), and the
function returns true when the lock is taken.
The lockref code is superficially very similar to the refcount code,
only with the special "atomic wrt the embedded lock" semantics. But
instead of the '*_and_lock()' naming it uses '*_or_lock()'.
And instead of returning true in case it took the lock, it returns true
if it *didn't* take the lock.
Now, arguably the reflock code is quite logical: it really is a "either
decrement _or_ lock" kind of situation - and the return value is about
whether the operation succeeded without any special care needed.
So despite the similarities, the differences do make some sense, and
maybe it's not worth trying to unify the different conditional locking
primitives in this area.
But while looking at this all, it did become obvious that the
'lockref_get_or_lock()' function hasn't actually had any users for
almost a decade.
The only user it ever had was the shortlived 'd_rcu_to_refcount()'
function, and it got removed and replaced with 'lockref_get_not_dead()'
back in 2013 in commits
0d98439ea3c6 ("vfs: use lockred 'dead' flag to
mark unrecoverably dead dentries") and
e5c832d55588 ("vfs: fix dentry
RCU to refcounting possibly sleeping dput()")
In fact, that single use was removed less than a week after the whole
function was introduced in commit
b3abd80250c1 ("lockref: add
'lockref_get_or_lock() helper") so this function has been around for a
decade, but only had a user for six days.
Let's just put this mis-designed and unused function out of its misery.
We can think about the naming and semantic oddities of the remaining
'lockref_put_or_lock()' later, but at least that function has users.
And while the naming is different and the return value doesn't match,
that function matches the whole '{atomic,refcount}_dec_and_test()'
pattern much better (ie the magic happens when the count goes down to
zero, not when it is incremented from zero).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 30 Jun 2022 16:34:10 +0000 (09:34 -0700)]
sparse: introduce conditional lock acquire function attribute
The kernel tends to try to avoid conditional locking semantics because
it makes it harder to think about and statically check locking rules,
but we do have a few fundamental locking primitives that take locks
conditionally - most obviously the 'trylock' functions.
That has always been a problem for 'sparse' checking for locking
imbalance, and we've had a special '__cond_lock()' macro that we've used
to let sparse know how the locking works:
# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0)
so that you can then use this to tell sparse that (for example) the
spinlock trylock macro ends up acquiring the lock when it succeeds, but
not when it fails:
#define raw_spin_trylock(lock) __cond_lock(lock, _raw_spin_trylock(lock))
and then sparse can follow along the locking rules when you have code like
if (!spin_trylock(&dentry->d_lock))
return LRU_SKIP;
.. sparse sees that the lock is held here..
spin_unlock(&dentry->d_lock);
and sparse ends up happy about the lock contexts.
However, this '__cond_lock()' use does result in very ugly header files,
and requires you to basically wrap the real function with that macro
that uses '__cond_lock'. Which has made PeterZ NAK things that try to
fix sparse warnings over the years [1].
To solve this, there is now a very experimental patch to sparse that
basically does the exact same thing as '__cond_lock()' did, but using a
function attribute instead. That seems to make PeterZ happy [2].
Note that this does not replace existing use of '__cond_lock()', but
only exposes the new proposed attribute and uses it for the previously
unannotated 'refcount_dec_and_lock()' family of functions.
For existing sparse installations, this will make no difference (a
negative output context was ignored), but if you have the experimental
sparse patch it will make sparse now understand code that uses those
functions, the same way '__cond_lock()' makes sparse understand the very
similar 'atomic_dec_and_lock()' uses that have the old '__cond_lock()'
annotations.
Note that in some cases this will silence existing context imbalance
warnings. But in other cases it may end up exposing new sparse warnings
for code that sparse just didn't see the locking for at all before.
This is a trial, in other words. I'd expect that if it ends up being
successful, and new sparse releases end up having this new attribute,
we'll migrate the old-style '__cond_lock()' users to use the new-style
'__cond_acquires' function attribute.
The actual experimental sparse patch was posted in [3].
Link: https://lore.kernel.org/all/20130930134434.GC12926@twins.programming.kicks-ass.net/
Link: https://lore.kernel.org/all/Yr60tWxN4P568x3W@worktop.programming.kicks-ass.net/
Link: https://lore.kernel.org/all/CAHk-=wjZfO9hGqJ2_hGQG3U_XzSh9_XaXze=HgPdvJbgrvASfA@mail.gmail.com/
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Aring <aahringo@redhat.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 3 Jul 2022 16:42:17 +0000 (09:42 -0700)]
Merge tag 'xfs-5.19-fixes-4' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"This fixes some stalling problems and corrects the last of the
problems (I hope) observed during testing of the new atomic xattr
update feature.
- Fix statfs blocking on background inode gc workers
- Fix some broken inode lock assertion code
- Fix xattr leaf buffer leaks when cancelling a deferred xattr update
operation
- Clean up xattr recovery to make it easier to understand.
- Fix xattr leaf block verifiers tripping over empty blocks.
- Remove complicated and error prone xattr leaf block bholding mess.
- Fix a bug where an rt extent crossing EOF was treated as "posteof"
blocks and cleaned unnecessarily.
- Fix a UAF when log shutdown races with unmount"
* tag 'xfs-5.19-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: prevent a UAF when log IO errors race with unmount
xfs: dont treat rt extents beyond EOF as eofblocks to be cleared
xfs: don't hold xattr leaf buffers across transaction rolls
xfs: empty xattr leaf header blocks are not corruption
xfs: clean up the end of xfs_attri_item_recover
xfs: always free xattri_leaf_bp when cancelling a deferred op
xfs: use invalidate_lock to check the state of mmap_lock
xfs: factor out the common lock flags assert
xfs: introduce xfs_inodegc_push()
xfs: bound maximum wait time for inodegc work
Linus Torvalds [Sat, 2 Jul 2022 18:20:56 +0000 (11:20 -0700)]
Merge tag 'nfsd-5.19-2' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"Notable regression fixes:
- Fix NFSD crash during NFSv4.2 READ_PLUS operation
- Fix incorrect status code returned by COMMIT operation"
* tag 'nfsd-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Fix READ_PLUS crasher
NFSD: restore EINVAL error translation in nfsd_commit()
Linus Torvalds [Sat, 2 Jul 2022 17:23:36 +0000 (10:23 -0700)]
Merge tag 'for-5.19/parisc-4' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
"Two important fixes for bugs in code which was added in 5.18:
- Fix userspace signal failures on 32-bit kernel due to a bug in vDSO
- Fix 32-bit load-word unalignment exception handler which returned
wrong values"
* tag 'for-5.19/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix vDSO signal breakage on 32-bit kernel
parisc/unaligned: Fix emulate_ldw() breakage
Helge Deller [Fri, 1 Jul 2022 07:00:41 +0000 (09:00 +0200)]
parisc: Fix vDSO signal breakage on 32-bit kernel
Addition of vDSO support for parisc in kernel v5.18 suddenly broke glibc
signal testcases on a 32-bit kernel.
The trampoline code (sigtramp.S) which is mapped into userspace includes
an offset to the context data on the stack, which is used by gdb and
glibc to get access to registers.
In a 32-bit kernel we used by mistake the offset into the compat context
(which is valid on a 64-bit kernel only) instead of the offset into the
"native" 32-bit context.
Reported-by: John David Anglin <dave.anglin@bell.net>
Tested-by: John David Anglin <dave.anglin@bell.net>
Fixes:
df24e1783e6e ("parisc: Add vDSO support")
CC: stable@vger.kernel.org # 5.18
Signed-off-by: Helge Deller <deller@gmx.de>
Linus Torvalds [Sat, 2 Jul 2022 16:28:36 +0000 (09:28 -0700)]
Merge tag 'perf-tools-fixes-for-v5.19-2022-07-02' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- BPF program info linear (BPIL) data is accessed assuming 64-bit
alignment resulting in undefined behavior as the data is just byte
aligned. Fix it, Found using -fsanitize=undefined.
- Fix 'perf offcpu' build on old kernels wrt task_struct's
state/__state field.
- Fix perf_event_attr.sample_type setting on the 'offcpu-time' event
synthesized by the 'perf offcpu' tool.
- Don't bail out when synthesizing PERF_RECORD_ events for pre-existing
threads when one goes away while parsing its procfs entries.
- Don't sort the task scan result from /proc, its not needed and
introduces bugs when the main thread isn't the first one to be
processed.
- Fix uninitialized 'offset' variable on aarch64 in the unwind code.
- Sync KVM headers with the kernel sources.
* tag 'perf-tools-fixes-for-v5.19-2022-07-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf synthetic-events: Ignore dead threads during event synthesis
perf synthetic-events: Don't sort the task scan result from /proc
perf unwind: Fix unitialized 'offset' variable on aarch64
tools headers UAPI: Sync linux/kvm.h with the kernel sources
perf bpf: 8 byte align bpil data
tools kvm headers arm64: Update KVM headers from the kernel sources
perf offcpu: Accept allowed sample types only
perf offcpu: Fix build failure on old kernels
Linus Torvalds [Sat, 2 Jul 2022 16:11:44 +0000 (09:11 -0700)]
Merge tag 'powerpc-5.19-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix BPF uapi confusion about the correct type of bpf_user_pt_regs_t.
- Fix virt_addr_valid() when memory is hotplugged above the boot-time
high_memory value.
- Fix a bug in 64-bit Book3E map_kernel_page() which would incorrectly
allocate a PMD page at PUD level.
- Fix a couple of minor issues found since we enabled KASAN for 64-bit
Book3S.
Thanks to Aneesh Kumar K.V, Cédric Le Goater, Christophe Leroy, Kefeng
Wang, Liam Howlett, Nathan Lynch, and Naveen N. Rao.
* tag 'powerpc-5.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/memhotplug: Add add_pages override for PPC
powerpc/bpf: Fix use of user_pt_regs in uapi
powerpc/prom_init: Fix kernel config grep
powerpc/book3e: Fix PUD allocation size in map_kernel_page()
powerpc/xive/spapr: correct bitmap allocation size
Namhyung Kim [Fri, 1 Jul 2022 20:54:58 +0000 (13:54 -0700)]
perf synthetic-events: Ignore dead threads during event synthesis
When it synthesize various task events, it scans the list of task
first and then accesses later. There's a window threads can die
between the two and proc entries may not be available.
Instead of bailing out, we can ignore that thread and move on.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220701205458.985106-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 1 Jul 2022 20:54:57 +0000 (13:54 -0700)]
perf synthetic-events: Don't sort the task scan result from /proc
It should not sort the result as procfs already returns a proper
ordering of tasks. Actually sorting the order caused problems that it
doesn't guararantee to process the main thread first.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220701205458.985106-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ivan Babrou [Fri, 1 Jul 2022 18:20:46 +0000 (11:20 -0700)]
perf unwind: Fix unitialized 'offset' variable on aarch64
Commit
dc2cf4ca866f5715 ("perf unwind: Fix segbase for ld.lld linked
objects") uncovered the following issue on aarch64:
util/unwind-libunwind-local.c: In function 'find_proc_info':
util/unwind-libunwind-local.c:386:28: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
386 | if (ofs > 0) {
| ^
util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here
199 | u64 address, offset;
| ^~~~~~
util/unwind-libunwind-local.c:371:20: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
371 | if (ofs <= 0) {
| ^
util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here
199 | u64 address, offset;
| ^~~~~~
util/unwind-libunwind-local.c:363:20: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
363 | if (ofs <= 0) {
| ^
util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here
199 | u64 address, offset;
| ^~~~~~
In file included from util/libunwind/arm64.c:37:
Fixes:
dc2cf4ca866f5715 ("perf unwind: Fix segbase for ld.lld linked objects")
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: kernel-team@cloudflare.com
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220701182046.12589-1-ivan@cloudflare.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Fri, 1 Jul 2022 23:58:19 +0000 (16:58 -0700)]
Merge tag 'libnvdimm-fixes-5.19-rc5' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Vishal Verma:
- Fix a bug in the libnvdimm 'BTT' (Block Translation Table) driver
where accounting for poison blocks to be cleared was off by one,
causing a failure to clear the the last badblock in an nvdimm region.
* tag 'libnvdimm-fixes-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm: Fix badblocks clear off-by-one error
Linus Torvalds [Fri, 1 Jul 2022 20:00:47 +0000 (13:00 -0700)]
Merge tag 'thermal-5.19-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"Add a new CPU ID to the list of supported processors in the
intel_tcc_cooling driver (Sumeet Pawnikar)"
* tag 'thermal-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel_tcc_cooling: Add TCC cooling support for RaptorLake
Linus Torvalds [Fri, 1 Jul 2022 19:55:28 +0000 (12:55 -0700)]
Merge tag 'pm-5.19-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix some issues in cpufreq drivers and some issues in devfreq:
- Fix error code path issues related PROBE_DEFER handling in devfreq
(Christian Marangi)
- Revert an editing accident in SPDX-License line in the devfreq
passive governor (Lukas Bulwahn)
- Fix refcount leak in of_get_devfreq_events() in the exynos-ppmu
devfreq driver (Miaoqian Lin)
- Use HZ_PER_KHZ macro in the passive devfreq governor (Yicong Yang)
- Fix missing of_node_put for qoriq and pmac32 driver (Liang He)
- Fix issues around throttle interrupt for qcom driver (Stephen Boyd)
- Add MT8186 to cpufreq-dt-platdev blocklist (AngeloGioacchino Del
Regno)
- Make amd-pstate enable CPPC on resume from S3 (Jinzhou Su)"
* tag 'pm-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / devfreq: passive: revert an editing accident in SPDX-License line
PM / devfreq: Fix kernel warning with cpufreq passive register fail
PM / devfreq: Rework freq_table to be local to devfreq struct
PM / devfreq: exynos-ppmu: Fix refcount leak in of_get_devfreq_events
PM / devfreq: passive: Use HZ_PER_KHZ macro in units.h
PM / devfreq: Fix cpufreq passive unregister erroring on PROBE_DEFER
PM / devfreq: Mute warning on governor PROBE_DEFER
PM / devfreq: Fix kernel panic with cpu based scaling to passive gov
cpufreq: Add MT8186 to cpufreq-dt-platdev blocklist
cpufreq: pmac32-cpufreq: Fix refcount leak bug
cpufreq: qcom-hw: Don't do lmh things without a throttle interrupt
drivers: cpufreq: Add missing of_node_put() in qoriq-cpufreq.c
cpufreq: amd-pstate: Add resume and suspend callbacks
Rafael J. Wysocki [Fri, 1 Jul 2022 19:43:08 +0000 (21:43 +0200)]
Merge branch 'pm-cpufreq'
Merge cpufreq fixes for 5.19-rc5, including ARM cpufreq fixes and the
following one:
- Make amd-pstate enable CPPC on resume from S3 (Jinzhou Su).
* pm-cpufreq:
cpufreq: Add MT8186 to cpufreq-dt-platdev blocklist
cpufreq: pmac32-cpufreq: Fix refcount leak bug
cpufreq: qcom-hw: Don't do lmh things without a throttle interrupt
drivers: cpufreq: Add missing of_node_put() in qoriq-cpufreq.c
cpufreq: amd-pstate: Add resume and suspend callbacks
Linus Torvalds [Fri, 1 Jul 2022 19:05:27 +0000 (12:05 -0700)]
Merge tag 'hwmon-for-v5.19-rc5' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix error handling in ibmaem driver initialization
- Fix bad data reported by occ driver after setting power cap
- Fix typos in pmbus/ucd9200 driver comments
* tag 'hwmon-for-v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ibmaem) don't call platform_device_del() if platform_device_add() fails
hwmon: (pmbus/ucd9200) fix typos in comments
hwmon: (occ) Prevent power cap command overwriting poll response
Yang Yingliang [Fri, 1 Jul 2022 07:41:53 +0000 (15:41 +0800)]
hwmon: (ibmaem) don't call platform_device_del() if platform_device_add() fails
If platform_device_add() fails, it no need to call platform_device_del(), split
platform_device_unregister() into platform_device_del/put(), so platform_device_put()
can be called separately.
Fixes:
8808a793f052 ("ibmaem: new driver for power/energy/temp meters in IBM System X hardware")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220701074153.4021556-1-yangyingliang@huawei.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Linus Torvalds [Fri, 1 Jul 2022 18:23:21 +0000 (11:23 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Restore TLB invalidation for the 'break-before-make' rule on
contiguous ptes (missed in a recent clean-up)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: hugetlb: Restore TLB invalidation for BBM on contiguous ptes
Linus Torvalds [Fri, 1 Jul 2022 18:19:14 +0000 (11:19 -0700)]
Merge tag 's390-5.19-5' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- Fix purgatory build process so bin2c tool does not get built
unnecessarily and the Makefile is more consistent with other
architectures.
- Return earlier simple design of arch_get_random_seed_long|int() and
arch_get_random_long|int() callbacks as result of changes in generic
RNG code.
- Fix minor comment typos and spelling mistakes.
* tag 's390-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/qdio: Fix spelling mistake
s390/sclp: Fix typo in comments
s390/archrandom: simplify back to earlier design and initialize earlier
s390/purgatory: remove duplicated build rule of kexec-purgatory.o
s390/purgatory: hard-code obj-y in Makefile
s390: remove unneeded 'select BUILD_BIN2C'
Linus Torvalds [Fri, 1 Jul 2022 18:11:32 +0000 (11:11 -0700)]
Merge tag 'nfs-for-5.19-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
- Allocate a fattr for _nfs4_discover_trunking()
- Fix module reference count leak in nfs4_run_state_manager()
* tag 'nfs-for-5.19-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4: Add an fattr allocation to _nfs4_discover_trunking()
NFS: restore module put when manager exits.
Linus Torvalds [Fri, 1 Jul 2022 18:06:21 +0000 (11:06 -0700)]
Merge tag 'ceph-for-5.19-rc5' of https://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
"A ceph filesystem fix, marked for stable.
There appears to be a deeper issue on the MDS side, but for now we are
going with this one-liner to avoid busy looping and potential soft
lockups"
* tag 'ceph-for-5.19-rc5' of https://github.com/ceph/ceph-client:
ceph: wait on async create before checking caps for syncfs
Linus Torvalds [Fri, 1 Jul 2022 17:58:39 +0000 (10:58 -0700)]
Merge tag 'for-5.19/dm-fixes-5' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"Three fixes for invalid memory accesses discovered by using KASAN
while running the lvm2 testsuite's dm-raid tests. Includes changes to
MD's raid5.c given the dependency dm-raid has on the MD code"
* tag 'for-5.19/dm-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm raid: fix KASAN warning in raid5_add_disks
dm raid: fix KASAN warning in raid5_remove_disk
dm raid: fix accesses beyond end of raid member array
Linus Torvalds [Fri, 1 Jul 2022 17:52:01 +0000 (10:52 -0700)]
Merge tag 'io_uring-5.19-2022-07-01' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Two minor tweaks:
- While we still can, adjust the send/recv based flags to be in
->ioprio rather than in ->addr2. This is consistent with eg accept,
and also doesn't waste a full 64-bit field for flags (Pavel)
- 5.18-stable fix for re-importing provided buffers. Not much real
world relevance here as it'll only impact non-pollable files gone
async, which is more of a practical test case rather than something
that is used in the wild (Dylan)"
* tag 'io_uring-5.19-2022-07-01' of git://git.kernel.dk/linux-block:
io_uring: fix provided buffer import
io_uring: keep sendrecv flags in ioprio
Linus Torvalds [Fri, 1 Jul 2022 17:42:10 +0000 (10:42 -0700)]
Merge tag 'block-5.19-2022-07-01' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Fix for batch getting of tags in sbitmap (wuchi)
- NVMe pull request via Christoph:
- More quirks (Lamarque Vieira Souza, Pablo Greco)
- Fix a fabrics disconnect regression (Ruozhu Li)
- Fix a nvmet-tcp data_digest calculation regression (Sagi
Grimberg)
- Fix nvme-tcp send failure handling (Sagi Grimberg)
- Fix a regression with nvmet-loop and passthrough controllers
(Alan Adamson)
* tag 'block-5.19-2022-07-01' of git://git.kernel.dk/linux-block:
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA IM2P33F8ABR1
nvmet: add a clear_ids attribute for passthru targets
nvme: fix regression when disconnect a recovering ctrl
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG SX6000LNP (AKA SPECTRIX S40G)
nvme-tcp: always fail a request when sending it failed
nvmet-tcp: fix regression in data_digest calculation
lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()
Linus Torvalds [Fri, 1 Jul 2022 17:38:17 +0000 (10:38 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"One simple driver fix for a dma overrun"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: hisi_sas: Limit max hw sectors for v3 HW
Linus Torvalds [Fri, 1 Jul 2022 17:31:44 +0000 (10:31 -0700)]
Merge tag 'ata-5.19-rc5' of git://git./linux/kernel/git/dlemoal/libata
Pull ATA fix from Damien Le Moal:
- Fix a compilation warning with some versions of gcc/sparse when
compiling the pata_cs5535 driver, from John.
* tag 'ata-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_cs5535: Fix W=1 warnings
Will Deacon [Wed, 29 Jun 2022 09:53:49 +0000 (10:53 +0100)]
arm64: hugetlb: Restore TLB invalidation for BBM on contiguous ptes
Commit
fb396bb459c1 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()")
removed TLB invalidation from get_clear_flush() [now get_clear_contig()]
on the basis that the core TLB invalidation code is aware of hugetlb
mappings backed by contiguous page-table entries and will cover the
correct virtual address range.
However, this change also resulted in the TLB invalidation being removed
from the "break" step in the break-before-make (BBM) sequence used
internally by huge_ptep_set_{access_flags,wrprotect}(), therefore
making the BBM sequence unsafe irrespective of later invalidation.
Although the architecture is desperately unclear about how exactly
contiguous ptes should be updated in a live page-table, restore TLB
invalidation to our BBM sequence under the assumption that BBM is the
right thing to be doing in the first place.
Fixes:
fb396bb459c1 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()")
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220629095349.25748-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linus Torvalds [Fri, 1 Jul 2022 17:01:32 +0000 (10:01 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Two small fixes
- Initialize a spinlock in the stm32 reset code
- Add dt bindings to the clk maintainer filepattern"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
MAINTAINERS: add include/dt-bindings/clock to COMMON CLK FRAMEWORK
clk: stm32: rcc_reset: Fix missing spin_lock_init()
Darrick J. Wong [Fri, 1 Jul 2022 16:08:33 +0000 (09:08 -0700)]
xfs: prevent a UAF when log IO errors race with unmount
KASAN reported the following use after free bug when running
generic/475:
XFS (dm-0): Mounting V5 Filesystem
XFS (dm-0): Starting recovery (logdev: internal)
XFS (dm-0): Ending recovery (logdev: internal)
Buffer I/O error on dev dm-0, logical block
20639616, async page read
Buffer I/O error on dev dm-0, logical block
20639617, async page read
XFS (dm-0): log I/O error -5
XFS (dm-0): Filesystem has been shut down due to log error (0x2).
XFS (dm-0): Unmounting Filesystem
XFS (dm-0): Please unmount the filesystem and rectify the problem(s).
==================================================================
BUG: KASAN: use-after-free in do_raw_spin_lock+0x246/0x270
Read of size 4 at addr
ffff888109dd84c4 by task 3:1H/136
CPU: 3 PID: 136 Comm: 3:1H Not tainted 5.19.0-rc4-xfsx #rc4
8e53ab5ad0fddeb31cee5e7063ff9c361915a9c4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
Workqueue: xfs-log/dm-0 xlog_ioend_work [xfs]
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_report.cold+0x2b8/0x661
? do_raw_spin_lock+0x246/0x270
kasan_report+0xab/0x120
? do_raw_spin_lock+0x246/0x270
do_raw_spin_lock+0x246/0x270
? rwlock_bug.part.0+0x90/0x90
xlog_force_shutdown+0xf6/0x370 [xfs
4ad76ae0d6add7e8183a553e624c31e9ed567318]
xlog_ioend_work+0x100/0x190 [xfs
4ad76ae0d6add7e8183a553e624c31e9ed567318]
process_one_work+0x672/0x1040
worker_thread+0x59b/0xec0
? __kthread_parkme+0xc6/0x1f0
? process_one_work+0x1040/0x1040
? process_one_work+0x1040/0x1040
kthread+0x29e/0x340
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Allocated by task 154099:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0x81/0xa0
kmem_alloc+0x8d/0x2e0 [xfs]
xlog_cil_init+0x1f/0x540 [xfs]
xlog_alloc_log+0xd1e/0x1260 [xfs]
xfs_log_mount+0xba/0x640 [xfs]
xfs_mountfs+0xf2b/0x1d00 [xfs]
xfs_fs_fill_super+0x10af/0x1910 [xfs]
get_tree_bdev+0x383/0x670
vfs_get_tree+0x7d/0x240
path_mount+0xdb7/0x1890
__x64_sys_mount+0x1fa/0x270
do_syscall_64+0x2b/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Freed by task 154151:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_set_free_info+0x20/0x30
____kasan_slab_free+0x110/0x190
slab_free_freelist_hook+0xab/0x180
kfree+0xbc/0x310
xlog_dealloc_log+0x1b/0x2b0 [xfs]
xfs_unmountfs+0x119/0x200 [xfs]
xfs_fs_put_super+0x6e/0x2e0 [xfs]
generic_shutdown_super+0x12b/0x3a0
kill_block_super+0x95/0xd0
deactivate_locked_super+0x80/0x130
cleanup_mnt+0x329/0x4d0
task_work_run+0xc5/0x160
exit_to_user_mode_prepare+0xd4/0xe0
syscall_exit_to_user_mode+0x1d/0x40
entry_SYSCALL_64_after_hwframe+0x46/0xb0
This appears to be a race between the unmount process, which frees the
CIL and waits for in-flight iclog IO; and the iclog IO completion. When
generic/475 runs, it starts fsstress in the background, waits a few
seconds, and substitutes a dm-error device to simulate a disk falling
out of a machine. If the fsstress encounters EIO on a pure data write,
it will exit but the filesystem will still be online.
The next thing the test does is unmount the filesystem, which tries to
clean the log, free the CIL, and wait for iclog IO completion. If an
iclog was being written when the dm-error switch occurred, it can race
with log unmounting as follows:
Thread 1 Thread 2
xfs_log_unmount
xfs_log_clean
xfs_log_quiesce
xlog_ioend_work
<observe error>
xlog_force_shutdown
test_and_set_bit(XLOG_IOERROR)
xfs_log_force
<log is shut down, nop>
xfs_log_umount_write
<log is shut down, nop>
xlog_dealloc_log
xlog_cil_destroy
<wait for iclogs>
spin_lock(&log->l_cilp->xc_push_lock)
<KABOOM>
Therefore, free the CIL after waiting for the iclogs to complete. I
/think/ this race has existed for quite a few years now, though I don't
remember the ~2014 era logging code well enough to know if it was a real
threat then or if the actual race was exposed only more recently.
Fixes:
ac983517ec59 ("xfs: don't sleep in xlog_cil_force_lsn on shutdown")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Linus Torvalds [Fri, 1 Jul 2022 00:19:19 +0000 (17:19 -0700)]
Merge tag 'drm-fixes-2022-07-01' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Bit quieter this week, the main thing is it pulls in the fixes for the
sysfb resource issue you were seeing. these had been queued for next
so should have had some decent testing.
Otherwise amdgpu, i915 and msm each have a few fixes, and vc4 has one.
fbdev:
- sysfb fixes/conflicting fb fixes
amdgpu:
- GPU recovery fix
- Fix integer type usage in fourcc header for AMD modifiers
- KFD TLB flush fix for gfx9 APUs
- Display fix
i915:
- Fix ioctl argument error return
- Fix d3cold disable to allow PCI upstream bridge D3 transition
- Fix setting cache_dirty for dma-buf objects on discrete
msm:
- Fix to increment vsync_cnt before calling drm_crtc_handle_vblank so
that userspace sees the value *after* it is incremented if waiting
for vblank events
- Fix to reset drm_dev to NULL in dp_display_unbind to avoid a crash
in probe/bind error paths
- Fix to resolve the smatch error of de-referencing before NULL check
in dpu_encoder_phys_wb.c
- Fix error return to userspace if fence-id allocation fails in
submit ioctl
vc4:
- NULL ptr dereference fix"
* tag 'drm-fixes-2022-07-01' of git://anongit.freedesktop.org/drm/drm:
Revert "drm/amdgpu/display: set vblank_disable_immediate for DC"
drm/amdgpu: To flush tlb for MMHUB of RAVEN series
drm/fourcc: fix integer type usage in uapi header
drm/amdgpu: fix adev variable used in amdgpu_device_gpu_recover()
fbdev: Disable sysfb device registration when removing conflicting FBs
firmware: sysfb: Add sysfb_disable() helper function
firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointer
drm/msm/gem: Fix error return on fence id alloc fail
drm/i915: tweak the ordering in cpu_write_needs_clflush
drm/i915/dgfx: Disable d3cold at gfx root port
drm/i915/gem: add missing else
drm/vc4: perfmon: Fix variable dereferenced before check
drm/msm/dpu: Fix variable dereferenced before check
drm/msm/dp: reset drm_dev to NULL at dp_display_unbind()
drm/msm/dpu: Increment vsync_cnt before waking up userspace
Dave Airlie [Thu, 30 Jun 2022 23:27:28 +0000 (09:27 +1000)]
Merge tag 'drm-misc-fixes-2022-06-30' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A NULL pointer dereference fix for vc4, and 3 patches to improve the
sysfb device behaviour when removing conflicting framebuffers
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220630072404.2fa4z3nk5h5q34ci@houat
Linus Torvalds [Thu, 30 Jun 2022 22:26:55 +0000 (15:26 -0700)]
Merge tag 'net-5.19-rc5' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter.
Current release - new code bugs:
- clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()
- mptcp:
- invoke MP_FAIL response only when needed
- fix shutdown vs fallback race
- consistent map handling on failure
- octeon_ep: use bitwise AND
Previous releases - regressions:
- tipc: move bc link creation back to tipc_node_create, fix NPD
Previous releases - always broken:
- tcp: add a missing nf_reset_ct() in 3WHS handling to prevent socket
buffered skbs from keeping refcount on the conntrack module
- ipv6: take care of disable_policy when restoring routes
- tun: make sure to always disable and unlink NAPI instances
- phy: don't trigger state machine while in suspend
- netfilter: nf_tables: avoid skb access on nf_stolen
- asix: fix "can't send until first packet is send" issue
- usb: asix: do not force pause frames support
- nxp-nci: don't issue a zero length i2c_master_read()
Misc:
- ncsi: allow use of proper "mellanox" DT vendor prefix
- act_api: add a message for user space if any actions were already
flushed before the error was hit"
* tag 'net-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits)
net: dsa: felix: fix race between reading PSFP stats and port stats
selftest: tun: add test for NAPI dismantle
net: tun: avoid disabling NAPI twice
net: sparx5: mdb add/del handle non-sparx5 devices
net: sfp: fix memory leak in sfp_probe()
mlxsw: spectrum_router: Fix rollback in tunnel next hop init
net: rose: fix UAF bugs caused by timer handler
net: usb: ax88179_178a: Fix packet receiving
net: bonding: fix use-after-free after 802.3ad slave unbind
ipv6: fix lockdep splat in in6_dump_addrs()
net: phy: ax88772a: fix lost pause advertisement configuration
net: phy: Don't trigger state machine while in suspend
usbnet: fix memory allocation in helpers
selftests net: fix kselftest net fatal error
NFC: nxp-nci: don't print header length mismatch on i2c error
NFC: nxp-nci: Don't issue a zero length i2c_master_read()
net: tipc: fix possible refcount leak in tipc_sk_create()
nfc: nfcmrvl: Fix irq_of_parse_and_map() return value
net: ipv6: unexport __init-annotated seg6_hmac_net_init()
ipv6/sit: fix ipip6_tunnel_get_prl return value
...
Amir Goldstein [Thu, 30 Jun 2022 19:58:49 +0000 (22:58 +0300)]
vfs: fix copy_file_range() regression in cross-fs copies
A regression has been reported by Nicolas Boichat, found while using the
copy_file_range syscall to copy a tracefs file.
Before commit
5dae222a5ff0 ("vfs: allow copy_file_range to copy across
devices") the kernel would return -EXDEV to userspace when trying to
copy a file across different filesystems. After this commit, the
syscall doesn't fail anymore and instead returns zero (zero bytes
copied), as this file's content is generated on-the-fly and thus reports
a size of zero.
Another regression has been reported by He Zhe - the assertion of
WARN_ON_ONCE(ret == -EOPNOTSUPP) can be triggered from userspace when
copying from a sysfs file whose read operation may return -EOPNOTSUPP.
Since we do not have test coverage for copy_file_range() between any two
types of filesystems, the best way to avoid these sort of issues in the
future is for the kernel to be more picky about filesystems that are
allowed to do copy_file_range().
This patch restores some cross-filesystem copy restrictions that existed
prior to commit
5dae222a5ff0 ("vfs: allow copy_file_range to copy across
devices"), namely, cross-sb copy is not allowed for filesystems that do
not implement ->copy_file_range().
Filesystems that do implement ->copy_file_range() have full control of
the result - if this method returns an error, the error is returned to
the user. Before this change this was only true for fs that did not
implement the ->remap_file_range() operation (i.e. nfsv3).
Filesystems that do not implement ->copy_file_range() still fall-back to
the generic_copy_file_range() implementation when the copy is within the
same sb. This helps the kernel can maintain a more consistent story
about which filesystems support copy_file_range().
nfsd and ksmbd servers are modified to fall-back to the
generic_copy_file_range() implementation in case vfs_copy_file_range()
fails with -EOPNOTSUPP or -EXDEV, which preserves behavior of
server-side-copy.
fall-back to generic_copy_file_range() is not implemented for the smb
operation FSCTL_DUPLICATE_EXTENTS_TO_FILE, which is arguably a correct
change of behavior.
Fixes:
5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/
Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/
Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
Link: https://lore.kernel.org/linux-fsdevel/20210630161320.29006-1-lhenriques@suse.de/
Reported-by: Nicolas Boichat <drinkcat@chromium.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Luis Henriques <lhenriques@suse.de>
Fixes:
64bf5ff58dff ("vfs: no fallback for ->copy_file_range")
Link: https://lore.kernel.org/linux-fsdevel/20f17f64-88cb-4e80-07c1-85cb96c83619@windriver.com/
Reported-by: He Zhe <zhe.he@windriver.com>
Tested-by: Namjae Jeon <linkinjeon@kernel.org>
Tested-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chuck Lever [Thu, 30 Jun 2022 20:48:18 +0000 (16:48 -0400)]
SUNRPC: Fix READ_PLUS crasher
Looks like there are still cases when "space_left - frag1bytes" can
legitimately exceed PAGE_SIZE. Ensure that xdr->end always remains
within the current encode buffer.
Reported-by: Bruce Fields <bfields@fieldses.org>
Reported-by: Zorro Lang <zlang@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216151
Fixes:
6c254bf3b637 ("SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Scott Mayhew [Mon, 27 Jun 2022 21:31:29 +0000 (17:31 -0400)]
NFSv4: Add an fattr allocation to _nfs4_discover_trunking()
This was missed in
c3ed222745d9 ("NFSv4: Fix free of uninitialized
nfs4_label on referral lookup.") and causes a panic when mounting
with '-o trunkdiscovery':
PID: 1604 TASK:
ffff93dac3520000 CPU: 3 COMMAND: "mount.nfs"
#0 [
ffffb79140f738f8] machine_kexec at
ffffffffaec64bee
#1 [
ffffb79140f73950] __crash_kexec at
ffffffffaeda67fd
#2 [
ffffb79140f73a18] crash_kexec at
ffffffffaeda76ed
#3 [
ffffb79140f73a30] oops_end at
ffffffffaec2658d
#4 [
ffffb79140f73a50] general_protection at
ffffffffaf60111e
[exception RIP: nfs_fattr_init+0x5]
RIP:
ffffffffc0c18265 RSP:
ffffb79140f73b08 RFLAGS:
00010246
RAX:
0000000000000000 RBX:
ffff93dac304a800 RCX:
0000000000000000
RDX:
ffffb79140f73bb0 RSI:
ffff93dadc8cbb40 RDI:
d03ee11cfaf6bd50
RBP:
ffffb79140f73be8 R8:
ffffffffc0691560 R9:
0000000000000006
R10:
ffff93db3ffd3df8 R11:
0000000000000000 R12:
ffff93dac4040000
R13:
ffff93dac2848e00 R14:
ffffb79140f73b60 R15:
ffffb79140f73b30
ORIG_RAX:
ffffffffffffffff CS: 0010 SS: 0018
#5 [
ffffb79140f73b08] _nfs41_proc_get_locations at
ffffffffc0c73d53 [nfsv4]
#6 [
ffffb79140f73bf0] nfs4_proc_get_locations at
ffffffffc0c83e90 [nfsv4]
#7 [
ffffb79140f73c60] nfs4_discover_trunking at
ffffffffc0c83fb7 [nfsv4]
#8 [
ffffb79140f73cd8] nfs_probe_fsinfo at
ffffffffc0c0f95f [nfs]
#9 [
ffffb79140f73da0] nfs_probe_server at
ffffffffc0c1026a [nfs]
RIP:
00007f6254fce26e RSP:
00007ffc69496ac8 RFLAGS:
00000246
RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007f6254fce26e
RDX:
00005600220a82a0 RSI:
00005600220a64d0 RDI:
00005600220a6520
RBP:
00007ffc69496c50 R8:
00005600220a8710 R9:
003035322e323231
R10:
0000000000000000 R11:
0000000000000246 R12:
00007ffc69496c50
R13:
00005600220a8440 R14:
0000000000000010 R15:
0000560020650ef9
ORIG_RAX:
00000000000000a5 CS: 0033 SS: 002b
Fixes:
c3ed222745d9 ("NFSv4: Fix free of uninitialized nfs4_label on referral lookup.")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NeilBrown [Thu, 23 Jun 2022 04:47:34 +0000 (14:47 +1000)]
NFS: restore module put when manager exits.
Commit
f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") removed
calls to module_put_and_kthread_exit() from threads that acted as SUNRPC
servers and had a related svc_serv_ops structure. This was correct.
It ALSO removed the module_put_and_kthread_exit() call from
nfs4_run_state_manager() which is NOT a SUNRPC service.
Consequently every time the NFSv4 state manager runs the module count
increments and won't be decremented. So the nfsv4 module cannot be
unloaded.
So restore the module_put_and_kthread_exit() call.
Fixes:
f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Jens Axboe [Thu, 30 Jun 2022 20:00:11 +0000 (14:00 -0600)]
Merge tag 'nvme-5.19-2022-06-30' of git://git.infradead.org/nvme into block-5.19
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.19
- more quirks (Lamarque Vieira Souza, Pablo Greco)
- fix a fabrics disconnect regression (Ruozhu Li)
- fix a nvmet-tcp data_digest calculation regression (Sagi Grimberg)
- fix nvme-tcp send failure handling (Sagi Grimberg)
- fix a regression with nvmet-loop and passthrough controllers
(Alan Adamson)"
* tag 'nvme-5.19-2022-06-30' of git://git.infradead.org/nvme:
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA IM2P33F8ABR1
nvmet: add a clear_ids attribute for passthru targets
nvme: fix regression when disconnect a recovering ctrl
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG SX6000LNP (AKA SPECTRIX S40G)
nvme-tcp: always fail a request when sending it failed
nvmet-tcp: fix regression in data_digest calculation
Vladimir Oltean [Wed, 29 Jun 2022 18:30:07 +0000 (21:30 +0300)]
net: dsa: felix: fix race between reading PSFP stats and port stats
Both PSFP stats and the port stats read by ocelot_check_stats_work() are
indirectly read through the same mechanism - write to STAT_CFG:STAT_VIEW,
read from SYS:STAT:CNT[n].
It's just that for port stats, we write STAT_VIEW with the index of the
port, and for PSFP stats, we write STAT_VIEW with the filter index.
So if we allow them to run concurrently, ocelot_check_stats_work() may
change the view from vsc9959_psfp_counters_get(), and vice versa.
Fixes:
7d4b564d6add ("net: dsa: felix: support psfp filter on vsc9959")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220629183007.3808130-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 29 Jun 2022 18:19:11 +0000 (11:19 -0700)]
selftest: tun: add test for NAPI dismantle
Being lazy does not pay, add the test for various
ordering of tun queue close / detach / destroy.
Link: https://lore.kernel.org/r/20220629181911.372047-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 29 Jun 2022 18:19:10 +0000 (11:19 -0700)]
net: tun: avoid disabling NAPI twice
Eric reports that syzbot made short work out of my speculative
fix. Indeed when queue gets detached its tfile->tun remains,
so we would try to stop NAPI twice with a detach(), close()
sequence.
Alternative fix would be to move tun_napi_disable() to
tun_detach_all() and let the NAPI run after the queue
has been detached.
Fixes:
a8fc8cb5692a ("net: tun: stop NAPI when detaching queues")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220629181911.372047-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Casper Andersson [Thu, 30 Jun 2022 12:22:26 +0000 (14:22 +0200)]
net: sparx5: mdb add/del handle non-sparx5 devices
When adding/deleting mdb entries on other net_devices, eg., tap
interfaces, it should not crash.
Fixes:
3bacfccdcb2d ("net: sparx5: Add mdb handlers")
Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Link: https://lore.kernel.org/r/20220630122226.316812-1-casper.casan@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sumeet Pawnikar [Fri, 6 May 2022 13:50:09 +0000 (19:20 +0530)]
thermal: intel_tcc_cooling: Add TCC cooling support for RaptorLake
Add RaptorLake to the list of processor models supported by the Intel
TCC cooling driver.
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
[ rjw: Subject edits, new changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Zhang Jiaming [Thu, 23 Jun 2022 06:05:43 +0000 (14:05 +0800)]
s390/qdio: Fix spelling mistake
Change 'defineable' to 'definable'.
Change 'paramater' to 'parameter'.
Signed-off-by: Zhang Jiaming <jiaming@nfschina.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Link: https://lore.kernel.org/r/20220623060543.12870-1-jiaming@nfschina.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Jiang Jian [Wed, 22 Jun 2022 14:27:13 +0000 (22:27 +0800)]
s390/sclp: Fix typo in comments
Remove the repeated word 'and' from comments
Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220622142713.14187-1-jiangjian@cdjrlc.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Jason A. Donenfeld [Fri, 10 Jun 2022 22:20:23 +0000 (00:20 +0200)]
s390/archrandom: simplify back to earlier design and initialize earlier
s390x appears to present two RNG interfaces:
- a "TRNG" that gathers entropy using some hardware function; and
- a "DRBG" that takes in a seed and expands it.
Previously, the TRNG was wired up to arch_get_random_{long,int}(), but
it was observed that this was being called really frequently, resulting
in high overhead. So it was changed to be wired up to arch_get_random_
seed_{long,int}(), which was a reasonable decision. Later on, the DRBG
was then wired up to arch_get_random_{long,int}(), with a complicated
buffer filling thread, to control overhead and rate.
Fortunately, none of the performance issues matter much now. The RNG
always attempts to use arch_get_random_seed_{long,int}() first, which
means a complicated implementation of arch_get_random_{long,int}() isn't
really valuable or useful to have around. And it's only used when
reseeding, which means it won't hit the high throughput complications
that were faced before.
So this commit returns to an earlier design of just calling the TRNG in
arch_get_random_seed_{long,int}(), and returning false in arch_get_
random_{long,int}().
Part of what makes the simplification possible is that the RNG now seeds
itself using the TRNG at bootup. But this only works if the TRNG is
detected early in boot, before random_init() is called. So this commit
also causes that check to happen in setup_arch().
Cc: stable@vger.kernel.org
Cc: Harald Freudenberger <freude@linux.ibm.com>
Cc: Ingo Franzki <ifranzki@linux.ibm.com>
Cc: Juergen Christ <jchrist@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20220610222023.378448-1-Jason@zx2c4.com
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Dylan Yudaken [Thu, 30 Jun 2022 13:20:06 +0000 (06:20 -0700)]
io_uring: fix provided buffer import
io_import_iovec uses the s pointer, but this was changed immediately
after the iovec was re-imported and so it was imported into the wrong
place.
Change the ordering.
Fixes:
2be2eb02e2f5 ("io_uring: ensure reads re-import for selected buffers")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630132006.2825668-1-dylany@fb.com
[axboe: ensure we don't half-import as well]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Thu, 30 Jun 2022 17:03:22 +0000 (10:03 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Three minor bug fixes:
- qedr not setting the QP timeout properly toward userspace
- Memory leak on error path in ib_cm
- Divide by 0 in RDMA interrupt moderation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
linux/dim: Fix divide by 0 in RDMA DIM
RDMA/cm: Fix memory leak in ib_cm_insert_listen
RDMA/qedr: Fix reporting QP timeout attribute
Linus Torvalds [Thu, 30 Jun 2022 16:57:18 +0000 (09:57 -0700)]
Merge tag 'fsnotify_for_v5.19-rc5' of git://git./linux/kernel/git/jack/linux-fs
Pull fanotify fix from Jan Kara:
"A fix for recently added fanotify API to have stricter checks and
refuse some invalid flag combinations to make our life easier in the
future"
* tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: refine the validation checks on non-dir inode mask
Linus Torvalds [Thu, 30 Jun 2022 16:45:42 +0000 (09:45 -0700)]
Merge tag 'v5.19-p3' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a regression that breaks the ccp driver"
* tag 'v5.19-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ccp - Fix device IRQ counting by using platform_irq_count()
Rafael J. Wysocki [Thu, 30 Jun 2022 13:30:30 +0000 (15:30 +0200)]
Merge tag 'devfreq-fixes-for-5.19-rc5' of git://git./linux/kernel/git/chanwoo/linux
Pull devfreq fixes for 5.19-rc5 from Chanwoo Choi:
"1. Fix devfreq passive governor issue when cpufreq policies are not
ready during kernel boot because some CPUs turn on after kernel
booting or others.
- Re-initialize the vairables of struct devfreq_passive_data when
PROBE_DEFER happens when cpufreq_get() returns NULL.
- Use dev_err_probe to mute warning when PROBE_DEFER.
- Fix cpufreq passive unregister erroring on PROBE_DEFER
by using the allocated parent_cpu_data list to free resouce
instead of for_each_possible_cpu().
- Remove duplicate cpufreq passive unregister and warning when
PROBE_DEFER.
- Use HZ_PER_KZH macro in units.h.
- Fix wrong indentation in SPDX-License line.
2. Fix reference count leak in exynos-ppmu.c by using of_node_put().
3. Rework freq_table to be local to devfreq struct
- struct devfreq_dev_profile includes freq_table array to store
the supported frequencies. If devfreq driver doesn't initialize
the freq_table, devfreq core allocates the memory and initializes
the freq_table.
On a devfreq PROBE_DEFER, the freq_table in the driver profile
struct is never reset and may be left in an undefined state. To fix
this and correctly handle PROBE_DEFER, use a local freq_table and
max_state in the devfreq struct."
* tag 'devfreq-fixes-for-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux:
PM / devfreq: passive: revert an editing accident in SPDX-License line
PM / devfreq: Fix kernel warning with cpufreq passive register fail
PM / devfreq: Rework freq_table to be local to devfreq struct
PM / devfreq: exynos-ppmu: Fix refcount leak in of_get_devfreq_events
PM / devfreq: passive: Use HZ_PER_KHZ macro in units.h
PM / devfreq: Fix cpufreq passive unregister erroring on PROBE_DEFER
PM / devfreq: Mute warning on governor PROBE_DEFER
PM / devfreq: Fix kernel panic with cpu based scaling to passive gov
Pavel Begunkov [Thu, 30 Jun 2022 12:25:57 +0000 (13:25 +0100)]
io_uring: keep sendrecv flags in ioprio
We waste a u64 SQE field for flags even though we don't need as many
bits and it can be used for something more useful later. Store io_uring
specific send/recv flags in sqe->ioprio instead of ->addr2.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Fixes:
0455d4ccec54 ("io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg")
[axboe: change comment in io_uring.h as well]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Masahiro Yamada [Mon, 13 Jun 2022 17:09:02 +0000 (02:09 +0900)]
s390/purgatory: remove duplicated build rule of kexec-purgatory.o
This is equivalent to the pattern rule in scripts/Makefile.build.
Having the dependency on $(obj)/purgatory.ro is enough.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20220613170902.1775211-3-masahiroy@kernel.org
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Masahiro Yamada [Mon, 13 Jun 2022 17:09:01 +0000 (02:09 +0900)]
s390/purgatory: hard-code obj-y in Makefile
The purgatory/ directory is entirely guarded in arch/s390/Kbuild.
CONFIG_ARCH_HAS_KEXEC_PURGATORY is bool type.
$(CONFIG_ARCH_HAS_KEXEC_PURGATORY) is always 'y' when Kbuild visits
this Makefile for building.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20220613170902.1775211-2-masahiroy@kernel.org
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Masahiro Yamada [Mon, 13 Jun 2022 17:09:00 +0000 (02:09 +0900)]
s390: remove unneeded 'select BUILD_BIN2C'
Since commit
4c0f032d4963 ("s390/purgatory: Omit use of bin2c"),
s390 builds the purgatory without using bin2c.
Remove 'select BUILD_BIN2C' to avoid the unneeded build of bin2c.
Fixes:
4c0f032d4963 ("s390/purgatory: Omit use of bin2c")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20220613170902.1775211-1-masahiroy@kernel.org
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Jianglei Nie [Wed, 29 Jun 2022 07:55:50 +0000 (15:55 +0800)]
net: sfp: fix memory leak in sfp_probe()
sfp_probe() allocates a memory chunk from sfp with sfp_alloc(). When
devm_add_action() fails, sfp is not freed, which leads to a memory leak.
We should use devm_add_action_or_reset() instead of devm_add_action().
Signed-off-by: Jianglei Nie <niejianglei2021@163.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20220629075550.2152003-1-niejianglei2021@163.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Machata [Wed, 29 Jun 2022 07:02:05 +0000 (10:02 +0300)]
mlxsw: spectrum_router: Fix rollback in tunnel next hop init
In mlxsw_sp_nexthop6_init(), a next hop is always added to the router
linked list, and mlxsw_sp_nexthop_type_init() is invoked afterwards. When
that function results in an error, the next hop will not have been removed
from the linked list. As the error is propagated upwards and the caller
frees the next hop object, the linked list ends up holding an invalid
object.
A similar issue comes up with mlxsw_sp_nexthop4_init(), where rollback
block does exist, however does not include the linked list removal.
Both IPv6 and IPv4 next hops have a similar issue with next-hop counter
rollbacks. As these were introduced in the same patchset as the next hop
linked list, include the cleanup in this patch.
Fixes:
dbe4598c1e92 ("mlxsw: spectrum_router: Keep nexthops in a linked list")
Fixes:
a5390278a5eb ("mlxsw: spectrum: Add support for setting counters on nexthops")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220629070205.803952-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Duoming Zhou [Wed, 29 Jun 2022 00:26:40 +0000 (08:26 +0800)]
net: rose: fix UAF bugs caused by timer handler
There are UAF bugs in rose_heartbeat_expiry(), rose_timer_expiry()
and rose_idletimer_expiry(). The root cause is that del_timer()
could not stop the timer handler that is running and the refcount
of sock is not managed properly.
One of the UAF bugs is shown below:
(thread 1) | (thread 2)
| rose_bind
| rose_connect
| rose_start_heartbeat
rose_release | (wait a time)
case ROSE_STATE_0 |
rose_destroy_socket | rose_heartbeat_expiry
rose_stop_heartbeat |
sock_put(sk) | ...
sock_put(sk) // FREE |
| bh_lock_sock(sk) // USE
The sock is deallocated by sock_put() in rose_release() and
then used by bh_lock_sock() in rose_heartbeat_expiry().
Although rose_destroy_socket() calls rose_stop_heartbeat(),
it could not stop the timer that is running.
The KASAN report triggered by POC is shown below:
BUG: KASAN: use-after-free in _raw_spin_lock+0x5a/0x110
Write of size 4 at addr
ffff88800ae59098 by task swapper/3/0
...
Call Trace:
<IRQ>
dump_stack_lvl+0xbf/0xee
print_address_description+0x7b/0x440
print_report+0x101/0x230
? irq_work_single+0xbb/0x140
? _raw_spin_lock+0x5a/0x110
kasan_report+0xed/0x120
? _raw_spin_lock+0x5a/0x110
kasan_check_range+0x2bd/0x2e0
_raw_spin_lock+0x5a/0x110
rose_heartbeat_expiry+0x39/0x370
? rose_start_heartbeat+0xb0/0xb0
call_timer_fn+0x2d/0x1c0
? rose_start_heartbeat+0xb0/0xb0
expire_timers+0x1f3/0x320
__run_timers+0x3ff/0x4d0
run_timer_softirq+0x41/0x80
__do_softirq+0x233/0x544
irq_exit_rcu+0x41/0xa0
sysvec_apic_timer_interrupt+0x8c/0xb0
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1b/0x20
RIP: 0010:default_idle+0xb/0x10
RSP: 0018:
ffffc9000012fea0 EFLAGS:
00000202
RAX:
000000000000bcae RBX:
ffff888006660f00 RCX:
000000000000bcae
RDX:
0000000000000001 RSI:
ffffffff843a11c0 RDI:
ffffffff843a1180
RBP:
dffffc0000000000 R08:
dffffc0000000000 R09:
ffffed100da36d46
R10:
dfffe9100da36d47 R11:
ffffffff83cf0950 R12:
0000000000000000
R13:
1ffff11000ccc1e0 R14:
ffffffff8542af28 R15:
dffffc0000000000
...
Allocated by task 146:
__kasan_kmalloc+0xc4/0xf0
sk_prot_alloc+0xdd/0x1a0
sk_alloc+0x2d/0x4e0
rose_create+0x7b/0x330
__sock_create+0x2dd/0x640
__sys_socket+0xc7/0x270
__x64_sys_socket+0x71/0x80
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Freed by task 152:
kasan_set_track+0x4c/0x70
kasan_set_free_info+0x1f/0x40
____kasan_slab_free+0x124/0x190
kfree+0xd3/0x270
__sk_destruct+0x314/0x460
rose_release+0x2fa/0x3b0
sock_close+0xcb/0x230
__fput+0x2d9/0x650
task_work_run+0xd6/0x160
exit_to_user_mode_loop+0xc7/0xd0
exit_to_user_mode_prepare+0x4e/0x80
syscall_exit_to_user_mode+0x20/0x40
do_syscall_64+0x4f/0x90
entry_SYSCALL_64_after_hwframe+0x46/0xb0
This patch adds refcount of sock when we use functions
such as rose_start_heartbeat() and so on to start timer,
and decreases the refcount of sock when timer is finished
or deleted by functions such as rose_stop_heartbeat()
and so on. As a result, the UAF bugs could be mitigated.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Tested-by: Duoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/20220629002640.5693-1-duoming@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jose Alonso [Tue, 28 Jun 2022 15:13:02 +0000 (12:13 -0300)]
net: usb: ax88179_178a: Fix packet receiving
This patch corrects packet receiving in ax88179_rx_fixup.
- problem observed:
ifconfig shows allways a lot of 'RX Errors' while packets
are received normally.
This occurs because ax88179_rx_fixup does not recognise properly
the usb urb received.
The packets are normally processed and at the end, the code exits
with 'return 0', generating RX Errors.
(pkt_cnt==-2 and ptk_hdr over field rx_hdr trying to identify
another packet there)
This is a usb urb received by "tcpdump -i usbmon2 -X" on a
little-endian CPU:
0x0000: eeee f8e3 3b19 87a0 94de 80e3 daac 0800
^ packet 1 start (pkt_len = 0x05ec)
^^^^ IP alignment pseudo header
^ ethernet packet start
last byte ethernet packet v
padding (8-bytes aligned) vvvv vvvv
0x05e0: c92d d444 1420 8a69 83dd 272f e82b 9811
0x05f0: eeee f8e3 3b19 87a0 94de 80e3 daac 0800
... ^ packet 2
0x0be0: eeee f8e3 3b19 87a0 94de 80e3 daac 0800
...
0x1130: 9d41 9171 8a38 0ec5 eeee f8e3 3b19 87a0
...
0x1720: 8cfc 15ff 5e4c e85c eeee f8e3 3b19 87a0
...
0x1d10: ecfa 2a3a 19ab c78c eeee f8e3 3b19 87a0
...
0x2070: eeee f8e3 3b19 87a0 94de 80e3 daac 0800
... ^ packet 7
0x2120: 7c88 4ca5 5c57 7dcc 0d34 7577 f778 7e0a
0x2130: f032 e093 7489 0740 3008 ec05 0000 0080
====1==== ====2====
hdr_off ^
pkt_len = 0x05ec ^^^^
AX_RXHDR_*=0x00830 ^^^^ ^
pkt_len = 0 ^^^^
AX_RXHDR_DROP_ERR=0x80000000 ^^^^ ^
0x2140: 3008 ec05 0000 0080 3008 5805 0000 0080
0x2150: 3008 ec05 0000 0080 3008 ec05 0000 0080
0x2160: 3008 5803 0000 0080 3008 c800 0000 0080
===11==== ===12==== ===13==== ===14====
0x2170: 0000 0000 0e00 3821
^^^^ ^^^^ rx_hdr
^^^^ pkt_cnt=14
^^^^ hdr_off=0x2138
^^^^ ^^^^ padding
The dump shows that pkt_cnt is the number of entrys in the
per-packet metadata. It is "2 * packet count".
Each packet have two entrys. The first have a valid
value (pkt_len and AX_RXHDR_*) and the second have a
dummy-header 0x80000000 (pkt_len=0 with AX_RXHDR_DROP_ERR).
Why exists dummy-header for each packet?!?
My guess is that this was done probably to align the
entry for each packet to 64-bits and maintain compatibility
with old firmware.
There is also a padding (0x00000000) before the rx_hdr to
align the end of rx_hdr to 64-bit.
Note that packets have a alignment of 64-bits (8-bytes).
This patch assumes that the dummy-header and the last
padding are optional. So it preserves semantics and
recognises the same valid packets as the current code.
This patch was made using only the dumpfile information and
tested with only one device:
0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet
Fixes:
57bc3d3ae8c1 ("net: usb: ax88179_178a: Fix out-of-bounds accesses in RX fixup")
Fixes:
e2ca90c276e1 ("ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver")
Signed-off-by: Jose Alonso <joalonsof@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/d6970bb04bf67598af4d316eaeb1792040b18cfd.camel@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lamarque Vieira Souza [Thu, 30 Jun 2022 00:30:53 +0000 (21:30 -0300)]
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA IM2P33F8ABR1
ADATA IM2P33F8ABR1 reports bogus eui64 values that appear to be the same
across all drives. Quirk them out so they are not marked as "non globally
unique" duplicates.
Co-developed-by: Felipe de Jesus Araujo da Conceição <felipe.conceicao@petrosoftdesign.com>
Signed-off-by: Felipe de Jesus Araujo da Conceição <felipe.conceicao@petrosoftdesign.com>
Signed-off-by: Lamarque V. Souza <lamarque.souza@petrosoftdesign.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Alan Adamson [Mon, 27 Jun 2022 23:25:43 +0000 (16:25 -0700)]
nvmet: add a clear_ids attribute for passthru targets
If the clear_ids attribute is set to true, the EUI/GUID/UUID is cleared
for the passthru target. By default, loop targets will set clear_ids to
true.
This resolves an issue where a connect to a passthru target fails when
using a trtype of 'loop' because EUI/GUID/UUID is not unique.
Fixes:
2079f41ec6ff ("nvme: check that EUI/GUID/UUID are globally unique")
Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Yevhen Orlov [Wed, 29 Jun 2022 01:29:14 +0000 (04:29 +0300)]
net: bonding: fix use-after-free after 802.3ad slave unbind
commit
0622cab0341c ("bonding: fix 802.3ad aggregator reselection"),
resolve case, when there is several aggregation groups in the same bond.
bond_3ad_unbind_slave will invalidate (clear) aggregator when
__agg_active_ports return zero. So, ad_clear_agg can be executed even, when
num_of_ports!=0. Than bond_3ad_unbind_slave can be executed again for,
previously cleared aggregator. NOTE: at this time bond_3ad_unbind_slave
will not update slave ports list, because lag_ports==NULL. So, here we
got slave ports, pointing to freed aggregator memory.
Fix with checking actual number of ports in group (as was before
commit
0622cab0341c ("bonding: fix 802.3ad aggregator reselection") ),
before ad_clear_agg().
The KASAN logs are as follows:
[ 767.617392] ==================================================================
[ 767.630776] BUG: KASAN: use-after-free in bond_3ad_state_machine_handler+0x13dc/0x1470
[ 767.638764] Read of size 2 at addr
ffff00011ba9d430 by task kworker/u8:7/767
[ 767.647361] CPU: 3 PID: 767 Comm: kworker/u8:7 Tainted: G O 5.15.11 #15
[ 767.655329] Hardware name: DNI AmazonGo1 A7040 board (DT)
[ 767.660760] Workqueue: lacp_1 bond_3ad_state_machine_handler
[ 767.666468] Call trace:
[ 767.668930] dump_backtrace+0x0/0x2d0
[ 767.672625] show_stack+0x24/0x30
[ 767.675965] dump_stack_lvl+0x68/0x84
[ 767.679659] print_address_description.constprop.0+0x74/0x2b8
[ 767.685451] kasan_report+0x1f0/0x260
[ 767.689148] __asan_load2+0x94/0xd0
[ 767.692667] bond_3ad_state_machine_handler+0x13dc/0x1470
Fixes:
0622cab0341c ("bonding: fix 802.3ad aggregator reselection")
Co-developed-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20220629012914.361-1-yevhen.orlov@plvision.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 28 Jun 2022 12:12:48 +0000 (12:12 +0000)]
ipv6: fix lockdep splat in in6_dump_addrs()
As reported by syzbot, we should not use rcu_dereference()
when rcu_read_lock() is not held.
WARNING: suspicious RCU usage
5.19.0-rc2-syzkaller #0 Not tainted
net/ipv6/addrconf.c:5175 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by syz-executor326/3617:
#0:
ffffffff8d5848e8 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xae/0xc20 net/netlink/af_netlink.c:2223
stack backtrace:
CPU: 0 PID: 3617 Comm: syz-executor326 Not tainted 5.19.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
in6_dump_addrs+0x12d1/0x1790 net/ipv6/addrconf.c:5175
inet6_dump_addr+0x9c1/0xb50 net/ipv6/addrconf.c:5300
netlink_dump+0x541/0xc20 net/netlink/af_netlink.c:2275
__netlink_dump_start+0x647/0x900 net/netlink/af_netlink.c:2380
netlink_dump_start include/linux/netlink.h:245 [inline]
rtnetlink_rcv_msg+0x73e/0xc90 net/core/rtnetlink.c:6046
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x6eb/0x810 net/socket.c:2492
___sys_sendmsg+0xf3/0x170 net/socket.c:2546
__sys_sendmsg net/socket.c:2575 [inline]
__do_sys_sendmsg net/socket.c:2584 [inline]
__se_sys_sendmsg net/socket.c:2582 [inline]
__x64_sys_sendmsg+0x132/0x220 net/socket.c:2582
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes:
88e2ca308094 ("mld: convert ifmcaddr6 to RCU")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20220628121248.858695-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oleksij Rempel [Tue, 28 Jun 2022 11:43:49 +0000 (13:43 +0200)]
net: phy: ax88772a: fix lost pause advertisement configuration
In case of asix_ax88772a_link_change_notify() workaround, we run soft
reset which will automatically clear MII_ADVERTISE configuration. The
PHYlib framework do not know about changed configuration state of the
PHY, so we need use phy_init_hw() to reinit PHY configuration.
Fixes:
dde258469257 ("net: usb/phy: asix: add support for ax88772A/C PHYs")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220628114349.3929928-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lukas Wunner [Tue, 28 Jun 2022 10:15:08 +0000 (12:15 +0200)]
net: phy: Don't trigger state machine while in suspend
Upon system sleep, mdio_bus_phy_suspend() stops the phy_state_machine(),
but subsequent interrupts may retrigger it:
They may have been left enabled to facilitate wakeup and are not
quiesced until the ->suspend_noirq() phase. Unwanted interrupts may
hence occur between mdio_bus_phy_suspend() and dpm_suspend_noirq(),
as well as between dpm_resume_noirq() and mdio_bus_phy_resume().
Retriggering the phy_state_machine() through an interrupt is not only
undesirable for the reason given in mdio_bus_phy_suspend() (freezing it
midway with phydev->lock held), but also because the PHY may be
inaccessible after it's suspended: Accesses to USB-attached PHYs are
blocked once usb_suspend_both() clears the can_submit flag and PHYs on
PCI network cards may become inaccessible upon suspend as well.
Amend phy_interrupt() to avoid triggering the state machine if the PHY
is suspended. Signal wakeup instead if the attached net_device or its
parent has been configured as a wakeup source. (Those conditions are
identical to mdio_bus_phy_may_suspend().) Postpone handling of the
interrupt until the PHY has resumed.
Before stopping the phy_state_machine() in mdio_bus_phy_suspend(),
wait for a concurrent phy_interrupt() to run to completion. That is
necessary because phy_interrupt() may have checked the PHY's suspend
status before the system sleep transition commenced and it may thus
retrigger the state machine after it was stopped.
Likewise, after re-enabling interrupt handling in mdio_bus_phy_resume(),
wait for a concurrent phy_interrupt() to complete to ensure that
interrupts which it postponed are properly rerun.
The issue was exposed by commit
1ce8b37241ed ("usbnet: smsc95xx: Forward
PHY interrupts to PHY driver to avoid polling"), but has existed since
forever.
Fixes:
541cd3ee00a4 ("phylib: Fix deadlock on resume")
Link: https://lore.kernel.org/netdev/a5315a8a-32c2-962f-f696-de9a26d30091@samsung.com/
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: stable@vger.kernel.org # v2.6.33+
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/b7f386d04e9b5b0e2738f0125743e30676f309ef.1656410895.git.lukas@wunner.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oliver Neukum [Tue, 28 Jun 2022 09:35:17 +0000 (11:35 +0200)]
usbnet: fix memory allocation in helpers
usbnet provides some helper functions that are also used in
the context of reset() operations. During a reset the other
drivers on a device are unable to operate. As that can be block
drivers, a driver for another interface cannot use paging
in its memory allocations without risking a deadlock.
Use GFP_NOIO in the helpers.
Fixes:
877bd862f32b8 ("usbnet: introduce usbnet 3 command helpers")
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Link: https://lore.kernel.org/r/20220628093517.7469-1-oneukum@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Coleman Dietsch [Tue, 28 Jun 2022 17:47:44 +0000 (12:47 -0500)]
selftests net: fix kselftest net fatal error
The incorrect path is causing the following error when trying to run net
kselftests:
In file included from bpf/nat6to4.c:43:
../../../lib/bpf/bpf_helpers.h:11:10: fatal error: 'bpf_helper_defs.h' file not found
^~~~~~~~~~~~~~~~~~~
1 error generated.
Fixes:
cf67838c4422 ("selftests net: fix bpf build error")
Signed-off-by: Coleman Dietsch <dietschc@csp.edu>
Link: https://lore.kernel.org/r/20220628174744.7908-1-dietschc@csp.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 30 Jun 2022 03:09:32 +0000 (20:09 -0700)]
Merge git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Restore set counter when one of the CPU loses race to add elements
to sets.
2) After NF_STOLEN, skb might be there no more, update nftables trace
infra to avoid access to skb in this case. From Florian Westphal.
3) nftables bridge might register a prerouting hook with zero priority,
br_netfilter incorrectly skips it. Also from Florian.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: br_netfilter: do not skip all hooks with 0 priority
netfilter: nf_tables: avoid skb access on nf_stolen
netfilter: nft_dynset: restore set element counter when failing to update
====================
Link: https://lore.kernel.org/r/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dave Airlie [Thu, 30 Jun 2022 00:48:54 +0000 (10:48 +1000)]
Merge tag 'amd-drm-fixes-5.19-2022-06-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.19-2022-06-29:
amdgpu:
- GPU recovery fix
- Fix integer type usage in fourcc header for AMD modifiers
- KFD TLB flush fix for gfx9 APUs
- Display fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220629192220.5870-1-alexander.deucher@amd.com
Dave Airlie [Thu, 30 Jun 2022 00:21:14 +0000 (10:21 +1000)]
Merge tag 'drm-intel-fixes-2022-06-29' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.19-rc5:
- Fix ioctl argument error return
- Fix d3cold disable to allow PCI upstream bridge D3 transition
- Fix setting cache_dirty for dma-buf objects on discrete
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/871qv7rblv.fsf@intel.com
Mikulas Patocka [Wed, 29 Jun 2022 17:40:57 +0000 (13:40 -0400)]
dm raid: fix KASAN warning in raid5_add_disks
There's a KASAN warning in raid5_add_disk when running the LVM testsuite.
The warning happens in the test
lvconvert-raid-reshape-linear_to_raid6-single-type.sh. We fix the warning
by verifying that rdev->saved_raid_disk is within limits.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mikulas Patocka [Wed, 29 Jun 2022 17:40:01 +0000 (13:40 -0400)]
dm raid: fix KASAN warning in raid5_remove_disk
There's a KASAN warning in raid5_remove_disk when running the LVM
testsuite. We fix this warning by verifying that the "number" variable is
within limits.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>