platform/kernel/linux-amlogic.git
9 months agoext4: fix i_data_sem unlock order in ext4_ind_migrate()
Artem Sadovnikov [Thu, 29 Aug 2024 15:22:09 +0000 (15:22 +0000)]
ext4: fix i_data_sem unlock order in ext4_ind_migrate()

Fuzzing reports a possible deadlock in jbd2_log_wait_commit.

This issue is triggered when an EXT4_IOC_MIGRATE ioctl is set to require
synchronous updates because the file descriptor is opened with O_SYNC.
This can lead to the jbd2_journal_stop() function calling
jbd2_might_wait_for_commit(), potentially causing a deadlock if the
EXT4_IOC_MIGRATE call races with a write(2) system call.

This problem only arises when CONFIG_PROVE_LOCKING is enabled. In this
case, the jbd2_might_wait_for_commit macro locks jbd2_handle in the
jbd2_journal_stop function while i_data_sem is locked. This triggers
lockdep because the jbd2_journal_start function might also lock the same
jbd2_handle simultaneously.

Found by Linux Verification Center (linuxtesting.org) with syzkaller.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Co-developed-by: Mikhail Ukhin <mish.uxin2012@yandex.ru>
Signed-off-by: Mikhail Ukhin <mish.uxin2012@yandex.ru>
Signed-off-by: Artem Sadovnikov <ancowi69@gmail.com>
Rule: add
Link: https://lore.kernel.org/stable/20240404095000.5872-1-mish.uxin2012%40yandex.ru
Link: https://patch.msgid.link/20240829152210.2754-1-ancowi69@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: remove the special buffer dirty handling in do_journal_get_write_access
Shida Zhang [Fri, 30 Aug 2024 05:37:39 +0000 (13:37 +0800)]
ext4: remove the special buffer dirty handling in do_journal_get_write_access

This kinda revert the commit 56d35a4cd13e("ext4: Fix dirtying of
journalled buffers in data=journal mode") made by Jan 14 years ago,
since the do_get_write_access() itself can deal with the extra
unexpected buf dirting things in a proper way now.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240830053739.3588573-5-zhangshida@kylinos.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: fix a potential assertion failure due to improperly dirtied buffer
Shida Zhang [Fri, 30 Aug 2024 05:37:38 +0000 (13:37 +0800)]
ext4: fix a potential assertion failure due to improperly dirtied buffer

On an old kernel version(4.19, ext3, data=journal, pagesize=64k),
an assertion failure will occasionally be triggered by the line below:
-----------
jbd2_journal_commit_transaction
{
...
J_ASSERT_BH(bh, !buffer_dirty(bh));
/*
* The buffer on BJ_Forget list and not jbddirty means
...
}
-----------

The same condition may also be applied to the lattest kernel version.

When blocksize < pagesize and we truncate a file, there can be buffers in
the mapping tail page beyond i_size. These buffers will be filed to
transaction's BJ_Forget list by ext4_journalled_invalidatepage() during
truncation. When the transaction doing truncate starts committing, we can
grow the file again. This calls __block_write_begin() which allocates new
blocks under these buffers in the tail page we go through the branch:

                        if (buffer_new(bh)) {
                                clean_bdev_bh_alias(bh);
                                if (folio_test_uptodate(folio)) {
                                        clear_buffer_new(bh);
                                        set_buffer_uptodate(bh);
                                        mark_buffer_dirty(bh);
                                        continue;
                                }
                                ...
                        }

Hence buffers on BJ_Forget list of the committing transaction get marked
dirty and this triggers the jbd2 assertion.

Teach ext4_block_write_begin() to properly handle files with data
journalling by avoiding dirtying them directly. Instead of
folio_zero_new_buffers() we use ext4_journalled_zero_new_buffers() which
takes care of handling journalling. We also don't need to mark new uptodate
buffers as dirty in ext4_block_write_begin(). That will be either done
either by block_commit_write() in case of success or by
folio_zero_new_buffers() in case of failure.

Reported-by: Baolin Liu <liubaolin@kylinos.cn>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240830053739.3588573-4-zhangshida@kylinos.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: hoist ext4_block_write_begin and replace the __block_write_begin
Shida Zhang [Fri, 30 Aug 2024 05:37:37 +0000 (13:37 +0800)]
ext4: hoist ext4_block_write_begin and replace the __block_write_begin

Using __block_write_begin() make it inconvenient to journal the
user data dirty process. We can't tell the block layer maintainer,
‘Hey, we want to trace the dirty user data in ext4, can we add some
special code for ext4 in __block_write_begin?’:P

So use ext4_block_write_begin() instead.

The two functions are basically doing the same thing except for the
fscrypt related code. Remove the unnecessary #ifdef since
fscrypt_inode_uses_fs_layer_crypto() returns false (and it's known at
compile time) when !CONFIG_FS_ENCRYPTION.

And hoist the ext4_block_write_begin so that it can be used in other
files.

Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240830053739.3588573-3-zhangshida@kylinos.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers
Shida Zhang [Fri, 30 Aug 2024 05:37:36 +0000 (13:37 +0800)]
ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers

For new uptodate buffers we also need to call write_end_fn() to persist the
uptodate content, similarly as folio_zero_new_buffers() does it.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240830053739.3588573-2-zhangshida@kylinos.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: dax: keep orphan list before truncate overflow allocated blocks
yangerkun [Thu, 29 Aug 2024 11:02:22 +0000 (19:02 +0800)]
ext4: dax: keep orphan list before truncate overflow allocated blocks

Any extending write for ext4 requires the inode to be placed on the
orphan list before the actual write. In addition, the inode can be
actually removed from the orphan list only after all writes are
completed. Otherwise we'd leave allocated blocks beyond i_disksize if we
could not copy all the data into allocated block and e2fsck would
complain.

Currently, direct IO and buffered IO comply with this logic(buffered
IO will truncate all overflow allocated blocks that has not been
written successfully, and direct IO will truncate all allocated blocks
when error occurs). However, dax write break this since dax write will
remove the inode from the orphan list by calling
ext4_handle_inode_extension unconditionally during extending write.

We add a argument to help determine does we do a fully write, and for
the case not fully write, we leave the inode on the orphan list, and the
latter ext4_inode_extension_cleanup will help us truncate the overflow
allocated blocks, and then remove the inode from the orphan list.

Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240829110222.126685-1-yangerkun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: fix error message when rejecting the default hash
Gabriel Krisman Bertazi [Tue, 27 Aug 2024 20:16:36 +0000 (16:16 -0400)]
ext4: fix error message when rejecting the default hash

Commit 985b67cd8639 ("ext4: filesystems without casefold feature cannot
be mounted with siphash") properly rejects volumes where
s_def_hash_version is set to DX_HASH_SIPHASH, but the check and the
error message should not look into casefold setup - a filesystem should
never have DX_HASH_SIPHASH as the default hash.  Fix it and, since we
are there, move the check to ext4_hash_info_init.

Fixes:985b67cd8639 ("ext4: filesystems without casefold feature cannot
be mounted with siphash")

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://patch.msgid.link/87jzg1en6j.fsf_-_@mailhost.krisman.be
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: save unnecessary indentation in ext4_ext_create_new_leaf()
Baokun Li [Thu, 22 Aug 2024 02:35:45 +0000 (10:35 +0800)]
ext4: save unnecessary indentation in ext4_ext_create_new_leaf()

Save an indentation level in ext4_ext_create_new_leaf() by removing
unnecessary 'else'. Besides, the variable 'ee_block' is declared to
avoid line breaks. No functional changes.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240822023545.1994557-26-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: make some fast commit functions reuse extents path
Baokun Li [Thu, 22 Aug 2024 02:35:44 +0000 (10:35 +0800)]
ext4: make some fast commit functions reuse extents path

The ext4_find_extent() can update the extent path so that it does not have
to allocate and free the path repeatedly, thus reducing the consumption of
memory allocation and freeing in the following functions:

    ext4_ext_clear_bb
    ext4_ext_replay_set_iblocks
    ext4_fc_replay_add_range
    ext4_fc_set_bitmaps_and_counters

No functional changes. Note that ext4_find_extent() does not support error
pointers, so in this case set path to NULL first.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-25-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: refactor ext4_swap_extents() to reuse extents path
Baokun Li [Thu, 22 Aug 2024 02:35:43 +0000 (10:35 +0800)]
ext4: refactor ext4_swap_extents() to reuse extents path

The ext4_find_extent() can update the extent path so it doesn't have to
allocate and free path repeatedly, thus reducing the consumption of memory
allocation and freeing in ext4_swap_extents().

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-24-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: get rid of ppath in convert_initialized_extent()
Baokun Li [Thu, 22 Aug 2024 02:35:42 +0000 (10:35 +0800)]
ext4: get rid of ppath in convert_initialized_extent()

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

To get rid of the ppath in convert_initialized_extent(), the following is
done here:

 * Free the extents path when an error is encountered.

No functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-23-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: get rid of ppath in ext4_ext_handle_unwritten_extents()
Baokun Li [Thu, 22 Aug 2024 02:35:41 +0000 (10:35 +0800)]
ext4: get rid of ppath in ext4_ext_handle_unwritten_extents()

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

To get rid of the ppath in ext4_ext_handle_unwritten_extents(), the
following is done here:

 * Free the extents path when an error is encountered.
 * The 'allocated' is changed from passing a value to passing an address.

No functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-22-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: get rid of ppath in ext4_ext_convert_to_initialized()
Baokun Li [Thu, 22 Aug 2024 02:35:40 +0000 (10:35 +0800)]
ext4: get rid of ppath in ext4_ext_convert_to_initialized()

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

To get rid of the ppath in ext4_ext_convert_to_initialized(), the following
is done here:

 * Free the extents path when an error is encountered.
 * Its caller needs to update ppath if it uses ppath.
 * The 'allocated' is changed from passing a value to passing an address.

No functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-21-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: get rid of ppath in ext4_convert_unwritten_extents_endio()
Baokun Li [Thu, 22 Aug 2024 02:35:39 +0000 (10:35 +0800)]
ext4: get rid of ppath in ext4_convert_unwritten_extents_endio()

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

To get rid of the ppath in ext4_convert_unwritten_extents_endio(), the
following is done here:

 * Free the extents path when an error is encountered.
 * Its caller needs to update ppath if it uses ppath.

No functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-20-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: get rid of ppath in ext4_split_convert_extents()
Baokun Li [Thu, 22 Aug 2024 02:35:38 +0000 (10:35 +0800)]
ext4: get rid of ppath in ext4_split_convert_extents()

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

To get rid of the ppath in ext4_split_convert_extents(), the following is
done here:

 * Its caller needs to update ppath if it uses ppath.

No functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-19-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: get rid of ppath in ext4_split_extent()
Baokun Li [Thu, 22 Aug 2024 02:35:37 +0000 (10:35 +0800)]
ext4: get rid of ppath in ext4_split_extent()

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

To get rid of the ppath in ext4_split_extent(), the following is done here:

 * The 'allocated' is changed from passing a value to passing an address.
 * Its caller needs to update ppath if it uses ppath.

No functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-18-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: get rid of ppath in ext4_force_split_extent_at()
Baokun Li [Thu, 22 Aug 2024 02:35:36 +0000 (10:35 +0800)]
ext4: get rid of ppath in ext4_force_split_extent_at()

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

To get rid of the ppath in ext4_force_split_extent_at(), the following is
done here:

 * Free the extents path when an error is encountered.

No functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-17-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: get rid of ppath in ext4_split_extent_at()
Baokun Li [Thu, 22 Aug 2024 02:35:35 +0000 (10:35 +0800)]
ext4: get rid of ppath in ext4_split_extent_at()

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

To get rid of the ppath in ext4_split_extent_at(), the following is done
here:

 * Free the extents path when an error is encountered.
 * Its caller needs to update ppath if it uses ppath.
 * Teach ext4_ext_show_leaf() to skip error pointer.

No functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-16-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: get rid of ppath in ext4_ext_insert_extent()
Baokun Li [Thu, 22 Aug 2024 02:35:34 +0000 (10:35 +0800)]
ext4: get rid of ppath in ext4_ext_insert_extent()

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

To get rid of the ppath in ext4_ext_insert_extent(), the following is done
here:

 * Free the extents path when an error is encountered.
 * Its caller needs to update ppath if it uses ppath.
 * Free path when npath is used, free npath when it is not used.
 * The got_allocated_blocks label in ext4_ext_map_blocks() does not
   update err now, so err is updated to 0 if the err returned by
   ext4_ext_search_right() is greater than 0 and is about to enter
   got_allocated_blocks.

No functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-15-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: get rid of ppath in ext4_ext_create_new_leaf()
Baokun Li [Thu, 22 Aug 2024 02:35:33 +0000 (10:35 +0800)]
ext4: get rid of ppath in ext4_ext_create_new_leaf()

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

To get rid of the ppath in ext4_ext_create_new_leaf(), the following is
done here:

 * Free the extents path when an error is encountered.
 * Its caller needs to update ppath if it uses ppath.

No functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-14-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: get rid of ppath in get_ext_path()
Baokun Li [Thu, 22 Aug 2024 02:35:32 +0000 (10:35 +0800)]
ext4: get rid of ppath in get_ext_path()

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

After getting rid of ppath in get_ext_path(), its caller may pass an error
pointer to ext4_free_ext_path(), so it needs to teach ext4_free_ext_path()
and ext4_ext_drop_refs() to skip the error pointer. No functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-13-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: get rid of ppath in ext4_find_extent()
Baokun Li [Thu, 22 Aug 2024 02:35:31 +0000 (10:35 +0800)]
ext4: get rid of ppath in ext4_find_extent()

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

Getting rid of ppath in ext4_find_extent() requires its caller to update
ppath. These ppaths will also be dropped later. No functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-12-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: propagate errors from ext4_find_extent() in ext4_insert_range()
Baokun Li [Thu, 22 Aug 2024 02:35:30 +0000 (10:35 +0800)]
ext4: propagate errors from ext4_find_extent() in ext4_insert_range()

Even though ext4_find_extent() returns an error, ext4_insert_range() still
returns 0. This may confuse the user as to why fallocate returns success,
but the contents of the file are not as expected. So propagate the error
returned by ext4_find_extent() to avoid inconsistencies.

Fixes: 331573febb6a ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-11-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: add new ext4_ext_path_brelse() helper
Baokun Li [Thu, 22 Aug 2024 02:35:29 +0000 (10:35 +0800)]
ext4: add new ext4_ext_path_brelse() helper

Add ext4_ext_path_brelse() helper function to reduce duplicate code
and ensure that path->p_bh is set to NULL after it is released.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-10-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: fix double brelse() the buffer of the extents path
Baokun Li [Thu, 22 Aug 2024 02:35:28 +0000 (10:35 +0800)]
ext4: fix double brelse() the buffer of the extents path

In ext4_ext_try_to_merge_up(), set path[1].p_bh to NULL after it has been
released, otherwise it may be released twice. An example of what triggers
this is as follows:

  split2    map    split1
|--------|-------|--------|

ext4_ext_map_blocks
 ext4_ext_handle_unwritten_extents
  ext4_split_convert_extents
   // path->p_depth == 0
   ext4_split_extent
     // 1. do split1
     ext4_split_extent_at
       |ext4_ext_insert_extent
       |  ext4_ext_create_new_leaf
       |    ext4_ext_grow_indepth
       |      le16_add_cpu(&neh->eh_depth, 1)
       |    ext4_find_extent
       |      // return -ENOMEM
       |// get error and try zeroout
       |path = ext4_find_extent
       |  path->p_depth = 1
       |ext4_ext_try_to_merge
       |  ext4_ext_try_to_merge_up
       |    path->p_depth = 0
       |    brelse(path[1].p_bh)  ---> not set to NULL here
       |// zeroout success
     // 2. update path
     ext4_find_extent
     // 3. do split2
     ext4_split_extent_at
       ext4_ext_insert_extent
         ext4_ext_create_new_leaf
           ext4_ext_grow_indepth
             le16_add_cpu(&neh->eh_depth, 1)
           ext4_find_extent
             path[0].p_bh = NULL;
             path->p_depth = 1
             read_extent_tree_block  ---> return err
             // path[1].p_bh is still the old value
             ext4_free_ext_path
               ext4_ext_drop_refs
                 // path->p_depth == 1
                 brelse(path[1].p_bh)  ---> brelse a buffer twice

Finally got the following WARRNING when removing the buffer from lru:

============================================
VFS: brelse: Trying to free free buffer
WARNING: CPU: 2 PID: 72 at fs/buffer.c:1241 __brelse+0x58/0x90
CPU: 2 PID: 72 Comm: kworker/u19:1 Not tainted 6.9.0-dirty #716
RIP: 0010:__brelse+0x58/0x90
Call Trace:
 <TASK>
 __find_get_block+0x6e7/0x810
 bdev_getblk+0x2b/0x480
 __ext4_get_inode_loc+0x48a/0x1240
 ext4_get_inode_loc+0xb2/0x150
 ext4_reserve_inode_write+0xb7/0x230
 __ext4_mark_inode_dirty+0x144/0x6a0
 ext4_ext_insert_extent+0x9c8/0x3230
 ext4_ext_map_blocks+0xf45/0x2dc0
 ext4_map_blocks+0x724/0x1700
 ext4_do_writepages+0x12d6/0x2a70
[...]
============================================

Fixes: ecb94f5fdf4b ("ext4: collapse a single extent tree block into the inode if possible")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-9-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: drop ppath from ext4_ext_replay_update_ex() to avoid double-free
Baokun Li [Thu, 22 Aug 2024 02:35:27 +0000 (10:35 +0800)]
ext4: drop ppath from ext4_ext_replay_update_ex() to avoid double-free

When calling ext4_force_split_extent_at() in ext4_ext_replay_update_ex(),
the 'ppath' is updated but it is the 'path' that is freed, thus potentially
triggering a double-free in the following process:

ext4_ext_replay_update_ex
  ppath = path
  ext4_force_split_extent_at(&ppath)
    ext4_split_extent_at
      ext4_ext_insert_extent
        ext4_ext_create_new_leaf
          ext4_ext_grow_indepth
            ext4_find_extent
              if (depth > path[0].p_maxdepth)
                kfree(path)                 ---> path First freed
                *orig_path = path = NULL    ---> null ppath
  kfree(path)                               ---> path double-free !!!

So drop the unnecessary ppath and use path directly to avoid this problem.
And use ext4_find_extent() directly to update path, avoiding unnecessary
memory allocation and freeing. Also, propagate the error returned by
ext4_find_extent() instead of using strange error codes.

Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-8-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: aovid use-after-free in ext4_ext_insert_extent()
Baokun Li [Thu, 22 Aug 2024 02:35:26 +0000 (10:35 +0800)]
ext4: aovid use-after-free in ext4_ext_insert_extent()

As Ojaswin mentioned in Link, in ext4_ext_insert_extent(), if the path is
reallocated in ext4_ext_create_new_leaf(), we'll use the stale path and
cause UAF. Below is a sample trace with dummy values:

ext4_ext_insert_extent
  path = *ppath = 2000
  ext4_ext_create_new_leaf(ppath)
    ext4_find_extent(ppath)
      path = *ppath = 2000
      if (depth > path[0].p_maxdepth)
            kfree(path = 2000);
            *ppath = path = NULL;
      path = kcalloc() = 3000
      *ppath = 3000;
      return path;
  /* here path is still 2000, UAF! */
  eh = path[depth].p_hdr

==================================================================
BUG: KASAN: slab-use-after-free in ext4_ext_insert_extent+0x26d4/0x3330
Read of size 8 at addr ffff8881027bf7d0 by task kworker/u36:1/179
CPU: 3 UID: 0 PID: 179 Comm: kworker/u6:1 Not tainted 6.11.0-rc2-dirty #866
Call Trace:
 <TASK>
 ext4_ext_insert_extent+0x26d4/0x3330
 ext4_ext_map_blocks+0xe22/0x2d40
 ext4_map_blocks+0x71e/0x1700
 ext4_do_writepages+0x1290/0x2800
[...]

Allocated by task 179:
 ext4_find_extent+0x81c/0x1f70
 ext4_ext_map_blocks+0x146/0x2d40
 ext4_map_blocks+0x71e/0x1700
 ext4_do_writepages+0x1290/0x2800
 ext4_writepages+0x26d/0x4e0
 do_writepages+0x175/0x700
[...]

Freed by task 179:
 kfree+0xcb/0x240
 ext4_find_extent+0x7c0/0x1f70
 ext4_ext_insert_extent+0xa26/0x3330
 ext4_ext_map_blocks+0xe22/0x2d40
 ext4_map_blocks+0x71e/0x1700
 ext4_do_writepages+0x1290/0x2800
 ext4_writepages+0x26d/0x4e0
 do_writepages+0x175/0x700
[...]
==================================================================

So use *ppath to update the path to avoid the above problem.

Reported-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Closes: https://lore.kernel.org/r/ZqyL6rmtwl6N4MWR@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com
Fixes: 10809df84a4d ("ext4: teach ext4_ext_find_extent() to realloc path if necessary")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240822023545.1994557-7-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: update orig_path in ext4_find_extent()
Baokun Li [Thu, 22 Aug 2024 02:35:25 +0000 (10:35 +0800)]
ext4: update orig_path in ext4_find_extent()

In ext4_find_extent(), if the path is not big enough, we free it and set
*orig_path to NULL. But after reallocating and successfully initializing
the path, we don't update *orig_path, in which case the caller gets a
valid path but a NULL ppath, and this may cause a NULL pointer dereference
or a path memory leak. For example:

ext4_split_extent
  path = *ppath = 2000
  ext4_find_extent
    if (depth > path[0].p_maxdepth)
      kfree(path = 2000);
      *orig_path = path = NULL;
      path = kcalloc() = 3000
  ext4_split_extent_at(*ppath = NULL)
    path = *ppath;
    ex = path[depth].p_ext;
    // NULL pointer dereference!

==================================================================
BUG: kernel NULL pointer dereference, address: 0000000000000010
CPU: 6 UID: 0 PID: 576 Comm: fsstress Not tainted 6.11.0-rc2-dirty #847
RIP: 0010:ext4_split_extent_at+0x6d/0x560
Call Trace:
 <TASK>
 ext4_split_extent.isra.0+0xcb/0x1b0
 ext4_ext_convert_to_initialized+0x168/0x6c0
 ext4_ext_handle_unwritten_extents+0x325/0x4d0
 ext4_ext_map_blocks+0x520/0xdb0
 ext4_map_blocks+0x2b0/0x690
 ext4_iomap_begin+0x20e/0x2c0
[...]
==================================================================

Therefore, *orig_path is updated when the extent lookup succeeds, so that
the caller can safely use path or *ppath.

Fixes: 10809df84a4d ("ext4: teach ext4_ext_find_extent() to realloc path if necessary")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240822023545.1994557-6-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: avoid use-after-free in ext4_ext_show_leaf()
Baokun Li [Thu, 22 Aug 2024 02:35:24 +0000 (10:35 +0800)]
ext4: avoid use-after-free in ext4_ext_show_leaf()

In ext4_find_extent(), path may be freed by error or be reallocated, so
using a previously saved *ppath may have been freed and thus may trigger
use-after-free, as follows:

ext4_split_extent
  path = *ppath;
  ext4_split_extent_at(ppath)
  path = ext4_find_extent(ppath)
  ext4_split_extent_at(ppath)
    // ext4_find_extent fails to free path
    // but zeroout succeeds
  ext4_ext_show_leaf(inode, path)
    eh = path[depth].p_hdr
    // path use-after-free !!!

Similar to ext4_split_extent_at(), we use *ppath directly as an input to
ext4_ext_show_leaf(). Fix a spelling error by the way.

Same problem in ext4_ext_handle_unwritten_extents(). Since 'path' is only
used in ext4_ext_show_leaf(), remove 'path' and use *ppath directly.

This issue is triggered only when EXT_DEBUG is defined and therefore does
not affect functionality.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-5-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: fix slab-use-after-free in ext4_split_extent_at()
Baokun Li [Thu, 22 Aug 2024 02:35:23 +0000 (10:35 +0800)]
ext4: fix slab-use-after-free in ext4_split_extent_at()

We hit the following use-after-free:

==================================================================
BUG: KASAN: slab-use-after-free in ext4_split_extent_at+0xba8/0xcc0
Read of size 2 at addr ffff88810548ed08 by task kworker/u20:0/40
CPU: 0 PID: 40 Comm: kworker/u20:0 Not tainted 6.9.0-dirty #724
Call Trace:
 <TASK>
 kasan_report+0x93/0xc0
 ext4_split_extent_at+0xba8/0xcc0
 ext4_split_extent.isra.0+0x18f/0x500
 ext4_split_convert_extents+0x275/0x750
 ext4_ext_handle_unwritten_extents+0x73e/0x1580
 ext4_ext_map_blocks+0xe20/0x2dc0
 ext4_map_blocks+0x724/0x1700
 ext4_do_writepages+0x12d6/0x2a70
[...]

Allocated by task 40:
 __kmalloc_noprof+0x1ac/0x480
 ext4_find_extent+0xf3b/0x1e70
 ext4_ext_map_blocks+0x188/0x2dc0
 ext4_map_blocks+0x724/0x1700
 ext4_do_writepages+0x12d6/0x2a70
[...]

Freed by task 40:
 kfree+0xf1/0x2b0
 ext4_find_extent+0xa71/0x1e70
 ext4_ext_insert_extent+0xa22/0x3260
 ext4_split_extent_at+0x3ef/0xcc0
 ext4_split_extent.isra.0+0x18f/0x500
 ext4_split_convert_extents+0x275/0x750
 ext4_ext_handle_unwritten_extents+0x73e/0x1580
 ext4_ext_map_blocks+0xe20/0x2dc0
 ext4_map_blocks+0x724/0x1700
 ext4_do_writepages+0x12d6/0x2a70
[...]
==================================================================

The flow of issue triggering is as follows:

ext4_split_extent_at
  path = *ppath
  ext4_ext_insert_extent(ppath)
    ext4_ext_create_new_leaf(ppath)
      ext4_find_extent(orig_path)
        path = *orig_path
        read_extent_tree_block
          // return -ENOMEM or -EIO
        ext4_free_ext_path(path)
          kfree(path)
        *orig_path = NULL
  a. If err is -ENOMEM:
  ext4_ext_dirty(path + path->p_depth)
  // path use-after-free !!!
  b. If err is -EIO and we have EXT_DEBUG defined:
  ext4_ext_show_leaf(path)
    eh = path[depth].p_hdr
    // path also use-after-free !!!

So when trying to zeroout or fix the extent length, call ext4_find_extent()
to update the path.

In addition we use *ppath directly as an ext4_ext_show_leaf() input to
avoid possible use-after-free when EXT_DEBUG is defined, and to avoid
unnecessary path updates.

Fixes: dfe5080939ea ("ext4: drop EXT4_EX_NOFREE_ON_ERR from rest of extents handling code")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-4-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: prevent partial update of the extents path
Baokun Li [Thu, 22 Aug 2024 02:35:22 +0000 (10:35 +0800)]
ext4: prevent partial update of the extents path

In ext4_ext_rm_idx() and ext4_ext_correct_indexes(), there is no proper
rollback of already executed updates when updating a level of the extents
path fails, so we may get an inconsistent extents tree, which may trigger
some bad things in errors=continue mode.

Hence clear the verified bit of modified extents buffers if the tree fails
to be updated in ext4_ext_rm_idx() or ext4_ext_correct_indexes(), which
forces the extents buffers to be checked in ext4_valid_extent_entries(),
ensuring that the extents tree is consistent.

Signed-off-by: zhanchengbin <zhanchengbin1@huawei.com>
Link: https://lore.kernel.org/r/20230213080514.535568-3-zhanchengbin1@huawei.com/
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-3-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: refactor ext4_ext_rm_idx() to index 'path'
Baokun Li [Thu, 22 Aug 2024 02:35:21 +0000 (10:35 +0800)]
ext4: refactor ext4_ext_rm_idx() to index 'path'

As suggested by Honza in Link,modify ext4_ext_rm_idx() to leave 'path'
alone and just index it like ext4_ext_correct_indexes() does it. This
facilitates adding error handling later. No functional changes.

Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/all/20230216130305.nrbtd42tppxhbynn@quack3/
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-2-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: avoid OOB when system.data xattr changes underneath the filesystem
Thadeu Lima de Souza Cascardo [Wed, 21 Aug 2024 15:23:24 +0000 (12:23 -0300)]
ext4: avoid OOB when system.data xattr changes underneath the filesystem

When looking up for an entry in an inlined directory, if e_value_offs is
changed underneath the filesystem by some change in the block device, it
will lead to an out-of-bounds access that KASAN detects as an UAF.

EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none.
loop0: detected capacity change from 2048 to 2047
==================================================================
BUG: KASAN: use-after-free in ext4_search_dir+0xf2/0x1c0 fs/ext4/namei.c:1500
Read of size 1 at addr ffff88803e91130f by task syz-executor269/5103

CPU: 0 UID: 0 PID: 5103 Comm: syz-executor269 Not tainted 6.11.0-rc4-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:93 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
 print_address_description mm/kasan/report.c:377 [inline]
 print_report+0x169/0x550 mm/kasan/report.c:488
 kasan_report+0x143/0x180 mm/kasan/report.c:601
 ext4_search_dir+0xf2/0x1c0 fs/ext4/namei.c:1500
 ext4_find_inline_entry+0x4be/0x5e0 fs/ext4/inline.c:1697
 __ext4_find_entry+0x2b4/0x1b30 fs/ext4/namei.c:1573
 ext4_lookup_entry fs/ext4/namei.c:1727 [inline]
 ext4_lookup+0x15f/0x750 fs/ext4/namei.c:1795
 lookup_one_qstr_excl+0x11f/0x260 fs/namei.c:1633
 filename_create+0x297/0x540 fs/namei.c:3980
 do_symlinkat+0xf9/0x3a0 fs/namei.c:4587
 __do_sys_symlinkat fs/namei.c:4610 [inline]
 __se_sys_symlinkat fs/namei.c:4607 [inline]
 __x64_sys_symlinkat+0x95/0xb0 fs/namei.c:4607
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f3e73ced469
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 21 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff4d40c258 EFLAGS: 00000246 ORIG_RAX: 000000000000010a
RAX: ffffffffffffffda RBX: 0032656c69662f2e RCX: 00007f3e73ced469
RDX: 0000000020000200 RSI: 00000000ffffff9c RDI: 00000000200001c0
RBP: 0000000000000000 R08: 00007fff4d40c290 R09: 00007fff4d40c290
R10: 0023706f6f6c2f76 R11: 0000000000000246 R12: 00007fff4d40c27c
R13: 0000000000000003 R14: 431bde82d7b634db R15: 00007fff4d40c2b0
 </TASK>

Calling ext4_xattr_ibody_find right after reading the inode with
ext4_get_inode_loc will lead to a check of the validity of the xattrs,
avoiding this problem.

Reported-by: syzbot+0c2508114d912a54ee79@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0c2508114d912a54ee79
Fixes: e8e948e7802a ("ext4: let ext4_find_entry handle inline data")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Link: https://patch.msgid.link/20240821152324.3621860-5-cascardo@igalia.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: explicitly exit when ext4_find_inline_entry returns an error
Thadeu Lima de Souza Cascardo [Wed, 21 Aug 2024 15:23:23 +0000 (12:23 -0300)]
ext4: explicitly exit when ext4_find_inline_entry returns an error

__ext4_find_entry currently ignores the return of ext4_find_inline_entry,
except for returning the bh or NULL when has_inline_data is 1.

Even though has_inline_data is set to 1 before calling
ext4_find_inline_entry and would only be set to 0 when that function
returns NULL, check for an encoded error return explicitly in order to
exit.

That makes the code more readable, not requiring that one assumes the cases
when has_inline_data is 1.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Link: https://patch.msgid.link/20240821152324.3621860-4-cascardo@igalia.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: return error on ext4_find_inline_entry
Thadeu Lima de Souza Cascardo [Wed, 21 Aug 2024 15:23:22 +0000 (12:23 -0300)]
ext4: return error on ext4_find_inline_entry

In case of errors when reading an inode from disk or traversing inline
directory entries, return an error-encoded ERR_PTR instead of returning
NULL. ext4_find_inline_entry only caller, __ext4_find_entry already returns
such encoded errors.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Link: https://patch.msgid.link/20240821152324.3621860-3-cascardo@igalia.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: ext4_search_dir should return a proper error
Thadeu Lima de Souza Cascardo [Wed, 21 Aug 2024 15:23:21 +0000 (12:23 -0300)]
ext4: ext4_search_dir should return a proper error

ext4_search_dir currently returns -1 in case of a failure, while it returns
0 when the name is not found. In such failure cases, it should return an
error code instead.

This becomes even more important when ext4_find_inline_entry returns an
error code as well in the next commit.

-EFSCORRUPTED seems appropriate as such error code as these failures would
be caused by unexpected record lengths and is in line with other instances
of ext4_check_dir_entry failures.

In the case of ext4_dx_find_entry, the current use of ERR_BAD_DX_DIR was
left as is to reduce the risk of regressions.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Link: https://patch.msgid.link/20240821152324.3621860-2-cascardo@igalia.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: check buffer_verified in advance to avoid unneeded ext4_get_group_info()
Kemeng Shi [Tue, 20 Aug 2024 13:22:34 +0000 (21:22 +0800)]
ext4: check buffer_verified in advance to avoid unneeded ext4_get_group_info()

Check buffer_verified in advance to avoid unneeded ext4_get_group_info().
This could be a simple cleanup as compiler may handle this.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://patch.msgid.link/20240820132234.2759926-8-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: remove unneeded NULL check of buffer_head in ext4_mark_inode_used()
Kemeng Shi [Tue, 20 Aug 2024 13:22:33 +0000 (21:22 +0800)]
ext4: remove unneeded NULL check of buffer_head in ext4_mark_inode_used()

If gdp from ext4_get_group_desc() is not NULL, then returned group_desc_bh
won't be NULL either. Remove check of group_desc_bh and only check
returned gdp from ext4_get_group_desc() like how other callers do.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://patch.msgid.link/20240820132234.2759926-7-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: move checksum length calculation of inode bitmap into ext4_inode_bitmap_csum_...
Kemeng Shi [Tue, 20 Aug 2024 13:22:32 +0000 (21:22 +0800)]
ext4: move checksum length calculation of inode bitmap into ext4_inode_bitmap_csum_[verify/set]() functions

There are some little improve:
1. remove repeat code to calculate checksum length of inode bitmap
2. remove unnecessary checksum length calculation if checksum is not
enabled.
3. use more efficient bit shift operation instead of div opreation.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://patch.msgid.link/20240820132234.2759926-6-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: remove dead check in __ext4_new_inode()
Kemeng Shi [Tue, 20 Aug 2024 13:22:31 +0000 (21:22 +0800)]
ext4: remove dead check in __ext4_new_inode()

If we can't grab any inode, the prvious find_inode_bit() will set ino
to be >= EXT4_INODES_PER_GROUP(sb). So the check of need to repeat
in the same group is not needed.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://patch.msgid.link/20240820132234.2759926-5-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: avoid negative min_clusters in find_group_orlov()
Kemeng Shi [Tue, 20 Aug 2024 13:22:30 +0000 (21:22 +0800)]
ext4: avoid negative min_clusters in find_group_orlov()

min_clusters is signed integer and will be converted to unsigned
integer when compared with unsigned number stats.free_clusters.
If min_clusters is negative, it will be converted to a huge unsigned
value in which case all groups may not meet the actual desired free
clusters.
Set negative min_clusters to 0 to avoid unexpected behavior.

Fixes: ac27a0ec112a ("[PATCH] ext4: initial copy of files from ext3")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://patch.msgid.link/20240820132234.2759926-4-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: avoid potential buffer_head leak in __ext4_new_inode()
Kemeng Shi [Tue, 20 Aug 2024 13:22:29 +0000 (21:22 +0800)]
ext4: avoid potential buffer_head leak in __ext4_new_inode()

If a group is marked EXT4_GROUP_INFO_IBITMAP_CORRUPT after it's inode
bitmap buffer_head was successfully verified, then __ext4_new_inode()
will get a valid inode_bitmap_bh of a corrupted group from
ext4_read_inode_bitmap() in which case inode_bitmap_bh misses a release.
Hnadle "IS_ERR(inode_bitmap_bh)" and group corruption separately like
how ext4_free_inode() does to avoid buffer_head leak.

Fixes: 9008a58e5dce ("ext4: make the bitmap read routines return real error codes")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://patch.msgid.link/20240820132234.2759926-3-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: avoid buffer_head leak in ext4_mark_inode_used()
Kemeng Shi [Tue, 20 Aug 2024 13:22:28 +0000 (21:22 +0800)]
ext4: avoid buffer_head leak in ext4_mark_inode_used()

Release inode_bitmap_bh from ext4_read_inode_bitmap() in
ext4_mark_inode_used() to avoid buffer_head leak.
By the way, remove unneeded goto for invalid ino when inode_bitmap_bh
is NULL.

Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://patch.msgid.link/20240820132234.2759926-2-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT even mount with discard
yangerkun [Sat, 17 Aug 2024 08:55:10 +0000 (16:55 +0800)]
ext4: clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT even mount with discard

Commit 3d56b8d2c74c ("ext4: Speed up FITRIM by recording flags in
ext4_group_info") speed up fstrim by skipping trim trimmed group. We
also has the chance to clear trimmed once there exists some block free
for this group(mount without discard), and the next trim for this group
will work well too.

For mount with discard, we will issue dicard when we free blocks, so
leave trimmed flag keep alive to skip useless trim trigger from
userspace seems reasonable. But for some case like ext4 build on
dm-thinpool(ext4 blocksize 4K, pool blocksize 128K), discard from ext4
maybe unaligned for dm thinpool, and thinpool will just finish this
discard(see process_discard_bio when begein equals to end) without
actually process discard. For this case, trim from userspace can really
help us to free some thinpool block.

So convert to clear trimmed flag for all case no matter mounted with
discard or not.

Fixes: 3d56b8d2c74c ("ext4: Speed up FITRIM by recording flags in ext4_group_info")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240817085510.2084444-1-yangerkun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: drop all delonly descriptions
Zhang Yi [Tue, 13 Aug 2024 12:34:52 +0000 (20:34 +0800)]
ext4: drop all delonly descriptions

When counting reserved clusters, delayed type is always equal to delonly
type now, hence drop all delonly descriptions in parameters and
comments.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240813123452.2824659-13-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: drop ext4_es_is_delonly()
Zhang Yi [Tue, 13 Aug 2024 12:34:51 +0000 (20:34 +0800)]
ext4: drop ext4_es_is_delonly()

Since we don't add delayed flag in unwritten extents, so there is no
difference between ext4_es_is_delayed() and ext4_es_is_delonly(),
just drop ext4_es_is_delonly().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240813123452.2824659-12-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: make extent status types exclusive
Zhang Yi [Tue, 13 Aug 2024 12:34:50 +0000 (20:34 +0800)]
ext4: make extent status types exclusive

Since we don't add delayed flag in unwritten extents, all of the four
extent status types EXTENT_STATUS_WRITTEN, EXTENT_STATUS_UNWRITTEN,
EXTENT_STATUS_DELAYED and EXTENT_STATUS_HOLE are exclusive now, add
assertion when storing pblock before inserting extent into status tree
and add comment to the status definition.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240813123452.2824659-11-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: drop unused ext4_es_store_status()
Zhang Yi [Tue, 13 Aug 2024 12:34:49 +0000 (20:34 +0800)]
ext4: drop unused ext4_es_store_status()

The helper ext4_es_store_status() is unused now, just drop it.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240813123452.2824659-10-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: use ext4_map_query_blocks() in ext4_map_blocks()
Zhang Yi [Tue, 13 Aug 2024 12:34:48 +0000 (20:34 +0800)]
ext4: use ext4_map_query_blocks() in ext4_map_blocks()

The blocks map querying logic in ext4_map_blocks() are the same as
ext4_map_query_blocks(), so switch to directly use it.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240813123452.2824659-9-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: drop ext4_es_delayed_clu()
Zhang Yi [Tue, 13 Aug 2024 12:34:47 +0000 (20:34 +0800)]
ext4: drop ext4_es_delayed_clu()

Since we move ext4_da_update_reserve_space() to ext4_es_insert_extent(),
no one uses ext4_es_delayed_clu() and __es_delayed_clu(), just drop
them.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240813123452.2824659-8-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: update delalloc data reserve spcae in ext4_es_insert_extent()
Zhang Yi [Tue, 13 Aug 2024 12:34:46 +0000 (20:34 +0800)]
ext4: update delalloc data reserve spcae in ext4_es_insert_extent()

Now that we update data reserved space for delalloc after allocating
new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is
enabled, we also need to query the extents_status tree to calculate the
exact reserved clusters. This is complicated now and it appears that
it's better to do this job in ext4_es_insert_extent(), because
__es_remove_extent() have already count delalloc blocks when removing
delalloc extents and __revise_pending() return new adding pending count,
we could update the reserved blocks easily in ext4_es_insert_extent().

We direct reduce the reserved cluster count when replacing a delalloc
extent. However, thers are two special cases need to concern about the
quota claiming when doing direct block allocation (e.g. from fallocate).

A),
fallocate a range that covers a delalloc extent but start with
non-delayed allocated blocks, e.g. a hole.

  hhhhhhh+ddddddd+ddddddd
  ^^^^^^^^^^^^^^^^^^^^^^^  fallocate this range

Current ext4_map_blocks() can't always trim the extent since it may
release i_data_sem before calling ext4_map_create_blocks() and raced by
another delayed allocation. Hence the EXT4_GET_BLOCKS_DELALLOC_RESERVE
may not set even when we are replacing a delalloc extent, without this
flag set, the quota has already been claimed by ext4_mb_new_blocks(), so
we should release the quota reservations instead of claim them again.

B),
bigalloc feature is enabled, fallocate a range that contains non-delayed
allocated blocks.

  |<         one cluster       >|
  hhhhhhh+hhhhhhh+hhhhhhh+ddddddd
  ^^^^^^^  fallocate this range

This case is similar to above case, the EXT4_GET_BLOCKS_DELALLOC_RESERVE
flag is also not set.

Hence we should release the quota reservations if we replace a delalloc
extent but without EXT4_GET_BLOCKS_DELALLOC_RESERVE set.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240813123452.2824659-7-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: passing block allocation information to ext4_es_insert_extent()
Zhang Yi [Tue, 13 Aug 2024 12:34:45 +0000 (20:34 +0800)]
ext4: passing block allocation information to ext4_es_insert_extent()

Just pass the block allocation flag to ext4_es_insert_extent() when we
replacing a current extent after an actually block allocation or extent
status conversion, this flag will be used by later changes.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240813123452.2824659-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: let __revise_pending() return newly inserted pendings
Zhang Yi [Tue, 13 Aug 2024 12:34:44 +0000 (20:34 +0800)]
ext4: let __revise_pending() return newly inserted pendings

Let __insert_pending() return 1 after successfully inserting a new
pending cluster, and also let __revise_pending() to return the number of
of newly inserted pendings.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240813123452.2824659-5-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: don't set EXTENT_STATUS_DELAYED on allocated blocks
Zhang Yi [Tue, 13 Aug 2024 12:34:43 +0000 (20:34 +0800)]
ext4: don't set EXTENT_STATUS_DELAYED on allocated blocks

Currently, we release delayed allocation reservation when removing
delayed extent from extent status tree (which also happens when
overwriting one extent with another one). When we allocated unwritten
extent under some delayed allocated extent, we don't need the
reservation anymore and hence we don't need to preserve the
EXT4_MAP_DELAYED status bit. Allocating the new extent blocks will
properly release the reservation.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240813123452.2824659-4-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: optimize the EXT4_GET_BLOCKS_DELALLOC_RESERVE flag set
Zhang Yi [Tue, 13 Aug 2024 12:34:42 +0000 (20:34 +0800)]
ext4: optimize the EXT4_GET_BLOCKS_DELALLOC_RESERVE flag set

When doing block allocation, magic EXT4_GET_BLOCKS_DELALLOC_RESERVE
means the allocating range covers a range of delayed allocated clusters,
the blocks and quotas have already been reserved in ext4_da_map_blocks(),
we should update the reserved space and don't need to claim them again.

At the moment, we only set this magic in mpage_map_one_extent() when
allocating a range of delayed allocated clusters in the write back path,
it makes things complicated since we have to notice and deal with the
case of allocating non-delayed allocated clusters separately in
ext4_ext_map_blocks(). For example, it we fallocate some blocks that
have been delayed allocated, free space would be claimed again in
ext4_mb_new_blocks() (this is wrong exactily), and we can't claim quota
space again, we have to release the quota reservations made for that
previously delayed allocated clusters.

Move the position thats set the EXT4_GET_BLOCKS_DELALLOC_RESERVE to
where we actually do block allocation, it could simplify above handling
a lot, it means that we always set this magic once the allocation range
covers delalloc blocks, no need to take care of the allocation path.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240813123452.2824659-3-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: factor out ext4_map_create_blocks() to allocate new blocks
Zhang Yi [Tue, 13 Aug 2024 12:34:41 +0000 (20:34 +0800)]
ext4: factor out ext4_map_create_blocks() to allocate new blocks

Factor out a common helper ext4_map_create_blocks() from
ext4_map_blocks() to do a real blocks allocation, no logic changes.

[ Note: this first patch of a ten patch series named "v3: simplify the
  counting and management of delalloc reserved blocks".  The link to
  the v1 and v2 patch series are below. -- TYT ]

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240802115120.362902-1-yi.zhang@huaweicloud.com
Link: https://patch.msgid.link/20240601034149.2169771-1-yi.zhang@huaweicloud.com
Link: https://patch.msgid.link/20240813123452.2824659-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: dax: fix overflowing extents beyond inode size when partially writing
Zhihao Cheng [Fri, 9 Aug 2024 12:15:32 +0000 (20:15 +0800)]
ext4: dax: fix overflowing extents beyond inode size when partially writing

The dax_iomap_rw() does two things in each iteration: map written blocks
and copy user data to blocks. If the process is killed by user(See signal
handling in dax_iomap_iter()), the copied data will be returned and added
on inode size, which means that the length of written extents may exceed
the inode size, then fsck will fail. An example is given as:

dd if=/dev/urandom of=file bs=4M count=1
 dax_iomap_rw
  iomap_iter // round 1
   ext4_iomap_begin
    ext4_iomap_alloc // allocate 0~2M extents(written flag)
  dax_iomap_iter // copy 2M data
  iomap_iter // round 2
   iomap_iter_advance
    iter->pos += iter->processed // iter->pos = 2M
   ext4_iomap_begin
    ext4_iomap_alloc // allocate 2~4M extents(written flag)
  dax_iomap_iter
   fatal_signal_pending
  done = iter->pos - iocb->ki_pos // done = 2M
 ext4_handle_inode_extension
  ext4_update_inode_size // inode size = 2M

fsck reports: Inode 13, i_size is 2097152, should be 4194304.  Fix?

Fix the problem by truncating extents if the written length is smaller
than expected.

Fixes: 776722e85d3b ("ext4: DAX iomap write support")
CC: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219136
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://patch.msgid.link/20240809121532.2105494-1-chengzhihao@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: don't set SB_RDONLY after filesystem errors
Jan Kara [Mon, 5 Aug 2024 20:12:41 +0000 (22:12 +0200)]
ext4: don't set SB_RDONLY after filesystem errors

When the filesystem is mounted with errors=remount-ro, we were setting
SB_RDONLY flag to stop all filesystem modifications. We knew this misses
proper locking (sb->s_umount) and does not go through proper filesystem
remount procedure but it has been the way this worked since early ext2
days and it was good enough for catastrophic situation damage
mitigation. Recently, syzbot has found a way (see link) to trigger
warnings in filesystem freezing because the code got confused by
SB_RDONLY changing under its hands. Since these days we set
EXT4_FLAGS_SHUTDOWN on the superblock which is enough to stop all
filesystem modifications, modifying SB_RDONLY shouldn't be needed. So
stop doing that.

Link: https://lore.kernel.org/all/000000000000b90a8e061e21d12f@google.com
Reported-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://patch.msgid.link/20240805201241.27286-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: nested locking for xattr inode
Wojciech Gładysz [Thu, 1 Aug 2024 14:38:27 +0000 (16:38 +0200)]
ext4: nested locking for xattr inode

Add nested locking with I_MUTEX_XATTR subclass to avoid lockdep warning
while handling xattr inode on file open syscall at ext4_xattr_inode_iget.

Backtrace
EXT4-fs (loop0): Ignoring removed oldalloc option
======================================================
WARNING: possible circular locking dependency detected
5.10.0-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor543/2794 is trying to acquire lock:
ffff8880215e1a48 (&ea_inode->i_rwsem#7/1){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:782 [inline]
ffff8880215e1a48 (&ea_inode->i_rwsem#7/1){+.+.}-{3:3}, at: ext4_xattr_inode_iget+0x42a/0x5c0 fs/ext4/xattr.c:425

but task is already holding lock:
ffff8880215e3278 (&ei->i_data_sem/3){++++}-{3:3}, at: ext4_setattr+0x136d/0x19c0 fs/ext4/inode.c:5559

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&ei->i_data_sem/3){++++}-{3:3}:
       lock_acquire+0x197/0x480 kernel/locking/lockdep.c:5566
       down_write+0x93/0x180 kernel/locking/rwsem.c:1564
       ext4_update_i_disksize fs/ext4/ext4.h:3267 [inline]
       ext4_xattr_inode_write fs/ext4/xattr.c:1390 [inline]
       ext4_xattr_inode_lookup_create fs/ext4/xattr.c:1538 [inline]
       ext4_xattr_set_entry+0x331a/0x3d80 fs/ext4/xattr.c:1662
       ext4_xattr_ibody_set+0x124/0x390 fs/ext4/xattr.c:2228
       ext4_xattr_set_handle+0xc27/0x14e0 fs/ext4/xattr.c:2385
       ext4_xattr_set+0x219/0x390 fs/ext4/xattr.c:2498
       ext4_xattr_user_set+0xc9/0xf0 fs/ext4/xattr_user.c:40
       __vfs_setxattr+0x404/0x450 fs/xattr.c:177
       __vfs_setxattr_noperm+0x11d/0x4f0 fs/xattr.c:208
       __vfs_setxattr_locked+0x1f9/0x210 fs/xattr.c:266
       vfs_setxattr+0x112/0x2c0 fs/xattr.c:283
       setxattr+0x1db/0x3e0 fs/xattr.c:548
       path_setxattr+0x15a/0x240 fs/xattr.c:567
       __do_sys_setxattr fs/xattr.c:582 [inline]
       __se_sys_setxattr fs/xattr.c:578 [inline]
       __x64_sys_setxattr+0xc5/0xe0 fs/xattr.c:578
       do_syscall_64+0x6d/0xa0 arch/x86/entry/common.c:62
       entry_SYSCALL_64_after_hwframe+0x61/0xcb

-> #0 (&ea_inode->i_rwsem#7/1){+.+.}-{3:3}:
       check_prev_add kernel/locking/lockdep.c:2988 [inline]
       check_prevs_add kernel/locking/lockdep.c:3113 [inline]
       validate_chain+0x1695/0x58f0 kernel/locking/lockdep.c:3729
       __lock_acquire+0x12fd/0x20d0 kernel/locking/lockdep.c:4955
       lock_acquire+0x197/0x480 kernel/locking/lockdep.c:5566
       down_write+0x93/0x180 kernel/locking/rwsem.c:1564
       inode_lock include/linux/fs.h:782 [inline]
       ext4_xattr_inode_iget+0x42a/0x5c0 fs/ext4/xattr.c:425
       ext4_xattr_inode_get+0x138/0x410 fs/ext4/xattr.c:485
       ext4_xattr_move_to_block fs/ext4/xattr.c:2580 [inline]
       ext4_xattr_make_inode_space fs/ext4/xattr.c:2682 [inline]
       ext4_expand_extra_isize_ea+0xe70/0x1bb0 fs/ext4/xattr.c:2774
       __ext4_expand_extra_isize+0x304/0x3f0 fs/ext4/inode.c:5898
       ext4_try_to_expand_extra_isize fs/ext4/inode.c:5941 [inline]
       __ext4_mark_inode_dirty+0x591/0x810 fs/ext4/inode.c:6018
       ext4_setattr+0x1400/0x19c0 fs/ext4/inode.c:5562
       notify_change+0xbb6/0xe60 fs/attr.c:435
       do_truncate+0x1de/0x2c0 fs/open.c:64
       handle_truncate fs/namei.c:2970 [inline]
       do_open fs/namei.c:3311 [inline]
       path_openat+0x29f3/0x3290 fs/namei.c:3425
       do_filp_open+0x20b/0x450 fs/namei.c:3452
       do_sys_openat2+0x124/0x460 fs/open.c:1207
       do_sys_open fs/open.c:1223 [inline]
       __do_sys_open fs/open.c:1231 [inline]
       __se_sys_open fs/open.c:1227 [inline]
       __x64_sys_open+0x221/0x270 fs/open.c:1227
       do_syscall_64+0x6d/0xa0 arch/x86/entry/common.c:62
       entry_SYSCALL_64_after_hwframe+0x61/0xcb

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ei->i_data_sem/3);
                               lock(&ea_inode->i_rwsem#7/1);
                               lock(&ei->i_data_sem/3);
  lock(&ea_inode->i_rwsem#7/1);

 *** DEADLOCK ***

5 locks held by syz-executor543/2794:
 #0: ffff888026fbc448 (sb_writers#4){.+.+}-{0:0}, at: mnt_want_write+0x4a/0x2a0 fs/namespace.c:365
 #1: ffff8880215e3488 (&sb->s_type->i_mutex_key#7){++++}-{3:3}, at: inode_lock include/linux/fs.h:782 [inline]
 #1: ffff8880215e3488 (&sb->s_type->i_mutex_key#7){++++}-{3:3}, at: do_truncate+0x1cf/0x2c0 fs/open.c:62
 #2: ffff8880215e3310 (&ei->i_mmap_sem){++++}-{3:3}, at: ext4_setattr+0xec4/0x19c0 fs/ext4/inode.c:5519
 #3: ffff8880215e3278 (&ei->i_data_sem/3){++++}-{3:3}, at: ext4_setattr+0x136d/0x19c0 fs/ext4/inode.c:5559
 #4: ffff8880215e30c8 (&ei->xattr_sem){++++}-{3:3}, at: ext4_write_trylock_xattr fs/ext4/xattr.h:162 [inline]
 #4: ffff8880215e30c8 (&ei->xattr_sem){++++}-{3:3}, at: ext4_try_to_expand_extra_isize fs/ext4/inode.c:5938 [inline]
 #4: ffff8880215e30c8 (&ei->xattr_sem){++++}-{3:3}, at: __ext4_mark_inode_dirty+0x4fb/0x810 fs/ext4/inode.c:6018

stack backtrace:
CPU: 1 PID: 2794 Comm: syz-executor543 Not tainted 5.10.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x177/0x211 lib/dump_stack.c:118
 print_circular_bug+0x146/0x1b0 kernel/locking/lockdep.c:2002
 check_noncircular+0x2cc/0x390 kernel/locking/lockdep.c:2123
 check_prev_add kernel/locking/lockdep.c:2988 [inline]
 check_prevs_add kernel/locking/lockdep.c:3113 [inline]
 validate_chain+0x1695/0x58f0 kernel/locking/lockdep.c:3729
 __lock_acquire+0x12fd/0x20d0 kernel/locking/lockdep.c:4955
 lock_acquire+0x197/0x480 kernel/locking/lockdep.c:5566
 down_write+0x93/0x180 kernel/locking/rwsem.c:1564
 inode_lock include/linux/fs.h:782 [inline]
 ext4_xattr_inode_iget+0x42a/0x5c0 fs/ext4/xattr.c:425
 ext4_xattr_inode_get+0x138/0x410 fs/ext4/xattr.c:485
 ext4_xattr_move_to_block fs/ext4/xattr.c:2580 [inline]
 ext4_xattr_make_inode_space fs/ext4/xattr.c:2682 [inline]
 ext4_expand_extra_isize_ea+0xe70/0x1bb0 fs/ext4/xattr.c:2774
 __ext4_expand_extra_isize+0x304/0x3f0 fs/ext4/inode.c:5898
 ext4_try_to_expand_extra_isize fs/ext4/inode.c:5941 [inline]
 __ext4_mark_inode_dirty+0x591/0x810 fs/ext4/inode.c:6018
 ext4_setattr+0x1400/0x19c0 fs/ext4/inode.c:5562
 notify_change+0xbb6/0xe60 fs/attr.c:435
 do_truncate+0x1de/0x2c0 fs/open.c:64
 handle_truncate fs/namei.c:2970 [inline]
 do_open fs/namei.c:3311 [inline]
 path_openat+0x29f3/0x3290 fs/namei.c:3425
 do_filp_open+0x20b/0x450 fs/namei.c:3452
 do_sys_openat2+0x124/0x460 fs/open.c:1207
 do_sys_open fs/open.c:1223 [inline]
 __do_sys_open fs/open.c:1231 [inline]
 __se_sys_open fs/open.c:1227 [inline]
 __x64_sys_open+0x221/0x270 fs/open.c:1227
 do_syscall_64+0x6d/0xa0 arch/x86/entry/common.c:62
 entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7f0cde4ea229
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 21 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd81d1c978 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
RAX: ffffffffffffffda RBX: 0030656c69662f30 RCX: 00007f0cde4ea229
RDX: 0000000000000089 RSI: 00000000000a0a00 RDI: 00000000200001c0
RBP: 2f30656c69662f2e R08: 0000000000208000 R09: 0000000000208000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd81d1c9c0
R13: 00007ffd81d1ca00 R14: 0000000000080000 R15: 0000000000000003
EXT4-fs error (device loop0): ext4_expand_extra_isize_ea:2730: inode #13: comm syz-executor543: corrupted in-inode xattr

Signed-off-by: Wojciech Gładysz <wojciech.gladysz@infogain.com>
Link: https://patch.msgid.link/20240801143827.19135-1-wojciech.gladysz@infogain.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agojbd2: remove unneeded check of ret in jbd2_fc_get_buf
Kemeng Shi [Thu, 1 Aug 2024 01:38:15 +0000 (09:38 +0800)]
jbd2: remove unneeded check of ret in jbd2_fc_get_buf

Simply return -EINVAL if j_fc_off is invalid to avoid repeated check of
ret.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240801013815.2393869-9-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agojbd2: correct comment jbd2_mark_journal_empty
Kemeng Shi [Thu, 1 Aug 2024 01:38:14 +0000 (09:38 +0800)]
jbd2: correct comment jbd2_mark_journal_empty

After jbd2_mark_journal_empty, journal log is supposed to be empty.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240801013815.2393869-8-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agojbd2: move escape handle to futher improve jbd2_journal_write_metadata_buffer
Kemeng Shi [Thu, 1 Aug 2024 01:38:13 +0000 (09:38 +0800)]
jbd2: move escape handle to futher improve jbd2_journal_write_metadata_buffer

Move escape handle to futher improve code readability and remove some
repeat check.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240801013815.2393869-7-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agojbd2: remove unneeded done_copy_out variable in jbd2_journal_write_metadata_buffer
Kemeng Shi [Thu, 1 Aug 2024 01:38:12 +0000 (09:38 +0800)]
jbd2: remove unneeded done_copy_out variable in jbd2_journal_write_metadata_buffer

It's more intuitive to use jh_in->b_frozen_data directly instead of
done_copy_out variable. Simply remove unneeded done_copy_out variable
and use b_frozen_data instead.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240801013815.2393869-6-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agojbd2: remove unneeded kmap for jh_in->b_frozen_data in jbd2_journal_write_metadata_buffer
Kemeng Shi [Thu, 1 Aug 2024 01:38:11 +0000 (09:38 +0800)]
jbd2: remove unneeded kmap for jh_in->b_frozen_data in jbd2_journal_write_metadata_buffer

Remove kmap for page of b_frozen_data from jbd2_alloc() which always
provides an address from the direct kernel mapping.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240801013815.2393869-5-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agojbd2: remove unused return value of jbd2_fc_release_bufs
Kemeng Shi [Thu, 1 Aug 2024 01:38:10 +0000 (09:38 +0800)]
jbd2: remove unused return value of jbd2_fc_release_bufs

Remove unused return value of jbd2_fc_release_bufs.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240801013815.2393869-4-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agojbd2: remove dead check in journal_alloc_journal_head
Kemeng Shi [Thu, 1 Aug 2024 01:38:09 +0000 (09:38 +0800)]
jbd2: remove dead check in journal_alloc_journal_head

We will alloc journal_head with __GFP_NOFAIL anyway, test for failure
is pointless.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240801013815.2393869-3-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agojbd2: correctly compare tids with tid_geq function in jbd2_fc_begin_commit
Kemeng Shi [Thu, 1 Aug 2024 01:38:08 +0000 (09:38 +0800)]
jbd2: correctly compare tids with tid_geq function in jbd2_fc_begin_commit

Use tid_geq to compare tids to work over sequence number wraps.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Cc: stable@kernel.org
Link: https://patch.msgid.link/20240801013815.2393869-2-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: annotate struct ext4_xattr_inode_array with __counted_by()
Thorsten Blum [Tue, 30 Jul 2024 22:02:02 +0000 (00:02 +0200)]
ext4: annotate struct ext4_xattr_inode_array with __counted_by()

Add the __counted_by compiler attribute to the flexible array member
inodes to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.

Remove the now obsolete comment on the count field.

In ext4_expand_inode_array(), use struct_size() instead of offsetof()
and remove the local variable count. Increment the count field before
adding a new inode to the inodes array.

Compile-tested only.

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Link: https://patch.msgid.link/20240730220200.410939-3-thorsten.blum@toblux.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoDocumentation: ext4.rst: remove obsolete descriptions of noacl/nouser_xattr options
Stefan Tauner [Sun, 28 Jul 2024 00:34:33 +0000 (02:34 +0200)]
Documentation: ext4.rst: remove obsolete descriptions of noacl/nouser_xattr options

These have been deprecated for a decade[1] and removed two years ago[2].
1: f70486055ee351158bd6999f3965ad378b52c694
2: 2d544ec923dbe5fbed64a7f43dccf527218380bc

Signed-off-by: Stefan Tauner <stefan.tauner@gmx.at>
Link: https://patch.msgid.link/20240728003433.2566649-1-stefan.tauner@gmx.at
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: fix incorrect tid assumption in ext4_fc_mark_ineligible()
Luis Henriques (SUSE) [Wed, 24 Jul 2024 16:11:18 +0000 (17:11 +0100)]
ext4: fix incorrect tid assumption in ext4_fc_mark_ineligible()

Function jbd2_journal_shrink_checkpoint_list() assumes that '0' is not a
valid value for transaction IDs, which is incorrect.

Furthermore, the sbi->s_fc_ineligible_tid handling also makes the same
assumption by being initialised to '0'.  Fortunately, the sb flag
EXT4_MF_FC_INELIGIBLE can be used to check whether sbi->s_fc_ineligible_tid
has been previously set instead of comparing it with '0'.

Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240724161119.13448-5-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
9 months agoext4: fix incorrect tid assumption in jbd2_journal_shrink_checkpoint_list()
Luis Henriques (SUSE) [Wed, 24 Jul 2024 16:11:17 +0000 (17:11 +0100)]
ext4: fix incorrect tid assumption in jbd2_journal_shrink_checkpoint_list()

Function jbd2_journal_shrink_checkpoint_list() assumes that '0' is not a
valid value for transaction IDs, which is incorrect.  Don't assume that and
use two extra boolean variables to control the loop iterations and keep
track of the first and last tid.

Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240724161119.13448-4-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
9 months agoext4: fix incorrect tid assumption in __jbd2_log_wait_for_space()
Luis Henriques (SUSE) [Wed, 24 Jul 2024 16:11:16 +0000 (17:11 +0100)]
ext4: fix incorrect tid assumption in __jbd2_log_wait_for_space()

Function __jbd2_log_wait_for_space() assumes that '0' is not a valid value
for transaction IDs, which is incorrect.  Don't assume that and invoke
jbd2_log_wait_commit() if the journal had a committing transaction instead.

Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240724161119.13448-3-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
9 months agoext4: fix incorrect tid assumption in ext4_wait_for_tail_page_commit()
Luis Henriques (SUSE) [Wed, 24 Jul 2024 16:11:15 +0000 (17:11 +0100)]
ext4: fix incorrect tid assumption in ext4_wait_for_tail_page_commit()

Function ext4_wait_for_tail_page_commit() assumes that '0' is not a valid
value for transaction IDs, which is incorrect.  Don't assume that and invoke
jbd2_log_wait_commit() if the journal had a committing transaction instead.

Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240724161119.13448-2-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
9 months agojbd2: fix kernel-doc for j_transaction_overhead_buffers
Randy Dunlap [Tue, 23 Jul 2024 05:16:47 +0000 (22:16 -0700)]
jbd2: fix kernel-doc for j_transaction_overhead_buffers

Use the correct struct member name in the kernel-doc notation
to prevent a kernel-doc build warning.

include/linux/jbd2.h:1303: warning: Function parameter or struct member 'j_transaction_overhead_buffers' not described in 'journal_s'
include/linux/jbd2.h:1303: warning: Excess struct member 'j_transaction_overhead' description in 'journal_s'

Fixes: e3a00a23781c ("jbd2: precompute number of transaction descriptor blocks")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20240710182252.4c281445@canb.auug.org.au/
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240723051647.3053491-1-rdunlap@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: tidy the BH loop in mext_page_mkuptodate()
Matthew Wilcox (Oracle) [Thu, 18 Jul 2024 22:30:02 +0000 (23:30 +0100)]
ext4: tidy the BH loop in mext_page_mkuptodate()

This for loop is somewhat hard to read; turn it into a normal BH
do-while loop.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20240718223005.568869-4-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: remove array of buffer_heads from mext_page_mkuptodate()
Matthew Wilcox (Oracle) [Thu, 18 Jul 2024 22:30:01 +0000 (23:30 +0100)]
ext4: remove array of buffer_heads from mext_page_mkuptodate()

Iterate the folio's list of buffer_heads twice instead of keeping
an array of pointers.  This solves a too-large-array-for-stack problem
on architectures with a ridiculoously large PAGE_SIZE and prepares
ext4 to support larger folios.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20240718223005.568869-3-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: pipeline buffer reads in mext_page_mkuptodate()
Matthew Wilcox (Oracle) [Thu, 18 Jul 2024 22:30:00 +0000 (23:30 +0100)]
ext4: pipeline buffer reads in mext_page_mkuptodate()

Instead of synchronously reading one buffer at a time, submit reads
as we walk the buffers in the first loop, then wait for them in the
second loop.  This should be significantly more efficient, particularly
on HDDs, but I have not measured.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20240718223005.568869-2-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: reduce stack usage in ext4_mpage_readpages()
Matthew Wilcox (Oracle) [Thu, 18 Jul 2024 22:29:59 +0000 (23:29 +0100)]
ext4: reduce stack usage in ext4_mpage_readpages()

This function is very similar to do_mpage_readpage() and a similar
approach to that taken in commit 12ac5a65cb56 will work.  As in
do_mpage_readpage(), we only use this array for checking block contiguity
and we can do that more efficiently with a little arithmetic.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20240718223005.568869-1-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agojbd2: stop waiting for space when jbd2_cleanup_journal_tail() returns error
Baokun Li [Thu, 18 Jul 2024 11:53:36 +0000 (19:53 +0800)]
jbd2: stop waiting for space when jbd2_cleanup_journal_tail() returns error

In __jbd2_log_wait_for_space(), we might call jbd2_cleanup_journal_tail()
to recover some journal space. But if an error occurs while executing
jbd2_cleanup_journal_tail() (e.g., an EIO), we don't stop waiting for free
space right away, we try other branches, and if j_committing_transaction
is NULL (i.e., the tid is 0), we will get the following complain:

============================================
JBD2: I/O error when updating journal superblock for sdd-8.
__jbd2_log_wait_for_space: needed 256 blocks and only had 217 space available
__jbd2_log_wait_for_space: no way to get more journal space in sdd-8
------------[ cut here ]------------
WARNING: CPU: 2 PID: 139804 at fs/jbd2/checkpoint.c:109 __jbd2_log_wait_for_space+0x251/0x2e0
Modules linked in:
CPU: 2 PID: 139804 Comm: kworker/u8:3 Not tainted 6.6.0+ #1
RIP: 0010:__jbd2_log_wait_for_space+0x251/0x2e0
Call Trace:
 <TASK>
 add_transaction_credits+0x5d1/0x5e0
 start_this_handle+0x1ef/0x6a0
 jbd2__journal_start+0x18b/0x340
 ext4_dirty_inode+0x5d/0xb0
 __mark_inode_dirty+0xe4/0x5d0
 generic_update_time+0x60/0x70
[...]
============================================

So only if jbd2_cleanup_journal_tail() returns 1, i.e., there is nothing to
clean up at the moment, continue to try to reclaim free space in other ways.

Note that this fix relies on commit 6f6a6fda2945 ("jbd2: fix ocfs2 corrupt
when updating journal superblock fails") to make jbd2_cleanup_journal_tail
return the correct error code.

Fixes: 8c3f25d8950c ("jbd2: don't give up looking for space so easily in __jbd2_log_wait_for_space")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240718115336.2554501-1-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: fix access to uninitialised lock in fc replay path
Luis Henriques (SUSE) [Thu, 18 Jul 2024 09:43:56 +0000 (10:43 +0100)]
ext4: fix access to uninitialised lock in fc replay path

The following kernel trace can be triggered with fstest generic/629 when
executed against a filesystem with fast-commit feature enabled:

INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 0 PID: 866 Comm: mount Not tainted 6.10.0+ #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x66/0x90
 register_lock_class+0x759/0x7d0
 __lock_acquire+0x85/0x2630
 ? __find_get_block+0xb4/0x380
 lock_acquire+0xd1/0x2d0
 ? __ext4_journal_get_write_access+0xd5/0x160
 _raw_spin_lock+0x33/0x40
 ? __ext4_journal_get_write_access+0xd5/0x160
 __ext4_journal_get_write_access+0xd5/0x160
 ext4_reserve_inode_write+0x61/0xb0
 __ext4_mark_inode_dirty+0x79/0x270
 ? ext4_ext_replay_set_iblocks+0x2f8/0x450
 ext4_ext_replay_set_iblocks+0x330/0x450
 ext4_fc_replay+0x14c8/0x1540
 ? jread+0x88/0x2e0
 ? rcu_is_watching+0x11/0x40
 do_one_pass+0x447/0xd00
 jbd2_journal_recover+0x139/0x1b0
 jbd2_journal_load+0x96/0x390
 ext4_load_and_init_journal+0x253/0xd40
 ext4_fill_super+0x2cc6/0x3180
...

In the replay path there's an attempt to lock sbi->s_bdev_wb_lock in
function ext4_check_bdev_write_error().  Unfortunately, at this point this
spinlock has not been initialized yet.  Moving it's initialization to an
earlier point in __ext4_fill_super() fixes this splat.

Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Link: https://patch.msgid.link/20240718094356.7863-1-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
9 months agoext4: fix fast commit inode enqueueing during a full journal commit
Luis Henriques (SUSE) [Wed, 17 Jul 2024 17:22:20 +0000 (18:22 +0100)]
ext4: fix fast commit inode enqueueing during a full journal commit

When a full journal commit is on-going, any fast commit has to be enqueued
into a different queue: FC_Q_STAGING instead of FC_Q_MAIN.  This enqueueing
is done only once, i.e. if an inode is already queued in a previous fast
commit entry it won't be enqueued again.  However, if a full commit starts
_after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to
be done into FC_Q_STAGING.  And this is not being done in function
ext4_fc_track_template().

This patch fixes the issue by re-enqueuing an inode into the STAGING queue
during the fast commit clean-up callback when doing a full commit.  However,
to prevent a race with a fast-commit, the clean-up callback has to be called
with the journal locked.

This bug was found using fstest generic/047.  This test creates several 32k
bytes files, sync'ing each of them after it's creation, and then shutting
down the filesystem.  Some data may be loss in this operation; for example a
file may have it's size truncated to zero.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240717172220.14201-1-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
9 months agoext4: fix timer use-after-free on failed mount
Xiaxi Shen [Mon, 15 Jul 2024 04:33:36 +0000 (21:33 -0700)]
ext4: fix timer use-after-free on failed mount

Syzbot has found an ODEBUG bug in ext4_fill_super

The del_timer_sync function cancels the s_err_report timer,
which reminds about filesystem errors daily. We should
guarantee the timer is no longer active before kfree(sbi).

When filesystem mounting fails, the flow goes to failed_mount3,
where an error occurs when ext4_stop_mmpd is called, causing
a read I/O failure. This triggers the ext4_handle_error function
that ultimately re-arms the timer,
leaving the s_err_report timer active before kfree(sbi) is called.

Fix the issue by canceling the s_err_report timer after calling ext4_stop_mmpd.

Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com>
Reported-and-tested-by: syzbot+59e0101c430934bc9a36@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=59e0101c430934bc9a36
Link: https://patch.msgid.link/20240715043336.98097-1-shenxiaxi26@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
9 months agoext4: use seq_putc() in two functions
Markus Elfring [Sat, 13 Jul 2024 15:10:24 +0000 (17:10 +0200)]
ext4: use seq_putc() in two functions

Single characters (line breaks) should be put into a sequence.
Thus use the corresponding function “seq_putc”.

This issue was transformed by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: https://patch.msgid.link/076974ab-4da3-4176-89dc-0514e020c276@web.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
9 months agoext4: no need to continue when the number of entries is 1
Edward Adam Davis [Mon, 1 Jul 2024 14:25:03 +0000 (22:25 +0800)]
ext4: no need to continue when the number of entries is 1

Fixes: ac27a0ec112a ("[PATCH] ext4: initial copy of files from ext3")
Reported-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Reported-and-tested-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com
Link: https://patch.msgid.link/tencent_BE7AEE6C7C2D216CB8949CE8E6EE7ECC2C0A@qq.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
9 months agoext4: correct encrypted dentry name hash when not casefolded
yao.ly [Mon, 1 Jul 2024 06:43:39 +0000 (14:43 +0800)]
ext4: correct encrypted dentry name hash when not casefolded

EXT4_DIRENT_HASH and EXT4_DIRENT_MINOR_HASH will access struct
ext4_dir_entry_hash followed ext4_dir_entry. But there is no ext4_dir_entry_hash
followed when inode is encrypted and not casefolded

Signed-off-by: yao.ly <yao.ly@linux.alibaba.com>
Link: https://patch.msgid.link/1719816219-128287-1-git-send-email-yao.ly@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
10 months agoext4: correct comment of h_checksum
Kemeng Shi [Thu, 6 Jun 2024 12:55:08 +0000 (20:55 +0800)]
ext4: correct comment of h_checksum

Checksum of xattr block is always crc32c(uuid+blknum+xattrblock), see
ext4_xattr_block_csum_set for detail. Remove incorrect comment that
"id = inum if refcount=1".

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://patch.msgid.link/20240606125508.1459893-4-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 months agoext4: correct comment of ext4_xattr_block_cache_insert
Kemeng Shi [Thu, 6 Jun 2024 12:55:07 +0000 (20:55 +0800)]
ext4: correct comment of ext4_xattr_block_cache_insert

There is no return value from ext4_xattr_block_cache_insert, just correct
it's comment about return value.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://patch.msgid.link/20240606125508.1459893-3-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 months agoext4: correct comment of ext4_xattr_cmp
Kemeng Shi [Thu, 6 Jun 2024 12:55:06 +0000 (20:55 +0800)]
ext4: correct comment of ext4_xattr_cmp

The ext4_xattr_cmp never returns negative error number. Correct possible
return value in ext4_xattr_cmp's comment.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://patch.msgid.link/20240606125508.1459893-2-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 months agoext4: fix macro definition error of EXT4_DIRENT_HASH and EXT4_DIRENT_MINOR_HASH
carrion bent [Thu, 6 Jun 2024 05:43:16 +0000 (13:43 +0800)]
ext4: fix macro definition error of EXT4_DIRENT_HASH and EXT4_DIRENT_MINOR_HASH

The macro parameter 'entry' of EXT4_DIRENT_HASH and
EXT4_DIRENT_MINOR_HASH was not used, but rather the variable 'de' was
directly used, which may be a local variable inside a function that
calls the macros.  Fortunately, all callers have passed in 'de' so
far, so this bug didn't have an effect.

Signed-off-by: carrion bent <carrionbent@linux.alibaba.com>
Link: https://patch.msgid.link/1717652596-58760-1-git-send-email-carrionbent@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 months agoext4: filesystems without casefold feature cannot be mounted with siphash
Lizhi Xu [Wed, 5 Jun 2024 01:23:35 +0000 (09:23 +0800)]
ext4: filesystems without casefold feature cannot be mounted with siphash

When mounting the ext4 filesystem, if the default hash version is set to
DX_HASH_SIPHASH but the casefold feature is not set, exit the mounting.

Reported-by: syzbot+340581ba9dceb7e06fb3@syzkaller.appspotmail.com
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Link: https://patch.msgid.link/20240605012335.44086-1-lizhi.xu@windriver.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 months agoext4: adjust the layout of the ext4_inode_info structure to save memory
Junchao Sun [Mon, 3 Jun 2024 13:15:24 +0000 (21:15 +0800)]
ext4: adjust the layout of the ext4_inode_info structure to save memory

Using pahole, we can see that there are some padding holes
in the current ext4_inode_info structure. Adjusting the
layout of ext4_inode_info can reduce these holes,
resulting in the size of the structure decreasing
from 2424 bytes to 2408 bytes.

Signed-off-by: Junchao Sun <sunjunchao2870@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240603131524.324224-1-sunjunchao2870@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 months agoLinux 6.11-rc4
Linus Torvalds [Sun, 18 Aug 2024 20:17:27 +0000 (13:17 -0700)]
Linux 6.11-rc4

10 months agoMerge tag 'driver-core-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 18 Aug 2024 17:19:49 +0000 (10:19 -0700)]
Merge tag 'driver-core-6.11-rc4' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are two driver fixes for regressions from 6.11-rc1 due to the
  driver core change making a structure in a driver core callback const.

  These were missed by all testing EXCEPT for what Bart happened to be
  running, so I appreciate the fixes provided here for some
  odd/not-often-used driver subsystems that nothing else happened to
  catch.

  Both of these fixes have been in linux-next all week with no reported
  issues"

* tag 'driver-core-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  mips: sgi-ip22: Fix the build
  ARM: riscpc: ecard: Fix the build

10 months agoMerge tag 'char-misc-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sun, 18 Aug 2024 17:16:34 +0000 (10:16 -0700)]
Merge tag 'char-misc-6.11-rc4' of git://git./linux/kernel/git/gregkh/char-misc

Pull char / misc fixes from Greg KH:
 "Here are some small char/misc fixes for 6.11-rc4 to resolve reported
  problems. Included in here are:

   - fastrpc revert of a change that broke userspace

   - xillybus fixes for reported issues

  Half of these have been in linux-next this week with no reported
  problems, I don't know if the last bit of xillybus driver changes made
  it in, but they are 'obviously correct' so will be safe :)"

* tag 'char-misc-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  char: xillybus: Check USB endpoints when probing device
  char: xillybus: Refine workqueue handling
  Revert "misc: fastrpc: Restrict untrusted app to attach to privileged PD"
  char: xillybus: Don't destroy workqueue from work item running on it

10 months agoMerge tag 'tty-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 18 Aug 2024 17:10:48 +0000 (10:10 -0700)]
Merge tag 'tty-6.11-rc4' of git://git./linux/kernel/git/gregkh/tty

Pull tty / serial fixes from Greg KH:
 "Here are some small tty and serial driver fixes for 6.11-rc4 to
  resolve some reported problems. Included in here are:

   - conmakehash.c userspace build issues

   - fsl_lpuart driver fix

   - 8250_omap revert for reported regression

   - atmel_serial rts flag fix

  All of these have been in linux-next this week with no reported
  issues"

* tag 'tty-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "serial: 8250_omap: Set the console genpd always on if no console suspend"
  tty: atmel_serial: use the correct RTS flag.
  tty: vt: conmakehash: remove non-portable code printing comment header
  tty: serial: fsl_lpuart: mark last busy before uart_add_one_port

10 months agoMerge tag 'usb-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 18 Aug 2024 16:59:06 +0000 (09:59 -0700)]
Merge tag 'usb-6.11-rc4' of git://git./linux/kernel/git/gregkh/usb

Pull USB / Thunderbolt driver fixes from Greg KH:
 "Here are some small USB and Thunderbolt driver fixes for 6.11-rc4 to
  resolve some reported issues. Included in here are:

   - thunderbolt driver fixes for reported problems

   - typec driver fixes

   - xhci fixes

   - new device id for ljca usb driver

  All of these have been in linux-next this week with no reported
  issues"

* tag 'usb-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  xhci: Fix Panther point NULL pointer deref at full-speed re-enumeration
  usb: misc: ljca: Add Lunar Lake ljca GPIO HID to ljca_gpio_hids[]
  Revert "usb: typec: tcpm: clear pd_event queue in PORT_RESET"
  usb: typec: ucsi: Fix the return value of ucsi_run_command()
  usb: xhci: fix duplicate stall handling in handle_tx_event()
  usb: xhci: Check for xhci->interrupters being allocated in xhci_mem_clearup()
  thunderbolt: Mark XDomain as unplugged when router is removed
  thunderbolt: Fix memory leaks in {port|retimer}_sb_regs_write()

10 months agoMerge tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Sun, 18 Aug 2024 15:50:36 +0000 (08:50 -0700)]
Merge tag 'for-6.11-rc3-tag' of git://git./linux/kernel/git/kdave/linux

Pull more btrfs fixes from David Sterba:
 "A more fixes. We got reports that shrinker added in 6.10 still causes
  latency spikes and the fixes don't handle all corner cases. Due to
  summer holidays we're taking a shortcut to disable it for release
  builds and will fix it in the near future.

   - only enable extent map shrinker for DEBUG builds, temporary quick
     fix to avoid latency spikes for regular builds

   - update target inode's ctime on unlink, mandated by POSIX

   - properly take lock to read/update block group's zoned variables

   - add counted_by() annotations"

* tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: only enable extent map shrinker for DEBUG builds
  btrfs: zoned: properly take lock to read/update block group's zoned variables
  btrfs: tree-checker: add dev extent item checks
  btrfs: update target inode's ctime on unlink
  btrfs: send: annotate struct name_cache_entry with __counted_by()

10 months agofuse: Initialize beyond-EOF page contents before setting uptodate
Jann Horn [Tue, 6 Aug 2024 19:51:42 +0000 (21:51 +0200)]
fuse: Initialize beyond-EOF page contents before setting uptodate

fuse_notify_store(), unlike fuse_do_readpage(), does not enable page
zeroing (because it can be used to change partial page contents).

So fuse_notify_store() must be more careful to fully initialize page
contents (including parts of the page that are beyond end-of-file)
before marking the page uptodate.

The current code can leave beyond-EOF page contents uninitialized, which
makes these uninitialized page contents visible to userspace via mmap().

This is an information leak, but only affects systems which do not
enable init-on-alloc (via CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y or the
corresponding kernel command line parameter).

Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2574
Cc: stable@kernel.org
Fixes: a1d75f258230 ("fuse: add store request")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 months agoMerge tag 'mm-hotfixes-stable-2024-08-17-19-34' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 18 Aug 2024 02:50:16 +0000 (19:50 -0700)]
Merge tag 'mm-hotfixes-stable-2024-08-17-19-34' of git://git./linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "16 hotfixes. All except one are for MM. 10 of these are cc:stable and
  the others pertain to post-6.10 issues.

  As usual with these merges, singletons and doubletons all over the
  place, no identifiable-by-me theme. Please see the lovingly curated
  changelogs to get the skinny"

* tag 'mm-hotfixes-stable-2024-08-17-19-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/migrate: fix deadlock in migrate_pages_batch() on large folios
  alloc_tag: mark pages reserved during CMA activation as not tagged
  alloc_tag: introduce clear_page_tag_ref() helper function
  crash: fix riscv64 crash memory reserve dead loop
  selftests: memfd_secret: don't build memfd_secret test on unsupported arches
  mm: fix endless reclaim on machines with unaccepted memory
  selftests/mm: compaction_test: fix off by one in check_compaction()
  mm/numa: no task_numa_fault() call if PMD is changed
  mm/numa: no task_numa_fault() call if PTE is changed
  mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0
  mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu
  mm: don't account memmap per-node
  mm: add system wide stats items category
  mm: don't account memmap on failure
  mm/hugetlb: fix hugetlb vs. core-mm PT locking
  mseal: fix is_madv_discard()

10 months agoMerge tag 'powerpc-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 18 Aug 2024 02:23:02 +0000 (19:23 -0700)]
Merge tag 'powerpc-6.11-2' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix crashes on 85xx with some configs since the recent hugepd rework.

 - Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL on some
   platforms.

 - Don't enable offline cores when changing SMT modes, to match existing
   userspace behaviour.

Thanks to Christophe Leroy, Dr. David Alan Gilbert, Guenter Roeck, Nysal
Jan K.A, Shrikanth Hegde, Thomas Gleixner, and Tyrel Datwyler.

* tag 'powerpc-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/topology: Check if a core is online
  cpu/SMT: Enable SMT only if a core is online
  powerpc/mm: Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL
  powerpc/mm: Fix size of allocated PGDIR
  soc: fsl: qbman: remove unused struct 'cgr_comp'