platform/kernel/linux-rpi.git
6 years agobtrfs: switch types to int when counting eb pages
David Sterba [Thu, 1 Mar 2018 17:20:27 +0000 (18:20 +0100)]
btrfs: switch types to int when counting eb pages

The loops iterating eb pages use unsigned long, that's an overkill as
we know that there are at most 16 pages (64k / 4k), and 4 by default
(with nodesize 16k).

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: use round_up wrapper in num_extent_pages
David Sterba [Wed, 4 Jul 2018 15:49:31 +0000 (17:49 +0200)]
btrfs: use round_up wrapper in num_extent_pages

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: pass only eb to num_extent_pages
David Sterba [Fri, 29 Jun 2018 08:56:49 +0000 (10:56 +0200)]
btrfs: pass only eb to num_extent_pages

Almost all callers pass the start and len as 2 arguments but this is not
necessary, all the information is provided by the eb. By reordering the
calls to num_extent_pages, we don't need the local variables with
start/len.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: prune unused includes
David Sterba [Fri, 29 Jun 2018 08:56:47 +0000 (10:56 +0200)]
btrfs: prune unused includes

Remove includes if none of the interfaces and exports is used in the
given source file.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: use copy_page for copying pages instead of memcpy
David Sterba [Fri, 29 Jun 2018 08:56:44 +0000 (10:56 +0200)]
btrfs: use copy_page for copying pages instead of memcpy

Use the helper that's possibly optimized for full page copies.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: simplify pointer chasing of local fs_info variables
David Sterba [Fri, 29 Jun 2018 08:56:42 +0000 (10:56 +0200)]
btrfs: simplify pointer chasing of local fs_info variables

Functions that get btrfs inode can simply reach the fs_info by
dereferencing the root and this looks a bit more straightforward
compared to the btrfs_sb(...) indirection.

If the transaction handle is available and not NULL it's used instead.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: simplify some assignments of inode numbers
David Sterba [Fri, 29 Jun 2018 08:56:40 +0000 (10:56 +0200)]
btrfs: simplify some assignments of inode numbers

There are several places when the btrfs inode is converted to the
generic inode, back to btrfs and then passed to btrfs_ino. We can remove
the extra back and forth conversions.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agoBtrfs: free space cache: make sure there is always room for generation number
Zhihui Zhang [Tue, 3 Jul 2018 00:00:54 +0000 (20:00 -0400)]
Btrfs: free space cache: make sure there is always room for generation number

io_ctl_set_generation() assumes that the generation number shares
the same page with inline CRCs. Let's make sure this is always true.

Signed-off-by: Zhihui Zhang <zzhsuny@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: drop unnecessary variable in btrfs_init_new_device
Anand Jain [Tue, 3 Jul 2018 05:14:51 +0000 (13:14 +0800)]
btrfs: drop unnecessary variable in btrfs_init_new_device

There is only usage of the declared devices variable, instead use its
value directly.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: use a temporary variable for fs_devices in btrfs_init_new_device
Anand Jain [Tue, 3 Jul 2018 05:14:50 +0000 (13:14 +0800)]
btrfs: use a temporary variable for fs_devices in btrfs_init_new_device

There are many instances of the %fs_info->fs_devices pointer
dereferences, use a temporary variable instead.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized
Qu Wenruo [Tue, 3 Jul 2018 09:10:07 +0000 (17:10 +0800)]
btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized

Invalid reloc tree can cause kernel NULL pointer dereference when btrfs
does some cleanup of the reloc roots.

It turns out that fs_info::reloc_ctl can be NULL in
btrfs_recover_relocation() as we allocate relocation control after all
reloc roots have been verified.
So when we hit: note, we haven't called set_reloc_control() thus
fs_info::reloc_ctl is still NULL.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199833
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: tree-checker: Detect invalid and empty essential trees
Qu Wenruo [Tue, 3 Jul 2018 09:10:06 +0000 (17:10 +0800)]
btrfs: tree-checker: Detect invalid and empty essential trees

A crafted image has empty root tree block, which will later cause NULL
pointer dereference.

The following trees should never be empty:
1) Tree root
   Must contain at least root items for extent tree, device tree and fs
   tree

2) Chunk tree
   Or we can't even bootstrap as it contains the mapping.

3) Fs tree
   At least inode item for top level inode (.).

4) Device tree
   Dev extents for chunks

5) Extent tree
   Must have corresponding extent for each chunk.

If any of them is empty, we are sure the fs is corrupted and no need to
mount it.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: tree-checker: Verify block_group_item
Qu Wenruo [Tue, 3 Jul 2018 09:10:05 +0000 (17:10 +0800)]
btrfs: tree-checker: Verify block_group_item

A crafted image with invalid block group items could make free space cache
code to cause panic.

We could detect such invalid block group item by checking:
1) Item size
   Known fixed value.
2) Block group size (key.offset)
   We have an upper limit on block group item (10G)
3) Chunk objectid
   Known fixed value.
4) Type
   Only 4 valid type values, DATA, METADATA, SYSTEM and DATA|METADATA.
   No more than 1 bit set for profile type.
5) Used space
   No more than the block group size.

This should allow btrfs to detect and refuse to mount the crafted image.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199849
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: annotate unlikely branches after V0 extent type removal
David Sterba [Tue, 26 Jun 2018 14:20:59 +0000 (16:20 +0200)]
btrfs: annotate unlikely branches after V0 extent type removal

The v0 extent type checks are the right case for the unlikely
annotations as we don't expect to ever see them, so let's give the
compiler some hint.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Add graceful handling of V0 extents
Nikolay Borisov [Tue, 26 Jun 2018 13:57:36 +0000 (16:57 +0300)]
btrfs: Add graceful handling of V0 extents

Following the removal of the v0 handling code let's be courteous and
print an error message when such extents are handled. In the cases
where we have a transaction just abort it, otherwise just call
btrfs_handle_fs_error. Both cases result in the FS being re-mounted RO.

In case the error handling would be too intrusive, leave the BUG_ON in
place, like extent_data_ref_count, other proper handling would catch
that earlier.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove V0 extent support
Nikolay Borisov [Thu, 21 Jun 2018 06:45:00 +0000 (09:45 +0300)]
btrfs: Remove V0 extent support

The v0 compat code was introduced in commit 5d4f98a28c7d
("Btrfs: Mixed back reference  (FORWARD ROLLING FORMAT CHANGE)") 9
years ago, which was merged in 2.6.31. This means that the code is
there to support filesystems which are _VERY_ old and if you are using
btrfs on such an old kernel, you have much bigger problems. This coupled
with the fact that no one is likely testing/maintining this code likely
means it has bugs lurking. All things considered I think 43 kernel
releases later it's high time this remnant of the past got removed.

This patch removes all code wrapped in #ifdefs but leaves the BUG_ONs in case
we have a v0 with no support intact as a sort of safety-net.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: remove unnecessary curly braces in btrfs_get_acl
Chengguang Xu [Wed, 27 Jun 2018 04:16:38 +0000 (12:16 +0800)]
btrfs: remove unnecessary curly braces in btrfs_get_acl

It's only coding style fix not functinal change.  When if/else has only
one statement then the braces are not needed.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: avoid error code override in btrfs_get_acl
Chengguang Xu [Wed, 27 Jun 2018 04:16:37 +0000 (12:16 +0800)]
btrfs: avoid error code override in btrfs_get_acl

It's not good to override the error code when failing from
btrfs_getxattr() in btrfs_get_acl() because it hides the real reason of
the failure.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: remove unnecessary -ERANGE check in btrfs_get_acl
Chengguang Xu [Wed, 27 Jun 2018 04:16:36 +0000 (12:16 +0800)]
btrfs: remove unnecessary -ERANGE check in btrfs_get_acl

There is no chance to get into -ERANGE error condition because we first
call btrfs_getxattr to get the length of the attribute, then we do a
subsequent call with the size from the first call.  Between the 2 calls
the size shouldn't change. So remove the unnecessary -ERANGE error
check.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: replace empty string with NULL when getting attribute length in btrfs_get_acl
Chengguang Xu [Wed, 27 Jun 2018 04:16:35 +0000 (12:16 +0800)]
btrfs: replace empty string with NULL when getting attribute length in btrfs_get_acl

In btrfs_get_acl() the first call of btr_getxattr() is for getting the
length of attribute, the value buffer is never used in this case. So
it's better to replace empty string with NULL.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: return error instead of crash when detecting unexpected type in btrfs_get_acl
Chengguang Xu [Wed, 27 Jun 2018 04:16:34 +0000 (12:16 +0800)]
btrfs: return error instead of crash when detecting unexpected type in btrfs_get_acl

The caller of btrfs_get_acl() checks error condition so there is no
impact from this change. In practice there is no chance to get into
default case of switch statement because VFS has already checked the
type.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: return EUCLEAN if extent_inline_ref type is invalid
Su Yue [Fri, 22 Jun 2018 08:18:01 +0000 (16:18 +0800)]
btrfs: return EUCLEAN if extent_inline_ref type is invalid

If type of extent_inline_ref found is not expected, filesystem may have
been corrupted, should return EUCLEAN instead of EINVAL.

Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Use iocb to derive pos instead of passing a separate parameter
Goldwyn Rodrigues [Sun, 17 Jun 2018 17:39:47 +0000 (12:39 -0500)]
btrfs: Use iocb to derive pos instead of passing a separate parameter

struct kiocb carries the ki_pos, so there is no need to pass it as
a separate function parameter.

generic_file_direct_write() increments ki_pos, so we now assign pos
after the function.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
[ rename to btrfs_buffered_write ]
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: print more details when checking tree block finds a problem
Su Yue [Fri, 22 Jun 2018 01:52:15 +0000 (09:52 +0800)]
btrfs: print more details when checking tree block finds a problem

For easier debugging, print eb->start if level is invalid.  Also make
clear if bytenr found is not expected.

Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Streamline memory allocation failure handling in btrfs_add_delayed_tree_ref
Nikolay Borisov [Wed, 20 Jun 2018 15:43:12 +0000 (18:43 +0300)]
btrfs: Streamline memory allocation failure handling in btrfs_add_delayed_tree_ref

Currently the function uses 2 goto labels to properly handle allocation
failures. This could be simplified by simply re-arranging the code so
that allocations are the in the beginning of the function. This allows
to use simple return statements. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Don't remove block group that still has pinned down bytes
Qu Wenruo [Fri, 22 Jun 2018 04:35:00 +0000 (12:35 +0800)]
btrfs: Don't remove block group that still has pinned down bytes

[BUG]
Under certain KVM load and LTP tests, it is possible to hit the
following calltrace if quota is enabled:

BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
BTRFS critical (device vda2): unable to find logical 8820195328 length 4096

WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 blk_status_to_errno+0x1a/0x30
CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 (unreleased)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
task: ffff9f827b340bc0 task.stack: ffffb4f8c0304000
RIP: 0010:blk_status_to_errno+0x1a/0x30
Call Trace:
 submit_extent_page+0x191/0x270 [btrfs]
 ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
 __do_readpage+0x2d2/0x810 [btrfs]
 ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
 ? run_one_async_done+0xc0/0xc0 [btrfs]
 __extent_read_full_page+0xe7/0x100 [btrfs]
 ? run_one_async_done+0xc0/0xc0 [btrfs]
 read_extent_buffer_pages+0x1ab/0x2d0 [btrfs]
 ? run_one_async_done+0xc0/0xc0 [btrfs]
 btree_read_extent_buffer_pages+0x94/0xf0 [btrfs]
 read_tree_block+0x31/0x60 [btrfs]
 read_block_for_search.isra.35+0xf0/0x2e0 [btrfs]
 btrfs_search_slot+0x46b/0xa00 [btrfs]
 ? kmem_cache_alloc+0x1a8/0x510
 ? btrfs_get_token_32+0x5b/0x120 [btrfs]
 find_parent_nodes+0x11d/0xeb0 [btrfs]
 ? leaf_space_used+0xb8/0xd0 [btrfs]
 ? btrfs_leaf_free_space+0x49/0x90 [btrfs]
 ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
 btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
 btrfs_find_all_roots+0x45/0x60 [btrfs]
 btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs]
 btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs]
 btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs]
 insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs]
 btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs]
 ? pick_next_task_fair+0x2cd/0x530
 ? __switch_to+0x92/0x4b0
 btrfs_worker_helper+0x81/0x300 [btrfs]
 process_one_work+0x1da/0x3f0
 worker_thread+0x2b/0x3f0
 ? process_one_work+0x3f0/0x3f0
 kthread+0x11a/0x130
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x35/0x40

BTRFS critical (device vda2): unable to find logical 8820195328 length 16384
BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO failure
BTRFS info (device vda2): forced readonly
BTRFS error (device vda2): pending csums is 2887680

[CAUSE]
It's caused by race with block group auto removal:

- There is a meta block group X, which has only one tree block
  The tree block belongs to fs tree 257.
- In current transaction, some operation modified fs tree 257
  The tree block gets COWed, so the block group X is empty, and marked
  as unused, queued to be deleted.
- Some workload (like fsync) wakes up cleaner_kthread()
  Which will call btrfs_delete_unused_bgs() to remove unused block
  groups.
  So block group X along its chunk map get removed.
- Some delalloc work finished for fs tree 257
  Quota needs to get the original reference of the extent, which will
  read tree blocks of commit root of 257.
  Then since the chunk map gets removed, the above warning gets
  triggered.

[FIX]
Just let btrfs_delete_unused_bgs() skip block group which still has
pinned bytes.

However there is a minor side effect: currently we only queue empty
blocks at update_block_group(), and such empty block group with pinned
bytes won't go through update_block_group() again, such block group
won't be removed, until it gets new extent allocated and removed.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Refactor count handling in btrfs_unpin_free_ino
Geert Uytterhoeven [Fri, 22 Jun 2018 07:18:29 +0000 (09:18 +0200)]
btrfs: Refactor count handling in btrfs_unpin_free_ino

With gcc 4.1.2:

    fs/btrfs/inode-map.c: In function ‘btrfs_unpin_free_ino’:
    fs/btrfs/inode-map.c:241: warning: ‘count’ may be used uninitialized in this function

While this warning is a false-positive, it can easily be killed by
refactoring the code.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: use timespec64 for i_otime
Arnd Bergmann [Thu, 21 Jun 2018 16:04:06 +0000 (18:04 +0200)]
btrfs: use timespec64 for i_otime

While the regular inode timestamps all use timespec64 now, the i_otime
field is btrfs specific and still needs to be converted to correctly
represent times beyond 2038.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: use monotonic time for transaction handling
Arnd Bergmann [Thu, 21 Jun 2018 16:04:05 +0000 (18:04 +0200)]
btrfs: use monotonic time for transaction handling

The transaction times were changed to ktime_get_real_seconds to avoid
the y2038 overflow, but they still have a minor problem when they go
backwards or jump due to settimeofday() or leap seconds.

This changes the transaction handling to instead use ktime_get_seconds(),
which returns a CLOCK_MONOTONIC timestamp that has neither of those
problems.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Get rid of the confusing btrfs_file_extent_inline_len
Qu Wenruo [Wed, 6 Jun 2018 07:41:49 +0000 (15:41 +0800)]
btrfs: Get rid of the confusing btrfs_file_extent_inline_len

We used to call btrfs_file_extent_inline_len() to get the uncompressed
data size of an inlined extent.

However this function is hiding evil, for compressed extent, it has no
choice but to directly read out ram_bytes from btrfs_file_extent_item.
While for uncompressed extent, it uses item size to calculate the real
data size, and ignoring ram_bytes completely.

In fact, for corrupted ram_bytes, due to above behavior kernel
btrfs_print_leaf() can't even print correct ram_bytes to expose the bug.

Since we have the tree-checker to verify all EXTENT_DATA, such mismatch
can be detected pretty easily, thus we can trust ram_bytes without the
evil btrfs_file_extent_inline_len().

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Deduplicate extent_buffer init code
Nikolay Borisov [Mon, 18 Jun 2018 11:13:19 +0000 (14:13 +0300)]
btrfs: Deduplicate extent_buffer init code

When a new extent buffer is allocated there are a few mandatory fields
which need to be set in order for the buffer to be sane: level,
generation, bytenr, backref_rev, owner and FSID/UUID. Currently this
is open coded in the callers of btrfs_alloc_tree_block, meaning it's
fairly high in the abstraction hierarchy of operations. This patch
solves this by simply moving this init code in btrfs_init_new_buffer,
since this is the function which initializes a newly allocated
extent buffer. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: check-integrity: Fix NULL pointer dereference for degraded mount
Qu Wenruo [Wed, 20 Jun 2018 07:38:58 +0000 (15:38 +0800)]
btrfs: check-integrity: Fix NULL pointer dereference for degraded mount

Commit f8f84b2dfda5 ("btrfs: index check-integrity state hash by a dev_t")
changed how btrfsic indexes device state.

Now we need to access device->bdev->bd_dev, while for degraded mount
it's completely possible to have device->bdev as NULL, thus it will
trigger a NULL pointer dereference at mount time.

Fix it by checking if the device is degraded before accessing
device->bdev->bd_dev.

There are a lot of other places accessing device->bdev->bd_dev, however
the other call sites have either checked device->bdev, or the
device->bdev is passed from btrfsic_map_block(), so it won't cause harm.

Fixes: f8f84b2dfda5 ("btrfs: index check-integrity state hash by a dev_t")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from btrfs_force_chunk_alloc
Nikolay Borisov [Wed, 20 Jun 2018 12:49:15 +0000 (15:49 +0300)]
btrfs: Remove fs_info from btrfs_force_chunk_alloc

It can be referenced from the passed transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from btrfs_inc_block_group_ro
Nikolay Borisov [Wed, 20 Jun 2018 12:49:14 +0000 (15:49 +0300)]
btrfs: Remove fs_info from btrfs_inc_block_group_ro

It can be referenced from the passed bg cache.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from btrfs_alloc_logged_file_extent
Nikolay Borisov [Wed, 20 Jun 2018 12:49:13 +0000 (15:49 +0300)]
btrfs: Remove fs_info from btrfs_alloc_logged_file_extent

It can be referenced from trans since the function is always called
within a valid transaction.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from remove_extent_backref
Nikolay Borisov [Wed, 20 Jun 2018 12:49:12 +0000 (15:49 +0300)]
btrfs: Remove fs_info from remove_extent_backref

It can be referenced directly from the transaction handle since it's
always valid.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from run_one_delayed_ref
Nikolay Borisov [Wed, 20 Jun 2018 12:49:11 +0000 (15:49 +0300)]
btrfs: Remove fs_info from run_one_delayed_ref

It can be referenced from the passed transaction handle, since it's
always valid.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from insert_inline_extent_backref
Nikolay Borisov [Wed, 20 Jun 2018 12:49:10 +0000 (15:49 +0300)]
btrfs: Remove fs_info from insert_inline_extent_backref

It can be referenced from the passed transaction handle, since it's
always valid.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from exclude_super_stripes
Nikolay Borisov [Wed, 20 Jun 2018 12:49:09 +0000 (15:49 +0300)]
btrfs: Remove fs_info from exclude_super_stripes

It can be referenced from the passed block group.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from free_excluded_extents
Nikolay Borisov [Wed, 20 Jun 2018 12:49:08 +0000 (15:49 +0300)]
btrfs: Remove fs_info from free_excluded_extents

It can be referenced from the passed block group.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from check_system_chunk
Nikolay Borisov [Wed, 20 Jun 2018 12:49:07 +0000 (15:49 +0300)]
btrfs: Remove fs_info from check_system_chunk

It can be referenced from trans since the function is always called
within a transaction.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from btrfs_alloc_chunk
Nikolay Borisov [Wed, 20 Jun 2018 12:49:06 +0000 (15:49 +0300)]
btrfs: Remove fs_info from btrfs_alloc_chunk

It can be referenced from trans since the function is always called
within a transaction.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from do_chunk_alloc
Nikolay Borisov [Wed, 20 Jun 2018 12:49:05 +0000 (15:49 +0300)]
btrfs: Remove fs_info from do_chunk_alloc

This function is always called with a valid transaction handle from
where fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from run_delayed_tree_ref
Nikolay Borisov [Wed, 20 Jun 2018 12:49:04 +0000 (15:49 +0300)]
btrfs: Remove fs_info from run_delayed_tree_ref

It can always be referneced from the passed transaction handle since
it's always valid. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from cleanup_ref_head
Nikolay Borisov [Wed, 20 Jun 2018 12:49:03 +0000 (15:49 +0300)]
btrfs: Remove fs_info from cleanup_ref_head

fs_info can be refenreced from the transaction handle, since it's always
valid. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove unused fs_info from cleanup_extent_op
Nikolay Borisov [Wed, 20 Jun 2018 12:49:02 +0000 (15:49 +0300)]
btrfs: Remove unused fs_info from cleanup_extent_op

The argument is no longer used so remove it.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from run_delayed_extent_op
Nikolay Borisov [Wed, 20 Jun 2018 12:49:01 +0000 (15:49 +0300)]
btrfs: Remove fs_info from run_delayed_extent_op

This function is always called with a valid transaction handle so
fs_info can be referenced from there. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from run_delayed_data_ref
Nikolay Borisov [Wed, 20 Jun 2018 12:49:00 +0000 (15:49 +0300)]
btrfs: Remove fs_info from run_delayed_data_ref

This function is always called with a valid transaction from where
fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info argument from __btrfs_inc_extent_ref
Nikolay Borisov [Wed, 20 Jun 2018 12:48:59 +0000 (15:48 +0300)]
btrfs: Remove fs_info argument from __btrfs_inc_extent_ref

This function already takes a transaction which holds a reference to
the fs_info struct. Use that reference and remove the extra arg. No
functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from alloc_reserved_file_extent
Nikolay Borisov [Wed, 20 Jun 2018 12:48:58 +0000 (15:48 +0300)]
btrfs: Remove fs_info from alloc_reserved_file_extent

fs_info can be referenced from the transaction handle, which is always
valid. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from __btrfs_free_extent
Nikolay Borisov [Wed, 20 Jun 2018 12:48:57 +0000 (15:48 +0300)]
btrfs: Remove fs_info from __btrfs_free_extent

This function is always called with a valid transaction handle so we
can reference the fs_info from there. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from btrfs_remove_block_group
Nikolay Borisov [Wed, 20 Jun 2018 12:48:56 +0000 (15:48 +0300)]
btrfs: Remove fs_info from btrfs_remove_block_group

This function is always called with a valid transaction handle from
where we can reference fs_info. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from btrfs_make_block_group
Nikolay Borisov [Wed, 20 Jun 2018 12:48:55 +0000 (15:48 +0300)]
btrfs: Remove fs_info from btrfs_make_block_group

This function is always called with a valid transaction handle from
where we can reference the fs_info. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from btrfs_add_delayed_data_ref
Nikolay Borisov [Wed, 20 Jun 2018 12:48:54 +0000 (15:48 +0300)]
btrfs: Remove fs_info from btrfs_add_delayed_data_ref

This function is always called with a valid transaction handle from
where fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from btrfs_add_delayed_tree_ref
Nikolay Borisov [Wed, 20 Jun 2018 12:48:53 +0000 (15:48 +0300)]
btrfs: Remove fs_info from btrfs_add_delayed_tree_ref

This function is always called with a valid transaction handle from
where fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from lookup_extent_backref
Nikolay Borisov [Wed, 20 Jun 2018 12:48:52 +0000 (15:48 +0300)]
btrfs: Remove fs_info from lookup_extent_backref

This argument is unused. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info argument from lookup_extent_data_ref
Nikolay Borisov [Wed, 20 Jun 2018 12:48:51 +0000 (15:48 +0300)]
btrfs: Remove fs_info argument from lookup_extent_data_ref

This function is always called with a valid transaction handle from
where fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info argument from lookup_tree_block_ref
Nikolay Borisov [Wed, 20 Jun 2018 12:48:50 +0000 (15:48 +0300)]
btrfs: Remove fs_info argument from lookup_tree_block_ref

This function is always called with a valid transaction handle from
where the fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info argument from update_inline_extent_backref
Nikolay Borisov [Wed, 20 Jun 2018 12:48:49 +0000 (15:48 +0300)]
btrfs: Remove fs_info argument from update_inline_extent_backref

This function always uses the leaf's extent_buffer which already
contains a reference to the fs_info. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from lookup_inline_extent_backref
Nikolay Borisov [Wed, 20 Jun 2018 12:48:48 +0000 (15:48 +0300)]
btrfs: Remove fs_info from lookup_inline_extent_backref

This function is always called with a valid transaction handle from
where the fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from fixup_low_keys
Nikolay Borisov [Wed, 20 Jun 2018 12:48:47 +0000 (15:48 +0300)]
btrfs: Remove fs_info from fixup_low_keys

This argument is unused. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from remove_extent_data_ref
Nikolay Borisov [Wed, 20 Jun 2018 12:48:46 +0000 (15:48 +0300)]
btrfs: Remove fs_info from remove_extent_data_ref

This function is always called with a valid transaction from where the
fs_info can be referenced. No functional change.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info argument from insert_extent_backref
Nikolay Borisov [Wed, 20 Jun 2018 12:48:45 +0000 (15:48 +0300)]
btrfs: Remove fs_info argument from insert_extent_backref

This function is always called with a valid transaction handle from
where fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from insert_extent_data_ref
Nikolay Borisov [Wed, 20 Jun 2018 12:48:44 +0000 (15:48 +0300)]
btrfs: Remove fs_info from insert_extent_data_ref

This function is always called with a valid transaction handle from
where fs_info can be referenced. So remove the redundant argument.
No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove fs_info from insert_tree_block_ref
Nikolay Borisov [Wed, 20 Jun 2018 12:48:43 +0000 (15:48 +0300)]
btrfs: Remove fs_info from insert_tree_block_ref

This function is always called with a valid transaction so there is no
need to duplicate the fs_info, we can reference it directly from the
trans handle. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Fix a C compliance issue
Bart Van Assche [Wed, 20 Jun 2018 17:03:33 +0000 (10:03 -0700)]
btrfs: Fix a C compliance issue

The C programming language does not allow to use preprocessor statements
inside macro arguments (pr_info() is defined as a macro). Hence rework
the pr_info() statement in btrfs_print_mod_info() such that it becomes
compliant. This patch allows tools like sparse to analyze the BTRFS
source code.

Fixes: 62e855771dac ("btrfs: convert printk(KERN_* to use pr_* calls")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Annotate fall-through when parsing mount option
Bart Van Assche [Wed, 20 Jun 2018 17:03:32 +0000 (10:03 -0700)]
btrfs: Annotate fall-through when parsing mount option

This patch avoids that the compiler complains that a fall-through
annotation is missing when building with W=1.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Fix misleading indentation reported by smatch
Bart Van Assche [Wed, 20 Jun 2018 17:03:31 +0000 (10:03 -0700)]
btrfs: Fix misleading indentation reported by smatch

This patch avoids that building the BTRFS source code with smatch
triggers complaints about inconsistent indenting.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Streamline log_extent_csums a bit
Nikolay Borisov [Wed, 20 Jun 2018 14:26:42 +0000 (17:26 +0300)]
btrfs: Streamline log_extent_csums a bit

Currently this function takes the root as an argument only to get the
log_root from it. Simplify this by directly passing the log root from
the caller. Also eliminate the fs_info local variable, since it's used
only once, so directly reference it from the transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: remove remaing full_sync logic from btrfs_sync_file
David Sterba [Thu, 19 Jul 2018 13:27:46 +0000 (15:27 +0200)]
btrfs: remove remaing full_sync logic from btrfs_sync_file

The logic to check if the inode is already in the log can now be
simplified since we always wait for the ordered extents to complete
before deciding whether the inode needs to be logged. The big comment
about it can go away too.

CC: Filipe Manana <fdmanana@suse.com>
Suggested-by: Filipe Manana <fdmanana@suse.com>
[ code and changelog copied from mail discussion ]
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: remove the logged extents infrastructure
Josef Bacik [Wed, 23 May 2018 15:58:36 +0000 (11:58 -0400)]
btrfs: remove the logged extents infrastructure

This is no longer used anywhere, remove all of it.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: clean up the left over logged_list usage
Josef Bacik [Wed, 23 May 2018 15:58:35 +0000 (11:58 -0400)]
btrfs: clean up the left over logged_list usage

We no longer use this list we've passed around so remove it everywhere.
Also remove the extra checks for ordered/filemap errors as this is
handled higher up now that we're waiting on ordered_extents before
getting to the tree log code.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: remove the wait ordered logic in the log_one_extent path
Josef Bacik [Wed, 23 May 2018 15:58:34 +0000 (11:58 -0400)]
btrfs: remove the wait ordered logic in the log_one_extent path

Since we are waiting on all ordered extents at the start of the fsync()
path we don't need to wait on any logged ordered extents, and we don't
need to look up the checksums on the ordered extents as they will
already be on disk prior to getting here.  Rework this so we're only
looking up and copying the on-disk checksums for the extent range we
care about.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: always wait on ordered extents at fsync time
Josef Bacik [Wed, 23 May 2018 15:58:33 +0000 (11:58 -0400)]
btrfs: always wait on ordered extents at fsync time

There's a priority inversion that exists currently with btrfs fsync.  In
some cases we will collect outstanding ordered extents onto a list and
only wait on them at the very last second.  However this "very last
second" falls inside of a transaction handle, so if we are in a lower
priority cgroup we can end up holding the transaction open for longer
than needed, so if a high priority cgroup is also trying to fsync()
it'll see latency.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Fix comment in lookup_inline_extent_backref
Nikolay Borisov [Mon, 18 Jun 2018 11:59:26 +0000 (14:59 +0300)]
btrfs: Fix comment in lookup_inline_extent_backref

The comment wrongfully states that the owner parameter is the level of
the parent block. In fact owner is the level of the current block and
by adding 1 to it we can eventually get to the parent/root.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Document __btrfs_inc_extent_ref
Nikolay Borisov [Mon, 18 Jun 2018 11:59:25 +0000 (14:59 +0300)]
btrfs: Document __btrfs_inc_extent_ref

Here is a doc-only patch which tires to deobfuscate the terra-incognita
that arguments for delayed refs are.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: scrub: Remove unused copy_nocow_pages and its callchain
Qu Wenruo [Wed, 6 Jun 2018 05:13:12 +0000 (13:13 +0800)]
btrfs: scrub: Remove unused copy_nocow_pages and its callchain

Since commit ac0b4145d662a3b9e340 ("btrfs: scrub: Don't use inode pages
for device replace") the function is not used and we can remove all
functions down the call chain.

There was an optimization that reused inode pages to speed up device
replace, but broke when there was nodatasum and compressed page. The
potential performance gain is small so we don't loose much by removing
it and using scrub_pages same as the other pages.

Signed-off-by: Qu Wenruo <wqu@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: replace get_seconds with new 64bit time API
Allen Pais [Tue, 12 Jun 2018 11:48:25 +0000 (17:18 +0530)]
btrfs: replace get_seconds with new 64bit time API

The get_seconds() function is deprecated as it truncates the timestamp
to 32 bits. Change it to or ktime_get_real_seconds().

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agoLinux 4.18-rc8
Linus Torvalds [Sun, 5 Aug 2018 19:37:41 +0000 (12:37 -0700)]
Linux 4.18-rc8

6 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 5 Aug 2018 16:39:30 +0000 (09:39 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fix from Thomas Gleixner:
 "A single fix, which addresses boot failures on machines which do not
  report EBDA correctly, which can place the trampoline into reserved
  memory regions. Validating against E820 prevents that"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/compressed/64: Validate trampoline placement against E820

6 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 5 Aug 2018 16:25:29 +0000 (09:25 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "Two oneliners addressing NOHZ failures:

   - Use a bitmask to check for the pending timer softirq and not the
     bit number. The existing code using the bit number checked for
     the wrong bit, which caused timers to either expire late or stop
     completely.

   - Make the nohz evaluation on interrupt exit more robust. The
     existing code did not re-arm the hardware when interrupting a
     running softirq in task context (ksoftirqd or tail of
     local_bh_enable()), which caused timers to either expire late
     or stop completely"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  nohz: Fix missing tick reprogram when interrupting an inline softirq
  nohz: Fix local_timer_softirq_pending()

6 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 5 Aug 2018 16:13:07 +0000 (09:13 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "A set of fixes for perf:

  Kernel side:

   - Fix the hardcoded index of extra PCI devices on Broadwell which
     caused a resource conflict and triggered warnings on CPU hotplug.

  Tooling:

   - Update the tools copy of several files, including perf_event.h,
     powerpc's asm/unistd.h (new io_pgetevents syscall), bpf.h and x86's
     memcpy_64.s (used in 'perf bench mem'), silencing the respective
     warnings during the perf tools build.

   - Fix the build on the alpine:edge distro"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Fix hardcoded index of Broadwell extra PCI devices
  perf tools: Fix the build on the alpine:edge distro
  tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'
  tools headers uapi: Refresh linux/bpf.h copy
  tools headers powerpc: Update asm/unistd.h copy to pick new
  tools headers uapi: Update tools's copy of linux/perf_event.h

6 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 5 Aug 2018 15:55:26 +0000 (08:55 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
 "A single bugfix for the irq core to prevent silent data corruption and
  malfunction of threaded interrupts under certain conditions"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Make force irq threading setup more robust

6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sun, 5 Aug 2018 15:20:39 +0000 (08:20 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Handle frames in error situations properly in AF_XDP, from Jakub
    Kicinski.

 2) tcp_mmap test case only tests ipv6 due to a thinko, fix from
    Maninder Singh.

 3) Session refcnt fix in l2tp_ppp, from Guillaume Nault.

 4) Fix regression in netlink bind handling of multicast gruops, from
    Dmitry Safonov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  netlink: Don't shift on 64 for ngroups
  net/smc: no cursor update send in state SMC_INIT
  l2tp: fix missing refcount drop in pppol2tp_tunnel_ioctl()
  mlxsw: core_acl_flex_actions: Remove redundant mirror resource destruction
  mlxsw: core_acl_flex_actions: Remove redundant counter destruction
  mlxsw: core_acl_flex_actions: Remove redundant resource destruction
  mlxsw: core_acl_flex_actions: Return error for conflicting actions
  selftests/bpf: update test_lwt_seg6local.sh according to iproute2
  drivers: net: lmc: fix case value for target abort error
  selftest/net: fix protocol family to work for IPv4.
  net: xsk: don't return frames via the allocator on error
  tools/bpftool: fix a percpu_array map dump problem

6 years agoMerge tag 'usercopy-fix-v4.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 Aug 2018 01:34:55 +0000 (18:34 -0700)]
Merge tag 'usercopy-fix-v4.18-rc8' of git://git./linux/kernel/git/kees/linux

Pull usercopy whitelisting fix from Kees Cook:
 "Bart Massey discovered that the usercopy whitelist for JFS was
  incomplete: the inline inode data may intentionally "overflow" into
  the neighboring "extended area", so the size of the whitelist needed
  to be raised to include the neighboring field"

* tag 'usercopy-fix-v4.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  jfs: Fix usercopy whitelist for inline inode data

6 years agoMerge tag 'xfs-4.18-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sun, 5 Aug 2018 01:30:58 +0000 (18:30 -0700)]
Merge tag 'xfs-4.18-fixes-5' of git://git./fs/xfs/xfs-linux

Pull xfs bugfix from Darrick Wong:
 "One more patch for 4.18 to fix a coding error in the iomap_bmap()
  function introduced in -rc1: fix incorrect shifting"

* tag 'xfs-4.18-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  fs: fix iomap_bmap position calculation

6 years agoPartially revert "block: fail op_is_write() requests to read-only partitions"
Linus Torvalds [Fri, 3 Aug 2018 19:22:09 +0000 (12:22 -0700)]
Partially revert "block: fail op_is_write() requests to read-only partitions"

It turns out that commit 721c7fc701c7 ("block: fail op_is_write()
requests to read-only partitions"), while obviously correct, causes
problems for some older lvm2 installations.

The reason is that the lvm snapshotting will continue to write to the
snapshow COW volume, even after the volume has been marked read-only.
End result: snapshot failure.

This has actually been fixed in newer version of the lvm2 tool, but the
old tools still exist, and the breakage was reported both in the kernel
bugzilla and in the Debian bugzilla:

  https://bugzilla.kernel.org/show_bug.cgi?id=200439
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900442

The lvm2 fix is here

  https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3

but until everybody has updated to recent versions, we'll have to weaken
the "never write to read-only partitions" check.  It now allows the
write to happen, but causes a warning, something like this:

  generic_make_request: Trying to write to read-only block-device dm-3 (partno X)
  Modules linked in: nf_tables xt_cgroup xt_owner kvm_intel iwlmvm kvm irqbypass iwlwifi
  CPU: 1 PID: 77 Comm: kworker/1:1 Not tainted 4.17.9-gentoo #3
  Hardware name: LENOVO 20B6A019RT/20B6A019RT, BIOS GJET91WW (2.41 ) 09/21/2016
  Workqueue: ksnaphd do_metadata
  RIP: 0010:generic_make_request_checks+0x4ac/0x600
  ...
  Call Trace:
   generic_make_request+0x64/0x400
   submit_bio+0x6c/0x140
   dispatch_io+0x287/0x430
   sync_io+0xc3/0x120
   dm_io+0x1f8/0x220
   do_metadata+0x1d/0x30
   process_one_work+0x1b9/0x3e0
   worker_thread+0x2b/0x3c0
   kthread+0x113/0x130
   ret_from_fork+0x35/0x40

Note that this is a "revert" in behavior only.  I'm leaving alone the
actual code cleanups in commit 721c7fc701c7, but letting the previously
uncaught request go through with a warning instead of stopping it.

Fixes: 721c7fc701c7 ("block: fail op_is_write() requests to read-only partitions")
Reported-and-tested-by: WGH <wgh@torlan.ru>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agonetlink: Don't shift on 64 for ngroups
Dmitry Safonov [Sun, 5 Aug 2018 00:35:53 +0000 (01:35 +0100)]
netlink: Don't shift on 64 for ngroups

It's legal to have 64 groups for netlink_sock.

As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe
only to first 32 groups.

The check for correctness of .bind() userspace supplied parameter
is done by applying mask made from ngroups shift. Which broke Android
as they have 64 groups and the shift for mask resulted in an overflow.

Fixes: 61f4b23769f0 ("netlink: Don't shift with UB on nlk->ngroups")
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Reported-and-Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
David S. Miller [Sun, 5 Aug 2018 00:51:55 +0000 (17:51 -0700)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2018-08-05

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix bpftool percpu_array dump by using correct roundup to next
   multiple of 8 for the value size, from Yonghong.

2) Fix in AF_XDP's __xsk_rcv_zc() to not returning frames back to
   allocator since driver will recycle frame anyway in case of an
   error, from Jakub.

3) Fix up BPF test_lwt_seg6local test cases to final iproute2
   syntax, from Mathieu.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: no cursor update send in state SMC_INIT
Ursula Braun [Fri, 3 Aug 2018 08:38:33 +0000 (10:38 +0200)]
net/smc: no cursor update send in state SMC_INIT

If a writer blocked condition is received without data, the current
consumer cursor is immediately sent. Servers could already receive this
condition in state SMC_INIT without finished tx-setup. This patch
avoids sending a consumer cursor update in this case.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agojfs: Fix usercopy whitelist for inline inode data
Kees Cook [Fri, 3 Aug 2018 19:52:58 +0000 (12:52 -0700)]
jfs: Fix usercopy whitelist for inline inode data

Bart Massey reported what turned out to be a usercopy whitelist false
positive in JFS when symlink contents exceeded 128 bytes. The inline
inode data (i_inline) is actually designed to overflow into the "extended
area" following it (i_inline_ea) when needed. So the whitelist needed to
be expanded to include both i_inline and i_inline_ea (the whole size
of which is calculated internally using IDATASIZE, 256, instead of
sizeof(i_inline), 128).

$ cd /mnt/jfs
$ touch $(perl -e 'print "B" x 250')
$ ln -s B* b
$ ls -l >/dev/null

[  249.436410] Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'jfs_ip' (offset 616, size 250)!

Reported-by: Bart Massey <bart.massey@gmail.com>
Fixes: 8d2704d382a9 ("jfs: Define usercopy region in jfs_ip slab cache")
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: jfs-discussion@lists.sourceforge.net
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
6 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Fri, 3 Aug 2018 20:43:59 +0000 (13:43 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Two vmx bugfixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: x86: vmx: fix vpid leak
  KVM: vmx: use local variable for current_vmptr when emulating VMPTRST

6 years agol2tp: fix missing refcount drop in pppol2tp_tunnel_ioctl()
Guillaume Nault [Fri, 3 Aug 2018 15:00:11 +0000 (17:00 +0200)]
l2tp: fix missing refcount drop in pppol2tp_tunnel_ioctl()

If 'session' is not NULL and is not a PPP pseudo-wire, then we fail to
drop the reference taken by l2tp_session_get().

Fixes: ecd012e45ab5 ("l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mlxsw-Fix-ACL-actions-error-condition-handling'
David S. Miller [Fri, 3 Aug 2018 19:28:02 +0000 (12:28 -0700)]
Merge branch 'mlxsw-Fix-ACL-actions-error-condition-handling'

Ido Schimmel says:

====================
mlxsw: Fix ACL actions error condition handling

Nir says:

Two issues were lately noticed within mlxsw ACL actions error condition
handling. The first patch deals with conflicting actions such as:

 # tc filter add dev swp49 parent ffff: \
   protocol ip pref 10 flower skip_sw dst_ip 192.168.101.1 \
   action goto chain 100 \
   action mirred egress redirect dev swp4

The second action will never execute, however SW model allows this
configuration, while the mlxsw driver cannot allow for it as it
implements actions in sets of up to three actions per set with a single
termination marking. Conflicting actions create a contradiction over
this single marking and thus cannot be configured. The fix replaces a
misplaced warning with an error code to be returned.

Patches 2-4 fix a condition of duplicate destruction of resources. Some
actions require allocation of specific resource prior to setting the
action itself. On error condition this resource was destroyed twice,
leading to a crash when using mirror action, and to a redundant
destruction in other cases, since for error condition rule destruction
also takes care of resource destruction. In order to fix this state a
symmetry in behavior is added and resource destruction also takes care
of removing the resource from rule's resource list.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: core_acl_flex_actions: Remove redundant mirror resource destruction
Nir Dotan [Fri, 3 Aug 2018 12:57:44 +0000 (15:57 +0300)]
mlxsw: core_acl_flex_actions: Remove redundant mirror resource destruction

In previous patch mlxsw_afa_resource_del() was added to avoid a duplicate
resource detruction scenario.
For mirror actions, such duplicate destruction leads to a crash as in:

 # tc qdisc add dev swp49 ingress
 # tc filter add dev swp49 parent ffff: \
   protocol ip chain 100 pref 10 \
   flower skip_sw dst_ip 192.168.101.1 action drop
 # tc filter add dev swp49 parent ffff: \
   protocol ip pref 10 \
   flower skip_sw dst_ip 192.168.101.1 action goto chain 100 \
   action mirred egress mirror dev swp4

Therefore add a call to mlxsw_afa_resource_del() in
mlxsw_afa_mirror_destroy() in order to clear that resource
from rule's resources.

Fixes: d0d13c1858a1 ("mlxsw: spectrum_acl: Add support for mirror action")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: core_acl_flex_actions: Remove redundant counter destruction
Nir Dotan [Fri, 3 Aug 2018 12:57:43 +0000 (15:57 +0300)]
mlxsw: core_acl_flex_actions: Remove redundant counter destruction

Each tc flower rule uses a hidden count action. As counter resource may
not be available due to limited HW resources, update _counter_create()
and _counter_destroy() pair to follow previously introduced symmetric
error condition handling, add a call to mlxsw_afa_resource_del() as part
of the counter resource destruction.

Fixes: c18c1e186ba8 ("mlxsw: core: Make counter index allocated inside the action append")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: core_acl_flex_actions: Remove redundant resource destruction
Nir Dotan [Fri, 3 Aug 2018 12:57:42 +0000 (15:57 +0300)]
mlxsw: core_acl_flex_actions: Remove redundant resource destruction

Some ACL actions require the allocation of a separate resource
prior to applying the action itself. When facing an error condition
during the setup phase of the action, resource should be destroyed.
For such actions the destruction was done twice which is dangerous
and lead to a potential crash.
The destruction took place first upon error on action setup phase
and then as the rule was destroyed.

The following sequence generated a crash:

 # tc qdisc add dev swp49 ingress
 # tc filter add dev swp49 parent ffff: \
   protocol ip chain 100 pref 10 \
   flower skip_sw dst_ip 192.168.101.1 action drop
 # tc filter add dev swp49 parent ffff: \
   protocol ip pref 10 \
   flower skip_sw dst_ip 192.168.101.1 action goto chain 100 \
   action mirred egress mirror dev swp4

Therefore add mlxsw_afa_resource_del() as a complement of
mlxsw_afa_resource_add() to add symmetry to resource_list membership
handling. Call this from mlxsw_afa_fwd_entry_ref_destroy() to make the
_fwd_entry_ref_create() and _fwd_entry_ref_destroy() pair of calls a
NOP.

Fixes: 140ce421217e ("mlxsw: core: Convert fwd_entry_ref list to be generic per-block resource list")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: core_acl_flex_actions: Return error for conflicting actions
Nir Dotan [Fri, 3 Aug 2018 12:57:41 +0000 (15:57 +0300)]
mlxsw: core_acl_flex_actions: Return error for conflicting actions

Spectrum switch ACL action set is built in groups of three actions
which may point to additional actions. A group holds a single record
which can be set as goto record for pointing at a following group
or can be set to mark the termination of the lookup. This is perfectly
adequate for handling a series of actions to be executed on a packet.
While the SW model allows configuration of conflicting actions
where it is clear that some actions will never execute, the mlxsw
driver must block such configurations as it creates a conflict
over the single terminate/goto record value.

For a conflicting actions configuration such as:

 # tc filter add dev swp49 parent ffff: \
   protocol ip pref 10 \
   flower skip_sw dst_ip 192.168.101.1 \
   action goto chain 100 \
   action mirred egress mirror dev swp4

Where it is clear that the last action will never execute, the
mlxsw driver was issuing a warning instead of returning an error.
Therefore replace that warning with an error for this specific
case.

Fixes: 4cda7d8d7098 ("mlxsw: core: Introduce flexible actions support")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Fri, 3 Aug 2018 17:49:47 +0000 (10:49 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma

Pull rdma fix from Jason Gunthorpe:
 "One bug for missing user input validation: refuse invalid port numbers
  in the modify_qp system call"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/uverbs: Expand primary and alt AV port checks

6 years agoMerge tag 'for-linus-20180803' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 3 Aug 2018 17:43:56 +0000 (10:43 -0700)]
Merge tag 'for-linus-20180803' of git://git.kernel.dk/linux-block

Pull block fix from Jens Axboe:
 "Just a single fix, from Ming, fixing a regression in this cycle where
  the busy tag iteration was changed to only calling the callback
  function for requests that are started. We really want all non-free
  requests.

  This fixes a boot regression on certain VM setups"

* tag 'for-linus-20180803' of git://git.kernel.dk/linux-block:
  blk-mq: fix blk_mq_tagset_busy_iter