Liu Bo [Thu, 26 Jan 2017 01:15:54 +0000 (17:15 -0800)]
Btrfs: cleanup unused cached_state in __extent_writepage_io
@cached_state is no more required in __extent_writepage_io, also remove
the goto label.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Wed, 25 Jan 2017 14:50:33 +0000 (09:50 -0500)]
btrfs: allow unlink to exceed subvolume quota
Once a qgroup limit is exceeded, it's impossible to restore normal
operation to the subvolume without modifying the limit or removing
the subvolume. This is a surprising situation for many users used
to the typical workflow with quotas on other file systems where it's
possible to remove files until the used space is back under the limit.
When we go to unlink a file and start the transaction, we'll hit
the qgroup limit while trying to reserve space for the items we'll
modify while removing the file. We discussed last month how best
to handle this situation and agreed that there is no perfect solution.
The best principle-of-least-surprise solution is to handle it similarly
to how we already handle ENOSPC when unlinking, which is to allow
the operation to succeed with the expectation that it will ultimately
release space under most circumstances.
This patch modifies the transaction start path to select whether to
honor the qgroups limits. btrfs_start_transaction_fallback_global_rsv
is the only caller that skips enforcement. The reservation and tracking
still happens normally -- it just skips the enforcement step.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Tue, 24 Jan 2017 23:58:51 +0000 (15:58 -0800)]
Btrfs: fix wrong argument for btrfs_lookup_ordered_range
Commit Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units"
(
d0b7da88) did this, but btrfs_lookup_ordered_range expects a 'length'
rather than a 'page_end'.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 16 Jan 2017 02:23:06 +0000 (10:23 +0800)]
btrfs: raid56: Remove unused variable in lock_stripe_add
Variable 'walk' in lock_stripe_add() is not used. Remove it.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Omar Sandoval [Wed, 18 Jan 2017 07:37:38 +0000 (23:37 -0800)]
Btrfs: refactor btrfs_extent_same() slightly
This was originally a prep patch for changing the behavior on len=0, but
we went another direction with that. This still makes the function
slightly easier to follow.
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Omar Sandoval [Wed, 18 Jan 2017 07:24:37 +0000 (23:24 -0800)]
Btrfs: constify struct btrfs_{,disk_}key wherever possible
In a lot of places, it's unclear when it's safe to reuse a struct
btrfs_key after it has been passed to a helper function. Constify these
arguments wherever possible to make it obvious.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Thu, 15 Dec 2016 06:36:05 +0000 (22:36 -0800)]
Btrfs: fix another race between truncate and lockless dio write
Dio writes can update i_size in btrfs_get_blocks_direct when it
writes to offset beyond EOF so that endio can update disk_i_size
correctly (because we don't udpate disk_i_size beyond i_size).
However, when truncating down a file, we firstly update i_size
and then wait for in-flight lockless dio reads/writes, according
to the above, i_size may have been changed in dio writes, and
file extents don't get truncated.
For lockless dio writes are always overwrites, i_size is not
supposed to be changed, so this adds a check to filter out this
case.
The race could be reproduced by fstests/generic/299 with patch
"Btrfs: fix btrfs_ordered_update_i_size to update disk_i_size properly"
applied.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Tue, 13 Dec 2016 20:51:51 +0000 (12:51 -0800)]
Btrfs: clean up btrfs_ordered_update_i_size
Since we have a good helper entry_end, use it for ordered extent.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ whitespace reformatting ]
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Tue, 13 Dec 2016 20:15:19 +0000 (12:15 -0800)]
Btrfs: fix comment in btrfs_page_mkwrite
The comment about "page_mkwrite gets called every time the page is
dirtied" in btrfs_page_mkwrite is not correct, it only gets called the
first time the page gets dirtied after the page faults in.
However, we don't need to touch the code because it works well, although
the proper logic is to check if delalloc bits has been set and if so, go
free reserved space, if not, set the delalloc bits for dirty page range.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Thu, 1 Dec 2016 21:01:02 +0000 (13:01 -0800)]
Btrfs: fix btrfs_ordered_update_i_size to update disk_i_size properly
btrfs_ordered_update_i_size can be called by truncate and endio, but
only endio takes ordered_extent which contains the completed IO.
while truncating down a file, if there are some in-flight IOs,
btrfs_ordered_update_i_size in endio will set disk_i_size to
@orig_offset that is zero. If truncating-down fails somehow, we try to
recover in memory isize with this zero'd disk_i_size.
Fix it by only updating disk_i_size with @orig_offset when
btrfs_ordered_update_i_size is not called from endio while truncating
down and waiting for in-flight IOs completing their work before recover
in-memory size.
Besides fixing the above issue, add an assertion for last_size to double
check we truncate down to the desired size.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 20 Jan 2017 13:54:07 +0000 (14:54 +0100)]
btrfs: fix over-80 lines introduced by previous cleanups
This goes as a separate patch because fixing that inside the patches
caused too many many conflicts.
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:50 +0000 (00:31 +0200)]
btrfs: Make count_inode_refs take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:49 +0000 (00:31 +0200)]
btrfs: Make count_inode_extrefs take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:48 +0000 (00:31 +0200)]
btrfs: Make btrfs_log_inode take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:47 +0000 (00:31 +0200)]
btrfs: Make log_inode_item take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:46 +0000 (00:31 +0200)]
btrfs: Make __add_inode_ref take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:45 +0000 (00:31 +0200)]
btrfs: Make drop_one_dir_item take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:44 +0000 (00:31 +0200)]
btrfs: Make btrfs_unlink_inode take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:43 +0000 (00:31 +0200)]
btrfs: Make log_new_dir_dentries take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:42 +0000 (00:31 +0200)]
btrfs: Make log_directory_changes take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:41 +0000 (00:31 +0200)]
btrfs: Make log_dir_items take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:40 +0000 (00:31 +0200)]
btrfs: Make btrfs_log_changed_extents take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:39 +0000 (00:31 +0200)]
btrfs: Make btrfs_get_logged_extents take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:38 +0000 (00:31 +0200)]
btrfs: Make btrfs_log_trailing_hole take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:37 +0000 (00:31 +0200)]
btrfs: Make btrfs_log_all_xattrs take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:36 +0000 (00:31 +0200)]
btrfs: Make copy_items take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:35 +0000 (00:31 +0200)]
btrfs: Make btrfs_check_ref_name_override take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:34 +0000 (00:31 +0200)]
btrfs: Make logged_inode_size take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:33 +0000 (00:31 +0200)]
btrfs: Make btrfs_del_inode_ref take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:32 +0000 (00:31 +0200)]
btrfs: Make btrfs_del_dir_entries_in_log take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:31 +0000 (00:31 +0200)]
btrfs: Make btrfs_log_new_name take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:30 +0000 (00:31 +0200)]
btrfs: Make btrfs_inode_in_log take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:29 +0000 (00:31 +0200)]
btrfs: Make btrfs_record_snapshot_destroy take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:28 +0000 (00:31 +0200)]
btrfs: Make btrfs_record_unlink_dir take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 17 Jan 2017 22:31:27 +0000 (00:31 +0200)]
btrfs: Make btrfs_must_commit_transaction take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 10 Jan 2017 18:35:42 +0000 (20:35 +0200)]
btrfs: Make btrfs_inode_delayed_dir_index_count take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 10 Jan 2017 18:35:41 +0000 (20:35 +0200)]
btrfs: Make btrfs_commit_inode_delayed_items take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 10 Jan 2017 18:35:40 +0000 (20:35 +0200)]
btrfs: Make btrfs_commit_inode_delayed_inode take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 10 Jan 2017 18:35:39 +0000 (20:35 +0200)]
btrfs: Make btrfs_remove_delayed_node take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 10 Jan 2017 18:35:38 +0000 (20:35 +0200)]
btrfs: Make btrfs_kill_delayed_inode_items take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 10 Jan 2017 18:35:37 +0000 (20:35 +0200)]
btrfs: Make btrfs_delayed_delete_inode_ref take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 10 Jan 2017 18:35:36 +0000 (20:35 +0200)]
btrfs: Make btrfs_delete_delayed_dir_index take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 10 Jan 2017 18:35:35 +0000 (20:35 +0200)]
btrfs: Make btrfs_insert_delayed_dir_index take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 10 Jan 2017 18:35:34 +0000 (20:35 +0200)]
btrfs: Make btrfs_delayed_inode_reserve_metadata take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 10 Jan 2017 18:35:33 +0000 (20:35 +0200)]
btrfs: Make btrfs_get_or_create_delayed_node take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 10 Jan 2017 18:35:32 +0000 (20:35 +0200)]
btrfs: Make btrfs_get_delayed_node take btrfs_inode
This function is internal to btrfs and doesn't really deal with any
VFS members, as such it needn't take a struct inode refrence but
btrfs_inode.
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 10 Jan 2017 18:35:31 +0000 (20:35 +0200)]
btrfs: Make btrfs_ino take a struct btrfs_inode
Currently btrfs_ino takes a struct inode and this causes a lot of
internal btrfs functions which consume this ino to take a VFS inode,
rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
of VFS structs into the internals of btrfs first it's necessary to
eliminate all uses of struct inode for the purpose of inode. This patch
does that by using BTRFS_I to convert an inode to btrfs_inode. With
this problem eliminated subsequent patches will start eliminating the
passing of struct inode altogether, eventually resulting in a lot cleaner
code.
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
[ fix btrfs_get_extent tracepoint prototype ]
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 4 Jan 2017 10:09:51 +0000 (11:09 +0100)]
btrfs: add wrapper for counting BTRFS_MAX_EXTENT_SIZE
The expression is open-coded in several places, this asks for a wrapper.
As we know the MAX_EXTENT fits to u32, we can use the appropirate
division helper. This cascades to the result type updates.
Compiler is clever enough to use shift instead of integer division, so
there's no change in the generated assembly.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 6 Jan 2017 12:27:00 +0000 (13:27 +0100)]
btrfs: remove unused logic of limiting async delalloc pages
A proposed patch in https://marc.info/?l=linux-btrfs&m=
147859791003837
pointed out bad limit threshold in cow_file_range_async, but it turned
out that the whole logic is not necessary and is done by writeback. We
agreed to remove it.
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Mon, 19 Dec 2016 11:09:06 +0000 (19:09 +0800)]
btrfs: consolidate auto defrag kick off policies
As of now writes smaller than 64k for non compressed extents and 16k
for compressed extents inside eof are considered as candidate
for auto defrag, put them together at a place.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Wed, 21 Dec 2016 07:42:08 +0000 (15:42 +0800)]
btrfs: btrfs_defrag_root() doesn't defrag extent root tree
Since btrfs_defrag_leaves() does not support extent_root, remove its
corresponding call. The user can use the file based defrag to defrag
extents as of now.
No change in behaviour as extent_root is explicitly skipped in
btrfs_defrag_leaves and this has never worked as expected.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ ehnance changelong ]
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Tue, 13 Dec 2016 19:39:34 +0000 (14:39 -0500)]
btrfs: drop unused extent_op arg from btrfs_add_delayed_data_ref
btrfs_add_delayed_data_ref is always called with a NULL extent_op,
so let's drop the argument.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Colin Ian King [Tue, 20 Dec 2016 16:18:37 +0000 (16:18 +0000)]
btrfs: remove redundant inode null check
The check for a null inode is redundant since the function
is a callback for exportfs, which will itself crash if
dentry->d_inode or parent->d_inode is NULL. Removing the
null check makes this consistent with other file systems.
Also remove the redundant null dir check too.
Found with static analysis by CoverityScan, CID 1389472
Kudos to Jeff Mahoney for reviewing and explaining the error in
my original patch (most of this explanation went into the above
commit message) and David Sterba for pointing out that the dir
check is also redundant.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Seraphime Kirkovski [Thu, 15 Dec 2016 13:38:16 +0000 (14:38 +0100)]
Btrfs: ACCESS_ONCE cleanup
This replaces ACCESS_ONCE macro with the corresponding
READ|WRITE macros
Signed-off-by: Seraphime Kirkovski <kirkseraph@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Seraphime Kirkovski [Thu, 15 Dec 2016 13:38:28 +0000 (14:38 +0100)]
Btrfs: code cleanup min/max -> min_t/max_t
This cleans up the cases where the min/max macros were used with a cast
rather than using directly min_t/max_t.
Signed-off-by: Seraphime Kirkovski <kirkseraph@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Geliang Tang [Mon, 19 Dec 2016 14:53:41 +0000 (22:53 +0800)]
btrfs: use rb_entry() instead of container_of
To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Tue, 6 Dec 2016 04:43:07 +0000 (12:43 +0800)]
btrfs: use BTRFS_COMPRESS_NONE to specify no compression
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Michal Hocko [Mon, 9 Jan 2017 14:39:03 +0000 (15:39 +0100)]
btrfs: drop gfp mask tweaking in try_release_extent_state
try_release_extent_state reduces the gfp mask to GFP_NOFS if it is
compatible. This is true for GFP_KERNEL as well. There is no real
reason to do that though. There is no new lock taken down the
the only consumer of the gfp mask which is
try_release_extent_state
clear_extent_bit
__clear_extent_bit
alloc_extent_state
So this seems just unnecessary and confusing.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Michal Hocko [Mon, 9 Jan 2017 14:39:02 +0000 (15:39 +0100)]
btrfs: fix up misleading GFP_NOFS usage in btrfs_releasepage
b335b0034e25 ("Btrfs: Avoid using __GFP_HIGHMEM with slab allocator")
has reduced the allocation mask in btrfs_releasepage to GFP_NOFS just
to prevent from giving an unappropriate gfp mask to the slab allocator
deeper down the callchain (in alloc_extent_state). This is wrong for
two reasons a) GFP_NOFS might be just too restrictive for the calling
context b) it is better to tweak the gfp mask down when it needs that.
So just remove the mask tweaking from btrfs_releasepage and move it
down to alloc_extent_state where it is needed.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Thu, 20 Oct 2016 02:28:41 +0000 (10:28 +0800)]
btrfs: Add WARN_ON for qgroup reserved underflow
Goldwyn Rodrigues has exposed and fixed a bug which underflows btrfs
qgroup reserved space, and leads to non-writable fs.
This reminds us that we don't have enough underflow check for qgroup
reserved space.
For underflow case, we should not really underflow the numbers but warn
and keeps qgroup still work.
So add more check on qgroup reserved space and add WARN_ON() and
btrfs_warn() for any underflow case.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Sun, 12 Feb 2017 21:03:20 +0000 (13:03 -0800)]
Linux 4.10-rc8
Linus Torvalds [Sat, 11 Feb 2017 18:31:46 +0000 (10:31 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Last minute x86 fixes:
- Fix a softlockup detector warning and long delays if using ptdump
with KASAN enabled.
- Two more TSC-adjust fixes for interesting firmware interactions.
- Two commits to fix an AMD CPU topology enumeration bug that caused
a measurable gaming performance regression"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/ptdump: Fix soft lockup in page table walker
x86/tsc: Make the TSC ADJUST sanitizing work for tsc_reliable
x86/tsc: Avoid the large time jump when sanitizing TSC ADJUST
x86/CPU/AMD: Fix Zen SMT topology
x86/CPU/AMD: Bring back Compute Unit ID
Linus Torvalds [Sat, 11 Feb 2017 18:24:16 +0000 (10:24 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Fix a sporadic missed timer hw reprogramming bug that can result in
random delays"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/nohz: Fix possible missing clock reprog after tick soft restart
Linus Torvalds [Sat, 11 Feb 2017 18:20:06 +0000 (10:20 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"A kernel crash fix plus three tooling fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix crash in perf_event_read()
perf callchain: Reference count maps
perf diff: Fix -o/--order option behavior (again)
perf diff: Fix segfault on 'perf diff -o N' option
Linus Torvalds [Sat, 11 Feb 2017 18:16:05 +0000 (10:16 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull lockdep fix from Ingo Molnar:
"This fixes an ugly lockdep stack trace output regression. (But also
affects other stacktrace users such as kmemleak, KASAN, etc)"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stacktrace, lockdep: Fix address, newline ugliness
Linus Torvalds [Sat, 11 Feb 2017 18:14:24 +0000 (10:14 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"Two last minute ARM irqchip driver fixes"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mxs: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
irqchip/keystone: Fix "scheduling while atomic" on rt
Linus Torvalds [Sat, 11 Feb 2017 17:15:58 +0000 (09:15 -0800)]
Merge branch 'for-linus-4.10' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"This has two last minute fixes. The highest priority here is a
regression fix for the decompression code, but we also fixed up a
problem with the 32-bit compat ioctls.
The decompression bug could hand back the wrong data on big reads when
zlib was used. I have a larger cleanup to make the math here less
error prone, but at this stage in the release Omar's patch is the best
choice"
* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: fix btrfs_decompress_buf2page()
btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls
Linus Torvalds [Sat, 11 Feb 2017 17:01:03 +0000 (09:01 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Six fairly small fixes. None is a real show stopper, two automation
detected problems: one memory leak, one use after free and four others
each of which fixes something that has been a significant source of
annoyance to someone"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed send
scsi: aacraid: Fix INTx/MSI-x issue with older controllers
scsi: mpt3sas: disable ASPM for MPI2 controllers
scsi: mpt3sas: Force request partial completion alignment
scsi: qla2xxx: Avoid that issuing a LIP triggers a kernel crash
scsi: qla2xxx: Fix a recently introduced memory leak
Omar Sandoval [Fri, 10 Feb 2017 23:03:35 +0000 (15:03 -0800)]
Btrfs: fix btrfs_decompress_buf2page()
If btrfs_decompress_buf2page() is handed a bio with its page in the
middle of the working buffer, then we adjust the offset into the working
buffer. After we copy into the bio, we advance the iterator by the
number of bytes we copied. Then, we have some logic to handle the case
of discontiguous pages and adjust the offset into the working buffer
again. However, if we didn't advance the bio to a new page, we may enter
this case in error, essentially repeating the adjustment that we already
made when we entered the function. The end result is bogus data in the
bio.
Previously, we only checked for this case when we advanced to a new
page, but the conversion to bio iterators changed that. This restores
the old, correct behavior.
A case I saw when testing with zlib was:
buf_start = 42769
total_out = 46865
working_bytes = total_out - buf_start = 4096
start_byte = 45056
The condition (total_out > start_byte && buf_start < start_byte) is
true, so we adjust the offset:
buf_offset = start_byte - buf_start = 2287
working_bytes -= buf_offset = 1809
current_buf_start = buf_start = 42769
Then, we copy
bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809
buf_offset += bytes = 4096
working_bytes -= bytes = 0
current_buf_start += bytes = 44578
After bio_advance(), we are still in the same page, so start_byte is the
same. Then, we check (total_out > start_byte && current_buf_start < start_byte),
which is true! So, we adjust the values again:
buf_offset = start_byte - buf_start = 2287
working_bytes = total_out - start_byte = 1809
current_buf_start = buf_start + buf_offset = 45056
But note that working_bytes was already zero before this, so we should
have stopped copying.
Fixes:
974b1adc3b10 ("btrfs: use bio iterators for the decompression handlers")
Reported-by: Pat Erley <pat-lkml@erley.org>
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Tested-by: Liu Bo <bo.li.liu@oracle.com>
Linus Torvalds [Fri, 10 Feb 2017 22:44:49 +0000 (14:44 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) If the timing is wrong we can indefinitely stop generating new ipv6
temporary addresses, from Marcus Huewe.
2) Don't double free per-cpu stats in ipv6 SIT tunnel driver, from Cong
Wang.
3) Put protections in place so that AF_PACKET is not able to submit
packets which don't even have a link level header to drivers. From
Willem de Bruijn.
4) Fix memory leaks in ipv4 and ipv6 multicast code, from Hangbin Liu.
5) Don't use udp_ioctl() in l2tp code, UDP version expects a UDP socket
and that doesn't go over very well when it is passed an L2TP one.
Fix from Eric Dumazet.
6) Don't crash on NULL pointer in phy_attach_direct(), from Florian
Fainelli.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
l2tp: do not use udp_ioctl()
xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend()
NET: mkiss: Fix panic
net: hns: Fix the device being used for dma mapping during TX
net: phy: Initialize mdio clock at probe function
igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()
xen-netfront: Improve error handling during initialization
sierra_net: Skip validating irrelevant fields for IDLE LSIs
sierra_net: Add support for IPv6 and Dual-Stack Link Sense Indications
kcm: fix 0-length case for kcm_sendmsg()
xen-netfront: Rework the fix for Rx stall during OOM and network stress
net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()
net: thunderx: Fix PHY autoneg for SGMII QLM mode
net: dsa: Do not destroy invalid network devices
ping: fix a null pointer dereference
packet: round up linear to header len
net: introduce device min_header_len
sit: fix a double free on error path
lwtunnel: valid encap attr check should return 0 when lwtunnel is disabled
ipv6: addrconf: fix generation of new temporary addresses
Linus Torvalds [Fri, 10 Feb 2017 22:41:16 +0000 (14:41 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Third round of -rc fixes for 4.10 kernel:
- two security related issues in the rxe driver
- one compile issue in the RDMA uapi header"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
RDMA: Don't reference kernel private header from UAPI header
IB/rxe: Fix mem_check_range integer overflow
IB/rxe: Fix resid update
Linus Torvalds [Fri, 10 Feb 2017 22:39:08 +0000 (14:39 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c bugfixes from Wolfram Sang:
"Two bugfixes (proper IO mapping and use of mutex) for a driver feature
we introduced in this cycle"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: piix4: Request the SMBUS semaphore inside the mutex
i2c: piix4: Fix request_region size
Linus Torvalds [Fri, 10 Feb 2017 22:35:22 +0000 (14:35 -0800)]
Merge tag 'mmc-v4.10-rc7' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC host fix from Ulf Hansson:
"mmci: Fix hang while waiting for busy-end interrupt"
* tag 'mmc-v4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: mmci: avoid clearing ST Micro busy end interrupt mistakenly
Linus Torvalds [Fri, 10 Feb 2017 22:29:30 +0000 (14:29 -0800)]
Merge tag 'sound-4.10' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Here are some last-minute fixes: two fixes for races in ALSA sequencer
queue spotted by syzkaller, a revert for a regression of LINE6 driver
(since 4.9), and a trivial new codec ID addition for Nvidia HDMI"
* tag 'sound-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - adding a new NV HDMI/DP codec ID in the driver
ALSA: seq: Fix race at creating a queue
Revert "ALSA: line6: Only determine control port properties if needed"
ALSA: seq: Don't handle loop timeout at snd_seq_pool_done()
Linus Torvalds [Fri, 10 Feb 2017 22:23:45 +0000 (14:23 -0800)]
Merge tag 'nfsd-4.10-3' of git://linux-nfs.org/~bfields/linux
Pull nfsd revert from Bruce Fields:
"This patch turned out to have a couple problems. The problems are
fixable, but at least one of the fixes is a little ugly. The original
bug has always been there, so we can wait another week or two to get
this right"
* tag 'nfsd-4.10-3' of git://linux-nfs.org/~bfields/linux:
nfsd: Revert "nfsd: special case truncates some more"
Linus Torvalds [Fri, 10 Feb 2017 22:10:35 +0000 (14:10 -0800)]
Merge tag 'powerpc-4.10-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes friom Michael Ellerman:
"Apologies for the late pull request, but Ben has been busy finding bugs.
- Userspace was semi-randomly segfaulting on radix due to us
incorrectly handling a fault triggered by autonuma, caused by a
patch we merged earlier in v4.10 to prevent the kernel executing
userspace.
- We weren't marking host IPIs properly for KVM in the OPAL ICP
backend.
- The ERAT flushing on radix was missing an isync and was incorrectly
marked as DD1 only.
- The powernv CPU hotplug code was missing a wakeup type and failing
to flush the interrupt correctly when using OPAL ICP
Thanks to Benjamin Herrenschmidt"
* tag 'powerpc-4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv: Properly set "host-ipi" on IPIs
powerpc/powernv: Fix CPU hotplug to handle waking on HVI
powerpc/mm/radix: Update ERAT flushes when invalidating TLB
powerpc/mm: Fix spurrious segfaults on radix with autonuma
Eric Dumazet [Fri, 10 Feb 2017 00:15:52 +0000 (16:15 -0800)]
l2tp: do not use udp_ioctl()
udp_ioctl(), as its name suggests, is used by UDP protocols,
but is also used by L2TP :(
L2TP should use its own handler, because it really does not
look the same.
SIOCINQ for instance should not assume UDP checksum or headers.
Thanks to Andrey and syzkaller team for providing the report
and a nice reproducer.
While crashes only happen on recent kernels (after commit
7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")), this
probably needs to be backported to older kernels.
Fixes:
7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")
Fixes:
85584672012e ("udp: Fix udp_poll() and ioctl()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Mason [Fri, 10 Feb 2017 20:53:18 +0000 (12:53 -0800)]
Merge branch 'for-chris' of git://git./linux/kernel/git/kdave/linux into for-linus-4.10
Boris Ostrovsky [Mon, 30 Jan 2017 17:45:46 +0000 (12:45 -0500)]
xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend()
rx_refill_timer should be deleted as soon as we disconnect from the
backend since otherwise it is possible for the timer to go off before
we get to xennet_destroy_queues(). If this happens we may dereference
queue->rx.sring which is set to NULL in xennet_disconnect_backend().
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: stable@vger.kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ralf Baechle [Thu, 9 Feb 2017 13:12:11 +0000 (14:12 +0100)]
NET: mkiss: Fix panic
If a USB-to-serial adapter is unplugged, the driver re-initializes, with
dev->hard_header_len and dev->addr_len set to zero, instead of the correct
values. If then a packet is sent through the half-dead interface, the
kernel will panic due to running out of headroom in the skb when pushing
for the AX.25 headers resulting in this panic:
[<
c0595468>] (skb_panic) from [<
c0401f70>] (skb_push+0x4c/0x50)
[<
c0401f70>] (skb_push) from [<
bf0bdad4>] (ax25_hard_header+0x34/0xf4 [ax25])
[<
bf0bdad4>] (ax25_hard_header [ax25]) from [<
bf0d05d4>] (ax_header+0x38/0x40 [mkiss])
[<
bf0d05d4>] (ax_header [mkiss]) from [<
c041b584>] (neigh_compat_output+0x8c/0xd8)
[<
c041b584>] (neigh_compat_output) from [<
c043e7a8>] (ip_finish_output+0x2a0/0x914)
[<
c043e7a8>] (ip_finish_output) from [<
c043f948>] (ip_output+0xd8/0xf0)
[<
c043f948>] (ip_output) from [<
c043f04c>] (ip_local_out_sk+0x44/0x48)
This patch makes mkiss behave like the 6pack driver. 6pack does not
panic. In 6pack.c sp_setup() (same function name here) the values for
dev->hard_header_len and dev->addr_len are set to the same values as in
my mkiss patch.
[ralf@linux-mips.org: Massages original submission to conform to the usual
standards for patch submissions.]
Signed-off-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kejian Yan [Thu, 9 Feb 2017 11:46:15 +0000 (11:46 +0000)]
net: hns: Fix the device being used for dma mapping during TX
This patch fixes the device being used to DMA map skb->data.
Erroneous device assignment causes the crash when SMMU is enabled.
This happens during TX since buffer gets DMA mapped with device
correspondign to net_device and gets unmapped using the device
related to DSAF.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Gleixner [Fri, 10 Feb 2017 13:44:01 +0000 (14:44 +0100)]
Merge tag 'irqchip-fixes-4.10' of git://git.infradead.org/users/jcooper/linux into irq/urgent
Pull irqchip fixes for v4.10 from Jason Cooper
- keystone: Fix scheduling while atomic for realtime
- mxs: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
Andrey Ryabinin [Fri, 10 Feb 2017 09:54:05 +0000 (12:54 +0300)]
x86/mm/ptdump: Fix soft lockup in page table walker
CONFIG_KASAN=y needs a lot of virtual memory mapped for its shadow.
In that case ptdump_walk_pgd_level_core() takes a lot of time to
walk across all page tables and doing this without
a rescheduling causes soft lockups:
NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [swapper/0:1]
...
Call Trace:
ptdump_walk_pgd_level_core+0x40c/0x550
ptdump_walk_pgd_level_checkwx+0x17/0x20
mark_rodata_ro+0x13b/0x150
kernel_init+0x2f/0x120
ret_from_fork+0x2c/0x40
I guess that this issue might arise even without KASAN on huge machines
with several terabytes of RAM.
Stick cond_resched() in pgd loop to fix this.
Reported-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Alexander Potapenko <glider@google.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170210095405.31802-1-aryabinin@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Thu, 9 Feb 2017 15:08:42 +0000 (16:08 +0100)]
x86/tsc: Make the TSC ADJUST sanitizing work for tsc_reliable
When the TSC is marked reliable then the synchronization check is skipped,
but that also skips the TSC ADJUST sanitizing code. So on a machine with a
wreckaged BIOS the TSC deviation between CPUs might go unnoticed.
Let the TSC adjust sanitizing code run unconditionally and just skip the
expensive synchronization checks when TSC is marked reliable.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Olof Johansson <olof@lixom.net>
Link: http://lkml.kernel.org/r/20170209151231.491189912@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Thu, 9 Feb 2017 15:08:41 +0000 (16:08 +0100)]
x86/tsc: Avoid the large time jump when sanitizing TSC ADJUST
Olof reported that on a machine which has a BIOS wreckaged TSC the
timestamps in dmesg are making a large jump because the TSC value is
jumping forward after resetting the TSC ADJUST register to a sane value.
This can be avoided by calling the TSC ADJUST saniziting function before
initializing the per cpu sched clock machinery. That takes the offset into
account and avoid the time jump.
What cannot be avoided is that the 'Firmware Bug' warnings on the secondary
CPUs are printed with the large time offsets because it would be too much
effort and ugly hackery to print those warnings into a buffer and emit them
after the adjustemt on the starting CPUs. It's a firmware bug and should be
fixed in firmware. The weird timestamps are collateral damage and just
illustrate the sillyness of the BIOS folks:
[ 0.397445] smp: Bringing up secondary CPUs ...
[ 0.402100] x86: Booting SMP configuration:
[ 0.406343] .... node #0, CPUs: #1
[
1265776479.930667] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -
2978888639075328 CPU1: -
2978888639183101
[
1265776479.944664] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -
2978888639183101
[ 0.508119] #2
[
1265776480.032346] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -
2978888639075328 CPU2: -
2978888639183677
[
1265776480.044192] TSC ADJUST synchronize: Reference CPU0: 0 CPU2: -
2978888639183677
[ 0.607643] #3
[
1265776480.131874] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -
2978888639075328 CPU3: -
2978888639184530
[
1265776480.143720] TSC ADJUST synchronize: Reference CPU0: 0 CPU3: -
2978888639184530
[ 0.707108] smp: Brought up 1 node, 4 CPUs
[ 0.711271] smpboot: Total of 4 processors activated (21698.88 BogoMIPS)
Reported-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170209151231.411460506@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Frederic Weisbecker [Tue, 7 Feb 2017 16:44:54 +0000 (17:44 +0100)]
tick/nohz: Fix possible missing clock reprog after tick soft restart
ts->next_tick keeps track of the next tick deadline in order to optimize
clock programmation on irq exit and avoid redundant clock device writes.
Now if ts->next_tick missed an update, we may spuriously miss a clock
reprog later as the nohz code is fooled by an obsolete next_tick value.
This is what happens here on a specific path: when we observe an
expired timer from the nohz update code on irq exit, we perform a soft
tick restart which simply fires the closest possible tick without
actually exiting the nohz mode and restoring a periodic state. But we
forget to update ts->next_tick accordingly.
As a result, after the next tick resulting from such soft tick restart,
the nohz code sees a stale value on ts->next_tick which doesn't match
the clock deadline that just expired. If that obsolete ts->next_tick
value happens to collide with the actual next tick deadline to be
scheduled, we may spuriously bypass the clock reprogramming. In the
worst case, the tick may never fire again.
Fix this with a ts->next_tick reset on soft tick restart.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1486485894-29173-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Peter Zijlstra [Tue, 31 Jan 2017 10:27:10 +0000 (11:27 +0100)]
perf/core: Fix crash in perf_event_read()
Alexei had his box explode because doing read() on a package
(rapl/uncore) event that isn't currently scheduled in ends up doing an
out-of-bounds load.
Rework the code to more explicitly deal with event->oncpu being -1.
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Fixes:
d6a2f9035bfc ("perf/core: Introduce PMU_EV_CAP_READ_ACTIVE_PKG")
Link: http://lkml.kernel.org/r/20170131102710.GL6515@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
James Bottomley [Fri, 10 Feb 2017 05:00:46 +0000 (21:00 -0800)]
Merge remote-tracking branch 'mkp-scsi/4.10/scsi-fixes' into fixes
J. Bruce Fields [Thu, 9 Feb 2017 19:20:42 +0000 (14:20 -0500)]
nfsd: Revert "nfsd: special case truncates some more"
This patch incorrectly attempted nested mnt_want_write, and incorrectly
disabled nfsd's owner override for truncate. We'll fix those problems
and make another attempt soon, for the moment I think the safest is to
revert.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Linus Torvalds [Fri, 10 Feb 2017 01:46:30 +0000 (17:46 -0800)]
Merge tag 'drm-fixes-for-v4.10-rc8' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This should be the final set of drm fixes for 4.10: one vmwgfx boot
fix, one vc4 fix, and a few i915 fixes:
* tag 'drm-fixes-for-v4.10-rc8' of git://people.freedesktop.org/~airlied/linux:
drm: vc4: adapt to new behaviour of drm_crtc.c
drm/i915: Always convert incoming exec offsets to non-canonical
drm/i915: Remove overzealous fence warn on runtime suspend
drm/i915/bxt: Add MST support when do DPLL calculation
drm/i915: don't warn about Skylake CPU - KabyPoint PCH combo
drm/i915: fix i915 running as dom0 under Xen
drm/i915: Flush untouched framebuffers before display on !llc
drm/i915: fix use-after-free in page_flip_completed()
drm/vmwgfx: Fix depth input into drm_mode_legacy_fb_format
Steffen Maier [Wed, 8 Feb 2017 14:34:22 +0000 (15:34 +0100)]
scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed send
Dan Carpenter kindly reported:
<quote>
The patch
d27a7cb91960: "zfcp: trace on request for open and close of
WKA port" from Aug 10, 2016, leads to the following static checker
warning:
drivers/s390/scsi/zfcp_fsf.c:1615 zfcp_fsf_open_wka_port()
warn: 'req' was already freed.
drivers/s390/scsi/zfcp_fsf.c
1609 zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
1610 retval = zfcp_fsf_req_send(req);
1611 if (retval)
1612 zfcp_fsf_req_free(req);
^^^
Freed.
1613 out:
1614 spin_unlock_irq(&qdio->req_q_lock);
1615 if (req && !IS_ERR(req))
1616 zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
^^^^^^^^^^^
Use after free.
1617 return retval;
1618 }
Same thing for zfcp_fsf_close_wka_port() as well.
</quote>
Rather than relying on req being NULL (or ERR_PTR) for all cases where
we don't want to trace or should not trace,
simply check retval which is unconditionally initialized with -EIO != 0
and it can only become 0 on successful retval = zfcp_fsf_req_send(req).
With that we can also remove the then again unnecessary unconditional
initialization of req which was introduced with that earlier commit.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes:
d27a7cb91960 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dave Carroll [Thu, 9 Feb 2017 18:04:47 +0000 (11:04 -0700)]
scsi: aacraid: Fix INTx/MSI-x issue with older controllers
commit
78cbccd3bd68 ("aacraid: Fix for KDUMP driver hang")
caused a problem on older controllers which do not support MSI-x (namely
ASR3405,ASR3805). This patch conditionalizes the previous patch to
controllers which support MSI-x
Cc: <stable@vger.kernel.org> # v4.7+
Fixes:
78cbccd3bd68 ("aacraid: Fix for KDUMP driver hang")
Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dave Airlie [Fri, 10 Feb 2017 00:14:24 +0000 (10:14 +1000)]
Merge tag 'drm-intel-fixes-2017-02-09' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes
Hopefully final fixes for v4.10, about half of them stable material.
* tag 'drm-intel-fixes-2017-02-09' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: Always convert incoming exec offsets to non-canonical
drm/i915: Remove overzealous fence warn on runtime suspend
drm/i915/bxt: Add MST support when do DPLL calculation
drm/i915: don't warn about Skylake CPU - KabyPoint PCH combo
drm/i915: fix i915 running as dom0 under Xen
drm/i915: Flush untouched framebuffers before display on !llc
drm/i915: fix use-after-free in page_flip_completed()
Dave Airlie [Fri, 10 Feb 2017 00:14:01 +0000 (10:14 +1000)]
Merge tag 'drm-misc-fixes-2017-02-09' of git://anongit.freedesktop.org/git/drm-misc into drm-fixes
Last-minute vc4 fix for 4.10.
* tag 'drm-misc-fixes-2017-02-09' of git://anongit.freedesktop.org/git/drm-misc:
drm: vc4: adapt to new behaviour of drm_crtc.c
ojab [Wed, 28 Dec 2016 11:05:24 +0000 (11:05 +0000)]
scsi: mpt3sas: disable ASPM for MPI2 controllers
MPI2 controllers sometimes got lost (i.e. disappear from
/sys/bus/pci/devices) if ASMP is enabled.
Signed-off-by: Slava Kardakov <ojab@ojab.ru>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=60644
Cc: <stable@vger.kernel.org>
Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Yendapally Reddy Dhananjaya Reddy [Wed, 8 Feb 2017 22:14:26 +0000 (17:14 -0500)]
net: phy: Initialize mdio clock at probe function
USB PHYs need the MDIO clock divisor enabled earlier to work.
Initialize mdio clock divisor in probe function. The ext bus
bit available in the same register will be used by mdio mux
to enable external mdio.
Signed-off-by: Yendapally Reddy Dhananjaya Reddy <yendapally.reddy@broadcom.com>
Fixes:
ddc24ae1 ("net: phy: Broadcom iProc MDIO bus driver")
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Wed, 8 Feb 2017 13:16:45 +0000 (21:16 +0800)]
igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()
In function igmpv3/mld_add_delrec() we allocate pmc and put it in
idev->mc_tomb, so we should free it when we don't need it in del_delrec().
But I removed kfree(pmc) incorrectly in latest two patches. Now fix it.
Fixes:
24803f38a5c0 ("igmp: do not remove igmp souce list info when ...")
Fixes:
1666d49e1d41 ("mld: do not remove mld souce list info when ...")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ross Lagerwall [Wed, 8 Feb 2017 10:57:37 +0000 (10:57 +0000)]
xen-netfront: Improve error handling during initialization
This fixes a crash when running out of grant refs when creating many
queues across many netdevs.
* If creating queues fails (i.e. there are no grant refs available),
call xenbus_dev_fatal() to ensure that the xenbus device is set to the
closed state.
* If no queues are created, don't call xennet_disconnect_backend as
netdev->real_num_tx_queues will not have been set correctly.
* If setup_netfront() fails, ensure that all the queues created are
cleaned up, not just those that have been set up.
* If any queues were set up and an error occurs, call
xennet_destroy_queues() to clean up the napi context.
* If any fatal error occurs, unregister and destroy the netdev to avoid
leaving around a half setup network device.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 9 Feb 2017 21:41:43 +0000 (16:41 -0500)]
Merge branch 'sierra_net-fixes'
Stefan Brüns says:
====================
Fixes for sierra_net driver
When trying to initiate a dual-stack (ipv4v6) connection, a MC7710, FW
version SWI9200X_03.05.24.00ap answers with an unsupported LSI. Add support
for this LSI.
Also the link_type should be ignored when going idle, otherwise the modem
is stuck in a bad link state.
Tested on MC7710, T-Mobile DE, APN internet.telekom, IPv4v6 PDP type. Both
IPv4 and IPv6 connections work.
v2: Do not overwrite protocol field in rx_fixup
v3: Remove leftover struct ethhdr *eth declaration
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Brüns [Wed, 8 Feb 2017 01:46:33 +0000 (02:46 +0100)]
sierra_net: Skip validating irrelevant fields for IDLE LSIs
When the context is deactivated, the link_type is set to 0xff, which
triggers a warning message, and results in a wrong link status, as
the LSI is ignored.
Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>