review.tizen.org Git - platform/kernel/linux-exynos.git/log

projects / platform / kernel / linux-exynos.git / log

Qu Wenruo [Tue, 15 Dec 2015 01:14:36 +0000 (09:14 +0800)]

btrfs: Enhance super validation check

Enhance btrfs_check_super_valid() function by the following points:
1) Restrict sector/node size check
   Not the old max/min valid check, but also check if it's a power of 2.
   So some bogus number like 12K node size won't pass now.

2) Super flag check
   For now, there is still some inconsistency between kernel and
   btrfs-progs super flags.
   And considering btrfs-progs may add new flags for super block, this
   check will only output warning.

3) Better root alignment check
   Now root bytenr is checked against sector size.

4) Move some check into btrfs_check_super_valid().
   Like node size vs leaf size check, and PAGESIZE vs sectorsize check.
   And magic number check.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>

commit | commitdiff | tree

Filipe Manana [Fri, 15 Jan 2016 11:05:12 +0000 (11:05 +0000)]

Btrfs: fix deadlock running delayed iputs at transaction commit time

While running a stress test I ran into a deadlock when running the delayed
iputs at transaction time, which produced the following report and trace:

[  886.399989] =============================================
[  886.400871] [ INFO: possible recursive locking detected ]
[  886.401663] 4.4.0-rc6-btrfs-next-18+ #1 Not tainted
[  886.402384] ---------------------------------------------
[  886.403182] fio/8277 is trying to acquire lock:
[  886.403568]  (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
[  886.403568]
[  886.403568] but task is already holding lock:
[  886.403568]  (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
[  886.403568]
[  886.403568] other info that might help us debug this:
[  886.403568]  Possible unsafe locking scenario:
[  886.403568]
[  886.403568]        CPU0
[  886.403568]        ----
[  886.403568]   lock(&fs_info->delayed_iput_sem);
[  886.403568]   lock(&fs_info->delayed_iput_sem);
[  886.403568]
[  886.403568]  *** DEADLOCK ***
[  886.403568]
[  886.403568]  May be due to missing lock nesting notation
[  886.403568]
[  886.403568] 3 locks held by fio/8277:
[  886.403568]  #0:  (sb_writers#11){.+.+.+}, at: [<ffffffff81174c4c>] __sb_start_write+0x5f/0xb0
[  886.403568]  #1:  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa054620d>] btrfs_file_write_iter+0x73/0x408 [btrfs]
[  886.403568]  #2:  (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
[  886.403568]
[  886.403568] stack backtrace:
[  886.403568] CPU: 6 PID: 8277 Comm: fio Not tainted 4.4.0-rc6-btrfs-next-18+ #1
[  886.403568] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[  886.403568]  0000000000000000 ffff88009f80f770 ffffffff8125d4fd ffffffff82af1fc0
[  886.403568]  ffff88009f80f830 ffffffff8108e5f9 0000000200000000 ffff88009fd92290
[  886.403568]  0000000000000000 ffffffff82af1fc0 ffffffff829cfb01 00042b216d008804
[  886.403568] Call Trace:
[  886.403568]  [<ffffffff8125d4fd>] dump_stack+0x4e/0x79
[  886.403568]  [<ffffffff8108e5f9>] __lock_acquire+0xd42/0xf0b
[  886.403568]  [<ffffffff810c22db>] ? __module_address+0xdf/0x108
[  886.403568]  [<ffffffff8108eb77>] lock_acquire+0x10d/0x194
[  886.403568]  [<ffffffff8108eb77>] ? lock_acquire+0x10d/0x194
[  886.403568]  [<ffffffffa0538823>] ? btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
[  886.489542]  [<ffffffff8148556b>] down_read+0x3e/0x4d
[  886.489542]  [<ffffffffa0538823>] ? btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
[  886.489542]  [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
[  886.489542]  [<ffffffffa0533953>] btrfs_commit_transaction+0x8f5/0x96e [btrfs]
[  886.489542]  [<ffffffffa0521d7a>] flush_space+0x435/0x44a [btrfs]
[  886.489542]  [<ffffffffa052218b>] ? reserve_metadata_bytes+0x26a/0x384 [btrfs]
[  886.489542]  [<ffffffffa05221ae>] reserve_metadata_bytes+0x28d/0x384 [btrfs]
[  886.489542]  [<ffffffffa052256c>] ? btrfs_block_rsv_refill+0x58/0x96 [btrfs]
[  886.489542]  [<ffffffffa0522584>] btrfs_block_rsv_refill+0x70/0x96 [btrfs]
[  886.489542]  [<ffffffffa053d747>] btrfs_evict_inode+0x394/0x55a [btrfs]
[  886.489542]  [<ffffffff81188e31>] evict+0xa7/0x15c
[  886.489542]  [<ffffffff81189878>] iput+0x1d3/0x266
[  886.489542]  [<ffffffffa053887c>] btrfs_run_delayed_iputs+0x8f/0xbf [btrfs]
[  886.489542]  [<ffffffffa0533953>] btrfs_commit_transaction+0x8f5/0x96e [btrfs]
[  886.489542]  [<ffffffff81085096>] ? signal_pending_state+0x31/0x31
[  886.489542]  [<ffffffffa0521191>] btrfs_alloc_data_chunk_ondemand+0x1d7/0x288 [btrfs]
[  886.489542]  [<ffffffffa0521282>] btrfs_check_data_free_space+0x40/0x59 [btrfs]
[  886.489542]  [<ffffffffa05228f5>] btrfs_delalloc_reserve_space+0x1e/0x4e [btrfs]
[  886.489542]  [<ffffffffa053620a>] btrfs_direct_IO+0x10c/0x27e [btrfs]
[  886.489542]  [<ffffffff8111d9a1>] generic_file_direct_write+0xb3/0x128
[  886.489542]  [<ffffffffa05463c3>] btrfs_file_write_iter+0x229/0x408 [btrfs]
[  886.489542]  [<ffffffff8108ae38>] ? __lock_is_held+0x38/0x50
[  886.489542]  [<ffffffff8117279e>] __vfs_write+0x7c/0xa5
[  886.489542]  [<ffffffff81172cda>] vfs_write+0xa0/0xe4
[  886.489542]  [<ffffffff811734cc>] SyS_write+0x50/0x7e
[  886.489542]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[ 1081.852335] INFO: task fio:8244 blocked for more than 120 seconds.
[ 1081.854348]       Not tainted 4.4.0-rc6-btrfs-next-18+ #1
[ 1081.857560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1081.863227] fio        D ffff880213f9bb28     0  8244   8240 0x00000000
[ 1081.868719]  ffff880213f9bb28 00ffffff810fc6b0 ffffffff0000000a ffff88023ed55240
[ 1081.872499]  ffff880206b5d400 ffff880213f9c000 ffff88020a4d5318 ffff880206b5d400
[ 1081.876834]  ffffffff00000001 ffff880206b5d400 ffff880213f9bb40 ffffffff81482ba4
[ 1081.880782] Call Trace:
[ 1081.881793]  [<ffffffff81482ba4>] schedule+0x7f/0x97
[ 1081.883340]  [<ffffffff81485eb5>] rwsem_down_write_failed+0x2d5/0x325
[ 1081.895525]  [<ffffffff8108d48d>] ? trace_hardirqs_on_caller+0x16/0x1ab
[ 1081.897419]  [<ffffffff81269723>] call_rwsem_down_write_failed+0x13/0x20
[ 1081.899251]  [<ffffffff81269723>] ? call_rwsem_down_write_failed+0x13/0x20
[ 1081.901063]  [<ffffffff81089fae>] ? __down_write_nested.isra.0+0x1f/0x21
[ 1081.902365]  [<ffffffff814855bd>] down_write+0x43/0x57
[ 1081.903846]  [<ffffffffa05211b0>] ? btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
[ 1081.906078]  [<ffffffffa05211b0>] btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
[ 1081.908846]  [<ffffffff8108d461>] ? mark_held_locks+0x56/0x6c
[ 1081.910409]  [<ffffffffa0521282>] btrfs_check_data_free_space+0x40/0x59 [btrfs]
[ 1081.912482]  [<ffffffffa05228f5>] btrfs_delalloc_reserve_space+0x1e/0x4e [btrfs]
[ 1081.914597]  [<ffffffffa053620a>] btrfs_direct_IO+0x10c/0x27e [btrfs]
[ 1081.919037]  [<ffffffff8111d9a1>] generic_file_direct_write+0xb3/0x128
[ 1081.920754]  [<ffffffffa05463c3>] btrfs_file_write_iter+0x229/0x408 [btrfs]
[ 1081.922496]  [<ffffffff8108ae38>] ? __lock_is_held+0x38/0x50
[ 1081.923922]  [<ffffffff8117279e>] __vfs_write+0x7c/0xa5
[ 1081.925275]  [<ffffffff81172cda>] vfs_write+0xa0/0xe4
[ 1081.926584]  [<ffffffff811734cc>] SyS_write+0x50/0x7e
[ 1081.927968]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[ 1081.985293] INFO: lockdep is turned off.
[ 1081.986132] INFO: task fio:8249 blocked for more than 120 seconds.
[ 1081.987434]       Not tainted 4.4.0-rc6-btrfs-next-18+ #1
[ 1081.988534] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1081.990147] fio        D ffff880218febbb8     0  8249   8240 0x00000000
[ 1081.991626]  ffff880218febbb8 00ffffff81486b8e ffff88020000000b ffff88023ed75240
[ 1081.993258]  ffff8802120a9a00 ffff880218fec000 ffff88020a4d5318 ffff8802120a9a00
[ 1081.994850]  ffffffff00000001 ffff8802120a9a00 ffff880218febbd0 ffffffff81482ba4
[ 1081.996485] Call Trace:
[ 1081.997037]  [<ffffffff81482ba4>] schedule+0x7f/0x97
[ 1081.998017]  [<ffffffff81485eb5>] rwsem_down_write_failed+0x2d5/0x325
[ 1081.999241]  [<ffffffff810852a5>] ? finish_wait+0x6d/0x76
[ 1082.000306]  [<ffffffff81269723>] call_rwsem_down_write_failed+0x13/0x20
[ 1082.001533]  [<ffffffff81269723>] ? call_rwsem_down_write_failed+0x13/0x20
[ 1082.002776]  [<ffffffff81089fae>] ? __down_write_nested.isra.0+0x1f/0x21
[ 1082.003995]  [<ffffffff814855bd>] down_write+0x43/0x57
[ 1082.005000]  [<ffffffffa05211b0>] ? btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
[ 1082.007403]  [<ffffffffa05211b0>] btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
[ 1082.008988]  [<ffffffffa0545064>] btrfs_fallocate+0x7c1/0xc2f [btrfs]
[ 1082.010193]  [<ffffffff8108a1ba>] ? percpu_down_read+0x4e/0x77
[ 1082.011280]  [<ffffffff81174c4c>] ? __sb_start_write+0x5f/0xb0
[ 1082.012265]  [<ffffffff81174c4c>] ? __sb_start_write+0x5f/0xb0
[ 1082.013021]  [<ffffffff811712e4>] vfs_fallocate+0x170/0x1ff
[ 1082.013738]  [<ffffffff81181ebb>] ioctl_preallocate+0x89/0x9b
[ 1082.014778]  [<ffffffff811822d7>] do_vfs_ioctl+0x40a/0x4ea
[ 1082.015778]  [<ffffffff81176ea7>] ? SYSC_newfstat+0x25/0x2e
[ 1082.016806]  [<ffffffff8118b4de>] ? __fget_light+0x4d/0x71
[ 1082.017789]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
[ 1082.018706]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f

This happens because we can recursively acquire the semaphore
fs_info->delayed_iput_sem when attempting to allocate space to satisfy
a file write request as shown in the first trace above - when committing
a transaction we acquire (down_read) the semaphore before running the
delayed iputs, and when running a delayed iput() we can end up calling
an inode's eviction handler, which in turn commits another transaction
and attempts to acquire (down_read) again the semaphore to run more
delayed iput operations.
This results in a deadlock because if a task acquires multiple times a
semaphore it should invoke down_read_nested() with a different lockdep
class for each level of recursion.

Fix this by simplifying the implementation and use a mutex instead that
is acquired by the cleaner kthread before it runs the delayed iputs
instead of always acquiring a semaphore before delayed references are
run from anywhere.

Fixes: d7c151717a1e (btrfs: Fix NO_SPACE bug caused by delayed-iput)
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>

commit | commitdiff | tree

Filipe Manana [Fri, 15 Jan 2016 10:56:15 +0000 (10:56 +0000)]

Btrfs: fix typo in log message when starting a balance

The recent change titled "Btrfs: Check metadata redundancy on balance"
(already in linux-next) left a typo in a message for users:
metatdata -> metadata.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>

commit | commitdiff | tree

Chris Mason [Wed, 20 Jan 2016 02:21:30 +0000 (18:21 -0800)]

Merge branch 'misc-for-4.5' of git://git./linux/kernel/git/kdave/linux into for-linus-4.5

commit | commitdiff | tree

Chris Mason [Wed, 20 Jan 2016 02:21:00 +0000 (18:21 -0800)]

Merge branch 'misc-cleanups-4.5' of git://git./linux/kernel/git/kdave/linux into for-linus-4.5

commit | commitdiff | tree

Colin Ian King [Tue, 19 Jan 2016 00:05:28 +0000 (00:05 +0000)]

btrfs: remove duplicate const specifier

duplicate const is redundant so remove it

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Sebastian Andrzej Siewior [Fri, 15 Jan 2016 13:37:15 +0000 (14:37 +0100)]

btrfs: initialize the seq counter in struct btrfs_device

I managed to trigger this:
| INFO: trying to register non-static key.
| the code is fine but needs lockdep annotation.
| turning off the locking correctness validator.
| CPU: 1 PID: 781 Comm: systemd-gpt-aut Not tainted 4.4.0-rt2+ #14
| Hardware name: ARM-Versatile Express
| [<80307cec>] (dump_stack)
| [<80070e98>] (__lock_acquire)
| [<8007184c>] (lock_acquire)
| [<80287800>] (btrfs_ioctl)
| [<8012a8d4>] (do_vfs_ioctl)
| [<8012ac14>] (SyS_ioctl)

so I think that btrfs_device_data_ordered_init() is not invoked behind
a macro somewhere.

Fixes: 7cc8e58d53cd ("Btrfs: fix unprotected device's variants on 32bits machine")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Dan Carpenter [Wed, 13 Jan 2016 12:21:17 +0000 (15:21 +0300)]

Btrfs: clean up an error code in btrfs_init_space_info()

If we return 1 here, then the caller treats it as an error and returns
-EINVAL. It causes a static checker warning to treat positive returns
as an error.

Fixes: 1aba86d67f34 ('Btrfs: fix easily get into ENOSPC in mixed case')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Geliang Tang [Wed, 13 Jan 2016 14:08:01 +0000 (22:08 +0800)]

btrfs: fix iterator with update error in backref.c

Fix the following error:

fs/btrfs/backref.c:565:1-20: iterator with update on line 577

Fixes: a7ca422('btrfs: use list_for_each_entry* in backref.c')
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Tsutomu Itoh [Wed, 6 Jan 2016 08:03:40 +0000 (17:03 +0900)]

Btrfs: fix output of compression message in btrfs_parse_options()

The compression message might not be correctly output.
Fix it.

[[before fix]]

# mount -o compress /dev/sdb3 /test3
[  996.874264] BTRFS info (device sdb3): disk space caching is enabled
[  996.874268] BTRFS: has skinny extents
# mount | grep /test3
/dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)

# mount -o remount,compress-force /dev/sdb3 /test3
[ 1035.075017] BTRFS info (device sdb3): force zlib compression
[ 1035.075021] BTRFS info (device sdb3): disk space caching is enabled
# mount | grep /test3
/dev/sdb3 on /test3 type btrfs (rw,relatime,compress-force=zlib,space_cache,subvolid=5,subvol=/)

# mount -o remount,compress /dev/sdb3 /test3
[ 1053.679092] BTRFS info (device sdb3): disk space caching is enabled
# mount | grep /test3
/dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)

[[after fix]]

# mount -o compress /dev/sdb3 /test3
[  401.021753] BTRFS info (device sdb3): use zlib compression
[  401.021758] BTRFS info (device sdb3): disk space caching is enabled
[  401.021760] BTRFS: has skinny extents
# mount | grep /test3
/dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)

# mount -o remount,compress-force /dev/sdb3 /test3
[  439.824624] BTRFS info (device sdb3): force zlib compression
[  439.824629] BTRFS info (device sdb3): disk space caching is enabled
# mount | grep /test3
/dev/sdb3 on /test3 type btrfs (rw,relatime,compress-force=zlib,space_cache,subvolid=5,subvol=/)

# mount -o remount,compress /dev/sdb3 /test3
[  459.918430] BTRFS info (device sdb3): use zlib compression
[  459.918434] BTRFS info (device sdb3): disk space caching is enabled
# mount | grep /test3
/dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Chandan Rajendra [Thu, 7 Jan 2016 13:26:59 +0000 (18:56 +0530)]

Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots

The following call trace is seen when btrfs/031 test is executed in a loop,

[  158.661848] ------------[ cut here ]------------
[  158.662634] WARNING: CPU: 2 PID: 890 at /home/chandan/repos/linux/fs/btrfs/ioctl.c:558 create_subvol+0x3d1/0x6ea()
[  158.664102] BTRFS: Transaction aborted (error -2)
[  158.664774] Modules linked in:
[  158.665266] CPU: 2 PID: 890 Comm: btrfs Not tainted 4.4.0-rc6-g511711a #2
[  158.666251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[  158.667392]  ffffffff81c0a6b0 ffff8806c7c4f8e8 ffffffff81431fc8 ffff8806c7c4f930
[  158.668515]  ffff8806c7c4f920 ffffffff81051aa1 ffff880c85aff000 ffff8800bb44d000
[  158.669647]  ffff8808863b5c98 0000000000000000 00000000fffffffe ffff8806c7c4f980
[  158.670769] Call Trace:
[  158.671153]  [<ffffffff81431fc8>] dump_stack+0x44/0x5c
[  158.671884]  [<ffffffff81051aa1>] warn_slowpath_common+0x81/0xc0
[  158.672769]  [<ffffffff81051b27>] warn_slowpath_fmt+0x47/0x50
[  158.673620]  [<ffffffff813bc98d>] create_subvol+0x3d1/0x6ea
[  158.674440]  [<ffffffff813777c9>] btrfs_mksubvol.isra.30+0x369/0x520
[  158.675376]  [<ffffffff8108a4aa>] ? percpu_down_read+0x1a/0x50
[  158.676235]  [<ffffffff81377a81>] btrfs_ioctl_snap_create_transid+0x101/0x180
[  158.677268]  [<ffffffff81377b52>] btrfs_ioctl_snap_create+0x52/0x70
[  158.678183]  [<ffffffff8137afb4>] btrfs_ioctl+0x474/0x2f90
[  158.678975]  [<ffffffff81144b8e>] ? vma_merge+0xee/0x300
[  158.679751]  [<ffffffff8115be31>] ? alloc_pages_vma+0x91/0x170
[  158.680599]  [<ffffffff81123f62>] ? lru_cache_add_active_or_unevictable+0x22/0x70
[  158.681686]  [<ffffffff813d99cf>] ? selinux_file_ioctl+0xff/0x1d0
[  158.682581]  [<ffffffff8117b791>] do_vfs_ioctl+0x2c1/0x490
[  158.683399]  [<ffffffff813d3cde>] ? security_file_ioctl+0x3e/0x60
[  158.684297]  [<ffffffff8117b9d4>] SyS_ioctl+0x74/0x80
[  158.685051]  [<ffffffff819b2bd7>] entry_SYSCALL_64_fastpath+0x12/0x6a
[  158.685958] ---[ end trace 4b63312de5a2cb76 ]---
[  158.686647] BTRFS: error (device loop0) in create_subvol:558: errno=-2 No such entry
[  158.709508] BTRFS info (device loop0): forced readonly
[  158.737113] BTRFS info (device loop0): disk space caching is enabled
[  158.738096] BTRFS error (device loop0): Remounting read-write after error is not allowed
[  158.851303] BTRFS error (device loop0): cleaner transaction attach returned -30

This occurs because,

Mount filesystem
Create subvol with ID 257
Unmount filesystem
Mount filesystem
Delete subvol with ID 257
  btrfs_drop_snapshot()
    Add root corresponding to subvol 257 into
    btrfs_transaction->dropped_roots list
Create new subvol (i.e. create_subvol())
  257 is returned as the next free objectid
  btrfs_read_fs_root_no_name()
    Finds the btrfs_root instance corresponding to the old subvol with ID 257
    in btrfs_fs_info->fs_roots_radix.
    Returns error since btrfs_root_item->refs has the value of 0.

To fix the issue the commit initializes tree root's and subvolume root's
highest_objectid when loading the roots from disk.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Jeff Mahoney [Wed, 3 Jun 2015 14:55:48 +0000 (10:55 -0400)]

btrfs: cleanup, stop casting for extent_map->lookup everywhere

Overloading extent_map->bdev to struct map_lookup * might have started out
as a means to an end, but it's a pattern that's used all over the place
now. Let's get rid of the casting and just add a union instead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Chris Mason [Mon, 11 Jan 2016 16:39:28 +0000 (08:39 -0800)]

Merge branch 'for-chris-4.5' of git://git./linux/kernel/git/fdmanana/linux into for-linus-4.5

Signed-off-by: Chris Mason <clm@fb.com>

commit | commitdiff | tree

Chris Mason [Mon, 11 Jan 2016 14:08:37 +0000 (06:08 -0800)]

Merge branch 'misc-cleanups-4.5' of git://git./linux/kernel/git/kdave/linux into for-linus-4.5

Signed-off-by: Chris Mason <clm@fb.com>

commit | commitdiff | tree

Chris Mason [Mon, 11 Jan 2016 13:59:32 +0000 (05:59 -0800)]

Merge branch 'misc-for-4.5' of git://git./linux/kernel/git/kdave/linux into for-linus-4.5

commit | commitdiff | tree

Filipe Manana [Wed, 6 Jan 2016 22:42:35 +0000 (22:42 +0000)]

Btrfs: fix fitrim discarding device area reserved for boot loader's use

As of the 4.3 kernel release, the fitrim ioctl can now discard any region
of a disk that is not allocated to any chunk/block group, including the
first megabyte which is used for our primary superblock and by the boot
loader (grub for example).

Fix this by not allowing to trim/discard any region in the device starting
with an offset not greater than min(alloc_start_mount_option, 1Mb), just
as it was not possible before 4.3.

A reproducer test case for xfstests follows.

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1 # failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
      cd /
      rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter

  # real QA test starts here
  _need_to_be_root
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch

  rm -f $seqres.full

  _scratch_mkfs >>$seqres.full 2>&1

  # Write to the [0, 64Kb[ and [68Kb, 1Mb[ ranges of the device. These ranges are
  # reserved for a boot loader to use (GRUB for example) and btrfs should never
  # use them - neither for allocating metadata/data nor should trim/discard them.
  # The range [64Kb, 68Kb[ is used for the primary superblock of the filesystem.
  $XFS_IO_PROG -c "pwrite -S 0xfd 0 64K" $SCRATCH_DEV | _filter_xfs_io
  $XFS_IO_PROG -c "pwrite -S 0xfd 68K 956K" $SCRATCH_DEV | _filter_xfs_io

  # Now mount the filesystem and perform a fitrim against it.
  _scratch_mount
  _require_batched_discard $SCRATCH_MNT
  $FSTRIM_PROG $SCRATCH_MNT

  # Now unmount the filesystem and verify the content of the ranges was not
  # modified (no trim/discard happened on them).
  _scratch_unmount
  echo "Content of the ranges [0, 64Kb] and [68Kb, 1Mb[ after fitrim:"
  od -t x1 -N $((64 * 1024)) $SCRATCH_DEV
  od -t x1 -j $((68 * 1024)) -N $((956 * 1024)) $SCRATCH_DEV

  status=0
  exit

Reported-by: Vincent Petry <PVince81@yahoo.fr>
Reported-by: Andrei Borzenkov <arvidjaar@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109341
Fixes: 499f377f49f0 (btrfs: iterate over unused chunk space in FITRIM)
Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: Filipe Manana <fdmanana@suse.com>

commit | commitdiff | tree

Sam Tygier [Wed, 6 Jan 2016 08:46:12 +0000 (08:46 +0000)]

Btrfs: Check metadata redundancy on balance

When converting a filesystem via balance check that metadata mode
is at least as redundant as the data mode. For example give warning
when:
-dconvert=raid1 -mconvert=single

Signed-off-by: Sam Tygier <samtygier@yahoo.co.uk>
[ minor message reformatting ]
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Sat, 10 Oct 2015 15:59:53 +0000 (17:59 +0200)]

btrfs: statfs: report zero available if metadata are exhausted

There is one ENOSPC case that's very confusing. There's Available
greater than zero but no file operation succeds (besides removing
files). This happens when the metadata are exhausted and there's no
possibility to allocate another chunk.

In this scenario it's normal that there's still some space in the data
chunk and the calculation in df reflects that in the Avail value.

To at least give some clue about the ENOSPC situation, let statfs report
zero value in Avail, even if there's still data space available.

Current:
/dev/sdb1 4.0G 3.3G 719M 83% /mnt/test

New:
/dev/sdb1 4.0G 3.3G 0 100% /mnt/test

We calculate the remaining metadata space minus global reserve. If this
is (supposedly) smaller than zero, there's no space. But this does not
hold in practice, the exhausted state happens where's still some
positive delta. So we apply some guesswork and compare the delta to a 4M
threshold. (Practically observed delta was 2M.)

We probably cannot calculate the exact threshold value because this
depends on the internal reservations requested by various operations, so
some operations that consume a few metadata will succeed even if the
Avail is zero. But this is better than the other way around.

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Tue, 10 Nov 2015 17:54:03 +0000 (18:54 +0100)]

btrfs: preallocate path for snapshot creation at ioctl time

We can also preallocate btrfs_path that's used during pending snapshot
creation and avoid another late ENOMEM failure.

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Tue, 10 Nov 2015 17:54:00 +0000 (18:54 +0100)]

btrfs: allocate root item at snapshot ioctl time

The actual snapshot creation is delayed until transaction commit. If we
cannot get enough memory for the root item there, we have to fail the
whole transaction commit which is bad. So we'll allocate the memory at
the ioctl call and pass it along with the pending_snapshot struct. The
potential ENOMEM will be returned to the caller of snapshot ioctl.

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Tue, 10 Nov 2015 17:53:56 +0000 (18:53 +0100)]

btrfs: do an allocation earlier during snapshot creation

We can allocate pending_snapshot earlier and do not have to do cleanup
in case of failure.

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Fri, 27 Nov 2015 15:31:45 +0000 (16:31 +0100)]

btrfs: use smaller type for btrfs_path locks

The values of btrfs_path::locks are 0 to 4, fit into a u8. Let's see:

* overall size of btrfs_path drops down from 136 to 112 (-24 bytes),
* better packing in a slab page +6 objects
* the whole structure now fits to 2 cachelines
* slight decrease in code size:

   text    data     bss     dec     hex filename
938731   43670   23144 1005545   f57e9 fs/btrfs/btrfs.ko.before
938203   43670   23144 1005017   f55d9 fs/btrfs/btrfs.ko.after

(and the generated assembly does not change much)

The main purpose is to decrease the size of the structure without
affecting performance. The byte access is usually well behaving accross
arches, the locks are not accessed frequently and sometimes just
compared to zero.

Note for further size reduction attempts: the slots could be made u16
but this might generate worse code on some arches (non-byte and non-int
access). Also the range of operations on slots is wider compared to
locks and the potential performance drop should be evaluated first.

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Fri, 27 Nov 2015 15:31:42 +0000 (16:31 +0100)]

btrfs: use smaller type for btrfs_path lowest_level

The level is 0..7, we can use smaller type. The size of btrfs_path is now
136 bytes from 144, which is +2 objects that fit into a 4k slab.

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Fri, 27 Nov 2015 15:31:38 +0000 (16:31 +0100)]

btrfs: use smaller type for btrfs_path reada

The possible values for reada are all positive and bounded, we can later
save some bytes by storing it in u8.

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Fri, 27 Nov 2015 15:31:35 +0000 (16:31 +0100)]

btrfs: cleanup, use enum values for btrfs_path reada

Replace the integers by enums for better readability. The value 2 does
not have any meaning since a717531942f488209dded30f6bc648167bcefa72
"Btrfs: do less aggressive btree readahead" (2009-01-22).

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Thu, 19 Nov 2015 10:42:31 +0000 (11:42 +0100)]

btrfs: constify static arrays

There are a few statically initialized arrays that can be made const.
The remaining (like file_system_type, sysfs attributes or prop handlers)
do not allow that due to type mismatch when passed to the APIs or
because the structures are modified through other members.

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Thu, 19 Nov 2015 10:42:28 +0000 (11:42 +0100)]

btrfs: constify remaining structs with function pointers

* struct extent_io_ops
* struct btrfs_free_space_op

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Thu, 19 Nov 2015 10:42:24 +0000 (11:42 +0100)]

btrfs tests: replace whole ops structure for free space tests

Preparatory work for making btrfs_free_space_op constant. In
test_steal_space_from_bitmap_to_extent, we substitute use_bitmap with
own version thus preventing constification. We can rework it so we
replace the whole structure with the correct function pointers.

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Geliang Tang [Mon, 21 Dec 2015 15:50:23 +0000 (23:50 +0800)]

btrfs: use list_for_each_entry* in backref.c

Use list_for_each_entry*() to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Geliang Tang [Fri, 18 Dec 2015 14:17:00 +0000 (22:17 +0800)]

btrfs: use list_for_each_entry_safe in free-space-cache.c

Use list_for_each_entry_safe() instead of list_for_each_safe() to
simplify the code.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Geliang Tang [Fri, 18 Dec 2015 14:16:59 +0000 (22:16 +0800)]

btrfs: use list_for_each_entry* in check-integrity.c

Use list_for_each_entry*() instead of list_for_each*() to simplify
the code.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Byongho Lee [Mon, 14 Dec 2015 16:42:10 +0000 (01:42 +0900)]

Btrfs: use linux/sizes.h to represent constants

We use many constants to represent size and offset value. And to make
code readable we use '256 * 1024 * 1024' instead of '268435456' to
represent '256MB'. However we can make far more readable with 'SZ_256MB'
which is defined in the 'linux/sizes.h'.

So this patch replaces 'xxx * 1024 * 1024' kind of expression with
single 'SZ_xxxMB' if 'xxx' is a power of 2 then 'xxx * SZ_1M' if 'xxx' is
not a power of 2. And I haven't touched to '4096' & '8192' because it's
more intuitive than 'SZ_4KB' & 'SZ_8KB'.

Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Mon, 30 Nov 2015 10:02:31 +0000 (11:02 +0100)]

btrfs: cleanup, remove stray return statements

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Alexandru Moise [Sun, 25 Oct 2015 20:15:06 +0000 (20:15 +0000)]

btrfs: zero out delayed node upon allocation

It's slightly cleaner to zero-out the delayed node upon allocation
than to do it by hand in btrfs_init_delayed_node() for a few members

Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Alexandru Moise [Sun, 25 Oct 2015 19:35:44 +0000 (19:35 +0000)]

btrfs: pass proper enum type to start_transaction()

Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Alexandru Moise [Sun, 18 Oct 2015 21:35:41 +0000 (21:35 +0000)]

btrfs: switch __btrfs_fs_incompat return type from int to bool

Conform to __btrfs_fs_incompat() cast-to-bool (!!) by explicitly
returning boolean not int.

Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Byongho Lee [Tue, 19 May 2015 14:46:45 +0000 (23:46 +0900)]

btrfs: remove unused inode argument from uncompress_inline()

The inode argument is never used from the beginning, so remove it.

Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Tue, 8 Dec 2015 13:39:32 +0000 (14:39 +0100)]

btrfs: don't use slab cache for struct btrfs_delalloc_work

Although we prefer to use separate caches for various structs, it seems
better not to do that for struct btrfs_delalloc_work. Objects of this
type are allocated rarely, when transaction commit calls
btrfs_start_delalloc_roots, requesting delayed iputs.

The objects are temporary (with some IO involved) but still allocated
and freed within __start_delalloc_inodes. Memory allocation failure is
handled.

The slab cache is empty most of the time (observed on several systems),
so if we need to allocate a new slab object, the first one has to
allocate a full page. In a potential case of low memory conditions this
might fail with higher probability compared to using the generic slab
caches.

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Tue, 1 Dec 2015 17:09:12 +0000 (18:09 +0100)]

btrfs: drop duplicate prefix from scrub workqueues

The helper btrfs_alloc_workqueue will add the "btrfs-" prefix.

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Mon, 30 Nov 2015 16:27:09 +0000 (17:27 +0100)]

btrfs: verbose error when we find an unexpected item in sys_array

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Mon, 30 Nov 2015 16:27:06 +0000 (17:27 +0100)]

btrfs: handle invalid num_stripes in sys_array

We can handle the special case of num_stripes == 0 directly inside
btrfs_read_sys_array. The BUG_ON in btrfs_chunk_item_size is there to
catch other unhandled cases where we fail to validate external data.

A crafted or corrupted image crashes at mount time:

BTRFS: device fsid 9006933e-2a9a-44f0-917f-514252aeec2c devid 1 transid 7 /dev/loop0
BTRFS info (device loop0): disk space caching is enabled
BUG: failure at fs/btrfs/ctree.h:337/btrfs_chunk_item_size()!
Kernel panic - not syncing: BUG!
CPU: 0 PID: 313 Comm: mount Not tainted 4.2.5-00657-ge047887-dirty #25
Stack:
637af890 60062489 602aeb2e 604192ba
60387961 00000011 637af8a0 6038a835
637af9c0 6038776b 634ef32b 00000000
Call Trace:
[<6001c86d>] show_stack+0xfe/0x15b
[<6038a835>] dump_stack+0x2a/0x2c
[<6038776b>] panic+0x13e/0x2b3
[<6020f099>] btrfs_read_sys_array+0x25d/0x2ff
[<601cfbbe>] open_ctree+0x192d/0x27af
[<6019c2c1>] btrfs_mount+0x8f5/0xb9a
[<600bc9a7>] mount_fs+0x11/0xf3
[<600d5167>] vfs_kern_mount+0x75/0x11a
[<6019bcb0>] btrfs_mount+0x2e4/0xb9a
[<600bc9a7>] mount_fs+0x11/0xf3
[<600d5167>] vfs_kern_mount+0x75/0x11a
[<600d710b>] do_mount+0xa35/0xbc9
[<600d7557>] SyS_mount+0x95/0xc8
[<6001e884>] handle_syscall+0x6b/0x8e

Reported-by: Jiri Slaby <jslaby@suse.com>
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
CC: stable@vger.kernel.org # 3.19+
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Mon, 30 Nov 2015 15:51:29 +0000 (16:51 +0100)]

btrfs: better packing of btrfs_delayed_extent_op

btrfs_delayed_extent_op can be packed in a better way, it's 40 bytes now
and has 8 unused bytes. Reducing the level type to u8 makes it possible
to squeeze it to the padding byte after key. The bitfields were switched
to bool as there's space to store the full byte without increasing the
whole structure, besides that the generated assembly is smaller.

struct btrfs_delayed_extent_op {
struct btrfs_disk_key      key;                  /*     0    17 */
u8                         level;                /*    17     1 */
bool                       update_key;           /*    18     1 */
bool                       update_flags;         /*    19     1 */
bool                       is_data;              /*    20     1 */

/* XXX 3 bytes hole, try to pack */

u64                        flags_to_set;         /*    24     8 */

/* size: 32, cachelines: 1, members: 6 */
/* sum members: 29, holes: 1, sum holes: 3 */
/* last cacheline: 32 bytes */
};

The final size is 32 bytes which gives +26 object per slab page.

   text    data     bss     dec     hex filename
938811   43670   23144 1005625   f5839 fs/btrfs/btrfs.ko.before
938747   43670   23144 1005561   f57f9 fs/btrfs/btrfs.ko.after

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

David Sterba [Thu, 19 Nov 2015 13:15:51 +0000 (14:15 +0100)]

btrfs: put delayed item hook into inode

Inodes for delayed iput allocate a trivial helper structure, let's place
the list hook directly into the inode and save a kmalloc (killing a
__GFP_NOFAIL as a bonus) at the cost of increasing size of btrfs_inode.

The inode can be put into the delayed_iputs list more than once and we
have to keep the count. This means we can't use the list_splice to
process a bunch of inodes because we'd lost track of the count if the
inode is put into the delayed iputs again while it's processed.

Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Zhao Lei [Thu, 19 Nov 2015 09:26:22 +0000 (17:26 +0800)]

btrfs: Support convert to -d dup for btrfs-convert

Since we will add support for -d dup for non-mixed filesystem,
kernel need to support converting to this raid-type.

This patch remove limitation of above case.

Tested by following script:
(combination of dup conversion with fsck):

export TEST_DEV='/dev/vdc'
export TEST_DIR='/var/ltf/tester/mnt'

do_dup_test()
{
    local m_from="$1"
    local d_from="$2"
    local m_to="$3"
    local d_to="$4"

    echo "Convert from -m $m_from -d $d_from to -m $m_to -d $d_to"

    umount "$TEST_DIR" &>/dev/null
    ./mkfs.btrfs -f -m "$m_from" -d "$d_from" "$TEST_DEV" >/dev/null || return 1
    mount "$TEST_DEV" "$TEST_DIR" || return 1

    cp -a /sbin/* "$TEST_DIR"

    [[ "$m_from" != "$m_to" ]] && {
        ./btrfs balance start -f -mconvert="$m_to" "$TEST_DIR" || return 1
    }

    [[ "$d_from" != "$d_to" ]] && {
local opt=()
[[ "$d_to" == single ]] && opt+=("-f")
        ./btrfs balance start "${opt[@]}" -dconvert="$d_to" "$TEST_DIR" || return 1
    }

    umount "$TEST_DIR" || return 1
    ./btrfsck "$TEST_DEV" || return 1
    echo

    return 0
}

test_all()
{
    for m_from in single dup; do
    for d_from in single dup; do
    for m_to in single dup; do
    for d_to in single dup; do
    do_dup_test "$m_from" "$d_from" "$m_to" "$d_to" || return 1
    done
    done
    done
    done
}

test_all

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Josef Bacik [Thu, 22 Oct 2015 19:05:09 +0000 (15:05 -0400)]

Btrfs: igrab inode in writepage

We hit this panic on a few of our boxes this week where we have an
ordered_extent with an NULL inode.  We do an igrab() of the inode in writepages,
but weren't doing it in writepage which can be called directly from the VM on
dirty pages.  If the inode has been unlinked then we could have I_FREEING set
which means igrab() would return NULL and we get this panic.  Fix this by trying
to igrab in btrfs_writepage, and if it returns NULL then just redirty the page
and return AOP_WRITEPAGE_ACTIVATE; so the VM knows it wasn't successful.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Anand Jain [Wed, 7 Oct 2015 09:23:23 +0000 (17:23 +0800)]

Btrfs: add missing brelse when superblock checksum fails

Looks like oversight, call brelse() when checksum fails. Further down the
code, in the non error path, we do call brelse() and so we don't see
brelse() in the goto error paths.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

commit | commitdiff | tree

Filipe Manana [Tue, 5 Jan 2016 16:24:05 +0000 (16:24 +0000)]

Btrfs: fix transaction handle leak on failure to create hard link

If we failed to create a hard link we were not always releasing the
the transaction handle we got before, resulting in a memory leak and
preventing any other tasks from being able to commit the current
transaction.
Fix this by always releasing our transaction handle.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>

commit | commitdiff | tree

Filipe Manana [Thu, 31 Dec 2015 18:16:29 +0000 (18:16 +0000)]

Btrfs: fix number of transaction units required to create symlink

We weren't accounting for the insertion of an inline extent item for the
symlink inode nor that we need to update the parent inode item (through
the call to btrfs_add_nondir()). So fix this by including two more
transaction units.

Signed-off-by: Filipe Manana <fdmanana@suse.com>

commit | commitdiff | tree

Filipe Manana [Thu, 31 Dec 2015 18:08:24 +0000 (18:08 +0000)]

Btrfs: don't leave dangling dentry if symlink creation failed

When we are creating a symlink we might fail with an error after we
created its inode and added the corresponding directory indexes to its
parent inode. In this case we end up never removing the directory indexes
because the inode eviction handler, called for our symlink inode on the
final iput(), only removes items associated with the symlink inode and
not with the parent inode.

Example:

  $ mkfs.btrfs -f /dev/sdi
  $ mount /dev/sdi /mnt
  $ touch /mnt/foo
  $ ln -s /mnt/foo /mnt/bar
  ln: failed to create symbolic link ‘bar’: Cannot allocate memory
  $ umount /mnt
  $ btrfsck /dev/sdi
  Checking filesystem on /dev/sdi
  UUID: d5acb5ba-31bd-42da-b456-89dca2e716e1
  checking extents
  checking free space cache
  checking fs roots
  root 5 inode 258 errors 2001, no inode item, link count wrong
unresolved ref dir 256 index 3 namelen 3 name bar filetype 7 errors 4, no inode ref
  found 131073 bytes used err is 1
  total csum bytes: 0
  total tree bytes: 131072
  total fs tree bytes: 32768
  total extent tree bytes: 16384
  btree space waste bytes: 124305
  file data blocks allocated: 262144
   referenced 262144
  btrfs-progs v4.2.3

So fix this by adding the directory index entries as the very last
step of symlink creation.

Signed-off-by: Filipe Manana <fdmanana@suse.com>

commit | commitdiff | tree

Filipe Manana [Thu, 31 Dec 2015 18:07:59 +0000 (18:07 +0000)]

Btrfs: send, don't BUG_ON() when an empty symlink is found

When a symlink is successfully created it always has an inline extent
containing the source path. However if an error happens when creating
the symlink, we can leave in the subvolume's tree a symlink inode without
any such inline extent item - this happens if after btrfs_symlink() calls
btrfs_end_transaction() and before it calls the inode eviction handler
(through the final iput() call), the transaction gets committed and a
crash happens before the eviction handler gets called, or if a snapshot
of the subvolume is made before the eviction handler gets called. Sadly
we can't just avoid this by making btrfs_symlink() call
btrfs_end_transaction() after it calls the eviction handler, because the
later can commit the current transaction before it removes any items from
the subvolume tree (if it encounters ENOSPC errors while reserving space
for removing all the items).

So make send fail more gracefully, with an -EIO error, and print a
message to dmesg/syslog informing that there's an empty symlink inode,
so that the user can delete the empty symlink or do something else
about it.

Reported-by: Stephen R. van den Berg <srb@cuci.nl>
Signed-off-by: Filipe Manana <fdmanana@suse.com>

commit | commitdiff | tree

Filipe Manana [Wed, 30 Dec 2015 02:42:30 +0000 (02:42 +0000)]

Btrfs: fix race between free space endio workers and space cache writeout

While running a stress test I ran into the following trace/transaction
abort:

[471626.672243] ------------[ cut here ]------------
[471626.673322] WARNING: CPU: 9 PID: 19107 at fs/btrfs/extent-tree.c:3740 btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]()
[471626.675492] BTRFS: Transaction aborted (error -2)
[471626.676748] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix
[471626.688802] CPU: 14 PID: 19107 Comm: fsstress Tainted: G        W       4.3.0-rc5-btrfs-next-17+ #1
[471626.690148] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[471626.691901]  0000000000000000 ffff880016037cf0 ffffffff812566f4 ffff880016037d38
[471626.695009]  ffff880016037d28 ffffffff8104d0a6 ffffffffa040c84e 00000000fffffffe
[471626.697490]  ffff88011fe855f8 ffff88000c484cb0 ffff88000d195000 ffff880016037d90
[471626.699201] Call Trace:
[471626.699804]  [<ffffffff812566f4>] dump_stack+0x4e/0x79
[471626.701049]  [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8
[471626.702542]  [<ffffffffa040c84e>] ? btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
[471626.704326]  [<ffffffff8104d107>] warn_slowpath_fmt+0x48/0x50
[471626.705636]  [<ffffffffa0403717>] ? write_one_cache_group.isra.32+0x77/0x82 [btrfs]
[471626.707048]  [<ffffffffa040c84e>] btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
[471626.708616]  [<ffffffffa048a50a>] commit_cowonly_roots+0x1d7/0x25a [btrfs]
[471626.709950]  [<ffffffffa041e34a>] btrfs_commit_transaction+0x4c4/0x991 [btrfs]
[471626.711286]  [<ffffffff81081c61>] ? signal_pending_state+0x31/0x31
[471626.712611]  [<ffffffffa03f6df4>] btrfs_sync_fs+0x145/0x1ad [btrfs]
[471626.715610]  [<ffffffff811962a2>] ? SyS_tee+0x226/0x226
[471626.716718]  [<ffffffff811962c2>] sync_fs_one_sb+0x20/0x22
[471626.717672]  [<ffffffff8116fc01>] iterate_supers+0x75/0xc2
[471626.718800]  [<ffffffff8119669a>] sys_sync+0x52/0x80
[471626.719990]  [<ffffffff8147cd97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[471626.721835] ---[ end trace baf57f43d76693f4 ]---
[471626.722954] BTRFS: error (device sdc) in btrfs_write_dirty_block_groups:3740: errno=-2 No such entry

This is a very rare situation and it happened due to a race between a free
space endio worker and writing the space caches for dirty block groups at
a transaction's commit critical section. The steps leading to this are:

1) A task calls btrfs_commit_transaction() and starts the writeout of the
   space caches for all currently dirty block groups (i.e. it calls
   btrfs_start_dirty_block_groups());

2) The previous step starts writeback for space caches;

3) When the writeback finishes it queues jobs for free space endio work
   queue (fs_info->endio_freespace_worker) that execute
   btrfs_finish_ordered_io();

4) The task committing the transaction sets the transaction's state
   to TRANS_STATE_COMMIT_DOING and shortly after calls
   btrfs_write_dirty_block_groups();

5) A free space endio job joins the transaction, through
   btrfs_join_transaction_nolock(), and updates a free space inode item
   in the root tree through btrfs_update_inode_fallback();

6) Updating the free space inode item resulted in COWing one or more
   nodes/leaves of the root tree, and that resulted in creating a new
   metadata block group, which gets added to the transaction's list
   of dirty block groups (this is a very rare case);

7) The free space endio job has not released yet its transaction handle
   at this point, so the new metadata block group was not yet fully
   created (didn't go through btrfs_create_pending_block_groups() yet);

8) The transaction commit task sees the new metadata block group in
   the transaction's list of dirty block groups and processes it.
   When it attempts to update the block group's block group item in
   the extent tree, through write_one_cache_group(), it isn't able
   to find it and aborts the transaction with error -ENOENT - this
   is because the free space endio job hasn't yet released its
   transaction handle (which calls btrfs_create_pending_block_groups())
   and therefore the block group item was not yet added to the extent
   tree.

Fix this waiting for free space endio jobs if we fail to find a block
group item in the extent tree and then retry once updating the block
group item.

Signed-off-by: Filipe Manana <fdmanana@suse.com>

commit | commitdiff | tree

Chris Mason [Wed, 30 Dec 2015 15:52:35 +0000 (07:52 -0800)]

btrfs: don't run delayed references while we are creating the free space tree

This is a short term solution to make sure btrfs_run_delayed_refs()
doesn't change the extent tree while we are scanning it to create the
free space tree.

Longer term we need to synchronize scanning the block groups one by one,
similar to what happens during a balance.

Signed-off-by: Chris Mason <clm@fb.com>

commit | commitdiff | tree

Chris Mason [Wed, 30 Dec 2015 15:37:26 +0000 (07:37 -0800)]

btrfs: fix compiling with CONFIG_BTRFS_DEBUG enabled.

Merging in the free space tree deleted a variable needed when
CONFIG_BTRFS_DEBUG=y

Signed-off-by: Chris Mason <clm@fb.com>

commit | commitdiff | tree

Chris Mason [Wed, 23 Dec 2015 21:30:51 +0000 (13:30 -0800)]

btrfs: fix warning on uninit variable in btrfs_finish_chunk_alloc

map->num_stripes really can't be zero, but just in case.

Signed-off-by: Chris Mason <clm@fb.com>

commit | commitdiff | tree

Chris Mason [Wed, 23 Dec 2015 21:29:09 +0000 (13:29 -0800)]

Merge branch 'freespace-4.5' into for-linus-4.5

commit | commitdiff | tree

Chris Mason [Wed, 23 Dec 2015 21:28:35 +0000 (13:28 -0800)]

Merge branch 'for-chris-4.5' of git://git./linux/kernel/git/fdmanana/linux into for-linus-4.5

commit | commitdiff | tree

Chris Mason [Wed, 23 Dec 2015 21:17:42 +0000 (13:17 -0800)]

Merge branch 'dev/simplify-set-bit' of git://git./linux/kernel/git/kdave/linux into for-linus-4.5

Signed-off-by: Chris Mason <clm@fb.com>

commit | commitdiff | tree

Chris Mason [Wed, 23 Dec 2015 21:11:27 +0000 (13:11 -0800)]

Merge branch 'dev/gfp-flags' of git://git./linux/kernel/git/kdave/linux into for-linus-4.5

commit | commitdiff | tree

Chris Mason [Wed, 23 Dec 2015 21:10:26 +0000 (13:10 -0800)]

Merge branch 'cleanup/misc-simplify' of git://git./linux/kernel/git/kdave/linux into for-linus-4.5

commit | commitdiff | tree

Filipe Manana [Fri, 18 Dec 2015 03:02:48 +0000 (03:02 +0000)]

Btrfs: fix unprotected list operations at btrfs_write_dirty_block_groups

We call btrfs_write_dirty_block_groups() in the critical section of a
transaction's commit, when no other tasks can join the transaction and
add more block groups to the transaction's list of dirty block groups,
so we not taking the dirty block groups spinlock when checking for the
list's emptyness, grabbing its first element or deleting elements from
it.

However there's a special and rare case where we can have a concurrent
task adding elements to this list. We trigger writeback for space
caches before at btrfs_start_dirty_block_groups() and in past iterations
of the loop at btrfs_write_dirty_block_groups(), this means that when
the writeback finishes (which happens asynchronously) it creates a
task for the endio free space work queue that executes
btrfs_finish_ordered_io() - this function is able to join the transaction,
through btrfs_join_transaction_nolock(), and update the free space cache's
inode item in the root tree, which can result in COWing nodes of this tree
and therefore allocation of a new block group can happen, which gets added
to the transaction's list of dirty block groups while the transaction
commit task is operating on it concurrently.

So fix this by taking the dirty block groups spinlock before doing
operations on the dirty block groups list at
btrfs_write_dirty_block_groups().

Signed-off-by: Filipe Manana <fdmanana@suse.com>

commit | commitdiff | tree

Linus Torvalds [Mon, 21 Dec 2015 00:06:09 +0000 (16:06 -0800)]

Linux 4.4-rc6

commit | commitdiff | tree

Linus Torvalds [Sun, 20 Dec 2015 18:01:11 +0000 (10:01 -0800)]

Merge tag 'rtc-4.4-3' of git://git./linux/kernel/git/abelloni/linux

Pull RTC fixes from Alexandre Belloni:
"Late fixes for the RTC subsystem for 4.4:

  A fix for a nasty hardware bug in rk808 and an initialization
  reordering in da9063 to fix a possible crash"

* tag 'rtc-4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  rtc: da9063: fix access ordering error during RTC interrupt at system power on
  rtc: rk808: Compensate for Rockchip calendar deviation on November 31st

commit | commitdiff | tree

Steve Twiss [Tue, 8 Dec 2015 16:28:39 +0000 (16:28 +0000)]

rtc: da9063: fix access ordering error during RTC interrupt at system power on

This fix alters the ordering of the IRQ and device registrations in the RTC
driver probe function. This change will apply to the RTC driver that supports
both DA9063 and DA9062 PMICs.

A problem could occur with the existing RTC driver if:

A system is started from a cold boot using the PMIC RTC IRQ to initiate a
power on operation. For instance, if an RTC alarm is used to start a
platform from power off.
The existing driver IRQ is requested before the device has been properly
registered.
i.e.
    ret = devm_request_threaded_irq()
comes before
    rtc->rtc_dev = devm_rtc_device_register();

In this case, the interrupt can be called before the device has been
registered and the handler can be called immediately. The IRQ handler
da9063_alarm_event() contains the function call

    rtc_update_irq(rtc->rtc_dev, 1, RTC_IRQF | RTC_AF);

which in turn tries to access the unavailable rtc->rtc_dev.

The fix is to reorder the functions inside the RTC probe. The IRQ is
requested after the RTC device resource has been registered so that
get_irq_byname is the last thing to happen.

Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>

commit | commitdiff | tree

Julius Werner [Tue, 15 Dec 2015 23:02:49 +0000 (15:02 -0800)]

rtc: rk808: Compensate for Rockchip calendar deviation on November 31st

In A.D. 1582 Pope Gregory XIII found that the existing Julian calendar
insufficiently represented reality, and changed the rules about
calculating leap years to account for this. Similarly, in A.D. 2013
Rockchip hardware engineers found that the new Gregorian calendar still
contained flaws, and that the month of November should be counted up to
31 days instead. Unfortunately it takes a long time for calendar changes
to gain widespread adoption, and just like more than 300 years went by
before the last Protestant nation implemented Greg's proposal, we will
have to wait a while until all religions and operating system kernels
acknowledge the inherent advantages of the Rockchip system. Until then
we need to translate dates read from (and written to) Rockchip hardware
back to the Gregorian format.

This patch works by defining Jan 1st, 2016 as the arbitrary anchor date
on which Rockchip and Gregorian calendars are in sync. From that we can
translate arbitrary later dates back and forth by counting the number
of November/December transitons since the anchor date to determine the
offset between the calendars. We choose this method (rather than trying
to regularly "correct" the date stored in hardware) since it's the only
way to ensure perfect time-keeping even if the system may be shut down
for an unknown number of years. The drawback is that other software
reading the same hardware (e.g. mainboard firmware) must use the same
translation convention (including the same anchor date) to be able to
read and write correct timestamps from/to the RTC.

Signed-off-by: Julius Werner <jwerner@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>

commit | commitdiff | tree

Linus Torvalds [Sun, 20 Dec 2015 01:44:19 +0000 (17:44 -0800)]

Merge tag 'tty-4.4-rc6' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
"Here are some tty/serial driver fixes for 4.4-rc6 that resolve some
  reported problems.  All of these have been in linux-next.  The details
  are in the shortlog"

* tag 'tty-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: Fix GPF in flush_to_ldisc()
  serial: earlycon: Add missing spinlock initialization
  serial: sh-sci: Fix length of scatterlist
  n_tty: Fix poll() after buffer-limited eof push read
  serial: 8250_uniphier: fix dl_read and dl_write functions

commit | commitdiff | tree

Linus Torvalds [Sun, 20 Dec 2015 01:33:58 +0000 (17:33 -0800)]

Merge tag 'usb-4.4-rc6' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some USB and PHY fixes for 4.4-rc6.  All of them resolve some
  reported problems.  Full details in the shortlog"

* tag 'usb-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: fix invalid memory access in hub_activate()
  USB: ipaq.c: fix a timeout loop
  phy: core: Get a refcount to phy in devm_of_phy_get_by_index()
  phy: cygnus: pcie: add missing of_node_put
  phy: miphy365x: add missing of_node_put
  phy: miphy28lp: add missing of_node_put
  phy: rockchip-usb: add missing of_node_put
  phy: berlin-sata: add missing of_node_put
  phy: mt65xx-usb3: add missing of_node_put
  phy: brcmstb-sata: add missing of_node_put
  phy: sun9i-usb: add USB dependency

commit | commitdiff | tree

Linus Torvalds [Sun, 20 Dec 2015 00:46:46 +0000 (16:46 -0800)]

Merge tag 'md/4.4-rc5-fixes' of git://neil.brown.name/md

Pull md fixes from Neil Brown:
"Four fixes for md:

   - two recently introduced regressions fixed.
   - one older bug in RAID10 - tagged for -stable since 4.2
   - one minor sysfs api improvement"

* tag 'md/4.4-rc5-fixes' of git://neil.brown.name/md:
  Fix remove_and_add_spares removes drive added as spare in slot_store
  md: fix bug due to nested suspend
  MD: change journal disk role to disk 0
  md/raid10: fix data corruption and crash during resync

commit | commitdiff | tree

Linus Torvalds [Sun, 20 Dec 2015 00:40:48 +0000 (16:40 -0800)]

Merge tag 'powerpc-4.4-5' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
- Partial revert of "powerpc: Individual System V IPC system calls"
- pr_warn_once on unsupported OPAL_MSG type from Stewart
- Fix deadlock in opal-irqchip introduced by "Fix double endian
   conversion" from Alistair

* tag 'powerpc-4.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/opal-irqchip: Fix deadlock introduced by "Fix double endian conversion"
  powerpc/powernv: pr_warn_once on unsupported OPAL_MSG type
  Partial revert of "powerpc: Individual System V IPC system calls"

commit | commitdiff | tree

Linus Torvalds [Sat, 19 Dec 2015 18:10:43 +0000 (10:10 -0800)]

Merge tag 'spi-fix-v4.4-rc5' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A couple of reference counting bugs here, one in spidev and one with
  holding an extra reference in the core that we never freed if we
  removed a device, plus a driver specific fix.  Both of the refcounting
  bugs are very old but they've only been found by observation so
  hopefully their impact has been low"

* tag 'spi-fix-v4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: fix parent-device reference leak
  spi: spidev: Hold spi_lock over all defererences of spi in release()
  spi-fsl-dspi: Fix CTAR Register access

commit | commitdiff | tree

Linus Torvalds [Sat, 19 Dec 2015 18:05:00 +0000 (10:05 -0800)]

Merge tag 'gpio-v4.4-3' of git://git./linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
"Some GPIO fixes for the v4.4 series.  Most prominent: I revert the
  error propagation from the .get() function until we can fix up all the
  drivers properly for v4.5.

   - Revert the error number propagation from the .get() vtable entry
     temporarily, until we make the proper fixes to all drivers.
   - Fix the clamping behaviour in the generic GPIO driver.
   - Driver fix for the ath79 driver"

* tag 'gpio-v4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: revert get() to non-errorprogating behaviour
  gpio: generic: clamp values from bgpio_get_set()
  gpio: ath79: Fix the logic to clear offset bit of AR71XX_GPIO_REG_OE register

commit | commitdiff | tree

Linus Torvalds [Sat, 19 Dec 2015 18:01:03 +0000 (10:01 -0800)]

Merge tag 'pinctrl-v4.4-3' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
- Driver fixes for Freescale i.MX7D, Intel, Broadcom 2835
- One MAINTAINERS entry

* tag 'pinctrl-v4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  MAINTAINERS: pinctrl: Add maintainers for pinctrl-single
  pinctrl: bcm2835: Fix initial value for direction_output
  pinctrl: intel: fix offset calculation issue of register PAD_OWN
  pinctrl: intel: fix bug of register offset calculation
  pinctrl: freescale: add ZERO_OFFSET_VALID flag for vf610 pinctrl

commit | commitdiff | tree

Linus Torvalds [Sat, 19 Dec 2015 17:52:44 +0000 (09:52 -0800)]

Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"A set of 'usual' driver bugfixes for the I2C subsystem"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: rcar: disable runtime PM correctly in slave mode
  i2c: designware: Keep pm_runtime_enable/_disable calls in sync
  i2c: designware: fix IO timeout issue for AMD controller
  i2c: imx: init bus recovery info before adding i2c adapter
  i2c: do not use 0x in front of %pa
  i2c: davinci: Increase module clock frequency
  i2c: mv64xxx: The n clockdiv factor is 0 based on sunxi SoCs
  i2c: rk3x: populate correct variable for sda_falling_time

commit | commitdiff | tree

Linus Torvalds [Sat, 19 Dec 2015 17:51:11 +0000 (09:51 -0800)]

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:
"Just a few assorted driver fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: elants_i2c - fix wake-on-touch
  Input: elan_i2c - set input device's vendor and product IDs
  Input: sun4i-lradc-keys - fix typo in binding documentation
  Input: atmel_mxt_ts - add maxtouch to I2C table for module autoload
  Input: arizona-haptic - fix disabling of haptics device
  Input: aiptek - fix crash on detecting device without endpoints
  Input: atmel_mxt_ts - add generic platform data for Chromebooks
  Input: parkbd - clear unused function pointers
  Input: walkera0701 - clear unused function pointers
  Input: turbografx - clear unused function pointers
  Input: gamecon - clear unused function pointers
  Input: db9 - clear unused function pointers

commit | commitdiff | tree

Wolfram Sang [Wed, 16 Dec 2015 19:05:18 +0000 (20:05 +0100)]

i2c: rcar: disable runtime PM correctly in slave mode

When we also are I2C slave, we need to disable runtime PM because the
address detection mechanism needs to be active all the time. However, we
can reenable runtime PM once the slave instance was unregistered. So,
use pm_runtime_get_sync/put to achieve this, since it has proper
refcounting. pm_runtime_allow/forbid is like a global knob controllable
from userspace which is unsuitable here.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org

commit | commitdiff | tree

Linus Torvalds [Sat, 19 Dec 2015 05:01:35 +0000 (21:01 -0800)]

Merge tag 'pm+acpi-4.4-rc6' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix a potential regression introduced during the 4.3 cycle
  (generic power domains framework), a nasty bug that has been present
  forever (power capping RAPL driver), a build issue (Tegra cpufreq
  driver) and a minor ugliness introduced recently (intel_pstate).

  Specifics:

   - Fix a potential regression in the generic power domains framework
     introduced during the 4.3 development cycle that may lead to
     spurious failures of system suspend in certain situations (Ulf
     Hansson).

   - Fix a problem in the power capping RAPL (Running Average Power
     Limits) driver that causes it to initialize successfully on some
     systems where it is not supposed to do that which is due to an
     incorrect check in an initialization routine (Prarit Bhargava).

   - Fix a build problem in the cpufreq Tegra driver that depends on the
     regulator framework, but that dependency is not reflected in
     Kconfig (Arnd Bergmann).

   - Fix a recent mistake in the intel_pstate driver where a numeric
     constant is used directly instead of a symbol defined specifically
     for the case in question (Prarit Bhargava)"

* tag 'pm+acpi-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  powercap / RAPL: fix BIOS lock check
  cpufreq: intel_pstate: Minor cleanup for FRAC_BITS
  cpufreq: tegra: add regulator dependency for T124
  PM / Domains: Allow runtime PM callbacks to be re-used during system PM

commit | commitdiff | tree

Linus Torvalds [Sat, 19 Dec 2015 04:35:35 +0000 (20:35 -0800)]

Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Three fixes this time, two in SES picked up by KASAN for various types
  of buffer overrun.  The first is a USB array which returns page 8
  whatever is asked for and causes us to overrun with incorrect data
  format assumptions and the second is an invalid iteration of page 10
  (the additional information page).

  The final fix is a reversion of a NULL deref fix which caused
  suspend/resume not to be called in pairs leading to incorrect device
  operation (Jens has queued a more proper fix for the problem in
  block)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  ses: fix additional element traversal bug
  Revert "SCSI: Fix NULL pointer dereference in runtime PM"
  ses: Fix problems with simple enclosures

commit | commitdiff | tree

James Chen [Fri, 18 Dec 2015 23:51:48 +0000 (15:51 -0800)]

Input: elants_i2c - fix wake-on-touch

When sending "SLEEP" command to the controller it ceases scanning
completely and is unable to wake the system up from sleep, so if it is
configured as a wakeup source we should simply configure interrupt for
wakeup and rely on idle logic within the controller to reduce power
consumption while it is not used.

Signed-off-by: James Chen <james.chen@emc.com.tw>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

commit | commitdiff | tree

Linus Torvalds [Fri, 18 Dec 2015 23:41:35 +0000 (15:41 -0800)]

Merge tag 'media/v4.4-3' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab.

* tag 'media/v4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] airspy: increase USB control message buffer size
  [media] hackrf: move RF gain ctrl enable behind module parameter
  [media] hackrf: fix possible null ptr on debug printing
  [media] Revert "[media] ivtv: avoid going past input/audio array"

commit | commitdiff | tree

Linus Torvalds [Fri, 18 Dec 2015 23:35:08 +0000 (15:35 -0800)]

Merge branch 'for-linus-4.4' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"A couple of small fixes"

* 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: check prepare_uptodate_page() error code earlier
  Btrfs: check for empty bitmap list in setup_cluster_bitmaps
  btrfs: fix misleading warning when space cache failed to load
  Btrfs: fix transaction handle leak in balance
  Btrfs: fix unprotected list move from unused_bgs to deleted_bgs list

commit | commitdiff | tree

Linus Torvalds [Fri, 18 Dec 2015 22:25:57 +0000 (14:25 -0800)]

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"Three patches"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  include/linux/mmdebug.h: should include linux/bug.h
  mm/zswap: change incorrect strncmp use to strcmp
  proc: fix -ESRCH error when writing to /proc/$pid/coredump_filter

commit | commitdiff | tree

James Morse [Fri, 18 Dec 2015 22:22:07 +0000 (14:22 -0800)]

include/linux/mmdebug.h: should include linux/bug.h

mmdebug.h uses BUILD_BUG_ON_INVALID(), assuming someone else included
linux/bug.h.  Include it ourselves.

This saves build-failures such as:

  arch/arm64/include/asm/pgtable.h: In function 'set_pte_at':
  arch/arm64/include/asm/pgtable.h:281:3: error: implicit declaration of function 'BUILD_BUG_ON_INVALID' [-Werror=implicit-function-declaration]
   VM_WARN_ONCE(!pte_young(pte),

Fixes: 02602a18c32d7 ("bug: completely remove code generated by disabled VM_BUG_ON()")
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Dan Streetman [Fri, 18 Dec 2015 22:22:04 +0000 (14:22 -0800)]

mm/zswap: change incorrect strncmp use to strcmp

Change the use of strncmp in zswap_pool_find_get() to strcmp.

The use of strncmp is no longer correct, now that zswap_zpool_type is
not an array; sizeof() will return the size of a pointer, which isn't
the right length to compare. We don't need to use strncmp anyway,
because the existing params and the passed in params are all guaranteed
to be null terminated, so strcmp should be used.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Reported-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Seth Jennings <sjennings@variantweb.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Colin Ian King [Fri, 18 Dec 2015 22:22:01 +0000 (14:22 -0800)]

proc: fix -ESRCH error when writing to /proc/$pid/coredump_filter

Writing to /proc/$pid/coredump_filter always returns -ESRCH because commit
774636e19ed51 ("proc: convert to kstrto*()/kstrto*_from_user()") removed
the setting of ret after the get_proc_task call and incorrectly left it as
-ESRCH.  Instead, return 0 when successful.

Example breakage:

  echo 0 > /proc/self/coredump_filter
  bash: echo: write error: No such process

Fixes: 774636e19ed51 ("proc: convert to kstrto*()/kstrto*_from_user()")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Linus Torvalds [Fri, 18 Dec 2015 20:51:52 +0000 (12:51 -0800)]

Merge tag 'hwmon-for-linus-v4.4-rc6' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

- Select CONFIG_BITREVERSE for sht15 driver to avoid build failure if
   it is not configured.

- Force wait for conversion time for the first valid data in tmp102
   driver to avoid reporting erroneous data to the thermal subsystem.

* tag 'hwmon-for-linus-v4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (sht15) Select CONFIG_BITREVERSE
  hwmon: (tmp102) Force wait for conversion time for the first valid data

commit | commitdiff | tree

Linus Torvalds [Fri, 18 Dec 2015 20:38:35 +0000 (12:38 -0800)]

Merge tag 'iommu-fixes-v4.4-rc5' of git://git./linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
"Two similar fixes for the Intel and AMD IOMMU drivers to add proper
  access checks before calling handle_mm_fault"

* tag 'iommu-fixes-v4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Do access checks before calling handle_mm_fault()
  iommu/amd: Do proper access checking before calling handle_mm_fault()

commit | commitdiff | tree

Linus Torvalds [Fri, 18 Dec 2015 20:24:52 +0000 (12:24 -0800)]

Merge tag 'for-linus-4.4-rc5-tag' of git://git./linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:
- XSA-155 security fixes to backend drivers.
- XSA-157 security fixes to pciback.

* tag 'for-linus-4.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen-pciback: fix up cleanup path when alloc fails
  xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.
  xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled.
  xen/pciback: Do not install an IRQ handler for MSI interrupts.
  xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled
  xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled
  xen/pciback: Save xen_pci_op commands before processing it
  xen-scsiback: safely copy requests
  xen-blkback: read from indirect descriptors only once
  xen-blkback: only read request operation from shared ring once
  xen-netback: use RING_COPY_REQUEST() throughout
  xen-netback: don't use last request to determine minimum Tx credit
  xen: Add RING_COPY_REQUEST()
  xen/x86/pvh: Use HVM's flush_tlb_others op
  xen: Resume PMU from non-atomic context
  xen/events/fifo: Consume unprocessed events when a CPU dies

commit | commitdiff | tree

Linus Torvalds [Fri, 18 Dec 2015 20:19:01 +0000 (12:19 -0800)]

Merge tag 'arc-fixes-for-4.4-rc6' of git://git./linux/kernel/git/vgupta/arc

Pull ARC architecture fixes from Vineet Gupta:
"Fixes for:

- perf interrupts on SMP: Not enabled (at boot) and disabled (at runtime)
- stack unwinder regression (for modules, ignoring dwarf3)
- nsim hosed for non default kernel link base builds"

* tag 'arc-fixes-for-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: smp: Rename platform hook @init_cpu_smp -> @init_per_cpu
  ARC: rename smp operation init_irq_cpu() to init_per_cpu()
  ARC: dw2 unwind: Ignore CIE version !=1 gracefully instead of bailing
  ARC: dw2 unwind: Reinstante unwinding out of modules
  ARC: [plat-sim] unbork non default CONFIG_LINUX_LINK_BASE
  ARC: intc: Document arc_request_percpu_irq() better
  ARCv2: perf: Ensure perf intr gets enabled on all cores
  ARC: intc: No need to clear IRQ_NOAUTOEN
  ARCv2: intc: Fix random perf irq disabling in SMP setup
  ARC: [axs10x] cap ethernet phy to 100 Mbit/sec

commit | commitdiff | tree

Linus Torvalds [Fri, 18 Dec 2015 19:47:06 +0000 (11:47 -0800)]

Merge tag 'sound-4.4-rc6' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"As usual in rc6, this update contains only a few HD-audio and
  USB-audio device-specific quirks: yet another Thinkpad noise fixes,
  Dell headphone mic fixes, and AudioQuest DragonFly fixes"

* tag 'sound-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Add a fixup for Thinkpad X1 Carbon 2nd
  ALSA: hda - Set codec to D3 at reboot/shutdown on Thinkpads
  ALSA: hda - Apply click noise workaround for Thinkpads generically
  ALSA: hda - Fix headphone mic input on a few Dell ALC293 machines
  ALSA: usb-audio: Add sample rate inquiry quirk for AudioQuest DragonFly
  ALSA: usb-audio: Add a more accurate volume quirk for AudioQuest DragonFly

commit | commitdiff | tree

Linus Torvalds [Fri, 18 Dec 2015 19:19:16 +0000 (11:19 -0800)]

Merge tag 'for-linus-20151217' of git://git.infradead.org/linux-mtd

Pull MTD fixes from Brian Norris:
"I was holding out on this pull request for a bit, since there are a
  few other small issues being discussed that look like 4.4-rc
  regressions.  Hopefully I can get those stabilized soon, but these are
  ready at any rate:

   - A little bit of a last-minute change for the device tree "fixed
     partition" binding.  This is needed because we might want to reuse
     the 'partitions' subnode for other sorts of partitioning
     descriptions -- e.g., for describing which on-flash partition
     format(s) might be used on the system.

   - Also tone down a warning message, since it is probably going to
     show up on a lot of systems where it should just be ignored"

* tag 'for-linus-20151217' of git://git.infradead.org/linux-mtd:
  doc: dt: mtd: partitions: add compatible property to "partitions" node
  mtd: ofpart: don't complain about missing 'partitions' node too loudly

commit | commitdiff | tree

Chris Mason [Fri, 18 Dec 2015 19:11:10 +0000 (11:11 -0800)]

Merge branch 'freespace-tree' into for-linus-4.5

Signed-off-by: Chris Mason <clm@fb.com>

commit | commitdiff | tree

Alan Stern [Wed, 16 Dec 2015 18:32:38 +0000 (13:32 -0500)]

USB: fix invalid memory access in hub_activate()

Commit 8520f38099cc ("USB: change hub initialization sleeps to
delayed_work") changed the hub_activate() routine to make part of it
run in a workqueue.  However, the commit failed to take a reference to
the usb_hub structure or to lock the hub interface while doing so.  As
a result, if a hub is plugged in and quickly unplugged before the work
routine can run, the routine will try to access memory that has been
deallocated.  Or, if the hub is unplugged while the routine is
running, the memory may be deallocated while it is in active use.

This patch fixes the problem by taking a reference to the usb_hub at
the start of hub_activate() and releasing it at the end (when the work
is finished), and by locking the hub interface while the work routine
is running.  It also adds a check at the start of the routine to see
if the hub has already been disconnected, in which nothing should be
done.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Alexandru Cornea <alexandru.cornea@intel.com>
Tested-by: Alexandru Cornea <alexandru.cornea@intel.com>
Fixes: 8520f38099cc ("USB: change hub initialization sleeps to delayed_work")
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Dan Carpenter [Wed, 16 Dec 2015 11:06:37 +0000 (14:06 +0300)]

USB: ipaq.c: fix a timeout loop

The code expects the loop to end with "retries" set to zero but, because
it is a post-op, it will end set to -1. I have fixed this by moving the
decrement inside the loop.

Fixes: 014aa2a3c32e ('USB: ipaq: minor ipaq_open() cleanup.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Antti Palosaari [Mon, 26 Oct 2015 20:58:14 +0000 (18:58 -0200)]

[media] airspy: increase USB control message buffer size

Driver requested device firmware version string during probe using
only 24 byte long buffer. That buffer is too small for newer firmware
versions, which causes device firmware hang - device stops responding
to any commands after that. Increase buffer size to 128 which should
be enough for any current and future version strings.

Link: https://github.com/airspy/host/issues/27
Cc: <stable@vger.kernel.org> # 3.17+
Reported-by: Benjamin Vernoux <bvernoux@gmail.com>
Signed-off-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>

commit | commitdiff | tree

Antti Palosaari [Fri, 23 Oct 2015 22:01:31 +0000 (20:01 -0200)]

[media] hackrf: move RF gain ctrl enable behind module parameter

Used Avago MGA-81563 RF amplifier could be destroyed pretty easily
with too strong signal or transmitting to bad antenna.
Add module parameter 'enable_rf_gain_ctrl' which allows enabling
RF gain control - otherwise, default without the module parameter,
RF gain control is set to 'grabbed' state which prevents setting
value to the control.

Signed-off-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>

commit | commitdiff | tree

Antti Palosaari [Wed, 21 Oct 2015 21:02:41 +0000 (19:02 -0200)]

[media] hackrf: fix possible null ptr on debug printing

drivers/media/usb/hackrf/hackrf.c:1533 hackrf_probe()
error: we previously assumed 'dev' could be null (see line 1366)

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>

commit | commitdiff | tree

Mauro Carvalho Chehab [Wed, 11 Nov 2015 11:22:36 +0000 (09:22 -0200)]

[media] Revert "[media] ivtv: avoid going past input/audio array"

This patch broke ivtv logic, as reported at
https://bugzilla.redhat.com/show_bug.cgi?id=1278942

This reverts commit 09290cc885937cab3b2d60a6d48fe3d2d3e04061.

Cc: stable@vger.kernel.org # for v4.1 and upper
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>

commit | commitdiff | tree

Greg Kroah-Hartman [Fri, 18 Dec 2015 17:24:50 +0000 (09:24 -0800)]

Merge tag 'phy-for-4.4-rc' of git://git./linux/kernel/git/kishon/linux-phy into usb-linus

Kishon writes:

phy: for 4.4 -rc

*) Add missing of_node_put in a bunch of PHY drivers
*) Add get_device in devm_of_phy_get_by_index()
*) Fix randconfig build error in sun9i usb driver

commit | commitdiff | tree

Doug Goldstein [Thu, 26 Nov 2015 20:32:39 +0000 (14:32 -0600)]

xen-pciback: fix up cleanup path when alloc fails

When allocating a pciback device fails, clear the private
field. This could lead to an use-after free, however
the 'really_probe' takes care of setting
dev_set_drvdata(dev, NULL) in its failure path (which we would
exercise if the ->probe function failed), so we we
are OK. However lets be defensive as the code can change.

Going forward we should clean up the pci_set_drvdata(dev, NULL)
in the various code-base. That will be for another day.

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reported-by: Jonathan Creekmore <jonathan.creekmore@gmail.com>
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Arnd Bergmann [Fri, 18 Dec 2015 14:52:28 +0000 (15:52 +0100)]

hwmon: (sht15) Select CONFIG_BITREVERSE

If CONFIG_BITREVERSE is not built-in, the sht15 driver fails to link:

drivers/built-in.o: In function `sht15_crc8':
drivers/hwmon/sht15.c:195: undefined reference to `byte_rev_table'

This adds a Kconfig 'select' statement, like all other users of
bitrev.h have it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 33836ee98533 ("hwmon:change sht15_reverse()")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Mon, 2 Nov 2015 23:13:27 +0000 (18:13 -0500)]

xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.

commit f598282f51 ("PCI: Fix the NIU MSI-X problem in a better way")
teaches us that dealing with MSI-X can be troublesome.

Further checks in the MSI-X architecture shows that if the
PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we
may not be able to access the BAR (since they are memory regions).

Since the MSI-X tables are located in there.. that can lead
to us causing PCIe errors. Inhibit us performing any
operation on the MSI-X unless the MEMORY bit is set.

Note that Xen hypervisor with:
"x86/MSI-X: access MSI-X table only after having enabled MSI-X"
will return:
xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3!

When the generic MSI code tries to setup the PIRQ without
MEMORY bit set. Which means with later versions of Xen
(4.6) this patch is not neccessary.

This is part of XSA-157

CC: stable@vger.kernel.org
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Domain: System / Kernel;

RSS Atom