Linus Torvalds [Sat, 28 May 2016 00:14:05 +0000 (17:14 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"Followups to the parallel lookup work:
- update docs
- restore killability of the places that used to take ->i_mutex
killably now that we have down_write_killable() merged
- Additionally, it turns out that I missed a prerequisite for
security_d_instantiate() stuff - ->getxattr() wasn't the only thing
that could be called before dentry is attached to inode; with smack
we needed the same treatment applied to ->setxattr() as well"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
switch ->setxattr() to passing dentry and inode separately
switch xattr_handler->set() to passing dentry and inode separately
restore killability of old mutex_lock_killable(&inode->i_mutex) users
add down_write_killable_nested()
update D/f/directory-locking
Al Viro [Fri, 27 May 2016 15:06:05 +0000 (11:06 -0400)]
switch ->setxattr() to passing dentry and inode separately
smack ->d_instantiate() uses ->setxattr(), so to be able to call it before
we'd hashed the new dentry and attached it to inode, we need ->setxattr()
instances getting the inode as an explicit argument rather than obtaining
it from dentry.
Similar change for ->getxattr() had been done in commit ce23e64. Unlike
->getxattr() (which is used by both selinux and smack instances of
->d_instantiate()) ->setxattr() is used only by smack one and unfortunately
it got missed back then.
Reported-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Fri, 27 May 2016 23:44:39 +0000 (16:44 -0700)]
Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs
Pull overlayfs update from Miklos Szeredi:
"The meat of this is a change to use the mounter's credentials for
operations that require elevated privileges (such as whiteout
creation). This fixes behavior under user namespaces as well as being
a nice cleanup"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: Do d_type check only if work dir creation was successful
ovl: update documentation
ovl: override creds with the ones from the superblock mounter
Linus Torvalds [Fri, 27 May 2016 23:37:36 +0000 (16:37 -0700)]
Merge branch 'for-linus-4.7' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs cleanups and fixes from Chris Mason:
"We have another round of fixes and a few cleanups.
I have a fix for short returns from btrfs_copy_from_user, which
finally nails down a very hard to find regression we added in v4.6.
Dave is pushing around gfp parameters, mostly to cleanup internal apis
and make it a little more consistent.
The rest are smaller fixes, and one speelling fixup patch"
* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (22 commits)
Btrfs: fix handling of faults from btrfs_copy_from_user
btrfs: fix string and comment grammatical issues and typos
btrfs: scrub: Set bbio to NULL before calling btrfs_map_block
Btrfs: fix unexpected return value of fiemap
Btrfs: free sys_array eb as soon as possible
btrfs: sink gfp parameter to convert_extent_bit
btrfs: make state preallocation more speculative in __set_extent_bit
btrfs: untangle gotos a bit in convert_extent_bit
btrfs: untangle gotos a bit in __clear_extent_bit
btrfs: untangle gotos a bit in __set_extent_bit
btrfs: sink gfp parameter to set_record_extent_bits
btrfs: sink gfp parameter to set_extent_new
btrfs: sink gfp parameter to set_extent_defrag
btrfs: sink gfp parameter to set_extent_delalloc
btrfs: sink gfp parameter to clear_extent_dirty
btrfs: sink gfp parameter to clear_record_extent_bits
btrfs: sink gfp parameter to clear_extent_bits
btrfs: sink gfp parameter to set_extent_bits
btrfs: make find_workspace warn if there are no workspaces
btrfs: make find_workspace always succeed
...
Linus Torvalds [Fri, 27 May 2016 23:03:22 +0000 (16:03 -0700)]
make IS_ERR_VALUE() complain about non-pointer-sized arguments
Now that the allmodconfig x86-64 build is clean wrt IS_ERR_VALUE() uses
on integers, add a cast to a pointer and back to the argument, so that
any new mis-uses of IS_ERR_VALUE() will cause warnings like
warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
so that we don't re-introduce any bogus uses.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 27 May 2016 22:57:31 +0000 (15:57 -0700)]
mm: remove more IS_ERR_VALUE abuses
The do_brk() and vm_brk() return value was "unsigned long" and returned
the starting address on success, and an error value on failure. The
reasons are entirely historical, and go back to it basically behaving
like the mmap() interface does.
However, nobody actually wanted that interface, and it causes totally
pointless IS_ERR_VALUE() confusion.
What every single caller actually wants is just the simpler integer
return of zero for success and negative error number on failure.
So just convert to that much clearer and more common calling convention,
and get rid of all the IS_ERR_VALUE() uses wrt vm_brk().
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Fri, 27 May 2016 21:23:25 +0000 (23:23 +0200)]
remove lots of IS_ERR_VALUE abuses
Most users of IS_ERR_VALUE() in the kernel are wrong, as they
pass an 'int' into a function that takes an 'unsigned long'
argument. This happens to work because the type is sign-extended
on 64-bit architectures before it gets converted into an
unsigned type.
However, anything that passes an 'unsigned short' or 'unsigned int'
argument into IS_ERR_VALUE() is guaranteed to be broken, as are
8-bit integers and types that are wider than 'unsigned long'.
Andrzej Hajda has already fixed a lot of the worst abusers that
were causing actual bugs, but it would be nice to prevent any
users that are not passing 'unsigned long' arguments.
This patch changes all users of IS_ERR_VALUE() that I could find
on 32-bit ARM randconfig builds and x86 allmodconfig. For the
moment, this doesn't change the definition of IS_ERR_VALUE()
because there are probably still architecture specific users
elsewhere.
Almost all the warnings I got are for files that are better off
using 'if (err)' or 'if (err < 0)'.
The only legitimate user I could find that we get a warning for
is the (32-bit only) freescale fman driver, so I did not remove
the IS_ERR_VALUE() there but changed the type to 'unsigned long'.
For 9pfs, I just worked around one user whose calling conventions
are so obscure that I did not dare change the behavior.
I was using this definition for testing:
#define IS_ERR_VALUE(x) ((unsigned long*)NULL == (typeof (x)*)NULL && \
unlikely((unsigned long long)(x) >= (unsigned long long)(typeof(x))-MAX_ERRNO))
which ends up making all 16-bit or wider types work correctly with
the most plausible interpretation of what IS_ERR_VALUE() was supposed
to return according to its users, but also causes a compile-time
warning for any users that do not pass an 'unsigned long' argument.
I suggested this approach earlier this year, but back then we ended
up deciding to just fix the users that are obviously broken. After
the initial warning that caused me to get involved in the discussion
(fs/gfs2/dir.c) showed up again in the mainline kernel, Linus
asked me to send the whole thing again.
[ Updated the 9p parts as per Al Viro - Linus ]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.org/lkml/2016/1/7/363
Link: https://lkml.org/lkml/2016/5/27/486
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> # For nvmem part
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 27 May 2016 22:23:32 +0000 (15:23 -0700)]
mm: fix section mismatch warning
The register_page_bootmem_info_node() function needs to be marked __init
in order to avoid a new warning introduced by commit
f65e91df25aa ("mm:
use early_pfn_to_nid in register_page_bootmem_info_node").
Otherwise you'll get a warning about how a non-init function calls
early_pfn_to_nid (which is __meminit)
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 27 May 2016 21:56:59 +0000 (14:56 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc updates and fixes from Andrew Morton:
- late-breaking ocfs2 updates
- random bunch of fixes
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: disable DEFERRED_STRUCT_PAGE_INIT on !NO_BOOTMEM
mm/memcontrol.c: move comments for get_mctgt_type() to proper position
mm/memcontrol.c: fix the margin computation in mem_cgroup_margin()
mm/cma: silence warnings due to max() usage
mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()
oom_reaper: close race with exiting task
mm: use early_pfn_to_nid in register_page_bootmem_info_node
mm: use early_pfn_to_nid in page_ext_init
MAINTAINERS: Kdump maintainers update
MAINTAINERS: add kexec_core.c and kexec_file.c
mm: oom: do not reap task if there are live threads in threadgroup
direct-io: fix direct write stale data exposure from concurrent buffered read
ocfs2: bump up o2cb network protocol version
ocfs2: o2hb: fix hb hung time
ocfs2: o2hb: don't negotiate if last hb fail
ocfs2: o2hb: add some user/debug log
ocfs2: o2hb: add NEGOTIATE_APPROVE message
ocfs2: o2hb: add NEGO_TIMEOUT message
ocfs2: o2hb: add negotiate timer
Gavin Shan [Fri, 27 May 2016 21:27:49 +0000 (14:27 -0700)]
mm: disable DEFERRED_STRUCT_PAGE_INIT on !NO_BOOTMEM
When we have !NO_BOOTMEM, the deferred page struct initialization
doesn't work well because the pages reserved in bootmem are released to
the page allocator uncoditionally. It causes memory corruption and
system crash eventually.
As Mel suggested, the bootmem is retiring slowly. We fix the issue by
simply hiding DEFERRED_STRUCT_PAGE_INIT when bootmem is enabled.
Link: http://lkml.kernel.org/r/1460602170-5821-1-git-send-email-gwshan@linux.vnet.ibm.com
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li RongQing [Fri, 27 May 2016 21:27:46 +0000 (14:27 -0700)]
mm/memcontrol.c: move comments for get_mctgt_type() to proper position
Move the comments for get_mctgt_type() to be before get_mctgt_type()
implementation.
Link: http://lkml.kernel.org/r/1463644638-7446-1-git-send-email-roy.qing.li@gmail.com
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li RongQing [Fri, 27 May 2016 21:27:43 +0000 (14:27 -0700)]
mm/memcontrol.c: fix the margin computation in mem_cgroup_margin()
mem_cgroup_margin() might return (memory.limit - memory_count) when the
memsw.limit is in excess. This doesn't happen usually because we do not
allow excess on hard limits and (memory.limit <= memsw.limit), but
__GFP_NOFAIL charges can force the charge and cause the excess when no
memory is really swappable (swap is full or no anonymous memory is
left).
[mhocko@suse.com: rewrote changelog]
Link: http://lkml.kernel.org/r/20160525155122.GK20132@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1464068266-27736-1-git-send-email-roy.qing.li@gmail.com
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Rothwell [Fri, 27 May 2016 21:27:41 +0000 (14:27 -0700)]
mm/cma: silence warnings due to max() usage
pageblock_order can be (at least) an unsigned int or an unsigned long
depending on the kernel config and architecture, so use max_t(unsigned
long, ...) when comparing it.
fixes these warnings:
In file included from include/asm-generic/bug.h:13:0,
from arch/powerpc/include/asm/bug.h:127,
from include/linux/bug.h:4,
from include/linux/mmdebug.h:4,
from include/linux/mm.h:8,
from include/linux/memblock.h:18,
from mm/cma.c:28:
mm/cma.c: In function 'cma_init_reserved_mem':
include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast
(void) (&_max1 == &_max2); ^
mm/cma.c:186:27: note: in expansion of macro 'max'
alignment = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
^
mm/cma.c: In function 'cma_declare_contiguous':
include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast
(void) (&_max1 == &_max2); ^
include/linux/kernel.h:747:9: note: in definition of macro 'max'
typeof(y) _max2 = (y); ^
mm/cma.c:270:29: note: in expansion of macro 'max'
(phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order));
^
include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast
(void) (&_max1 == &_max2); ^
include/linux/kernel.h:747:21: note: in definition of macro 'max'
typeof(y) _max2 = (y); ^
mm/cma.c:270:29: note: in expansion of macro 'max'
(phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order));
^
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20160526150748.5be38a4f@canb.auug.org.au
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Fri, 27 May 2016 21:27:38 +0000 (14:27 -0700)]
mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()
If page_move_anon_rmap() is refiling a pmd-splitted THP mapped in a tail
page from a pte, the "address" must be THP aligned in order for the
page->index bugcheck to pass in the CONFIG_DEBUG_VM=y builds.
Link: http://lkml.kernel.org/r/1464253620-106404-1-git-send-email-kirill.shutemov@linux.intel.com
Fixes:
6d0a07edd17c ("mm: thp: calculate the mapcount correctly for THP pages during WP faults")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [4.5]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 27 May 2016 21:27:35 +0000 (14:27 -0700)]
oom_reaper: close race with exiting task
Tetsuo has reported:
Out of memory: Kill process 443 (oleg's-test) score 855 or sacrifice child
Killed process 443 (oleg's-test) total-vm:493248kB, anon-rss:423880kB, file-rss:4kB, shmem-rss:0kB
sh invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), order=0, oom_score_adj=0
sh cpuset=/ mems_allowed=0
CPU: 2 PID: 1 Comm: sh Not tainted 4.6.0-rc7+ #51
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
Call Trace:
dump_stack+0x85/0xc8
dump_header+0x5b/0x394
oom_reaper: reaped process 443 (oleg's-test), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
In other words:
__oom_reap_task exit_mm
atomic_inc_not_zero
tsk->mm = NULL
mmput
atomic_dec_and_test # > 0
exit_oom_victim # New victim will be
# selected
<OOM killer invoked>
# no TIF_MEMDIE task so we can select a new one
unmap_page_range # to release the memory
The race exists even without the oom_reaper because anybody who pins the
address space and gets preempted might race with exit_mm but oom_reaper
made this race more probable.
We can address the oom_reaper part by using oom_lock for __oom_reap_task
because this would guarantee that a new oom victim will not be selected
if the oom reaper might race with the exit path. This doesn't solve the
original issue, though, because somebody else still might be pinning
mm_users and so __mmput won't be called to release the memory but that
is not really realiably solvable because the task will get away from the
oom sight as soon as it is unhashed from the task_list and so we cannot
guarantee a new victim won't be selected.
[akpm@linux-foundation.org: fix use of unused `mm', Per Stephen]
[akpm@linux-foundation.org: coding-style fixes]
Fixes:
aac453635549 ("mm, oom: introduce oom reaper")
Link: http://lkml.kernel.org/r/1464271493-20008-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Fri, 27 May 2016 21:27:32 +0000 (14:27 -0700)]
mm: use early_pfn_to_nid in register_page_bootmem_info_node
register_page_bootmem_info_node() is invoked in mem_init(), so it will
be called before page_alloc_init_late() if DEFERRED_STRUCT_PAGE_INIT is
enabled. But, pfn_to_nid() depends on memmap which won't be fully setup
until page_alloc_init_late() is done, so replace pfn_to_nid() by
early_pfn_to_nid().
Link: http://lkml.kernel.org/r/1464210007-30930-1-git-send-email-yang.shi@linaro.org
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Fri, 27 May 2016 21:27:30 +0000 (14:27 -0700)]
mm: use early_pfn_to_nid in page_ext_init
page_ext_init() checks suitable pages with pfn_to_nid(), but
pfn_to_nid() depends on memmap which will not be setup fully until
page_alloc_init_late() is done. Use early_pfn_to_nid() instead of
pfn_to_nid() so that page extension could be still used early even
though CONFIG_ DEFERRED_STRUCT_PAGE_INIT is enabled and catch early page
allocation call sites.
Suggested by Joonsoo Kim [1], this fix basically undoes the change
introduced by commit
b8f1a75d61d840 ("mm: call page_ext_init() after all
struct pages are initialized") and fixes the same problem with a better
approach.
[1] http://lkml.kernel.org/r/CAAmzW4OUmyPwQjvd7QUfc6W1Aic__TyAuH80MLRZNMxKy0-wPQ@mail.gmail.com
Link: http://lkml.kernel.org/r/1464198689-23458-1-git-send-email-yang.shi@linaro.org
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vivek Goyal [Fri, 27 May 2016 21:27:27 +0000 (14:27 -0700)]
MAINTAINERS: Kdump maintainers update
I am proposing following updates to kdump maintainership. I have got
busy in other things and not getting time to spend on kdump.
Remove Haren Myneni as he has not participated in kdump development for
a long time now.
Add the names of Dave and Baoquan as kdump maintainers as they have been
contributing to kdump for a long time now and they are in a much better
position to spend time on this than me.
Mark myself as a reviewer.
Link: http://lkml.kernel.org/r/20160525131616.GB27291@redhat.com
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Simon Horman <horms@verge.net.au>
Cc: Haren Myneni <hbabu@us.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minfei Huang [Fri, 27 May 2016 21:27:24 +0000 (14:27 -0700)]
MAINTAINERS: add kexec_core.c and kexec_file.c
In the below commits kexec.c was split to kexec.c, kexec_file.c and
kexec_core.c.
commit
a43cac0d9dc2 ("kexec: split kexec_file syscall code to kexec_file.c")
commit
2965faa5e03d ("kexec: split kexec_load syscall from kexec core code")
Both kexec_file.c and kexec_core.c still belong to the kexec component.
In order to get correct mail lists by using the script get_maintainer.pl,
add these files to MAINTAINERS.
Link: http://lkml.kernel.org/r/1464189735-59113-1-git-send-email-mnghuan@gmail.com
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Fri, 27 May 2016 21:27:21 +0000 (14:27 -0700)]
mm: oom: do not reap task if there are live threads in threadgroup
If the current process is exiting, we don't invoke oom killer, instead
we give it access to memory reserves and try to reap its mm in case
nobody is going to use it. There's a mistake in the code performing
this check - we just ignore any process of the same thread group no
matter if it is exiting or not - see try_oom_reaper. Fix it.
Link: http://lkml.kernel.org/r/1464087628-7318-1-git-send-email-vdavydov@virtuozzo.com
Fixes:
3ef22dfff239 ("oom, oom_reaper: try to reap tasks which skip regular OOM killer path")Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eryu Guan [Fri, 27 May 2016 21:27:18 +0000 (14:27 -0700)]
direct-io: fix direct write stale data exposure from concurrent buffered read
Currently direct writes inside i_size on a DIO_SKIP_HOLES filesystem are
not allowed to allocate blocks(get_more_blocks() sets 'create' to 0
before calling get_block() callback), if it's a sparse file, direct
writes fall back to buffered writes to avoid stale data exposure from
concurrent buffered read. But there're two cases that can result in
stale data exposure are not correctly detected.
1. The detection for "writing inside i_size" is not sufficient,
writes can be treated as "extending writes" wrongly. For example,
direct write 1FSB (file system block) to a 1FSB sparse file on
ext2/3/4, starting from offset 0, in this case it's writing inside
i_size, but 'create' is non-zero, because 'block_in_file' and
'(i_size_read(inode) >> blkbits' are both zero.
2. Direct writes starting from or beyong i_size (not inside i_size)
also could trigger block allocation and expose stale data. For
example, consider a sparse file with i_size of 2k, and a write to
offset 2k or 3k into the file, with a filesystem block size of 4k.
(Thanks to Jeff Moyer for pointing this case out in his review.)
The first problem can be demostrated by running ltp-aiodio test ADSP045
many times. When testing on extN filesystems, I see test failures
occasionally, buffered read could read non-zero (stale) data.
ADSP045: dio_sparse -a 4k -w 4k -s 2k -n 1
dio_sparse 0 TINFO : Dirtying free blocks
dio_sparse 0 TINFO : Starting I/O tests
non zero buffer at buf[0] => 0xffffffaa,
ffffffaa,
ffffffaa,
ffffffaa
non-zero read at offset 0
dio_sparse 0 TINFO : Killing childrens(s)
dio_sparse 1 TFAIL : dio_sparse.c:191: 1 children(s) exited abnormally
The second problem can also be reproduced easily by a hacked dio_sparse
program, which accepts an option to specify the write offset.
What we should really do is to disable block allocation for writes that
could result in filling holes inside i_size.
Link: http://lkml.kernel.org/r/1463156728-13357-1-git-send-email-guaneryu@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Junxiao Bi [Fri, 27 May 2016 21:27:16 +0000 (14:27 -0700)]
ocfs2: bump up o2cb network protocol version
Two new messages are added to support negotiating hb timeout. Stop
nodes frmo talking an old version to mount as they will cause the
negotiation to fail.
Link: http://lkml.kernel.org/r/1464231615-27939-1-git-send-email-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Junxiao Bi [Fri, 27 May 2016 21:27:13 +0000 (14:27 -0700)]
ocfs2: o2hb: fix hb hung time
hr_last_timeout_start should be set as the last time where hb is
still OK. When hb write timeout, hung time will be (jiffies -
hr_last_timeout_start).
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Gang He <ghe@suse.com>
Cc: rwxybh <rwxybh@126.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Junxiao Bi [Fri, 27 May 2016 21:27:10 +0000 (14:27 -0700)]
ocfs2: o2hb: don't negotiate if last hb fail
Sometimes io error is returned when storage is down for a while. Like
for iscsi device, stroage is made offline when session timeout, and this
will make all io return -EIO. For this case, nodes shouldn't do
negotiate timeout but should fence self. So let nodes fence self when
o2hb_do_disk_heartbeat return an error, this is the same behavior with
o2hb without negotiate timer.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Gang He <ghe@suse.com>
Cc: rwxybh <rwxybh@126.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Junxiao Bi [Fri, 27 May 2016 21:27:07 +0000 (14:27 -0700)]
ocfs2: o2hb: add some user/debug log
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Gang He <ghe@suse.com>
Cc: rwxybh <rwxybh@126.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Junxiao Bi [Fri, 27 May 2016 21:27:04 +0000 (14:27 -0700)]
ocfs2: o2hb: add NEGOTIATE_APPROVE message
This message is used to re-queue write timeout timer and negotiate timer
when all nodes suffer a write hung to storage, this makes node not fence
self if storage down.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Gang He <ghe@suse.com>
Cc: rwxybh <rwxybh@126.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Junxiao Bi [Fri, 27 May 2016 21:27:01 +0000 (14:27 -0700)]
ocfs2: o2hb: add NEGO_TIMEOUT message
This message is sent to master node when non-master nodes's negotiate
timer expired. Master node records these nodes in a bitmap which is
used to do write timeout timer re-queue decision.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Gang He <ghe@suse.com>
Cc: rwxybh <rwxybh@126.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Junxiao Bi [Fri, 27 May 2016 21:26:58 +0000 (14:26 -0700)]
ocfs2: o2hb: add negotiate timer
This series of patches is to fix the issue that when storage down, all
nodes will fence self due to write timeout.
With this patch set, all nodes will keep going until storage back
online, except if the following issue happens, then all nodes will do as
before to fence self.
1. io error got
2. network between nodes down
3. nodes panic
This patch (of 6):
When storage down, all nodes will fence self due to write timeout. The
negotiate timer is designed to avoid this, with it node will wait until
storage up again.
Negotiate timer working in the following way:
1. The timer expires before write timeout timer, its timeout is half
of write timeout now. It is re-queued along with write timeout timer.
If expires, it will send NEGO_TIMEOUT message to master node(node with
lowest node number). This message does nothing but marks a bit in a
bitmap recording which nodes are negotiating timeout on master node.
2. If storage down, nodes will send this message to master node, then
when master node finds its bitmap including all online nodes, it sends
NEGO_APPROVL message to all nodes one by one, this message will
re-queue write timeout timer and negotiate timer. For any node doesn't
receive this message or meets some issue when handling this message, it
will be fenced. If storage up at any time, o2hb_thread will run and
re-queue all the timer, nothing will be affected by these two steps.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Gang He <ghe@suse.com>
Cc: rwxybh <rwxybh@126.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 27 May 2016 21:28:09 +0000 (14:28 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A set of fixes that wasn't included in the first merge window pull
request. This pull request contains:
- A set of NVMe fixes from Keith, and one from Nic for the integrity
side of it.
- Fix from Ming, clearing ->mq_ops if we don't successfully setup a
queue for multiqueue.
- A set of stability fixes for bcache from Jiri, and also marking
bcache as orphaned as it's no longer actively maintained (in
mainline, at least)"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: clear q->mq_ops if init fail
MAINTAINERS: mark bcache as orphan
bcache: bch_gc_thread() is not freezable
bcache: bch_allocator_thread() is not freezable
bcache: bch_writeback_thread() is not freezable
nvme/host: Add missing blk_integrity tag_size + flags assignments
NVMe: Add device ID's with stripe quirk
NVMe: Short-cut removal on surprise hot-unplug
NVMe: Allow user initiated rescan
NVMe: Reduce driver log spamming
NVMe: Unbind driver on failure
NVMe: Delete only created queues
NVMe: Allocate queues only for online cpus
Linus Torvalds [Fri, 27 May 2016 21:17:15 +0000 (14:17 -0700)]
Merge tag 'for-linus-
20160527' of git://git.infradead.org/linux-mtd
Pull MTD fixes from Brian Norris:
"We've already noticed a few flaws in the MTD work for v4.7-rc1:
- The Atmel folks got ahead of themselves on trying to support their
latest hardware and were working off incorrect documentation. Fix
up the NAND driver to get this correct.
- Fix up device tree example documentation to use the latest
recommendations for describing NAND ECC algorithms"
* tag 'for-linus-
20160527' of git://git.infradead.org/linux-mtd:
Documentation: dt: mtd: drop "soft_bch" from example
Revert "mtd: atmel_nand: Support variable RB_EDGE interrupts"
Linus Torvalds [Fri, 27 May 2016 21:08:56 +0000 (14:08 -0700)]
Merge tag 'drm-fixes-v4.7-rc1' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
- one IMX built-in regression fix
- a set of amdgpu fixes, mostly powerplay and polaris GPU stuff
- a set of i915 fixes all over, many cc'ed to stable.
The i915 batch contain support for DP++ dongle detection, which is
used to fix some regressions in the HDMI color depth area
* tag 'drm-fixes-v4.7-rc1' of git://people.freedesktop.org/~airlied/linux: (44 commits)
drm/amd: add Kconfig dependency for ACP on DRM_AMDGPU
drm/amdgpu: Fix hdmi deep color support.
drm/amdgpu: fix bug in fence driver fini
drm/i915: Stop automatically retiring requests after a GPU hang
drm/i915: Unify intel_ring_begin()
drm/i915: Ignore stale wm register values on resume on ilk-bdw (v2)
drm/i915/psr: Try to program link training times correctly
drm/imx: Match imx-ipuv3-crtc components using device node in platform data
drm/i915/bxt: Adjusting the error in horizontal timings retrieval
drm/i915: Don't leave old junk in ilk active watermarks on readout
drm/i915: s/DPPL/DPLL/ for SKL DPLLs
drm/i915: Fix gen8 semaphores id for legacy mode
drm/i915: Set crtc_state->lane_count for HDMI
drm/i915/BXT: Retrieving the horizontal timing for DSI
drm/i915: Protect gen7 irq_seqno_barrier with uncore lock
drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms
drm/i915: Determine DP++ type 1 DVI adaptor presence based on VBT
drm/i915: Enable/disable TMDS output buffers in DP++ adaptor as needed
drm/i915: Respect DP++ adaptor TMDS clock limit
drm: Add helper for DP++ adaptors
...
Linus Torvalds [Fri, 27 May 2016 20:56:02 +0000 (13:56 -0700)]
Merge tag 'platform-drivers-x86-v4.7-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
Pull x86 platform driver updates from Darren Hart:
"Mostly minor updates and cleanups. One new power management
controller driver for Intel Core SoCs.
platform/x86:
- Add PMC Driver for Intel Core SoC
dell-rbtn:
- Ignore ACPI notifications if device is suspended
thinkpad_acpi:
- save kbdlight state on suspend and restore it on resume
intel_menlow:
- reduce code duplication
asus-wmi:
- provide access to ALS control
ideapad-laptop:
- add a new WMI string for ESC key
surfacepro3_button:
- Add a warning when switching to tablet mode
sony-laptop:
- Avoid oops on module unload for older laptops
intel_telemetry:
- Constify telemetry_core_ops structures
fujitsu-laptop:
- Use IS_ENABLED() instead of checking for built-in or module
asus-laptop:
- correct error handling in sysfs_acpi_set
- remove redundant initializers
- correct error handling in asus_read_brightness()
fujitsu-laptop:
- Support radio LED"
* tag 'platform-drivers-x86-v4.7-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
platform/x86: Add PMC Driver for Intel Core SoC
dell-rbtn: Ignore ACPI notifications if device is suspended
thinkpad_acpi: save kbdlight state on suspend and restore it on resume
intel_menlow: reduce code duplication
asus-wmi: provide access to ALS control
ideapad-laptop: add a new WMI string for ESC key
surfacepro3_button: Add a warning when switching to tablet mode
sony-laptop: Avoid oops on module unload for older laptops
intel_telemetry: Constify telemetry_core_ops structures
fujitsu-laptop: Use IS_ENABLED() instead of checking for built-in or module
asus-laptop: correct error handling in sysfs_acpi_set
asus-laptop: remove redundant initializers
asus-laptop: correct error handling in asus_read_brightness()
fujitsu-laptop: Support radio LED
Linus Torvalds [Fri, 27 May 2016 20:49:24 +0000 (13:49 -0700)]
Merge git://git.infradead.org/intel-iommu
Pull intel IOMMU updates from David Woodhouse:
"This patchset improves the scalability of the Intel IOMMU code by
resolving two spinlock bottlenecks and eliminating the linearity of
the IOVA allocator, yielding up to ~5x performance improvement and
approaching 'iommu=off' performance"
* git://git.infradead.org/intel-iommu:
iommu/vt-d: Use per-cpu IOVA caching
iommu/iova: introduce per-cpu caching to iova allocation
iommu/vt-d: change intel-iommu to use IOVA frame numbers
iommu/vt-d: avoid dev iotlb logic for domains with no dev iotlbs
iommu/vt-d: only unmap mapped entries
iommu/vt-d: correct flush_unmaps pfn usage
iommu/vt-d: per-cpu deferred invalidation queues
iommu/vt-d: refactoring of deferred flush entries
Linus Torvalds [Fri, 27 May 2016 20:41:54 +0000 (13:41 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull second batch of KVM updates from Radim Krčmář:
"General:
- move kvm_stat tool from QEMU repo into tools/kvm/kvm_stat (kvm_stat
had nothing to do with QEMU in the first place -- the tool only
interprets debugfs)
- expose per-vm statistics in debugfs and support them in kvm_stat
(KVM always collected per-vm statistics, but they were summarised
into global statistics)
x86:
- fix dynamic APICv (VMX was improperly configured and a guest could
access host's APIC MSRs, CVE-2016-4440)
- minor fixes
ARM changes from Christoffer Dall:
- new vgic reimplementation of our horribly broken legacy vgic
implementation. The two implementations will live side-by-side
(with the new being the configured default) for one kernel release
and then we'll remove the legacy one.
- fix for a non-critical issue with virtual abort injection to guests"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (70 commits)
tools: kvm_stat: Add comments
tools: kvm_stat: Introduce pid monitoring
KVM: Create debugfs dir and stat files for each VM
MAINTAINERS: Add kvm tools
tools: kvm_stat: Powerpc related fixes
tools: Add kvm_stat man page
tools: Add kvm_stat vm monitor script
kvm:vmx: more complete state update on APICv on/off
KVM: SVM: Add more SVM_EXIT_REASONS
KVM: Unify traced vector format
svm: bitwise vs logical op typo
KVM: arm/arm64: vgic-new: Synchronize changes to active state
KVM: arm/arm64: vgic-new: enable build
KVM: arm/arm64: vgic-new: implement mapped IRQ handling
KVM: arm/arm64: vgic-new: Wire up irqfd injection
KVM: arm/arm64: vgic-new: Add vgic_v2/v3_enable
KVM: arm/arm64: vgic-new: vgic_init: implement map_resources
KVM: arm/arm64: vgic-new: vgic_init: implement vgic_init
KVM: arm/arm64: vgic-new: vgic_init: implement vgic_create
KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init
...
Al Viro [Fri, 27 May 2016 14:19:30 +0000 (10:19 -0400)]
switch xattr_handler->set() to passing dentry and inode separately
preparation for similar switch in ->setxattr() (see the next commit for
rationale).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Rajneesh Bhardwaj [Thu, 26 May 2016 09:11:19 +0000 (14:41 +0530)]
platform/x86: Add PMC Driver for Intel Core SoC
This patch adds the Power Management Controller driver as a PCI driver
for Intel Core SoC architecture.
This driver can utilize debugging capabilities and supported features
as exposed by the Power Management Controller.
Please refer to the below specification for more details on PMC features.
http://www.intel.in/content/www/in/en/chipsets/100-series-chipset-datasheet-vol-2.html
The current version of this driver exposes SLP_S0_RESIDENCY counter.
This counter can be used for detecting fragile SLP_S0 signal related
failures and take corrective actions when PCH SLP_S0 signal is not
asserted after kernel freeze as part of suspend to idle flow
(echo freeze > /sys/power/state).
Intel Platform Controller Hub (PCH) asserts SLP_S0 signal when it
detects favorable conditions to enter its low power mode. As a
pre-requisite the SoC should be in deepest possible Package C-State
and devices should be in low power mode. For example, on Skylake SoC
the deepest Package C-State is Package C10 or PC10. Suspend to idle
flow generally leads to PC10 state but PC10 state may not be sufficient
for realizing the platform wide power potential which SLP_S0 signal
assertion can provide.
SLP_S0 signal is often connected to the Embedded Controller (EC) and the
Power Management IC (PMIC) for other platform power management related
optimizations.
In general, SLP_S0 assertion == PC10 + PCH low power mode + ModPhy Lanes
power gated + PLL Idle.
As part of this driver, a mechanism to read the SLP_S0_RESIDENCY is exposed
as an API and also debugfs features are added to indicate SLP_S0 signal
assertion residency in microseconds.
echo freeze > /sys/power/state
wake the system
cat /sys/kernel/debug/pmc_core/slp_s0_residency_usec
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com>
Signed-off-by: Vishwanath Somayaji <vishwanath.somayaji@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Gabriele Mazzotta [Tue, 24 May 2016 20:53:08 +0000 (22:53 +0200)]
dell-rbtn: Ignore ACPI notifications if device is suspended
Some BIOSes unconditionally send an ACPI notification to RBTN when the
system is resuming from suspend. This makes dell-rbtn send an input
event to userspace as if a function key was pressed. Prevent this by
ignoring all the notifications received while the device is suspended.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=106031
Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Tested-by: Alex Hung <alex.hung@canonical.com>
Reviewed-by: Pali Rohár <pali.rohar@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Marco Trevisan (Treviño) [Mon, 23 May 2016 22:39:51 +0000 (00:39 +0200)]
thinkpad_acpi: save kbdlight state on suspend and restore it on resume
Override default LED class suspend/resume handles, by keeping track of
the brightness level before suspending so that it can be automatically
restored on resume by calling default resume handler.
Signed-off-by: Marco Trevisan (Treviño) <mail@3v1n0.net>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Rasmus Villemoes [Sat, 26 Mar 2016 21:40:26 +0000 (22:40 +0100)]
intel_menlow: reduce code duplication
aux0_show and aux1_show consists of almost identical code. Pull that
into a common helper and make them thin wrappers. Similarly for
_store.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Oleksij Rempel [Fri, 1 Apr 2016 11:35:21 +0000 (13:35 +0200)]
asus-wmi: provide access to ALS control
Asus Zenbook ux31a is providing ACPI0008 interface for ALS
(Ambient Light Sensor), which is accessible for OS => Win 7.
This sensor can be used with iio/acpi-als driver.
Since it is disabled by default, we should use asus-wmi
interface to enable it.
Signed-off-by: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Arnd Bergmann [Mon, 9 May 2016 21:49:21 +0000 (23:49 +0200)]
ideapad-laptop: add a new WMI string for ESC key
My patch to the ideapad-laptop driver to get the ESC key working on the
Yoga 1170 (Yoga 3) failed to do the same for the following model, the
Lenovo Yoga 700.
Denis Gordienko managed to get it working by adding another GUID for the
new WMI interface. I have adapted his patch to normal coding style
and simplified it a bit for inclusion, but this patch is currently
untested.
Link: https://forums.lenovo.com/t5/Lenovo-Yoga-Series-Notebooks/YOGA-3-14-How-to-reclaim-my-Esc-key-and-permanently-disable/m-p/3317499
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Denis Gordienko <denis.gordienko.mail@gmail.com>
[dvhart: Whitespace cleanup, static const char *const array declaration]
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Andy Shevchenko [Tue, 10 May 2016 11:49:55 +0000 (14:49 +0300)]
surfacepro3_button: Add a warning when switching to tablet mode
Microsoft Surface Book has a tablet mode button. Print another message
once on this event instead of repeating "Unknown event...".
Unfortunately, proper support involves the _DSM method, which is not a
discoverable interface. Just print a warning for now.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Baruch Siach [Wed, 25 May 2016 03:45:10 +0000 (06:45 +0300)]
Documentation: dt: mtd: drop "soft_bch" from example
Commit
32698aafc9f0 (Documentation: devicetree: deprecate "soft_bch"
nand-ecc-mode value, 2016-04-22) deprecated "soft_bch". Update the example to
match.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Vivek Goyal [Fri, 20 May 2016 13:04:26 +0000 (09:04 -0400)]
ovl: Do d_type check only if work dir creation was successful
d_type check requires successful creation of workdir as iterates
through work dir and expects work dir to be present in it. If that's
not the case, this check will always return d_type not supported even
if underlying filesystem might be supporting it.
So don't do this check if work dir creation failed in previous step.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 27 May 2016 06:55:26 +0000 (08:55 +0200)]
ovl: update documentation
Two "fixme" items are actually fixed now.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Antonio Murdaca [Thu, 7 Apr 2016 13:48:25 +0000 (15:48 +0200)]
ovl: override creds with the ones from the superblock mounter
In user namespace the whiteout creation fails with -EPERM because the
current process isn't capable(CAP_SYS_ADMIN) when setting xattr.
A simple reproducer:
$ mkdir upper lower work merged lower/dir
$ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
$ unshare -m -p -f -U -r bash
Now as root in the user namespace:
\# touch merged/dir/{1,2,3} # this will force a copy up of lower/dir
\# rm -fR merged/*
This ends up failing with -EPERM after the files in dir has been
correctly deleted:
unlinkat(4, "2", 0) = 0
unlinkat(4, "1", 0) = 0
unlinkat(4, "3", 0) = 0
close(4) = 0
unlinkat(AT_FDCWD, "merged/dir", AT_REMOVEDIR) = -1 EPERM (Operation not
permitted)
Interestingly, if you don't place files in merged/dir you can remove it,
meaning if upper/dir does not exist, creating the char device file works
properly in that same location.
This patch uses ovl_sb_creator_cred() to get the cred struct from the
superblock mounter and override the old cred with these new ones so that
the whiteout creation is possible because overlay is wrong in assuming that
the creds it will get with prepare_creds will be in the initial user
namespace. The old cap_raise game is removed in favor of just overriding
the old cred struct.
This patch also drops from ovl_copy_up_one() the following two lines:
override_cred->fsuid = stat->uid;
override_cred->fsgid = stat->gid;
This is because the correct uid and gid are taken directly with the stat
struct and correctly set with ovl_set_attr().
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Dave Airlie [Fri, 27 May 2016 06:08:38 +0000 (16:08 +1000)]
Merge tag 'drm-intel-next-fixes-2016-05-25' of git://anongit.freedesktop.org/drm-intel into drm-next
I see the main drm pull got merged, here's the first batch of fixes for
4.7 already. Fixes all around, a large portion cc: stable stuff.
[airlied: the DP++ stuff is a regression fix].
* tag 'drm-intel-next-fixes-2016-05-25' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Stop automatically retiring requests after a GPU hang
drm/i915: Unify intel_ring_begin()
drm/i915: Ignore stale wm register values on resume on ilk-bdw (v2)
drm/i915/psr: Try to program link training times correctly
drm/i915/bxt: Adjusting the error in horizontal timings retrieval
drm/i915: Don't leave old junk in ilk active watermarks on readout
drm/i915: s/DPPL/DPLL/ for SKL DPLLs
drm/i915: Fix gen8 semaphores id for legacy mode
drm/i915: Set crtc_state->lane_count for HDMI
drm/i915/BXT: Retrieving the horizontal timing for DSI
drm/i915: Protect gen7 irq_seqno_barrier with uncore lock
drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms
drm/i915: Determine DP++ type 1 DVI adaptor presence based on VBT
drm/i915: Enable/disable TMDS output buffers in DP++ adaptor as needed
drm/i915: Respect DP++ adaptor TMDS clock limit
drm: Add helper for DP++ adaptors
Dave Airlie [Fri, 27 May 2016 06:03:48 +0000 (16:03 +1000)]
Merge branch 'drm-next-4.7' of git://people.freedesktop.org/~agd5f/linux into drm-next
AMD GPU bugfixes:
- Various powerplay bug fixes
- Add some new polaris pci ids
- misc bug fixes and code cleanups
* 'drm-next-4.7' of git://people.freedesktop.org/~agd5f/linux: (27 commits)
drm/amd: add Kconfig dependency for ACP on DRM_AMDGPU
drm/amdgpu: Fix hdmi deep color support.
drm/amdgpu: fix bug in fence driver fini
drm/amd/powerplay/hwmgr: use kmemdup
drm/amd/powerplay/hwmgr: use kmemdup
drm/amd/powerplay/hwmgr: use kmemdup
drm/amd/powerplay: fix bugs of checking if dpm is running on Tonga
drm/amdgpu: update Polaris11 golden setting
drm/amdgpu: Add more Polaris 11 PCI IDs
drm/amdgpu: update Polaris10 golden setting
drm/amdgpu: add more Polaris10 DID
drm/amd/amdgpu : Remove unused variable
drm/amd/amdgpu : Remove unused variable
drm/amd/amdgpu : Remove unused variable
drm/amd/amdgpu/cz_dpm: Remove unused variable
drm/amd/amdgpu : Remove unused variable
drm/amd/powerplay: use ARRAY_SIZE() to calculate array size.
drm/amdgpu: fix array out of bounds
drm/radeon: fix array out of bounds
drm/amd/powerplay: fix a bug on updating sclk for Tonga
...
Linus Torvalds [Fri, 27 May 2016 05:32:05 +0000 (22:32 -0700)]
Merge branch 'misc' of git://git./linux/kernel/git/mmarek/kbuild
Pull misc kbuild updates from Michal Marek:
"This is the non-critical part of kbuild:
- Coccinelle fixes, one semantic patch less in this round [Vaishali
Thakkar, Wolfram Sang, Kees Cook]
- rpm-pkg support for (open)SUSE's update-bootloader [Jiří Kosian]
- rpm-pkg restored support for $RPMOPTS [Srinivas Pandruvada]
- deb-pkg fixes for the linux-headers package [Bjørn Mork, Azriel
Samson]"
* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
coccicheck: Fix missing 0 index in kill loop
scripts/package/Makefile: rpmbuild add support of RPMOPTS
builddeb: fix missing headers in linux-headers package
builddeb: include objtool binary in headers package
kbuild/mkspec: support 'update-bootloader'-based systems
scripts: coccinelle: remove check to move constants to right
Coccinelle: setup_timer: Add space in front of parentheses
Linus Torvalds [Fri, 27 May 2016 05:27:09 +0000 (22:27 -0700)]
Merge branch 'kconfig' of git://git./linux/kernel/git/mmarek/kbuild
Pull kconfig update from Michal Marek:
- fix for behavior of tristate choice items and fix for documentation
of existing kconfig behavior [Dirk Gouders]
- more helpful "unexpected data" kconfig warning [Paul Bolle]
* 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kconfig/symbol.c: handle choice_values that depend on 'm' symbols
kconfig-language: elaborate on the type of a choice
kconfig-language: fix comment on dependency-generated menu structures.
kconfig: add unexpected data itself to warning
Linus Torvalds [Fri, 27 May 2016 05:01:22 +0000 (22:01 -0700)]
Merge branch 'kbuild' of git://git./linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:
- new option CONFIG_TRIM_UNUSED_KSYMS which does a two-pass build and
unexports symbols which are not used in the current config [Nicolas
Pitre]
- several kbuild rule cleanups [Masahiro Yamada]
- warning option adjustments for gcov etc [Arnd Bergmann]
- a few more small fixes
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (31 commits)
kbuild: move -Wunused-const-variable to W=1 warning level
kbuild: fix if_change and friends to consider argument order
kbuild: fix adjust_autoksyms.sh for modules that need only one symbol
kbuild: fix ksym_dep_filter when multiple EXPORT_SYMBOL() on the same line
gcov: disable -Wmaybe-uninitialized warning
gcov: disable tree-loop-im to reduce stack usage
gcov: disable for COMPILE_TEST
Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES
Kbuild: change CC_OPTIMIZE_FOR_SIZE definition
kbuild: forbid kernel directory to contain spaces and colons
kbuild: adjust ksym_dep_filter for some cmd_* renames
kbuild: Fix dependencies for final vmlinux link
kbuild: better abstract vmlinux sequential prerequisites
kbuild: fix call to adjust_autoksyms.sh when output directory specified
kbuild: Get rid of KBUILD_STR
kbuild: rename cmd_as_s_S to cmd_cpp_s_S
kbuild: rename cmd_cc_i_c to cmd_cpp_i_c
kbuild: drop redundant "PHONY += FORCE"
kbuild: delete unnecessary "@:"
kbuild: mark help target as PHONY
...
Linus Torvalds [Fri, 27 May 2016 04:32:40 +0000 (21:32 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
drivers/pinctrl/intel/pinctrl-baytrail.c: fix build with gcc-4.4
update "mm/zsmalloc: don't fail if can't create debugfs info"
dma-debug: avoid spinlock recursion when disabling dma-debug
mm: oom_reaper: remove some bloat
memcg: fix mem_cgroup_out_of_memory() return value.
ocfs2: fix improper handling of return errno
mm: slub: remove unused virt_to_obj()
mm: kasan: remove unused 'reserved' field from struct kasan_alloc_meta
mm: make CONFIG_DEFERRED_STRUCT_PAGE_INIT depends on !FLATMEM explicitly
seqlock: fix raw_read_seqcount_latch()
Linus Torvalds [Fri, 27 May 2016 03:00:28 +0000 (20:00 -0700)]
Merge tag 'dax-locking-for-4.7' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull DAX locking updates from Ross Zwisler:
"Filesystem DAX locking for 4.7
- We use a bit in an exceptional radix tree entry as a lock bit and
use it similarly to how page lock is used for normal faults. This
fixes races between hole instantiation and read faults of the same
index.
- Filesystem DAX PMD faults are disabled, and will be re-enabled when
PMD locking is implemented"
* tag 'dax-locking-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: Remove i_mmap_lock protection
dax: Use radix tree entry lock to protect cow faults
dax: New fault locking
dax: Allow DAX code to replace exceptional entries
dax: Define DAX lock bit for radix tree exceptional entry
dax: Make huge page handling depend of CONFIG_BROKEN
dax: Fix condition for filling of PMD holes
Linus Torvalds [Fri, 27 May 2016 02:34:26 +0000 (19:34 -0700)]
Merge tag 'dax-misc-for-4.7' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull misc DAX updates from Vishal Verma:
"DAX error handling for 4.7
- Until now, dax has been disabled if media errors were found on any
device. This enables the use of DAX in the presence of these
errors by making all sector-aligned zeroing go through the driver.
- The driver (already) has the ability to clear errors on writes that
are sent through the block layer using 'DSMs' defined in ACPI 6.1.
Other misc changes:
- When mounting DAX filesystems, check to make sure the partition is
page aligned. This is a requirement for DAX, and previously, we
allowed such unaligned mounts to succeed, but subsequent
reads/writes would fail.
- Misc/cleanup fixes from Jan that remove unused code from DAX
related to zeroing, writeback, and some size checks"
* tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: fix a comment in dax_zero_page_range and dax_truncate_page
dax: for truncate/hole-punch, do zeroing through the driver if possible
dax: export a low-level __dax_zero_page_range helper
dax: use sb_issue_zerout instead of calling dax_clear_sectors
dax: enable dax in the presence of known media errors (badblocks)
dax: fallback from pmd to pte on error
block: Update blkdev_dax_capable() for consistency
xfs: Add alignment check for DAX mount
ext2: Add alignment check for DAX mount
ext4: Add alignment check for DAX mount
block: Add bdev_dax_supported() for dax mount checks
block: Add vfs_msg() interface
dax: Remove redundant inode size checks
dax: Remove pointless writeback from dax_do_io()
dax: Remove zeroing from dax_io()
dax: Remove dead zeroing code from fault handlers
ext2: Avoid DAX zeroing to corrupt data
ext2: Fix block zeroing in ext2_get_blocks() for DAX
dax: Remove complete_unwritten argument
DAX: move RADIX_DAX_ definitions to dax.c
Andrew Morton [Thu, 26 May 2016 22:16:30 +0000 (15:16 -0700)]
drivers/pinctrl/intel/pinctrl-baytrail.c: fix build with gcc-4.4
gcc-4.4 and thereabouts has issues with initializers of anonymous
unions, and it generates the following warnings:
drivers/pinctrl/intel/pinctrl-baytrail.c:413: error: unknown field 'simple_funcs' specified in initializer
drivers/pinctrl/intel/pinctrl-baytrail.c:413: warning: missing braces around initializer
drivers/pinctrl/intel/pinctrl-baytrail.c:413: warning: (near initialization for 'byt_score_groups[0].<anonymous>')
drivers/pinctrl/intel/pinctrl-baytrail.c:415: error: unknown field 'simple_funcs' specified in initializer
drivers/pinctrl/intel/pinctrl-baytrail.c:417: error: unknown field 'simple_funcs' specified in initializer
...
Work around this.
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Streetman [Thu, 26 May 2016 22:16:27 +0000 (15:16 -0700)]
update "mm/zsmalloc: don't fail if can't create debugfs info"
Some updates to commit
d34f615720d1 ("mm/zsmalloc: don't fail if can't
create debugfs info"):
- add pr_warn to all stat failure cases
- do not prevent module loading on stat failure
Link: http://lkml.kernel.org/r/1463671123-5479-1-git-send-email-ddstreet@ieee.org
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Reviewed-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ville Syrjälä [Thu, 26 May 2016 22:16:25 +0000 (15:16 -0700)]
dma-debug: avoid spinlock recursion when disabling dma-debug
With netconsole (at least) the pr_err("... disablingn") call can
recurse back into the dma-debug code, where it'll try to grab
free_entries_lock again. Avoid the problem by doing the printk after
dropping the lock.
Link: http://lkml.kernel.org/r/1463678421-18683-1-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 26 May 2016 22:16:22 +0000 (15:16 -0700)]
mm: oom_reaper: remove some bloat
mmput_async is currently used only from the oom_reaper which is defined
only for CONFIG_MMU. We can save work_struct in mm_struct for
!CONFIG_MMU.
[akpm@linux-foundation.org: fix typo, per Minchan]
Link: http://lkml.kernel.org/r/20160520061658.GB19172@dhcp22.suse.cz
Reported-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Thu, 26 May 2016 22:16:19 +0000 (15:16 -0700)]
memcg: fix mem_cgroup_out_of_memory() return value.
mem_cgroup_out_of_memory() is returning "true" if it finds a TIF_MEMDIE
task after an eligible task was found, "false" if it found a TIF_MEMDIE
task before an eligible task is found.
This difference confuses memory_max_write() which checks the return
value of mem_cgroup_out_of_memory(). Since memory_max_write() wants to
continue looping, mem_cgroup_out_of_memory() should return "true" in
this case.
This patch sets a dummy pointer in order to return "true".
Link: http://lkml.kernel.org/r/1463753327-5170-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Ren [Thu, 26 May 2016 22:16:16 +0000 (15:16 -0700)]
ocfs2: fix improper handling of return errno
Previously, if a bad inode was found in ocfs2_iget(), -ESTALE was
returned back to the caller anyway. Since commit
d2b9d71a2da7 ("ocfs2:
check/fix inode block for online file check") can handle with return
value from ocfs2_read_locked_inode() now, we know the exact errno
returned for us.
Link: http://lkml.kernel.org/r/1463970656-18413-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Thu, 26 May 2016 22:16:14 +0000 (15:16 -0700)]
mm: slub: remove unused virt_to_obj()
It's unused since commit
7ed2f9e66385 ("mm, kasan: SLAB support")
Link: http://lkml.kernel.org/r/1464020961-2242-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Thu, 26 May 2016 22:16:11 +0000 (15:16 -0700)]
mm: kasan: remove unused 'reserved' field from struct kasan_alloc_meta
Commit
cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable
stackdepot for SLAB") added 'reserved' field, but never used it.
Link: http://lkml.kernel.org/r/1464021054-2307-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Thu, 26 May 2016 22:16:08 +0000 (15:16 -0700)]
mm: make CONFIG_DEFERRED_STRUCT_PAGE_INIT depends on !FLATMEM explicitly
Per the suggestion from Michal Hocko [1], DEFERRED_STRUCT_PAGE_INIT
requires some ordering wrt other initialization operations, e.g.
page_ext_init has to happen after the whole memmap is initialized
properly.
For SPARSEMEM this requires to wait for page_alloc_init_late. Other
memory models (e.g. flatmem) might have different initialization
layouts (page_ext_init_flatmem). Currently DEFERRED_STRUCT_PAGE_INIT
depends on MEMORY_HOTPLUG which in turn
depends on SPARSEMEM || X86_64_ACPI_NUMA
depends on ARCH_ENABLE_MEMORY_HOTPLUG
and X86_64_ACPI_NUMA depends on NUMA which in turn disable FLATMEM
memory model:
config ARCH_FLATMEM_ENABLE
def_bool y
depends on X86_32 && !NUMA
so FLATMEM is ruled out via dependency maze. Be explicit and disable
FLATMEM for DEFERRED_STRUCT_PAGE_INIT so that we do not reintroduce
subtle initialization bugs
[1] http://lkml.kernel.org/r/
20160523073157.GD2278@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1464027356-32282-1-git-send-email-yang.shi@linaro.org
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 26 May 2016 22:16:06 +0000 (15:16 -0700)]
seqlock: fix raw_read_seqcount_latch()
lockless_dereference() is supposed to take pointer not integer.
Link: http://lkml.kernel.org/r/20160521201448.GA7429@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 26 May 2016 21:10:32 +0000 (14:10 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
"This changeset has a few main parts:
- Ilya has finished a huge refactoring effort to sync up the
client-side logic in libceph with the user-space client code, which
has evolved significantly over the last couple years, with lots of
additional behaviors (e.g., how requests are handled when cluster
is full and transitions from full to non-full).
This structure of the code is more closely aligned with userspace
now such that it will be much easier to maintain going forward when
behavior changes take place. There are some locking improvements
bundled in as well.
- Zheng adds multi-filesystem support (multiple namespaces within the
same Ceph cluster)
- Zheng has changed the readdir offsets and directory enumeration so
that dentry offsets are hash-based and therefore stable across
directory fragmentation events on the MDS.
- Zheng has a smorgasbord of bug fixes across fs/ceph"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
ceph: fix wake_up_session_cb()
ceph: don't use truncate_pagecache() to invalidate read cache
ceph: SetPageError() for writeback pages if writepages fails
ceph: handle interrupted ceph_writepage()
ceph: make ceph_update_writeable_page() uninterruptible
libceph: make ceph_osdc_wait_request() uninterruptible
ceph: handle -EAGAIN returned by ceph_update_writeable_page()
ceph: make fault/page_mkwrite return VM_FAULT_OOM for -ENOMEM
ceph: block non-fatal signals for fault/page_mkwrite
ceph: make logical calculation functions return bool
ceph: tolerate bad i_size for symlink inode
ceph: improve fragtree change detection
ceph: keep leaf frag when updating fragtree
ceph: fix dir_auth check in ceph_fill_dirfrag()
ceph: don't assume frag tree splits in mds reply are sorted
ceph: fix inode reference leak
ceph: using hash value to compose dentry offset
ceph: don't forbid marking directory complete after forward seek
ceph: record 'offset' for each entry of readdir result
ceph: define 'end/complete' in readdir reply as bit flags
...
Chris Mason [Mon, 16 May 2016 16:21:01 +0000 (09:21 -0700)]
Btrfs: fix handling of faults from btrfs_copy_from_user
When btrfs_copy_from_user isn't able to copy all of the pages, we need
to adjust our accounting to reflect the work that was actually done.
Commit
2e78c927d79 changed around the decisions a little and we ended up
skipping the accounting adjustments some of the time. This commit makes
sure that when we don't copy anything at all, we still hop into
the adjustments, and switches to release_bytes instead of write_bytes,
since write_bytes isn't aligned.
The accounting errors led to warnings during btrfs_destroy_inode:
[ 70.847532] WARNING: CPU: 10 PID: 514 at fs/btrfs/inode.c:9350 btrfs_destroy_inode+0x2b3/0x2c0
[ 70.847536] Modules linked in: i2c_piix4 virtio_net i2c_core input_leds button led_class serio_raw acpi_cpufreq sch_fq_codel autofs4 virtio_blk
[ 70.847538] CPU: 10 PID: 514 Comm: umount Tainted: G W 4.6.0-rc6_00062_g2997da1-dirty #23
[ 70.847539] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014
[ 70.847542]
0000000000000000 ffff880ff5cafab8 ffffffff8149d5e9 0000000000000202
[ 70.847543]
0000000000000000 0000000000000000 0000000000000000 ffff880ff5cafb08
[ 70.847547]
ffffffff8107bdfd ffff880ff5cafaf8 000024868120013d ffff880ff5cafb28
[ 70.847547] Call Trace:
[ 70.847550] [<
ffffffff8149d5e9>] dump_stack+0x51/0x78
[ 70.847551] [<
ffffffff8107bdfd>] __warn+0xfd/0x120
[ 70.847553] [<
ffffffff8107be3d>] warn_slowpath_null+0x1d/0x20
[ 70.847555] [<
ffffffff8139c9e3>] btrfs_destroy_inode+0x2b3/0x2c0
[ 70.847556] [<
ffffffff812003a1>] ? __destroy_inode+0x71/0x140
[ 70.847558] [<
ffffffff812004b3>] destroy_inode+0x43/0x70
[ 70.847559] [<
ffffffff810b7b5f>] ? wake_up_bit+0x2f/0x40
[ 70.847560] [<
ffffffff81200c68>] evict+0x148/0x1d0
[ 70.847562] [<
ffffffff81398ade>] ? start_transaction+0x3de/0x460
[ 70.847564] [<
ffffffff81200d49>] dispose_list+0x59/0x80
[ 70.847565] [<
ffffffff81201ba0>] evict_inodes+0x180/0x190
[ 70.847566] [<
ffffffff812191ff>] ? __sync_filesystem+0x3f/0x50
[ 70.847568] [<
ffffffff811e95f8>] generic_shutdown_super+0x48/0x100
[ 70.847569] [<
ffffffff810b75c0>] ? woken_wake_function+0x20/0x20
[ 70.847571] [<
ffffffff811e9796>] kill_anon_super+0x16/0x30
[ 70.847573] [<
ffffffff81365cde>] btrfs_kill_super+0x1e/0x130
[ 70.847574] [<
ffffffff811e99be>] deactivate_locked_super+0x4e/0x90
[ 70.847576] [<
ffffffff811e9e61>] deactivate_super+0x51/0x70
[ 70.847577] [<
ffffffff8120536f>] cleanup_mnt+0x3f/0x80
[ 70.847579] [<
ffffffff81205402>] __cleanup_mnt+0x12/0x20
[ 70.847581] [<
ffffffff81098358>] task_work_run+0x68/0xa0
[ 70.847582] [<
ffffffff810022b6>] exit_to_usermode_loop+0xd6/0xe0
[ 70.847583] [<
ffffffff81002e1d>] do_syscall_64+0xbd/0x170
[ 70.847586] [<
ffffffff817d4dbc>] entry_SYSCALL64_slow_path+0x25/0x25
This is the test program I used to force short returns from
btrfs_copy_from_user
void *dontneed(void *arg)
{
char *p = arg;
int ret;
while(1) {
ret = madvise(p, BUFSIZE/4, MADV_DONTNEED);
if (ret) {
perror("madvise");
exit(1);
}
}
}
int main(int ac, char **av) {
int ret;
int fd;
char *filename;
unsigned long offset;
char *buf;
int i;
pthread_t tid;
if (ac != 2) {
fprintf(stderr, "usage: dammitdave filename\n");
exit(1);
}
buf = mmap(NULL, BUFSIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (buf == MAP_FAILED) {
perror("mmap");
exit(1);
}
memset(buf, 'a', BUFSIZE);
filename = av[1];
ret = pthread_create(&tid, NULL, dontneed, buf);
if (ret) {
fprintf(stderr, "error %d from pthread_create\n", ret);
exit(1);
}
ret = pthread_detach(tid);
if (ret) {
fprintf(stderr, "pthread detach failed %d\n", ret);
exit(1);
}
while (1) {
fd = open(filename, O_RDWR | O_CREAT, 0600);
if (fd < 0) {
perror("open");
exit(1);
}
for (i = 0; i < ROUNDS; i++) {
int this_write = BUFSIZE;
offset = rand() % MAXSIZE;
ret = pwrite(fd, buf, this_write, offset);
if (ret < 0) {
perror("pwrite");
exit(1);
} else if (ret != this_write) {
fprintf(stderr, "short write to %s offset %lu ret %d\n",
filename, offset, ret);
exit(1);
}
if (i == ROUNDS - 1) {
ret = sync_file_range(fd, offset, 4096,
SYNC_FILE_RANGE_WRITE);
if (ret < 0) {
perror("sync_file_range");
exit(1);
}
}
}
ret = ftruncate(fd, 0);
if (ret < 0) {
perror("ftruncate");
exit(1);
}
ret = close(fd);
if (ret) {
perror("close");
exit(1);
}
ret = unlink(filename);
if (ret) {
perror("unlink");
exit(1);
}
}
return 0;
}
Signed-off-by: Chris Mason <clm@fb.com>
Reported-by: Dave Jones <dsj@fb.com>
Fixes:
2e78c927d79333f299a8ac81c2fd2952caeef335
cc: stable@vger.kernel.org # v4.6
Signed-off-by: Chris Mason <clm@fb.com>
Chris Mason [Thu, 26 May 2016 19:49:21 +0000 (12:49 -0700)]
Merge branch 'for-chris-4.7' of git://git./linux/kernel/git/kdave/linux into for-linus-4.7
Linus Torvalds [Thu, 26 May 2016 17:33:33 +0000 (10:33 -0700)]
Merge tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"Highlights include:
Features:
- Add support for the NFS v4.2 COPY operation
- Add support for NFS/RDMA over IPv6
Bugfixes and cleanups:
- Avoid race that crashes nfs_init_commit()
- Fix oops in callback path
- Fix LOCK/OPEN race when unlinking an open file
- Choose correct stateids when using delegations in setattr, read and
write
- Don't send empty SETATTR after OPEN_CREATE
- xprtrdma: Prevent server from writing a reply into memory client
has released
- xprtrdma: Support using Read list and Reply chunk in one RPC call"
* tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (61 commits)
pnfs: pnfs_update_layout needs to consider if strict iomode checking is on
nfs/flexfiles: Use the layout segment for reading unless it a IOMODE_RW and reading is disabled
nfs/flexfiles: Helper function to detect FF_FLAGS_NO_READ_IO
nfs: avoid race that crashes nfs_init_commit
NFS: checking for NULL instead of IS_ERR() in nfs_commit_file()
pnfs: make pnfs_layout_process more robust
pnfs: rework LAYOUTGET retry handling
pnfs: lift retry logic from send_layoutget to pnfs_update_layout
pnfs: fix bad error handling in send_layoutget
flexfiles: add kerneldoc header to nfs4_ff_layout_prepare_ds
flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTED
pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args
pnfs: keep track of the return sequence number in pnfs_layout_hdr
pnfs: record sequence in pnfs_layout_segment when it's created
pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit set
pNFS/flexfiles: When initing reads or writes, we might have to retry connecting to DSes
pNFS/flexfiles: When checking for available DSes, conditionally check for MDS io
pNFS/flexfile: Fix erroneous fall back to read/write through the MDS
NFS: Reclaim writes via writepage are opportunistic
NFSv4: Use the right stateid for delegations in setattr, read and write
...
Linus Torvalds [Thu, 26 May 2016 17:13:40 +0000 (10:13 -0700)]
Merge tag 'xfs-for-linus-4.7-rc1' of git://git./linux/kernel/git/dgc/linux-xfs
Pull xfs updates from Dave Chinner:
"A pretty average collection of fixes, cleanups and improvements in
this request.
Summary:
- fixes for mount line parsing, sparse warnings, read-only compat
feature remount behaviour
- allow fast path symlink lookups for inline symlinks.
- attribute listing cleanups
- writeback goes direct to bios rather than indirecting through
bufferheads
- transaction allocation cleanup
- optimised kmem_realloc
- added configurable error handling for metadata write errors,
changed default error handling behaviour from "retry forever" to
"retry until unmount then fail"
- fixed several inode cluster writeback lookup vs reclaim race
conditions
- fixed inode cluster writeback checking wrong inode after lookup
- fixed bugs where struct xfs_inode freeing wasn't actually RCU safe
- cleaned up inode reclaim tagging"
* tag 'xfs-for-linus-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (39 commits)
xfs: fix warning in xfs_finish_page_writeback for non-debug builds
xfs: move reclaim tagging functions
xfs: simplify inode reclaim tagging interfaces
xfs: rename variables in xfs_iflush_cluster for clarity
xfs: xfs_iflush_cluster has range issues
xfs: mark reclaimed inodes invalid earlier
xfs: xfs_inode_free() isn't RCU safe
xfs: optimise xfs_iext_destroy
xfs: skip stale inodes in xfs_iflush_cluster
xfs: fix inode validity check in xfs_iflush_cluster
xfs: xfs_iflush_cluster fails to abort on error
xfs: remove xfs_fs_evict_inode()
xfs: add "fail at unmount" error handling configuration
xfs: add configuration handlers for specific errors
xfs: add configuration of error failure speed
xfs: introduce table-based init for error behaviors
xfs: add configurable error support to metadata buffers
xfs: introduce metadata IO error class
xfs: configurable error behavior via sysfs
xfs: buffer ->bi_end_io function requires irq-safe lock
...
Linus Torvalds [Thu, 26 May 2016 16:48:23 +0000 (09:48 -0700)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/jdelvare/staging
Pull hwmon fixlets from Jean Delvare.
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
Documentation/hwmon: Update links in max34440
hwmon: (emc2103) Fix typo in MODULE_PARM_DESC
Linus Torvalds [Thu, 26 May 2016 16:36:10 +0000 (09:36 -0700)]
Merge tag 'mmc-v4.7-rc1' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC fixes from Ulf Hansson:
"Here are some mmc fixes intended for v4.7 rc1. They are based on a
commit earlier in the merge window and have been tested in linux-next
for a while.
MMC core:
- Prevent re-tuning while serving requests for RPMB partitions
- Extend timeout for long read time quirk to support more eMMCs
MMC host:
- sdhci-acpi: Ensure connected devices are powered when probing
- sdhci-pci|acpi: Remove unreliable MMC_CAP_BUS_WIDTH_TEST for Intel HWs
- dw_mmc: Correct the assigning of max_blk_size
- dw_mmc-rockchip: Allow RPMB partitions to be created
- dw_mmc-rockchip: Set the drive phase properly"
* tag 'mmc-v4.7-rc1' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: sdhci-acpi: Remove MMC_CAP_BUS_WIDTH_TEST for Intel controllers
mmc: sdhci-pci: Remove MMC_CAP_BUS_WIDTH_TEST for Intel controllers
mmc: longer timeout for long read time quirk
mmc: dw_mmc: rockchip: Set the drive phase properly
mmc: dw_mmc: fix the wrong max_blk_size
mmc: dw_mmc-rockchip: add MMC_CAP_CMD23 capabilities
mmc: sdhci-acpi: Ensure connected devices are powered when probing
ACPI / PM: Export acpi_device_fix_up_power()
mmc: block: Pause re-tuning while switched to the RPMB partition
mmc: block: Always switch back to main area after RPMB access
mmc: core: Add a facility to "pause" re-tuning
Linus Torvalds [Thu, 26 May 2016 16:23:43 +0000 (09:23 -0700)]
Merge branch 'next' of git://git./linux/kernel/git/rzhang/linux
Pull thermal management updates from Zhang Rui:
- Introduce generic ADC thermal driver, based on OF thermal (Laxman
Dewangan)
- Introduce new thermal driver for Tango chips (Marc Gonzalez)
- Rockchip driver support for RK3399, RK3366, and some fixes (Caesar
Wang, Elaine Zhang and Shawn Lin)
- Add CPU power cooling model to Mediatek thermal driver (Dawei Chien)
- Wider usage of dev_thermal_zone_of_sensor_register (Eduardo Valentin)
- TI thermal driver gained a new maintainer (Keerthy).
- Enabled powerclamp driver by checking CPU feature and package cstate
counter instead of CPU whitelist (Jacob Pan)
- Various fixes on thermal governor, OF thermal, Tegra, and RCAR
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (50 commits)
thermal: tango: initialize TEMPSI_CFG
thermal: rockchip: use the usleep_range instead of udelay
thermal: rockchip: add the notes for better reading
thermal: rockchip: Support RK3366 SoCs in the thermal driver
thermal: rockchip: handle the power sequence for tsadc controller
thermal: rockchip: update the tsadc table for rk3399
thermal: rockchip: fixes the code_to_temp for tsadc driver
thermal: rockchip: disable thermal->clk in err case
thermal: tegra: add Tegra132 specific SOC_THERM driver
thermal: fix ptr_ret.cocci warnings
thermal: mediatek: Add cpu dynamic power cooling model.
thermal: generic-adc: Add ADC based thermal sensor driver
thermal: generic-adc: Add DT binding for ADC based thermal sensor
thermal: tegra: fix static checker warning
thermal: tegra: mark PM functions __maybe_unused
thermal: add temperature sensor support for tango SoC
thermal: hisilicon: fix IRQ imbalance enabling
thermal: hisilicon: support to use any sensor
thermal: rcar: Remove binding docs for r8a7794
thermal: tegra: add PM support
...
Linus Torvalds [Thu, 26 May 2016 16:15:19 +0000 (09:15 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull Yama locking fix from James Morris:
"Fix for the Yama LSM"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
Yama: fix double-spinlock and user access in atomic context
Ming Lin [Thu, 26 May 2016 06:23:27 +0000 (23:23 -0700)]
blk-mq: clear q->mq_ops if init fail
blk_mq_init_queue() calls blk_mq_init_allocated_queue(), but q->mq_ops
was not cleared when blk_mq_init_allocated_queue() fails.
Then blk_cleanup_queue() calls blk_mq_free_queue() which will crash because:
- q->all_q_node is not added to all_q_list yet
- q->tag_set is NULL
- hctx was not setup yet or already freed
Fixed it by clearing q->mq_ops on error path.
Signed-off-by: Ming Lin <ming.l@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Tom Haynes [Wed, 25 May 2016 14:31:14 +0000 (07:31 -0700)]
pnfs: pnfs_update_layout needs to consider if strict iomode checking is on
As flexfiles has FF_FLAGS_NO_READ_IO, there is a need to generically
support enforcing that a IOMODE_RW segment will not allow READ I/O.
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Tom Haynes [Wed, 25 May 2016 14:31:13 +0000 (07:31 -0700)]
nfs/flexfiles: Use the layout segment for reading unless it a IOMODE_RW and reading is disabled
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Glenn Dayton [Thu, 26 May 2016 09:06:53 +0000 (11:06 +0200)]
Documentation/hwmon: Update links in max34440
It appears the website for maxim-ic.com changed to
maximintegrated.com.
Signed-off-by: Glenn Dayton <glenn.dayton24@gmail.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Dan Carpenter [Thu, 26 May 2016 09:06:53 +0000 (11:06 +0200)]
hwmon: (emc2103) Fix typo in MODULE_PARM_DESC
"apd" was intended here instead of "init".
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Al Viro [Thu, 26 May 2016 04:05:12 +0000 (00:05 -0400)]
restore killability of old mutex_lock_killable(&inode->i_mutex) users
The ones that are taking it exclusive, that is...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 26 May 2016 04:04:58 +0000 (00:04 -0400)]
add down_write_killable_nested()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 26 May 2016 04:04:18 +0000 (00:04 -0400)]
update D/f/directory-locking
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Wenyou Yang [Mon, 9 May 2016 06:51:18 +0000 (14:51 +0800)]
Revert "mtd: atmel_nand: Support variable RB_EDGE interrupts"
This reverts commit
5ddc7bd43ccc ("mtd: atmel_nand: Support variable
RB_EDGE interrupts")
Because for current SoCs, the RB_EDGE3(i.e. bit 27) of HSMC_SR
register does not exist, the RB_EDGE0 (i.e. bit 24) is the ready/busy
line edge status bit. It is a datasheet bug.
Cc: <stable@vger.kernel.org>
Fixes: commit
5ddc7bd43ccc ("mtd: atmel_nand: Support variable RB_EDGE interrupts")
Signed-off-by: Wenyou Yang <wenyou.yang@atmel.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Linus Torvalds [Thu, 26 May 2016 00:37:33 +0000 (17:37 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes: EFI, entry code, pkeys and MPX fixes, TASK_SIZE cleanups
and a tsc frequency table fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Switch from TASK_SIZE to TASK_SIZE_MAX in the page fault code
x86/fsgsbase/64: Use TASK_SIZE_MAX for FSBASE/GSBASE upper limits
x86/mm/mpx: Work around MPX erratum SKD046
x86/entry/64: Fix stack return address retrieval in thunk
x86/efi: Fix 7-parameter efi_call()s
x86/cpufeature, x86/mm/pkeys: Fix broken compile-time disabling of pkeys
x86/tsc: Add missing Cherrytrail frequency to the table
Linus Torvalds [Thu, 26 May 2016 00:11:43 +0000 (17:11 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Two fixes: one for a lost wakeup, the other to fix the compiler
optimizing out preempt operations on ARM64 (and possibly other non-x86
architectures)"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix remote wakeups
sched/preempt: Fix preempt_count manipulations
Linus Torvalds [Thu, 26 May 2016 00:05:40 +0000 (17:05 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Mostly tooling and PMU driver fixes, but also a number of late updates
such as the reworking of the call-chain size limiting logic to make
call-graph recording more robust, plus tooling side changes for the
new 'backwards ring-buffer' extension to the perf ring-buffer"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
perf record: Read from backward ring buffer
perf record: Rename variable to make code clear
perf record: Prevent reading invalid data in record__mmap_read
perf evlist: Add API to pause/resume
perf trace: Use the ptr->name beautifier as default for "filename" args
perf trace: Use the fd->name beautifier as default for "fd" args
perf report: Add srcline_from/to branch sort keys
perf evsel: Record fd into perf_mmap
perf evsel: Add overwrite attribute and check write_backward
perf tools: Set buildid dir under symfs when --symfs is provided
perf trace: Only auto set call-graph to "dwarf" when syscalls are being traced
perf annotate: Sort list of recognised instructions
perf annotate: Fix identification of ARM blt and bls instructions
perf tools: Fix usage of max_stack sysctl
perf callchain: Stop validating callchains by the max_stack sysctl
perf trace: Fix exit_group() formatting
perf top: Use machine->kptr_restrict_warned
perf trace: Warn when trying to resolve kernel addresses with kptr_restrict=1
perf machine: Do not bail out if not managing to read ref reloc symbol
perf/x86/intel/p4: Trival indentation fix, remove space
...
Jann Horn [Sun, 22 May 2016 04:01:34 +0000 (06:01 +0200)]
Yama: fix double-spinlock and user access in atomic context
Commit
8a56038c2aef ("Yama: consolidate error reporting") causes lockups
when someone hits a Yama denial. Call chain:
process_vm_readv -> process_vm_rw -> process_vm_rw_core -> mm_access
-> ptrace_may_access
task_lock(...) is taken
__ptrace_may_access -> security_ptrace_access_check
-> yama_ptrace_access_check -> report_access -> kstrdup_quotable_cmdline
-> get_cmdline -> access_process_vm -> get_task_mm
task_lock(...) is taken again
task_lock(p) just calls spin_lock(&p->alloc_lock), so at this point,
spin_lock() is called on a lock that is already held by the current
process.
Also: Since the alloc_lock is a spinlock, sleeping inside
security_ptrace_access_check hooks is probably not allowed at all? So it's
not even possible to print the cmdline from in there because that might
involve paging in userspace memory.
It would be tempting to rewrite ptrace_may_access() to drop the alloc_lock
before calling the LSM, but even then, ptrace_may_access() itself might be
called from various contexts in which you're not allowed to sleep; for
example, as far as I understand, to be able to hold a reference to another
task, usually an RCU read lock will be taken (see e.g. kcmp() and
get_robust_list()), so that also prohibits sleeping. (And using e.g. FUSE,
a user can cause pagefault handling to take arbitrary amounts of time -
see https://bugs.chromium.org/p/project-zero/issues/detail?id=808.)
Therefore, AFAIK, in order to print the name of a process below
security_ptrace_access_check(), you'd have to either grab a reference to
the mm_struct and defer the access violation reporting or just use the
"comm" value that's stored in kernelspace and accessible without big
complications. (Or you could try to use some kind of atomic remote VM
access that fails if the memory isn't paged in, similar to
copy_from_user_inatomic(), and if necessary fall back to comm, but
that'd be kind of ugly because the comm/cmdline choice would look
pretty random to the user.)
Fix it by deferring reporting of the access violation until current
exits kernelspace the next time.
v2: Don't oops on PTRACE_TRACEME, call report_access under
task_lock(current). Also fix nonsensical comment. And don't use
GPF_ATOMIC for memory allocation with no locks held.
This patch is tested both for ptrace attach and ptrace traceme.
Fixes:
8a56038c2aef ("Yama: consolidate error reporting")
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Linus Torvalds [Wed, 25 May 2016 23:52:19 +0000 (16:52 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull objtool build fix from Ingo Molnar:
"An libtool fix for older libelf versions"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Allow building with older libelf
Yan, Zheng [Thu, 19 May 2016 11:15:19 +0000 (19:15 +0800)]
ceph: fix wake_up_session_cb()
We should reset i_requested_max_size before waking the waiters.
(zero i_requested_max_size make waiter re-request the max size)
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Wed, 18 May 2016 12:58:26 +0000 (20:58 +0800)]
ceph: don't use truncate_pagecache() to invalidate read cache
truncate_pagecache() drops dirty pages, it's dangerous to use it
to invalidate read cache. Besides, we shouldn't start invalidating
read cache while there are buffer writers. Because buffer writers
may add dirty pages later.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Fri, 13 May 2016 09:54:17 +0000 (17:54 +0800)]
ceph: SetPageError() for writeback pages if writepages fails
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Fri, 13 May 2016 09:29:51 +0000 (17:29 +0800)]
ceph: handle interrupted ceph_writepage()
writepage() can be interrupted when it's called by direct memory
reclaimer (the direct memory relaimer is killed). To avoid lossing
data, we redirty the page.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Fri, 13 May 2016 03:30:24 +0000 (11:30 +0800)]
ceph: make ceph_update_writeable_page() uninterruptible
ceph_update_writeable_page() is used by ceph_write_begin(). It beaks
atomicity of write operation if it's interruptible.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Fri, 13 May 2016 03:04:33 +0000 (11:04 +0800)]
libceph: make ceph_osdc_wait_request() uninterruptible
Ceph_osdc_wait_request() is used when cephfs issues sync IO. In most
cases, the sync IO should be uninterruptible. The fix is use killale
wait function in ceph_osdc_wait_request().
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Tue, 10 May 2016 11:09:06 +0000 (19:09 +0800)]
ceph: handle -EAGAIN returned by ceph_update_writeable_page()
when ceph_update_writeable_page() return -EAGAIN, caller should
lock the page and call ceph_update_writeable_page() again.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Tue, 10 May 2016 10:59:13 +0000 (18:59 +0800)]
ceph: make fault/page_mkwrite return VM_FAULT_OOM for -ENOMEM
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Tue, 10 May 2016 10:40:28 +0000 (18:40 +0800)]
ceph: block non-fatal signals for fault/page_mkwrite
Fault and page_mkwrite are supposed to be uninterruptable. But they
call ceph functions that are interruptible. So they should block
signals before calling functions that are interruptible
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Zhang Zhuoyu [Fri, 25 Mar 2016 09:18:39 +0000 (05:18 -0400)]
ceph: make logical calculation functions return bool
This patch makes serverl logical caculation functions return bool to
improve readability due to these particular functions only using 0/1
as their return value.
No functional change.
Signed-off-by: Zhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
Yan, Zheng [Thu, 5 May 2016 08:40:17 +0000 (16:40 +0800)]
ceph: tolerate bad i_size for symlink inode
A mds bug can cause symlink's size to be truncated to zero.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Wed, 4 May 2016 03:40:30 +0000 (11:40 +0800)]
ceph: improve fragtree change detection
check if number of splits in i_fragtree is equal to number of splits
in mds reply
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Yan, Zheng [Wed, 4 May 2016 03:05:10 +0000 (11:05 +0800)]
ceph: keep leaf frag when updating fragtree
Nodes in i_fragtree are sorted according to ceph_compare_frag().
It means frag node in i_fragtree always follow its direct parent
node. To check if a leaf node is valid, we just need to check if
it's child of previous split node.
Signed-off-by: Yan, Zheng <zyan@redhat.com>