Jeff Layton [Tue, 11 Dec 2012 17:10:13 +0000 (12:10 -0500)]
vfs: make fchmodat retry once on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Thu, 20 Dec 2012 22:08:32 +0000 (17:08 -0500)]
vfs: have chroot retry once on ESTALE error
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:12 +0000 (12:10 -0500)]
vfs: have chdir retry lookup and call once on ESTALE error
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:11 +0000 (12:10 -0500)]
vfs: have faccessat retry once on an ESTALE error
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:11 +0000 (12:10 -0500)]
vfs: have do_sys_truncate retry once on an ESTALE error
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:10 +0000 (12:10 -0500)]
vfs: fix renameat to retry on ESTALE errors
...as always, rename is the messiest of the bunch. We have to track
whether to retry or not via a separate flag since the error handling
is already quite complex.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Thu, 20 Dec 2012 21:38:04 +0000 (16:38 -0500)]
vfs: make do_unlinkat retry once on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Thu, 20 Dec 2012 21:28:33 +0000 (16:28 -0500)]
vfs: make do_rmdir retry once on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:09 +0000 (12:10 -0500)]
vfs: add a flags argument to user_path_parent
...so we can pass in LOOKUP_REVAL. For now, nothing does yet.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Thu, 20 Dec 2012 21:15:38 +0000 (16:15 -0500)]
vfs: fix linkat to retry once on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:08 +0000 (12:10 -0500)]
vfs: fix symlinkat to retry on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Thu, 20 Dec 2012 21:04:09 +0000 (16:04 -0500)]
vfs: fix mkdirat to retry once on an ESTALE error
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Thu, 20 Dec 2012 21:00:10 +0000 (16:00 -0500)]
vfs: fix mknodat to retry on ESTALE errors
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:06 +0000 (12:10 -0500)]
vfs: turn is_dir argument to kern_path_create into a lookup_flags arg
Where we can pass in LOOKUP_DIRECTORY or LOOKUP_REVAL. Any other flags
passed in here are currently ignored.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:06 +0000 (12:10 -0500)]
vfs: fix readlinkat to retry on ESTALE
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 17:10:05 +0000 (12:10 -0500)]
vfs: make fstatat retry on ESTALE errors from getattr call
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Thu, 20 Dec 2012 19:59:40 +0000 (14:59 -0500)]
vfs: add a retry_estale helper function to handle retries on ESTALE
This function is expected to be called from path-based syscalls to help
them decide whether to try the lookup and call again in the event that
they got an -ESTALE return back on an earier try.
Currently, we only retry the call once on an ESTALE error, but in the
event that we decide that that's not enough in the future, we should be
able to change the logic in this helper without too much effort.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 20 Dec 2012 23:49:14 +0000 (18:49 -0500)]
Merge branch 'fscache' of git://git./linux/kernel/git/dhowells/linux-fs into for-linus
NeilBrown [Fri, 9 Nov 2012 00:09:37 +0000 (16:09 -0800)]
vfs: d_obtain_alias() needs to use "/" as default name.
NFS appears to use d_obtain_alias() to create the root dentry rather than
d_make_root. This can cause 'prepend_path()' to complain that the root
has a weird name if an NFS filesystem is lazily unmounted. e.g. if
"/mnt" is an NFS mount then
{ cd /mnt; umount -l /mnt ; ls -l /proc/self/cwd; }
will cause a WARN message like
WARNING: at /home/git/linux/fs/dcache.c:2624 prepend_path+0x1d7/0x1e0()
...
Root dentry has weird name <>
to appear in kernel logs.
So change d_obtain_alias() to use "/" rather than "" as the anonymous
name.
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Alessio Igor Bogani [Thu, 13 Dec 2012 11:22:39 +0000 (12:22 +0100)]
vfs: Remove useless function prototypes
Commit
8e22cc88d68ca1a46d7d582938f979eb640ed30f removes the (un)lock_super
function definitions but forgets to remove their prototypes.
Signed-off-by: Alessio Igor Bogani <abogani@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 11:00:38 +0000 (12:00 +0100)]
documentation: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 11:00:02 +0000 (12:00 +0100)]
mm: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:59:20 +0000 (11:59 +0100)]
vfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:58:36 +0000 (11:58 +0100)]
ntfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:57:37 +0000 (11:57 +0100)]
nilfs2: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:57:03 +0000 (11:57 +0100)]
ncpfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:56:25 +0000 (11:56 +0100)]
minix: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:55:42 +0000 (11:55 +0100)]
logfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:55:07 +0000 (11:55 +0100)]
hfsplus: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:54:25 +0000 (11:54 +0100)]
jfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:53:50 +0000 (11:53 +0100)]
hpfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David Howells [Fri, 14 Dec 2012 11:02:22 +0000 (11:02 +0000)]
FS-Cache: Clear remaining page count on retrieval cancellation
Provide fscache_cancel_op() with a pointer to a function it should invoke under
lock if it cancels an operation.
Use this to clear the remaining page count upon cancellation of a pending
retrieval operation so that fscache_release_retrieval_op() doesn't get an
assertion failure (see below). This can happen when a signal occurs, say from
CTRL-C being pressed during data retrieval.
FS-Cache: Assertion failed
3 == 0 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/page.c:237!
invalid opcode: 0000 [#641] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 0
Pid: 6075, comm: slurp-q Tainted: GF D 3.7.0-rc8-fsdevel+ #411 /DG965RY
RIP: 0010:[<
ffffffffa007f328>] [<
ffffffffa007f328>] fscache_release_retrieval_op+0x75/0xff [fscache]
RSP: 0000:
ffff88001c6d7988 EFLAGS:
00010296
RAX:
000000000000000f RBX:
ffff880014cdfe00 RCX:
ffffffff6c102000
RDX:
ffffffff8102d1ad RSI:
ffffffff6c102000 RDI:
ffffffff8102d1d6
RBP:
ffff88001c6d7998 R08:
0000000000000002 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000000 R12:
00000000fffffe00
R13:
ffff88001c6d7ab4 R14:
ffff88001a8638a0 R15:
ffff88001552b190
FS:
00007f877aaf0700(0000) GS:
ffff88003bc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
00007fff11378fd2 CR3:
000000001c6c6000 CR4:
00000000000007f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process slurp-q (pid: 6075, threadinfo
ffff88001c6d6000, task
ffff88001c6c4080)
Stack:
ffffffffa007ec07 ffff880014cdfe00 ffff88001c6d79c8 ffffffffa007db4d
ffffffffa007ec07 ffff880014cdfe00 00000000fffffe00 ffff88001c6d7ab4
ffff88001c6d7a38 ffffffffa008116d 0000000000000000 ffff88001c6c4080
Call Trace:
[<
ffffffffa007ec07>] ? fscache_cancel_op+0x194/0x1cf [fscache]
[<
ffffffffa007db4d>] fscache_put_operation+0x135/0x2ed [fscache]
[<
ffffffffa007ec07>] ? fscache_cancel_op+0x194/0x1cf [fscache]
[<
ffffffffa008116d>] __fscache_read_or_alloc_pages+0x413/0x4bc [fscache]
[<
ffffffff810ac8ae>] ? __alloc_pages_nodemask+0x195/0x75c
[<
ffffffffa00aab0f>] __nfs_readpages_from_fscache+0x86/0x13d [nfs]
[<
ffffffffa00a5fe0>] nfs_readpages+0x186/0x1bd [nfs]
[<
ffffffff810d23c8>] ? alloc_pages_current+0xc7/0xe4
[<
ffffffff810a68b5>] ? __page_cache_alloc+0x84/0x91
[<
ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<
ffffffff810afaa3>] __do_page_cache_readahead+0x237/0x2e0
[<
ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<
ffffffff810afe3e>] ra_submit+0x1c/0x20
[<
ffffffff810b019b>] ondemand_readahead+0x359/0x382
[<
ffffffff810b0279>] page_cache_sync_readahead+0x38/0x3a
[<
ffffffff810a77b5>] generic_file_aio_read+0x26b/0x637
[<
ffffffffa00f1852>] ? nfs_mark_delegation_referenced+0xb/0xb [nfsv4]
[<
ffffffffa009cc85>] nfs_file_read+0xaa/0xcf [nfs]
[<
ffffffff810db5b3>] do_sync_read+0x91/0xd1
[<
ffffffff810dbb8b>] vfs_read+0x9b/0x144
[<
ffffffff810dbc78>] sys_read+0x44/0x75
[<
ffffffff81422892>] system_call_fastpath+0x16/0x1b
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 13 Dec 2012 20:03:13 +0000 (20:03 +0000)]
FS-Cache: Mark cancellation of in-progress operation
Mark as cancelled an operation that is in progress rather than pending at the
time it is cancelled, and call fscache_complete_op() to cancel an operation so
that blocked ops can be started.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 7 Dec 2012 10:41:26 +0000 (10:41 +0000)]
FS-Cache: One of the write operation paths doesn't set the object state
In fscache_write_op(), if the object is determined to have become inactive or
to have lost its cookie, we don't move the operation state from in-progress,
and so an assertion in fscache_put_operation() fails with an assertion (see
below).
Instrumenting fscache_op_work_func() indicates that it called
fscache_write_op() before calling fscache_put_operation() - where the assertion
failed. The assertion at line 433 indicates that the operation state is
IN_PROGRESS rather than being COMPLETE or CANCELLED.
Instrumenting fscache_write_op() showed that it was being called on an object
that had had its cookie removed and that this was due to relinquishment of the
cookie by the netfs. At this point fscache no longer has access to the pages
of netfs data that were requested to be written, and so simply cancelling the
operation is the thing to do.
FS-Cache: Assertion failed
3 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:433!
invalid opcode: 0000 [#1] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 0
Pid: 1035, comm: kworker/u:3 Tainted: GF 3.7.0-rc8-fsdevel+ #411 /DG965RY
RIP: 0010:[<
ffffffffa007db22>] [<
ffffffffa007db22>] fscache_put_operation+0x11a/0x2ed [fscache]
RSP: 0018:
ffff88003e32bcf8 EFLAGS:
00010296
RAX:
000000000000000f RBX:
ffff88001818eb78 RCX:
ffffffff6c102000
RDX:
ffffffff8102d1ad RSI:
ffffffff6c102000 RDI:
ffffffff8102d1d6
RBP:
ffff88003e32bd18 R08:
0000000000000002 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000000 R12:
ffffffffa00811da
R13:
0000000000000001 R14:
0000000100625d26 R15:
0000000000000000
FS:
0000000000000000(0000) GS:
ffff88003bc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
00007fff7dd31c68 CR3:
000000003d730000 CR4:
00000000000007f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process kworker/u:3 (pid: 1035, threadinfo
ffff88003e32a000, task
ffff88003bb38080)
Stack:
ffffffff8102d1ad ffff88001818eb78 ffffffffa00811da 0000000000000001
ffff88003e32bd48 ffffffffa007f0ad ffff88001818eb78 ffffffff819583c0
ffff88003df24e00 ffff88003882c3e0 ffff88003e32bde8 ffffffff81042de0
Call Trace:
[<
ffffffff8102d1ad>] ? vprintk_emit+0x3c6/0x41a
[<
ffffffffa00811da>] ? __fscache_read_or_alloc_pages+0x4bc/0x4bc [fscache]
[<
ffffffffa007f0ad>] fscache_op_work_func+0xec/0x123 [fscache]
[<
ffffffff81042de0>] process_one_work+0x21c/0x3b0
[<
ffffffff81042d82>] ? process_one_work+0x1be/0x3b0
[<
ffffffffa007efc1>] ? fscache_operation_gc+0x23e/0x23e [fscache]
[<
ffffffff8104332e>] worker_thread+0x202/0x2df
[<
ffffffff8104312c>] ? rescuer_thread+0x18e/0x18e
[<
ffffffff81047c1c>] kthread+0xd0/0xd8
[<
ffffffff81421bfa>] ? _raw_spin_unlock_irq+0x29/0x3e
[<
ffffffff81047b4c>] ? __init_kthread_worker+0x55/0x55
[<
ffffffff814227ec>] ret_from_fork+0x7c/0xb0
[<
ffffffff81047b4c>] ? __init_kthread_worker+0x55/0x55
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 7 Dec 2012 18:08:02 +0000 (18:08 +0000)]
FS-Cache: Fix signal handling during waits
wait_on_bit() with TASK_INTERRUPTIBLE returns 1 rather than a negative error
code, so change what we check for. This means that the signal handling in
fscache_wait_for_retrieval_activation() should now work properly.
Without this, the following bug can be seen if CTRL-C is pressed during
fscache read operation:
FS-Cache: Assertion failed
2 == 3 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/page.c:347!
invalid opcode: 0000 [#1] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 1
Pid: 15006, comm: slurp-q Tainted: GF 3.7.0-rc8-fsdevel+ #411 /DG965RY
RIP: 0010:[<
ffffffffa007fcb4>] [<
ffffffffa007fcb4>] fscache_wait_for_retrieval_activation+0x167/0x177 [fscache]
RSP: 0018:
ffff88002a4c39a8 EFLAGS:
00010292
RAX:
000000000000001a RBX:
ffff88002d3dc158 RCX:
0000000000008685
RDX:
ffffffff8102ccd6 RSI:
0000000000000001 RDI:
ffffffff8102d1d6
RBP:
ffff88002a4c39c8 R08:
0000000000000002 R09:
0000000000000000
R10:
ffffffff8163afa0 R11:
ffff88003bd11900 R12:
ffffffffa00868c8
R13:
ffff880028306458 R14:
ffff88002d3dc1b0 R15:
ffff88001372e538
FS:
00007f17426a0700(0000) GS:
ffff88003bd00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
00007f1742494a44 CR3:
0000000031bd7000 CR4:
00000000000007e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process slurp-q (pid: 15006, threadinfo
ffff88002a4c2000, task
ffff880023de3040)
Stack:
ffff88002d3dc158 ffff88001372e538 ffff88002a4c3ab4 ffff8800283064e0
ffff88002a4c3a38 ffffffffa0080f6d 0000000000000000 ffff880023de3040
ffff88002a4c3ac8 ffffffff810ac8ae ffff880028306458 ffff88002a4c3bc8
Call Trace:
[<
ffffffffa0080f6d>] __fscache_read_or_alloc_pages+0x24f/0x4bc [fscache]
[<
ffffffff810ac8ae>] ? __alloc_pages_nodemask+0x195/0x75c
[<
ffffffffa00aab0f>] __nfs_readpages_from_fscache+0x86/0x13d [nfs]
[<
ffffffffa00a5fe0>] nfs_readpages+0x186/0x1bd [nfs]
[<
ffffffff810d23c8>] ? alloc_pages_current+0xc7/0xe4
[<
ffffffff810a68b5>] ? __page_cache_alloc+0x84/0x91
[<
ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<
ffffffff810afaa3>] __do_page_cache_readahead+0x237/0x2e0
[<
ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<
ffffffff810afe3e>] ra_submit+0x1c/0x20
[<
ffffffff810b019b>] ondemand_readahead+0x359/0x382
[<
ffffffff810b0279>] page_cache_sync_readahead+0x38/0x3a
[<
ffffffff810a77b5>] generic_file_aio_read+0x26b/0x637
[<
ffffffffa00f1852>] ? nfs_mark_delegation_referenced+0xb/0xb [nfsv4]
[<
ffffffffa009cc85>] nfs_file_read+0xaa/0xcf [nfs]
[<
ffffffff810db5b3>] do_sync_read+0x91/0xd1
[<
ffffffff810dbb8b>] vfs_read+0x9b/0x144
[<
ffffffff810dbc78>] sys_read+0x44/0x75
[<
ffffffff81422892>] system_call_fastpath+0x16/0x1b
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 5 Dec 2012 16:31:49 +0000 (16:31 +0000)]
NFS4: Open files for fscaching
nfs4_file_open() should open files for fscaching.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 5 Dec 2012 13:34:49 +0000 (13:34 +0000)]
FS-Cache: Add transition to handle invalidate immediately after lookup
Add a missing transition to the FS-Cache object state machine to handle an
invalidation event occuring between the back end completing the object lookup
by calling fscache_obtained_object() (which moves to state OBJECT_AVAILABLE)
and the backend returning to fscache_lookup_object() and thence to
fscache_object_state_machine() which then does a goto lookup_transit to handle
the transition - but lookup_transit doesn't handle EV_INVALIDATE.
Without this, the following BUG can be logged:
FS-Cache: Unsupported event 2 [5/f7] in state OBJECT_AVAILABLE
------------[ cut here ]------------
kernel BUG at fs/fscache/object.c:357!
Where event 2 is EV_INVALIDATE.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 5 Dec 2012 13:34:49 +0000 (13:34 +0000)]
NFS: nfs_migrate_page() does not wait for FS-Cache to finish with a page
nfs_migrate_page() does not wait for FS-Cache to finish with a page, probably
leading to the following bad-page-state:
BUG: Bad page state in process python-bin pfn:17d39b
page:
ffffea00053649e8 flags:
004000000000100c count:0 mapcount:0 mapping:(null)
index:38686 (Tainted: G B ---------------- )
Pid: 31053, comm: python-bin Tainted: G B ----------------
2.6.32-71.24.1.el6.x86_64 #1
Call Trace:
[<
ffffffff8111bfe7>] bad_page+0x107/0x160
[<
ffffffff8111ee69>] free_hot_cold_page+0x1c9/0x220
[<
ffffffff8111ef19>] __pagevec_free+0x59/0xb0
[<
ffffffff8104b988>] ? flush_tlb_others_ipi+0x128/0x130
[<
ffffffff8112230c>] release_pages+0x21c/0x250
[<
ffffffff8115b92a>] ? remove_migration_pte+0x28a/0x2b0
[<
ffffffff8115f3f8>] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70
[<
ffffffff81122687>] ____pagevec_lru_add+0x167/0x180
[<
ffffffff811226f8>] __lru_cache_add+0x58/0x70
[<
ffffffff81122731>] lru_cache_add_lru+0x21/0x40
[<
ffffffff81123f49>] putback_lru_page+0x69/0x100
[<
ffffffff8115c0bd>] migrate_pages+0x13d/0x5d0
[<
ffffffff81122687>] ? ____pagevec_lru_add+0x167/0x180
[<
ffffffff81152ab0>] ? compaction_alloc+0x0/0x370
[<
ffffffff8115255c>] compact_zone+0x4cc/0x600
[<
ffffffff8111cfac>] ? get_page_from_freelist+0x15c/0x820
[<
ffffffff810672f4>] ? check_preempt_wakeup+0x1c4/0x3c0
[<
ffffffff8115290e>] compact_zone_order+0x7e/0xb0
[<
ffffffff81152a49>] try_to_compact_pages+0x109/0x170
[<
ffffffff8111e94d>] __alloc_pages_nodemask+0x5ed/0x850
[<
ffffffff814c9136>] ? thread_return+0x4e/0x778
[<
ffffffff81150d43>] alloc_pages_vma+0x93/0x150
[<
ffffffff81167ea5>] do_huge_pmd_anonymous_page+0x135/0x340
[<
ffffffff814cb6f6>] ? rwsem_down_read_failed+0x26/0x30
[<
ffffffff81136755>] handle_mm_fault+0x245/0x2b0
[<
ffffffff814ce383>] do_page_fault+0x123/0x3a0
[<
ffffffff814cbdf5>] page_fault+0x25/0x30
nfs_migrate_page() calls nfs_fscache_release_page() which doesn't actually wait
- even if __GFP_WAIT is set. The reason that doesn't wait is that
fscache_maybe_release_page() might deadlock the allocator as the work threads
writing to the cache may all end up sleeping on memory allocation.
However, I wonder if that is actually a problem. There are a number of things
I can do to deal with this:
(1) Make nfs_migrate_page() wait.
(2) Make fscache_maybe_release_page() honour the __GFP_WAIT flag.
(3) Set a timeout around the wait.
(4) Make nfs_migrate_page() return an error if the page is still busy.
For the moment, I'll select (2) and (4).
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
David Howells [Wed, 5 Dec 2012 13:34:48 +0000 (13:34 +0000)]
FS-Cache: Exclusive op submission can BUG if there's been an I/O error
The function to submit an exclusive op (fscache_submit_exclusive_op()) can BUG
if there's been an I/O error because it may see the parent cache object in an
unexpected state. It should only BUG if there hasn't been an I/O error.
In this case the problem was produced by remounting the cache partition to be
R/O. The EROFS state was detected and the cache was aborted, but not
everything handled the aborting correctly.
SysRq : Emergency Remount R/O
EXT4-fs (sda6): re-mounted. Opts: (null)
Emergency Remount complete
CacheFiles: I/O Error: Failed to update xattr with error -30
FS-Cache: Cache cachefiles stopped due to I/O error
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:128!
invalid opcode: 0000 [#1] SMP
CPU 0
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc
Pid: 6612, comm: kworker/u:2 Not tainted 3.1.0-rc8-fsdevel+ #1093 /DG965RY
RIP: 0010:[<
ffffffffa00739c0>] [<
ffffffffa00739c0>] fscache_submit_exclusive_op+0x2ad/0x2c2 [fscache]
RSP: 0018:
ffff880000853d40 EFLAGS:
00010206
RAX:
ffff880038ac72a8 RBX:
ffff8800181f2260 RCX:
ffffffff81f2b2b0
RDX:
0000000000000001 RSI:
ffffffff8179a478 RDI:
ffff8800181f2280
RBP:
ffff880000853d60 R08:
0000000000000002 R09:
0000000000000000
R10:
0000000000000001 R11:
0000000000000001 R12:
ffff880038ac7268
R13:
ffff8800181f2280 R14:
ffff88003a359190 R15:
000000010122b162
FS:
0000000000000000(0000) GS:
ffff88003bc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
00000034cc4a77f0 CR3:
0000000010e96000 CR4:
00000000000006f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process kworker/u:2 (pid: 6612, threadinfo
ffff880000852000, task
ffff880014c3c040)
Stack:
ffff8800181f2260 ffff8800181f2310 ffff880038ac7268 ffff8800181f2260
ffff880000853dc0 ffffffffa0072375 ffff880037ecfe00 ffff88003a359198
ffff880000853dc0 0000000000000246 0000000000000000 ffff88000a91d308
Call Trace:
[<
ffffffffa0072375>] fscache_object_work_func+0x792/0xe65 [fscache]
[<
ffffffff81047e44>] process_one_work+0x1eb/0x37f
[<
ffffffff81047de6>] ? process_one_work+0x18d/0x37f
[<
ffffffffa0071be3>] ? fscache_enqueue_dependents+0xd8/0xd8 [fscache]
[<
ffffffff810482e4>] worker_thread+0x15a/0x21a
[<
ffffffff8104818a>] ? rescuer_thread+0x188/0x188
[<
ffffffff8104bf96>] kthread+0x7f/0x87
[<
ffffffff813ad6f4>] kernel_thread_helper+0x4/0x10
[<
ffffffff81026b98>] ? finish_task_switch+0x45/0xc0
[<
ffffffff813abd1d>] ? retint_restore_args+0xe/0xe
[<
ffffffff8104bf17>] ? __init_kthread_worker+0x53/0x53
[<
ffffffff813ad6f0>] ? gs_change+0xb/0xb
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 5 Dec 2012 13:34:48 +0000 (13:34 +0000)]
FS-Cache: Limit the number of I/O error reports for a cache
Limit the number of I/O error reports for a cache to 1 to prevent massive
amounts of noise. After the first I/O error the cache is taken off line
automatically, so must be restarted to resume caching.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 5 Dec 2012 13:34:47 +0000 (13:34 +0000)]
FS-Cache: Don't mask off the object event mask when printing it
Don't mask off the object event mask when printing it. That way it can be seen
if threre are bits set that shouldn't be.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 5 Dec 2012 13:34:46 +0000 (13:34 +0000)]
FS-Cache: Initialise the object event mask with the calculated mask
Initialise the object event mask with the calculated mask rather than unmasking
undefined events also.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 5 Dec 2012 13:34:46 +0000 (13:34 +0000)]
FS-Cache: Convert the object event ID #defines into an enum
Convert the fscache_object event IDs from #defines into an enum. Also add an
extra label to the enum to carry the event count and redefine the event mask
in terms of that.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 5 Dec 2012 13:34:45 +0000 (13:34 +0000)]
CacheFiles: Add missing retrieval completions
CacheFiles is missing some calls to fscache_retrieval_complete() in the error
handling/collision paths of its reader functions.
This can be seen by the following assertion tripping in fscache_put_operation()
whereby the operation being destroyed is still in the in-progress state and has
not been cancelled or completed:
FS-Cache: Assertion failed
3 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:408!
invalid opcode: 0000 [#1] SMP
CPU 2
Modules linked in: xfs ioatdma dca loop joydev evdev
psmouse dcdbas pcspkr serio_raw i5000_edac edac_core i5k_amb shpchp
pci_hotplug sg sr_mod]
Pid: 8062, comm: httpd Not tainted 3.1.0-rc8 #1 Dell Inc. PowerEdge 1950/0DT097
RIP: 0010:[<
ffffffff81197b24>] [<
ffffffff81197b24>] fscache_put_operation+0x304/0x330
RSP: 0018:
ffff880062f739d8 EFLAGS:
00010296
RAX:
0000000000000025 RBX:
ffff8800c5122e84 RCX:
ffffffff81ddf040
RDX:
00000000ffffffff RSI:
0000000000000082 RDI:
ffffffff81ddef30
RBP:
ffff880062f739f8 R08:
0000000000000005 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000003 R12:
ffff8800c5122e40
R13:
ffff880037a2cd20 R14:
ffff880087c7a058 R15:
ffff880087c7a000
FS:
00007f63dcf636e0(0000) GS:
ffff88022fc80000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f0c0a91f000 CR3:
0000000062ec2000 CR4:
00000000000006e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process httpd (pid: 8062, threadinfo
ffff880062f72000, task
ffff880087e58000)
Stack:
ffff880062f73bf8 0000000000000000 ffff880062f73bf8 ffff880037a2cd20
ffff880062f73a68 ffffffff8119aa7e ffff88006540e000 ffff880062f73ad4
ffff88008e9a4308 ffff880037a2cd20 ffff880062f73a48 ffff8800c5122e40
Call Trace:
[<
ffffffff8119aa7e>] __fscache_read_or_alloc_pages+0x1fe/0x530
[<
ffffffff81250780>] __nfs_readpages_from_fscache+0x70/0x1c0
[<
ffffffff8123142a>] nfs_readpages+0xca/0x1e0
[<
ffffffff815f3c06>] ? rpc_do_put_task+0x36/0x50
[<
ffffffff8122755b>] ? alloc_nfs_open_context+0x4b/0x110
[<
ffffffff815ecd1a>] ? rpc_call_sync+0x5a/0x70
[<
ffffffff810e7e9a>] __do_page_cache_readahead+0x1ca/0x270
[<
ffffffff810e7f61>] ra_submit+0x21/0x30
[<
ffffffff810e818d>] ondemand_readahead+0x11d/0x250
[<
ffffffff810e83b6>] page_cache_sync_readahead+0x36/0x60
[<
ffffffff810dffa4>] generic_file_aio_read+0x454/0x770
[<
ffffffff81224ce1>] nfs_file_read+0xe1/0x130
[<
ffffffff81121bd9>] do_sync_read+0xd9/0x120
[<
ffffffff8114088f>] ? mntput+0x1f/0x40
[<
ffffffff811238cb>] ? fput+0x1cb/0x260
[<
ffffffff81122938>] vfs_read+0xc8/0x180
[<
ffffffff81122af5>] sys_read+0x55/0x90
Reported-by: Mark Moseley <moseleymark@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 20 Dec 2012 21:52:38 +0000 (21:52 +0000)]
NFS: Use FS-Cache invalidation
Use the new FS-Cache invalidation facility from NFS to deal with foreign
changes being detected on the server rather than attempting to retire the old
cookie and get a new one.
The problem with the old method was that NFS did not wait for all outstanding
storage and retrieval ops on the cache to complete. There was no automatic
wait between the calls to ->readpages() and calls to invalidate_inode_pages2()
as the latter can only wait on locked pages that have been added to the
pagecache (which they haven't yet on entry to ->readpages()).
This was leading to oopses like the one below when an outstanding read got cut
off from its cookie by a premature release.
BUG: unable to handle kernel NULL pointer dereference at
00000000000000a8
IP: [<
ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
PGD
15889067 PUD
15890067 PMD 0
Oops: 0000 [#1] SMP
CPU 0
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc
Pid: 4544, comm: tar Not tainted 3.1.0-rc4-fsdevel+ #1064 /DG965RY
RIP: 0010:[<
ffffffffa0075118>] [<
ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
RSP: 0018:
ffff8800158799e8 EFLAGS:
00010246
RAX:
0000000000000000 RBX:
ffff8800070d41e0 RCX:
ffff8800083dc1b0
RDX:
0000000000000000 RSI:
ffff880015879960 RDI:
ffff88003e627b90
RBP:
ffff880015879a28 R08:
0000000000000002 R09:
0000000000000002
R10:
0000000000000001 R11:
ffff880015879950 R12:
ffff880015879aa4
R13:
0000000000000000 R14:
ffff8800083dc158 R15:
ffff880015879be8
FS:
00007f671e9d87c0(0000) GS:
ffff88003bc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
00000000000000a8 CR3:
000000001587f000 CR4:
00000000000006f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process tar (pid: 4544, threadinfo
ffff880015878000, task
ffff880015875040)
Stack:
ffffffffa00b1759 ffff8800070dc158 ffff8800000213da ffff88002a286508
ffff880015879aa4 ffff880015879be8 0000000000000001 ffff88002a2866e8
ffff880015879a88 ffffffffa00b20be 00000000000200da ffff880015875040
Call Trace:
[<
ffffffffa00b1759>] ? nfs_fscache_wait_bit+0xd/0xd [nfs]
[<
ffffffffa00b20be>] __nfs_readpages_from_fscache+0x7e/0x13f [nfs]
[<
ffffffff81095fe7>] ? __alloc_pages_nodemask+0x156/0x662
[<
ffffffffa0098763>] nfs_readpages+0xee/0x187 [nfs]
[<
ffffffff81098a5e>] __do_page_cache_readahead+0x1be/0x267
[<
ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267
[<
ffffffff81098d7b>] ra_submit+0x1c/0x20
[<
ffffffff8109900a>] ondemand_readahead+0x28b/0x29a
[<
ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a
[<
ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e
[<
ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs]
[<
ffffffff810c22c4>] do_sync_read+0xba/0xfa
[<
ffffffff810a62c9>] ? might_fault+0x4e/0x9e
[<
ffffffff81177a47>] ? security_file_permission+0x7b/0x84
[<
ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8
[<
ffffffff810c29a4>] vfs_read+0xaa/0x13a
[<
ffffffff810c2a79>] sys_read+0x45/0x6c
[<
ffffffff813ac37b>] system_call_fastpath+0x16/0x1b
Reported-by: Mark Moseley <moseleymark@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 20 Dec 2012 21:52:36 +0000 (21:52 +0000)]
CacheFiles: Implement invalidation
Implement invalidation for CacheFiles. This is in two parts:
(1) Provide an invalidation method (which just truncates the backing file).
(2) Abort attempts to copy anything read from the backing file whilst
invalidation is in progress.
Question: CacheFiles uses truncation in a couple of places. It has been using
notify_change() rather than sys_truncate() or something similar. This means
it bypasses a bunch of checks and suchlike that it possibly should be making
(security, file locking, lease breaking, vfsmount write). Should it be using
vfs_truncate() as added by a preceding patch or should it use notify_write()
and assume that anyone poking around in the cache files on disk gets
everything they deserve?
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 20 Dec 2012 21:52:36 +0000 (21:52 +0000)]
VFS: Make more complete truncate operation available to CacheFiles
Make a more complete truncate operation available to CacheFiles (including
security checks and suchlike) so that it can use this to clear invalidated
cache files.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
David Howells [Thu, 20 Dec 2012 21:52:36 +0000 (21:52 +0000)]
FS-Cache: Provide proper invalidation
Provide a proper invalidation method rather than relying on the netfs retiring
the cookie it has and getting a new one. The problem with this is that isn't
easy for the netfs to make sure that it has completed/cancelled all its
outstanding storage and retrieval operations on the cookie it is retiring.
Instead, have the cache provide an invalidation method that will cancel or wait
for all currently outstanding operations before invalidating the cache, and
will cause new operations to queue up behind that. Whilst invalidation is in
progress, some requests will be rejected until the cache can stack a barrier on
the operation queue to cause new operations to be deferred behind it.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 20 Dec 2012 21:52:35 +0000 (21:52 +0000)]
FS-Cache: Fix operation state management and accounting
Fix the state management of internal fscache operations and the accounting of
what operations are in what states.
This is done by:
(1) Give struct fscache_operation a enum variable that directly represents the
state it's currently in, rather than spreading this knowledge over a bunch
of flags, who's processing the operation at the moment and whether it is
queued or not.
This makes it easier to write assertions to check the state at various
points and to prevent invalid state transitions.
(2) Add an 'operation complete' state and supply a function to indicate the
completion of an operation (fscache_op_complete()) and make things call
it. The final call to fscache_put_operation() can then check that an op
in the appropriate state (complete or cancelled).
(3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better
govern the state of an object:
(a) The ->n_ops is now the number of extant operations on the object
and is now decremented by fscache_put_operation() only.
(b) The ->n_in_progress is simply the number of objects that have been
taken off of the object's pending queue for the purposes of being
run. This is decremented by fscache_op_complete() only.
(c) The ->n_exclusive is the number of exclusive ops that have been
submitted and queued or are in progress. It is decremented by
fscache_op_complete() and by fscache_cancel_op().
fscache_put_operation() and fscache_operation_gc() now no longer try to
clean up ->n_exclusive and ->n_in_progress. That was leading to double
decrements against fscache_cancel_op().
fscache_cancel_op() now no longer decrements ->n_ops. That was leading to
double decrements against fscache_put_operation().
fscache_submit_exclusive_op() now decides whether it has to queue an op
based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter
will persist in being true even after all preceding operations have been
cancelled or completed. Furthermore, if an object is active and there are
runnable ops against it, there must be at least one op running.
(4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and
provide a function to record completion of the pages as they complete.
When n_pages reaches 0, the operation is deemed to be complete and
fscache_op_complete() is called.
Add calls to fscache_retrieval_complete() anywhere we've finished with a
page we've been given to read or allocate for. This includes places where
we just return pages to the netfs for reading from the server and where
accessing the cache fails and we discard the proposed netfs page.
The bugs in the unfixed state management manifest themselves as oopses like the
following where the operation completion gets out of sync with return of the
cookie by the netfs. This is possible because the cache unlocks and returns
all the netfs pages before recording its completion - which means that there's
nothing to stop the netfs discarding them and returning the cookie.
FS-Cache: Cookie 'NFS.fh' still has outstanding reads
------------[ cut here ]------------
kernel BUG at fs/fscache/cookie.c:519!
invalid opcode: 0000 [#1] SMP
CPU 1
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc
Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090 /DG965RY
RIP: 0010:[<
ffffffffa007050a>] [<
ffffffffa007050a>] __fscache_relinquish_cookie+0x170/0x343 [fscache]
RSP: 0018:
ffff8800368cfb00 EFLAGS:
00010282
RAX:
000000000000003c RBX:
ffff880023cc8790 RCX:
0000000000000000
RDX:
0000000000002f2e RSI:
0000000000000001 RDI:
ffffffff813ab86c
RBP:
ffff8800368cfb50 R08:
0000000000000002 R09:
0000000000000000
R10:
ffff88003a1b7890 R11:
ffff88001df6e488 R12:
ffff880023d8ed98
R13:
ffff880023cc8798 R14:
0000000000000004 R15:
ffff88003b8bf370
FS:
0000000000000000(0000) GS:
ffff88003bd00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
00000000008ba008 CR3:
0000000023d93000 CR4:
00000000000006e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process kswapd0 (pid: 400, threadinfo
ffff8800368ce000, task
ffff88003b8bf040)
Stack:
ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0
ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0
ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91
Call Trace:
[<
ffffffffa00b2c91>] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs]
[<
ffffffffa008f25f>] nfs_clear_inode+0x3c/0x41 [nfs]
[<
ffffffffa0090df1>] nfs4_evict_inode+0x2f/0x33 [nfs]
[<
ffffffff810d8d47>] evict+0xa1/0x15c
[<
ffffffff810d8e2e>] dispose_list+0x2c/0x38
[<
ffffffff810d9ebd>] prune_icache_sb+0x28c/0x29b
[<
ffffffff810c56b7>] prune_super+0xd5/0x140
[<
ffffffff8109b615>] shrink_slab+0x102/0x1ab
[<
ffffffff8109d690>] balance_pgdat+0x2f2/0x595
[<
ffffffff8103e009>] ? process_timeout+0xb/0xb
[<
ffffffff8109dba3>] kswapd+0x270/0x289
[<
ffffffff8104c5ea>] ? __init_waitqueue_head+0x46/0x46
[<
ffffffff8109d933>] ? balance_pgdat+0x595/0x595
[<
ffffffff8104bf7a>] kthread+0x7f/0x87
[<
ffffffff813ad6b4>] kernel_thread_helper+0x4/0x10
[<
ffffffff81026b98>] ? finish_task_switch+0x45/0xc0
[<
ffffffff813abcdd>] ? retint_restore_args+0xe/0xe
[<
ffffffff8104befb>] ? __init_kthread_worker+0x53/0x53
[<
ffffffff813ad6b0>] ? gs_change+0xb/0xb
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 20 Dec 2012 21:52:35 +0000 (21:52 +0000)]
FS-Cache: Make cookie relinquishment wait for outstanding reads
Make fscache_relinquish_cookie() log a warning and wait if there are any
outstanding reads left on the cookie it was given.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 20 Dec 2012 21:52:34 +0000 (21:52 +0000)]
CacheFiles: Make some debugging statements conditional
Downgrade some debugging statements to not unconditionally print stuff, but
rather be conditional on the appropriate module parameter setting.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 20 Dec 2012 21:52:33 +0000 (21:52 +0000)]
FS-Cache: Check that there are no read ops when cookie relinquished
Check that the netfs isn't trying to relinquish a cookie that still has read
operations in progress upon it. If there are, then give log a warning and BUG.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 20 Dec 2012 21:52:33 +0000 (21:52 +0000)]
CacheFiles: Downgrade the requirements passed to the allocator
Downgrade the requirements passed to the allocator in the gfp flags parameter.
FS-Cache/CacheFiles can handle OOM conditions simply by aborting the attempt to
store an object or a page in the cache.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 20 Dec 2012 21:52:32 +0000 (21:52 +0000)]
CacheFiles: Fix the marking of cached pages
Under some circumstances CacheFiles defers the marking of pages with PG_fscache
so that it can take advantage of pagevecs to reduce the number of calls to
fscache_mark_pages_cached() and the netfs's hook to keep track of this.
There are, however, two problems with this:
(1) It can lead to the PG_fscache mark being applied _after_ the page is set
PG_uptodate and unlocked (by the call to fscache_end_io()).
(2) CacheFiles's ref on the page is dropped immediately following
fscache_end_io() - and so may not still be held when the mark is applied.
This can lead to the page being passed back to the allocator before the
mark is applied.
Fix this by, where appropriate, marking the page before calling
fscache_end_io() and releasing the page. This means that we can't take
advantage of pagevecs and have to make a separate call for each page to the
marking routines.
The symptoms of this are Bad Page state errors cropping up under memory
pressure, for example:
BUG: Bad page state in process tar pfn:002da
page:
ffffea0000009fb0 count:0 mapcount:0 mapping: (null) index:0x1447
page flags: 0x1000(private_2)
Pid: 4574, comm: tar Tainted: G W 3.1.0-rc4-fsdevel+ #1064
Call Trace:
[<
ffffffff8109583c>] ? dump_page+0xb9/0xbe
[<
ffffffff81095916>] bad_page+0xd5/0xea
[<
ffffffff81095d82>] get_page_from_freelist+0x35b/0x46a
[<
ffffffff810961f3>] __alloc_pages_nodemask+0x362/0x662
[<
ffffffff810989da>] __do_page_cache_readahead+0x13a/0x267
[<
ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267
[<
ffffffff81098d7b>] ra_submit+0x1c/0x20
[<
ffffffff8109900a>] ondemand_readahead+0x28b/0x29a
[<
ffffffff81098ee2>] ? ondemand_readahead+0x163/0x29a
[<
ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a
[<
ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e
[<
ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs]
[<
ffffffff810c22c4>] do_sync_read+0xba/0xfa
[<
ffffffff81177a47>] ? security_file_permission+0x7b/0x84
[<
ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8
[<
ffffffff810c29a4>] vfs_read+0xaa/0x13a
[<
ffffffff810c2a79>] sys_read+0x45/0x6c
[<
ffffffff813ac37b>] system_call_fastpath+0x16/0x1b
As can be seen, PG_private_2 (== PG_fscache) is set in the page flags.
Instrumenting fscache_mark_pages_cached() to verify whether page->mapping was
set appropriately showed that sometimes it wasn't. This led to the discovery
that sometimes the page has apparently been reclaimed by the time the marker
got to see it.
Reported-by: M. Stevens <m@tippett.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Marco Stornelli [Sat, 15 Dec 2012 10:53:15 +0000 (11:53 +0100)]
hfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:52:33 +0000 (11:52 +0100)]
bfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:51:53 +0000 (11:51 +0100)]
affs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:51:11 +0000 (11:51 +0100)]
adfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:50:20 +0000 (11:50 +0100)]
ocfs2: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:49:42 +0000 (11:49 +0100)]
omfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:48:48 +0000 (11:48 +0100)]
procfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:47:31 +0000 (11:47 +0100)]
reiserfs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:45:58 +0000 (11:45 +0100)]
sysv: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Marco Stornelli [Sat, 15 Dec 2012 10:45:14 +0000 (11:45 +0100)]
ufs: drop vmtruncate
Removed vmtruncate
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jan Kara [Wed, 5 Dec 2012 13:40:14 +0000 (14:40 +0100)]
fs: Fix imbalance in freeze protection in mark_files_ro()
File descriptors (even those for writing) do not hold freeze protection.
Thus mark_files_ro() must call __mnt_drop_write() to only drop protection
against remount read-only. Calling mnt_drop_write_file() as we do now
results in:
[ BUG: bad unlock balance detected! ]
3.7.0-rc6-00028-g88e75b6 #101 Not tainted
-------------------------------------
kworker/1:2/79 is trying to release lock (sb_writers) at:
[<
ffffffff811b33b4>] mnt_drop_write+0x24/0x30
but there are no more locks to release!
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Wed, 28 Nov 2012 16:30:53 +0000 (11:30 -0500)]
vfs: remove DCACHE_NEED_LOOKUP
The code that relied on that flag was ripped out of btrfs quite some
time ago, and never added back. Josef indicated that he was going to
take a different approach to the problem in btrfs, and that we
could just eliminate this flag.
Cc: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 20 Dec 2012 18:41:28 +0000 (13:41 -0500)]
path_init(): make -ENOTDIR failure exits consistent
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jeff Layton [Tue, 11 Dec 2012 13:56:16 +0000 (08:56 -0500)]
vfs: remove unneeded permission check from path_init
When path_init is called with a valid dfd, that code checks permissions
on the open directory fd and returns an error if the check fails. This
permission check is redundant, however.
Both callers of path_init immediately call link_path_walk afterward. The
first thing that link_path_walk does for pathnames that do not consist
only of slashes is to check for exec permissions at the starting point of
the path walk. And this check in path_init() is on the path taken only
when *name != '/' && *name != '\0'.
In most cases, these checks are very quick, but when the dfd is for a
file on a NFS mount with the actimeo=0, each permission check goes
out onto the wire. The result is 2 identical ACCESS calls.
Given that these codepaths are fairly "hot", I think it makes sense to
eliminate the permission check in path_init and simply assume that the
caller will eventually check the permissions before proceeding.
Reported-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miao Xie [Fri, 16 Nov 2012 09:23:50 +0000 (17:23 +0800)]
vfs, freeze: use ACCESS_ONCE() to guard access to ->mnt_flags
The compiler may optimize the while loop and make the check just be done once,
so we should use ACCESS_ONCE() to guard access to ->mnt_flags
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Vaibhav Bedia [Wed, 19 Dec 2012 06:53:10 +0000 (06:53 +0000)]
ARM: OMAP: Fix build breakage due to missing include in i2c.c
Merge commit
752451f01c45 ("Merge branch 'i2c-embedded/for-next' of
git://git.pengutronix.de/git/wsa/linux") resulted in a build breakage
for OMAP
arch/arm/mach-omap2/i2c.c: In function 'omap_pm_set_max_mpu_wakeup_lat_compat':
arch/arm/mach-omap2/i2c.c:130:2: error: implicit declaration of function 'omap_pm_set_max_mpu_wakeup_lat'
make[1]: *** [arch/arm/mach-omap2/i2c.o] Error 1
Fix this by including the appropriate header file with the function
prototype.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Vaibhav Bedia <vaibhav.bedia@ti.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 20 Dec 2012 16:37:04 +0000 (08:37 -0800)]
Merge tag 'virtio-next-for-linus' of git://git./linux/kernel/git/rusty/linux
Pull virtio update from Rusty Russell:
"Some nice cleanups, and even a patch my wife did as a "live" demo for
Latinoware 2012.
There's a slightly non-trivial merge in virtio-net, as we cleaned up
the virtio add_buf interface while DaveM accepted the mq virtio-net
patches."
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (27 commits)
virtio_console: Add support for remoteproc serial
virtio_console: Merge struct buffer_token into struct port_buffer
virtio: add drv_to_virtio to make code clearly
virtio: use dev_to_virtio wrapper in virtio
virtio-mmio: Fix irq parsing in command line parameter
virtio_console: Free buffers from out-queue upon close
virtio: Convert dev_printk(KERN_<LEVEL> to dev_<level>(
virtio_console: Use kmalloc instead of kzalloc
virtio_console: Free buffer if splice fails
virtio: tools: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: scsi: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: rpmsg: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: net: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: console: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: make virtqueue_add_buf() returning 0 on success, not capacity.
virtio: console: don't rely on virtqueue_add_buf() returning capacity.
virtio_net: don't rely on virtqueue_add_buf() returning capacity.
virtio-net: remove unused skb_vnet_hdr->num_sg field
virtio-net: correct capacity math on ring full
virtio: move queue_index and num_free fields into core struct virtqueue.
...
Linus Torvalds [Thu, 20 Dec 2012 15:52:13 +0000 (07:52 -0800)]
Merge tag 'sound-3.8' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This update contains overall only driver-specific fixes. Slightly
large LOC are seen in usb-audio driver for a couple of new device
quirks and cs42l71 ASoC driver for enhanced features. The others are
a few small (regression) fixes HD-audio, and yet other small / trival
ASoC fixes."
* tag 'sound-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Support for Digidesign Mbox 2 USB sound card:
ALSA: HDA: Fix sound resume hang
ALSA: hda - bug fix for invalid connection list of Haswell HDMI codec pins
ALSA: hda - Fix the wrong pincaps set in ALC861VD dallas/hp fixup
ALSA: hda - Set codec->single_adc_amp flag for Realtek codecs
ASoC: atmel-ssc: change disable to disable in dts node
ASoC: Prevent pop_wait overwrite
ALSA: usb-audio: ignore-quirk for HP Wireless Audio
ALSA: hda - Always turn on pins for HDMI/DP
ALSA: hda - Fix pin configuration of HP Pavilion dv7
ASoC: core: Fix splitting of log messages
ASoC: cs42l73: Change VSPIN/VSPOUT to VSPINOUT
ASoC: cs42l73: Add DAPM events for power down.
ASoC: cs42l73: Add DMIC's as DAPM inputs.
ASoC: sigmadsp: Fix endianness conversion issue
ASoC: tpa6130a2: Use devm_* APIs
Linus Torvalds [Thu, 20 Dec 2012 15:39:03 +0000 (07:39 -0800)]
Merge tag 'upstream-3.8-rc1' of git://git.infradead.org/linux-ubi
Pull UBI update from Artem Bityutskiy:
"Nothing exciting, just clean-ups and nicification. Oh, and one small
optimization which makes UBI to use less RAM."
* tag 'upstream-3.8-rc1' of git://git.infradead.org/linux-ubi:
UBI: embed ubi_debug_info field in ubi_device struct
UBI: introduce helpers dbg_chk_{io, gen}
UBI: replace memcpy with struct assignment
UBI: remove spurious comment
UBI: gluebi: rename misleading variables
UBI: do not allocate the memory unnecessarily
UBI: use list_move_tail instead of list_del/list_add_tail
Linus Torvalds [Thu, 20 Dec 2012 15:27:44 +0000 (07:27 -0800)]
Merge tags 'disintegrate-h8300-
20121219', 'disintegrate-m32r-
20121219' and 'disintegrate-score-
20121220' of git://git.infradead.org/users/dhowells/linux-headers
Pull UAPI disintegration for H8/300, M32R and Score from David Howells.
Scripted UAPI patches for architectures that apparently never reacted to
it on their own.
* tag 'disintegrate-h8300-
20121219' of git://git.infradead.org/users/dhowells/linux-headers:
UAPI: (Scripted) Disintegrate arch/h8300/include/asm
* tag 'disintegrate-m32r-
20121219' of git://git.infradead.org/users/dhowells/linux-headers:
UAPI: (Scripted) Disintegrate arch/m32r/include/asm
* tag 'disintegrate-score-
20121220' of git://git.infradead.org/users/dhowells/linux-headers:
UAPI: (Scripted) Disintegrate arch/score/include/asm
Linus Torvalds [Thu, 20 Dec 2012 15:24:17 +0000 (07:24 -0800)]
Merge tag 'cris-for-linus-3.8' of git://jni.nu/cris
Pull CRIS changes from Jesper Nilsson.
... mainly the UAPI disintegration.
* tag 'cris-for-linus-3.8' of git://jni.nu/cris:
UAPI: Fix up empty files in arch/cris/
CRIS: locking: fix the return value of arch_read_trylock()
CRIS: use kbuild.h instead of defining macros in asm-offset.c
UAPI: (Scripted) Disintegrate arch/cris/include/asm
UAPI: (Scripted) Disintegrate arch/cris/include/arch-v32/arch
UAPI: (Scripted) Disintegrate arch/cris/include/arch-v10/arch
Linus Torvalds [Thu, 20 Dec 2012 15:21:54 +0000 (07:21 -0800)]
Merge tag 'fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"This is a batch of fixes for arm-soc platforms, most of it is for OMAP
but there are others too (i.MX, Tegra, ep93xx). Fixes warnings, some
broken platforms and drivers, etc. A bit all over the map really."
There was some concern about commit
68136b10 ("RM: sunxi: Change device
tree naming scheme for sunxi"), but Tony says:
"Looks like that's trivial to fix as needed, no need to rebuild the
branch to fix that AFAIK.
The fix can be done once Olof is available online again.
Linus, I suggest that you go ahead and pull this if there are no other
issues with this branch."
* tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (32 commits)
ARM: sunxi: Change device tree naming scheme for sunxi
ARM: ux500: fix missing include
ARM: u300: delete custom pin hog code
ARM: davinci: fix build break due to missing include
ARM: exynos: Fix warning due to missing 'inline' in stub
ARM: imx: Move platform-mx2-emma to arch/arm/mach-imx/devices
ARM i.MX51 clock: Fix regression since enabling MIPI/HSP clocks
ARM: dts: mx27: Fix the AIPI bus for FEC
ARM: OMAP2+: common: remove use of vram
ARM: OMAP3/4: cpuidle: fix sparse and checkpatch warnings
ARM: OMAP4: clock data: DPLLs are missing bypass clocks in their parent lists
ARM: OMAP4: clock data: div_iva_hs_clk is a power-of-two divider
ARM: OMAP4: Fix EMU clock domain always on
ARM: OMAP4460: Workaround ABE DPLL failing to turn-on
ARM: OMAP4: Enhance support for DPLLs with 4X multiplier
ARM: OMAP4: Add function table for non-M4X dplls
ARM: OMAP4: Update timer clock aliases
ARM: OMAP: Move plat/omap-serial.h to include/linux/platform_data/serial-omap.h
ARM: dts: Add build target for omap4-panda-a4
ARM: dts: OMAP2420: Correct H4 board memory size
...
Linus Torvalds [Thu, 20 Dec 2012 15:18:29 +0000 (07:18 -0800)]
Merge tag 'tag-for-linus-3.8' of git://git.linaro.org/people/sumitsemwal/linux-dma-buf
Pull dma-buf updates from Sumit Semwal:
"A fairly small dma-buf pull request for 3.8 - only 2 patches"
* tag 'tag-for-linus-3.8' of git://git.linaro.org/people/sumitsemwal/linux-dma-buf:
dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER
dma-buf: might_sleep() in dma_buf_unmap_attachment()
Linus Torvalds [Thu, 20 Dec 2012 15:07:18 +0000 (07:07 -0800)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/jdelvare/staging
Pull hwmon subsystem update from Jean Delvare:
"There are many improvements to the it87 driver, as well as suspend
support for the Winbond Super-I/O chips, and a few other fixes."
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
hwmon-vid: Add support for AMD family 11h to 15h processors
hwmon: (it87) Support PECI for additional chips
hwmon: (it87) Report thermal sensor type as Intel PECI if appropriate
hwmon: (it87) Manage device specific features with table
hwmon: (it87) Replace pwm group macro with direct attribute definitions
hwmon: (it87) Avoid quoted string splits across lines
hwmon: (it87) Save fan registers in 2-dimensional array
hwmon: (it87) Introduce support for tempX_offset sysfs attribute
hwmon: (it87) Replace macro defining tempX_type sensors with direct definitions
hwmon: (it87) Save voltage register values in 2-dimensional array
hwmon: (it87) Save temperature registers in 2-dimensional array
hwmon: (w83627ehf) Get rid of smatch warnings
hwmon: (w83627hf) Don't touch nonexistent I2C address registers
hwmon: (w83627ehf) Add support for suspend
hwmon: (w83627hf) Add support for suspend
hwmon: Fix PCI device reference leak in quirk
Hugh Dickins [Thu, 20 Dec 2012 01:44:29 +0000 (17:44 -0800)]
ksm: make rmap walks more scalable
The rmap walks in ksm.c are like those in rmap.c: they can safely be
done with anon_vma_lock_read().
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 20 Dec 2012 01:42:16 +0000 (17:42 -0800)]
sched: numa: ksm: fix oops in task_numa_placment()
task_numa_placement() oopsed on NULL p->mm when task_numa_fault() got
called in the handling of break_ksm() for ksmd. That might be a
peculiar case, which perhaps KSM could takes steps to avoid? but it's
more robust if task_numa_placement() allows for such a possibility.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zlatko Calusic [Wed, 19 Dec 2012 23:25:13 +0000 (00:25 +0100)]
mm: do not sleep in balance_pgdat if there's no i/o congestion
On a 4GB RAM machine, where Normal zone is much smaller than DMA32 zone,
the Normal zone gets fragmented in time. This requires relatively more
pressure in balance_pgdat to get the zone above the required watermark.
Unfortunately, the congestion_wait() call in there slows it down for a
completely wrong reason, expecting that there's a lot of
writeback/swapout, even when there's none (much more common). After a
few days, when fragmentation progresses, this flawed logic translates to
a very high CPU iowait times, even though there's no I/O congestion at
all. If THP is enabled, the problem occurs sooner, but I was able to
see it even on !THP kernels, just by giving it a bit more time to occur.
The proper way to deal with this is to not wait, unless there's
congestion. Thanks to Mel Gorman, we already have the function that
perfectly fits the job. The patch was tested on a machine which nicely
revealed the problem after only 1 day of uptime, and it's been working
great.
Signed-off-by: Zlatko Calusic <zlatko.calusic@iskon.hr>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Mon, 22 Oct 2012 12:18:44 +0000 (14:18 +0200)]
UAPI: Fix up empty files in arch/cris/
Fix up three empty files in arch/cris/ by sticking placeholder comments in
there to prevent the patch program from deleting them.
I decided not to delete the arch-v*/Kbuild files as it's possibly someone might
want to use them for genhdr-y lines in the future, but they could be deleted
and the pointer lines removed from asm/Kbuild. The uapi/arch-v*/Kbuild files
ought to be uneffected by such a change.
asm/swab.h didn't have anything outside of __KERNEL__ so nothing appeared in
uapi/asm/swab.h. The latter, however, is exported by Kbuild.asm.
This needs to be applied after the CRIS UAPI disintegration patch.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Wei Yongjun [Wed, 17 Oct 2012 14:54:27 +0000 (16:54 +0200)]
CRIS: locking: fix the return value of arch_read_trylock()
arch_write_trylock() should return 'ret' instead of always
return 1.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Jesper Nilsson [Thu, 20 Dec 2012 11:48:53 +0000 (12:48 +0100)]
Merge tag 'disintegrate-cris-
20121009' of git://git.infradead.org/users/dhowells/linux-headers into for-linus2
UAPI Disintegration 2012-10-09
* tag 'disintegrate-cris-
20121009' of git://git.infradead.org/users/dhowells/linux-headers:
UAPI: (Scripted) Disintegrate arch/cris/include/asm
UAPI: (Scripted) Disintegrate arch/cris/include/arch-v32/arch
UAPI: (Scripted) Disintegrate arch/cris/include/arch-v10/arch
James Hogan [Thu, 11 Oct 2012 09:00:58 +0000 (11:00 +0200)]
CRIS: use kbuild.h instead of defining macros in asm-offset.c
This is modelled on commits such as the one below:
Commit
fc1c3a003edb8a6778e64e10ef671a38c76c969e ("sh: use kbuild.h
instead of defining macros in asm-offsets.c") introduced in v2.6.26.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
David Howells [Thu, 20 Dec 2012 10:53:58 +0000 (10:53 +0000)]
UAPI: (Scripted) Disintegrate arch/score/include/asm
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
Acked-by: Lennox Wu <lennox.wu@gmail.com>
Acked-by: Liqin Chen <liqin299@gmail.com>
Maarten Lankhorst [Wed, 12 Dec 2012 09:23:03 +0000 (10:23 +0100)]
dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER
Documentation says that code requiring dma-buf should add it to
select, so inline fallbacks are not going to be used. A link error
will make it obvious what went wrong, instead of silently doing
nothing at runtime.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Rob Clark <rob.clark@linaro.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Rob Clark [Fri, 28 Sep 2012 07:29:43 +0000 (09:29 +0200)]
dma-buf: might_sleep() in dma_buf_unmap_attachment()
We never really clarified if unmap could be done in atomic context.
But since mapping might require sleeping, this implies mutex in use
to synchronize mapping/unmapping, so unmap could sleep as well. Add
a might_sleep() to clarify this.
Signed-off-by: Rob Clark <rob@ti.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Linus Torvalds [Thu, 20 Dec 2012 04:31:02 +0000 (20:31 -0800)]
Merge git://git./linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
"Please pull to get these sparc AES/DES/CAMELLIA crypto bug fixes as
well as an addition of a pte_accessible() define for sparc64 and a
hugetlb fix from Dave Kleikamp."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Set CRYPTO_TFM_REQ_MAY_SLEEP consistently in CAMELLIA code.
sparc64: Set CRYPTO_TFM_REQ_MAY_SLEEP consistently in DES code.
sparc64: Fix ECB looping constructs in AES code.
sparc64: Set CRYPTO_TFM_REQ_MAY_SLEEP consistently in AES code.
sparc64: Fix AES ctr mode block size.
sparc64: Fix unrolled AES 256-bit key loops.
sparc64: Define pte_accessible()
sparc: huge_ptep_set_* functions need to call set_huge_pte_at()
Linus Torvalds [Thu, 20 Dec 2012 04:29:15 +0000 (20:29 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Really fix tuntap SKB use after free bug, from Eric Dumazet.
2) Adjust SKB data pointer to point past the transport header before
calling icmpv6_notify() so that the headers are in the state which
that function expects. From Duan Jiong.
3) Fix ambiguities in the new tuntap multi-queue APIs. From Jason
Wang.
4) mISDN needs to use del_timer_sync(), from Konstantin Khlebnikov.
5) Don't destroy mutex after freeing up device private in mac802154,
fix also from Konstantin Khlebnikov.
6) Fix INET request socket leak in TCP and DCCP, from Christoph Paasch.
7) SCTP HMAC kconfig rework, from Neil Horman.
8) Fix SCTP jprobes function signature, otherwise things explode, from
Daniel Borkmann.
9) Fix typo in ipv6-offload Makefile variable reference, from Simon
Arlott.
10) Don't fail USBNET open just because remote wakeup isn't supported,
from Oliver Neukum.
11) be2net driver bug fixes from Sathya Perla.
12) SOLOS PCI ATM driver bug fixes from Nathan Williams and David
Woodhouse.
13) Fix MTU changing regression in 8139cp driver, from John Greene.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
solos-pci: ensure all TX packets are aligned to 4 bytes
solos-pci: add firmware upgrade support for new models
solos-pci: remove superfluous debug output
solos-pci: add GPIO support for newer versions on Geos board
8139cp: Prevent dev_close/cp_interrupt race on MTU change
net: qmi_wwan: add ZTE MF880
drivers/net: Use of_match_ptr() macro in smsc911x.c
drivers/net: Use of_match_ptr() macro in smc91x.c
ipv6: addrconf.c: remove unnecessary "if"
bridge: Correctly encode addresses when dumping mdb entries
bridge: Do not unregister all PF_BRIDGE rtnl operations
use generic usbnet_manage_power()
usbnet: generic manage_power()
usbnet: handle PM failure gracefully
ksz884x: fix receive polling race condition
qlcnic: update driver version
qlcnic: fix unused variable warnings
net: fec: forbid FEC_PTP on SoCs that do not support
be2net: fix wrong frag_idx reported by RX CQ
be2net: fix be_close() to ensure all events are ack'ed
...
Linus Torvalds [Thu, 20 Dec 2012 04:26:16 +0000 (20:26 -0800)]
Merge tags 'dt-for-linus', 'gpio-for-linus' and 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6
Pull devicetree, gpio and spi bugfixes from Grant Likely:
"Device tree v3.8 bug fix:
- Fixes an undefined struct device build error and a missing symbol
export.
GPIO device driver bug fixes:
- gpio/mvebu-gpio: Make mvebu-gpio depend on OF_CONFIG
- gpio/ich: Add missing spinlock init
SPI device driver bug fixes:
- Most of this is bug fixes to the core code and the sh-hspi and
s3c64xx device drivers.
- There is also a patch here to add DT support to the Atmel driver.
This one should have been in the first round, but I missed it.
It's a low risk change contained within a single driver and the
Atmel maintainer has requested it."
* tag 'dt-for-linus' of git://git.secretlab.ca/git/linux-2.6:
of: define struct device in of_platform.h if !OF_DEVICE and !OF_ADDRESS
of: Fix export of of_find_matching_node_and_match()
* tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6:
gpio/mvebu-gpio: Make mvebu-gpio depend on OF_CONFIG
gpio/ich: Add missing spinlock init
* tag 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6:
spi/sh-hspi: fix return value check in hspi_probe().
spi: fix tegra SPI binding examples
spi/atmel: add DT support
of/spi: Fix SPI module loading by using proper "spi:" modalias prefixes.
spi: Change FIFO flush operation and spi channel off
spi: Keep chipselect assertion during one message
Linus Torvalds [Thu, 20 Dec 2012 04:24:25 +0000 (20:24 -0800)]
Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm bugfix from Dave Airlie:
"Just a single urgent regression fix, seeing a few wierd behaviours I'd
like not to persist."
* 'drm-next' of git://people.freedesktop.org/~airlied/linux:
drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling
Linus Torvalds [Thu, 20 Dec 2012 04:23:37 +0000 (20:23 -0800)]
Merge tag 'random_for_linus' of git://git./linux/kernel/git/tytso/random
Pull random updates from Ted Ts'o:
"A few /dev/random improvements for the v3.8 merge window."
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: Mix cputime from each thread that exits to the pool
random: prime last_data value per fips requirements
random: fix debug format strings
random: make it possible to enable debugging without rebuild
David S. Miller [Wed, 19 Dec 2012 23:44:31 +0000 (15:44 -0800)]
sparc64: Set CRYPTO_TFM_REQ_MAY_SLEEP consistently in CAMELLIA code.
We use the FPU and therefore cannot sleep during the crypto
loops.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Dec 2012 23:43:38 +0000 (15:43 -0800)]
sparc64: Set CRYPTO_TFM_REQ_MAY_SLEEP consistently in DES code.
We use the FPU and therefore cannot sleep during the crypto
loops.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Dec 2012 23:30:07 +0000 (15:30 -0800)]
sparc64: Fix ECB looping constructs in AES code.
Things works better when you increment the source buffer pointer
properly.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Dec 2012 23:22:03 +0000 (15:22 -0800)]
sparc64: Set CRYPTO_TFM_REQ_MAY_SLEEP consistently in AES code.
We use the FPU and therefore cannot sleep during the crypto
loops.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Dec 2012 23:20:23 +0000 (15:20 -0800)]
sparc64: Fix AES ctr mode block size.
Like the generic versions, we need to support a block size
of '1' for CTR mode AES.
This was discovered thanks to all of the new test cases added by
Jussi Kivilinna.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Dec 2012 23:19:11 +0000 (15:19 -0800)]
sparc64: Fix unrolled AES 256-bit key loops.
The basic scheme of the block mode assembler is that we start by
enabling the FPU, loading the key into the floating point registers,
then iterate calling the encrypt/decrypt routine for each block.
For the 256-bit key cases, we run short on registers in the unrolled
loops.
So the {ENCRYPT,DECRYPT}_256_2() macros reload the key registers that
get clobbered.
The unrolled macros, {ENCRYPT,DECRYPT}_256(), are not mindful of this.
So if we have a mix of multi-block and single-block calls, the
single-block unrolled 256-bit encrypt/decrypt can run with some
of the key registers clobbered.
Handle this by always explicitly loading those registers before using
the non-unrolled 256-bit macro.
This was discovered thanks to all of the new test cases added by
Jussi Kivilinna.
Signed-off-by: David S. Miller <davem@davemloft.net>
David Woodhouse [Wed, 19 Dec 2012 11:01:21 +0000 (11:01 +0000)]
solos-pci: ensure all TX packets are aligned to 4 bytes
The FPGA can't handled unaligned DMA (yet). So copy into an aligned buffer,
if skb->data isn't suitably aligned.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>