Bryan Schumaker [Thu, 10 May 2012 19:07:32 +0000 (15:07 -0400)]
NFS: Don't pass mount data to nfs_fscache_get_super_cookie()
I intend on creating a single nfs_fs_mount() function used by all our
mount paths. To avoid checking between new mounts and clone mounts, I
instead pass both structures to a new function in super.c that finds the
cache key and then looks up the super cookie.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bryan Schumaker [Thu, 10 May 2012 19:07:31 +0000 (15:07 -0400)]
NFS: Create a single nfs_get_root()
This patch splits out the NFS v4 specific functionality of
nfs4_get_root() into its own rpc_op called by the generic client, and
leaves nfs4_proc_get_rootfh() as its own stand alone function. This
also allows me to change nfs4_remote_mount(), nfs4_xdev_mount() and
nfs4_remote_referral_mount() to use the generic client's nfs_get_root()
function. Later patches in this series will collapse these functions
into one common function, so using the same get_root() function
everywhere simplifies future changes.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bryan Schumaker [Thu, 10 May 2012 19:07:30 +0000 (15:07 -0400)]
NFS: Rename nfs4_proc_get_root()
This function is really getting the root filehandle and not the root
dentry of the filesystem. I also removed the rpc_ops lookup from
nfs4_get_rootfh() under the assumption that if we reach this function
then we already know we are using NFS v4.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Wed, 9 May 2012 17:54:53 +0000 (13:54 -0400)]
NFS: Clean up - Simplify reference counting in fs/nfs/direct.c
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Trond Myklebust [Wed, 9 May 2012 18:04:55 +0000 (14:04 -0400)]
NFS: Clean up - Rename nfs_unlock_request and nfs_unlock_request_dont_release
Function rename to ensure that the functionality of nfs_unlock_request()
mirrors that of nfs_lock_request(). Then let nfs_unlock_and_release_request()
do the work of what used to be called nfs_unlock_request()...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Trond Myklebust [Wed, 9 May 2012 17:19:15 +0000 (13:19 -0400)]
NFS: Clean up - simplify nfs_lock_request()
We only have two places where we need to grab a reference when trying
to lock the nfs_page. We're better off making that explicit.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Trond Myklebust [Wed, 9 May 2012 17:37:43 +0000 (13:37 -0400)]
NFS: nfs_set_page_writeback no longer needs to reference the page
We now hold a reference to the nfs_page across the calls to
nfs_set_page_writeback and nfs_end_page_writeback, and that
means we already have a reference to the struct page.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Trond Myklebust [Wed, 9 May 2012 18:30:35 +0000 (14:30 -0400)]
NFS: Prevent a deadlock in the new writeback code
We have to unlock the nfs_page before we call nfs_end_page_writeback
to avoid races with functions that expect the page to be unlocked
when PG_locked and PG_writeback are not set.
The problem is that nfs_unlock_request also releases the nfs_page,
causing a deadlock if the release of the nfs_open_context
triggers an iput() while the PG_writeback flag is still set...
The solution is to separate the unlocking and release of the nfs_page,
so that we can do the former before nfs_end_page_writeback and the
latter after.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Trond Myklebust [Sun, 6 May 2012 23:46:30 +0000 (19:46 -0400)]
NFSv4: nfs_client_return_marked_delegations can't flush data
Since even filemap_flush() needs to lock pages that are dirty, we
cannot risk calling it from the state manager context. Therefore,
we need to move the call to filemap_flush() to
nfs_async_inode_return_delegation().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Sun, 6 May 2012 23:34:17 +0000 (19:34 -0400)]
NFS: nfs_inode_return_delegation() should always flush dirty data
The assumption is that if you are in a situation where you need to
return the delegation, then you should probably stop caching the
data anyway.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Sun, 6 May 2012 23:10:59 +0000 (19:10 -0400)]
NFS: Don't do a full flush to disk on close() if we hold a delegation
If we hold a delegation then we know that it should be safe to continue
to cache the data beyond the close(). However since the process that wrote
the data may die after close(), we may still want to send the data to
server before those RPCSEC_GSS credentials expire. We therefore compromise
by starting writeback to the server, but don't wait for completion.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 4 May 2012 17:54:24 +0000 (13:54 -0400)]
NFS: Fix sparse warnings
Fix the following sparse warnings:
fs/nfs/direct.c:221:6: warning: symbol 'nfs_direct_readpage_release' was
not declared. Should it be static?
fs/nfs/read.c:38:43: warning: non-ANSI function declaration of function
'nfs_readhdr_alloc'
fs/nfs/objlayout/objio_osd.c:214:5: warning: symbol '__alloc_objio_seg'
was not declared. Should it be static?
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Trond Myklebust [Fri, 4 May 2012 17:47:16 +0000 (13:47 -0400)]
NFS: Fix O_DIRECT compile warnings
Fix the following compile warnings:
fs/nfs/direct.c: In function 'nfs_direct_read_schedule_segment':
fs/nfs/direct.c:325:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:325:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:325:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:352:27: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c: In function 'nfs_direct_write_schedule_segment':
fs/nfs/direct.c:622:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:622:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:622:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:650:27: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Trond Myklebust [Tue, 1 May 2012 21:37:59 +0000 (17:37 -0400)]
NFS: Adapt readdirplus to application usage patterns
While the use of READDIRPLUS is significantly more efficient than
READDIR followed by many LOOKUP calls, it is still less efficient
than just READDIR if the attributes are not required.
This patch tracks when lookups are attempted on the directory,
and uses that information to selectively disable READDIRPLUS
on that directory.
The first 'readdir' call is always served using READDIRPLUS.
Subsequent calls only use READDIRPLUS if there was a successful
lookup or revalidation on a child in the mean time.
Credit for the original idea should go to Neil Brown. See:
http://www.spinics.net/lists/linux-nfs/msg19996.html
However, the implementation in this patch differs from Neil's
in that it focuses on tracking lookups rather than calls to
stat().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Neil Brown <neilb@suse.de>
Trond Myklebust [Sun, 29 Apr 2012 14:44:42 +0000 (10:44 -0400)]
NFSv4: COMMIT does not need post-op attributes
No attributes are supposed to change during a COMMIT call, so there
is no need to request post-op attributes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Sat, 28 Apr 2012 18:55:16 +0000 (14:55 -0400)]
NFSv4: Don't request cache consistency attributes on some writes
We don't need cache consistency information when we're doing O_DIRECT
writes. Ditto for the case of delegated writes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 27 Apr 2012 17:48:19 +0000 (13:48 -0400)]
NFSv4: Simplify the NFSv4 REMOVE, LINK and RENAME compounds
Get rid of the post-op GETATTR on the directory in order to reduce
the amount of processing done on the server.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 27 Apr 2012 17:48:18 +0000 (13:48 -0400)]
NFSv4: Simplify the NFSv4 CREATE compound
Get rid of the post-op GETATTR on the directory in order to reduce
the amount of processing done on the server.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 27 Apr 2012 17:48:18 +0000 (13:48 -0400)]
NFSv4: Simplify the NFSv4 OPEN compound
Get rid of the post-op GETATTR on the directory in order to reduce
the amount of processing done on the server.
The cost is that if we later need to stat() the directory, then we
know that the ctime and mtime are likely to be invalid.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 27 Apr 2012 17:48:18 +0000 (13:48 -0400)]
NFS: Simplify the cache invalidation code
Now that NFSv2 and NFSv3 have simulated change attributes,
instead of using all three of mtime, ctime and change attribute to
manage data cache consistency, we can simplify the code to just use
the change attribute.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 27 Apr 2012 17:48:18 +0000 (13:48 -0400)]
NFSv2/v3: Simulate the change attribute
Use the ctime to simulate a change attribute for NFSv2 and NFSv3.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 27 Apr 2012 17:48:17 +0000 (13:48 -0400)]
NFS: Change attribute updates should set NFS_INO_REVAL_PAGECACHE
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 27 Apr 2012 17:48:17 +0000 (13:48 -0400)]
NFS: Simplify nfs_fhget()
If the inode is being initialised, there is no point in
setting flags such as NFS_INO_INVALID_ACCESS,
NFS_INO_INVALID_ACL or NFS_INO_INVALID_DATA since there are
no cached access calls, acls or data caches to invalidate.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Sun, 29 Apr 2012 16:50:01 +0000 (12:50 -0400)]
NFS: Always trust the PageUptodate flag when we have a delegation
We can always use the optimal full page write if we know that we
hold a delegation.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Sun, 29 Apr 2012 16:30:19 +0000 (12:30 -0400)]
NFS: Optimise away nfs_check_inode_attributes() when holding a delegation
We already know that the attribute cache is valid.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Sun, 29 Apr 2012 15:23:50 +0000 (11:23 -0400)]
NFS: Don't force page cache revalidations when holding a delegation
If we're holding a delegation, then we already know that our
page cache is valid.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Sat, 28 Apr 2012 20:05:03 +0000 (16:05 -0400)]
NFSv4: Retrieve attributes _before_ calling delegreturn
In order to retrieve cache consistency attributes before
anyone else has a chance to change the inode, we need to
put the GETATTR op _before_ the DELEGRETURN op.
We can then use that as part of a 'nfs_post_op_update_inode_force_wcc()'
call, to ensure that we update the attributes without clearing our
cached data.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 27 Apr 2012 17:48:17 +0000 (13:48 -0400)]
NFSv4: Delegreturn only needs the cache consistency bitmask
In order to do close-to-open cache consistency checking after
a delegreturn, we don't need to retrieve the full set of
attributes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 27 Apr 2012 17:48:17 +0000 (13:48 -0400)]
NFSv4: Fix a typo in NFS4_enc_link_sz
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Tue, 1 May 2012 16:49:58 +0000 (12:49 -0400)]
NFS: Simplify the nfs_read_completion functions
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Trond Myklebust [Tue, 1 May 2012 16:07:22 +0000 (12:07 -0400)]
NFS: Clean up nfs read and write error paths
Move the error handling for nfs_generic_pagein() into a single function.
Ditto for nfs_generic_flush().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Trond Myklebust [Tue, 1 May 2012 15:21:43 +0000 (11:21 -0400)]
NFS: Read cleanups
Remove unused variables, and reformat some code.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Trond Myklebust [Mon, 30 Apr 2012 22:39:20 +0000 (18:39 -0400)]
NFS: Fix a compile issue when CONFIG_NFS_V4_1 is undefined
struct nfs_direct_req can't compile when struct pnfs_ds_commit_info
is undefined.
Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Trond Myklebust [Mon, 30 Apr 2012 22:31:49 +0000 (18:31 -0400)]
NFS: Use kmem_cache_zalloc() in nfs_direct_req_alloc
Simplify the initialisation of O_DIRECT requests.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Trond Myklebust [Mon, 30 Apr 2012 17:27:31 +0000 (13:27 -0400)]
NFS: Simplify O_DIRECT page referencing
The O_DIRECT code shouldn't need to hold 2 references to each page. The
reference held by the struct nfs_page should suffice.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Trond Myklebust [Mon, 30 Apr 2012 17:40:06 +0000 (13:40 -0400)]
NFS: O_DIRECT pgio_completion_ops error_cleanup must unlock the request
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Trond Myklebust [Mon, 30 Apr 2012 17:22:54 +0000 (13:22 -0400)]
NFS: Ensure that we break out of read/write_schedule_segment on error
Currently we do break out of the for() loop, but we also need to
break out of the enclosing do {} while()...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Bryan Schumaker [Mon, 30 Apr 2012 18:30:22 +0000 (14:30 -0400)]
NFS: Define dummy nfs_init_cinfo() and nfs_init_cinfo_from_inode()
These are needed when v3 and v4 are not enabled.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bryan Schumaker [Mon, 30 Apr 2012 17:27:11 +0000 (13:27 -0400)]
NFS: Define nfs_direct_write_schedule_work() when v3 and v4 are disabled
v2 doesn't have commits, so this function can be a no-op.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bryan Schumaker [Mon, 30 Apr 2012 17:06:53 +0000 (13:06 -0400)]
NFS: pnfs_pageio_init_read() and init_write() need an extra argument
This is only when CONFIG_NFS_V4_1 isn't enabled.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Acked-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Fri, 27 Apr 2012 18:31:47 +0000 (14:31 -0400)]
NFS: Fix a use-before-initialised warning in fs/nfs/write.c and fs/nfs/pnfs.c
If the allocation of nfs_write_header fails, the list of nfs_pages that
needs to be cleaned up is still on desc->pg_list...
Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Fred Isaman <iisaman@netapp.com>
Bryan Schumaker [Fri, 27 Apr 2012 17:27:46 +0000 (13:27 -0400)]
NFS: Remove extra rpc_clnt argument to proc_lookup
Now that I'm doing secinfo automatically in the v4 code this extra
argument isn't needed.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bryan Schumaker [Fri, 27 Apr 2012 17:27:45 +0000 (13:27 -0400)]
NFS: Create a submount rpc_op
This simplifies the code for v2 and v3 and gives v4 a chance to decide
on referrals without needing to modify the generic client.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bryan Schumaker [Fri, 27 Apr 2012 17:27:44 +0000 (13:27 -0400)]
NFS: Remove secinfo knowledge out of the generic client
And also remove the unneeded rpc_op.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Tue, 24 Apr 2012 18:50:34 +0000 (14:50 -0400)]
NFS: Prevent garbage cinfo->ds from leaking out
This is a bugfix that applies on top of the previous directio patches,
that fixes a bug introduced in "NFS: create struct nfs_commit_info".
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:57 +0000 (14:47 -0400)]
NFS: rewrite directio write to use async coalesce code
This also has the advantage that it allows directio to use pnfs.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:56 +0000 (14:47 -0400)]
NFS: avoid some stat gathering for direct io
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:55 +0000 (14:47 -0400)]
NFS: add dreq to nfs_commit_info
Need this to pass into nfs_commitdata_init, in order to keep data->dreq
accurate.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:54 +0000 (14:47 -0400)]
NFS: create nfs_commit_completion_ops
Factors out the code that needs to change when directio
starts using these code paths.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:53 +0000 (14:47 -0400)]
NFS: create struct nfs_commit_info
It is COMMIT that is handled the most differently between
the paged and direct paths. Create a structure that encapsulates
everything either path needs to know about the commit state.
We could use void to hide some of the layout driver stuff, but
Trond suggests pulling it out to ensure type checking, given the
huge changes being made, and the fact that it doesn't interfere
with other drivers.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:52 +0000 (14:47 -0400)]
NFS: create nfs_generic_commit_list
Simple refactoring.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:51 +0000 (14:47 -0400)]
NFS: rewrite directio read to use async coalesce code
This also has the advantage that it allows directio to use pnfs.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 23:55:31 +0000 (19:55 -0400)]
NFS: prepare coalesce testing for directio
The coalesce code made assumptions that will no longer be true once
non-page aligned io occurs. This introduces no change in
current behavior, but allows for more general situations to come.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:49 +0000 (14:47 -0400)]
NFS: remove unused wb_complete field from struct nfs_page
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:48 +0000 (14:47 -0400)]
NFS: create completion structure to pass into page_init functions
Factors out the code that will need to change when directio
starts using these code paths. This will allow directio to use
the generic pagein and flush routines
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:47 +0000 (14:47 -0400)]
NFS: merge _full and _partial write rpc_ops
Decouple nfs_pgio_header and nfs_write_data, and have (possibly
multiple) nfs_write_datas each take a refcount on nfs_pgio_header.
For the moment keeps nfs_write_header as a way to preallocate a single
nfs_write_data with the nfs_pgio_header. The code doesn't need this,
and would be prettier without, but given the amount of churn I am
already introducing I didn't want to play with tuning new mempools.
This also fixes bug in pnfs_ld_handle_write_error. In the case of
desc->pg_bsize < PAGE_CACHE_SIZE, the pages list was empty, causing
replay attempt to do nothing.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:46 +0000 (14:47 -0400)]
NFS: merge _full and _partial read rpc_ops
Decouple nfs_pgio_header and nfs_read_data, and have (possibly
multiple) nfs_read_datas each take a refcount on nfs_pgio_header.
For the moment keeps nfs_read_header as a way to preallocate a single
nfs_read_data with the nfs_pgio_header. The code doesn't need this,
and would be prettier without, but given the amount of churn I am
already introducing I didn't want to play with tuning new mempools.
This also fixes bug in pnfs_ld_handle_read_error. In the case of
desc->pg_bsize < PAGE_CACHE_SIZE, the pages list was empty, causing
replay attempt to do nothing.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:45 +0000 (14:47 -0400)]
NFS: create struct nfs_page_array
Both nfs_read_data and nfs_write_data devote several fields which
can be combined into a single shared struct.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:44 +0000 (14:47 -0400)]
NFS: create common nfs_pgio_header for both read and write
In order to avoid duplicating all the data in nfs_read_data whenever we
split it up into multiple RPC calls (either due to a short read result
or due to rsize < PAGE_SIZE), we split out the bits that are the same
per RPC call into a separate "header" structure.
The goal this patch moves towards is to have a single header
refcounted by several rpc_data structures. Thus, want to always refer
from rpc_data to the header, and not the other way. This patch comes
close to that ideal, but the directio code currently needs some
special casing, isolated in the nfs_direct_[read_write]hdr_release()
functions. This will be dealt with in a future patch.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:43 +0000 (14:47 -0400)]
NFS: use req_offset where appropriate
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:42 +0000 (14:47 -0400)]
NFS: remove unnecessary casts of void pointers in nfs4filelayout.c
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:41 +0000 (14:47 -0400)]
NFS: reverse arg order in nfs_initiate_[read|write]
Make it consistent with nfs_initiate_commit.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:40 +0000 (14:47 -0400)]
NFS: dprintks in directio code were referencing task after put
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:39 +0000 (14:47 -0400)]
NFS: add a struct nfs_commit_data to replace nfs_write_data in commits
Commits don't need the vectors of pages, etc. that writes do. Split out
a separate structure for the commit operation.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:38 +0000 (14:47 -0400)]
NFS4.1: Add lseg to struct nfs4_fl_commit_bucket
Also create a commit_info structure to hold the bucket array and push
it up from the lseg to the layout where it really belongs.
While we are at it, fix a refcounting bug due to an (incorrect)
implicit assumption that filelayout_scan_ds_commit_list always
completely emptied the src list.
This clarifies refcounting, removes the ugly find_only_write_lseg
functions, and pushes the file layout commit code along on the path to
supporting multiple lsegs.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:37 +0000 (14:47 -0400)]
NFS4.1: make pnfs_ld_[read|write]_done consistent
The two functions had diverged quite a bit, with the write function
being a bit more robust than the read.
However, these still break badly in the desc->pg_bsize < PAGE_CACHE_SIZE case,
as then there is nothing hanging on the data->pages list, and the resend
ends up doing nothing. This will be fixed in a patch later in the series.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fred Isaman [Fri, 20 Apr 2012 18:47:36 +0000 (14:47 -0400)]
NFS: grab open context in direct read
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bryan Schumaker [Fri, 27 Apr 2012 17:27:43 +0000 (13:27 -0400)]
NFS: Remove unused function nfs_lookup_with_sec()
This fixes a compiler warning.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bryan Schumaker [Fri, 27 Apr 2012 17:27:42 +0000 (13:27 -0400)]
NFS: Honor the authflavor set in the clone mount data
The authflavor is set in an nfs_clone_mount structure and passed to the
xdev_mount() functions where it was promptly ignored. Instead, use it
to initialize an rpc_clnt for the cloned server.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bryan Schumaker [Fri, 27 Apr 2012 17:27:41 +0000 (13:27 -0400)]
NFS: Fix following referral mount points with different security
I create a new proc_lookup_mountpoint() to use when submounting an NFS
v4 share. This function returns an rpc_clnt to use for performing an
fs_locations() call on a referral's mountpoint.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bryan Schumaker [Fri, 27 Apr 2012 17:27:40 +0000 (13:27 -0400)]
NFS: Do secinfo as part of lookup
Whenever lookup sees wrongsec do a secinfo and retry the lookup to find
attributes of the file or directory, such as "is this a referral
mountpoint?". This also allows me to remove handling -NFS4ERR_WRONSEC
as part of getattr xdr decoding.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bryan Schumaker [Fri, 27 Apr 2012 17:27:39 +0000 (13:27 -0400)]
NFS: Handle exceptions coming out of nfs4_proc_fs_locations()
We don't want to return -NFS4ERR_WRONGSEC to the VFS because it could
cause the kernel to oops.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bryan Schumaker [Fri, 27 Apr 2012 17:27:38 +0000 (13:27 -0400)]
NFS: Fix SECINFO_NO_NAME
I was using the same decoder function for SECINFO and SECINFO_NO_NAME,
so it was returning an error when it tried to decode an OP_SECINFO_NO_NAME
header as OP_SECINFO.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Stanislav Kinsbursky [Fri, 27 Apr 2012 09:00:17 +0000 (13:00 +0400)]
SUNRPC: traverse clients tree on PipeFS event
v2: recursion was replaced by loop
If client is a clone, then it's parent can not be in the list.
But parent's Pipefs dentries have to be created and destroyed.
Note: event skip helper for clients introduced
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Stanislav Kinsbursky [Fri, 20 Apr 2012 14:19:56 +0000 (18:19 +0400)]
SUNRPC: set per-net PipeFS superblock before notification
There can be a case, when on MOUNT event RPC client (after it's dentries were
created) is not longer hold by anyone except notification callback.
I.e. on release this client will be destoroyed. And it's dentries have to be
destroyed as well. Which in turn requires per-net PipeFS superblock to be set.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Stanislav Kinsbursky [Fri, 20 Apr 2012 14:19:18 +0000 (18:19 +0400)]
SUNRPC: skip clients with program without PipeFS entries
1) This is sane.
2) Otherwise there will be soft lockup:
do {
rpc_get_client_for_event (clnt->cl_dentry == NULL ==> choose)
__rpc_pipefs_event (clnt->cl_program->pipe_dir_name == NULL ==> return)
} while (1)
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Stanislav Kinsbursky [Fri, 20 Apr 2012 14:11:02 +0000 (18:11 +0400)]
SUNRPC: skip dead but not buried clients on PipeFS events
These clients can't be safely dereferenced if their counter in 0.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Sachin Prabhu [Tue, 17 Apr 2012 13:36:40 +0000 (14:36 +0100)]
Avoid beyond bounds copy while caching ACL
When attempting to cache ACLs returned from the server, if the bitmap
size + the ACL size is greater than a PAGE_SIZE but the ACL size itself
is smaller than a PAGE_SIZE, we can read past the buffer page boundary.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Sachin Prabhu [Tue, 17 Apr 2012 13:35:39 +0000 (14:35 +0100)]
Avoid reading past buffer when calling GETACL
Bug noticed in commit
bf118a342f10dafe44b14451a1392c3254629a1f
When calling GETACL, if the size of the bitmap array, the length
attribute and the acl returned by the server is greater than the
allocated buffer(args.acl_len), we can Oops with a General Protection
fault at _copy_from_pages() when we attempt to read past the pages
allocated.
This patch allocates an extra PAGE for the bitmap and checks to see that
the bitmap + attribute_length + ACLs don't exceed the buffer space
allocated to it.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reported-by: Jian Li <jiali@redhat.com>
[Trond: Fixed a size_t vs unsigned int printk() warning]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Jim Rees [Tue, 10 Apr 2012 02:33:39 +0000 (22:33 -0400)]
fix page number calculation bug for block layout decode buffer
Signed-off-by: Jim Rees <rees@umich.edu>
Suggested-by: Andy Adamson <andros@netapp.com>
Suggested-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson [Sat, 14 Apr 2012 07:56:35 +0000 (03:56 -0400)]
NFSv4.1 fix page number calculation bug for filelayout decode buffers
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Sachin Bhamare [Fri, 30 Mar 2012 21:29:59 +0000 (14:29 -0700)]
pnfs-obj: Remove unused variable from objlayout_get_deviceinfo()
Local variable 'sb' was not being used in objlayout_get_deviceinfo().
Signed-off-by: Sachin Bhamare <sbhamare@panasas.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Weston Andros Adamson [Tue, 24 Apr 2012 20:50:37 +0000 (16:50 -0400)]
nfs4: fix referrals on mounts that use IPv6 addrs
All referrals (IPv4 addr, IPv6 addr, and DNS) are broken on mounts of
IPv6 addresses, because validation code uses a path that is parsed
from the dev_name ("<server>:<path>") by splitting on the first colon and
colons are used in IPv6 addrs.
This patch ignores colons within IPv6 addresses that are escaped by '[' and ']'.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Linus Torvalds [Thu, 26 Apr 2012 04:38:44 +0000 (21:38 -0700)]
Merge tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Fix NFSv4 infinite loops on open(O_TRUNC)
- Fix an Oops and an infinite loop in the NFSv4 flock code
- Don't register the PipeFS filesystem until it has been set up
- Fix an Oops in nfs_try_to_update_request
- Don't reuse NFSv4 open owners: fixes a bad sequence id storm.
* tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Keep dropped state owners on the LRU list for a while
NFSv4: Ensure that we don't drop a state owner more than once
NFSv4: Ensure we do not reuse open owner names
nfs: Enclose hostname in brackets when needed in nfs_do_root_mount
NFS: put open context on error in nfs_flush_multi
NFS: put open context on error in nfs_pagein_multi
NFSv4: Fix open(O_TRUNC) and ftruncate() error handling
NFSv4: Ensure that we check lock exclusive/shared type against open modes
NFSv4: Ensure that the LOCK code sets exception->inode
NFS: check for req==NULL in nfs_try_to_update_request cleanup
SUNRPC: register PipeFS file system after pernet sybsystem
Linus Torvalds [Thu, 26 Apr 2012 04:29:26 +0000 (21:29 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from H. Peter Anvin.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x32, siginfo: Provide proper overrides for x32 siginfo_t
asm-generic: Allow overriding clock_t and add attributes to siginfo_t
x32: Check __ILP32__ instead of __LP64__ for x32
x86, acpi: Call acpi_enter_sleep_state via an asmlinkage C function from assembler
ACPI: Convert wake_sleep_flags to a value instead of function
x86, apic: APIC code touches invalid MSR on P5 class machines
i387: ptrace breaks the lazy-fpu-restore logic
x86/platform: Remove incorrect error message in x86_default_fixup_cpu_id()
x86, efi: Add dedicated EFI stub entry point
x86/amd: Remove broken links from comment and kernel message
x86, microcode: Ensure that module is only loaded on supported AMD CPUs
x86, microcode: Fix sysfs warning during module unload on unsupported CPUs
Linus Torvalds [Thu, 26 Apr 2012 04:28:10 +0000 (21:28 -0700)]
Merge branch 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86
Pull x86 platform driver fixes from Matthew Garrett:
"One annoyance fix (make intel_ips stop complaining unnecessarily) and
one oops fix (unterminated list in dell-laptop). Both have been in
-next for a while with no complaints."
* 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86:
dell-laptop: Terminate quirks list properly
intel_ips: Hush the i915 symbols message
Johannes Weiner [Tue, 24 Apr 2012 18:22:33 +0000 (20:22 +0200)]
mm: memcg: move pc lookup point to commit_charge()
None of the callsites actually need the page_cgroup descriptor
themselves, so just pass the page and do the look up in there.
We already had two bugs (6568d4a 'mm: memcg: update the correct soft
limit tree during migration' and 'memcg: fix Bad page state after
replace_page_cache') where the passed page and pc were not referring
to the same page frame.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Miller [Wed, 25 Apr 2012 20:10:50 +0000 (16:10 -0400)]
mm: nobootmem: Correct alloc_bootmem semantics.
The comments above __alloc_bootmem_node() claim that the code will
first try the allocation using 'goal' and if that fails it will
try again but with the 'goal' requirement dropped.
Unfortunately, this is not what the code does, so fix it to do so.
This is important for nobootmem conversions to architectures such
as sparc where MAX_DMA_ADDRESS is infinity.
On such architectures all of the allocations done by generic spots,
such as the sparse-vmemmap implementation, will pass in:
__pa(MAX_DMA_ADDRESS)
as the goal, and with the limit given as "-1" this will always fail
unless we add the appropriate fallback logic here.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 24 Apr 2012 15:22:25 +0000 (08:22 -0700)]
Merge git://git./linux/kernel/git/steve/gfs2-3.0-fixes
Pull gfs2 fixes from Steven Whitehouse.
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
GFS2: Instruct DLM to avoid queue convert slowdown
Linus Torvalds [Tue, 24 Apr 2012 15:20:28 +0000 (08:20 -0700)]
Merge tag 'hsi_fixes_for_3.4' of git://gitorious.org/kernel-hsi/kernel-hsi
Pull HSI fixes and ABI documentation from Carlos Chinea
* tag 'hsi_fixes_for_3.4' of git://gitorious.org/kernel-hsi/kernel-hsi:
HSI: Add HSI ABI documentation
HSI: hsi_char: Remove max_data_size from sysfs
HSI: hsi: Rework hsi_event interface
HSI: hsi: Remove controllers and ports from the bus
HSI: hsi: Fix error path cleanup on client registration
HSI: hsi: Rework hsi_controller release
Bob Peterson [Tue, 10 Apr 2012 18:45:24 +0000 (14:45 -0400)]
GFS2: Instruct DLM to avoid queue convert slowdown
This patch instructs DLM to prevent an "in place" conversion, where the
lock just stays on the granted queue, and instead forces the conversion to
the back of the convert queue. This is done on upward conversions only.
This is useful in cases where, for example, a lock is frequently needed in
PR on one node, but another node needs it temporarily in EX to update it.
This may happen, for example, when the rindex is being updated by gfs2_grow.
The gfs2_grow needs to have the lock in EX, but the other nodes need to
re-read it to retrieve the updates. The glock is already granted in PR on
the non-growing nodes, so this prevents them from continually re-granting
the lock in PR, and forces the EX from gfs2_grow to go through.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Linus Torvalds [Tue, 24 Apr 2012 02:52:00 +0000 (19:52 -0700)]
Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 bug fixes from Ted Ts'o:
"These are two low-risk bug fixes for ext4, fixing a compile warning
and a potential deadlock."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
super.c: unused variable warning without CONFIG_QUOTA
jbd2: use GFP_NOFS for blkdev_issue_flush
Linus Torvalds [Tue, 24 Apr 2012 02:50:48 +0000 (19:50 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/rkuo/linux-hexagon-kernel
Pull Hexagon fixes from Richard Kuo:
"It's mostly compile fixes and the Hexagon portion of a CPU hotplug
patch set."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel:
hexagon: add missing cpu.h include
hexagon/CPU hotplug: Add missing call to notify_cpu_starting()
hexagon: use renamed tick_nohz_idle_* functions
Hexagon: misc compile warning/error cleanup due to missing headers
Linus Torvalds [Tue, 24 Apr 2012 02:45:19 +0000 (19:45 -0700)]
Merge branch 'rc-fixes' of git://git./linux/kernel/git/mmarek/kbuild
Pull build system failure fix from Michal Marek:
"This fixes build failure with newer gcc that adds some internal
symbols that end in "__mod_*_device_table", but are not actually the
tables themselves."
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
Fix modpost failures in fedora 17
Eldad Zack [Sun, 22 Apr 2012 15:50:52 +0000 (17:50 +0200)]
super.c: unused variable warning without CONFIG_QUOTA
sb info is only checked with quota support.
fs/ext4/super.c: In function ‘parse_options’:
fs/ext4/super.c:1600:23: warning: unused variable ‘sbi’ [-Wunused-variable]
Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Shaohua Li [Fri, 13 Apr 2012 02:27:35 +0000 (10:27 +0800)]
jbd2: use GFP_NOFS for blkdev_issue_flush
flush request is issued in transaction commit code path, so looks using
GFP_KERNEL to allocate memory for flush request bio falls into the classic
deadlock issue. I saw btrfs and dm get it right, but ext4, xfs and md are
using GFP.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Linus Torvalds [Tue, 24 Apr 2012 01:25:01 +0000 (18:25 -0700)]
Merge tag 'md-3.4-fixes' of git://neil.brown.name/md
Pull a few more md bug fixes from NeilBrown:
"2 are tagged for -stable, one being for a fairly serious bug that can
corrupt metadata and make it hard to recovery an array. The other is
for a more recent regression since 3.3"
* tag 'md-3.4-fixes' of git://neil.brown.name/md:
md: fix possible corruption of array metadata on shutdown.
md: don't call ->add_disk unless there is good reason.
DM RAID: Use safe version of rdev_for_each
Linus Torvalds [Tue, 24 Apr 2012 01:22:42 +0000 (18:22 -0700)]
Merge tag 'dlm-fixes-3.4' of git://git./linux/kernel/git/teigland/linux-dlm
Pull dlm fixes from David Teigland:
"This includes one short patch fixing the behavior of the QUECVT flag,
which the gfs2 folks are waiting on."
* tag 'dlm-fixes-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: fix QUECVT when convert queue is empty
Hugh Dickins [Mon, 23 Apr 2012 18:14:50 +0000 (11:14 -0700)]
mm: fix s390 BUG by __set_page_dirty_no_writeback on swap
Mel reports a BUG_ON(slot == NULL) in radix_tree_tag_set() on s390
3.0.13: called from __set_page_dirty_nobuffers() when page_remove_rmap()
tries to transfer dirty flag from s390 storage key to struct page and
radix_tree.
That would be because of reclaim's shrink_page_list() calling
add_to_swap() on this page at the same time: first PageSwapCache is set
(causing page_mapping(page) to appear as &swapper_space), then
page->private set, then tree_lock taken, then page inserted into
radix_tree - so there's an interval before taking the lock when the
radix_tree slot is empty.
We could fix this by moving __add_to_swap_cache()'s spin_lock_irq up
before the SetPageSwapCache. But a better fix is simply to do what's
five years overdue: Ken Chen introduced __set_page_dirty_no_writeback()
(if !PageDirty TestSetPageDirty) for tmpfs to skip all the radix_tree
overhead, and swap is just the same - it ignores the radix_tree tag, and
does not participate in dirty page accounting, so should be using
__set_page_dirty_no_writeback() too.
s390 testing now confirms that this does indeed fix the problem.
Reported-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ken Chen <kenchen@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
H. Peter Anvin [Mon, 23 Apr 2012 23:34:12 +0000 (16:34 -0700)]
x32, siginfo: Provide proper overrides for x32 siginfo_t
Provide the proper override macros for x32 siginfo_t. The combination
of a special type here and an overall alignment constraint actually
ends up with all the types being properly aligned, but the hack is
needed to keep the substructures inside siginfo_t from adding padding.
Note: use __attribute__((aligned())) since __aligned() is not exported
to user space.
[ v2: fix stray semicolon ]
Reported-by: H.J. Lu <hjl.rools@gmail.com>
Cc: Bruce J. Beare <bruce.j.beare@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/CAMe9rOqF6Kh6-NK7oP0Fpzkd4SBAWU%2BG53hwBbSD4iA2UzyxuA@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>