Pavel Emelyanov [Mon, 4 Oct 2010 12:58:02 +0000 (16:58 +0400)]
sunrpc: Remove UDP worker wrappers
Same for UDP sockets creation paths.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 4 Oct 2010 12:57:40 +0000 (16:57 +0400)]
sunrpc: Remove TCP worker wrappers
The v4 and the v6 wrappers only pass the respective family
to the xs_tcp_setup_socket. This family can be taken from the
xprt's sockaddr.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 4 Oct 2010 12:57:11 +0000 (16:57 +0400)]
sunrpc: Pass family to setup_socket calls
Now we have a single socket creation routine and can call it
directly from the setup_socket routines.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 4 Oct 2010 12:56:38 +0000 (16:56 +0400)]
sunrpc: Merge xs_create_sock code
After xs_bind is merged it's easy to merge its callers.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
[bfields@redhat.com: fix address family initialization]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Tue, 5 Oct 2010 11:53:08 +0000 (15:53 +0400)]
sunrpc: Merge the xs_bind code
There's the only difference betseen the xs_bind4 and the
xs_bind6 - the size of sockaddr structure they use.
Fortunatelly its size can be indirectly get from the transport.
Change since v1:
* use sockaddr_storage instead of sockaddr
* use rpc_set_port instead of manual port assigning
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
[bfields@redhat.com: fix address family initialization]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 4 Oct 2010 12:55:38 +0000 (16:55 +0400)]
sunrpc: Call xs_create_sockX directly from setup_socket
Remove now unneeded wrappers that just add type and protocol
to socket creation callback.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 4 Oct 2010 12:54:55 +0000 (16:54 +0400)]
sunrpc: Factor out v6 sockets creation
Same patch for v6 protocols.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 4 Oct 2010 12:54:26 +0000 (16:54 +0400)]
sunrpc: Factor out v4 sockets creation
The UDPv4 and TCPv4 socket creation callbacks now look very similar.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 4 Oct 2010 12:53:46 +0000 (16:53 +0400)]
sunrpc: Factor out udp sockets creation
Make it look like the TCP sockets creation.
Unfortunately the git diff made the patch look messy :(
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 4 Oct 2010 12:52:55 +0000 (16:52 +0400)]
sunrpc: Remove duplicate xprt/transport arguments from calls
The xs_tcp_reuse_connection takes the xprt only to pass it down
to the xs_abort_connection. The later one can get it from the given
transport itself.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 4 Oct 2010 12:52:25 +0000 (16:52 +0400)]
sunrpc: Get xprt pointer once in xs_tcp_setup_socket
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 4 Oct 2010 12:51:56 +0000 (16:51 +0400)]
sunrpc: Remove unused sock arg from xs_next_srcport
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 4 Oct 2010 12:51:23 +0000 (16:51 +0400)]
sunrpc: Remove unused sock arg from xs_get_srcport
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Tom Tucker [Tue, 12 Oct 2010 20:33:57 +0000 (15:33 -0500)]
svcrdma: Cleanup DMA unmapping in error paths.
There are several error paths in the code that do not unmap DMA. This
patch adds calls to svc_rdma_unmap_dma to free these DMA contexts.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Tom Tucker [Tue, 12 Oct 2010 20:33:52 +0000 (15:33 -0500)]
svcrdma: Change DMA mapping logic to avoid the page_address kernel API
There was logic in the send path that assumed that a page containing data
to send to the client has a KVA. This is not always the case and can result
in data corruption when page_address returns zero and we end up DMA mapping
zero.
This patch changes the bus mapping logic to avoid page_address() where
necessary and converts all calls from ib_dma_map_single to ib_dma_map_page
in order to keep the map/unmap calls symmetric.
Signed-off-by: Tom Tucker <tom@ogc.us>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Mon, 11 Oct 2010 20:49:44 +0000 (16:49 -0400)]
nfsd4: expire clients more promptly
Expire clients more promptly, at the expense of possibly running the
laundromat thread more frequently.
Though it's not the default, I'd like it to be feasible to run with a
lease time of just a few seconds, at which point a minimum 10 second
wait between laundromat runs seems a little much.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Tue, 5 Oct 2010 16:48:02 +0000 (20:48 +0400)]
sunrpc: Use helper to set v4 mapped addr in ip_map_parse
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Thu, 7 Oct 2010 04:29:46 +0000 (15:29 +1100)]
sunrpc/cache: centralise handling of size limit on deferred list.
We limit the number of 'defer' requests to DFR_MAX.
The imposition of this limit is spread about a bit - sometime we don't
add new things to the list, sometimes we remove old things.
Also it is currently applied to requests which we are 'waiting' for
rather than 'deferring'. This doesn't seem ideal as 'waiting'
requests are naturally limited by the number of threads.
So gather the DFR_MAX handling code to one place and only apply it to
requests that are actually being deferred.
This means that not all 'cache_deferred_req' structures go on the
'cache_defer_list, so we need to be careful when adding and removing
things.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Thu, 7 Oct 2010 04:29:46 +0000 (15:29 +1100)]
sunrpc: Simplify cache_defer_req and related functions.
The return value from cache_defer_req is somewhat confusing.
Various different error codes are returned, but the single caller is
only interested in success or failure.
In fact it can measure this success or failure itself by checking
CACHE_PENDING, which makes the point of the code more explicit.
So change cache_defer_req to return 'void' and test CACHE_PENDING
after it completes, to see if the request was actually deferred or
not.
Similarly setup_deferral and cache_wait_req don't need a return value,
so make them void and remove some code.
The call to cache_revisit_request (to guard against a race) is only
needed for the second call to setup_deferral, so move it out of
setup_deferral to after that second call. With the first call the
race is handled differently (by explicitly calling
'wait_for_completion').
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Sat, 2 Oct 2010 22:42:39 +0000 (18:42 -0400)]
nfsd4: return expired on unfound stateid's
Commit
78155ed75f470710f2aecb3e75e3d97107ba8374 "nfsd4: distinguish
expired from stale stateids" attempted to distinguish expired and stale
stateid's using time information that may not have been completely
reliable, so I reverted it.
That was throwing out the baby with the bathwater; we still do want to
return expired, but let's do that using the simpler approach of just
assuming any stateid is expired if it looks like it was given out by the
current server instance, but we can't find it any more.
This may help clients that are recovering from network partitions.
Reported-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Wed, 29 Sep 2010 20:11:06 +0000 (16:11 -0400)]
nfsd4: add new connections to session
As long as we're not implementing any session security, we should just
automatically add any new connections that come along to the list of
sessions associated with the session.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Wed, 29 Sep 2010 19:29:32 +0000 (15:29 -0400)]
nfsd4: refactor connection allocation
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Sun, 6 Jun 2010 22:37:16 +0000 (18:37 -0400)]
nfsd4: use callbacks on svc_xprt_deletion
Remove connections from the list when they go down.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
J. Bruce Fields [Mon, 22 Mar 2010 19:37:17 +0000 (15:37 -0400)]
nfsd: provide callbacks on svc_xprt deletion
NFSv4.1 needs warning when a client tcp connection goes down, if that
connection is being used as a backchannel, so that it can warn the
client that it has lost the backchannel connection.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
J. Bruce Fields [Sun, 6 Jun 2010 22:12:14 +0000 (18:12 -0400)]
nfsd4: keep per-session list of connections
The spec requires us in various places to keep track of the connections
associated with each session.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
J. Bruce Fields [Mon, 27 Sep 2010 21:12:05 +0000 (17:12 -0400)]
nfsd4: clean up session allocation
Changes:
- make sure session memory reservation is released on failure
path.
- use min_t()/min() for more compact code in several places.
- break alloc_init_session into smaller pieces.
- miscellaneous other cleanup.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
J. Bruce Fields [Mon, 27 Sep 2010 20:26:25 +0000 (16:26 -0400)]
nfsd4: fix alloc_init_session return type
This returns an nfs error, not -ERRNO.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Mon, 27 Sep 2010 20:22:30 +0000 (16:22 -0400)]
nfsd4: fix alloc_init_session BUILD_BUG_ON()
Note we're allocating an array of nfsd4_slot *'s, not nfsd4_slot's.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Sat, 5 Jun 2010 00:04:45 +0000 (20:04 -0400)]
nfsd4: Move callback setup to callback queue
Instead of creating the new rpc client from a regular server thread,
set a flag, kick off a null call, and allow the null call to do the work
of setting up the client on the callback workqueue.
Use a spinlock to ensure the callback work gets a consistent view of the
callback parameters.
This allows, for example, changing the callback from contexts where
sleeping is not allowed. I hope it will also keep the locking simple as
we add more session and trunking features, by serializing most of the
callback-specific work.
This also closes a small race where the the new cb_ident could be used
with an old connection (or vice-versa).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
J. Bruce Fields [Mon, 31 May 2010 22:21:37 +0000 (18:21 -0400)]
nfsd4: remove separate cb_args struct
I don't see the point of the separate struct. It seems to just be
getting in the way.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
J. Bruce Fields [Wed, 26 May 2010 21:52:14 +0000 (17:52 -0400)]
nfsd4: use generic callback code in null case
This will eventually allow us, for example, to kick off null callback
from contexts where we can't sleep.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
J. Bruce Fields [Sun, 16 May 2010 20:47:08 +0000 (16:47 -0400)]
nfsd4: generic callback code
Make the recall callback code more generic, so that other callbacks
will be able to use it too.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
J. Bruce Fields [Wed, 26 May 2010 21:46:00 +0000 (17:46 -0400)]
nfsd4: rename nfs4_rpc_args->nfsd4_cb_args
With apologies for the gratuitous rename, the new name seems more
helpful to me.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
J. Bruce Fields [Wed, 26 May 2010 21:40:53 +0000 (17:40 -0400)]
nfsd4: combine nfs4_rpc_args and nfsd4_cb_sequence
These two structs don't really need to be distinct as far as I can tell.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
J. Bruce Fields [Mon, 31 May 2010 23:09:40 +0000 (19:09 -0400)]
nfsd4: minor variable renaming (cb -> conn)
Now that we have both nfsd4_callback and nfsd4_cb_conn structures, I get
confused if variables of both types are always named cb....
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
J. Bruce Fields [Fri, 1 Oct 2010 19:40:01 +0000 (15:40 -0400)]
nfsd4: remove spkm3
Unfortunately, spkm3 never got very far; while interoperability with one
other implementation was demonstrated at some point, problems were found
with the spec that were deemed not worth fixing.
The kernel code is useless on its own without nfs-utils patches which
were never merged into nfs-utils, and were only ever available from
citi.umich.edu. They appear not to have been updated since 2005.
Therefore it seems safe to assume that this code has no users, and never
will.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Wed, 22 Sep 2010 02:55:06 +0000 (12:55 +1000)]
sunrpc: fix race in new cache_wait code.
If we set up to wait for a cache item to be filled in, and then find
that it is no longer pending, it could be that some other thread is
in 'cache_revisit_request' and has moved our request to its 'pending' list.
So when our setup_deferral calls cache_revisit_request it will find nothing to
put on the pending list, and do nothing.
We then return from cache_wait_req, thus leaving the 'sleeper'
on-stack structure open to being corrupted by subsequent stack usage.
However that 'sleeper' could still be on the 'pending' list that the
other thread is looking at and so any corruption could cause it to behave badly.
To avoid this race we simply take the same path as if the
'wait_for_completion_interruptible_timeout' was interrupted and if the
sleeper is no longer on the list (which it won't be) we wait on the
completion - which will ensure that any other cache_revisit_request
will have let go of the sleeper.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:06:57 +0000 (16:06 +0400)]
sunrpc: Create sockets in net namespaces
The context is already known in all the sock_create callers.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:06:32 +0000 (16:06 +0400)]
net: Export __sock_create
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:05:43 +0000 (16:05 +0400)]
sunrpc: Tag rpc_xprt with net
The net is known from the xprt_create and this tagging will also
give un the context in the conntection workers where real sockets
are created.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:05:12 +0000 (16:05 +0400)]
sunrpc: Add net to xprt_create
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:04:45 +0000 (16:04 +0400)]
sunrpc: Add net to rpc_create_args
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:04:18 +0000 (16:04 +0400)]
sunrpc: Pull net argument downto svc_create_socket
After this the socket creation in it knows the context.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:03:50 +0000 (16:03 +0400)]
sunrpc: Add net argument to svc_create_xprt
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:03:13 +0000 (16:03 +0400)]
sunrpc: Factor out rpc_xprt freeing
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Wed, 29 Sep 2010 12:02:43 +0000 (16:02 +0400)]
sunrpc: Factor out rpc_xprt allocation
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Benny Halevy [Thu, 30 Sep 2010 18:47:46 +0000 (20:47 +0200)]
nfsd4: adjust buflen for encoded attrs bitmap based on actual bitmap length
The existing code adjusted it based on the worst case scenario for the returned
bitmap and the best case scenario for the supported attrs attribute.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[bfields@redhat.com: removed likely/unlikely's]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Stephen Rothwell [Wed, 29 Sep 2010 04:16:57 +0000 (14:16 +1000)]
sunrpc: fix up rpcauth_remove_module section mismatch
On Wed, 29 Sep 2010 14:02:38 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> After merging the final tree, today's linux-next build (powerpc
> ppc44x_defconfig) produced tis warning:
>
> WARNING: net/sunrpc/sunrpc.o(.init.text+0x110): Section mismatch in reference from the function init_sunrpc() to the function .exit.text:rpcauth_remove_module()
> The function __init init_sunrpc() references
> a function __exit rpcauth_remove_module().
> This is often seen when error handling in the init function
> uses functionality in the exit path.
> The fix is often to remove the __exit annotation of
> rpcauth_remove_module() so it may be used outside an exit section.
>
> Probably caused by commit
2f72c9b73730c335381b13e2bd221abe1acea394
> ("sunrpc: The per-net skeleton").
This actually causes a build failure on a sparc32 defconfig build:
`rpcauth_remove_module' referenced in section `.init.text' of net/built-in.o: defined in discarded section `.exit.text' of net/built-in.o
I applied the following patch for today:
Fixes:
`rpcauth_remove_module' referenced in section `.init.text' of net/built-in.o: defined in discarded section `.exit.text' of net/built-in.o
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 10:02:29 +0000 (14:02 +0400)]
sunrpc: Make the ip_map_cache be per-net
Everything that is required for that already exists:
* the per-net cache registration with respective proc entries
* the context (struct net) is available in all the users
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 10:01:58 +0000 (14:01 +0400)]
sunrpc: Make the /proc/net/rpc appear in net namespaces
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 10:01:27 +0000 (14:01 +0400)]
sunrpc: The per-net skeleton
Register empty per-net operations for the sunrpc layer.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 10:00:49 +0000 (14:00 +0400)]
sunrpc: Tag svc_xprt with net
The transport representation should be per-net of course.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 10:00:15 +0000 (14:00 +0400)]
sunrpc: Add routines that allow registering per-net caches
Existing calls do the same, but for the init_net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 09:59:48 +0000 (13:59 +0400)]
sunrpc: Add net to pure API calls
There are two calls that operate on ip_map_cache and are
directly called from the nfsd code. Other places will be
handled in a different way.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 09:59:13 +0000 (13:59 +0400)]
sunrpc: Pass xprt to cached get/put routines
They do not require the rqst actually and having the xprt simplifies
further patching.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 09:58:42 +0000 (13:58 +0400)]
sunrpc: Make xprt auth cache release work with the xprt
This is done in order to facilitate getting the ip_map_cache from
which to put the ip_map.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Mon, 27 Sep 2010 09:57:36 +0000 (13:57 +0400)]
sunrpc: Pass the ip_map_parse's cd to lower calls
The target is to have many ip_map_cache-s in the system. This particular
patch handles its usage by the ip_map_parse callback.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Fri, 24 Sep 2010 21:43:59 +0000 (17:43 -0400)]
nfsd: fix /proc/net/rpc/nfsd.export/content display
Note with "first" always 0, and "lastflags" initially 0, we always dump
a spurious set of 0 flags at the start, among other problems.
Fix. And attempt to make the code a little more obvious.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pavel Emelyanov [Thu, 23 Sep 2010 14:26:58 +0000 (18:26 +0400)]
nfsd: Export get_task_comm for nfsd
The git://linux-nfs.org/~bfields/linux.git nfsd-next branch doesn't
compile when nfsd is a module with the following error:
ERROR: "get_task_comm" [fs/nfsd/nfsd.ko] undefined!
Replace the get_task_comm call with direct comm access, which is
safe for current.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Wed, 22 Sep 2010 02:55:07 +0000 (12:55 +1000)]
nfsd: allow deprecated interface to be compiled out.
Add CONFIG_NFSD_DEPRECATED, default to y.
Only include deprecated interface if this is defined.
This allows distros to remove this interface before the official
removal, and allows developers to test without it.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Wed, 22 Sep 2010 02:55:07 +0000 (12:55 +1000)]
nfsd: formally deprecate legacy nfsd syscall interface
The syscall interface is has been replaced by a more flexible
interface since 2.6.0. It is time to work towards discarding
the old interface.
So add a entry in feature-removal-schedule.txt and print a warning
when the interface is used.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Wed, 22 Sep 2010 02:55:06 +0000 (12:55 +1000)]
sunrpc/cache: fix recent breakage of cache_clean_deferred
commit
6610f720e9e8103c22d1f1ccf8fbb695550a571f
broke cache_clean_deferred as entries are no longer added to the
pending list for subsequent revisiting.
So put those requests back on the pending list.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Bryan Schumaker [Tue, 21 Sep 2010 20:38:12 +0000 (16:38 -0400)]
lockd: Mostly remove BKL from the server
This patch removes all but one call to lock_kernel() from the server.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Andy Shevchenko [Tue, 21 Sep 2010 06:40:25 +0000 (09:40 +0300)]
sunrpc/cache: don't use custom hex_to_bin() converter
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Thu, 12 Aug 2010 07:04:08 +0000 (17:04 +1000)]
sunrpc/cache: change deferred-request hash table to use hlist.
Being a hash table, hlist is the best option.
There is currently some ugliness were we treat "->next == NULL" as
a special case to avoid having to initialise the whole array.
This change nicely gets rid of that case.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Thu, 12 Aug 2010 07:04:07 +0000 (17:04 +1000)]
svcauth_gss: replace a trivial 'switch' with an 'if'
Code like:
switch(xxx) {
case -error1:
case -error2:
..
return;
case 0:
stuff;
}
can more naturally be written:
if (xxx < 0)
return;
stuff;
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Thu, 12 Aug 2010 07:04:06 +0000 (17:04 +1000)]
nfsd/idmap: drop special request deferal in favour of improved default.
The idmap code manages request deferal by waiting for a reply from
userspace rather than putting the NFS request on a queue to be retried
from the start.
Now that the common deferal code does this there is no need for the
special code in idmap.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Thu, 12 Aug 2010 07:04:07 +0000 (17:04 +1000)]
nfsd: disable deferral for NFSv4
Now that a slight delay in getting a reply to an upcall doesn't
require deferring of requests, request deferral for all NFSv4
requests - the concept doesn't really fit with the v4 model.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NeilBrown [Thu, 12 Aug 2010 07:04:07 +0000 (17:04 +1000)]
sunrpc: close connection when a request is irretrievably lost.
If we drop a request in the sunrpc layer, either due kmalloc failure,
or due to a cache miss when we could not queue the request for later
replay, then close the connection to encourage the client to retry sooner.
Note that if the drop happens in the NFS layer, NFSERR_JUKEBOX
(aka NFS4ERR_DELAY) is returned to guide the client concerning
replay.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Mon, 20 Sep 2010 02:55:06 +0000 (22:55 -0400)]
nfsd4: fix hang on fast-booting nfs servers
The last_close field of a cache_detail is initialized to zero, so the
condition
detail->last_close < seconds_since_boot() - 30
may be false even for a cache that was never opened.
However, we want to immediately fail upcalls to caches that were never
opened: in the case of the auth_unix_gid cache, especially, which may
never be opened by mountd (if the --manage-gids option is not set), we
want to fail the upcall immediately. Otherwise client requests will be
dropped unnecessarily on reboot.
Also document these conditions.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Mon, 20 Sep 2010 03:48:00 +0000 (23:48 -0400)]
Merge remote branch 'trond/bugfixes' into for-2.6.37
Without some client-side fixes, server testing is currently difficult.
Trond Myklebust [Sun, 12 Sep 2010 23:57:50 +0000 (19:57 -0400)]
SUNRPC: Fix the NFSv4 and RPCSEC_GSS Kconfig dependencies
The NFSv4 client's callback server calls svc_gss_principal(), which
is defined in the auth_rpcgss.ko
The NFSv4 server has the same dependency, and in addition calls
svcauth_gss_flavor(), gss_mech_get_by_pseudoflavor(),
gss_pseudoflavor_to_service() and gss_mech_put() from the same module.
The module auth_rpcgss itself has no dependencies aside from sunrpc,
so we only need to select RPCSEC_GSS.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Menyhart Zoltan [Sun, 12 Sep 2010 23:55:26 +0000 (19:55 -0400)]
statfs() gives ESTALE error
Hi,
An NFS client executes a statfs("file", &buff) call.
"file" exists / existed, the client has read / written it,
but it has already closed it.
user_path(pathname, &path) looks up "file" successfully in the
directory-cache and restarts the aging timer of the directory-entry.
Even if "file" has already been removed from the server, because the
lookupcache=positive option I use, keeps the entries valid for a while.
nfs_statfs() returns ESTALE if "file" has already been removed from the
server.
If the user application repeats the statfs("file", &buff) call, we
are stuck: "file" remains young forever in the directory-cache.
Signed-off-by: Zoltan Menyhart <Zoltan.Menyhart@bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Trond Myklebust [Sun, 12 Sep 2010 23:55:26 +0000 (19:55 -0400)]
NFS: Fix a typo in nfs_sockaddr_match_ipaddr6
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Miquel van Smoorenburg [Sun, 12 Sep 2010 23:55:26 +0000 (19:55 -0400)]
sunrpc: increase MAX_HASHTABLE_BITS to 14
The maximum size of the authcache is now set to 1024 (10 bits),
but on our server we need at least 4096 (12 bits). Increase
MAX_HASHTABLE_BITS to 14. This is a maximum of 16384 entries,
each containing a pointer (8 bytes on x86_64). This is
exactly the limit of kmalloc() (128K).
Signed-off-by: Miquel van Smoorenburg <mikevs@xs4all.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bian Naimeng [Sun, 12 Sep 2010 23:55:26 +0000 (19:55 -0400)]
gss:spkm3 miss returning error to caller when import security context
spkm3 miss returning error to up layer when import security context,
it may be return ok though it has failed to import security context.
Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Bian Naimeng [Sun, 12 Sep 2010 23:55:25 +0000 (19:55 -0400)]
gss:krb5 miss returning error to caller when import security context
krb5 miss returning error to up layer when import security context,
it may be return ok though it has failed to import security context.
Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fabio Olive Leite [Sun, 12 Sep 2010 23:55:25 +0000 (19:55 -0400)]
Remove incorrect do_vfs_lock message
The do_vfs_lock function on fs/nfs/file.c is only called if NLM is
not being used, via the -onolock mount option. Therefore it cannot
really be "out of sync with lock manager" when the local locking
function called returns an error, as there will be no corresponding
call to the NLM. For details, simply check the if/else on do_setlk
and do_unlk on fs/nfs/file.c.
Signed-Off-By: Fabio Olive Leite <fleite@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
J. Bruce Fields [Sun, 12 Sep 2010 23:55:25 +0000 (19:55 -0400)]
SUNRPC: cleanup state-machine ordering
This is just a minor cleanup: net/sunrpc/clnt.c clarifies the rpc client
state machine by commenting each state and by laying out the functions
implementing each state in the order that each state is normally
executed (in the absence of errors).
The previous patch "Fix null dereference in call_allocate" changed the
order of the states. Move the functions and update the comments to
reflect the change.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Sun, 12 Sep 2010 23:55:25 +0000 (19:55 -0400)]
SUNRPC: Fix a race in rpc_info_open
There is a race between rpc_info_open and rpc_release_client()
in that nothing stops a process from opening the file after
the clnt->cl_kref goes to zero.
Fix this by using atomic_inc_unless_zero()...
Reported-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Trond Myklebust [Sun, 12 Sep 2010 23:55:25 +0000 (19:55 -0400)]
SUNRPC: Fix race corrupting rpc upcall
If rpc_queue_upcall() adds a new upcall to the rpci->pipe list just
after rpc_pipe_release calls rpc_purge_list(), but before it calls
gss_pipe_release (as rpci->ops->release_pipe(inode)), then the latter
will free a message without deleting it from the rpci->pipe list.
We will be left with a freed object on the rpc->pipe list. Most
frequent symptoms are kernel crashes in rpc.gssd system calls on the
pipe in question.
Reported-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
J. Bruce Fields [Sun, 12 Sep 2010 23:55:25 +0000 (19:55 -0400)]
Fix null dereference in call_allocate
In call_allocate we need to reach the auth in order to factor au_cslack
into the allocation.
As of
a17c2153d2e271b0cbacae9bed83b0eaa41db7e1 "SUNRPC: Move the bound
cred to struct rpc_rqst", call_allocate attempts to do this by
dereferencing tk_client->cl_auth, however this is not guaranteed to be
defined--cl_auth can be zero in the case of gss context destruction (see
rpc_free_auth).
Reorder the client state machine to bind credentials before allocating,
so that we can instead reach the auth through the cred.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Linus Torvalds [Sun, 12 Sep 2010 23:07:37 +0000 (16:07 -0700)]
Linux 2.6.36-rc4
Randy Dunlap [Sat, 11 Sep 2010 22:55:26 +0000 (15:55 -0700)]
docbook: skip files with no docs since they generate scary warnings
Fix docbook templates that reference files that do not contain the
expected kernel-doc notation.
Fixes these warnings:
Warning(arch/x86/include/asm/unaligned.h): no structured comments found
Warning(lib/vsprintf.c): no structured comments found
These cause errors in the generated html output, like below, so drop
these lines.
Name
arch/x86/include/asm/unaligned.h - Document generation inconsistency
Oops
Warning
The template for this document tried to insert the structured comment from the file arch/x86/include/asm/unaligned.h at this point, but none was found. This dummy section is inserted to allow generation to continue.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Berg [Sat, 11 Sep 2010 22:55:22 +0000 (15:55 -0700)]
docbook: warn on unused doc entries
When you don't use !E or !I but only !F, then it's very easy to miss
including some functions, structs etc. in documentation. To help
finding which ones were missed, allow printing out the unused ones as
warnings.
For example, using this on mac80211 yields a lot of warnings like this:
Warning: didn't use docs for DOC: mac80211 workqueue
Warning: didn't use docs for ieee80211_max_queues
Warning: didn't use docs for ieee80211_bss_change
Warning: didn't use docs for ieee80211_bss_conf
when generating the documentation for it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Berg [Sat, 11 Sep 2010 22:55:12 +0000 (15:55 -0700)]
kernel-doc: ignore case when stripping attributes
There are valid attributes that could have upper case letters, but we
still want to remove, like for example
__attribute__((aligned(NETDEV_ALIGN)))
as encountered in the wireless code.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 11 Sep 2010 22:50:53 +0000 (15:50 -0700)]
Merge branch 'pm-fixes' of git://git./linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM / Hibernate: Avoid hitting OOM during preallocation of memory
PM QoS: Correct pr_debug() misuse and improve parameter checks
PM: Prevent waiting forever on asynchronous resume after failing suspend
Linus Torvalds [Sat, 11 Sep 2010 19:17:02 +0000 (12:17 -0700)]
Merge git://git./linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
[SCSI] fix use-after-free in scsi_init_io()
[SCSI] sd: fix medium-removal bug
[SCSI] qla2xxx: Update version number to 8.03.04-k0.
[SCSI] qla2xxx: Check for empty slot in request queue before posting Command type 6 request.
[SCSI] qla2xxx: Cover UNDERRUN case where SCSI status is set.
[SCSI] qla2xxx: Correctly set fw hung and complete only waiting mbx.
[SCSI] qla2xxx: Reset seconds_since_last_heartbeat correctly.
[SCSI] qla2xxx: make rport deletions explicit during vport removal
[SCSI] qla2xxx: Fix vport delete issues
[SCSI] sd, sym53c8xx: Remove warnings after vsprintf %pV introducation.
[SCSI] Fix warning: zero-length gnu_printf format string
[SCSI] hpsa: disable doorbell reset on reset_devices
[SCSI] be2iscsi: Fix for Login failure
[SCSI] fix bio.bi_rw handling
Rafael J. Wysocki [Sat, 11 Sep 2010 18:58:27 +0000 (20:58 +0200)]
PM / Hibernate: Avoid hitting OOM during preallocation of memory
There is a problem in hibernate_preallocate_memory() that it calls
preallocate_image_memory() with an argument that may be greater than
the total number of available non-highmem memory pages. If that's
the case, the OOM condition is guaranteed to trigger, which in turn
can cause significant slowdown to occur during hibernation.
To avoid that, make preallocate_image_memory() adjust its argument
before calling preallocate_image_pages(), so that the total number of
saveable non-highem pages left is not less than the minimum size of
a hibernation image. Change hibernate_preallocate_memory() to try to
allocate from highmem if the number of pages allocated by
preallocate_image_memory() is too low.
Modify free_unnecessary_pages() to take all possible memory
allocation patterns into account.
Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: M. Vefa Bicakci <bicave@superonline.com>
Linus Torvalds [Sat, 11 Sep 2010 15:06:38 +0000 (08:06 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (28 commits)
ipheth: remove incorrect devtype to WWAN
MAINTAINERS: Add CAIF
sctp: fix test for end of loop
KS8851: Correct RX packet allocation
udp: add rehash on connect()
net: blackhole route should always be recalculated
ipv4: Suppress lockdep-RCU false positive in FIB trie (3)
niu: Fix kernel buffer overflow for ETHTOOL_GRXCLSRLALL
ipvs: fix active FTP
gro: Re-fix different skb headrooms
via-velocity: Turn scatter-gather support back off.
ipv4: Fix reverse path filtering with multipath routing.
UNIX: Do not loop forever at unix_autobind().
PATCH: b44 Handle RX FIFO overflow better (simplified)
irda: off by one
3c59x: Fix deadlock in vortex_error()
netfilter: discard overlapping IPv6 fragment
ipv6: discard overlapping fragment
net: fix tx queue selection for bridged devices implementing select_queue
bonding: Fix jiffies overflow problems (again)
...
Fix up trivial conflicts due to the same cgroup API thinko fix going
through both Andrew and the networking tree. However, there were small
differences between the two, with Andrew's version generally being the
nicer one, and the one I merged first. So pick that one.
Conflicts in: include/linux/cgroup.h and kernel/cgroup.c
Linus Torvalds [Sat, 11 Sep 2010 15:01:09 +0000 (08:01 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc: Kill all BKL usage.
Linus Torvalds [Sat, 11 Sep 2010 14:59:49 +0000 (07:59 -0700)]
Merge branch 'sched-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, tsc: Fix a preemption leak in restore_sched_clock_state()
sched: Move sched_avg_update() to update_cpu_load()
Peter Zijlstra [Fri, 10 Sep 2010 20:32:53 +0000 (22:32 +0200)]
x86, tsc: Fix a preemption leak in restore_sched_clock_state()
Doh, a real life genuine preemption leak..
This caused a suspend failure.
Reported-bisected-and-tested-by-the-invaluable: Jeff Chua <jeff.chua.linux@gmail.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Nico Schottelius <nico-linux-20100709@schottelius.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Florian Pritz <flo@xssn.at>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: <stable@kernel.org> # Greg, please apply after: cd7240c ("x86, tsc, sched: Recompute cyc2ns_offset's during resume from")
sleep states
LKML-Reference: <
1284150773.402.122.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Sat, 11 Sep 2010 01:19:43 +0000 (18:19 -0700)]
Merge branch 'drm-intel-fixes' of git://git./linux/kernel/git/ickle/drm-intel
* 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel:
drm/i915: don't enable self-refresh on Ironlake
drm/i915: Double check that the wait_request is not pending before warning
Revert "drm/i915: Warn if we run out of FIFO space for a mode"
Revert "drm/i915: Allow LVDS on pipe A on gen4+"
Revert "drm/i915: Enable RC6 on Ironlake."
Linus Torvalds [Sat, 11 Sep 2010 01:19:26 +0000 (18:19 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: log IO completion workqueue is a high priority queue
xfs: prevent reading uninitialized stack memory
Peter Zijlstra [Fri, 10 Sep 2010 20:32:53 +0000 (22:32 +0200)]
x86, tsc: Fix a preemption leak in restore_sched_clock_state()
A real life genuine preemption leak..
Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mark gross [Thu, 9 Sep 2010 21:20:09 +0000 (23:20 +0200)]
PM QoS: Correct pr_debug() misuse and improve parameter checks
Correct some pr_debug() misuse and add a stronger parameter check to
pm_qos_write() for the ASCII hex value case. Thanks to Dan Carpenter
for pointing out the problem!
Signed-off-by: mark gross <markgross@thegnar.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Dave Chinner [Wed, 8 Sep 2010 09:00:22 +0000 (09:00 +0000)]
xfs: log IO completion workqueue is a high priority queue
The workqueue implementation in 2.6.36-rcX has changed, resulting
in the workqueues no longer having dedicated threads for work
processing. This has caused severe livelocks under heavy parallel
create workloads because the log IO completions have been getting
held up behind metadata IO completions. Hence log commits would
stall, memory allocation would stall because pages could not be
cleaned, and lock contention on the AIL during inode IO completion
processing was being seen to slow everything down even further.
By making the log Io completion workqueue a high priority workqueue,
they are queued ahead of all data/metadata IO completions and
processed before the data/metadata completions. Hence the log never
gets stalled, and operations needed to clean memory can continue as
quickly as possible. This avoids the livelock conditions and allos
the system to keep running under heavy load as per normal.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Roland McGrath [Wed, 8 Sep 2010 02:37:06 +0000 (19:37 -0700)]
execve: make responsive to SIGKILL with large arguments
An execve with a very large total of argument/environment strings
can take a really long time in the execve system call. It runs
uninterruptibly to count and copy all the strings. This change
makes it abort the exec quickly if sent a SIGKILL.
Note that this is the conservative change, to interrupt only for
SIGKILL, by using fatal_signal_pending(). It would be perfectly
correct semantics to let any signal interrupt the string-copying in
execve, i.e. use signal_pending() instead of fatal_signal_pending().
We'll save that change for later, since it could have user-visible
consequences, such as having a timer set too quickly make it so that
an execve can never complete, though it always happened to work before.
Signed-off-by: Roland McGrath <roland@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Wed, 8 Sep 2010 02:36:28 +0000 (19:36 -0700)]
execve: improve interactivity with large arguments
This adds a preemption point during the copying of the argument and
environment strings for execve, in copy_strings(). There is already
a preemption point in the count() loop, so this doesn't add any new
points in the abstract sense.
When the total argument+environment strings are very large, the time
spent copying them can be much more than a normal user time slice.
So this change improves the interactivity of the rest of the system
when one process is doing an execve with very large arguments.
Signed-off-by: Roland McGrath <roland@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>