Robert Milkowski [Thu, 30 Jan 2020 09:43:25 +0000 (09:43 +0000)]
NFSv4.0: nfs4_do_fsinfo() should not do implicit lease renewals
Currently, each time nfs4_do_fsinfo() is called it will do an implicit
NFS4 lease renewal, which is not compliant with the NFS4 specification.
This can result in a lease being expired by an NFS server.
Commit
83ca7f5ab31f ("NFS: Avoid PUTROOTFH when managing leases")
introduced implicit client lease renewal in nfs4_do_fsinfo(),
which can result in the NFSv4.0 lease to expire on a server side,
and servers returning NFS4ERR_EXPIRED or NFS4ERR_STALE_CLIENTID.
This can easily be reproduced by frequently unmounting a sub-mount,
then stat'ing it to get it mounted again, which will delay or even
completely prevent client from sending RENEW operations if no other
NFS operations are issued. Eventually nfs server will expire client's
lease and return an error on file access or next RENEW.
This can also happen when a sub-mount is automatically unmounted
due to inactivity (after nfs_mountpoint_expiry_timeout), then it is
mounted again via stat(). This can result in a short window during
which client's lease will expire on a server but not on a client.
This specific case was observed on production systems.
This patch removes the implicit lease renewal from nfs4_do_fsinfo().
Fixes: 83ca7f5ab31f ("NFS: Avoid PUTROOTFH when managing leases")
Signed-off-by: Robert Milkowski <rmilkowski@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Robert Milkowski [Tue, 28 Jan 2020 08:37:47 +0000 (08:37 +0000)]
NFSv4: try lease recovery on NFS4ERR_EXPIRED
Currently, if an nfs server returns NFS4ERR_EXPIRED to open(),
we return EIO to applications without even trying to recover.
Fixes: 272289a3df72 ("NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid")
Signed-off-by: Robert Milkowski <rmilkowski@gmail.com>
Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Wenwen Wang [Mon, 3 Feb 2020 03:47:53 +0000 (03:47 +0000)]
NFS: Fix memory leaks
In _nfs42_proc_copy(), 'res->commit_res.verf' is allocated through
kzalloc() if 'args->sync' is true. In the following code, if
'res->synchronous' is false, handle_async_copy() will be invoked. If an
error occurs during the invocation, the following code will not be executed
and the error will be returned . However, the allocated
'res->commit_res.verf' is not deallocated, leading to a memory leak. This
is also true if the invocation of process_copy_commit() returns an error.
To fix the above leaks, redirect the execution to the 'out' label if an
error is encountered.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Dai Ngo [Thu, 23 Jan 2020 01:45:39 +0000 (20:45 -0500)]
nfs: optimise readdir cache page invalidation
When the directory is large and it's being modified by one client
while another client is doing the 'ls -l' on the same directory then
the cache page invalidation from nfs_force_use_readdirplus causes
the reading client to keep restarting READDIRPLUS from cookie 0
which causes the 'ls -l' to take a very long time to complete,
possibly never completing.
Currently when nfs_force_use_readdirplus is called to switch from
READDIR to READDIRPLUS, it invalidates all the cached pages of the
directory. This cache page invalidation causes the next nfs_readdir
to re-read the directory content from cookie 0.
This patch is to optimise the cache invalidation in
nfs_force_use_readdirplus by only truncating the cached pages from
last page index accessed to the end the file. It also marks the
inode to delay invalidating all the cached page of the directory
until the next initial nfs_readdir of the next 'ls' instance.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com>
[Anna - Fix conflicts with Trond's readdir patches]
[Anna - Remove redundant call to nfs_zap_mapping()]
[Anna - Replace d_inode(file_dentry(desc->file)) with file_inode(desc->file)]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Sun, 2 Feb 2020 22:53:56 +0000 (17:53 -0500)]
NFS: Switch readdir to using iterate_shared()
Now that the page cache locking is repaired, we should be able to
switch to using iterate_shared() for improved concurrency when
doing readdir().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Sun, 2 Feb 2020 22:53:55 +0000 (17:53 -0500)]
NFS: Use kmemdup_nul() in nfs_readdir_make_qstr()
The directory strings stored in the readdir cache may be used with
printk(), so it is better to ensure they are nul-terminated.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Sun, 2 Feb 2020 22:53:54 +0000 (17:53 -0500)]
NFS: Directory page cache pages need to be locked when read
When a NFS directory page cache page is removed from the page cache,
its contents are freed through a call to nfs_readdir_clear_array().
To prevent the removal of the page cache entry until after we've
finished reading it, we must take the page lock.
Fixes: 11de3b11e08c ("NFS: Fix a memory leak in nfs_readdir")
Cc: stable@vger.kernel.org # v2.6.37+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Sun, 2 Feb 2020 22:53:53 +0000 (17:53 -0500)]
NFS: Fix memory leaks and corruption in readdir
nfs_readdir_xdr_to_array() must not exit without having initialised
the array, so that the page cache deletion routines can safely
call nfs_readdir_clear_array().
Furthermore, we should ensure that if we exit nfs_readdir_filler()
with an error, we free up any page contents to prevent a leak
if we try to fill the page again.
Fixes: 11de3b11e08c ("NFS: Fix a memory leak in nfs_readdir")
Cc: stable@vger.kernel.org # v2.6.37+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Sun, 2 Feb 2020 22:57:08 +0000 (17:57 -0500)]
SUNRPC: Use kmemdup_nul() in rpc_parse_scope_id()
Using kmemdup_nul() is more efficient when the length is known.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Sun, 2 Feb 2020 22:57:07 +0000 (17:57 -0500)]
NFS: Replace various occurrences of kstrndup() with kmemdup_nul()
When we already know the string length, it is more efficient to
use kmemdup_nul().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
[Anna - Changes to super.c were already made during fscontext conversion]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 27 Jan 2020 14:58:19 +0000 (09:58 -0500)]
NFSv4: Limit the total number of cached delegations
Delegations can be expensive to return, and can cause scalability issues
for the server. Let's therefore try to limit the number of inactive
delegations we hold.
Once the number of delegations is above a certain threshold, start
to return them on close.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 27 Jan 2020 14:58:18 +0000 (09:58 -0500)]
NFSv4: Add accounting for the number of active delegations held
In order to better manage our delegation caching, add a counter
to track the number of active delegations.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 27 Jan 2020 14:58:17 +0000 (09:58 -0500)]
NFSv4: Try to return the delegation immediately when marked for return on close
Add a routine to return the delegation immediately upon close of the
file if it was marked for return-on-close.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 27 Jan 2020 14:58:16 +0000 (09:58 -0500)]
NFS: Clear NFS_DELEGATION_RETURN_IF_CLOSED when the delegation is returned
If a delegation is marked as needing to be returned when the file is
closed, then don't clear that marking until we're ready to return
it.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 27 Jan 2020 14:58:15 +0000 (09:58 -0500)]
NFSv4: nfs_inode_evict_delegation() should set NFS_DELEGATION_RETURNING
In particular, the pnfs return-on-close code will check for that flag,
so ensure we set it appropriately.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Sun, 26 Jan 2020 22:31:15 +0000 (17:31 -0500)]
NFS: nfs_find_open_context() should use cred_fscmp()
We want to find open contexts that match our filesystem access
properties. They don't have to exactly match the cred.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Sun, 26 Jan 2020 22:31:14 +0000 (17:31 -0500)]
NFS: nfs_access_get_cached_rcu() should use cred_fscmp()
We do not need to have the rcu lookup method fail in the case where
the fsuid/fsgid and supplemental groups match.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Sun, 26 Jan 2020 22:31:13 +0000 (17:31 -0500)]
NFSv4: pnfs_roc() must use cred_fscmp() to compare creds
When comparing two 'struct cred' for equality w.r.t. behaviour under
filesystem access, we need to use cred_fscmp().
Fixes: a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Alex Shi [Tue, 21 Jan 2020 08:49:56 +0000 (16:49 +0800)]
NFS: remove unused macros
MNT_fhs_status_sz/MNT_fhandle3_sz are never used after they were
introduced. So better to remove them.
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
David Howells [Fri, 17 Jan 2020 15:55:09 +0000 (15:55 +0000)]
nfs: Return EINVAL rather than ERANGE for mount parse errors
Return EINVAL rather than ERANGE for mount parse errors as the userspace
mount command doesn't necessarily understand what to do with anything other
than EINVAL.
The old code returned -ERANGE as an intermediate error that then get
converted to -EINVAL, whereas the new code returns -ERANGE.
This was induced by passing minorversion=1 to a v4 mount where
CONFIG_NFS_V4_1 was disabled in the kernel build.
Fixes: 68f65ef40e1e ("NFS: Convert mount option parsing to use functionality from fs_parser.h")
Reported-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Olga Kornievskaia [Mon, 16 Dec 2019 21:34:02 +0000 (16:34 -0500)]
NFS: allow deprecation of NFS UDP protocol
Add a kernel config CONFIG_NFS_DISABLE_UDP_SUPPORT to disallow NFS
UDP mounts and enable it by default.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Tue, 14 Jan 2020 17:06:34 +0000 (12:06 -0500)]
NFS: Add softreval behaviour to nfs_lookup_revalidate()
If the server is unavaliable, we want to allow the revalidating
lookup to time out, and to default to validating the cached dentry
if the 'softreval' mount option is set.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Su Yanjun [Wed, 25 Dec 2019 03:37:57 +0000 (11:37 +0800)]
NFSv3: FIx bug when using chacl and chmod to change acl
We find a bug when running test under nfsv3 as below.
1)
chacl u::r--,g::rwx,o:rw- file1
2)
chmod u+w file1
3)
chacl -l file1
We expect u::rw-, but it shows u::r--, more likely it returns the
cached acl in inode.
We dig the code find that the code path is different.
chacl->..->__nfs3_proc_setacls->nfs_zap_acl_cache
Then nfs_zap_acl_cache clears the NFS_INO_INVALID_ACL in
NFS_I(inode)->cache_validity.
chmod->..->nfs3_proc_setattr
Because NFS_INO_INVALID_ACL has been cleared by chacl path,
nfs_zap_acl_cache wont be called.
nfs_setattr_update_inode will set NFS_INO_INVALID_ACL so let it
before nfs_zap_acl_cache call.
Signed-off-by: Su Yanjun <suyanjun218@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Olga Kornievskaia [Wed, 18 Dec 2019 21:50:42 +0000 (16:50 -0500)]
NFSv4.x recover from pre-mature loss of openstateid
Ever since the commit
0e0cb35b417f, it's possible to lose an open stateid
while retrying a CLOSE due to ERR_OLD_STATEID. Once that happens,
operations that require openstateid fail with EAGAIN which is propagated
to the application then tests like generic/446 and generic/168 fail with
"Resource temporarily unavailable".
Instead of returning this error, initiate state recovery when possible to
recover the open stateid and then try calling nfs4_select_rw_stateid()
again.
Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Olga Kornievskaia [Thu, 2 Jan 2020 22:09:54 +0000 (17:09 -0500)]
NFSv4 fix acl retrieval over krb5i/krb5p mounts
For the krb5i and krb5p mount, it was problematic to truncate the
received ACL to the provided buffer because an integrity check
could not be preformed.
Instead, provide enough pages to accommodate the largest buffer
bounded by the largest RPC receive buffer size.
Note: I don't think it's possible for the ACL to be truncated now.
Thus NFS4_ACL_TRUNC flag and related code could be possibly
removed but since I'm unsure, I'm leaving it.
v2: needs +1 page.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:39:37 +0000 (15:39 -0500)]
NFS: Add mount option 'softreval'
Add a mount option 'softreval' that allows attribute revalidation 'getattr'
calls to time out, and causes them to fall back to using the cached
attributes.
The use case for this option is for ensuring that we can still (slowly)
traverse paths and use cached information even when the server is down.
Once the server comes back up again, the getattr calls start succeeding,
and the caches will revalidate as usual.
The 'softreval' mount option is automatically enabled if you have
specified 'softerr'. It can be turned off using the options
'nosoftreval', or 'hard'.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:39:36 +0000 (15:39 -0500)]
NFS: Trust cached access if we've already revalidated the inode once
If we've already revalidated the inode once then don't distrust the
access cache unless the NFS_INO_INVALID_ACCESS flag is actually set.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:25:14 +0000 (15:25 -0500)]
NFS: Fix nfs_direct_write_reschedule_io()
The 'hdr->good_bytes' is defined as the number of bytes we expect to
read or write starting at offset hdr->io_start. In the case of a partial
read/write we may end up adjusting hdr->args.offset and hdr->args.count
to skip I/O for data that was already read/written, and so we must ensure
the calculation takes that into account.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:25:13 +0000 (15:25 -0500)]
NFS: When resending after a short write, reset the reply count to zero
If we're resending a write due to a short read or write, ensure we
reset the reply count to zero.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:25:12 +0000 (15:25 -0500)]
NFS: Improve tracing of permission calls
On exit from nfs_do_access(), record the mask representing the requested
permissions, as well as the server-supplied set of access rights for
this user.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:25:11 +0000 (15:25 -0500)]
pNFS/flexfiles: Add tracing for layout errors
Trace layout errors for pNFS/flexfiles on read/write/commit operations.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:25:10 +0000 (15:25 -0500)]
NFS: Clean up generic file commit tracepoint
Clean up the generic file commit tracepoints to use a 64-bit value
for the verifier, and to display the pNFS filehandle, if it exists.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:25:09 +0000 (15:25 -0500)]
NFS: Clean up generic writeback tracepoints
Clean up the generic writeback tracepoints so they do pass the
full structures as arguments. Also ensure we report the number
of bytes actually written.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:25:08 +0000 (15:25 -0500)]
NFS: Clean up generic file read tracepoints
Clean up the generic file read tracepoints so they do pass the
full structures as arguments. Also ensure we report the number
of bytes actually read.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:25:07 +0000 (15:25 -0500)]
pNFS/flexfiles: Record resend attempts on I/O failure
If the attempt to do pNFS fails, then record what action we
take to recover (resend, reset to pnfs or reset to mds).
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:25:06 +0000 (15:25 -0500)]
NFS: Fix fix of show_nfs_errors
Casting a negative value to an unsigned long is not the same as
converting it to its absolute value.
Fixes: 96650e2effa2 ("NFS: Fix show_nfs_errors macros again")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:25:05 +0000 (15:25 -0500)]
NFSv4: Improve read/write/commit tracing
Ensure we always return the number of bytes read/written. Also display
the pnfs filehandle if it is in use.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:25:04 +0000 (15:25 -0500)]
NFS/pnfs: Fix pnfs_generic_prepare_to_resend_writes()
Instead of making assumptions about the commit verifier contents, change
the commit code to ensure we always check that the verifier was set
by the XDR code.
Fixes: f54bcf2ecee9 ("pnfs: Prepare for flexfiles by pulling out common code")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:25:03 +0000 (15:25 -0500)]
NFS: Fix up fsync() when the server rebooted
Don't clear the NFS_CONTEXT_RESEND_WRITES flag until after calling
nfs_commit_inode(). Otherwise, if nfs_commit_inode() returns an
error, we end up with dirty pages in the page cache, but no tag
to tell us that those pages need resending.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:25:02 +0000 (15:25 -0500)]
SUNRPC: Remove broken gss_mech_list_pseudoflavors()
Remove gss_mech_list_pseudoflavors() and its callers. This is part of
an unused API, and could leak an RCU reference if it were ever called.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:25:01 +0000 (15:25 -0500)]
NFS: Revalidate the file mapping on all fatal writeback errors
If a write or commit failed, and the mapping sees a fatal error, we
need to revalidate the contents of that mapping.
Fixes: 06c9fdf3b9f1 ("NFS: On fatal writeback errors, we need to call nfs_inode_remove_request()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Mon, 6 Jan 2020 20:25:00 +0000 (15:25 -0500)]
NFS: Revalidate the file size on a fatal write error
If we suffer a fatal error upon writing a file, which causes us to
need to revalidate the entire mapping, then we should also revalidate
the file size.
Fixes: d2ceb7e57086 ("NFS: Don't use page_file_mapping after removing the page")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Fri, 3 Jan 2020 16:57:09 +0000 (11:57 -0500)]
xprtrdma: DMA map rr_rdma_buf as each rpcrdma_rep is created
Clean up: This simplifies the logic in rpcrdma_post_recvs.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Fri, 3 Jan 2020 16:57:04 +0000 (11:57 -0500)]
xprtrdma: Destroy reps from previous connection instance
To safely get rid of all rpcrdma_reps from a particular connection
instance, xprtrdma has to wait until each of those reps is finished
being used. A rep may be backing the rq_rcv_buf of an RPC that has
just completed, for example.
Since it is safe to invoke rpcrdma_rep_destroy() only in the Receive
completion handler, simply mark reps remaining in the rb_all_reps
list after the transport is drained. These will then be deleted as
rpcrdma_post_recvs pulls them off the rep free list.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Fri, 3 Jan 2020 16:56:58 +0000 (11:56 -0500)]
xprtrdma: Destroy rpcrdma_rep when Receive is flushed
This reduces the hardware and memory footprint of an unconnected
transport.
At some point in the future, transport reconnect will allow
resolving the destination IP address through a different device. The
current change enables reps for the new connection to be allocated
on whichever NUMA node the new device affines to after a reconnect.
Note that this does not destroy _all_ the transport's reps... there
will be a few that are still part of a running RPC completion.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Fri, 3 Jan 2020 16:56:53 +0000 (11:56 -0500)]
xprtrdma: Allocate and map transport header buffers at connect time
Currently the underlying RDMA device is chosen at transport set-up
time. But it will soon be at connect time instead.
The maximum size of a transport header is based on device
capabilities. Thus transport header buffers have to be allocated
_after_ the underlying device has been chosen (via address and route
resolution); ie, in the connect worker.
Thus, move the allocation of transport header buffers to the connect
worker, after the point at which the underlying RDMA device has been
chosen.
This also means the RDMA device is available to do a DMA mapping of
these buffers at connect time, instead of in the hot I/O path. Make
that optimization as well.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Fri, 3 Jan 2020 16:56:48 +0000 (11:56 -0500)]
xprtrdma: Refactor frwr_is_supported
Refactor: Perform the "is supported" check in rpcrdma_ep_create()
instead of in rpcrdma_ia_open(). frwr_open() is where most of the
logic to query device attributes is already located.
The current code displays a redundant error message when the device
does not support FRWR. As an additional clean-up, this patch removes
the extra message.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Fri, 3 Jan 2020 16:56:43 +0000 (11:56 -0500)]
xprtrdma: Eliminate per-transport "max pages"
To support device hotplug and migrating a connection between devices
of different capabilities, we have to guarantee that all in-kernel
devices can support the same max NFS payload size (1 megabyte).
This means that possibly one or two in-tree devices are no longer
supported for NFS/RDMA because they cannot support 1MB rsize/wsize.
The only one I confirmed was cxgb3, but it has already been removed
from the kernel.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Fri, 3 Jan 2020 16:56:37 +0000 (11:56 -0500)]
xprtrdma: Refactor initialization of ep->rep_max_requests
Clean up: there is no need to keep two copies of the same value.
Also, in subsequent patches, rpcrdma_ep_create() will be called in
the connect worker rather than at set-up time.
Minor fix: Initialize the transport's sendctx to the value based on
the capabilities of the underlying device, not the maximum setting.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Fri, 3 Jan 2020 16:56:32 +0000 (11:56 -0500)]
xprtrdma: Make sendctx queue lifetime the same as connection lifetime
The size of the sendctx queue depends on the value stored in
ia->ri_max_send_sges. This value is determined by querying the
underlying device.
Eventually, rpcrdma_ia_open() and rpcrdma_ep_create() will be called
in the connect worker rather than at transport set-up time. The
underlying device will not have been chosen device set-up time.
The sendctx queue will thus have to be created after the underlying
device has been chosen via address and route resolution; in other
words, in the connect worker.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Fri, 3 Jan 2020 16:56:27 +0000 (11:56 -0500)]
xprtrdma: Eliminate ri_max_send_sges
Clean-up. The max_send_sge value also happens to be stored in
ep->rep_attr. Let's keep just a single copy.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Colin Ian King [Mon, 6 Jan 2020 13:17:34 +0000 (13:17 +0000)]
NFS: Add missing null check for failed allocation
Currently the allocation of buf is not being null checked and
a null pointer dereference can occur when the memory allocation fails.
Fix this by adding a check and returning -ENOMEM.
Addresses-Coverity: ("Dereference null return")
Fixes: 6d972518b821 ("NFS: Add fs_context support.")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Geert Uytterhoeven [Mon, 30 Dec 2019 15:32:38 +0000 (16:32 +0100)]
nfs: NFS_SWAP should depend on SWAP
If CONFIG_SWAP=n, it does not make much sense to offer the user the
option to enable support for swapping over NFS, as that will still fail
at run time:
# swapon /swap
swapon: /swap: swapon failed: Function not implemented
Fix this by adding a dependency on CONFIG_SWAP.
Fixes: a564b8f0398636ba ("nfs: enable swap on NFS")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Julia Lawall [Wed, 1 Jan 2020 07:43:30 +0000 (08:43 +0100)]
SUNRPC: constify copied structure
The empty_iov structure is only copied into another structure,
so make it const.
The opportunity for this change was found using Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Murphy Zhou [Thu, 2 Jan 2020 08:04:26 +0000 (16:04 +0800)]
fs/nfs, swapon: check holes in swapfile
swapon over NFS does not go through generic_swapfile_activate
code path when setting up extents. This makes holes in NFS
swapfiles possible which is not expected for swapon.
Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Mon, 23 Dec 2019 15:28:44 +0000 (10:28 -0500)]
SUNRPC: call_connect_status should handle -EPROTO
The xprtrdma connect logic can return -EPROTO if the underlying
device or network path does not support RDMA. This can happen
after a device removal/insertion.
- When SOFTCONN is set, EPROTO is a permanent error.
- When SOFTCONN is not set, EPROTO is treated as a temporary error.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Mon, 23 Dec 2019 15:28:38 +0000 (10:28 -0500)]
NFS4: Report callback authentication errors
This seems to be a somewhat common issue with Kerberos NFSv4.0
set-ups.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Mon, 23 Dec 2019 15:28:33 +0000 (10:28 -0500)]
NFS: Introduce trace events triggered by page writeback errors
Try to capture the reason for the writeback path tagging an error on
a page.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Mon, 23 Dec 2019 15:28:28 +0000 (10:28 -0500)]
SUNRPC: Capture signalled RPC tasks
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
zhengbin [Thu, 19 Dec 2019 06:14:18 +0000 (14:14 +0800)]
NFS: move dprintk after nfs_alloc_fattr in nfs3_proc_lookup
In nfs3_proc_lookup, if nfs_alloc_fattr fails, will only print
"NFS call lookup". This may be confusing, move dprintk after
nfs_alloc_fattr.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
zhengbin [Thu, 19 Dec 2019 10:34:47 +0000 (18:34 +0800)]
NFS4: Remove unneeded semicolon
Fixes coccicheck warning:
fs/nfs/nfs4state.c:1138:2-3: Unneeded semicolon
fs/nfs/nfs4proc.c:6862:2-3: Unneeded semicolon
fs/nfs/nfs4proc.c:8629:2-3: Unneeded semicolon
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Arnd Bergmann [Mon, 11 Nov 2019 20:16:27 +0000 (21:16 +0100)]
nfs: encode nfsv4 timestamps as 64-bit
On 32-bit architectures, xdr_encode_nfstime4() needlessly
truncates timestamps to a 32-bit value in the range between
year 1902 and 2038.
Change it to use 'struct timespec64' to allow the entire range
of values supported by the server.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Arnd Bergmann [Mon, 11 Nov 2019 20:16:26 +0000 (21:16 +0100)]
nfs: remove timespec from xdr_encode_nfstime
For NFSv2 and NFSv3, timestamps are stored using 32-bit entities
and overflow in y2038. For historic reasons we truncate the
64-bit timestamps by converting from a timespec64 to a timespec
first.
Remove this unnecessary conversion step and do the truncation
in the final functions that take a timestamp.
This is transparent to users, but avoids one of the last uses
of 'timespec' and lets us remove it later.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Arnd Bergmann [Mon, 11 Nov 2019 20:16:25 +0000 (21:16 +0100)]
nfs: fscache: use timespec64 in inode auxdata
nfs currently behaves differently on 32-bit and 64-bit kernels regarding
the on-disk format of nfs_fscache_inode_auxdata.
That format should really be the same on any kernel, and we should avoid
the 'timespec' type in order to remove that from the kernel later on.
Using plain 'timespec64' would not be good here, since that includes
implied padding and would possibly leak kernel stack data to the on-disk
format on 32-bit architectures.
struct __kernel_timespec would work as a replacement, but open-coding
the two struct members in nfs_fscache_inode_auxdata makes it more
obvious what's going on here, and keeps the current format for 64-bit
architectures.
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Arnd Bergmann [Mon, 11 Nov 2019 20:16:23 +0000 (21:16 +0100)]
nfs: use timespec64 in nfs_fattr
Push down the use of timespec64 into NFS nfs_fattr, to avoid needless
conversions, and get closer to having 64-bit time_t support on 32-bit
NFSv4 and removing some old interfaces from the kernel.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Arnd Bergmann [Mon, 11 Nov 2019 20:16:21 +0000 (21:16 +0100)]
sunrpc: convert to time64_t for expiry
Using signed 32-bit types for UTC time leads to the y2038 overflow,
which is what happens in the sunrpc code at the moment.
This changes the sunrpc code over to use time64_t where possible.
The one exception is the gss_import_v{1,2}_context() function for
kerberos5, which uses 32-bit timestamps in the protocol. Here,
we can at least treat the numbers as 'unsigned', which extends the
range from 2038 to 2106.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Scott Mayhew [Tue, 10 Dec 2019 12:31:15 +0000 (07:31 -0500)]
NFS: Attach supplementary error information to fs_context.
Split out from commit "NFS: Add fs_context support."
Add wrappers nfs_errorf(), nfs_invalf(), and nfs_warnf() which log error
information to the fs_context. Convert some printk's to use these new
wrappers instead.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Scott Mayhew [Tue, 10 Dec 2019 12:31:14 +0000 (07:31 -0500)]
NFS: Additional refactoring for fs_context conversion
Split out from commit "NFS: Add fs_context support."
This patch adds additional refactoring for the conversion of NFS to use
fs_context, namely:
(*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context.
nfs_clone_mount has had several fields removed, and nfs_mount_info
has been removed altogether.
(*) Various functions now take an fs_context as an argument instead
of nfs_mount_info, nfs_fs_context, etc.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
David Howells [Tue, 10 Dec 2019 12:31:13 +0000 (07:31 -0500)]
NFS: Add fs_context support.
Add filesystem context support to NFS, parsing the options in advance and
attaching the information to struct nfs_fs_context. The highlights are:
(*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context. This
structure represents NFS's superblock config.
(*) Make use of the VFS's parsing support to split comma-separated lists
(*) Pin the NFS protocol module in the nfs_fs_context.
(*) Attach supplementary error information to fs_context. This has the
downside that these strings must be static and can't be formatted.
(*) Remove the auxiliary file_system_type structs since the information
necessary can be conveyed in the nfs_fs_context struct instead.
(*) Root mounts are made by duplicating the config for the requested mount
so as to have the same parameters. Submounts pick up their parameters
from the parent superblock.
[AV -- retrans is u32, not string]
[SM -- Renamed cfg to ctx in a few functions in an earlier patch]
[SM -- Moved fs_context mount option parsing to an earlier patch]
[SM -- Moved fs_context error logging to a later patch]
[SM -- Fixed printks in nfs4_try_get_tree() and nfs4_get_referral_tree()]
[SM -- Added is_remount_fc() helper]
[SM -- Deferred some refactoring to a later patch]
[SM -- Fixed referral mounts, which were broken in the original patch]
[SM -- Fixed leak of nfs_fattr when fs_context is freed]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Scott Mayhew [Tue, 10 Dec 2019 12:31:12 +0000 (07:31 -0500)]
NFS: Convert mount option parsing to use functionality from fs_parser.h
Split out from commit "NFS: Add fs_context support."
Convert existing mount option definitions to fs_parameter_enum's and
fs_parameter_spec's. Parse mount options using fs_parse() and
lookup_constant().
Notes:
1) Fixed a typo in the udp6 definition in nfs_xprt_protocol_tokens
from the original commit.
2) fs_parse() expects an fs_context as the first arg so that any
errors can be logged to the fs_context. We're passing NULL for the
fs_context (this will change in commit "NFS: Add fs_context support.")
which is okay as it will cause logfc() to do a printk() instead.
3) fs_parse() expects an fs_paramter as the third arg. We're
building an fs_parameter manually in nfs_fs_context_parse_option(),
which will go away in commit "NFS: Add fs_context support.".
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Scott Mayhew [Tue, 10 Dec 2019 12:31:11 +0000 (07:31 -0500)]
NFS: rename nfs_fs_context pointer arg in a few functions
Split out from commit "NFS: Add fs_context support."
Rename cfg to ctx in nfs_init_server(), nfs_verify_authflavors(),
and nfs_request_mount(). No functional changes.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
David Howells [Tue, 10 Dec 2019 12:31:10 +0000 (07:31 -0500)]
NFS: Do some tidying of the parsing code
Do some tidying of the parsing code, including:
(*) Returning 0/error rather than true/false.
(*) Putting the nfs_fs_context pointer first in some arg lists.
(*) Unwrap some lines that will now fit on one line.
(*) Provide unioned sockaddr/sockaddr_storage fields to avoid casts.
(*) nfs_parse_devname() can paste its return values directly into the
nfs_fs_context struct as that's where the caller puts them.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
David Howells [Tue, 10 Dec 2019 12:31:09 +0000 (07:31 -0500)]
NFS: Add a small buffer in nfs_fs_context to avoid string dup
Add a small buffer in nfs_fs_context to avoid string duplication when
parsing numbers. Also make the parsing function wrapper place the parsed
integer directly in the appropriate nfs_fs_context struct member.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
David Howells [Tue, 10 Dec 2019 12:31:08 +0000 (07:31 -0500)]
NFS: Deindent nfs_fs_context_parse_option()
Deindent nfs_fs_context_parse_option().
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
David Howells [Tue, 10 Dec 2019 12:31:07 +0000 (07:31 -0500)]
NFS: Split nfs_parse_mount_options()
Split nfs_parse_mount_options() to move the prologue, list-splitting and
epilogue into one function and the per-option processing into another.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
David Howells [Tue, 10 Dec 2019 12:31:06 +0000 (07:31 -0500)]
NFS: Rename struct nfs_parsed_mount_data to struct nfs_fs_context
Rename struct nfs_parsed_mount_data to struct nfs_fs_context and rename
pointers to it to "ctx". At some point this will be pointed to by an
fs_context struct's fs_private pointer.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
David Howells [Tue, 10 Dec 2019 12:31:05 +0000 (07:31 -0500)]
NFS: Constify mount argument match tables
The mount argument match tables should never be altered so constify them.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
David Howells [Tue, 10 Dec 2019 12:31:04 +0000 (07:31 -0500)]
NFS: Move mount parameterisation bits into their own file
Split various bits relating to mount parameterisation out from
fs/nfs/super.c into their own file to form the basis of filesystem context
handling for NFS.
No other changes are made to the code beyond removing 'static' qualifiers.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Al Viro [Tue, 10 Dec 2019 12:31:03 +0000 (07:31 -0500)]
nfs: get rid of ->set_security()
it's always either nfs_set_sb_security() or nfs_clone_sb_security(),
the choice being controlled by mount_info->cloned != NULL. No need
to add methods, especially when both instances live right next to
the caller and are never accessed anywhere else.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Al Viro [Tue, 10 Dec 2019 12:31:02 +0000 (07:31 -0500)]
nfs_clone_sb_security(): simplify the check for server bogosity
We used to check ->i_op for being nfs_dir_inode_operations. With
separate inode_operations for v3 and v4 that became bogus, but
rather than going for protocol-dependent comparison we could've
just checked ->i_fop instead; _that_ is the same for all protocol
versions.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Al Viro [Tue, 10 Dec 2019 12:31:01 +0000 (07:31 -0500)]
nfs: get rid of mount_info ->fill_super()
The only possible values are nfs_fill_super and nfs_clone_super. The
latter is used only when crossing into a submount and it is almost
identical to the former; the only differences are
* ->s_time_gran unconditionally set to 1 (even for v2 mounts).
Regression dating back to 2012, actually.
* ->s_blocksize/->s_blocksize_bits set to that of parent.
Rather than messing with the method, stash ->s_blocksize_bits in
mount_info in submount case and after the (now unconditional)
call of nfs_fill_super() override ->s_blocksize/->s_blocksize_bits
if that has been set.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Al Viro [Tue, 10 Dec 2019 12:31:00 +0000 (07:31 -0500)]
nfs: don't pass nfs_subversion to ->create_server()
pick it from mount_info
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Al Viro [Tue, 10 Dec 2019 12:30:59 +0000 (07:30 -0500)]
nfs: unexport nfs_fs_mount_common()
Make it static, even. And remove a stale extern of (long-gone)
nfs_xdev_mount_common() from internal.h, while we are at it.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Al Viro [Tue, 10 Dec 2019 12:30:58 +0000 (07:30 -0500)]
nfs: merge xdev and remote file_system_type
they are identical now...
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Al Viro [Tue, 10 Dec 2019 12:30:57 +0000 (07:30 -0500)]
nfs: don't bother passing nfs_subversion to ->try_mount() and nfs_fs_mount_common()
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Al Viro [Tue, 10 Dec 2019 12:30:56 +0000 (07:30 -0500)]
nfs: stash nfs_subversion reference into nfs_mount_info
That will allow to get rid of passing those references around in
quite a few places. Moreover, that will allow to merge xdev and
remote file_system_type.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Al Viro [Tue, 10 Dec 2019 12:30:55 +0000 (07:30 -0500)]
nfs: lift setting mount_info from nfs_xdev_mount()
Do it in nfs_do_submount() instead. As a side benefit, nfs_clone_data
doesn't need ->fh and ->fattr anymore.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Al Viro [Tue, 10 Dec 2019 12:30:54 +0000 (07:30 -0500)]
nfs4: fold nfs_do_root_mount/nfs_follow_remote_path
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Al Viro [Tue, 10 Dec 2019 12:30:53 +0000 (07:30 -0500)]
nfs: don't bother setting/restoring export_path around do_nfs_root_mount()
nothing in it will be looking at that thing anyway
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Al Viro [Tue, 10 Dec 2019 12:30:52 +0000 (07:30 -0500)]
nfs: fold nfs4_remote_fs_type and nfs4_remote_referral_fs_type
They are identical now.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Al Viro [Tue, 10 Dec 2019 12:30:51 +0000 (07:30 -0500)]
nfs: lift setting mount_info from nfs4_remote{,_referral}_mount
Do that (fhandle allocation, setting struct server up) in
nfs4_referral_mount() and nfs4_try_mount() resp. and pass the
server and pointer to mount_info into nfs_do_root_mount() so that
nfs4_remote_referral_mount()/nfs_remote_mount() could be merged.
Since we are moving stuff from ->mount() instances to the points
prior to vfs_kern_mount() that would trigger those, we need to
make sure that do_nfs_root_mount() will do the corresponding
cleanup itself if it doesn't trigger those ->mount() instances.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Al Viro [Tue, 10 Dec 2019 12:30:50 +0000 (07:30 -0500)]
nfs: stash server into struct nfs_mount_info
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Al Viro [Tue, 10 Dec 2019 12:30:49 +0000 (07:30 -0500)]
saner calling conventions for nfs_fs_mount_common()
Allow it to take ERR_PTR() for server and return ERR_CAST() of it in
such case. All callers used to open-code that...
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Linus Torvalds [Tue, 14 Jan 2020 21:33:14 +0000 (13:33 -0800)]
Merge tag 'nfs-for-5.5-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client bugfixes from Anna Schumaker:
"Three NFS over RDMA fixes for bugs Chuck found that can be hit during
device removal:
- Fix create_qp crash on device unload
- Fix completion wait during device removal
- Fix oops in receive handler after device removal"
* tag 'nfs-for-5.5-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
xprtrdma: Fix oops in Receive handler after device removal
xprtrdma: Fix completion wait during device removal
xprtrdma: Fix create_qp crash on device unload
Chuck Lever [Fri, 3 Jan 2020 16:52:22 +0000 (11:52 -0500)]
xprtrdma: Fix oops in Receive handler after device removal
Since v5.4, a device removal occasionally triggered this oops:
Dec 2 17:13:53 manet kernel: BUG: unable to handle page fault for address:
0000000c00000219
Dec 2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode
Dec 2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page
Dec 2 17:13:53 manet kernel: PGD 0 P4D 0
Dec 2 17:13:53 manet kernel: Oops: 0000 [#1] SMP
Dec 2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G W
5.4.0-00050-g53717e43af61 #883
Dec 2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
Dec 2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
Dec 2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma]
Dec 2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 <48> 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0
Dec 2 17:13:53 manet kernel: RSP: 0018:
ffffc900035dfe00 EFLAGS:
00010246
Dec 2 17:13:53 manet kernel: RAX:
ffff888467290000 RBX:
ffff88846c638400 RCX:
0000000000000048
Dec 2 17:13:53 manet kernel: RDX:
0000000000000048 RSI:
00000000f942e000 RDI:
0000000c00000001
Dec 2 17:13:53 manet kernel: RBP:
ffff888467611b00 R08:
ffff888464e4a3c4 R09:
0000000000000000
Dec 2 17:13:53 manet kernel: R10:
ffffc900035dfc88 R11:
fefefefefefefeff R12:
ffff888865af4428
Dec 2 17:13:53 manet kernel: R13:
ffff888466023000 R14:
ffff88846c63f000 R15:
0000000000000010
Dec 2 17:13:53 manet kernel: FS:
0000000000000000(0000) GS:
ffff88846fa80000(0000) knlGS:
0000000000000000
Dec 2 17:13:53 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Dec 2 17:13:53 manet kernel: CR2:
0000000c00000219 CR3:
0000000002009002 CR4:
00000000001606e0
Dec 2 17:13:53 manet kernel: Call Trace:
Dec 2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core]
Dec 2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core]
Dec 2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd
Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
Dec 2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a
Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
Dec 2 17:13:53 manet kernel: kthread+0xf4/0xf9
Dec 2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74
Dec 2 17:13:53 manet kernel: ret_from_fork+0x24/0x30
The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that
is still pointing to the old ib_device, which has been freed. The
only way that is possible is if this rpcrdma_rep was not destroyed
by rpcrdma_ia_remove.
Debugging showed that was indeed the case: this rpcrdma_rep was
still in use by a completing RPC at the time of the device removal,
and thus wasn't on the rep free list. So, it was not found by
rpcrdma_reps_destroy().
The fix is to introduce a list of all rpcrdma_reps so that they all
can be found when a device is removed. That list is used to perform
only regbuf DMA unmapping, replacing that call to
rpcrdma_reps_destroy().
Meanwhile, to prevent corruption of this list, I've moved the
destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs().
rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is
not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus
protecting the rb_all_reps list.
Fixes: b0b227f071a0 ("xprtrdma: Use an llist to manage free rpcrdma_reps")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Fri, 3 Jan 2020 16:52:17 +0000 (11:52 -0500)]
xprtrdma: Fix completion wait during device removal
I've found that on occasion, "rmmod <dev>" will hang while if an NFS
is under load.
Ensure that ri_remove_done is initialized only just before the
transport is woken up to force a close. This avoids the completion
possibly getting initialized again while the CM event handler is
waiting for a wake-up.
Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA from under an NFS mount")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Fri, 3 Jan 2020 16:52:12 +0000 (11:52 -0500)]
xprtrdma: Fix create_qp crash on device unload
On device re-insertion, the RDMA device driver crashes trying to set
up a new QP:
Nov 27 16:32:06 manet kernel: BUG: kernel NULL pointer dereference, address:
00000000000001c0
Nov 27 16:32:06 manet kernel: #PF: supervisor write access in kernel mode
Nov 27 16:32:06 manet kernel: #PF: error_code(0x0002) - not-present page
Nov 27 16:32:06 manet kernel: PGD 0 P4D 0
Nov 27 16:32:06 manet kernel: Oops: 0002 [#1] SMP
Nov 27 16:32:06 manet kernel: CPU: 1 PID: 345 Comm: kworker/u28:0 Tainted: G W 5.4.0 #852
Nov 27 16:32:06 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
Nov 27 16:32:06 manet kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
Nov 27 16:32:06 manet kernel: RIP: 0010:atomic_try_cmpxchg+0x2/0x12
Nov 27 16:32:06 manet kernel: Code: ff ff 48 8b 04 24 5a c3 c6 07 00 0f 1f 40 00 c3 31 c0 48 81 ff 08 09 68 81 72 0c 31 c0 48 81 ff 83 0c 68 81 0f 92 c0 c3 8b 06 <f0> 0f b1 17 0f 94 c2 84 d2 75 02 89 06 88 d0 c3 53 ba 01 00 00 00
Nov 27 16:32:06 manet kernel: RSP: 0018:
ffffc900035abbf0 EFLAGS:
00010046
Nov 27 16:32:06 manet kernel: RAX:
0000000000000000 RBX:
00000000000001c0 RCX:
0000000000000000
Nov 27 16:32:06 manet kernel: RDX:
0000000000000001 RSI:
ffffc900035abbfc RDI:
00000000000001c0
Nov 27 16:32:06 manet kernel: RBP:
ffffc900035abde0 R08:
000000000000000e R09:
ffffffffffffc000
Nov 27 16:32:06 manet kernel: R10:
0000000000000000 R11:
000000000002e800 R12:
ffff88886169d9f8
Nov 27 16:32:06 manet kernel: R13:
ffff88886169d9f4 R14:
0000000000000246 R15:
0000000000000000
Nov 27 16:32:06 manet kernel: FS:
0000000000000000(0000) GS:
ffff88846fa40000(0000) knlGS:
0000000000000000
Nov 27 16:32:06 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Nov 27 16:32:06 manet kernel: CR2:
00000000000001c0 CR3:
0000000002009006 CR4:
00000000001606e0
Nov 27 16:32:06 manet kernel: Call Trace:
Nov 27 16:32:06 manet kernel: do_raw_spin_lock+0x2f/0x5a
Nov 27 16:32:06 manet kernel: create_qp_common.isra.47+0x856/0xadf [mlx4_ib]
Nov 27 16:32:06 manet kernel: ? slab_post_alloc_hook.isra.60+0xa/0x1a
Nov 27 16:32:06 manet kernel: ? __kmalloc+0x125/0x139
Nov 27 16:32:06 manet kernel: mlx4_ib_create_qp+0x57f/0x972 [mlx4_ib]
The fix is to copy the qp_init_attr struct that was just created by
rpcrdma_ep_create() instead of using the one from the previous
connection instance.
Fixes: 98ef77d1aaa7 ("xprtrdma: Send Queue size grows after a reconnect")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Linus Torvalds [Tue, 14 Jan 2020 18:22:10 +0000 (10:22 -0800)]
Merge branch 'parisc-5.5-3' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"A boot crash fix by Mike Rapoport and a printk fix by Krzysztof
Kozlowski"
* 'parisc-5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: fix map_pages() to actually populate upper directory
parisc: Use proper printk format for resource_size_t
Linus Torvalds [Tue, 14 Jan 2020 18:17:15 +0000 (10:17 -0800)]
Merge tag 'asm-generic-5.5' of git://git./linux/kernel/git/arnd/playground
Pull asm-generic fixes from Arnd Bergmann:
"Here are two bugfixes from Mike Rapoport, both fixing compile-time
errors for the nds32 architecture that were recently introduced"
* tag 'asm-generic-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
nds32: fix build failure caused by page table folding updates
asm-generic/nds32: don't redefine cacheflush primitives
Linus Torvalds [Tue, 14 Jan 2020 18:14:06 +0000 (10:14 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two simple fixes in the upper drivers (so both fairly core), one in
enclosures, which fixes replugging a device into an enclosure slot and
one in the disk driver which fixes revalidating a drive with
protection information (PI) to make it a non-PI drive ... previously
we were still remembering the old PI state.
Both fixed issues are quite rare in the field"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: enclosure: Fix stale device oops with hot replug
scsi: sd: Clear sdkp->protection_type if disk is reformatted without PI