platform/kernel/linux-rpi.git
2 years agoNFS: Remove the nfs4_label argument from nfs_setsecurity
Anna Schumaker [Fri, 22 Oct 2021 17:11:12 +0000 (13:11 -0400)]
NFS: Remove the nfs4_label argument from nfs_setsecurity

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Remove the nfs4_label argument from nfs_fhget()
Anna Schumaker [Fri, 22 Oct 2021 17:11:11 +0000 (13:11 -0400)]
NFS: Remove the nfs4_label argument from nfs_fhget()

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Remove the nfs4_label argument from nfs_add_or_obtain()
Anna Schumaker [Fri, 22 Oct 2021 17:11:10 +0000 (13:11 -0400)]
NFS: Remove the nfs4_label argument from nfs_add_or_obtain()

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Remove the nfs4_label argument from nfs_instantiate()
Anna Schumaker [Fri, 22 Oct 2021 17:11:09 +0000 (13:11 -0400)]
NFS: Remove the nfs4_label argument from nfs_instantiate()

Pull the label from the fattr instead.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Remove the nfs4_label from the nfs_setattrres
Anna Schumaker [Fri, 22 Oct 2021 17:11:08 +0000 (13:11 -0400)]
NFS: Remove the nfs4_label from the nfs_setattrres

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Remove the nfs4_label from the nfs4_getattr_res
Anna Schumaker [Fri, 22 Oct 2021 17:11:07 +0000 (13:11 -0400)]
NFS: Remove the nfs4_label from the nfs4_getattr_res

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Remove the f_label from the nfs4_opendata and nfs_openres
Anna Schumaker [Fri, 22 Oct 2021 17:11:06 +0000 (13:11 -0400)]
NFS: Remove the f_label from the nfs4_opendata and nfs_openres

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Remove the nfs4_label from the nfs4_lookupp_res struct
Anna Schumaker [Fri, 22 Oct 2021 17:11:05 +0000 (13:11 -0400)]
NFS: Remove the nfs4_label from the nfs4_lookupp_res struct

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Remove the label from the nfs4_lookup_res struct
Anna Schumaker [Fri, 22 Oct 2021 17:11:04 +0000 (13:11 -0400)]
NFS: Remove the label from the nfs4_lookup_res struct

And usethe fattr's label field instead. I also adjust function calls to
remove labels along the way.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Remove the nfs4_label from the nfs4_link_res struct
Anna Schumaker [Fri, 22 Oct 2021 17:11:03 +0000 (13:11 -0400)]
NFS: Remove the nfs4_label from the nfs4_link_res struct

Again, use the fattr's label field instead.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Remove the nfs4_label from the nfs4_create_res struct
Anna Schumaker [Fri, 22 Oct 2021 17:11:02 +0000 (13:11 -0400)]
NFS: Remove the nfs4_label from the nfs4_create_res struct

Instead, use the label embedded in the attached fattr.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Remove the nfs4_label from the nfs_entry struct
Anna Schumaker [Fri, 22 Oct 2021 17:11:01 +0000 (13:11 -0400)]
NFS: Remove the nfs4_label from the nfs_entry struct

And instead allocate the fattr using nfs_alloc_fattr_with_label()

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Create a new nfs_alloc_fattr_with_label() function
Anna Schumaker [Fri, 22 Oct 2021 17:11:00 +0000 (13:11 -0400)]
NFS: Create a new nfs_alloc_fattr_with_label() function

For creating fattrs with the label field already allocated for us. I
also update nfs_free_fattr() to free the label in the end.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Always initialise fattr->label in nfs_fattr_alloc()
Trond Myklebust [Thu, 4 Nov 2021 22:03:26 +0000 (18:03 -0400)]
NFS: Always initialise fattr->label in nfs_fattr_alloc()

We're about to add a check in nfs_free_fattr() for whether or not the
label is non-zero.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFSv4.2: alloc_file_pseudo() takes an open flag, not an f_mode
Trond Myklebust [Fri, 5 Nov 2021 18:32:28 +0000 (14:32 -0400)]
NFSv4.2: alloc_file_pseudo() takes an open flag, not an f_mode

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Don't allocate nfs_fattr on the stack in __nfs42_ssc_open()
Trond Myklebust [Fri, 5 Nov 2021 18:23:30 +0000 (14:23 -0400)]
NFS: Don't allocate nfs_fattr on the stack in __nfs42_ssc_open()

The preferred behaviour is always to allocate struct nfs_fattr from the
slab.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFSv4: Remove unnecessary 'minor version' check
Trond Myklebust [Fri, 5 Nov 2021 17:40:11 +0000 (13:40 -0400)]
NFSv4: Remove unnecessary 'minor version' check

It is completely redundant to the server capability check.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFSv4: Fix potential Oops in decode_op_map()
Trond Myklebust [Thu, 4 Nov 2021 21:33:36 +0000 (17:33 -0400)]
NFSv4: Fix potential Oops in decode_op_map()

The return value of xdr_inline_decode() is not being checked, leading to
a potential Oops. Just replace the open coded array decode with the
generic XDR version.

Reported-by: <rtm@csail.mit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFSv4: Ensure decode_compound_hdr() sanity checks the tag
Trond Myklebust [Thu, 4 Nov 2021 21:18:01 +0000 (17:18 -0400)]
NFSv4: Ensure decode_compound_hdr() sanity checks the tag

The server is supposed to return the same tag that the client sends in
the outgoing RPC call, but we should still sanity check the length just
in case.

Reported-by: <rtm@csail.mit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Don't trace an uninitialised value
Trond Myklebust [Fri, 5 Nov 2021 16:35:26 +0000 (12:35 -0400)]
NFS: Don't trace an uninitialised value

If fhandle is NULL or fattr is NULL, then 'error' is uninitialised.

Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
2 years agoSUNRPC: Prevent immediate close+reconnect
Trond Myklebust [Tue, 26 Oct 2021 22:01:07 +0000 (18:01 -0400)]
SUNRPC: Prevent immediate close+reconnect

If we have already set up the socket and are waiting for it to connect,
then don't immediately close and retry.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoSUNRPC: Fix races when closing the socket
Trond Myklebust [Fri, 29 Oct 2021 16:26:17 +0000 (12:26 -0400)]
SUNRPC: Fix races when closing the socket

Ensure that we bump the xprt->connect_cookie when we set the
XPRT_CLOSE_WAIT flag so that another call to
xprt_conditional_disconnect() won't race with the reconnection.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFSv4.2 add tracepoint to OFFLOAD_CANCEL
Olga Kornievskaia [Thu, 4 Nov 2021 14:57:14 +0000 (10:57 -0400)]
NFSv4.2 add tracepoint to OFFLOAD_CANCEL

Add tracepoint to OFFLOAD_CANCEL operation.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFSv4.2 add tracepoint to COPY_NOTIFY
Olga Kornievskaia [Thu, 4 Nov 2021 14:57:13 +0000 (10:57 -0400)]
NFSv4.2 add tracepoint to COPY_NOTIFY

Add a tracepoint to COPY_NOTIFY operation.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFSv4.2 add tracepoint to CB_OFFLOAD
Olga Kornievskaia [Thu, 4 Nov 2021 14:57:12 +0000 (10:57 -0400)]
NFSv4.2 add tracepoint to CB_OFFLOAD

Add a tracepoint to the CB_OFFLOAD operation.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFSv4.2 add tracepoint to CLONE
Olga Kornievskaia [Thu, 4 Nov 2021 14:57:11 +0000 (10:57 -0400)]
NFSv4.2 add tracepoint to CLONE

Add a tracepoint to the CLONE operation.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFSv4.2 add tracepoint to COPY
Olga Kornievskaia [Thu, 4 Nov 2021 14:57:10 +0000 (10:57 -0400)]
NFSv4.2 add tracepoint to COPY

Add a tracepoint to the COPY operation.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFSv4.2 add tracepoints to FALLOCATE and DEALLOCATE
Olga Kornievskaia [Thu, 4 Nov 2021 14:57:09 +0000 (10:57 -0400)]
NFSv4.2 add tracepoints to FALLOCATE and DEALLOCATE

Add a tracepoint to the FALLOCATE/DEALLOCATE operations.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFSv4.2 add tracepoint to SEEK
Olga Kornievskaia [Thu, 4 Nov 2021 14:57:08 +0000 (10:57 -0400)]
NFSv4.2 add tracepoint to SEEK

Add a tracepoint to the SEEK operation.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoSUNRPC: Check if the xprt is connected before handling sysfs reads
Anna Schumaker [Thu, 28 Oct 2021 19:17:41 +0000 (15:17 -0400)]
SUNRPC: Check if the xprt is connected before handling sysfs reads

xprts don't immediately reconnect when changing the "dstaddr" property,
instead this gets handled the next time an operation uses the transport.
This could lead to NULL pointer dereferences when trying to read sysfs
files between the disconnect and reconnect operations. Fix this by
returning an error if the xprt is not connected.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agonfs: remove unused header <linux/pnfs_osd_xdr.h>
Jonathan Corbet [Tue, 2 Nov 2021 22:01:56 +0000 (16:01 -0600)]
nfs: remove unused header <linux/pnfs_osd_xdr.h>

Commit 19fcae3d4f2dd ("scsi: remove the SCSI OSD library") deleted the last
file that included <linux/pnfs_osd_xdr.h> but left that file behind.  It's
unused, get rid of it now.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agonfs4: take a reference on the nfs_client when running FREE_STATEID
Scott Mayhew [Wed, 3 Nov 2021 10:24:40 +0000 (06:24 -0400)]
nfs4: take a reference on the nfs_client when running FREE_STATEID

During umount, the session slot tables are freed.  If there are
outstanding FREE_STATEID tasks, a use-after-free and slab corruption can
occur when rpc_exit_task calls rpc_call_done -> nfs41_sequence_done ->
nfs4_sequence_process/nfs41_sequence_free_slot.

Prevent that from happening by taking a reference on the nfs_client in
nfs41_free_stateid and putting it in nfs41_free_stateid_release.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Add offset to nfs_aop_readahead tracepoint
Dave Wysochanski [Tue, 2 Nov 2021 19:51:55 +0000 (15:51 -0400)]
NFS: Add offset to nfs_aop_readahead tracepoint

Add the byte offset of the readahead request to the tracepoint output
so we know where the read starts.

Before this patch:
cat-8104    [002] .....   813.168775: nfs_aop_readahead: fileid=00:31:141 fhandle=0xe55807f6 version=1756509392533525500 nr_pages=256
cat-8104    [002] .....   813.174973: nfs_aop_readahead_done: fileid=00:31:141 fhandle=0xe55807f6 version=1756509392533525500 nr_pages=256 ret=0
cat-8104    [002] .....   813.175963: nfs_aop_readahead: fileid=00:31:141 fhandle=0xe55807f6 version=1756509392533525500 nr_pages=256
cat-8104    [002] .....   813.183742: nfs_aop_readahead_done: fileid=00:31:141 fhandle=0xe55807f6 version=1756509392533525500 nr_pages=1 ret=0

After this patch:
cat-6392    [001] .....    73.107782: nfs_aop_readahead: fileid=00:31:141 fhandle=0xed22403f version=1756511950029502774 offset=5242880 nr_pages=256
cat-6392    [001] .....    73.112466: nfs_aop_readahead_done: fileid=00:31:141 fhandle=0xed22403f version=1756511950029502774 nr_pages=256 ret=0
cat-6392    [001] .....    73.115692: nfs_aop_readahead: fileid=00:31:141 fhandle=0xed22403f version=1756511950029502774 offset=6291456 nr_pages=256
cat-6392    [001] .....    73.123283: nfs_aop_readahead_done: fileid=00:31:141 fhandle=0xed22403f version=1756511950029502774 nr_pages=256 ret=0

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoxprtrdma: Fix a maybe-uninitialized compiler warning
Benjamin Coddington [Tue, 2 Nov 2021 18:48:59 +0000 (14:48 -0400)]
xprtrdma: Fix a maybe-uninitialized compiler warning

This minor fix-up keeps GCC from complaining that "last' may be used
uninitialized", which breaks some build workflows that have been running
with all warnings treated as errors.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Move NFS protocol display macros to global header
Chuck Lever [Fri, 22 Oct 2021 20:17:03 +0000 (16:17 -0400)]
NFS: Move NFS protocol display macros to global header

Refactor: surface useful show_ macros so they can be shared between
the client and server trace code.

Additional clean up:
- Housekeeping: ensure the correct #include files are pulled in
  and add proper TRACE_DEFINE_ENUM where they are missing
- Use a consistent naming scheme for the helpers
- Store values to be displayed symbolically as unsigned long, as
  that is the type that the __print_yada() functions take

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Move generic FS show macros to global header
Chuck Lever [Fri, 22 Oct 2021 20:16:56 +0000 (16:16 -0400)]
NFS: Move generic FS show macros to global header

Refactor: Surface useful show_ macros for use by other trace
subsystems.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoSUNRPC: Clean up xs_tcp_setup_sock()
Trond Myklebust [Fri, 29 Oct 2021 16:05:48 +0000 (12:05 -0400)]
SUNRPC: Clean up xs_tcp_setup_sock()

Move the error handling into a single switch statement for clarity.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoSUNRPC: Replace use of socket sk_callback_lock with sock_lock
Trond Myklebust [Fri, 29 Oct 2021 15:02:20 +0000 (11:02 -0400)]
SUNRPC: Replace use of socket sk_callback_lock with sock_lock

Since we do things like setting flags, etc it really is more appropriate
to use sock_lock().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFSv4: Fix a regression in nfs_set_open_stateid_locked()
Trond Myklebust [Wed, 27 Oct 2021 01:56:40 +0000 (21:56 -0400)]
NFSv4: Fix a regression in nfs_set_open_stateid_locked()

If we already hold open state on the client, yet the server gives us a
completely different stateid to the one we already hold, then we
currently treat it as if it were an out-of-sequence update, and wait for
5 seconds for other updates to come in.
This commit fixes the behaviour so that we immediately start processing
of the new stateid, and then leave it to the call to
nfs4_test_and_free_stateid() to decide what to do with the old stateid.

Fixes: b4868b44c562 ("NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Remove redundant call to __set_page_dirty_nobuffers
Trond Myklebust [Thu, 21 Oct 2021 21:11:37 +0000 (17:11 -0400)]
NFS: Remove redundant call to __set_page_dirty_nobuffers

Remove a redundant call in nfs_updatepage(). nfs_writepage_setup() will
have already called nfs_mark_request_dirty() on success.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agosunrpc: remove unnecessary test in rpc_task_set_client()
Thiago Rafael Becker [Wed, 20 Oct 2021 21:04:28 +0000 (18:04 -0300)]
sunrpc: remove unnecessary test in rpc_task_set_client()

In rpc_task_set_client(), testing for a NULL clnt is not necessary, as
clnt should always be a valid pointer to a rpc_client.

Signed-off-by: Thiago Rafael Becker <trbecker@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Unexport nfs_probe_fsinfo()
Anna Schumaker [Thu, 14 Oct 2021 17:55:08 +0000 (13:55 -0400)]
NFS: Unexport nfs_probe_fsinfo()

All the callers are now in client.c so we can remove the
EXPORT_SYMBOL_GPL() and make it static.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Call nfs_probe_server() during a fscontext-reconfigure event
Anna Schumaker [Thu, 14 Oct 2021 17:55:07 +0000 (13:55 -0400)]
NFS: Call nfs_probe_server() during a fscontext-reconfigure event

This lets us update the server's attributes when the user does a "mount
-o remount" on the filesystem.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Replace calls to nfs_probe_fsinfo() with nfs_probe_server()
Anna Schumaker [Thu, 14 Oct 2021 17:55:06 +0000 (13:55 -0400)]
NFS: Replace calls to nfs_probe_fsinfo() with nfs_probe_server()

Clean up. There are a few places where we want to probe the server, but
don't actually care about the fsinfo result. Change these to use
nfs_probe_server(), which handles the fattr allocation for us.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Move nfs_probe_destination() into the generic client
Anna Schumaker [Thu, 14 Oct 2021 17:55:05 +0000 (13:55 -0400)]
NFS: Move nfs_probe_destination() into the generic client

And rename it to nfs_probe_server(). I also change it to take the nfs_fh
as an argument so callers can choose what filehandle to probe.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Create an nfs4_server_set_init_caps() function
Anna Schumaker [Thu, 14 Oct 2021 17:55:04 +0000 (13:55 -0400)]
NFS: Create an nfs4_server_set_init_caps() function

And call it before doing an FSINFO probe to reset to the baseline
capabilities before probing.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Remove --> and <-- dprintk call sites
Chuck Lever [Sat, 16 Oct 2021 22:03:04 +0000 (18:03 -0400)]
NFS: Remove --> and <-- dprintk call sites

dprintk call sites that display no other information than the
function name can be replaced with use of the trace "function" or
"function_graph" plug-ins.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoSUNRPC: Trace calls to .rpc_call_done
Chuck Lever [Sat, 16 Oct 2021 22:02:57 +0000 (18:02 -0400)]
SUNRPC: Trace calls to .rpc_call_done

Introduce a single tracepoint that can replace simple dprintk call
sites in upper layer "rpc_call_done" callbacks. Example:

   kworker/u24:2-1254  [001]   771.026677: rpc_stats_latency:    task:00000001@00000002 xid=0x16a6f3c0 rpcbindv2 GETPORT backlog=446 rtt=101 execute=555
   kworker/u24:2-1254  [001]   771.026677: rpc_task_call_done:   task:00000001@00000002 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpcb_getport_done
   kworker/u24:2-1254  [001]   771.026678: rpcb_setport:         task:00000001@00000002 status=0 port=20048

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Replace dprintk callsites in nfs_readpage(s)
Chuck Lever [Sat, 16 Oct 2021 22:02:51 +0000 (18:02 -0400)]
NFS: Replace dprintk callsites in nfs_readpage(s)

These new events report slightly different information for readpage
and readpages/readahead.

For readpage:
             fsx-1387  [006]   380.761896: nfs_aop_readpage:    fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355910932437 offset=131072
             fsx-1387  [006]   380.761900: nfs_aop_readpage_done: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355910932437 offset=131072 ret=0

The index of a synchronous single-page read is reported.

For readpages:

             fsx-1387  [006]   380.760847: nfs_aop_readahead:   fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 nr_pages=3
             fsx-1387  [006]   380.760853: nfs_aop_readahead_done: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 nr_pages=3 ret=0

The count of pages requested is reported. nfs_readpages does not
wait for the READ requests to complete.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoSUNRPC: Use BIT() macro in rpc_show_xprt_state()
Chuck Lever [Sat, 16 Oct 2021 22:02:38 +0000 (18:02 -0400)]
SUNRPC: Use BIT() macro in rpc_show_xprt_state()

Clean up: BIT() is preferred over open-coding the shift.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoSUNRPC: Tracepoints should display tk_pid and cl_clid as a fixed-size field
Chuck Lever [Sat, 16 Oct 2021 22:02:24 +0000 (18:02 -0400)]
SUNRPC: Tracepoints should display tk_pid and cl_clid as a fixed-size field

For certain special cases, RPC-related tracepoints record a -1 as
the task ID or the client ID. It's ugly for a trace event to display
4 billion in these cases.

To help keep SUNRPC tracepoints consistent, create a macro that
defines the print format specifiers for tk_pid and cl_clid. At some
point in the future we might try tk_pid with a wider range of values
than 0..64K so this makes it easier to make that change.

RPC tracepoints now look like this:

<...>-1276  [009]   149.720358: rpc_clnt_new:         client=00000005 peer=[192.168.2.55]:20049 program=nfs server=klimt.ib

<...>-1342  [004]   149.921234: rpc_xdr_recvfrom:     task:0000001a@00000005 head=[0xff1242d9ab6dc01c,144] page=0 tail=[(nil),0] len=144
<...>-1342  [004]   149.921235: xprt_release_cong:    task:0000001a@00000005 snd_task:ffffffff cong=256 cwnd=16384
<...>-1342  [004]   149.921235: xprt_put_cong:        task:0000001a@00000005 snd_task:ffffffff cong=0 cwnd=16384

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoxprtrdma: Remove rpcrdma_ep::re_implicit_roundup
Chuck Lever [Tue, 5 Oct 2021 14:18:06 +0000 (10:18 -0400)]
xprtrdma: Remove rpcrdma_ep::re_implicit_roundup

Clean up: this field is no longer used.

xprt_rdma_pad_optimize is also no longer used, but is left in place
because it is part of the kernel/userspace API.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoxprtrdma: Provide a buffer to pad Write chunks of unaligned length
Chuck Lever [Tue, 5 Oct 2021 14:17:59 +0000 (10:17 -0400)]
xprtrdma: Provide a buffer to pad Write chunks of unaligned length

This is a buffer to be left persistently registered while a
connection is up. Connection tear-down will automatically DMA-unmap,
invalidate, and dereg the MR. A persistently registered buffer is
lower in cost to provide, and it can never be coalesced into the
RDMA segment that carries the data payload.

An RPC that provisions a Write chunk with a non-aligned length now
uses this MR rather than the tail buffer of the RPC's rq_rcv_buf.

Reviewed-By: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoFix user namespace leak
Alexey Gladkov [Thu, 14 Oct 2021 16:02:30 +0000 (18:02 +0200)]
Fix user namespace leak

Fixes: 61ca2c4afd9d ("NFS: Only reference user namespace from nfs4idmap struct instead of cred")
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Save some space in the inode
Trond Myklebust [Tue, 28 Sep 2021 21:41:41 +0000 (17:41 -0400)]
NFS: Save some space in the inode

Save some space in the nfs_inode by setting up an anonymous union with
the fields that are peculiar to a specific type of filesystem object.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Fix WARN_ON due to unionization of nfs_inode.nrequests
Dave Wysochanski [Sun, 10 Oct 2021 22:23:13 +0000 (18:23 -0400)]
NFS: Fix WARN_ON due to unionization of nfs_inode.nrequests

Fixes the following WARN_ON
WARNING: CPU: 2 PID: 18678 at fs/nfs/inode.c:123 nfs_clear_inode+0x3b/0x50 [nfs]
...
Call Trace:
  nfs4_evict_inode+0x57/0x70 [nfsv4]
  evict+0xd1/0x180
  dispose_list+0x48/0x60
  evict_inodes+0x156/0x190
  generic_shutdown_super+0x37/0x110
  nfs_kill_super+0x1d/0x40 [nfs]
  deactivate_locked_super+0x36/0xa0

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFSv4: Fixes for nfs4_inode_return_delegation()
Trond Myklebust [Sun, 10 Oct 2021 08:58:12 +0000 (10:58 +0200)]
NFSv4: Fixes for nfs4_inode_return_delegation()

We mustn't call nfs_wb_all() on anything other than a regular file.
Furthermore, we can exit early when we don't hold a delegation.

Reported-by: David Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Fix an Oops in pnfs_mark_request_commit()
Trond Myklebust [Tue, 5 Oct 2021 18:05:02 +0000 (14:05 -0400)]
NFS: Fix an Oops in pnfs_mark_request_commit()

Olga reports seeing the following Oops when doing O_DIRECT writes to a
pNFS flexfiles server:

Oops: 0000 [#1] SMP PTI
CPU: 1 PID: 234186 Comm: kworker/u8:1 Not tainted 5.15.0-rc4+ #4
Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
Workqueue: nfsiod rpc_async_release [sunrpc]
RIP: 0010:nfs_mark_request_commit+0x12/0x30 [nfs]
Code: ff ff be 03 00 00 00 e8 ac 34 83 eb e9 29 ff ff
ff e8 22 bc d7 eb 66 90 0f 1f 44 00 00 48 85 f6 74 16 48 8b 42 10 48
8b 40 18 <48> 8b 40 18 48 85 c0 74 05 e9 70 fc 15 ec 48 89 d6 e9 68 ed
ff ff
RSP: 0018:ffffa82f0159fe00 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8f3393141880 RCX: 0000000000000000
RDX: ffffa82f0159fe08 RSI: ffff8f3381252500 RDI: ffff8f3393141880
RBP: ffff8f33ac317c00 R08: 0000000000000000 R09: ffff8f3487724cb0
R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000001
R13: ffff8f3485bccee0 R14: ffff8f33ac317c10 R15: ffff8f33ac317cd8
FS:  0000000000000000(0000) GS:ffff8f34fbc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000018 CR3: 0000000122120006 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 nfs_direct_write_completion+0x13b/0x250 [nfs]
 rpc_free_task+0x39/0x60 [sunrpc]
 rpc_async_release+0x29/0x40 [sunrpc]
 process_one_work+0x1ce/0x370
 worker_thread+0x30/0x380
 ? process_one_work+0x370/0x370
 kthread+0x11a/0x140
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x22/0x30

Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: 9c455a8c1e14 ("NFS/pNFS: Clean up pNFS commit operations")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Fix up commit deadlocks
Trond Myklebust [Mon, 4 Oct 2021 19:37:42 +0000 (15:37 -0400)]
NFS: Fix up commit deadlocks

If O_DIRECT bumps the commit_info rpcs_out field, then that could lead
to fsync() hangs. The fix is to ensure that O_DIRECT calls
nfs_commit_end().

Fixes: 723c921e7dfc ("sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Fix deadlocks in nfs_scan_commit_list()
Trond Myklebust [Mon, 4 Oct 2021 19:44:16 +0000 (15:44 -0400)]
NFS: Fix deadlocks in nfs_scan_commit_list()

Partially revert commit 2ce209c42c01 ("NFS: Wait for requests that are
locked on the commit list"), since it can lead to deadlocks between
commit requests and nfs_join_page_group().
For now we should assume that any locked requests on the commit list are
either about to be removed and committed by another task, or the writes
they describe are about to be retransmitted. In either case, we should
not need to worry.

Fixes: 2ce209c42c01 ("NFS: Wait for requests that are locked on the commit list")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Instrument i_size_write()
Chuck Lever [Mon, 4 Oct 2021 14:10:16 +0000 (10:10 -0400)]
NFS: Instrument i_size_write()

Generate a trace event whenever the NFS client modifies the size of
a file. These new events aid troubleshooting workloads that trigger
races around size updates.

There are four new trace points, all named nfs_size_something so
they are easy to grep for or enable as a group with a single glob.

Size updated on the server:

  kworker/u24:10-194   [010]   369.939174: nfs_size_update:      fileid=00:28:2 fhandle=0x36fbbe51 version=1752899344277980615 cursize=250471 newsize=172083

Server-side size update reported via NFSv3 WCC attributes:

             fsx-1387  [006]   380.760686: nfs_size_wcc:         fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 cursize=146792 newsize=171216

File has been truncated locally:

             fsx-1387  [007]   369.437421: nfs_size_truncate:    fileid=00:28:2 fhandle=0x36fbbe51 version=1752899231200117272 cursize=215244 newsize=0

File has been extended locally:

             fsx-1387  [007]   369.439213: nfs_size_grow:        fileid=00:28:2 fhandle=0x36fbbe51 version=1752899343704248410 cursize=258048 newsize=262144

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoSUNRPC: Per-rpc_clnt task PIDs
Chuck Lever [Mon, 4 Oct 2021 14:10:10 +0000 (10:10 -0400)]
SUNRPC: Per-rpc_clnt task PIDs

The current range of RPC task PIDs is 0..65535. That's not adequate
for distinguishing tasks across multiple rpc_clnts running high
throughput workloads.

To help relieve this situation and to reduce the bottleneck of
having a single atomic for assigning all RPC task PIDs, assign task
PIDs per rpc_clnt.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Remove unnecessary TRACE_DEFINE_ENUM()s
Chuck Lever [Mon, 4 Oct 2021 14:09:57 +0000 (10:09 -0400)]
NFS: Remove unnecessary TRACE_DEFINE_ENUM()s

Clean up: TRACE_DEFINE_ENUM is unnecessary because the target
symbols are all C macros, not enums.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agopnfs/flexfiles: Fix misplaced barrier in nfs4_ff_layout_prepare_ds
Baptiste Lepers [Mon, 6 Sep 2021 01:59:24 +0000 (11:59 +1000)]
pnfs/flexfiles: Fix misplaced barrier in nfs4_ff_layout_prepare_ds

_nfs4_pnfs_v3/v4_ds_connect do
   some work
   smp_wmb
   ds->ds_clp = clp;

And nfs4_ff_layout_prepare_ds currently does
   smp_rmb
   if(ds->ds_clp)
      ...

This patch places the smp_rmb after the if. This ensures that following
reads only happen once nfs4_ff_layout_prepare_ds has checked that data
has been properly initialized.

Fixes: d67ae825a59d6 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Remove unnecessary page cache invalidations
Trond Myklebust [Sat, 2 Oct 2021 23:21:49 +0000 (19:21 -0400)]
NFS: Remove unnecessary page cache invalidations

Remove cache invalidations that are already covered by change attribute
updates.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Do not flush the readdir cache in nfs_dentry_iput()
Trond Myklebust [Sat, 2 Oct 2021 23:04:59 +0000 (19:04 -0400)]
NFS: Do not flush the readdir cache in nfs_dentry_iput()

The original premise in commit 83672d392f7b ("NFS: Fix directory caching
problem - with test case and patch.") was that readdirplus was caching
attribute information and replaying it later. This is no longer the
case.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Fix dentry verifier races
Trond Myklebust [Wed, 29 Sep 2021 12:12:53 +0000 (08:12 -0400)]
NFS: Fix dentry verifier races

If the directory changed while we were revalidating the dentry, then
don't update the dentry verifier. There is no value in setting the
verifier to an older value, and we could end up overwriting a more up to
date verifier from a parallel revalidation.

Fixes: efeda80da38d ("NFSv4: Fix revalidation of dentries with delegations")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
2 years agoNFS: Further optimisations for 'ls -l'
Trond Myklebust [Tue, 28 Sep 2021 18:33:44 +0000 (14:33 -0400)]
NFS: Further optimisations for 'ls -l'

If a user is doing 'ls -l', we have a heuristic in GETATTR that tells
the readdir code to try to use READDIRPLUS in order to refresh the inode
attributes. In certain cirumstances, we also try to invalidate the
remaining directory entries in order to ensure this refresh.

If there are multiple readers of the directory, we probably should avoid
invalidating the page cache, since the heuristic breaks down in that
situation anyway.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
2 years agoNFS: Fix up nfs_readdir_inode_mapping_valid()
Trond Myklebust [Tue, 28 Sep 2021 16:37:05 +0000 (12:37 -0400)]
NFS: Fix up nfs_readdir_inode_mapping_valid()

The check for duplicate readdir cookies should only care if the change
attribute is invalid or the data cache is invalid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
2 years agoNFS: Ignore the directory size when marking for revalidation
Trond Myklebust [Tue, 28 Sep 2021 15:24:57 +0000 (11:24 -0400)]
NFS: Ignore the directory size when marking for revalidation

If we want to revalidate the directory, then just mark the change
attribute as invalid.

Fixes: 13c0b082b6a9 ("NFS: Replace use of NFS_INO_REVAL_PAGECACHE when checking cache validity")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
2 years agoNFS: Don't set NFS_INO_DATA_INVAL_DEFER and NFS_INO_INVALID_DATA
Trond Myklebust [Tue, 28 Sep 2021 15:15:53 +0000 (11:15 -0400)]
NFS: Don't set NFS_INO_DATA_INVAL_DEFER and NFS_INO_INVALID_DATA

NFS_INO_DATA_INVAL_DEFER and NFS_INO_INVALID_DATA should be considered
mutually exclusive.

Fixes: 1c341b777501 ("NFS: Add deferred cache invalidation for close-to-open consistency violations")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
2 years agoNFS: Default change_attr_type to NFS4_CHANGE_TYPE_IS_UNDEFINED
Trond Myklebust [Sun, 26 Sep 2021 18:05:04 +0000 (14:05 -0400)]
NFS: Default change_attr_type to NFS4_CHANGE_TYPE_IS_UNDEFINED

Both NFSv3 and NFSv2 generate their change attribute from the ctime
value that was supplied by the server. However the problem is that there
are plenty of servers out there with ctime resolutions of 1ms or worse.
In a modern performance system, this is insufficient when trying to
decide which is the most recent set of attributes when, for instance, a
READ or GETATTR call races with a WRITE or SETATTR.

For this reason, let's revert to labelling the NFSv2/v3 change
attributes as NFS4_CHANGE_TYPE_IS_UNDEFINED. This will ensure we protect
against such races.

Fixes: 7b24dacf0840 ("NFS: Another inode revalidation improvement")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSv4: Retrieve ACCESS on open if we're not using NFS4_CREATE_EXCLUSIVE
Trond Myklebust [Wed, 14 Jul 2021 17:00:58 +0000 (13:00 -0400)]
NFSv4: Retrieve ACCESS on open if we're not using NFS4_CREATE_EXCLUSIVE

NFS4_CREATE_EXCLUSIVE does not allow the caller to set an access mode,
so for most Linux filesystems, the access call ends up returning no
permissions. However both NFS4_CREATE_EXCLUSIVE4_1 and
NFS4_CREATE_GUARDED allow the client to set the access mode.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Fix a few more clear_bit() instances that need release semantics
Trond Myklebust [Tue, 13 Jul 2021 16:28:22 +0000 (12:28 -0400)]
NFS: Fix a few more clear_bit() instances that need release semantics

All these bits are being used as bit locks.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoSUNRPC: xprt_clear_locked() only needs release memory semantics
Trond Myklebust [Mon, 12 Jul 2021 16:34:34 +0000 (12:34 -0400)]
SUNRPC: xprt_clear_locked() only needs release memory semantics

The clearing of the XPRT_LOCKED bit has to happen after we clear
xprt->snd_task, but we don't require any extra memory barriers after
that.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoSUNRPC: Remove unnecessary memory barriers
Trond Myklebust [Mon, 12 Jul 2021 16:24:15 +0000 (12:24 -0400)]
SUNRPC: Remove unnecessary memory barriers

The only check for RPC_TASK_RUNNING is the one in rpc_make_runnable(),
which happens under the same spin lock held when we call
rpc_clear_running().

Ditto, the last check for RPC_TASK_QUEUED in rpc_execute() is performed
under the same lock as the one held when we call rpc_clear_queued().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoSUNRPC: Remove WQ_HIGHPRI from xprtiod
Trond Myklebust [Mon, 12 Jul 2021 15:57:15 +0000 (11:57 -0400)]
SUNRPC: Remove WQ_HIGHPRI from xprtiod

Don't let xprtiod pre-empt softirq.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoSUNRPC: Add cond_resched() at the appropriate point in __rpc_execute()
Trond Myklebust [Mon, 12 Jul 2021 13:57:08 +0000 (09:57 -0400)]
SUNRPC: Add cond_resched() at the appropriate point in __rpc_execute()

Allow tasks that need to pre-empt rpciod/xprtiod to do so when it is
safe.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoSUNRPC: Partial revert of commit 6f9f17287e78
Trond Myklebust [Mon, 12 Jul 2021 13:52:59 +0000 (09:52 -0400)]
SUNRPC: Partial revert of commit 6f9f17287e78

The premise of commit 6f9f17287e78 ("SUNRPC: Mitigate cond_resched() in
xprt_transmit()") was that cond_resched() is expensive and unnecessary
when there has been just a single send.
The point of cond_resched() is to ensure that tasks that should pre-empt
this one get a chance to do so when it is safe to do so. The code prior
to commit 6f9f17287e78 failed to take into account that it was keeping a
rpc_task pinned for longer than it needed to, and so rather than doing a
full revert, let's just move the cond_resched.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Fix up nfs_ctx_key_to_expire()
Trond Myklebust [Sat, 10 Jul 2021 22:07:14 +0000 (18:07 -0400)]
NFS: Fix up nfs_ctx_key_to_expire()

If the cached credential exists but doesn't have any expiration callback
then exit early.
Fix up atomicity issues when replacing the credential with a new one
since the existing code could lead to refcount leaks.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Label the dentry with a verifier in nfs_rmdir() and nfs_unlink()
Trond Myklebust [Thu, 8 Jul 2021 01:43:09 +0000 (21:43 -0400)]
NFS: Label the dentry with a verifier in nfs_rmdir() and nfs_unlink()

After the success of an operation such as rmdir() or unlink(), we expect
to add the dentry back to the dcache as an ordinary negative dentry.
However in NFS, unless it is labelled with the appropriate verifier for
the parent directory state, then nfs_lookup_revalidate will end up
discarding that dentry and forcing a new lookup.

The fix is to ensure that we relabel the dentry appropriately on
success.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoNFS: Label the dentry with a verifier in nfs_link(), nfs_symlink()
Trond Myklebust [Thu, 8 Jul 2021 02:08:32 +0000 (22:08 -0400)]
NFS: Label the dentry with a verifier in nfs_link(), nfs_symlink()

After the success of an operation such as link(), or symlink(), we
expect to add the dentry back to the dcache as an ordinary positive
dentry.
However in NFS, unless it is labelled with the appropriate verifier for
the parent directory state, then nfs_lookup_revalidate will end up
discarding that dentry and forcing a new lookup.

The fix is to ensure that we relabel the dentry appropriately on
success.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2 years agoLinux 5.15-rc4
Linus Torvalds [Sun, 3 Oct 2021 21:08:47 +0000 (14:08 -0700)]
Linux 5.15-rc4

2 years agoelf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings
Chen Jingwen [Tue, 28 Sep 2021 12:56:57 +0000 (20:56 +0800)]
elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings

In commit b212921b13bd ("elf: don't use MAP_FIXED_NOREPLACE for elf
executable mappings") we still leave MAP_FIXED_NOREPLACE in place for
load_elf_interp.

Unfortunately, this will cause kernel to fail to start with:

    1 (init): Uhuuh, elf segment at 00003ffff7ffd000 requested but the memory is mapped already
    Failed to execute /init (error -17)

The reason is that the elf interpreter (ld.so) has overlapping segments.

  readelf -l ld-2.31.so
  Program Headers:
    Type           Offset             VirtAddr           PhysAddr
                   FileSiz            MemSiz              Flags  Align
    LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                   0x000000000002c94c 0x000000000002c94c  R E    0x10000
    LOAD           0x000000000002dae0 0x000000000003dae0 0x000000000003dae0
                   0x00000000000021e8 0x0000000000002320  RW     0x10000
    LOAD           0x000000000002fe00 0x000000000003fe00 0x000000000003fe00
                   0x00000000000011ac 0x0000000000001328  RW     0x10000

The reason for this problem is the same as described in commit
ad55eac74f20 ("elf: enforce MAP_FIXED on overlaying elf segments").

Not only executable binaries, elf interpreters (e.g. ld.so) can have
overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go
back to MAP_FIXED in load_elf_interp.

Fixes: 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map")
Cc: <stable@vger.kernel.org> # v4.19
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoMerge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 3 Oct 2021 20:56:53 +0000 (13:56 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix a number of ext4 bugs in fast_commit, inline data, and delayed
  allocation.

  Also fix error handling code paths in ext4_dx_readdir() and
  ext4_fill_super().

  Finally, avoid a grabbing a journal head in the delayed allocation
  write in the common cases where we are overwriting a pre-existing
  block or appending to an inode"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: recheck buffer uptodate bit under buffer lock
  ext4: fix potential infinite loop in ext4_dx_readdir()
  ext4: flush s_error_work before journal destroy in ext4_fill_super
  ext4: fix loff_t overflow in ext4_max_bitmap_size()
  ext4: fix reserved space counter leakage
  ext4: limit the number of blocks in one ADD_RANGE TLV
  ext4: enforce buffer head state assertion in ext4_da_map_blocks
  ext4: remove extent cache entries when truncating inline data
  ext4: drop unnecessary journal handle in delalloc write
  ext4: factor out write end code of inline file
  ext4: correct the error path of ext4_write_inline_data_end()
  ext4: check and update i_disksize properly
  ext4: add error checking to ext4_ext_replay_set_iblocks()

2 years agoobjtool: print out the symbol type when complaining about it
Linus Torvalds [Sun, 3 Oct 2021 20:45:48 +0000 (13:45 -0700)]
objtool: print out the symbol type when complaining about it

The objtool warning that the kvm instruction emulation code triggered
wasn't very useful:

    arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception

in that it helpfully tells you which symbol name it had trouble figuring
out the relocation for, but it doesn't actually say what the unknown
symbol type was that triggered it all.

In this case it was because of missing type information (type 0, aka
STT_NOTYPE), but on the whole it really should just have printed that
out as part of the message.

Because if this warning triggers, that's very much the first thing you
want to know - why did reloc2sec_off() return failure for that symbol?

So rather than just saying you can't handle some type of symbol without
saying what the type _was_, just print out the type number too.

Fixes: 24ff65257375 ("objtool: Teach get_alt_entry() about more relocation types")
Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agokvm: fix objtool relocation warning
Linus Torvalds [Sun, 3 Oct 2021 20:34:19 +0000 (13:34 -0700)]
kvm: fix objtool relocation warning

The recent change to make objtool aware of more symbol relocation types
(commit 24ff65257375: "objtool: Teach get_alt_entry() about more
relocation types") also added another check, and resulted in this
objtool warning when building kvm on x86:

    arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception

The reason seems to be that kvm_fastop_exception() is marked as a global
symbol, which causes the relocation to ke kept around for objtool.  And
at the same time, the kvm_fastop_exception definition (which is done as
an inline asm statement) doesn't actually set the type of the global,
which then makes objtool unhappy.

The minimal fix is to just not mark kvm_fastop_exception as being a
global symbol.  It's only used in that one compilation unit anyway, so
it was always pointless.  That's how all the other local exception table
labels are done.

I'm not entirely happy about the kinds of games that the kvm code plays
with doing its own exception handling, and the fact that it confused
objtool is most definitely a symptom of the code being a bit too subtle
and ad-hoc.  But at least this trivial one-liner makes objtool no longer
upset about what is going on.

Fixes: 24ff65257375 ("objtool: Teach get_alt_entry() about more relocation types")
Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/
Cc: Borislav Petkov <bp@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoMerge tag 'char-misc-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sun, 3 Oct 2021 18:19:02 +0000 (11:19 -0700)]
Merge tag 'char-misc-5.15-rc4' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small misc driver fixes for 5.15-rc4. They are in two
  "groups":

   - ipack driver fixes for issues found by Johan Hovold

   - interconnect driver fixes for reported problems

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  ipack: ipoctal: fix module reference leak
  ipack: ipoctal: fix missing allocation-failure check
  ipack: ipoctal: fix tty-registration error handling
  ipack: ipoctal: fix tty registration race
  ipack: ipoctal: fix stack information leak
  interconnect: qcom: sdm660: Add missing a2noc qos clocks
  dt-bindings: interconnect: sdm660: Add missing a2noc qos clocks
  interconnect: qcom: sdm660: Correct NOC_QOS_PRIORITY shift and mask
  interconnect: qcom: sdm660: Fix id of slv_cnoc_mnoc_cfg

2 years agoMerge tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 3 Oct 2021 18:10:09 +0000 (11:10 -0700)]
Merge tag 'driver-core-5.15-rc4' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are some driver core and kernfs fixes for reported issues for
  5.15-rc4. These fixes include:

   - kernfs positive dentry bugfix

   - debugfs_create_file_size error path fix

   - cpumask sysfs file bugfix to preserve the user/kernel abi (has been
     reported multiple times.)

   - devlink fixes for mdiobus devices as reported by the subsystem
     maintainers.

  Also included in here are some devlink debugging changes to make it
  easier for people to report problems when asked. They have already
  helped with the mdiobus and other subsystems reporting issues.

  All of these have been linux-next for a while with no reported issues"

* tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kernfs: also call kernfs_set_rev() for positive dentry
  driver core: Add debug logs when fwnode links are added/deleted
  driver core: Create __fwnode_link_del() helper function
  driver core: Set deferred probe reason when deferred by driver core
  net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus parents
  driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  driver core: fw_devlink: Improve handling of cyclic dependencies
  cpumask: Omit terminating null byte in cpumap_print_{list,bitmask}_to_buf
  debugfs: debugfs_create_file_size(): use IS_ERR to check for error

2 years agoMerge tag 'sched_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 3 Oct 2021 17:49:00 +0000 (10:49 -0700)]
Merge tag 'sched_urgent_for_v5.15_rc4' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Tell the compiler to always inline is_percpu_thread()

 - Make sure tunable_scaling buffer is null-terminated after an update
   in sysfs

 - Fix LTP named regression due to cgroup list ordering

* tag 'sched_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Always inline is_percpu_thread()
  sched/fair: Null terminate buffer when updating tunable_scaling
  sched/fair: Add ancestors of unthrottled undecayed cfs_rq

2 years agoMerge tag 'perf_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 3 Oct 2021 17:32:27 +0000 (10:32 -0700)]
Merge tag 'perf_urgent_for_v5.15_rc4' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Make sure the destroy callback is reset when a event initialization
   fails

 - Update the event constraints for Icelake

 - Make sure the active time of an event is updated even for inactive
   events

* tag 'perf_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: fix userpage->time_enabled of inactive events
  perf/x86/intel: Update event constraints for ICX
  perf/x86: Reset destroy callback on event init failure

2 years agoMerge tag 'objtool_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 3 Oct 2021 17:23:54 +0000 (10:23 -0700)]
Merge tag 'objtool_urgent_for_v5.15_rc4' of git://git./linux/kernel/git/tip/tip

Pull objtool fix from Borislav Petkov:

 - Handle symbol relocations properly due to changes in the toolchains
   which remove section symbols now

* tag 'objtool_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Teach get_alt_entry() about more relocation types

2 years agoMerge tag 'hwmon-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 3 Oct 2021 00:51:01 +0000 (17:51 -0700)]
Merge tag 'hwmon-for-v5.15-rc4' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - Fixed various potential NULL pointer accesses in w8379* drivers

 - Improved error handling, fault reporting, and fixed rounding in
   thmp421 driver

 - Fixed error handling in ltc2947 driver

 - Added missing attribute to pmbus/mp2975 driver

 - Fixed attribute values in pbus/ibm-cffps, occ, and mlxreg-fan
   drivers

 - Removed unused residual code from k10temp driver

* tag 'hwmon-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (w83793) Fix NULL pointer dereference by removing unnecessary structure field
  hwmon: (w83792d) Fix NULL pointer dereference by removing unnecessary structure field
  hwmon: (w83791d) Fix NULL pointer dereference by removing unnecessary structure field
  hwmon: (pmbus/mp2975) Add missed POUT attribute for page 1 mp2975 controller
  hwmon: (pmbus/ibm-cffps) max_power_out swap changes
  hwmon: (occ) Fix P10 VRM temp sensors
  hwmon: (ltc2947) Properly handle errors when looking for the external clock
  hwmon: (tmp421) fix rounding for negative values
  hwmon: (tmp421) report /PVLD condition as fault
  hwmon: (tmp421) handle I2C errors
  hwmon: (mlxreg-fan) Return non-zero value when fan current state is enforced from sysfs
  hwmon: (k10temp) Remove residues of current and voltage

2 years agoMerge tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Sun, 3 Oct 2021 00:43:54 +0000 (17:43 -0700)]
Merge tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull ksmbd server fixes from Steve French:
 "Eleven fixes for the ksmbd kernel server, mostly security related:

   - an important fix for disabling weak NTLMv1 authentication

   - seven security (improved buffer overflow checks) fixes

   - fix for wrong infolevel struct used in some getattr/setattr paths

   - two small documentation fixes"

* tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: missing check for NULL in convert_to_nt_pathname()
  ksmbd: fix transform header validation
  ksmbd: add buffer validation for SMB2_CREATE_CONTEXT
  ksmbd: add validation in smb2 negotiate
  ksmbd: add request buffer validation in smb2_set_info
  ksmbd: use correct basic info level in set_file_basic_info()
  ksmbd: remove NTLMv1 authentication
  ksmbd: fix documentation for 2 functions
  MAINTAINERS: rename cifs_common to smbfs_common in cifs and ksmbd entry
  ksmbd: fix invalid request buffer access in compound
  ksmbd: remove RFC1002 check in smb2 request

2 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 2 Oct 2021 19:56:03 +0000 (12:56 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Five fairly minor fixes and spelling updates, all in drivers. Even
  though the ufs fix is in tracing, it's a potentially exploitable use
  beyond end of array bug"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: csiostor: Add module softdep on cxgb4
  scsi: qla2xxx: Fix excessive messages during device logout
  scsi: virtio_scsi: Fix spelling mistake "Unsupport" -> "Unsupported"
  scsi: ses: Fix unsigned comparison with less than zero
  scsi: ufs: Fix illegal offset in UPIU event trace

2 years agoMerge tag 'block-5.15-2021-10-01' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 2 Oct 2021 18:00:36 +0000 (11:00 -0700)]
Merge tag 'block-5.15-2021-10-01' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A few block fixes for this release:

   - Revert a BFQ commit that causes breakage for people. Unfortunately
     it was auto-selected for stable as well, so now 5.14.7 suffers from
     it too. Hopefully stable will pick up this revert quickly too, so
     we can remove the issue on that end as well.

   - Add a quirk for Apple NVMe controllers, which due to their
     non-compliance broke due to the introduction of command sequences
     (Keith)

   - Use shifts in nbd, fixing a __divdi3 issue (Nick)"

* tag 'block-5.15-2021-10-01' of git://git.kernel.dk/linux-block:
  nbd: use shifts rather than multiplies
  Revert "block, bfq: honor already-setup queue merges"
  nvme: add command id quirk for apple controllers

2 years agoMerge tag 'io_uring-5.15-2021-10-01' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 2 Oct 2021 17:26:19 +0000 (10:26 -0700)]
Merge tag 'io_uring-5.15-2021-10-01' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Two fixes in here:

   - The signal issue that was discussed start of this week (me).

   - Kill dead fasync support in io_uring. Looks like it was broken
     since io_uring was initially merged, and given that nobody has ever
     complained about it, let's just kill it (Pavel)"

* tag 'io_uring-5.15-2021-10-01' of git://git.kernel.dk/linux-block:
  io_uring: kill fasync
  io-wq: exclusively gate signal based exit on get_signal() return

2 years agoMerge tag 'libnvdimm-fixes-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 2 Oct 2021 17:08:35 +0000 (10:08 -0700)]
Merge tag 'libnvdimm-fixes-5.15-rc4' of git://git./linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
 "A fix for a regression added this cycle in the pmem driver, and for a
  long standing bug for failed NUMA node lookups on ARM64.

  This has appeared in -next for several days with no reported issues.

  Summary:

   - Fix a regression that caused the sysfs ABI for pmem block devices
     to not be registered. This fails the nvdimm unit tests and dax
     xfstests.

   - Fix numa node lookups for dax-kmem memory (device-dax memory
     assigned to the page allocator) on ARM64"

* tag 'libnvdimm-fixes-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  nvdimm/pmem: fix creating the dax group
  ACPI: NFIT: Use fallback node id when numa info in NFIT table is incorrect

2 years agocachefiles: Fix oops in trace_cachefiles_mark_buried due to NULL object
Dave Wysochanski [Fri, 1 Oct 2021 14:37:31 +0000 (15:37 +0100)]
cachefiles: Fix oops in trace_cachefiles_mark_buried due to NULL object

In cachefiles_mark_object_buried, the dentry in question may not have an
owner, and thus our cachefiles_object pointer may be NULL when calling
the tracepoint, in which case we will also not have a valid debug_id to
print in the tracepoint.

Check for NULL object in the tracepoint and if so, just set debug_id to
MAX_UINT as was done in 2908f5e101e3 ("fscache: Add a cookie debug ID
and use that in traces").

This fixes the following oops:

    FS-Cache: Cache "mycache" added (type cachefiles)
    CacheFiles: File cache on vdc registered
    ...
    Workqueue: fscache_object fscache_object_work_func [fscache]
    RIP: 0010:trace_event_raw_event_cachefiles_mark_buried+0x4e/0xa0 [cachefiles]
    ....
    Call Trace:
     cachefiles_mark_object_buried+0xa5/0xb0 [cachefiles]
     cachefiles_bury_object+0x270/0x430 [cachefiles]
     cachefiles_walk_to_object+0x195/0x9c0 [cachefiles]
     cachefiles_lookup_object+0x5a/0xc0 [cachefiles]
     fscache_look_up_object+0xd7/0x160 [fscache]
     fscache_object_work_func+0xb2/0x340 [fscache]
     process_one_work+0x1f1/0x390
     worker_thread+0x53/0x3e0
     kthread+0x127/0x150

Fixes: 2908f5e101e3 ("fscache: Add a cookie debug ID and use that in traces")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agodrm/i915: fix blank screen booting crashes
Hugh Dickins [Sat, 2 Oct 2021 10:17:29 +0000 (03:17 -0700)]
drm/i915: fix blank screen booting crashes

5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915.  Bisections converge convincingly, but arrive at different
and suprising "culprits", none of them the actual culprit.

netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:

kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI: ffffffff814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.

Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>