platform/kernel/linux-starfive.git
3 years agonfsd: Fix fall-through warnings for Clang
Gustavo A. R. Silva [Fri, 20 Nov 2020 18:26:40 +0000 (12:26 -0600)]
nfsd: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding a couple of break statements instead of
just letting the code fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: grant read delegations to clients holding writes
J. Bruce Fields [Fri, 16 Apr 2021 18:00:18 +0000 (14:00 -0400)]
nfsd: grant read delegations to clients holding writes

It's OK to grant a read delegation to a client that holds a write,
as long as it's the only client holding the write.

We originally tried to do this in commit 94415b06eb8a ("nfsd4: a
client's own opens needn't prevent delegations"), which had to be
reverted in commit 6ee65a773096 ("Revert "nfsd4: a client's own
opens needn't prevent delegations"").

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: reshuffle some code
J. Bruce Fields [Fri, 16 Apr 2021 18:00:17 +0000 (14:00 -0400)]
nfsd: reshuffle some code

No change in behavior, I'm just moving some code around to avoid forward
references in a following patch.

(To do someday: figure out how to split up nfs4state.c.  It's big and
disorganized.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: track filehandle aliasing in nfs4_files
J. Bruce Fields [Fri, 16 Apr 2021 18:00:16 +0000 (14:00 -0400)]
nfsd: track filehandle aliasing in nfs4_files

It's unusual but possible for multiple filehandles to point to the same
file.  In that case, we may end up with multiple nfs4_files referencing
the same inode.

For delegation purposes it will turn out to be useful to flag those
cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: hash nfs4_files by inode number
J. Bruce Fields [Fri, 16 Apr 2021 18:00:15 +0000 (14:00 -0400)]
nfsd: hash nfs4_files by inode number

The nfs4_file structure is per-filehandle, not per-inode, because the
spec requires open and other state to be per filehandle.

But it will turn out to be convenient for nfs4_files associated with the
same inode to be hashed to the same bucket, so let's hash on the inode
instead of the filehandle.

Filehandle aliasing is rare, so that shouldn't have much performance
impact.

(If you have a ton of exported filesystems, though, and all of them have
a root with inode number 2, could that get you an overlong hash chain?
Perhaps this (and the v4 open file cache) should be hashed on the inode
pointer instead.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: ensure new clients break delegations
J. Bruce Fields [Fri, 16 Apr 2021 18:00:14 +0000 (14:00 -0400)]
nfsd: ensure new clients break delegations

If nfsd already has an open file that it plans to use for IO from
another, it may not need to do another vfs open, but it still may need
to break any delegations in case the existing opens are for another
client.

Symptoms are that we may incorrectly fail to break a delegation on a
write open from a different client, when the delegation-holding client
already has a write open.

Fixes: 28df3d1539de ("nfsd: clients don't need to break their own delegations")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: removed unused argument in nfsd_startup_generic()
Vasily Averin [Thu, 15 Apr 2021 12:00:58 +0000 (15:00 +0300)]
nfsd: removed unused argument in nfsd_startup_generic()

Since commit 501cb1849f86 ("nfsd: rip out the raparms cache")
nrservs is not used in nfsd_startup_generic()

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: remove unused function
Jiapeng Chong [Thu, 15 Apr 2021 08:38:24 +0000 (16:38 +0800)]
nfsd: remove unused function

Fix the following clang warning:

fs/nfsd/nfs4state.c:6276:1: warning: unused function 'end_offset'
[-Wunused-function].

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Pass a useful error code to the send_err tracepoint
Chuck Lever [Sun, 11 Apr 2021 18:19:08 +0000 (14:19 -0400)]
svcrdma: Pass a useful error code to the send_err tracepoint

Capture error codes in @ret, which is passed to the send_err
tracepoint, so that they can be logged when something goes awry.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Rename goto labels in svc_rdma_sendto()
Chuck Lever [Tue, 13 Apr 2021 21:55:28 +0000 (17:55 -0400)]
svcrdma: Rename goto labels in svc_rdma_sendto()

Clean up: Make the goto labels consistent with other similar
functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Don't leak send_ctxt on Send errors
Chuck Lever [Tue, 13 Apr 2021 21:53:22 +0000 (17:53 -0400)]
svcrdma: Don't leak send_ctxt on Send errors

Address a rare send_ctxt leak in the svc_rdma_sendto() error paths.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Use DEFINE_SPINLOCK() for spinlock
Guobin Huang [Tue, 6 Apr 2021 12:08:18 +0000 (20:08 +0800)]
NFSD: Use DEFINE_SPINLOCK() for spinlock

spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Guobin Huang <huangguobin4@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosunrpc: Remove unused function ip_map_lookup
Jiapeng Chong [Tue, 6 Apr 2021 03:46:59 +0000 (11:46 +0800)]
sunrpc: Remove unused function ip_map_lookup

Fix the following clang warnings:

net/sunrpc/svcauth_unix.c:306:30: warning: unused function
'ip_map_lookup' [-Wunused-function].

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSv4.2: fix copy stateid copying for the async copy
Olga Kornievskaia [Tue, 30 Mar 2021 19:03:59 +0000 (15:03 -0400)]
NFSv4.2: fix copy stateid copying for the async copy

This patch fixes Dan Carpenter's report that the static checker
found a problem where memcpy() was copying into too small of a buffer.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e0639dc5805a ("NFSD introduce async copy feature")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Dai Ngo <dai.ngo@oracle.com>
3 years agoUAPI: nfsfh.h: Replace one-element array with flexible-array member
Gustavo A. R. Silva [Tue, 23 Mar 2021 22:48:58 +0000 (17:48 -0500)]
UAPI: nfsfh.h: Replace one-element array with flexible-array member

There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].

Use an anonymous union with a couple of anonymous structs in order to
keep userspace unchanged:

$ pahole -C nfs_fhbase_new fs/nfsd/nfsfh.o
struct nfs_fhbase_new {
        union {
                struct {
                        __u8       fb_version_aux;       /*     0     1 */
                        __u8       fb_auth_type_aux;     /*     1     1 */
                        __u8       fb_fsid_type_aux;     /*     2     1 */
                        __u8       fb_fileid_type_aux;   /*     3     1 */
                        __u32      fb_auth[1];           /*     4     4 */
                };                                       /*     0     8 */
                struct {
                        __u8       fb_version;           /*     0     1 */
                        __u8       fb_auth_type;         /*     1     1 */
                        __u8       fb_fsid_type;         /*     2     1 */
                        __u8       fb_fileid_type;       /*     3     1 */
                        __u32      fb_auth_flex[0];      /*     4     0 */
                };                                       /*     0     4 */
        };                                               /*     0     8 */

        /* size: 8, cachelines: 1, members: 1 */
        /* last cacheline: 8 bytes */
};

Also, this helps with the ongoing efforts to enable -Warray-bounds by
fixing the following warnings:

fs/nfsd/nfsfh.c: In function ‘nfsd_set_fh_dentry’:
fs/nfsd/nfsfh.c:191:41: warning: array subscript 1 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
  191 |        ntohl((__force __be32)fh->fh_fsid[1])));
      |                              ~~~~~~~~~~~^~~
./include/linux/kdev_t.h:12:46: note: in definition of macro ‘MKDEV’
   12 | #define MKDEV(ma,mi) (((ma) << MINORBITS) | (mi))
      |                                              ^~
./include/uapi/linux/byteorder/little_endian.h:40:26: note: in expansion of macro ‘__swab32’
   40 | #define __be32_to_cpu(x) __swab32((__force __u32)(__be32)(x))
      |                          ^~~~~~~~
./include/linux/byteorder/generic.h:136:21: note: in expansion of macro ‘__be32_to_cpu’
  136 | #define ___ntohl(x) __be32_to_cpu(x)
      |                     ^~~~~~~~~~~~~
./include/linux/byteorder/generic.h:140:18: note: in expansion of macro ‘___ntohl’
  140 | #define ntohl(x) ___ntohl(x)
      |                  ^~~~~~~~
fs/nfsd/nfsfh.c:191:8: note: in expansion of macro ‘ntohl’
  191 |        ntohl((__force __be32)fh->fh_fsid[1])));
      |        ^~~~~
fs/nfsd/nfsfh.c:192:32: warning: array subscript 2 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
  192 |    fh->fh_fsid[1] = fh->fh_fsid[2];
      |                     ~~~~~~~~~~~^~~
fs/nfsd/nfsfh.c:192:15: warning: array subscript 1 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
  192 |    fh->fh_fsid[1] = fh->fh_fsid[2];
      |    ~~~~~~~~~~~^~~

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Clean up dto_q critical section in svc_rdma_recvfrom()
Chuck Lever [Mon, 1 Mar 2021 15:44:49 +0000 (10:44 -0500)]
svcrdma: Clean up dto_q critical section in svc_rdma_recvfrom()

This, to me, seems less cluttered and less redundant. I was hoping
it could help reduce lock contention on the dto_q lock by reducing
the size of the critical section, but alas, the only improvement is
readability.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Remove svc_rdma_recv_ctxt::rc_pages and ::rc_arg
Chuck Lever [Wed, 13 Jan 2021 14:31:50 +0000 (09:31 -0500)]
svcrdma: Remove svc_rdma_recv_ctxt::rc_pages and ::rc_arg

These fields are no longer used.

The size of struct svc_rdma_recv_ctxt is now less than 300 bytes on
x86_64, down from 2440 bytes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Remove sc_read_complete_q
Chuck Lever [Wed, 30 Dec 2020 17:43:34 +0000 (12:43 -0500)]
svcrdma: Remove sc_read_complete_q

Now that svc_rdma_recvfrom() waits for Read completion,
sc_read_complete_q is no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Single-stage RDMA Read
Chuck Lever [Tue, 22 Dec 2020 18:22:20 +0000 (13:22 -0500)]
svcrdma: Single-stage RDMA Read

Currently the generic RPC server layer calls svc_rdma_recvfrom()
twice to retrieve an RPC message that uses Read chunks. I'm not
exactly sure why this design was chosen originally.

Instead, let's wait for the Read chunk completion inline in the
first call to svc_rdma_recvfrom().

The goal is to eliminate some page allocator churn.
rdma_read_complete() replaces pages in the second svc_rqst by
calling put_page() repeatedly while the upper layer waits for the
request to be constructed, which adds unnecessary NFS WRITE round-
trip latency.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
3 years agoSUNRPC: Move svc_xprt_received() call sites
Chuck Lever [Tue, 5 Jan 2021 15:15:09 +0000 (10:15 -0500)]
SUNRPC: Move svc_xprt_received() call sites

Currently, XPT_BUSY is not cleared until xpo_recvfrom returns.
That effectively blocks the receipt and handling of the next RPC
message until the current one has been taken off the transport.
This strict ordering is a requirement for socket transports.

For our kernel RPC/RDMA transport implementation, however, dequeuing
an ingress message is nothing more than a list_del(). The transport
can safely be marked un-busy as soon as that is done.

To keep the changes simpler, this patch just moves the
svc_xprt_received() call site from svc_handle_xprt() into the
transports, so that the actual optimization can be done in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoSUNRPC: Export svc_xprt_received()
Chuck Lever [Fri, 29 Jan 2021 18:04:04 +0000 (13:04 -0500)]
SUNRPC: Export svc_xprt_received()

Prepare svc_xprt_received() to be called from transport code instead
of from generic RPC server code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Retain the page backing rq_res.head[0].iov_base
Chuck Lever [Mon, 1 Feb 2021 20:16:57 +0000 (15:16 -0500)]
svcrdma: Retain the page backing rq_res.head[0].iov_base

svc_rdma_sendto() now waits for the NIC hardware to finish with
the pages backing rq_res. We still have to release the page array
in some cases, but now it's always safe to immediately re-use the
page backing rq_res's head buffer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Remove unused sc_pages field
Chuck Lever [Thu, 28 Jan 2021 21:47:56 +0000 (16:47 -0500)]
svcrdma: Remove unused sc_pages field

Clean up. This significantly reduces the size of struct
svc_rdma_send_ctxt.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Normalize Send page handling
Chuck Lever [Wed, 13 Jan 2021 18:57:18 +0000 (13:57 -0500)]
svcrdma: Normalize Send page handling

Currently svc_rdma_sendto() migrates xdr_buf pages into a separate
page list and NULLs out a bunch of entries in rq_pages while the
pages are under I/O. The Send completion handler then frees those
pages later.

Instead, let's wait for the Send completion, then handle page
releasing in the nfsd thread. I'd like to avoid the cost of 250+
put_page() calls in the Send completion handler, which is single-
threaded.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Add a "deferred close" helper
Chuck Lever [Sat, 20 Feb 2021 23:53:40 +0000 (18:53 -0500)]
svcrdma: Add a "deferred close" helper

Refactor a bit of commonly used logic so that every site that wants
a close deferred to an nfsd thread does all the right things
(set_bit(XPT_CLOSE) then enqueue).

Also, once XPT_CLOSE is set on a transport, it is never cleared. If
XPT_CLOSE is already set, then the close is already being handled
and the enqueue can be skipped.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Maintain a Receive water mark
Chuck Lever [Thu, 11 Mar 2021 23:32:30 +0000 (18:32 -0500)]
svcrdma: Maintain a Receive water mark

Post more Receives when the number of pending Receives drops below
a water mark. The batch mechanism is disabled if the underlying
device cannot support a reasonably-sized Receive Queue.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Use svc_rdma_refresh_recvs() in wc_receive
Chuck Lever [Thu, 11 Mar 2021 21:15:22 +0000 (16:15 -0500)]
svcrdma: Use svc_rdma_refresh_recvs() in wc_receive

Replace svc_rdma_post_recv() with the new batch receive mechanism.
For the moment it is posting just a single Receive WR at a time,
so no change in behavior is expected.

Since svc_rdma_wc_receive() was the last call site for
svc_rdma_post_recv(), it is removed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Add a batch Receive posting mechanism
Chuck Lever [Thu, 11 Mar 2021 18:54:34 +0000 (13:54 -0500)]
svcrdma: Add a batch Receive posting mechanism

Introduce a server-side mechanism similar to commit e340c2d6ef2a
("xprtrdma: Reduce the doorbell rate (Receive)") to post Receive
WRs in batch. Its first consumer is svc_rdma_post_recvs(), which
posts the initial set of Receive WRs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Remove stale comment for svc_rdma_wc_receive()
Chuck Lever [Thu, 11 Mar 2021 18:49:25 +0000 (13:49 -0500)]
svcrdma: Remove stale comment for svc_rdma_wc_receive()

xprt pinning was removed in commit 365e9992b90f ("svcrdma: Remove
transport reference counting"), but this comment was not updated
to reflect that change.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Provide an explanatory comment in CMA event handler
Chuck Lever [Mon, 1 Mar 2021 18:34:38 +0000 (13:34 -0500)]
svcrdma: Provide an explanatory comment in CMA event handler

Clean up: explain why svc_xprt_enqueue() is invoked in the event
handler even though no xpt_flags bits are toggled here.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: RPCDBG_FACILITY is no longer used
Chuck Lever [Sun, 21 Feb 2021 00:11:55 +0000 (19:11 -0500)]
svcrdma: RPCDBG_FACILITY is no longer used

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: report client confirmation status in "info" file
NeilBrown [Fri, 19 Mar 2021 22:38:04 +0000 (09:38 +1100)]
nfsd: report client confirmation status in "info" file

mountd can now monitor clients appearing and disappearing in
/proc/fs/nfsd/clients, and will log these events, in liu of the logging
of mount/unmount events for NFSv3.

Currently it cannot distinguish between unconfirmed clients (which might
be transient and totally uninteresting) and confirmed clients.

So add a "status: " line which reports either "confirmed" or
"unconfirmed", and use fsnotify to report that the info file
has been modified.

This requires a bit of infrastructure to keep the dentry for the "info"
file.  There is no need to take a counted reference as the dentry must
remain around until the client is removed.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: don't ignore high bits of copy count
J. Bruce Fields [Fri, 19 Mar 2021 00:03:22 +0000 (20:03 -0400)]
nfsd: don't ignore high bits of copy count

Note size_t is 32-bit on a 32-bit architecture, but cp_count is defined
by the protocol to be 64 bit, so we could be turning a large copy into a
0-length copy here.

Reported-by: <radchenkoy@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: COPY with length 0 should copy to end of file
J. Bruce Fields [Fri, 19 Mar 2021 00:03:23 +0000 (20:03 -0400)]
nfsd: COPY with length 0 should copy to end of file

>From https://tools.ietf.org/html/rfc7862#page-65

A count of 0 (zero) requests that all bytes from ca_src_offset
through EOF be copied to the destination.

Reported-by: <radchenkoy@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: Fix typo "accesible"
Ricardo Ribalda [Thu, 18 Mar 2021 20:22:21 +0000 (21:22 +0100)]
nfsd: Fix typo "accesible"

Trivial fix.

Cc: linux-nfs@vger.kernel.org
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: Ensure knfsd shuts down when the "nfsd" pseudofs is unmounted
Trond Myklebust [Sat, 13 Mar 2021 21:08:47 +0000 (16:08 -0500)]
nfsd: Ensure knfsd shuts down when the "nfsd" pseudofs is unmounted

In order to ensure that knfsd threads don't linger once the nfsd
pseudofs is unmounted (e.g. when the container is killed) we let
nfsd_umount() shut down those threads and wait for them to exit.

This also should ensure that we don't need to do a kernel mount of
the pseudofs, since the thread lifetime is now limited by the
lifetime of the filesystem.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: Log client tracking type log message as info instead of warning
Paul Menzel [Fri, 12 Mar 2021 21:03:00 +0000 (22:03 +0100)]
nfsd: Log client tracking type log message as info instead of warning

`printk()`, by default, uses the log level warning, which leaves the
user reading

    NFSD: Using UMH upcall client tracking operations.

wondering what to do about it (`dmesg --level=warn`).

Several client tracking methods are tried, and expected to fail. That’s
why a message is printed only on success. It might be interesting for
users to know the chosen method, so use info-level instead of
debug-level.

Cc: linux-nfs@vger.kernel.org
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: helper for laundromat expiry calculations
J. Bruce Fields [Tue, 2 Mar 2021 15:46:23 +0000 (10:46 -0500)]
nfsd: helper for laundromat expiry calculations

We do this same logic repeatedly, and it's easy to get the sense of the
comparison wrong.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Clean up NFSDDBG_FACILITY macro
Chuck Lever [Fri, 5 Mar 2021 19:22:32 +0000 (14:22 -0500)]
NFSD: Clean up NFSDDBG_FACILITY macro

These are no longer needed because there are no dprintk() call sites
in these files.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Add a tracepoint to record directory entry encoding
Chuck Lever [Fri, 5 Mar 2021 18:57:40 +0000 (13:57 -0500)]
NFSD: Add a tracepoint to record directory entry encoding

Enable watching the progress of directory encoding to capture the
timing of any issues with reading or encoding a directory. The
new tracepoint captures dirent encoding for all NFS versions.

For example, here's what a few NFSv4 directory entries might look
like:

nfsd-989   [002]   468.596265: nfsd_dirent:          fh_hash=0x5d162594 ino=2 name=.
nfsd-989   [002]   468.596267: nfsd_dirent:          fh_hash=0x5d162594 ino=1 name=..
nfsd-989   [002]   468.596299: nfsd_dirent:          fh_hash=0x5d162594 ino=3827 name=zlib.c
nfsd-989   [002]   468.596325: nfsd_dirent:          fh_hash=0x5d162594 ino=3811 name=xdiff
nfsd-989   [002]   468.596351: nfsd_dirent:          fh_hash=0x5d162594 ino=3810 name=xdiff-interface.h
nfsd-989   [002]   468.596377: nfsd_dirent:          fh_hash=0x5d162594 ino=3809 name=xdiff-interface.c

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Clean up after updating NFSv3 ACL encoders
Chuck Lever [Sun, 15 Nov 2020 20:09:16 +0000 (15:09 -0500)]
NFSD: Clean up after updating NFSv3 ACL encoders

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 SETACL result encoder to use struct xdr_stream
Chuck Lever [Wed, 18 Nov 2020 21:21:24 +0000 (16:21 -0500)]
NFSD: Update the NFSv3 SETACL result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 GETACL result encoder to use struct xdr_stream
Chuck Lever [Wed, 18 Nov 2020 21:11:42 +0000 (16:11 -0500)]
NFSD: Update the NFSv3 GETACL result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Clean up after updating NFSv2 ACL encoders
Chuck Lever [Sun, 15 Nov 2020 19:31:42 +0000 (14:31 -0500)]
NFSD: Clean up after updating NFSv2 ACL encoders

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 ACL ACCESS result encoder to use struct xdr_stream
Chuck Lever [Wed, 18 Nov 2020 19:52:09 +0000 (14:52 -0500)]
NFSD: Update the NFSv2 ACL ACCESS result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 ACL GETATTR result encoder to use struct xdr_stream
Chuck Lever [Wed, 18 Nov 2020 19:49:57 +0000 (14:49 -0500)]
NFSD: Update the NFSv2 ACL GETATTR result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 SETACL result encoder to use struct xdr_stream
Chuck Lever [Wed, 18 Nov 2020 19:47:56 +0000 (14:47 -0500)]
NFSD: Update the NFSv2 SETACL result encoder to use struct xdr_stream

The SETACL result encoder is exactly the same as the NFSv2
attrstatres decoder.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 GETACL result encoder to use struct xdr_stream
Chuck Lever [Wed, 18 Nov 2020 19:38:47 +0000 (14:38 -0500)]
NFSD: Update the NFSv2 GETACL result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Add an xdr_stream-based encoder for NFSv2/3 ACLs
Chuck Lever [Wed, 18 Nov 2020 19:55:05 +0000 (14:55 -0500)]
NFSD: Add an xdr_stream-based encoder for NFSv2/3 ACLs

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Remove unused NFSv2 directory entry encoders
Chuck Lever [Sun, 15 Nov 2020 19:30:13 +0000 (14:30 -0500)]
NFSD: Remove unused NFSv2 directory entry encoders

Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 READDIR entry encoder to use struct xdr_stream
Chuck Lever [Sat, 14 Nov 2020 18:45:35 +0000 (13:45 -0500)]
NFSD: Update the NFSv2 READDIR entry encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 READDIR result encoder to use struct xdr_stream
Chuck Lever [Fri, 23 Oct 2020 20:49:01 +0000 (16:49 -0400)]
NFSD: Update the NFSv2 READDIR result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Count bytes instead of pages in the NFSv2 READDIR encoder
Chuck Lever [Fri, 13 Nov 2020 21:57:44 +0000 (16:57 -0500)]
NFSD: Count bytes instead of pages in the NFSv2 READDIR encoder

Clean up: Counting the bytes used by each returned directory entry
seems less brittle to me than trying to measure consumed pages after
the fact.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Add a helper that encodes NFSv3 directory offset cookies
Chuck Lever [Fri, 13 Nov 2020 21:53:17 +0000 (16:53 -0500)]
NFSD: Add a helper that encodes NFSv3 directory offset cookies

Refactor: Add helper function similar to nfs3svc_encode_cookie3().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 STATFS result encoder to use struct xdr_stream
Chuck Lever [Fri, 23 Oct 2020 23:01:38 +0000 (19:01 -0400)]
NFSD: Update the NFSv2 STATFS result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 READ result encoder to use struct xdr_stream
Chuck Lever [Fri, 23 Oct 2020 20:40:11 +0000 (16:40 -0400)]
NFSD: Update the NFSv2 READ result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 READLINK result encoder to use struct xdr_stream
Chuck Lever [Fri, 23 Oct 2020 19:41:09 +0000 (15:41 -0400)]
NFSD: Update the NFSv2 READLINK result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 diropres encoder to use struct xdr_stream
Chuck Lever [Fri, 23 Oct 2020 20:44:16 +0000 (16:44 -0400)]
NFSD: Update the NFSv2 diropres encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 attrstat encoder to use struct xdr_stream
Chuck Lever [Fri, 23 Oct 2020 19:28:59 +0000 (15:28 -0400)]
NFSD: Update the NFSv2 attrstat encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 stat encoder to use struct xdr_stream
Chuck Lever [Fri, 23 Oct 2020 15:08:02 +0000 (11:08 -0400)]
NFSD: Update the NFSv2 stat encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Reduce svc_rqst::rq_pages churn during READDIR operations
Chuck Lever [Fri, 15 Jan 2021 14:28:44 +0000 (09:28 -0500)]
NFSD: Reduce svc_rqst::rq_pages churn during READDIR operations

During NFSv2 and NFSv3 READDIR/PLUS operations, NFSD advances
rq_next_page to the full size of the client-requested buffer, then
releases all those pages at the end of the request. The next request
to use that nfsd thread has to refill the pages.

NFSD does this even when the dirlist in the reply is small. With
NFSv3 clients that send READDIR operations with large buffer sizes,
that can be 256 put_page/alloc_page pairs per READDIR request, even
though those pages often remain unused.

We can save some work by not releasing dirlist buffer pages that
were not used to form the READDIR Reply. I've left the NFSv2 code
alone since there are never more than three pages involved in an
NFSv2 READDIR Reply.

Eventually we should nail down why these pages need to be released
at all in order to avoid allocating and releasing pages
unnecessarily.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Remove unused NFSv3 directory entry encoders
Chuck Lever [Fri, 13 Nov 2020 16:27:13 +0000 (11:27 -0500)]
NFSD: Remove unused NFSv3 directory entry encoders

Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update NFSv3 READDIR entry encoders to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 23:46:58 +0000 (19:46 -0400)]
NFSD: Update NFSv3 READDIR entry encoders to use struct xdr_stream

The benefit of the xdr_stream helpers is that they transparently
handle encoding an XDR data item that crosses page boundaries.
Most of the open-coded logic to do that here can be eliminated.

A sub-buffer and sub-stream are set up as a sink buffer for the
directory entry encoder. As an entry is encoded, it is added to
the end of the content in this buffer/stream. The total length of
the directory list is tracked in the buffer's @len field.

When it comes time to encode the Reply, the sub-buffer is merged
into rq_res's page array at the correct place using
xdr_write_pages().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 READDIR3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 23:31:48 +0000 (19:31 -0400)]
NFSD: Update the NFSv3 READDIR3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Count bytes instead of pages in the NFSv3 READDIR encoder
Chuck Lever [Mon, 9 Nov 2020 18:13:21 +0000 (13:13 -0500)]
NFSD: Count bytes instead of pages in the NFSv3 READDIR encoder

Clean up: Counting the bytes used by each returned directory entry
seems less brittle to me than trying to measure consumed pages after
the fact.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Add a helper that encodes NFSv3 directory offset cookies
Chuck Lever [Tue, 10 Nov 2020 14:57:14 +0000 (09:57 -0500)]
NFSD: Add a helper that encodes NFSv3 directory offset cookies

Refactor: De-duplicate identical code that handles encoding of
directory offset cookies across page boundaries.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 COMMIT3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:35:46 +0000 (15:35 -0400)]
NFSD: Update the NFSv3 COMMIT3res encoder to use struct xdr_stream

As an additional clean up, encode_wcc_data() is removed because it
is now no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream
Chuck Lever [Fri, 6 Nov 2020 18:15:09 +0000 (13:15 -0500)]
NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 FSINFO3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 17:42:13 +0000 (13:42 -0400)]
NFSD: Update the NFSv3 FSINFO3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 FSSTAT3res encoder to use struct xdr_stream
Chuck Lever [Fri, 6 Nov 2020 18:08:45 +0000 (13:08 -0500)]
NFSD: Update the NFSv3 FSSTAT3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 LINK3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:08:29 +0000 (15:08 -0400)]
NFSD: Update the NFSv3 LINK3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 RENAMEv3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:33:05 +0000 (15:33 -0400)]
NFSD: Update the NFSv3 RENAMEv3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 CREATE family of encoders to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:27:23 +0000 (15:27 -0400)]
NFSD: Update the NFSv3 CREATE family of encoders to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 WRITE3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:26:31 +0000 (15:26 -0400)]
NFSD: Update the NFSv3 WRITE3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 READ3res encode to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:23:50 +0000 (15:23 -0400)]
NFSD: Update the NFSv3 READ3res encode to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 READLINK3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:18:40 +0000 (15:18 -0400)]
NFSD: Update the NFSv3 READLINK3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 wccstat result encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:12:38 +0000 (15:12 -0400)]
NFSD: Update the NFSv3 wccstat result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 LOOKUP3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 18:46:58 +0000 (14:46 -0400)]
NFSD: Update the NFSv3 LOOKUP3res encoder to use struct xdr_stream

Also, clean up: Rename the encoder function to match the name of
the result structure in RFC 1813, consistent with other encoder
function names in nfs3xdr.c. "diropres" is an NFSv2 thingie.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 ACCESS3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 17:56:58 +0000 (13:56 -0400)]
NFSD: Update the NFSv3 ACCESS3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the GETATTR3res encoder to use struct xdr_stream
Chuck Lever [Wed, 21 Oct 2020 15:58:41 +0000 (11:58 -0400)]
NFSD: Update the GETATTR3res encoder to use struct xdr_stream

As an additional clean up, some renaming is done to more closely
reflect the data type and variable names used in the NFSv3 XDR
definition provided in RFC 1813. "attrstat" is an NFSv2 thingie.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Extract the svcxdr_init_encode() helper
Chuck Lever [Tue, 27 Oct 2020 19:53:42 +0000 (15:53 -0400)]
NFSD: Extract the svcxdr_init_encode() helper

NFSD initializes an encode xdr_stream only after the RPC layer has
already inserted the RPC Reply header. Thus it behaves differently
than xdr_init_encode does, which assumes the passed-in xdr_buf is
entirely devoid of content.

nfs4proc.c has this server-side stream initialization helper, but
it is visible only to the NFSv4 code. Move this helper to a place
that can be accessed by NFSv2 and NFSv3 server XDR functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoLinux 5.12-rc4
Linus Torvalds [Sun, 21 Mar 2021 21:56:43 +0000 (14:56 -0700)]
Linux 5.12-rc4

3 years agoMerge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 21 Mar 2021 21:06:10 +0000 (14:06 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Miscellaneous ext4 bug fixes for v5.12"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: initialize ret to suppress smatch warning
  ext4: stop inode update before return
  ext4: fix rename whiteout with fast commit
  ext4: fix timer use-after-free on failed mount
  ext4: fix potential error in ext4_do_update_inode
  ext4: do not try to set xattr into ea_inode if value is empty
  ext4: do not iput inode under running transaction in ext4_rename()
  ext4: find old entry again if failed to rename whiteout
  ext4: fix error handling in ext4_end_enable_verity()
  ext4: fix bh ref count on error paths
  fs/ext4: fix integer overflow in s_log_groups_per_flex
  ext4: add reclaim checks to xattr code
  ext4: shrink race window in ext4_should_retry_alloc()

3 years agoMerge tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block
Linus Torvalds [Sun, 21 Mar 2021 19:25:54 +0000 (12:25 -0700)]
Merge tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block

Pull io_uring followup fixes from Jens Axboe:

 - The SIGSTOP change from Eric, so we properly ignore that for
   PF_IO_WORKER threads.

 - Disallow sending signals to PF_IO_WORKER threads in general, we're
   not interested in having them funnel back to the io_uring owning
   task.

 - Stable fix from Stefan, ensuring we properly break links for short
   send/sendmsg recv/recvmsg if MSG_WAITALL is set.

 - Catch and loop when needing to run task_work before a PF_IO_WORKER
   threads goes to sleep.

* tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block:
  io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
  io-wq: ensure task is running before processing task_work
  signal: don't allow STOP on PF_IO_WORKER threads
  signal: don't allow sending any signals to PF_IO_WORKER threads

3 years agoMerge tag 'staging-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 21 Mar 2021 18:54:04 +0000 (11:54 -0700)]
Merge tag 'staging-5.12-rc4' of git://git./linux/kernel/git/gregkh/staging

Pull staging and IIO driver fixes from Greg KH:
 "Some small staging and IIO driver fixes:

   - MAINTAINERS changes for the move of the staging mailing list

   - comedi driver fixes to get request_irq() to work correctly

   - counter driver fixes for reported issues with iio devices

   - tiny iio driver fixes for reported issues.

  All of these have been in linux-next with no reported problems"

* tag 'staging-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: vt665x: fix alignment constraints
  staging: comedi: cb_pcidas64: fix request_irq() warn
  staging: comedi: cb_pcidas: fix request_irq() warn
  MAINTAINERS: move the staging subsystem to lists.linux.dev
  MAINTAINERS: move some real subsystems off of the staging mailing list
  iio: gyro: mpu3050: Fix error handling in mpu3050_trigger_handler
  iio: hid-sensor-temperature: Fix issues of timestamp channel
  iio: hid-sensor-humidity: Fix alignment issue of timestamp channel
  counter: stm32-timer-cnt: fix ceiling miss-alignment with reload register
  counter: stm32-timer-cnt: fix ceiling write max value
  counter: stm32-timer-cnt: Report count function when SLAVE_MODE_DISABLED
  iio: adc: ab8500-gpadc: Fix off by 10 to 3
  iio:adc:stm32-adc: Add HAS_IOMEM dependency
  iio: adis16400: Fix an error code in adis16400_initial_setup()
  iio: adc: adi-axi-adc: add proper Kconfig dependencies
  iio: adc: ad7949: fix wrong ADC result due to incorrect bit mask
  iio: hid-sensor-prox: Fix scale not correct issue
  iio:adc:qcom-spmi-vadc: add default scale to LR_MUX2_BAT_ID channel

3 years agoMerge tag 'usb-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 21 Mar 2021 18:49:16 +0000 (11:49 -0700)]
Merge tag 'usb-5.12-rc4' of git://git./linux/kernel/git/gregkh/usb

Pull USB and Thunderbolt driver fixes from Greg KH:
 "Here are some small Thunderbolt and USB driver fixes for some reported
  issues:

   - thunderbolt fixes for minor problems

   - typec fixes for power issues

   - usb-storage quirk addition

   - usbip bugfix

   - dwc3 bugfix when stopping transfers

   - cdnsp bugfix for isoc transfers

   - gadget use-after-free fix

  All have been in linux-next this week with no reported issues"

* tag 'usb-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: tcpm: Skip sink_cap query only when VDM sm is busy
  usb: dwc3: gadget: Prevent EP queuing while stopping transfers
  usb: typec: tcpm: Invoke power_supply_changed for tcpm-source-psy-
  usb: typec: Remove vdo[3] part of tps6598x_rx_identity_reg struct
  usb-storage: Add quirk to defeat Kindle's automatic unload
  usb: gadget: configfs: Fix KASAN use-after-free
  usbip: Fix incorrect double assignment to udc->ud.tcp_rx
  usb: cdnsp: Fixes incorrect value in ISOC TRB
  thunderbolt: Increase runtime PM reference count on DP tunnel discovery
  thunderbolt: Initialize HopID IDAs in tb_switch_alloc()

3 years agoMerge tag 'irq-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 21 Mar 2021 18:34:24 +0000 (11:34 -0700)]
Merge tag 'irq-urgent-2021-03-21' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Ingo Molnar:
 "A change to robustify force-threaded IRQ handlers to always disable
  interrupts, plus a DocBook fix.

  The force-threaded IRQ handler change has been accelerated from the
  normal schedule of such a change to keep the bad pattern/workaround of
  spin_lock_irqsave() in handlers or IRQF_NOTHREAD as a kludge from
  spreading"

* tag 'irq-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Disable interrupts for force threaded handlers
  genirq/irq_sim: Fix typos in kernel doc (fnode -> fwnode)

3 years agoMerge tag 'perf-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 21 Mar 2021 18:26:21 +0000 (11:26 -0700)]
Merge tag 'perf-urgent-2021-03-21' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "Boundary condition fixes for bugs unearthed by the perf fuzzer"

* tag 'perf-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix unchecked MSR access error caused by VLBR_EVENT
  perf/x86/intel: Fix a crash caused by zero PEBS status

3 years agoMerge tag 'locking-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 21 Mar 2021 18:19:29 +0000 (11:19 -0700)]
Merge tag 'locking-urgent-2021-03-21' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Ingo Molnar:

 - Get static calls & modules right. Hopefully.

 - WW mutex fixes

* tag 'locking-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  static_call: Fix static_call_update() sanity check
  static_call: Align static_call_is_init() patching condition
  static_call: Fix static_call_set_init()
  locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini()
  locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling

3 years agoMerge tag 'efi-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 21 Mar 2021 18:11:22 +0000 (11:11 -0700)]
Merge tag 'efi-urgent-2021-03-21' of git://git./linux/kernel/git/tip/tip

Pull EFI fixes from Ingo Molnar:

 - another missing RT_PROP table related fix, to ensure that the
   efivarfs pseudo filesystem fails gracefully if variable services
   are unsupported

 - use the correct alignment for literal EFI GUIDs

 - fix a use after unmap issue in the memreserve code

* tag 'efi-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: use 32-bit alignment for efi_guid_t literals
  firmware/efi: Fix a use after bug in efi_mem_reserve_persistent
  efivars: respect EFI_UNSUPPORTED return from firmware

3 years agoMerge tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 21 Mar 2021 18:04:20 +0000 (11:04 -0700)]
Merge tag 'x86_urgent_for_v5.12-rc4' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "The freshest pile of shiny x86 fixes for 5.12:

   - Add the arch-specific mapping between physical and logical CPUs to
     fix devicetree-node lookups

   - Restore the IRQ2 ignore logic

   - Fix get_nr_restart_syscall() to return the correct restart syscall
     number. Split in a 4-patches set to avoid kABI breakage when
     backporting to dead kernels"

* tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/of: Fix CPU devicetree-node lookups
  x86/ioapic: Ignore IRQ2 again
  x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART
  x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall()
  x86: Move TS_COMPAT back to asm/thread_info.h
  kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()

3 years agoMerge tag 'powerpc-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 21 Mar 2021 17:57:35 +0000 (10:57 -0700)]
Merge tag 'powerpc-5.12-4' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix a possible stack corruption and subsequent DLPAR failure in the
   rpadlpar_io PCI hotplug driver

 - Two build fixes for uncommon configurations

Thanks to Christophe Leroy and Tyrel Datwyler.

* tag 'powerpc-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  PCI: rpadlpar: Fix potential drc_name corruption in store functions
  powerpc: Force inlining of cpu_has_feature() to avoid build failure
  powerpc/vdso32: Add missing _restgpr_31_x to fix build failure

3 years agoio_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
Stefan Metzmacher [Sat, 20 Mar 2021 19:33:36 +0000 (20:33 +0100)]
io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL

Without that it's not safe to use them in a linked combination with
others.

Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
should be possible.

We already handle short reads and writes for the following opcodes:

- IORING_OP_READV
- IORING_OP_READ_FIXED
- IORING_OP_READ
- IORING_OP_WRITEV
- IORING_OP_WRITE_FIXED
- IORING_OP_WRITE
- IORING_OP_SPLICE
- IORING_OP_TEE

Now we have it for these as well:

- IORING_OP_SENDMSG
- IORING_OP_SEND
- IORING_OP_RECVMSG
- IORING_OP_RECV

For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
flags in order to call req_set_fail_links().

There might be applications arround depending on the behavior
that even short send[msg]()/recv[msg]() retuns continue an
IOSQE_IO_LINK chain.

It's very unlikely that such applications pass in MSG_WAITALL,
which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.

It's expected that the low level sock_sendmsg() call just ignores
MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set
SO_ZEROCOPY.

We also expect the caller to know about the implicit truncation to
MAX_RW_COUNT, which we don't detect.

cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.1615908477.git.metze@samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: ensure task is running before processing task_work
Jens Axboe [Sun, 21 Mar 2021 13:06:56 +0000 (07:06 -0600)]
io-wq: ensure task is running before processing task_work

Mark the current task as running if we need to run task_work from the
io-wq threads as part of work handling. If that is the case, then return
as such so that the caller can appropriately loop back and reset if it
was part of a going-to-sleep flush.

Fixes: 3bfe6106693b ("io-wq: fork worker threads from original task")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agosignal: don't allow STOP on PF_IO_WORKER threads
Eric W. Biederman [Sun, 21 Mar 2021 15:37:48 +0000 (09:37 -0600)]
signal: don't allow STOP on PF_IO_WORKER threads

Just like we don't allow normal signals to IO threads, don't deliver a
STOP to a task that has PF_IO_WORKER set. The IO threads don't take
signals in general, and have no means of flushing out a stop either.

Longer term, we may want to look into allowing stop of these threads,
as it relates to eg process freezing. For now, this prevents a spin
issue if a SIGSTOP is delivered to the parent task.

Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
3 years agosignal: don't allow sending any signals to PF_IO_WORKER threads
Jens Axboe [Sat, 20 Mar 2021 01:25:13 +0000 (19:25 -0600)]
signal: don't allow sending any signals to PF_IO_WORKER threads

They don't take signals individually, and even if they share signals with
the parent task, don't allow them to be delivered through the worker
thread. Linux does allow this kind of behavior for regular threads, but
it's really a compatability thing that we need not care about for the IO
threads.

Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoext4: initialize ret to suppress smatch warning
Theodore Ts'o [Sun, 21 Mar 2021 04:45:37 +0000 (00:45 -0400)]
ext4: initialize ret to suppress smatch warning

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: stop inode update before return
Pan Bian [Sun, 17 Jan 2021 08:57:32 +0000 (00:57 -0800)]
ext4: stop inode update before return

The inode update should be stopped before returing the error code.

Signed-off-by: Pan Bian <bianpan2016@163.com>
Link: https://lore.kernel.org/r/20210117085732.93788-1-bianpan2016@163.com
Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Cc: stable@kernel.org
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: fix rename whiteout with fast commit
Harshad Shirwadkar [Tue, 16 Mar 2021 22:19:21 +0000 (15:19 -0700)]
ext4: fix rename whiteout with fast commit

This patch adds rename whiteout support in fast commits. Note that the
whiteout object that gets created is actually char device. Which
imples, the function ext4_inode_journal_mode(struct inode *inode)
would return "JOURNAL_DATA" for this inode. This has a consequence in
fast commit code that it will make creation of the whiteout object a
fast-commit ineligible behavior and thus will fall back to full
commits. With this patch, this can be observed by running fast commits
with rename whiteout and seeing the stats generated by ext4_fc_stats
tracepoint as follows:

ext4_fc_stats: dev 254:32 fc ineligible reasons:
XATTR:0, CROSS_RENAME:0, JOURNAL_FLAG_CHANGE:0, NO_MEM:0, SWAP_BOOT:0,
RESIZE:0, RENAME_DIR:0, FALLOC_RANGE:0, INODE_JOURNAL_DATA:16;
num_commits:6, ineligible: 6, numblks: 3

So in short, this patch guarantees that in case of rename whiteout, we
fall back to full commits.

Amir mentioned that instead of creating a new whiteout object for
every rename, we can create a static whiteout object with irrelevant
nlink. That will make fast commits to not fall back to full
commit. But until this happens, this patch will ensure correctness by
falling back to full commits.

Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Cc: stable@kernel.org
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20210316221921.1124955-1-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: fix timer use-after-free on failed mount
Jan Kara [Mon, 15 Mar 2021 16:59:06 +0000 (17:59 +0100)]
ext4: fix timer use-after-free on failed mount

When filesystem mount fails because of corrupted filesystem we first
cancel the s_err_report timer reminding fs errors every day and only
then we flush s_error_work. However s_error_work may report another fs
error and re-arm timer thus resulting in timer use-after-free. Fix the
problem by first flushing the work and only after that canceling the
s_err_report timer.

Reported-by: syzbot+628472a2aac693ab0fcd@syzkaller.appspotmail.com
Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210315165906.2175-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>