platform/kernel/linux-starfive.git
2 years agoNFSD: use (un)lock_inode instead of fh_(un)lock for file operations
NeilBrown [Tue, 26 Jul 2022 06:45:30 +0000 (16:45 +1000)]
NFSD: use (un)lock_inode instead of fh_(un)lock for file operations

When locking a file to access ACLs and xattrs etc, use explicit locking
with inode_lock() instead of fh_lock().  This means that the calls to
fh_fill_pre/post_attr() are also explicit which improves readability and
allows us to place them only where they are needed.  Only the xattr
calls need pre/post information.

When locking a file we don't need I_MUTEX_PARENT as the file is not a
parent of anything, so we can use inode_lock() directly rather than the
inode_lock_nested() call that fh_lock() uses.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: use explicit lock/unlock for directory ops
NeilBrown [Tue, 26 Jul 2022 06:45:30 +0000 (16:45 +1000)]
NFSD: use explicit lock/unlock for directory ops

When creating or unlinking a name in a directory use explicit
inode_lock_nested() instead of fh_lock(), and explicit calls to
fh_fill_pre_attrs() and fh_fill_post_attrs().  This is already done
for renames, with lock_rename() as the explicit locking.

Also move the 'fill' calls closer to the operation that might change the
attributes.  This way they are avoided on some error paths.

For the v2-only code in nfsproc.c, the fill calls are not replaced as
they aren't needed.

Making the locking explicit will simplify proposed future changes to
locking for directories.  It also makes it easily visible exactly where
pre/post attributes are used - not all callers of fh_lock() actually
need the pre/post attributes.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: reduce locking in nfsd_lookup()
NeilBrown [Tue, 26 Jul 2022 06:45:30 +0000 (16:45 +1000)]
NFSD: reduce locking in nfsd_lookup()

nfsd_lookup() takes an exclusive lock on the parent inode, but no
callers want the lock and it may not be needed at all if the
result is in the dcache.

Change nfsd_lookup_dentry() to not take the lock, and call
lookup_one_len_locked() which takes lock only if needed.

nfsd4_open() currently expects the lock to still be held, but that isn't
necessary as nfsd_validate_delegated_dentry() provides required
guarantees without the lock.

NOTE: NFSv4 requires directory changeinfo for OPEN even when a create
  wasn't requested and no change happened.  Now that nfsd_lookup()
  doesn't use fh_lock(), we need to explicitly fill the attributes
  when no create happens.  A new fh_fill_both_attrs() is provided
  for that task.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: only call fh_unlock() once in nfsd_link()
NeilBrown [Tue, 26 Jul 2022 06:45:30 +0000 (16:45 +1000)]
NFSD: only call fh_unlock() once in nfsd_link()

On non-error paths, nfsd_link() calls fh_unlock() twice.  This is safe
because fh_unlock() records that the unlock has been done and doesn't
repeat it.
However it makes the code a little confusing and interferes with changes
that are planned for directory locking.

So rearrange the code to ensure fh_unlock() is called exactly once if
fh_lock() was called.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: always drop directory lock in nfsd_unlink()
NeilBrown [Tue, 26 Jul 2022 06:45:30 +0000 (16:45 +1000)]
NFSD: always drop directory lock in nfsd_unlink()

Some error paths in nfsd_unlink() allow it to exit without unlocking the
directory.  This is not a problem in practice as the directory will be
locked with an fh_put(), but it is untidy and potentially confusing.

This allows us to remove all the fh_unlock() calls that are immediately
after nfsd_unlink() calls.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning.
NeilBrown [Tue, 26 Jul 2022 06:45:30 +0000 (16:45 +1000)]
NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning.

nfsd_create() usually returns with the directory still locked.
nfsd_symlink() usually returns with it unlocked.  This is clumsy.

Until recently nfsd_create() needed to keep the directory locked until
ACLs and security label had been set.  These are now set inside
nfsd_create() (in nfsd_setattr()) so this need is gone.

So change nfsd_create() and nfsd_symlink() to always unlock, and remove
any fh_unlock() calls that follow calls to these functions.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: add posix ACLs to struct nfsd_attrs
NeilBrown [Tue, 26 Jul 2022 06:45:30 +0000 (16:45 +1000)]
NFSD: add posix ACLs to struct nfsd_attrs

pacl and dpacl pointers are added to struct nfsd_attrs, which requires
that we have an nfsd_attrs_free() function to free them.
Those nfsv4 functions that can set ACLs now set up these pointers
based on the passed in NFSv4 ACL.

nfsd_setattr() sets the acls as appropriate.

Errors are handled as with security labels.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: add security label to struct nfsd_attrs
NeilBrown [Tue, 26 Jul 2022 06:45:30 +0000 (16:45 +1000)]
NFSD: add security label to struct nfsd_attrs

nfsd_setattr() now sets a security label if provided, and nfsv4 provides
it in the 'open' and 'create' paths and the 'setattr' path.
If setting the label failed (including because the kernel doesn't
support labels), an error field in 'struct nfsd_attrs' is set, and the
caller can respond.  The open/create callers clear
FATTR4_WORD2_SECURITY_LABEL in the returned attr set in this case.
The setattr caller returns the error.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: set attributes when creating symlinks
NeilBrown [Tue, 26 Jul 2022 06:45:30 +0000 (16:45 +1000)]
NFSD: set attributes when creating symlinks

The NFS protocol includes attributes when creating symlinks.
Linux does store attributes for symlinks and allows them to be set,
though they are not used for permission checking.

NFSD currently doesn't set standard (struct iattr) attributes when
creating symlinks, but for NFSv4 it does set ACLs and security labels.
This is inconsistent.

To improve consistency, pass the provided attributes into nfsd_symlink()
and call nfsd_create_setattr() to set them.

NOTE: this results in a behaviour change for all NFS versions when the
client sends non-default attributes with a SYMLINK request. With the
Linux client, the only attributes are:
attr.ia_mode = S_IFLNK | S_IRWXUGO;
attr.ia_valid = ATTR_MODE;
so the final outcome will be unchanged. Other clients might sent
different attributes, and if they did they probably expect them to be
honoured.

We ignore any error from nfsd_create_setattr().  It isn't really clear
what should be done if a file is successfully created, but the
attributes cannot be set.  NFS doesn't allow partial success to be
reported.  Reporting failure is probably more misleading than reporting
success, so the status is ignored.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: introduce struct nfsd_attrs
NeilBrown [Tue, 26 Jul 2022 06:45:30 +0000 (16:45 +1000)]
NFSD: introduce struct nfsd_attrs

The attributes that nfsd might want to set on a file include 'struct
iattr' as well as an ACL and security label.
The latter two are passed around quite separately from the first, in
part because they are only needed for NFSv4.  This leads to some
clumsiness in the code, such as the attributes NOT being set in
nfsd_create_setattr().

We need to keep the directory locked until all attributes are set to
ensure the file is never visibile without all its attributes.  This need
combined with the inconsistent handling of attributes leads to more
clumsiness.

As a first step towards tidying this up, introduce 'struct nfsd_attrs'.
This is passed (by reference) to vfs.c functions that work with
attributes, and is assembled by the various nfs*proc functions which
call them.  As yet only iattr is included, but future patches will
expand this.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: verify the opened dentry after setting a delegation
Jeff Layton [Tue, 26 Jul 2022 06:45:30 +0000 (16:45 +1000)]
NFSD: verify the opened dentry after setting a delegation

Between opening a file and setting a delegation on it, someone could
rename or unlink the dentry. If this happens, we do not want to grant a
delegation on the open.

On a CLAIM_NULL open, we're opening by filename, and we may (in the
non-create case) or may not (in the create case) be holding i_rwsem
when attempting to set a delegation.  The latter case allows a
race.

After getting a lease, redo the lookup of the file being opened and
validate that the resulting dentry matches the one in the open file
description.

To properly redo the lookup we need an rqst pointer to pass to
nfsd_lookup_dentry(), so make sure that is available.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: drop fh argument from alloc_init_deleg
Jeff Layton [Tue, 26 Jul 2022 06:45:30 +0000 (16:45 +1000)]
NFSD: drop fh argument from alloc_init_deleg

Currently, we pass the fh of the opened file down through several
functions so that alloc_init_deleg can pass it to delegation_blocked.
The filehandle of the open file is available in the nfs4_file however,
so there's no need to pass it in a separate argument.

Drop the argument from alloc_init_deleg, nfs4_open_delegation and
nfs4_set_delegation.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Move copy offload callback arguments into a separate structure
Chuck Lever [Wed, 27 Jul 2022 18:41:18 +0000 (14:41 -0400)]
NFSD: Move copy offload callback arguments into a separate structure

Refactor so that CB_OFFLOAD arguments can be passed without
allocating a whole struct nfsd4_copy object. On my system (x86_64)
this removes another 96 bytes from struct nfsd4_copy.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Add nfsd4_send_cb_offload()
Chuck Lever [Wed, 27 Jul 2022 18:41:12 +0000 (14:41 -0400)]
NFSD: Add nfsd4_send_cb_offload()

Refactor for legibility.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Remove kmalloc from nfsd4_do_async_copy()
Chuck Lever [Wed, 27 Jul 2022 18:41:06 +0000 (14:41 -0400)]
NFSD: Remove kmalloc from nfsd4_do_async_copy()

Instead of manufacturing a phony struct nfsd_file, pass the
struct file returned by nfs42_ssc_open() directly to
nfsd4_do_copy().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Refactor nfsd4_do_copy()
Chuck Lever [Wed, 27 Jul 2022 18:40:59 +0000 (14:40 -0400)]
NFSD: Refactor nfsd4_do_copy()

Refactor: Now that nfsd4_do_copy() no longer calls the cleanup
helpers, plumb the use of struct file pointers all the way down to
_nfsd_copy_file_range().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
Chuck Lever [Wed, 27 Jul 2022 18:40:53 +0000 (14:40 -0400)]
NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)

Move the nfsd4_cleanup_*() call sites out of nfsd4_do_copy(). A
subsequent patch will modify one of the new call sites to avoid
the need to manufacture the phony struct nfsd_file.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
Chuck Lever [Wed, 27 Jul 2022 18:40:47 +0000 (14:40 -0400)]
NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)

The @src parameter is sometimes a pointer to a struct nfsd_file and
sometimes a pointer to struct file hiding in a phony struct
nfsd_file. Refactor nfsd4_cleanup_inter_ssc() so the @src parameter
is always an explicit struct file.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Replace boolean fields in struct nfsd4_copy
Chuck Lever [Wed, 27 Jul 2022 18:40:41 +0000 (14:40 -0400)]
NFSD: Replace boolean fields in struct nfsd4_copy

Clean up: saves 8 bytes, and we can replace check_and_set_stop_copy()
with an atomic bitop.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Make nfs4_put_copy() static
Chuck Lever [Wed, 27 Jul 2022 18:40:35 +0000 (14:40 -0400)]
NFSD: Make nfs4_put_copy() static

Clean up: All call sites are in fs/nfsd/nfs4proc.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Reorder the fields in struct nfsd4_op
Chuck Lever [Wed, 27 Jul 2022 18:40:28 +0000 (14:40 -0400)]
NFSD: Reorder the fields in struct nfsd4_op

Pack the fields to reduce the size of struct nfsd4_op, which is used
an array in struct nfsd4_compoundargs.

sizeof(struct nfsd4_op):
Before: /* size: 672, cachelines: 11, members: 5 */
After:  /* size: 640, cachelines: 10, members: 5 */

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Shrink size of struct nfsd4_copy
Chuck Lever [Wed, 27 Jul 2022 18:40:22 +0000 (14:40 -0400)]
NFSD: Shrink size of struct nfsd4_copy

struct nfsd4_copy is part of struct nfsd4_op, which resides in an
8-element array.

sizeof(struct nfsd4_op):
Before: /* size: 1696, cachelines: 27, members: 5 */
After:  /* size: 672, cachelines: 11, members: 5 */

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Shrink size of struct nfsd4_copy_notify
Chuck Lever [Wed, 27 Jul 2022 18:40:16 +0000 (14:40 -0400)]
NFSD: Shrink size of struct nfsd4_copy_notify

struct nfsd4_copy_notify is part of struct nfsd4_op, which resides
in an 8-element array.

sizeof(struct nfsd4_op):
Before: /* size: 2208, cachelines: 35, members: 5 */
After:  /* size: 1696, cachelines: 27, members: 5 */

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: nfserrno(-ENOMEM) is nfserr_jukebox
Chuck Lever [Wed, 27 Jul 2022 18:40:09 +0000 (14:40 -0400)]
NFSD: nfserrno(-ENOMEM) is nfserr_jukebox

Suggested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Fix strncpy() fortify warning
Chuck Lever [Wed, 27 Jul 2022 18:40:03 +0000 (14:40 -0400)]
NFSD: Fix strncpy() fortify warning

In function ‘strncpy’,
    inlined from ‘nfsd4_ssc_setup_dul’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1392:3,
    inlined from ‘nfsd4_interssc_connect’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1489:11:
/home/cel/src/linux/manet/include/linux/fortify-string.h:52:33: warning: ‘__builtin_strncpy’ specified bound 63 equals destination size [-Wstringop-truncation]
   52 | #define __underlying_strncpy    __builtin_strncpy
      |                                 ^
/home/cel/src/linux/manet/include/linux/fortify-string.h:89:16: note: in expansion of macro ‘__underlying_strncpy’
   89 |         return __underlying_strncpy(p, q, size);
      |                ^~~~~~~~~~~~~~~~~~~~

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Clean up nfsd4_encode_readlink()
Chuck Lever [Fri, 22 Jul 2022 20:09:23 +0000 (16:09 -0400)]
NFSD: Clean up nfsd4_encode_readlink()

Similar changes to nfsd4_encode_readv(), all bundled into a single
patch.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Use xdr_pad_size()
Chuck Lever [Fri, 22 Jul 2022 20:09:16 +0000 (16:09 -0400)]
NFSD: Use xdr_pad_size()

Clean up: Use a helper instead of open-coding the calculation of
the XDR pad size.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Simplify starting_len
Chuck Lever [Fri, 22 Jul 2022 20:09:10 +0000 (16:09 -0400)]
NFSD: Simplify starting_len

Clean-up: Now that nfsd4_encode_readv() does not have to encode the
EOF or rd_length values, it no longer needs to subtract 8 from
@starting_len.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Optimize nfsd4_encode_readv()
Chuck Lever [Fri, 22 Jul 2022 20:09:04 +0000 (16:09 -0400)]
NFSD: Optimize nfsd4_encode_readv()

write_bytes_to_xdr_buf() is pretty expensive to use for inserting
an XDR data item that is always 1 XDR_UNIT at an address that is
always XDR word-aligned.

Since both the readv and splice read paths encode EOF and maxcount
values, move both to a common code path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Add an nfsd4_read::rd_eof field
Chuck Lever [Fri, 22 Jul 2022 20:08:57 +0000 (16:08 -0400)]
NFSD: Add an nfsd4_read::rd_eof field

Refactor: Make the EOF result available in the entire NFSv4 READ
path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Clean up SPLICE_OK in nfsd4_encode_read()
Chuck Lever [Fri, 22 Jul 2022 20:08:51 +0000 (16:08 -0400)]
NFSD: Clean up SPLICE_OK in nfsd4_encode_read()

Do the test_bit() once -- this reduces the number of locked-bus
operations and makes the function a little easier to read.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Optimize nfsd4_encode_fattr()
Chuck Lever [Fri, 22 Jul 2022 20:08:45 +0000 (16:08 -0400)]
NFSD: Optimize nfsd4_encode_fattr()

write_bytes_to_xdr_buf() is a generic way to place a variable-length
data item in an already-reserved spot in the encoding buffer.

However, it is costly. In nfsd4_encode_fattr(), it is unnecessary
because the data item is fixed in size and the buffer destination
address is always word-aligned.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Optimize nfsd4_encode_operation()
Chuck Lever [Fri, 22 Jul 2022 20:08:38 +0000 (16:08 -0400)]
NFSD: Optimize nfsd4_encode_operation()

write_bytes_to_xdr_buf() is a generic way to place a variable-length
data item in an already-reserved spot in the encoding buffer.
However, it is costly, and here, it is unnecessary because the
data item is fixed in size, the buffer destination address is
always word-aligned, and the destination location is already in
@p.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agonfsd: silence extraneous printk on nfsd.ko insertion
Jeff Layton [Wed, 20 Jul 2022 12:39:23 +0000 (08:39 -0400)]
nfsd: silence extraneous printk on nfsd.ko insertion

This printk pops every time nfsd.ko gets plugged in. Most kmods don't do
that and this one is not very informative. Olaf's email address seems to
be defunct at this point anyway. Just drop it.

Cc: Olaf Kirch <okir@suse.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: limit the number of v4 clients to 1024 per 1GB of system memory
Dai Ngo [Fri, 15 Jul 2022 23:54:53 +0000 (16:54 -0700)]
NFSD: limit the number of v4 clients to 1024 per 1GB of system memory

Currently there is no limit on how many v4 clients are supported
by the system. This can be a problem in systems with small memory
configuration to function properly when a very large number of
clients exist that creates memory shortage conditions.

This patch enforces a limit of 1024 NFSv4 clients, including courtesy
clients, per 1GB of system memory.  When the number of the clients
reaches the limit, requests that create new clients are returned
with NFS4ERR_DELAY and the laundromat is kicked start to trim old
clients. Due to the overhead of the upcall to remove the client
record, the maximun number of clients the laundromat removes on
each run is limited to 128. This is done to ensure the laundromat
can still process the other tasks in a timely manner.

Since there is now a limit of the number of clients, the 24-hr
idle time limit of courtesy client is no longer needed and was
removed.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: keep track of the number of v4 clients in the system
Dai Ngo [Fri, 15 Jul 2022 23:54:52 +0000 (16:54 -0700)]
NFSD: keep track of the number of v4 clients in the system

Add counter nfs4_client_count to keep track of the total number
of v4 clients, including courtesy clients, in the system.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: refactoring v4 specific code to a helper in nfs4state.c
Dai Ngo [Fri, 15 Jul 2022 23:54:51 +0000 (16:54 -0700)]
NFSD: refactoring v4 specific code to a helper in nfs4state.c

This patch moves the v4 specific code from nfsd_init_net() to
nfsd4_init_leases_net() helper in nfs4state.c

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Ensure nf_inode is never dereferenced
Chuck Lever [Fri, 8 Jul 2022 18:27:09 +0000 (14:27 -0400)]
NFSD: Ensure nf_inode is never dereferenced

The documenting comment for struct nf_file states:

/*
 * A representation of a file that has been opened by knfsd. These are hashed
 * in the hashtable by inode pointer value. Note that this object doesn't
 * hold a reference to the inode by itself, so the nf_inode pointer should
 * never be dereferenced, only used for comparison.
 */

Replace the two existing dereferences to make the comment always
true.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: NFSv4 CLOSE should release an nfsd_file immediately
Chuck Lever [Fri, 8 Jul 2022 18:27:02 +0000 (14:27 -0400)]
NFSD: NFSv4 CLOSE should release an nfsd_file immediately

The last close of a file should enable other accessors to open and
use that file immediately. Leaving the file open in the filecache
prevents other users from accessing that file until the filecache
garbage-collects the file -- sometimes that takes several seconds.

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?387
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Move nfsd_file_trace_alloc() tracepoint
Chuck Lever [Fri, 8 Jul 2022 18:26:49 +0000 (14:26 -0400)]
NFSD: Move nfsd_file_trace_alloc() tracepoint

Avoid recording the allocation of an nfsd_file item that is
immediately released because a matching item was already
inserted in the hash.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Separate tracepoints for acquire and create
Chuck Lever [Fri, 8 Jul 2022 18:26:43 +0000 (14:26 -0400)]
NFSD: Separate tracepoints for acquire and create

These tracepoints collect different information: the create case does
not open a file, so there's no nf_file available.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Clean up unused code after rhashtable conversion
Chuck Lever [Fri, 8 Jul 2022 18:26:36 +0000 (14:26 -0400)]
NFSD: Clean up unused code after rhashtable conversion

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Convert the filecache to use rhashtable
Chuck Lever [Fri, 8 Jul 2022 18:26:30 +0000 (14:26 -0400)]
NFSD: Convert the filecache to use rhashtable

Enable the filecache hash table to start small, then grow with the
workload. Smaller server deployments benefit because there should
be lower memory utilization. Larger server deployments should see
improved scaling with the number of open files.

Suggested-by: Jeff Layton <jlayton@kernel.org>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Set up an rhashtable for the filecache
Chuck Lever [Fri, 8 Jul 2022 18:26:23 +0000 (14:26 -0400)]
NFSD: Set up an rhashtable for the filecache

Add code to initialize and tear down an rhashtable. The rhashtable
is not used yet.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Replace the "init once" mechanism
Chuck Lever [Fri, 8 Jul 2022 18:26:16 +0000 (14:26 -0400)]
NFSD: Replace the "init once" mechanism

In a moment, the nfsd_file_hashtbl global will be replaced with an
rhashtable. Replace the one or two spots that need to check if the
hash table is available. We can easily reuse the SHUTDOWN flag for
this purpose.

Document that this mechanism relies on callers to hold the
nfsd_mutex to prevent init, shutdown, and purging to run
concurrently.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Remove nfsd_file::nf_hashval
Chuck Lever [Fri, 8 Jul 2022 18:26:10 +0000 (14:26 -0400)]
NFSD: Remove nfsd_file::nf_hashval

The value in this field can always be computed from nf_inode, thus
it is no longer used.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: nfsd_file_hash_remove can compute hashval
Chuck Lever [Fri, 8 Jul 2022 18:26:03 +0000 (14:26 -0400)]
NFSD: nfsd_file_hash_remove can compute hashval

Remove an unnecessary use of nf_hashval.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Refactor __nfsd_file_close_inode()
Chuck Lever [Fri, 8 Jul 2022 18:25:57 +0000 (14:25 -0400)]
NFSD: Refactor __nfsd_file_close_inode()

The code that computes the hashval is the same in both callers.

To prevent them from going stale, reframe the documenting comments
to remove descriptions of the underlying hash table structure, which
is about to be replaced.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: nfsd_file_unhash can compute hashval from nf->nf_inode
Chuck Lever [Fri, 8 Jul 2022 18:25:50 +0000 (14:25 -0400)]
NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode

Remove an unnecessary usage of nf_hashval.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Remove lockdep assertion from unhash_and_release_locked()
Chuck Lever [Fri, 8 Jul 2022 18:25:44 +0000 (14:25 -0400)]
NFSD: Remove lockdep assertion from unhash_and_release_locked()

IIUC, holding the hash bucket lock is needed only in
nfsd_file_unhash, and there is already a lockdep assertion there.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: No longer record nf_hashval in the trace log
Chuck Lever [Fri, 8 Jul 2022 18:25:37 +0000 (14:25 -0400)]
NFSD: No longer record nf_hashval in the trace log

I'm about to replace nfsd_file_hashtbl with an rhashtable. The
individual hash values will no longer be visible or relevant, so
remove them from the tracepoints.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Never call nfsd_file_gc() in foreground paths
Chuck Lever [Fri, 8 Jul 2022 18:25:30 +0000 (14:25 -0400)]
NFSD: Never call nfsd_file_gc() in foreground paths

The checks in nfsd_file_acquire() and nfsd_file_put() that directly
invoke filecache garbage collection are intended to keep cache
occupancy between a low- and high-watermark. The reason to limit the
capacity of the filecache is to keep filecache lookups reasonably
fast.

However, invoking garbage collection at those points has some
undesirable negative impacts. Files that are held open by NFSv4
clients often push the occupancy of the filecache over these
watermarks. At that point:

- Every call to nfsd_file_acquire() and nfsd_file_put() results in
  an LRU walk. This has the same effect on lookup latency as long
  chains in the hash table.
- Garbage collection will then run on every nfsd thread, causing a
  lot of unnecessary lock contention.
- Limiting cache capacity pushes out files used only by NFSv3
  clients, which are the type of files the filecache is supposed to
  help.

To address those negative impacts, remove the direct calls to the
garbage collector. Subsequent patches will address maintaining
lookup efficiency as cache capacity increases.

Suggested-by: Wang Yugui <wangyugui@e16-tech.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Fix the filecache LRU shrinker
Chuck Lever [Fri, 8 Jul 2022 18:25:24 +0000 (14:25 -0400)]
NFSD: Fix the filecache LRU shrinker

Without LRU item rotation, the shrinker visits only a few items on
the end of the LRU list, and those would always be long-term OPEN
files for NFSv4 workloads. That makes the filecache shrinker
completely ineffective.

Adopt the same strategy as the inode LRU by using LRU_ROTATE.

Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Leave open files out of the filecache LRU
Chuck Lever [Fri, 8 Jul 2022 18:25:17 +0000 (14:25 -0400)]
NFSD: Leave open files out of the filecache LRU

There have been reports of problems when running fstests generic/531
against Linux NFS servers with NFSv4. The NFS server that hosts the
test's SCRATCH_DEV suffers from CPU soft lock-ups during the test.
Analysis shows that:

fs/nfsd/filecache.c
 482                 ret = list_lru_walk(&nfsd_file_lru,
 483                                 nfsd_file_lru_cb,
 484                                 &head, LONG_MAX);

causes nfsd_file_gc() to walk the entire length of the filecache LRU
list every time it is called (which is quite frequently). The walk
holds a spinlock the entire time that prevents other nfsd threads
from accessing the filecache.

What's more, for NFSv4 workloads, none of the items that are visited
during this walk may be evicted, since they are all files that are
held OPEN by NFS clients.

Address this by ensuring that open files are not kept on the LRU
list.

Reported-by: Frank van der Linden <fllinden@amazon.com>
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=386
Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Trace filecache LRU activity
Chuck Lever [Fri, 8 Jul 2022 18:25:11 +0000 (14:25 -0400)]
NFSD: Trace filecache LRU activity

Observe the operation of garbage collection and the lifetime of
filecache items.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: WARN when freeing an item still linked via nf_lru
Chuck Lever [Fri, 8 Jul 2022 18:25:04 +0000 (14:25 -0400)]
NFSD: WARN when freeing an item still linked via nf_lru

Add a guardrail to prevent freeing memory that is still on a list.
This includes either a dispose list or the LRU list.

This is the sign of a bug, but this class of bugs can be detected
so that they don't endanger system stability, especially while
debugging.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Hook up the filecache stat file
Chuck Lever [Fri, 8 Jul 2022 18:24:58 +0000 (14:24 -0400)]
NFSD: Hook up the filecache stat file

There has always been the capability of exporting filecache metrics
via /proc, but it was never hooked up. Let's surface these metrics
to enable better observability of the filecache.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Zero counters when the filecache is re-initialized
Chuck Lever [Fri, 8 Jul 2022 18:24:51 +0000 (14:24 -0400)]
NFSD: Zero counters when the filecache is re-initialized

If nfsd_file_cache_init() is called after a shutdown, be sure the
stat counters are reset.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Record number of flush calls
Chuck Lever [Fri, 8 Jul 2022 18:24:45 +0000 (14:24 -0400)]
NFSD: Record number of flush calls

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Report the number of items evicted by the LRU walk
Chuck Lever [Fri, 8 Jul 2022 18:24:38 +0000 (14:24 -0400)]
NFSD: Report the number of items evicted by the LRU walk

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Refactor nfsd_file_lru_scan()
Chuck Lever [Fri, 8 Jul 2022 18:24:31 +0000 (14:24 -0400)]
NFSD: Refactor nfsd_file_lru_scan()

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Refactor nfsd_file_gc()
Chuck Lever [Fri, 8 Jul 2022 18:24:25 +0000 (14:24 -0400)]
NFSD: Refactor nfsd_file_gc()

Refactor nfsd_file_gc() to use the new list_lru helper.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Add nfsd_file_lru_dispose_list() helper
Chuck Lever [Fri, 8 Jul 2022 18:24:18 +0000 (14:24 -0400)]
NFSD: Add nfsd_file_lru_dispose_list() helper

Refactor the invariant part of nfsd_file_lru_walk_list() into a
separate helper function.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Report average age of filecache items
Chuck Lever [Fri, 8 Jul 2022 18:24:12 +0000 (14:24 -0400)]
NFSD: Report average age of filecache items

This is a measure of how long items stay in the filecache, to help
assess how efficient the cache is.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Report count of freed filecache items
Chuck Lever [Fri, 8 Jul 2022 18:24:05 +0000 (14:24 -0400)]
NFSD: Report count of freed filecache items

Surface the count of freed nfsd_file items.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Report count of calls to nfsd_file_acquire()
Chuck Lever [Fri, 8 Jul 2022 18:23:59 +0000 (14:23 -0400)]
NFSD: Report count of calls to nfsd_file_acquire()

Count the number of successful acquisitions that did not create a
file (ie, acquisitions that do not result in a compulsory cache
miss). This count can be compared directly with the reported hit
count to compute a hit ratio.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Report filecache LRU size
Chuck Lever [Fri, 8 Jul 2022 18:23:52 +0000 (14:23 -0400)]
NFSD: Report filecache LRU size

Surface the NFSD filecache's LRU list length to help field
troubleshooters monitor filecache issues.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Demote a WARN to a pr_warn()
Chuck Lever [Fri, 8 Jul 2022 18:23:45 +0000 (14:23 -0400)]
NFSD: Demote a WARN to a pr_warn()

The call trace doesn't add much value, but it sure is noisy.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoSUNRPC: Fix server-side fault injection documentation
Chuck Lever [Fri, 1 Jul 2022 14:37:15 +0000 (10:37 -0400)]
SUNRPC: Fix server-side fault injection documentation

Fixes: 37324e6bb120 ("SUNRPC: Cache deferral injection")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agonfsd: remove redundant assignment to variable len
Colin Ian King [Tue, 28 Jun 2022 21:25:25 +0000 (22:25 +0100)]
nfsd: remove redundant assignment to variable len

Variable len is being assigned a value zero and this is never
read, it is being re-assigned later. The assignment is redundant
and can be removed.

Cleans up clang scan-build warning:
fs/nfsd/nfsctl.c:636:2: warning: Value stored to 'len' is never read

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Fix space and spelling mistake
Zhang Jiaming [Thu, 23 Jun 2022 08:20:05 +0000 (16:20 +0800)]
NFSD: Fix space and spelling mistake

Add a blank space after ','.
Change 'succesful' to 'successful'.

Signed-off-by: Zhang Jiaming <jiaming@nfschina.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNFSD: Instrument fh_verify()
Chuck Lever [Tue, 21 Jun 2022 14:06:23 +0000 (10:06 -0400)]
NFSD: Instrument fh_verify()

Capture file handles and how they map to local inodes. In particular,
NFSv4 PUTFH uses fh_verify() so we can now observe which file handles
are the target of OPEN, LOOKUP, RENAME, and so on.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoSUNRPC: Expand the svc_alloc_arg_err tracepoint
Chuck Lever [Tue, 21 Jun 2022 14:06:16 +0000 (10:06 -0400)]
SUNRPC: Expand the svc_alloc_arg_err tracepoint

Record not only the number of pages requested, but the number of
pages that were actually allocated, to get a measure of progress
(or lack thereof).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoNLM: Defend against file_lock changes after vfs_test_lock()
Benjamin Coddington [Mon, 13 Jun 2022 13:40:06 +0000 (09:40 -0400)]
NLM: Defend against file_lock changes after vfs_test_lock()

Instead of trusting that struct file_lock returns completely unchanged
after vfs_test_lock() when there's no conflicting lock, stash away our
nlm_lockowner reference so we can properly release it for all cases.

This defends against another file_lock implementation overwriting fl_owner
when the return type is F_UNLCK.

Reported-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Tested-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoSUNRPC: Fix xdr_encode_bool()
Chuck Lever [Tue, 19 Jul 2022 13:18:35 +0000 (09:18 -0400)]
SUNRPC: Fix xdr_encode_bool()

I discovered that xdr_encode_bool() was returning the same address
that was passed in the @p parameter. The documenting comment states
that the intent is to return the address of the next buffer
location, just like the other "xdr_encode_*" helpers.

The result was the encoded results of NFSv3 PATHCONF operations were
not formed correctly.

Fixes: ded04a587f6c ("NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2 years agonfsd: eliminate the NFSD_FILE_BREAK_* flags
Jeff Layton [Fri, 29 Jul 2022 21:01:07 +0000 (17:01 -0400)]
nfsd: eliminate the NFSD_FILE_BREAK_* flags

We had a report from the spring Bake-a-thon of data corruption in some
nfstest_interop tests. Looking at the traces showed the NFS server
allowing a v3 WRITE to proceed while a read delegation was still
outstanding.

Currently, we only set NFSD_FILE_BREAK_* flags if
NFSD_MAY_NOT_BREAK_LEASE was set when we call nfsd_file_alloc.
NFSD_MAY_NOT_BREAK_LEASE was intended to be set when finding files for
COMMIT ops, where we need a writeable filehandle but don't need to
break read leases.

It doesn't make any sense to consult that flag when allocating a file
since the file may be used on subsequent calls where we do want to break
the lease (and the usage of it here seems to be reverse from what it
should be anyway).

Also, after calling nfsd_open_break_lease, we don't want to clear the
BREAK_* bits. A lease could end up being set on it later (more than
once) and we need to be able to break those leases as well.

This means that the NFSD_FILE_BREAK_* flags now just mirror
NFSD_MAY_{READ,WRITE} flags, so there's no need for them at all. Just
drop those flags and unconditionally call nfsd_open_break_lease every
time.

Reported-by: Olga Kornieskaia <kolga@netapp.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2107360
Fixes: 65294c1f2c5e (nfsd: add a new struct file caching facility to nfsd)
Cc: <stable@vger.kernel.org> # 5.4.x : bb283ca18d1e NFSD: Clean up the show_nf_flags() macro
Cc: <stable@vger.kernel.org> # 5.4.x
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 years agoLinux 5.19-rc7
Linus Torvalds [Sun, 17 Jul 2022 20:30:22 +0000 (13:30 -0700)]
Linux 5.19-rc7

2 years agoMerge tag 'drm-intel-fixes-2022-07-17' of git://anongit.freedesktop.org/drm/drm-intel
Linus Torvalds [Sun, 17 Jul 2022 20:08:03 +0000 (13:08 -0700)]
Merge tag 'drm-intel-fixes-2022-07-17' of git://anongit.freedesktop.org/drm/drm-intel

Pull intel drm build fix from Rodrigo Vivi:
 "Our 'dim' flow has a problem with fixes of fixes getting missed. We
  need to take a look on that later.

  Meanwhile, please allow me to quickly propagate this fix for the
  32-bit build issue here upstream"

* tag 'drm-intel-fixes-2022-07-17' of git://anongit.freedesktop.org/drm/drm-intel:
  drm/i915/ttm: fix 32b build

2 years agoMerge tag 'perf-tools-fixes-for-v5.19-2022-07-17' of git://git.kernel.org/pub/scm...
Linus Torvalds [Sun, 17 Jul 2022 19:42:57 +0000 (12:42 -0700)]
Merge tag 'perf-tools-fixes-for-v5.19-2022-07-17' of git://git./linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix SIGSEGV when processing syscall args in perf.data files in 'perf
   trace'

 - Sync kvm, msr-index and cpufeatures headers with the kernel sources

 - Fix 'convert perf time to TSC' 'perf test':
     - No need to open events twice
     - Fix finding correct event on hybrid systems

* tag 'perf-tools-fixes-for-v5.19-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf trace: Fix SIGSEGV when processing syscall args
  perf tests: Fix Convert perf time to TSC test for hybrid
  perf tests: Stop Convert perf time to TSC test opening events twice
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools headers cpufeatures: Sync with the kernel sources
  tools headers UAPI: Sync linux/kvm.h with the kernel sources

2 years agodrm/i915/ttm: fix 32b build
Matthew Auld [Tue, 12 Jul 2022 17:40:50 +0000 (18:40 +0100)]
drm/i915/ttm: fix 32b build

Since segment_pages is no longer a compile time constant, it looks the
DIV_ROUND_UP(node->size, segment_pages) breaks the 32b build. Simplest
is just to use the ULL variant, but really we should need not need more
than u32 for the page alignment (also we are limited by that due to the
sg->length type), so also make it all u32.

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: aff1e0b09b54 ("drm/i915/ttm: fix sg_table construction")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220712174050.592550-1-matthew.auld@intel.com
(cherry picked from commit 9306b2b2dfce6931241ef804783692cee526599c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2 years agoMerge tag 'perf_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 17 Jul 2022 15:34:02 +0000 (08:34 -0700)]
Merge tag 'perf_urgent_for_v5.19_rc7' of git://git./linux/kernel/git/tip/tip

Pull perf fix from Borislav Petkov:

 - A single data race fix on the perf event cleanup path to avoid
   endless loops due to insufficient locking

* tag 'perf_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix data race between perf_event_set_output() and perf_mmap_close()

2 years agoMerge tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 17 Jul 2022 15:27:30 +0000 (08:27 -0700)]
Merge tag 'x86_urgent_for_v5.19_rc7' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Improve the check whether the kernel supports WP mappings so that it
   can accomodate a XenPV guest due to how the latter is setting up the
   PAT machinery

  - Now that the retbleed nightmare is public, here's the first round of
    fallout fixes:

      * Fix a build failure on 32-bit due to missing include

      * Remove an untraining point in espfix64 return path

      * other small cleanups

* tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Remove apostrophe typo
  um: Add missing apply_returns()
  x86/entry: Remove UNTRAIN_RET from native_irq_return_ldt
  x86/bugs: Mark retbleed_strings static
  x86/pat: Fix x86_has_pat_wp()
  x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit

2 years agoMerge tag 'gpio-fixes-for-v5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 17 Jul 2022 14:58:19 +0000 (07:58 -0700)]
Merge tag 'gpio-fixes-for-v5.19-rc7' of git://git./linux/kernel/git/brgl/linux

Pull gpio fix from Bartosz Golaszewski:

 - fix a configfs attribute of the gpio-sim module

* tag 'gpio-fixes-for-v5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: sim: fix the chip_name configfs item

2 years agoMerge tag 'input-for-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 17 Jul 2022 14:52:46 +0000 (07:52 -0700)]
Merge tag 'input-for-v5.19-rc6' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

 - fix Goodix driver to properly behave on the Aya Neo Next

 - some more sanity checks in usbtouchscreen driver

 - a tweak in wm97xx driver in preparation for remove() to return void

 - a clarification in input core regarding units of measurement for
   resolution on touch events.

* tag 'input-for-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: document the units for resolution of size axes
  Input: goodix - call acpi_device_fix_up_power() in some cases
  Input: wm97xx - make .remove() obviously always return 0
  Input: usbtouchscreen - add driver_info sanity check

2 years agoMerge tag 'for-v5.19-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux...
Linus Torvalds [Sun, 17 Jul 2022 14:45:51 +0000 (07:45 -0700)]
Merge tag 'for-v5.19-rc' of git://git./linux/kernel/git/sre/linux-power-supply

Pull power supply fixes from Sebastian Reichel:

 - power-supply core temperature interpolation regression fix for
   incorrect boundaries

 - ab8500 needs to destroy its work queues in error paths

 - Fix old DT refcount leak in arm-versatile

* tag 'for-v5.19-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
  power: supply: core: Fix boundary conditions in interpolation
  power/reset: arm-versatile: Fix refcount leak in versatile_reboot_probe
  power: supply: ab8500_fg: add missing destroy_workqueue in ab8500_fg_probe

2 years agoperf trace: Fix SIGSEGV when processing syscall args
Naveen N. Rao [Thu, 7 Jul 2022 09:09:00 +0000 (14:39 +0530)]
perf trace: Fix SIGSEGV when processing syscall args

On powerpc, 'perf trace' is crashing with a SIGSEGV when trying to
process a perf.data file created with 'perf trace record -p':

  #0  0x00000001225b8988 in syscall_arg__scnprintf_augmented_string <snip> at builtin-trace.c:1492
  #1  syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1492
  #2  syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1486
  #3  0x00000001225bdd9c in syscall_arg_fmt__scnprintf_val <snip> at builtin-trace.c:1973
  #4  syscall__scnprintf_args <snip> at builtin-trace.c:2041
  #5  0x00000001225bff04 in trace__sys_enter <snip> at builtin-trace.c:2319

That points to the below code in tools/perf/builtin-trace.c:
/*
 * If this is raw_syscalls.sys_enter, then it always comes with the 6 possible
 * arguments, even if the syscall being handled, say "openat", uses only 4 arguments
 * this breaks syscall__augmented_args() check for augmented args, as we calculate
 * syscall->args_size using each syscalls:sys_enter_NAME tracefs format file,
 * so when handling, say the openat syscall, we end up getting 6 args for the
 * raw_syscalls:sys_enter event, when we expected just 4, we end up mistakenly
 * thinking that the extra 2 u64 args are the augmented filename, so just check
 * here and avoid using augmented syscalls when the evsel is the raw_syscalls one.
 */
if (evsel != trace->syscalls.events.sys_enter)
augmented_args = syscall__augmented_args(sc, sample, &augmented_args_size, trace->raw_augmented_syscalls_args_size);

As the comment points out, we should not be trying to augment the args
for raw_syscalls. However, when processing a perf.data file, we are not
initializing those properly. Fix the same.

Reported-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220707090900.572584-1-naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf tests: Fix Convert perf time to TSC test for hybrid
Adrian Hunter [Wed, 13 Jul 2022 12:34:59 +0000 (15:34 +0300)]
perf tests: Fix Convert perf time to TSC test for hybrid

The test does not always correctly determine the number of events for
hybrids, nor allow for more than 1 evsel when parsing.

Fix by iterating the events actually created and getting the correct
evsel for the events processed.

Fixes: d9da6f70eb235110 ("perf tests: Support 'Convert perf time to TSC' test for hybrid")
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20220713123459.24145-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf tests: Stop Convert perf time to TSC test opening events twice
Adrian Hunter [Wed, 13 Jul 2022 12:34:58 +0000 (15:34 +0300)]
perf tests: Stop Convert perf time to TSC test opening events twice

Do not call evlist__open() twice.

Fixes: 5bb017d4b97a0f13 ("perf test: Fix error message for test case 71 on s390, where it is not supported")
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20220713123459.24145-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agotools arch x86: Sync the msr-index.h copy with the kernel sources
Arnaldo Carvalho de Melo [Thu, 1 Jul 2021 16:32:18 +0000 (13:32 -0300)]
tools arch x86: Sync the msr-index.h copy with the kernel sources

To pick up the changes from these csets:

  4ad3278df6fe2b08 ("x86/speculation: Disable RRSBA behavior")
  d7caac991feeef1b ("x86/cpu/amd: Add Spectral Chicken")

That cause no changes to tooling:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  $

Just silences this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
  diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/YtQTm9wsB3hxQWvy@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agotools headers cpufeatures: Sync with the kernel sources
Arnaldo Carvalho de Melo [Thu, 1 Jul 2021 16:39:15 +0000 (13:39 -0300)]
tools headers cpufeatures: Sync with the kernel sources

To pick the changes from:

  f43b9876e857c739 ("x86/retbleed: Add fine grained Kconfig knobs")
  a149180fbcf336e9 ("x86: Add magic AMD return-thunk")
  15e67227c49a5783 ("x86: Undo return-thunk damage")
  369ae6ffc41a3c11 ("x86/retpoline: Cleanup some #ifdefery")
  4ad3278df6fe2b08 x86/speculation: Disable RRSBA behavior
  26aae8ccbc197223 x86/cpu/amd: Enumerate BTC_NO
  9756bba28470722d x86/speculation: Fill RSB on vmexit for IBRS
  3ebc170068885b6f x86/bugs: Add retbleed=ibpb
  2dbb887e875b1de3 x86/entry: Add kernel IBRS implementation
  6b80b59b35557065 x86/bugs: Report AMD retbleed vulnerability
  a149180fbcf336e9 x86: Add magic AMD return-thunk
  15e67227c49a5783 x86: Undo return-thunk damage
  a883d624aed463c8 x86/cpufeatures: Move RETPOLINE flags to word 11
  51802186158c74a0 x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug

This only causes these perf files to be rebuilt:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
  diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
  diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org
Link: https://lore.kernel.org/lkml/YtQM40VmiLTkPND2@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agotools headers UAPI: Sync linux/kvm.h with the kernel sources
Arnaldo Carvalho de Melo [Sun, 9 May 2021 12:39:02 +0000 (09:39 -0300)]
tools headers UAPI: Sync linux/kvm.h with the kernel sources

To pick the changes in:

  1b870fa5573e260b ("kvm: stats: tell userspace which values are boolean")

That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the the 'perf trace' ioctl syscall argument
beautifiers.

This is also by now used by tools/testing/selftests/kvm/, a simple test
build succeeded.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lore.kernel.org/lkml/YtQLDvQrBhJNl3n5@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoMerge tag 'for-5.19-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Sat, 16 Jul 2022 20:48:55 +0000 (13:48 -0700)]
Merge tag 'for-5.19-rc7-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs reverts from David Sterba:
 "Due to a recent report [1] we need to revert the radix tree to xarray
  conversion patches.

  There's a problem with sleeping under spinlock, when xa_insert could
  allocate memory under pressure. We use GFP_NOFS so this is a real
  problem that we unfortunately did not discover during review.

  I'm sorry to do such change at rc6 time but the revert is IMO the
  safer option, there are patches to use mutex instead of the spin locks
  but that would need more testing. The revert branch has been tested on
  a few setups, all seem ok.

  The conversion to xarray will be revisited in the future"

Link: https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/
* tag 'for-5.19-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Revert "btrfs: turn delayed_nodes_tree into an XArray"
  Revert "btrfs: turn name_cache radix tree into XArray in send_ctx"
  Revert "btrfs: turn fs_info member buffer_radix into XArray"
  Revert "btrfs: turn fs_roots_radix in btrfs_fs_info into an XArray"

2 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 16 Jul 2022 18:45:40 +0000 (11:45 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Six small and reasonably obvious fixes, all in drivers"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: pm80xx: Set stopped phy's linkrate to Disabled
  scsi: pm80xx: Fix 'Unknown' max/min linkrate
  scsi: ufs: core: Fix missing clk change notification on host reset
  scsi: ufs: core: Drop loglevel of WriteBoost message
  scsi: megaraid: Clear READ queue map's nr_queues
  scsi: target: Fix WRITE_SAME No Data Buffer crash

2 years agoMerge tag 'block-5.19-2022-07-15' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 16 Jul 2022 18:40:10 +0000 (11:40 -0700)]
Merge tag 'block-5.19-2022-07-15' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Two NVMe fixes, and a regression fix for the core block layer from
  this merge window"

* tag 'block-5.19-2022-07-15' of git://git.kernel.dk/linux-block:
  block: fix missing blkcg_bio_issue_init
  nvme: fix block device naming collision
  nvme-pci: fix freeze accounting for error handling

2 years agoMerge tag 'usb-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sat, 16 Jul 2022 18:21:15 +0000 (11:21 -0700)]
Merge tag 'usb-5.19-rc7' of git://git./linux/kernel/git/gregkh/usb

Pull USB driver fixes from Greg KH:
 "Here are some small USB driver fixes and new device ids for 5.19-rc7.
  They include:

   - new usb-serial driver ids

   - typec uevent fix

   - uvc gadget driver fix

   - dwc3 driver fixes

   - ehci-fsl driver fix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: ftdi_sio: add Belimo device ids
  drivers/usb/host/ehci-fsl: Fix interrupt setup in host mode.
  usb: gadget: uvc: fix changing interface name via configfs
  usb: typec: add missing uevent when partner support PD
  usb: dwc3-am62: remove unnecesary clk_put()
  usb: dwc3: gadget: Fix event pending check

2 years agoMerge tag 'tty-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sat, 16 Jul 2022 18:11:56 +0000 (11:11 -0700)]
Merge tag 'tty-5.19-rc7' of git://git./linux/kernel/git/gregkh/tty

Pull tty and serial driver fixes from Greg KH:
 "Here are some TTY and Serial driver fixes for 5.19-rc7. They resolve a
  number of reported problems including:

   - longtime bug in pty_write() that has been reported in the past.

   - 8250 driver fixes

   - new serial device ids

   - vt overlapping data copy bugfix

   - other tiny serial driver bugfixes

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'tty-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: use new tty_insert_flip_string_and_push_buffer() in pty_write()
  tty: extract tty_flip_buffer_commit() from tty_flip_buffer_push()
  serial: 8250: dw: Fix the macro RZN1_UART_xDMACR_8_WORD_BURST
  vt: fix memory overlapping when deleting chars in the buffer
  serial: mvebu-uart: correctly report configured baudrate value
  serial: 8250: Fix PM usage_count for console handover
  serial: 8250: fix return error code in serial8250_request_std_resource()
  serial: stm32: Clear prev values before setting RTS delays
  tty: Add N_CAN327 line discipline ID for ELM327 based CAN driver
  serial: 8250: Fix __stop_tx() & DMA Tx restart races
  serial: pl011: UPSTAT_AUTORTS requires .throttle/unthrottle
  tty: serial: samsung_tty: set dma burst_size to 1
  serial: 8250: dw: enable using pdata with ACPI

2 years agoMerge tag 's390-5.19-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Sat, 16 Jul 2022 18:00:40 +0000 (11:00 -0700)]
Merge tag 's390-5.19-6' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Alexander Gordeev:

 - Fix building of out-of-tree kernel modules without a pre-built kernel
   in case CONFIG_EXPOLINE_EXTERN=y.

 - Fix a reference counting error that could prevent unloading of zcrypt
   modules.

* tag 's390-5.19-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/ap: fix error handling in __verify_queue_reservations()
  s390/nospec: remove unneeded header includes
  s390/nospec: build expoline.o for modules_prepare target

2 years agoMerge tag 'pm-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Sat, 16 Jul 2022 17:56:28 +0000 (10:56 -0700)]
Merge tag 'pm-5.19-rc7' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki
 "Fix recent regression in the cpufreq mediatek driver related to
  incorrect handling of regulator_get_optional() return value
  (AngeloGioacchino Del Regno)"

* tag 'pm-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: mediatek: Handle sram regulator probe deferral

2 years agoMerge tag 'acpi-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Sat, 16 Jul 2022 17:52:41 +0000 (10:52 -0700)]
Merge tag 'acpi-5.19-rc7' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
 "Fix more fallout from recent changes of the ACPI CPPC handling on AMD
  platforms (Mario Limonciello)"

* tag 'acpi-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: CPPC: Fix enabling CPPC on AMD systems with shared memory

2 years agoMerge tag 'printk-for-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 16 Jul 2022 17:46:03 +0000 (10:46 -0700)]
Merge tag 'printk-for-5.19-rc7' of git://git./linux/kernel/git/printk/linux

Pull printk fix from Petr Mladek:

 - Make pr_flush() fast when consoles are suspended.

* tag 'printk-for-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: do not wait for consoles when suspended