Christoph Hellwig [Sat, 29 Jun 2019 02:27:28 +0000 (19:27 -0700)]
xfs: remove the b_io_length field in struct xfs_buf
This field is now always idential to b_length.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:28 +0000 (19:27 -0700)]
xfs: properly type the b_log_item field in struct xfs_buf
Now that the log code doesn't abuse this field any more we can
declare it as a struct xfs_buf_log_item pointer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:27 +0000 (19:27 -0700)]
xfs: remove unused buffer cache APIs
Now that the log code uses bios directly we can drop various special
cases in the buffer cache code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:27 +0000 (19:27 -0700)]
xfs: stop using bp naming for log recovery buffers
Now that we don't use struct xfs_buf to hold log recovery buffer rename
the related functions and variables to just talk of a buffer instead of
using the bp name that we usually use for xfs_buf related functionality.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:26 +0000 (19:27 -0700)]
xfs: use bios directly to read and write the log recovery buffers
The xfs_buf structure is basically used as a glorified container for
a memory allocation in the log recovery code. Replace it with a
call to kmem_alloc_large and a simple abstraction to read into or
write from it synchronously using chained bios.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:26 +0000 (19:27 -0700)]
xfs: return an offset instead of a pointer from xlog_align
This simplifies both the helper and the callers. We lost a bit of
size sanity checking, but that is already covered by KASAN if needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:25 +0000 (19:27 -0700)]
xfs: move the log ioend workqueue to struct xlog
Move the workqueue used for log I/O completions from struct xfs_mount
to struct xlog to keep it self contained in the log code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: destroy the log workqueue after ensuring log ios are done]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:25 +0000 (19:27 -0700)]
xfs: use bios directly to write log buffers
Currently the XFS logging code uses the xfs_buf structure and
associated APIs to write the log buffers to disk. This requires
various special cases in the log code and is generally not very
optimal.
Instead of using a buffer just allocate a kmem_alloc_larger region for
each log buffer, and use a bio and bio_vec array embedded in the iclog
structure to write the buffer to disk. This also allows for using
the bio split and chaining case to deal with the case of a log
buffer wrapping around the end of the log.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: don't split if/else with an #endif]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:24 +0000 (19:27 -0700)]
xfs: make use of the l_targ field in struct xlog
Use the slightly shorter way to get at the buftarg for the log device
wherever we can in the log and log recovery code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:24 +0000 (19:27 -0700)]
xfs: remove the syncing argument from xlog_verify_iclog
The only caller unconditionally passes true here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:23 +0000 (19:27 -0700)]
xfs: update both stat counters together in xlog_sync
Just a small bit of code tidying up.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:23 +0000 (19:27 -0700)]
xfs: factor out iclog size calculation from xlog_sync
Split out another self-contained bit of code from xlog_sync.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:22 +0000 (19:27 -0700)]
xfs: factor out splitting of an iclog from xlog_sync
Split out a self-contained chunk of code from xlog_sync that calculates
the split offset for an iclog that wraps the log end and bumps the
cycles for the second half.
Use the chance to bring some sanity to the variables used to track the
split in xlog_sync by not changing the count variable, and instead use
split as the offset for the split and use those to calculate the
sizes and offsets for the two write buffers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:22 +0000 (19:27 -0700)]
xfs: factor out log buffer writing from xlog_sync
Replace the not very useful xlog_bdstrat wrapper with a new version that
that takes care of all the common logic for writing log buffers. Use
the opportunity to avoid overloading the buffer address with the log
relative address, and to shed the unused return value.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:21 +0000 (19:27 -0700)]
xfs: don't use REQ_PREFLUSH for split log writes
If we have to split a log write because it wraps the end of the log we
can't just use REQ_PREFLUSH to flush before the first log write,
as the writes might get reordered somewhere in the I/O stack. Issue
a manual flush in that case so that the ordering of the two log I/Os
doesn't matter.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:21 +0000 (19:27 -0700)]
xfs: remove XLOG_STATE_IOABORT
This value is the only flag in ic_state, which we otherwise use as
a state. Switch it to a new debug-only field and also report and
actual error in the buffer in the I/O completion path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:20 +0000 (19:27 -0700)]
xfs: reformat xlog_get_lowest_lsn
Reformat xlog_get_lowest_lsn to our usual style.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:20 +0000 (19:27 -0700)]
xfs: cleanup xlog_get_iclog_buffer_size
We don't really need all the messy branches in the function, as it
really does three things, out of which 2 are common for all branches:
1) set up mount point log buffer size and count values if not already
done from mount options
2) calculate the number of log headers
3) set up all the values in struct xlog based on the above
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:19 +0000 (19:27 -0700)]
xfs: remove the l_iclog_size_log field from struct xlog
This field is never used, so we can simply kill it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:19 +0000 (19:27 -0700)]
xfs: make mem_to_page available outside of xfs_buf.c
Rename the function to kmem_to_page and move it to kmem.h together
with our kmem_large allocator that may either return kmalloced or
vmalloc pages.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:18 +0000 (19:27 -0700)]
xfs: renumber XBF_WRITE_FAIL
Assining a numerical value that is not close to the flags
defined near by is just asking for conflicts later on.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:18 +0000 (19:27 -0700)]
xfs: remove the never used _XBF_COMPOUND flag
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Sat, 29 Jun 2019 02:27:17 +0000 (19:27 -0700)]
xfs: remove the no-op spinlock_destroy stub
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Darrick J. Wong [Sat, 29 Jun 2019 02:25:35 +0000 (19:25 -0700)]
xfs: move xfs_ino_geometry to xfs_shared.h
The inode geometry structure isn't related to ondisk format; it's
support for the mount structure. Move it to xfs_shared.h.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Sat, 29 Jun 2019 02:25:35 +0000 (19:25 -0700)]
xfs: claim maintainership of loose files
Claim maintainership over the miscellaneous files outside of fs/xfs/
that came from xfs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Eric Sandeen [Wed, 12 Jun 2019 16:00:00 +0000 (09:00 -0700)]
xfs: remove unused flag arguments
There are several functions which take a flag argument that is
only ever passed as "0," so remove these arguments.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Wed, 12 Jun 2019 15:59:59 +0000 (08:59 -0700)]
xfs: remove the debug-only q_transp field from struct xfs_dquot
The field is only used for a few assertations. Shrink the dqout
structure instead, similarly to what commit
f3ca87389dbf
("xfs: remove i_transp") did for the xfs_inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Christoph Hellwig [Wed, 12 Jun 2019 15:59:59 +0000 (08:59 -0700)]
xfs: merge xfs_buf_zero and xfs_buf_iomove
xfs_buf_zero is the only caller of xfs_buf_iomove. Remove support
for copying from or to the buffer in xfs_buf_iomove and merge the
two functions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Eric Sandeen [Wed, 12 Jun 2019 15:59:58 +0000 (08:59 -0700)]
xfs: remove unused flags arg from getsb interfaces
The flags value is always passed as 0 so remove the argument.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Eric Sandeen [Wed, 5 Jun 2019 18:19:48 +0000 (11:19 -0700)]
xfs: include WARN, REPAIR build options in XFS_BUILD_OPTIONS
The XFS_BUILD_OPTIONS string, shown at module init time and
in modinfo output, does not currently include all available
build options. So, add in CONFIG_XFS_WARN and CONFIG_XFS_REPAIR.
It has been suggested in some quarters
That this is not enough.
Well ...
Anybody who would like to see this in a sysfs file can send
a patch. :)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Darrick J. Wong [Wed, 5 Jun 2019 18:19:36 +0000 (11:19 -0700)]
xfs: finish converting to inodes_per_cluster
Finish converting all the old inode_cluster_size >> inopblog users to
inodes_per_cluster.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Darrick J. Wong [Wed, 5 Jun 2019 18:19:35 +0000 (11:19 -0700)]
xfs: fix inode_cluster_size rounding mayhem
inode_cluster_size is supposed to represent the size (in bytes) of an
inode cluster buffer. We avoid having to handle multiple clusters per
filesystem block on filesystems with large blocks by openly rounding
this value up to 1 FSB when necessary. However, we never reset
inode_cluster_size to reflect this new rounded value, which adds to the
potential for mistakes in calculating geometries.
Fix this by setting inode_cluster_size to reflect the rounded-up size if
needed, and special-case the few places in the sparse inodes code where
we actually need the smaller value to validate on-disk metadata.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Darrick J. Wong [Wed, 5 Jun 2019 18:19:35 +0000 (11:19 -0700)]
xfs: refactor inode geometry setup routines
Migrate all of the inode geometry setup code from xfs_mount.c into a
single libxfs function that we can share with xfsprogs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Darrick J. Wong [Wed, 5 Jun 2019 18:19:34 +0000 (11:19 -0700)]
xfs: separate inode geometry
Separate the inode geometry information into a distinct structure.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Amir Goldstein [Wed, 5 Jun 2019 15:04:51 +0000 (08:04 -0700)]
fuse: copy_file_range needs to strip setuid bits and update timestamps
Like ->write_iter(), we update mtime and strip setuid of dst file before
copy and like ->read_iter(), we update atime of src file after copy.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Amir Goldstein [Wed, 5 Jun 2019 15:04:50 +0000 (08:04 -0700)]
vfs: allow copy_file_range to copy across devices
We want to enable cross-filesystem copy_file_range functionality
where possible, so push the "same superblock only" checks down to
the individual filesystem callouts so they can make their own
decisions about cross-superblock copy offload and fallack to
generic_copy_file_range() for cross-superblock copy.
[Amir] We do not call ->remap_file_range() in case the files are not
on the same sb and do not call ->copy_file_range() in case the files
do not belong to the same filesystem driver.
This changes behavior of the copy_file_range(2) syscall, which will
now allow cross filesystem in-kernel copy. CIFS already supports
cross-superblock copy, between two shares to the same server. This
functionality will now be available via the copy_file_range(2) syscall.
Cc: Steve French <stfrench@microsoft.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Amir Goldstein [Wed, 5 Jun 2019 15:04:50 +0000 (08:04 -0700)]
xfs: use file_modified() helper
Note that by using the helper, the order of calling file_remove_privs()
after file_update_mtime() in xfs_file_aio_write_checks() has changed.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Amir Goldstein [Wed, 5 Jun 2019 15:04:49 +0000 (08:04 -0700)]
vfs: introduce file_modified() helper
The combination of file_remove_privs() and file_update_mtime() is
quite common in filesystem ->write_iter() methods.
Modelled after the helper file_accessed(), introduce file_modified()
and use it from generic_remap_file_range_prep().
Note that the order of calling file_remove_privs() before
file_update_mtime() in the helper was matched to the more common order by
filesystems and not the current order in generic_remap_file_range_prep().
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Amir Goldstein [Wed, 5 Jun 2019 15:04:49 +0000 (08:04 -0700)]
vfs: add missing checks to copy_file_range
Like the clone and dedupe interfaces we've recently fixed, the
copy_file_range() implementation is missing basic sanity, limits and
boundary condition tests on the parameters that are passed to it
from userspace. Create a new "generic_copy_file_checks()" function
modelled on the generic_remap_checks() function to provide this
missing functionality.
[Amir] Shorten copy length instead of checking pos_in limits
because input file size already abides by the limits.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Amir Goldstein [Wed, 5 Jun 2019 15:04:48 +0000 (08:04 -0700)]
vfs: remove redundant checks from generic_remap_checks()
The access limit checks on input file range in generic_remap_checks()
are redundant because the input file size is guaranteed to be within
limits and pos+len are already checked to be within input file size.
Beyond the fact that the check cannot fail, if it would have failed,
it could return -EFBIG for input file range error. There is no precedent
for that. -EFBIG is returned in syscalls that would change file length.
With that call removed, we can fold generic_access_check_limits() into
generic_write_check_limits().
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Amir Goldstein [Wed, 5 Jun 2019 15:04:48 +0000 (08:04 -0700)]
vfs: introduce generic_file_rw_checks()
Factor out helper with some checks on in/out file that are
common to clone_file_range and copy_file_range.
Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Dave Chinner [Wed, 5 Jun 2019 15:04:47 +0000 (08:04 -0700)]
vfs: no fallback for ->copy_file_range
Now that we have generic_copy_file_range(), remove it as a fallback
case when offloads fail. This puts the responsibility for executing
fallbacks on the filesystems that implement ->copy_file_range and
allows us to add operational validity checks to
generic_copy_file_range().
Rework vfs_copy_file_range() to call a new do_copy_file_range()
helper to execute the copying callout, and move calls to
generic_file_copy_range() into filesystem methods where they
currently return failures.
[Amir] overlayfs is not responsible of executing the fallback.
It is the responsibility of the underlying filesystem.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Dave Chinner [Wed, 5 Jun 2019 15:04:47 +0000 (08:04 -0700)]
vfs: introduce generic_copy_file_range()
Right now if vfs_copy_file_range() does not use any offload
mechanism, it falls back to calling do_splice_direct(). This fails
to do basic sanity checks on the files being copied. Before we
start adding this necessarily functionality to the fallback path,
separate it out into generic_copy_file_range().
generic_copy_file_range() has the same prototype as
->copy_file_range() so that filesystems can use it in their custom
->copy_file_range() method if they so choose.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Linus Torvalds [Sun, 9 Jun 2019 03:24:46 +0000 (20:24 -0700)]
Linux 5.2-rc4
Linus Torvalds [Sat, 8 Jun 2019 22:57:35 +0000 (15:57 -0700)]
Merge tag 'ceph-for-5.2-rc4' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A change to call iput() asynchronously to avoid a possible deadlock
when iput_final() needs to wait for in-flight I/O (e.g. readahead) and
a fixup for a cleanup that went into -rc1"
* tag 'ceph-for-5.2-rc4' of git://github.com/ceph/ceph-client:
ceph: fix error handling in ceph_get_caps()
ceph: avoid iput_final() while holding mutex or in dispatch thread
ceph: single workqueue for inode related works
Linus Torvalds [Sat, 8 Jun 2019 20:16:05 +0000 (13:16 -0700)]
Merge tag 'for-linus-5.2b-rc4-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"Just one fix for the Xen block frontend driver avoiding allocations
with order > 0"
* tag 'for-linus-5.2b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen-blkfront: switch kcalloc to kvcalloc for large array allocation
Linus Torvalds [Sat, 8 Jun 2019 20:12:54 +0000 (13:12 -0700)]
Merge tag 's390-5.2-4' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- fix stack unwinder: the stack unwinder rework has on off-by-one bug
which prevents following stack backchains over more than one context
(e.g. irq -> process).
- fix address space detection in exception handler: if user space
switches to access register mode, which is not supported anymore, the
exception handler may resolve to the wrong address space.
* tag 's390-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/unwind: correct stack switching during unwind
s390/mm: fix address space detection in exception handling
Linus Torvalds [Sat, 8 Jun 2019 20:09:31 +0000 (13:09 -0700)]
Merge tag 'mips_fixes_5.2_1' of git://git./linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
- Declare ginvt() __always_inline due to its use of an argument as an
inline asm immediate.
- A VDSO build fix following Kbuild changes made this cycle.
- A fix for boot failures on txx9 systems following memory
initialization changes made this cycle.
- Bounds check virt_addr_valid() to prevent it spuriously indicating
that bogus addresses are valid, in turn fixing hardened usercopy
failures that have been present since v4.12.
- Build uImage.gz for pistachio systems by default, since this is the
image we need in order to actually boot on a board.
- Remove an unused variable in our uprobes code.
* tag 'mips_fixes_5.2_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: uprobes: remove set but not used variable 'epc'
MIPS: pistachio: Build uImage.gz by default
MIPS: Make virt_addr_valid() return bool
MIPS: Bounds check virt_addr_valid
MIPS: TXx9: Fix boot crash in free_initmem()
MIPS: remove a space after -I to cope with header search paths for VDSO
MIPS: mark ginvt() as __always_inline
Linus Torvalds [Sat, 8 Jun 2019 19:52:42 +0000 (12:52 -0700)]
Merge tag 'spdx-5.2-rc4' of git://git./linux/kernel/git/gregkh/driver-core
Pull yet more SPDX updates from Greg KH:
"Another round of SPDX header file fixes for 5.2-rc4
These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
added, based on the text in the files. We are slowly chipping away at
the 700+ different ways people tried to write the license text. All of
these were reviewed on the spdx mailing list by a number of different
people.
We now have over 60% of the kernel files covered with SPDX tags:
$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
Files checked: 64533
Files with SPDX: 40392
Files with errors: 0
I think the majority of the "easy" fixups are now done, it's now the
start of the longer-tail of crazy variants to wade through"
* tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits)
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429
...
Linus Torvalds [Sat, 8 Jun 2019 19:50:36 +0000 (12:50 -0700)]
Merge tag 'char-misc-5.2-rc4' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc driver fixes for 5.2-rc4 to resolve
a number of reported issues.
The most "notable" one here is the kernel headers in proc^Wsysfs
fixes. Those changes move the header file info into sysfs and fixes
the build issues that you reported.
Other than that, a bunch of small habanalabs driver fixes, some fpga
driver fixes, and a few other tiny driver fixes.
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
habanalabs: Read upper bits of trace buffer from RWPHI
habanalabs: Fix virtual address access via debugfs for 2MB pages
fpga: zynqmp-fpga: Correctly handle error pointer
habanalabs: fix bug in checking huge page optimization
habanalabs: Avoid using a non-initialized MMU cache mutex
habanalabs: fix debugfs code
uapi/habanalabs: add opcode for enable/disable device debug mode
habanalabs: halt debug engines on user process close
test_firmware: Use correct snprintf() limit
genwqe: Prevent an integer overflow in the ioctl
parport: Fix mem leak in parport_register_dev_model
fpga: dfl: expand minor range when registering chrdev region
fpga: dfl: Add lockdep classes for pdata->lock
fpga: dfl: afu: Pass the correct device to dma_mapping_error()
fpga: stratix10-soc: fix use-after-free on s10_init()
w1: ds2408: Fix typo after
49695ac46861 (reset on output_write retry with readback)
kheaders: Do not regenerate archive if config is not changed
kheaders: Move from proc to sysfs
lkdtm/bugs: Adjust recursion test to avoid elision
lkdtm/usercopy: Moves the KERNEL_DS test to non-canonical
Linus Torvalds [Sat, 8 Jun 2019 19:48:49 +0000 (12:48 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C has a driver bugfix and a MAINTAINERS fix"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
MAINTAINERS: Karthikeyan Ramasubramanian is MIA
i2c: xiic: Add max_read_len quirk
Linus Torvalds [Sat, 8 Jun 2019 19:46:31 +0000 (12:46 -0700)]
Merge tag 'dmaengine-fix-5.2-rc4' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
- jz4780 transfer fix for acking descriptors early
- fsl-qdma: clean registers on error
- dw-axi-dmac: null pointer dereference fix
- mediatek-cqdma: fix sleeping in atomic context
- tegra210-adma: fix bunch os issues like crashing in driver probe,
channel FIFO configuration etc.
- sprd: Fixes for possible crash on descriptor status, block length
overflow. For 2-stage transfer fix incorrect start, configuration and
interrupt handling.
* tag 'dmaengine-fix-5.2-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: sprd: Add interrupt support for 2-stage transfer
dmaengine: sprd: Fix the right place to configure 2-stage transfer
dmaengine: sprd: Fix block length overflow
dmaengine: sprd: Fix the incorrect start for 2-stage destination channels
dmaengine: sprd: Add validation of current descriptor in irq handler
dmaengine: sprd: Fix the possible crash when getting descriptor status
dmaengine: tegra210-adma: Fix spelling
dmaengine: tegra210-adma: Fix channel FIFO configuration
dmaengine: tegra210-adma: Fix crash during probe
dmaengine: mediatek-cqdma: sleeping in atomic context
dmaengine: dw-axi-dmac: fix null dereference when pointer first is null
dmaengine: fsl-qdma: Add improvement
dmaengine: jz4780: Fix transfers being ACKed too soon
Linus Torvalds [Sat, 8 Jun 2019 19:12:11 +0000 (12:12 -0700)]
Merge tag 'for-linus-
20190608' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Allow symlink from the bfq.weight cgroup parameter to the general
weight (Angelo)
- Damien is new skd maintainer (Bart)
- NVMe pull request from Sagi, with a few small fixes.
- Ensure we set DMA segment size properly, dma-debug is now tripping on
these (Christoph)
- Remove useless debugfs_create() return check (Greg)
- Remove redundant unlikely() check on IS_ERR() (Kefeng)
- Fixup request freeing on exit (Ming)
* tag 'for-linus-
20190608' of git://git.kernel.dk/linux-block:
block, bfq: add weight symlink to the bfq.weight cgroup parameter
cgroup: let a symlink too be created with a cftype file
block: free sched's request pool in blk_cleanup_queue
nvme-rdma: use dynamic dma mapping per command
nvme: Fix u32 overflow in the number of namespace list calculation
mmc: also set max_segment_size in the device
mtip32xx: also set max_segment_size in the device
rsxx: don't call dma_set_max_seg_size
nvme-pci: don't limit DMA segement size
block: Drop unlikely before IS_ERR(_OR_NULL)
block: aoe: no need to check return value of debugfs_create functions
nvmet: fix data_len to 0 for bdev-backed write_zeroes
MAINTAINERS: Hand over skd maintainership
nvme-tcp: fix queue mapping when queue count is limited
nvme-rdma: fix queue mapping when queue count is limited
Linus Torvalds [Sat, 8 Jun 2019 18:54:17 +0000 (11:54 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two bug fixes, both for fairly serious problems; the UFS one looks
like it could be used to exfiltrate data from the kernel, although
probably only a privileged user has access to the command management
interface and the missing unlock in smartpqi is long standing and
probably a little used error path"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: smartpqi: unlock on error in pqi_submit_raid_request_synchronous()
scsi: ufs: Check that space was properly alloced in copy_query_response
Linus Torvalds [Sat, 8 Jun 2019 17:57:32 +0000 (10:57 -0700)]
Merge tag 'linux-kselftest-5.2-rc4-2' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fix from Shuah Khan:
"This consists of a single fix for a vm test build failure regression
when it is built by itself"
* tag 'linux-kselftest-5.2-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: vm: Fix test build failure when built by itself
Linus Torvalds [Sat, 8 Jun 2019 00:39:31 +0000 (17:39 -0700)]
Merge tag 'drm-fixes-2019-06-07-1' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"A small bit more lively this week but not majorly so. I'm away in
Japan next week for family holiday, so I'll be pretty disconnected,
I've asked Daniel to do fixes for the week while I'm out.
The nouveau firmware changes are a bit large, but they address a big
problem where a whole set of boards don't load with the driver, and
the new firmware fixes that, so I think it's worth trying to land it
now.
core:
- Allow fb changes in async commits (drivers as well)
udmabuf:
- Unmap scatterlist when unmapping udmabuf
nouveau:
- firmware loading fixes for secboot firmware on new GPU revision.
komeda:
- oops, dma mapping and warning fixes
arm-hdlcd:
- clock fixes
- mode validation fix
i915:
- Add a missing Icelake workaround
- GVT - DMA map fault fix and enforcement fixes
amdgpu:
- DCE resume fix
- New raven variation updates"
* tag 'drm-fixes-2019-06-07-1' of git://anongit.freedesktop.org/drm/drm: (33 commits)
drm/nouveau/secboot/gp10[2467]: support newer FW to fix SEC2 failures on some boards
drm/nouveau/secboot: enable loading of versioned LS PMU/SEC2 ACR msgqueue FW
drm/nouveau/secboot: split out FW version-specific LS function pointers
drm/nouveau/secboot: pass max supported FW version to LS load funcs
drm/nouveau/core: support versioned firmware loading
drm/nouveau/core: pass subdev into nvkm_firmware_get, rather than device
drm/komeda: Potential error pointer dereference
drm/komeda: remove set but not used variable 'kcrtc'
drm/amd/amdgpu: add RLC firmware to support raven1 refresh
drm/amd/powerplay: add set_power_profile_mode for raven1_refresh
drm/amdgpu: fix ring test failure issue during s3 in vce 3.0 (V2)
udmabuf: actually unmap the scatterlist
drm/arm/hdlcd: Allow a bit of clock tolerance
drm/arm/hdlcd: Actually validate CRTC modes
drm/arm/mali-dp: Add a loop around the second set CVAL and try 5 times
drm/komeda: fixing of DMA mapping sg segment warning
drm: don't block fb changes for async plane updates
drm/vc4: fix fb references in async update
drm/msm: fix fb references in async update
drm/amd: fix fb references in async update
...
Wolfram Sang [Mon, 27 May 2019 19:45:45 +0000 (21:45 +0200)]
MAINTAINERS: Karthikeyan Ramasubramanian is MIA
A mail just bounced back with "user unknown":
550 5.1.1 <kramasub@codeaurora.org> User doesn't exist
I also couldn't find a more recent address in git history. So, remove
this stale entry.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Robert Hancock [Tue, 4 Jun 2019 21:55:51 +0000 (15:55 -0600)]
i2c: xiic: Add max_read_len quirk
This driver does not support reading more than 255 bytes at once because
the register for storing the number of bytes to read is only 8 bits. Add
a max_read_len quirk to enforce this.
This was found when using this driver with the SFP driver, which was
previously reading all 256 bytes in the SFP EEPROM in one transaction.
This caused a bunch of hard-to-debug errors in the xiic driver since the
driver/logic was treating the number of bytes to read as zero.
Rejecting transactions that aren't supported at least allows the problem
to be diagnosed more easily.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Linus Torvalds [Fri, 7 Jun 2019 20:38:53 +0000 (13:38 -0700)]
Merge tag 'hwmon-for-v5.2-rc4' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix a couple of inconsistencies and locking problems in pmbus driver
- Register with thermal subsystem only on systems supporting devicetree
* tag 'hwmon-for-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (pmbus/core) Treat parameters as paged if on multiple pages
hwmon: (pmbus/core) mutex_lock write in pmbus_set_samples
hwmon: (core) add thermal sensors only if dev->of_node is present
Jan Glauber [Wed, 5 Jun 2019 13:48:49 +0000 (15:48 +0200)]
lockref: Limit number of cmpxchg loop retries
The lockref cmpxchg loop is unbound as long as the spinlock is not
taken. Depending on the hardware implementation of compare-and-swap
a high number of loop retries might happen.
Add an upper bound to the loop to force the fallback to spinlocks
after some time. A retry value of 100 should not impact any hardware
that does not have this issue.
With the retry limit the performance of an open-close testcase
improved between 60-70% on ThunderX2.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jan Glauber <jglauber@marvell.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Konovalov [Tue, 4 Jun 2019 12:04:47 +0000 (14:04 +0200)]
uaccess: add noop untagged_addr definition
Architectures that support memory tagging have a need to perform untagging
(stripping the tag) in various parts of the kernel. This patch adds an
untagged_addr() macro, which is defined as noop for architectures that do
not support memory tagging. The oncoming patch series will define it at
least for sparc64 and arm64.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 7 Jun 2019 20:06:00 +0000 (13:06 -0700)]
Merge tag 'xtensa-
20190607' of git://github.com/jcmvbkbc/linux-xtensa
Pull xtensa fix from Max Filippov:
"Fix a section mismatch between memblock_reserve and mem_reserve.
This fixes tinyconfig xtensa builds"
* tag 'xtensa-
20190607' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: Fix section mismatch between memblock_reserve and mem_reserve
Jens Axboe [Fri, 7 Jun 2019 20:04:28 +0000 (14:04 -0600)]
Merge branch 'nvme-5.2-rc-next' of git://git.infradead.org/nvme into for-linus
Pull NVMe fixes from Sagi.
* 'nvme-5.2-rc-next' of git://git.infradead.org/nvme:
nvme-rdma: use dynamic dma mapping per command
nvme: Fix u32 overflow in the number of namespace list calculation
nvmet: fix data_len to 0 for bdev-backed write_zeroes
nvme-tcp: fix queue mapping when queue count is limited
nvme-rdma: fix queue mapping when queue count is limited
Linus Torvalds [Fri, 7 Jun 2019 18:59:20 +0000 (11:59 -0700)]
Merge tag 'kbuild-fixes-v5.2-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild fixes from Masahiro Yamada:
- fix kselftest-merge to find config fragments in deeper directories
- fix kconfig unit test, which was broken by SPDX tag addition
- add + prefix to buildtar to suppress jobserver unavailable warning
- fix checkstack.pl to recognize arch=arm64
- suppress noisy warning from cc-cross-prefix
* tag 'kbuild-fixes-v5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: use more portable 'command -v' for cc-cross-prefix
scripts/checkstack.pl: Fix arm64 wrong or unknown architecture
kbuild: tar-pkg: enable communication with jobserver
kconfig: tests: fix recursive inclusion unit test
kbuild: teach kselftest-merge to find nested config files
Linus Torvalds [Fri, 7 Jun 2019 18:52:31 +0000 (11:52 -0700)]
Merge tag 'mmc-v5.2-rc2' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"Here's a couple of MMC and MEMSTICK fixes:
MMC host:
- sdhci: Fix SDIO IRQ thread deadlock
- sdhci-tegra: Fix a warning message
- sdhci_am654: Fix SLOTTYPE write
- meson-gx: Fix IRQ ack
- tmio: Fix SCC error handling to avoid false positive CRC error
MEMSTICK core:
- mspro_block: Fix returning a correct error code"
* tag 'mmc-v5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci_am654: Fix SLOTTYPE write
mmc: sdhci: Fix SDIO IRQ thread deadlock
mmc: meson-gx: fix irq ack
mmc: tmio: fix SCC error handling to avoid false positive CRC error
mmc: tegra: Fix a warning message
memstick: mspro_block: Fix an error code in mspro_block_issue_req()
Linus Torvalds [Fri, 7 Jun 2019 18:36:17 +0000 (11:36 -0700)]
Merge tag 'pm-5.2-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a crash during resume from hibernation introduced during the
4.19 cycle, cause the new Performance and Energy Bias Hint (EPB) code
to be built only if CONFIG_PM is set and add a few missing kerneldoc
comments.
Specifics:
- Fix a crash that occurs when a kernel with 'nosmt' in the command
line is used to resume the system from hibernation (as the
"restore" kernel), because memory mapping differences between the
restore and image kernels cause SMT siblings to be woken up from
idle states and subsequently they try to fetch instructions from
incorrect memory locations (Jiri Kosina).
- Cause the new Performance and Energy Bias Hint (EPB) code to be
built only if CONFIG_PM is set, because that code is not really
necessary otherwise (Rafael Wysocki).
- Add kerneldoc comments to documents some helper functions related
to system-wide suspend to avoid possible confusion regarding their
purpose (Rafael Wysocki)"
* tag 'pm-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
x86/power: Fix 'nosmt' vs hibernation triple fault during resume
PM: sleep: Add kerneldoc comments to some functions
x86: intel_epb: Do not build when CONFIG_PM is unset
Jann Horn [Sun, 2 Jun 2019 01:15:58 +0000 (03:15 +0200)]
x86/insn-eval: Fix use-after-free access to LDT entry
get_desc() computes a pointer into the LDT while holding a lock that
protects the LDT from being freed, but then drops the lock and returns the
(now potentially dangling) pointer to its caller.
Fix it by giving the caller a copy of the LDT entry instead.
Fixes:
670f928ba09b ("x86/insn-eval: Add utility function to get segment descriptor")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 7 Jun 2019 16:29:14 +0000 (09:29 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Free AF_PACKET po->rollover properly, from Willem de Bruijn.
2) Read SFP eeprom in max 16 byte increments to avoid problems with
some SFP modules, from Russell King.
3) Fix UDP socket lookup wrt. VRF, from Tim Beale.
4) Handle route invalidation properly in s390 qeth driver, from Julian
Wiedmann.
5) Memory leak on unload in RDS, from Zhu Yanjun.
6) sctp_process_init leak, from Neil HOrman.
7) Fix fib_rules rule insertion semantic change that broke Android,
from Hangbin Liu.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
pktgen: do not sleep with the thread lock held.
net: mvpp2: Use strscpy to handle stat strings
net: rds: fix memory leak in rds_ib_flush_mr_pool
ipv6: fix EFAULT on sendto with icmpv6 and hdrincl
ipv6: use READ_ONCE() for inet->hdrincl as in ipv4
Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied"
net: aquantia: fix wol configuration not applied sometimes
ethtool: fix potential userspace buffer overflow
Fix memory leak in sctp_process_init
net: rds: fix memory leak when unload rds_rdma
ipv6: fix the check before getting the cookie in rt6_get_cookie
ipv4: not do cache for local delivery if bc_forwarding is enabled
s390/qeth: handle error when updating TX queue count
s390/qeth: fix VLAN attribute in bridge_hostnotify udev event
s390/qeth: check dst entry before use
s390/qeth: handle limited IPv4 broadcast in L3 TX path
net: fix indirect calls helpers for ptype list hooks.
net: ipvlan: Fix ipvlan device tso disabled while NETIF_F_IP_CSUM is set
udp: only choose unbound UDP socket for multicast when not in a VRF
net/tls: replace the sleeping lock around RX resync with a bit lock
...
Linus Torvalds [Fri, 7 Jun 2019 16:25:27 +0000 (09:25 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Things are looking pretty quiet here in RDMA, not too many bug fixes
rolling in right now. The usual driver bug fixes and fixes for a
couple of regressions introduced in 5.2:
- Fix a race on bootup with RDMA device renaming and srp. SRP also
needs to rename its internal sys files
- Fix a memory leak in hns
- Don't leak resources in efa on certain error unwinds
- Don't panic in certain error unwinds in ib_register_device
- Various small user visible bug fix patches for the hfi and efa
drivers
- Fix the 32 bit compilation break"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/efa: Remove MAYEXEC flag check from mmap flow
mlx5: avoid 64-bit division
IB/hfi1: Validate page aligned for a given virtual address
IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr value
IB/hfi1: Insure freeze_work work_struct is canceled on shutdown
IB/rdmavt: Fix alloc_qpn() WARN_ON()
RDMA/core: Fix panic when port_data isn't initialized
RDMA/uverbs: Pass udata on uverbs error unwind
RDMA/core: Clear out the udata before error unwind
RDMA/hns: Fix PD memory leak for internal allocation
RDMA/srp: Rename SRP sysfs name after IB device rename trigger
Linus Torvalds [Fri, 7 Jun 2019 16:21:48 +0000 (09:21 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Another round of mostly-benign fixes, the exception being a boot crash
on SVE2-capable CPUs (although I don't know where you'd find such a
thing, so maybe it's benign too).
We're in the process of resolving some big-endian ptrace breakage, so
I'll probably have some more for you next week.
Summary:
- Fix boot crash on platforms with SVE2 due to missing register
encoding
- Fix architected timer accessors when CONFIG_OPTIMIZE_INLINING=y
- Move cpu_logical_map into smp.h for use by upcoming irqchip drivers
- Trivial typo fix in comment
- Disable some useless, noisy warnings from GCC 9"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Silence gcc warnings about arch ABI drift
ARM64: trivial: s/TIF_SECOMP/TIF_SECCOMP/ comment typo fix
arm64: arch_timer: mark functions as __always_inline
arm64: smp: Moved cpu_logical_map[] to smp.h
arm64: cpufeature: Fix missing ZFR0 in __read_sysreg_by_encoding()
Masahiro Yamada [Thu, 6 Jun 2019 04:13:58 +0000 (13:13 +0900)]
kbuild: use more portable 'command -v' for cc-cross-prefix
To print the pathname that will be used by shell in the current
environment, 'command -v' is a standardized way. [1]
'which' is also often used in scripts, but it is less portable.
When I worked on commit
bd55f96fa9fc ("kbuild: refactor cc-cross-prefix
implementation"), I was eager to use 'command -v' but it did not work.
(The reason is explained below.)
I kept 'which' as before but got rid of '> /dev/null 2>&1' as I
thought it was no longer needed. Sorry, I was wrong.
It works well on my Ubuntu machine, but Alexey Brodkin reports noisy
warnings on CentOS7 when 'which' fails to find the given command in
the PATH environment.
$ which foo
which: no foo in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
Given that behavior of 'which' depends on system (and it may not be
installed by default), I want to try 'command -v' once again.
The specification [1] clearly describes the behavior of 'command -v'
when the given command is not found:
Otherwise, no output shall be written and the exit status shall reflect
that the name was not found.
However, we need a little magic to use 'command -v' from Make.
$(shell ...) passes the argument to a subshell for execution, and
returns the standard output of the command.
Here is a trick. GNU Make may optimize this by executing the command
directly instead of forking a subshell, if no shell special characters
are found in the command and omitting the subshell will not change the
behavior.
In this case, no shell special character is used. So, Make will try
to run it directly. However, 'command' is a shell-builtin command,
then Make would fail to find it in the PATH environment:
$ make ARCH=m68k defconfig
make: command: Command not found
make: command: Command not found
make: command: Command not found
In fact, Make has a table of shell-builtin commands because it must
ask the shell to execute them.
Until recently, 'command' was missing in the table.
This issue was fixed by the following commit:
| commit
1af314465e5dfe3e8baa839a32a72e83c04f26ef
| Author: Paul Smith <psmith@gnu.org>
| Date: Sun Nov 12 18:10:28 2017 -0500
|
| * job.c: Add "command" as a known shell built-in.
|
| This is not a POSIX shell built-in but it's common in UNIX shells.
| Reported by Nick Bowler <nbowler@draconx.ca>.
Because the latest release is GNU Make 4.2.1 in 2016, this commit is
not included in any released versions. (But some distributions may
have back-ported it.)
We need to trick Make to spawn a subshell. There are various ways to
do so:
1) Use a shell special character '~' as dummy
$(shell : ~; command -v $(c)gcc)
2) Use a variable reference that always expands to the empty string
(suggested by David Laight)
$(shell command$${x:+} -v $(c)gcc)
3) Use redirect
$(shell command -v $(c)gcc 2>/dev/null)
I chose 3) to not confuse people. The stderr would not be polluted
anyway, but it will provide extra safety, and is easy to understand.
Tested on Make 3.81, 3.82, 4.0, 4.1, 4.2, 4.2.1
[1] http://pubs.opengroup.org/onlinepubs/
9699919799/utilities/command.html
Fixes:
bd55f96fa9fc ("kbuild: refactor cc-cross-prefix implementation")
Cc: linux-stable <stable@vger.kernel.org> # 5.1
Reported-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Alexey Brodkin <abrodkin@synopsys.com>
Vasily Gorbik [Thu, 6 Jun 2019 14:58:45 +0000 (16:58 +0200)]
s390/unwind: correct stack switching during unwind
Adjust conditions in on_stack function. That fixes backchain unwinder
which was unable to read pt_regs at the very bottom of the stack and
hence couldn't follow stacks (e.g. from async stack to a task stack).
Fixes:
78c98f907413 ("s390/unwind: introduce stack unwind API")
Reported-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Rafael J. Wysocki [Fri, 7 Jun 2019 08:48:57 +0000 (10:48 +0200)]
Merge branch 'pm-x86'
* pm-x86:
x86/power: Fix 'nosmt' vs hibernation triple fault during resume
x86: intel_epb: Do not build when CONFIG_PM is unset
Angelo Ruocco [Tue, 21 May 2019 08:01:55 +0000 (10:01 +0200)]
block, bfq: add weight symlink to the bfq.weight cgroup parameter
Many userspace tools and services use the proportional-share policy of
the blkio/io cgroups controller. The CFQ I/O scheduler implemented
this policy for the legacy block layer. To modify the weight of a
group in case CFQ was in charge, the 'weight' parameter of the group
must be modified. On the other hand, the BFQ I/O scheduler implements
the same policy in blk-mq, but, with BFQ, the parameter to modify has
a different name: bfq.weight (forced choice until legacy block was
present, because two different policies cannot share a common parameter
in cgroups).
Due to CFQ legacy, most if not all userspace configurations still use
the parameter 'weight', and for the moment do not seem likely to be
changed. But, when CFQ went away with legacy block, such a parameter
ceased to exist.
So, a simple workaround has been proposed [1] to make all
configurations work: add a symlink, named weight, to bfq.weight. This
commit adds such a symlink.
[1] https://lkml.org/lkml/2019/4/8/555
Suggested-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Angelo Ruocco <angeloruocco90@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Angelo Ruocco [Tue, 21 May 2019 08:01:54 +0000 (10:01 +0200)]
cgroup: let a symlink too be created with a cftype file
This commit enables a cftype to have a symlink (of any name) that
points to the file associated with the cftype.
Signed-off-by: Angelo Ruocco <angeloruocco90@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dave Airlie [Fri, 7 Jun 2019 07:14:19 +0000 (17:14 +1000)]
Merge branch 'linux-5.2' of git://github.com/skeggsb/linux into drm-fixes
" This is a bit more than I'd like to be pushing at this point in a
cycle, but it's a fairly important issue. There's been numerous
reports of more recent GP10[2467] boards failing to load, and I've
worked with NVIDIA FW engineers and tracked this down to the FW we've
been using not properly supporting the boards in question.
I've pushed an update to linux-firmware with the new FW version, which
unfortunately contains API changes vs the older firmware.
This series teaches the ACR subsystem inside nouveau enough to be able
to deal with supporting multiple incompatible FW revisions, and adds
support to the relevant chipsets for loading the newer FW revision, if
it's available."
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <skeggsb@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv7pG+vur0Kn_TyU3ainnkvJVw07upnnaQNOToF+kzQtDQ@mail.gmail.com
Ben Skeggs [Thu, 6 Jun 2019 06:28:35 +0000 (16:28 +1000)]
drm/nouveau/secboot/gp10[2467]: support newer FW to fix SEC2 failures on some boards
Some newer boards with these chipsets aren't compatible with the prior
version of the SEC2 FW, and fail to load as a result.
This newer FW is actually the one we already use on >=GP108.
Unfortunately, there are interface differences in GP108's FW, making it
impossible to simply move files around in linux-firmware to solve this.
We need to be able to keep compatibility with all linux-firmware/kernel
combinations, which means supporting both firmwares.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Thu, 6 Jun 2019 06:32:31 +0000 (16:32 +1000)]
drm/nouveau/secboot: enable loading of versioned LS PMU/SEC2 ACR msgqueue FW
Some chipsets will be switching to updated SEC2 LS firmware, so we need to
plumb that through.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Thu, 6 Jun 2019 05:57:14 +0000 (15:57 +1000)]
drm/nouveau/secboot: split out FW version-specific LS function pointers
It's not enough to have per-falcon structures anymore, we have multiple
versions of some firmware now that have interface differences.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Thu, 6 Jun 2019 05:38:25 +0000 (15:38 +1000)]
drm/nouveau/secboot: pass max supported FW version to LS load funcs
Will be passed to the FW loader function as an upper bound on the supported
FW version to attempt to load.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Wed, 22 May 2019 06:15:54 +0000 (16:15 +1000)]
drm/nouveau/core: support versioned firmware loading
We have a need for this now with updated SEC2 LS FW images that have an
incompatible interface from the previous version.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Thu, 6 Jun 2019 07:08:12 +0000 (17:08 +1000)]
drm/nouveau/core: pass subdev into nvkm_firmware_get, rather than device
It'd be nice to have FW loading debug messages to appear for the relevant
subsystem, when enabled.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ming Lei [Tue, 4 Jun 2019 13:08:02 +0000 (21:08 +0800)]
block: free sched's request pool in blk_cleanup_queue
In theory, IO scheduler belongs to request queue, and the request pool
of sched tags belongs to the request queue too.
However, the current tags allocation interfaces are re-used for both
driver tags and sched tags, and driver tags is definitely host wide,
and doesn't belong to any request queue, same with its request pool.
So we need tagset instance for freeing request of sched tags.
Meantime, blk_mq_free_tag_set() often follows blk_cleanup_queue() in case
of non-BLK_MQ_F_TAG_SHARED, this way requires that request pool of sched
tags to be freed before calling blk_mq_free_tag_set().
Commit
47cdee29ef9d94e ("block: move blk_exit_queue into __blk_release_queue")
moves blk_exit_queue into __blk_release_queue for simplying the fast
path in generic_make_request(), then causes oops during freeing requests
of sched tags in __blk_release_queue().
Fix the above issue by move freeing request pool of sched tags into
blk_cleanup_queue(), this way is safe becasue queue has been frozen and no any
in-queue requests at that time. Freeing sched tags has to be kept in queue's
release handler becasue there might be un-completed dispatch activity
which might refer to sched tags.
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Fixes:
47cdee29ef9d94e485eb08f962c74943023a5271 ("block: move blk_exit_queue into __blk_release_queue")
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dave Airlie [Fri, 7 Jun 2019 00:41:32 +0000 (10:41 +1000)]
Merge tag 'drm-intel-fixes-2019-06-06' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Include gvt-fixes-2019-06-05
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190606120401.GA16071@jlahtine-desk.ger.corp.intel.com
Dave Airlie [Thu, 6 Jun 2019 21:31:49 +0000 (07:31 +1000)]
Merge branch 'malidp-fixes' of git://linux-arm.org/linux-ld into drm-fixes
Assorted set of patches for Arm DRM drivers that I maintain
in my tree.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Liviu Dudau <Liviu.Dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190604144205.GO15316@e110455-lin.cambridge.arm.com
Linus Torvalds [Thu, 6 Jun 2019 20:13:09 +0000 (13:13 -0700)]
Merge branch 'parisc-5.2-3' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
- Fix crashes when accessing PCI devices on some machines like C240 and
J5000. The crashes were triggered because we replaced cache flushes
by nops in the alternative coding where we shouldn't for some
machines.
- Dave fixed a race in the usage of the sr1 space register when used to
load the coherence index.
- Use the hardware lpa instruction to to load the physical address of
kernel virtual addresses in the iommu driver code.
- The kernel may fail to link when CONFIG_MLONGCALLS isn't set. Solve
that by rearranging functions in the final vmlinux executeable.
- Some defconfig cleanups and removal of compiler warnings.
* 'parisc-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix crash due alternative coding for NP iopdir_fdc bit
parisc: Use lpa instruction to load physical addresses in driver code
parisc: configs: Remove useless UEVENT_HELPER_PATH
parisc: Use implicit space register selection for loading the coherence index of I/O pdirs
parisc: Fix compiler warnings in float emulation code
parisc/slab: cleanup after /proc/slab_allocators removal
parisc: Allow building 64-bit kernel without -mlong-calls compiler option
parisc: Kconfig: remove ARCH_DISCARD_MEMBLOCK
Linus Torvalds [Thu, 6 Jun 2019 20:10:49 +0000 (13:10 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a regression that breaks the jitterentropy RNG and a
potential memory leak in hmac"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: hmac - fix memory leak in hmac_init_tfm()
crypto: jitterentropy - change back to module_init()
Linus Torvalds [Thu, 6 Jun 2019 19:36:54 +0000 (12:36 -0700)]
Merge tag 'xfs-5.2-fixes-2' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Here are a couple more bug fixes for 5.2. Changes since last update:
- Fix some forgotten strings in a log debugging function
- Fix incorrect unit conversion in online fsck code"
* tag 'xfs-5.2-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: inode btree scrubber should calculate im_boffset correctly
xfs: fix broken log reservation debugging
Linus Torvalds [Thu, 6 Jun 2019 19:33:52 +0000 (12:33 -0700)]
Merge tag 'gfs2-v5.2.fixes' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Andreas Gruenbacher:
"A revert for a patch that turned out to be broken"
* tag 'gfs2-v5.2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
Revert "gfs2: Replace gl_revokes with a GLF flag"
Linus Torvalds [Thu, 6 Jun 2019 19:31:15 +0000 (12:31 -0700)]
Merge tag 'ovl-fixes-5.2-rc4' of git://git./linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"Here's one fix for a class of bugs triggered by syzcaller, and one
that makes xfstests fail less"
* tag 'ovl-fixes-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: doc: add non-standard corner cases
ovl: detect overlapping layers
ovl: support the FS_IOC_FS[SG]ETXATTR ioctls
Linus Torvalds [Thu, 6 Jun 2019 19:25:56 +0000 (12:25 -0700)]
Merge tag 'fuse-fixes-5.2-rc4' of git://git./linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"This fixes a leaked inode lock in an error cleanup path and a data
consistency issue with copy_file_range().
It also adds a new flag for the WRITE request that allows userspace
filesystems to clear suid/sgid bits on the file if necessary"
* tag 'fuse-fixes-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: extract helper for range writeback
fuse: fix copy_file_range() in the writeback case
fuse: add FUSE_WRITE_KILL_PRIV
fuse: fallocate: fix return with locked inode
Linus Torvalds [Thu, 6 Jun 2019 19:19:37 +0000 (12:19 -0700)]
Merge tag 'nfs-for-5.2-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"These are mostly stable bugfixes found during testing, many during the
recent NFS bake-a-thon.
Stable bugfixes:
- SUNRPC: Fix regression in umount of a secure mount
- SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS credential
- NFSv4.1: Again fix a race where CB_NOTIFY_LOCK fails to wake a waiter
- NFSv4.1: Fix bug only first CB_NOTIFY_LOCK is handled
Other bugfixes:
- xprtrdma: Use struct_size() in kzalloc()"
* tag 'nfs-for-5.2-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4.1: Fix bug only first CB_NOTIFY_LOCK is handled
NFSv4.1: Again fix a race where CB_NOTIFY_LOCK fails to wake a waiter
SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS credential
SUNRPC fix regression in umount of a secure mount
xprtrdma: Use struct_size() in kzalloc()
Paolo Abeni [Thu, 6 Jun 2019 13:45:03 +0000 (15:45 +0200)]
pktgen: do not sleep with the thread lock held.
Currently, the process issuing a "start" command on the pktgen procfs
interface, acquires the pktgen thread lock and never release it, until
all pktgen threads are completed. The above can blocks indefinitely any
other pktgen command and any (even unrelated) netdevice removal - as
the pktgen netdev notifier acquires the same lock.
The issue is demonstrated by the following script, reported by Matteo:
ip -b - <<'EOF'
link add type dummy
link add type veth
link set dummy0 up
EOF
modprobe pktgen
echo reset >/proc/net/pktgen/pgctrl
{
echo rem_device_all
echo add_device dummy0
} >/proc/net/pktgen/kpktgend_0
echo count 0 >/proc/net/pktgen/dummy0
echo start >/proc/net/pktgen/pgctrl &
sleep 1
rmmod veth
Fix the above releasing the thread lock around the sleep call.
Additionally we must prevent racing with forcefull rmmod - as the
thread lock no more protects from them. Instead, acquire a self-reference
before waiting for any thread. As a side effect, running
rmmod pktgen
while some thread is running now fails with "module in use" error,
before this patch such command hanged indefinitely.
Note: the issue predates the commit reported in the fixes tag, but
this fix can't be applied before the mentioned commit.
v1 -> v2:
- no need to check for thread existence after flipping the lock,
pktgen threads are freed only at net exit time
-
Fixes:
6146e6a43b35 ("[PKTGEN]: Removes thread_{un,}lock() macros.")
Reported-and-tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 6 Jun 2019 18:02:54 +0000 (11:02 -0700)]
Merge tag 'for-rc-adfs' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ADFS cleanups/fixes from Russell King:
"As a result of some of Al Viro's great work, here are a few cleanups
with fixes for adfs:
- factor out filename comparison, so we can be sure that
adfs_compare() (used for namei compare) and adfs_match() (used for
lookup) have the same behaviour.
- factor out filename lowering (which is not the same as tolower()
which will lower top-bit-set characters) to ensure that we have the
same behaviour when comparing filenames as when we hash them.
- factor out the object fixups, so we are applying all fixups to
directory objects in the same way, independent of the disk format.
- factor out the object name fixup (into the previously factored out
function) to ensure that filenames are appropriately translated -
for example, adfs allows '/' in filenames, which being the Unix
path separator, need to be translated to a different character,
which is normally '.' (DOS 8.3 filenames represent the . as a / on
adfs, so this is the expected reverse translation.)
- remove filename truncation; Al asked about this and apparently the
decision is to remove it. In any case, adfs's truncation was buggy,
so this rids us of that bug by removing the truncation feature.
- we now have only one location which adds the "filetype" suffix to
the filename, so there's no point that code being out of line.
- since we translate '/' into '.', an adfs filename of "/" or "//"
would end up being translated to "." and ".." which have special
meanings. In this case, change the first character to "^" to avoid
these special directory names being abused"
* tag 'for-rc-adfs' of git://git.armlinux.org.uk/~rmk/linux-arm:
fs/adfs: fix filename fixup handling for "/" and "//" names
fs/adfs: move append_filetype_suffix() into adfs_object_fixup()
fs/adfs: remove truncated filename hashing
fs/adfs: factor out filename fixup
fs/adfs: factor out object fixups
fs/adfs: factor out filename case lowering
fs/adfs: factor out filename comparison
Maxime Chevallier [Thu, 6 Jun 2019 08:42:56 +0000 (10:42 +0200)]
net: mvpp2: Use strscpy to handle stat strings
Use a safe strscpy call to copy the ethtool stat strings into the
relevant buffers, instead of a memcpy that will be accessing
out-of-bound data.
Fixes:
118d6298f6f0 ("net: mvpp2: add ethtool GOP statistics")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhu Yanjun [Thu, 6 Jun 2019 08:00:03 +0000 (04:00 -0400)]
net: rds: fix memory leak in rds_ib_flush_mr_pool
When the following tests last for several hours, the problem will occur.
Server:
rds-stress -r 1.1.1.16 -D 1M
Client:
rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30
The following will occur.
"
Starting up....
tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu
%
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
"
>From vmcore, we can find that clean_list is NULL.
>From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker.
Then rds_ib_mr_pool_flush_worker calls
"
rds_ib_flush_mr_pool(pool, 0, NULL);
"
Then in function
"
int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
int free_all, struct rds_ib_mr **ibmr_ret)
"
ibmr_ret is NULL.
In the source code,
"
...
list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail);
if (ibmr_ret)
*ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);
/* more than one entry in llist nodes */
if (clean_nodes->next)
llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list);
...
"
When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next
instead of clean_nodes is added in clean_list.
So clean_nodes is discarded. It can not be used again.
The workqueue is executed periodically. So more and more clean_nodes are
discarded. Finally the clean_list is NULL.
Then this problem will occur.
Fixes:
1bc144b62524 ("net, rds, Replace xlist in net/rds/xlist.h with llist")
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 6 Jun 2019 17:29:21 +0000 (10:29 -0700)]
Merge branch 'ipv6-fix-EFAULT-on-sendto-with-icmpv6-and-hdrincl'
Olivier Matz says:
====================
ipv6: fix EFAULT on sendto with icmpv6 and hdrincl
The following code returns EFAULT (Bad address):
s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1);
sendto(ipv6_icmp6_packet, addr); /* returns -1, errno = EFAULT */
The problem is fixed in the second patch. The first one aligns the
code to ipv4, to avoid a race condition in the second patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Olivier Matz [Thu, 6 Jun 2019 07:15:19 +0000 (09:15 +0200)]
ipv6: fix EFAULT on sendto with icmpv6 and hdrincl
The following code returns EFAULT (Bad address):
s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1);
sendto(ipv6_icmp6_packet, addr); /* returns -1, errno = EFAULT */
The IPv4 equivalent code works. A workaround is to use IPPROTO_RAW
instead of IPPROTO_ICMPV6.
The failure happens because 2 bytes are eaten from the msghdr by
rawv6_probe_proto_opt() starting from commit
19e3c66b52ca ("ipv6
equivalent of "ipv4: Avoid reading user iov twice after
raw_probe_proto_opt""), but at that time it was not a problem because
IPV6_HDRINCL was not yet introduced.
Only eat these 2 bytes if hdrincl == 0.
Fixes:
715f504b1189 ("ipv6: add IPV6_HDRINCL option for raw sockets")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Olivier Matz [Thu, 6 Jun 2019 07:15:18 +0000 (09:15 +0200)]
ipv6: use READ_ONCE() for inet->hdrincl as in ipv4
As it was done in commit
8f659a03a0ba ("net: ipv4: fix for a race
condition in raw_sendmsg") and commit
20b50d79974e ("net: ipv4: emulate
READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()") for ipv4, copy the
value of inet->hdrincl in a local variable, to avoid introducing a race
condition in the next commit.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Max Gurtovoy [Thu, 6 Jun 2019 09:27:36 +0000 (12:27 +0300)]
nvme-rdma: use dynamic dma mapping per command
Commit
87fd125344d6 ("nvme-rdma: remove redundant reference between
ib_device and tagset") caused a kernel panic when disconnecting from an
inaccessible controller (disconnect during re-connection).
--
nvme nvme0: Removing ctrl: NQN "testnqn1"
nvme_rdma: nvme_rdma_exit_request: hctx 0 queue_idx 1
BUG: unable to handle kernel paging request at
0000000080000228
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
...
Call Trace:
blk_mq_exit_hctx+0x5c/0xf0
blk_mq_exit_queue+0xd4/0x100
blk_cleanup_queue+0x9a/0xc0
nvme_rdma_destroy_io_queues+0x52/0x60 [nvme_rdma]
nvme_rdma_shutdown_ctrl+0x3e/0x80 [nvme_rdma]
nvme_do_delete_ctrl+0x53/0x80 [nvme_core]
nvme_sysfs_delete+0x45/0x60 [nvme_core]
kernfs_fop_write+0x105/0x180
vfs_write+0xad/0x1a0
ksys_write+0x5a/0xd0
do_syscall_64+0x55/0x110
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fa215417154
--
The reason for this crash is accessing an already freed ib_device for
performing dma_unmap during exit_request commands. The root cause for
that is that during re-connection all the queues are destroyed and
re-created (and the ib_device is reference counted by the queues and
freed as well) but the tagset stays alive and all the DMA mappings (that
we perform in init_request) kept in the request context. The original
commit fixed a different bug that was introduced during bonding (aka nic
teaming) tests that for some scenarios change the underlying ib_device
and caused memory leakage and possible segmentation fault. This commit
is a complementary commit that also changes the wrong DMA mappings that
were saved in the request context and making the request sqe dma
mappings dynamic with the command lifetime (i.e. mapped in .queue_rq and
unmapped in .complete). It also fixes the above crash of accessing freed
ib_device during destruction of the tagset.
Fixes:
87fd125344d6 ("nvme-rdma: remove redundant reference between ib_device and tagset")
Reported-by: Jim Harris <james.r.harris@intel.com>
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>