Linus Torvalds [Mon, 28 Aug 2023 19:15:00 +0000 (12:15 -0700)]
Merge tag 'fscrypt-for-linus' of git://git./fs/fscrypt/linux
Pull fscrypt update from Eric Biggers:
"Just a small documentation improvement"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fscrypt: improve the "Encryption modes and usage" section
Linus Torvalds [Mon, 28 Aug 2023 18:59:52 +0000 (11:59 -0700)]
Merge tag 'iomap-6.6-merge-3' of git://git./fs/xfs/xfs-linux
Pull iomap updates from Darrick Wong:
"We've got some big changes for this release -- I'm very happy to be
landing willy's work to enable large folios for the page cache for
general read and write IOs when the fs can make contiguous space
allocations, and Ritesh's work to track sub-folio dirty state to
eliminate the write amplification problems inherent in using large
folios.
As a bonus, io_uring can now process write completions in the caller's
context instead of bouncing through a workqueue, which should reduce
io latency dramatically. IOWs, XFS should see a nice performance bump
for both IO paths.
Summary:
- Make large writes to the page cache fill sparse parts of the cache
with large folios, then use large memcpy calls for the large folio.
- Track the per-block dirty state of each large folio so that a
buffered write to a single byte on a large folio does not result in
a (potentially) multi-megabyte writeback IO.
- Allow some directio completions to be performed in the initiating
task's context instead of punting through a workqueue. This will
reduce latency for some io_uring requests"
* tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (26 commits)
iomap: support IOCB_DIO_CALLER_COMP
io_uring/rw: add write support for IOCB_DIO_CALLER_COMP
fs: add IOCB flags related to passing back dio completions
iomap: add IOMAP_DIO_INLINE_COMP
iomap: only set iocb->private for polled bio
iomap: treat a write through cache the same as FUA
iomap: use an unsigned type for IOMAP_DIO_* defines
iomap: cleanup up iomap_dio_bio_end_io()
iomap: Add per-block dirty state tracking to improve performance
iomap: Allocate ifs in ->write_begin() early
iomap: Refactor iomap_write_delalloc_punch() function out
iomap: Use iomap_punch_t typedef
iomap: Fix possible overflow condition in iomap_write_delalloc_scan
iomap: Add some uptodate state handling helpers for ifs state bitmap
iomap: Drop ifs argument from iomap_set_range_uptodate()
iomap: Rename iomap_page to iomap_folio_state and others
iomap: Copy larger chunks from userspace
iomap: Create large folios in the buffered write path
filemap: Allow __filemap_get_folio to allocate large folios
filemap: Add fgf_t typedef
...
Linus Torvalds [Mon, 28 Aug 2023 18:52:10 +0000 (11:52 -0700)]
Merge tag 'erofs-for-6.6-rc1' of git://git./linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"In this cycle, a xattr bloom filter feature is introduced to speed up
negative xattr lookups, which was originally suggested by Alexander
for Composefs use cases.
Additionally, the DEFLATE algorithm is now supported, which can be
used together with hardware accelerators for our cloud workloads. Each
supported compression algorithm can be selected on a per-file basis
for specific access patterns too.
There are also some random fixes and cleanups as usual:
- Support xattr bloom filter to optimize negative xattr lookups
- Support DEFLATE compression algorithm as an alternative
- Fix a regression that ztailpacking pclusters don't release properly
- Avoid warning dedupe and fragments features anymore
- Some folio conversions and cleanups"
* tag 'erofs-for-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: release ztailpacking pclusters properly
erofs: don't warn dedupe and fragments features anymore
erofs: adapt folios for z_erofs_read_folio()
erofs: adapt folios for z_erofs_readahead()
erofs: get rid of fe->backmost for cache decompression
erofs: drop z_erofs_page_mark_eio()
erofs: tidy up z_erofs_do_read_page()
erofs: move preparation logic into z_erofs_pcluster_begin()
erofs: avoid obsolete {collector,collection} terms
erofs: simplify z_erofs_read_fragment()
erofs: remove redundant erofs_fs_type declaration in super.c
erofs: add necessary kmem_cache_create flags for erofs inode cache
erofs: clean up redundant comment and adjust code alignment
erofs: refine warning messages for zdata I/Os
erofs: boost negative xattr lookup with bloom filter
erofs: update on-disk format for xattr name filter
erofs: DEFLATE compression support
Linus Torvalds [Mon, 28 Aug 2023 18:47:24 +0000 (11:47 -0700)]
Merge tag 'filelock-v6.6' of git://git./linux/kernel/git/jlayton/linux
Pull file locking updates from Jeff Layton:
- new functionality for F_OFD_GETLK: requesting a type of F_UNLCK will
find info about whatever lock happens to be first in the given range,
regardless of type.
- an OFD lock selftest
- bugfix involving a UAF in a tracepoint
- comment typo fix
* tag 'filelock-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lock
fs/locks: Fix typo
selftests: add OFD lock tests
fs/locks: F_UNLCK extension for F_OFD_GETLK
Linus Torvalds [Mon, 28 Aug 2023 18:43:19 +0000 (11:43 -0700)]
Merge tag 'v6.6-fs.proc.uapi' of git://git./linux/kernel/git/vfs/vfs
Pull procfs fixes from Christian Brauner:
"Mode changes to files under /proc/<pid>/ aren't supported ever since
commit
6d76fa58b050 ("Don't allow chmod() on the /proc/<pid>/ files").
Due to an oversight in commit
1b3044e39a89 ("procfs: fix pthread
cross-thread naming if !PR_DUMPABLE") in switching from REG to NOD,
mode changes on /proc/thread-self/comm were accidently allowed.
Similar, mode changes for all files beneath /proc/<pid>/net/ are
blocked but mode changes on /proc/<pid>/net itself were accidently
allowed.
Both issues come down to not using the generic proc_setattr() helper
which blocks all mode changes. This is rectified with this pull
request.
This also removes a strange nolibc test that abused /proc/<pid>/net
for testing mode changes. Using procfs for this test never made a lot
of sense given procfs has special semantics for almost everything
anway.
Both changes are minor user-visible changes. It is however very
unlikely that mode changes on proc/<pid>/net and
/proc/thread-self/comm are something that userspace relies on"
* tag 'v6.6-fs.proc.uapi' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
procfs: block chmod on /proc/thread-self/comm
proc: use generic setattr() for /proc/$PID/net
selftests/nolibc: drop test chmod_net
Linus Torvalds [Mon, 28 Aug 2023 18:39:14 +0000 (11:39 -0700)]
Merge tag 'v6.6-vfs.autofs' of git://git./linux/kernel/git/vfs/vfs
Pull autofs fixes from Christian Brauner:
"This fixes a memory leak in autofs reported by syzkaller and a missing
conversion from uninterruptible to interruptible wake up when autofs
is in catatonic mode"
* tag 'v6.6-vfs.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
autofs: use wake_up() instead of wake_up_interruptible(()
autofs: fix memory leak of waitqueues in autofs_catatonic_mode
Linus Torvalds [Mon, 28 Aug 2023 18:25:27 +0000 (11:25 -0700)]
Merge tag 'v6.6-vfs.fchmodat2' of git://git./linux/kernel/git/vfs/vfs
Pull fchmodat2 system call from Christian Brauner:
"This adds the fchmodat2() system call. It is a revised version of the
fchmodat() system call, adding a missing flag argument. Support for
both AT_SYMLINK_NOFOLLOW and AT_EMPTY_PATH are included.
Adding this system call revision has been a longstanding request but
so far has always fallen through the cracks. While the kernel
implementation of fchmodat() does not have a flag argument the libc
provided POSIX-compliant fchmodat(3) version does. Both glibc and musl
have to implement a workaround in order to support AT_SYMLINK_NOFOLLOW
(see [1] and [2]).
The workaround is brittle because it relies not just on O_PATH and
O_NOFOLLOW semantics and procfs magic links but also on our rather
inconsistent symlink semantics.
This gives userspace a proper fchmodat2() system call that libcs can
use to properly implement fchmodat(3) and allows them to get rid of
their hacks. In this case it will immediately benefit them as the
current workaround is already defunct because of aformentioned
inconsistencies.
In addition to AT_SYMLINK_NOFOLLOW, give userspace the ability to use
AT_EMPTY_PATH with fchmodat2(). This is already possible with
fchownat() so there's no reason to not also support it for
fchmodat2().
The implementation is simple and comes with selftests. Implementation
of the system call and wiring up the system call are done as separate
patches even though they could arguably be one patch. But in case
there are merge conflicts from other system call additions it can be
beneficial to have separate patches"
Link: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35
Link: https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28
* tag 'v6.6-vfs.fchmodat2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests: fchmodat2: remove duplicate unneeded defines
fchmodat2: add support for AT_EMPTY_PATH
selftests: Add fchmodat2 selftest
arch: Register fchmodat2, usually as syscall 452
fs: Add fchmodat2()
Non-functional cleanup of a "__user * filename"
Linus Torvalds [Mon, 28 Aug 2023 18:04:18 +0000 (11:04 -0700)]
Merge tag 'v6.6-vfs.super' of git://git./linux/kernel/git/vfs/vfs
Pull superblock updates from Christian Brauner:
"This contains the super rework that was ready for this cycle. The
first part changes the order of how we open block devices and allocate
superblocks, contains various cleanups, simplifications, and a new
mechanism to wait on superblock state changes.
This unblocks work to ultimately limit the number of writers to a
block device. Jan has already scheduled follow-up work that will be
ready for v6.7 and allows us to restrict the number of writers to a
given block device. That series builds on this work right here.
The second part contains filesystem freezing updates.
Overview:
The generic superblock changes are rougly organized as follows
(ignoring additional minor cleanups):
(1) Removal of the bd_super member from struct block_device.
This was a very odd back pointer to struct super_block with
unclear rules. For all relevant places we have other means to get
the same information so just get rid of this.
(2) Simplify rules for superblock cleanup.
Roughly, everything that is allocated during fs_context
initialization and that's stored in fs_context->s_fs_info needs
to be cleaned up by the fs_context->free() implementation before
the superblock allocation function has been called successfully.
After sget_fc() returned fs_context->s_fs_info has been
transferred to sb->s_fs_info at which point sb->kill_sb() if
fully responsible for cleanup. Adhering to these rules means that
cleanup of sb->s_fs_info in fill_super() is to be avoided as it's
brittle and inconsistent.
Cleanup shouldn't be duplicated between sb->put_super() as
sb->put_super() is only called if sb->s_root has been set aka
when the filesystem has been successfully born (SB_BORN). That
complexity should be avoided.
This also means that block devices are to be closed in
sb->kill_sb() instead of sb->put_super(). More details in the
lower section.
(3) Make it possible to lookup or create a superblock before opening
block devices
There's a subtle dependency on (2) as some filesystems did rely
on fill_super() to be called in order to correctly clean up
sb->s_fs_info. All these filesystems have been fixed.
(4) Switch most filesystem to follow the same logic as the generic
mount code now does as outlined in (3).
(5) Use the superblock as the holder of the block device. We can now
easily go back from block device to owning superblock.
(6) Export and extend the generic fs_holder_ops and use them as
holder ops everywhere and remove the filesystem specific holder
ops.
(7) Call from the block layer up into the filesystem layer when the
block device is removed, allowing to shut down the filesystem
without risk of deadlocks.
(8) Get rid of get_super().
We can now easily go back from the block device to owning
superblock and can call up from the block layer into the
filesystem layer when the device is removed. So no need to wade
through all registered superblock to find the owning superblock
anymore"
Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/
* tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits)
super: use higher-level helper for {freeze,thaw}
super: wait until we passed kill super
super: wait for nascent superblocks
super: make locking naming consistent
super: use locking helpers
fs: simplify invalidate_inodes
fs: remove get_super
block: call into the file system for ioctl BLKFLSBUF
block: call into the file system for bdev_mark_dead
block: consolidate __invalidate_device and fsync_bdev
block: drop the "busy inodes on changed media" log message
dasd: also call __invalidate_device when setting the device offline
amiflop: don't call fsync_bdev in FDFMTBEG
floppy: call disk_force_media_change when changing the format
block: simplify the disk_force_media_change interface
nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl
xfs use fs_holder_ops for the log and RT devices
xfs: drop s_umount over opening the log and RT devices
ext4: use fs_holder_ops for the log device
ext4: drop s_umount over opening the log device
...
Linus Torvalds [Mon, 28 Aug 2023 17:17:14 +0000 (10:17 -0700)]
Merge tag 'v6.6-vfs.misc' of git://git./linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual miscellaneous features, cleanups, and fixes
for vfs and individual filesystems.
Features:
- Block mode changes on symlinks and rectify our broken semantics
- Report file modifications via fsnotify() for splice
- Allow specifying an explicit timeout for the "rootwait" kernel
command line option. This allows to timeout and reboot instead of
always waiting indefinitely for the root device to show up
- Use synchronous fput for the close system call
Cleanups:
- Get rid of open-coded lockdep workarounds for async io submitters
and replace it all with a single consolidated helper
- Simplify epoll allocation helper
- Convert simple_write_begin and simple_write_end to use a folio
- Convert page_cache_pipe_buf_confirm() to use a folio
- Simplify __range_close to avoid pointless locking
- Disable per-cpu buffer head cache for isolated cpus
- Port ecryptfs to kmap_local_page() api
- Remove redundant initialization of pointer buf in pipe code
- Unexport the d_genocide() function which is only used within core
vfs
- Replace printk(KERN_ERR) and WARN_ON() with WARN()
Fixes:
- Fix various kernel-doc issues
- Fix refcount underflow for eventfds when used as EFD_SEMAPHORE
- Fix a mainly theoretical issue in devpts
- Check the return value of __getblk() in reiserfs
- Fix a racy assert in i_readcount_dec
- Fix integer conversion issues in various functions
- Fix LSM security context handling during automounts that prevented
NFS superblock sharing"
* tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits)
cachefiles: use kiocb_{start,end}_write() helpers
ovl: use kiocb_{start,end}_write() helpers
aio: use kiocb_{start,end}_write() helpers
io_uring: use kiocb_{start,end}_write() helpers
fs: create kiocb_{start,end}_write() helpers
fs: add kerneldoc to file_{start,end}_write() helpers
io_uring: rename kiocb_end_write() local helper
splice: Convert page_cache_pipe_buf_confirm() to use a folio
libfs: Convert simple_write_begin and simple_write_end to use a folio
fs/dcache: Replace printk and WARN_ON by WARN
fs/pipe: remove redundant initialization of pointer buf
fs: Fix kernel-doc warnings
devpts: Fix kernel-doc warnings
doc: idmappings: fix an error and rephrase a paragraph
init: Add support for rootwait timeout parameter
vfs: fix up the assert in i_readcount_dec
fs: Fix one kernel-doc comment
docs: filesystems: idmappings: clarify from where idmappings are taken
fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs
vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing
...
Linus Torvalds [Mon, 28 Aug 2023 16:55:25 +0000 (09:55 -0700)]
Merge tag 'v6.6-vfs.tmpfs' of git://git./linux/kernel/git/vfs/vfs
Pull libfs and tmpfs updates from Christian Brauner:
"This cycle saw a lot of work for tmpfs that required changes to the
vfs layer. Andrew, Hugh, and I decided to take tmpfs through vfs this
cycle. Things will go back to mm next cycle.
Features
========
- By far the biggest work is the quota support for tmpfs. New tmpfs
quota infrastructure is added to support it and a new QFMT_SHMEM
uapi option is exposed.
This offers user and group quotas to tmpfs (project quotas will be
added later). Similar to other filesystems tmpfs quota are not
supported within user namespaces yet.
- Add support for user xattrs. While tmpfs already supports security
xattrs (security.*) and POSIX ACLs for a long time it lacked
support for user xattrs (user.*). With this pull request tmpfs will
be able to support a limited number of user xattrs.
This is accompanied by a fix (see below) to limit persistent simple
xattr allocations.
- Add support for stable directory offsets. Currently tmpfs relies on
the libfs provided cursor-based mechanism for readdir. This causes
issues when a tmpfs filesystem is exported via NFS.
NFS clients do not open directories. Instead, each server-side
readdir operation opens the directory, reads it, and then closes
it. Since the cursor state for that directory is associated with
the opened file it is discarded after each readdir operation. Such
directory offsets are not just cached by NFS clients but also
various userspace libraries based on these clients.
As it stands there is no way to invalidate the caches when
directory offsets have changed and the whole application depends on
unchanging directory offsets.
At LSFMM we discussed how to solve this problem and decided to
support stable directory offsets. libfs now allows filesystems like
tmpfs to use an xarrary to map a directory offset to a dentry. This
mechanism is currently only used by tmpfs but can be supported by
others as well.
Fixes
=====
- Change persistent simple xattrs allocations in libfs from
GFP_KERNEL to GPF_KERNEL_ACCOUNT so they're subject to memory
cgroup limits. Since this is a change to libfs it affects both
tmpfs and kernfs.
- Correctly verify {g,u}id mount options.
A new filesystem context is created via fsopen() which records the
namespace that becomes the owning namespace of the superblock when
fsconfig(FSCONFIG_CMD_CREATE) is called for filesystems that are
mountable in namespaces. However, fsconfig() calls can occur in a
namespace different from the namespace where fsopen() has been
called.
Currently, when fsconfig() is called to set {g,u}id mount options
the requested {g,u}id is mapped into a k{g,u}id according to the
namespace where fsconfig() was called from. The resulting k{g,u}id
is not guaranteed to be resolvable in the namespace of the
filesystem (the one that fsopen() was called in).
This means it's possible for an unprivileged user to create files
owned by any group in a tmpfs mount since it's possible to set the
setid bits on the tmpfs directory.
The contract for {g,u}id mount options and {g,u}id values in
general set from userspace has always been that they are translated
according to the caller's idmapping. In so far, tmpfs has been
doing the correct thing. But since tmpfs is mountable in
unprivileged contexts it is also necessary to verify that the
resulting {k,g}uid is representable in the namespace of the
superblock to avoid such bugs.
The new mount api's cross-namespace delegation abilities are
already widely used. Having talked to a bunch of userspace this is
the most faithful solution with minimal regression risks"
* tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs
mm: invalidation check mapping before folio_contains
tmpfs: trivial support for direct IO
tmpfs,xattr: enable limited user extended attributes
tmpfs: track free_ispace instead of free_inodes
xattr: simple_xattr_set() return old_xattr to be freed
tmpfs: verify {g,u}id mount options correctly
shmem: move spinlock into shmem_recalc_inode() to fix quota support
libfs: Remove parent dentry locking in offset_iterate_dir()
libfs: Add a lock class for the offset map's xa_lock
shmem: stable directory offsets
shmem: Refactor shmem_symlink()
libfs: Add directory operations for stable offsets
shmem: fix quota lock nesting in huge hole handling
shmem: Add default quota limit mount options
shmem: quota support
shmem: prepare shmem quota infrastructure
quota: Check presence of quota operation structures instead of ->quota_read and ->quota_write callbacks
shmem: make shmem_get_inode() return ERR_PTR instead of NULL
shmem: make shmem_inode_acct_block() return error
Linus Torvalds [Mon, 28 Aug 2023 16:31:32 +0000 (09:31 -0700)]
Merge tag 'v6.6-vfs.ctime' of git://git./linux/kernel/git/vfs/vfs
Pull vfs timestamp updates from Christian Brauner:
"This adds VFS support for multi-grain timestamps and converts tmpfs,
xfs, ext4, and btrfs to use them. This carries acks from all relevant
filesystems.
The VFS always uses coarse-grained timestamps when updating the ctime
and mtime after a change. This has the benefit of allowing filesystems
to optimize away a lot of metadata updates, down to around 1 per
jiffy, even when a file is under heavy writes.
Unfortunately, this has always been an issue when we're exporting via
NFSv3, which relies on timestamps to validate caches. A lot of changes
can happen in a jiffy, so timestamps aren't sufficient to help the
client decide to invalidate the cache.
Even with NFSv4, a lot of exported filesystems don't properly support
a change attribute and are subject to the same problems with timestamp
granularity. Other applications have similar issues with timestamps
(e.g., backup applications).
If we were to always use fine-grained timestamps, that would improve
the situation, but that becomes rather expensive, as the underlying
filesystem would have to log a lot more metadata updates.
This introduces fine-grained timestamps that are used when they are
actively queried.
This uses the 31st bit of the ctime tv_nsec field to indicate that
something has queried the inode for the mtime or ctime. When this flag
is set, on the next mtime or ctime update, the kernel will fetch a
fine-grained timestamp instead of the usual coarse-grained one.
As POSIX generally mandates that when the mtime changes, the ctime
must also change the kernel always stores normalized ctime values, so
only the first 30 bits of the tv_nsec field are ever used.
Filesytems can opt into this behavior by setting the FS_MGTIME flag in
the fstype. Filesystems that don't set this flag will continue to use
coarse-grained timestamps.
Various preparatory changes, fixes and cleanups are included:
- Fixup all relevant places where POSIX requires updating ctime
together with mtime. This is a wide-range of places and all
maintainers provided necessary Acks.
- Add new accessors for inode->i_ctime directly and change all
callers to rely on them. Plain accesses to inode->i_ctime are now
gone and it is accordingly rename to inode->__i_ctime and commented
as requiring accessors.
- Extend generic_fillattr() to pass in a request mask mirroring in a
sense the statx() uapi. This allows callers to pass in a request
mask to only get a subset of attributes filled in.
- Rework timestamp updates so it's possible to drop the @now
parameter the update_time() inode operation and associated helpers.
- Add inode_update_timestamps() and convert all filesystems to it
removing a bunch of open-coding"
* tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits)
btrfs: convert to multigrain timestamps
ext4: switch to multigrain timestamps
xfs: switch to multigrain timestamps
tmpfs: add support for multigrain timestamps
fs: add infrastructure for multigrain timestamps
fs: drop the timespec64 argument from update_time
xfs: have xfs_vn_update_time gets its own timestamp
fat: make fat_update_time get its own timestamp
fat: remove i_version handling from fat_update_time
ubifs: have ubifs_update_time use inode_update_timestamps
btrfs: have it use inode_update_timestamps
fs: drop the timespec64 arg from generic_update_time
fs: pass the request_mask to generic_fillattr
fs: remove silly warning from current_time
gfs2: fix timestamp handling on quota inodes
fs: rename i_ctime field to __i_ctime
selinux: convert to ctime accessor functions
security: convert to ctime accessor functions
apparmor: convert to ctime accessor functions
sunrpc: convert to ctime accessor functions
...
Helge Deller [Mon, 28 Aug 2023 15:29:46 +0000 (17:29 +0200)]
parisc: ccio-dma: Create private runway procfs root entry
Create an own procfs "runway" root entry for the CCIO driver.
No need to share it with the sba_iommu driver, as only one
of those busses can be active in one machine anyway.
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 547259580dfa ("parisc: Move proc_mckinley_root and proc_runway_root to sba_iommu")
Cc: <stable@vger.kernel.org> # v6.5
Linus Torvalds [Mon, 28 Aug 2023 16:00:09 +0000 (09:00 -0700)]
Merge tag 'v6.6-vfs.fs_context' of git://git./linux/kernel/git/vfs/vfs
Pull mount API updates from Christian Brauner:
"This introduces FSCONFIG_CMD_CREATE_EXCL which allows userspace to
implement something like
$ mount -t ext4 --exclusive /dev/sda /B
which fails if a superblock for the requested filesystem does already
exist instead of silently reusing an existing superblock.
Without it, in the sequence
$ move-mount -f xfs -o source=/dev/sda4 /A
$ move-mount -f xfs -o noacl,source=/dev/sda4 /B
the initial mounter will create a superblock. The second mounter will
reuse the existing superblock, creating a bind-mount (see [1] for the
source of the move-mount binary).
The problem is that reusing an existing superblock means all mount
options other than read-only and read-write will be silently ignored
even if they are incompatible requests. For example, the second mount
has requested no POSIX ACL support but since the existing superblock
is reused POSIX ACL support will remain enabled.
Such silent superblock reuse can easily become a security issue.
After adding support for FSCONFIG_CMD_CREATE_EXCL to mount(8) in
util-linux this can be fixed:
$ move-mount -f xfs --exclusive -o source=/dev/sda4 /A
$ move-mount -f xfs --exclusive -o noacl,source=/dev/sda4 /B
Device or resource busy | move-mount.c: 300: do_fsconfig: i xfs: reusing existing filesystem not allowed
This requires the new mount api. With the old mount api it would be
necessary to plumb this through every legacy filesystem's
file_system_type->mount() method. If they want this feature they are
most welcome to switch to the new mount api"
Link: https://github.com/brauner/move-mount-beneath
Link: https://lore.kernel.org/linux-block/20230704-fasching-wertarbeit-7c6ffb01c83d@brauner
Link: https://lore.kernel.org/linux-block/20230705-pumpwerk-vielversprechend-a4b1fd947b65@brauner
Link: https://lore.kernel.org/linux-fsdevel/20230725-einnahmen-warnschilder-17779aec0a97@brauner
Link: https://lore.kernel.org/lkml/20230824-anzog-allheilmittel-e8c63e429a79@brauner/
* tag 'v6.6-vfs.fs_context' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: add FSCONFIG_CMD_CREATE_EXCL
fs: add vfs_cmd_reconfigure()
fs: add vfs_cmd_create()
super: remove get_tree_single_reconf()
Helge Deller [Sun, 27 Aug 2023 11:54:12 +0000 (13:54 +0200)]
parisc: chassis: Do not overwrite string on LCD display
If we send a chassis code via PDC, PDC usually overwrites the
contents on the LCD display. Just call lcd_print() in this case
so that the LCD/LED driver prints the last string again.
Signed-off-by: Helge Deller <deller@gmx.de>
Helge Deller [Sun, 27 Aug 2023 11:50:00 +0000 (13:50 +0200)]
parisc: led: Rewrite LED/LCD driver to utilizize Linux LED subsystem
Rewrite the whole driver and drop the own code to calculate load
average, disk and LAN load. Switch instead to use the in-kernel LED
subsystem, which gives us quite some advantages, e.g.
- existing triggers for heartbeat and disk/lan activity can be used
- users can configre the LEDs at will to any existing trigger via
/sys/class/leds
- less overhead since we don't need to run own timers
- fully integrated in Linux and as such cleaner code.
Note that the driver now depends on CONFIG_LEDS_CLASS which has to
be built-in and not as module.
Signed-off-by: Helge Deller <deller@gmx.de>
David Heidelberg [Wed, 23 Aug 2023 22:36:22 +0000 (00:36 +0200)]
dt-bindings: thermal: lmh: update maintainer address
The old email is no longer functioning.
Fixes: 17b1362d4919 ("MAINTAINERS: Update email address")
Signed-off-by: David Heidelberg <david@ixit.cz>
Link: https://lore.kernel.org/r/20230823223622.91789-1-david@ixit.cz
Signed-off-by: Rob Herring <robh@kernel.org>
Rob Herring [Thu, 24 Aug 2023 22:17:34 +0000 (17:17 -0500)]
of: unittest: Fix of_unittest_pci_node() kconfig dependencies
of_unittest_pci_node test depends on both CONFIG_PCI_DYNAMIC_OF_NODES
and CONFIG_OF_OVERLAY. Move the test into the existing
CONFIG_OF_OVERLAY ifdef and rework the CONFIG_PCI_DYNAMIC_OF_NODES
dependency to use IS_ENABLED() instead. This reduces the combinations to
build.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308241954.oRNfVqmB-lkp@intel.com/
Fixes: 26409dd04589 ("of: unittest: Add pci_dt_testdrv pci driver")
Cc: Lizhi Hou <lizhi.hou@amd.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20230824221743.1581707-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Jakub Kicinski [Mon, 28 Aug 2023 15:02:37 +0000 (08:02 -0700)]
Merge branch 'devlink-finish-file-split-and-get-retire-leftover-c'
Jiri Pirko says:
====================
devlink: finish file split and get retire leftover.c
This patchset finishes a move Jakub started and Moshe continued in the
past. I was planning to do this for a long time, so here it is, finally.
This patchset does not change any behaviour. It just splits leftover.c
into per-object files and do necessary changes, like declaring functions
used from other code, on the way.
The last 3 patches are pushing the rest of the code into appropriate
existing files.
====================
Link: https://lore.kernel.org/r/20230828061657.300667-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Mon, 28 Aug 2023 06:16:57 +0000 (08:16 +0200)]
devlink: move devlink_notify_register/unregister() to dev.c
At last, move the last bits out of leftover.c,
the devlink_notify_register/unregister() functions to dev.c
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-16-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Mon, 28 Aug 2023 06:16:56 +0000 (08:16 +0200)]
devlink: move small_ops definition into netlink.c
Move the generic netlink small_ops definition where they are consumed,
into netlink.c
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-15-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Mon, 28 Aug 2023 06:16:55 +0000 (08:16 +0200)]
devlink: move tracepoint definitions into core.c
Move remaining tracepoint definitions to most suitable file core.c.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-14-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Mon, 28 Aug 2023 06:16:54 +0000 (08:16 +0200)]
devlink: push linecard related code into separate file
Cut out another chunk from leftover.c and put linecard related code
into a separate file.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-13-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Mon, 28 Aug 2023 06:16:53 +0000 (08:16 +0200)]
devlink: push rate related code into separate file
Cut out another chunk from leftover.c and put rate related code
into a separate file.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-12-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Mon, 28 Aug 2023 06:16:52 +0000 (08:16 +0200)]
devlink: push trap related code into separate file
Cut out another chunk from leftover.c and put trap related code
into a separate file.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-11-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Mon, 28 Aug 2023 06:16:51 +0000 (08:16 +0200)]
devlink: use tracepoint_enabled() helper
In preparation for the trap code move, use tracepoint_enabled() helper
instead of trace_devlink_trap_report_enabled() which would not be
defined in that scope.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-10-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Mon, 28 Aug 2023 06:16:50 +0000 (08:16 +0200)]
devlink: push region related code into separate file
Cut out another chunk from leftover.c and put region related code
into a separate file.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-9-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Mon, 28 Aug 2023 06:16:49 +0000 (08:16 +0200)]
devlink: push param related code into separate file
Cut out another chunk from leftover.c and put param related code
into a separate file.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-8-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Mon, 28 Aug 2023 06:16:48 +0000 (08:16 +0200)]
devlink: push resource related code into separate file
Cut out another chunk from leftover.c and put resource related code
into a separate file.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-7-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Mon, 28 Aug 2023 06:16:47 +0000 (08:16 +0200)]
devlink: push dpipe related code into separate file
Cut out another chunk from leftover.c and put dpipe related code
into a separate file.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-6-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Mon, 28 Aug 2023 06:16:46 +0000 (08:16 +0200)]
devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
Since both dpipe and resource code is using this helper, in preparation
for code split to separate files, move
devlink_dpipe_send_and_alloc_skb() helper into netlink.c. Rename it on
the way.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-5-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Mon, 28 Aug 2023 06:16:45 +0000 (08:16 +0200)]
devlink: push shared buffer related code into separate file
Cut out another chunk from leftover.c and put sb related code
into a separate file.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-4-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Mon, 28 Aug 2023 06:16:44 +0000 (08:16 +0200)]
devlink: push port related code into separate file
Cut out another chunk from leftover.c and put port related code
into a separate file.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-3-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Mon, 28 Aug 2023 06:16:43 +0000 (08:16 +0200)]
devlink: push object register/unregister notifications into separate helpers
In preparations of leftover.c split to individual files, avoid need to
have object structures exposed in devl_internal.h and allow to have them
maintained in object files.
The register/unregister notifications need to know the structures
to iterate lists. To avoid the need, introduce per-object
register/unregister notification helpers and use them.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230828061657.300667-2-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Takashi Iwai [Mon, 28 Aug 2023 14:56:54 +0000 (16:56 +0200)]
Merge tag 'asoc-fix-v6.5-merge-window' of https://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes that got left after v6.4
These were some changes in my v6.4 branch that never got sent as fixes,
none of them super urgent thankfully.
Takashi Iwai [Mon, 28 Aug 2023 11:35:37 +0000 (13:35 +0200)]
ASoC: dwc: i2s: Fix unused functions
A few newly added functions aren't built unless CONFIG_OF is set,
which result in the build failure due to defined-but-not-used errors.
Put "#ifdef CONFIG_OF" around those functions to suppress the build
error.
Fixes: 52ea7c0543f8 ("ASoC: dwc: i2s: Add StarFive JH7110 SoC support")
Link: https://lore.kernel.org/r/20230828113537.27600-1-tiwai@suse.de
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Takashi Iwai [Mon, 28 Aug 2023 14:13:03 +0000 (16:13 +0200)]
Merge tag 'asoc-v6.6' of https://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Updates for v6.6
The rest of the updates for v6.6, some of the highlights include:
- A big API cleanup from Morimoto-san, rationalising the places we put
functions.
- Lots of work on the SOF framework, AMD and Intel drivers, including a
lot of cleanup and new device support.
- Standardisation of the presentation of jacks from drivers.
- Provision of some generic sound card DT properties.
- Conversion oof more drivers to the maple tree register cache.
- New drivers for AMD Van Gogh, AWInic AW88261, Cirrus Logic cs42l43,
various Intel platforms, Mediatek MT7986, RealTek RT1017 and StarFive
JH7110.
Takashi Iwai [Mon, 28 Aug 2023 10:19:24 +0000 (12:19 +0200)]
ALSA: usb-audio: Don't try to submit URBs after disconnection
USB-audio driver can still submit URBs while the device is being
disconnected, and it may result in spurious error messages like:
usb 1-2: cannot submit urb (err = -19)
usb 1-2: Unable to submit urb #0: -19 at snd_usb_queue_pending_output_urbs
usb 1-2: cannot submit urb 0, error -19: no device
Although those are harmless, they are just ugly.
This patch tries to avoid spewing such error messages when the device
is already at the disconnected state. It also skips the superfluous
xfer notification, too.
Link: https://lore.kernel.org/r/20230828101924.27107-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Rafael J. Wysocki [Mon, 28 Aug 2023 12:15:41 +0000 (14:15 +0200)]
Merge tag 'opp-updates-6.6' of git://git./linux/kernel/git/vireshk/pm
Pull OPP updates for 6.6 from Viresh Kumar:
"- Minor core cleanup and addition of new frequency related APIs (Viresh
Kumar and Manivannan Sadhasivam).
- Convert ti cpufreq/opp bindings to json schema (Nishanth Menon)."
* tag 'opp-updates-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
dt-bindings: cpufreq: Convert ti-cpufreq to json schema
dt-bindings: opp: Convert ti-omap5-opp-supply to json schema
OPP: Fix argument name in doc comment
dt-bindings: opp: Increase maxItems for opp-hz property
OPP: Fix passing 0 to PTR_ERR in _opp_attach_genpd()
OPP: Fix potential null ptr dereference in dev_pm_opp_get_required_pstate()
OPP: Reuse dev_pm_opp_get_freq_indexed()
OPP: Update _read_freq() to return the correct frequency
OPP: Add dev_pm_opp_find_freq_exact_indexed()
OPP: Introduce dev_pm_opp_get_freq_indexed() API
OPP: Introduce dev_pm_opp_find_freq_{ceil/floor}_indexed() APIs
OPP: Rearrange entries in pm_opp.h
Rafael J. Wysocki [Mon, 28 Aug 2023 12:13:54 +0000 (14:13 +0200)]
Merge branch 'pm-cpufreq'
Merge ARM cpufreq updates for 6.6:
- Migrate various platforms to use remove callback returning void
(Yangtao Li).
- Add online/offline/exit hooks for Tegra driver (Sumit Gupta).
- Explicitly include correct DT includes (Rob Herring).
- Frequency domain updates for qcom-hw driver (Neil Armstrong).
- Modify AMD pstate driver return the highest_perf value (Meng Li).
- Generic cleanups for cppc, mediatek and powernow driver (Liao Chang,
Konrad Dybcio).
- Add more platforms to cpufreq-arm driver's blocklist (AngeloGioacchino
Del Regno, Konrad Dybcio).
- brcmstb-avs-cpufreq: Fix -Warray-bounds bug (Gustavo A. R. Silva).
* pm-cpufreq: (33 commits)
cpufreq: tegra194: remove opp table in exit hook
cpufreq: powernow-k8: Use related_cpus instead of cpus in driver.exit()
cpufreq: tegra194: add online/offline hooks
cpufreq: qcom-cpufreq-hw: add support for 4 freq domains
dt-bindings: cpufreq: qcom-hw: add a 4th frequency domain
cpufreq: cppc: Set fie_disabled to FIE_DISABLED if fails to create kworker_fie
cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases.
cpufreq: Prefer to print cpuid in MIN/MAX QoS register error message
cpufreq: amd-pstate-ut: Modify the function to get the highest_perf value
cpufreq: mediatek-hw: Remove unused define
cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
cpufreq: brcmstb-avs-cpufreq: Fix -Warray-bounds bug
cpufreq: blocklist MSM8998 in cpufreq-dt-platdev
cpufreq: omap: Convert to platform remove callback returning void
cpufreq: qoriq: Convert to platform remove callback returning void
cpufreq: acpi: Convert to platform remove callback returning void
cpufreq: tegra186: Convert to platform remove callback returning void
cpufreq: qcom-nvmem: Convert to platform remove callback returning void
cpufreq: kirkwood: Convert to platform remove callback returning void
cpufreq: pcc-cpufreq: Convert to platform remove callback returning void
...
Rafael J. Wysocki [Mon, 28 Aug 2023 12:12:05 +0000 (14:12 +0200)]
Merge tag 'cpufreq-arm-updates-6.6' of git://git./linux/kernel/git/vireshk/pm
Pull ARM cpufreq updates for 6.6 from Viresh Kumar:
"- Migrate various platforms to use remove callback returning void
(Yangtao Li).
- Add online/offline/exit hooks for Tegra driver (Sumit Gupta).
- Explicitly include correct DT includes (Rob Herring).
- Frequency domain updates for qcom-hw driver (Neil Armstrong).
- Modify AMD pstate driver return the highest_perf value (Meng Li).
- Generic cleanups for cppc, mediatek and powernow driver (Liao Chang
and Konrad Dybcio).
- Add more platforms to cpufreq-arm driver's blocklist (AngeloGioacchino
Del Regno and Konrad Dybcio).
- brcmstb-avs-cpufreq: Fix -Warray-bounds bug (Gustavo A. R. Silva)."
* tag 'cpufreq-arm-updates-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (33 commits)
cpufreq: tegra194: remove opp table in exit hook
cpufreq: powernow-k8: Use related_cpus instead of cpus in driver.exit()
cpufreq: tegra194: add online/offline hooks
cpufreq: qcom-cpufreq-hw: add support for 4 freq domains
dt-bindings: cpufreq: qcom-hw: add a 4th frequency domain
cpufreq: cppc: Set fie_disabled to FIE_DISABLED if fails to create kworker_fie
cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases.
cpufreq: Prefer to print cpuid in MIN/MAX QoS register error message
cpufreq: amd-pstate-ut: Modify the function to get the highest_perf value
cpufreq: mediatek-hw: Remove unused define
cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
cpufreq: brcmstb-avs-cpufreq: Fix -Warray-bounds bug
cpufreq: blocklist MSM8998 in cpufreq-dt-platdev
cpufreq: omap: Convert to platform remove callback returning void
cpufreq: qoriq: Convert to platform remove callback returning void
cpufreq: acpi: Convert to platform remove callback returning void
cpufreq: tegra186: Convert to platform remove callback returning void
cpufreq: qcom-nvmem: Convert to platform remove callback returning void
cpufreq: kirkwood: Convert to platform remove callback returning void
cpufreq: pcc-cpufreq: Convert to platform remove callback returning void
...
Ard Biesheuvel [Mon, 28 Aug 2023 10:57:05 +0000 (12:57 +0200)]
Merge remote-tracking branch 'linux-efi/urgent' into efi/next
Sumit Gupta [Fri, 25 Aug 2023 11:16:17 +0000 (16:46 +0530)]
cpufreq: tegra194: remove opp table in exit hook
Add exit hook and remove OPP table when the device gets unregistered.
This will fix the error messages when the CPU FREQ driver module is
removed and then re-inserted. It also fixes these messages while
onlining the first CPU from a policy whose all CPU's were previously
offlined.
debugfs: File 'cpu5' in directory 'opp' already present!
debugfs: File 'cpu6' in directory 'opp' already present!
debugfs: File 'cpu7' in directory 'opp' already present!
Fixes: f41e1442ac5b ("cpufreq: tegra194: add OPP support and set bandwidth")
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
[ Viresh: Dropped irrelevant change from it ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Takashi Iwai [Mon, 28 Aug 2023 09:56:39 +0000 (11:56 +0200)]
Merge branch 'for-next' into for-linus
Pull materials for 6.5 merge window.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Thomas Gleixner [Mon, 28 Aug 2023 09:33:03 +0000 (11:33 +0200)]
Merge tag 'irqchip-6.6' of git://git./linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- Fix for Loongsoon eiointc init error handling
- Fix a bunch of warning showing up when -Wmissing-prototypes is set
- A set of fixes for drivers checking for 0 as a potential return
value from platform_get_irq()
- Another set of patches converting existing code to the use of helpers
such as of_address_count() and devm_platform_get_and_ioremap_resource()
- A tree-wide cleanup of drivers including of_*.h without discrimination
- Added support for the Amlogic C3 SoCs
Link: https://lore.kernel.org/lkml/20230828091543.4001857-1-maz@kernel.org
Eric Dumazet [Mon, 28 Aug 2023 08:47:32 +0000 (08:47 +0000)]
inet: fix IP_TRANSPARENT error handling
My recent patch forgot to change error handling for IP_TRANSPARENT
socket option.
WARNING: bad unlock balance detected!
6.5.0-rc7-syzkaller-01717-g59da9885767a #0 Not tainted
-------------------------------------
syz-executor151/5028 is trying to release lock (sk_lock-AF_INET) at:
[<
ffffffff88213983>] sockopt_release_sock+0x53/0x70 net/core/sock.c:1073
but there are no more locks to release!
other info that might help us debug this:
1 lock held by syz-executor151/5028:
stack backtrace:
CPU: 0 PID: 5028 Comm: syz-executor151 Not tainted
6.5.0-rc7-syzkaller-01717-g59da9885767a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
__lock_release kernel/locking/lockdep.c:5438 [inline]
lock_release+0x4b5/0x680 kernel/locking/lockdep.c:5781
sock_release_ownership include/net/sock.h:1824 [inline]
release_sock+0x175/0x1b0 net/core/sock.c:3527
sockopt_release_sock+0x53/0x70 net/core/sock.c:1073
do_ip_setsockopt+0x12c1/0x3640 net/ipv4/ip_sockglue.c:1364
ip_setsockopt+0x59/0xe0 net/ipv4/ip_sockglue.c:1419
raw_setsockopt+0x218/0x290 net/ipv4/raw.c:833
__sys_setsockopt+0x2cd/0x5b0 net/socket.c:2305
__do_sys_setsockopt net/socket.c:2316 [inline]
__se_sys_setsockopt net/socket.c:2313 [inline]
Fixes: 4bd0623f04ee ("inet: move inet->transparent to inet->inet_flags")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhengchao Shao [Sat, 26 Aug 2023 02:23:30 +0000 (10:23 +0800)]
selftests: bonding: create directly devices in the target namespaces
If failed to set link1_1 to netns client, we should delete link1_1 in the
cleanup path. But if set link1_1 to netns client successfully, delete
link1_1 will report warning. So it will be safer creating directly the
devices in the target namespaces.
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Closes: https://lore.kernel.org/all/ZNyJx1HtXaUzOkNA@Laptop-X1/
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 25 Aug 2023 19:44:01 +0000 (21:44 +0200)]
r8169: fix ASPM-related issues on a number of systems with NIC version from RTL8168h
This effectively reverts
4b5f82f6aaef. On a number of systems ASPM L1
causes tx timeouts with RTL8168h, see referenced bug report.
Fixes: 4b5f82f6aaef ("r8169: enable ASPM L1/L1.1 from RTL8168h")
Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217814
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mikhail Kobuk [Fri, 25 Aug 2023 19:04:41 +0000 (22:04 +0300)]
ethernet: tg3: remove unreachable code
'tp->irq_max' value is either 1 [L16336] or 5 [L16354], as indicated in
tg3_get_invariants(). Therefore, 'i' can't exceed 4 in tg3_init_one()
that makes (i <= 4) always true. Moreover, 'intmbx' value set at the
last iteration is not used later in it's scope.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 78f90dcf184b ("tg3: Move napi_add calls below tg3_get_invariants")
Signed-off-by: Mikhail Kobuk <m.kobuk@ispras.ru>
Reviewed-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 25 Aug 2023 13:49:46 +0000 (15:49 +0200)]
net: Make consumed action consistent in sch_handle_egress
While looking at TC_ACT_* handling, the TC_ACT_CONSUMED is only handled in
sch_handle_ingress but not sch_handle_egress. This was added via
cd11b164073b
("net/tc: introduce TC_ACT_REINSERT.") and
e5cf1baf92cb ("act_mirred: use
TC_ACT_REINSERT when possible") and later got renamed into TC_ACT_CONSUMED
via
720f22fed81b ("net: sched: refactor reinsert action").
The initial work was targeted for ovs back then and only needed on ingress,
and the mirred action module also restricts it to only that. However, given
it's an API contract it would still make sense to make this consistent to
sch_handle_ingress and handle it on egress side in the same way, that is,
setting return code to "success" and returning NULL back to the caller as
otherwise an action module sitting on egress returning TC_ACT_CONSUMED could
lead to an UAF when untreated.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 25 Aug 2023 13:49:45 +0000 (15:49 +0200)]
net: Fix skb consume leak in sch_handle_egress
Fix a memory leak for the tc egress path with TC_ACT_{STOLEN,QUEUED,TRAP}:
[...]
unreferenced object 0xffff88818bcb4f00 (size 232):
comm "softirq", pid 0, jiffies
4299085078 (age 134.028s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 80 70 61 81 88 ff ff 00 41 31 14 81 88 ff ff ..pa.....A1.....
backtrace:
[<
ffffffff9991b938>] kmem_cache_alloc_node+0x268/0x400
[<
ffffffff9b3d9231>] __alloc_skb+0x211/0x2c0
[<
ffffffff9b3f0c7e>] alloc_skb_with_frags+0xbe/0x6b0
[<
ffffffff9b3bf9a9>] sock_alloc_send_pskb+0x6a9/0x870
[<
ffffffff9b6b3f00>] __ip_append_data+0x14d0/0x3bf0
[<
ffffffff9b6ba24e>] ip_append_data+0xee/0x190
[<
ffffffff9b7e1496>] icmp_push_reply+0xa6/0x470
[<
ffffffff9b7e4030>] icmp_reply+0x900/0xa00
[<
ffffffff9b7e42e3>] icmp_echo.part.0+0x1a3/0x230
[<
ffffffff9b7e444d>] icmp_echo+0xcd/0x190
[<
ffffffff9b7e9566>] icmp_rcv+0x806/0xe10
[<
ffffffff9b699bd1>] ip_protocol_deliver_rcu+0x351/0x3d0
[<
ffffffff9b699f14>] ip_local_deliver_finish+0x2b4/0x450
[<
ffffffff9b69a234>] ip_local_deliver+0x174/0x1f0
[<
ffffffff9b69a4b2>] ip_sublist_rcv_finish+0x1f2/0x420
[<
ffffffff9b69ab56>] ip_sublist_rcv+0x466/0x920
[...]
I was able to reproduce this via:
ip link add dev dummy0 type dummy
ip link set dev dummy0 up
tc qdisc add dev eth0 clsact
tc filter add dev eth0 egress protocol ip prio 1 u32 match ip protocol 1 0xff action mirred egress redirect dev dummy0
ping 1.1.1.1
<stolen>
After the fix, there are no kmemleak reports with the reproducer. This is
in line with what is also done on the ingress side, and from debugging the
skb_unref(skb) on dummy xmit and sch_handle_egress() side, it is visible
that these are two different skbs with both skb_unref(skb) as true. The two
seen skbs are due to mirred doing a skb_clone() internally as use_reinsert
is false in tcf_mirred_act() for egress. This was initially reported by Gal.
Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support")
Reported-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/bdfc2640-8f65-5b56-4472-db8e2b161aab@nvidia.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jann Horn [Fri, 25 Aug 2023 13:32:41 +0000 (15:32 +0200)]
dccp: Fix out of bounds access in DCCP error handler
There was a previous attempt to fix an out-of-bounds access in the DCCP
error handlers, but that fix assumed that the error handlers only want
to access the first 8 bytes of the DCCP header. Actually, they also look
at the DCCP sequence number, which is stored beyond 8 bytes, so an
explicit pskb_may_pull() is required.
Fixes: 6706a97fec96 ("dccp: fix out of bound access in dccp_v4_err()")
Fixes: 1aa9d1a0e7ee ("ipv6: dccp: fix out of bound access in dccp_v6_err()")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Aug 2023 09:05:56 +0000 (10:05 +0100)]
Merge branch 'octeontx2-af-misc-mac-block-changes'
Hariprasad Kelam says:
====================
octeontx2-af: misc MAC block changes
This series of patches adds recent changes added in MAC (CGX/RPM) block.
Patch1: Adds new LMAC mode supported by CN10KB silicon
Patch2: In a scenario where system boots with no cgx devices, currently
AF driver treats this as error as a result no interfaces will work.
This patch relaxes this check, such that non cgx mapped netdev
devices will work.
Patch3: This patch adds required lmac validation in MAC block APIs.
Patch4: Prints error message incase, no netdev is mapped with given
cgx,lmac pair.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Fri, 25 Aug 2023 10:40:22 +0000 (16:10 +0530)]
octeontx2-af: print error message incase of invalid pf mapping
During AF driver initialization, it creates a mapping between pf to
cgx,lmac pair. Whenever there is a physical link change, using this
mapping driver forwards the message to the associated netdev.
This patch prints error message incase of cgx,lmac pair is not
associated with any pf netdev.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Fri, 25 Aug 2023 10:40:21 +0000 (16:10 +0530)]
octeontx2-af: Add validation of lmac
With the addition of new MAC blocks like CN10K RPM and CN10KB
RPM_USX, LMACs are noncontiguous. Though in most of the functions,
lmac validation checks exist but in few functions they are missing.
This patch adds the same.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Fri, 25 Aug 2023 10:40:20 +0000 (16:10 +0530)]
octeontx2-af: Don't treat lack of CGX interfaces as error
Don't treat lack of CGX LMACs on the system as a error.
Instead ignore it so that LBK VFs are created and can be used.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Fri, 25 Aug 2023 10:40:19 +0000 (16:10 +0530)]
octeontx2-af: CN10KB: Add USGMII LMAC mode
Upon physical link change, firmware reports to the kernel about the
change along with the details like speed, lmac_type_id, etc.
Kernel derives lmac_type based on lmac_type_id received from firmware.
This patch extends current lmac list with new USGMII mode supported
by CN10KB RPM block.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexis Lothoré [Fri, 25 Aug 2023 08:20:27 +0000 (10:20 +0200)]
dt-bindings: net: dsa: marvell: fix wrong model in compatibility list
Fix wrong switch name in compatibility list.
88E6163 switch does not exist
and is in fact
88E6361
Fixes: 9229a9483d80 ("dt-bindings: net: dsa: marvell: add MV88E6361 switch to compatibility list")
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Liao Chang [Sat, 26 Aug 2023 09:51:13 +0000 (09:51 +0000)]
cpufreq: powernow-k8: Use related_cpus instead of cpus in driver.exit()
Since the 'cpus' field of policy structure will become empty in the
cpufreq core API, it is better to use 'related_cpus' in the exit()
callback of driver.
Fixes: c3274763bfc3 ("cpufreq: powernow-k8: Initialize per-cpu data-structures properly")
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Sumit Gupta [Fri, 25 Aug 2023 11:19:20 +0000 (16:49 +0530)]
cpufreq: tegra194: add online/offline hooks
Implement the light-weight tear down and bring up helpers to reduce the
amount of work to do on CPU offline/online operation.
This change helps to make the hotplugging paths much faster.
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Link: https://lore.kernel.org/lkml/20230816033402.3abmugb5goypvllm@vireshk-i7/
[ Viresh: Fixed rebase conflict ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Radoslaw Tyl [Thu, 24 Aug 2023 20:46:19 +0000 (13:46 -0700)]
igb: set max size RX buffer when store bad packet is enabled
Increase the RX buffer size to 3K when the SBP bit is on. The size of
the RX buffer determines the number of pages allocated which may not
be sufficient for receive frames larger than the set MTU size.
Cc: stable@vger.kernel.org
Fixes: 89eaefb61dc9 ("igb: Support RX-ALL feature flag.")
Reported-by: Manfred Rudigier <manfred.rudigier@omicronenergy.com>
Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima [Thu, 24 Aug 2023 16:50:59 +0000 (09:50 -0700)]
netrom: Deny concurrent connect().
syzkaller reported null-ptr-deref [0] related to AF_NETROM.
This is another self-accept issue from the strace log. [1]
syz-executor creates an AF_NETROM socket and calls connect(), which
is blocked at that time. Then, sk->sk_state is TCP_SYN_SENT and
sock->state is SS_CONNECTING.
[pid 5059] socket(AF_NETROM, SOCK_SEQPACKET, 0) = 4
[pid 5059] connect(4, {sa_family=AF_NETROM, sa_data="..." <unfinished ...>
Another thread calls connect() concurrently, which finally fails
with -EINVAL. However, the problem here is the socket state is
reset even while the first connect() is blocked.
[pid 5060] connect(4, NULL, 0 <unfinished ...>
[pid 5060] <... connect resumed>) = -1 EINVAL (Invalid argument)
As sk->state is TCP_CLOSE and sock->state is SS_UNCONNECTED, the
following listen() succeeds. Then, the first connect() looks up
itself as a listener and puts skb into the queue with skb->sk itself.
As a result, the next accept() gets another FD of itself as 3, and
the first connect() finishes.
[pid 5060] listen(4, 0 <unfinished ...>
[pid 5060] <... listen resumed>) = 0
[pid 5060] accept(4, NULL, NULL <unfinished ...>
[pid 5060] <... accept resumed>) = 3
[pid 5059] <... connect resumed>) = 0
Then, accept4() is called but blocked, which causes the general protection
fault later.
[pid 5059] accept4(4, NULL, 0x20000400, SOCK_NONBLOCK <unfinished ...>
After that, another self-accept occurs by accept() and writev().
[pid 5060] accept(4, NULL, NULL <unfinished ...>
[pid 5061] writev(3, [{iov_base=...}] <unfinished ...>
[pid 5061] <... writev resumed>) = 99
[pid 5060] <... accept resumed>) = 6
Finally, the leader thread close()s all FDs. Since the three FDs
reference the same socket, nr_release() does the cleanup for it
three times, and the remaining accept4() causes the following fault.
[pid 5058] close(3) = 0
[pid 5058] close(4) = 0
[pid 5058] close(5) = -1 EBADF (Bad file descriptor)
[pid 5058] close(6) = 0
[pid 5058] <... exit_group resumed>) = ?
[ 83.456055][ T5059] general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
To avoid the issue, we need to return an error for connect() if
another connect() is in progress, as done in __inet_stream_connect().
[0]:
general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
CPU: 0 PID: 5059 Comm: syz-executor.0 Not tainted
6.5.0-rc5-syzkaller-00194-gace0ab3a4b54 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
RIP: 0010:__lock_acquire+0x109/0x5de0 kernel/locking/lockdep.c:5012
Code: 45 85 c9 0f 84 cc 0e 00 00 44 8b 05 11 6e 23 0b 45 85 c0 0f 84 be 0d 00 00 48 ba 00 00 00 00 00 fc ff df 4c 89 d1 48 c1 e9 03 <80> 3c 11 00 0f 85 e8 40 00 00 49 81 3a a0 69 48 90 0f 84 96 0d 00
RSP: 0018:
ffffc90003d6f9e0 EFLAGS:
00010006
RAX:
ffff8880244c8000 RBX:
1ffff920007adf6c RCX:
0000000000000003
RDX:
dffffc0000000000 RSI:
0000000000000000 RDI:
0000000000000018
RBP:
0000000000000001 R08:
0000000000000001 R09:
0000000000000001
R10:
0000000000000018 R11:
0000000000000000 R12:
0000000000000000
R13:
0000000000000000 R14:
0000000000000000 R15:
0000000000000000
FS:
00007f51d519a6c0(0000) GS:
ffff8880b9800000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f51d5158d58 CR3:
000000002943f000 CR4:
00000000003506f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<TASK>
lock_acquire kernel/locking/lockdep.c:5761 [inline]
lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3a/0x50 kernel/locking/spinlock.c:162
prepare_to_wait+0x47/0x380 kernel/sched/wait.c:269
nr_accept+0x20d/0x650 net/netrom/af_netrom.c:798
do_accept+0x3a6/0x570 net/socket.c:1872
__sys_accept4_file net/socket.c:1913 [inline]
__sys_accept4+0x99/0x120 net/socket.c:1943
__do_sys_accept4 net/socket.c:1954 [inline]
__se_sys_accept4 net/socket.c:1951 [inline]
__x64_sys_accept4+0x96/0x100 net/socket.c:1951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f51d447cae9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007f51d519a0c8 EFLAGS:
00000246 ORIG_RAX:
0000000000000120
RAX:
ffffffffffffffda RBX:
00007f51d459bf80 RCX:
00007f51d447cae9
RDX:
0000000020000400 RSI:
0000000000000000 RDI:
0000000000000004
RBP:
00007f51d44c847a R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000800 R11:
0000000000000246 R12:
0000000000000000
R13:
000000000000000b R14:
00007f51d459bf80 R15:
00007ffc25c34e48
</TASK>
Link: https://syzkaller.appspot.com/text?tag=CrashLog&x=152cdb63a80000
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+666c97e4686410e79649@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=666c97e4686410e79649
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pranavi Somisetty [Thu, 24 Aug 2023 11:44:56 +0000 (17:14 +0530)]
dt-bindings: net: xilinx_gmii2rgmii: Convert to json schema
Convert the Xilinx GMII to RGMII Converter device tree binding
documentation to json schema.
This converter is usually used as gem <---> gmii2rgmii <---> external phy
and, it's phy-handle should point to the phandle of the external phy.
Signed-off-by: Pranavi Somisetty <pranavi.somisetty@amd.com>
Signed-off-by: Harini Katakam <harini.katakam@amd.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 28 Aug 2023 00:17:44 +0000 (17:17 -0700)]
Merge branch 'tls-expand-tls_cipher_size_desc-to-simplify-getsockopt-setsockopt'
Sabrina Dubroca says:
====================
tls: expand tls_cipher_size_desc to simplify getsockopt/setsockopt
Commit
2d2c5ea24243 ("net/tls: Describe ciphers sizes by const
structs") introduced tls_cipher_size_desc to describe the size of the
fields of the per-cipher crypto_info structs, and commit
ea7a9d88ba21
("net/tls: Use cipher sizes structs") used it, but only in
tls_device.c and tls_device_fallback.c, and skipped converting similar
code in tls_main.c and tls_sw.c.
This series expands tls_cipher_size_desc (renamed to tls_cipher_desc
to better fit this expansion) to fully describe a cipher:
- offset of the fields within the per-cipher crypto_info
- size of the full struct (for copies to/from userspace)
- offload flag
- algorithm name used by SW crypto
With these additions, we can remove ~350L of
switch (crypto_info->cipher_type) { ... }
from tls_set_device_offload, tls_sw_fallback_init,
do_tls_getsockopt_conf, do_tls_setsockopt_conf, tls_set_sw_offload
(mainly do_tls_getsockopt_conf and tls_set_sw_offload).
This series also adds the ARIA ciphers to the tls selftests, and some
more getsockopt/setsockopt tests to cover more of the code changed by
this series.
====================
Link: https://lore.kernel.org/r/cover.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:22 +0000 (23:35 +0200)]
tls: get cipher_name from cipher_desc in tls_set_sw_offload
tls_cipher_desc also contains the algorithm name needed by
crypto_alloc_aead, use it.
Finally, use get_cipher_desc to check if the cipher_type coming from
userspace is valid, and remove the cipher_type switch.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/53d021d80138aa125a9cef4468aa5ce531975a7b.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:21 +0000 (23:35 +0200)]
tls: use tls_cipher_desc to access per-cipher crypto_info in tls_set_sw_offload
The crypto_info_* helpers allow us to fetch pointers into the
per-cipher crypto_info's data.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/c23af110caf0af6b68de2f86c58064913e2e902a.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:20 +0000 (23:35 +0200)]
tls: use tls_cipher_desc to get per-cipher sizes in tls_set_sw_offload
We can get rid of some local variables, but we have to keep nonce_size
because tls1.3 uses nonce_size = 0 for all ciphers.
We can also drop the runtime sanity checks on iv/rec_seq/tag size,
since we have compile time checks on those values.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/deed9c4430a62c31751a72b8c03ad66ffe710717.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:19 +0000 (23:35 +0200)]
tls: use tls_cipher_desc to simplify do_tls_getsockopt_conf
Every cipher uses the same code to update its crypto_info struct based
on the values contained in the cctx, with only the struct type and
size/offset changing. We can get those from tls_cipher_desc, and use
a single pair of memcpy and final copy_to_user.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/c21a904b91e972bdbbf9d1c6d2731ccfa1eedf72.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:18 +0000 (23:35 +0200)]
tls: get crypto_info size from tls_cipher_desc in do_tls_setsockopt_conf
We can simplify do_tls_setsockopt_conf using tls_cipher_desc. Also use
get_cipher_desc's result to check if the cipher_type coming from
userspace is valid.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/e97658eb4c6a5832f8ba20a06c4f36a77763c59e.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:17 +0000 (23:35 +0200)]
tls: expand use of tls_cipher_desc in tls_sw_fallback_init
tls_sw_fallback_init already gets the key and tag size from
tls_cipher_desc. We can now also check that the cipher type is valid,
and stop hard-coding the algorithm name passed to crypto_alloc_aead.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/c8c94b8fcafbfb558e09589c1f1ad48dbdf92f76.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:16 +0000 (23:35 +0200)]
tls: allocate the fallback aead after checking that the cipher is valid
No need to allocate the aead if we're going to fail afterwards.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/335e32511ed55a0b30f3f81a78fa8f323b3bdf8f.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:15 +0000 (23:35 +0200)]
tls: expand use of tls_cipher_desc in tls_set_device_offload
tls_set_device_offload is already getting iv and rec_seq sizes from
tls_cipher_desc. We can now also check if the cipher_type coming from
userspace is valid and can be offloaded.
We can also remove the runtime check on rec_seq, since we validate it
at compile time.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/8ab71b8eca856c7aaf981a45fe91ac649eb0e2e9.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:14 +0000 (23:35 +0200)]
tls: validate cipher descriptions at compile time
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/b38fb8cf60e099e82ae9979c3c9c92421042417c.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:13 +0000 (23:35 +0200)]
tls: extend tls_cipher_desc to fully describe the ciphers
- add nonce, usually equal to iv_size but not for chacha
- add offsets into the crypto_info for each field
- add algorithm name
- add offloadable flag
Also add helpers to access each field of a crypto_info struct
described by a tls_cipher_desc.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/39d5f476d63c171097764e8d38f6f158b7c109ae.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:12 +0000 (23:35 +0200)]
tls: rename tls_cipher_size_desc to tls_cipher_desc
We're going to add other fields to it to fully describe a cipher, so
the "_size" name won't match the contents.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/76ca6c7686bd6d1534dfa188fb0f1f6fabebc791.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:11 +0000 (23:35 +0200)]
tls: reduce size of tls_cipher_size_desc
tls_cipher_size_desc indexes ciphers by their type, but we're not
using indices 0..50 of the array. Each struct tls_cipher_size_desc is
20B, so that's a lot of unused memory. We can reindex the array
starting at the lowest used cipher_type.
Introduce the get_cipher_size_desc helper to find the right item and
avoid out-of-bounds accesses, and make tls_cipher_size_desc's size
explicit so that gcc reminds us to update TLS_CIPHER_MIN/MAX when we
add a new cipher.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/5e054e370e240247a5d37881a1cd93a67c15f4ca.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:10 +0000 (23:35 +0200)]
tls: add TLS_CIPHER_ARIA_GCM_* to tls_cipher_size_desc
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/b2e0fb79e6d0a4478be9bf33781dc9c9281c9d56.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:09 +0000 (23:35 +0200)]
tls: move tls_cipher_size_desc to net/tls/tls.h
It's only used in net/tls/*, no need to bloat include/net/tls.h.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/dd9fad80415e5b3575b41f56b331871038362eab.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:08 +0000 (23:35 +0200)]
selftests: tls: test some invalid inputs for setsockopt
This test will need to be updated if new ciphers are added.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/bfcfa9cffda56d2064296ab7c99a05775dd4c28e.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:07 +0000 (23:35 +0200)]
selftests: tls: add getsockopt test
The kernel accepts fetching either just the version and cipher type,
or exactly the per-cipher struct. Also check that getsockopt returns
what we just passed to the kernel.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/81a007ca13de9a74f4af45635d06682cdb385a54.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Fri, 25 Aug 2023 21:35:06 +0000 (23:35 +0200)]
selftests: tls: add test variants for aria-gcm
Only supported for TLS1.2.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/ccf4a4d3f3820f8ff30431b7629f5210cb33fa89.1692977948.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 28 Aug 2023 00:17:19 +0000 (17:17 -0700)]
Merge branch 'tools-net-ynl-add-support-for-netlink-raw-families'
Donald Hunter says:
====================
tools/net/ynl: Add support for netlink-raw families
This patchset adds support for netlink-raw families such as rtnetlink.
Patch 1 fixes a typo in existing schemas
Patch 2 contains the schema definition
Patches 3 & 4 update the schema documentation
Patches 5 - 9 extends ynl
Patches 10 - 12 add several netlink-raw specs
The netlink-raw schema is very similar to genetlink-legacy and I thought
about making the changes there and symlinking to it. On balance I
thought that might be problematic for accurate schema validation.
rtnetlink doesn't seem to fit into unified or directional message
enumeration models. It seems like an 'explicit' model would be useful,
to force the schema author to specify the message ids directly.
There is not yet support for notifications because ynl currently doesn't
support defining 'event' properties on a 'do' operation. The message ids
are shared so ops need to be both sync and async. I plan to look at this
in a future patch.
The link and route messages contain different nested attributes
dependent on the type of link or route. Decoding these will need some
kind of attr-space selection that uses the value of another attribute as
the selector key. These nested attributes have been left with type
'binary' for now.
====================
Link: https://lore.kernel.org/r/20230825122756.7603-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Donald Hunter [Fri, 25 Aug 2023 12:27:55 +0000 (13:27 +0100)]
doc/netlink: Add spec for rt route messages
Add schema for rt route with support for getroute, newroute and
delroute.
Routes can be dumped with filter attributes like this:
./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/rt_route.yaml \
--dump getroute --json '{"rtm-family": 2, "rtm-table": 254}'
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230825122756.7603-13-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Donald Hunter [Fri, 25 Aug 2023 12:27:54 +0000 (13:27 +0100)]
doc/netlink: Add spec for rt link messages
Add schema for rt link with support for newlink, dellink, getlink,
setlink and getstats.
A dummy link can be created like this:
sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/rt_link.yaml \
--do newlink --create \
--json '{"ifname": "dummy0", "linkinfo": {"kind": "dummy"}}'
For example, offload stats can be fetched like this:
./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/rt_link.yaml \
--dump getstats --json '{ "filter-mask": 8 }'
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230825122756.7603-12-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Donald Hunter [Fri, 25 Aug 2023 12:27:53 +0000 (13:27 +0100)]
doc/netlink: Add spec for rt addr messages
Add schema for rt addr with support for:
- newaddr, deladdr, getaddr (dump)
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230825122756.7603-11-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Donald Hunter [Fri, 25 Aug 2023 12:27:52 +0000 (13:27 +0100)]
tools/net/ynl: Add support for create flags
Add support for using NLM_F_REPLACE, _EXCL, _CREATE and _APPEND flags
in requests.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230825122756.7603-10-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Donald Hunter [Fri, 25 Aug 2023 12:27:51 +0000 (13:27 +0100)]
tools/net/ynl: Implement nlattr array-nest decoding in ynl
Add support for the 'array-nest' attribute type that is used by several
netlink-raw families.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230825122756.7603-9-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Donald Hunter [Fri, 25 Aug 2023 12:27:50 +0000 (13:27 +0100)]
tools/net/ynl: Add support for netlink-raw families
Refactor the ynl code to encapsulate protocol specifics into
NetlinkProtocol and GenlProtocol.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20230825122756.7603-8-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Donald Hunter [Fri, 25 Aug 2023 12:27:49 +0000 (13:27 +0100)]
tools/net/ynl: Fix extack parsing with fixed header genlmsg
Move decode_fixed_header into YnlFamily and add a _fixed_header_size
method to allow extack decoding to skip the fixed header.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230825122756.7603-7-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Donald Hunter [Fri, 25 Aug 2023 12:27:48 +0000 (13:27 +0100)]
tools/ynl: Add mcast-group schema parsing to ynl
Add a SpecMcastGroup class to the nlspec lib.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230825122756.7603-6-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Donald Hunter [Fri, 25 Aug 2023 12:27:47 +0000 (13:27 +0100)]
doc/netlink: Document the netlink-raw schema extensions
Add a doc page for netlink-raw that describes the schema attributes
needed for netlink-raw.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230825122756.7603-5-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Donald Hunter [Fri, 25 Aug 2023 12:27:46 +0000 (13:27 +0100)]
doc/netlink: Update genetlink-legacy documentation
Add documentation for recently added genetlink-legacy schema attributes.
Remove statements about 'work in progress' and 'todo'.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230825122756.7603-4-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Donald Hunter [Fri, 25 Aug 2023 12:27:45 +0000 (13:27 +0100)]
doc/netlink: Add a schema for netlink-raw families
This schema is largely a copy of the genetlink-legacy schema with the
following modifications:
- change the schema id to netlink-raw
- add a top-level protonum property, e.g. 0 (for NETLINK_ROUTE)
- change the protocol enumeration to netlink-raw, removing the
genetlink options.
- replace doc references to generic netlink with raw netlink
- add a value property to mcast-group definitions
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230825122756.7603-3-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Donald Hunter [Fri, 25 Aug 2023 12:27:44 +0000 (13:27 +0100)]
doc/netlink: Fix typo in genetlink-* schemas
Fix typo verion -> version in genetlink-c and genetlink-legacy.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230825122756.7603-2-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 28 Aug 2023 00:08:47 +0000 (17:08 -0700)]
Merge branch 'devlink-mlx5-add-port-function-attributes-for-ipsec'
Saeed Mahameed says:
====================
{devlink,mlx5}: Add port function attributes for ipsec
From Dima:
Introduce hypervisor-level control knobs to set the functionality of PCI
VF devices passed through to guests. The administrator of a hypervisor
host may choose to change the settings of a port function from the
defaults configured by the device firmware.
The software stack has two types of IPsec offload - crypto and packet.
Specifically, the ip xfrm command has sub-commands for "state" and
"policy" that have an "offload" parameter. With ip xfrm state, both
crypto and packet offload types are supported, while ip xfrm policy can
only be offloaded in packet mode.
The series introduces two new boolean attributes of a port function:
ipsec_crypto and ipsec_packet. The goal is to provide a similar level of
granularity for controlling VF IPsec offload capabilities, which would
be aligned with the software model. This will allow users to decide if
they want both types of offload enabled for a VF, just one of them, or
none at all (which is the default).
At a high level, the difference between the two knobs is that with
ipsec_crypto, only XFRM state can be offloaded. Specifically, only the
crypto operation (Encrypt/Decrypt) is offloaded. With ipsec_packet, both
XFRM state and policy can be offloaded. Furthermore, in addition to
crypto operation offload, IPsec encapsulation is also offloaded. For
XFRM state, choosing between crypto and packet offload types is
possible. From the HW perspective, different resources may be required
for each offload type.
Examples of when a user prefers to enable IPsec packet offload for a VF
when using switchdev mode:
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable migratable disable ipsec_crypto disable ipsec_packet disable
$ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable migratable disable ipsec_crypto disable ipsec_packet enable
This enables the corresponding IPsec capability of the function before
it's enumerated, so when the driver reads the capability from the device
firmware, it is enabled. The driver is then able to configure
corresponding features and ops of the VF net device to support IPsec
state and policy offloading.
v2: https://lore.kernel.org/netdev/
20230421104901.897946-1-dchumak@nvidia.com/
====================
Link: https://lore.kernel.org/r/20230825062836.103744-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dima Chumak [Fri, 25 Aug 2023 06:28:36 +0000 (23:28 -0700)]
net/mlx5: Implement devlink port function cmds to control ipsec_packet
Implement devlink port function commands to enable / disable IPsec
packet offloads. This is used to control the IPsec capability of the
device.
When ipsec_offload is enabled for a VF, it prevents adding IPsec packet
offloads on the PF, because the two cannot be active simultaneously due
to HW constraints. Conversely, if there are any active IPsec packet
offloads on the PF, it's not allowed to enable ipsec_packet on a VF,
until PF IPsec offloads are cleared.
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-9-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dima Chumak [Fri, 25 Aug 2023 06:28:35 +0000 (23:28 -0700)]
net/mlx5: Implement devlink port function cmds to control ipsec_crypto
Implement devlink port function commands to enable / disable IPsec
crypto offloads. This is used to control the IPsec capability of the
device.
When ipsec_crypto is enabled for a VF, it prevents adding IPsec crypto
offloads on the PF, because the two cannot be active simultaneously due
to HW constraints. Conversely, if there are any active IPsec crypto
offloads on the PF, it's not allowed to enable ipsec_crypto on a VF,
until PF IPsec offloads are cleared.
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-8-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Leon Romanovsky [Fri, 25 Aug 2023 06:28:34 +0000 (23:28 -0700)]
net/mlx5: Provide an interface to block change of IPsec capabilities
mlx5 HW can't perform IPsec offload operation simultaneously both on PF
and VFs at the same time. While the previous patches added devlink knobs
to change IPsec capabilities dynamically, there is a need to add a logic
to block such IPsec capabilities for the cases when IPsec is already
configured.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-7-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Leon Romanovsky [Fri, 25 Aug 2023 06:28:33 +0000 (23:28 -0700)]
net/mlx5: Add IFC bits to support IPsec enable/disable
Add hardware definitions to allow to control IPSec capabilities.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-6-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Leon Romanovsky [Fri, 25 Aug 2023 06:28:32 +0000 (23:28 -0700)]
net/mlx5e: Rewrite IPsec vs. TC block interface
In the commit
366e46242b8e ("net/mlx5e: Make IPsec offload work together
with eswitch and TC"), new API to block IPsec vs. TC creation was introduced.
Internally, that API used devlink lock to avoid races with userspace, but it is
not really needed as dev->priv.eswitch is stable and can't be changed. So remove
dependency on devlink lock and move block encap code back to its original place.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-5-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Leon Romanovsky [Fri, 25 Aug 2023 06:28:31 +0000 (23:28 -0700)]
net/mlx5: Drop extra layer of locks in IPsec
There is no need in holding devlink lock as it gives nothing
compared to already used write mode_lock.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-4-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>