Linus Torvalds [Tue, 25 Apr 2023 18:05:04 +0000 (11:05 -0700)]
Merge tag 'core-entry-2023-04-24' of git://git./linux/kernel/git/tip/tip
Pull core entry/ptrace update from Thomas Gleixner:
"Provide a ptrace set/get interface for syscall user dispatch. The main
purpose is to enable checkpoint/restore (CRIU) to handle processes
which utilize syscall user dispatch correctly"
* tag 'core-entry-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftest, ptrace: Add selftest for syscall user dispatch config api
ptrace: Provide set/get interface for syscall user dispatch
syscall_user_dispatch: Untag selector address before access_ok()
syscall_user_dispatch: Split up set_syscall_user_dispatch()
Linus Torvalds [Tue, 25 Apr 2023 18:00:45 +0000 (11:00 -0700)]
Merge tag 'core-debugobjects-2023-04-24' of git://git./linux/kernel/git/tip/tip
Pull core debugobjects update from Thomas Gleixner:
"A single update to debugobjects:
Prevent a race vs statically initialized objects. Such objects are
usually not initialized via an init() function. They are special cased
and detected on first use under the assumption that they are already
correctly initialized via the static initializer.
This works correctly unless there are two concurrent debug object
operations on such an object.
The first one detects that the object is not yet tracked and tries to
establish a tracking object after dropping the debug objects hash
bucket lock. The concurrent operation does the same. The one which
wins the race ends up modifying the state of the object which makes
the other one fail resulting in a bogus debug objects warning.
Prevent this by making the detection of a static object and the
allocation of a tracking object atomic under the hash bucket lock. So
the first one to acquire the hash bucket lock will succeed and the
second one will observe the correct tracking state.
This race existed forever but was only exposed when the timer wheel
code added a debug_object_assert_init() call outside of the timer base
locked region. This replaced the previous warning about
timer::function being NULL which had to be removed when the
timer_shutdown() mechanics were added"
* tag 'core-debugobjects-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobject: Prevent init race with static objects
Linus Torvalds [Tue, 25 Apr 2023 17:48:08 +0000 (10:48 -0700)]
Merge tag 'x86_sev_for_v6.4_rc1' of git://git./linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
- Add the necessary glue so that the kernel can run as a confidential
SEV-SNP vTOM guest on Hyper-V. A vTOM guest basically splits the
address space in two parts: encrypted and unencrypted. The use case
being running unmodified guests on the Hyper-V confidential computing
hypervisor
- Double-buffer messages between the guest and the hardware PSP device
so that no partial buffers are copied back'n'forth and thus potential
message integrity and leak attacks are possible
- Name the return value the sev-guest driver returns when the hw PSP
device hasn't been called, explicitly
- Cleanups
* tag 'x86_sev_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyperv: Change vTOM handling to use standard coco mechanisms
init: Call mem_encrypt_init() after Hyper-V hypercall init is done
x86/mm: Handle decryption/re-encryption of bss_decrypted consistently
Drivers: hv: Explicitly request decrypted in vmap_pfn() calls
x86/hyperv: Reorder code to facilitate future work
x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VM
x86/sev: Change snp_guest_issue_request()'s fw_err argument
virt/coco/sev-guest: Double-buffer messages
crypto: ccp: Get rid of __sev_platform_init_locked()'s local function pointer
crypto: ccp - Name -1 return value as SEV_RET_NO_FW_CALL
Linus Torvalds [Tue, 25 Apr 2023 17:32:51 +0000 (10:32 -0700)]
Merge tag 'x86_paravirt_for_v6.4_rc1' of git://git./linux/kernel/git/tip/tip
Pull x86 paravirt updates from Borislav Petkov:
- Convert a couple of paravirt callbacks to asm to prevent
'-fzero-call-used-regs' builds from zeroing live registers because
paravirt hides the CALLs from the compiler so latter doesn't know
there's a CALL in the first place
- Merge two paravirt callbacks into one, as their functionality is
identical
* tag 'x86_paravirt_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/paravirt: Convert simple paravirt functions to asm
x86/paravirt: Merge activate_mm() and dup_mmap() callbacks
Linus Torvalds [Tue, 25 Apr 2023 17:27:02 +0000 (10:27 -0700)]
Merge tag 'x86_misc_for_v6.4_rc1' of git://git./linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:
- Add a x86 hw vulnerabilities section to MAINTAINERS so that the folks
involved in it can get CCed on patches
- Add some more CPUID leafs to the kcpuid tool and extend its
functionality to be more useful when grepping for CPUID bits
* tag 'x86_misc_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add x86 hardware vulnerabilities section
tools/x86/kcpuid: Dump the CPUID function in detailed view
tools/x86/kcpuid: Update AMD leaf Fn80000001
tools/x86/kcpuid: Fix avx512bw and avx512lvl fields in Fn00000007
Linus Torvalds [Tue, 25 Apr 2023 17:20:52 +0000 (10:20 -0700)]
Merge tag 'x86_cpu_for_v6.4_rc1' of git://git./linux/kernel/git/tip/tip
Pull x86 cpu model updates from Borislav Petkov:
- Add Emerald Rapids to the list of Intel models supporting PPIN
- Finally use a CPUID bit for split lock detection instead of
enumerating every model
- Make sure automatic IBRS is set on AMD, even though the AP bringup
code does that now by replicating the MSR which contains the switch
* tag 'x86_cpu_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add Xeon Emerald Rapids to list of CPUs that support PPIN
x86/split_lock: Enumerate architectural split lock disable bit
x86/CPU/AMD: Make sure EFER[AIBRSE] is set
Linus Torvalds [Tue, 25 Apr 2023 17:05:00 +0000 (10:05 -0700)]
Merge tag 'x86_acpi_for_v6.4_rc1' of git://git./linux/kernel/git/tip/tip
Pull x86 ACPI update from Borislav Petkov:
- Improve code generation in ACPI's global lock's acquisition function
* tag 'x86_acpi_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ACPI/boot: Improve __acpi_acquire_global_lock
Linus Torvalds [Tue, 25 Apr 2023 16:56:33 +0000 (09:56 -0700)]
Merge tag 'ras_core_for_v6.4_rc1' of git://git./linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
- Just cleanups and fixes this time around: make threshold_ktype const,
an objtool fix and use proper size for a bitmap
* tag 'ras_core_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/MCE/AMD: Use an u64 for bank_map
x86/mce: Always inline old MCA stubs
x86/MCE/AMD: Make kobj_type structure constant
Linus Torvalds [Tue, 25 Apr 2023 16:44:07 +0000 (09:44 -0700)]
Merge tag 'edac_updates_for_v6.4' of git://git./linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:
- skx_edac: Fix overflow when decoding 32G DIMM ranks
- i10nm_edac: Add Sierra Forest support
- amd64_edac: Split driver code between legacy and SMCA systems. The
final goal is adding support for more hw, like GPUs
- The usual minor cleanups and fixes
* tag 'edac_updates_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (25 commits)
EDAC/i10nm: Add Intel Sierra Forest server support
EDAC/amd64: Fix indentation in umc_determine_edac_cap()
EDAC/altera: Remove MODULE_LICENSE in non-module
EDAC: Sanitize MODULE_AUTHOR strings
EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHOR
EDAC/amd64: Add get_err_info() to pvt->ops
EDAC/amd64: Split dump_misc_regs() into dct/umc functions
EDAC/amd64: Split init_csrows() into dct/umc functions
EDAC/amd64: Split determine_edac_cap() into dct/umc functions
EDAC/amd64: Rename f17h_determine_edac_ctl_cap()
EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions
EDAC/amd64: Split ecc_enabled() into dct/umc functions
EDAC/amd64: Split read_mc_regs() into dct/umc functions
EDAC/amd64: Split determine_memory_type() into dct/umc functions
EDAC/amd64: Split read_base_mask() into dct/umc functions
EDAC/amd64: Split prep_chip_selects() into dct/umc functions
EDAC/amd64: Rework hw_info_{get,put}
EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
EDAC/amd64: Do not discover ECC symbol size for Family 17h and later
EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
...
Linus Torvalds [Tue, 25 Apr 2023 16:37:43 +0000 (09:37 -0700)]
Merge tag 'm68k-for-v6.4-tag1' of git://git./linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- defconfig updates
- miscellaneous fixes and improvements
* tag 'm68k-for-v6.4-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: kexec: Include <linux/reboot.h>
m68k: defconfig: Update defconfigs for v6.3-rc1
m68k: Remove obsolete config NO_KERNEL_MSG
nubus: Drop noop match function
Dhruva Gole [Mon, 24 Apr 2023 10:25:46 +0000 (15:55 +0530)]
spi: bcm63xx: use macro DEFINE_SIMPLE_DEV_PM_OPS
Using this macro makes the code more readable.
It also inits the members of dev_pm_ops in the following manner
without us explicitly needing to:
.suspend = bcm63xx_spi_suspend, \
.resume = bcm63xx_spi_resume, \
.freeze = bcm63xx_spi_suspend, \
.thaw = bcm63xx_spi_resume, \
.poweroff = bcm63xx_spi_suspend, \
.restore = bcm63xx_spi_resume
Signed-off-by: Dhruva Gole <d-gole@ti.com>
Link: https://lore.kernel.org/r/20230424102546.1604484-1-d-gole@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Juergen Gross [Fri, 16 Dec 2022 13:49:18 +0000 (14:49 +0100)]
xen/blkback: move blkif_get_x86_*_req() into blkback.c
There is no need to have the functions blkif_get_x86_32_req() and
blkif_get_x86_64_req() in a header file, as they are used in one place
only.
So move them into the using source file and drop the inline qualifier.
While at it fix some style issues, and simplify the code by variable
reusing and using min() instead of open coding it.
Instead of using barrier() use READ_ONCE() for avoiding multiple reads
of nr_segments.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Juergen Gross [Mon, 12 Dec 2022 09:37:42 +0000 (10:37 +0100)]
xen/blkback: simplify free_persistent_gnts() interface
The interface of free_persistent_gnts() can be simplified, as there is
only a single caller of free_persistent_gnts() and the 2nd and 3rd
parameters are easily obtainable via the ring pointer, which is passed
as the first parameter anyway.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Juergen Gross [Mon, 12 Dec 2022 08:43:28 +0000 (09:43 +0100)]
xen/blkback: remove stale prototype
There is no function xen_blkif_purge_persistent(), so remove its
prototype from common.h.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Juergen Gross [Mon, 12 Dec 2022 08:41:34 +0000 (09:41 +0100)]
xen/blkback: fix white space code style issues
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Bob Peterson [Fri, 21 Apr 2023 19:07:10 +0000 (15:07 -0400)]
gfs2: gfs2_ail_empty_gl no log flush on error
Before this patch, function gfs2_ail_empty_gl called gfs2_log_flush even
in cases where it encountered an error. It should probably skip the log
flush and leave the file system in an inconsistent state, letting a
subsequent withdraw force the journal to be replayed to reestablish
metadata consistency.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Bob Peterson [Fri, 21 Apr 2023 19:07:09 +0000 (15:07 -0400)]
gfs2: Issue message when revokes cannot be written
Before this patch, function gfs2_ail_empty_gl would silently return an
error to the caller. This would get silently set into sd_log_error which
would cause a withdraw, but there was no indication why the file system
was withdrawn. This patch adds a fs_err to log the appropriate error
message.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Bob Peterson [Fri, 21 Apr 2023 19:07:08 +0000 (15:07 -0400)]
gfs2: Perform second log flush in gfs2_make_fs_ro
Before this patch, function gfs2_make_fs_ro called gfs2_log_flush once to
finalize the log. However, if there's dirty metadata, log flushes tend
to sync the metadata and formulate revokes. Before this patch, those
revokes may not be written out to the journal immediately, which meant
unresolved glocks could still have revokes in their ail lists. When the
glock worker runs, it tries to transition the glock, but the unresolved
revokes in the ail still need to be written, so it tries to start a
transaction. It's impossible to start a transaction because at that
point, the SDF_JOURNAL_LIVE flag has been cleared by gfs2_make_fs_ro.
That causes the glock worker to fail, unable to write the revokes. The
calling sequence looked something like this:
gfs2_make_fs_ro
gfs2_log_flush - with GFS2_LOG_HEAD_FLUSH_SHUTDOWN flag set
if (flags & GFS2_LOG_HEAD_FLUSH_SHUTDOWN)
clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
...meanwhile...
glock_work_func
do_xmote
rgrp_go_sync (or possibly inode_go_sync)
...
gfs2_ail_empty_gl
__gfs2_trans_begin
if (unlikely(!test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags))) {
...
return -EROFS;
The previous patch in the series ("gfs2: return errors from
gfs2_ail_empty_gl") now causes the transaction error to no longer be
ignored, so it causes a warning from MOST of the xfstests:
WARNING: CPU: 11 PID: X at fs/gfs2/super.c:603 gfs2_put_super [gfs2]
which corresponds to:
WARN_ON(gfs2_withdrawing(sdp));
The withdraw was triggered silently from do_xmote by:
if (unlikely(sdp->sd_log_error && !gfs2_withdrawn(sdp)))
gfs2_withdraw_delayed(sdp);
This patch adds a second log_flush to gfs2_make_fs_ro: one to sync the
data and one to sync any outstanding revokes and finalize the journal.
Note that both of these log flushes need to be "special," in other
words, not GFS2_LOG_HEAD_FLUSH_NORMAL.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Bob Peterson [Fri, 21 Apr 2023 19:07:07 +0000 (15:07 -0400)]
gfs2: return errors from gfs2_ail_empty_gl
Before this patch, function gfs2_ail_empty_gl did not return errors it
encountered from __gfs2_trans_begin. Those errors usually came from the
fact that the file system was made read-only, often due to unmount
(but theoretically could be due to -o remount,ro), which prevented
the transaction from starting.
The inability to start a transaction prevented its revokes from being
properly written to the journal for glocks during unmount (and
transition to ro).
That meant glocks could be unlocked without the metadata properly
revoked in the journal. So other nodes could grab the glock thinking
that their lvb values were correct, but in fact corresponded to the
glock without its revokes properly synced. That presented as lvb
mismatch errors.
This patch allows gfs2_ail_empty_gl to return the error properly to
the caller.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Basavaraj Natikar [Mon, 24 Apr 2023 16:04:06 +0000 (21:34 +0530)]
HID: amd_sfh: Fix max supported HID devices
commit
4bd763568dbd ("HID: amd_sfh: Support for additional light sensor")
adds additional sensor devices, but forgets to add the number of HID
devices to match. Thus, the number of HID devices does not match the
actual number of sensors.
In order to prevent corruption and system hangs when more than the
allowed number of HID devices are accessed, the number of HID devices is
increased accordingly.
Fixes: 4bd763568dbd ("HID: amd_sfh: Support for additional light sensor")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217354
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Link: https://lore.kernel.org/r/20230424160406.2579888-1-Basavaraj.Natikar@amd.com
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
wuych [Tue, 25 Apr 2023 05:15:32 +0000 (13:15 +0800)]
net: phy: marvell-88x2222: remove unnecessary (void*) conversions
Pointer variables of void * type do not require type cast.
Signed-off-by: wuych <yunchuan@nfschina.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima [Mon, 24 Apr 2023 22:20:22 +0000 (15:20 -0700)]
tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
syzkaller reported [0] memory leaks of an UDP socket and ZEROCOPY
skbs. We can reproduce the problem with these sequences:
sk = socket(AF_INET, SOCK_DGRAM, 0)
sk.setsockopt(SOL_SOCKET, SO_TIMESTAMPING, SOF_TIMESTAMPING_TX_SOFTWARE)
sk.setsockopt(SOL_SOCKET, SO_ZEROCOPY, 1)
sk.sendto(b'', MSG_ZEROCOPY, ('127.0.0.1', 53))
sk.close()
sendmsg() calls msg_zerocopy_alloc(), which allocates a skb, sets
skb->cb->ubuf.refcnt to 1, and calls sock_hold(). Here, struct
ubuf_info_msgzc indirectly holds a refcnt of the socket. When the
skb is sent, __skb_tstamp_tx() clones it and puts the clone into
the socket's error queue with the TX timestamp.
When the original skb is received locally, skb_copy_ubufs() calls
skb_unclone(), and pskb_expand_head() increments skb->cb->ubuf.refcnt.
This additional count is decremented while freeing the skb, but struct
ubuf_info_msgzc still has a refcnt, so __msg_zerocopy_callback() is
not called.
The last refcnt is not released unless we retrieve the TX timestamped
skb by recvmsg(). Since we clear the error queue in inet_sock_destruct()
after the socket's refcnt reaches 0, there is a circular dependency.
If we close() the socket holding such skbs, we never call sock_put()
and leak the count, sk, and skb.
TCP has the same problem, and commit
e0c8bccd40fc ("net: stream:
purge sk_error_queue in sk_stream_kill_queues()") tried to fix it
by calling skb_queue_purge() during close(). However, there is a
small chance that skb queued in a qdisc or device could be put
into the error queue after the skb_queue_purge() call.
In __skb_tstamp_tx(), the cloned skb should not have a reference
to the ubuf to remove the circular dependency, but skb_clone() does
not call skb_copy_ubufs() for zerocopy skb. So, we need to call
skb_orphan_frags_rx() for the cloned skb to call skb_copy_ubufs().
[0]:
BUG: memory leak
unreferenced object 0xffff88800c6d2d00 (size 1152):
comm "syz-executor392", pid 264, jiffies
4294785440 (age 13.044s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 cd af e8 81 00 00 00 00 ................
02 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............
backtrace:
[<
0000000055636812>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:2024
[<
0000000054d77b7a>] sk_alloc+0x3b/0x800 net/core/sock.c:2083
[<
0000000066f3c7e0>] inet_create net/ipv4/af_inet.c:319 [inline]
[<
0000000066f3c7e0>] inet_create+0x31e/0xe40 net/ipv4/af_inet.c:245
[<
000000009b83af97>] __sock_create+0x2ab/0x550 net/socket.c:1515
[<
00000000b9b11231>] sock_create net/socket.c:1566 [inline]
[<
00000000b9b11231>] __sys_socket_create net/socket.c:1603 [inline]
[<
00000000b9b11231>] __sys_socket_create net/socket.c:1588 [inline]
[<
00000000b9b11231>] __sys_socket+0x138/0x250 net/socket.c:1636
[<
000000004fb45142>] __do_sys_socket net/socket.c:1649 [inline]
[<
000000004fb45142>] __se_sys_socket net/socket.c:1647 [inline]
[<
000000004fb45142>] __x64_sys_socket+0x73/0xb0 net/socket.c:1647
[<
0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<
0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
[<
0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
BUG: memory leak
unreferenced object 0xffff888017633a00 (size 240):
comm "syz-executor392", pid 264, jiffies
4294785440 (age 13.044s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 2d 6d 0c 80 88 ff ff .........-m.....
backtrace:
[<
000000002b1c4368>] __alloc_skb+0x229/0x320 net/core/skbuff.c:497
[<
00000000143579a6>] alloc_skb include/linux/skbuff.h:1265 [inline]
[<
00000000143579a6>] sock_omalloc+0xaa/0x190 net/core/sock.c:2596
[<
00000000be626478>] msg_zerocopy_alloc net/core/skbuff.c:1294 [inline]
[<
00000000be626478>] msg_zerocopy_realloc+0x1ce/0x7f0 net/core/skbuff.c:1370
[<
00000000cbfc9870>] __ip_append_data+0x2adf/0x3b30 net/ipv4/ip_output.c:1037
[<
0000000089869146>] ip_make_skb+0x26c/0x2e0 net/ipv4/ip_output.c:1652
[<
00000000098015c2>] udp_sendmsg+0x1bac/0x2390 net/ipv4/udp.c:1253
[<
0000000045e0e95e>] inet_sendmsg+0x10a/0x150 net/ipv4/af_inet.c:819
[<
000000008d31bfde>] sock_sendmsg_nosec net/socket.c:714 [inline]
[<
000000008d31bfde>] sock_sendmsg+0x141/0x190 net/socket.c:734
[<
0000000021e21aa4>] __sys_sendto+0x243/0x360 net/socket.c:2117
[<
00000000ac0af00c>] __do_sys_sendto net/socket.c:2129 [inline]
[<
00000000ac0af00c>] __se_sys_sendto net/socket.c:2125 [inline]
[<
00000000ac0af00c>] __x64_sys_sendto+0xe1/0x1c0 net/socket.c:2125
[<
0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<
0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
[<
0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY")
Fixes: b5947e5d1e71 ("udp: msg_zerocopy")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gencen Gan [Mon, 24 Apr 2023 15:28:01 +0000 (23:28 +0800)]
net: amd: Fix link leak when verifying config failed
After failing to verify configuration, it returns directly without
releasing link, which may cause memory leak.
Paolo Abeni thinks that the whole code of this driver is quite
"suboptimal" and looks unmainatained since at least ~15y, so he
suggests that we could simply remove the whole driver, please
take it into consideration.
Simon Horman suggests that the fix label should be set to
"Linux-2.6.12-rc2" considering that the problem has existed
since the driver was introduced and the commit above doesn't
seem to exist in net/net-next.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Gan Gecen <gangecen@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sakari Ailus [Wed, 29 Mar 2023 14:57:50 +0000 (15:57 +0100)]
media: ov5670: Fix probe on ACPI
devm_clk_get() will return either an error or NULL, which the driver
handles, continuing to use the clock of reading the value of the
clock-frequency property.
However, the value of ov5670->xvclk is left as-is and the other clock
framework functions aren't capable of handling error values.
Use devm_clk_get_optional() to obtain NULL instead of -ENOENT.
Fixes: 8004c91e2095 ("media: i2c: ov5670: Use common clock framework")
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Geert Uytterhoeven [Thu, 23 Mar 2023 10:22:59 +0000 (11:22 +0100)]
sh: Replace <uapi/asm/types.h> by <asm-generic/int-ll64.h>
As arch/sh/include/uapi/asm/types.h doesn't exist, sh doesn't provide
any sh-specific uapi definitions, and it can just include
<asm-generic/int-ll64.h>, like most other architectures.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/26932016c83c2ad350db59f5daf96117a38bbbd8.1679566927.git.geert@linux-m68k.org
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Geert Uytterhoeven [Thu, 23 Mar 2023 10:18:07 +0000 (11:18 +0100)]
sh: Use generic GCC library routines
The C implementations of __ashldi3(), __ashrdi3__(), and __lshrdi3() in
arch/sh/lib/ are identical to the generic C implementations in lib/.
Reduce duplication by switching SH to the generic versions.
Update the include path in arch/sh/boot/compressed accordingly.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/74dbe68dc8e2ffb6180092f73723fe21ab692c7a.1679566500.git.geert+renesas@glider.be
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Linus Torvalds [Tue, 25 Apr 2023 02:43:32 +0000 (19:43 -0700)]
Merge tag 'pull-nios2' of git://git./linux/kernel/git/viro/vfs
Pull trivial nios2 cleanup from Al Viro.
* tag 'pull-nios2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
nios2: _TIF_ALLWORK_MASK is unused
Linus Torvalds [Tue, 25 Apr 2023 02:38:34 +0000 (19:38 -0700)]
Merge tag 'pull-misc' of git://git./linux/kernel/git/viro/vfs
Pull misc vfs pile from Al Viro.
Random minor cleanups.
* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Fix description of vfs_tmpfile()
sysv: switch to put_and_unmap_page()
fs/sysv: Don't round down address for kunmap_flush_on_unmap()
Linus Torvalds [Tue, 25 Apr 2023 02:28:49 +0000 (19:28 -0700)]
Merge tag 'pull-old-dio' of git://git./linux/kernel/git/viro/vfs
Pull legacy dio cleanup from Al Viro.
* tag 'pull-old-dio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
__blockdev_direct_IO(): get rid of submit_io callback
Linus Torvalds [Tue, 25 Apr 2023 02:20:27 +0000 (19:20 -0700)]
Merge tag 'pull-write-one-page' of git://git./linux/kernel/git/viro/vfs
Pull vfs write_one_page removal from Al Viro:
"write_one_page series"
* tag 'pull-write-one-page' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
mm,jfs: move write_one_page/folio_write_one to jfs
ocfs2: don't use write_one_page in ocfs2_duplicate_clusters_by_page
ufs: don't flush page immediately for DIRSYNC directories
Linus Torvalds [Tue, 25 Apr 2023 02:14:20 +0000 (19:14 -0700)]
Merge tag 'pull-fd' of git://git./linux/kernel/git/viro/vfs
Pull vfs fget updates from Al Viro:
"fget() to fdget() conversions"
* tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fuse_dev_ioctl(): switch to fdget()
cgroup_get_from_fd(): switch to fdget_raw()
bpf: switch to fdget_raw()
build_mount_idmapped(): switch to fdget()
kill the last remaining user of proc_ns_fget()
SVM-SEV: convert the rest of fget() uses to fdget() in there
convert sgx_set_attribute() to fdget()/fdput()
convert setns(2) to fdget()/fdput()
Christian Marangi [Sun, 23 Apr 2023 17:28:00 +0000 (19:28 +0200)]
net: phy: marvell: Fix inconsistent indenting in led_blink_set
Fix inconsistent indeinting in m88e1318_led_blink_set reported by kernel
test robot, probably done by the presence of an if condition dropped in
later revision of the same code.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304240007.0VEX8QYG-lkp@intel.com/
Fixes: ea9e86485dec ("net: phy: marvell: Implement led_blink_set()")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230423172800.3470-1-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Horatiu Vultur [Sat, 22 Apr 2023 14:23:44 +0000 (16:23 +0200)]
lan966x: Don't use xdp_frame when action is XDP_TX
When the action of an xdp program was XDP_TX, lan966x was creating
a xdp_frame and use this one to send the frame back. But it is also
possible to send back the frame without needing a xdp_frame, because
it is possible to send it back using the page.
And then once the frame is transmitted is possible to use directly
page_pool_recycle_direct as lan966x is using page pools.
This would save some CPU usage on this path, which results in higher
number of transmitted frames. Bellow are the statistics:
Frame size: Improvement:
64 ~8%
256 ~11%
512 ~8%
1000 ~0%
1500 ~0%
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230422142344.3630602-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 25 Apr 2023 01:45:11 +0000 (18:45 -0700)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
====================
pull-request: bpf-next 2023-04-24
We've added 5 non-merge commits during the last 3 day(s) which contain
a total of 7 files changed, 87 insertions(+), 44 deletions(-).
The main changes are:
1) Workaround for bpf iter selftest due to lack of subprog support
in precision tracking, from Andrii.
2) Disable bpf_refcount_acquire kfunc until races are fixed, from Dave.
3) One more test_verifier test converted from asm macro to asm in C,
from Eduard.
4) Fix build with NETFILTER=y INET=n config, from Florian.
5) Add __rcu_read_{lock,unlock} into deny list, from Yafang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: avoid mark_all_scalars_precise() trigger in one of iter tests
bpf: Add __rcu_read_{lock,unlock} into btf id deny list
bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed
selftests/bpf: verifier/prevent_map_lookup converted to inline assembly
bpf: fix link failure with NETFILTER=y INET=n
====================
Link: https://lore.kernel.org/r/20230425005648.86714-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 25 Apr 2023 01:22:39 +0000 (18:22 -0700)]
Merge branch 'tsnep-xdp-socket-zero-copy-support'
Gerhard Engleder says:
====================
tsnep: XDP socket zero-copy support
Implement XDP socket zero-copy support for tsnep driver. I tried to
follow existing drivers like igc as far as possible. But one main
difference is that tsnep does not need any reconfiguration for XDP BPF
program setup. So I decided to keep this behavior no matter if a XSK
pool is used or not. As a result, tsnep starts using the XSK pool even
if no XDP BPF program is available.
Another difference is that I tried to prevent potentially failing
allocations during XSK pool setup. E.g. both memory models for page pool
and XSK pool are registered all the time. Thus, XSK pool setup cannot
end up with not working queues.
Some prework is done to reduce the last two XSK commits to actual XSK
changes.
====================
Link: https://lore.kernel.org/r/20230421194656.48063-1-gerhard@engleder-embedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gerhard Engleder [Fri, 21 Apr 2023 19:46:56 +0000 (21:46 +0200)]
tsnep: Add XDP socket zero-copy TX support
Send and complete XSK pool frames within TX NAPI context. NAPI context
is triggered by ndo_xsk_wakeup.
Test results with A53 1.2GHz:
xdpsock txonly copy mode, 64 byte frames:
pps pkts 1.00
tx 284,409 11,398,144
Two CPUs with 100% and 10% utilization.
xdpsock txonly zero-copy mode, 64 byte frames:
pps pkts 1.00
tx 511,929 5,890,368
Two CPUs with 100% and 1% utilization.
xdpsock l2fwd copy mode, 64 byte frames:
pps pkts 1.00
rx 248,985 7,315,885
tx 248,921 7,315,885
Two CPUs with 100% and 10% utilization.
xdpsock l2fwd zero-copy mode, 64 byte frames:
pps pkts 1.00
rx 254,735 3,039,456
tx 254,735 3,039,456
Two CPUs with 100% and 4% utilization.
Packet rate increases and CPU utilization is reduced in both cases.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gerhard Engleder [Fri, 21 Apr 2023 19:46:55 +0000 (21:46 +0200)]
tsnep: Add XDP socket zero-copy RX support
Add support for XSK zero-copy to RX path. The setup of the XSK pool can
be done at runtime. If the netdev is running, then the queue must be
disabled and enabled during reconfiguration. This can be done easily
with functions introduced in previous commits.
A more important property is that, if the netdev is running, then the
setup of the XSK pool shall not stop the netdev in case of errors. A
broken netdev after a failed XSK pool setup is bad behavior. Therefore,
the allocation and setup of resources during XSK pool setup is done only
before any queue is disabled. Additionally, freeing and later allocation
of resources is eliminated in some cases. Page pool entries are kept for
later use. Two memory models are registered in parallel. As a result,
the XSK pool setup cannot fail during queue reconfiguration.
In contrast to other drivers, XSK pool setup and XDP BPF program setup
are separate actions. XSK pool setup can be done without any XDP BPF
program. The XDP BPF program can be added, removed or changed without
any reconfiguration of the XSK pool.
Test results with A53 1.2GHz:
xdpsock rxdrop copy mode, 64 byte frames:
pps pkts 1.00
rx 856,054 10,625,775
Two CPUs with both 100% utilization.
xdpsock rxdrop zero-copy mode, 64 byte frames:
pps pkts 1.00
rx 889,388 4,615,284
Two CPUs with 100% and 20% utilization.
Packet rate increases and CPU utilization is reduced.
100% CPU load seems to the base load. This load is consumed by ksoftirqd
just for dropping the generated packets without xdpsock running.
Using batch API reduced CPU utilization slightly, but measurements are
not stable enough to provide meaningful numbers.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gerhard Engleder [Fri, 21 Apr 2023 19:46:54 +0000 (21:46 +0200)]
tsnep: Move skb receive action to separate function
The function tsnep_rx_poll() is already pretty long and the skb receive
action can be reused for XSK zero-copy support. Move page based skb
receive to separate function.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gerhard Engleder [Fri, 21 Apr 2023 19:46:53 +0000 (21:46 +0200)]
tsnep: Add functions for queue enable/disable
Move queue enable and disable code to separate functions. This way the
activation and deactivation of the queues are defined actions, which can
be used in future execution paths.
This functions will be used for the queue reconfiguration at runtime,
which is necessary for XSK zero-copy support.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gerhard Engleder [Fri, 21 Apr 2023 19:46:52 +0000 (21:46 +0200)]
tsnep: Rework TX/RX queue initialization
Make initialization of TX and RX queues less dynamic by moving some
initialization from netdev open/close to device probing.
Additionally, move some initialization code to separate functions to
enable future use in other execution paths.
This is done as preparation for queue reconfigure at runtime, which is
necessary for XSK zero-copy support.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gerhard Engleder [Fri, 21 Apr 2023 19:46:51 +0000 (21:46 +0200)]
tsnep: Replace modulo operation with mask
TX/RX ring size is static and power of 2 to enable compiler to optimize
modulo operation to mask operation. Make this optimization already in
the code and don't rely on the compiler.
CPU utilisation during high packet rate has not changed. So no
performance improvement has been measured. But it is best practice to
prevent modulo operations.
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexander Stein [Mon, 24 Apr 2023 13:46:25 +0000 (15:46 +0200)]
net: phy: dp83867: Add led_brightness_set support
Up to 4 LEDs can be attached to the PHY, add support for setting
brightness manually.
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230424134625.303957-1-alexander.stein@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexander Stein [Mon, 24 Apr 2023 14:16:48 +0000 (16:16 +0200)]
net: phy: Fix reading LED reg property
'reg' is always encoded in 32 bits, thus it has to be read using the
function with the corresponding bit width.
Fixes: 01e5b728e9e4 ("net: phy: Add a binding for PHY LEDs")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230424141648.317944-1-alexander.stein@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jianuo Kuang [Mon, 24 Apr 2023 02:41:40 +0000 (10:41 +0800)]
drivers: nfc: nfcsim: remove return value check of `dev_dir`
Smatch complains that:
nfcsim_debugfs_init_dev() warn: 'dev_dir' is an error pointer or valid
According to the documentation of the debugfs_create_dir() function,
there is no need to check the return value of this function.
Just delete the dead code.
Signed-off-by: Jianuo Kuang <u202110722@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230424024140.34607-1-u202110722@hust.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
wuych [Mon, 24 Apr 2023 10:15:50 +0000 (18:15 +0800)]
net: phy: dp83867: Remove unnecessary (void*) conversions
Pointer variables of void * type do not require type cast.
Signed-off-by: wuych <yunchuan@nfschina.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230424101550.664319-1-yunchuan@nfschina.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 20 Apr 2023 23:33:02 +0000 (16:33 -0700)]
net: ethtool: coalesce: try to make user settings stick twice
SET_COALESCE may change operation mode and parameters in one call.
Changing operation mode may cause the driver to reset the parameter
values to what is a reasonable default for new operation mode.
Since driver does not know which parameters come from user and which
are echoed back from ->get, driver may ignore the parameters when
switching operation modes.
This used to be inevitable for ioctl() but in netlink we know which
parameters are actually specified by the user.
We could inform which parameters were set by the user but this would
lead to a lot of code duplication in the drivers. Instead try to call
the drivers twice if both mode and params are changed. The set method
already checks if any params need updating so in case the driver did
the right thing the first time around - there will be no second call
to it's ->set method (only an extra call to ->get()).
For mlx5 for example before this patch we'd see:
# ethtool -C eth0 adaptive-rx on adaptive-tx on
# ethtool -C eth0 adaptive-rx off adaptive-tx off \
tx-usecs 123 rx-usecs 123
Adaptive RX: off TX: off
rx-usecs: 3
rx-frames: 32
tx-usecs: 16
tx-frames: 32
[...]
After the change:
# ethtool -C eth0 adaptive-rx on adaptive-tx on
# ethtool -C eth0 adaptive-rx off adaptive-tx off \
tx-usecs 123 rx-usecs 123
Adaptive RX: off TX: off
rx-usecs: 123
rx-frames: 32
tx-usecs: 123
tx-frames: 32
[...]
This only works for netlink, so it's a small discrepancy between
netlink and ioctl(). Since we anticipate most users to move to
netlink I believe it's worth making their lives easier.
Link: https://lore.kernel.org/r/20230420233302.944382-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 25 Apr 2023 01:08:54 +0000 (18:08 -0700)]
Merge branch 'update-coding-style-and-check-alloc_frag'
Haiyang Zhang says:
====================
Update coding style and check alloc_frag
Follow up patches for the jumbo frame support.
As suggested by Jakub Kicinski, update coding style, and check napi_alloc_frag
for possible fallback to single pages.
====================
Link: https://lore.kernel.org/r/1682096818-30056-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Haiyang Zhang [Fri, 21 Apr 2023 17:06:58 +0000 (10:06 -0700)]
net: mana: Check if netdev/napi_alloc_frag returns single page
netdev/napi_alloc_frag() may fall back to single page which is smaller
than the requested size.
Add error checking to avoid memory overwritten.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Haiyang Zhang [Fri, 21 Apr 2023 17:06:57 +0000 (10:06 -0700)]
net: mana: Rename mana_refill_rxoob and remove some empty lines
Rename mana_refill_rxoob for naming consistency.
And remove some empty lines between function call and error
checking.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 25 Apr 2023 01:07:09 +0000 (18:07 -0700)]
Merge branch 'add-page_pool-support-for-page-recycling-in-veth-driver'
Lorenzo Bianconi says:
====================
add page_pool support for page recycling in veth driver
Introduce page_pool support in veth driver in order to recycle pages in
veth_convert_skb_to_xdp_buff routine and avoid reallocating the skb through
the page allocator when we run a xdp program on the device and we receive
skbs from the stack.
====================
Link: https://lore.kernel.org/r/cover.1682188837.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo Bianconi [Sat, 22 Apr 2023 18:54:33 +0000 (20:54 +0200)]
net: veth: add page_pool stats
Introduce page_pool stats support to report info about local page_pool
through ethtool
Tested-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo Bianconi [Sat, 22 Apr 2023 18:54:32 +0000 (20:54 +0200)]
net: veth: add page_pool for page recycling
Introduce page_pool support in veth driver in order to recycle pages
in veth_convert_skb_to_xdp_buff routine and avoid reallocating the skb
through the page allocator.
The patch has been tested sending tcp traffic to a veth pair where the
remote peer is running a simple xdp program just returning xdp_pass:
veth upstream codebase:
MTU 1500B: ~ 8Gbps
MTU 8000B: ~ 13.9Gbps
veth upstream codebase + pp support:
MTU 1500B: ~ 9.2Gbps
MTU 8000B: ~ 16.2Gbps
Tested-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Fri, 21 Apr 2023 18:52:55 +0000 (11:52 -0700)]
netlink: Use copy_to_user() for optval in netlink_getsockopt().
Brad Spencer provided a detailed report [0] that when calling getsockopt()
for AF_NETLINK, some SOL_NETLINK options set only 1 byte even though such
options require at least sizeof(int) as length.
The options return a flag value that fits into 1 byte, but such behaviour
confuses users who do not initialise the variable before calling
getsockopt() and do not strictly check the returned value as char.
Currently, netlink_getsockopt() uses put_user() to copy data to optlen and
optval, but put_user() casts the data based on the pointer, char *optval.
As a result, only 1 byte is set to optval.
To avoid this behaviour, we need to use copy_to_user() or cast optval for
put_user().
Note that this changes the behaviour on big-endian systems, but we document
that the size of optval is int in the man page.
$ man 7 netlink
...
Socket options
To set or get a netlink socket option, call getsockopt(2) to read
or setsockopt(2) to write the option with the option level argument
set to SOL_NETLINK. Unless otherwise noted, optval is a pointer to
an int.
Fixes: 9a4595bc7e67 ("[NETLINK]: Add set/getsockopt options to support more than 32 groups")
Fixes: be0c22a46cfb ("netlink: add NETLINK_BROADCAST_ERROR socket option")
Fixes: 38938bfe3489 ("netlink: add NETLINK_NO_ENOBUFS socket flag")
Fixes: 0a6a3a23ea6e ("netlink: add NETLINK_CAP_ACK socket option")
Fixes: 2d4bc93368f5 ("netlink: extended ACK reporting")
Fixes: 89d35528d17d ("netlink: Add new socket option to enable strict checking on dumps")
Reported-by: Brad Spencer <bspencer@blackberry.com>
Link: https://lore.kernel.org/netdev/ZD7VkNWFfp22kTDt@datsun.rim.net/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Link: https://lore.kernel.org/r/20230421185255.94606-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andrii Nakryiko [Mon, 24 Apr 2023 23:51:28 +0000 (16:51 -0700)]
selftests/bpf: avoid mark_all_scalars_precise() trigger in one of iter tests
iter_pass_iter_ptr_to_subprog subtest is relying on actual array size
being passed as subprog parameter. This combined with recent fixes to
precision tracking in conditional jumps ([0]) is now causing verifier to
backtrack all the way to the point where sum() and fill() subprogs are
called, at which point precision backtrack bails out and forces all the
states to have precise SCALAR registers. This in turn causes each
possible value of i within fill() and sum() subprogs to cause
a different non-equivalent state, preventing iterator code to converge.
For now, change the test to assume fixed size of passed in array. Once
BPF verifier supports precision tracking across subprogram calls, these
changes will be reverted as unnecessary.
[0]
71b547f56124 ("bpf: Fix incorrect verifier pruning due to missing register precision taints")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230424235128.1941726-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jakub Kicinski [Mon, 24 Apr 2023 22:37:35 +0000 (15:37 -0700)]
Merge tag 'nf-next-23-04-22' of git://git./linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
1) Reduce jumpstack footprint: Stash chain in last rule marker in blob for
tracing. Remove last rule and chain from jumpstack. From Florian Westphal.
2) nf_tables validates all tables before committing the new rules.
Unfortunately, this has two drawbacks:
- Since addition of the transaction mutex pernet state gets written to
outside of the locked section from the cleanup callback, this is
wrong so do this cleanup directly after table has passed all checks.
- Revalidate tables that saw no changes. This can be avoided by
keeping the validation state per table, not per netns.
From Florian Westphal.
3) Get rid of a few redundant pointers in the traceinfo structure.
The three removed pointers are used in the expression evaluation loop,
so gcc keeps them in registers. Passing them to the (inlined) helpers
thus doesn't increase nft_do_chain text size, while stack is reduced
by another 24 bytes on 64bit arches. From Florian Westphal.
4) IPVS cleanups in several ways without implementing any functional
changes, aside from removing some debugging output:
- Update width of source for ip_vs_sync_conn_options
The operation is safe, use an annotation to describe it properly.
- Consistently use array_size() in ip_vs_conn_init()
It seems better to use helpers consistently.
- Remove {Enter,Leave}Function. These seem to be well past their
use-by date.
- Correct spelling in comments.
From Simon Horman.
5) Extended netlink error report for netdevice in flowtables and
netdev/chains. Allow for incrementally add/delete devices to netdev
basechain. Allow to create netdev chain without device.
* tag 'nf-next-23-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: allow to create netdev chain without device
netfilter: nf_tables: support for deleting devices in an existing netdev chain
netfilter: nf_tables: support for adding new devices to an existing netdev chain
netfilter: nf_tables: rename function to destroy hook list
netfilter: nf_tables: do not send complete notification of deletions
netfilter: nf_tables: extended netlink error reporting for netdevice
ipvs: Correct spelling in comments
ipvs: Remove {Enter,Leave}Function
ipvs: Consistently use array_size() in ip_vs_conn_init()
ipvs: Update width of source for ip_vs_sync_conn_options
netfilter: nf_tables: do not store rule in traceinfo structure
netfilter: nf_tables: do not store verdict in traceinfo structure
netfilter: nf_tables: do not store pktinfo in traceinfo structure
netfilter: nf_tables: remove unneeded conditional
netfilter: nf_tables: make validation state per table
netfilter: nf_tables: don't write table validation state without mutex
netfilter: nf_tables: don't store chain address on jump
netfilter: nf_tables: don't store address of last rule on jump
netfilter: nf_tables: merge nft_rules_old structure and end of ruleblob marker
====================
Link: https://lore.kernel.org/r/20230421235021.216950-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Mon, 24 Apr 2023 21:25:39 +0000 (14:25 -0700)]
Merge tag 'erofs-for-6.4-rc1' of git://git./linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"In this cycle, sub-page block support for uncompressed files is
available. It's mainly used to enable original signing ('golden')
4k-block images on arm64 with 16/64k pages. In addition, end users
could also use this feature to build a manifest to directly refer to
golden tar data.
Besides, long xattr name prefix support is also introduced in this
cycle to avoid too many xattrs with the same prefix (e.g. overlayfs
xattrs). It's useful for erofs + overlayfs combination (like Composefs
model): the image size is reduced by ~14% and runtime performance is
also slightly improved.
Others are random fixes and cleanups as usual.
Summary:
- Add sub-page block size support for uncompressed files
- Support flattened block device for multi-blob images to be attached
into virtual machines (including cloud servers) and bare metals
- Support long xattr name prefixes to optimize images with common
xattr namespaces (e.g. files with overlayfs xattrs) use cases
- Various minor cleanups & fixes"
* tag 'erofs-for-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: cleanup i_format-related stuffs
erofs: sunset erofs_dbg()
erofs: fix potential overflow calculating xattr_isize
erofs: get rid of z_erofs_fill_inode()
erofs: enable long extended attribute name prefixes
erofs: handle long xattr name prefixes properly
erofs: add helpers to load long xattr name prefixes
erofs: introduce on-disk format for long xattr name prefixes
erofs: move packed inode out of the compression part
erofs: keep meta inode into erofs_buf
erofs: initialize packed inode after root inode is assigned
erofs: stop parsing non-compact HEAD index if clusterofs is invalid
erofs: don't warn ztailpacking feature anymore
erofs: simplify erofs_xattr_generic_get()
erofs: rename init_inode_xattrs with erofs_ prefix
erofs: move several xattr helpers into xattr.c
erofs: tidy up EROFS on-disk naming
erofs: support flattened block device for multi-blob images
erofs: set block size to the on-disk block size
erofs: avoid hardcoded blocksize for subpage block support
Yafang Shao [Mon, 24 Apr 2023 16:11:03 +0000 (16:11 +0000)]
bpf: Add __rcu_read_{lock,unlock} into btf id deny list
The tracing recursion prevention mechanism must be protected by rcu, that
leaves __rcu_read_{lock,unlock} unprotected by this mechanism. If we trace
them, the recursion will happen. Let's add them into the btf id deny list.
When CONFIG_PREEMPT_RCU is enabled, it can be reproduced with a simple bpf
program as such:
SEC("fentry/__rcu_read_lock")
int fentry_run()
{
return 0;
}
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230424161104.3737-2-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Linus Torvalds [Mon, 24 Apr 2023 21:06:41 +0000 (14:06 -0700)]
Merge tag 'v6.4/vfs.open' of git://git./linux/kernel/git/vfs/vfs
Pull vfs open fixlet from Christian Brauner:
"EINVAL ist keinmal: This contains the changes to make O_DIRECTORY when
specified together with O_CREAT an invalid request.
The wider background is that a regression report about the behavior of
O_DIRECTORY | O_CREAT was sent to fsdevel about a behavior that was
changed multiple years and LTS releases earlier during v5.7
development.
This has also been covered in
https://lwn.net/Articles/926782/
which provides an excellent summary of the discussion"
* tag 'v6.4/vfs.open' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
open: return EINVAL for O_DIRECTORY | O_CREAT
Dave Marchevsky [Mon, 24 Apr 2023 20:43:21 +0000 (13:43 -0700)]
bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed
As reported by Kumar in [0], the shared ownership implementation for BPF
programs has some race conditions which need to be addressed before it
can safely be used. This patch does so in a minimal way instead of
ripping out shared ownership entirely, as proper fixes for the issues
raised will follow ASAP, at which point this patch's commit can be
reverted to re-enable shared ownership.
The patch removes the ability to call bpf_refcount_acquire_impl from BPF
programs. Programs can only bump refcount and obtain a new owning
reference using this kfunc, so removing the ability to call it
effectively disables shared ownership.
Instead of changing success / failure expectations for
bpf_refcount-related selftests, this patch just disables them from
running for now.
[0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2/
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230424204321.2680232-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Linus Torvalds [Mon, 24 Apr 2023 20:39:58 +0000 (13:39 -0700)]
Merge tag 'v6.4/vfs.misc' of git://git./linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains a pile of various smaller fixes. Most of them aren't
very interesting so this just highlights things worth mentioning:
- Various filesystems contained the same little helper to convert
from the mode of a dentry to the DT_* type of that dentry.
They have now all been switched to rely on the generic
fs_umode_to_dtype() helper. All custom helpers are removed (Jeff)
- Fsnotify now reports ACCESS and MODIFY events for splice
(Chung-Chiang Cheng)
- After converting timerfd a long time ago to rely on
wait_event_interruptible_*() apis, convert eventfd as well. This
removes the complex open-coded wait code (Wen Yang)
- Simplify sysctl registration for devpts, avoiding the declaration
of two tables. Instead, just use a prefixed path with
register_sysctl() (Luis)
- The setattr_should_drop_sgid() helper is now exported so NFS can
use it. By switching NFS to this helper an NFS setgid inheritance
bug is fixed (me)"
* tag 'v6.4/vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: hfsplus: remove WARN_ON() from hfsplus_cat_{read,write}_inode()
pnode: pass mountpoint directly
eventfd: use wait_event_interruptible_locked_irq() helper
splice: report related fsnotify events
fs: consolidate duplicate dt_type helpers
nfs: use vfs setgid helper
Update relatime comments to include equality
fs/buffer: Remove redundant assignment to err
fs_context: drop the unused lsm_flags member
fs/namespace: fnic: Switch to use %ptTd
Documentation: update idmappings.rst
devpts: simplify two-level sysctl registration for pty_kern_table
eventpoll: align comment with nested epoll limitation
Linus Torvalds [Mon, 24 Apr 2023 20:35:23 +0000 (13:35 -0700)]
Merge tag 'v6.4/vfs.acl' of git://git./linux/kernel/git/vfs/vfs
Pull acl updates from Christian Brauner:
"After finishing the introduction of the new posix acl api last cycle
the generic POSIX ACL xattr handlers are still around in the
filesystems xattr handlers for two reasons:
(1) Because a few filesystems rely on the ->list() method of the
generic POSIX ACL xattr handlers in their ->listxattr() inode
operation.
(2) POSIX ACLs are only available if IOP_XATTR is raised. The
IOP_XATTR flag is raised in inode_init_always() based on whether
the sb->s_xattr pointer is non-NULL. IOW, the registered xattr
handlers of the filesystem are used to raise IOP_XATTR. Removing
the generic POSIX ACL xattr handlers from all filesystems would
risk regressing filesystems that only implement POSIX ACL support
and no other xattrs (nfs3 comes to mind).
This contains the work to decouple POSIX ACLs from the IOP_XATTR flag
as they don't depend on xattr handlers anymore. So it's now possible
to remove the generic POSIX ACL xattr handlers from the sb->s_xattr
list of all filesystems. This is a crucial step as the generic POSIX
ACL xattr handlers aren't used for POSIX ACLs anymore and POSIX ACLs
don't depend on the xattr infrastructure anymore.
Adressing problem (1) will require more long-term work. It would be
best to get rid of the ->list() method of xattr handlers completely at
some point.
For erofs, ext{2,4}, f2fs, jffs2, ocfs2, and reiserfs the nop POSIX
ACL xattr handler is kept around so they can continue to use
array-based xattr handler indexing.
This update does simplify the ->listxattr() implementation of all
these filesystems however"
* tag 'v6.4/vfs.acl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
acl: don't depend on IOP_XATTR
ovl: check for ->listxattr() support
reiserfs: rework priv inode handling
fs: rename generic posix acl handlers
reiserfs: rework ->listxattr() implementation
fs: simplify ->listxattr() implementation
fs: drop unused posix acl handlers
xattr: remove unused argument
xattr: add listxattr helper
xattr: simplify listxattr helpers
Linus Torvalds [Mon, 24 Apr 2023 20:03:42 +0000 (13:03 -0700)]
Merge tag 'v6.4/pidfd.file' of git://git./linux/kernel/git/brauner/linux
Pull pidfd updates from Christian Brauner:
"This adds a new pidfd_prepare() helper which allows the caller to
reserve a pidfd number and allocates a new pidfd file that stashes the
provided struct pid.
It should be avoided installing a file descriptor into a task's file
descriptor table just to close it again via close_fd() in case an
error occurs. The fd has been visible to userspace and might already
be in use. Instead, a file descriptor should be reserved but not
installed into the caller's file descriptor table.
If another failure path is hit then the reserved file descriptor and
file can just be put without any userspace visible side-effects. And
if all failure paths are cleared the file descriptor and file can be
installed into the task's file descriptor table.
This helper is now used in all places that open coded this
functionality before. For example, this is currently done during
copy_process() and fanotify used pidfd_create(), which returns a pidfd
that has already been made visibile in the caller's file descriptor
table, but then closed it using close_fd().
In one of the next merge windows there is also new functionality
coming to unix domain sockets that will have to rely on
pidfd_prepare()"
* tag 'v6.4/pidfd.file' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
fanotify: use pidfd_prepare()
fork: use pidfd_prepare()
pid: add pidfd_prepare()
Linus Torvalds [Mon, 24 Apr 2023 19:52:35 +0000 (12:52 -0700)]
Merge tag 'v6.4/kernel.user_worker' of git://git./linux/kernel/git/brauner/linux
Pull user work thread updates from Christian Brauner:
"This contains the work generalizing the ability to create a kernel
worker from a userspace process.
Such user workers will run with the same credentials as the userspace
process they were created from providing stronger security and
accounting guarantees than the traditional override_creds() approach
ever could've hoped for.
The original work was heavily based and optimzed for the needs of
io_uring which was the first user. However, as it quickly turned out
the ability to create user workers inherting properties from a
userspace process is generally useful.
The vhost subsystem currently creates workers using the kthread api.
The consequences of using the kthread api are that RLIMITs don't work
correctly as they are inherited from khtreadd. This leads to bugs
where more workers are created than would be allowed by the RLIMITs of
the userspace process in lieu of which workers are created.
Problems like this disappear with user workers created from the
userspace processes for which they perform the work. In addition,
providing this api allows vhost to remove additional complexity. For
example, cgroup and mm sharing will just work out of the box with user
workers based on the relevant userspace process instead of manually
ensuring the correct cgroup and mm contexts are used.
So the vhost subsystem should simply be made to use the same mechanism
as io_uring. To this end the original mechanism used for
create_io_thread() is generalized into user workers:
- Introduce PF_USER_WORKER as a generic indicator that a given task
is a user worker, i.e., a kernel task that was created from a
userspace process. Now a PF_IO_WORKER thread is just a specialized
version of PF_USER_WORKER. So io_uring io workers raise both flags.
- Make copy_process() available to core kernel code
- Extend struct kernel_clone_args with the following bitfields
allowing to indicate to copy_process():
- to create a user worker (raise PF_USER_WORKER)
- to not inherit any files from the userspace process
- to ignore signals
After all generic changes are in place the vhost subsystem implements
a new dedicated vhost api based on user workers. Finally, vhost is
switched to rely on the new api moving it off of kthreads.
Thanks to Mike for sticking it out and making it through this rather
arduous journey"
* tag 'v6.4/kernel.user_worker' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
vhost: use vhost_tasks for worker threads
vhost: move worker thread fields to new struct
vhost_task: Allow vhost layer to use copy_process
fork: allow kernel code to call copy_process
fork: Add kernel_clone_args flag to ignore signals
fork: add kernel_clone_args flag to not dup/clone files
fork/vm: Move common PF_IO_WORKER behavior to new flag
kernel: Make io_thread and kthread bit fields
kthread: Pass in the thread's name during creation
kernel: Allow a kernel thread's name to be set in copy_process
csky: Remove kernel_thread declaration
Linus Torvalds [Mon, 24 Apr 2023 19:48:33 +0000 (12:48 -0700)]
Merge tag 'v6.4/kernel.clone3.tests' of git://git./linux/kernel/git/brauner/linux
Pull clone3 selftest fix from Christian Brauner:
"This is a single fix to the clone3() selftstests.
It fell through the sefltest tree cracks a few times so I'll provide
it here. It has low urgency but we should still correctly report the
number of tests"
* tag 'v6.4/kernel.clone3.tests' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests/clone3: fix number of tests in ksft_set_plan
Linus Torvalds [Mon, 24 Apr 2023 19:35:49 +0000 (12:35 -0700)]
Merge tag 'docs-6.4' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"Commit volume in documentation is relatively low this time, but there
is still a fair amount going on, including:
- Reorganize the architecture-specific documentation under
Documentation/arch
This makes the structure match the source directory and helps to
clean up the mess that is the top-level Documentation directory a
bit. This work creates the new directory and moves x86 and most of
the less-active architectures there.
The current plan is to move the rest of the architectures in 6.5,
with the patches going through the appropriate subsystem trees.
- Some more Spanish translations and maintenance of the Italian
translation
- A new "Kernel contribution maturity model" document from Ted
- A new tutorial on quickly building a trimmed kernel from Thorsten
Plus the usual set of updates and fixes"
* tag 'docs-6.4' of git://git.lwn.net/linux: (47 commits)
media: Adjust column width for pdfdocs
media: Fix building pdfdocs
docs: clk: add documentation to log which clocks have been disabled
docs: trace: Fix typo in ftrace.rst
Documentation/process: always CC responsible lists
docs: kmemleak: adjust to config renaming
ELF: document some de-facto PT_* ABI quirks
Documentation: arm: remove stih415/stih416 related entries
docs: turn off "smart quotes" in the HTML build
Documentation: firmware: Clarify firmware path usage
docs/mm: Physical Memory: Fix grammar
Documentation: Add document for false sharing
dma-api-howto: typo fix
docs: move m68k architecture documentation under Documentation/arch/
docs: move parisc documentation under Documentation/arch/
docs: move ia64 architecture docs under Documentation/arch/
docs: Move arc architecture docs under Documentation/arch/
docs: move nios2 documentation under Documentation/arch/
docs: move openrisc documentation under Documentation/arch/
docs: move superh documentation under Documentation/arch/
...
Linus Torvalds [Mon, 24 Apr 2023 19:31:32 +0000 (12:31 -0700)]
Merge tag 'linux-kselftest-kunit-6.4-rc1' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
- several fixes to kunit tool
- new klist structure test
- support for m68k under QEMU
- support for overriding the QEMU serial port
- support for SH under QEMU
* tag 'linux-kselftest-kunit-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: add tests for using current KUnit test field
kunit: tool: Add support for SH under QEMU
kunit: tool: Add support for overriding the QEMU serial port
.gitignore: Unignore .kunitconfig
list: test: Test the klist structure
kunit: increase KUNIT_LOG_SIZE to 2048 bytes
kunit: Use gfp in kunit_alloc_resource() kernel-doc
kunit: tool: fix pre-existing `mypy --strict` errors and update run_checks.py
kunit: tool: remove unused imports and variables
kunit: tool: add subscripts for type annotations where appropriate
kunit: fix bug of extra newline characters in debugfs logs
kunit: fix bug in the order of lines in debugfs logs
kunit: fix bug in debugfs logs of parameterized tests
kunit: tool: Add support for m68k under QEMU
Linus Torvalds [Mon, 24 Apr 2023 19:28:34 +0000 (12:28 -0700)]
Merge tag 'linux-kselftest-next-6.4-rc1' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest updates from Shuah Khan:
- several patches to enhance and fix resctrl test
- nolibc support for kselftest with an addition to vprintf() to
tools/nolibc/stdio and related test changes
- Refactor 'peeksiginfo' ptrace test part
- add 'malloc' failures checks in cgroup test_memcontrol
- a new prctl test
- enhancements sched test with additional ore schedule prctl calls
* tag 'linux-kselftest-next-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (25 commits)
selftests/resctrl: Fix incorrect error return on test complete
selftests/resctrl: Remove duplicate codes that clear each test result file
selftests/resctrl: Commonize the signal handler register/unregister for all tests
selftests/resctrl: Cleanup properly when an error occurs in CAT test
selftests/resctrl: Flush stdout file buffer before executing fork()
selftests/resctrl: Return MBA check result and make it to output message
selftests/resctrl: Fix set up schemata with 100% allocation on first run in MBM test
selftests/resctrl: Use correct exit code when tests fail
kselftest/arm64: Convert za-fork to use kselftest.h
kselftest: Support nolibc
tools/nolibc/stdio: Implement vprintf()
selftests/resctrl: Correct get_llc_perf() param in function comment
selftests/resctrl: Use remount_resctrlfs() consistently with boolean
selftests/resctrl: Change name from CBM_MASK_PATH to INFO_PATH
selftests/resctrl: Change initialize_llc_perf() return type to void
selftests/resctrl: Replace obsolete memalign() with posix_memalign()
selftests/resctrl: Check for return value after write_schemata()
selftests/resctrl: Allow ->setup() to return errors
selftests/resctrl: Move ->setup() call outside of test specific branches
selftests/resctrl: Return NULL if malloc_and_init_memory() did not alloc mem
...
Linus Torvalds [Mon, 24 Apr 2023 19:16:14 +0000 (12:16 -0700)]
Merge tag 'rcu.6.4.april5.2023.3' of git://git./linux/kernel/git/jfern/linux
Pull RCU updates from Joel Fernandes:
- Updates and additions to MAINTAINERS files, with Boqun being added to
the RCU entry and Zqiang being added as an RCU reviewer.
I have also transitioned from reviewer to maintainer; however, Paul
will be taking over sending RCU pull-requests for the next merge
window.
- Resolution of hotplug warning in nohz code, achieved by fixing
cpu_is_hotpluggable() through interaction with the nohz subsystem.
Tick dependency modifications by Zqiang, focusing on fixing usage of
the TICK_DEP_BIT_RCU_EXP bitmask.
- Avoid needless calls to the rcu-lazy shrinker for CONFIG_RCU_LAZY=n
kernels, fixed by Zqiang.
- Improvements to rcu-tasks stall reporting by Neeraj.
- Initial renaming of k[v]free_rcu() to k[v]free_rcu_mightsleep() for
increased robustness, affecting several components like mac802154,
drbd, vmw_vmci, tracing, and more.
A report by Eric Dumazet showed that the API could be unknowingly
used in an atomic context, so we'd rather make sure they know what
they're asking for by being explicit:
https://lore.kernel.org/all/
20221202052847.
2623997-1-edumazet@google.com/
- Documentation updates, including corrections to spelling,
clarifications in comments, and improvements to the srcu_size_state
comments.
- Better srcu_struct cache locality for readers, by adjusting the size
of srcu_struct in support of SRCU usage by Christoph Hellwig.
- Teach lockdep to detect deadlocks between srcu_read_lock() vs
synchronize_srcu() contributed by Boqun.
Previously lockdep could not detect such deadlocks, now it can.
- Integration of rcutorture and rcu-related tools, targeted for v6.4
from Boqun's tree, featuring new SRCU deadlock scenarios, test_nmis
module parameter, and more
- Miscellaneous changes, various code cleanups and comment improvements
* tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux: (71 commits)
checkpatch: Error out if deprecated RCU API used
mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep()
rcuscale: Rename kfree_rcu() to kfree_rcu_mightsleep()
ext4/super: Rename kfree_rcu() to kfree_rcu_mightsleep()
net/mlx5: Rename kfree_rcu() to kfree_rcu_mightsleep()
net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
lib/test_vmalloc.c: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
misc: vmw_vmci: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
drbd: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access
rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed
rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan()
rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early
rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels
rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check
rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race
rcu/trace: use strscpy() to instead of strncpy()
tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem
...
Linus Torvalds [Mon, 24 Apr 2023 19:09:43 +0000 (12:09 -0700)]
Merge tag 'nolibc.2023.04.04a' of git://git./linux/kernel/git/paulmck/linux-rcu
Pull nolibc updates from Paul McKenney:
- Add support for loongarch
- Fix stack-protector issues
- Support additional integral types and signal-related macros
- Add support for stdin, stdout, and stderr
- Add getuid() and geteuid()
- Allow S_I* macros to be overridden by program
- Defer to linux/fcntl.h and linux/stat.h to avoid duplicate
definitions
- Many improvements to the selftests
* tag 'nolibc.2023.04.04a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (22 commits)
tools/nolibc: x86_64: add stackprotector support
tools/nolibc: i386: add stackprotector support
tools/nolibc: tests: add test for -fstack-protector
tools/nolibc: tests: fold in no-stack-protector cflags
tools/nolibc: add support for stack protector
tools/nolibc: tests: constify test_names
tools/nolibc: add helpers for wait() signal exits
tools/nolibc: add definitions for standard fds
selftests/nolibc: Adjust indentation for Makefile
selftests/nolibc: Add support for LoongArch
tools/nolibc: Add support for LoongArch
tools/nolibc: Add statx() and make stat() rely on statx() if necessary
tools/nolibc: Include linux/fcntl.h and remove duplicate code
tools/nolibc: check for S_I* macros before defining them
selftests/nolibc: skip the chroot_root and link_dir tests when not privileged
tools/nolibc: add getuid() and geteuid()
tools/nolibc: add tests for the integer limits in stdint.h
tools/nolibc: enlarge column width of tests
tools/nolibc: add integer types and integer limit macros
tools/nolibc: add stdint.h
...
Linus Torvalds [Mon, 24 Apr 2023 19:05:08 +0000 (12:05 -0700)]
Merge tag 'locktorture.2023.04.04a' of git://git./linux/kernel/git/paulmck/linux-rcu
Pull locktorture updates from Paul McKenney:
"This adds tests for nested locking and also adds support for testing
raw spinlocks in PREEMPT_RT kernels"
* tag 'locktorture.2023.04.04a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
locktorture: Add raw_spinlock* torture tests for PREEMPT_RT kernels
locktorture: With nested locks, occasionally skip main lock
locktorture: Add nested locking to rtmutex torture tests
locktorture: Add nested locking to mutex torture tests
locktorture: Add nested_[un]lock() hooks and nlocks parameter
Linus Torvalds [Mon, 24 Apr 2023 19:02:25 +0000 (12:02 -0700)]
Merge tag 'lkmm-scripting.2023.04.07a' of git://git./linux/kernel/git/paulmck/linux-rcu
Pull Linux Kernel Memory Model scripting updates from Paul McKenney:
"This improves litmus-test documentation and improves the ability to do
before/after tests on the https://github.com/paulmckrcu/litmus repo"
* tag 'lkmm-scripting.2023.04.07a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (32 commits)
tools/memory-model: Remove out-of-date SRCU documentation
tools/memory-model: Document LKMM test procedure
tools/memory-model: Use "grep -E" instead of "egrep"
tools/memory-model: Use "-unroll 0" to keep --hw runs finite
tools/memory-model: Make judgelitmus.sh handle scripted Result: tag
tools/memory-model: Add data-race capabilities to judgelitmus.sh
tools/memory-model: Add checktheselitmus.sh to run specified litmus tests
tools/memory-model: Repair parseargs.sh header comment
tools/memory-model: Add "--" to parseargs.sh for additional arguments
tools/memory-model: Make history-check scripts use mselect7
tools/memory-model: Make checkghlitmus.sh use mselect7
tools/memory-model: Fix scripting --jobs argument
tools/memory-model: Implement --hw support for checkghlitmus.sh
tools/memory-model: Add -v flag to jingle7 runs
tools/memory-model: Make runlitmus.sh check for jingle errors
tools/memory-model: Allow herd to deduce CPU type
tools/memory-model: Keep assembly-language litmus tests
tools/memory-model: Move from .AArch64.litmus.out to .litmus.AArch.out
tools/memory-model: Make runlitmus.sh generate .litmus.out for --hw
tools/memory-model: Split runlitmus.sh out of checklitmus.sh
...
Linus Torvalds [Mon, 24 Apr 2023 19:00:51 +0000 (12:00 -0700)]
Merge tag 'lkmm.2023.04.07a' of git://git./linux/kernel/git/paulmck/linux-rcu
Pull Linux Kernel Memory Model updates from Paul McKenney
"This improves LKMM diagnostic messages, unifies handling of the
ordering produced by unlock/lock pairs, adds support for the
smp_mb__after_srcu_read_unlock() macro, removes redundant members from
the to-r relation, brings SRCU read-side semantics into alignment with
Linux-kernel SRCU, makes ppo a subrelation of po, and improves
documentation"
* tag 'lkmm.2023.04.07a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
Documentation: litmus-tests: Correct spelling
tools/memory-model: Add documentation about SRCU read-side critical sections
tools/memory-model: Make ppo a subrelation of po
tools/memory-model: Provide exact SRCU semantics
tools/memory-model: Restrict to-r to read-read address dependency
tools/memory-model: Add smp_mb__after_srcu_read_unlock()
tools/memory-model: Unify UNLOCK+LOCK pairings to po-unlock-lock-po
tools/memory-model: Update some warning labels
Linus Torvalds [Mon, 24 Apr 2023 18:46:53 +0000 (11:46 -0700)]
Merge tag 'kcsan.2023.04.04a' of git://git./linux/kernel/git/paulmck/linux-rcu
Pull KCSAN updates from Paul McKenney:
"Kernel concurrency sanitizer (KCSAN) updates for v6.4
This fixes kernel-doc warnings and also updates instrumentation from
READ_ONCE() to volatile in order to avoid unaligned load-acquire
instructions on arm64 in kernels built with LTO"
* tag 'kcsan.2023.04.04a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
kcsan: Avoid READ_ONCE() in read_instrumented_memory()
instrumented.h: Fix all kernel-doc format warnings
Linus Torvalds [Mon, 24 Apr 2023 18:40:26 +0000 (11:40 -0700)]
Merge tag 'tpmdd-v6.4-rc1' of git://git./linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
- The .machine keyring, used for Machine Owner Keys (MOK), acquired the
ability to store only CA enforced keys, and put rest to the .platform
keyring, thus separating the code signing keys from the keys that are
used to sign certificates.
This essentially unlocks the use of the .machine keyring as a trust
anchor for IMA. It is an opt-in feature, meaning that the additional
contraints won't brick anyone who does not care about them.
- Enable interrupt based transactions with discrete TPM chips (tpm_tis).
There was code for this existing but it never really worked so I
consider this a new feature rather than a bug fix. Before the driver
just fell back to the polling mode.
Link: https://lore.kernel.org/linux-integrity/a93b6222-edda-d43c-f010-a59701f2aeef@gmx.de/
Link: https://lore.kernel.org/linux-integrity/20230302164652.83571-1-eric.snowberg@oracle.com/
* tag 'tpmdd-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: (29 commits)
tpm: Add !tpm_amd_is_rng_defective() to the hwrng_unregister() call site
tpm_tis: fix stall after iowrite*()s
tpm/tpm_tis_synquacer: Convert to platform remove callback returning void
tpm/tpm_tis: Convert to platform remove callback returning void
tpm/tpm_ftpm_tee: Convert to platform remove callback returning void
tpm: tpm_tis_spi: Mark ACPI and OF related data as maybe unused
tpm: st33zp24: Mark ACPI and OF related data as maybe unused
tpm, tpm_tis: Enable interrupt test
tpm, tpm_tis: startup chip before testing for interrupts
tpm, tpm_tis: Claim locality when interrupts are reenabled on resume
tpm, tpm_tis: Claim locality in interrupt handler
tpm, tpm_tis: Request threaded interrupt handler
tpm, tpm: Implement usage counter for locality
tpm, tpm_tis: do not check for the active locality in interrupt handler
tpm, tpm_tis: Move interrupt mask checks into own function
tpm, tpm_tis: Only handle supported interrupts
tpm, tpm_tis: Claim locality before writing interrupt registers
tpm, tpm_tis: Do not skip reset of original interrupt vector
tpm, tpm_tis: Disable interrupts if tpm_tis_probe_irq() failed
tpm, tpm_tis: Claim locality before writing TPM_INT_ENABLE register
...
Linus Torvalds [Mon, 24 Apr 2023 18:37:24 +0000 (11:37 -0700)]
Merge tag 'Smack-for-6.4' of https://github.com/cschaufler/smack-next
Pull smack updates from Casey Schaufler:
"There are two changes, one small and one more substantial:
- Remove of an unnecessary cast
- The mount option processing introduced with the mount rework makes
copies of mount option values. There is no good reason to make
copies of Smack labels, as they are maintained on a list and never
removed.
The code now uses pointers to entries on the list, reducing
processing time and memory use"
* tag 'Smack-for-6.4' of https://github.com/cschaufler/smack-next:
Smack: Improve mount process memory use
smack_lsm: remove unnecessary type casting
Linus Torvalds [Mon, 24 Apr 2023 18:35:15 +0000 (11:35 -0700)]
Merge tag 'landlock-6.4-rc1' of git://git./linux/kernel/git/mic/linux
Pull landlock update from Mickaël Salaün:
"Improve user space documentation"
* tag 'landlock-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Clarify documentation for the LANDLOCK_ACCESS_FS_REFER right
Linus Torvalds [Mon, 24 Apr 2023 18:33:07 +0000 (11:33 -0700)]
Merge tag 'tomoyo-pr-
20230424' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1
Pull tomoyo update from Tetsuo Handa:
"One cleanup patch from Vlastimil Babka"
* tag 'tomoyo-pr-
20230424' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
tomoyo: replace tomoyo_round2() with kmalloc_size_roundup()
Linus Torvalds [Mon, 24 Apr 2023 18:21:50 +0000 (11:21 -0700)]
Merge tag 'lsm-pr-
20230420' of git://git./linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Move the LSM hook comment blocks into security/security.c
For many years the LSM hook comment blocks were located in a very odd
place, include/linux/lsm_hooks.h, where they lived on their own,
disconnected from both the function prototypes and definitions.
In keeping with current kernel conventions, this moves all of these
comment blocks to the top of the function definitions, transforming
them into the kdoc format in the process. This should make it much
easier to maintain these comments, which are the main source of LSM
hook documentation.
For the most part the comment contents were left as-is, although some
glaring errors were corrected. Expect additional edits in the future
as we slowly update and correct the comment blocks.
This is the bulk of the diffstat.
- Introduce LSM_ORDER_LAST
Similar to how LSM_ORDER_FIRST is used to specify LSMs which should
be ordered before "normal" LSMs, the LSM_ORDER_LAST is used to
specify LSMs which should be ordered after "normal" LSMs.
This is one of the prerequisites for transitioning IMA/EVM to a
proper LSM.
- Remove the security_old_inode_init_security() hook
The security_old_inode_init_security() LSM hook only allows for a
single xattr which is problematic both for LSM stacking and the
IMA/EVM-as-a-LSM effort. This finishes the conversion over to the
security_inode_init_security() hook and removes the single-xattr LSM
hook.
- Fix a reiserfs problem with security xattrs
During the security_old_inode_init_security() removal work it became
clear that reiserfs wasn't handling security xattrs properly so we
fixed it.
* tag 'lsm-pr-
20230420' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (32 commits)
reiserfs: Add security prefix to xattr name in reiserfs_security_write()
security: Remove security_old_inode_init_security()
ocfs2: Switch to security_inode_init_security()
reiserfs: Switch to security_inode_init_security()
security: Remove integrity from the LSM list in Kconfig
Revert "integrity: double check iint_cache was initialized"
security: Introduce LSM_ORDER_LAST and set it for the integrity LSM
device_cgroup: Fix typo in devcgroup_css_alloc description
lsm: fix a badly named parameter in security_get_getsecurity()
lsm: fix doc warnings in the LSM hook comments
lsm: styling fixes to security/security.c
lsm: move the remaining LSM hook comments to security/security.c
lsm: move the io_uring hook comments to security/security.c
lsm: move the perf hook comments to security/security.c
lsm: move the bpf hook comments to security/security.c
lsm: move the audit hook comments to security/security.c
lsm: move the binder hook comments to security/security.c
lsm: move the sysv hook comments to security/security.c
lsm: move the key hook comments to security/security.c
lsm: move the xfrm hook comments to security/security.c
...
Linus Torvalds [Mon, 24 Apr 2023 18:11:59 +0000 (11:11 -0700)]
Merge tag 'selinux-pr-
20230420' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Stop passing the 'selinux_state' pointers as function arguments
As discussed during the end of the last development cycle, passing a
selinux_state pointer through the SELinux code has a noticeable
impact on performance, and with the current code it is not strictly
necessary.
This simplifies things by referring directly to the single
selinux_state global variable which should help improve SELinux
performance.
- Uninline the unlikely portions of avc_has_perm_noaudit()
This change was also based on a discussion from the last development
cycle, and is heavily based on an initial proof of concept patch from
you. The core issue was that avc_has_perm_noaudit() was not able to
be inlined, as intended, due to its size. We solved this issue by
extracting the less frequently hit portions of avc_has_perm_noaudit()
into a separate function, reducing the size of avc_has_perm_noaudit()
to the point where the compiler began inlining the function. We also
took the opportunity to clean up some ugly RCU locking in the code
that became uglier with the change.
- Remove the runtime disable functionality
After several years of work by the userspace and distro folks, we are
finally in a place where we feel comfortable removing the runtime
disable functionality which we initially deprecated at the start of
2020.
There is plenty of information in the kernel's deprecation (now
removal) notice, but the main motivation was to be able to safely
mark the LSM hook structures as '__ro_after_init'.
LWN also wrote a good summary of the deprecation this morning which
offers a more detailed history:
https://lwn.net/SubscriberLink/927463/
dcfa0d4ed2872f03
- Remove the checkreqprot functionality
The original checkreqprot deprecation notice stated that the removal
would happen no sooner than June 2021, which means this falls hard
into the "better late than never" bucket.
The Kconfig and deprecation notice has more detail on this setting,
but the basic idea is that we want to ensure that the SELinux policy
allows for the memory protections actually applied by the kernel, and
not those requested by the process.
While we haven't found anyone running a supported distro that is
affected by this deprecation/removal, anyone who is affected would
only need to update their policy to reflect the reality of their
applications' mapping protections.
- Minor Makefile improvements
Some minor Makefile improvements to correct some dependency issues
likely only ever seen by SELinux developers. I expect we will have at
least one more tweak to the Makefile during the next merge window,
but it didn't quite make the cutoff this time around.
* tag 'selinux-pr-
20230420' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: ensure av_permissions.h is built when needed
selinux: fix Makefile dependencies of flask.h
selinux: stop returning node from avc_insert()
selinux: clean up dead code after removing runtime disable
selinux: update the file list in MAINTAINERS
selinux: remove the runtime disable functionality
selinux: remove the 'checkreqprot' functionality
selinux: stop passing selinux_state pointers and their offspring
selinux: uninline unlikely parts of avc_has_perm_noaudit()
Qi Han [Tue, 18 Apr 2023 06:09:54 +0000 (14:09 +0800)]
f2fs: remove unnessary comment in __may_age_extent_tree
This comment make no sense and is in the wrong place, so let's
remove it.
Signed-off-by: Qi Han <hanqi@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Daeho Jeong [Tue, 18 Apr 2023 17:42:01 +0000 (10:42 -0700)]
f2fs: allocate node blocks for atomic write block replacement
When a node block is missing for atomic write block replacement, we need
to allocate it in advance of the replacement.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Daeho Jeong [Tue, 18 Apr 2023 17:52:06 +0000 (10:52 -0700)]
f2fs: use cow inode data when updating atomic write
Need to use cow inode data content instead of the one in the original
inode, when we try to write the already updated atomic write files.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 18 Apr 2023 00:12:52 +0000 (17:12 -0700)]
f2fs: remove power-of-two limitation of zoned device
In f2fs, there's no reason to force po2.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Linus Torvalds [Mon, 24 Apr 2023 17:39:27 +0000 (10:39 -0700)]
Merge branch 'x86-rep-insns': x86 user copy clarifications
Merge my x86 user copy updates branch.
This cleans up a lot of our x86 memory copy code, particularly for user
accesses. I've been pushing for microarchitectural support for good
memory copying and clearing for a long while, and it's been visible in
how the kernel has aggressively used 'rep movs' and 'rep stos' whenever
possible.
And that micro-architectural support has been improving over the years,
to the point where on modern CPU's the best option for a memory copy
that would become a function call (as opposed to being something that
can just be turned into individual 'mov' instructions) is now to inline
the string instruction sequence instead.
However, that only makes sense when we have the modern markers for this:
the x86 FSRM and FSRS capabilities ("Fast Short REP MOVS/STOS").
So this cleans up a lot of our historical code, gets rid of the legacy
marker use ("REP_GOOD" and "ERMS") from the memcpy/memset cases, and
replaces it with that modern reality. Note that REP_GOOD and ERMS end
up still being used by the known large cases (ie page copyin gand
clearing).
The reason much of this ends up being about user memory accesses is that
the normal in-kernel cases are done by the compiler (__builtin_memcpy()
and __builtin_memset()) and getting to the point where we can use our
instruction rewriting to inline those to be string instructions will
need some compiler support.
In contrast, the user accessor functions are all entirely controlled by
the kernel code, so we can change those arbitrarily.
Thanks to Borislav Petkov for feedback on the series, and Jens testing
some of this on micro-architectures I didn't personally have access to.
* x86-rep-insns:
x86: rewrite '__copy_user_nocache' function
x86: remove 'zerorest' argument from __copy_user_nocache()
x86: set FSRS automatically on AMD CPUs that have FSRM
x86: improve on the non-rep 'copy_user' function
x86: improve on the non-rep 'clear_user' function
x86: inline the 'rep movs' in user copies for the FSRM case
x86: move stac/clac from user copy routines into callers
x86: don't use REP_GOOD or ERMS for user memory clearing
x86: don't use REP_GOOD or ERMS for user memory copies
x86: don't use REP_GOOD or ERMS for small memory clearing
x86: don't use REP_GOOD or ERMS for small memory copies
Linus Torvalds [Thu, 30 Mar 2023 21:53:51 +0000 (14:53 -0700)]
iov: improve copy_iovec_from_user() code generation
Use the same pattern as the compat version of this code does: instead of
copying the whole array to a kernel buffer and then having a separate
phase of verifying it, just do it one entry at a time, verifying as you
go.
On Jens' /dev/zero readv() test this improves performance by ~6%.
[ This was obviously triggered by Jens' ITER_UBUF updates series ]
Reported-and-tested-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/all/de35d11d-bce7-e976-7372-1f2caf417103@kernel.dk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 24 Apr 2023 17:29:28 +0000 (10:29 -0700)]
Merge tag 'iter-ubuf.2-2023-04-21' of git://git.kernel.dk/linux
Pull ITER_UBUF updates from Jens Axboe:
"This turns singe vector imports into ITER_UBUF, rather than
ITER_IOVEC.
The former is more trivial to iterate and advance, and hence a bit
more efficient. From some very unscientific testing, ~60% of all iovec
imports are single vector"
* tag 'iter-ubuf.2-2023-04-21' of git://git.kernel.dk/linux:
iov_iter: Mark copy_compat_iovec_from_user() noinline
iov_iter: import single vector iovecs as ITER_UBUF
iov_iter: convert import_single_range() to ITER_UBUF
iov_iter: overlay struct iovec and ubuf/len
iov_iter: set nr_segs = 1 for ITER_UBUF
iov_iter: remove iov_iter_iovec()
iov_iter: add iter_iov_addr() and iter_iov_len() helpers
ALSA: pcm: check for user backed iterator, not specific iterator type
IB/qib: check for user backed iterator, not specific iterator type
IB/hfi1: check for user backed iterator, not specific iterator type
iov_iter: add iter_iovec() helper
block: ensure bio_alloc_map_data() deals with ITER_UBUF correctly
Linus Torvalds [Mon, 24 Apr 2023 17:26:22 +0000 (10:26 -0700)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM development updates from Russell King:
"Four changes for v6.4:
- simplify the path to the top vmlinux
- three patches to fix vfp with instrumentation enabled (eg lockdep)"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9294/2: vfp: Fix broken softirq handling with instrumentation enabled
ARM: 9293/1: vfp: Pass successful return address via register R3
ARM: 9292/1: vfp: Pass thread_info pointer to vfp_support_entry
ARM: 9291/1: decompressor: simplify the path to the top vmlinux
Ruihan Li [Mon, 24 Apr 2023 16:21:10 +0000 (00:21 +0800)]
scripts: Remove ICC-related dead code
Intel compiler support has already been completely removed in commit
95207db8166a ("Remove Intel compiler support"). However, it appears
that there is still some ICC-related code in scripts/cc-version.sh.
There is no harm in leaving the code as it is, but removing the dead
code makes the codebase a bit cleaner.
Hopefully all ICC-related stuff in the build scripts will be removed
after this commit, given the grep output as below:
(linux/scripts) $ grep -i -w -R 'icc'
cc-version.sh:ICC)
cc-version.sh: min_version=$($min_tool_version icc)
dtc/include-prefixes/arm64/qcom/sm6350.dtsi:#include <dt-bindings/interconnect/qcom,icc.h>
Fixes: 95207db8166a ("Remove Intel compiler support")
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Mon, 24 Apr 2023 16:37:20 +0000 (18:37 +0200)]
Merge branches 'pm-core', 'pm-sleep', 'pm-opp' and 'pm-tools'
Merge PM core changes, updates related to system sleep support,
operating performance points (OPP) changes and power management
utilities changes for 6.4-rc1:
- Drop unnecessary (void *) conversions from the PM core (Li zeming).
- Add sysfs files to represent time spent in a platform sleep state
during suspend-to-idle and make AMD and Intel PMC drivers use them
(Mario Limonciello).
- Use of_property_present() for testing DT property presence (Rob
Herring).
- Add set_required_opps() callback to the 'struct opp_table', to make
the code paths cleaner (Viresh Kumar).
- Update the pm-graph siute of utilities to v5.11 with the following
changes:
* New script which allows users to install the latest pm-graph
from the upstream github repo.
* Update all the dmesg suspend/resume PM print formats to be able to
process recent timelines using dmesg only.
* Add ethtool output to the log for the system's ethernet device if
ethtool exists.
* Make the tool more robustly handle events where mangled dmesg or
ftrace outputs do not include all the requisite data.
- Make the sleepgraph utility recognize "CPU killed" messages (Xueqin
Luo).
* pm-core:
PM: core: Remove unnecessary (void *) conversions
* pm-sleep:
platform/x86/intel/pmc: core: Report duration of time in HW sleep state
platform/x86/intel/pmc: core: Always capture counters on suspend
platform/x86/amd: pmc: Report duration of time in hw sleep state
PM: Add sysfs files to represent time spent in hardware sleep state
* pm-opp:
OPP: Move required opps configuration to specialized callback
OPP: Handle all genpd cases together in _set_required_opps()
opp: Use of_property_present() for testing DT property presence
* pm-tools:
PM: tools: sleepgraph: Recognize "CPU killed" messages
pm-graph: Update to v5.11
Rafael J. Wysocki [Mon, 24 Apr 2023 16:36:07 +0000 (18:36 +0200)]
Merge branch 'pm-cpuidle'
Merge a cpuidle change for 6.4-rc1:
- Use of_property_present() for testing DT property presence in the
cpuidle code (Rob Herring).
* pm-cpuidle:
cpuidle: Use of_property_present() for testing DT property presence
Rafael J. Wysocki [Mon, 24 Apr 2023 16:24:47 +0000 (18:24 +0200)]
Merge branch 'pm-cpufreq'
Merge cpufreq updates for 6.4-rc1:
- Fix the frequency unit in cpufreq_verify_current_freq checks()
(Sanjay Chandrashekara).
- Make mode_state_machine in amd-pstate static (Tom Rix).
- Make the cpufreq core require drivers with target_index() to set
freq_table (Viresh Kumar).
- Fix typo in the ARM_BRCMSTB_AVS_CPUFREQ Kconfig entry (Jingyu Wang).
- Use of_property_read_bool() for boolean properties in the pmac32
cpufreq driver (Rob Herring).
- Make the cpufreq sysfs interface return proper error codes on
obviously invalid input (qinyu).
- Add guided autonomous mode support to the AMD P-state driver (Wyes
Karny).
- Make the Intel P-state driver enable HWP IO boost on all server
platforms (Srinivas Pandruvada).
- Add opp and bandwidth support to tegra194 cpufreq driver (Sumit
Gupta).
- Use of_property_present() for testing DT property presence (Rob
Herring).
- Remove MODULE_LICENSE in non-modules (Nick Alcock).
- Add SM7225 to cpufreq-dt-platdev blocklist (Luca Weiss).
- Optimizations and fixes for qcom-cpufreq-hw driver (Krzysztof
Kozlowski, Konrad Dybcio, and Bjorn Andersson).
- DT binding updates for qcom-cpufreq-hw driver (Konrad Dybcio and
Bartosz Golaszewski).
- Updates and fixes for mediatek driver (Jia-Wei Chang and
AngeloGioacchino Del Regno).
* pm-cpufreq: (29 commits)
cpufreq: use correct unit when verify cur freq
cpufreq: tegra194: add OPP support and set bandwidth
cpufreq: amd-pstate: Make varaiable mode_state_machine static
cpufreq: drivers with target_index() must set freq_table
cpufreq: qcom-cpufreq-hw: Revert adding cpufreq qos
dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCM2290
dt-bindings: cpufreq: cpufreq-qcom-hw: Sanitize data per compatible
dt-bindings: cpufreq: cpufreq-qcom-hw: Allow just 1 frequency domain
cpufreq: Add SM7225 to cpufreq-dt-platdev blocklist
cpufreq: qcom-cpufreq-hw: fix double IO unmap and resource release on exit
cpufreq: mediatek: Raise proc and sram max voltage for MT7622/7623
cpufreq: mediatek: raise proc/sram max voltage for MT8516
cpufreq: mediatek: fix KP caused by handler usage after regulator_put/clk_put
cpufreq: mediatek: fix passing zero to 'PTR_ERR'
cpufreq: pmac32: Use of_property_read_bool() for boolean properties
cpufreq: Fix typo in the ARM_BRCMSTB_AVS_CPUFREQ Kconfig entry
cpufreq: warn about invalid vals to scaling_max/min_freq interfaces
Documentation: cpufreq: amd-pstate: Update amd_pstate status sysfs for guided
cpufreq: amd-pstate: Add guided mode control support via sysfs
cpufreq: amd-pstate: Add guided autonomous mode
...
Rafael J. Wysocki [Mon, 24 Apr 2023 16:07:56 +0000 (18:07 +0200)]
Merge branches 'acpi-utils' and 'acpi-docs'
Merge ACPI utilities and documentation updates for 6.4-rc1:
- Fix acpi_evaluate_dsm_typed() redefinition error (Kiran K).
- Update the pm_profile sysfs attribute documentation (Rafael Wysocki).
* acpi-utils:
ACPI: utils: Fix acpi_evaluate_dsm_typed() redefinition error
* acpi-docs:
ACPI: docs: Update the pm_profile sysfs attribute documentation
Rafael J. Wysocki [Mon, 24 Apr 2023 16:01:57 +0000 (18:01 +0200)]
Merge branches 'acpi-bus', 'acpi-video' and 'acpi-misc'
Merge ACPI bus type driver changes, ACPI backlight driver updates and a
series of cleanups related to of.h for 6.4-rc1:
- Ensure that ACPI notify handlers are not running after removal and
clean up code in acpi_sb_notify() (Rafael Wysocki).
- Remove register_backlight_delay module option and code and remove
quirks for false-positive backlight control support advertised on
desktop boards (Hans de Goede).
- Replace irqdomain.h include with struct declarations in ACPI headers
and update several pieces of code previously including of.h
implicitly through those headers (Rob Herring).
* acpi-bus:
ACPI: bus: Ensure that notify handlers are not running after removal
ACPI: bus: Add missing braces to acpi_sb_notify()
* acpi-video:
ACPI: video: Remove desktops without backlight DMI quirks
ACPI: video: Remove register_backlight_delay module option and code
* acpi-misc:
ACPI: Replace irqdomain.h include with struct declarations
fpga: lattice-sysconfig-spi: Add explicit include for of.h
tpm: atmel: Add explicit include for of.h
virtio-mmio: Add explicit include for of.h
pata: ixp4xx: Add explicit include for of.h
ata: pata_macio: Add explicit include of irqdomain.h
serial: 8250_tegra: Add explicit include for of.h
net: rfkill-gpio: Add explicit include for of.h
staging: iio: resolver: ad2s1210: Add explicit include for of.h
iio: adc: ad7292: Add explicit include for of.h
Rafael J. Wysocki [Mon, 24 Apr 2023 15:54:02 +0000 (17:54 +0200)]
Merge branches 'acpi-apei', 'acpi-properties', 'acpi-sbs' and 'acpi-thermal'
Merge ACPI APEI changes, ACPI device properties handling update, ACPI
SBS driver fixes and ACPI thermal driver cleanup for 6.4-rc1:
- Make the APEI error injection code warn on invalid arguments when
explicitly indicated by platform (Shuai Xue).
- Add CXL error types to the error injection code in APEI (Tony Luck).
- Refactor acpi_data_prop_read_single() (Andy Shevchenko).
- Fix two issues in the ACPI SBS driver (Armin Wolf).
- Replace ternary operator with min_t() in the generic ACPI thermal
zone driver (Jiangshan Yi).
* acpi-apei:
ACPI: APEI: EINJ: warn on invalid argument when explicitly indicated by platform
ACPI: APEI: EINJ: Add CXL error types
* acpi-properties:
ACPI: property: Refactor acpi_data_prop_read_single()
* acpi-sbs:
ACPI: SBS: Fix handling of Smart Battery Selectors
ACPI: EC: Fix oops when removing custom query handlers
ACPI: EC: Limit explicit removal of query handlers to custom query handlers
* acpi-thermal:
ACPI: thermal: Replace ternary operator with min_t()
Rafael J. Wysocki [Mon, 24 Apr 2023 15:45:49 +0000 (17:45 +0200)]
Merge branches 'acpi-processor', 'acpi-pm', 'acpi-tables' and 'acpi-sysfs'
Merge ACPI processor driver changes, ACPI power management updates,
changes related to parsing ACPI tables and an ACPI sysfs interface
update for 6.4-rc1:
- Fix evaluating the _PDC ACPI control method when running as Xen
dom0 (Roger Pau Monne).
- Use platform devices to load ACPI PPC and PCC drivers (Petr Pavlu).
- Check for null return of devm_kzalloc() in fch_misc_setup() (Kang
Chen).
- Log a message if enable_irq_wake() fails for the ACPI SCI (Simon
Gaiser).
- Initialize the correct IOMMU fwspec while parsing ACPI VIOT
(Jean-Philippe Brucker).
- Amend indentation and prefix error messages with FW_BUG in the ACPI
SPCR parsing code (Andy Shevchenko).
- Enable ACPI sysfs support for CCEL records (Kuppuswamy
Sathyanarayanan).
* acpi-processor:
ACPI: processor: Fix evaluating _PDC method when running as Xen dom0
ACPI: cpufreq: Use platform devices to load ACPI PPC and PCC drivers
ACPI: processor: Check for null return of devm_kzalloc() in fch_misc_setup()
* acpi-pm:
ACPI: s2idle: Log when enabling wakeup IRQ fails
* acpi-tables:
ACPI: VIOT: Initialize the correct IOMMU fwspec
ACPI: SPCR: Amend indentation
ACPI: SPCR: Prefix error messages with FW_BUG
* acpi-sysfs:
ACPI: sysfs: Enable ACPI sysfs support for CCEL records
Rafael J. Wysocki [Mon, 24 Apr 2023 15:31:41 +0000 (17:31 +0200)]
Merge branch 'acpica'
Merge ACPICA material for 6.4-rc1:
- Delete bogus node_array array of pointers from AEST table (Jessica
Clarke).
- Add support for trace buffer extension in GICC to the ACPI MADT
parser (Xiongfeng Wang).
- Add missing macro ACPI_FUNCTION_TRACE() for acpi_ns_repair_HID()
(Xiongfeng Wang).
- Add missing tables to astable (Pedro Falcato).
- Add support for 64 bit loong_arch compilation to ACPICA (Huacai
Chen).
- Add support for ASPT table in disassembler to ACPICA (Jeremi
Piotrowski).
- Add support for Arm's MPAM ACPI table version 2 (Hesham Almatary).
- Update all copyrights/signons in ACPICA to 2023 (Bob Moore).
- Add support for ClockInput resource (v6.5) (Niyas Sait).
- Add RISC-V INTC interrupt controller definition to the list of
supported interrupt controllers for MADT (Sunil V L).
- Add structure definitions for the RISC-V RHCT ACPI table (Sunil V L).
- Address several cases in which the ACPICA code might lead to
undefined behavior (Tamir Duberstein).
- Make ACPICA code support flexible arrays properly (Kees Cook).
- Check null return of ACPI_ALLOCATE_ZEROED in
acpi_db_display_objects() (void0red).
- Add os specific support for Zephyr RTOS to ACPICA (Najumon).
- Update version to
20230331 (Bob Moore).
* acpica: (32 commits)
ACPICA: Update version to
20230331
ACPICA: add os specific support for Zephyr RTOS
ACPICA: ACPICA: check null return of ACPI_ALLOCATE_ZEROED in acpi_db_display_objects
ACPICA: acpi_resource_irq: Replace 1-element arrays with flexible array
ACPICA: acpi_madt_oem_data: Fix flexible array member definition
ACPICA: acpi_dmar_andd: Replace 1-element array with flexible array
ACPICA: acpi_pci_routing_table: Replace fixed-size array with flex array member
ACPICA: struct acpi_resource_dma: Replace 1-element array with flexible array
ACPICA: Introduce ACPI_FLEX_ARRAY
ACPICA: struct acpi_nfit_interleave: Replace 1-element array with flexible array
ACPICA: actbl2: Replace 1-element arrays with flexible arrays
ACPICA: actbl1: Replace 1-element arrays with flexible arrays
ACPICA: struct acpi_resource_vendor: Replace 1-element array with flexible array
ACPICA: Avoid undefined behavior: load of misaligned address
ACPICA: Avoid undefined behavior: member access within misaligned address
ACPICA: Avoid undefined behavior: member access within misaligned address
ACPICA: Avoid undefined behavior: member access within misaligned address
ACPICA: Avoid undefined behavior: member access within misaligned address
ACPICA: Avoid undefined behavior: member access within null pointer
ACPICA: Avoid undefined behavior: applying zero offset to null pointer
...
Petr Mladek [Mon, 24 Apr 2023 14:49:08 +0000 (16:49 +0200)]
Merge branch 'for-6.4/doc' into for-linus
Jarkko Sakkinen [Sun, 23 Apr 2023 15:49:58 +0000 (18:49 +0300)]
tpm: Add !tpm_amd_is_rng_defective() to the hwrng_unregister() call site
The following crash was reported:
[ 1950.279393] list_del corruption,
ffff99560d485790->next is NULL
[ 1950.279400] ------------[ cut here ]------------
[ 1950.279401] kernel BUG at lib/list_debug.c:49!
[ 1950.279405] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 1950.279407] CPU: 11 PID: 5886 Comm: modprobe Tainted: G O 6.2.8_1 #1
[ 1950.279409] Hardware name: Gigabyte Technology Co., Ltd. B550M AORUS PRO-P/B550M AORUS PRO-P,
BIOS F15c 05/11/2022
[ 1950.279410] RIP: 0010:__list_del_entry_valid+0x59/0xc0
[ 1950.279415] Code: 48 8b 01 48 39 f8 75 5a 48 8b 72 08 48 39 c6 75 65 b8 01 00 00 00 c3 cc cc cc
cc 48 89 fe 48 c7 c7 08 a8 13 9e e8 b7 0a bc ff <0f> 0b 48 89 fe 48 c7 c7 38 a8 13 9e e8 a6 0a bc
ff 0f 0b 48 89 fe
[ 1950.279416] RSP: 0018:
ffffa96d05647e08 EFLAGS:
00010246
[ 1950.279418] RAX:
0000000000000033 RBX:
ffff99560d485750 RCX:
0000000000000000
[ 1950.279419] RDX:
0000000000000000 RSI:
ffffffff9e107c59 RDI:
00000000ffffffff
[ 1950.279420] RBP:
ffffffffc19c5168 R08:
0000000000000000 R09:
ffffa96d05647cc8
[ 1950.279421] R10:
0000000000000003 R11:
ffffffff9ea2a568 R12:
0000000000000000
[ 1950.279422] R13:
ffff99560140a2e0 R14:
ffff99560127d2e0 R15:
0000000000000000
[ 1950.279422] FS:
00007f67da795380(0000) GS:
ffff995d1f0c0000(0000) knlGS:
0000000000000000
[ 1950.279424] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 1950.279424] CR2:
00007f67da7e65c0 CR3:
00000001feed2000 CR4:
0000000000750ee0
[ 1950.279426] PKRU:
55555554
[ 1950.279426] Call Trace:
[ 1950.279428] <TASK>
[ 1950.279430] hwrng_unregister+0x28/0xe0 [rng_core]
[ 1950.279436] tpm_chip_unregister+0xd5/0xf0 [tpm]
Add the forgotten !tpm_amd_is_rng_defective() invariant to the
hwrng_unregister() call site inside tpm_chip_unregister().
Cc: stable@vger.kernel.org
Reported-by: Martin Dimov <martin@dmarto.com>
Link: https://lore.kernel.org/linux-integrity/3d1d7e9dbfb8c96125bc93b6b58b90a7@dmarto.com/
Fixes: f1324bbc4011 ("tpm: disable hwrng for fTPM on some AMD designs")
Fixes: b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted sources")
Tested-by: Martin Dimov <martin@dmarto.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Haris Okanovic [Wed, 19 Apr 2023 15:41:30 +0000 (17:41 +0200)]
tpm_tis: fix stall after iowrite*()s
ioread8() operations to TPM MMIO addresses can stall the CPU when
immediately following a sequence of iowrite*()'s to the same region.
For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in CPU or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.
The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.
Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
Link: https://lore.kernel.org/r/20230323153436.B2SATnZV@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Uwe Kleine-König [Mon, 20 Mar 2023 08:06:07 +0000 (09:06 +0100)]
tpm/tpm_tis_synquacer: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>