Linus Torvalds [Wed, 28 Jun 2023 04:12:41 +0000 (21:12 -0700)]
Merge tag 'execve-v6.5-rc1' of git://git./linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
- Fix a few comments for correctness and typos (Baruch Siach)
- Small simplifications for binfmt (Christophe JAILLET)
- Set p_align to 4 for PT_NOTE in core dump (Fangrui Song)
* tag 'execve-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf: fix comment typo s/reset/regset/
elf: correct note name comment
binfmt: Slightly simplify elf_fdpic_map_file()
binfmt: Use struct_size()
coredump, vmcore: Set p_align to 4 for PT_NOTE
Linus Torvalds [Wed, 28 Jun 2023 00:58:06 +0000 (17:58 -0700)]
Merge tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next
Pull smack updates from Casey Schaufler:
"There are two patches, both of which change how Smack initializes the
SMACK64TRANSMUTE extended attribute.
The first corrects the behavior of overlayfs, which creates inodes
differently from other filesystems. The second ensures that transmute
attributes specified by mount options are correctly assigned"
* tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next:
smack: Record transmuting in smk_transmuted
smack: Retrieve transmuting information in smack_inode_getsecurity()
Linus Torvalds [Wed, 28 Jun 2023 00:32:34 +0000 (17:32 -0700)]
Merge tag 'integrity-v6.5' of git://git./linux/kernel/git/zohar/linux-integrity
Pull integrity subsystem updates from Mimi Zohar:
"An i_version change, one bug fix, and three kernel doc fixes:
- instead of IMA detecting file change by directly accesssing
i_version, it now calls vfs_getattr_nosec().
- fix a race condition when inserting a new node in the iint rb-tree"
* tag 'integrity-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: Fix build warnings
evm: Fix build warnings
evm: Complete description of evm_inode_setattr()
integrity: Fix possible multiple allocation in integrity_inode_get()
IMA: use vfs_getattr_nosec to get the i_version
Linus Torvalds [Wed, 28 Jun 2023 00:24:26 +0000 (17:24 -0700)]
Merge tag 'lsm-pr-
20230626' of git://git./linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- A SafeSetID patch to correct what appears to be a cut-n-paste typo in
the code causing a UID to be printed where a GID was desired.
This is coming via the LSM tree because we haven't been able to get a
response from the SafeSetID maintainer (Micah Morton) in several
months. Hopefully we are able to get in touch with Micah, but until
we do I'm going to pick them up in the LSM tree.
- A small fix to the reiserfs LSM xattr code.
We're continuing to work through some issues with the reiserfs code
as we try to fixup the LSM xattr handling, but in the process we're
uncovering some ugly problems in reiserfs and we may just end up
removing the LSM xattr support in reiserfs prior to reiserfs'
removal.
For better or worse, this shouldn't impact any of the reiserfs users,
as we discovered that LSM xattrs on reiserfs were completely broken,
meaning no one is currently using the combo of reiserfs and a file
labeling LSM.
- A tweak to how the cap_user_data_t struct/typedef is declared in the
header file to appease the Sparse gods.
- In the process of trying to sort out the SafeSetID lost-maintainer
problem I realized that I needed to update the labeled networking
entry to "Supported".
- Minor comment/documentation and spelling fixes.
* tag 'lsm-pr-
20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
device_cgroup: Fix kernel-doc warnings in device_cgroup
SafeSetID: fix UID printed instead of GID
MAINTAINERS: move labeled networking to "supported"
capability: erase checker warnings about struct __user_cap_data_struct
lsm: fix a number of misspellings
reiserfs: Initialize sec->length in reiserfs_security_init().
capability: fix kernel-doc warnings in capability.c
Linus Torvalds [Wed, 28 Jun 2023 00:18:48 +0000 (17:18 -0700)]
Merge tag 'selinux-pr-
20230626' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Thanks to help from the MPTCP folks, it looks like we have finally
sorted out a proper solution to the MPTCP socket labeling issue, see
the new security_mptcp_add_subflow() LSM hook.
- Fix the labeled NFS handling such that a labeled NFS share mounted
prior to the initial SELinux policy load is properly labeled once a
policy is loaded; more information in the commit description.
- Two patches to security/selinux/Makefile, the first took the cleanups
in v6.4 a bit further and the second removed the grouped targets
support as that functionality doesn't appear to be properly supported
prior to make v4.3.
- Deprecate the "fs" object context type in SELinux policies. The fs
object context type was an old vestige that was introduced back in
v2.6.12-rc2 but never really used.
- A number of small changes that remove dead code, clean up some
awkward bits, and generally improve the quality of the code. See the
individual commit descriptions for more information.
* tag 'selinux-pr-
20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: avoid bool as identifier name
selinux: fix Makefile for versions of make < v4.3
selinux: make labeled NFS work when mounted before policy load
selinux: cleanup exit_sel_fs() declaration
selinux: deprecated fs ocon
selinux: make header files self-including
selinux: keep context struct members in sync
selinux: Implement mptcp_add_subflow hook
security, lsm: Introduce security_mptcp_add_subflow()
selinux: small cleanups in selinux_audit_rule_init()
selinux: declare read-only data arrays const
selinux: retain const qualifier on string literal in avtab_hash_eval()
selinux: drop return at end of void function avc_insert()
selinux: avc: drop unused function avc_disable()
selinux: adjust typos in comments
selinux: do not leave dangling pointer behind
selinux: more Makefile tweaks
Linus Torvalds [Wed, 28 Jun 2023 00:15:35 +0000 (17:15 -0700)]
Merge tag 'audit-pr-
20230626' of git://git./linux/kernel/git/pcmoore/audit
Pull audit update from Paul Moore:
"A single audit patch that resolves two compiler warnings regarding
missing function prototypes"
* tag 'audit-pr-
20230626' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: avoid missing-prototype warnings
Linus Torvalds [Wed, 28 Jun 2023 00:10:27 +0000 (17:10 -0700)]
Merge tag 'landlock-6.5-rc1' of git://git./linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"Add support for Landlock to UML.
To do this, this fixes the way hostfs manages inodes according to the
underlying filesystem [1]. They are now properly handled as for other
filesystems, which enables Landlock support (and probably other
features).
This also extends Landlock's tests with 6 pseudo filesystems,
including hostfs"
[1] https://lore.kernel.org/all/
20230612191430.339153-1-mic@digikod.net/
* tag 'landlock-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Add hostfs tests
selftests/landlock: Add tests for pseudo filesystems
selftests/landlock: Make mounts configurable
selftests/landlock: Add supports_filesystem() helper
selftests/landlock: Don't create useless file layouts
hostfs: Fix ephemeral inodes
Linus Torvalds [Tue, 27 Jun 2023 23:54:21 +0000 (16:54 -0700)]
Merge tag 'cgroup-for-6.5' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Whenever cpuset needs to rebuild sched_domain, it walked all tasks
looking for DEADLINE tasks as they need to be accounted on the new
domain. Walking all tasks can be expensive and there may not be any
DEADLINE tasks at all. Task iteration is now omitted if there are no
DEADLINE tasks
- Fixes DEADLINE bandwidth misaccounting after task migration failures
- When no controller is enabled, -Wstringop-overflow warning is
triggered. The fix patch added an early exit which is too eager and
got reverted for now. Will fix later
- Everything else is minor cleanups
* tag 'cgroup-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Revert "cgroup: Avoid -Wstringop-overflow warnings"
cgroup/misc: Expose misc.current on cgroup v2 root
cgroup: Avoid -Wstringop-overflow warnings
cgroup: remove obsolete comment on cgroup_on_dfl()
cgroup: remove unused task_cgroup_path()
cgroup/cpuset: remove unneeded header files
cgroup: make cgroup_is_threaded() and cgroup_is_thread_root() static
rdmacg: fix kernel-doc warnings in rdmacg
cgroup: Replace the css_set call with cgroup_get
cgroup: remove unused macro for_each_e_css()
cgroup: Update out-of-date comment in cgroup_migrate()
cgroup: Replace all non-returning strlcpy with strscpy
cgroup/cpuset: remove unneeded header files
cgroup/cpuset: Free DL BW in case can_attach() fails
sched/deadline: Create DL BW alloc, free & check overflow interface
cgroup/cpuset: Iterate only if DEADLINE tasks are present
sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
sched/cpuset: Bring back cpuset_mutex
cgroup/cpuset: Rename functions dealing with DEADLINE accounting
Linus Torvalds [Tue, 27 Jun 2023 23:46:06 +0000 (16:46 -0700)]
Merge tag 'wq-for-6.5-cleanup-ordered' of git://git./linux/kernel/git/tj/wq
Pull ordered workqueue creation updates from Tejun Heo:
"For historical reasons, unbound workqueues with max concurrency limit
of 1 are considered ordered, even though the concurrency limit hasn't
been system-wide for a long time.
This creates ambiguity around whether ordered execution is actually
required for correctness, which was actually confusing for e.g. btrfs
(btrfs updates are being routed through the btrfs tree).
There aren't that many users in the tree which use the combination and
there are pending improvements to unbound workqueue affinity handling
which will make inadvertent use of ordered workqueue a bigger loss.
This clarifies the situation for most of them by updating the ones
which require ordered execution to use alloc_ordered_workqueue().
There are some conversions being routed through subsystem-specific
trees and likely a few stragglers. Once they're all converted,
workqueue can trigger a warning on unbound + @max_active==1 usages and
eventually drop the implicit ordered behavior"
* tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
rxrpc: Use alloc_ordered_workqueue() to create ordered workqueues
net: qrtr: Use alloc_ordered_workqueue() to create ordered workqueues
net: wwan: t7xx: Use alloc_ordered_workqueue() to create ordered workqueues
dm integrity: Use alloc_ordered_workqueue() to create ordered workqueues
media: amphion: Use alloc_ordered_workqueue() to create ordered workqueues
scsi: NCR5380: Use default @max_active for hostdata->work_q
media: coda: Use alloc_ordered_workqueue() to create ordered workqueues
crypto: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues
wifi: ath10/11/12k: Use alloc_ordered_workqueue() to create ordered workqueues
wifi: mwifiex: Use default @max_active for workqueues
wifi: iwlwifi: Use default @max_active for trans_pcie->rba.alloc_wq
xen/pvcalls: Use alloc_ordered_workqueue() to create ordered workqueues
virt: acrn: Use alloc_ordered_workqueue() to create ordered workqueues
net: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues
net: thunderx: Use alloc_ordered_workqueue() to create ordered workqueues
greybus: Use alloc_ordered_workqueue() to create ordered workqueues
powerpc, workqueue: Use alloc_ordered_workqueue() to create ordered workqueues
Linus Torvalds [Tue, 27 Jun 2023 23:32:52 +0000 (16:32 -0700)]
Merge tag 'wq-for-6.5' of git://git./linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
- Concurrency-managed per-cpu work items that hog CPUs and delay the
execution of other work items are now automatically detected and
excluded from concurrency management. Reporting on such work items
can also be enabled through a config option.
- Added tools/workqueue/wq_monitor.py which improves visibility into
workqueue usages and behaviors.
- Arnd's minimal fix for gcc-13 enum warning on 32bit compiles,
superseded by commit
afa4bb778e48 in mainline.
* tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0
workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle()
workqueue: fix enum type for gcc-13
workqueue: Track and monitor per-workqueue CPU time usage
workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism
workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE
workqueue: Improve locking rule description for worker fields
workqueue: Move worker_set/clr_flags() upwards
workqueue: Re-order struct worker fields
workqueue: Add pwq->stats[] and a monitoring script
Further upgrade queue_work_on() comment
Linus Torvalds [Tue, 27 Jun 2023 23:03:20 +0000 (16:03 -0700)]
Merge tag 'for-linus-6.5-rc1-tag' of git://git./linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- three patches adding missing prototypes
- a fix for finding the iBFT in a Xen dom0 for supporting diskless
iSCSI boot
* tag 'for-linus-6.5-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86: xen: add missing prototypes
x86/xen: add prototypes for paravirt mmu functions
iscsi_ibft: Fix finding the iBFT under Xen Dom 0
xen: xen_debug_interrupt prototype to global header
Linus Torvalds [Tue, 27 Jun 2023 22:49:10 +0000 (15:49 -0700)]
Merge tag 's390-6.5-1' of git://git./linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Fix the style of protected key API driver source: use x-mas tree for
all local variable declarations
- Rework protected key API driver to not use the struct pkey_protkey
and pkey_clrkey anymore. Both structures have a fixed size buffer,
but with the support of ECC protected key these buffers are not big
enough. Use dynamic buffers internally and transparently for
userspace
- Add support for a new 'non CCA clear key token' with ECC clear keys
supported: ECC P256, ECC P384, ECC P521, ECC ED25519 and ECC ED448.
This makes it possible to derive a protected key from the ECC clear
key input via PKEY_KBLOB2PROTK3 ioctl, while currently the only way
to derive is via PCKMO instruction
- The s390 PMU of PAI crypto and extension 1 NNPA counters use atomic_t
for reference counting. Replace this with the proper data type
refcount_t
- Select ARCH_SUPPORTS_INT128, but limit this to clang for now, since
gcc generates inefficient code, which may lead to stack overflows
- Replace one-element array with flexible-array member in struct
vfio_ccw_parent and refactor the rest of the code accordingly. Also,
prefer struct_size() over sizeof() open- coded versions
- Introduce OS_INFO_FLAGS_ENTRY pointing to a flags field and
OS_INFO_FLAG_REIPL_CLEAR flag that informs a dumper whether the
system memory should be cleared or not once dumped
- Fix a hang when a user attempts to remove a VFIO-AP mediated device
attached to a guest: add VFIO_DEVICE_GET_IRQ_INFO and
VFIO_DEVICE_SET_IRQS IOCTLs and wire up the VFIO bus driver callback
to request a release of the device
- Fix calculation for R_390_GOTENT relocations for modules
- Allow any user space process with CAP_PERFMON capability read and
display the CPU Measurement facility counter sets
- Rework large statically-defined per-CPU cpu_cf_events data structure
and replace it with dynamically allocated structures created when a
perf_event_open() system call is invoked or /dev/hwctr device is
accessed
* tag 's390-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cpum_cf: rework PER_CPU_DEFINE of struct cpu_cf_events
s390/cpum_cf: open access to hwctr device for CAP_PERFMON privileged process
s390/module: fix rela calculation for R_390_GOTENT
s390/vfio-ap: wire in the vfio_device_ops request callback
s390/vfio-ap: realize the VFIO_DEVICE_SET_IRQS ioctl
s390/vfio-ap: realize the VFIO_DEVICE_GET_IRQ_INFO ioctl
s390/pkey: add support for ecc clear key
s390/pkey: do not use struct pkey_protkey
s390/pkey: introduce reverse x-mas trees
s390/zcore: conditionally clear memory on reipl
s390/ipl: add REIPL_CLEAR flag to os_info
vfio/ccw: use struct_size() helper
vfio/ccw: replace one-element array with flexible-array member
s390: select ARCH_SUPPORTS_INT128
s390/pai_ext: replace atomic_t with refcount_t
s390/pai_crypto: replace atomic_t with refcount_t
Linus Torvalds [Tue, 27 Jun 2023 22:44:11 +0000 (15:44 -0700)]
Merge tag 'xtensa-
20230627' of https://github.com/jcmvbkbc/linux-xtensa
Pull xtensa updates from Max Filippov:
- clean up platform_* interface of the xtensa architecture
- enable HAVE_ASM_MODVERSIONS
- drop ARCH_WANT_FRAME_POINTERS
- clean up unaligned access exception handler
- provide handler for load/store exceptions
- various small fixes and cleanups
* tag 'xtensa-
20230627' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: dump userspace code around the exception PC
xtensa: rearrange show_stack output
xtensa: add load/store exception handler
xtensa: rearrange unaligned exception handler
xtensa: always install slow handler for unaligned access exception
xtensa: move early_trap_init from kasan_early_init to init_arch
xtensa: drop ARCH_WANT_FRAME_POINTERS
xtensa: report trax and perf counters in cpuinfo
xtensa: add asm-prototypes.h
xtensa: only build __strncpy_user with CONFIG_ARCH_HAS_STRNCPY_FROM_USER
xtensa: drop bcopy implementation
xtensa: drop EXPORT_SYMBOL for common_exception_return
xtensa: boot-redboot: clean up Makefile
xtensa: clean up default platform functions
xtensa: drop platform_halt and platform_power_off
xtensa: drop platform_restart
xtensa: drop platform_heartbeat
xtensa: xt2000: drop empty platform_init
Dinh Nguyen [Tue, 27 Jun 2023 22:14:30 +0000 (17:14 -0500)]
Revert "nios2: Convert __pte_free_tlb() to use ptdescs"
This reverts commit
6ebe94baa2b9ddf3ccbb7f94df6ab26234532734.
The patch "nios2: Convert __pte_free_tlb() to use ptdescs" was supposed
to go together with a patchset that Vishal Moola had planned taking it
through the mm tree. By just having this patch, all NIOS2 builds are
broken.
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 27 Jun 2023 22:05:41 +0000 (15:05 -0700)]
Merge tag 'objtool-core-2023-06-27' of git://git./linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molar:
"Build footprint & performance improvements:
- Reduce memory usage with CONFIG_DEBUG_INFO=y
In the worst case of an allyesconfig+CONFIG_DEBUG_INFO=y kernel,
DWARF creates almost 200 million relocations, ballooning objtool's
peak heap usage to 53GB. These patches reduce that to 25GB.
On a distro-type kernel with kernel IBT enabled, they reduce
objtool's peak heap usage from 4.2GB to 2.8GB.
These changes also improve the runtime significantly.
Debuggability improvements:
- Add the unwind_debug command-line option, for more extend unwinding
debugging output
- Limit unreachable warnings to once per function
- Add verbose option for disassembling affected functions
- Include backtrace in verbose mode
- Detect missing __noreturn annotations
- Ignore exc_double_fault() __noreturn warnings
- Remove superfluous global_noreturns entries
- Move noreturn function list to separate file
- Add __kunit_abort() to noreturns
Unwinder improvements:
- Allow stack operations in UNWIND_HINT_UNDEFINED regions
- drm/vmwgfx: Add unwind hints around RBP clobber
Cleanups:
- Move the x86 entry thunk restore code into thunk functions
- x86/unwind/orc: Use swap() instead of open coding it
- Remove unnecessary/unused variables
Fixes for modern stack canary handling"
* tag 'objtool-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
x86/orc: Make the is_callthunk() definition depend on CONFIG_BPF_JIT=y
objtool: Skip reading DWARF section data
objtool: Free insns when done
objtool: Get rid of reloc->rel[a]
objtool: Shrink elf hash nodes
objtool: Shrink reloc->sym_reloc_entry
objtool: Get rid of reloc->jump_table_start
objtool: Get rid of reloc->addend
objtool: Get rid of reloc->type
objtool: Get rid of reloc->offset
objtool: Get rid of reloc->idx
objtool: Get rid of reloc->list
objtool: Allocate relocs in advance for new rela sections
objtool: Add for_each_reloc()
objtool: Don't free memory in elf_close()
objtool: Keep GElf_Rel[a] structs synced
objtool: Add elf_create_section_pair()
objtool: Add mark_sec_changed()
objtool: Fix reloc_hash size
objtool: Consolidate rel/rela handling
...
Linus Torvalds [Tue, 27 Jun 2023 21:47:17 +0000 (14:47 -0700)]
Merge tag 'x86-mm-2023-06-27' of git://git./linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
- Remove Xen-PV leftovers from init_32.c
- Fix __swp_entry_to_pte() warning splat for Xen PV guests, triggered
on CONFIG_DEBUG_VM_PGTABLE=y
* tag 'x86-mm-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Remove Xen-PV leftovers from init_32.c
x86/mm: Fix __swp_entry_to_pte() for Xen PV guests
Linus Torvalds [Tue, 27 Jun 2023 21:43:02 +0000 (14:43 -0700)]
Merge tag 'perf-core-2023-06-27' of git://git./linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
- Rework & fix the event forwarding logic by extending the core
interface.
This fixes AMD PMU events that have to be forwarded from the
core PMU to the IBS PMU.
- Add self-tests to test AMD IBS invocation via core PMU events
- Clean up Intel FixCntrCtl MSR encoding & handling
* tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Re-instate the linear PMU search
perf/x86/intel: Define bit macros for FixCntrCtl MSR
perf test: Add selftest to test IBS invocation via core pmu events
perf/core: Remove pmu linear searching code
perf/ibs: Fix interface via core pmu events
perf/core: Rework forwarding of {task|cpu}-clock events
Linus Torvalds [Tue, 27 Jun 2023 21:14:30 +0000 (14:14 -0700)]
Merge tag 'locking-core-2023-06-27' of git://git./linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()
The cmpxchg128() family of functions is basically & functionally the
same as cmpxchg_double(), but with a saner interface.
Instead of a 6-parameter horror that forced u128 - u64/u64-halves
layout details on the interface and exposed users to complexity,
fragility & bugs, use a natural 3-parameter interface with u128
types.
- Restructure the generated atomic headers, and add kerneldoc comments
for all of the generic atomic{,64,_long}_t operations.
The generated definitions are much cleaner now, and come with
documentation.
- Implement lock_set_cmp_fn() on lockdep, for defining an ordering when
taking multiple locks of the same type.
This gets rid of one use of lockdep_set_novalidate_class() in the
bcache code.
- Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable
shadowing generating garbage code on Clang on certain ARM builds.
* tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc
percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg()
locking/atomic: treewide: delete arch_atomic_*() kerneldoc
locking/atomic: docs: Add atomic operations to the driver basic API documentation
locking/atomic: scripts: generate kerneldoc comments
docs: scripts: kernel-doc: accept bitwise negation like ~@var
locking/atomic: scripts: simplify raw_atomic*() definitions
locking/atomic: scripts: simplify raw_atomic_long*() definitions
locking/atomic: scripts: split pfx/name/sfx/order
locking/atomic: scripts: restructure fallback ifdeffery
locking/atomic: scripts: build raw_atomic_long*() directly
locking/atomic: treewide: use raw_atomic*_<op>()
locking/atomic: scripts: add trivial raw_atomic*_<op>()
locking/atomic: scripts: factor out order template generation
locking/atomic: scripts: remove leftover "${mult}"
locking/atomic: scripts: remove bogus order parameter
locking/atomic: xtensa: add preprocessor symbols
locking/atomic: x86: add preprocessor symbols
locking/atomic: sparc: add preprocessor symbols
locking/atomic: sh: add preprocessor symbols
...
Linus Torvalds [Tue, 27 Jun 2023 21:03:21 +0000 (14:03 -0700)]
Merge tag 'sched-core-2023-06-27' of git://git./linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Scheduler SMP load-balancer improvements:
- Avoid unnecessary migrations within SMT domains on hybrid systems.
Problem:
On hybrid CPU systems, (processors with a mixture of
higher-frequency SMT cores and lower-frequency non-SMT cores),
under the old code lower-priority CPUs pulled tasks from the
higher-priority cores if more than one SMT sibling was busy -
resulting in many unnecessary task migrations.
Solution:
The new code improves the load balancer to recognize SMT cores
with more than one busy sibling and allows lower-priority CPUs
to pull tasks, which avoids superfluous migrations and lets
lower-priority cores inspect all SMT siblings for the busiest
queue.
- Implement the 'runnable boosting' feature in the EAS balancer:
consider CPU contention in frequency, EAS max util & load-balance
busiest CPU selection.
This improves CPU utilization for certain workloads, while leaves
other key workloads unchanged.
Scheduler infrastructure improvements:
- Rewrite the scheduler topology setup code by consolidating it into
the build_sched_topology() helper function and building it
dynamically on the fly.
- Resolve the local_clock() vs. noinstr complications by rewriting
the code: provide separate sched_clock_noinstr() and
local_clock_noinstr() functions to be used in instrumentation code,
and make sure it is all instrumentation-safe.
Fixes:
- Fix a kthread_park() race with wait_woken()
- Fix misc wait_task_inactive() bugs unearthed by the -rt merge:
- Fix UP PREEMPT bug by unifying the SMP and UP implementations
- Fix task_struct::saved_state handling
- Fix various rq clock update bugs, unearthed by turning on the rq
clock debugging code.
- Fix the PSI WINDOW_MIN_US trigger limit, which was easy to trigger
by creating enough cgroups, by removing the warnign and restricting
window size triggers to PSI file write-permission or
CAP_SYS_RESOURCE.
- Propagate SMT flags in the topology when removing degenerate domain
- Fix grub_reclaim() calculation bug in the deadline scheduler code
- Avoid resetting the min update period when it is unnecessary, in
psi_trigger_destroy().
- Don't balance a task to its current running CPU in load_balance(),
which was possible on certain NUMA topologies with overlapping
groups.
- Fix the sched-debug printing of rq->nr_uninterruptible
Cleanups:
- Address various -Wmissing-prototype warnings, as a preparation to
(maybe) enable this warning in the future.
- Remove unused code
- Mark more functions __init
- Fix shadow-variable warnings"
* tag 'sched-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()
sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()
sched/core: Fixed missing rq clock update before calling set_rq_offline()
sched/deadline: Update GRUB description in the documentation
sched/deadline: Fix bandwidth reclaim equation in GRUB
sched/wait: Fix a kthread_park race with wait_woken()
sched/topology: Mark set_sched_topology() __init
sched/fair: Rename variable cpu_util eff_util
arm64/arch_timer: Fix MMIO byteswap
sched/fair, cpufreq: Introduce 'runnable boosting'
sched/fair: Refactor CPU utilization functions
cpuidle: Use local_clock_noinstr()
sched/clock: Provide local_clock_noinstr()
x86/tsc: Provide sched_clock_noinstr()
clocksource: hyper-v: Provide noinstr sched_clock()
clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX
x86/vdso: Fix gettimeofday masking
math64: Always inline u128 version of mul_u64_u64_shr()
s390/time: Provide sched_clock_noinstr()
loongarch: Provide noinstr sched_clock_read()
...
Arnd Bergmann [Tue, 27 Jun 2023 20:54:28 +0000 (22:54 +0200)]
Merge tag 'soc-fsl-next-v6.5' of git://git./linux/kernel/git/leo/linux into soc/drivers
NXP/FSL SoC driver updates for v6.5
- fsl-mc: Make remove function return void
- QE USB: fix build issue caused by missing dependency
* tag 'soc-fsl-next-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux:
bus: fsl-mc: fsl-mc-allocator: Drop a write-only variable
bus: fsl-mc: fsl-mc-allocator: Initialize mc_bus_dev before use
soc/fsl/qe: fix usb.c build errors
bus: fsl-mc: Make remove function return void
soc: fsl: dpio: Suppress duplicated error reporting on device remove
bus: fsl-mc: fsl-mc-allocator: Improve error reporting
bus: fsl-mc: fsl-mc-allocator: Drop if block with always wrong condition
bus: fsl-mc: dprc: Push down error message from fsl_mc_driver_remove()
bus: fsl-mc: Only warn once about errors on device unbind
Link: https://lore.kernel.org/r/20230621222503.12402-1-leoyang.li@nxp.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Linus Torvalds [Tue, 27 Jun 2023 20:49:33 +0000 (13:49 -0700)]
Merge tag 'x86_sgx_for_v6.5' of git://git./linux/kernel/git/tip/tip
Pull SGX update from Borislav Petkov:
- A fix to avoid using a list iterator variable after the loop it is
used in
* tag 'x86_sgx_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Avoid using iterator after loop in sgx_mmu_notifier_release()
Jiri Kosina [Tue, 27 Jun 2023 20:43:39 +0000 (22:43 +0200)]
Merge branch 'for-6.5/wacom' into for-linus
- touch selftests for hid-wacom (Joshua Dickens)
- conversion of hid-wacom to use ktime_t (Jason Gerecke)
Jiri Kosina [Tue, 27 Jun 2023 20:42:48 +0000 (22:42 +0200)]
Merge branch 'for-6.5/nvidia' into for-linus
- support for nVidia Thunderstrike (SHIELD 2017) controller (Rahul Rameshbabu)
Jiri Kosina [Tue, 27 Jun 2023 20:42:28 +0000 (22:42 +0200)]
Merge branch 'for-6.5/i2c-hid' into for-linus
Jiri Kosina [Tue, 27 Jun 2023 20:41:03 +0000 (22:41 +0200)]
Merge branch 'for-6.5/goodix' into for-linus
- power management reset-during-suspend fix for goodix Chromebook
devices (Fei Shao)
Jiri Kosina [Tue, 27 Jun 2023 20:38:37 +0000 (22:38 +0200)]
Merge branch 'for-6.5/core' into for-linus
- more bullet-proof handling of devres-managed resources in HID core
(Dmitry Torokhov)
- kunit test Kconfig dependency fix (Geert Uytterhoeven)
Jiri Kosina [Tue, 27 Jun 2023 20:37:24 +0000 (22:37 +0200)]
Merge branch 'for-6.5/apple' into for-linus
- improved support for Keychron K8 keyboard (Lasse Brun)
Jiri Kosina [Tue, 27 Jun 2023 20:32:06 +0000 (22:32 +0200)]
Merge branch 'for-6.5/amd-sfh' into for-linus
- amd-sfh driver code cleanups (Basavaraj Natikar)
Jiri Kosina [Tue, 27 Jun 2023 20:30:34 +0000 (22:30 +0200)]
Merge branch 'for-6.5/acer' into for-linus
- ASUS ROG Z13 keyboard support and other assorted fixes to
hid-asus (Luke D. Jones)
Linus Torvalds [Tue, 27 Jun 2023 20:26:30 +0000 (13:26 -0700)]
Merge tag 'x86_sev_for_v6.5' of git://git./linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
- Some SEV and CC platform helpers cleanup and simplifications now that
the usage patterns are becoming apparent
[ I'm sure I'm the only one that has gets confused by all the TLAs, but
in case there are others: here SEV is AMD's "Secure Encrypted
Virtualization" and CC is generic "Confidential Computing".
There's also Intel SGX (Software Guard Extensions) and TDX (Trust
Domain Extensions), along with all the vendor memory encryption
extensions (SME, TSME, TME, and WTF).
And then we have arm64 with RMA and CCA, and I probably forgot another
dozen or so related acronyms - Linus ]
* tag 'x86_sev_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/coco: Get rid of accessor functions
x86/sev: Get rid of special sev_es_enable_key
x86/coco: Mark cc_platform_has() and descendants noinstr
Linus Torvalds [Tue, 27 Jun 2023 20:11:32 +0000 (13:11 -0700)]
Merge tag 'x86_mtrr_for_v6.5' of git://git./linux/kernel/git/tip/tip
Pull x86 mtrr updates from Borislav Petkov:
"A serious scrubbing of the MTRR code including adding a new map
mechanism in order to look up the memory type of a region easily.
Also address memory range lookup issues like returning an invalid
memory type. Furthermore, this handles the decoupling of PAT from MTRR
more naturally.
All work by Juergen Gross"
* tag 'x86_mtrr_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/xen: Set default memory type for PV guests to WB
x86/mtrr: Unify debugging printing
x86/mtrr: Remove unused code
x86/mm: Only check uniform after calling mtrr_type_lookup()
x86/mtrr: Don't let mtrr_type_lookup() return MTRR_TYPE_INVALID
x86/mtrr: Use new cache_map in mtrr_type_lookup()
x86/mtrr: Add mtrr=debug command line option
x86/mtrr: Construct a memory map with cache modes
x86/mtrr: Add get_effective_type() service function
x86/mtrr: Allocate mtrr_value array dynamically
x86/mtrr: Move 32-bit code from mtrr.c to legacy.c
x86/mtrr: Have only one set_mtrr() variant
x86/mtrr: Replace vendor tests in MTRR code
x86/xen: Set MTRR state when running as Xen PV initial domain
x86/hyperv: Set MTRR state when running as SEV-SNP Hyper-V guest
x86/mtrr: Support setting MTRR state for software defined MTRRs
x86/mtrr: Replace size_or_mask and size_and_mask with a much easier concept
x86/mtrr: Remove physical address size calculation
Mikulas Patocka [Mon, 26 Jun 2023 14:48:40 +0000 (16:48 +0200)]
dm: get rid of GFP_NOIO workarounds for __vmalloc and kvmalloc
In the past, the function __vmalloc didn't respect the GFP flags - it
allocated memory with the provided gfp flags, but it allocated page tables
with GFP_KERNEL. This was fixed in commit
451769ebb7e7 ("mm/vmalloc:
alloc GFP_NO{FS,IO} for vmalloc") so the memalloc_noio_{save,restore}
workaround is no longer needed.
The function kvmalloc didn't like flags different from GFP_KERNEL. This
was fixed in commit
a421ef303008 ("mm: allow !GFP_KERNEL allocations
for kvmalloc"), so kvmalloc can now be called with GFP_NOIO.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mikulas Patocka [Mon, 26 Jun 2023 14:46:57 +0000 (16:46 +0200)]
dm integrity: scale down the recalculate buffer if memory allocation fails
If memory allocation fails, try to reduce the size of the recalculate
buffer and continue with that smaller buffer.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mikulas Patocka [Mon, 26 Jun 2023 14:46:00 +0000 (16:46 +0200)]
dm integrity: only allocate recalculate buffer when needed
dm-integrity preallocated 8MiB buffer for recalculating in the
constructor and freed it in the destructor. This wastes memory when
the user has many dm-integrity devices.
Fix dm-integrity so that the buffer is only allocated when
recalculation is in progress; allocate the buffer at the beginning of
integrity_recalc() and free it at the end.
Note that integrity_recalc() doesn't hold any locks when allocating
the buffer, so it shouldn't cause low-memory deadlock.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mikulas Patocka [Mon, 26 Jun 2023 14:44:34 +0000 (16:44 +0200)]
dm integrity: reduce vmalloc space footprint on 32-bit architectures
It was reported that dm-integrity runs out of vmalloc space on 32-bit
architectures. On x86, there is only 128MiB vmalloc space and dm-integrity
consumes it quickly because it has a 64MiB journal and 8MiB recalculate
buffer.
Fix this by reducing the size of the journal to 4MiB and the size of
the recalculate buffer to 1MiB, so that multiple dm-integrity devices
can be created and activated on 32-bit architectures.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Linus Torvalds [Tue, 27 Jun 2023 19:25:42 +0000 (12:25 -0700)]
Merge tag 'x86_misc_for_v6.5' of git://git./linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:
- Remove the local symbols prefix of the get/put_user() exception
handling symbols so that tools do not get confused by the presence of
code belonging to the wrong symbol/not belonging to any symbol
- Improve csum_partial()'s performance
- Some improvements to the kcpuid tool
* tag 'x86_misc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/lib: Make get/put_user() exception handling a visible symbol
x86/csum: Fix clang -Wuninitialized in csum_partial()
x86/csum: Improve performance of `csum_partial`
tools/x86/kcpuid: Add .gitignore
tools/x86/kcpuid: Dump the correct CPUID function in error
Linus Torvalds [Tue, 27 Jun 2023 19:03:44 +0000 (12:03 -0700)]
Merge tag 'x86_microcode_for_v6.5' of git://git./linux/kernel/git/tip/tip
Pull x86 microcode loader updates from Borislav Petkov:
- Load late on both SMT threads on AMD, just like it is being done in
the early loading procedure
- Cleanups
* tag 'x86_microcode_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Load late on both threads too
x86/microcode/amd: Remove unneeded pointer arithmetic
x86/microcode/AMD: Get rid of __find_equiv_id()
Linus Torvalds [Tue, 27 Jun 2023 18:58:16 +0000 (11:58 -0700)]
Merge tag 'docs-arm-move' of git://git.lwn.net/linux
Pull arm documentation move from Jonathan Corbet:
"Move the Arm architecture documentation under Documentation/arch/.
This brings some order to the documentation directory, declutters the
top-level directory, and makes the documentation organization more
closely match that of the source"
* tag 'docs-arm-move' of git://git.lwn.net/linux:
dt-bindings: Update Documentation/arm references
docs: update some straggling Documentation/arm references
crypto: update some Arm documentation references
mips: update a reference to a moved Arm Document
arm64: Update Documentation/arm references
arm: update in-source documentation references
arm: docs: Move Arm documentation to Documentation/arch/
Linus Torvalds [Tue, 27 Jun 2023 18:33:47 +0000 (11:33 -0700)]
Merge tag 'docs-6.5' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"It's been a relatively calm cycle in docsland. We do have:
- Some initial page-table documentation from Linus (the other Linus)
- Regression-handling documentation improvements from Thorsten
- Addition of kerneldoc documentation for the ERR_PTR() and related
macros from James Seo
... and the usual collection of fixes and updates"
* tag 'docs-6.5' of git://git.lwn.net/linux:
docs: consolidate storage interfaces
Documentation: update git configuration for Link: tag
Documentation: KVM: make corrections to vcpu-requests.rst
Documentation: KVM: make corrections to ppc-pv.rst
Documentation: KVM: make corrections to locking.rst
Documentation: KVM: make corrections to halt-polling.rst
Documentation: virt: correct location of haltpoll module params
Documentation/mm: Initial page table documentation
docs: crypto: async-tx-api: fix typo in struct name
docs/doc-guide: Clarify how to write tables
docs: handling-regressions: rework section about fixing procedures
docs: process: fix a typoed cross-reference
docs: submitting-patches: Discuss interleaved replies
MAINTAINERS: direct process doc changes to a dedicated ML
Documentation: core-api: Add error pointer functions to kernel-api
err.h: Add missing kerneldocs for error pointer functions
Documentation: conf.py: Add __force to c_id_attributes
docs: clarify KVM related kernel parameters' descriptions
docs: consolidate human interface subsystems
docs: admin-guide: Add information about intel_pstate active mode
Linus Torvalds [Tue, 27 Jun 2023 18:28:56 +0000 (11:28 -0700)]
Merge tag 'linux-kselftest-next-6.5-rc1' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
- allow runners to override the timeout
This change is made to avoid future increases of long timeouts
- several other spelling and cleanups
- a new subtest to video_device_test
- enhancements to test coverage in clone3 test
- other fixes to ftrace and cpufreq tests
* tag 'linux-kselftest-next-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ftace: Fix KTAP output ordering
selftests/cpufreq: Don't enable generic lock debugging options
kselftests: Sort the collections list to avoid duplicate tests
selftest: pidfd: Omit long and repeating outputs
selftests: allow runners to override the timeout
selftests/ftrace: Add new test case which checks for optimized probes
selftests/clone3: test clone3 with exit signal in flags
kselftest: vDSO: Fix accumulation of uninitialized ret when CLOCK_REALTIME is undefined
selftests: prctl: Fix spelling mistake "anonynous" -> "anonymous"
selftests: media_tests: Add new subtest to video_device_test
Linus Torvalds [Tue, 27 Jun 2023 18:12:55 +0000 (11:12 -0700)]
Merge tag 'linux-kselftest-kunit-6.5-rc1' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
- kunit_add_action() API to defer a call until test exit
- Update document to add kunit_add_action() usage notes
- Changes to always run cleanup from a test kthread
- Documentation updates to clarify cleanup usage (assertions should not
be used in cleanup)
- Documentation update to clearly indicate that exit functions should
run even if init fails
- Several fixes and enhancements to existing tests
* tag 'linux-kselftest-kunit-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
MAINTAINERS: Add source tree entry for kunit
Documentation: kunit: Rename references to kunit_abort()
kunit: Move kunit_abort() call out of kunit_do_failed_assertion()
kunit: Fix obsolete name in documentation headers (func->action)
Documentation: Kunit: add MODULE_LICENSE to sample code
kunit: Update kunit_print_ok_not_ok function
kunit: Fix reporting of the skipped parameterized tests
kunit/test: Add example test showing parameterized testing
Documentation: kunit: Add usage notes for kunit_add_action()
kunit: kmalloc_array: Use kunit_add_action()
kunit: executor_test: Use kunit_add_action()
kunit: Add kunit_add_action() to defer a call until test exit
kunit: example: Provide example exit functions
Documentation: kunit: Warn that exit functions run even if init fails
Documentation: kunit: Note that assertions should not be used in cleanup
kunit: Always run cleanup from a test kthread
Documentation: kunit: Modular tests should not depend on KUNIT=y
kunit: tool: undo type subscripts for subprocess.Popen
Linus Torvalds [Tue, 27 Jun 2023 17:56:41 +0000 (10:56 -0700)]
Merge tag 'nolibc.2023.06.22a' of git://git./linux/kernel/git/paulmck/linux-rcu
Pull nolibc updates from Paul McKenney:
- Add stackprotector support
- Fix RISC-V load-store instruction syntax to support 32-bit binaries,
plus fixes for generic 32-bit support
- Fix use of s390 sys_fork()
- Add my_syscall6() for ARM
- Support different platforms having different errno definitions
- Fix ppoll/ppoll_time64 arguments (add the fifth argument)
- Force use of little endian on MIPS
- Improved testing, for example, better handling of different compilers
and compiler versions, comparing nolibc behavior to that of libc, and
additional test cases
- Improve syntax and header ordering
- Use existing <linux/reboot.h> instead of redefining constants
- Add syscall()
* tag 'nolibc.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (53 commits)
selftests/nolibc: make sure gcc always use little endian on MIPS
selftests/nolibc: also count skipped and failed tests in output
selftests/nolibc: add new gettimeofday test cases
selftests/nolibc: remove gettimeofday_bad1/2 completely
selftests/nolibc: support two errnos with EXPECT_SYSER2()
tools/nolibc: open: fix up compile warning for arm
tools/nolibc: arm: add missing my_syscall6
selftests/nolibc: use INT_MAX instead of __INT_MAX__
selftests/nolibc: not include limits.h for nolibc
selftests/nolibc: fix up compile warning with glibc on x86_64
selftests/nolibc: allow specify extra arguments for qemu
selftests/nolibc: remove test gettimeofday_null
tools/nolibc: ensure fast64 integer types have 64 bits
selftests/nolibc: test_fork: fix up duplicated print
tools/nolibc: ppoll/ppoll_time64: add a missing argument
selftests/nolibc: remove the duplicated gettimeofday_bad2
selftests/nolibc: print name instead of number for EOVERFLOW
tools/nolibc: support nanoseconds in stat()
selftests/nolibc: prevent coredumps during test execution
tools/nolibc: add support for prctl()
...
Jakub Kicinski [Tue, 27 Jun 2023 17:50:24 +0000 (10:50 -0700)]
Merge branch 'af_unix-followup-fixes-for-so_passpidfd'
Kuniyuki Iwashima says:
====================
af_unix: Followup fixes for SO_PASSPIDFD.
This series fixes 2 issues introduced by commit
5e2ff6704a27 ("scm: add
SO_PASSPIDFD and SCM_PIDFD").
The 1st patch fixes a warning in scm_pidfd_recv() reported by syzkaller.
The 2nd patch fixes a regression that bluetooth can't be built as module.
====================
Link: https://lore.kernel.org/r/20230627174314.67688-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexander Mikhalitsyn [Tue, 27 Jun 2023 17:43:14 +0000 (10:43 -0700)]
net: scm: introduce and use scm_recv_unix helper
Recently, our friends from bluetooth subsystem reported [1] that after
commit
5e2ff6704a27 ("scm: add SO_PASSPIDFD and SCM_PIDFD") scm_recv()
helper become unusable in kernel modules (because it uses unexported
pidfd_prepare() API).
We were aware of this issue and workarounded it in a hard way
by commit
97154bcf4d1b ("af_unix: Kconfig: make CONFIG_UNIX bool").
But recently a new functionality was added in the scope of commit
817efd3cad74 ("Bluetooth: hci_sock: Forward credentials to monitor")
and after that bluetooth can't be compiled as a kernel module.
After some discussion in [1] we decided to split scm_recv() into
two helpers, one won't support SCM_PIDFD (used for unix sockets),
and another one will be completely the same as it was before commit
5e2ff6704a27 ("scm: add SO_PASSPIDFD and SCM_PIDFD").
Link: https://lore.kernel.org/lkml/CAJqdLrpFcga4n7wxBhsFqPQiN8PKFVr6U10fKcJ9W7AcZn+o6Q@mail.gmail.com/
Fixes:
5e2ff6704a27 ("scm: add SO_PASSPIDFD and SCM_PIDFD")
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230627174314.67688-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Tue, 27 Jun 2023 17:43:13 +0000 (10:43 -0700)]
af_unix: Skip SCM_PIDFD if scm->pid is NULL.
syzkaller hit a WARN_ON_ONCE(!scm->pid) in scm_pidfd_recv().
In unix_stream_read_generic(), if there is no skb in the queue, we could
bail out the do-while loop without calling scm_set_cred():
1. No skb in the queue
2. sk is non-blocking
or
shutdown(sk, RCV_SHUTDOWN) is called concurrently
or
peer calls close()
If the socket is configured with SO_PASSPIDFD, scm_pidfd_recv() would
populate cmsg with garbage emitting the warning.
Let's skip SCM_PIDFD if scm->pid is NULL in scm_pidfd_recv().
Note another way would be skip calling scm_recv() in such cases, but this
caused a regression resulting in commit
9d797ee2dce1 ("Revert "af_unix:
Call scm_recv() only after scm_set_cred()."").
WARNING: CPU: 1 PID: 3245 at include/net/scm.h:138 scm_pidfd_recv include/net/scm.h:138 [inline]
WARNING: CPU: 1 PID: 3245 at include/net/scm.h:138 scm_recv.constprop.0+0x754/0x850 include/net/scm.h:177
Modules linked in:
CPU: 1 PID: 3245 Comm: syz-executor.1 Not tainted 6.4.0-rc5-01219-gfa0e21fa4443 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:scm_pidfd_recv include/net/scm.h:138 [inline]
RIP: 0010:scm_recv.constprop.0+0x754/0x850 include/net/scm.h:177
Code: 67 fd e9 55 fd ff ff e8 4a 70 67 fd e9 7f fd ff ff e8 40 70 67 fd e9 3e fb ff ff e8 36 70 67 fd e9 02 fd ff ff e8 8c 3a 20 fd <0f> 0b e9 fe fb ff ff e8 50 70 67 fd e9 2e f9 ff ff e8 46 70 67 fd
RSP: 0018:
ffffc90009af7660 EFLAGS:
00010216
RAX:
00000000000000a1 RBX:
ffff888041e58a80 RCX:
ffffc90003852000
RDX:
0000000000040000 RSI:
ffffffff842675b4 RDI:
0000000000000007
RBP:
ffffc90009af7810 R08:
0000000000000007 R09:
0000000000000013
R10:
00000000000000f8 R11:
0000000000000001 R12:
ffffc90009af7db0
R13:
0000000000000000 R14:
ffff888041e58a88 R15:
1ffff9200135eecc
FS:
00007f6b7113f640(0000) GS:
ffff88806cf00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f6b7111de38 CR3:
0000000012a6e002 CR4:
0000000000770ee0
PKRU:
55555554
Call Trace:
<TASK>
unix_stream_read_generic+0x5fe/0x1f50 net/unix/af_unix.c:2830
unix_stream_recvmsg+0x194/0x1c0 net/unix/af_unix.c:2880
sock_recvmsg_nosec net/socket.c:1019 [inline]
sock_recvmsg+0x188/0x1d0 net/socket.c:1040
____sys_recvmsg+0x210/0x610 net/socket.c:2712
___sys_recvmsg+0xff/0x190 net/socket.c:2754
do_recvmmsg+0x25d/0x6c0 net/socket.c:2848
__sys_recvmmsg net/socket.c:2927 [inline]
__do_sys_recvmmsg net/socket.c:2950 [inline]
__se_sys_recvmmsg net/socket.c:2943 [inline]
__x64_sys_recvmmsg+0x224/0x290 net/socket.c:2943
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f6b71da2e5d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48
RSP: 002b:
00007f6b7113ecc8 EFLAGS:
00000246 ORIG_RAX:
000000000000012b
RAX:
ffffffffffffffda RBX:
00000000004bc050 RCX:
00007f6b71da2e5d
RDX:
0000000000000007 RSI:
0000000020006600 RDI:
000000000000000b
RBP:
00000000004bc050 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000120 R11:
0000000000000246 R12:
0000000000000000
R13:
000000000000006e R14:
00007f6b71e03530 R15:
0000000000000000
</TASK>
Fixes:
5e2ff6704a27 ("scm: add SO_PASSPIDFD and SCM_PIDFD")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230627174314.67688-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 27 Jun 2023 17:37:01 +0000 (10:37 -0700)]
Merge tag 'rcu.2023.06.22a' of git://git./linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
"Documentation updates
Miscellaneous fixes, perhaps most notably:
- Remove RCU_NONIDLE(). The new visibility of most of the idle loop
to RCU has obsoleted this API.
- Make the RCU_SOFTIRQ callback-invocation time limit also apply to
the rcuc kthreads that invoke callbacks for CONFIG_PREEMPT_RT.
- Add a jiffies-based callback-invocation time limit to handle
long-running callbacks. (The local_clock() function is only invoked
once per 32 callbacks due to its high overhead.)
- Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs, which
fixes a bug that can occur on systems with non-contiguous CPU
numbering.
kvfree_rcu updates:
- Eliminate the single-argument variant of k[v]free_rcu() now that
all uses have been converted to k[v]free_rcu_mightsleep().
- Add WARN_ON_ONCE() checks for k[v]free_rcu*() freeing callbacks too
soon. Yes, this is closing the barn door after the horse has
escaped, but Murphy says that there will be more horses.
Callback-offloading updates:
- Fix a number of bugs involving the shrinker and lazy callbacks.
Tasks RCU updates
Torture-test updates"
* tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (32 commits)
torture: Remove duplicated argument -enable-kvm for ppc64
doc/rcutorture: Add description of rcutorture.stall_cpu_block
rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale
rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup()
rcutorture: Correct name of use_softirq module parameter
locktorture: Add long_hold to adjust lock-hold delays
rcu/nocb: Make shrinker iterate only over NOCB CPUs
rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs
rcu: Make rcu_cpu_starting() rely on interrupts being disabled
rcu: Mark rcu_cpu_kthread() accesses to ->rcu_cpu_has_work
rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp
rcu: Employ jiffies-based backstop to callback time limit
rcu: Check callback-invocation time limit for rcuc kthreads
rcu: Remove RCU_NONIDLE()
rcu: Add more RCU files to kernel-api.rst
rcu-tasks: Clarify the cblist_init_generic() function's pr_info() output
rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic()
rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker
rcu/nocb: Fix shrinker race against callback enqueuer
rcu/nocb: Protect lazy shrinker against concurrent (de-)offloading
...
Jason Gunthorpe [Tue, 27 Jun 2023 17:06:29 +0000 (14:06 -0300)]
Merge tag 'v6.4' into rdma.git for-next
Linux 6.4
Resolve conflicts between rdma rc and next in rxe_cq matching linux-next:
drivers/infiniband/sw/rxe/rxe_cq.c:
https://lore.kernel.org/r/
20230622115246.
365d30ad@canb.auug.org.au
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Dan Carpenter [Tue, 27 Jun 2023 07:20:13 +0000 (10:20 +0300)]
RDMA/bnxt_re: Fix an IS_ERR() vs NULL check
The bnxt_re_mmap_entry_insert() function returns NULL, not error pointers.
Update the check for errors accordingly.
Fixes:
360da60d6c6e ("RDMA/bnxt_re: Enable low latency push")
Link: https://lore.kernel.org/r/8d92e85f-626b-4eca-8501-ca7024cfc0ee@moroto.mountain
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Moritz Fischer [Tue, 27 Jun 2023 03:54:32 +0000 (03:54 +0000)]
net: lan743x: Simplify comparison
Simplify comparison, no functional changes.
Cc: Bryan Whitehead <bryan.whitehead@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Moritz Fischer <moritzf@google.com>
Link: https://lore.kernel.org/r/20230627035432.1296760-1-moritzf@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 27 Jun 2023 16:45:22 +0000 (09:45 -0700)]
Merge git://git./linux/kernel/git/netdev/net
Merge in late fixes to prepare for the 6.5 net-next PR.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Sun, 25 Jun 2023 21:02:25 +0000 (14:02 -0700)]
gup: add warning if some caller would seem to want stack expansion
It feels very unlikely that anybody would want to do a GUP in an
unmapped area under the stack pointer, but real users sometimes do some
really strange things. So add a (temporary) warning for the case where
a GUP fails and expanding the stack might have made it work.
It's trivial to do the expansion in the caller as part of getting the mm
lock in the first place - see __access_remote_vm() for ptrace, for
example - it's just that it's unnecessarily painful to do it deep in the
guts of the GUP lookup when we might have to drop and re-take the lock.
I doubt anybody actually does anything quite this strange, but let's be
proactive: adding these warnings is simple, and will make debugging it
much easier if they trigger.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 24 Jun 2023 20:45:51 +0000 (13:45 -0700)]
mm: always expand the stack with the mmap write lock held
This finishes the job of always holding the mmap write lock when
extending the user stack vma, and removes the 'write_locked' argument
from the vm helper functions again.
For some cases, we just avoid expanding the stack at all: drivers and
page pinning really shouldn't be extending any stacks. Let's see if any
strange users really wanted that.
It's worth noting that architectures that weren't converted to the new
lock_mm_and_find_vma() helper function are left using the legacy
"expand_stack()" function, but it has been changed to drop the mmap_lock
and take it for writing while expanding the vma. This makes it fairly
straightforward to convert the remaining architectures.
As a result of dropping and re-taking the lock, the calling conventions
for this function have also changed, since the old vma may no longer be
valid. So it will now return the new vma if successful, and NULL - and
the lock dropped - if the area could not be extended.
Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # ia64
Tested-by: Frank Scheiner <frank.scheiner@web.de> # ia64
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kuniyuki Iwashima [Mon, 26 Jun 2023 16:43:13 +0000 (09:43 -0700)]
netlink: Add __sock_i_ino() for __netlink_diag_dump().
syzbot reported a warning in __local_bh_enable_ip(). [0]
Commit
8d61f926d420 ("netlink: fix potential deadlock in
netlink_set_err()") converted read_lock(&nl_table_lock) to
read_lock_irqsave() in __netlink_diag_dump() to prevent a deadlock.
However, __netlink_diag_dump() calls sock_i_ino() that uses
read_lock_bh() and read_unlock_bh(). If CONFIG_TRACE_IRQFLAGS=y,
read_unlock_bh() finally enables IRQ even though it should stay
disabled until the following read_unlock_irqrestore().
Using read_lock() in sock_i_ino() would trigger a lockdep splat
in another place that was fixed in commit
f064af1e500a ("net: fix
a lockdep splat"), so let's add __sock_i_ino() that would be safe
to use under BH disabled.
[0]:
WARNING: CPU: 0 PID: 5012 at kernel/softirq.c:376 __local_bh_enable_ip+0xbe/0x130 kernel/softirq.c:376
Modules linked in:
CPU: 0 PID: 5012 Comm: syz-executor487 Not tainted 6.4.0-rc7-syzkaller-00202-g6f68fc395f49 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
RIP: 0010:__local_bh_enable_ip+0xbe/0x130 kernel/softirq.c:376
Code: 45 bf 01 00 00 00 e8 91 5b 0a 00 e8 3c 15 3d 00 fb 65 8b 05 ec e9 b5 7e 85 c0 74 58 5b 5d c3 65 8b 05 b2 b6 b4 7e 85 c0 75 a2 <0f> 0b eb 9e e8 89 15 3d 00 eb 9f 48 89 ef e8 6f 49 18 00 eb a8 0f
RSP: 0018:
ffffc90003a1f3d0 EFLAGS:
00010046
RAX:
0000000000000000 RBX:
0000000000000201 RCX:
1ffffffff1cf5996
RDX:
0000000000000000 RSI:
0000000000000201 RDI:
ffffffff8805c6f3
RBP:
ffffffff8805c6f3 R08:
0000000000000001 R09:
ffff8880152b03a3
R10:
ffffed1002a56074 R11:
0000000000000005 R12:
00000000000073e4
R13:
dffffc0000000000 R14:
0000000000000002 R15:
0000000000000000
FS:
0000555556726300(0000) GS:
ffff8880b9800000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
000000000045ad50 CR3:
000000007c646000 CR4:
00000000003506f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<TASK>
sock_i_ino+0x83/0xa0 net/core/sock.c:2559
__netlink_diag_dump+0x45c/0x790 net/netlink/diag.c:171
netlink_diag_dump+0xd6/0x230 net/netlink/diag.c:207
netlink_dump+0x570/0xc50 net/netlink/af_netlink.c:2269
__netlink_dump_start+0x64b/0x910 net/netlink/af_netlink.c:2374
netlink_dump_start include/linux/netlink.h:329 [inline]
netlink_diag_handler_dump+0x1ae/0x250 net/netlink/diag.c:238
__sock_diag_cmd net/core/sock_diag.c:238 [inline]
sock_diag_rcv_msg+0x31e/0x440 net/core/sock_diag.c:269
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2547
sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1914
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x71c/0x900 net/socket.c:2503
___sys_sendmsg+0x110/0x1b0 net/socket.c:2557
__sys_sendmsg+0xf7/0x1c0 net/socket.c:2586
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f5303aaabb9
Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007ffc7506e548 EFLAGS:
00000246 ORIG_RAX:
000000000000002e
RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007f5303aaabb9
RDX:
0000000000000000 RSI:
0000000020000180 RDI:
0000000000000003
RBP:
00007f5303a6ed60 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
00007f5303a6edf0
R13:
0000000000000000 R14:
0000000000000000 R15:
0000000000000000
</TASK>
Fixes:
8d61f926d420 ("netlink: fix potential deadlock in netlink_set_err()")
Reported-by: syzbot+5da61cf6a9bc1902d422@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=5da61cf6a9bc1902d422
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230626164313.52528-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Mon, 26 Jun 2023 15:44:02 +0000 (18:44 +0300)]
net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses
When using the felix driver (the only one which supports UC filtering
and MC filtering) as a DSA master for a random other DSA switch, one can
see the following stack trace when the downstream switch ports join a
VLAN-aware bridge:
=============================
WARNING: suspicious RCU usage
-----------------------------
net/8021q/vlan_core.c:238 suspicious rcu_dereference_protected() usage!
stack backtrace:
Workqueue: dsa_ordered dsa_slave_switchdev_event_work
Call trace:
lockdep_rcu_suspicious+0x170/0x210
vlan_for_each+0x8c/0x188
dsa_slave_sync_uc+0x128/0x178
__hw_addr_sync_dev+0x138/0x158
dsa_slave_set_rx_mode+0x58/0x70
__dev_set_rx_mode+0x88/0xa8
dev_uc_add+0x74/0xa0
dsa_port_bridge_host_fdb_add+0xec/0x180
dsa_slave_switchdev_event_work+0x7c/0x1c8
process_one_work+0x290/0x568
What it's saying is that vlan_for_each() expects rtnl_lock() context and
it's not getting it, when it's called from the DSA master's ndo_set_rx_mode().
The caller of that - dsa_slave_set_rx_mode() - is the slave DSA
interface's dsa_port_bridge_host_fdb_add() which comes from the deferred
dsa_slave_switchdev_event_work().
We went to great lengths to avoid the rtnl_lock() context in that call
path in commit
0faf890fc519 ("net: dsa: drop rtnl_lock from
dsa_slave_switchdev_event_work"), and calling rtnl_lock() is simply not
an option due to the possibility of deadlocking when calling
dsa_flush_workqueue() from the call paths that do hold rtnl_lock() -
basically all of them.
So, when the DSA master calls vlan_for_each() from its ndo_set_rx_mode(),
the state of the 8021q driver on this device is really not protected
from concurrent access by anything.
Looking at net/8021q/, I don't think that vlan_info->vid_list was
particularly designed with RCU traversal in mind, so introducing an RCU
read-side form of vlan_for_each() - vlan_for_each_rcu() - won't be so
easy, and it also wouldn't be exactly what we need anyway.
In general I believe that the solution isn't in net/8021q/ anyway;
vlan_for_each() is not cut out for this task. DSA doesn't need rtnl_lock()
to be held per se - since it's not a netdev state change that we're
blocking, but rather, just concurrent additions/removals to a VLAN list.
We don't even need sleepable context - the callback of vlan_for_each()
just schedules deferred work.
The proposed escape is to remove the dependency on vlan_for_each() and
to open-code a non-sleepable, rtnl-free alternative to that, based on
copies of the VLAN list modified from .ndo_vlan_rx_add_vid() and
.ndo_vlan_rx_kill_vid().
Fixes:
64fdc5f341db ("net: dsa: sync unicast and multicast addresses for VLAN filters too")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230626154402.3154454-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Mon, 26 Jun 2023 20:58:37 +0000 (13:58 -0700)]
Revert "af_unix: Call scm_recv() only after scm_set_cred()."
This reverts commit
3f5f118bb657f94641ea383c7c1b8c09a5d46ea2.
Konrad reported that desktop environment below cannot be reached after
commit
3f5f118bb657 ("af_unix: Call scm_recv() only after scm_set_cred().")
- postmarketOS (Alpine Linux w/ musl 1.2.4)
- busybox 1.36.1
- GNOME 44.1
- networkmanager 1.42.6
- openrc 0.47
Regarding to the warning of SO_PASSPIDFD, I'll post another patch to
suppress it by skipping SCM_PIDFD if scm->pid == NULL in scm_pidfd_recv().
Reported-by: Konrad Dybcio <konradybcio@kernel.org>
Link: https://lore.kernel.org/netdev/8c7f9abd-4f84-7296-2788-1e130d6304a0@kernel.org/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Gal Pressman <gal@nvidia.com>
Link: https://lore.kernel.org/r/20230626205837.82086-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 26 Jun 2023 21:46:40 +0000 (14:46 -0700)]
phylink: ReST-ify the phylink_pcs_neg_mode() kdoc
Stephen reports warnings when rendering phylink kdocs as HTML:
include/linux/phylink.h:110: ERROR: Unexpected indentation.
include/linux/phylink.h:111: WARNING: Block quote ends without a blank line; unexpected unindent.
include/linux/phylink.h:614: WARNING: Inline literal start-string without end-string.
include/linux/phylink.h:644: WARNING: Inline literal start-string without end-string.
Make phylink_pcs_neg_mode() use a proper list format to fix the first
two warnings.
The last two warnings, AFAICT, come from the use of shorthand like
phylink_mode_*(). Perhaps those should be special-cased at the Sphinx
level.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/all/20230626162908.2f149f98@canb.auug.org.au/
Link: https://lore.kernel.org/r/20230626214640.3142252-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Tue, 27 Jun 2023 13:49:48 +0000 (14:49 +0100)]
libceph: Partially revert changes to support MSG_SPLICE_PAGES
Fix the mishandling of MSG_DONTWAIT and also reinstates the per-page
checking of the source pages (which might have come from a DIO write by
userspace) by partially reverting the changes to support MSG_SPLICE_PAGES
and doing things a little differently. In messenger_v1:
(1) The ceph_tcp_sendpage() is resurrected and the callers reverted to use
that.
(2) The callers now pass MSG_MORE unconditionally. Previously, they were
passing in MSG_MORE|MSG_SENDPAGE_NOTLAST and then degrading that to
just MSG_MORE on the last call to ->sendpage().
(3) Make ceph_tcp_sendpage() a wrapper around sendmsg() rather than
sendpage(), setting MSG_SPLICE_PAGES if sendpage_ok() returns true on
the page.
In messenger_v2:
(4) Bring back do_try_sendpage() and make the callers use that.
(5) Make do_try_sendpage() use sendmsg() for both cases and set
MSG_SPLICE_PAGES if sendpage_ok() is set.
Fixes:
40a8c17aa770 ("ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage")
Fixes:
fa094ccae1e7 ("ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()")
Reported-by: Ilya Dryomov <idryomov@gmail.com>
Link: https://lore.kernel.org/r/CAOi1vP9vjLfk3W+AJFeexC93jqPaPUn2dD_4NrzxwoZTbYfOnw@mail.gmail.com/
Link: https://lore.kernel.org/r/CAOi1vP_Bn918j24S94MuGyn+Gxk212btw7yWeDrRcW1U8pc_BA@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Link: https://lore.kernel.org/r/3101881.1687801973@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/3111635.1687813501@warthog.procyon.org.uk/
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Link: https://lore.kernel.org/r/3199652.1687873788@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 27 Jun 2023 13:42:35 +0000 (16:42 +0300)]
net: phy: mscc: fix packet loss due to RGMII delays
Two deadly typos break RX and TX traffic on the VSC8502 PHY using RGMII
if phy-mode = "rgmii-id" or "rgmii-txid", and no "tx-internal-delay-ps"
override exists. The negative error code from phy_get_internal_delay()
does not get overridden with the delay deduced from the phy-mode, and
later gets committed to hardware. Also, the rx_delay gets overridden by
what should have been the tx_delay.
Fixes:
dbb050d2bfc8 ("phy: mscc: Add support for RGMII delay configuration")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Harini Katakam <harini.katakam@amd.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230627134235.3453358-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 27 Jun 2023 16:28:25 +0000 (09:28 -0700)]
Merge branch 'use-vmalloc_array-and-vcalloc'
Julia Lawall says:
====================
use vmalloc_array and vcalloc
The functions vmalloc_array and vcalloc were introduced in
commit
a8749a35c399 ("mm: vmalloc: introduce array allocation functions")
but are not used much yet. This series introduces uses of
these functions, to protect against multiplication overflows.
The changes were done using the following Coccinelle semantic
patch.
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
v2: This series uses vmalloc_array and vcalloc instead of
array_size. It also leaves a multiplication of a constant by a
sizeof as is. Two patches are thus dropped from the series.
====================
Link: https://lore.kernel.org/r/20230627144339.144478-1-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julia Lawall [Tue, 27 Jun 2023 14:43:37 +0000 (16:43 +0200)]
net: mana: use vmalloc_array and vcalloc
Use vmalloc_array and vcalloc to protect against
multiplication overflows.
The changes were done using the following Coccinelle
semantic patch:
// <smpl>
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20230627144339.144478-23-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julia Lawall [Tue, 27 Jun 2023 14:43:33 +0000 (16:43 +0200)]
net: enetc: use vmalloc_array and vcalloc
Use vmalloc_array and vcalloc to protect against
multiplication overflows.
The changes were done using the following Coccinelle
semantic patch:
// <smpl>
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20230627144339.144478-19-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julia Lawall [Tue, 27 Jun 2023 14:43:26 +0000 (16:43 +0200)]
ionic: use vmalloc_array and vcalloc
Use vmalloc_array and vcalloc to protect against
multiplication overflows.
The changes were done using the following Coccinelle
semantic patch:
// <smpl>
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20230627144339.144478-12-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julia Lawall [Tue, 27 Jun 2023 14:43:24 +0000 (16:43 +0200)]
pds_core: use vmalloc_array and vcalloc
Use vmalloc_array and vcalloc to protect against
multiplication overflows.
The changes were done using the following Coccinelle
semantic patch:
// <smpl>
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20230627144339.144478-10-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julia Lawall [Tue, 27 Jun 2023 14:43:19 +0000 (16:43 +0200)]
gve: use vmalloc_array and vcalloc
Use vmalloc_array and vcalloc to protect against
multiplication overflows.
The changes were done using the following Coccinelle
semantic patch:
// <smpl>
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20230627144339.144478-5-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julia Lawall [Tue, 27 Jun 2023 14:43:17 +0000 (16:43 +0200)]
octeon_ep: use vmalloc_array and vcalloc
Use vmalloc_array and vcalloc to protect against
multiplication overflows.
The changes were done using the following Coccinelle
semantic patch:
// <smpl>
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20230627144339.144478-3-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rafael J. Wysocki [Tue, 27 Jun 2023 14:35:52 +0000 (16:35 +0200)]
ACPI: EC: Fix acpi_ec_dispatch_gpe()
Commit
896e97bf99ec ("ACPI: EC: Clear GPE on interrupt handling only")
broke suspend-to-idle at least on Dell XPS13 9360 and 9380.
The problem is that acpi_ec_dispatch_gpe() must clear the EC GPE,
because the EC GPE handler never runs when the system is in the
suspend-to-idle state and if the EC GPE is not cleared by the suspend-
to-idle loop, it is never cleared at all which leads to a GPE storm.
This causes suspend-to-idle to burn energy instead of saving it which
is potentially dangerous (the affected machines heat up rather badly
when that happens).
Addess this by making acpi_ec_dispatch_gpe() clear the EC GPE as it did
before.
Fixes:
896e97bf99ec ("ACPI: EC: Clear GPE on interrupt handling only")
Tested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Sergio Paracuellos [Fri, 23 Jun 2023 03:59:01 +0000 (05:59 +0200)]
dt-bindings: interrupt-controller: add Ralink SoCs interrupt controller
Add YAML doc for the interrupt controller which is present on Ralink SoCs.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20230623035901.1514341-1-sergio.paracuellos@gmail.com
Signed-off-by: Rob Herring <robh@kernel.org>
Sebastian Reichel [Fri, 16 Jun 2023 17:00:21 +0000 (19:00 +0200)]
dt-bindings: PCI: dwc: rockchip: Update for RK3588
The PCIe 2.0 controllers on RK3588 need one additional clock,
one additional reset line and one for ranges entry.
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230616170022.76107-4-sebastian.reichel@collabora.com
Signed-off-by: Rob Herring <robh@kernel.org>
Davide Tronchin [Mon, 26 Jun 2023 12:53:36 +0000 (14:53 +0200)]
net: usb: qmi_wwan: add u-blox 0x1312 composition
Add RmNet support for LARA-R6 01B.
The new LARA-R6 product variant identified by the "01B" string can be
configured (by AT interface) in three different USB modes:
* Default mode (Vendor ID: 0x1546 Product ID: 0x1311) with 4 serial
interfaces
* RmNet mode (Vendor ID: 0x1546 Product ID: 0x1312) with 4 serial
interfaces and 1 RmNet virtual network interface
* CDC-ECM mode (Vendor ID: 0x1546 Product ID: 0x1313) with 4 serial
interface and 1 CDC-ECM virtual network interface
The first 4 interfaces of all the 3 configurations (default, RmNet, ECM)
are the same.
In RmNet mode LARA-R6 01B exposes the following interfaces:
If 0: Diagnostic
If 1: AT parser
If 2: AT parser
If 3: AT parset/alternative functions
If 4: RMNET interface
Signed-off-by: Davide Tronchin <davide.tronchin.94@gmail.com>
Link: https://lore.kernel.org/r/20230626125336.3127-1-davide.tronchin.94@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Matthieu Baerts [Mon, 26 Jun 2023 09:02:39 +0000 (11:02 +0200)]
perf trace: fix MSG_SPLICE_PAGES build error
Our MPTCP CI and Stephen got this error:
In file included from builtin-trace.c:907:
trace/beauty/msg_flags.c: In function 'syscall_arg__scnprintf_msg_flags':
trace/beauty/msg_flags.c:28:21: error: 'MSG_SPLICE_PAGES' undeclared (first use in this function)
28 | if (flags & MSG_##n) { | ^~~~
trace/beauty/msg_flags.c:50:9: note: in expansion of macro 'P_MSG_FLAG'
50 | P_MSG_FLAG(SPLICE_PAGES);
| ^~~~~~~~~~
trace/beauty/msg_flags.c:28:21: note: each undeclared identifier is reported only once for each function it appears in
28 | if (flags & MSG_##n) { | ^~~~
trace/beauty/msg_flags.c:50:9: note: in expansion of macro 'P_MSG_FLAG'
50 | P_MSG_FLAG(SPLICE_PAGES);
| ^~~~~~~~~~
The fix is similar to what was done with MSG_FASTOPEN: the new macro is
defined if it is not defined in the system headers.
Fixes:
b848b26c6672 ("net: Kill MSG_SENDPAGE_NOTLAST")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/r/
20230626112847.
2ef3d422@canb.auug.org.au/
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230626090239.899672-1-matthieu.baerts@tessares.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 27 Jun 2023 10:47:43 +0000 (12:47 +0200)]
Merge tag 'nf-23-06-27' of git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Reset shift on Boyer-Moore string match for each block,
from Jeremy Sowden.
2) Fix acccess to non-linear area in DCCP conntrack helper,
from Florian Westphal.
3) Fix kernel-doc warnings, by Randy Dunlap.
4) Bail out if expires= does not show in SIP helper message,
or make ct_sip_parse_numerical_param() tristate and report
error if expires= cannot be parsed.
5) Unbind non-anonymous set in case rule construction fails.
6) Fix underflow in chain reference counter in case set element
already exists or it cannot be created.
netfilter pull request 23-06-27
====================
Link: https://lore.kernel.org/r/20230627065304.66394-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Cambda Zhu [Mon, 26 Jun 2023 09:33:47 +0000 (17:33 +0800)]
ipvlan: Fix return value of ipvlan_queue_xmit()
ipvlan_queue_xmit() should return NET_XMIT_XXX, but
ipvlan_xmit_mode_l2/l3() returns rx_handler_result_t or NET_RX_XXX
in some cases. ipvlan_rcv_frame() will only return RX_HANDLER_CONSUMED
in ipvlan_xmit_mode_l2/l3() because 'local' is true. It's equal to
NET_XMIT_SUCCESS. But dev_forward_skb() can return NET_RX_SUCCESS or
NET_RX_DROP, and returning NET_RX_DROP(NET_XMIT_DROP) will increase
both ipvlan and ipvlan->phy_dev drops counter.
The skb to forward can be treated as xmitted successfully. This patch
makes ipvlan_queue_xmit() return NET_XMIT_SUCCESS for forward skb.
Fixes:
2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230626093347.7492-1-cambda@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Huacai Chen [Tue, 27 Jun 2023 06:56:51 +0000 (14:56 +0800)]
Merge 'irq/loongarch-fixes-6.5' into loongarch-next
LoongArch architecture changes for 6.5 depend on the irq changes
to work on both ACPI and FDT systems, so merge them to create a base.
Masahiro Yamada [Thu, 19 Jan 2023 08:22:50 +0000 (17:22 +0900)]
powerpc: remove checks for binutils older than 2.25
Commit
e4412739472b ("Documentation: raise minimum supported version of
binutils to 2.25") allows us to remove the checks for old binutils.
There is no more user for ld-ifversion. Remove it as well.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230119082250.151485-1-masahiroy@kernel.org
Naveen N Rao [Tue, 30 May 2023 06:14:36 +0000 (11:44 +0530)]
powerpc: Fail build if using recordmcount with binutils v2.37
binutils v2.37 drops unused section symbols, which prevents recordmcount
from capturing mcount locations in sections that have no non-weak
symbols. This results in a build failure with a message such as:
Cannot find symbol for section 12: .text.perf_callchain_kernel.
kernel/events/callchain.o: failed
The change to binutils was reverted for v2.38, so this behavior is
specific to binutils v2.37:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=
c09c8b42021180eee9495bd50d8b35e683d3901b
Objtool is able to cope with such sections, so this issue is specific to
recordmcount.
Fail the build and print a warning if binutils v2.37 is detected and if
we are using recordmcount.
Cc: stable@vger.kernel.org
Suggested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230530061436.56925-1-naveen@kernel.org
Linus Torvalds [Tue, 27 Jun 2023 03:12:07 +0000 (20:12 -0700)]
Merge tag 'tag-chrome-platform-for-v6.5' of git://git./linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Tzung-Bi Shih:
"Improvements:
- Support Pin Assignment D in getting mux state
- Emit an uevent when EC panics so that userland programs get chance
to capture EC coredumps (LPC interface only)
- Send EC_CMD_HOST_SLEEP_EVENT to EC at the very beginning/end of
system suspend/resume so that EC can watch the duration more
accurately (LPC interface only)
Misc:
- Switch back from I2C .probe_new() to .probe()
- Use %*ph for printing hexdump of small buffers"
* tag 'tag-chrome-platform-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_spi: Use %*ph for printing hexdump of a small buffer
platform/chrome: Switch i2c drivers back to use .probe()
platform/chrome: cros_ec_lpc: Move host command to prepare/complete
platform/chrome: cros_ec: Report EC panic as uevent
platform/chrome: cros_typec_switch: Add Pin D support
Linus Torvalds [Tue, 27 Jun 2023 02:41:26 +0000 (19:41 -0700)]
Merge tag 'thermal-6.5-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These extend the int340x thermal driver, add thermal DT bindings for
some Qcom platforms, add DT bindings and support for Armada AP807 and
MSM8909, allow selecting the bang-bang thermal governor as the default
one, address issues in several thermal drivers for ARM platforms and
clean up code.
Specifics:
- Add new IOCTLs to the int340x thermal driver to allow user space to
retrieve the Passive v2 thermal table (Srinivas Pandruvada)
- Add DT bindings for SM6375, MSM8226 and QCM2290 Qcom platforms
(Konrad Dybcio)
- Add DT bindings and support for QCom MSM8226 (Matti Lehtimäki)
- Add DT bindings for QCom ipq9574 (Praveenkumar I)
- Convert bcm2835 DT bindings to the yaml schema (Stefan Wahren)
- Allow selecting the bang-bang governor as default (Thierry Reding)
- Refactor and prepare the code to set the scene for RCar Gen4
(Wolfram Sang)
- Clean up and fix the QCom tsens drivers. Add DT bindings and
calibration for the MSM8909 platform (Stephan Gerhold)
- Revert a patch introducing a wrong usage of devm_of_iomap() on the
Mediatek platform (Ricardo Cañuelo)
- Fix the clock vs reset ordering in order to conform to the
documentation on the sun8i (Christophe JAILLET)
- Prevent setting up undocumented registers, enable the only
described sensors and add the version 2.1 on the Qoriq sensor (Peng
Fan)
- Add DT bindings and support for the Armada AP807 (Alex Leibovich)
- Update the mlx5 driver with the recent thermal changes (Daniel
Lezcano)
- Convert to platform remove callback returning void on STM32 (Uwe
Kleine-König)
- Add an error information printing for devm_thermal_add_hwmon_sysfs()
and remove the error from the Sun8i, Amlogic, i.MX, TI, K3, Tegra,
Qoriq, Mediateka and QCom (Yangtao Li)
- Register as hwmon sensor for the Generic ADC (Chen-Yu Tsai)
- Use the dev_err_probe() function in the QCom tsens alarm driver
(Luca Weiss)"
* tag 'thermal-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (39 commits)
thermal/drivers/qcom/temp-alarm: Use dev_err_probe
thermal/drivers/generic-adc: Register thermal zones as hwmon sensors
thermal/drivers/mediatek/lvts_thermal: Remove redundant msg in lvts_ctrl_start()
thermal/drivers/qcom: Remove redundant msg at probe time
thermal/drivers/ti-soc: Remove redundant msg in ti_thermal_expose_sensor()
thermal/drivers/qoriq: Remove redundant msg in qoriq_tmu_register_tmu_zone()
thermal/drivers/tegra: Remove redundant msg in tegra_tsensor_register_channel()
drivers/thermal/k3: Remove redundant msg in k3_bandgap_probe()
thermal/drivers/imx: Remove redundant msg in imx8mm_tmu_probe() and imx_sc_thermal_probe()
thermal/drivers/amlogic: Remove redundant msg in amlogic_thermal_probe()
thermal/drivers/sun8i: Remove redundant msg in sun8i_ths_register()
thermal/hwmon: Add error information printing for devm_thermal_add_hwmon_sysfs()
thermal/drivers/stm32: Convert to platform remove callback returning void
net/mlx5: Update the driver with the recent thermal changes
thermal/drivers/armada: Add support for AP807 thermal data
dt-bindings: armada-thermal: Add armada-ap807-thermal compatible
thermal/drivers/qoriq: Support version 2.1
thermal/drivers/qoriq: Only enable supported sensors
thermal/drivers/qoriq: No need to program site adjustment register
thermal/drivers/mediatek/lvts_thermal: Register thermal zones as hwmon sensors
...
Linus Torvalds [Tue, 27 Jun 2023 02:36:30 +0000 (19:36 -0700)]
Merge tag 'pm-6.5-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add Intel TPMI (Topology Aware Register and PM Capsule
Interface) support to the power capping subsystem, extend the
intel_idle driver to work in VM guests where MWAIT is not available,
extend the system-wide power management diagnostics, fix bugs and
clean up code.
Specifics:
- Introduce power capping core support for Intel TPMI (Topology Aware
Register and PM Capsule Interface) and a TPMI interface driver for
Intel RAPL (Zhang Rui, Dan Carpenter)
- Fix CONFIG_IOSF_MBI dependency in the Intel RAPL power capping
driver (Zhang Rui)
- Fix invalid initialization for pl4_supported field in the Intel
RAPL power capping driver (Sumeet Pawnikar)
- Clean up the intel_idle driver, make it work with VM guests that
cannot use the MWAIT instruction and address the case in which the
host may enter a deep idle state when the guest is idle (Arjan van
de Ven)
- Prevent cpufreq drivers that provide the ->adjust_perf() callback
without a ->fast_switch() one which is used as a fallback from the
former in some cases (Wyes Karny)
- Fix some issues related to the AMD P-state cpufreq driver (Mario
Limonciello, Wyes Karny)
- Fix the energy_performance_preference attribute handling in the
intel_pstate driver in passive mode (Tero Kristo)
- Fix the handling of pm_suspend_target_state when CONFIG_PM is unset
(Kai-Heng Feng)
- Correct spelling mistake in a comment in the hibernation code (Wang
Honghui)
- Add arch_resume_nosmt() prototype to avoid a "missing prototypes"
build warning (Arnd Bergmann)
- Restrict pm_pr_dbg() to system-wide power transitions and use it in
a few additional places (Mario Limonciello)
- Drop verification of in-params from genpd_add_device() and ensure
that all of its callers will do it (Ulf Hansson)
- Prevent possible integer overflows from occurring in
genpd_parse_state() (Nikita Zhandarovich)
- Reorder fieldls in 'struct devfreq_dev_status' to reduce its size
somewhat (Christophe JAILLET)
- Ensure that the Exynos PPMU driver is already loaded before the
Exynos Bus driver starts probing so as to avoid a possible freeze
loading of the kernel modules (Marek Szyprowski)
- Fix variable deferencing before NULL check in the mtk-cci devfreq
driver (Sukrut Bellary)"
* tag 'pm-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (42 commits)
intel_idle: Add a "Long HLT" C1 state for the VM guest mode
cpufreq: intel_pstate: Fix energy_performance_preference for passive
cpufreq: amd-pstate: Add a kernel config option to set default mode
cpufreq: amd-pstate: Set a fallback policy based on preferred_profile
ACPI: CPPC: Add definition for undefined FADT preferred PM profile value
cpufreq: amd-pstate: Set default governor to schedutil
PM: domains: Move the verification of in-params from genpd_add_device()
cpufreq: amd-pstate: Make amd-pstate EPP driver name hyphenated
cpufreq: amd-pstate: Write CPPC enable bit per-socket
intel_idle: Add support for using intel_idle in a VM guest using just hlt
cpufreq: Fail driver register if it has adjust_perf without fast_switch
intel_idle: clean up the (new) state_update_enter_method function
intel_idle: refactor state->enter manipulation into its own function
platform/x86/amd: pmc: Use pm_pr_dbg() for suspend related messages
pinctrl: amd: Use pm_pr_dbg to show debugging messages
ACPI: x86: Add pm_debug_messages for LPS0 _DSM state tracking
include/linux/suspend.h: Only show pm_pr_dbg messages at suspend/resume
powercap: RAPL: Fix a NULL vs IS_ERR() bug
powercap: RAPL: Fix CONFIG_IOSF_MBI dependency
powercap: RAPL: fix invalid initialization for pl4_supported field
...
Linus Torvalds [Tue, 27 Jun 2023 02:30:19 +0000 (19:30 -0700)]
Merge tag 'acpi-6.5-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These rework the handling of notifications in ACPI button drivers (to
enable future simplifications and cleanups), clean up the ACPI thermal
driver, update the ACPI backlight driver, add quirks working around
AML bugs on some systems, fix some assorted issues and clean up code.
Specifics:
- Reduce ACPI device enumeration overhead related to devices with
dependencies (Rafael Wysocki)
- Fix the handling of Microsoft LPS0 _DSM for suspend-to-idle (Mario
Limonciello)
- Fix section mismatch warning in the ACPI suspend-to-idle code (Arnd
Bergmann)
- Drop several ACPI resource management quirks related to IRQ
ovverides on AMD "Zen" systems (Mario Limonciello)
- Modify the ACPI EC driver to make it only clear the EC GPE status
when handling the GPE (Jeremy Compostella)
- Add quirks to work around ACPI tables defects on Lenovo Yoga Book
yb1-x90f/l and Nextbook Ares 8A (Hans de Goede)
- Add ACPi backlight quirks for Dell Studio 1569, Lenovo ThinkPad
X131e (3371 AMD version) and Apple iMac11,3 and stop trying to use
vendor backlight control on relatively recent systems (Hans de
Goede)
- Add pwm_lookup_table entry for second PWM on CHT/BSW devices in the
ACPI LPSS (Intel SoC) driver (Hans de Goede)
- Add nfit_intel_shutdown_status() declaration to a local header to
avoid a "missing prototypes" build warning (Arnd Bergmann)
- Clean up the ACPI thermal driver and drop some dead or otherwise
unneded code from it (Rafael Wysocki)
- Rework the handling of notifications in the ACPI button drivers so
as to allow the common notification handling code for devices to be
simplified (Rafael Wysocki)
- Make ghes_get_devices() return NULL to indicate that there are no
GHES devices so as to allow vendor-specific EDAC drivers to probe
then (Li Yang)
- Mark bert_disable() as __initdata and drop an unused function from
the APEI GHES code (Miaohe Lin)
- Make the ACPI PAD (Processor Aggregator Device) driver realize that
Zhaoxin CPUs support nonstop TSC (Tony W Wang-oc)
- Drop the certainly unnecessary and likely incorrect inclusion of
linux/arm-smccc.h from acpi_ffh.c (Sudeep Holla)"
* tag 'acpi-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (30 commits)
ACPI: video: Add backlight=native DMI quirk for Dell Studio 1569
ACPI: thermal: Drop struct acpi_thermal_flags
ACPI: thermal: Drop struct acpi_thermal_state
ACPI: bus: Simplify installation and removal of notify callback
ACPI: tiny-power-button: Eliminate the driver notify callback
ACPI: button: Use different notify handlers for lid and buttons
ACPI: button: Eliminate the driver notify callback
ACPI: thermal: Eliminate struct acpi_thermal_state_flags
ACPI: thermal: Move acpi_thermal_driver definition
ACPI: thermal: Move symbol definitions to one place
ACPI: thermal: Drop redundant ACPI_TRIPS_REFRESH_DEVICES symbol
ACPI: thermal: Use BIT() macro for defining flags
APEI: GHES: correctly return NULL for ghes_get_devices()
ACPI: FFH: Drop the inclusion of linux/arm-smccc.h
ACPI: PAD: mark Zhaoxin CPUs NONSTOP TSC correctly
ACPI: APEI: mark bert_disable as __initdata
ACPI: EC: Clear GPE on interrupt handling only
ACPI: video: Stop trying to use vendor backlight control on laptops from after ~2012
ACPI: x86: s2idle: Adjust Microsoft LPS0 _DSM handling sequence
ACPI: resource: Remove "Zen" specific match and quirks
...
Linus Torvalds [Tue, 27 Jun 2023 00:11:53 +0000 (17:11 -0700)]
Merge tag 'arm64-upstream' of git://git./linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Notable features are user-space support for the memcpy/memset
instructions and the permission indirection extension.
- Support for the Armv8.9 Permission Indirection Extensions. While
this feature doesn't add new functionality, it enables future
support for Guarded Control Stacks (GCS) and Permission Overlays
- User-space support for the Armv8.8 memcpy/memset instructions
- arm64 perf: support the HiSilicon SoC uncore PMU, Arm CMN sysfs
identifier, support for the NXP i.MX9 SoC DDRC PMU, fixes and
cleanups
- Removal of superfluous ISBs on context switch (following
retrospective architecture tightening)
- Decode the ISS2 register during faults for additional information
to help with debugging
- KPTI clean-up/simplification of the trampoline exit code
- Addressing several -Wmissing-prototype warnings
- Kselftest improvements for signal handling and ptrace
- Fix TPIDR2_EL0 restoring on sigreturn
- Clean-up, robustness improvements of the module allocation code
- More sysreg conversions to the automatic register/bitfields
generation
- CPU capabilities handling cleanup
- Arm documentation updates: ACPI, ptdump"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (124 commits)
kselftest/arm64: Add a test case for TPIDR2 restore
arm64/signal: Restore TPIDR2 register rather than memory state
arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe
Documentation/arm64: Add ptdump documentation
arm64: hibernate: remove WARN_ON in save_processor_state
kselftest/arm64: Log signal code and address for unexpected signals
docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst
arm64/fpsimd: Exit streaming mode when flushing tasks
docs: perf: Add new description for HiSilicon UC PMU
drivers/perf: hisi: Add support for HiSilicon UC PMU driver
drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver
perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE
perf/arm-cmn: Add sysfs identifier
perf/arm-cmn: Revamp model detection
perf/arm_dmc620: Add cpumask
arm64: mm: fix VA-range sanity check
arm64/mm: remove now-superfluous ISBs from TTBR writes
Documentation/arm64: Update ACPI tables from BBR
Documentation/arm64: Update references in arm-acpi
Documentation/arm64: Update ARM and arch reference
...
Linus Torvalds [Tue, 27 Jun 2023 00:07:53 +0000 (17:07 -0700)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
- lots of build cleanups from Arnd spread throughout the arch/arm tree
- replace strlcpy() with the preferred strscpy()
- use sign_extend32() in the module linker
- drop handle_irq() machine descriptor method
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9315/1: fiq: include asm/mach/irq.h for prototypes
ARM: 9314/1: tcm: move tcm_init() prototype to asm/tcm.h
ARM: 9313/1: vdso: add missing prototypes
ARM: 9312/1: vfp: include asm/neon.h in vfpmodule.c
ARM: 9311/1: decompressor: move function prototypes to misc.h
ARM: 9310/1: xip-kernel: add __inflate_kernel_data prototype
ARM: 9309/1: add missing syscall prototypes
ARM: 9308/1: move setup functions to header
ARM: 9307/1: nommu: include asm/idmap.h
ARM: 9306/1: cacheflush: avoid __flush_anon_page() missing-prototype warning
ARM: 9305/1: add clear/copy_user_highpage declarations
ARM: 9304/1: add prototype for function called only from asm
ARM: 9303/1: kprobes: avoid missing-declaration warnings
ARM: 9302/1: traps: hide unused functions on NOMMU
ARM: 9301/1: dma-mapping: hide unused dma_contiguous_early_fixup function
ARM: 9300/1: Replace all non-returning strlcpy with strscpy
ARM: 9299/1: module: use sign_extend32() to extend the signedness
ARM: 9298/1: Drop custom mdesc->handle_irq()
Linus Torvalds [Tue, 27 Jun 2023 00:04:03 +0000 (17:04 -0700)]
Merge tag 'm68k-for-v6.5-tag1' of git://git./linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- miscellaneous NuBus fixes and improvements
- defconfig updates
* tag 'm68k-for-v6.5-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: defconfig: Update defconfigs for v6.4-rc1
nubus: Don't list slot resources by default
nubus: Remove proc entries before adding them
nubus: Partially revert proc_create_single_data() conversion
Linus Torvalds [Mon, 26 Jun 2023 23:43:54 +0000 (16:43 -0700)]
Merge tag 'x86_cleanups_for_6.5' of git://git./linux/kernel/git/tip/tip
Pull x86 cleanups from Dave Hansen:
"As usual, these are all over the map. The biggest cluster is work from
Arnd to eliminate -Wmissing-prototype warnings:
- Address -Wmissing-prototype warnings
- Remove repeated 'the' in comments
- Remove unused current_untag_mask()
- Document urgent tip branch timing
- Clean up MSR kernel-doc notation
- Clean up paravirt_ops doc
- Update Srivatsa S. Bhat's maintained areas
- Remove unused extern declaration acpi_copy_wakeup_routine()"
* tag 'x86_cleanups_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
x86/acpi: Remove unused extern declaration acpi_copy_wakeup_routine()
Documentation: virt: Clean up paravirt_ops doc
x86/mm: Remove unused current_untag_mask()
x86/mm: Remove repeated word in comments
x86/lib/msr: Clean up kernel-doc notation
x86/platform: Avoid missing-prototype warnings for OLPC
x86/mm: Add early_memremap_pgprot_adjust() prototype
x86/usercopy: Include arch_wb_cache_pmem() declaration
x86/vdso: Include vdso/processor.h
x86/mce: Add copy_mc_fragile_handle_tail() prototype
x86/fbdev: Include asm/fb.h as needed
x86/hibernate: Declare global functions in suspend.h
x86/entry: Add do_SYSENTER_32() prototype
x86/quirks: Include linux/pnp.h for arch_pnpbios_disabled()
x86/mm: Include asm/numa.h for set_highmem_pages_init()
x86: Avoid missing-prototype warnings for doublefault code
x86/fpu: Include asm/fpu/regset.h
x86: Add dummy prototype for mk_early_pgtbl_32()
x86/pci: Mark local functions as 'static'
x86/ftrace: Move prepare_ftrace_return prototype to header
...
Theodore Ts'o [Fri, 23 Jun 2023 14:18:51 +0000 (10:18 -0400)]
ext4: avoid updating the superblock on a r/o mount if not needed
This was noticed by a user who noticied that the mtime of a file
backing a loopback device was getting bumped when the loopback device
is mounted read/only. Note: This doesn't show up when doing a
loopback mount of a file directly, via "mount -o ro /tmp/foo.img
/mnt", since the loop device is set read-only when mount automatically
creates loop device. However, this is noticeable for a LUKS loop
device like this:
% cryptsetup luksOpen /tmp/foo.img test
% mount -o ro /dev/loop0 /mnt ; umount /mnt
or, if LUKS is not in use, if the user manually creates the loop
device like this:
% losetup /dev/loop0 /tmp/foo.img
% mount -o ro /dev/loop0 /mnt ; umount /mnt
The modified mtime causes rsync to do a rolling checksum scan of the
file on the local and remote side, incrementally increasing the time
to rsync the not-modified-but-touched image file.
Fixes:
eee00237fa5e ("ext4: commit super block if fs record error when journal record without error")
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/ZIauBR7YiV3rVAHL@glitch
Reported-by: Sean Greenslade <sean@seangreenslade.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhang Yi [Fri, 16 Jun 2023 01:55:47 +0000 (09:55 +0800)]
jbd2: skip reading super block if it has been verified
We got a NULL pointer dereference issue below while running generic/475
I/O failure pressure test.
BUG: kernel NULL pointer dereference, address:
0000000000000000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP PTI
CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190
RIP: 0010:jbd2_journal_set_features+0x13d/0x430
...
Call Trace:
<TASK>
? __die+0x23/0x60
? page_fault_oops+0xa4/0x170
? exc_page_fault+0x67/0x170
? asm_exc_page_fault+0x26/0x30
? jbd2_journal_set_features+0x13d/0x430
jbd2_journal_revoke+0x47/0x1e0
__ext4_forget+0xc3/0x1b0
ext4_free_blocks+0x214/0x2f0
ext4_free_branches+0xeb/0x270
ext4_ind_truncate+0x2bf/0x320
ext4_truncate+0x1e4/0x490
ext4_handle_inode_extension+0x1bd/0x2a0
? iomap_dio_complete+0xaf/0x1d0
The root cause is the journal super block had been failed to write out
due to I/O fault injection, it's uptodate bit was cleared by
end_buffer_write_sync() and didn't reset yet in jbd2_write_superblock().
And it raced by journal_get_superblock()->bh_read(), unfortunately, the
read IO is also failed, so the error handling in
journal_fail_superblock() unexpectedly clear the journal->j_sb_buffer,
finally lead to above NULL pointer dereference issue.
If the journal super block had been read and verified, there is no need
to call bh_read() read it again even if it has been failed to written
out. So the fix could be simply move buffer_verified(bh) in front of
bh_read(). Also remove a stale comment left in
jbd2_journal_check_used_features().
Fixes:
51bacdba23d8 ("jbd2: factor out journal initialization from journal_get_superblock()")
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230616015547.3155195-1-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Chao Yu [Tue, 6 Jun 2023 07:32:03 +0000 (15:32 +0800)]
ext4: fix to check return value of freeze_bdev() in ext4_shutdown()
freeze_bdev() can fail due to a lot of reasons, it needs to check its
reason before later process.
Fixes:
783d94854499 ("ext4: add EXT4_IOC_GOINGDOWN ioctl")
Cc: stable@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230606073203.1310389-1-chao@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Baokun Li [Mon, 27 Mar 2023 14:16:30 +0000 (22:16 +0800)]
ext4: refactoring to use the unified helper ext4_quotas_off()
Rename ext4_quota_off_umount() to ext4_quotas_off(), and add type
parameter to replace open code in ext4_enable_quotas().
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230327141630.156875-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Baokun Li [Mon, 27 Mar 2023 14:16:29 +0000 (22:16 +0800)]
ext4: turn quotas off if mount failed after enabling quotas
Yi found during a review of the patch "ext4: don't BUG on inconsistent
journal feature" that when ext4_mark_recovery_complete() returns an error
value, the error handling path does not turn off the enabled quotas,
which triggers the following kmemleak:
================================================================
unreferenced object 0xffff8cf68678e7c0 (size 64):
comm "mount", pid 746, jiffies
4294871231 (age 11.540s)
hex dump (first 32 bytes):
00 90 ef 82 f6 8c ff ff 00 00 00 00 41 01 00 00 ............A...
c7 00 00 00 bd 00 00 00 0a 00 00 00 48 00 00 00 ............H...
backtrace:
[<
00000000c561ef24>] __kmem_cache_alloc_node+0x4d4/0x880
[<
00000000d4e621d7>] kmalloc_trace+0x39/0x140
[<
00000000837eee74>] v2_read_file_info+0x18a/0x3a0
[<
0000000088f6c877>] dquot_load_quota_sb+0x2ed/0x770
[<
00000000340a4782>] dquot_load_quota_inode+0xc6/0x1c0
[<
0000000089a18bd5>] ext4_enable_quotas+0x17e/0x3a0 [ext4]
[<
000000003a0268fa>] __ext4_fill_super+0x3448/0x3910 [ext4]
[<
00000000b0f2a8a8>] ext4_fill_super+0x13d/0x340 [ext4]
[<
000000004a9489c4>] get_tree_bdev+0x1dc/0x370
[<
000000006e723bf1>] ext4_get_tree+0x1d/0x30 [ext4]
[<
00000000c7cb663d>] vfs_get_tree+0x31/0x160
[<
00000000320e1bed>] do_new_mount+0x1d5/0x480
[<
00000000c074654c>] path_mount+0x22e/0xbe0
[<
0000000003e97a8e>] do_mount+0x95/0xc0
[<
000000002f3d3736>] __x64_sys_mount+0xc4/0x160
[<
0000000027d2140c>] do_syscall_64+0x3f/0x90
================================================================
To solve this problem, we add a "failed_mount10" tag, and call
ext4_quota_off_umount() in this tag to release the enabled qoutas.
Fixes:
11215630aada ("ext4: don't BUG on inconsistent journal feature")
Cc: stable@kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230327141630.156875-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhang Yi [Wed, 22 Mar 2023 01:33:53 +0000 (09:33 +0800)]
ext4: update doc about journal superblock description
Update the description in doc about the s_head parameter in the journal
superblock.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20230322013353.1843306-4-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhang Yi [Wed, 22 Mar 2023 01:33:52 +0000 (09:33 +0800)]
ext4: add journal cycled recording support
Always enable 'JBD2_CYCLE_RECORD' journal option on ext4, letting the
jbd2 continue to record new journal transactions from the recovered
journal head or the checkpointed transactions in the previous mount.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230322013353.1843306-3-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhang Yi [Wed, 22 Mar 2023 01:33:51 +0000 (09:33 +0800)]
jbd2: continue to record log between each mount
For a newly mounted file system, the journal committing thread always
record new transactions from the start of the journal area, no matter
whether the journal was clean or just has been recovered. So the logdump
code in debugfs cannot dump continuous logs between each mount, it is
disadvantageous to analysis corrupted file system image and locate the
file system inconsistency bugs.
If we get a corrupted file system in the running products and want to
find out what has happened, besides lookup the system log, one effective
way is to backtrack the journal log. But we may not always run e2fsck
before each mount and the default fsck -a mode also cannot always
checkout all inconsistencies, so it could left over some inconsistencies
into the next mount until we detect it. Finally, transactions in the
journal may probably discontinuous and some relatively new transactions
has been covered, it becomes hard to analyse. If we could record
transactions continuously between each mount, we could acquire more
useful info from the journal. Like this:
|Previous mount checkpointed/recovered logs|Current mount logs |
|{------}{---}{--------} ... {------}| ... |{======}{========}...000000|
And yes the journal area is limited and cannot record everything, the
problematic transaction may also be covered even if we do this, but
this is still useful for fuzzy tests and short-running products.
This patch save the head blocknr in the superblock after flushing the
journal or unmounting the file system, let the next mount could continue
to record new transaction behind it. This change is backward compatible
because the old kernel does not care about the head blocknr of the
journal. It is also fine if we mount a clean old image without valid
head blocknr, we fail back to set it to s_first just like before.
Finally, for the case of mount an unclean file system, we could also get
the journal head easily after scanning/replaying the journal, it will
continue to record new transaction after the recovered transactions.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230322013353.1843306-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhang Yi [Wed, 15 Mar 2023 01:31:28 +0000 (09:31 +0800)]
jbd2: remove j_format_version
journal->j_format_version is no longer used, remove it.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230315013128.3911115-7-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhang Yi [Wed, 15 Mar 2023 01:31:27 +0000 (09:31 +0800)]
jbd2: factor out journal initialization from journal_get_superblock()
Current journal_get_superblock() couple journal superblock checking and
partial journal initialization, factor out initialization part from it
to make things clear.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230315013128.3911115-6-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhang Yi [Wed, 15 Mar 2023 01:31:26 +0000 (09:31 +0800)]
jbd2: switch to check format version in superblock directly
We should only check and set extented features if journal format version
is 2, and now we check the in memory copy of the superblock
'journal->j_format_version', which relys on the parameter initialization
sequence, switch to use the h_blocktype in superblock cloud be more
clear.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230315013128.3911115-5-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhang Yi [Wed, 15 Mar 2023 01:31:25 +0000 (09:31 +0800)]
jbd2: remove unused feature macros
JBD2_HAS_[IN|RO_]COMPAT_FEATURE macros are no longer used, just remove
them.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230315013128.3911115-4-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhihao Cheng [Wed, 15 Mar 2023 01:31:24 +0000 (09:31 +0800)]
ext4: ext4_put_super: Remove redundant checking for 'sbi->s_journal_bdev'
As discussed in [1], 'sbi->s_journal_bdev != sb->s_bdev' will always
become true if sbi->s_journal_bdev exists. Filesystem block device and
journal block device are both opened with 'FMODE_EXCL' mode, so these
two devices can't be same one. Then we can remove the redundant checking
'sbi->s_journal_bdev != sb->s_bdev' if 'sbi->s_journal_bdev' exists.
[1] https://lore.kernel.org/lkml/
f86584f6-3877-ff18-47a1-
2efaa12d18b2@huawei.com/
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230315013128.3911115-3-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhihao Cheng [Wed, 15 Mar 2023 01:31:23 +0000 (09:31 +0800)]
ext4: Fix reusing stale buffer heads from last failed mounting
Following process makes ext4 load stale buffer heads from last failed
mounting in a new mounting operation:
mount_bdev
ext4_fill_super
| ext4_load_and_init_journal
| ext4_load_journal
| jbd2_journal_load
| load_superblock
| journal_get_superblock
| set_buffer_verified(bh) // buffer head is verified
| jbd2_journal_recover // failed caused by EIO
| goto failed_mount3a // skip 'sb->s_root' initialization
deactivate_locked_super
kill_block_super
generic_shutdown_super
if (sb->s_root)
// false, skip ext4_put_super->invalidate_bdev->
// invalidate_mapping_pages->mapping_evict_folio->
// filemap_release_folio->try_to_free_buffers, which
// cannot drop buffer head.
blkdev_put
blkdev_put_whole
if (atomic_dec_and_test(&bdev->bd_openers))
// false, systemd-udev happens to open the device. Then
// blkdev_flush_mapping->kill_bdev->truncate_inode_pages->
// truncate_inode_folio->truncate_cleanup_folio->
// folio_invalidate->block_invalidate_folio->
// filemap_release_folio->try_to_free_buffers will be skipped,
// dropping buffer head is missed again.
Second mount:
ext4_fill_super
ext4_load_and_init_journal
ext4_load_journal
ext4_get_journal
jbd2_journal_init_inode
journal_init_common
bh = getblk_unmovable
bh = __find_get_block // Found stale bh in last failed mounting
journal->j_sb_buffer = bh
jbd2_journal_load
load_superblock
journal_get_superblock
if (buffer_verified(bh))
// true, skip journal->j_format_version = 2, value is 0
jbd2_journal_recover
do_one_pass
next_log_block += count_tags(journal, bh)
// According to journal_tag_bytes(), 'tag_bytes' calculating is
// affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3()
// returns false because 'j->j_format_version >= 2' is not true,
// then we get wrong next_log_block. The do_one_pass may exit
// early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'.
The filesystem is corrupted here, journal is partially replayed, and
new journal sequence number actually is already used by last mounting.
The invalidate_bdev() can drop all buffer heads even racing with bare
reading block device(eg. systemd-udev), so we can fix it by invalidating
bdev in error handling path in __ext4_fill_super().
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171
Fixes:
25ed6e8a54df ("jbd2: enable journal clients to enable v2 checksumming")
Cc: stable@vger.kernel.org # v3.5
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230315013128.3911115-2-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Brian Foster [Tue, 14 Mar 2023 13:07:59 +0000 (09:07 -0400)]
ext4: allow concurrent unaligned dio overwrites
We've had reports of significant performance regression of sub-block
(unaligned) direct writes due to the added exclusivity restrictions
in ext4. The purpose of the exclusivity requirement for unaligned
direct writes is to avoid data corruption caused by unserialized
partial block zeroing in the iomap dio layer across overlapping
writes.
XFS has similar requirements for the same underlying reasons, yet
doesn't suffer the extreme performance regression that ext4 does.
The reason for this is that XFS utilizes IOMAP_DIO_OVERWRITE_ONLY
mode, which allows for optimistic submission of concurrent unaligned
I/O and kicks back writes that require partial block zeroing such
that they can be submitted in a safe, exclusive context. Since ext4
already performs most of these checks pre-submission, it can support
something similar without necessarily relying on the iomap flag and
associated retry mechanism.
Update the dio write submission path to allow concurrent submission
of unaligned direct writes that are purely overwrite and so will not
require block zeroing. To improve readability of the various related
checks, move the unaligned I/O handling down into
ext4_dio_write_checks(), where the dio draining and force wait logic
can immediately follow the locking requirement checks. Finally, the
IOMAP_DIO_OVERWRITE_ONLY flag is set to enable a warning check as a
precaution should the ext4 overwrite logic ever become inconsistent
with the zeroing expectations of iomap dio.
The performance improvement of sub-block direct write I/O is shown
in the following fio test on a 64xcpu guest vm:
Test: fio --name=test --ioengine=libaio --direct=1 --group_reporting
--overwrite=1 --thread --size=10G --filename=/mnt/fio
--readwrite=write --ramp_time=10s --runtime=60s --numjobs=8
--blocksize=2k --iodepth=256 --allow_file_create=0
v6.2: write: IOPS=4328, BW=8724KiB/s
v6.2 (patched): write: IOPS=801k, BW=1565MiB/s
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230314130759.642710-1-bfoster@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Thu, 8 Jun 2023 14:39:35 +0000 (10:39 -0400)]
ext4: clean up mballoc criteria comments
Line wrap and slightly clarify the comments describing mballoc's
cirtiera.
Define EXT4_MB_NUM_CRS as part of the enum, so that it will
automatically get updated when criteria is added or removed.
Also fix a potential unitialized use of 'cr' variable if
CONFIG_EXT4_DEBUG is enabled.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Baokun Li [Mon, 24 Apr 2023 03:38:46 +0000 (11:38 +0800)]
ext4: make ext4_zeroout_es() return void
After ext4_es_insert_extent() returns void, the return value in
ext4_zeroout_es() is also unnecessary, so make it return void too.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230424033846.4732-13-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>