Catalin Marinas [Wed, 25 Mar 2020 11:10:51 +0000 (11:10 +0000)]
Merge branch 'for-next/asm-cleanups' into for-next/core
* for-next/asm-cleanups:
: Various asm clean-ups (alignment, mov_q vs ldr, .idmap)
arm64: move kimage_vaddr to .rodata
arm64: use mov_q instead of literal ldr
Catalin Marinas [Wed, 25 Mar 2020 11:10:46 +0000 (11:10 +0000)]
Merge branch 'for-next/asm-annotations' into for-next/core
* for-next/asm-annotations:
: Modernise arm64 assembly annotations
arm64: head: Convert install_el2_stub to SYM_INNER_LABEL
arm64: Mark call_smc_arch_workaround_1 as __maybe_unused
arm64: entry-ftrace.S: Fix missing argument for CONFIG_FUNCTION_GRAPH_TRACER=y
arm64: vdso32: Convert to modern assembler annotations
arm64: vdso: Convert to modern assembler annotations
arm64: sdei: Annotate SDEI entry points using new style annotations
arm64: kvm: Modernize __smccc_workaround_1_smc_start annotations
arm64: kvm: Modernize annotation for __bp_harden_hyp_vecs
arm64: kvm: Annotate assembly using modern annoations
arm64: kernel: Convert to modern annotations for assembly data
arm64: head: Annotate stext and preserve_boot_args as code
arm64: head.S: Convert to modern annotations for assembly functions
arm64: ftrace: Modernise annotation of return_to_handler
arm64: ftrace: Correct annotation of ftrace_caller assembly
arm64: entry-ftrace.S: Convert to modern annotations for assembly functions
arm64: entry: Additional annotation conversions for entry.S
arm64: entry: Annotate ret_from_fork as code
arm64: entry: Annotate vector table and handlers as code
arm64: crypto: Modernize names for AES function macros
arm64: crypto: Modernize some extra assembly annotations
Catalin Marinas [Wed, 25 Mar 2020 11:10:32 +0000 (11:10 +0000)]
Merge branches 'for-next/memory-hotremove', 'for-next/arm_sdei', 'for-next/amu', 'for-next/final-cap-helper', 'for-next/cpu_ops-cleanup', 'for-next/misc' and 'for-next/perf' into for-next/core
* for-next/memory-hotremove:
: Memory hot-remove support for arm64
arm64/mm: Enable memory hot remove
arm64/mm: Hold memory hotplug lock while walking for kernel page table dump
* for-next/arm_sdei:
: SDEI: fix double locking on return from hibernate and clean-up
firmware: arm_sdei: clean up sdei_event_create()
firmware: arm_sdei: Use cpus_read_lock() to avoid races with cpuhp
firmware: arm_sdei: fix possible double-lock on hibernate error path
firmware: arm_sdei: fix double-lock on hibernate with shared events
* for-next/amu:
: ARMv8.4 Activity Monitors support
clocksource/drivers/arm_arch_timer: validate arch_timer_rate
arm64: use activity monitors for frequency invariance
cpufreq: add function to get the hardware max frequency
Documentation: arm64: document support for the AMU extension
arm64/kvm: disable access to AMU registers from kvm guests
arm64: trap to EL1 accesses to AMU counters from EL0
arm64: add support for the AMU extension v1
* for-next/final-cap-helper:
: Introduce cpus_have_final_cap_helper(), migrate arm64 KVM to it
arm64: kvm: hyp: use cpus_have_final_cap()
arm64: cpufeature: add cpus_have_final_cap()
* for-next/cpu_ops-cleanup:
: cpu_ops[] access code clean-up
arm64: Introduce get_cpu_ops() helper function
arm64: Rename cpu_read_ops() to init_cpu_ops()
arm64: Declare ACPI parking protocol CPU operation if needed
* for-next/misc:
: Various fixes and clean-ups
arm64: define __alloc_zeroed_user_highpage
arm64/kernel: Simplify __cpu_up() by bailing out early
arm64: remove redundant blank for '=' operator
arm64: kexec_file: Fixed code style.
arm64: add blank after 'if'
arm64: fix spelling mistake "ca not" -> "cannot"
arm64: entry: unmask IRQ in el0_sp()
arm64: efi: add efi-entry.o to targets instead of extra-$(CONFIG_EFI)
arm64: csum: Optimise IPv6 header checksum
arch/arm64: fix typo in a comment
arm64: remove gratuitious/stray .ltorg stanzas
arm64: Update comment for ASID() macro
arm64: mm: convert cpu_do_switch_mm() to C
arm64: fix NUMA Kconfig typos
* for-next/perf:
: arm64 perf updates
arm64: perf: Add support for ARMv8.5-PMU 64-bit counters
KVM: arm64: limit PMU version to PMUv3 for ARMv8.1
arm64: cpufeature: Extract capped perfmon fields
arm64: perf: Clean up enable/disable calls
perf: arm-ccn: Use scnprintf() for robustness
arm64: perf: Support new DT compatibles
arm64: perf: Refactor PMU init callbacks
perf: arm_spe: Remove unnecessary zero check on 'nr_pages'
Mark Brown [Mon, 23 Mar 2020 12:33:36 +0000 (12:33 +0000)]
arm64: head: Convert install_el2_stub to SYM_INNER_LABEL
New assembly annotations have recently been introduced which aim to
make the way we describe symbols in assembly more consistent. Recently the
arm64 assembler was converted to use these but install_el2_stub was missed.
Signed-off-by: Mark Brown <broonie@kernel.org>
[catalin.marinas@arm.com: changed to SYM_L_LOCAL]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Gavin Shan [Wed, 18 Mar 2020 23:01:44 +0000 (10:01 +1100)]
arm64: Introduce get_cpu_ops() helper function
This introduces get_cpu_ops() to return the CPU operations according to
the given CPU index. For now, it simply returns the @cpu_ops[cpu] as
before. Also, helper function __cpu_try_die() is introduced to be shared
by cpu_die() and ipi_cpu_crash_stop(). So it shouldn't introduce any
functional changes.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Gavin Shan [Wed, 18 Mar 2020 23:01:43 +0000 (10:01 +1100)]
arm64: Rename cpu_read_ops() to init_cpu_ops()
This renames cpu_read_ops() to init_cpu_ops() as the function is only
called in initialization phase. Also, we will introduce get_cpu_ops() in
the subsequent patches, to retireve the CPU operation by the given CPU
index. The usage of cpu_read_ops() and get_cpu_ops() are difficult to be
distinguished from their names.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Gavin Shan [Wed, 18 Mar 2020 23:01:42 +0000 (10:01 +1100)]
arm64: Declare ACPI parking protocol CPU operation if needed
It's obvious we needn't declare the corresponding CPU operation when
CONFIG_ARM64_ACPI_PARKING_PROTOCOL is disabled, even it doesn't cause
any compiling warnings.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Remi Denis-Courmont [Thu, 12 Mar 2020 09:40:02 +0000 (11:40 +0200)]
arm64: move kimage_vaddr to .rodata
This datum is not referenced from .idmap.text: it does not need to be
mapped in idmap. Lets move it to .rodata as it is never written to after
early boot of the primary CPU.
(Maybe .data.ro_after_init would be cleaner though?)
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Remi Denis-Courmont [Wed, 4 Mar 2020 09:36:31 +0000 (11:36 +0200)]
arm64: use mov_q instead of literal ldr
In practice, this requires only 2 instructions, or even only 1 for
the idmap_pg_dir size (with 4 or 64 KiB pages). Only the MAIR values
needed more than 2 instructions and it was already converted to mov_q
by
95b3f74bec203804658e17f86fe20755bb8abcb9.
Signed-off-by: Remi Denis-Courmont <remi.denis.courmont@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Andrew Murray [Mon, 2 Mar 2020 18:17:52 +0000 (18:17 +0000)]
arm64: perf: Add support for ARMv8.5-PMU 64-bit counters
At present ARMv8 event counters are limited to 32-bits, though by
using the CHAIN event it's possible to combine adjacent counters to
achieve 64-bits. The perf config1:0 bit can be set to use such a
configuration.
With the introduction of ARMv8.5-PMU support, all event counters can
now be used as 64-bit counters.
Let's enable 64-bit event counters where support exists. Unless the
user sets config1:0 we will adjust the counter value such that it
overflows upon 32-bit overflow. This follows the same behaviour as
the cycle counter which has always been (and remains) 64-bits.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[Mark: fix ID field names, compare with 8.5 value]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Andrew Murray [Mon, 2 Mar 2020 18:17:51 +0000 (18:17 +0000)]
KVM: arm64: limit PMU version to PMUv3 for ARMv8.1
We currently expose the PMU version of the host to the guest via
emulation of the DFR0_EL1 and AA64DFR0_EL1 debug feature registers.
However many of the features offered beyond PMUv3 for 8.1 are not
supported in KVM. Examples of this include support for the PMMIR
registers (added in PMUv3 for ARMv8.4) and 64-bit event counters
added in (PMUv3 for ARMv8.5).
Let's trap the Debug Feature Registers in order to limit
PMUVer/PerfMon in the Debug Feature Registers to PMUv3 for ARMv8.1
to avoid unexpected behaviour.
Both ID_AA64DFR0.PMUVer and ID_DFR0.PerfMon follow the "Alternative ID
scheme used for the Performance Monitors Extension version" where 0xF
means an IMPLEMENTATION DEFINED PMU is implemented, and values 0x0-0xE
are treated as with an unsigned field (with 0x0 meaning no PMU is
present). As we don't expect to expose an IMPLEMENTATION DEFINED PMU,
and our cap is below 0xF, we can treat these fields as unsigned when
applying the cap.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[Mark: make field names consistent, use perfmon cap]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Andrew Murray [Mon, 2 Mar 2020 18:17:50 +0000 (18:17 +0000)]
arm64: cpufeature: Extract capped perfmon fields
When emulating ID registers there is often a need to cap the version
bits of a feature such that the guest will not use features that the
host is not aware of. For example, when KVM mediates access to the PMU
by emulating register accesses.
Let's add a helper that extracts a performance monitors ID field and
caps the version to a given value.
Fields that identify the version of the Performance Monitors Extension
do not follow the standard ID scheme, and instead follow the scheme
described in ARM DDI 0487E.a page D13-2825 "Alternative ID scheme used
for the Performance Monitors Extension version". The value 0xF means an
IMPLEMENTATION DEFINED PMU is present, and values 0x0-OxE can be treated
the same as an unsigned field with 0x0 meaning no PMU is present.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[Mark: rework to handle perfmon fields]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Robin Murphy [Tue, 17 Mar 2020 18:22:54 +0000 (18:22 +0000)]
arm64: perf: Clean up enable/disable calls
Reading this code bordered on painful, what with all the repetition and
pointless return values. More fundamentally, dribbling the hardware
enables and disables in one bit at a time incurs needless system
register overhead for chained events and on reset. We already use
bitmask values for the KVM hooks, so consolidate all the register
accesses to match, and make a reasonable saving in both source and
object code.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Takashi Iwai [Sun, 15 Mar 2020 09:37:15 +0000 (10:37 +0100)]
perf: arm-ccn: Use scnprintf() for robustness
snprintf() is a hard-to-use function, it's especially difficult to use
it for concatenating substrings in a buffer with a limited size.
Since snprintf() returns the would-be-output size, not the actual
size, the subsequent use of snprintf() may point to the incorrect
position easily. Although the current code doesn't actually overflow
the buffer, it's an incorrect usage.
This patch replaces such snprintf() calls with a safer version,
scnprintf().
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Will Deacon <will@kernel.org>
glider@google.com [Thu, 12 Mar 2020 15:59:20 +0000 (16:59 +0100)]
arm64: define __alloc_zeroed_user_highpage
When running the kernel with init_on_alloc=1, calling the default
implementation of __alloc_zeroed_user_highpage() from include/linux/highmem.h
leads to double-initialization of the allocated page (first by the page
allocator, then by clear_user_page().
Calling alloc_page_vma() with __GFP_ZERO, similarly to e.g. x86, seems
to be enough to ensure the user page is zeroed only once.
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Gavin Shan [Mon, 2 Mar 2020 02:03:40 +0000 (13:03 +1100)]
arm64/kernel: Simplify __cpu_up() by bailing out early
The function __cpu_up() is invoked to bring up the target CPU through
the backend, PSCI for example. The nested if statements won't be needed
if we bail out early on the following two conditions where the status
won't be checked. The code looks simplified in that case.
* Error returned from the backend (e.g. PSCI)
* The target CPU has been marked as onlined
Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
韩科才 [Wed, 11 Mar 2020 06:52:49 +0000 (14:52 +0800)]
arm64: remove redundant blank for '=' operator
remove redundant blank for '=' operator, it may be more elegant.
Signed-off-by: hankecai <hankecai@vivo.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Li Tao [Wed, 11 Mar 2020 07:31:55 +0000 (15:31 +0800)]
arm64: kexec_file: Fixed code style.
Remove unnecessary blank.
Signed-off-by: Li Tao <tao.li@vivo.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Zheng Wei [Fri, 13 Mar 2020 14:54:02 +0000 (22:54 +0800)]
arm64: add blank after 'if'
add blank after 'if' for armv8_deprecated_init()
to make it comply with kernel coding style.
Signed-off-by: Zheng Wei <wei.zheng@vivo.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
韩科才 [Sun, 15 Mar 2020 04:01:19 +0000 (12:01 +0800)]
arm64: fix spelling mistake "ca not" -> "cannot"
There is a spelling mistake in the comment, Fix it.
Signed-off-by: hankecai <hankecai@bbktel.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Rutland [Fri, 21 Feb 2020 14:50:22 +0000 (14:50 +0000)]
arm64: kvm: hyp: use cpus_have_final_cap()
The KVM hyp code is only run after system capabilities have been
finalized, and thus all const cap checks have been patched. This is
noted in in __cpu_init_hyp_mode(), where we BUG() if called too early:
| /*
| * Call initialization code, and switch to the full blown HYP code.
| * If the cpucaps haven't been finalized yet, something has gone very
| * wrong, and hyp will crash and burn when it uses any
| * cpus_have_const_cap() wrapper.
| */
Given this, the hyp code can use cpus_have_final_cap() and avoid
generating code to check the cpu_hwcaps array, which would be unsafe to
run in hyp context.
This patch migrate the KVM hyp code to cpus_have_final_cap(), avoiding
this redundant code generation, and making it possible to detect if we
accidentally invoke this code too early. In the latter case, the BUG()
in cpus_have_final_cap() will cause a hyp panic.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Rutland [Fri, 21 Feb 2020 14:50:21 +0000 (14:50 +0000)]
arm64: cpufeature: add cpus_have_final_cap()
When cpus_have_const_cap() was originally introduced it was intended to
be safe in hyp context, where it is not safe to access the cpu_hwcaps
array as cpus_have_cap() did. For more details see commit:
a4023f682739439b ("arm64: Add hypervisor safe helper for checking constant capabilities")
We then made use of cpus_have_const_cap() throughout the kernel.
Subsequently, we had to defer updating the static_key associated with
each capability in order to avoid lockdep complaints. To avoid breaking
kernel-wide usage of cpus_have_const_cap(), this was updated to fall
back to the cpu_hwcaps array if called before the static_keys were
updated. As the kvm hyp code was only called later than this, the
fallback is redundant but not functionally harmful. For more details,
see commit:
63a1e1c95e60e798 ("arm64/cpufeature: don't use mutex in bringup path")
Today we have more users of cpus_have_const_cap() which are only called
once the relevant static keys are initialized, and it would be
beneficial to avoid the redundant code.
To that end, this patch adds a new cpus_have_final_cap(), helper which
is intend to be used in code which is only run once capabilities have
been finalized, and will never check the cpus_hwcap array. This helps
the compiler to generate better code as it no longer needs to generate
code to address and test the cpus_hwcap array. To help catch misuse,
cpus_have_final_cap() will BUG() if called before capabilities are
finalized.
In hyp context, BUG() will result in a hyp panic, but the specific BUG()
instance will not be identified in the usual way.
Comments are added to the various cpus_have_*_cap() helpers to describe
the constraints on when they can be used. For clarity cpus_have_cap() is
moved above the other helpers. Similarly the helpers are updated to use
system_capabilities_finalized() consistently, and this is made
__always_inline as required by its new callers.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Rutland [Fri, 28 Feb 2020 14:59:42 +0000 (14:59 +0000)]
arm64: entry: unmask IRQ in el0_sp()
Currently, the EL0 SP alignment handler masks IRQs unnecessarily. It
does so due to historic code sharing of the EL0 SP and PC alignment
handlers, and branch predictor hardening applicable to the EL0 SP
handler.
We began masking IRQs in the EL0 SP alignment handler in commit:
5dfc6ed27710c42c ("arm64: entry: Apply BP hardening for high-priority synchronous exception")
... as this shared code with the EL0 PC alignment handler, and branch
predictor hardening made it necessary to disable IRQs for early parts of
the EL0 PC alignment handler. It was not necessary to mask IRQs during
EL0 SP alignment exceptions, but it was not considered harmful to do so.
This masking was carried forward into C code in commit:
582f95835a8fc812 ("arm64: entry: convert el0_sync to C")
... where the SP/PC cases were split into separate handlers, and the
masking duplicated.
Subsequently the EL0 PC alignment handler was refactored to perform
branch predictor hardening before unmasking IRQs, in commit:
bfe298745afc9548 ("arm64: entry-common: don't touch daif before bp-hardening")
... but the redundant masking of IRQs was not removed from the EL0 SP
alignment handler.
Let's do so now, and make it interruptible as with most other
synchronous exception handlers.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Nathan Chancellor [Tue, 10 Mar 2020 23:25:44 +0000 (16:25 -0700)]
arm64: Mark call_smc_arch_workaround_1 as __maybe_unused
When building allnoconfig:
arch/arm64/kernel/cpu_errata.c:174:13: warning: unused function
'call_smc_arch_workaround_1' [-Wunused-function]
static void call_smc_arch_workaround_1(void)
^
1 warning generated.
Follow arch/arm and mark this function as __maybe_unused.
Fixes:
4db61fef16a1 ("arm64: kvm: Modernize __smccc_workaround_1_smc_start annotations")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Kunihiko Hayashi [Wed, 11 Mar 2020 02:36:53 +0000 (11:36 +0900)]
arm64: entry-ftrace.S: Fix missing argument for CONFIG_FUNCTION_GRAPH_TRACER=y
Missing argument of another SYM_INNER_LABEL() breaks build for
CONFIG_FUNCTION_GRAPH_TRACER=y.
Fixes:
e2d591d29d44 ("arm64: entry-ftrace.S: Convert to modern annotations for assembly functions")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Brown <broonie@kernel.org>
Masahiro Yamada [Thu, 5 Mar 2020 05:20:52 +0000 (14:20 +0900)]
arm64: efi: add efi-entry.o to targets instead of extra-$(CONFIG_EFI)
efi-entry.o is built on demand for efi-entry.stub.o, so you do not have
to repeat $(CONFIG_EFI) here. Adding it to 'targets' is enough.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Robin Murphy [Mon, 20 Jan 2020 18:52:29 +0000 (18:52 +0000)]
arm64: csum: Optimise IPv6 header checksum
Throwing our __uint128_t idioms at csum_ipv6_magic() makes it
about 1.3x-2x faster across a range of microarchitecture/compiler
combinations. Not much in absolute terms, but every little helps.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Feb 2020 19:58:42 +0000 (19:58 +0000)]
arm64: vdso32: Convert to modern assembler annotations
In an effort to clarify and simplify the annotation of assembly
functions new macros have been introduced. These replace ENTRY and
ENDPROC with two different annotations for normal functions and those
with unusual calling conventions. Use these for the compat VDSO,
allowing us to drop the custom ARM_ENTRY() and ARM_ENDPROC() macros.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Feb 2020 19:58:41 +0000 (19:58 +0000)]
arm64: vdso: Convert to modern assembler annotations
In an effort to clarify and simplify the annotation of assembly
functions new macros have been introduced. These replace ENTRY and
ENDPROC with two different annotations for normal functions and those
with unusual calling conventions. Convert the assembly function in the
arm64 VDSO to the new macros.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Feb 2020 19:58:40 +0000 (19:58 +0000)]
arm64: sdei: Annotate SDEI entry points using new style annotations
In an effort to clarify and simplify the annotation of assembly
functions new macros have been introduced. These replace ENTRY and
ENDPROC with two different annotations for normal functions and those
with unusual calling conventions.
The SDEI entry points are currently annotated as normal functions but
are called from non-kernel contexts with non-standard calling convention
and should therefore be annotated as such so do so.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: James Morse <james.Morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Feb 2020 19:58:39 +0000 (19:58 +0000)]
arm64: kvm: Modernize __smccc_workaround_1_smc_start annotations
In an effort to clarify and simplify the annotation of assembly functions
in the kernel new macros have been introduced. These replace ENTRY and
ENDPROC with separate annotations for standard C callable functions,
data and code with different calling conventions.
Using these for __smccc_workaround_1_smc is more involved than for most
symbols as this symbol is annotated quite unusually, rather than just have
the explicit symbol we define _start and _end symbols which we then use to
compute the length. This does not play at all nicely with the new style
macros. Instead define a constant for the size of the function and use that
in both the C code and for .org based size checks in the assembly code.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Mark Brown [Tue, 18 Feb 2020 19:58:38 +0000 (19:58 +0000)]
arm64: kvm: Modernize annotation for __bp_harden_hyp_vecs
We have recently introduced new macros for annotating assembly symbols
for things that aren't C functions, SYM_CODE_START() and SYM_CODE_END(),
in an effort to clarify and simplify our annotations of assembly files.
Using these for __bp_harden_hyp_vecs is more involved than for most symbols
as this symbol is annotated quite unusually as rather than just have the
explicit symbol we define _start and _end symbols which we then use to
compute the length. This does not play at all nicely with the new style
macros. Since the size of the vectors is a known constant which won't vary
the simplest thing to do is simply to drop the separate _start and _end
symbols and just use a #define for the size.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Mark Brown [Tue, 18 Feb 2020 19:58:37 +0000 (19:58 +0000)]
arm64: kvm: Annotate assembly using modern annoations
In an effort to clarify and simplify the annotation of assembly functions
in the kernel new macros have been introduced. These replace ENTRY and
ENDPROC with separate annotations for standard C callable functions,
data and code with different calling conventions. Update the more
straightforward annotations in the kvm code to the new macros.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Mark Brown [Tue, 18 Feb 2020 19:58:35 +0000 (19:58 +0000)]
arm64: kernel: Convert to modern annotations for assembly data
In an effort to clarify and simplify the annotation of assembly functions
in the kernel new macros have been introduced. These include specific
annotations for the start and end of data, update symbols for data to use
these.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Feb 2020 19:58:34 +0000 (19:58 +0000)]
arm64: head: Annotate stext and preserve_boot_args as code
In an effort to clarify and simplify the annotation of assembly
functions new macros have been introduced. These replace ENTRY and
ENDPROC with two different annotations for normal functions and those
with unusual calling conventions. Neither stext nor preserve_boot_args
is called with the usual AAPCS calling conventions and they should
therefore be annotated as code.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Feb 2020 19:58:33 +0000 (19:58 +0000)]
arm64: head.S: Convert to modern annotations for assembly functions
In an effort to clarify and simplify the annotation of assembly functions
in the kernel new macros have been introduced. These replace ENTRY and
ENDPROC and also add a new annotation for static functions which previously
had no ENTRY equivalent. Update the annotations in the core kernel code to
the new macros.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Feb 2020 19:58:32 +0000 (19:58 +0000)]
arm64: ftrace: Modernise annotation of return_to_handler
In an effort to clarify and simplify the annotation of assembly
functions new macros have been introduced. These replace ENTRY and
ENDPROC with two different annotations for normal functions and those
with unusual calling conventions.
return_to_handler does entertaining things with LR so doesn't follow the
usual C conventions and should therefore be annotated as code rather than
a function.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Feb 2020 19:58:31 +0000 (19:58 +0000)]
arm64: ftrace: Correct annotation of ftrace_caller assembly
In an effort to clarify and simplify the annotation of assembly
functions new macros have been introduced. These replace ENTRY and
ENDPROC with two different annotations for normal functions and those
with unusual calling conventions.
The patchable function entry versions of ftrace_*_caller don't follow the
usual AAPCS rules, pushing things onto the stack which they don't clean up,
and therefore should be annotated as code rather than functions.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Feb 2020 19:58:30 +0000 (19:58 +0000)]
arm64: entry-ftrace.S: Convert to modern annotations for assembly functions
In an effort to clarify and simplify the annotation of assembly functions
in the kernel new macros have been introduced. These replace ENTRY and
ENDPROC and also add a new annotation for static functions which previously
had no ENTRY equivalent. Update the annotations in the core kernel code to
the new macros.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Feb 2020 19:58:29 +0000 (19:58 +0000)]
arm64: entry: Additional annotation conversions for entry.S
In an effort to clarify and simplify the annotation of assembly functions
in the kernel new macros have been introduced. These replace ENTRY and
ENDPROC with separate annotations for standard C callable functions,
data and code with different calling conventions. Update the
remaining annotations in the entry.S code to the new macros.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Feb 2020 19:58:28 +0000 (19:58 +0000)]
arm64: entry: Annotate ret_from_fork as code
In an effort to clarify and simplify the annotation of assembly
functions new macros have been introduced. These replace ENTRY and
ENDPROC with two different annotations for normal functions and those
with unusual calling conventions.
ret_from_fork is not a normal C function and should therefore be
annotated as code.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Feb 2020 19:58:27 +0000 (19:58 +0000)]
arm64: entry: Annotate vector table and handlers as code
In an effort to clarify and simplify the annotation of assembly
functions new macros have been introduced. These replace ENTRY and
ENDPROC with two different annotations for normal functions and those
with unusual calling conventions. The vector table and handlers aren't
normal C style code so should be annotated as CODE.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Feb 2020 19:58:26 +0000 (19:58 +0000)]
arm64: crypto: Modernize names for AES function macros
Now that the rest of the code has been converted to the modern START/END
macros the AES_ENTRY() and AES_ENDPROC() macros look out of place and
like they need updating. Rename them to AES_FUNC_START() and AES_FUNC_END()
to line up with the modern style assembly macros.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Feb 2020 19:58:25 +0000 (19:58 +0000)]
arm64: crypto: Modernize some extra assembly annotations
A couple of functions were missed in the modernisation of assembly macros,
update them too.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
王程刚 [Mon, 9 Mar 2020 07:21:42 +0000 (15:21 +0800)]
arch/arm64: fix typo in a comment
Fix typo in a comment in arch/arm64/include/asm/esr.h
"Unallocted" -> "Unallocated"
Signed-off-by: Chenggang Wang <wangchenggang@vivo.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ionela Voinescu [Thu, 5 Mar 2020 09:06:27 +0000 (09:06 +0000)]
clocksource/drivers/arm_arch_timer: validate arch_timer_rate
Using an arch timer with a frequency of less than 1MHz can potentially
result in incorrect functionality in systems that assume a reasonable
rate of the arch timer of 1 to 50MHz, described as typical in the
architecture specification.
Therefore, warn if the arch timer rate is below 1MHz, which is
considered atypical and worth emphasizing.
Suggested-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ionela Voinescu [Thu, 5 Mar 2020 09:06:26 +0000 (09:06 +0000)]
arm64: use activity monitors for frequency invariance
The Frequency Invariance Engine (FIE) is providing a frequency
scaling correction factor that helps achieve more accurate
load-tracking.
So far, for arm and arm64 platforms, this scale factor has been
obtained based on the ratio between the current frequency and the
maximum supported frequency recorded by the cpufreq policy. The
setting of this scale factor is triggered from cpufreq drivers by
calling arch_set_freq_scale. The current frequency used in computation
is the frequency requested by a governor, but it may not be the
frequency that was implemented by the platform.
This correction factor can also be obtained using a core counter and a
constant counter to get information on the performance (frequency based
only) obtained in a period of time. This will more accurately reflect
the actual current frequency of the CPU, compared with the alternative
implementation that reflects the request of a performance level from
the OS.
Therefore, implement arch_scale_freq_tick to use activity monitors, if
present, for the computation of the frequency scale factor.
The use of AMU counters depends on:
- CONFIG_ARM64_AMU_EXTN - depents on the AMU extension being present
- CONFIG_CPU_FREQ - the current frequency obtained using counter
information is divided by the maximum frequency obtained from the
cpufreq policy.
While it is possible to have a combination of CPUs in the system with
and without support for activity monitors, the use of counters for
frequency invariance is only enabled for a CPU if all related CPUs
(CPUs in the same frequency domain) support and have enabled the core
and constant activity monitor counters. In this way, there is a clear
separation between the policies for which arch_set_freq_scale (cpufreq
based FIE) is used, and the policies for which arch_scale_freq_tick
(counter based FIE) is used to set the frequency scale factor. For
this purpose, a late_initcall_sync is registered to trigger validation
work for policies that will enable or disable the use of AMU counters
for frequency invariance. If CONFIG_CPU_FREQ is not defined, the use
of counters is enabled on all CPUs only if all possible CPUs correctly
support the necessary counters.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ionela Voinescu [Thu, 5 Mar 2020 09:06:25 +0000 (09:06 +0000)]
cpufreq: add function to get the hardware max frequency
Add weak function to return the hardware maximum frequency of a CPU,
with the default implementation returning cpuinfo.max_freq, which is
the best information we can generically get from the cpufreq framework.
The default can be overwritten by a strong function in platforms
that want to provide an alternative implementation, with more accurate
information, obtained either from hardware or firmware.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ionela Voinescu [Thu, 5 Mar 2020 09:06:24 +0000 (09:06 +0000)]
Documentation: arm64: document support for the AMU extension
The activity monitors extension is an optional extension introduced
by the ARMv8.4 CPU architecture.
Add initial documentation for the AMUv1 extension:
- arm64/amu.txt: AMUv1 documentation
- arm64/booting.txt: system registers initialisation
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ionela Voinescu [Thu, 5 Mar 2020 09:06:23 +0000 (09:06 +0000)]
arm64/kvm: disable access to AMU registers from kvm guests
Access to the AMU counters should be disabled by default in kvm guests,
as information from the counters might reveal activity in other guests
or activity on the host.
Therefore, disable access to AMU registers from EL0 and EL1 in kvm
guests by:
- Hiding the presence of the extension in the feature register
(SYS_ID_AA64PFR0_EL1) on the VCPU.
- Disabling access to the AMU registers before switching to the guest.
- Trapping accesses and injecting an undefined instruction into the
guest.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ionela Voinescu [Thu, 5 Mar 2020 09:06:22 +0000 (09:06 +0000)]
arm64: trap to EL1 accesses to AMU counters from EL0
The activity monitors extension is an optional extension introduced
by the ARMv8.4 CPU architecture. In order to access the activity
monitors counters safely, if desired, the kernel should detect the
presence of the extension through the feature register, and mediate
the access.
Therefore, disable direct accesses to activity monitors counters
from EL0 (userspace) and trap them to EL1 (kernel).
To be noted that the ARM64_AMU_EXTN kernel config does not have an
effect on this code. Given that the amuserenr_el0 resets to an
UNKNOWN value, setting the trap of EL0 accesses to EL1 is always
attempted for safety and security considerations. Therefore firmware
should still ensure accesses to AMU registers are not trapped in
EL2/EL3 as this code cannot be bypassed if the CPU implements the
Activity Monitors Unit.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ionela Voinescu [Thu, 5 Mar 2020 09:06:21 +0000 (09:06 +0000)]
arm64: add support for the AMU extension v1
The activity monitors extension is an optional extension introduced
by the ARMv8.4 CPU architecture. This implements basic support for
version 1 of the activity monitors architecture, AMUv1.
This support includes:
- Extension detection on each CPU (boot, secondary, hotplugged)
- Register interface for AMU aarch64 registers
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Remi Denis-Courmont [Wed, 4 Mar 2020 09:35:46 +0000 (11:35 +0200)]
arm64: remove gratuitious/stray .ltorg stanzas
There are no applicable literals above them.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Remi Denis-Courmont <remi.denis.courmont@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Anshuman Khandual [Wed, 4 Mar 2020 04:28:43 +0000 (09:58 +0530)]
arm64/mm: Enable memory hot remove
The arch code for hot-remove must tear down portions of the linear map and
vmemmap corresponding to memory being removed. In both cases the page
tables mapping these regions must be freed, and when sparse vmemmap is in
use the memory backing the vmemmap must also be freed.
This patch adds unmap_hotplug_range() and free_empty_tables() helpers which
can be used to tear down either region and calls it from vmemmap_free() and
___remove_pgd_mapping(). The free_mapped argument determines whether the
backing memory will be freed.
It makes two distinct passes over the kernel page table. In the first pass
with unmap_hotplug_range() it unmaps, invalidates applicable TLB cache and
frees backing memory if required (vmemmap) for each mapped leaf entry. In
the second pass with free_empty_tables() it looks for empty page table
sections whose page table page can be unmapped, TLB invalidated and freed.
While freeing intermediate level page table pages bail out if any of its
entries are still valid. This can happen for partially filled kernel page
table either from a previously attempted failed memory hot add or while
removing an address range which does not span the entire page table page
range.
The vmemmap region may share levels of table with the vmalloc region.
There can be conflicts between hot remove freeing page table pages with
a concurrent vmalloc() walking the kernel page table. This conflict can
not just be solved by taking the init_mm ptl because of existing locking
scheme in vmalloc(). So free_empty_tables() implements a floor and ceiling
method which is borrowed from user page table tear with free_pgd_range()
which skips freeing page table pages if intermediate address range is not
aligned or maximum floor-ceiling might not own the entire page table page.
Boot memory on arm64 cannot be removed. Hence this registers a new memory
hotplug notifier which prevents boot memory offlining and it's removal.
While here update arch_add_memory() to handle __add_pages() failures by
just unmapping recently added kernel linear mapping. Now enable memory hot
remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE.
This implementation is overall inspired from kernel page table tear down
procedure on X86 architecture and user page table tear down method.
[Mike and Catalin added P4D page table level support]
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Anshuman Khandual [Wed, 4 Mar 2020 04:28:42 +0000 (09:58 +0530)]
arm64/mm: Hold memory hotplug lock while walking for kernel page table dump
The arm64 page table dump code can race with concurrent modification of the
kernel page tables. When a leaf entries are modified concurrently, the dump
code may log stale or inconsistent information for a VA range, but this is
otherwise not harmful.
When intermediate levels of table are freed, the dump code will continue to
use memory which has been freed and potentially reallocated for another
purpose. In such cases, the dump code may dereference bogus addresses,
leading to a number of potential problems.
Intermediate levels of table may by freed during memory hot-remove,
which will be enabled by a subsequent patch. To avoid racing with
this, take the memory hotplug lock when walking the kernel page table.
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Robin Murphy [Fri, 21 Feb 2020 19:35:32 +0000 (19:35 +0000)]
arm64: perf: Support new DT compatibles
Add support for matching the new PMUs. For now, this just wires them up
as generic PMUv3 such that people writing DTs for new SoCs can do the
right thing, and at least have architectural and raw events be usable.
We can come back and fill in event maps for sysfs and/or perf tools at
a later date.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Robin Murphy [Fri, 21 Feb 2020 19:35:31 +0000 (19:35 +0000)]
arm64: perf: Refactor PMU init callbacks
The PMU init callbacks are already drowning in boilerplate, so before
doubling the number of supported PMU models, give it a sensible refactor
to significantly reduce the bloat, both in source and object code.
Although nobody uses non-default sysfs attributes today, there's minimal
impact to preserving the notion that maybe, some day, somebody might, so
we may as well keep up appearances.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
luanshi [Wed, 26 Feb 2020 04:25:06 +0000 (12:25 +0800)]
perf: arm_spe: Remove unnecessary zero check on 'nr_pages'
We already check that the 'nr_pages' is > 2, so there's no need to check
that it's != 0 later on.
Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Will Deacon <will@kernel.org>
Will Deacon [Fri, 28 Feb 2020 12:43:55 +0000 (12:43 +0000)]
arm64: Update comment for ASID() macro
Commit
25b92693a1b6 ("arm64: mm: convert cpu_do_switch_mm() to C") added
a new use of the ASID() macro, so update the comment in asm/mmu.h which
reasons about why an atomic reload of 'mm->context.id.counter' is not
required.
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Liguang Zhang [Fri, 21 Feb 2020 16:35:09 +0000 (16:35 +0000)]
firmware: arm_sdei: clean up sdei_event_create()
Function sdei_event_find() is always called in sdei_event_create(), but
it is already called in sdei_event_register(). This code is trying to
avoid a double-create of the same event, which can't happen as we still
hold the sdei_events_lock. We can remove this needless sdei_event_find()
call.
Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
[expanded commit message]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
James Morse [Fri, 21 Feb 2020 16:35:08 +0000 (16:35 +0000)]
firmware: arm_sdei: Use cpus_read_lock() to avoid races with cpuhp
SDEI has private events that need registering and enabling on each CPU.
CPUs can come and go while we are trying to do this. SDEI tries to avoid
these problems by setting the reregister flag before the register call,
so any CPUs that come online register the event too. Sticking plaster
like this doesn't work, as if the register call fails, a CPU that
subsequently comes online will register the event before reregister
is cleared.
Take cpus_read_lock() around the register and enable calls. We don't
want surprise CPUs to do the wrong thing if they race with these calls
failing.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Liguang Zhang [Fri, 21 Feb 2020 16:35:07 +0000 (16:35 +0000)]
firmware: arm_sdei: fix possible double-lock on hibernate error path
We call sdei_reregister_event() with sdei_list_lock held, if the register
fails we call sdei_event_destroy() which also acquires sdei_list_lock
thus creating A-A deadlock.
Add '_llocked' to sdei_reregister_event(), to indicate the list lock
is held, and add a _llocked variant of sdei_event_destroy().
Fixes:
da351827240e ("firmware: arm_sdei: Add support for CPU and system power states")
Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
[expanded subject, added wrappers instead of duplicating contents]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
James Morse [Fri, 21 Feb 2020 16:35:06 +0000 (16:35 +0000)]
firmware: arm_sdei: fix double-lock on hibernate with shared events
SDEI has private events that must be registered on each CPU. When
CPUs come and go they must re-register and re-enable their private
events. Each event has flags to indicate whether this should happen
to protect against an event being registered on a CPU coming online,
while all the others are unregistering the event.
These flags are protected by the sdei_list_lock spinlock, because
the cpuhp callbacks can't take the mutex.
Hibernate needs to unregister all events, but keep the in-memory
re-register and re-enable as they are. sdei_unregister_shared()
takes the spinlock to walk the list, then calls _sdei_event_unregister()
on each shared event. _sdei_event_unregister() tries to take the
same spinlock to update re-register and re-enable. This doesn't go
so well.
Push the re-register and re-enable updates out to their callers.
sdei_unregister_shared() doesn't want these values updated, so
doesn't need to do anything.
This also fixes shared events getting lost over hibernate as this
path made them look unregistered.
Fixes:
da351827240e ("firmware: arm_sdei: Add support for CPU and system power states")
Reported-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Rutland [Thu, 13 Feb 2020 12:14:52 +0000 (12:14 +0000)]
arm64: mm: convert cpu_do_switch_mm() to C
There's no reason that cpu_do_switch_mm() needs to be written as an
assembly function, and having it as a C function would make it easier to
maintain.
This patch converts cpu_do_switch_mm() to C, removing code that this
change makes redundant (e.g. the mmid macro). Since the header comment
was stale and the prototype now implies all the necessary information,
this comment is removed. The 'pgd_phys' argument is made a phys_addr_t
to match the return type of virt_to_phys().
At the same time, post_ttbr_update_workaround() is updated to use
IS_ENABLED(), which allows the compiler to figure out it can elide calls
for !CONFIG_CAVIUM_ERRATUM_27456 builds.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
[catalin.marinas@arm.com: change comments from asm-style to C-style]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Randy Dunlap [Sat, 1 Feb 2020 01:51:06 +0000 (17:51 -0800)]
arm64: fix NUMA Kconfig typos
Fix typos in arch/arm64/Kconfig:
- spell Numa as NUMA
- add hyphenation to Non-Uniform
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linus Torvalds [Mon, 24 Feb 2020 00:17:42 +0000 (16:17 -0800)]
Linux 5.6-rc3
Linus Torvalds [Sun, 23 Feb 2020 17:43:50 +0000 (09:43 -0800)]
Merge tag 'for-5.6-rc2-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"These are fixes that were found during testing with help of error
injection, plus some other stable material.
There's a fixup to patch added to rc1 causing locking in wrong context
warnings, tests found one more deadlock scenario. The patches are
tagged for stable, two of them now in the queue but we'd like all
three released at the same time.
I'm not happy about fixes to fixes in such a fast succession during
rcs, but I hope we found all the fallouts of commit
28553fa992cb
('Btrfs: fix race between shrinking truncate and fiemap')"
* tag 'for-5.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix deadlock during fast fsync when logging prealloc extents beyond eof
Btrfs: fix btrfs_wait_ordered_range() so that it waits for all ordered extents
btrfs: fix bytes_may_use underflow in prealloc error condtition
btrfs: handle logged extent failure properly
btrfs: do not check delayed items are empty for single transaction cleanup
btrfs: reset fs_root to NULL on error in open_ctree
btrfs: destroy qgroup extent records on transaction abort
Linus Torvalds [Sun, 23 Feb 2020 17:42:19 +0000 (09:42 -0800)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"More miscellaneous ext4 bug fixes (all stable fodder)"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix mount failure with quota configured as module
jbd2: fix ocfs2 corrupt when clearing block group bits
ext4: fix race between writepages and enabling EXT4_EXTENTS_FL
ext4: rename s_journal_flag_rwsem to s_writepages_rwsem
ext4: fix potential race between s_flex_groups online resizing and access
ext4: fix potential race between s_group_info online resizing and access
ext4: fix potential race between online resizing and write operations
ext4: add cond_resched() to __ext4_find_entry()
ext4: fix a data race in EXT4_I(inode)->i_disksize
Linus Torvalds [Sun, 23 Feb 2020 17:37:41 +0000 (09:37 -0800)]
Merge tag 'csky-for-linus-5.6-rc3' of git://github.com/c-sky/csky-linux
Pull csky updates from Guo Ren:
"Sorry, I missed 5.6-rc1 merge window, but in this pull request the
most are the fixes and the rests are between fixes and features. The
only outside modification is the MAINTAINERS file update with our
mailing list.
- cache flush implementation fixes
- ftrace modify panic fix
- CONFIG_SMP boot problem fix
- fix pt_regs saving for atomic.S
- fix fixaddr_init without highmem.
- fix stack protector support
- fix fake Tightly-Coupled Memory code compile and use
- fix some typos and coding convention"
* tag 'csky-for-linus-5.6-rc3' of git://github.com/c-sky/csky-linux: (23 commits)
csky: Replace <linux/clk-provider.h> by <linux/of_clk.h>
csky: Implement copy_thread_tls
csky: Add PCI support
csky: Minimize defconfig to support buildroot config.fragment
csky: Add setup_initrd check code
csky: Cleanup old Kconfig options
arch/csky: fix some Kconfig typos
csky: Fixup compile warning for three unimplemented syscalls
csky: Remove unused cache implementation
csky: Fixup ftrace modify panic
csky: Add flush_icache_mm to defer flush icache all
csky: Optimize abiv2 copy_to_user_page with VM_EXEC
csky: Enable defer flush_dcache_page for abiv2 cpus (807/810/860)
csky: Remove unnecessary flush_icache_* implementation
csky: Support icache flush without specific instructions
csky/Kconfig: Add Kconfig.platforms to support some drivers
csky/smp: Fixup boot failed when CONFIG_SMP
csky: Set regs->usp to kernel sp, when the exception is from kernel
csky/mm: Fixup export invalid_pte_table symbol
csky: Separate fixaddr_init from highmem
...
Geert Uytterhoeven [Wed, 12 Feb 2020 10:10:58 +0000 (11:10 +0100)]
csky: Replace <linux/clk-provider.h> by <linux/of_clk.h>
The C-Sky platform code is not a clock provider, and just needs to call
of_clk_init().
Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Linus Torvalds [Sun, 23 Feb 2020 02:02:10 +0000 (18:02 -0800)]
Merge tag 'ras-urgent-2020-02-22' of git://git./linux/kernel/git/tip/tip
Pull RAS fixes from Thomas Gleixner:
"Two fixes for the AMD MCE driver:
- Populate the per CPU MCA bank descriptor pointer only after it has
been completely set up to prevent a use-after-free in case that one
of the subsequent initialization step fails
- Implement a proper release function for the sysfs entries of MCA
threshold controls instead of freeing the memory right in the CPU
teardown code, which leads to another use-after-free when the
associated sysfs file is opened and accessed"
* tag 'ras-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce/amd: Fix kobject lifetime
x86/mce/amd: Publish the bank pointer only after setup has succeeded
Linus Torvalds [Sun, 23 Feb 2020 01:25:46 +0000 (17:25 -0800)]
Merge tag 'irq-urgent-2020-02-22' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Two fixes for the irq core code which are follow ups to the recent MSI
fixes:
- The WARN_ON which was put into the MSI setaffinity callback for
paranoia reasons actually triggered via a callchain which escaped
when all the possible ways to reach that code were analyzed.
The proc/irq/$N/*affinity interfaces have a quirk which came in
when ALPHA moved to the generic interface: In case that the written
affinity mask does not contain any online CPU it calls into ALPHAs
magic auto affinity setting code.
A few years later this mechanism was also made available to x86 for
no good reasons and in a way which circumvents all sanity checks
for interrupts which cannot have their affinity set from process
context on X86 due to the way the X86 interrupt delivery works.
It would be possible to make this work properly, but there is no
point in doing so. If the interrupt is not yet started then the
affinity setting has no effect and if it is started already then it
is already assigned to an online CPU so there is no point to
randomly move it to some other CPU. Just return EINVAL as the code
has done before that change forever.
- The new MSI quirk bit in the irq domain flags turned out to be
already occupied, which escaped the author and the reviewers
because the already in use bits were 0,6,2,3,4,5 listed in that
order.
That bit 6 was simply overlooked because the ordering was straight
forward linear otherwise. So the new bit ended up being a
duplicate.
Fix it up by switching the oddball 6 to the obvious 1"
* tag 'irq-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/irqdomain: Make sure all irq domain flags are distinct
genirq/proc: Reject invalid affinity masks (again)
Linus Torvalds [Sun, 23 Feb 2020 01:08:16 +0000 (17:08 -0800)]
Merge tag 'x86-urgent-2020-02-22' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Two fixes for x86:
- Remove the __force_oder definiton from the kaslr boot code as it is
already defined in the page table code which makes GCC 10 builds
fail because it changed the default to -fno-common.
- Address the AMD erratum 1054 concerning the IRPERF capability and
enable the Instructions Retired fixed counter on machines which are
not affected by the erratum"
* tag 'x86-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF
x86/boot/compressed: Don't declare __force_order in kaslr_64.c
Linus Torvalds [Sat, 22 Feb 2020 19:38:20 +0000 (11:38 -0800)]
Merge tag 'zonefs-5.6-rc3' of git://git./linux/kernel/git/dlemoal/zonefs
Pull zonefs fix from Damien Le Moal:
"A single patch fixing typos in the documentation file"
* tag 'zonefs-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: fix documentation typos etc.
Linus Torvalds [Sat, 22 Feb 2020 19:12:55 +0000 (11:12 -0800)]
Merge tag 'io_uring-5.6-2020-02-22' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Here's a small collection of fixes that were queued up:
- Remove unnecessary NULL check (Dan)
- Missing io_req_cancelled() call in fallocate (Pavel)
- Put the cleanup check for aux data in the right spot (Pavel)
- Two fixes for SQPOLL (Stefano, Xiaoguang)"
* tag 'io_uring-5.6-2020-02-22' of git://git.kernel.dk/linux-block:
io_uring: fix __io_iopoll_check deadlock in io_sq_thread
io_uring: prevent sq_thread from spinning when it should stop
io_uring: fix use-after-free by io_cleanup_req()
io_uring: remove unnecessary NULL checks
io_uring: add missing io_req_cancelled()
Linus Torvalds [Sat, 22 Feb 2020 19:09:06 +0000 (11:09 -0800)]
Merge tag 'block-5.6-2020-02-22' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Just a set of NVMe fixes via Keith"
* tag 'block-5.6-2020-02-22' of git://git.kernel.dk/linux-block:
nvme-multipath: Fix memory leak with ana_log_buf
nvme: Fix uninitialized-variable warning
nvme-pci: Use single IRQ vector for old Apple models
nvme/pci: Add sleep quirk for Samsung and Toshiba drives
Linus Torvalds [Sat, 22 Feb 2020 19:00:52 +0000 (11:00 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four non-core fixes.
Two are reverts of target fixes which turned out to have unwanted side
effects, one is a revert of an RDMA fix with the same problem and the
final one fixes an incorrect warning about memory allocation failures
in megaraid_sas (the driver actually reduces the allocation size until
it succeeds)"
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: Revert "target: iscsi: Wait for all commands to finish before freeing a session"
scsi: Revert "RDMA/isert: Fix a recently introduced regression related to logout"
scsi: megaraid_sas: silence a warning
scsi: Revert "target/core: Inline transport_lun_remove_cmd()"
Linus Torvalds [Sat, 22 Feb 2020 18:52:54 +0000 (10:52 -0800)]
Merge tag 'hwmon-for-v5.6-rc3' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix crash in w83627ehf driver seen with W83627DHG-P
- Fix lockdep splat in acpi_power_meter driver
- Fix xdpe12284 documentation Sphinx warnings
* tag 'hwmon-for-v5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (w83627ehf) Fix crash seen with W83627DHG-P
hwmon: (acpi_power_meter) Fix lockdep splat
Documentation/hwmon: fix xdpe12284 Sphinx warnings
Linus Torvalds [Sat, 22 Feb 2020 18:49:59 +0000 (10:49 -0800)]
Merge tag 'devicetree-fixes-for-5.6-2' of git://git./linux/kernel/git/robh/linux
Pull devicetree fixes deom Rob Herring:
"A handful of fixes in DT bindings for MDIO bus, Allwinner CSI, OMAP
HSMMC, and Tegra124 EMC"
* tag 'devicetree-fixes-for-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: media: csi: Fix clocks description
dt-bindings: media: csi: Add interconnects properties
dt-bindings: net: mdio: remove compatible string from example
dt-bindings: memory-controller: Update example for Tegra124 EMC
dt-bindings: mmc: omap-hsmmc: Fix SDIO interrupt
Linus Torvalds [Sat, 22 Feb 2020 18:43:41 +0000 (10:43 -0800)]
Merge tag 's390-5.6-4' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Remove ieee_emulation_warnings sysctl which is a dead code.
- Avoid triggering rebuild of the kernel during make install.
- Enable protected virtualization guest support in default configs.
- Fix cio_ignore seq_file .next function to increase position index.
And use kobj_to_dev instead of container_of in cio code.
- Fix storage block address lists to contain absolute addresses in qdio
code.
- Few clang warnings and spelling fixes.
* tag 's390-5.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/qdio: fill SBALEs with absolute addresses
s390/qdio: fill SL with absolute addresses
s390: remove obsolete ieee_emulation_warnings
s390: make 'install' not depend on vmlinux
s390/kaslr: Fix casts in get_random
s390/mm: Explicitly compare PAGE_DEFAULT_KEY against zero in storage_key_init_range
s390/pkey/zcrypt: spelling s/crytp/crypt/
s390/cio: use kobj_to_dev() API
s390/defconfig: enable CONFIG_PROTECTED_VIRTUALIZATION_GUEST
s390/cio: cio_ignore_proc_seq_next should increase position index
Xiaoguang Wang [Sat, 22 Feb 2020 06:46:05 +0000 (14:46 +0800)]
io_uring: fix __io_iopoll_check deadlock in io_sq_thread
Since commit
a3a0e43fd770 ("io_uring: don't enter poll loop if we have
CQEs pending"), if we already events pending, we won't enter poll loop.
In case SETUP_IOPOLL and SETUP_SQPOLL are both enabled, if app has
been terminated and don't reap pending events which are already in cq
ring, and there are some reqs in poll_list, io_sq_thread will enter
__io_iopoll_check(), and find pending events, then return, this loop
will never have a chance to exit.
I have seen this issue in fio stress tests, to fix this issue, let
io_sq_thread call io_iopoll_getevents() with argument 'min' being zero,
and remove __io_iopoll_check().
Fixes:
a3a0e43fd770 ("io_uring: don't enter poll loop if we have CQEs pending")
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Fri, 21 Feb 2020 10:08:35 +0000 (11:08 +0100)]
ext4: fix mount failure with quota configured as module
When CONFIG_QFMT_V2 is configured as a module, the test in
ext4_feature_set_ok() fails and so mount of filesystems with quota or
project features fails. Fix the test to use IS_ENABLED macro which
works properly even for modules.
Link: https://lore.kernel.org/r/20200221100835.9332-1-jack@suse.cz
Fixes:
d65d87a07476 ("ext4: improve explanation of a mount failure caused by a misconfigured kernel")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
wangyan [Thu, 20 Feb 2020 13:46:14 +0000 (21:46 +0800)]
jbd2: fix ocfs2 corrupt when clearing block group bits
I found a NULL pointer dereference in ocfs2_block_group_clear_bits().
The running environment:
kernel version: 4.19
A cluster with two nodes, 5 luns mounted on two nodes, and do some
file operations like dd/fallocate/truncate/rm on every lun with storage
network disconnection.
The fallocate operation on dm-23-45 caused an null pointer dereference.
The information of NULL pointer dereference as follows:
[577992.878282] JBD2: Error -5 detected when updating journal superblock for dm-23-45.
[577992.878290] Aborting journal on device dm-23-45.
...
[577992.890778] JBD2: Error -5 detected when updating journal superblock for dm-24-46.
[577992.890908] __journal_remove_journal_head: freeing b_committed_data
[577992.890916] (fallocate,88392,52):ocfs2_extend_trans:474 ERROR: status = -30
[577992.890918] __journal_remove_journal_head: freeing b_committed_data
[577992.890920] (fallocate,88392,52):ocfs2_rotate_tree_right:2500 ERROR: status = -30
[577992.890922] __journal_remove_journal_head: freeing b_committed_data
[577992.890924] (fallocate,88392,52):ocfs2_do_insert_extent:4382 ERROR: status = -30
[577992.890928] (fallocate,88392,52):ocfs2_insert_extent:4842 ERROR: status = -30
[577992.890928] __journal_remove_journal_head: freeing b_committed_data
[577992.890930] (fallocate,88392,52):ocfs2_add_clusters_in_btree:4947 ERROR: status = -30
[577992.890933] __journal_remove_journal_head: freeing b_committed_data
[577992.890939] __journal_remove_journal_head: freeing b_committed_data
[577992.890949] Unable to handle kernel NULL pointer dereference at virtual address
0000000000000020
[577992.890950] Mem abort info:
[577992.890951] ESR = 0x96000004
[577992.890952] Exception class = DABT (current EL), IL = 32 bits
[577992.890952] SET = 0, FnV = 0
[577992.890953] EA = 0, S1PTW = 0
[577992.890954] Data abort info:
[577992.890955] ISV = 0, ISS = 0x00000004
[577992.890956] CM = 0, WnR = 0
[577992.890958] user pgtable: 4k pages, 48-bit VAs, pgdp =
00000000f8da07a9
[577992.890960] [
0000000000000020] pgd=
0000000000000000
[577992.890964] Internal error: Oops:
96000004 [#1] SMP
[577992.890965] Process fallocate (pid: 88392, stack limit = 0x00000000013db2fd)
[577992.890968] CPU: 52 PID: 88392 Comm: fallocate Kdump: loaded Tainted: G W OE 4.19.36 #1
[577992.890969] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
[577992.890971] pstate:
60400009 (nZCv daif +PAN -UAO)
[577992.891054] pc : _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
[577992.891082] lr : _ocfs2_free_suballoc_bits+0x618/0x968 [ocfs2]
[577992.891084] sp :
ffff0000c8e2b810
[577992.891085] x29:
ffff0000c8e2b820 x28:
0000000000000000
[577992.891087] x27:
00000000000006f3 x26:
ffffa07957b02e70
[577992.891089] x25:
ffff807c59d50000 x24:
00000000000006f2
[577992.891091] x23:
0000000000000001 x22:
ffff807bd39abc30
[577992.891093] x21:
ffff0000811d9000 x20:
ffffa07535d6a000
[577992.891097] x19:
ffff000001681638 x18:
ffffffffffffffff
[577992.891098] x17:
0000000000000000 x16:
ffff000080a03df0
[577992.891100] x15:
ffff0000811d9708 x14:
203d207375746174
[577992.891101] x13:
73203a524f525245 x12:
20373439343a6565
[577992.891103] x11:
0000000000000038 x10:
0101010101010101
[577992.891106] x9 :
ffffa07c68a85d70 x8 :
7f7f7f7f7f7f7f7f
[577992.891109] x7 :
0000000000000000 x6 :
0000000000000080
[577992.891110] x5 :
0000000000000000 x4 :
0000000000000002
[577992.891112] x3 :
ffff000001713390 x2 :
2ff90f88b1c22f00
[577992.891114] x1 :
ffff807bd39abc30 x0 :
0000000000000000
[577992.891116] Call trace:
[577992.891139] _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
[577992.891162] _ocfs2_free_clusters+0x100/0x290 [ocfs2]
[577992.891185] ocfs2_free_clusters+0x50/0x68 [ocfs2]
[577992.891206] ocfs2_add_clusters_in_btree+0x198/0x5e0 [ocfs2]
[577992.891227] ocfs2_add_inode_data+0x94/0xc8 [ocfs2]
[577992.891248] ocfs2_extend_allocation+0x1bc/0x7a8 [ocfs2]
[577992.891269] ocfs2_allocate_extents+0x14c/0x338 [ocfs2]
[577992.891290] __ocfs2_change_file_space+0x3f8/0x610 [ocfs2]
[577992.891309] ocfs2_fallocate+0xe4/0x128 [ocfs2]
[577992.891316] vfs_fallocate+0x11c/0x250
[577992.891317] ksys_fallocate+0x54/0x88
[577992.891319] __arm64_sys_fallocate+0x28/0x38
[577992.891323] el0_svc_common+0x78/0x130
[577992.891325] el0_svc_handler+0x38/0x78
[577992.891327] el0_svc+0x8/0xc
My analysis process as follows:
ocfs2_fallocate
__ocfs2_change_file_space
ocfs2_allocate_extents
ocfs2_extend_allocation
ocfs2_add_inode_data
ocfs2_add_clusters_in_btree
ocfs2_insert_extent
ocfs2_do_insert_extent
ocfs2_rotate_tree_right
ocfs2_extend_rotate_transaction
ocfs2_extend_trans
jbd2_journal_restart
jbd2__journal_restart
/* handle->h_transaction is NULL,
* is_handle_aborted(handle) is true
*/
handle->h_transaction = NULL;
start_this_handle
return -EROFS;
ocfs2_free_clusters
_ocfs2_free_clusters
_ocfs2_free_suballoc_bits
ocfs2_block_group_clear_bits
ocfs2_journal_access_gd
__ocfs2_journal_access
jbd2_journal_get_undo_access
/* I think jbd2_write_access_granted() will
* return true, because do_get_write_access()
* will return -EROFS.
*/
if (jbd2_write_access_granted(...)) return 0;
do_get_write_access
/* handle->h_transaction is NULL, it will
* return -EROFS here, so do_get_write_access()
* was not called.
*/
if (is_handle_aborted(handle)) return -EROFS;
/* bh2jh(group_bh) is NULL, caused NULL
pointer dereference */
undo_bg = (struct ocfs2_group_desc *)
bh2jh(group_bh)->b_committed_data;
If handle->h_transaction == NULL, then jbd2_write_access_granted()
does not really guarantee that journal_head will stay around,
not even speaking of its b_committed_data. The bh2jh(group_bh)
can be removed after ocfs2_journal_access_gd() and before call
"bh2jh(group_bh)->b_committed_data". So, we should move
is_handle_aborted() check from do_get_write_access() into
jbd2_journal_get_undo_access() and jbd2_journal_get_write_access()
before the call to jbd2_write_access_granted().
Link: https://lore.kernel.org/r/f72a623f-b3f1-381a-d91d-d22a1c83a336@huawei.com
Signed-off-by: Yan Wang <wangyan122@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Eric Biggers [Wed, 19 Feb 2020 18:30:47 +0000 (10:30 -0800)]
ext4: fix race between writepages and enabling EXT4_EXTENTS_FL
If EXT4_EXTENTS_FL is set on an inode while ext4_writepages() is running
on it, the following warning in ext4_add_complete_io() can be hit:
WARNING: CPU: 1 PID: 0 at fs/ext4/page-io.c:234 ext4_put_io_end_defer+0xf0/0x120
Here's a minimal reproducer (not 100% reliable) (root isn't required):
while true; do
sync
done &
while true; do
rm -f file
touch file
chattr -e file
echo X >> file
chattr +e file
done
The problem is that in ext4_writepages(), ext4_should_dioread_nolock()
(which only returns true on extent-based files) is checked once to set
the number of reserved journal credits, and also again later to select
the flags for ext4_map_blocks() and copy the reserved journal handle to
ext4_io_end::handle. But if EXT4_EXTENTS_FL is being concurrently set,
the first check can see dioread_nolock disabled while the later one can
see it enabled, causing the reserved handle to unexpectedly be NULL.
Since changing EXT4_EXTENTS_FL is uncommon, and there may be other races
related to doing so as well, fix this by synchronizing changing
EXT4_EXTENTS_FL with ext4_writepages() via the existing
s_writepages_rwsem (previously called s_journal_flag_rwsem).
This was originally reported by syzbot without a reproducer at
https://syzkaller.appspot.com/bug?extid=
2202a584a00fffd19fbf,
but now that dioread_nolock is the default I also started seeing this
when running syzkaller locally.
Link: https://lore.kernel.org/r/20200219183047.47417-3-ebiggers@kernel.org
Reported-by: syzbot+2202a584a00fffd19fbf@syzkaller.appspotmail.com
Fixes:
6b523df4fb5a ("ext4: use transaction reservation for extent conversion in ext4_end_io")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Eric Biggers [Wed, 19 Feb 2020 18:30:46 +0000 (10:30 -0800)]
ext4: rename s_journal_flag_rwsem to s_writepages_rwsem
In preparation for making s_journal_flag_rwsem synchronize
ext4_writepages() with changes to both the EXTENTS and JOURNAL_DATA
flags (rather than just JOURNAL_DATA as it does currently), rename it to
s_writepages_rwsem.
Link: https://lore.kernel.org/r/20200219183047.47417-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Suraj Jitindar Singh [Wed, 19 Feb 2020 03:08:51 +0000 (19:08 -0800)]
ext4: fix potential race between s_flex_groups online resizing and access
During an online resize an array of s_flex_groups structures gets replaced
so it can get enlarged. If there is a concurrent access to the array and
this memory has been reused then this can lead to an invalid memory access.
The s_flex_group array has been converted into an array of pointers rather
than an array of structures. This is to ensure that the information
contained in the structures cannot get out of sync during a resize due to
an accessor updating the value in the old structure after it has been
copied but before the array pointer is updated. Since the structures them-
selves are no longer copied but only the pointers to them this case is
mitigated.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
Link: https://lore.kernel.org/r/20200221053458.730016-4-tytso@mit.edu
Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Linus Torvalds [Sat, 22 Feb 2020 00:10:10 +0000 (16:10 -0800)]
Merge tag 'for-linus-5.6-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Two small fixes for Xen:
- a fix to avoid warnings with new gcc
- a fix for incorrectly disabled interrupts when calling
_cond_resched()"
* tag 'for-linus-5.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Enable interrupts when calling _cond_resched()
x86/xen: Distribute switch variables for initialization
Linus Torvalds [Sat, 22 Feb 2020 00:03:36 +0000 (16:03 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"It's all straightforward apart from the changes to mmap()/mremap() in
relation to their handling of address arguments from userspace with
non-zero tag bits in the upper byte.
The change to brk() is necessary to fix a nasty user-visible
regression in malloc(), but we tightened up mmap() and mremap() at the
same time because they also allow the user to create virtual aliases
by accident. It's much less likely than brk() to matter in practice,
but enforcing the principle of "don't permit the creation of mappings
using tagged addresses" leads to a straightforward ABI without having
to worry about the "but what if a crazy program did foo?" aspect of
things.
Summary:
- Fix regression in malloc() caused by ignored address tags in brk()
- Add missing brackets around argument to untagged_addr() macro
- Fix clang build when using binutils assembler
- Fix silly typo in virtual memory map documentation"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
mm: Avoid creating virtual address aliases in brk()/mmap()/mremap()
docs: arm64: fix trivial spelling enought to enough in memory.rst
arm64: memory: Add missing brackets to untagged_addr() macro
arm64: lse: Fix LSE atomics with LLVM
Linus Torvalds [Fri, 21 Feb 2020 23:57:56 +0000 (15:57 -0800)]
Merge tag 'powerpc-5.6-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.6. This is two weeks worth as I was out
sick last week:
- Three fixes for the recently added VMAP_STACK on 32-bit.
- Three fixes related to hugepages on 8xx (32-bit).
- A fix for a bug in our transactional memory handling that could
lead to a kernel crash if we saw a page fault during signal
delivery.
- A fix for a deadlock in our PCI EEH (Enhanced Error Handling) code.
- A couple of other minor fixes.
Thanks to: Christophe Leroy, Erhard F, Frederic Barrat, Gustavo Luiz
Duarte, Larry Finger, Leonardo Bras, Oliver O'Halloran, Sam Bobroff"
* tag 'powerpc-5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/entry: Fix an #if which should be an #ifdef in entry_32.S
powerpc/xmon: Fix whitespace handling in getstring()
powerpc/6xx: Fix power_save_ppc32_restore() with CONFIG_VMAP_STACK
powerpc/chrp: Fix enter_rtas() with CONFIG_VMAP_STACK
powerpc/32s: Fix DSI and ISI exceptions for CONFIG_VMAP_STACK
powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery
powerpc/8xx: Fix clearing of bits 20-23 in ITLB miss
powerpc/hugetlb: Fix 8M hugepages on 8xx
powerpc/hugetlb: Fix 512k hugepages on 8xx with 16k page size
powerpc/eeh: Fix deadlock handling dead PHB
Linus Torvalds [Fri, 21 Feb 2020 21:02:49 +0000 (13:02 -0800)]
Merge tag 'linux-watchdog-5.6-rc3' of git://linux-watchdog.org/linux-watchdog
Pull watchdog fixes from Wim Van Sebroeck:
- mtk_wdt needs RESET_CONTROLLER to build
- da9062 driver fixes:
- fix power management ops
- do not ping the hw during stop()
- add dependency on I2C
* tag 'linux-watchdog-5.6-rc3' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: da9062: Add dependency on I2C
watchdog: da9062: fix power management ops
watchdog: da9062: do not ping the hw during stop()
watchdog: fix mtk_wdt.c RESET_CONTROLLER build error
Linus Torvalds [Fri, 21 Feb 2020 20:57:05 +0000 (12:57 -0800)]
Merge tag 'char-misc-5.6-rc3' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for 5.6-rc3.
Also included in here are some updates for some documentation files
that I seem to be maintaining these days.
The driver fixes are:
- small fixes for the habanalabs driver
- fsi driver bugfix
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Documentation/process: Swap out the ambassador for Canonical
habanalabs: patched cb equals user cb in device memset
habanalabs: do not halt CoreSight during hard reset
habanalabs: halt the engines before hard-reset
MAINTAINERS: remove unnecessary ':' characters
fsi: aspeed: add unspecified HAS_IOMEM dependency
COPYING: state that all contributions really are covered by this file
Documentation/process: Change Microsoft contact for embargoed hardware issues
embargoed-hardware-issues: drop Amazon contact as the email address now bounces
Documentation/process: Add Arm contact for embargoed HW issues
Linus Torvalds [Fri, 21 Feb 2020 20:53:53 +0000 (12:53 -0800)]
Merge tag 'staging-5.6-rc3' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for 5.6-rc3, along with the
removal of an unused/unneeded driver as well.
The android vsoc driver is not needed anymore by anyone, so it was
removed.
The other driver fixes are:
- ashmem bugfixes
- greybus audio driver bugfix
- wireless driver bugfixes and tiny cleanups to error paths
All of these have been in linux-next for a while now with no reported
issues"
* tag 'staging-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8723bs: Remove unneeded goto statements
staging: rtl8188eu: Remove some unneeded goto statements
staging: rtl8723bs: Fix potential overuse of kernel memory
staging: rtl8188eu: Fix potential overuse of kernel memory
staging: rtl8723bs: Fix potential security hole
staging: rtl8188eu: Fix potential security hole
staging: greybus: use after free in gb_audio_manager_remove_all()
staging: android: Delete the 'vsoc' driver
staging: rtl8723bs: fix copy of overlapping memory
staging: android: ashmem: Disallow ashmem memory from being remapped
staging: vt6656: fix sign of rx_dbm to bb_pre_ed_rssi.
Linus Torvalds [Fri, 21 Feb 2020 20:48:29 +0000 (12:48 -0800)]
Merge tag 'tty-5.6-rc3' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are a number of small tty and serial driver fixes for 5.6-rc3
that resolve a bunch of reported issues.
They are:
- vt selection and ioctl fixes
- serdev bugfix
- atmel serial driver fixes
- qcom serial driver fixes
- other minor serial driver fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
vt: selection, close sel_buffer race
vt: selection, handle pending signals in paste_selection
serial: cpm_uart: call cpm_muram_init before registering console
tty: serial: qcom_geni_serial: Fix RX cancel command failure
serial: 8250: Check UPF_IRQ_SHARED in advance
tty: serial: imx: setup the correct sg entry for tx dma
vt: vt_ioctl: fix race in VT_RESIZEX
vt: fix scrollback flushing on background consoles
tty: serial: tegra: Handle RX transfer in PIO mode if DMA wasn't started
tty/serial: atmel: manage shutdown in case of RS485 or ISO7816 mode
serdev: ttyport: restore client ops on deregistration
serial: ar933x_uart: set UART_CS_{RX,TX}_READY_ORIDE
Linus Torvalds [Fri, 21 Feb 2020 20:44:53 +0000 (12:44 -0800)]
Merge tag 'usb-5.6-rc3' of git://git./linux/kernel/git/gregkh/usb
Pull USB/Thunderbolt fixes from Greg KH:
"Here are a number of small USB driver fixes for 5.6-rc3.
Included in here are:
- MAINTAINER file updates
- USB gadget driver fixes
- usb core quirk additions and fixes for regressions
- xhci driver fixes
- usb serial driver id additions and fixes
- thunderbolt bugfix
Thunderbolt patches come in through here now that USB4 is really
thunderbolt.
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (34 commits)
USB: misc: iowarrior: add support for the 100 device
thunderbolt: Prevent crash if non-active NVMem file is read
usb: gadget: udc-xilinx: Fix xudc_stop() kernel-doc format
USB: misc: iowarrior: add support for the 28 and 28L devices
USB: misc: iowarrior: add support for 2 OEMed devices
USB: Fix novation SourceControl XL after suspend
xhci: Fix memory leak when caching protocol extended capability PSI tables - take 2
Revert "xhci: Fix memory leak when caching protocol extended capability PSI tables"
MAINTAINERS: Sort entries in database for THUNDERBOLT
usb: dwc3: debug: fix string position formatting mixup with ret and len
usb: gadget: serial: fix Tx stall after buffer overflow
usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flags
usb: dwc2: Fix SET/CLEAR_FEATURE and GET_STATUS flows
usb: dwc2: Fix in ISOC request length checking
usb: gadget: composite: Support more than 500mA MaxPower
usb: gadget: composite: Fix bMaxPower for SuperSpeedPlus
usb: gadget: u_audio: Fix high-speed max packet size
usb: dwc3: gadget: Check for IOC/LST bit in TRB->ctrl fields
USB: core: clean up endpoint-descriptor parsing
USB: quirks: blacklist duplicate ep on Sound Devices USBPre2
...
Linus Torvalds [Fri, 21 Feb 2020 20:18:02 +0000 (12:18 -0800)]
Merge tag 'drm-fixes-2020-02-21' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Varied fixes for rc3.
i915 is the largest, they are seeing some ACPI problems with their CI
which hopefully get solved soon [1].
msm has a bunch of fixes for new hw added in the merge, a bunch of
amdgpu fixes, and nouveau adds support for some new firmwares for
turing tu11x GPUs that were just released into linux-firmware by
nvidia, they operate the same as the ones we already have for tu10x so
should be fine to hook up.
Otherwise it's just misc fixes for panfrost and sun4i.
core:
- Allow only one rotation argument, and allow zero rotation in video
cmdline.
i915:
- Workaround missing Display Stream Compression (DSC) state readout
by forcing modeset when its enabled at probe
- Fix EHL port clock voltage level requirements
- Fix queuing retire workers on the virtual engine
- Fix use of partially initialized waiters
- Stop using drm_pci_alloc/drm_pci/free
- Fix rewind of RING_TAIL by forcing a context reload
- Fix locking on resetting ring->head
- Propagate our bug filing URL change to stable kernels
panfrost:
- Small compiler warning fix for panfrost.
- Fix when using performance counters in panfrost when using per fd
address space.
sun4xi:
- Fix dt binding
nouveau:
- tu11x modesetting fix
- ACR/GR firmware support for tu11x (fw is public now)
msm:
- fix UBWC on GPU and display side for sc7180
- fix DSI suspend/resume issue encountered on sc7180
- fix some breakage on so called "linux-android" devices
(fallout from sc7180/a618 support, not seen earlier due to
bootloader/firmware differences)
- couple other misc fixes
amdgpu:
- HDCP fixes
- xclk fix for raven
- GFXOFF fixes"
[1] The Intel suspend testing should now be fixed by commit
63fb9623427f
("ACPI: PM: s2idle: Check fixed wakeup events in acpi_s2idle_wake()")
* tag 'drm-fixes-2020-02-21' of git://anongit.freedesktop.org/drm/drm: (39 commits)
drm/amdgpu/display: clean up hdcp workqueue handling
drm/amdgpu: add is_raven_kicker judgement for raven1
drm/i915/gt: Avoid resetting ring->head outside of its timeline mutex
drm/i915/execlists: Always force a context reload when rewinding RING_TAIL
drm/i915: Wean off drm_pci_alloc/drm_pci_free
drm/i915/gt: Protect defer_request() from new waiters
drm/i915/gt: Prevent queuing retire workers on the virtual engine
drm/i915/dsc: force full modeset whenever DSC is enabled at probe
drm/i915/ehl: Update port clock voltage level requirements
drm/i915: Update drm/i915 bug filing URL
MAINTAINERS: Update drm/i915 bug filing URL
drm/i915: Initialise basic fence before acquiring seqno
drm/i915/gem: Require per-engine reset support for non-persistent contexts
drm/nouveau/kms/gv100-: Re-set LUT after clearing for modesets
drm/nouveau/gr/tu11x: initial support
drm/nouveau/acr/tu11x: initial support
drm/amdgpu/gfx10: disable gfxoff when reading rlc clock
drm/amdgpu/gfx9: disable gfxoff when reading rlc clock
drm/amdgpu/soc15: fix xclk for raven
drm/amd/powerplay: always refetch the enabled features status on dpm enablement
...
Linus Torvalds [Fri, 21 Feb 2020 19:59:51 +0000 (11:59 -0800)]
Merge git://git./linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Limit xt_hashlimit hash table size to avoid OOM or hung tasks, from
Cong Wang.
2) Fix deadlock in xsk by publishing global consumer pointers when NAPI
is finished, from Magnus Karlsson.
3) Set table field properly to RT_TABLE_COMPAT when necessary, from
Jethro Beekman.
4) NLA_STRING attributes are not necessary NULL terminated, deal wiht
that in IFLA_ALT_IFNAME. From Eric Dumazet.
5) Fix checksum handling in atlantic driver, from Dmitry Bezrukov.
6) Handle mtu==0 devices properly in wireguard, from Jason A.
Donenfeld.
7) Fix several lockdep warnings in bonding, from Taehee Yoo.
8) Fix cls_flower port blocking, from Jason Baron.
9) Sanitize internal map names in libbpf, from Toke Høiland-Jørgensen.
10) Fix RDMA race in qede driver, from Michal Kalderon.
11) Fix several false lockdep warnings by adding conditions to
list_for_each_entry_rcu(), from Madhuparna Bhowmik.
12) Fix sleep in atomic in mlx5 driver, from Huy Nguyen.
13) Fix potential deadlock in bpf_map_do_batch(), from Yonghong Song.
14) Hey, variables declared in switch statement before any case
statements are not initialized. I learn something every day. Get
rids of this stuff in several parts of the networking, from Kees
Cook.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits)
bnxt_en: Issue PCIe FLR in kdump kernel to cleanup pending DMAs.
bnxt_en: Improve device shutdown method.
net: netlink: cap max groups which will be considered in netlink_bind()
net: thunderx: workaround BGX TX Underflow issue
ionic: fix fw_status read
net: disable BRIDGE_NETFILTER by default
net: macb: Properly handle phylink on at91rm9200
s390/qeth: fix off-by-one in RX copybreak check
s390/qeth: don't warn for napi with 0 budget
s390/qeth: vnicc Fix EOPNOTSUPP precedence
openvswitch: Distribute switch variables for initialization
net: ip6_gre: Distribute switch variables for initialization
net: core: Distribute switch variables for initialization
udp: rehash on disconnect
net/tls: Fix to avoid gettig invalid tls record
bpf: Fix a potential deadlock with bpf_map_do_batch
bpf: Do not grab the bucket spinlock by default on htab batch ops
ice: Wait for VF to be reset/ready before configuration
ice: Don't tell the OS that link is going down
ice: Don't reject odd values of usecs set by user
...
Linus Torvalds [Fri, 21 Feb 2020 19:40:10 +0000 (11:40 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
- A few y2038 fixes which missed the merge window while dependencies
in NFS were being sorted out.
- A bunch of fixes. Some minor, some not.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
MAINTAINERS: use tabs for SAFESETID
lib/stackdepot.c: fix global out-of-bounds in stack_slabs
mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM
mm/vmscan.c: don't round up scan size for online memory cgroup
lib/string.c: update match_string() doc-strings with correct behavior
mm/memcontrol.c: lost css_put in memcg_expand_shrinker_maps()
mm/swapfile.c: fix a comment in sys_swapon()
scripts/get_maintainer.pl: deprioritize old Fixes: addresses
get_maintainer: remove uses of P: for maintainer name
selftests/vm: add missed tests in run_vmtests
include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for swap
Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"
y2038: hide timeval/timespec/itimerval/itimerspec types
y2038: remove unused time32 interfaces
y2038: remove ktime to/from timespec/timeval conversion
Randy Dunlap [Fri, 21 Feb 2020 04:04:33 +0000 (20:04 -0800)]
MAINTAINERS: use tabs for SAFESETID
Use tabs for indentation instead of spaces for SAFESETID. All (!) other
entries in MAINTAINERS use tabs (according to my simple grepping).
Link: http://lkml.kernel.org/r/2bb2e52a-2694-816d-57b4-6cabfadd6c1a@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Micah Morton <mortonm@chromium.org>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander Potapenko [Fri, 21 Feb 2020 04:04:30 +0000 (20:04 -0800)]
lib/stackdepot.c: fix global out-of-bounds in stack_slabs
Walter Wu has reported a potential case in which init_stack_slab() is
called after stack_slabs[STACK_ALLOC_MAX_SLABS - 1] has already been
initialized. In that case init_stack_slab() will overwrite
stack_slabs[STACK_ALLOC_MAX_SLABS], which may result in a memory
corruption.
Link: http://lkml.kernel.org/r/20200218102950.260263-1-glider@google.com
Fixes:
cd11016e5f521 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: Walter Wu <walter-zh.wu@mediatek.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wei Yang [Fri, 21 Feb 2020 04:04:27 +0000 (20:04 -0800)]
mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM
When we use SPARSEMEM instead of SPARSEMEM_VMEMMAP, pfn_to_page()
doesn't work before sparse_init_one_section() is called.
This leads to a crash when hotplug memory:
BUG: unable to handle page fault for address:
0000000006400000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP PTI
CPU: 3 PID: 221 Comm: kworker/u16:1 Tainted: G W 5.5.0-next-
20200205+ #343
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
RIP: 0010:__memset+0x24/0x30
Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
RSP: 0018:
ffffb43ac0373c80 EFLAGS:
00010a87
RAX:
ffffffffffffffff RBX:
ffff8a1518800000 RCX:
0000000000050000
RDX:
0000000000000000 RSI:
00000000000000ff RDI:
0000000006400000
RBP:
0000000000140000 R08:
0000000000100000 R09:
0000000006400000
R10:
0000000000000000 R11:
0000000000000002 R12:
0000000000000000
R13:
0000000000000028 R14:
0000000000000000 R15:
ffff8a153ffd9280
FS:
0000000000000000(0000) GS:
ffff8a153ab00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000006400000 CR3:
0000000136fca000 CR4:
00000000000006e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
sparse_add_section+0x1c9/0x26a
__add_pages+0xbf/0x150
add_pages+0x12/0x60
add_memory_resource+0xc8/0x210
__add_memory+0x62/0xb0
acpi_memory_device_add+0x13f/0x300
acpi_bus_attach+0xf6/0x200
acpi_bus_scan+0x43/0x90
acpi_device_hotplug+0x275/0x3d0
acpi_hotplug_work_fn+0x1a/0x30
process_one_work+0x1a7/0x370
worker_thread+0x30/0x380
kthread+0x112/0x130
ret_from_fork+0x35/0x40
We should use memmap as it did.
On x86 the impact is limited to x86_32 builds, or x86_64 configurations
that override the default setting for SPARSEMEM_VMEMMAP.
Other memory hotplug archs (arm64, ia64, and ppc) also default to
SPARSEMEM_VMEMMAP=y.
[dan.j.williams@intel.com: changelog update]
{rppt@linux.ibm.com: changelog update]
Link: http://lkml.kernel.org/r/20200219030454.4844-1-bhe@redhat.com
Fixes:
ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>