platform/kernel/linux-rpi.git
19 months agopowerpc/feature-fixups: Refactor entry fixups patching
Christophe Leroy [Fri, 2 Dec 2022 08:31:40 +0000 (09:31 +0100)]
powerpc/feature-fixups: Refactor entry fixups patching

Several fonctions have the same loop for patching instructions.

Introduce function do_patch_entry_fixups() to refactor those loops.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/79eeff7b20a98f7136da5f79b1f7c436928f27f3.1669969781.git.christophe.leroy@csgroup.eu
19 months agopowerpc/code-patching: Remove #ifdef CONFIG_STRICT_KERNEL_RWX
Christophe Leroy [Fri, 2 Dec 2022 08:31:39 +0000 (09:31 +0100)]
powerpc/code-patching: Remove #ifdef CONFIG_STRICT_KERNEL_RWX

No need to have one implementation of patch_instruction() for
CONFIG_STRICT_KERNEL_RWX and one for !CONFIG_STRICT_KERNEL_RWX.

In patch_instruction(), call raw_patch_instruction() when
!CONFIG_STRICT_KERNEL_RWX.

In poking_init(), bail out immediately, it will be equivalent
to the weak default implementation.

Everything else is declared static and will be discarded by
GCC when !CONFIG_STRICT_KERNEL_RWX.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f67d2a109404d03e8fdf1ea15388c8778337a76b.1669969781.git.christophe.leroy@csgroup.eu
19 months agopowerpc/ftrace: fix syscall tracing on PPC64_ELF_ABI_V1
Michael Jeanson [Thu, 1 Dec 2022 16:14:42 +0000 (11:14 -0500)]
powerpc/ftrace: fix syscall tracing on PPC64_ELF_ABI_V1

In v5.7 the powerpc syscall entry/exit logic was rewritten in C, on
PPC64_ELF_ABI_V1 this resulted in the symbols in the syscall table
changing from their dot prefixed variant to the non-prefixed ones.

Since ftrace prefixes a dot to the syscall names when matching them to
build its syscall event list, this resulted in no syscall events being
available.

Remove the PPC64_ELF_ABI_V1 specific version of
arch_syscall_match_sym_name to have the same behavior across all powerpc
variants.

Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201161442.2127231-1-mjeanson@efficios.com
19 months agopowerpc/64: Sanitise user registers on interrupt in pseries, POWERNV
Rohan McLure [Thu, 1 Dec 2022 07:10:19 +0000 (18:10 +1100)]
powerpc/64: Sanitise user registers on interrupt in pseries, POWERNV

Cause pseries and POWERNV platforms to default to zeroising all potentially
user-defined registers when entering the kernel by means of any interrupt
source, reducing user-influence of the kernel and the likelihood or
producing speculation gadgets.

Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-7-rmclure@linux.ibm.com
19 months agopowerpc/64e: Clear gprs on interrupt routine entry on Book3E
Rohan McLure [Thu, 1 Dec 2022 07:10:18 +0000 (18:10 +1100)]
powerpc/64e: Clear gprs on interrupt routine entry on Book3E

Zero GPRS r14-r31 on entry into the kernel for interrupt sources to
limit influence of user-space values in potential speculation gadgets.
Prior to this commit, all other GPRS are reassigned during the common
prologue to interrupt handlers and so need not be zeroised explicitly.

This may be done safely, without loss of register state prior to the
interrupt, as the common prologue saves the initial values of
non-volatiles, which are unconditionally restored in interrupt_64.S.
Mitigation defaults to enabled by INTERRUPT_SANITIZE_REGISTERS.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-6-rmclure@linux.ibm.com
19 months agopowerpc/64s: Zeroise gprs on interrupt routine entry on Book3S
Rohan McLure [Thu, 1 Dec 2022 07:10:17 +0000 (18:10 +1100)]
powerpc/64s: Zeroise gprs on interrupt routine entry on Book3S

Zeroise user state in gprs (assign to zero) to reduce the influence of user
registers on speculation within kernel syscall handlers. Clears occur
at the very beginning of the sc and scv 0 interrupt handlers, with
restores occurring following the execution of the syscall handler.

Zeroise GPRS r0, r2-r11, r14-r31, on entry into the kernel for all
other interrupt sources. The remaining gprs are overwritten by
entry macros to interrupt handlers, irrespective of whether or not a
given handler consumes these register values. If an interrupt does not
select the IMSR_R12 IOption, zeroise r12.

Prior to this commit, r14-r31 are restored on a per-interrupt basis at
exit, but now they are always restored on 64bit Book3S. Remove explicit
REST_NVGPRS invocations on 64-bit Book3S. 32-bit systems do not clear
user registers on interrupt, and continue to depend on the return value
of interrupt_exit_user_prepare to determine whether or not to restore
non-volatiles.

The mmap_bench benchmark in selftests should rapidly invoke pagefaults.
See ~0.8% performance regression with this mitigation, but this
indicates the worst-case performance due to heavier-weight interrupt
handlers. This mitigation is able to be enabled/disabled through
CONFIG_INTERRUPT_SANITIZE_REGISTERS.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-5-rmclure@linux.ibm.com
19 months agopowerpc/64s: IOption for MSR stored in r12
Rohan McLure [Thu, 1 Dec 2022 07:10:16 +0000 (18:10 +1100)]
powerpc/64s: IOption for MSR stored in r12

Interrupt handlers in asm/exceptions-64s.S contain a great deal of common
code produced by the GEN_COMMON macros. Currently, at the exit point of
the macro, r12 will contain the contents of the MSR. A future patch will
cause these macros to zeroise architected registers to avoid potential
speculation influence of user data.

Provide an IOption that signals that r12 must be retained, as the
interrupt handler assumes it to hold the contents of the MSR.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-4-rmclure@linux.ibm.com
19 months agopowerpc/64: Sanitise common exit code for interrupts
Rohan McLure [Thu, 1 Dec 2022 07:10:15 +0000 (18:10 +1100)]
powerpc/64: Sanitise common exit code for interrupts

Interrupt code is shared between Book3E/S 64-bit systems for interrupt
handlers. Ensure that exit code correctly restores non-volatile gprs on
each system when CONFIG_INTERRUPT_SANITIZE_REGISTERS is enabled.

Also introduce macros for clearing/restoring registers on interrupt
entry for when this configuration option is either disabled or enabled.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-3-rmclure@linux.ibm.com
19 months agopowerpc/64: Add interrupt register sanitisation macros
Rohan McLure [Thu, 1 Dec 2022 07:10:14 +0000 (18:10 +1100)]
powerpc/64: Add interrupt register sanitisation macros

Include in asm/ppc_asm.h macros to be used in multiple successive
patches to implement zeroising architected registers in interrupt
handlers. Registers will be sanitised in this fashion in future patches
to reduce the speculation influence of user-controlled register values.
These mitigations will be configurable through the
CONFIG_INTERRUPT_SANITIZE_REGISTERS Kconfig option.

Included are macros for conditionally zeroising registers and restoring
as required with the mitigation enabled. With the mitigation disabled,
non-volatiles must be restored on demand at separate locations to
those required by the mitigation.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-2-rmclure@linux.ibm.com
19 months agopowerpc/64: Add INTERRUPT_SANITIZE_REGISTERS Kconfig
Rohan McLure [Thu, 1 Dec 2022 07:10:13 +0000 (18:10 +1100)]
powerpc/64: Add INTERRUPT_SANITIZE_REGISTERS Kconfig

Add Kconfig option for enabling clearing of registers on arrival in an
interrupt handler. This reduces the speculation influence of registers
on kernel internals. The option will be consumed by 64-bit systems that
feature speculation and wish to implement this mitigation.

This patch only introduces the Kconfig option, no actual mitigations.

The primary overhead of this mitigation lies in an increased number of
registers that must be saved and restored by interrupt handlers on
Book3S systems. Enable by default on Book3E systems, which prior to
this patch eagerly save and restore register state, meaning that the
mitigation when implemented will have minimal overhead.

Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221201071019.1953023-1-rmclure@linux.ibm.com
19 months agopowerpc/hv-gpci: Fix hv_gpci event list
Kajol Jain [Wed, 30 Nov 2022 17:45:13 +0000 (23:15 +0530)]
powerpc/hv-gpci: Fix hv_gpci event list

Based on getPerfCountInfo v1.018 documentation, some of the
hv_gpci events were deprecated for platform firmware that
supports counter_info_version 0x8 or above.

Fix the hv_gpci event list by adding a new attribute group
called "hv_gpci_event_attrs_v6" and a "ENABLE_EVENTS_COUNTERINFO_V6"
macro to enable these events for platform firmware
that supports counter_info_version 0x6 or below. And assigning
the hv_gpci event list based on output counter info version
of underlying plaform.

Fixes: 97bf2640184f ("powerpc/perf/hv-gpci: add the remaining gpci requests")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221130174513.87501-1-kjain@linux.ibm.com
19 months agopowerpc/83xx/mpc832x_rdb: call platform_device_put() in error case in of_fsl_spi_probe()
Yang Yingliang [Sat, 29 Oct 2022 11:16:26 +0000 (19:16 +0800)]
powerpc/83xx/mpc832x_rdb: call platform_device_put() in error case in of_fsl_spi_probe()

If platform_device_add() is not called or failed, it can not call
platform_device_del() to clean up memory, it should call
platform_device_put() in error case.

Fixes: 26f6cb999366 ("[POWERPC] fsl_soc: add support for fsl_spi")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221029111626.429971-1-yangyingliang@huawei.com
19 months agoMerge branch 'topic/qspinlock' into next
Michael Ellerman [Fri, 2 Dec 2022 07:04:56 +0000 (18:04 +1100)]
Merge branch 'topic/qspinlock' into next

Merge Nick's powerpc qspinlock implementation. From his cover letter:

This replaces the generic queued spinlock code (like s390 does) with our
own implementation.

Generic PV qspinlock code is causing latency / starvation regressions on
large systems that are resulting in hard lockups reported (mostly in
pathoogical cases). The generic qspinlock code has a number of issues
important for powerpc hardware and hypervisors that aren't easily solved
without changing code that would impact other architectures. Follow
s390's lead and implement our own for now.

Issues for powerpc using generic qspinlocks:
  - The previous lock value should not be loaded with simple loads, and
    need not be passed around from previous loads or cmpxchg results,
    because powerpc uses ll/sc-style atomics which can perform more
    complex operations that do not require this. powerpc implementations
    tend to prefer loads use larx for improved coherency performance.
  - The queueing process should absolutely minimise the number of stores
    to the lock word to reduce exclusive coherency probes, important for
    large system scalability. The pending logic is counter productive
    here.
  - Non-atomic unlock for paravirt locks is important (atomic
    instructions tend to still be more expensive than x86 CPUs).
  - Yielding to the lock owner is important in the oversubscribed
    paravirt case, which requires storing the owner CPU in the lock
    word.
  - More control of lock stealing for the paravirt case is important to
    keep latency down on large systems.
  - The lock acquisition operation should always be made with a special
    variant of atomic instructions with the lock hint bit set,
    including (especially) in the queueing paths. This is more a matter
    of adding more arch lock helpers so not an insurmountable problem
    for generic code.

19 months agoselftests/powerpc: Add ptrace setup_core_pattern() null-terminator
Benjamin Gray [Mon, 28 Nov 2022 04:19:43 +0000 (15:19 +1100)]
selftests/powerpc: Add ptrace setup_core_pattern() null-terminator

- malloc() does not zero the buffer,
- fread() does not null-terminate it's output,
- `cat /proc/sys/kernel/core_pattern | hexdump -C` shows the file is
  not inherently null-terminated

So using string operations on the buffer is risky. Explicitly add a null
character to the end to make it safer.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221128041948.58339-3-bgray@linux.ibm.com
19 months agoselftests/powerpc: Use mfspr/mtspr macros
Benjamin Gray [Mon, 28 Nov 2022 04:19:42 +0000 (15:19 +1100)]
selftests/powerpc: Use mfspr/mtspr macros

No need to write inline asm for mtspr/mfspr, we have macros for this
in reg.h

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221128041948.58339-2-bgray@linux.ibm.com
19 months agoselftests: powerpc: Use "grep -E" instead of "egrep"
Tiezhu Yang [Thu, 1 Dec 2022 02:49:57 +0000 (10:49 +0800)]
selftests: powerpc: Use "grep -E" instead of "egrep"

The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this using "grep -E" instead.

  sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests/powerpc`

Here are the steps to install the latest grep:

  wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
  tar xf grep-3.8.tar.gz
  cd grep-3.8 && ./configure && make
  sudo make install
  export PATH=/usr/local/bin:$PATH

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1669862997-31335-1-git-send-email-yangtiezhu@loongson.cn
19 months agopowerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults
Nicholas Piggin [Mon, 24 Oct 2022 03:01:50 +0000 (13:01 +1000)]
powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults

This option increases the number of hash misses by limiting the number
of kernel HPT entries, by keeping a per-CPU record of the last kernel
HPTEs installed, and removing that from the hash table on the next hash
insertion. A timer round-robins CPUs removing remaining kernel HPTEs and
clearing the TLB (in the case of bare metal) to increase and slightly
randomise kernel fault activity.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add comment about NR_CPUS usage, fixup whitespace]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221024030150.852517-1-npiggin@gmail.com
19 months agopowerpc: remove STACK_FRAME_OVERHEAD
Nicholas Piggin [Sun, 27 Nov 2022 12:49:42 +0000 (22:49 +1000)]
powerpc: remove STACK_FRAME_OVERHEAD

This is equal to STACK_FRAME_MIN_SIZE on 32-bit and 64-bit ELFv1, and no
longer used in 64-bit ELFv2, so replace STACK_FRAME_OVERHEAD occurrences
with STACK_FRAME_MIN_SIZE.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-18-npiggin@gmail.com
19 months agopowerpc/64: ELFv2 use minimal stack frames in int and switch frame sizes
Nicholas Piggin [Sun, 27 Nov 2022 12:49:41 +0000 (22:49 +1000)]
powerpc/64: ELFv2 use minimal stack frames in int and switch frame sizes

Adjust the ELFv2 interrupt and switch frames to the minimum C ABI size,
plus pt_regs, plus 16 bytes for the aligned regs marker for the int
frame (and the switch frame needs to match that because it uses the same
regs offset as the int frame).

This saves 80 bytes of kernel stack per interrupt. It's the principle of
getting our accounting right that's more important than the practical
saving.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-17-npiggin@gmail.com
19 months agopowerpc: allow minimum sized kernel stack frames
Nicholas Piggin [Sun, 27 Nov 2022 12:49:40 +0000 (22:49 +1000)]
powerpc: allow minimum sized kernel stack frames

This affects only 64-bit ELFv2 kernels, and reduces the minimum
asm-created stack frame size from 112 to 32 byte on those kernels.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-16-npiggin@gmail.com
19 months agopowerpc: split validate_sp into two functions
Nicholas Piggin [Sun, 27 Nov 2022 12:49:39 +0000 (22:49 +1000)]
powerpc: split validate_sp into two functions

Most callers just want to validate an arbitrary kernel stack pointer,
some need a particular size. Make the size case the exceptional one
with an extra function.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-15-npiggin@gmail.com
19 months agopowerpc: copy_thread add a back chain to the switch stack frame
Nicholas Piggin [Sun, 27 Nov 2022 12:49:38 +0000 (22:49 +1000)]
powerpc: copy_thread add a back chain to the switch stack frame

Stack unwinders need LR and the back chain as a minimum. The switch
stack uses regs->nip for its return pointer rather than lrsave, so
that was not set in the fork frame, and neither was the back chain.
This change sets those fields in the stack.

With this and the previous change, a stack trace in the switch or
interrupt stack goes from looking like this:

  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 3 PID: 90 Comm: systemd Not tainted
  NIP:  c000000000011060 LR: c000000000010f68 CTR: 0000000000007fff
  [ ... regs ... ]
  NIP [c000000000011060] _switch+0x160/0x17c
  LR [c000000000010f68] _switch+0x68/0x17c
  Call Trace:

To this:

  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  CPU: 0 PID: 93 Comm: systemd Not tainted
  NIP:  c000000000011060 LR: c000000000010f68 CTR: 0000000000007fff
  [ ... regs ... ]
  NIP [c000000000011060] _switch+0x160/0x17c
  LR [c000000000010f68] _switch+0x68/0x17c
  Call Trace:
  [c000000005a93e10] [c00000000000cdbc] ret_from_fork_scv+0x0/0x54
  --- interrupt: 3000 at 0x7fffa72f56d8
  NIP:  00007fffa72f56d8 LR: 0000000000000000 CTR: 0000000000000000
  [ ... regs ... ]
  NIP [00007fffa72f56d8] 0x7fffa72f56d8
  LR [0000000000000000] 0x0
  --- interrupt: 3000

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-14-npiggin@gmail.com
19 months agopowerpc: copy_thread fill in interrupt frame marker and back chain
Nicholas Piggin [Sun, 27 Nov 2022 12:49:37 +0000 (22:49 +1000)]
powerpc: copy_thread fill in interrupt frame marker and back chain

Backtraces will not recognise the fork system call interrupt without
the regs marker. And regular interrupt entry from userspace creates
the back chain to the user stack, so do this for the initial fork
frame too, to be consistent.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-13-npiggin@gmail.com
19 months agopowerpc: add a define for the switch frame size and regs offset
Nicholas Piggin [Sun, 27 Nov 2022 12:49:36 +0000 (22:49 +1000)]
powerpc: add a define for the switch frame size and regs offset

This is open-coded in process.c, ppc32 uses a different define with the
same value, and the C definition is name differently which makes it an
extra indirection to grep for.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-12-npiggin@gmail.com
19 months agopowerpc: add a define for the user interrupt frame size
Nicholas Piggin [Sun, 27 Nov 2022 12:49:35 +0000 (22:49 +1000)]
powerpc: add a define for the user interrupt frame size

The user interrupt frame is a different size from the kernel frame, so
give it its own name.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-11-npiggin@gmail.com
19 months agopowerpc: Rename STACK_FRAME_MARKER and derive it from frame offset
Nicholas Piggin [Sun, 27 Nov 2022 12:49:34 +0000 (22:49 +1000)]
powerpc: Rename STACK_FRAME_MARKER and derive it from frame offset

This is a count of longs from the stack pointer to the regs marker.
Rename it to make it more distinct from the other byte offsets. It
can be derived from the byte offset definitions just added.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-10-npiggin@gmail.com
19 months agopowerpc: add a definition for the marker offset within the interrupt frame
Nicholas Piggin [Sun, 27 Nov 2022 12:49:33 +0000 (22:49 +1000)]
powerpc: add a definition for the marker offset within the interrupt frame

Define a constant rather than open-code the offset for the
"regs" marker.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-9-npiggin@gmail.com
19 months agopowerpc: add definition for pt_regs offset within an interrupt frame
Nicholas Piggin [Sun, 27 Nov 2022 12:49:32 +0000 (22:49 +1000)]
powerpc: add definition for pt_regs offset within an interrupt frame

This is a common offset that currently uses the overloaded
STACK_FRAME_OVERHEAD constant. It's easier to read and more
flexible to use a specific regs offset for this.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-8-npiggin@gmail.com
19 months agopowerpc: simplify ppc_save_regs
Nicholas Piggin [Sun, 27 Nov 2022 12:49:31 +0000 (22:49 +1000)]
powerpc: simplify ppc_save_regs

Adjust the pt_regs pointer so the interrupt frame offsets can be used
to save registers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-7-npiggin@gmail.com
19 months agopowerpc/pseries: hvcall stack frame overhead
Nicholas Piggin [Sun, 27 Nov 2022 12:49:30 +0000 (22:49 +1000)]
powerpc/pseries: hvcall stack frame overhead

This call may use the min size stack frame. The scratch space used is
in the caller's parameter area frame, not this function's frame.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-6-npiggin@gmail.com
19 months agopowerpc: Rearrange copy_thread child stack creation
Nicholas Piggin [Sun, 27 Nov 2022 12:49:29 +0000 (22:49 +1000)]
powerpc: Rearrange copy_thread child stack creation

This makes it a bit clearer where the stack frame is created, and will
allow easier use of some of the stack offset constants in a later
change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-5-npiggin@gmail.com
19 months agopowerpc/perf: callchain validate kernel stack pointer bounds
Nicholas Piggin [Sun, 27 Nov 2022 12:49:28 +0000 (22:49 +1000)]
powerpc/perf: callchain validate kernel stack pointer bounds

The interrupt frame detection and loads from the hypothetical pt_regs
are not bounds-checked. The next-frame validation only bounds-checks
STACK_FRAME_OVERHEAD, which does not include the pt_regs. Add another
test for this.

The user could set r1 to be equal to the address matching the first
interrupt frame - STACK_INT_FRAME_SIZE, which is in the previous page
due to the kernel redzone, and induce the kernel to load the marker from
there. Possibly this could cause a crash at least. If the user could
induce the previous page to contain a valid marker, then it might be
able to direct perf to read specific memory addresses in a way that
could be transmitted back to the user in the perf data.

Fixes: 20002ded4d93 ("perf_counter: powerpc: Add callchain support")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-4-npiggin@gmail.com
19 months agopowerpc/64: Remove asm interrupt tracing call helpers
Nicholas Piggin [Sun, 27 Nov 2022 12:49:27 +0000 (22:49 +1000)]
powerpc/64: Remove asm interrupt tracing call helpers

These are now unused. Remove.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-3-npiggin@gmail.com
19 months agopowerpc/64: Option to build big-endian with ELFv2 ABI
Nicholas Piggin [Mon, 28 Nov 2022 04:15:39 +0000 (14:15 +1000)]
powerpc/64: Option to build big-endian with ELFv2 ABI

Provide an option to build big-endian kernels using the ELFv2 ABI. This
works on GCC only for now. Clang is rumored to support this, but core
build files need updating first, at least.

This gives big-endian kernels useful advantages of the ELFv2 ABI, e.g.,
less stack usage, -mprofile-kernel support, better compatibility with
eBPF tools.

BE+ELFv2 is not officially supported by the GNU toolchain, but it works
fine in testing and has been used by some userspace for some time (e.g.,
Void Linux).

Tested-by: Michal Suchánek <msuchanek@suse.de>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221128041539.1742489-5-npiggin@gmail.com
19 months agopowerpc/64: Add big-endian ELFv2 flavour to crypto VMX asm generation
Nicholas Piggin [Mon, 28 Nov 2022 04:15:38 +0000 (14:15 +1000)]
powerpc/64: Add big-endian ELFv2 flavour to crypto VMX asm generation

This allows asm generation for big-endian ELFv2 builds.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221128041539.1742489-4-npiggin@gmail.com
19 months agopowerpc/64: Add module check for ELF ABI version
Nicholas Piggin [Mon, 28 Nov 2022 04:15:37 +0000 (14:15 +1000)]
powerpc/64: Add module check for ELF ABI version

Override the generic module ELF check to provide a check for the ELF ABI
version. This becomes important if we allow big-endian ELF ABI V2 builds
but it doesn't hurt to check now.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221128041539.1742489-3-npiggin@gmail.com
19 months agomodule: add module_elf_check_arch for module-specific checks
Nicholas Piggin [Mon, 28 Nov 2022 04:15:36 +0000 (14:15 +1000)]
module: add module_elf_check_arch for module-specific checks

The elf_check_arch() function is also used to test compatibility of
usermode binaries. Kernel modules may have more specific requirements,
for example powerpc would like to test for ABI version compatibility.

Add a weak module_elf_check_arch() that defaults to true, and call it
from elf_validity_check().

Signed-off-by: Jessica Yu <jeyu@kernel.org>
[np: added changelog, adjust name, rebase]
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221128041539.1742489-2-npiggin@gmail.com
19 months agopowerpc/code-patching: Consolidate and cache per-cpu patching context
Benjamin Gray [Wed, 9 Nov 2022 04:51:12 +0000 (15:51 +1100)]
powerpc/code-patching: Consolidate and cache per-cpu patching context

With the temp mm context support, there are CPU local variables to hold
the patch address and pte. Use these in the non-temp mm path as well
instead of adding a level of indirection through the text_poke_area
vm_struct and pointer chasing the pte.

As both paths use these fields now, there is no need to let unreferenced
variables be dropped by the compiler, so it is cleaner to merge them
into a single context struct. This has the additional benefit of
removing a redundant CPU local pointer, as only one of cpu_patching_mm /
text_poke_area is ever used, while remaining well-typed. It also groups
each CPU's data into a single cacheline.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
[mpe: Shorten name to 'area' as suggested by Christophe]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-10-bgray@linux.ibm.com
19 months agopowerpc/code-patching: Use temporary mm for Radix MMU
Christopher M. Riedl [Wed, 9 Nov 2022 04:51:11 +0000 (15:51 +1100)]
powerpc/code-patching: Use temporary mm for Radix MMU

x86 supports the notion of a temporary mm which restricts access to
temporary PTEs to a single CPU. A temporary mm is useful for situations
where a CPU needs to perform sensitive operations (such as patching a
STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
said mappings to other CPUs. Another benefit is that other CPU TLBs do
not need to be flushed when the temporary mm is torn down.

Mappings in the temporary mm can be set in the userspace portion of the
address-space.

Interrupts must be disabled while the temporary mm is in use. HW
breakpoints, which may have been set by userspace as watchpoints on
addresses now within the temporary mm, are saved and disabled when
loading the temporary mm. The HW breakpoints are restored when unloading
the temporary mm. All HW breakpoints are indiscriminately disabled while
the temporary mm is in use - this may include breakpoints set by perf.

Use the `poking_init` init hook to prepare a temporary mm and patching
address. Initialize the temporary mm using mm_alloc(). Choose a
randomized patching address inside the temporary mm userspace address
space. The patching address is randomized between PAGE_SIZE and
DEFAULT_MAP_WINDOW-PAGE_SIZE.

Bits of entropy with 64K page size on BOOK3S_64:

bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)

PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
bits of entropy = log2(128TB / 64K)
bits of entropy = 31

The upper limit is DEFAULT_MAP_WINDOW due to how the Book3s64 Hash MMU
operates - by default the space above DEFAULT_MAP_WINDOW is not
available. Currently the Hash MMU does not use a temporary mm so
technically this upper limit isn't necessary; however, a larger
randomization range does not further "harden" this overall approach and
future work may introduce patching with a temporary mm on Hash as well.

Randomization occurs only once during initialization for each CPU as it
comes online.

The patching page is mapped with PAGE_KERNEL to set EAA[0] for the PTE
which ignores the AMR (so no need to unlock/lock KUAP) according to
PowerISA v3.0b Figure 35 on Radix.

Based on x86 implementation:

commit 4fc19708b165
("x86/alternatives: Initialize temporary mm for patching")

and:

commit b3fd8e83ada0
("x86/alternatives: Use temporary mm for text poking")

From: Benjamin Gray <bgray@linux.ibm.com>

Synchronisation is done according to ISA 3.1B Book 3 Chapter 13
"Synchronization Requirements for Context Alterations". Switching the mm
is a change to the PID, which requires a CSI before and after the change,
and a hwsync between the last instruction that performs address
translation for an associated storage access.

Instruction fetch is an associated storage access, but the instruction
address mappings are not being changed, so it should not matter which
context they use. We must still perform a hwsync to guard arbitrary
prior code that may have accessed a userspace address.

TLB invalidation is local and VA specific. Local because only this core
used the patching mm, and VA specific because we only care that the
writable mapping is purged. Leaving the other mappings intact is more
efficient, especially when performing many code patches in a row (e.g.,
as ftrace would).

Signed-off-by: Christopher M. Riedl <cmr@bluescreens.de>
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
[mpe: Use mm_alloc() per 107b6828a7cd ("x86/mm: Use mm_alloc() in poking_init()")]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-9-bgray@linux.ibm.com
19 months agopowerpc/qspinlock: add compile-time tuning adjustments
Nicholas Piggin [Sat, 26 Nov 2022 09:59:32 +0000 (19:59 +1000)]
powerpc/qspinlock: add compile-time tuning adjustments

This adds compile-time options that allow the EH lock hint bit to be
enabled or disabled, and adds some new options that may or may not
help matters. To help with experimentation and tuning.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-18-npiggin@gmail.com
19 months agopowerpc/qspinlock: provide accounting and options for sleepy locks
Nicholas Piggin [Sat, 26 Nov 2022 09:59:31 +0000 (19:59 +1000)]
powerpc/qspinlock: provide accounting and options for sleepy locks

Finding the owner or a queued waiter on a lock with a preempted vcpu is
indicative of an oversubscribed guest causing the lock to get into
trouble. Provide some options to detect this situation and have new CPUs
avoid queueing for a longer time (more steal iterations) to minimise the
problems caused by vcpu preemption on the queue.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-17-npiggin@gmail.com
19 months agopowerpc/qspinlock: allow indefinite spinning on a preempted owner
Nicholas Piggin [Sat, 26 Nov 2022 09:59:30 +0000 (19:59 +1000)]
powerpc/qspinlock: allow indefinite spinning on a preempted owner

Provide an option that holds off queueing indefinitely while the lock
owner is preempted. This could reduce queueing latencies for very
overcommitted vcpu situations.

This is disabled by default.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-16-npiggin@gmail.com
19 months agopowerpc/qspinlock: reduce remote node steal spins
Nicholas Piggin [Sat, 26 Nov 2022 09:59:29 +0000 (19:59 +1000)]
powerpc/qspinlock: reduce remote node steal spins

Allow for a reduction in the number of times a CPU from a different
node than the owner can attempt to steal the lock before queueing.
This could bias the transfer behaviour of the lock across the
machine and reduce NUMA crossings.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-15-npiggin@gmail.com
19 months agopowerpc/qspinlock: use spin_begin/end API
Nicholas Piggin [Sat, 26 Nov 2022 09:59:28 +0000 (19:59 +1000)]
powerpc/qspinlock: use spin_begin/end API

Use the spin_begin/spin_cpu_relax/spin_end APIs in qspinlock, which helps
to prevent threads issuing a lot of expensive priority nops which may not
have much effect due to immediately executing low then medium priority.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-14-npiggin@gmail.com
19 months agopowerpc/qspinlock: allow lock stealing in trylock and lock fastpath
Nicholas Piggin [Sat, 26 Nov 2022 09:59:27 +0000 (19:59 +1000)]
powerpc/qspinlock: allow lock stealing in trylock and lock fastpath

This change allows trylock to steal the lock. It also allows the initial
lock attempt to steal the lock rather than bailing out and going to the
slow path.

This gives trylock more strength: without this a continually-contended
lock will never permit a trylock to succeed. With this change, the
trylock has a small but non-zero chance.

It also gives the lock fastpath most of the benefit of passing the
reservation back through to the steal loop in the slow path without the
complexity.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-13-npiggin@gmail.com
19 months agopowerpc/qspinlock: add ability to prod new queue head CPU
Nicholas Piggin [Sat, 26 Nov 2022 09:59:26 +0000 (19:59 +1000)]
powerpc/qspinlock: add ability to prod new queue head CPU

After the head of the queue acquires the lock, it releases the
next waiter in the queue to become the new head. Add an option
to prod the new head if its vCPU was preempted. This may only
have an effect if queue waiters are yielding.

Disable this option by default for now, i.e., no logical change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-12-npiggin@gmail.com
19 months agopowerpc/qspinlock: allow propagation of yield CPU down the queue
Nicholas Piggin [Sat, 26 Nov 2022 09:59:25 +0000 (19:59 +1000)]
powerpc/qspinlock: allow propagation of yield CPU down the queue

Having all CPUs poll the lock word for the owner CPU that should be
yielded to defeats most of the purpose of using MCS queueing for
scalability. Yet it may be desirable for queued waiters to yield to a
preempted owner.

With this change, queue waiters never sample the owner CPU directly from
the lock word. The queue head (which is spinning on the lock) propagates
the owner CPU back to the next waiter if it finds the owner has been
preempted. That waiter then propagates the owner CPU back to the next
waiter, and so on.

s390 addresses this problem differenty, by having queued waiters sample
the lock word to find the owner at a low frequency. That has the
advantage of being simpler, the advantage of propagation is that the
lock word never has to be accesed by queued waiters, and the transfer of
cache lines to transmit the owner data is only required when lock holder
vCPU preemption occurs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-11-npiggin@gmail.com
19 months agopowerpc/qspinlock: allow stealing when head of queue yields
Nicholas Piggin [Sat, 26 Nov 2022 09:59:24 +0000 (19:59 +1000)]
powerpc/qspinlock: allow stealing when head of queue yields

If the head of queue is preventing stealing but it finds the owner vCPU
is preempted, it will yield its cycles to the owner which could cause it
to become preempted. Add an option to re-allow stealers before yielding,
and disallow them again after returning from the yield.

Disable this option by default for now, i.e., no logical change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-10-npiggin@gmail.com
19 months agopowerpc/qspinlock: implement option to yield to previous node
Nicholas Piggin [Sat, 26 Nov 2022 09:59:23 +0000 (19:59 +1000)]
powerpc/qspinlock: implement option to yield to previous node

Queued waiters which are not at the head of the queue don't spin on
the lock word but their qnode lock word, waiting for the previous queued
CPU to release them. Add an option which allows these waiters to yield
to the previous CPU if its vCPU is preempted.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-9-npiggin@gmail.com
19 months agopowerpc/qspinlock: paravirt yield to lock owner
Nicholas Piggin [Sat, 26 Nov 2022 09:59:22 +0000 (19:59 +1000)]
powerpc/qspinlock: paravirt yield to lock owner

Waiters spinning on the lock word should yield to the lock owner if the
vCPU is preempted. This improves performance when the hypervisor has
oversubscribed physical CPUs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-8-npiggin@gmail.com
19 months agopowerpc/qspinlock: store owner CPU in lock word
Nicholas Piggin [Sat, 26 Nov 2022 09:59:21 +0000 (19:59 +1000)]
powerpc/qspinlock: store owner CPU in lock word

Store the owner CPU number in the lock word so it may be yielded to,
as powerpc's paravirtualised simple spinlocks do.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-7-npiggin@gmail.com
19 months agopowerpc/qspinlock: theft prevention to control latency
Nicholas Piggin [Sat, 26 Nov 2022 09:59:20 +0000 (19:59 +1000)]
powerpc/qspinlock: theft prevention to control latency

Give the queue head the ability to stop stealers. After a number of
spins without successfully acquiring the lock, the queue head sets
this, which halts stealing and will assure it is the next owner.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-6-npiggin@gmail.com
19 months agopowerpc/qspinlock: allow new waiters to steal the lock before queueing
Nicholas Piggin [Sat, 26 Nov 2022 09:59:19 +0000 (19:59 +1000)]
powerpc/qspinlock: allow new waiters to steal the lock before queueing

Allow new waiters to "steal" the lock before queueing. That is, to
acquire it while other CPUs have queued.

This particularly helps paravirt performance when physical CPUs are
oversubscribed, by keeping the lock from becoming a strict FIFO and
vCPU preemption causing queue train wrecks.

The new __queued_spin_trylock_steal() function is put in qspinlock.h
to save having to move it, because it will be used there by a later
change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-5-npiggin@gmail.com
19 months agopowerpc/qspinlock: convert atomic operations to assembly
Nicholas Piggin [Sat, 26 Nov 2022 09:59:18 +0000 (19:59 +1000)]
powerpc/qspinlock: convert atomic operations to assembly

This uses more optimal ll/sc style access patterns (rather than
cmpxchg), and also sets the EH=1 lock hint on those operations
which acquire ownership of the lock.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-4-npiggin@gmail.com
19 months agopowerpc/qspinlock: use a half-word store to unlock to avoid larx/stcx.
Nicholas Piggin [Sat, 26 Nov 2022 09:59:17 +0000 (19:59 +1000)]
powerpc/qspinlock: use a half-word store to unlock to avoid larx/stcx.

The first 16 bits of the lock are only modified by the owner, and other
modifications always use atomic operations on the entire 32 bits, so
unlocks can use plain stores on the 16 bits. This is the same kind of
optimisation done by core qspinlock code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-3-npiggin@gmail.com
19 months agopowerpc/qspinlock: add mcs queueing for contended waiters
Nicholas Piggin [Sat, 26 Nov 2022 09:59:16 +0000 (19:59 +1000)]
powerpc/qspinlock: add mcs queueing for contended waiters

This forms the basis of the qspinlock slow path.

Like generic qspinlocks and unlike the vanilla MCS algorithm, the lock
owner does not participate in the queue, only waiters. The first waiter
spins on the lock word, then when the lock is released it takes
ownership and unqueues the next waiter. This is how qspinlocks can be
implemented with the spinlock API -- lock owners don't need a node, only
waiters do.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-2-npiggin@gmail.com
19 months agopowerpc/qspinlock: powerpc qspinlock implementation
Nicholas Piggin [Mon, 28 Nov 2022 03:11:13 +0000 (13:11 +1000)]
powerpc/qspinlock: powerpc qspinlock implementation

Add a powerpc specific implementation of queued spinlocks. This is the
build framework with a very simple (non-queued) spinlock implementation
to begin with. Later changes add queueing, and other features and
optimisations one-at-a-time. It is done this way to more easily see how
the queued spinlocks are built, and to make performance and correctness
bisects more useful.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Drop paravirt.h & processor.h changes to fix 32-bit build]
[mpe: Fix 32-bit build of qspinlock.o & disallow GENERIC_LOCKBREAK per Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/CONLLQB6DCJU.2ZPOS7T6S5GRR@bobo
19 months agopowerpc/tlb: Add local flush for page given mm_struct and psize
Benjamin Gray [Wed, 9 Nov 2022 04:51:10 +0000 (15:51 +1100)]
powerpc/tlb: Add local flush for page given mm_struct and psize

Adds a local TLB flush operation that works given an mm_struct, VA to
flush, and page size representation. Most implementations mirror the
surrounding code. The book3s/32/tlbflush.h implementation is left as
a BUILD_BUG because it is more complicated and not required for
anything as yet.

This removes the need to create a vm_area_struct, which the temporary
patching mm work does not need.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-8-bgray@linux.ibm.com
19 months agopowerpc/mm: Remove flush_all_mm, local_flush_all_mm
Benjamin Gray [Wed, 9 Nov 2022 04:51:09 +0000 (15:51 +1100)]
powerpc/mm: Remove flush_all_mm, local_flush_all_mm

These functions were introduced for "cxl: Enable global TLBIs for cxl
contexts" [1], which ended up using them for Radix only. They were never
implemented on Hash (and creating an implementation appears to be
difficult), so nothing can actually rely on them.

They behave differently to the existing surrounding functions too, in
that they actually need to do something on Hash. The other functions
are primarily for use in generic code that expects their definitions,
but Hash updates the TLB during PTE updates.

After replacing the only usage with the Radix specific version, there
are no more users of these functions, and given they are not implemented
anyway it is safe to delete them.

[1]: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20170903181513.29635-1-fbarrat@linux.vnet.ibm.com/

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-7-bgray@linux.ibm.com
19 months agocxl: Use radix__flush_all_mm instead of generic flush_all_mm
Benjamin Gray [Wed, 9 Nov 2022 04:51:08 +0000 (15:51 +1100)]
cxl: Use radix__flush_all_mm instead of generic flush_all_mm

The generic implementation of this function isn't really generic (Hash
is not implemented). Unfortunately, the runtime warnings cannot be
replaced with BUILD_BUG's, so it seems safer not to provide a stub in
the first place.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-6-bgray@linux.ibm.com
19 months agopowerpc/mm: Remove empty hash__ functions
Benjamin Gray [Wed, 9 Nov 2022 04:51:07 +0000 (15:51 +1100)]
powerpc/mm: Remove empty hash__ functions

The empty hash__* functions are unnecessary. The empty definitions were
introduced when 64-bit Hash support was added, as the functions were
still used in generic code. These empty definitions were prefixed with
hash__ when Radix support was added, and new wrappers with the original
names were added that selected the Radix or Hash version based on
radix_enabled().

But the hash__ prefixed functions were not part of a public interface,
so there is no need to include them for compatibility with anything.
Generic code will use the non-prefixed wrappers, and Hash specific code
will know that there is no point in calling them (or even worse, call
them and expect them to do something).

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-5-bgray@linux.ibm.com
19 months agopowerpc/code-patching: Use WARN_ON and fix check in poking_init
Benjamin Gray [Wed, 9 Nov 2022 04:51:05 +0000 (15:51 +1100)]
powerpc/code-patching: Use WARN_ON and fix check in poking_init

BUG_ON() when failing to initialise the code patching window is
unnecessary, and use of BUG_ON is discouraged. We don't set
poking_init_done in this case, so failure to init the boot CPU will
result in a strict RWX error when a following patch_instruction uses
raw_patch_instruction. If it only fails for later CPUs, they won't be
onlined in the first place.

The return value of cpuhp_setup_state() is also >= 0 on success,
so check for < 0.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-3-bgray@linux.ibm.com
19 months agopowerpc: Allow clearing and restoring registers independent of saved breakpoint state
Jordan Niethe [Wed, 9 Nov 2022 04:51:04 +0000 (15:51 +1100)]
powerpc: Allow clearing and restoring registers independent of saved breakpoint state

For the coming temporary mm used for instruction patching, the
breakpoint registers need to be cleared to prevent them from
accidentally being triggered. As soon as the patching is done, the
breakpoints will be restored.

The breakpoint state is stored in the per-cpu variable current_brk[].
Add a suspend_breakpoints() function which will clear the breakpoint
registers without touching the state in current_brk[]. Add a pair
function restore_breakpoints() which will move the state in
current_brk[] back to the registers.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-2-bgray@linux.ibm.com
19 months agopowerpc/fsl-pci: Choose PCI host bridge with alias pci0 as the primary
Pali Rohár [Sat, 20 Aug 2022 12:33:27 +0000 (14:33 +0200)]
powerpc/fsl-pci: Choose PCI host bridge with alias pci0 as the primary

If there's no PCI host bridge with ISA then check for PCI host bridge with
alias "pci0" (first PCI host bridge) and if it exists then choose it as the
primary PCI host bridge.

This makes choice of primary PCI host bridge more stable across boots and
updates as the last fallback candidate for primary PCI host bridge (if
there is no choice) is selected arbitrary.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220820123327.20551-1-pali@kernel.org
19 months agopowerpc: dts: turris1x.dts: Add channel labels for temperature sensor
Pali Rohár [Fri, 30 Sep 2022 12:39:01 +0000 (14:39 +0200)]
powerpc: dts: turris1x.dts: Add channel labels for temperature sensor

Channel 0 of SA56004ED chip refers to internal SA56004ED chip sensor (chip
itself is located on the board) and channel 1 of SA56004ED chip refers to
external sensor which is connected to temperature diode of the P2020 CPU.

Fixes: 54c15ec3b738 ("powerpc: dts: Add DTS file for CZ.NIC Turris 1.x routers")
Signed-off-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930123901.10251-1-pali@kernel.org
19 months agopowerpc/book3e: remove #include <generated/utsrelease.h>
Thomas Weißschuh [Sat, 26 Nov 2022 05:10:00 +0000 (06:10 +0100)]
powerpc/book3e: remove #include <generated/utsrelease.h>

Commit 7ad4bd887d27 ("powerpc/book3e: get rid of #include <generated/compile.h>")
removed the usage of the define UTS_RELEASE but forgot to drop the
include.

utsrelease.h is potentially generated on each build. By removing the
unused include we can get rid of some spurious recompilations.

Fixes: 7ad4bd887d27 ("powerpc/book3e: get rid of #include <generated/compile.h>")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix typo in change log and add more explanation]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126051002.123199-2-linux@weissschuh.net
19 months agoselftests/powerpc: Account for offline cpus in perf-hwbreak test
Naveen N. Rao [Tue, 22 Nov 2022 06:40:54 +0000 (12:10 +0530)]
selftests/powerpc: Account for offline cpus in perf-hwbreak test

For systemwide tests, use online cpu mask to only open events on online
cpus. This enables this test to work on systems in lower SMT modes.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/15fd447dcefd19945a7d31f0a475349f548a3603.1669096083.git.naveen.n.rao@linux.vnet.ibm.com
19 months agoselftests/powerpc: Bump up rlimit for perf-hwbreak test
Naveen N. Rao [Tue, 22 Nov 2022 06:40:53 +0000 (12:10 +0530)]
selftests/powerpc: Bump up rlimit for perf-hwbreak test

The systemwide perf hardware breakpoint test tries to open a perf event
on each cpu. On large systems, we run out of file descriptors and fail
the test. Instead, have the test set the file descriptor limit to an
arbitraty high value.

Reported-by: Rohan Deshpande <rohan_d@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/187fed5843cecc1e5066677b6296ee88337d7bef.1669096083.git.naveen.n.rao@linux.vnet.ibm.com
19 months agoselftests/powerpc: Move perror closer to its use
Naveen N. Rao [Tue, 22 Nov 2022 06:40:52 +0000 (12:10 +0530)]
selftests/powerpc: Move perror closer to its use

Right now, if perf_event_open() fails for the systemwide tests, error
report is printed too late, sometimes after subsequent system calls.
Move use of perror() to the main function, just after the syscall.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/372ac78c27899f1f612fbd6ac796604a4a9310aa.1669096083.git.naveen.n.rao@linux.vnet.ibm.com
19 months agopowerpc/ps3: mark ps3_system_bus_type static
Christoph Hellwig [Tue, 22 Nov 2022 07:22:25 +0000 (08:22 +0100)]
powerpc/ps3: mark ps3_system_bus_type static

ps3_system_bus_type is only used inside of system-bus.c, so remove
the external declaration and the very outdated comment next to it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221122072225.423432-1-hch@lst.de
19 months agoMerge branch 'fixes' into next
Michael Ellerman [Wed, 30 Nov 2022 10:46:06 +0000 (21:46 +1100)]
Merge branch 'fixes' into next

Merge our fixes branch to bring in some changes that are prerequisites
for work in next.

19 months agoMerge branch 'topic/ppc-kvm' into next
Michael Ellerman [Wed, 30 Nov 2022 09:42:22 +0000 (20:42 +1100)]
Merge branch 'topic/ppc-kvm' into next

Merge our KVM topic branch.

19 months agoKVM: PPC: Book3E: Fix CONFIG_TRACE_IRQFLAGS support
Nicholas Piggin [Sun, 27 Nov 2022 12:49:26 +0000 (22:49 +1000)]
KVM: PPC: Book3E: Fix CONFIG_TRACE_IRQFLAGS support

32-bit does not trace_irqs_off() to match the trace_irqs_on() call in
kvmppc_fix_ee_before_entry(). This can lead to irqs being enabled twice
in the trace, and the irqs-off region between guest exit and the host
enabling local irqs again is not properly traced.

64-bit code does call this, but from asm code where volatiles are live
and so incorrectly get clobbered.

Move the irq reconcile into C to fix both problems.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-2-npiggin@gmail.com
19 months agopowerpc/64s: Add missing declaration for machine_check_early_boot()
Michael Ellerman [Fri, 25 Nov 2022 11:34:42 +0000 (22:34 +1100)]
powerpc/64s: Add missing declaration for machine_check_early_boot()

There's no declaration for machine_check_early_boot(), which leads to a
build failure with W=1. Add one.

Fixes: 2f5182cffa43 ("powerpc/64s: early boot machine check handler")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221125132521.2167039-1-mpe@ellerman.id.au
19 months agopowerpc: Use "grep -E" instead of "egrep"
Tiezhu Yang [Fri, 18 Nov 2022 09:40:29 +0000 (17:40 +0800)]
powerpc: Use "grep -E" instead of "egrep"

The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.

  sed -i "s/egrep/grep -E/g" `grep egrep -rwl arch/powerpc`

Here are the steps to install the latest grep:

  wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
  tar xf grep-3.8.tar.gz
  cd grep-3.8 && ./configure && make
  sudo make install
  export PATH=/usr/local/bin:$PATH

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1668764429-11540-1-git-send-email-yangtiezhu@loongson.cn
19 months agopowerpc/pseries: fix plpks_read_var() code for different consumers
Nayna Jain [Sun, 6 Nov 2022 20:58:39 +0000 (15:58 -0500)]
powerpc/pseries: fix plpks_read_var() code for different consumers

Even though plpks_read_var() is currently called to read variables
owned by different consumers, it internally supports only OS consumer.

Fix plpks_read_var() to handle different consumers correctly.

Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221106205839.600442-7-nayna@linux.ibm.com
19 months agopowerpc/pseries: replace kmalloc with kzalloc in PLPKS driver
Nayna Jain [Sun, 6 Nov 2022 20:58:38 +0000 (15:58 -0500)]
powerpc/pseries: replace kmalloc with kzalloc in PLPKS driver

Replace kmalloc with kzalloc in construct_auth() function to default
initialize structure with zeroes.

Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221106205839.600442-6-nayna@linux.ibm.com
19 months agopowerpc/pseries: cleanup error logs in plpks driver
Nayna Jain [Sun, 6 Nov 2022 20:58:37 +0000 (15:58 -0500)]
powerpc/pseries: cleanup error logs in plpks driver

Logging H_CALL return codes in PLPKS driver are easy to confuse with
Linux error codes.

Let the caller of the function log the converted linux error code.

Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221106205839.600442-5-nayna@linux.ibm.com
19 months agopowerpc/pseries: Return -EIO instead of -EINTR for H_ABORTED error
Nayna Jain [Sun, 6 Nov 2022 20:58:36 +0000 (15:58 -0500)]
powerpc/pseries: Return -EIO instead of -EINTR for H_ABORTED error

Some commands for eg. "cat" might continue to retry on encountering
EINTR. This is not expected for original error code H_ABORTED.

Map H_ABORTED to more relevant Linux error code EIO.

Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221106205839.600442-4-nayna@linux.ibm.com
19 months agopowerpc/pseries: Fix the H_CALL error code in PLPKS driver
Nayna Jain [Sun, 6 Nov 2022 20:58:35 +0000 (15:58 -0500)]
powerpc/pseries: Fix the H_CALL error code in PLPKS driver

PAPR Spec defines H_P1 actually as H_PARAMETER and maps H_ABORTED to
a different numerical value.

Fix the error codes as per PAPR Specification.

Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221106205839.600442-3-nayna@linux.ibm.com
19 months agopowerpc/pseries: fix the object owners enum value in plpks driver
Nayna Jain [Sun, 6 Nov 2022 20:58:34 +0000 (15:58 -0500)]
powerpc/pseries: fix the object owners enum value in plpks driver

OS_VAR_LINUX enum in PLPKS driver should be 0x02 instead of 0x01.

Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221106205839.600442-2-nayna@linux.ibm.com
19 months agopowerpc/powermac: Fix symbol not declared warnings
Chen Lifu [Thu, 3 Nov 2022 07:01:22 +0000 (15:01 +0800)]
powerpc/powermac: Fix symbol not declared warnings

1. ppc_override_l2cr and ppc_override_l2cr_value are only used in
   l2cr_init() function, remove them and used *l2cr directly.
2. has_l2cache is not used outside of the file, so mark it static and
   do not initialise statics to 0.

Fixes the following warnings:

  arch/powerpc/platforms/powermac/setup.c:73:5: warning: symbol
  'ppc_override_l2cr' was not declared. Should it be static?
  arch/powerpc/platforms/powermac/setup.c:74:5: warning: symbol
  'ppc_override_l2cr_value' was not declared. Should it be static?
  arch/powerpc/platforms/powermac/setup.c:75:5: warning: symbol
  'has_l2cache' was not declared. Should it be static?

Signed-off-by: Chen Lifu <chenlifu@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Unwrap printk string]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221103070122.340773-1-chenlifu@huawei.com
19 months agopowerpc/pseries/eeh: Fix some kernel-doc warnings
Bo Liu [Mon, 31 Oct 2022 06:37:06 +0000 (02:37 -0400)]
powerpc/pseries/eeh: Fix some kernel-doc warnings

Fixes the following W=1 kernel build warning(s):
  arch/powerpc/platforms/pseries/eeh_pseries.c:163: warning: Function parameter or member 'config_addr' not described in 'pseries_eeh_phb_reset'
  arch/powerpc/platforms/pseries/eeh_pseries.c:163: warning: Excess function parameter 'config_adddr' description in 'pseries_eeh_phb_reset'
  arch/powerpc/platforms/pseries/eeh_pseries.c:198: warning: Function parameter or member 'config_addr' not described in 'pseries_eeh_phb_configure_bridge'
  arch/powerpc/platforms/pseries/eeh_pseries.c:198: warning: Excess function parameter 'config_adddr' description in 'pseries_eeh_phb_configure_bridge'

Signed-off-by: Bo Liu <liubo03@inspur.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221031063706.2770-1-liubo03@inspur.com
19 months agopowerpc/8xx: Fix warning in hw_breakpoint_handler()
Russell Currey [Mon, 24 Oct 2022 04:13:46 +0000 (15:13 +1100)]
powerpc/8xx: Fix warning in hw_breakpoint_handler()

In hw_breakpoint_handler(), ea is set by wp_get_instr_detail() except
for 8xx, leading the variable to be passed uninitialised to
wp_check_constraints().  This is safe as wp_check_constraints() returns
early without using ea, so just set it to make the compiler happy.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221024041346.103608-1-ruscur@russell.cc
19 months agoselftests/powerpc: Remove repeated word in comments
Shaomin Deng [Sat, 29 Oct 2022 09:46:43 +0000 (05:46 -0400)]
selftests/powerpc: Remove repeated word in comments

Remove the repeated word "not" in comments.

Signed-off-by: Shaomin Deng <dengshaomin@cdjrlc.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221029094643.5595-1-dengshaomin@cdjrlc.com
19 months agoselftests/powerpc: Fix spelling mistake "mmaping" -> "mmapping"
Colin Ian King [Fri, 21 Oct 2022 08:45:45 +0000 (09:45 +0100)]
selftests/powerpc: Fix spelling mistake "mmaping" -> "mmapping"

There is a spelling mistake in a perror message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221021084545.65973-1-colin.i.king@gmail.com
19 months agopowerpc/kprobes: Use preempt_enable() rather than the no_resched variant
Naveen N. Rao [Thu, 20 Oct 2022 17:28:59 +0000 (22:58 +0530)]
powerpc/kprobes: Use preempt_enable() rather than the no_resched variant

preempt_enable_no_resched() is just the same as preempt_enable() when we
are in a irqs disabled context. kprobe_handler() and the post/fault
handlers are all called with irqs disabled. As such, convert those to
just use preempt_enable().

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/72639f75fe66f931ec8c2165276ffbfb0fe1006f.1666262278.git.naveen.n.rao@linux.vnet.ibm.com
19 months agopowerpc/kprobes: Have optimized_callback() use preempt_enable()
Naveen N. Rao [Thu, 20 Oct 2022 17:28:58 +0000 (22:58 +0530)]
powerpc/kprobes: Have optimized_callback() use preempt_enable()

Similar to x86 commit 2e62024c265aa6 ("kprobes/x86: Use preempt_enable()
in optimized_callback()"), change powerpc optprobes to use
preempt_enable() rather than preempt_enable_no_resched() since powerpc
also removed irq disabling for optprobes in commit f72180cc93a2c6
("powerpc/kprobes: Do not disable interrupts for optprobes and
kprobes_on_ftrace").

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1885bab182626c33d9bf6421f430abf924c521a5.1666262278.git.naveen.n.rao@linux.vnet.ibm.com
19 months agopowerpc/kprobes: Remove preempt disable around call to get_kprobe() in arch_prepare_k...
Naveen N. Rao [Thu, 20 Oct 2022 17:28:57 +0000 (22:58 +0530)]
powerpc/kprobes: Remove preempt disable around call to get_kprobe() in arch_prepare_kprobe()

arch_prepare_kprobe() is called from register_kprobe() via
prepare_kprobe(), or through register_aggr_kprobe(), both with the
kprobe_mutex held. Per the comment for get_kprobe():
  /*
   * This routine is called either:
   * - under the 'kprobe_mutex' - during kprobe_[un]register().
   * OR
   * - with preemption disabled - from architecture specific code.
   */

As such, there is no need to disable preemption around the call to
get_kprobe(). Drop the same.

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1043d06a0affed83a4a46dd29466e72820ee215d.1666262278.git.naveen.n.rao@linux.vnet.ibm.com
19 months agopowerpc/mpic_msgr: fix cast removes address space of expression warnings
ruanjinjie [Wed, 19 Oct 2022 06:34:14 +0000 (14:34 +0800)]
powerpc/mpic_msgr: fix cast removes address space of expression warnings

When build Linux kernel, encounter the following warnings:

./arch/powerpc/sysdev/mpic_msgr.c:230:38: warning: cast removes address space '__iomem' of expression
./arch/powerpc/sysdev/mpic_msgr.c:230:27: warning: incorrect type in assignment (different address spaces)

The data type of msgr->mer and msgr->base are 'u32 __iomem *', but
converted to 'u32 *' and 'u8 *' directly and cause above warnings, now
instead of using a type cast, change the size of the pointer offset to fix
these warnings.

Signed-off-by: ruanjinjie <ruanjinjie@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221019063414.3758087-1-ruanjinjie@huawei.com
19 months agopowerpc/xive: add missing iounmap() in error path in xive_spapr_populate_irq_data()
Yang Yingliang [Mon, 17 Oct 2022 03:23:33 +0000 (11:23 +0800)]
powerpc/xive: add missing iounmap() in error path in xive_spapr_populate_irq_data()

If remapping 'data->trig_page' fails, the 'data->eoi_mmio' need be unmapped
before returning from xive_spapr_populate_irq_data().

Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221017032333.1852406-1-yangyingliang@huawei.com
19 months agopowerpc: suppress some linker warnings in recent linker versions
Stephen Rothwell [Mon, 10 Oct 2022 05:57:21 +0000 (16:57 +1100)]
powerpc: suppress some linker warnings in recent linker versions

This is a follow on from commit

  0d362be5b142 ("Makefile: link with -z noexecstack --no-warn-rwx-segments")

for arch/powerpc/boot to address wanrings like:

  ld: warning: opal-calls.o: missing .note.GNU-stack section implies executable stack
  ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
  ld: warning: arch/powerpc/boot/zImage.epapr has a LOAD segment with RWX permissions

This fixes issue https://github.com/linuxppc/issues/issues/417

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221010165721.106267e6@canb.auug.org.au
19 months agopowerpc/sysdev: Remove some duplicate prefix in some messages
Christophe JAILLET [Sun, 9 Oct 2022 10:49:50 +0000 (12:49 +0200)]
powerpc/sysdev: Remove some duplicate prefix in some messages

At the beginning of the file, we have:
   #define pr_fmt(fmt) "xive: " fmt

So, there is no need to duplicate "XIVE:" in debug and error messages.

For the records, these useless prefix have been added in commit
5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt
controller")

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7b8b5915a2c7c1616b33e8433ebe0a0bf07070a2.1665312579.git.christophe.jaillet@wanadoo.fr
19 months agopowerpc: remove the last remnants of cputime_t
Nicholas Piggin [Thu, 6 Oct 2022 10:56:53 +0000 (20:56 +1000)]
powerpc: remove the last remnants of cputime_t

cputime_t was a core kernel type, removed by commits
ed5c8c854f2b..b672592f0221. As explained in commit b672592f0221
("sched/cputime: Remove generic asm headers"), the final cleanup
is for the arch to provide cputime_to_nsec[s](). Commit ade7667a981b
("powerpc: Add cputime_to_nsecs()") did that, but justdidn't remove
the then-unused cputime_to_usecs(), cputime_t type, and associated
remnants.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006105653.115829-1-npiggin@gmail.com
19 months agopowerpc: Print instruction dump on a single line
Michael Ellerman [Thu, 6 Oct 2022 03:20:19 +0000 (14:20 +1100)]
powerpc: Print instruction dump on a single line

Although the previous commit made the powerpc instruction dump usable
with scripts/decodecode, there are still some problems.

Because the dump is split across multiple lines, the script doesn't cope
with printk timestamps or caller info.

That can be fixed by printing the entire dump on one line, eg:

  [   12.016307][  T112] --- interrupt: c00
  [   12.016605][  T112] Code: 4b7aae15 60000000 3d22016e 3c62ffec 39291160 38639bc0 e8890000 4b7aadf9 60000000 4bfffee8 7c0802a6 60000000 <0fe0000060420000 3c4c008f 384268a0
  [   12.017655][  T112] ---[ end trace 0000000000000000 ]---

That output can then be piped directly into scripts/decodecode and
interpreted correctly.

Printing the dump on a single line does produce a very long line, about
173 characters. That is still shorter than x86, which prints nearly 200
characters even without timestamps etc.

All consoles I'm aware of will wrap the line if it's too long, so the
length should not be a functional problem. If anything it should help on
consoles like VGA by using less vertical space.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006032019.1128624-2-mpe@ellerman.id.au
19 months agopowerpc: Make instruction dump work with scripts/decodecode
Michael Ellerman [Thu, 6 Oct 2022 03:20:18 +0000 (14:20 +1100)]
powerpc: Make instruction dump work with scripts/decodecode

Matt reported that scripts/decodecode doesn't work for the instruction
dump in the powerpc oops output. Although there are scripts around that
can decode it, it would be preferable if the standard in-tree script
worked.

All other arches prefix the instruction dump with "Code:", and that's
what the script looks for, so use that.

The script then works as expected:

  $ CROSS_COMPILE=powerpc64le-linux-gnu- ./scripts/decodecode
  Code:
  fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000
  60000000 ebff0008 7c3ef840 41820048 <815f0060e93f0000 5529077c 7d295378
  ^D

  All code
  ========
     0:   f0 ff c1 fb     std     r30,-16(r1)
     4:   c1 ff 21 f8     stdu    r1,-64(r1)
     8:   78 1b 7d 7c     mr      r29,r3
     ...

Note that the script doesn't cope well with printk timestamps or printk
caller info.

Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006032019.1128624-1-mpe@ellerman.id.au
19 months agopowerpc/microwatt: Add litesd
Joel Stanley [Fri, 30 Sep 2022 06:50:12 +0000 (16:20 +0930)]
powerpc/microwatt: Add litesd

This is the register layout of the litesd peripheral for the fusesoc
based Microwatt SoC.

It requires a description of the system clock, which is hardcoded to
100MHz.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930065012.2860577-1-joel@jms.id.au
19 months agopowerpc/8xx: Reverse order entries are written by __set_pte_at()
Christophe Leroy [Wed, 28 Sep 2022 06:29:22 +0000 (08:29 +0200)]
powerpc/8xx: Reverse order entries are written by __set_pte_at()

At the time being, with 16k pages __set_pte_at() writes table entries
in reverse order:

 294: 91 49 00 0c  stw     r10,12(r9)
 298: 91 49 00 08  stw     r10,8(r9)
 29c: 91 49 00 04  stw     r10,4(r9)
 2a0: 91 49 00 00  stw     r10,0(r9)

Allthough there should be no impact at all as it stays in a single
cacheline, reverse the writing in a more natural order.

 288: 91 49 00 0c  stw     r10,0(r9)
 28c: 91 49 00 08  stw     r10,4(r9)
 290: 91 49 00 04  stw     r10,8(r9)
 294: 91 49 00 00  stw     r10,12(r9)

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/67c3b5d44edfec054234ea9b4d05fc4b4f7f8a0e.1664346554.git.christophe.leroy@csgroup.eu
19 months agopowerpc/8xx: Simplify pte_update() with 16k pages
Christophe Leroy [Wed, 28 Sep 2022 06:29:00 +0000 (08:29 +0200)]
powerpc/8xx: Simplify pte_update() with 16k pages

While looking at code generated for code patching, I saw that
pte_clear generated:

 2d8: 38 a0 00 00  li      r5,0
 2dc: 38 e0 10 00  li      r7,4096
 2e0: 39 00 20 00  li      r8,8192
 2e4: 39 40 30 00  li      r10,12288
 2e8: 90 a9 00 00  stw     r5,0(r9)
 2ec: 90 e9 00 04  stw     r7,4(r9)
 2f0: 91 09 00 08  stw     r8,8(r9)
 2f4: 91 49 00 0c  stw     r10,12(r9)

With 16k pages, only the first entry is used by the kernel, so no need
to adapt the address of other entries. Only duplicate the first entry
for hardware.

Now it is:

 2cc: 39 40 00 00  li      r10,0
 2d0: 91 49 00 00  stw     r10,0(r9)
 2d4: 91 49 00 04  stw     r10,4(r9)
 2d8: 91 49 00 08  stw     r10,8(r9)
 2dc: 91 49 00 0c  stw     r10,12(r9)

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/65f76300de07091a59a042a3db2d0ce9b939a05c.1664346532.git.christophe.leroy@csgroup.eu
19 months agopowerpc/sgy_cts1000: convert to using gpiod API and facelift
Dmitry Torokhov [Tue, 27 Sep 2022 19:23:58 +0000 (12:23 -0700)]
powerpc/sgy_cts1000: convert to using gpiod API and facelift

This patch converts the driver to newer gpiod API, and away from
OF-specific legacy gpio API that we want to stop using.

While at it, let's address a few more issues:

- switch to using dev_info()/pr_info() and friends
- cancel work when unbinding the driver

Note that the original code handled halt GPIO polarity incorrectly:
in halt callback, when line polarity is "low" it would set trigger to
"1" and drive halt line high, which is counter to the annotation.
gpiod API will drive such line low. However I do not see any DTSes
in mainline that have a DT node with "sgy,gpio-halt" compatible.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YzNNznewTyCJiGFz@google.com