platform/kernel/linux-starfive.git
2 years agopowerpc/lib: Add __init attribute to eligible functions
Nick Child [Thu, 16 Dec 2021 22:00:17 +0000 (17:00 -0500)]
powerpc/lib: Add __init attribute to eligible functions

Some functions defined in 'arch/powerpc/lib' are deserving of an `__init`
macro attribute. These functions are only called by other initialization
functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-3-nick.child@ibm.com
2 years agopowerpc/kernel: Add __init attribute to eligible functions
Nick Child [Thu, 16 Dec 2021 22:00:16 +0000 (17:00 -0500)]
powerpc/kernel: Add __init attribute to eligible functions

Some functions defined in `arch/powerpc/kernel` (and one in `arch/powerpc/
kexec`) are deserving of an `__init` macro attribute. These functions are
only called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-2-nick.child@ibm.com
2 years agoselftests/powerpc: Add a test of sigreturning to the kernel
Michael Ellerman [Thu, 9 Dec 2021 11:59:44 +0000 (22:59 +1100)]
selftests/powerpc: Add a test of sigreturning to the kernel

We have a general signal fuzzer, sigfuz, which can modify the MSR & NIP
before sigreturn. But the chance of it hitting a kernel address and also
clearing MSR_PR is fairly slim.

So add a specific test of sigreturn to a kernel address, both with and
without attempting to clear MSR_PR (which the kernel must block).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211209115944.4062384-1-mpe@ellerman.id.au
2 years agopowerpc/dts: Remove "spidev" nodes
Rob Herring [Fri, 17 Dec 2021 22:14:00 +0000 (16:14 -0600)]
powerpc/dts: Remove "spidev" nodes

"spidev" is not a real device, but a Linux implementation detail. It has
never been documented either. The kernel has WARNed on the use of it for
over 6 years. Time to remove its usage from the tree.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211217221400.3667133-1-robh@kernel.org
2 years agoocxl: remove redundant rc variable
Minghao Chi [Wed, 15 Dec 2021 06:04:38 +0000 (06:04 +0000)]
ocxl: remove redundant rc variable

Return value from ocxl_context_attach() directly instead
of taking this in another redundant variable.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211215060438.441918-1-chi.minghao@zte.com.cn
2 years agopowerpc/64s/radix: Fix huge vmap false positive
Nicholas Piggin [Thu, 16 Dec 2021 10:33:42 +0000 (20:33 +1000)]
powerpc/64s/radix: Fix huge vmap false positive

pmd_huge() is defined to false when HUGETLB_PAGE is not configured, but
the vmap code still installs huge PMDs. This leads to false bad PMD
errors when vunmapping because it is not seen as a huge PTE, and the bad
PMD check catches it. The end result may not be much more serious than
some bad pmd warning messages, because the pmd_none_or_clear_bad() does
what we wanted and clears the huge PTE anyway.

Fix this by checking pmd_is_leaf(), which checks for a PTE regardless of
config options. The whole huge/large/leaf stuff is a tangled mess but
that's kernel-wide and not something we can improve much in arch/powerpc
code.

pmd_page(), pud_page(), etc., called by vmalloc_to_page() on huge vmaps
can similarly trigger a false VM_BUG_ON when CONFIG_HUGETLB_PAGE=n, so
those checks are adjusted. The checks were added by commit d6eacedd1f0e
("powerpc/book3s: Use config independent helpers for page table walk"),
while implementing a similar fix for other page table walking functions.

Fixes: d909f9109c30 ("powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216103342.609192-1-npiggin@gmail.com
2 years agopowerpc: use swap() to make code cleaner
Yang Guang [Sat, 18 Dec 2021 01:59:17 +0000 (09:59 +0800)]
powerpc: use swap() to make code cleaner

Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
opencoding it.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: David Yang <davidcomponentone@gmail.com>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
[mpe: Add include of linux/minmax.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/71a702c2189b16c152affd8a8cda1d84ce32741c.1639792543.git.yang.guang5@zte.com.cn
2 years agopowerpc/mpic: Use bitmap_zalloc() when applicable
Christophe JAILLET [Fri, 17 Dec 2021 21:54:12 +0000 (22:54 +0100)]
powerpc/mpic: Use bitmap_zalloc() when applicable

'mpic->protected' is a bitmap. So use 'bitmap_zalloc()' to simplify
code and improve the semantic, instead of hand writing it.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aa145f674e08044c98f13f1a985faa9cc29c3708.1639777976.git.christophe.jaillet@wanadoo.fr
2 years agoselftests/powerpc: skip tests for unavailable mitigations.
Sachin Sant [Mon, 13 Dec 2021 16:42:23 +0000 (22:12 +0530)]
selftests/powerpc: skip tests for unavailable mitigations.

Mitigation patching test iterates over a set of mitigations irrespective
of whether a certain mitigation is supported/available in the kernel.
This causes following messages on a kernel where some mitigations
are unavailable:

  Spawned threads enabling/disabling mitigations ...
  cat: entry_flush: No such file or directory
  cat: uaccess_flush: No such file or directory
  Waiting for timeout ...
  OK

This patch adds a check for available mitigations in the kernel.

Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/163941374362.36967.18016981579099073379.sendpatchset@1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa
2 years agopowerpc/pseries: use slab context cpumask allocation in CPU hotplug init
Nicholas Piggin [Fri, 5 Nov 2021 13:29:23 +0000 (23:29 +1000)]
powerpc/pseries: use slab context cpumask allocation in CPU hotplug init

Slab is up at this point, using the bootmem allocator triggers a
warning. Switch to using the regular cpumask allocator.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105132923.1582514-1-npiggin@gmail.com
2 years agopowerpc/64s/interrupt: avoid saving CFAR in some asynchronous interrupts
Nicholas Piggin [Wed, 22 Sep 2021 14:54:52 +0000 (00:54 +1000)]
powerpc/64s/interrupt: avoid saving CFAR in some asynchronous interrupts

Reading the CFAR register is quite costly (~20 cycles on POWER9). It is
a good idea to have for most synchronous interrupts, but for async ones
it is much less important.

Doorbell, external, and decrementer interrupts are the important
asynchronous ones. HV interrupts can't skip CFAR if KVM HV is possible,
because it might be a guest exit that requires CFAR preserved. But the
important pseries interrupts can avoid loading CFAR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-7-npiggin@gmail.com
2 years agopowerpc/64/interrupt: reduce expensive debug tests
Nicholas Piggin [Wed, 22 Sep 2021 14:54:51 +0000 (00:54 +1000)]
powerpc/64/interrupt: reduce expensive debug tests

Move the assertions requiring restart table searches under
CONFIG_PPC_IRQ_SOFT_MASK_DEBUG.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-6-npiggin@gmail.com
2 years agopowerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use
Nicholas Piggin [Wed, 22 Sep 2021 14:54:50 +0000 (00:54 +1000)]
powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use

Enabling MSR[EE] in interrupt handlers while interrupts are still soft
masked allows PMIs to profile interrupt handlers to some degree, beyond
what SIAR latching allows.

When perf is not being used, this is almost useless work. It requires an
extra mtmsrd in the irq handler, and it also opens the door to masked
interrupts hitting and requiring replay, which is more expensive than
just taking them directly. This effect can be noticable in high IRQ
workloads.

Avoid enabling MSR[EE] unless perf is currently in use. This saves about
60 cycles (or 8%) on a simple decrementer interrupt microbenchmark.
Replayed interrupts drop from 1.4% of all interrupts taken, to 0.003%.

This does prevent the soft-nmi interrupt being taken in these handlers,
but that's not too reliable anyway. The SMP watchdog will continue to be
the reliable way to catch lockups.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-5-npiggin@gmail.com
2 years agopowerpc/64s/perf: add power_pmu_wants_prompt_pmi to say whether perf wants PMIs to...
Nicholas Piggin [Wed, 22 Sep 2021 14:54:49 +0000 (00:54 +1000)]
powerpc/64s/perf: add power_pmu_wants_prompt_pmi to say whether perf wants PMIs to be soft-NMI

Interrupt code enables MSR[EE] in some irq handlers while keeping local
irqs disabled via soft-mask, allowing PMI interrupts to be taken as
soft-NMI to improve profiling of irq handlers.

When perf is not enabled, there is no point to doing this, it's
additional overhead. So provide a function that can say if PMIs should
be taken promptly if possible.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-4-npiggin@gmail.com
2 years agopowerpc/64s/interrupt: handle MSR EE and RI in interrupt entry wrapper
Nicholas Piggin [Wed, 22 Sep 2021 14:54:48 +0000 (00:54 +1000)]
powerpc/64s/interrupt: handle MSR EE and RI in interrupt entry wrapper

The mtmsrd to enable MSR[RI] can be combined with the mtmsrd to enable
MSR[EE] in interrupt entry code, for those interrupts which enable EE.
This helps performance of important synchronous interrupts (e.g., page
faults).

This is similar to what commit dd152f70bdc1 ("powerpc/64s: system call
avoid setting MSR[RI] until we set MSR[EE]") does for system calls.

Do this by enabling EE and RI together at the beginning of the entry
wrapper if PACA_IRQ_HARD_DIS is clear, and only enabling RI if it is
set.

Asynchronous interrupts set PACA_IRQ_HARD_DIS, but synchronous ones
leave it unchanged, so by default they always get EE=1 unless they have
interrupted a caller that is hard disabled. When the sync interrupt
later calls interrupt_cond_local_irq_enable(), it will not require
another mtmsrd because MSR[EE] was already enabled here.

This avoids one mtmsrd L=1 for synchronous interrupts on 64s, which
saves about 20 cycles on POWER9. And for kernel-mode interrupts, both
synchronous and asynchronous, this saves an additional 40 cycles due to
the mtmsrd being moved ahead of mfspr SPRN_AMR, which prevents a SPR
scoreboard stall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-3-npiggin@gmail.com
2 years agopowerpc/64/interrupt: make normal synchronous interrupts enable MSR[EE] if possible
Nicholas Piggin [Wed, 22 Sep 2021 14:54:47 +0000 (00:54 +1000)]
powerpc/64/interrupt: make normal synchronous interrupts enable MSR[EE] if possible

Make synchronous interrupt handler entry wrappers enable MSR[EE] if
MSR[EE] was enabled in the interrupted context. IRQs are soft-disabled
at this point so there is no change to high level code, but it's a
masked interrupt could fire.

This is a performance disadvantage for interrupts which do not later
call interrupt_cond_local_irq_enable(), because an an additional mtmsrd
or wrtee instruction is executed. However the important synchronous
interrupts (e.g., page fault) do enable interrupts, so the performance
disadvantage is mostly avoided.

In the next patch, MSR[RI] enabling can be combined with MSR[EE]
enabling, which mitigates the performance drop for the former and gives
a performance advanage for the latter interrupts, on 64s machines. 64e
is coming along for the ride for now to avoid divergences with 64s in
this tricky code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-2-npiggin@gmail.com
2 years agopowerpc/pseries/vas: Don't print an error when VAS is unavailable
Nicholas Piggin [Fri, 26 Nov 2021 05:21:33 +0000 (15:21 +1000)]
powerpc/pseries/vas: Don't print an error when VAS is unavailable

KVM does not support VAS so guests always print a useless error on boot

    vas: HCALL(398) error -2, query_type 0, result buffer 0x57f2000

Change this to only print the message if the error is not H_FUNCTION.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211126052133.1664375-1-npiggin@gmail.com
2 years agopowerpc/perf: Add data source encodings for power10 platform
Kajol Jain [Mon, 6 Dec 2021 09:17:49 +0000 (14:47 +0530)]
powerpc/perf: Add data source encodings for power10 platform

The code represent memory/cache level data based on PERF_MEM_LVL_*
namespace, which is in the process of deprication in the favour of
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.
Add data source encodings to represent cache/memory data based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.

Add data source encodings to represent data coming from local
memory/Remote memory/distant memory and remote/distant cache hits.

In order to represent data coming from OpenCAPI cache/memory, we use
LVLNUM "PMEM" field which is used to present persistent memory accesses.

Result in power10 system with patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                      Shared Object
 # ........  ............  ........................  ..........................  ................
 #
    29.46%          2331  L1 or L1 hit              [.] __random                                     libc-2.28.so
    23.11%          2121  L1 or L1 hit              [.] producer_populate_cache                      producer_consumer
    18.56%          1758  L1 or L1 hit              [.] __random_r                                   libc-2.28.so
    15.64%          1559  L2 or L2 hit              [.] __random                                     libc-2.28.so
    .....
    0.09%              5  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
    0.07%              4  Remote socket, same board Any cache hit             [.] __random         libc-2.28.so
    .....

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211206091749.87585-5-kjain@linux.ibm.com
2 years agopowerpc/perf: Add encodings to represent data based on newer composite PERF_MEM_LVLNU...
Kajol Jain [Mon, 6 Dec 2021 09:17:48 +0000 (14:47 +0530)]
powerpc/perf: Add encodings to represent data based on newer composite PERF_MEM_LVLNUM* fields

The code represent data coming from L1/L2/L3 cache hits based on
PERF_MEM_LVL_* namespace, which is in the process of deprecation in
the favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_}
fields.

Add data source encodings to represent L1/L2/L3 cache hits based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields for
power10 and older platforms

Result in power9 system without patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                             Shared Object
 # ........  ............  ........................  .................................  ................
 #
    29.51%             1  L2 hit                    [k] perf_event_exec                [kernel.vmlinux]
    27.05%             1  L1 hit                    [k] perf_ctx_unlock                [kernel.vmlinux]
    13.93%             1  L1 hit                    [k] vtime_delta                    [kernel.vmlinux]
    13.11%             1  L1 hit                    [k] prepend_path.isra.11           [kernel.vmlinux]
     8.20%             1  L1 hit                    [.] 00000038.plt_call.__GI_strlen  libc-2.28.so
     8.20%             1  L1 hit                    [k] perf_event_interrupt           [kernel.vmlinux]

Result in power9 system with patch changes:

localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
 # Overhead       Samples  Memory access             Symbol                      Shared Object
 # ........  ............  ........................  ..........................  ................
 #
    36.63%             1  L2 or L2 hit              [k] perf_event_exec         [kernel.vmlinux]
    25.50%             1  L1 or L1 hit              [k] vtime_delta             [kernel.vmlinux]
    13.12%             1  L1 or L1 hit              [k] unmap_region            [kernel.vmlinux]
    12.62%             1  L1 or L1 hit              [k] perf_sample_event_took  [kernel.vmlinux]
     6.93%             1  L1 or L1 hit              [k] perf_ctx_unlock         [kernel.vmlinux]
     5.20%             1  L1 or L1 hit              [.] __memcpy_power7         libc-2.28.so

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211206091749.87585-4-kjain@linux.ibm.com
2 years agoperf: Add new macros for mem_hops field
Kajol Jain [Mon, 6 Dec 2021 09:17:46 +0000 (14:47 +0530)]
perf: Add new macros for mem_hops field

Add new macros for mem_hops field which can be used to
represent remote-node, socket and board level details.

Currently the code had macro for HOPS_0, which corresponds
to data coming from another core but same node.
Add new macros for HOPS_1 to HOPS_3 to represent
remote-node, socket and board level data.

For ex: Encodings for mem_hops fields with L2 cache:

L2 - local L2
L2 | REMOTE | HOPS_0 - remote core, same node L2
L2 | REMOTE | HOPS_1 - remote node, same socket L2
L2 | REMOTE | HOPS_2 - remote socket, same board L2
L2 | REMOTE | HOPS_3 - remote board L2

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211206091749.87585-2-kjain@linux.ibm.com
2 years agoMerge branch 'topic/ppc-kvm' into next
Michael Ellerman [Wed, 15 Dec 2021 00:29:53 +0000 (11:29 +1100)]
Merge branch 'topic/ppc-kvm' into next

Bring in some more KVM commits from our KVM topic branch.

2 years agoKVM: PPC: Book3S HV P9: Use kvm_arch_vcpu_get_wait() to get rcuwait object
Sean Christopherson [Mon, 13 Dec 2021 17:45:56 +0000 (17:45 +0000)]
KVM: PPC: Book3S HV P9: Use kvm_arch_vcpu_get_wait() to get rcuwait object

Use kvm_arch_vcpu_get_wait() to get a vCPU's rcuwait object instead of
using vcpu->wait directly in kvmhv_run_single_vcpu().  Functionally, this
is a nop as vcpu->arch.waitp is guaranteed to point at vcpu->wait.  But
that is not obvious at first glance, and a future change coming in via
the KVM tree, commit 510958e99721 ("KVM: Force PPC to define its own
rcuwait object"), will hide vcpu->wait from architectures that define
__KVM_HAVE_ARCH_WQP to prevent generic KVM from attepting to wake a vCPU
with the wrong rcuwait object.

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211213174556.3871157-1-seanjc@google.com
2 years agopowerpc/powermac: Add additional missing lockdep_register_key()
Christophe Leroy [Wed, 8 Dec 2021 17:36:52 +0000 (17:36 +0000)]
powerpc/powermac: Add additional missing lockdep_register_key()

Commit df1f679d19ed ("powerpc/powermac: Add missing
lockdep_register_key()") fixed a problem that was causing a WARNING.

There are two other places in the same file with the same problem
originating from commit 9e607f72748d ("i2c_powermac: shut up lockdep
warning").

Add missing lockdep_register_key()

Fixes: 9e607f72748d ("i2c_powermac: shut up lockdep warning")
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Depends-on: df1f679d19ed ("powerpc/powermac: Add missing lockdep_register_key()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200055
Link: https://lore.kernel.org/r/2c7e421874e21b2fb87813d768cf662f630c2ad4.1638984999.git.christophe.leroy@csgroup.eu
2 years agopowerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic
Hari Bathini [Tue, 7 Dec 2021 10:37:19 +0000 (16:07 +0530)]
powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic

In panic path, fadump is triggered via a panic notifier function.
Before calling panic notifier functions, smp_send_stop() gets called,
which stops all CPUs except the panic'ing CPU. Commit 8389b37dffdc
("powerpc: stop_this_cpu: remove the cpu from the online map.") and
again commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
started marking CPUs as offline while stopping them. So, if a kernel
has either of the above commits, vmcore captured with fadump via panic
path would not process register data for all CPUs except the panic'ing
CPU. Sample output of crash-utility with such vmcore:

  # crash vmlinux vmcore
  ...
        KERNEL: vmlinux
      DUMPFILE: vmcore  [PARTIAL DUMP]
          CPUS: 1
          DATE: Wed Nov 10 09:56:34 EST 2021
        UPTIME: 00:00:42
  LOAD AVERAGE: 2.27, 0.69, 0.24
         TASKS: 183
      NODENAME: XXXXXXXXX
       RELEASE: 5.15.0+
       VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
       MACHINE: ppc64le  (2500 Mhz)
        MEMORY: 8 GB
         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
           PID: 3394
       COMMAND: "bash"
          TASK: c0000000150a5f80  [THREAD_INFO: c0000000150a5f80]
           CPU: 1
         STATE: TASK_RUNNING (PANIC)

  crash> p -x __cpu_online_mask
  __cpu_online_mask = $1 = {
    bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>
  crash>
  crash> p -x __cpu_active_mask
  __cpu_active_mask = $2 = {
    bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>

While this has been the case since fadump was introduced, the issue
was not identified for two probable reasons:

  - In general, the bulk of the vmcores analyzed were from crash
    due to exception.

  - The above did change since commit 8341f2f222d7 ("sysrq: Use
    panic() to force a crash") started using panic() instead of
    deferencing NULL pointer to force a kernel crash. But then
    commit de6e5d38417e ("powerpc: smp_send_stop do not offline
    stopped CPUs") stopped marking CPUs as offline till kernel
    commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
    reverted that change.

To ensure post processing register data of all other CPUs happens
as intended, let panic() function take the crash friendly path (read
crash_smp_send_stop()) with the help of crash_kexec_post_notifiers
option. Also, as register data for all CPUs is captured by f/w, skip
IPI callbacks here for fadump, to avoid any complications in finding
the right backtraces.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211207103719.91117-2-hbathini@linux.ibm.com
2 years agopowerpc: handle kdump appropriately with crash_kexec_post_notifiers option
Hari Bathini [Tue, 7 Dec 2021 10:37:18 +0000 (16:07 +0530)]
powerpc: handle kdump appropriately with crash_kexec_post_notifiers option

Kdump can be triggered after panic_notifers since commit f06e5153f4ae2
("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump
after panic_notifers") introduced crash_kexec_post_notifiers option.
But using this option would mean smp_send_stop(), that marks all other
CPUs as offline, gets called before kdump is triggered. As a result,
kdump routines fail to save other CPUs' registers. To fix this, kdump
friendly crash_smp_send_stop() function was introduced with kernel
commit 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump
friendly version in panic path"). Override this kdump friendly weak
function to handle crash_kexec_post_notifiers option appropriately
on powerpc.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
[Fixed signature of crash_stop_this_cpu() - reported by lkp@intel.com]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211207103719.91117-1-hbathini@linux.ibm.com
2 years agoselftests/powerpc/spectre_v2: Return skip code when miss_percent is high
Thadeu Lima de Souza Cascardo [Tue, 7 Dec 2021 13:05:57 +0000 (10:05 -0300)]
selftests/powerpc/spectre_v2: Return skip code when miss_percent is high

A mis-match between reported and actual mitigation is not restricted to the
Vulnerable case. The guest might also report the mitigation as "Software
count cache flush" and the host will still mitigate with branch cache
disabled.

So, instead of skipping depending on the detected mitigation, simply skip
whenever the detected miss_percent is the expected one for a fully
mitigated system, that is, above 95%.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211207130557.40566-1-cascardo@canonical.com
2 years agopowerpc/cell: Fix clang -Wimplicit-fallthrough warning
Anders Roxell [Tue, 7 Dec 2021 11:02:28 +0000 (12:02 +0100)]
powerpc/cell: Fix clang -Wimplicit-fallthrough warning

Clang warns:

arch/powerpc/platforms/cell/pervasive.c:81:2: error: unannotated fall-through between switch labels
        case SRR1_WAKEEE:
        ^
arch/powerpc/platforms/cell/pervasive.c:81:2: note: insert 'break;' to avoid fall-through
        case SRR1_WAKEEE:
        ^
        break;
1 error generated.

Clang is more pedantic than GCC, which does not warn when failing
through to a case that is just break or return. Clang's version is more
in line with the kernel's own stance in deprecated.rst. Add athe missing
break to silence the warning.

Fixes: 6e83985b0f6e ("powerpc/cbe: Do not process external or decremeter interrupts from sreset")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211207110228.698956-1-anders.roxell@linaro.org
2 years agomacintosh: Add const to of_device_id
Xiang wangx [Sun, 5 Dec 2021 13:09:25 +0000 (21:09 +0800)]
macintosh: Add const to of_device_id

struct of_device_id should normally be const.

Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211205130925.28389-1-wangxiang@cdjrlc.com
2 years agopowerpc/inst: Optimise copy_inst_from_kernel_nofault()
Christophe Leroy [Mon, 29 Nov 2021 17:49:41 +0000 (18:49 +0100)]
powerpc/inst: Optimise copy_inst_from_kernel_nofault()

copy_inst_from_kernel_nofault() uses copy_from_kernel_nofault() to
copy one or two 32bits words. This means calling an out-of-line
function which itself calls back copy_from_kernel_nofault_allowed()
then performs a generic copy with loops.

Rewrite copy_inst_from_kernel_nofault() to do everything at a
single place and use __get_kernel_nofault() directly to perform
single accesses without loops.

Allthough the generic function uses pagefault_disable(), it is not
required on powerpc because do_page_fault() bails earlier when a
kernel mode fault happens on a kernel address.

As the function has now become very small, inline it.

With this change, on an 8xx the time spent in the loop in
ftrace_replace_code() is reduced by 23% at function tracer activation
and 27% at nop tracer activation.
The overall time to activate function tracer (measured with shell
command 'time') is 570ms before the patch and 470ms after the patch.

Even vmlinux size is reduced (by 152 instruction).

Before the patch:

00000018 <copy_inst_from_kernel_nofault>:
  18: 94 21 ff e0  stwu    r1,-32(r1)
  1c: 7c 08 02 a6  mflr    r0
  20: 38 a0 00 04  li      r5,4
  24: 93 e1 00 1c  stw     r31,28(r1)
  28: 7c 7f 1b 78  mr      r31,r3
  2c: 38 61 00 08  addi    r3,r1,8
  30: 90 01 00 24  stw     r0,36(r1)
  34: 48 00 00 01  bl      34 <copy_inst_from_kernel_nofault+0x1c>
34: R_PPC_REL24 copy_from_kernel_nofault
  38: 2c 03 00 00  cmpwi   r3,0
  3c: 40 82 00 0c  bne     48 <copy_inst_from_kernel_nofault+0x30>
  40: 81 21 00 08  lwz     r9,8(r1)
  44: 91 3f 00 00  stw     r9,0(r31)
  48: 80 01 00 24  lwz     r0,36(r1)
  4c: 83 e1 00 1c  lwz     r31,28(r1)
  50: 38 21 00 20  addi    r1,r1,32
  54: 7c 08 03 a6  mtlr    r0
  58: 4e 80 00 20  blr

After the patch (before inlining):

00000018 <copy_inst_from_kernel_nofault>:
  18: 3d 20 b0 00  lis     r9,-20480
  1c: 7c 04 48 40  cmplw   r4,r9
  20: 7c 69 1b 78  mr      r9,r3
  24: 41 80 00 14  blt     38 <copy_inst_from_kernel_nofault+0x20>
  28: 81 44 00 00  lwz     r10,0(r4)
  2c: 38 60 00 00  li      r3,0
  30: 91 49 00 00  stw     r10,0(r9)
  34: 4e 80 00 20  blr

  38: 38 60 ff de  li      r3,-34
  3c: 4e 80 00 20  blr
  40: 38 60 ff f2  li      r3,-14
  44: 4e 80 00 20  blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Add clang workaround, with version check as suggested by Nathan]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0d5b12183d5176dd702d29ad94c39c384e51c78f.1638208156.git.christophe.leroy@csgroup.eu
2 years agopowerpc/inst: Move ppc_inst_t definition in asm/reg.h
Christophe Leroy [Mon, 29 Nov 2021 17:49:40 +0000 (18:49 +0100)]
powerpc/inst: Move ppc_inst_t definition in asm/reg.h

Because of circular inclusion of asm/hw_breakpoint.h, we
need to move definition of asm/reg.h outside of inst.h
so that asm/hw_breakpoint.h gets it without including
asm/inst.h

Also remove asm/inst.h from asm/uprobes.h as it's not
needed anymore.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4b79f1491118af96b1ac0735e74aeca02ea4c04e.1638208156.git.christophe.leroy@csgroup.eu
2 years agopowerpc/inst: Define ppc_inst_t as u32 on PPC32
Christophe Leroy [Mon, 29 Nov 2021 17:49:39 +0000 (18:49 +0100)]
powerpc/inst: Define ppc_inst_t as u32 on PPC32

Unlike PPC64 ABI, PPC32 uses the stack to pass a parameter defined
as a struct, even when the struct has a single simple element.

To avoid that, define ppc_inst_t as u32 on PPC32.

Keep it as 'struct ppc_inst' when __CHECKER__ is defined so that
sparse can perform type checking.

Also revert commit 511eea5e2ccd ("powerpc/kprobes: Fix Oops by passing
ppc_inst as a pointer to emulate_step() on ppc32") as now the
instruction to be emulated is passed as a register to emulate_step().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c6d0c46f598f76ad0b0a88bc0d84773bd921b17c.1638208156.git.christophe.leroy@csgroup.eu
2 years agopowerpc/inst: Define ppc_inst_t
Christophe Leroy [Mon, 29 Nov 2021 17:49:38 +0000 (18:49 +0100)]
powerpc/inst: Define ppc_inst_t

In order to stop using 'struct ppc_inst' on PPC32,
define a ppc_inst_t typedef.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fe5baa2c66fea9db05a8b300b3e8d2880a42596c.1638208156.git.christophe.leroy@csgroup.eu
2 years agopowerpc/inst: Refactor ___get_user_instr()
Christophe Leroy [Mon, 29 Nov 2021 17:49:37 +0000 (18:49 +0100)]
powerpc/inst: Refactor ___get_user_instr()

PPC64 version of ___get_user_instr() can be used for PPC32 as well,
by simply disabling the suffix part with IS_ENABLED(CONFIG_PPC64).

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1f0ede830ccb33a659119a55cb590820c27004db.1638208156.git.christophe.leroy@csgroup.eu
2 years agopowerpc/32s: Allocate one 256k IBAT instead of two consecutives 128k IBATs
Christophe Leroy [Fri, 26 Nov 2021 12:40:35 +0000 (13:40 +0100)]
powerpc/32s: Allocate one 256k IBAT instead of two consecutives 128k IBATs

Today we have the following IBATs allocated:

---[ Instruction Block Address Translation ]---
0: 0xc0000000-0xc03fffff 0x00000000         4M Kernel   x     m
1: 0xc0400000-0xc05fffff 0x00400000         2M Kernel   x     m
2: 0xc0600000-0xc06fffff 0x00600000         1M Kernel   x     m
3: 0xc0700000-0xc077ffff 0x00700000       512K Kernel   x     m
4: 0xc0780000-0xc079ffff 0x00780000       128K Kernel   x     m
5: 0xc07a0000-0xc07bffff 0x007a0000       128K Kernel   x     m
6:         -
7:         -

The two 128K should be a single 256K instead.

When _etext is not aligned to 128Kbytes, the system will allocate
all necessary BATs to the lower 128Kbytes boundary, then allocate
an additional 128Kbytes BAT for the remaining block.

Instead, align the top to 128Kbytes so that the function directly
allocates a 256Kbytes last block:

---[ Instruction Block Address Translation ]---
0: 0xc0000000-0xc03fffff 0x00000000         4M Kernel   x     m
1: 0xc0400000-0xc05fffff 0x00400000         2M Kernel   x     m
2: 0xc0600000-0xc06fffff 0x00600000         1M Kernel   x     m
3: 0xc0700000-0xc077ffff 0x00700000       512K Kernel   x     m
4: 0xc0780000-0xc07bffff 0x00780000       256K Kernel   x     m
5:         -
6:         -
7:         -

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ab58b296832b0ec650e2203200e060adbcb2677d.1637930421.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Remove CONFIG_PPC_HAVE_KUAP and CONFIG_PPC_HAVE_KUEP
Christophe Leroy [Tue, 19 Oct 2021 07:29:33 +0000 (09:29 +0200)]
powerpc: Remove CONFIG_PPC_HAVE_KUAP and CONFIG_PPC_HAVE_KUEP

All platforms now have KUAP and KUEP so remove CONFIG_PPC_HAVE_KUAP
and CONFIG_PPC_HAVE_KUEP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a3c007ad0951965199e6ab2ef1035966bc66e771.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/kuap: Wire-up KUAP on book3e/64
Christophe Leroy [Tue, 19 Oct 2021 07:29:32 +0000 (09:29 +0200)]
powerpc/kuap: Wire-up KUAP on book3e/64

This adds KUAP support to book3e/64.
This is done by reading the content of SPRN_MAS1 and checking
the TID at the time user pgtable is loaded.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e2c2c9375afd4bbc06aa904d0103a5f5102a2b1a.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/kuap: Wire-up KUAP on 85xx in 32 bits mode.
Christophe Leroy [Tue, 19 Oct 2021 07:29:31 +0000 (09:29 +0200)]
powerpc/kuap: Wire-up KUAP on 85xx in 32 bits mode.

This adds KUAP support to 85xx in 32 bits mode.
This is done by reading the content of SPRN_MAS1 and checking
the TID at the time user pgtable is loaded.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f8696f8980ca1532ada3a2f0e0a03e756269c7fe.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/kuap: Wire-up KUAP on 40x
Christophe Leroy [Tue, 19 Oct 2021 07:29:30 +0000 (09:29 +0200)]
powerpc/kuap: Wire-up KUAP on 40x

This adds KUAP support to 40x. This is done by checking
the content of SPRN_PID at the time user pgtable is loaded.

40x doesn't have KUEP, but KUAP implies KUEP because when the
PID doesn't match the page's PID, the page cannot be read nor
executed.

So KUEP is now automatically selected when KUAP is selected and
disabled when KUAP is disabled.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aaefa91897ddc42ac11019dc0e1d1a525bd08e90.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/kuap: Wire-up KUAP on 44x
Christophe Leroy [Tue, 19 Oct 2021 07:29:29 +0000 (09:29 +0200)]
powerpc/kuap: Wire-up KUAP on 44x

This adds KUAP support to 44x. This is done by checking
the content of SPRN_PID at the time it is read and written
into SPRN_MMUCR.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7d6c3f1978a26feada74b084f651e8cf1e3b3a47.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Add KUAP support for BOOKE and 40x
Christophe Leroy [Tue, 19 Oct 2021 07:29:28 +0000 (09:29 +0200)]
powerpc: Add KUAP support for BOOKE and 40x

On booke/40x we don't have segments like book3s/32.
On booke/40x we don't have access protection groups like 8xx.

Use the PID register to provide user access protection.
Kernel address space can be accessed with any PID.
User address space has to be accessed with the PID of the user.
User PID is always not null.

Everytime the kernel is entered, set PID register to 0 and
restore PID register when returning to user.

Everytime kernel needs to access user data, PID is restored
for the access.

In TLB miss handlers, check the PID and bail out to data storage
exception when PID is 0 and accessed address is in user space.

Note that also forbids execution of user text by kernel except
when user access is unlocked. But this shouldn't be a problem
as the kernel is not supposed to ever run user text.

This patch prepares the infrastructure but the real activation of KUAP
is done by following patches for each processor type one by one.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5d65576a8e31e9480415785a180c92dd4e72306d.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/kuap: Make PPC_KUAP_DEBUG depend on PPC_KUAP only
Christophe Leroy [Tue, 19 Oct 2021 07:29:27 +0000 (09:29 +0200)]
powerpc/kuap: Make PPC_KUAP_DEBUG depend on PPC_KUAP only

PPC_KUAP_DEBUG is supported by all platforms doing PPC_KUAP,
it doesn't depend on Radix on book3s/64.

This will avoid adding one more dependency when implementing
KUAP on book3e/64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a5ff6228a36e51783b83d8c10d058db76e450f63.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/kuap: Prepare for supporting KUAP on BOOK3E/64
Christophe Leroy [Tue, 19 Oct 2021 07:29:26 +0000 (09:29 +0200)]
powerpc/kuap: Prepare for supporting KUAP on BOOK3E/64

Also call kuap_lock() and kuap_save_and_lock() from
interrupt functions with CONFIG_PPC64.

For book3s/64 we keep them empty as it is done in assembly.

Also do the locked assert when switching task unless it is
book3s/64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1cbf94e26e6d6e2e028fd687588a7e6622d454a6.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/config: Add CONFIG_BOOKE_OR_40x
Christophe Leroy [Tue, 19 Oct 2021 07:29:25 +0000 (09:29 +0200)]
powerpc/config: Add CONFIG_BOOKE_OR_40x

We have many functionnalities common to 40x and BOOKE, it leads to
many places with #if defined(CONFIG_BOOKE) || defined(CONFIG_40x).

We are going to add a few more with KUAP for booke/40x, so create
a new symbol which is defined when either BOOKE or 40x is defined.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9a3dbd60924cb25c9f944d3d8205ac5a0d15e229.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/nohash: Move setup_kuap out of 8xx.c
Christophe Leroy [Tue, 19 Oct 2021 07:29:24 +0000 (09:29 +0200)]
powerpc/nohash: Move setup_kuap out of 8xx.c

In order to reuse it on booke/4xx, move KUAP
setup routine out of 8xx.c

Make them usable on SMP by removing the __init tag
as it is called for each CPU.

And use __prevent_user_access() instead of hard
coding initial lock.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ae35eec3426509efc2b8ae69586c822e2fe2642a.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/kuap: Add kuap_lock()
Christophe Leroy [Tue, 19 Oct 2021 07:29:23 +0000 (09:29 +0200)]
powerpc/kuap: Add kuap_lock()

Add kuap_lock() and call it when entering interrupts from user.

It is called kuap_lock() as it is similar to kuap_save_and_lock()
without the save.

However book3s/32 already have a kuap_lock(). Rename it
kuap_lock_addr().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4437e2deb9f6f549f7089d45e9c6f96a7e77905a.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/kuap: Remove __kuap_assert_locked()
Christophe Leroy [Tue, 19 Oct 2021 07:29:22 +0000 (09:29 +0200)]
powerpc/kuap: Remove __kuap_assert_locked()

__kuap_assert_locked() is redundant with
__kuap_get_and_assert_locked().

Move the verification of CONFIG_PPC_KUAP_DEBUG in kuap_assert_locked()
and make it call __kuap_get_and_assert_locked() directly.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1a60198a25d2ba38a37f1b92bc7d096435df4224.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/kuap: Check KUAP activation in generic functions
Christophe Leroy [Tue, 19 Oct 2021 07:29:21 +0000 (09:29 +0200)]
powerpc/kuap: Check KUAP activation in generic functions

Today, every platform checks that KUAP is not de-activated
before doing the real job.

Move the verification out of platform specific functions.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/894f110397fcd248e125fb855d1e863e4e633a0d.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/kuap: Add a generic intermediate layer
Christophe Leroy [Tue, 19 Oct 2021 07:29:20 +0000 (09:29 +0200)]
powerpc/kuap: Add a generic intermediate layer

Make the following functions generic to all platforms.
- bad_kuap_fault()
- kuap_assert_locked()
- kuap_save_and_lock() (PPC32 only)
- kuap_kernel_restore()
- kuap_get_and_assert_locked()

And for all platforms except book3s/64
- allow_user_access()
- prevent_user_access()
- prevent_user_access_return()
- restore_user_access()

Prepend __ in front of the name of platform specific ones.

For now the generic just calls the platform specific, but
next patch will move redundant parts of specific functions
into the generic one.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eaef143a8dae7288cd34565ffa7b49c16aee1ec3.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/kuep: Remove 'nosmep' boot time parameter except for book3s/64
Christophe Leroy [Tue, 19 Oct 2021 07:29:19 +0000 (09:29 +0200)]
powerpc/kuep: Remove 'nosmep' boot time parameter except for book3s/64

Deactivating KUEP at boot time is unrelevant for PPC32 and BOOK3E/64.

Remove it.

It allows to refactor setup_kuep() via a __weak function
that only PPC64s will overide for now.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix CONFIG_PPC_BOOKS_64 -> CONFIG_PPC_BOOK3S_64 typo]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4c36df18b41c988c4512f45d96220486adbe4c99.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/32s: Save content of sr0 to avoid 'mfsr'
Christophe Leroy [Tue, 19 Oct 2021 07:29:18 +0000 (09:29 +0200)]
powerpc/32s: Save content of sr0 to avoid 'mfsr'

Calling 'mfsr' to get the content of segment registers is heavy,
in addition it requires clearing of the 'reserved' bits.

In order to avoid this operation, save it in mm context and in
thread struct.

The saved sr0 is the one used by kernel, this means that on
locking entry it can be used as is.

For unlocking, the only thing to do is to clear SR_NX.

This improves null_syscall selftest by 12 cycles, ie 4%.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b02baf2ed8f09bad910dfaeeb7353b2ae6830525.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/32s: Do kuep_lock() and kuep_unlock() in assembly
Christophe Leroy [Tue, 19 Oct 2021 07:29:17 +0000 (09:29 +0200)]
powerpc/32s: Do kuep_lock() and kuep_unlock() in assembly

When interrupt and syscall entries where converted to C, KUEP locking
and unlocking was also converted. It improved performance by unrolling
the loop, and allowed easily implementing boot time deactivation of
KUEP.

However, null_syscall selftest shows that KUEP is still heavy
(361 cycles with KUEP, 212 cycles without).

A way to improve more is to group 'mtsr's together, instead of
repeating 'addi' + 'mtsr' several times.

In order to do that, more registers need to be available. In C, GCC
will always be able to provide the requested number of registers, but
at the cost of saving some data on the stack, which is counter
performant here.

So let's do it in assembly, when we have full control of which
register can be used. It also has the advantage of locking earlier
and unlocking later and it helps GCC generating less tricky code.
The only drawback is to make boot time deactivation less straight
forward and require 'hand' instruction patching.

Group 'mtsr's by 4.

With this change, null_syscall selftest reports 336 cycles. Without
the change it was 361 cycles, that's a 7% reduction.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/115cb279e9b9948dfd93a065e047081c59e3a2a6.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/32s: Remove capability to disable KUEP at boottime
Christophe Leroy [Tue, 19 Oct 2021 07:29:16 +0000 (09:29 +0200)]
powerpc/32s: Remove capability to disable KUEP at boottime

Disabling KUEP at boottime makes things unnecessarily complex.

Still allow disabling KUEP at build time, but when it's built-in
it is always there.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/96f583f82423a29a4205c60b9721079111b35567.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/book3e: Activate KUEP at all time
Christophe Leroy [Tue, 19 Oct 2021 07:29:15 +0000 (09:29 +0200)]
powerpc/book3e: Activate KUEP at all time

On book3e,
- When using 64 bits PTE: User pages don't have the SX bit defined
so KUEP is always active.
- When using 32 bits PTE: Implement KUEP by clearing SX bit during
TLB miss for user pages. The impact is minimal and worth neither
boot time nor build time selection.

Activate it at all time.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e376b114283fb94504e2aa2de846780063252cde.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/44x: Activate KUEP at all time
Christophe Leroy [Tue, 19 Oct 2021 07:29:14 +0000 (09:29 +0200)]
powerpc/44x: Activate KUEP at all time

On 44x, KUEP is implemented by clearing SX bit during TLB miss
for user pages. The impact is minimal and not worth neither
boot time nor build time selection.

Activate it at all time.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2414d662558e7fb27d1ed41c8e47c591d576acac.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/8xx: Activate KUEP at all time
Christophe Leroy [Tue, 19 Oct 2021 07:29:13 +0000 (09:29 +0200)]
powerpc/8xx: Activate KUEP at all time

On the 8xx, there is absolutely no runtime impact with KUEP. Protection
against execution of user code in kernel mode is set up at boot time
by configuring the groups with contain all user pages as having swapped
protection rights, in extenso EX for user and NA for supervisor.

Configure KUEP at startup and force selection of CONFIG_PPC_KUEP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2129e86944323ffe9ed07fffbeafdfd2e363690a.1634627931.git.christophe.leroy@csgroup.eu
2 years agoRevert "powerpc: Inline setup_kup()"
Christophe Leroy [Tue, 19 Oct 2021 07:29:12 +0000 (09:29 +0200)]
Revert "powerpc: Inline setup_kup()"

This reverts commit 1791ebd131c46539b024c0f2ebf12b6c88a265b9.

setup_kup() was inlined to manage conflict between PPC32 marking
setup_{kuap/kuep}() __init and PPC64 not marking them __init.

But in fact PPC32 has removed the __init mark for all but 8xx
in order to properly handle SMP.

In order to make setup_kup() grow a bit, revert the commit
mentioned above but remove __init for 8xx as well so that
we don't have to mark setup_kup() as __ref.

Also switch the order so that KUAP is initialised before KUEP
because on the 40x, KUEP will depend on the activation of KUAP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7691088fd0994ee3c8db6298dc8c00259e3f6a7f.1634627931.git.christophe.leroy@csgroup.eu
2 years agopowerpc/40x: Map 32Mbytes of memory at startup
Christophe Leroy [Mon, 27 Sep 2021 15:12:39 +0000 (17:12 +0200)]
powerpc/40x: Map 32Mbytes of memory at startup

As reported by Carlo, 16Mbytes is not enough with modern kernels
that tend to be a bit big, so map another 16M page at boot.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/89b5f974a7fa5011206682cd092e2c905530ff46.1632755552.git.christophe.leroy@csgroup.eu
2 years agopowerpc/microwatt: add POWER9_CPU, clear PPC_64S_HASH_MMU
Nicholas Piggin [Wed, 1 Dec 2021 14:41:53 +0000 (00:41 +1000)]
powerpc/microwatt: add POWER9_CPU, clear PPC_64S_HASH_MMU

Microwatt implements a subset of ISA v3.0 (which is equivalent to
the POWER9_CPU option). It is radix-only, so does not require hash
MMU support.

This saves 20kB compressed dtbImage and 56kB vmlinux size.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-19-npiggin@gmail.com
2 years agopowerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU
Nicholas Piggin [Wed, 1 Dec 2021 14:41:52 +0000 (00:41 +1000)]
powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU

Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves
128kB kernel image size (90kB text) on powernv_defconfig minus KVM,
350kB on pseries_defconfig minus KVM, 40kB on a tiny config.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG.
      Fix radix_enabled() use in setup_initial_memory_limit(). Add some
      stubs to reduce number of ifdefs.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com
2 years agopowerpc/64s: Make hash MMU support configurable
Nicholas Piggin [Wed, 1 Dec 2021 14:41:51 +0000 (00:41 +1000)]
powerpc/64s: Make hash MMU support configurable

This adds Kconfig selection which allows 64s hash MMU support to be
disabled. It can be disabled if radix support is enabled, the minimum
supported CPU type is POWER9 (or higher), and KVM is not selected.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-17-npiggin@gmail.com
2 years agopowerpc/64s: Always define arch unmapped area calls
Nicholas Piggin [Wed, 1 Dec 2021 14:41:50 +0000 (00:41 +1000)]
powerpc/64s: Always define arch unmapped area calls

To avoid any functional changes to radix paths when building with hash
MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the
arch get_unmapped_area calls on 64s platforms.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-16-npiggin@gmail.com
2 years agopowerpc/64s: Fix radix MMU when MMU_FTR_HPTE_TABLE is clear
Nicholas Piggin [Wed, 1 Dec 2021 14:41:49 +0000 (00:41 +1000)]
powerpc/64s: Fix radix MMU when MMU_FTR_HPTE_TABLE is clear

There are a few places that require MMU_FTR_HPTE_TABLE to be set even
when running in radix mode. Fix those up.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-15-npiggin@gmail.com
2 years agopowerpc/64e: remove mmu_linear_psize
Nicholas Piggin [Wed, 1 Dec 2021 14:41:48 +0000 (00:41 +1000)]
powerpc/64e: remove mmu_linear_psize

mmu_linear_psize is only set at boot once on 64e, is not necessarily
the correct size of the linear map pages, and is never used anywhere.
Remove it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Retain the extern, so we can use IS_ENABLED() for related code]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-14-npiggin@gmail.com
2 years agopowerpc: make memremap_compat_align 64s-only
Nicholas Piggin [Wed, 1 Dec 2021 14:41:47 +0000 (00:41 +1000)]
powerpc: make memremap_compat_align 64s-only

memremap_compat_align is only relevant when ZONE_DEVICE is selected.
ZONE_DEVICE depends on ARCH_HAS_PTE_DEVMAP, which is only selected
by PPC_BOOK3S_64.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-13-npiggin@gmail.com
2 years agopowerpc/64: pcpu setup avoid reading mmu_linear_psize on 64e or radix
Nicholas Piggin [Wed, 1 Dec 2021 14:41:46 +0000 (00:41 +1000)]
powerpc/64: pcpu setup avoid reading mmu_linear_psize on 64e or radix

Radix never sets mmu_linear_psize so it's always 4K, which causes pcpu
atom_size to always be PAGE_SIZE. 64e sets it to 1GB always.

Make paths for these platforms to be explicit about what value they set
atom_size to.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-12-npiggin@gmail.com
2 years agopowerpc/64s: Rename hash_hugetlbpage.c to hugetlbpage.c
Nicholas Piggin [Wed, 1 Dec 2021 14:41:45 +0000 (00:41 +1000)]
powerpc/64s: Rename hash_hugetlbpage.c to hugetlbpage.c

This file contains functions and data common to radix, so rename it to
remove the hash_ prefix.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-11-npiggin@gmail.com
2 years agopowerpc/64s: move page size definitions from hash specific file
Nicholas Piggin [Wed, 1 Dec 2021 14:41:44 +0000 (00:41 +1000)]
powerpc/64s: move page size definitions from hash specific file

The radix code uses some of the psize variables. Move the common
ones from hash_utils.c to pgtable.c.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-10-npiggin@gmail.com
2 years agopowerpc/64s: Make flush_and_reload_slb a no-op when radix is enabled
Nicholas Piggin [Wed, 1 Dec 2021 14:41:43 +0000 (00:41 +1000)]
powerpc/64s: Make flush_and_reload_slb a no-op when radix is enabled

The radix test can exclude slb_flush_all_realmode() from being called
because flush_and_reload_slb() is only expected to flush ERAT when
called by flush_erat(), which is only on pre-ISA v3.0 CPUs that do not
support radix.

This helps the later change to make hash support configurable to not
introduce runtime changes to radix mode behaviour.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-9-npiggin@gmail.com
2 years agopowerpc/64s: move THP trace point creation out of hash specific file
Nicholas Piggin [Wed, 1 Dec 2021 14:41:42 +0000 (00:41 +1000)]
powerpc/64s: move THP trace point creation out of hash specific file

In preparation for making hash MMU support configurable, move THP
trace point function definitions out of an otherwise hash-specific
file.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-8-npiggin@gmail.com
2 years agopowerpc/pseries: lparcfg don't include slb_size line in radix mode
Nicholas Piggin [Wed, 1 Dec 2021 14:41:41 +0000 (00:41 +1000)]
powerpc/pseries: lparcfg don't include slb_size line in radix mode

This avoids a change in behaviour in the later patch making hash
support configurable. This is possibly a user interface change, so
the alternative would be a hard-coded slb_size=0 here.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-7-npiggin@gmail.com
2 years agopowerpc/pseries: move process table registration away from hash-specific code
Nicholas Piggin [Wed, 1 Dec 2021 14:41:40 +0000 (00:41 +1000)]
powerpc/pseries: move process table registration away from hash-specific code

This reduces ifdefs in a later change which makes hash support configurable.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-6-npiggin@gmail.com
2 years agopowerpc/64s: Move and rename do_bad_slb_fault as it is not hash specific
Nicholas Piggin [Wed, 1 Dec 2021 14:41:39 +0000 (00:41 +1000)]
powerpc/64s: Move and rename do_bad_slb_fault as it is not hash specific

slb.c is hash-specific SLB management, but do_bad_slb_fault deals with
segment interrupts that occur with radix MMU as well.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-5-npiggin@gmail.com
2 years agopowerpc/pseries: Stop selecting PPC_HASH_MMU_NATIVE
Nicholas Piggin [Wed, 1 Dec 2021 14:41:38 +0000 (00:41 +1000)]
powerpc/pseries: Stop selecting PPC_HASH_MMU_NATIVE

The pseries platform does not use the native hash code but the PAPR
virtualised hash interfaces, so remove PPC_HASH_MMU_NATIVE.

This requires moving tlbiel code from hash_native.c to hash_utils.c.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-4-npiggin@gmail.com
2 years agopowerpc: Rename PPC_NATIVE to PPC_HASH_MMU_NATIVE
Nicholas Piggin [Wed, 1 Dec 2021 14:41:37 +0000 (00:41 +1000)]
powerpc: Rename PPC_NATIVE to PPC_HASH_MMU_NATIVE

PPC_NATIVE now only controls the native HPT code, so rename it to be
more descriptive. Restrict it to Book3S only.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-3-npiggin@gmail.com
2 years agopowerpc: Remove unused FW_FEATURE_NATIVE references
Nicholas Piggin [Wed, 1 Dec 2021 14:41:36 +0000 (00:41 +1000)]
powerpc: Remove unused FW_FEATURE_NATIVE references

FW_FEATURE_NATIVE_ALWAYS and FW_FEATURE_NATIVE_POSSIBLE are always
zero and never do anything. Remove them.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-2-npiggin@gmail.com
2 years agoKVM: PPC: Book3S: Suppress failed alloc warning in H_COPY_TOFROM_GUEST
Alexey Kardashevskiy [Wed, 1 Sep 2021 08:45:50 +0000 (18:45 +1000)]
KVM: PPC: Book3S: Suppress failed alloc warning in H_COPY_TOFROM_GUEST

H_COPY_TOFROM_GUEST is an hcall for an upper level VM to access its nested
VMs memory. The userspace can trigger WARN_ON_ONCE(!(gfp & __GFP_NOWARN))
in __alloc_pages() by constructing a tiny VM which only does
H_COPY_TOFROM_GUEST with a too big GPR9 (number of bytes to copy).

This silences the warning by adding __GFP_NOWARN.

Spotted by syzkaller.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210901084550.1658699-1-aik@ozlabs.ru
2 years agoKVM: PPC: Book3S: Suppress warnings when allocating too big memory slots
Alexey Kardashevskiy [Wed, 1 Sep 2021 08:45:12 +0000 (18:45 +1000)]
KVM: PPC: Book3S: Suppress warnings when allocating too big memory slots

The userspace can trigger "vmalloc size %lu allocation failure: exceeds
total pages" via the KVM_SET_USER_MEMORY_REGION ioctl.

This silences the warning by checking the limit before calling vzalloc()
and returns ENOMEM if failed.

This does not call underlying valloc helpers as __vmalloc_node() is only
exported when CONFIG_TEST_VMALLOC_MODULE and __vmalloc_node_range() is
not exported at all.

Spotted by syzkaller.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[mpe: Use 'size' for the variable rather than 'cb']
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210901084512.1658628-1-aik@ozlabs.ru
2 years agoKVM: PPC: Book3S HV P9: Remove unused ri_set local variable
Nicholas Piggin [Wed, 1 Dec 2021 05:21:12 +0000 (15:21 +1000)]
KVM: PPC: Book3S HV P9: Remove unused ri_set local variable

ri_set is set and never used.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201052112.2137167-1-npiggin@gmail.com
2 years agopowerpc/xive: Fix compile when !CONFIG_PPC_POWERNV.
Cédric Le Goater [Wed, 1 Dec 2021 16:54:18 +0000 (17:54 +0100)]
powerpc/xive: Fix compile when !CONFIG_PPC_POWERNV.

The automatic "save & restore" of interrupt context is a POWER10/XIVE2
feature exploited by KVM under the PowerNV platform. It is not
available under pSeries and the associated toggle should not be
exposed under the XIVE debugfs directory.

Introduce a platform handler for debugfs initialization and move the
'save-restore' entry under the native (PowerNV) backend to fix compile
when !CONFIG_PPC_POWERNV.

Fixes: 1e7684dc4fc7 ("powerpc/xive: Add a debugfs toggle for save-restore")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201165418.1041842-1-clg@kaod.org
2 years agopowerpc/signal32: Use struct_group() to zero spe regs
Kees Cook [Thu, 18 Nov 2021 20:36:04 +0000 (12:36 -0800)]
powerpc/signal32: Use struct_group() to zero spe regs

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Add a struct_group() for the spe registers so that memset() can correctly reason
about the size:

   In function 'fortify_memset_chk',
       inlined from 'restore_user_regs.part.0' at arch/powerpc/kernel/signal_32.c:539:3:
   >> include/linux/fortify-string.h:195:4: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
     195 |    __write_overflow_field();
         |    ^~~~~~~~~~~~~~~~~~~~~~~~

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211118203604.1288379-1-keescook@chromium.org
2 years agopowerpc/32s: Fix shift-out-of-bounds in KASAN init
Christophe Leroy [Tue, 30 Nov 2021 08:42:37 +0000 (09:42 +0100)]
powerpc/32s: Fix shift-out-of-bounds in KASAN init

================================================================================
UBSAN: shift-out-of-bounds in arch/powerpc/mm/kasan/book3s_32.c:22:23
shift exponent -1 is negative
CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.5-gentoo-PowerMacG4 #9
Call Trace:
[c214be60] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable)
[c214be80] [c0b99288] ubsan_epilogue+0x10/0x5c
[c214be90] [c0b98fe0] __ubsan_handle_shift_out_of_bounds+0x94/0x138
[c214bf00] [c1c0f010] kasan_init_region+0xd8/0x26c
[c214bf30] [c1c0ed84] kasan_init+0xc0/0x198
[c214bf70] [c1c08024] setup_arch+0x18/0x54c
[c214bfc0] [c1c037f0] start_kernel+0x90/0x33c
[c214bff0] [00003610] 0x3610
================================================================================

This happens when the directly mapped memory is a power of 2.

Fix it by checking the shift and set the result to 0 when shift is -1

Fixes: 7974c4732642 ("powerpc/32s: Implement dedicated kasan_init_region()")
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215169
Link: https://lore.kernel.org/r/15cbc3439d4ad988b225e2119ec99502a5cc6ad3.1638261744.git.christophe.leroy@csgroup.eu
2 years agopowerpc/powermac: Add missing lockdep_register_key()
Christophe Leroy [Tue, 30 Nov 2021 09:32:42 +0000 (10:32 +0100)]
powerpc/powermac: Add missing lockdep_register_key()

KeyWest i2c @0xf8001003 irq 42 /uni-n@f8000000/i2c@f8001000
BUG: key c2d00cbc has not been registered!
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:4801 lockdep_init_map_type+0x4c0/0xb4c
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.5-gentoo-PowerMacG4 #9
NIP:  c01a9428 LR: c01a9428 CTR: 00000000
REGS: e1033cf0 TRAP: 0700   Not tainted  (5.15.5-gentoo-PowerMacG4)
MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 24002002  XER: 00000000

GPR00: c01a9428 e1033db0 c2d1cf20 00000016 00000004 00000001 c01c0630 e1033a73
GPR08: 00000000 00000000 00000000 e1033db0 24002004 00000000 f8729377 00000003
GPR16: c1829a9c 00000000 18305357 c1416fc0 c1416f80 c006ac60 c2d00ca8 c1416f00
GPR24: 00000000 c21586f0 c2160000 00000000 c2d00cbc c2170000 c216e1a0 c2160000
NIP [c01a9428] lockdep_init_map_type+0x4c0/0xb4c
LR [c01a9428] lockdep_init_map_type+0x4c0/0xb4c
Call Trace:
[e1033db0] [c01a9428] lockdep_init_map_type+0x4c0/0xb4c (unreliable)
[e1033df0] [c1c177b8] kw_i2c_add+0x334/0x424
[e1033e20] [c1c18294] pmac_i2c_init+0x9ec/0xa9c
[e1033e80] [c1c1a790] smp_core99_probe+0xbc/0x35c
[e1033eb0] [c1c03cb0] kernel_init_freeable+0x190/0x5a4
[e1033f10] [c000946c] kernel_init+0x28/0x154
[e1033f30] [c0035148] ret_from_kernel_thread+0x14/0x1c

Add missing lockdep_register_key()

Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/69e4f55565bb45ebb0843977801b245af0c666fe.1638264741.git.christophe.leroy@csgroup.eu
2 years agopowerpc/modules: Don't WARN on first module allocation attempt
Christophe Leroy [Tue, 30 Nov 2021 10:10:43 +0000 (11:10 +0100)]
powerpc/modules: Don't WARN on first module allocation attempt

module_alloc() first tries to allocate module text within 24 bits direct
jump from kernel text, and tries a wider allocation if first one fails.

When first allocation fails the following is observed in kernel logs:

  vmap allocation for size 2400256 failed: use vmalloc=<size> to increase size
  systemd-udevd: vmalloc error: size 2395133, vm_struct allocation failed, mode:0xcc0(GFP_KERNEL), nodemask=(null)
  CPU: 0 PID: 127 Comm: systemd-udevd Tainted: G        W         5.15.5-gentoo-PowerMacG4 #9
  Call Trace:
  [e2a53a50] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable)
  [e2a53a70] [c0540128] warn_alloc+0x11c/0x2b4
  [e2a53b50] [c0531be8] __vmalloc_node_range+0xd8/0x64c
  [e2a53c10] [c00338c0] module_alloc+0xa0/0xac
  [e2a53c40] [c027a368] load_module+0x2ae0/0x8148
  [e2a53e30] [c027fc78] sys_finit_module+0xfc/0x130
  [e2a53f30] [c0035098] ret_from_syscall+0x0/0x28
  ...

Add __GFP_NOWARN flag to first allocation so that no warning appears
when it fails.

Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Fixes: 2ec13df16704 ("powerpc/modules: Load modules closer to kernel text")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/93c9b84d6ec76aaf7b4f03468e22433a6d308674.1638267035.git.christophe.leroy@csgroup.eu
2 years agopowerpc/64s: Get LPID bit width from device tree
Nicholas Piggin [Mon, 29 Nov 2021 03:09:15 +0000 (13:09 +1000)]
powerpc/64s: Get LPID bit width from device tree

Allow the LPID bit width and partition table size to be set at runtime
from the device tree.

Move the PID bit width detection into the same place.

KVM does not support using the extra bits yet, this is mainly required
to get the PTCR register values correct (so KVM will run but it will
not allocate > 4096 LPIDs).

OPAL firmware provides this property for POWER10 CPUs since skiboot
commit 9b85f7d961f2 ("hdata: add mmu-pid-bits and mmu-lpid-bits for
POWER10 CPUs").

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211129030915.1888332-1-npiggin@gmail.com
2 years agopowerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC
Athira Rajeev [Wed, 21 Jul 2021 05:48:29 +0000 (01:48 -0400)]
powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC

Running perf fuzzer showed below in dmesg logs:
  "Can't find PMC that caused IRQ"

This means a PMU exception happened, but none of the PMC's (Performance
Monitor Counter) were found to be overflown. There are some corner cases
that clears the PMCs after PMI gets masked. In such cases, the perf
interrupt handler will not find the active PMC values that had caused
the overflow and thus leads to this message while replaying.

Case 1: PMU Interrupt happens during replay of other interrupts and
counter values gets cleared by PMU callbacks before replay:

During replay of interrupts like timer, __do_irq() and doorbell
exception, we conditionally enable interrupts via may_hard_irq_enable().
This could potentially create a window to generate a PMI. Since irq soft
mask is set to ALL_DISABLED, the PMI will get masked here. We could get
IPIs run before perf interrupt is replayed and the PMU events could
be deleted or stopped. This will change the PMU SPR values and resets
the counters. Snippet of ftrace log showing PMU callbacks invoked in
__do_irq():

  <idle>-0 [051] dns. 132025441306354: __do_irq <-call_do_irq
  <idle>-0 [051] dns. 132025441306430: irq_enter <-__do_irq
  <idle>-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq
  <idle>-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq
  <<>>
  <idle>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed
  <idle>-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed
  <idle>-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function
  <idle>-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable
  <idle>-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out
  <idle>-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del
  <idle>-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read
  <idle>-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del
  <idle>-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del
  <idle>-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage
  <idle>-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage
  <idle>-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable
  <<>>
  <idle>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq
  <idle>-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interrupts

Case 2: PMI's masked during local_* operations, example local_add(). If
the local_add() operation happens within a local_irq_save(), replay of
PMI will be during local_irq_restore(). Similar to case 1, this could
also create a window before replay where PMU events gets deleted or
stopped.

Fix it by updating the PMU callback function power_pmu_disable() to
check for pending perf interrupt. If there is an overflown PMC and
pending perf interrupt indicated in paca, clear the PMI bit in paca to
drop that sample. Clearing of PMI bit is done in power_pmu_disable()
since disable is invoked before any event gets deleted/stopped. With
this fix, if there are more than one event running in the PMU, there is
a chance that we clear the PMI bit for the event which is not getting
deleted/stopped. The other events may still remain active. Hence to make
sure we don't drop valid sample in such cases, another check is added in
power_pmu_enable. This checks if there is an overflown PMC found among
the active events and if so enable back the PMI bit. Two new helper
functions are introduced to clear/set the PMI, ie
clear_pmi_irq_pending() and set_pmi_irq_pending(). Helper function
pmi_irq_pending() is introduced to give a warning if there is pending
PMI bit in paca, but no PMC is overflown.

Also there are corner cases which result in performance monitor
interrupts being triggered during power_pmu_disable(). This happens
since PMXE bit is not cleared along with disabling of other MMCR0 bits
in the pmu_disable. Such PMI's could leave the PMU running and could
trigger PMI again which will set MMCR0 PMAO bit. This could lead to
spurious interrupts in some corner cases. Example, a timer after
power_pmu_del() which will re-enable interrupts and triggers a PMI again
since PMAO bit is still set. But fails to find valid overflow since PMC
was cleared in power_pmu_del(). Fix that by disabling PMXE along with
disabling of other MMCR0 bits in power_pmu_disable().

We can't just replay PMI any time. Hence this approach is preferred
rather than replaying PMI before resetting overflown PMC. Patch also
documents core-book3s on a race condition which can trigger these PMC
messages during idle path in PowerNV.

Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and replay them")
Reported-by: Nageswara R Sastry <nasastry@in.ibm.com>
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Make pmi_irq_pending() return bool, reflow/reword some comments]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1626846509-1350-2-git-send-email-atrajeev@linux.vnet.ibm.com
2 years agopowerpc/atomics: Remove atomic_inc()/atomic_dec() and friends
Christophe Leroy [Tue, 21 Sep 2021 15:09:49 +0000 (17:09 +0200)]
powerpc/atomics: Remove atomic_inc()/atomic_dec() and friends

Now that atomic_add() and atomic_sub() handle immediate operands,
atomic_inc() and atomic_dec() have no added value compared to the
generic fallback which calls atomic_add(1) and atomic_sub(1).

Also remove atomic_inc_not_zero() which fallsback to
atomic_add_unless() which itself fallsback to
atomic_fetch_add_unless() which now handles immediate operands.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0bc64a2f18726055093dbb2e479cefc60a409cfd.1632236981.git.christophe.leroy@csgroup.eu
2 years agopowerpc/atomics: Use immediate operand when possible
Christophe Leroy [Tue, 21 Sep 2021 15:09:48 +0000 (17:09 +0200)]
powerpc/atomics: Use immediate operand when possible

Today we get the following code generation for atomic operations:

c001bb2c: 39 20 00 01  li      r9,1
c001bb30: 7d 40 18 28  lwarx   r10,0,r3
c001bb34: 7d 09 50 50  subf    r8,r9,r10
c001bb38: 7d 00 19 2d  stwcx.  r8,0,r3

c001c7a8: 39 40 00 01  li      r10,1
c001c7ac: 7d 00 18 28  lwarx   r8,0,r3
c001c7b0: 7c ea 42 14  add     r7,r10,r8
c001c7b4: 7c e0 19 2d  stwcx.  r7,0,r3

By allowing GCC to choose between immediate or regular operation,
we get:

c001bb2c: 7d 20 18 28  lwarx   r9,0,r3
c001bb30: 39 49 ff ff  addi    r10,r9,-1
c001bb34: 7d 40 19 2d  stwcx.  r10,0,r3
--
c001c7a4: 7d 40 18 28  lwarx   r10,0,r3
c001c7a8: 39 0a 00 01  addi    r8,r10,1
c001c7ac: 7d 00 19 2d  stwcx.  r8,0,r3

For "and", the dot form has to be used because "andi" doesn't exist.

For logical operations we use unsigned 16 bits immediate.
For arithmetic operations we use signed 16 bits immediate.

On pmac32_defconfig, it reduces the text by approx another 8 kbytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2ec558d44db8045752fe9dbd29c9ba84bab6030b.1632236981.git.christophe.leroy@csgroup.eu
2 years agopowerpc/bitops: Use immediate operand when possible
Christophe Leroy [Tue, 21 Sep 2021 15:09:47 +0000 (17:09 +0200)]
powerpc/bitops: Use immediate operand when possible

Today we get the following code generation for bitops like
set or clear bit:

c0009fe0: 39 40 08 00  li      r10,2048
c0009fe4: 7c e0 40 28  lwarx   r7,0,r8
c0009fe8: 7c e7 53 78  or      r7,r7,r10
c0009fec: 7c e0 41 2d  stwcx.  r7,0,r8

c000d568: 39 00 18 00  li      r8,6144
c000d56c: 7c c0 38 28  lwarx   r6,0,r7
c000d570: 7c c6 40 78  andc    r6,r6,r8
c000d574: 7c c0 39 2d  stwcx.  r6,0,r7

Most set bits are constant on lower 16 bits, so it can easily
be replaced by the "immediate" version of the operation. Allow
GCC to choose between the normal or immediate form.

For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for
when all bits to be cleared are consecutive.

On 64 bits we don't have any equivalent single operation for clearing,
single bits or a few bits, we'd need two 'rldicl' so it is not
worth it, the li/andc sequence is doing the same.

With this patch we get:

c0009fe0: 7d 00 50 28  lwarx   r8,0,r10
c0009fe4: 61 08 08 00  ori     r8,r8,2048
c0009fe8: 7d 00 51 2d  stwcx.  r8,0,r10

c000d558: 7c e0 40 28  lwarx   r7,0,r8
c000d55c: 54 e7 05 64  rlwinm  r7,r7,0,21,18
c000d560: 7c e0 41 2d  stwcx.  r7,0,r8

On pmac32_defconfig, it reduces the text by approx 10 kbytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e6f815d9181bab09df3b350af51149437863e9f9.1632236981.git.christophe.leroy@csgroup.eu
2 years agopowerpc: flexible GPR range save/restore macros
Nicholas Piggin [Fri, 22 Oct 2021 06:13:22 +0000 (16:13 +1000)]
powerpc: flexible GPR range save/restore macros

Introduce macros that operate on a (start, end) range of GPRs, which
reduces lines of code and need to do mental arithmetic while reading the
code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211022061322.2671178-1-npiggin@gmail.com
2 years agopowerpc/watchdog: help remote CPUs to flush NMI printk output
Nicholas Piggin [Fri, 19 Nov 2021 11:31:46 +0000 (21:31 +1000)]
powerpc/watchdog: help remote CPUs to flush NMI printk output

The printk layer at the moment does not seem to have a good way to force
flush printk messages that are created in NMI context, except in the
panic path.

NMI-context printk messages normally get to the console with irq_work,
but that won't help if the CPU is stuck with irqs disabled, as can be
the case for hard lockup watchdog messages.

The watchdog currently flushes the printk buffers after detecting a
lockup on remote CPUs, but they may not have processed their NMI IPI
yet by that stage, or they may have self-detected a lockup in which
case they won't go via this NMI IPI path.

Improve the situation by having NMI-context mark a flag if it called
printk, and have watchdog timer interrupts check if that flag was set
and try to flush if it was. Latency is not a big problem because we
were already stuck for a while, just need to try to make sure the
messages eventually make it out.

Depends-on: 5d5e4522a7f4 ("printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211119113146.752759-6-npiggin@gmail.com
2 years agopowerpc: Don't bother about .data..Lubsan sections
Christophe Leroy [Thu, 25 Nov 2021 11:43:33 +0000 (12:43 +0100)]
powerpc: Don't bother about .data..Lubsan sections

Since commit 9a427556fb8e ("vmlinux.lds.h: catch compound literals
into data and BSS") .data..Lubsan sections are taken into account
in DATA_MAIN which is included in DATA_DATA macro.

No need to take care of them anymore in powerpc vmlinux.lds.S

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3eb14570612eef17e01bb67f14a4450136001794.1637840601.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ptdump: Fix display a BAT's size unit
Christophe Leroy [Fri, 26 Nov 2021 10:30:03 +0000 (11:30 +0100)]
powerpc/ptdump: Fix display a BAT's size unit

We have wrong units on BAT's sizes (G instead of M, M instead of ...)

---[ Instruction Block Address Translation ]---
0: 0xc0000000-0xc03fffff 0x00000000         4G Kernel   x     m
1: 0xc0400000-0xc05fffff 0x00400000         2G Kernel   x     m
2: 0xc0600000-0xc06fffff 0x00600000         1G Kernel   x     m
3: 0xc0700000-0xc077ffff 0x00700000       512M Kernel   x     m
4: 0xc0780000-0xc079ffff 0x00780000       128M Kernel   x     m
5: 0xc07a0000-0xc07bffff 0x007a0000       128M Kernel   x     m
6:         -
7:         -

This is because pt_dump_size() expects a size in Kbytes but
bat_show_603() gives the size in bytes.

To avoid risk of confusion, change pt_dump_size() to take bytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f16c30f5c9185a63335322cf1a8b22f189d335ef.1637922595.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on PPC32
Christophe Leroy [Thu, 28 Oct 2021 12:24:04 +0000 (14:24 +0200)]
powerpc/ftrace: Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on PPC32

Unlike PPC64, PPC32 doesn't require any special compiler option
to get _mcount() call not clobbering registers.

Provide ftrace_regs_caller() and ftrace_regs_call() and activate
HAVE_DYNAMIC_FTRACE_WITH_REGS.

That's heavily copied from ftrace_64_mprofile.S

For the time being leave livepatching aside, it will come with
following patch.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1862dc7719855cc2a4eec80920d94c955877557e.1635423081.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Add module_trampoline_target() for PPC32
Christophe Leroy [Thu, 28 Oct 2021 12:24:03 +0000 (14:24 +0200)]
powerpc/ftrace: Add module_trampoline_target() for PPC32

module_trampoline_target() is used by __ftrace_modify_call().

Implement it for PPC32 so that CONFIG_DYNAMIC_FTRACE_WITH_REGS
can be activated on PPC32 as well.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/42345f464fb465f0fc76f3090e250be8fc1729f0.1635423081.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: No need to read LR from stack in _mcount()
Christophe Leroy [Thu, 28 Oct 2021 12:24:02 +0000 (14:24 +0200)]
powerpc/ftrace: No need to read LR from stack in _mcount()

All functions calling _mcount do it exactly the same way, with the
following sequence of instructions:

c07de788:       7c 08 02 a6     mflr    r0
c07de78c:       90 01 00 04     stw     r0,4(r1)
c07de790:       4b 84 13 65     bl      c001faf4 <_mcount>

Allthough LR is pushed on stack, it is still in r0 while entering
_mcount().

Function arguments are in r3-r10, so r11 and r12 are still available
at that point.

Do like PPC64 and use r12 to move LR into CTR, so that r0 is preserved
and doesn't need to be restored from the stack.

While at it, bring back the EXPORT_SYMBOL at the end of _mcount.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/24a3ba7db388537c44a038026f926d885372e6d3.1635423081.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Mark probe_machine() __init and static
Michael Ellerman [Wed, 24 Nov 2021 09:32:54 +0000 (20:32 +1100)]
powerpc: Mark probe_machine() __init and static

Prior to commit b1923caa6e64 ("powerpc: Merge 32-bit and 64-bit
setup_arch()") probe_machine() was called from setup_32/64.c and lived
in setup-common.c. But now it's only called from setup-common.c so it
can be static and __init, and we don't need the declaration in
machdep.h either.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-6-mpe@ellerman.id.au
2 years agopowerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILING
Michael Ellerman [Wed, 24 Nov 2021 09:32:53 +0000 (20:32 +1100)]
powerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILING

setup_profiling_timer() is only needed when CONFIG_PROFILING is enabled.

Fixes the following W=1 warning when CONFIG_PROFILING=n:
  linux/arch/powerpc/kernel/smp.c:1638:5: error: no previous prototype for ‘setup_profiling_timer’

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-5-mpe@ellerman.id.au
2 years agopowerpc/mm: Move tlbcam_sz() and make it static
Michael Ellerman [Wed, 24 Nov 2021 09:32:52 +0000 (20:32 +1100)]
powerpc/mm: Move tlbcam_sz() and make it static

Building with W=1 we see a warning:
  linux/arch/powerpc/mm/nohash/fsl_book3e.c:63:15: error: no previous prototype for ‘tlbcam_sz’

tlbcam_sz() is not used outside this file, so we can make it static.
However it's only used inside #ifdef CONFIG_PPC32, so move it within
that ifdef, otherwise we would get a defined but not used error.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-4-mpe@ellerman.id.au
2 years agopowerpc/85xx: Make c293_pcie_pic_init() static
Michael Ellerman [Wed, 24 Nov 2021 09:32:51 +0000 (20:32 +1100)]
powerpc/85xx: Make c293_pcie_pic_init() static

To fix the W=1 warning:
  linux/arch/powerpc/platforms/85xx/c293pcie.c:22:13: error: no previous prototype for ‘c293_pcie_pic_init’

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-3-mpe@ellerman.id.au
2 years agopowerpc/85xx: Make mpc85xx_smp_kexec_cpu_down() static
Michael Ellerman [Wed, 24 Nov 2021 09:32:50 +0000 (20:32 +1100)]
powerpc/85xx: Make mpc85xx_smp_kexec_cpu_down() static

To fix the W=1 warning:
  arch/powerpc/platforms/85xx/smp.c:369:6: error: no previous prototype for ‘mpc85xx_smp_kexec_cpu_down’

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-2-mpe@ellerman.id.au