platform/kernel/linux-rpi.git
3 years agopowerpc/32s: Fix cleanup_cpu_mmu_context() compile bug
Michael Ellerman [Tue, 15 Dec 2020 01:57:20 +0000 (12:57 +1100)]
powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug

Currently pmac32_defconfig with SMP=y doesn't build:

  arch/powerpc/platforms/powermac/smp.c:
  error: implicit declaration of function 'cleanup_cpu_mmu_context'

It would be nice for consistency if all platforms clear mm_cpumask and
flush TLBs on unplug, but the TLB invalidation bug described in commit
01b0f0eae081 ("powerpc/64s: Trim offlined CPUs from mm_cpumasks") only
applies to 64s and for now we only have the TLB flush code for that
platform.

So just add an empty version for 32-bit Book3S.

Fixes: 01b0f0eae081 ("powerpc/64s: Trim offlined CPUs from mm_cpumasks")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Change log based on comments from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
3 years agopowerpc: Add config fragment for disabling -Werror
Michael Ellerman [Fri, 23 Oct 2020 04:00:02 +0000 (15:00 +1100)]
powerpc: Add config fragment for disabling -Werror

This makes it easy to disable building with -Werror:

  $ make defconfig
  $ grep WERROR .config
  # CONFIG_PPC_DISABLE_WERROR is not set
  CONFIG_PPC_WERROR=y

  $ make disable-werror.config
    GEN     Makefile
  Using .config as base
  Merging arch/powerpc/configs/disable-werror.config
  Value of CONFIG_PPC_DISABLE_WERROR is redefined by fragment arch/powerpc/configs/disable-werror.config:
  Previous value: # CONFIG_PPC_DISABLE_WERROR is not set
  New value: CONFIG_PPC_DISABLE_WERROR=y
  ...

  $ grep WERROR .config
  CONFIG_PPC_DISABLE_WERROR=y

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201023040002.3313371-1-mpe@ellerman.id.au
3 years agopowerpc/configs: Add ppc64le_allnoconfig target
Michael Ellerman [Wed, 25 Nov 2020 03:15:51 +0000 (14:15 +1100)]
powerpc/configs: Add ppc64le_allnoconfig target

Add a phony target for ppc64le_allnoconfig, which tests some
combinations of CONFIG symbols that aren't covered by any of our
defconfigs.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201125031551.2112715-1-mpe@ellerman.id.au
3 years agopowerpc/powernv: Rate limit opal-elog read failure message
Andrew Donnellan [Fri, 11 Dec 2020 02:11:41 +0000 (13:11 +1100)]
powerpc/powernv: Rate limit opal-elog read failure message

Sometimes we can't read an error log from OPAL, and we print an error
message accordingly. But the OPAL userspace tools seem to like retrying a
lot, in which case we flood the kernel log with a lot of messages.

Change pr_err() to pr_err_ratelimited() to help with this.

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201211021140.28402-1-ajd@linux.ibm.com
3 years agopowerpc/pseries/memhotplug: Quieten some DLPAR operations
Laurent Dufour [Fri, 11 Dec 2020 14:59:54 +0000 (15:59 +0100)]
powerpc/pseries/memhotplug: Quieten some DLPAR operations

When attempting to remove by index a set of LMBs a lot of messages are
displayed on the console, even when everything goes fine:

  pseries-hotplug-mem: Attempting to hot-remove LMB, drc index 8000002d
  Offlined Pages 4096
  pseries-hotplug-mem: Memory at 2d0000000 was hot-removed

The 2 messages prefixed by "pseries-hotplug-mem" are not really
helpful for the end user, they should be debug outputs.

In case of error, because some of the LMB's pages couldn't be
offlined, the following is displayed on the console:

  pseries-hotplug-mem: Attempting to hot-remove LMB, drc index 8000003e
  pseries-hotplug-mem: Failed to hot-remove memory at 3e0000000
  dlpar: Could not handle DLPAR request "memory remove index 0x8000003e"

Again, the 2 messages prefixed by "pseries-hotplug-mem" are useless,
and the generic DLPAR prefixed message should be enough.

These 2 first changes are mainly triggered by the changes introduced
in drmgr:
  https://groups.google.com/g/powerpc-utils-devel/c/Y6ef4NB3EzM/m/9cu5JHRxAQAJ

Also, when adding a bunch of LMBs, a message is displayed in the console per LMB
like these ones:
  pseries-hotplug-mem: Memory at 7e0000000 (drc index 8000007e) was hot-added
  pseries-hotplug-mem: Memory at 7f0000000 (drc index 8000007f) was hot-added
  pseries-hotplug-mem: Memory at 800000000 (drc index 80000080) was hot-added
  pseries-hotplug-mem: Memory at 810000000 (drc index 80000081) was hot-added

When adding 1TB of memory and LMB size is 256MB, this leads to 4096
messages to be displayed on the console. These messages are not really
helpful for the end user, so moving them to the DEBUG level.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
[mpe: Tweak change log wording]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201211145954.90143-1-ldufour@linux.ibm.com
3 years agopowerpc/ps3: use dma_mapping_error()
Vincent Stehlé [Sun, 13 Dec 2020 18:26:22 +0000 (19:26 +0100)]
powerpc/ps3: use dma_mapping_error()

The DMA address returned by dma_map_single() should be checked with
dma_mapping_error(). Fix the ps3stor_setup() function accordingly.

Fixes: 80071802cb9c ("[POWERPC] PS3: Storage Driver Core")
Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201213182622.23047-1-vincent.stehle@laposte.net
3 years agopowerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
Christophe Leroy [Thu, 15 Oct 2020 10:52:20 +0000 (10:52 +0000)]
powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10

ppc-linux-objdump -d vmlinux | grep -e "<csum_partial>" -e "<__csum_partial>"

With gcc9 I get:

c0017ef8 <__csum_partial>:
c00182fc: 4b ff fb fd  bl      c0017ef8 <__csum_partial>
c0018478: 4b ff fa 80  b       c0017ef8 <__csum_partial>
c03e8458: 4b c2 fa a0  b       c0017ef8 <__csum_partial>
c03e8518: 4b c2 f9 e1  bl      c0017ef8 <__csum_partial>
c03ef410: 4b c2 8a e9  bl      c0017ef8 <__csum_partial>
c03f0b24: 4b c2 73 d5  bl      c0017ef8 <__csum_partial>
c04279a4: 4b bf 05 55  bl      c0017ef8 <__csum_partial>
c0429820: 4b be e6 d9  bl      c0017ef8 <__csum_partial>
c0429944: 4b be e5 b5  bl      c0017ef8 <__csum_partial>
c042b478: 4b be ca 81  bl      c0017ef8 <__csum_partial>
c042b554: 4b be c9 a5  bl      c0017ef8 <__csum_partial>
c045f15c: 4b bb 8d 9d  bl      c0017ef8 <__csum_partial>
c0492190: 4b b8 5d 69  bl      c0017ef8 <__csum_partial>
c0492310: 4b b8 5b e9  bl      c0017ef8 <__csum_partial>
c0495594: 4b b8 29 65  bl      c0017ef8 <__csum_partial>
c049c420: 4b b7 ba d9  bl      c0017ef8 <__csum_partial>
c049c870: 4b b7 b6 89  bl      c0017ef8 <__csum_partial>
c049c930: 4b b7 b5 c9  bl      c0017ef8 <__csum_partial>
c04a9ca0: 4b b6 e2 59  bl      c0017ef8 <__csum_partial>
c04bdde4: 4b b5 a1 15  bl      c0017ef8 <__csum_partial>
c04be480: 4b b5 9a 79  bl      c0017ef8 <__csum_partial>
c04be710: 4b b5 97 e9  bl      c0017ef8 <__csum_partial>
c04c969c: 4b b4 e8 5d  bl      c0017ef8 <__csum_partial>
c04ca2fc: 4b b4 db fd  bl      c0017ef8 <__csum_partial>
c04cf5bc: 4b b4 89 3d  bl      c0017ef8 <__csum_partial>
c04d0440: 4b b4 7a b9  bl      c0017ef8 <__csum_partial>

With gcc10 I get:

c0018d08 <__csum_partial>:
c0019020 <csum_partial>:
c0019020: 4b ff fc e8  b       c0018d08 <__csum_partial>
c001914c: 4b ff fe d4  b       c0019020 <csum_partial>
c0019250: 4b ff fd d1  bl      c0019020 <csum_partial>
c03e404c <csum_partial>:
c03e404c: 4b c3 4c bc  b       c0018d08 <__csum_partial>
c03e4050: 4b ff ff fc  b       c03e404c <csum_partial>
c03e40fc: 4b ff ff 51  bl      c03e404c <csum_partial>
c03e6680: 4b ff d9 cd  bl      c03e404c <csum_partial>
c03e68c4: 4b ff d7 89  bl      c03e404c <csum_partial>
c03e7934: 4b ff c7 19  bl      c03e404c <csum_partial>
c03e7bf8: 4b ff c4 55  bl      c03e404c <csum_partial>
c03eb148: 4b ff 8f 05  bl      c03e404c <csum_partial>
c03ecf68: 4b c2 bd a1  bl      c0018d08 <__csum_partial>
c04275b8 <csum_partial>:
c04275b8: 4b bf 17 50  b       c0018d08 <__csum_partial>
c0427884: 4b ff fd 35  bl      c04275b8 <csum_partial>
c0427b18: 4b ff fa a1  bl      c04275b8 <csum_partial>
c0427bd8: 4b ff f9 e1  bl      c04275b8 <csum_partial>
c0427cd4: 4b ff f8 e5  bl      c04275b8 <csum_partial>
c0427e34: 4b ff f7 85  bl      c04275b8 <csum_partial>
c045a1c0: 4b bb eb 49  bl      c0018d08 <__csum_partial>
c0489464 <csum_partial>:
c0489464: 4b b8 f8 a4  b       c0018d08 <__csum_partial>
c04896b0: 4b ff fd b5  bl      c0489464 <csum_partial>
c048982c: 4b ff fc 39  bl      c0489464 <csum_partial>
c0490158: 4b b8 8b b1  bl      c0018d08 <__csum_partial>
c0492f0c <csum_partial>:
c0492f0c: 4b b8 5d fc  b       c0018d08 <__csum_partial>
c049326c: 4b ff fc a1  bl      c0492f0c <csum_partial>
c049333c: 4b ff fb d1  bl      c0492f0c <csum_partial>
c0493b18: 4b ff f3 f5  bl      c0492f0c <csum_partial>
c0493f50: 4b ff ef bd  bl      c0492f0c <csum_partial>
c0493ffc: 4b ff ef 11  bl      c0492f0c <csum_partial>
c04a0f78: 4b b7 7d 91  bl      c0018d08 <__csum_partial>
c04b3e3c: 4b b6 4e cd  bl      c0018d08 <__csum_partial>
c04b40d0 <csum_partial>:
c04b40d0: 4b b6 4c 38  b       c0018d08 <__csum_partial>
c04b4448: 4b ff fc 89  bl      c04b40d0 <csum_partial>
c04b46f4: 4b ff f9 dd  bl      c04b40d0 <csum_partial>
c04bf448: 4b b5 98 c0  b       c0018d08 <__csum_partial>
c04c5264: 4b b5 3a a5  bl      c0018d08 <__csum_partial>
c04c61e4: 4b b5 2b 25  bl      c0018d08 <__csum_partial>

gcc10 defines multiple versions of csum_partial() which are just
an unconditionnal branch to __csum_partial().

To enforce inlining of that branch to __csum_partial(),
mark csum_partial() as __always_inline.

With this patch with gcc10:

c0018d08 <__csum_partial>:
c0019148: 4b ff fb c0  b       c0018d08 <__csum_partial>
c001924c: 4b ff fa bd  bl      c0018d08 <__csum_partial>
c03e40ec: 4b c3 4c 1d  bl      c0018d08 <__csum_partial>
c03e4120: 4b c3 4b e8  b       c0018d08 <__csum_partial>
c03eb004: 4b c2 dd 05  bl      c0018d08 <__csum_partial>
c03ecef4: 4b c2 be 15  bl      c0018d08 <__csum_partial>
c0427558: 4b bf 17 b1  bl      c0018d08 <__csum_partial>
c04286e4: 4b bf 06 25  bl      c0018d08 <__csum_partial>
c0428cd8: 4b bf 00 31  bl      c0018d08 <__csum_partial>
c0428d84: 4b be ff 85  bl      c0018d08 <__csum_partial>
c045a17c: 4b bb eb 8d  bl      c0018d08 <__csum_partial>
c0489450: 4b b8 f8 b9  bl      c0018d08 <__csum_partial>
c0491860: 4b b8 74 a9  bl      c0018d08 <__csum_partial>
c0492eec: 4b b8 5e 1d  bl      c0018d08 <__csum_partial>
c04a0eac: 4b b7 7e 5d  bl      c0018d08 <__csum_partial>
c04b3e34: 4b b6 4e d5  bl      c0018d08 <__csum_partial>
c04b426c: 4b b6 4a 9d  bl      c0018d08 <__csum_partial>
c04b463c: 4b b6 46 cd  bl      c0018d08 <__csum_partial>
c04c004c: 4b b5 8c bd  bl      c0018d08 <__csum_partial>
c04c0368: 4b b5 89 a1  bl      c0018d08 <__csum_partial>
c04c5254: 4b b5 3a b5  bl      c0018d08 <__csum_partial>
c04c60d4: 4b b5 2c 35  bl      c0018d08 <__csum_partial>

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a1d31f84ddb0926813b17fcd5cc7f3fa7b4deac2.1602759123.git.christophe.leroy@csgroup.eu
3 years agopowerpc/perf: Fix Threshold Event Counter Multiplier width for P10
Madhavan Srinivasan [Tue, 15 Dec 2020 08:56:18 +0000 (03:56 -0500)]
powerpc/perf: Fix Threshold Event Counter Multiplier width for P10

Threshold Event Counter Multiplier (TECM) is part of Monitor Mode
Control Register A (MMCRA). This field along with Threshold Event
Counter Exponent (TECE) is used to get threshould counter value.
In Power10, this is a 8bit field, so patch fixes the
current code to modify the MMCRA[TECM] extraction macro to
handle this change. ISA v3.1 says this is a 7 bit field but
POWER10 it's actually 8 bits which will hopefully be fixed
in ISA v3.1 update.

Fixes: 170a315f41c6 ("powerpc/perf: Support to export MMCRA[TEC*] field to userspace")
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1608022578-1532-1-git-send-email-atrajeev@linux.vnet.ibm.com
3 years agopowerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range()
Christophe Leroy [Sat, 12 Dec 2020 13:41:25 +0000 (13:41 +0000)]
powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range()

Commit 7bfe54b5f165 ("powerpc/mm: Refactor the floor/ceiling check in
hugetlb range freeing functions") inadvertely removed the mask
applied to start parameter in those two functions, leading to the
following crash on power9.

  LTP: starting hugemmap05_1 (hugemmap05 -m)
  ------------[ cut here ]------------
  kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:387!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 NUMA PowerNV
  ...
  CPU: 99 PID: 308 Comm: ksoftirqd/99 Tainted: G           O      5.10.0-rc7-next-20201211 #1
  NIP:  c00000000005dbec LR: c0000000003352f4 CTR: 0000000000000000
  REGS: c00020000bb6f830 TRAP: 0700   Tainted: G           O       (5.10.0-rc7-next-20201211)
  MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002284  XER: 20040000
  GPR00: c0000000003352f4 c00020000bb6fad0 c000000007f70b00 c0002000385b3ff0
  GPR04: 0000000000000000 0000000000000003 c00020000bb6f8b4 0000000000000001
  GPR08: 0000000000000001 0000000000000009 0000000000000008 0000000000000002
  GPR12: 0000000024002488 c000201fff649c00 c000000007f2a20c 0000000000000000
  GPR16: 0000000000000007 0000000000000000 c000000000194d10 c000000000194d10
  GPR24: 0000000000000014 0000000000000015 c000201cc6e72398 c000000007fac4b4
  GPR28: c000000007f2bf80 c000000007fac2f8 0000000000000008 c000200033870000
  NIP [c00000000005dbec] __tlb_remove_table+0x1dc/0x1e0
                         pgtable_free at arch/powerpc/mm/book3s64/pgtable.c:387
                         (inlined by) __tlb_remove_table at arch/powerpc/mm/book3s64/pgtable.c:405
  LR [c0000000003352f4] tlb_remove_table_rcu+0x54/0xa0
  Call Trace:
    __tlb_remove_table+0x13c/0x1e0 (unreliable)
    tlb_remove_table_rcu+0x54/0xa0
    __tlb_remove_table_free at mm/mmu_gather.c:101
    (inlined by) tlb_remove_table_rcu at mm/mmu_gather.c:156
    rcu_core+0x35c/0xbb0
    rcu_do_batch at kernel/rcu/tree.c:2502
    (inlined by) rcu_core at kernel/rcu/tree.c:2737
    __do_softirq+0x480/0x704
    run_ksoftirqd+0x74/0xd0
    run_ksoftirqd at kernel/softirq.c:651
    (inlined by) run_ksoftirqd at kernel/softirq.c:642
    smpboot_thread_fn+0x278/0x320
    kthread+0x1c4/0x1d0
    ret_from_kernel_thread+0x5c/0x80

Properly apply the masks before calling pmd_free_tlb() and
pud_free_tlb() respectively.

Fixes: 7bfe54b5f165 ("powerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions")
Reported-by: Qian Cai <qcai@redhat.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/56feccd7b6fcd98e353361a233fa7bb8e67c3164.1607780469.git.christophe.leroy@csgroup.eu
3 years agoKVM: PPC: Book3S HV: Fix mask size for emulated msgsndp
Leonardo Bras [Tue, 8 Dec 2020 21:57:08 +0000 (18:57 -0300)]
KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp

According to ISAv3.1 and ISAv3.0b, the msgsndp is described to split
RB in:
  msgtype <- (RB) 32:36
  payload <- (RB) 37:63
  t       <- (RB) 57:63

The current way of getting 'msgtype', and 't' is missing their MSB:
  msgtype: ((arg >> 27) & 0xf) : Gets (RB) 33:36, missing bit 32
  t:       (arg &= 0x3f)       : Gets (RB) 58:63, missing bit 57

Fixes this by applying the correct mask.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201208215707.31149-1-leobras.c@gmail.com
3 years agoKVM: PPC: fix comparison to bool warning
Kaixu Xia [Sat, 7 Nov 2020 15:49:38 +0000 (23:49 +0800)]
KVM: PPC: fix comparison to bool warning

Fix the following coccicheck warning:

./arch/powerpc/kvm/booke.c:503:6-16: WARNING: Comparison to bool
./arch/powerpc/kvm/booke.c:505:6-17: WARNING: Comparison to bool
./arch/powerpc/kvm/booke.c:507:6-16: WARNING: Comparison to bool

Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1604764178-8087-1-git-send-email-kaixuxia@tencent.com
3 years agoKVM: PPC: Book3S: Assign boolean values to a bool variable
Kaixu Xia [Sat, 7 Nov 2020 06:26:22 +0000 (14:26 +0800)]
KVM: PPC: Book3S: Assign boolean values to a bool variable

Fix the following coccinelle warnings:

./arch/powerpc/kvm/book3s_xics.c:476:3-15: WARNING: Assignment of 0/1 to bool variable
./arch/powerpc/kvm/book3s_xics.c:504:3-15: WARNING: Assignment of 0/1 to bool variable

Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1604730382-5810-1-git-send-email-kaixuxia@tencent.com
3 years agopowerpc: Inline setup_kup()
Michael Ellerman [Mon, 14 Dec 2020 10:56:16 +0000 (21:56 +1100)]
powerpc: Inline setup_kup()

setup_kup() is used by both 64-bit and 32-bit code. However on 64-bit
it must not be __init, because it's used for CPU hotplug, whereas on
32-bit it should be __init because it calls setup_kuap/kuep() which
are __init.

We worked around that problem in the past by marking it __ref, see
commit 67d53f30e23e ("powerpc/mm: fix section mismatch for
setup_kup()").

Marking it __ref basically just omits it from section mismatch
checking, which can lead to bugs, and in fact it did, see commit
44b4c4450f8d ("powerpc/64s: Mark the kuap/kuep functions non __init")

We can avoid all these problems by just making it static inline.
Because all it does is call other functions, making it inline actually
shrinks the 32-bit vmlinux by ~76 bytes.

Make it __always_inline as pointed out by Christophe.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201214123011.311024-1-mpe@ellerman.id.au
3 years agopowerpc/64s: Mark the kuap/kuep functions non __init
Aneesh Kumar K.V [Mon, 14 Dec 2020 08:01:21 +0000 (13:31 +0530)]
powerpc/64s: Mark the kuap/kuep functions non __init

The kernel calls these functions on CPU online and hence they must not
be marked __init.

Otherwise if the memory they occupied has been reused the system can
crash in various ways. Sachin reported it caused his LPAR to
spontaneously restart with no other output. With xmon enabled it may
drop into xmon with a dump like:

  cpu 0x1: Vector: 700 (Program Check) at [c000000003c5fcb0]
      pc: 00000000011e0a78
      lr: 00000000011c51d4
      sp: c000000003c5ff50
     msr: 8000000000081001
    current = 0xc000000002c12b00
    paca    = 0xc000000003cff280  irqmask: 0x03  irq_happened: 0x01
      pid   = 0, comm = swapper/1
  ...
  [c000000003c5ff500000000000087c38 (unreliable)
  [c000000003c5ff70000000000003870c
  [c000000003c5ff90000000000000d108

Fixes: 3b47b7549ead ("powerpc/book3s64/kuap: Move KUAP related function outside radix")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Expand change log with details and xmon output]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201214080121.358567-1-aneesh.kumar@linux.ibm.com
3 years agoKVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering
Cédric Le Goater [Thu, 10 Dec 2020 17:14:50 +0000 (18:14 +0100)]
KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering

When the XIVE resources are allocated at the HW level, the VP
structures describing the vCPUs of a guest are distributed among
the chips to optimize the PowerBUS usage. For best performance, the
guest vCPUs can be pinned to match the VP structure distribution.

Currently, the VP identifiers are deduced from the vCPU id using
the kvmppc_pack_vcpu_id() routine which is not incorrect but not
optimal either. It VSMT is used, the result is not continuous and
the constraints on HW resources described above can not be met.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-14-clg@kaod.org
3 years agopowerpc/xive: Improve error reporting of OPAL calls
Cédric Le Goater [Thu, 10 Dec 2020 17:14:49 +0000 (18:14 +0100)]
powerpc/xive: Improve error reporting of OPAL calls

Introduce a vp_err() macro to standardize error reporting.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-13-clg@kaod.org
3 years agopowerpc/xive: Simplify xive_do_source_eoi()
Cédric Le Goater [Thu, 10 Dec 2020 17:14:48 +0000 (18:14 +0100)]
powerpc/xive: Simplify xive_do_source_eoi()

Previous patches removed the need of the first argument which was a
hack for Firwmware EOI. Remove it and flatten the routine which has
became simpler.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-12-clg@kaod.org
3 years agopowerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW
Cédric Le Goater [Thu, 10 Dec 2020 17:14:47 +0000 (18:14 +0100)]
powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW

This flag was used to support the P9 DD1 and we have stopped
supporting this CPU when DD2 came out. See skiboot commit:

  https://github.com/open-power/skiboot/commit/0b0d15e3c170

Also, remove eoi handler which is now unused.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-11-clg@kaod.org
3 years agopowerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW
Cédric Le Goater [Thu, 10 Dec 2020 17:14:46 +0000 (18:14 +0100)]
powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW

This flag was used to support the PHB4 LSIs on P9 DD1 and we have
stopped supporting this CPU when DD2 came out. See skiboot commit:

  https://github.com/open-power/skiboot/commit/0b0d15e3c170

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-10-clg@kaod.org
3 years agopowerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG
Cédric Le Goater [Thu, 10 Dec 2020 17:14:45 +0000 (18:14 +0100)]
powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG

This flag was used to support the PHB4 LSIs on P9 DD1 and we have
stopped supporting this CPU when DD2 came out. See skiboot commit:

  https://github.com/open-power/skiboot/commit/0b0d15e3c170

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-9-clg@kaod.org
3 years agopowerpc: Increase NR_IRQS range to support more KVM guests
Cédric Le Goater [Thu, 10 Dec 2020 17:14:44 +0000 (18:14 +0100)]
powerpc: Increase NR_IRQS range to support more KVM guests

PowerNV systems can handle up to 4K guests and 1M interrupt numbers
per chip. Increase the range of allowed interrupts to support a larger
number of guests.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-8-clg@kaod.org
3 years agopowerpc/xive: Add a debug_show handler to the XIVE irq_domain
Cédric Le Goater [Thu, 10 Dec 2020 17:14:43 +0000 (18:14 +0100)]
powerpc/xive: Add a debug_show handler to the XIVE irq_domain

Full state of the Linux interrupt descriptors can be dumped under
debugfs when compiled with CONFIG_GENERIC_IRQ_DEBUGFS. Add support for
the XIVE interrupt controller.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-7-clg@kaod.org
3 years agopowerpc/xive: Add a name to the IRQ domain
Cédric Le Goater [Thu, 10 Dec 2020 17:14:42 +0000 (18:14 +0100)]
powerpc/xive: Add a name to the IRQ domain

We hope one day to handle multiple irq_domain in the XIVE driver.
Start simple by setting the name using the DT node.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-6-clg@kaod.org
3 years agopowerpc/xive: Introduce XIVE_IPI_HW_IRQ
Cédric Le Goater [Thu, 10 Dec 2020 17:14:40 +0000 (18:14 +0100)]
powerpc/xive: Introduce XIVE_IPI_HW_IRQ

The XIVE driver deals with CPU IPIs in a peculiar way. Each CPU has
its own XIVE IPI interrupt allocated at the HW level, for PowerNV, or
at the hypervisor level for pSeries. In practice, these interrupts are
not always used. pSeries/PowerVM prefers local doorbells for local
threads since they are faster. On PowerNV, global doorbells are also
preferred for the same reason.

The mapping in the Linux is reduced to a single interrupt using HW
interrupt number 0 and a custom irq_chip to handle EOI. This can cause
performance issues in some benchmark (ipistorm) on multichip systems.

Clarify the use of the 0 value, it will help in improving multichip
support.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-4-clg@kaod.org
3 years agopowerpc/xive: Rename XIVE_IRQ_NO_EOI to show its a flag
Cédric Le Goater [Thu, 10 Dec 2020 17:14:39 +0000 (18:14 +0100)]
powerpc/xive: Rename XIVE_IRQ_NO_EOI to show its a flag

This is a simple cleanup to identify easily all flags of the XIVE
interrupt structure. The interrupts flagged with XIVE_IRQ_FLAG_NO_EOI
are the escalations used to wake up vCPUs in KVM. They are handled
very differently from the rest.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-3-clg@kaod.org
3 years agoKVM: PPC: Book3S HV: XIVE: Show detailed configuration in debug output
Cédric Le Goater [Thu, 10 Dec 2020 17:14:38 +0000 (18:14 +0100)]
KVM: PPC: Book3S HV: XIVE: Show detailed configuration in debug output

This is useful to track allocation of the HW resources on per guest
basis. Making sure IPIs are local to the chip of the vCPUs reduces
rerouting between interrupt controllers and gives better performance
in case of pinning. Checking the distribution of VP structures on the
chips also helps in reducing PowerBUS traffic.

[ clg: resurrected show_sources and reworked ouput ]

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-2-clg@kaod.org
3 years agopowerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache
Gautham R. Shenoy [Thu, 10 Dec 2020 10:38:59 +0000 (16:08 +0530)]
powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache

On POWER platforms where only some groups of threads within a core
share the L2-cache (indicated by the ibm,thread-groups device-tree
property), we currently print the incorrect shared_cpu_map/list for
L2-cache in the sysfs.

This patch reports the correct shared_cpu_map/list on such platforms.

Example:
On a platform with "ibm,thread-groups" set to
                 00000001 00000002 00000004 00000000
                 00000002 00000004 00000006 00000001
                 00000003 00000005 00000007 00000002
                 00000002 00000004 00000000 00000002
                 00000004 00000006 00000001 00000003
                 00000005 00000007

This indicates that threads {0,2,4,6} in the core share the L2-cache
and threads {1,3,5,7} in the core share the L2 cache.

However, without the patch, the shared_cpu_map/list for L2 for CPUs 0,
1 is reported in the sysfs as follows:

/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0-7
/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_map:000000,000000ff

/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list:0-7
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map:000000,000000ff

With the patch, the shared_cpu_map/list for L2 cache for CPUs 0, 1 is
correctly reported as follows:

/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0,2,4,6
/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_map:000000,00000055

/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list:1,3,5,7
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map:000000,000000aa

This patch also defines cpu_l2_cache_mask() for !CONFIG_SMP case.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-6-git-send-email-ego@linux.vnet.ibm.com
3 years agopowerpc/smp: Add support detecting thread-groups sharing L2 cache
Gautham R. Shenoy [Thu, 10 Dec 2020 10:38:58 +0000 (16:08 +0530)]
powerpc/smp: Add support detecting thread-groups sharing L2 cache

On POWER systems, groups of threads within a core sharing the L2-cache
can be indicated by the "ibm,thread-groups" property array with the
identifier "2".

This patch adds support for detecting this, and when present, populate
the populating the cpu_l2_cache_mask of every CPU to the core-siblings
which share L2 with the CPU as specified in the by the
"ibm,thread-groups" property array.

On a platform with the following "ibm,thread-group" configuration
 00000001 00000002 00000004 00000000
 00000002 00000004 00000006 00000001
 00000003 00000005 00000007 00000002
 00000002 00000004 00000000 00000002
 00000004 00000006 00000001 00000003
 00000005 00000007

Without this patch, the sched-domain hierarchy for CPUs 0,1 would be
CPU0 attaching sched-domain(s):
domain-0: span=0,2,4,6 level=SMT
domain-1: span=0-7 level=CACHE
domain-2: span=0-15,24-39,48-55 level=MC
domain-3: span=0-55 level=DIE

CPU1 attaching sched-domain(s):
domain-0: span=1,3,5,7 level=SMT
domain-1: span=0-7 level=CACHE
domain-2: span=0-15,24-39,48-55 level=MC
domain-3: span=0-55 level=DIE

The CACHE domain at 0-7 is incorrect since the ibm,thread-groups
sub-array
[00000002 00000002 00000004
 00000000 00000002 00000004 00000006
 00000001 00000003 00000005 00000007]
indicates that L2 (Property "2") is shared only between the threads of a single
group. There are "2" groups of threads where each group contains "4"
threads each. The groups being {0,2,4,6} and {1,3,5,7}.

With this patch, the sched-domain hierarchy for CPUs 0,1 would be
      CPU0 attaching sched-domain(s):
domain-0: span=0,2,4,6 level=SMT
domain-1: span=0-15,24-39,48-55 level=MC
domain-2: span=0-55 level=DIE

CPU1 attaching sched-domain(s):
domain-0: span=1,3,5,7 level=SMT
domain-1: span=0-15,24-39,48-55 level=MC
domain-2: span=0-55 level=DIE

The CACHE domain with span=0,2,4,6 for CPU 0 (span=1,3,5,7 for CPU 1
resp.) gets degenerated into the SMT domain. Furthermore, the
last-level-cache domain gets correctly set to the SMT sched-domain.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-5-git-send-email-ego@linux.vnet.ibm.com
3 years agopowerpc/smp: Rename init_thread_group_l1_cache_map() to make it generic
Gautham R. Shenoy [Thu, 10 Dec 2020 10:38:57 +0000 (16:08 +0530)]
powerpc/smp: Rename init_thread_group_l1_cache_map() to make it generic

init_thread_group_l1_cache_map() initializes the per-cpu cpumask
thread_group_l1_cache_map with the core-siblings which share L1 cache
with the CPU. Make this function generic to the cache-property (L1 or
L2) and update a suitable mask. This is a preparatory patch for the
next patch where we will introduce discovery of thread-groups that
share L2-cache.

No functional change.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-4-git-send-email-ego@linux.vnet.ibm.com
3 years agopowerpc/smp: Rename cpu_l1_cache_map as thread_group_l1_cache_map
Gautham R. Shenoy [Thu, 10 Dec 2020 10:38:56 +0000 (16:08 +0530)]
powerpc/smp: Rename cpu_l1_cache_map as thread_group_l1_cache_map

On platforms which have the "ibm,thread-groups" property, the per-cpu
variable cpu_l1_cache_map keeps a track of which group of threads
within the same core share the L1 cache, Instruction and Data flow.

This patch renames the variable to "thread_group_l1_cache_map" to make
it consistent with a subsequent patch which will introduce
thread_group_l2_cache_map.

This patch introduces no functional change.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-3-git-send-email-ego@linux.vnet.ibm.com
3 years agopowerpc/smp: Parse ibm,thread-groups with multiple properties
Gautham R. Shenoy [Thu, 10 Dec 2020 10:38:55 +0000 (16:08 +0530)]
powerpc/smp: Parse ibm,thread-groups with multiple properties

The "ibm,thread-groups" device-tree property is an array that is used
to indicate if groups of threads within a core share certain
properties. It provides details of which property is being shared by
which groups of threads. This array can encode information about
multiple properties being shared by different thread-groups within the
core.

Example: Suppose,
"ibm,thread-groups" = [1,2,4,8,10,12,14,9,11,13,15,2,2,4,8,10,12,14,9,11,13,15]

This can be decomposed up into two consecutive arrays:

a) [1,2,4,8,10,12,14,9,11,13,15]
b) [2,2,4,8,10,12,14,9,11,13,15]

where in,

a) provides information of Property "1" being shared by "2" groups,
   each with "4" threads each. The "ibm,ppc-interrupt-server#s" of the
   first group is {8,10,12,14} and the "ibm,ppc-interrupt-server#s" of
   the second group is {9,11,13,15}. Property "1" is indicative of
   the thread in the group sharing L1 cache, translation cache and
   Instruction Data flow.

b) provides information of Property "2" being shared by "2" groups,
   each group with "4" threads. The "ibm,ppc-interrupt-server#s" of
   the first group is {8,10,12,14} and the
   "ibm,ppc-interrupt-server#s" of the second group is
   {9,11,13,15}. Property "2" indicates that the threads in each group
   share the L2-cache.

The existing code assumes that the "ibm,thread-groups" encodes
information about only one property. Hence even on platforms which
encode information about multiple properties being shared by the
corresponding groups of threads, the current code will only pick the
first one. (In the above example, it will only consider
[1,2,4,8,10,12,14,9,11,13,15] but not [2,2,4,8,10,12,14,9,11,13,15]).

This patch extends the parsing support on platforms which encode
information about multiple properties being shared by the
corresponding groups of threads.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-2-git-send-email-ego@linux.vnet.ibm.com
3 years agopowerpc/watchpoint: Workaround P10 DD1 issue with VSX-32 byte instructions
Ravi Bangoria [Fri, 6 Nov 2020 04:56:50 +0000 (10:26 +0530)]
powerpc/watchpoint: Workaround P10 DD1 issue with VSX-32 byte instructions

POWER10 DD1 has an issue where it generates watchpoint exceptions when
it shouldn't. The conditions where this occur are:

 - octword op
 - ending address of DAWR range is less than starting address of op
 - those addresses need to be in the same or in two consecutive 512B
   blocks
 - 'op address + 64B' generates an address that has a carry into bit
   52 (crosses 2K boundary)

Handle such spurious exception by considering them as extraneous and
emulating/single-steeping instruction without generating an event.

[ravi: Fixed build warning reported by lkp@intel.com]
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201106045650.278987-1-ravi.bangoria@linux.ibm.com
3 years agopowerpc/sstep: Add testcases for VSX vector paired load/store instructions
Balamuruhan S [Sun, 11 Oct 2020 05:09:08 +0000 (10:39 +0530)]
powerpc/sstep: Add testcases for VSX vector paired load/store instructions

Add testcases for VSX vector paired load/store instructions.
Sample o/p:

  emulate_step_test: lxvp           : PASS
  emulate_step_test: stxvp          : PASS
  emulate_step_test: lxvpx          : PASS
  emulate_step_test: stxvpx         : PASS
  emulate_step_test: plxvp          : PASS
  emulate_step_test: pstxvp         : PASS

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-6-ravi.bangoria@linux.ibm.com
3 years agopowerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions
Balamuruhan S [Sun, 11 Oct 2020 05:09:07 +0000 (10:39 +0530)]
powerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions

Add instruction encodings, DQ, D0, D1 immediate, XTP, XSP operands as
macros for new VSX vector paired instructions,
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-5-ravi.bangoria@linux.ibm.com
3 years agopowerpc/sstep: Support VSX vector paired storage access instructions
Balamuruhan S [Sun, 11 Oct 2020 05:09:06 +0000 (10:39 +0530)]
powerpc/sstep: Support VSX vector paired storage access instructions

VSX Vector Paired instructions loads/stores an octword (32 bytes)
from/to storage into two sequential VSRs. Add emulation support
for these new instructions:
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

[kernel test robot reported a build failure]

Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-4-ravi.bangoria@linux.ibm.com
3 years agopowerpc/sstep: Cover new VSX instructions under CONFIG_VSX
Ravi Bangoria [Sun, 11 Oct 2020 05:09:05 +0000 (10:39 +0530)]
powerpc/sstep: Cover new VSX instructions under CONFIG_VSX

Recently added Power10 prefixed VSX instruction are included
unconditionally in the kernel. If they are executed on a
machine without VSX support, it might create issues. Fix that.
Also fix one mnemonics spelling mistake in comment.

Fixes: 50b80a12e4cc ("powerpc sstep: Add support for prefixed load/stores")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-3-ravi.bangoria@linux.ibm.com
3 years agopowerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
Balamuruhan S [Sun, 11 Oct 2020 05:09:04 +0000 (10:39 +0530)]
powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set

Unconditional emulation of prefixed instructions will allow
emulation of them on Power10 predecessors which might cause
issues. Restrict that.

Fixes: 3920742b92f5 ("powerpc sstep: Add support for prefixed fixed-point arithmetic")
Fixes: 50b80a12e4cc ("powerpc sstep: Add support for prefixed load/stores")
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-2-ravi.bangoria@linux.ibm.com
3 years agopowerpc/64s: Remove idle workaround code from restore_cpu_cpufeatures
Nicholas Piggin [Thu, 11 Jul 2019 02:24:04 +0000 (12:24 +1000)]
powerpc/64s: Remove idle workaround code from restore_cpu_cpufeatures

Idle code no longer uses the .cpu_restore CPU operation to restore
SPRs, so this workaround is no longer required.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190711022404.18132-2-npiggin@gmail.com
3 years agopowerpc/perf: Exclude kernel samples while counting events in user space.
Athira Rajeev [Wed, 25 Nov 2020 07:26:55 +0000 (02:26 -0500)]
powerpc/perf: Exclude kernel samples while counting events in user space.

Perf event attritube supports exclude_kernel flag to avoid
sampling/profiling in supervisor state (kernel). Based on this event
attr flag, Monitor Mode Control Register bit is set to freeze on
supervisor state. But sometimes (due to hardware limitation), Sampled
Instruction Address Register (SIAR) locks on to kernel address even
when freeze on supervisor is set. Patch here adds a check to drop
those samples.

Cc: stable@vger.kernel.org
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1606289215-1433-1-git-send-email-atrajeev@linux.vnet.ibm.com
3 years agopowerpc/64: irq replay remove decrementer overflow check
Nicholas Piggin [Sat, 7 Nov 2020 01:43:36 +0000 (11:43 +1000)]
powerpc/64: irq replay remove decrementer overflow check

This is way to catch some cases of decrementer overflow, when the
decrementer has underflowed an odd number of times, while MSR[EE] was
disabled.

With a typical small decrementer, a timer that fires when MSR[EE] is
disabled will be "lost" if MSR[EE] remains disabled for between 4.3 and
8.6 seconds after the timer expires. In any case, the decrementer
interrupt would be taken at 8.6 seconds and the timer would be found at
that point.

So this check is for catching extreme latency events, and it prevents
those latencies from being a further few seconds long.  It's not obvious
this is a good tradeoff. This is already a watchdog magnitude event and
that situation is not improved a significantly with this check. For
large decrementers, it's useless.

Therefore remove this check, which avoids a mftb when enabling hard
disabled interrupts (e.g., when enabling after coming from hardware
interrupt handlers). Perhaps more importantly, it also removes the
clunky MSR[EE] vs PACA_IRQ_HARD_DIS incoherency in soft-interrupt replay
which simplifies the code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201107014336.2337337-1-npiggin@gmail.com
3 years agopowerpc/64s: Remove MSR[ISF] bit
Nicholas Piggin [Fri, 6 Nov 2020 04:53:40 +0000 (14:53 +1000)]
powerpc/64s: Remove MSR[ISF] bit

No supported processor implements this mode. Setting the bit in
MSR values can be a bit confusing (and would prevent the bit from
ever being reused). Remove it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201106045340.1935841-1-npiggin@gmail.com
3 years agopowerpc/64s/iommu: Don't use atomic_ function on atomic64_t type
Nicholas Piggin [Wed, 11 Nov 2020 11:07:22 +0000 (21:07 +1000)]
powerpc/64s/iommu: Don't use atomic_ function on atomic64_t type

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111110723.3148665-3-npiggin@gmail.com
3 years agopowerpc/32s: Cleanup around PTE_FLAGS_OFFSET in hash_low.S
Christophe Leroy [Tue, 24 Nov 2020 19:51:57 +0000 (19:51 +0000)]
powerpc/32s: Cleanup around PTE_FLAGS_OFFSET in hash_low.S

PTE_FLAGS_OFFSET is defined in asm/page_32.h and used only
in hash_low.S

And PTE_FLAGS_OFFSET nullity depends on CONFIG_PTE_64BIT

Instead of tests like #if (PTE_FLAGS_OFFSET != 0), use
CONFIG_PTE_64BIT related code.

Also move the definition of PTE_FLAGS_OFFSET into hash_low.S
directly, that improves readability.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f5bc21db7a33dab55924734e6060c2e9daed562e.1606247495.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: In add_hash_page(), calculate VSID later
Christophe Leroy [Tue, 24 Nov 2020 19:51:56 +0000 (19:51 +0000)]
powerpc/32s: In add_hash_page(), calculate VSID later

VSID is only for create_hpte(). When _PAGE_HASHPTE is
already set, add_hash_page() bails out without calling
create_hpte() and doesn't need the value of VSID.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3907199974c89b85a3441cf3f528751173b7649c.1606247495.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Remove unused counters incremented by create_hpte()
Christophe Leroy [Tue, 24 Nov 2020 19:51:55 +0000 (19:51 +0000)]
powerpc/32s: Remove unused counters incremented by create_hpte()

primary_pteg_full and htab_hash_searches are not used.

Remove them.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6470ab99e58c84a5445af43ce4d1d772b0dc3e93.1606247495.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions
Christophe Leroy [Fri, 6 Nov 2020 13:20:54 +0000 (13:20 +0000)]
powerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions

All hugetlb range freeing functions have a verification like the following,
which only differs by the mask used, depending on the page table level.

start &= MASK;
if (start < floor)
return;
if (ceiling) {
ceiling &= MASK;
if (! ceiling)
return;
}
if (end - 1 > ceiling - 1)
return;

Refactor that into a helper function which takes the mask as
an argument, returning true when [start;end[ is not fully
contained inside [floor;ceiling[

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/16a571bb32eb6e8cd44bda484c8d81cd8a25e6d7.1604668827.git.christophe.leroy@csgroup.eu
3 years agopowerpc/fault: Perform exception fixup in do_page_fault()
Christophe Leroy [Wed, 9 Dec 2020 05:29:25 +0000 (05:29 +0000)]
powerpc/fault: Perform exception fixup in do_page_fault()

Exception fixup doesn't require the heady full regs saving,
do it from do_page_fault() directly.

For that, split bad_page_fault() in two parts.

As bad_page_fault() can also be called from other places than
handle_page_fault(), it will still perform exception fixup and
fallback on __bad_page_fault().

handle_page_fault() directly calls __bad_page_fault() as the
exception fixup will now be done by do_page_fault()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bd07d6fef9237614cd6d318d8f19faeeadaa816b.1607491748.git.christophe.leroy@csgroup.eu
3 years agopowerpc/fault: Avoid heavy search_exception_tables() verification
Christophe Leroy [Wed, 9 Dec 2020 05:29:24 +0000 (05:29 +0000)]
powerpc/fault: Avoid heavy search_exception_tables() verification

search_exception_tables() is an heavy operation, we have to avoid it.
When KUAP is selected, we'll know the fault has been blocked by KUAP.
When it is blocked by KUAP, check whether we are in an expected
userspace access place. If so, emit a warning to spot something is
going work. Otherwise, just remain silent, it will likely Oops soon.

When KUAP is not selected, it behaves just as if the address was
already in the TLBs and no fault was generated.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9870f01e293a5a76c4f4e4ddd4a6b0f63038c591.1607491748.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mm: Move the WARN() out of bad_kuap_fault()
Christophe Leroy [Wed, 9 Dec 2020 05:29:23 +0000 (05:29 +0000)]
powerpc/mm: Move the WARN() out of bad_kuap_fault()

In order to prepare the removal of calls to
search_exception_tables() on the fast path, move the
WARN() out of bad_kuap_fault().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9501311014bd6507e04b27a0c3035186ccf65cd5.1607491748.git.christophe.leroy@csgroup.eu
3 years agopowerpc/fault: Unnest definition of page_fault_is_write() and page_fault_is_bad()
Christophe Leroy [Wed, 9 Dec 2020 05:29:22 +0000 (05:29 +0000)]
powerpc/fault: Unnest definition of page_fault_is_write() and page_fault_is_bad()

To make it more readable, separate page_fault_is_write() and page_fault_is_bad()
to avoir several levels of #ifdefs

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6afaac2495248d68f94c438c5ec36b6010931de5.1607491748.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mm: sanity_check_fault() should work for all, not only BOOK3S
Christophe Leroy [Wed, 9 Dec 2020 05:29:21 +0000 (05:29 +0000)]
powerpc/mm: sanity_check_fault() should work for all, not only BOOK3S

The verification and message introduced by commit 374f3f5979f9
("powerpc/mm/hash: Handle user access of kernel address gracefully")
applies to all platforms, it should not be limited to BOOK3S.

Make the BOOK3S version of sanity_check_fault() the one for all,
and bail out earlier if not BOOK3S.

Fixes: 374f3f5979f9 ("powerpc/mm/hash: Handle user access of kernel address gracefully")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fe199d5af3578d3bf80035d203a94d742a7a28af.1607491748.git.christophe.leroy@csgroup.eu
3 years agopowerpc/ppc-opcode: Add PPC_RAW_MFSPR()
Christophe Leroy [Tue, 24 Nov 2020 15:24:59 +0000 (15:24 +0000)]
powerpc/ppc-opcode: Add PPC_RAW_MFSPR()

Add PPC_RAW_MFSPR() to replace open coding done in 8xx-pmu.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e281e3a611eead8817c49cf06a60072a021af823.1606231483.git.christophe.leroy@csgroup.eu
3 years agopowerpc/8xx: Use SPRN_SPRG_SCRATCH2 in DTLB miss exception
Christophe Leroy [Tue, 24 Nov 2020 15:24:58 +0000 (15:24 +0000)]
powerpc/8xx: Use SPRN_SPRG_SCRATCH2 in DTLB miss exception

Use SPRN_SPRG_SCRATCH2 in DTLB miss exception instead of DAR
in order to be similar to ITLB miss exception.

This also simplifies mpc8xx_pmu_del()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e3cc8f023ef40e1e8ae144e4dd1330a5ff022528.1606231483.git.christophe.leroy@csgroup.eu
3 years agopowerpc/8xx: Use SPRN_SPRG_SCRATCH2 in ITLB miss exception
Christophe Leroy [Tue, 24 Nov 2020 15:24:57 +0000 (15:24 +0000)]
powerpc/8xx: Use SPRN_SPRG_SCRATCH2 in ITLB miss exception

In order to re-enable MMU earlier, ensure ITLB miss exception
cannot clobber SPRN_SPRG_SCRATCH0 and SPRN_SPRG_SCRATCH1.
Do so by using SPRN_SPRG_SCRATCH2 and SPRN_M_TW instead, like
the DTLB miss exception.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/abc78e8e9577d473691ebb9996c6413b37bfd9ca.1606231483.git.christophe.leroy@csgroup.eu
3 years agopowerpc/8xx: Simplify INVALIDATE_ADJACENT_PAGES_CPU15
Christophe Leroy [Tue, 24 Nov 2020 15:24:56 +0000 (15:24 +0000)]
powerpc/8xx: Simplify INVALIDATE_ADJACENT_PAGES_CPU15

We now have r11 available as a scratch register so
INVALIDATE_ADJACENT_PAGES_CPU15() can be simplified.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bdafd651b4ac3a851fd09249f5f3699c50da29f2.1606231483.git.christophe.leroy@csgroup.eu
3 years agopowerpc/8xx: Always pin kernel text TLB
Christophe Leroy [Tue, 24 Nov 2020 15:24:55 +0000 (15:24 +0000)]
powerpc/8xx: Always pin kernel text TLB

There is no big poing in not pinning kernel text anymore, as now
we can keep pinned TLB even with things like DEBUG_PAGEALLOC.

Remove CONFIG_PIN_TLB_TEXT, making it always right.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Drop ifdef around mmu_pin_tlb() to fix build errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/203b89de491e1379f1677a2685211b7c32adfff0.1606231483.git.christophe.leroy@csgroup.eu
3 years agopowerpc/8xx: DEBUG_PAGEALLOC doesn't require an ITLB miss exception handler
Christophe Leroy [Tue, 24 Nov 2020 15:24:54 +0000 (15:24 +0000)]
powerpc/8xx: DEBUG_PAGEALLOC doesn't require an ITLB miss exception handler

Since commit e611939fc8ec ("powerpc/mm: Ensure change_page_attr()
doesn't invalidate pinned TLBs"), pinned TLBs are not anymore
invalidated by __kernel_map_pages() when CONFIG_DEBUG_PAGEALLOC is
selected.

Remove the dependency on CONFIG_DEBUG_PAGEALLOC.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e796c5fcb5898de827c803cf1ab8ba1d7a5d4b76.1606231483.git.christophe.leroy@csgroup.eu
3 years agopowerpc/process: Remove target specific __set_dabr()
Christophe Leroy [Fri, 4 Dec 2020 10:12:51 +0000 (10:12 +0000)]
powerpc/process: Remove target specific __set_dabr()

__set_dabr() are simple functions that can be inline directly
inside set_dabr() and using IS_ENABLED() instead of #ifdef

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c10b263668e137236c71d76648b03cf2cd1ee66f.1607076733.git.christophe.leroy@csgroup.eu
3 years agopowerpc/8xx: Fix early debug when SMC1 is relocated
Christophe Leroy [Fri, 4 Dec 2020 10:11:34 +0000 (10:11 +0000)]
powerpc/8xx: Fix early debug when SMC1 is relocated

When SMC1 is relocated and early debug is selected, the
board hangs is ppc_md.setup_arch(). This is because ones
the microcode has been loaded and SMC1 relocated, early
debug writes in the weed.

To allow smooth continuation, the SMC1 parameter RAM set up
by the bootloader have to be copied into the new location.

Fixes: 43db76f41824 ("powerpc/8xx: Add microcode patch to move SMC parameter RAM.")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b2f71f39eca543f1e4ec06596f09a8b12235c701.1607076683.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Handle PROTFAULT in hash_page() also for CONFIG_PPC_KUAP
Christophe Leroy [Mon, 16 Nov 2020 16:09:31 +0000 (16:09 +0000)]
powerpc/32s: Handle PROTFAULT in hash_page() also for CONFIG_PPC_KUAP

On hash 32 bits, handling minor protection faults like unsetting
dirty flag is heavy if done from the normal page_fault processing,
because it implies hash table software lookup for flushing the entry
and then a DSI is taken anyway to add the entry back.

When KUAP was implemented, as explained in commit a68c31fc01ef
("powerpc/32s: Implement Kernel Userspace Access Protection"),
protection faults has been diverted from hash_page() because
hash_page() was not able to identify a KUAP fault.

Implement KUAP verification in hash_page(), by clearing write
permission when the access is a kernel access and Ks is 1.
This works regardless of the address because kernel segments always
have Ks set to 0 while user segments have Ks set to 0 only
when kernel write to userspace is granted.

Then protection faults can be handled by hash_page() even for KUAP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8a4ffe4798e9ea32aaaccdf85e411bb1beed3500.1605542955.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Make support for 603 and 604+ selectable
Christophe Leroy [Thu, 22 Oct 2020 06:29:44 +0000 (06:29 +0000)]
powerpc/32s: Make support for 603 and 604+ selectable

book3s/32 has two main families:
- CPU with 603 cores that don't have HASH PTE table and
perform SW TLB loading.
- Other CPUs based on 604+ cores that have HASH PTE table.

This leads to some complex logic and additionnal code to
support both. This makes sense for distribution kernels
that aim at running on any CPU, but when you are fine
tuning a kernel for an embedded 603 based board you
don't need all the HASH logic.

Allow selection of support for each family, in order to opt
out unneeded parts of code. At least one must be selected.

Note that some of the CPU supporting HASH also support SW TLB
loading, however it is not supported by Linux kernel at the
time being, because they do not have alternate registers in
the TLB miss exception handlers.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8dde0cdb629a71abc29b0d85a52a86e920376cb6.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Regroup 603 based CPUs in cputable
Christophe Leroy [Thu, 22 Oct 2020 06:29:43 +0000 (06:29 +0000)]
powerpc/32s: Regroup 603 based CPUs in cputable

In order to selectively build the kernel for 603 SW TLB handling,
regroup all 603 based CPUs together.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/45065263fdb9f5cc2a2d210ec2a762ac8bf5b2bc.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Remove CONFIG_PPC_BOOK3S_6xx
Christophe Leroy [Thu, 22 Oct 2020 06:29:42 +0000 (06:29 +0000)]
powerpc/32s: Remove CONFIG_PPC_BOOK3S_6xx

As 601 is gone, CONFIG_PPC_BOO3S_6xx and CONFIG_PPC_BOOK3S_32
are dedundant.

Remove CONFIG_PPC_BOOK3S_6xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f18c16af37f6f77b577bed8d9e12831b695617ae.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Move early_mmu_init() into mmu.c
Christophe Leroy [Thu, 22 Oct 2020 06:29:41 +0000 (06:29 +0000)]
powerpc/32s: Move early_mmu_init() into mmu.c

early_mmu_init() is independent of MMU type and not
directly linked to tlb handling.

In a following patch, tlb.c will be restricted to HASH mmu.

Move early_mmu_init() into mmu.c which is common.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e51b5e2fe6bca623b33116403043d3a1b5eaf826.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Inline flush_hash_entry()
Christophe Leroy [Thu, 22 Oct 2020 06:29:40 +0000 (06:29 +0000)]
powerpc/32s: Inline flush_hash_entry()

flush_hash_entry() is a simple function calling
flush_hash_pages() if it's a hash MMU or doing nothing otherwise.

Inline it.

And use it also in __ptep_test_and_clear_young().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9af895be7d4b404d40e749a2659552fd138e62c4.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Inline tlb_flush()
Christophe Leroy [Thu, 22 Oct 2020 06:29:39 +0000 (06:29 +0000)]
powerpc/32s: Inline tlb_flush()

On book3s/32, tlb_flush() does nothing when the CPU has a hash table,
it calls _tlbia() otherwise.

Inline it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ebc933d1c530a19ef3cf7983f6ae94814f6e92ac.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Split and inline flush_range()
Christophe Leroy [Thu, 22 Oct 2020 06:29:38 +0000 (06:29 +0000)]
powerpc/32s: Split and inline flush_range()

flush_range() handle both the MMU_FTR_HPTE_TABLE case and
the other case.

The non MMU_FTR_HPTE_TABLE case is trivial as it is only a call
to _tlbie()/_tlbia() which is not worth a dedicated function.

Make flush_range() a hash specific and call it from tlbflush.h based
on mmu_has_feature(MMU_FTR_HPTE_TABLE).

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/132ab19aae52abc8e06ab524ec86d4229b5b9c3d.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Inline flush_tlb_range() and flush_tlb_kernel_range()
Christophe Leroy [Thu, 22 Oct 2020 06:29:37 +0000 (06:29 +0000)]
powerpc/32s: Inline flush_tlb_range() and flush_tlb_kernel_range()

flush_tlb_range() and flush_tlb_kernel_range() are trivial calls to
flush_range().

Make flush_range() global and inline flush_tlb_range()
and flush_tlb_kernel_range().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c7029a78e78709ad9272d7a44260e06b649169b2.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Split and inline flush_tlb_mm() and flush_tlb_page()
Christophe Leroy [Thu, 22 Oct 2020 06:29:36 +0000 (06:29 +0000)]
powerpc/32s: Split and inline flush_tlb_mm() and flush_tlb_page()

flush_tlb_mm() and flush_tlb_page() handle both the MMU_FTR_HPTE_TABLE
case and the other case.

The non MMU_FTR_HPTE_TABLE case is trivial as it is only a call
to _tlbie()/_tlbia() which is not worth a dedicated function.

Make flush_tlb_mm() and flush_tlb_page() hash specific and call
them from tlbflush.h based on mmu_has_feature(MMU_FTR_HPTE_TABLE).

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/11e932ded41ba6d9b251d89b7afa33cc060d3aa4.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Move _tlbie() and _tlbia() in a new file
Christophe Leroy [Thu, 22 Oct 2020 06:29:35 +0000 (06:29 +0000)]
powerpc/32s: Move _tlbie() and _tlbia() in a new file

_tlbie() and _tlbia() are used only on 603 cores while the
other functions are used only on cores having a hash table.

Move them into a new file named nohash_low.S

Add mmu_hash_lock var is used by both, it needs to go
in a common file.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9a265b1b17a64153463d361280cb4b43eb1266a4.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Inline _tlbie() on non SMP
Christophe Leroy [Thu, 22 Oct 2020 06:29:34 +0000 (06:29 +0000)]
powerpc/32s: Inline _tlbie() on non SMP

On non SMP, _tlbie() is just a tlbie plus a sync instruction.

Make it static inline.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/475136425541db5c7c8a0395d19d400525b251bc.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Move _tlbie() and _tlbia() prototypes to tlbflush.h
Christophe Leroy [Thu, 22 Oct 2020 06:29:33 +0000 (06:29 +0000)]
powerpc/32s: Move _tlbie() and _tlbia() prototypes to tlbflush.h

In order to use _tlbie() and _tlbia() directly
from asm/book3s/32/tlbflush.h, move their prototypes
from mm/mm_decl.h to there.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/867587af929973ad65f8ef6972f2474a80c1737a.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Declare Hash related vars as __initdata
Christophe Leroy [Thu, 22 Oct 2020 06:29:32 +0000 (06:29 +0000)]
powerpc/32s: Declare Hash related vars as __initdata

Hash related vars are used at init only.

Declare them in __initdata.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3878ea30706839fcff9196790ff3f99c128c3f6a.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Make Hash var static
Christophe Leroy [Thu, 22 Oct 2020 06:29:31 +0000 (06:29 +0000)]
powerpc/32s: Make Hash var static

Hash var is used only locally in mmu.c now.

No need to set it in head_32.S anymore.

Make it static and initialises it to the early hash table.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/786c82a89cdfdaabb32b72a44f7c312fa81d192b.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Use mmu_has_feature(MMU_FTR_HPTE_TABLE) instead of checking Hash var
Christophe Leroy [Thu, 22 Oct 2020 06:29:30 +0000 (06:29 +0000)]
powerpc/32s: Use mmu_has_feature(MMU_FTR_HPTE_TABLE) instead of checking Hash var

We now have an early hash table on hash MMU, so no need to check
Hash var to know if the Hash table is set of not.

Use mmu_has_feature(MMU_FTR_HPTE_TABLE) instead. This will allow
optimisation via jump_label.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f1766631a9e014b6433f1a3c12c726ddfce34220.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Make bat_addrs[] static
Christophe Leroy [Thu, 22 Oct 2020 06:29:29 +0000 (06:29 +0000)]
powerpc/32s: Make bat_addrs[] static

This table is used only locally. Declare it static.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/054fec0c139fc4c0a306360b360784733c0a6e65.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mm: Remove flush_tlb_page_nohash() prototype.
Christophe Leroy [Thu, 22 Oct 2020 06:29:28 +0000 (06:29 +0000)]
powerpc/mm: Remove flush_tlb_page_nohash() prototype.

flush_tlb_page_nohash() was removed by
commit 703b41ad1a87 ("powerpc/mm: remove flush_tlb_page_nohash")

Remove stale prototype and comment.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4a58831da6d6ba4fe309b94aa1dd8f02982d46b2.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mm: Add mask of always present MMU features
Christophe Leroy [Thu, 22 Oct 2020 06:29:27 +0000 (06:29 +0000)]
powerpc/mm: Add mask of always present MMU features

On the same principle as commit 773edeadf672 ("powerpc/mm: Add mask
of possible MMU features"), add mask for MMU features that are
always there in order to optimise out dead branches.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4943775fbe91885eb3e09133b093aaf62e55c715.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/rtas: Fix typo of ibm,open-errinjct in RTAS filter
Tyrel Datwyler [Tue, 8 Dec 2020 19:54:34 +0000 (13:54 -0600)]
powerpc/rtas: Fix typo of ibm,open-errinjct in RTAS filter

Commit bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
introduced the following error when invoking the errinjct userspace
tool:

  [root@ltcalpine2-lp5 librtas]# errinjct open
  [327884.071171] sys_rtas: RTAS call blocked - exploit attempt?
  [327884.071186] sys_rtas: token=0x26, nargs=0 (called by errinjct)
  errinjct: Could not open RTAS error injection facility
  errinjct: librtas: open: Unexpected I/O error

The entry for ibm,open-errinjct in rtas_filter array has a typo where
the "j" is omitted in the rtas call name. After fixing this typo the
errinjct tool functions again as expected.

  [root@ltcalpine2-lp5 linux]# errinjct open
  RTAS error injection facility open, token = 1

Fixes: bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
Cc: stable@vger.kernel.org
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201208195434.8289-1-tyreld@linux.ibm.com
3 years agopowerpc/powermac: Fix low_sleep_handler with CONFIG_VMAP_STACK
Christophe Leroy [Tue, 8 Dec 2020 05:24:19 +0000 (05:24 +0000)]
powerpc/powermac: Fix low_sleep_handler with CONFIG_VMAP_STACK

low_sleep_handler() can't restore the context from standard
stack because the stack can hardly be accessed with MMU OFF.

Store everything in a global storage area instead of storing
a pointer to the stack in that global storage area.

To avoid a complete churn of the function, still use r1 as
the pointer to the storage area during restore.

Fixes: cd08f109e262 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
Reported-by: Giuseppe Sacco <giuseppe@sguazz.it>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Giuseppe Sacco <giuseppe@sguazz.it>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e3e0d8042a3ba75cb4a9546c19c408b5b5b28994.1607404931.git.christophe.leroy@csgroup.eu
3 years agopowerpc: fix spelling mistake in Kconfig "seleted" -> "selected"
Colin Ian King [Mon, 7 Dec 2020 15:54:20 +0000 (15:54 +0000)]
powerpc: fix spelling mistake in Kconfig "seleted" -> "selected"

There is a spelling mistake in the help text of the Kconfig. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207155420.172370-1-colin.king@canonical.com
3 years agopowerpc/pseries/mobility: refactor node lookup during DT update
Nathan Lynch [Mon, 7 Dec 2020 21:52:00 +0000 (15:52 -0600)]
powerpc/pseries/mobility: refactor node lookup during DT update

In pseries_devicetree_update(), with each call to ibm,update-nodes the
partition firmware communicates the node to be deleted or updated by
placing its phandle in the work buffer. Each of delete_dt_node(),
update_dt_node(), and add_dt_node() have duplicate lookups using the
phandle value and corresponding refcount management.

Move the lookup and of_node_put() into pseries_devicetree_update(),
and emit a warning on any failed lookups.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-29-nathanl@linux.ibm.com
3 years agopowerpc/rtas: remove unused rtas_suspend_me_data
Nathan Lynch [Mon, 7 Dec 2020 21:51:59 +0000 (15:51 -0600)]
powerpc/rtas: remove unused rtas_suspend_me_data

All code which used this type has been removed.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-28-nathanl@linux.ibm.com
3 years agopowerpc/pseries/hibernation: remove prepare_late() callback
Nathan Lynch [Mon, 7 Dec 2020 21:51:58 +0000 (15:51 -0600)]
powerpc/pseries/hibernation: remove prepare_late() callback

The pseries hibernate code no longer calls into the original
join/suspend code in kernel/rtas.c, so pseries_prepare_late() and
related code don't accomplish anything now.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-27-nathanl@linux.ibm.com
3 years agopowerpc/pseries/hibernation: perform post-suspend fixups later
Nathan Lynch [Mon, 7 Dec 2020 21:51:57 +0000 (15:51 -0600)]
powerpc/pseries/hibernation: perform post-suspend fixups later

The pseries hibernate code calls post_mobility_fixup() which is sort
of a dumping ground of fixups that need to run after resuming from
suspend regardless of whether suspend was a hibernation or a
migration. Calling post_mobility_fixup() from
pseries_suspend_enable_irqs() runs this code early in resume with
devices suspended and only one CPU up, while the much more commonly
used migration case runs these fixups in a more typical process
context.

Call post_mobility_fixup() after the suspend core returns a success
status to the hibernate sysfs store method and remove
pseries_suspend_enable_irqs().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-26-nathanl@linux.ibm.com
3 years agopowerpc/pseries/hibernation: remove redundant cacheinfo update
Nathan Lynch [Mon, 7 Dec 2020 21:51:56 +0000 (15:51 -0600)]
powerpc/pseries/hibernation: remove redundant cacheinfo update

Partitions with cache nodes in the device tree can encounter the
following warning on resume:

CPU 0 already accounted in PowerPC,POWER9@0(Data)
WARNING: CPU: 0 PID: 3177 at arch/powerpc/kernel/cacheinfo.c:197 cacheinfo_cpu_online+0x640/0x820

These calls to cacheinfo_cpu_offline/online have been redundant since
commit e610a466d16a ("powerpc/pseries/mobility: rebuild cacheinfo
hierarchy post-migration").

Fixes: e610a466d16a ("powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-25-nathanl@linux.ibm.com
3 years agopowerpc/rtas: remove unused rtas_suspend_last_cpu()
Nathan Lynch [Mon, 7 Dec 2020 21:51:55 +0000 (15:51 -0600)]
powerpc/rtas: remove unused rtas_suspend_last_cpu()

rtas_suspend_last_cpu() is now unused, remove it and
__rtas_suspend_last_cpu() which also becomes unused.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-24-nathanl@linux.ibm.com
3 years agopowerpc/pseries/hibernation: switch to rtas_ibm_suspend_me()
Nathan Lynch [Mon, 7 Dec 2020 21:51:54 +0000 (15:51 -0600)]
powerpc/pseries/hibernation: switch to rtas_ibm_suspend_me()

rtas_suspend_last_cpu() and related code perform a lot of work that
isn't relevant to the hibernation workflow. All other CPUs are offline
when called so there is no need to place them in H_JOIN or prod them
on resume, nor is there need for retries or operations on shared
state.

Call the rtas_ibm_suspend_me() wrapper function directly from
pseries_suspend_enter() instead of using rtas_suspend_last_cpu().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-23-nathanl@linux.ibm.com
3 years agopowerpc/rtas: remove rtas_suspend_cpu()
Nathan Lynch [Mon, 7 Dec 2020 21:51:53 +0000 (15:51 -0600)]
powerpc/rtas: remove rtas_suspend_cpu()

rtas_suspend_cpu() no longer has users; remove it and
__rtas_suspend_cpu() which now becomes unused as well.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-22-nathanl@linux.ibm.com
3 years agopowerpc/machdep: remove suspend_disable_cpu()
Nathan Lynch [Mon, 7 Dec 2020 21:51:52 +0000 (15:51 -0600)]
powerpc/machdep: remove suspend_disable_cpu()

There are no users left of the suspend_disable_cpu() callback, remove
it.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-21-nathanl@linux.ibm.com
3 years agopowerpc/pseries/hibernation: remove pseries_suspend_cpu()
Nathan Lynch [Mon, 7 Dec 2020 21:51:51 +0000 (15:51 -0600)]
powerpc/pseries/hibernation: remove pseries_suspend_cpu()

Since commit 48f6e7f6d948 ("powerpc/pseries: remove cede offline state
for CPUs"), ppc_md.suspend_disable_cpu() is no longer used and all
CPUs (save one) are placed into true offline state as opposed to
H_JOIN. So pseries_suspend_cpu() is effectively unused; remove it.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-20-nathanl@linux.ibm.com
3 years agopowerpc/pseries/hibernation: pass stream id via function arguments
Nathan Lynch [Mon, 7 Dec 2020 21:51:50 +0000 (15:51 -0600)]
powerpc/pseries/hibernation: pass stream id via function arguments

There is no need for the stream id to be a file-global variable; pass
it from hibernate_store() to pseries_suspend_begin() for the
H_VASI_STATE call.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-19-nathanl@linux.ibm.com
3 years agopowerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops
Nathan Lynch [Mon, 7 Dec 2020 21:51:49 +0000 (15:51 -0600)]
powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops

There are three ways pseries_suspend_begin() can be reached:

1. When "mem" is written to /sys/power/state:

kobj_attr_store()
-> state_store()
  -> pm_suspend()
    -> suspend_devices_and_enter()
      -> pseries_suspend_begin()

This never works because there is no way to supply a valid stream id
using this interface, and H_VASI_STATE is called with a stream id of
zero. So this call path is useless at best.

2. When a stream id is written to /sys/devices/system/power/hibernate.
pseries_suspend_begin() is polled directly from store_hibernate()
until the stream is in the "Suspending" state (i.e. the platform is
ready for the OS to suspend execution):

dev_attr_store()
-> store_hibernate()
  -> pseries_suspend_begin()

3. When a stream id is written to /sys/devices/system/power/hibernate
(continued). After #2, pseries_suspend_begin() is called once again
from the pm core:

dev_attr_store()
-> store_hibernate()
  -> pm_suspend()
    -> suspend_devices_and_enter()
      -> pseries_suspend_begin()

This is redundant because the VASI suspend state is already known to
be Suspending.

The begin() callback of platform_suspend_ops is optional, so we can
simply remove that assignment with no loss of function.

Fixes: 32d8ad4e621d ("powerpc/pseries: Partition hibernation support")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-18-nathanl@linux.ibm.com
3 years agopowerpc/rtas: remove rtas_ibm_suspend_me_unsafe()
Nathan Lynch [Mon, 7 Dec 2020 21:51:48 +0000 (15:51 -0600)]
powerpc/rtas: remove rtas_ibm_suspend_me_unsafe()

rtas_ibm_suspend_me_unsafe() is now unused; remove it and
rtas_percpu_suspend_me() which becomes unused as a result.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-17-nathanl@linux.ibm.com
3 years agopowerpc/rtas: dispatch partition migration requests to pseries
Nathan Lynch [Mon, 7 Dec 2020 21:51:47 +0000 (15:51 -0600)]
powerpc/rtas: dispatch partition migration requests to pseries

sys_rtas() cannot call ibm,suspend-me directly in the same way it
handles other inputs. Instead it must dispatch the request to code
that can first perform the H_JOIN sequence before any call to
ibm,suspend-me can succeed. Over time kernel/rtas.c has accreted a fair
amount of platform-specific code to implement this.

Since a different, more robust implementation of the suspend sequence
is now in the pseries platform code, we want to dispatch the request
there.

Note that invoking ibm,suspend-me via the RTAS syscall is all but
deprecated; this change preserves ABI compatibility for old programs
while providing to them the benefit of the new partition suspend
implementation. This is a behavior change in that the kernel performs
the device tree update and firmware activation before returning, but
experimentation indicates this is tolerated fine by legacy user space.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-16-nathanl@linux.ibm.com
3 years agopowerpc/pseries/mobility: retry partition suspend after error
Nathan Lynch [Mon, 7 Dec 2020 21:51:46 +0000 (15:51 -0600)]
powerpc/pseries/mobility: retry partition suspend after error

This is a mitigation for the relatively rare occurrence where a
virtual IOA can be in a transient state that prevents the
suspend/migration from succeeding, resulting in an error from
ibm,suspend-me.

If the join/suspend sequence returns an error, it is acceptable to
retry as long as the VASI suspend session state is still
"Suspending" (i.e. the platform is still waiting for the OS to
suspend).

Retry a few times on suspend failure while this condition holds,
progressively increasing the delay between attempts. We don't want to
retry indefinitey because firmware emits an error log event on each
unsuccessful attempt.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-15-nathanl@linux.ibm.com
3 years agopowerpc/pseries/mobility: signal suspend cancellation to platform
Nathan Lynch [Mon, 7 Dec 2020 21:51:45 +0000 (15:51 -0600)]
powerpc/pseries/mobility: signal suspend cancellation to platform

If we're returning an error to user space, use H_VASI_SIGNAL to send a
cancellation request to the platform. This isn't strictly required but
it communicates that Linux will not attempt to complete the suspend,
which allows the various entities involved to promptly end the
operation in progress.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-14-nathanl@linux.ibm.com
3 years agopowerpc/pseries/mobility: use stop_machine for join/suspend
Nathan Lynch [Mon, 7 Dec 2020 21:51:44 +0000 (15:51 -0600)]
powerpc/pseries/mobility: use stop_machine for join/suspend

The partition suspend sequence as specified in the platform
architecture requires that all active processor threads call
H_JOIN, which:

- suspends the calling thread until it is the target of
  an H_PROD; or
- immediately returns H_CONTINUE, if the calling thread is the last to
  call H_JOIN. This thread is expected to call ibm,suspend-me to
  completely suspend the partition.

Upon returning from ibm,suspend-me the calling thread must wake all
others using H_PROD.

rtas_ibm_suspend_me_unsafe() uses on_each_cpu() to implement this
protocol, but because of its synchronizing nature this is susceptible
to deadlock versus users of stop_machine() or other callers of
on_each_cpu().

Not only is stop_machine() intended for use cases like this, it
handles error propagation and allows us to keep the data shared
between CPUs minimal: a single atomic counter which ensures exactly
one CPU will wake the others from their joined states.

Switch the migration code to use stop_machine() and a less complex
local implementation of the H_JOIN/ibm,suspend-me logic, which
carries additional benefits:

- more informative error reporting, appropriately ratelimited
- resets the lockup detector / watchdog on resume to prevent lockup
  warnings when the OS has been suspended for a time exceeding the
  threshold.

Fixes: 91dc182ca6e2 ("[PATCH] powerpc: special-case ibm,suspend-me RTAS call")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-13-nathanl@linux.ibm.com
3 years agopowerpc/pseries/mobility: extract VASI session polling logic
Nathan Lynch [Mon, 7 Dec 2020 21:51:43 +0000 (15:51 -0600)]
powerpc/pseries/mobility: extract VASI session polling logic

The behavior of rtas_ibm_suspend_me_unsafe() is to return -EAGAIN to
the caller until the specified VASI suspend session state makes the
transition from H_VASI_ENABLED to H_VASI_SUSPENDING. In the interest
of separating concerns to prepare for a new implementation of the
join/suspend sequence, extract VASI session polling logic into a
couple of local functions. Waiting for the session state to reach
H_VASI_SUSPENDING before calling rtas_ibm_suspend_me_unsafe() ensures
that we will never get an EAGAIN result necessitating a retry. No
user-visible change in behavior is intended.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-12-nathanl@linux.ibm.com
3 years agopowerpc/pseries/mobility: use rtas_activate_firmware() on resume
Nathan Lynch [Mon, 7 Dec 2020 21:51:42 +0000 (15:51 -0600)]
powerpc/pseries/mobility: use rtas_activate_firmware() on resume

It's incorrect to abort post-suspend processing if
ibm,activate-firmware isn't available. Use rtas_activate_firmware(),
which logs this condition appropriately and allows us to proceed.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-11-nathanl@linux.ibm.com