Christophe Leroy [Sun, 27 Sep 2020 09:16:24 +0000 (09:16 +0000)]
powerpc/vdso: Remove unnecessary ifdefs in vdso_pagelist initialization
No need of all those #ifdefs around the pagelist initialisation,
use IS_ENABLED(), GCC will kick out unused static variables.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f9333432e329b1fcbbbf846cb1cd4a1c4127a60b.1601197618.git.christophe.leroy@csgroup.eu
Christophe Leroy [Sun, 27 Sep 2020 09:16:23 +0000 (09:16 +0000)]
powerpc/vdso: Refactor 32 bits and 64 bits pages setup
The setup of VDSO pages is identical for 32 bits VDSO and
64 bits VDSO.
Refactor that setup.
And use &vdsoXX_start which is synonym of vdsoXX_kbase.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/269ffb54c37fc1d46128f77d7a39f88ef4a9957d.1601197618.git.christophe.leroy@csgroup.eu
Christophe Leroy [Sun, 27 Sep 2020 09:16:22 +0000 (09:16 +0000)]
powerpc/vdso: Remove NULL termination element in vdso_pagelist
No need of a NULL last element in pagelists, install_special_mapping()
knows how long the list is.
Remove that element.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e58d95ab859e3cbc9bae3c9ce2959e17d2864f5d.1601197618.git.christophe.leroy@csgroup.eu
Christophe Leroy [Sun, 27 Sep 2020 09:16:21 +0000 (09:16 +0000)]
powerpc/vdso: Remove get_page() in vdso_pagelist initialization
Partly copied from commit
16fb1a9bec61 ("arm64: vdso: clean up
vdso_pagelist initialization").
No need to get_page() the vdso text/data - these are part of the
kernel image.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9d14540bd10832b6c9519d74fb5728fdc4974b36.1601197618.git.christophe.leroy@csgroup.eu
Christophe Leroy [Sun, 27 Sep 2020 09:16:20 +0000 (09:16 +0000)]
powerpc/vdso: Rename syscall_map_32/64 to simplify vdso_setup_syscall_map()
Today vdso_data structure has:
- syscall_map_32[] and syscall_map_64[] on PPC64
- syscall_map_32[] on PPC32
On PPC32, syscall_map_32[] is populated using sys_call_table[].
On PPC64, syscall_map_64[] is populated using sys_call_table[]
and syscal_map_32[] is populated using compat_sys_call_table[].
To simplify vdso_setup_syscall_map(),
- On PPC32 rename syscall_map_32[] into syscall_map[],
- On PPC64 rename syscall_map_64[] into syscall_map[],
- On PPC64 rename syscall_map_32[] into compat_syscall_map[].
That way, syscall_map[] gets populated using sys_call_table[] and
compat_syscall_map[] gets population using compat_sys_call_table[].
Also define an empty compat_syscall_map[] on PPC32 to avoid ifdefs.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/472734be0d9991eee320a06824219a5b2663736b.1601197618.git.christophe.leroy@csgroup.eu
Christophe Leroy [Sun, 27 Sep 2020 09:16:19 +0000 (09:16 +0000)]
powerpc/vdso: Add missing includes and clean vdso_setup_syscall_map()
Instead of including extern references locally in
vdso_setup_syscall_map(), add the missing headers.
sys_ni_syscall() being a function, cast its address to
an unsigned long instead of declaring it as a fake
unsigned long object.
At the same time, remove a comment which paraphrases the
function name.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b4afedce748ed2858299ceab5ae29b52109263ef.1601197618.git.christophe.leroy@csgroup.eu
Christophe Leroy [Sun, 27 Sep 2020 09:16:18 +0000 (09:16 +0000)]
powerpc/vdso: Stripped VDSO is not needed, don't build it
Since commit
24b659a13866 ("powerpc: Use unstripped VDSO image for
more accurate profiling data"), only the unstripped VDSO image
has been used.
Partially revert commit
8150caad0226 ("[POWERPC] powerpc vDSO: install
unstripped copies on disk") to avoid building the stripped version.
And the unstripped version in $(MODLIB)/vdso/ is not required
anymore as it is the one embedded in the kernel image.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5986ca25be44fe6e9790486304507f240077d8c4.1601197618.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:38 +0000 (17:19 +0000)]
powerpc/signal32: Transform save_user_regs() and save_tm_user_regs() in 'unsafe' version
Change those two functions to be used within a user access block.
For that, change save_general_regs() to and unsafe_save_general_regs(),
then replace all user accesses by unsafe_ versions.
This series leads to a reduction from 2.55s to 1.73s of
the system CPU time with the following microbench app
on an mpc832x with KUAP (approx 32%)
Without KUAP, the difference is in the noise.
void sigusr1(int sig) { }
int main(int argc, char **argv)
{
int i = 100000;
signal(SIGUSR1, sigusr1);
for (;i--;)
raise(SIGUSR1);
exit(0);
}
An additional 0.10s reduction is achieved by removing
CONFIG_PPC_FPU, as the mpc832x has no FPU.
A bit less spectacular on an 8xx as KUAP is less heavy, prior to
the series (with KUAP) it ran in 8.10 ms. Once applies the removal
of FPU regs handling, we get 7.05s. With the full series, we get 6.9s.
If artificially re-activating FPU regs handling with the full series,
we get 7.6s.
So for the 8xx, the removal of the FPU regs copy is what makes the
difference, but the rework of handle_signal also have a benefit.
Same as above, without KUAP the difference is in the noise.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fixup typo in SPE handling]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c7b37b385ccf9666066452e58f018a86573f83e8.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:36 +0000 (17:19 +0000)]
powerpc/signal32: Isolate non-copy actions in save_user_regs() and save_tm_user_regs()
Reorder actions in save_user_regs() and save_tm_user_regs() to
regroup copies together in order to switch to user_access_begin()
logic in a later patch.
Move non-copy actions into new functions called
prepare_save_user_regs() and prepare_save_tm_user_regs().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f6eac65781b4a57220477c8864bca2b57f29a5d5.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:35 +0000 (17:19 +0000)]
powerpc/signal: Create 'unsafe' versions of copy_[ck][fpr/vsx]_to_user()
For the non VSX version, that's trivial. Just use unsafe_copy_to_user()
instead of __copy_to_user().
For the VSX version, remove the intermediate step through a buffer and
use unsafe_put_user() directly. This generates a far smaller code which
is acceptable to inline, see below:
Standard VSX version:
0000000000000000 <.copy_fpr_to_user>:
0: 7c 08 02 a6 mflr r0
4: fb e1 ff f8 std r31,-8(r1)
8: 39 00 00 20 li r8,32
c: 39 24 0b 80 addi r9,r4,2944
10: 7d 09 03 a6 mtctr r8
14: f8 01 00 10 std r0,16(r1)
18: f8 21 fe 71 stdu r1,-400(r1)
1c: 39 41 00 68 addi r10,r1,104
20: e9 09 00 00 ld r8,0(r9)
24: 39 4a 00 08 addi r10,r10,8
28: 39 29 00 10 addi r9,r9,16
2c: f9 0a 00 00 std r8,0(r10)
30: 42 00 ff f0 bdnz 20 <.copy_fpr_to_user+0x20>
34: e9 24 0d 80 ld r9,3456(r4)
38: 3d 42 00 00 addis r10,r2,0
3a: R_PPC64_TOC16_HA .toc
3c: eb ea 00 00 ld r31,0(r10)
3e: R_PPC64_TOC16_LO_DS .toc
40: f9 21 01 70 std r9,368(r1)
44: e9 3f 00 00 ld r9,0(r31)
48: 81 29 00 20 lwz r9,32(r9)
4c: 2f 89 00 00 cmpwi cr7,r9,0
50: 40 9c 00 18 bge cr7,68 <.copy_fpr_to_user+0x68>
54: 4c 00 01 2c isync
58: 3d 20 40 00 lis r9,16384
5c: 79 29 07 c6 rldicr r9,r9,32,31
60: 7d 3d 03 a6 mtspr 29,r9
64: 4c 00 01 2c isync
68: 38 a0 01 08 li r5,264
6c: 38 81 00 70 addi r4,r1,112
70: 48 00 00 01 bl 70 <.copy_fpr_to_user+0x70>
70: R_PPC64_REL24 .__copy_tofrom_user
74: 60 00 00 00 nop
78: e9 3f 00 00 ld r9,0(r31)
7c: 81 29 00 20 lwz r9,32(r9)
80: 2f 89 00 00 cmpwi cr7,r9,0
84: 40 9c 00 18 bge cr7,9c <.copy_fpr_to_user+0x9c>
88: 4c 00 01 2c isync
8c: 39 20 ff ff li r9,-1
90: 79 29 00 44 rldicr r9,r9,0,1
94: 7d 3d 03 a6 mtspr 29,r9
98: 4c 00 01 2c isync
9c: 38 21 01 90 addi r1,r1,400
a0: e8 01 00 10 ld r0,16(r1)
a4: eb e1 ff f8 ld r31,-8(r1)
a8: 7c 08 03 a6 mtlr r0
ac: 4e 80 00 20 blr
'unsafe' simulated VSX version (The ... are only nops) using
unsafe_copy_fpr_to_user() macro:
unsigned long copy_fpr_to_user(void __user *to,
struct task_struct *task)
{
unsafe_copy_fpr_to_user(to, task, failed);
return 0;
failed:
return 1;
}
0000000000000000 <.copy_fpr_to_user>:
0: 39 00 00 20 li r8,32
4: 39 44 0b 80 addi r10,r4,2944
8: 7d 09 03 a6 mtctr r8
c: 7c 69 1b 78 mr r9,r3
...
20: e9 0a 00 00 ld r8,0(r10)
24: f9 09 00 00 std r8,0(r9)
28: 39 4a 00 10 addi r10,r10,16
2c: 39 29 00 08 addi r9,r9,8
30: 42 00 ff f0 bdnz 20 <.copy_fpr_to_user+0x20>
34: e9 24 0d 80 ld r9,3456(r4)
38: f9 23 01 00 std r9,256(r3)
3c: 38 60 00 00 li r3,0
40: 4e 80 00 20 blr
...
50: 38 60 00 01 li r3,1
54: 4e 80 00 20 blr
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/29f6c4b8e7a5bbc61e6a8801b78bbf493f9f819e.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:34 +0000 (17:19 +0000)]
powerpc/signal32: Switch swap_context() to user_access_begin() logic
As this was the last user of put_sigset_t(), remove it as well.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c3ac4f2d134a3391bb51bdaa2d00e9a409aba9f8.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:33 +0000 (17:19 +0000)]
powerpc/signal32: Add and use unsafe_put_sigset_t()
put_sigset_t() calls copy_to_user() for copying two words.
This is terribly inefficient for copying two words.
By switching to unsafe_put_user(), we end up with something as
simple as:
3cc: 81 3d 00 00 lwz r9,0(r29)
3d0: 91 26 00 b4 stw r9,180(r6)
3d4: 81 3d 00 04 lwz r9,4(r29)
3d8: 91 26 00 b8 stw r9,184(r6)
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/06def97e87ac1c4ae8e3197e0982e1fab7b3c8ae.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:32 +0000 (17:19 +0000)]
signal: Add unsafe_put_compat_sigset()
Implement 'unsafe' version of put_compat_sigset()
For the bigendian, use unsafe_put_user() directly
to avoid intermediate copy through the stack.
For the littleendian, use a straight unsafe_copy_to_user().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/537c7082ee309a0bb9c67a50c5d9dd929aedb82d.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:31 +0000 (17:19 +0000)]
powerpc/signal32: Remove ifdefery in middle of if/else
MSR_TM_ACTIVE() is always defined and returns always 0 when
CONFIG_PPC_TRANSACTIONAL_MEM is not selected, so the awful
ifdefery in the middle of an if/else can be removed.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f3c36d687e4228f58d5c207a4036aa9ddcc7420a.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:30 +0000 (17:19 +0000)]
powerpc/signal32: Switch handle_rt_signal32() to user_access_begin() logic
On the same way as handle_signal32(), replace all user
accesses with equivalent unsafe_ versions, and move the
trampoline code icache flush outside the user access block.
Functions that have no unsafe_ equivalent also remains outside
the access block.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2974314226256f958e2984912b48883ef1754185.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:29 +0000 (17:19 +0000)]
powerpc/signal32: Switch handle_signal32() to user_access_begin() logic
Replace the access_ok() by user_access_begin() and change all user
accesses to unsafe_ version.
Move flush_icache_range() outside the user access block.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a27797f781aa00da96f8284c898173d18e952361.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:28 +0000 (17:19 +0000)]
powerpc/signal32: Move signal trampoline setup to handle_[rt_]signal32
Move signal trampoline setup into handle_signal32()
and handle_rt_signal32().
At the same time, remove the define which hides the mc_pad field
used for trampoline.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e439cc0fa35aa45da6776520777a61848b92fd4b.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:27 +0000 (17:19 +0000)]
powerpc/signal32: Misc changes to make handle_[rt_]_signal32() more similar
Miscellaneous changes to clean and make handle_signal32() and
handle_rt_signal32() even more similar.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/df0bc8c3b8fa96390c46f611df79b2a94ac21844.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:26 +0000 (17:19 +0000)]
powerpc/signal32: Rename local pointers in handle_rt_signal32()
Rename pointers in handle_rt_signal32() to make it more similar to
handle_signal32()
tm_frame becomes tm_mctx
frame becomes mctx
rt_sf becomes frame
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/be77477b0f05397876015b218e36548ee8f5e10b.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:25 +0000 (17:19 +0000)]
powerpc/signal32: Move handle_signal32() close to handle_rt_signal32()
Those two functions are similar and serving the same purpose.
To ease refactorisation, move them close to each other.
This is pure move, no code change, no cosmetic. Yes, checkpatch is
not happy, most will clear later.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dbce67900bf566bcf40179467bf1eb500814c405.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:24 +0000 (17:19 +0000)]
powerpc/signal32: Simplify logging in handle_rt_signal32()
If something is bad in the frame, there is no point in
knowing which part of the frame exactly is wrong as it
got allocated as a single block.
Always print the root address of the frame in case of
failed user access, just like handle_signal32().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/691895bd31fee89a2d8370befd66ad4eff5b63f2.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:23 +0000 (17:19 +0000)]
powerpc/signal: Refactor bad frame logging
The logging of bad frame appears half a dozen of times
and is pretty similar.
Create signal_fault() fonction to perform that logging.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fa094445c119fc00315e1c13783b493346306c6a.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:22 +0000 (17:19 +0000)]
powerpc/signal: Call get_tm_stackpointer() from get_sigframe()
Instead of calling get_tm_stackpointer() from the caller, call it
directly from get_sigframe(). This avoids a double call and
allows get_tm_stackpointer() to become static and be inlined
into get_sigframe() by GCC.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/abfdc105b8b28c4eb3ab9a26297d17f302b600ea.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:21 +0000 (17:19 +0000)]
powerpc/signal: Remove get_clean_sp()
get_clean_sp() is only used once in kernel/signal.c .
GCC is smart enough to see that x & 0xffffffff is a nop
calculation on PPC32, no need of a special PPC32 trivial version.
Include the logic from the PPC64 version of get_clean_sp() directly
in get_sigframe().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/13ef6510ce30a4867e043157b93af5bb8c67fb3b.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:20 +0000 (17:19 +0000)]
powerpc/signal: Move access_ok() out of get_sigframe()
This access_ok() will soon be performed by user_access_begin().
So move it out of get_sigframe().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/900b93744732ed0887f28f5b6a40730fb04a43fa.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:19 +0000 (17:19 +0000)]
powerpc/signal: Remove BUG_ON() in handler_signal functions
There is already the same BUG_ON() check in do_signal() which
is the only caller of handle_rt_signal64() handle_rt_signal32() and
handle_signal32().
Remove those three redundant BUG_ON().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3582e10a341d523c9c3f1ac925c3aaefc9d9293d.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:18 +0000 (17:19 +0000)]
powerpc/32s: Allow deselecting CONFIG_PPC_FPU on mpc832x
The e300c2 core which is embedded in mpc832x CPU doesn't have
an FPU.
Make it possible to not select CONFIG_PPC_FPU when building a
kernel dedicated to that target.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fcdc60d85baf80eaa0a7f3261d9d889282068216.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:17 +0000 (17:19 +0000)]
powerpc/signal: Don't manage floating point regs when no FPU
There is no point in copying floating point regs when there
is no FPU and MATH_EMULATION is not selected.
Create a new CONFIG_PPC_FPU_REGS bool that is selected by
CONFIG_MATH_EMULATION and CONFIG_PPC_FPU, and use it to
opt out everything related to fp_state in thread_struct.
The asm const used only by fpu.S are opted out with CONFIG_PPC_FPU
as fpu.S build is conditionnal to CONFIG_PPC_FPU.
The following app spends approx 8.1 seconds system time on an 8xx
without the patch, and 7.0 seconds with the patch (13.5% reduction).
On an 832x, it spends approx 2.6 seconds system time without
the patch and 2.1 seconds with the patch (19% reduction).
void sigusr1(int sig) { }
int main(int argc, char **argv)
{
int i = 100000;
signal(SIGUSR1, sigusr1);
for (;i--;)
raise(SIGUSR1);
exit(0);
}
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7569070083e6cd5b279bb5023da601aba3c06f3c.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:16 +0000 (17:19 +0000)]
powerpc/ptrace: Create ptrace_get_fpr() and ptrace_put_fpr()
On the same model as ptrace_get_reg() and ptrace_put_reg(),
create ptrace_get_fpr() and ptrace_put_fpr() to get/set
the floating points registers.
We move the boundary checkings in them.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/24a1baedea7f7ae7b6bf27be98bab6d01b5ca2c1.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:15 +0000 (17:19 +0000)]
powerpc/ptrace: Consolidate reg index calculation
Today we have:
#ifdef CONFIG_PPC32
index = addr >> 2;
if ((addr & 3) || child->thread.regs == NULL)
#else
index = addr >> 3;
if ((addr & 7))
#endif
sizeof(long) has value 4 for PPC32 and value 8 for PPC64.
Dividing by 4 is equivalent to >> 2 and dividing by 8 is equivalent
to >> 3.
And 3 and 7 are respectively (sizeof(long) - 1).
Use sizeof(long) to get rid of the #ifdef CONFIG_PPC32 and consolidate
the calculation and checking.
thread.regs have to be not NULL on both PPC32 and PPC64 so adding
that test on PPC64 is harmless.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3cd1e284e93c60db981659585e18d1f6bb73ed2f.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:14 +0000 (17:19 +0000)]
powerpc/ptrace: Move declaration of ptrace_get_reg() and ptrace_set_reg()
ptrace_get_reg() and ptrace_set_reg() are only used internally by
ptrace.
Move them in arch/powerpc/kernel/ptrace/ptrace-decl.h
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/376c258267aeae54a4423bc4a2e107a9611f0039.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Tue, 18 Aug 2020 17:19:13 +0000 (17:19 +0000)]
powerpc/signal: Move inline functions in signal.h
To really be inlined, the functions need to be defined in the
same C file as the caller, or in an included header.
Move functions defined inline from signal .c in signal.h
Fixes:
3dd4eb83a9c0 ("powerpc: move common register copy functions from signal_32.c to signal.c")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/35b1bd44a1a66f5bcf9b457a1c480ac8d5ef50b2.1597770847.git.christophe.leroy@csgroup.eu
Christophe Leroy [Thu, 26 Nov 2020 13:10:06 +0000 (00:10 +1100)]
powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32
Provides __kernel_clock_gettime64() on vdso32. This is the
64 bits version of __kernel_clock_gettime() which is
y2038 compliant.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-9-mpe@ellerman.id.au
Christophe Leroy [Thu, 26 Nov 2020 13:10:05 +0000 (00:10 +1100)]
powerpc/vdso: Switch VDSO to generic C implementation.
With the C VDSO, the performance is slightly lower, but it is worth
it as it will ease maintenance and evolution, and also brings clocks
that are not supported with the ASM VDSO.
On an 8xx at 132 MHz, vdsotest with the ASM VDSO:
gettimeofday: vdso: 828 nsec/call
clock-getres-realtime-coarse: vdso: 391 nsec/call
clock-gettime-realtime-coarse: vdso: 614 nsec/call
clock-getres-realtime: vdso: 460 nsec/call
clock-gettime-realtime: vdso: 876 nsec/call
clock-getres-monotonic-coarse: vdso: 399 nsec/call
clock-gettime-monotonic-coarse: vdso: 691 nsec/call
clock-getres-monotonic: vdso: 460 nsec/call
clock-gettime-monotonic: vdso: 1026 nsec/call
On an 8xx at 132 MHz, vdsotest with the C VDSO:
gettimeofday: vdso: 955 nsec/call
clock-getres-realtime-coarse: vdso: 545 nsec/call
clock-gettime-realtime-coarse: vdso: 592 nsec/call
clock-getres-realtime: vdso: 545 nsec/call
clock-gettime-realtime: vdso: 941 nsec/call
clock-getres-monotonic-coarse: vdso: 545 nsec/call
clock-gettime-monotonic-coarse: vdso: 591 nsec/call
clock-getres-monotonic: vdso: 545 nsec/call
clock-gettime-monotonic: vdso: 940 nsec/call
It is even better for gettime with monotonic clocks.
Unsupported clocks with ASM VDSO:
clock-gettime-boottime: vdso: 3851 nsec/call
clock-gettime-tai: vdso: 3852 nsec/call
clock-gettime-monotonic-raw: vdso: 3396 nsec/call
Same clocks with C VDSO:
clock-gettime-tai: vdso: 941 nsec/call
clock-gettime-monotonic-raw: vdso: 1001 nsec/call
clock-gettime-monotonic-coarse: vdso: 591 nsec/call
On an 8321E at 333 MHz, vdsotest with the ASM VDSO:
gettimeofday: vdso: 220 nsec/call
clock-getres-realtime-coarse: vdso: 102 nsec/call
clock-gettime-realtime-coarse: vdso: 178 nsec/call
clock-getres-realtime: vdso: 129 nsec/call
clock-gettime-realtime: vdso: 235 nsec/call
clock-getres-monotonic-coarse: vdso: 105 nsec/call
clock-gettime-monotonic-coarse: vdso: 208 nsec/call
clock-getres-monotonic: vdso: 129 nsec/call
clock-gettime-monotonic: vdso: 274 nsec/call
On an 8321E at 333 MHz, vdsotest with the C VDSO:
gettimeofday: vdso: 272 nsec/call
clock-getres-realtime-coarse: vdso: 160 nsec/call
clock-gettime-realtime-coarse: vdso: 184 nsec/call
clock-getres-realtime: vdso: 166 nsec/call
clock-gettime-realtime: vdso: 281 nsec/call
clock-getres-monotonic-coarse: vdso: 160 nsec/call
clock-gettime-monotonic-coarse: vdso: 184 nsec/call
clock-getres-monotonic: vdso: 169 nsec/call
clock-gettime-monotonic: vdso: 275 nsec/call
On a Power9 Nimbus DD2.2 at 3.8GHz, with the ASM VDSO:
clock-gettime-monotonic: vdso: 35 nsec/call
clock-getres-monotonic: vdso: 16 nsec/call
clock-gettime-monotonic-coarse: vdso: 18 nsec/call
clock-getres-monotonic-coarse: vdso: 522 nsec/call
clock-gettime-monotonic-raw: vdso: 598 nsec/call
clock-getres-monotonic-raw: vdso: 520 nsec/call
clock-gettime-realtime: vdso: 34 nsec/call
clock-getres-realtime: vdso: 16 nsec/call
clock-gettime-realtime-coarse: vdso: 18 nsec/call
clock-getres-realtime-coarse: vdso: 517 nsec/call
getcpu: vdso: 8 nsec/call
gettimeofday: vdso: 25 nsec/call
And with the C VDSO:
clock-gettime-monotonic: vdso: 37 nsec/call
clock-getres-monotonic: vdso: 20 nsec/call
clock-gettime-monotonic-coarse: vdso: 21 nsec/call
clock-getres-monotonic-coarse: vdso: 19 nsec/call
clock-gettime-monotonic-raw: vdso: 38 nsec/call
clock-getres-monotonic-raw: vdso: 20 nsec/call
clock-gettime-realtime: vdso: 37 nsec/call
clock-getres-realtime: vdso: 20 nsec/call
clock-gettime-realtime-coarse: vdso: 20 nsec/call
clock-getres-realtime-coarse: vdso: 19 nsec/call
getcpu: vdso: 8 nsec/call
gettimeofday: vdso: 28 nsec/call
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-8-mpe@ellerman.id.au
Christophe Leroy [Thu, 26 Nov 2020 13:10:04 +0000 (00:10 +1100)]
powerpc/vdso: Save and restore TOC pointer on PPC64
On PPC64, the TOC pointer needs to be saved and restored.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-7-mpe@ellerman.id.au
Christophe Leroy [Thu, 26 Nov 2020 13:10:03 +0000 (00:10 +1100)]
powerpc/vdso: Prepare for switching VDSO to generic C implementation.
Prepare for switching VDSO to generic C implementation in following
patch. Here, we:
- Prepare the helpers to call the C VDSO functions
- Prepare the required callbacks for the C VDSO functions
- Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
- Add the C trampolines to the generic C VDSO functions
powerpc is a bit special for VDSO as well as system calls in the
way that it requires setting CR SO bit which cannot be done in C.
Therefore, entry/exit needs to be performed in ASM.
Implementing __arch_get_vdso_data() would clobber the link register,
requiring the caller to save it. As the ASM calling function already
has to set a stack frame and saves the link register before calling
the C vdso function, retriving the vdso data pointer there is lighter.
Implement __arch_vdso_capable() and always return true.
Provide vdso_shift_ns(), as the generic x >> s gives the following
bad result:
18: 35 25 ff e0 addic. r9,r5,-32
1c: 41 80 00 10 blt 2c <shift+0x14>
20: 7c 64 4c 30 srw r4,r3,r9
24: 38 60 00 00 li r3,0
...
2c: 54 69 08 3c rlwinm r9,r3,1,0,30
30: 21 45 00 1f subfic r10,r5,31
34: 7c 84 2c 30 srw r4,r4,r5
38: 7d 29 50 30 slw r9,r9,r10
3c: 7c 63 2c 30 srw r3,r3,r5
40: 7d 24 23 78 or r4,r9,r4
In our case the shift is always <= 32. In addition, the upper 32 bits
of the result are likely nul. Lets GCC know it, it also optimises the
following calculations.
With the patch, we get:
0: 21 25 00 20 subfic r9,r5,32
4: 7c 69 48 30 slw r9,r3,r9
8: 7c 84 2c 30 srw r4,r4,r5
c: 7d 24 23 78 or r4,r9,r4
10: 7c 63 2c 30 srw r3,r3,r5
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-6-mpe@ellerman.id.au
Michael Ellerman [Thu, 26 Nov 2020 13:10:02 +0000 (00:10 +1100)]
powerpc/barrier: Use CONFIG_PPC64 for barrier selection
Currently we use ifdef __powerpc64__ in barrier.h to decide if we
should use lwsync or eieio for SMPWMB which is then used by
__smp_wmb().
That means when we are building the compat VDSO we will use eieio,
because it's 32-bit code, even though we're building a 64-bit kernel
for a 64-bit CPU.
Although eieio should work, it would be cleaner if we always used the
same barrier, even for the 32-bit VDSO.
So change the ifdef to CONFIG_PPC64, so that the selection is made
based on the bitness of the kernel we're building for, not the current
compilation unit.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-5-mpe@ellerman.id.au
Michael Ellerman [Thu, 26 Nov 2020 13:10:01 +0000 (00:10 +1100)]
powerpc/time: Fix mftb()/get_tb() for use with the compat VDSO
When we're building the compat VDSO we are building 32-bit code but in
the context of a 64-bit kernel configuration.
To make this work we need to be careful in some places when using
ifdefs to differentiate between CONFIG_PPC64 and __powerpc64__.
CONFIG_PPC64 indicates the kernel we're building is 64-bit, but it
doesn't tell us that we're currently building 64-bit code - we could
be building 32-bit code for the compat VDSO.
On the other hand __powerpc64__ tells us that we are currently
building 64-bit code (and therefore we must also be building a 64-bit
kernel).
In the case of get_tb() we want to use the 32-bit code sequence
regardless of whether the kernel we're building for is 64-bit or
32-bit, what matters is the word size of the current object. So we
need to check __powerpc64__ to decide if we use mftb() or the
mftbu()/mftb() sequence.
For mftb() the logic for CPU_FTR_CELL_TB_BUG only makes sense if we're
building 64-bit code, so guard that with a __powerpc64__ check.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-4-mpe@ellerman.id.au
Christophe Leroy [Thu, 26 Nov 2020 13:10:00 +0000 (00:10 +1100)]
powerpc/time: Move timebase functions into new asm/vdso/timebase.h
In order to easily use get_tb() from C VDSO, move timebase
functions into a new header named asm/vdso/timebase.h
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-3-mpe@ellerman.id.au
Christophe Leroy [Thu, 26 Nov 2020 13:09:59 +0000 (00:09 +1100)]
powerpc/processor: Move cpu_relax() into asm/vdso/processor.h
cpu_relax() need to be in asm/vdso/processor.h to be used by
the C VDSO generic library.
Move it there.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-2-mpe@ellerman.id.au
Christophe Leroy [Thu, 26 Nov 2020 13:09:58 +0000 (00:09 +1100)]
powerpc/feature: Use CONFIG_PPC64 instead of __powerpc64__ to define possible features
In order to build VDSO32 for PPC64, we need to have CPU_FTRS_POSSIBLE
and CPU_FTRS_ALWAYS independant of whether we are building the
32 bits VDSO or the 64 bits VDSO.
Use #ifdef CONFIG_PPC64 instead of #ifdef __powerpc64__
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-1-mpe@ellerman.id.au
Michael Ellerman [Tue, 24 Nov 2020 12:05:47 +0000 (23:05 +1100)]
powerpc: Update NUMA Kconfig description & help text
Update the NUMA Kconfig description to match other architectures, and
add some help text. Shamelessly borrowed from x86/arm64.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20201124120547.1940635-3-mpe@ellerman.id.au
Michael Ellerman [Tue, 24 Nov 2020 12:05:46 +0000 (23:05 +1100)]
powerpc: Make NUMA default y for powernv
Our NUMA option is default y for pseries, but not powernv. The bulk of
powernv systems are NUMA, so make NUMA default y for powernv also.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20201124120547.1940635-2-mpe@ellerman.id.au
Michael Ellerman [Tue, 24 Nov 2020 12:05:45 +0000 (23:05 +1100)]
powerpc: Make NUMA depend on SMP
Our Kconfig allows NUMA to be enabled without SMP, but none of
our defconfigs use that combination. This means it can easily be
broken inadvertently by code changes, which has happened recently.
Although it's theoretically possible to have a machine with a single
CPU and multiple memory nodes, I can't think of any real systems where
that's the case. Even so if such a system exists, it can just run an
SMP kernel anyway.
So to avoid the need to add extra #ifdefs and/or build breaks, make
NUMA depend on SMP.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20201124120547.1940635-1-mpe@ellerman.id.au
Christophe Leroy [Sat, 21 Nov 2020 17:59:19 +0000 (17:59 +0000)]
powerpc: inline iomap accessors
ioreadXX()/ioreadXXbe() accessors are equivalent to ppc
in_leXX()/in_be16() accessors but they are not inlined.
Since commit
0eb573682872 ("powerpc/kerenl: Enable EEH for IO
accessors"), the 'le' versions are equivalent to the ones
defined in asm-generic/io.h, allthough the ones there are inlined.
Include asm-generic/io.h to get them. Keep ppc versions of the
'be' ones as they are optimised, but make them inline in ppc io.h.
This reduces the size of ppc64e_defconfig build by 3 kbytes:
text data bss dec hex filename
10160733 4343422 562972
15067127 e5e7f7 vmlinux.before
10159239 4341590 562972
15063801 e5daf9 vmlinux.after
A typical function using ioread and iowrite before the change:
c00000000066a3c4 <.ata_bmdma_stop>:
c00000000066a3c4: 7c 08 02 a6 mflr r0
c00000000066a3c8: fb c1 ff f0 std r30,-16(r1)
c00000000066a3cc: f8 01 00 10 std r0,16(r1)
c00000000066a3d0: fb e1 ff f8 std r31,-8(r1)
c00000000066a3d4: f8 21 ff 81 stdu r1,-128(r1)
c00000000066a3d8: eb e3 00 00 ld r31,0(r3)
c00000000066a3dc: eb df 00 98 ld r30,152(r31)
c00000000066a3e0: 7f c3 f3 78 mr r3,r30
c00000000066a3e4: 4b 9b 6f 7d bl
c000000000021360 <.ioread8>
c00000000066a3e8: 60 00 00 00 nop
c00000000066a3ec: 7f c4 f3 78 mr r4,r30
c00000000066a3f0: 54 63 06 3c rlwinm r3,r3,0,24,30
c00000000066a3f4: 4b 9b 70 4d bl
c000000000021440 <.iowrite8>
c00000000066a3f8: 60 00 00 00 nop
c00000000066a3fc: 7f e3 fb 78 mr r3,r31
c00000000066a400: 38 21 00 80 addi r1,r1,128
c00000000066a404: e8 01 00 10 ld r0,16(r1)
c00000000066a408: eb c1 ff f0 ld r30,-16(r1)
c00000000066a40c: 7c 08 03 a6 mtlr r0
c00000000066a410: eb e1 ff f8 ld r31,-8(r1)
c00000000066a414: 4b ff ff 8c b
c00000000066a3a0 <.ata_sff_dma_pause>
The same function with this patch:
c000000000669cb4 <.ata_bmdma_stop>:
c000000000669cb4: e8 63 00 00 ld r3,0(r3)
c000000000669cb8: e9 43 00 98 ld r10,152(r3)
c000000000669cbc: 7c 00 04 ac hwsync
c000000000669cc0: 89 2a 00 00 lbz r9,0(r10)
c000000000669cc4: 0c 09 00 00 twi 0,r9,0
c000000000669cc8: 4c 00 01 2c isync
c000000000669ccc: 55 29 06 3c rlwinm r9,r9,0,24,30
c000000000669cd0: 7c 00 04 ac hwsync
c000000000669cd4: 99 2a 00 00 stb r9,0(r10)
c000000000669cd8: a1 4d 06 f0 lhz r10,1776(r13)
c000000000669cdc: 2c 2a 00 00 cmpdi r10,0
c000000000669ce0: 41 c2 00 08 beq-
c000000000669ce8 <.ata_bmdma_stop+0x34>
c000000000669ce4: b1 4d 06 f2 sth r10,1778(r13)
c000000000669ce8: 4b ff ff a8 b
c000000000669c90 <.ata_sff_dma_pause>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/18b357d68c4cde149f75c7a1031c850925cd8128.1605981539.git.christophe.leroy@csgroup.eu
Athira Rajeev [Tue, 24 Nov 2020 02:40:40 +0000 (21:40 -0500)]
powerpc/perf: Fix crash with is_sier_available when pmu is not set
On systems without any specific PMU driver support registered, running
'perf record' with —intr-regs will crash ( perf record -I <workload> ).
The relevant portion from crash logs and Call Trace:
Unable to handle kernel paging request for data at address 0x00000068
Faulting instruction address: 0xc00000000013eb18
Oops: Kernel access of bad area, sig: 11 [#1]
CPU: 2 PID: 13435 Comm: kill Kdump: loaded Not tainted 4.18.0-193.el8.ppc64le #1
NIP:
c00000000013eb18 LR:
c000000000139f2c CTR:
c000000000393d80
REGS:
c0000004a07ab4f0 TRAP: 0300 Not tainted (4.18.0-193.el8.ppc64le)
NIP [
c00000000013eb18] is_sier_available+0x18/0x30
LR [
c000000000139f2c] perf_reg_value+0x6c/0xb0
Call Trace:
[
c0000004a07ab770] [
c0000004a07ab7c8] 0xc0000004a07ab7c8 (unreliable)
[
c0000004a07ab7a0] [
c0000000003aa77c] perf_output_sample+0x60c/0xac0
[
c0000004a07ab840] [
c0000000003ab3f0] perf_event_output_forward+0x70/0xb0
[
c0000004a07ab8c0] [
c00000000039e208] __perf_event_overflow+0x88/0x1a0
[
c0000004a07ab910] [
c00000000039e42c] perf_swevent_hrtimer+0x10c/0x1d0
[
c0000004a07abc50] [
c000000000228b9c] __hrtimer_run_queues+0x17c/0x480
[
c0000004a07abcf0] [
c00000000022aaf4] hrtimer_interrupt+0x144/0x520
[
c0000004a07abdd0] [
c00000000002a864] timer_interrupt+0x104/0x2f0
[
c0000004a07abe30] [
c0000000000091c4] decrementer_common+0x114/0x120
When perf record session is started with "-I" option, capturing registers
on each sample calls is_sier_available() to check for the
SIER (Sample Instruction Event Register) availability in the platform.
This function in core-book3s accesses 'ppmu->flags'. If a platform specific
PMU driver is not registered, ppmu is set to NULL and accessing its
members results in a crash. Fix the crash by returning false in
is_sier_available() if ppmu is not set.
Fixes:
333804dc3b7a ("powerpc/perf: Update perf_regs structure to include SIER")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1606185640-1720-1-git-send-email-atrajeev@linux.vnet.ibm.com
Alan Modra [Fri, 27 Nov 2020 00:48:42 +0000 (11:48 +1100)]
powerpc/boot: Make use of REL16 relocs in powerpc/boot/util.S
Use bcl 20,31,0f rather than plain bl to avoid unbalancing the link
stack.
Update the code to use REL16 relocs, available for ppc64 in 2009 (and
ppc32 in 2005).
Signed-off-by: Alan Modra <amodra@gmail.com>
[mpe: Incorporate more detail into the change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Bill Wendling [Fri, 20 Nov 2020 22:40:34 +0000 (14:40 -0800)]
powerpc: Work around inline asm issues in alternate feature sections
The clang toolchain treats inline assembly a bit differently than
straight assembly code. In particular, inline assembly doesn't have
the complete context available to resolve expressions. This is
intentional to avoid divergence in the resulting assembly code.
We can work around this issue by borrowing a workaround done for ARM,
i.e. not directly testing the labels themselves, but by moving the
current output pointer by a value that should always be zero. If this
value is not null, then we will trigger a backward move, which is
explicitly forbidden.
Signed-off-by: Bill Wendling <morbo@google.com>
[mpe: Put it in a macro and only do the workaround for clang]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201120224034.191382-4-morbo@google.com
Bill Wendling [Fri, 20 Nov 2020 22:40:33 +0000 (14:40 -0800)]
powerpc/boot: Use clang when CC is clang
The gcc compiler may not be available if CC is clang.
Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201120224034.191382-3-morbo@google.com
Bill Wendling [Fri, 20 Nov 2020 22:40:32 +0000 (14:40 -0800)]
powerpc/boot/wrapper: Add "-z notext" flag to disable diagnostic
The "-z notext" flag disables reporting an error if DT_TEXTREL is set.
ld.lld: error: can't create dynamic relocation R_PPC64_ADDR64 against
symbol: _start in readonly segment; recompile object files with
-fPIC or pass '-Wl,-z,notext' to allow text relocations in the
output
>>> defined in
>>> referenced by crt0.o:(.text+0x8) in archive arch/powerpc/boot/wrapper.a
The BFD linker disables this by default (though it's configurable in
current versions). LLD enables this by default. So we add the flag to
keep LLD from emitting the error.
Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201120224034.191382-2-morbo@google.com
Bill Wendling [Wed, 18 Nov 2020 22:39:10 +0000 (14:39 -0800)]
powerpc/boot/wrapper: Add "-z rodynamic" when using LLD
Normally all read-only sections precede SHF_WRITE sections. .dynamic
and .got have the SHF_WRITE flag; .dynamic probably because of
DT_DEBUG. LLD emits an error when this happens, so use "-z rodynamic"
to mark .dynamic as read-only.
Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201118223910.2711337-1-morbo@google.com
Bill Wendling [Sat, 17 Oct 2020 00:01:51 +0000 (17:01 -0700)]
powerpc/boot: Move the .got section to after the .dynamic section
Both .dynamic and .got are RELRO sections and should be placed
together, and LLD emits an error:
ld.lld: error: section: .got is not contiguous with other relro sections
Place them together to avoid this.
Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201017000151.150788-1-morbo@google.com
Oleg Nesterov [Thu, 19 Nov 2020 16:02:47 +0000 (17:02 +0100)]
powerpc/ptrace: Hard wire PT_SOFTE value to 1 in gpr_get() too
The commit
a8a4b03ab95f ("powerpc: Hard wire PT_SOFTE value to 1 in
ptrace & signals") changed ptrace_get_reg(PT_SOFTE) to report 0x1,
but PTRACE_GETREGS still copies pt_regs->softe as is.
This is not consistent and this breaks the user-regs-peekpoke test
from https://sourceware.org/systemtap/wiki/utrace/tests/
Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201119160247.GB5188@redhat.com
Oleg Nesterov [Thu, 19 Nov 2020 16:02:21 +0000 (17:02 +0100)]
powerpc/ptrace: Simplify gpr_get()/tm_cgpr_get()
gpr_get() does membuf_write() twice to override pt_regs->msr in
between. We can call membuf_write() once and change ->msr in the
kernel buffer, this simplifies the code and the next fix.
The patch adds a new simple helper, membuf_at(offs), it returns the
new membuf which can be safely used after membuf_write().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[mpe: Fixup some minor whitespace issues noticed by Christophe]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201119160221.GA5188@redhat.com
Michael Ellerman [Wed, 25 Nov 2020 12:17:31 +0000 (23:17 +1100)]
Merge branch 'fixes' into next
Merge our fixes branch, in particular to bring in the changes for the
entry/uaccess flush.
Stephen Rothwell [Mon, 23 Nov 2020 07:40:16 +0000 (18:40 +1100)]
powerpc/64s: Fix allnoconfig build since uaccess flush
Using DECLARE_STATIC_KEY_FALSE needs linux/jump_table.h.
Otherwise the build fails with eg:
arch/powerpc/include/asm/book3s/64/kup-radix.h:66:1: warning: data definition has no type or storage class
66 | DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
Fixes:
9a32a7e78bd0 ("powerpc/64s: flush L1D after user accesses")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[mpe: Massage change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201123184016.693fe464@canb.auug.org.au
Michael Ellerman [Mon, 23 Nov 2020 10:16:27 +0000 (21:16 +1100)]
Merge tag 'powerpc-cve-2020-4788' into fixes
From Daniel's cover letter:
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.
However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern.
This patch series flushes the L1 cache on kernel entry (patch 2) and after the
kernel performs any user accesses (patch 3). It also adds a self-test and
performs some related cleanups.
Daniel Axtens [Tue, 17 Nov 2020 05:59:16 +0000 (16:59 +1100)]
powerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigations
pseries|pnv_setup_rfi_flush already does the count cache flush setup, and
we just added entry and uaccess flushes. So the name is not very accurate
any more. In both platforms we then also immediately setup the STF flush.
Rename them to _setup_security_mitigations and fold the STF flush in.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Tue, 17 Nov 2020 05:59:15 +0000 (16:59 +1100)]
selftests/powerpc: refactor entry and rfi_flush tests
For simplicity in backporting, the original entry_flush test contained
a lot of duplicated code from the rfi_flush test. De-duplicate that code.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Tue, 17 Nov 2020 05:59:14 +0000 (16:59 +1100)]
selftests/powerpc: entry flush test
Add a test modelled on the RFI flush test which counts the number
of L1D misses doing a simple syscall with the entry flush on and off.
For simplicity of backporting, this test duplicates a lot of code from
rfi_flush. We clean that up in the next patch.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 19 Nov 2020 12:43:53 +0000 (23:43 +1100)]
powerpc: Only include kup-radix.h for 64-bit Book3S
In kup.h we currently include kup-radix.h for all 64-bit builds, which
includes Book3S and Book3E. The latter doesn't make sense, Book3E
never uses the Radix MMU.
This has worked up until now, but almost by accident, and the recent
uaccess flush changes introduced a build breakage on Book3E because of
the bad structure of the code.
So disentangle things so that we only use kup-radix.h for Book3S. This
requires some more stubs in kup.h and fixing an include in
syscall_64.c.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Tue, 17 Nov 2020 05:59:13 +0000 (16:59 +1100)]
powerpc/64s: flush L1D after user accesses
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.
However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache after user accesses.
This is part of the fix for CVE-2020-4788.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Tue, 17 Nov 2020 05:59:12 +0000 (16:59 +1100)]
powerpc/64s: flush L1D on kernel entry
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.
However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache on kernel entry.
This is part of the fix for CVE-2020-4788.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Russell Currey [Tue, 17 Nov 2020 05:59:11 +0000 (16:59 +1100)]
selftests/powerpc: rfi_flush: disable entry flush if present
We are about to add an entry flush. The rfi (exit) flush test measures
the number of L1D flushes over a syscall with the RFI flush enabled and
disabled. But if the entry flush is also enabled, the effect of enabling
and disabling the RFI flush is masked.
If there is a debugfs entry for the entry flush, disable it during the RFI
flush and restore it later.
Reported-by: Spoorthy S <spoorts2@in.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
David Hildenbrand [Wed, 11 Nov 2020 14:53:22 +0000 (15:53 +0100)]
powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
Let's use alloc_contig_pages() for allocating memory and remove the
linear mapping manually via arch_remove_linear_mapping(). Mark all pages
PG_offline, such that they will definitely not get touched - e.g.,
when hibernating. When freeing memory, try to revert what we did.
The original idea was discussed in:
https://lkml.kernel.org/r/
48340e96-7e6b-736f-9e23-
d3111b915b6e@redhat.com
This is similar to CONFIG_DEBUG_PAGEALLOC handling on other
architectures, whereby only single pages are unmapped from the linear
mapping. Let's mimic what memory hot(un)plug would do with the linear
mapping.
We now need MEMORY_HOTPLUG and CONTIG_ALLOC as dependencies. Add a TODO
that we want to use __GFP_ZERO for clearing once alloc_contig_pages()
understands that.
Tested with in QEMU/TCG with 10 GiB of main memory:
[root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
[ 105.903043][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
[root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
[ 145.042493][ T1080] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages
[ 145.049019][ T1080] memtrace: Freed trace memory back on node 0
[ 145.333960][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
[root@localhost ~]# echo 0x80000000 > /sys/kernel/debug/powerpc/memtrace/enable
[ 213.606916][ T1080] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages
[ 213.613855][ T1080] memtrace: Freed trace memory back on node 0
[ 214.185094][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
[root@localhost ~]# echo 0x100000000 > /sys/kernel/debug/powerpc/memtrace/enable
[ 234.874872][ T1080] radix-mmu: Mapped 0x0000000080000000-0x0000000100000000 with 64.0 KiB pages
[ 234.886974][ T1080] memtrace: Freed trace memory back on node 0
[ 234.890153][ T1080] memtrace: Failed to allocate trace memory on node 0
[root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
[ 259.490196][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
I also made sure allocated memory is properly zeroed.
Note 1: We currently won't be allocating from ZONE_MOVABLE - because our
pages are not movable. However, as we don't run with any memory
hot(un)plug mechanism around, we could make an exception to
increase the chance of allocations succeeding.
Note 2: PG_reserved isn't sufficient. E.g., kernel_page_present() used
along PG_reserved in hibernation code will always return "true"
on powerpc, resulting in the pages getting touched. It's too
generic - e.g., indicates boot allocations.
Note 3: For now, we keep using memory_block_size_bytes() as minimum
granularity.
Suggested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-9-david@redhat.com
David Hildenbrand [Wed, 11 Nov 2020 14:53:21 +0000 (15:53 +0100)]
powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
Let's revert what we did in case something goes wrong and we return an
error - as already done on arm64 and s390x.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-8-david@redhat.com
David Hildenbrand [Wed, 11 Nov 2020 14:53:20 +0000 (15:53 +0100)]
powerpc/book3s64/hash: Drop WARN_ON in hash__remove_section_mapping()
The single caller (arch_remove_linear_mapping()) prints a proper
warning when this function fails. No need to eventually crash the
kernel - let's drop this WARN_ON.
Suggested-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-7-david@redhat.com
David Hildenbrand [Wed, 11 Nov 2020 14:53:19 +0000 (15:53 +0100)]
powerpc/mm: print warning in arch_remove_linear_mapping()
Let's print a warning similar to in arch_add_linear_mapping() instead of
WARN_ON_ONCE() and eventually crashing the kernel.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-6-david@redhat.com
David Hildenbrand [Wed, 11 Nov 2020 14:53:18 +0000 (15:53 +0100)]
powerpc/mm: protect linear mapping modifications by a mutex
This code currently relies on mem_hotplug_begin()/mem_hotplug_done() -
create_section_mapping()/remove_section_mapping() implementations
cannot tollerate getting called concurrently.
Let's prepare for callers (memtrace) not holding any such locks (and
don't force them to mess with memory hotplug locks).
Other parts in these functions don't seem to rely on external locking.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-5-david@redhat.com
David Hildenbrand [Wed, 11 Nov 2020 14:53:17 +0000 (15:53 +0100)]
powerpc/mm: factor out creating/removing linear mapping
We want to stop abusing memory hotplug infrastructure in memtrace code
to perform allocations and remove the linear mapping. Instead we will use
alloc_contig_pages() and remove the linear mapping manually.
Let's factor out creating/removing the linear mapping into
arch_create_linear_mapping() / arch_remove_linear_mapping() - so in the
future, we might be able to have whole arch_add_memory() /
arch_remove_memory() be implemented in common code.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-4-david@redhat.com
David Hildenbrand [Wed, 11 Nov 2020 14:53:16 +0000 (15:53 +0100)]
powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrently
It's very easy to crash the kernel right now by simply trying to
enable memtrace concurrently, hammering on the "enable" interface
loop.sh:
#!/bin/bash
dmesg --console-off
while true; do
echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
done
[root@localhost ~]# loop.sh &
[root@localhost ~]# loop.sh &
Resulting quickly in a kernel crash. Let's properly protect using a
mutex.
Fixes:
9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing")
Cc: stable@vger.kernel.org# v4.14+
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-3-david@redhat.com
David Hildenbrand [Wed, 11 Nov 2020 14:53:15 +0000 (15:53 +0100)]
powerpc/powernv/memtrace: Don't leak kernel memory to user space
We currently leak kernel memory to user space, because memory
offlining doesn't do any implicit clearing of memory and we are
missing explicit clearing of memory.
Let's keep it simple and clear pages before removing the linear
mapping.
Reproduced in QEMU/TCG with 10 GiB of main memory:
[root@localhost ~]# dd obs=9G if=/dev/urandom of=/dev/null
[... wait until "free -m" used counter no longer changes and cancel]
19665802+0 records in
1+0 records out
9663676416 bytes (9.7 GB, 9.0 GiB) copied, 135.548 s, 71.3 MB/s
[root@localhost ~]# cat /sys/devices/system/memory/block_size_bytes
40000000
[root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
[ 402.978663][ T1086] page:
000000001bc4bc74 refcount:1 mapcount:0 mapping:
0000000000000000 index:0x0 pfn:0x24900
[ 402.980063][ T1086] flags: 0x7ffff000001000(reserved)
[ 402.980415][ T1086] raw:
007ffff000001000 c00c000000924008 c00c000000924008 0000000000000000
[ 402.980627][ T1086] raw:
0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 402.980845][ T1086] page dumped because: unmovable page
[ 402.989608][ T1086] Offlined Pages 16384
[ 403.324155][ T1086] memtrace: Allocated trace memory on node 0 at 0x0000000200000000
Before this patch:
[root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/
00000000/trace | head
00000000 c8 25 72 51 4d 26 36 c5 5c c2 56 15 d5 1a cd 10 |.%rQM&6.\.V.....|
00000010 19 b9 50 b2 cb e3 60 b8 ec 0a f3 ec 4b 3c 39 f0 |..P...`.....K<9.|$
00000020 4e 5a 4c cf bd 26 19 ff 37 79 13 67 24 b7 b8 57 |NZL..&..7y.g$..W|$
00000030 98 3e f5 be 6f 14 6a bd a4 52 bc 6e e9 e0 c1 5d |.>..o.j..R.n...]|$
00000040 76 b3 ae b5 88 d7 da e3 64 23 85 2c 10 88 07 b6 |v.......d#.,....|$
00000050 9a d8 91 de f7 50 27 69 2e 64 9c 6f d3 19 45 79 |.....P'i.d.o..Ey|$
00000060 6a 6f 8a 61 71 19 1f c7 f1 df 28 26 ca 0f 84 55 |jo.aq.....(&...U|$
00000070 01 3f be e4 e2 e1 da ff 7b 8c 8e 32 37 b4 24 53 |.?......{..27.$S|$
00000080 1b 70 30 45 56 e6 8c c4 0e b5 4c fb 9f dd 88 06 |.p0EV.....L.....|$
00000090 ef c4 18 79 f1 60 b1 5c 79 59 4d f4 36 d7 4a 5c |...y.`.\yYM.6.J\|$
After this patch:
[root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/
00000000/trace | head
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
40000000
Fixes:
9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing")
Cc: stable@vger.kernel.org # v4.14+
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-2-david@redhat.com
Madhavan Srinivasan [Wed, 21 Oct 2020 08:53:29 +0000 (14:23 +0530)]
powerpc/perf: Use regs->nip when SIAR is zero
In power10 DD1, there is an issue where the SIAR (Sampled Instruction
Address Register) is not latching to the sampled address during random
sampling. This results in value of 0s in the SIAR. Add a check to use
regs->nip when SIAR is zero.
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201021085329.384535-5-maddy@linux.ibm.com
Athira Rajeev [Wed, 21 Oct 2020 08:53:27 +0000 (14:23 +0530)]
powerpc/perf: Use the address from SIAR register to set cpumode flags
While setting the processor mode for any sample, perf_get_misc_flags()
expects the privilege level to differentiate the userspace and kernel
address. On power10 DD1, there is an issue that causes MSR_HV MSR_PR
bits of Sampled Instruction Event Register (SIER) not to be set for
marked events. Hence add a check to use the address in SIAR (Sampled
Instruction Address Register) to identify the privilege level.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201021085329.384535-3-maddy@linux.ibm.com
Athira Rajeev [Wed, 21 Oct 2020 08:53:26 +0000 (14:23 +0530)]
powerpc/perf: Drop the check for SIAR_VALID
In power10 DD1, there is an issue that causes the SIAR_VALID bit of
the SIER (Sampled Instruction Event Register) to not be set. But the
SIAR_VALID bit is used for fetching the instruction address from the
SIAR (Sampled Instruction Address Register), and marked events are
sampled only if the SIAR_VALID bit is set. So drop the check for
SIAR_VALID and return true always incase of power10 DD1.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201021085329.384535-2-maddy@linux.ibm.com
Athira Rajeev [Wed, 21 Oct 2020 08:53:25 +0000 (14:23 +0530)]
powerpc/perf: Add new power PMU flag "PPMU_P10_DD1" for power10 DD1
Add a new power PMU flag "PPMU_P10_DD1" which can be used to
conditionally add any code path for power10 DD1 processor version.
Also modify power10 PMU driver code to set this flag only for DD1,
based on the Processor Version Register (PVR) value.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201021085329.384535-1-maddy@linux.ibm.com
Kaixu Xia [Tue, 10 Nov 2020 02:56:01 +0000 (10:56 +0800)]
powerpc/mm: Fix comparing pointer to 0 warning
Fixes coccicheck warning:
./arch/powerpc/mm/pgtable_32.c:87:11-12: WARNING comparing pointer to 0
Avoid pointer type value compared to 0.
Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1604976961-20441-1-git-send-email-kaixuxia@tencent.com
Christophe Leroy [Sun, 8 Nov 2020 16:57:37 +0000 (16:57 +0000)]
powerpc: Remove RFI macro
RFI macro is just there to add an infinite loop past
rfi in order to avoid prefetch on 40x in half a dozen
of places in entry_32 and head_32.
Those places are already full of #ifdefs, so just add a
few more to explicitely show those loops and remove RFI.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f7e9cb9e9240feec63cb330abf40b67d1aad852f.1604854583.git.christophe.leroy@csgroup.eu
Christophe Leroy [Sun, 8 Nov 2020 16:57:36 +0000 (16:57 +0000)]
powerpc: Replace RFI by rfi on book3s/32 and booke
For book3s/32 and for booke, RFI is just an rfi.
Only 40x has a non trivial RFI.
CONFIG_PPC_RTAS is never selected by 40x platforms.
Make it more explicit by replacing RFI by rfi wherever possible.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b901ddfdeb8a0a3b7cb59999599cdfde1bbfe834.1604854583.git.christophe.leroy@csgroup.eu
Christophe Leroy [Sun, 8 Nov 2020 16:57:35 +0000 (16:57 +0000)]
powerpc/64s: Replace RFI by RFI_TO_KERNEL and remove RFI
In head_64.S, we have two places using RFI to return to
kernel. Use RFI_TO_KERNEL instead.
They are the two only places using RFI on book3s/64, so
the RFI macro can go away.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7719261b0a0d2787772339484c33eb809723bca7.1604854583.git.christophe.leroy@csgroup.eu
Kaixu Xia [Tue, 10 Nov 2020 11:19:30 +0000 (19:19 +0800)]
powerpc/powernv/sriov: fix unsigned int win compared to less than zero
Fix coccicheck warning:
arch/powerpc/platforms/powernv/pci-sriov.c:443:7-10:
WARNING: Unsigned expression compared with zero: win < 0
arch/powerpc/platforms/powernv/pci-sriov.c:462:7-10:
WARNING: Unsigned expression compared with zero: win < 0
Fixes:
39efc03e3ee8 ("powerpc/powernv/sriov: Move M64 BAR allocation into a helper")
Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1605007170-22171-1-git-send-email-kaixuxia@tencent.com
Zhang Xiaoxu [Wed, 11 Nov 2020 02:07:52 +0000 (21:07 -0500)]
Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"
This reverts commit
a0ff72f9f5a780341e7ff5e9ba50a0dad5fa1980.
Since the commit
b015f6bc9547 ("powerpc/pseries: Add cpu DLPAR
support for drc-info property"), the 'cpu_drcs' wouldn't be double
freed when the 'cpus' node not found.
So we needn't apply this patch, otherwise, the memory will be leaked.
Fixes:
a0ff72f9f5a7 ("powerpc/pseries/hotplug-cpu: Remove double free in error path")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
[mpe: Caused by me applying a patch to a function that had changed in the interim]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111020752.1686139-1-zhangxiaoxu5@huawei.com
Nicholas Piggin [Wed, 11 Nov 2020 12:01:51 +0000 (22:01 +1000)]
powerpc/64s/perf: perf interrupt does not have to get_user_pages to access user memory
read_user_stack_slow that walks user address translation by hand is
only required on hash, because a hash fault can not be serviced from
"NMI" context (to avoid re-entering the hash code) so the user stack
can be mapped into Linux page tables but not accessible by the CPU.
Radix MMU mode does not have this restriction. A page fault failure
would indicate the page is not accessible via get_user_pages either,
so avoid this on radix.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111120151.3150658-1-npiggin@gmail.com
Youling Tang [Wed, 4 Nov 2020 10:59:10 +0000 (18:59 +0800)]
powerpc: Use the common INIT_DATA_SECTION macro in vmlinux.lds.S
Use the common INIT_DATA_SECTION rule for the linker script in an effort
to regularize the linker script.
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1604487550-20040-1-git-send-email-tangyouling@loongson.cn
Christophe Leroy [Tue, 3 Nov 2020 18:07:12 +0000 (18:07 +0000)]
powerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32
On 8xx, we get the following features:
[ 0.000000] cpu_features = 0x0000000000000100
[ 0.000000] possible = 0x0000000000000120
[ 0.000000] always = 0x0000000000000000
This is not correct. As CONFIG_PPC_8xx is mutually exclusive with all
other configurations, the three lines should be equal.
The problem is due to CPU_FTRS_GENERIC_32 which is taken when
CONFIG_BOOK3S_32 is NOT selected. This CPU_FTRS_GENERIC_32 is
pointless because there is no generic configuration supporting
all 32 bits but book3s/32.
Remove this pointless generic features definition to unbreak the
calculation of 'possible' features and 'always' features.
Fixes:
76bc080ef5a3 ("[POWERPC] Make default cputable entries reflect selected CPU family")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/76a85f30bf981d1aeaae00df99321235494da254.1604426550.git.christophe.leroy@csgroup.eu
Aneesh Kumar K.V [Wed, 7 Oct 2020 05:33:05 +0000 (11:03 +0530)]
powerpc/mm: Update tlbiel loop on POWER10
With POWER10, single tlbiel instruction invalidates all the congruence
class of the TLB and hence we need to issue only one tlbiel with SET=0.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201007053305.232879-1-aneesh.kumar@linux.ibm.com
Ard Biesheuvel [Wed, 28 Oct 2020 08:04:33 +0000 (09:04 +0100)]
powerpc: Avoid broken GCC __attribute__((optimize))
Commit
7053f80d9696 ("powerpc/64: Prevent stack protection in early
boot") introduced a couple of uses of __attribute__((optimize)) with
function scope, to disable the stack protector in some early boot
code.
Unfortunately, and this is documented in the GCC man pages [0],
overriding function attributes for optimization is broken, and is only
supported for debug scenarios, not for production: the problem appears
to be that setting GCC -f flags using this method will cause it to
forget about some or all other optimization settings that have been
applied.
So the only safe way to disable the stack protector is to disable it
for the entire source file.
[0] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
Fixes:
7053f80d9696 ("powerpc/64: Prevent stack protection in early boot")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
[mpe: Drop one remaining use of __nostackprotector, reported by snowpatch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028080433.26799-1-ardb@kernel.org
Qinglang Miao [Wed, 28 Oct 2020 09:15:51 +0000 (17:15 +0800)]
powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()
I noticed that iounmap() of msgr_block_addr before return from
mpic_msgr_probe() in the error handling case is missing. So use
devm_ioremap() instead of just ioremap() when remapping the message
register block, so the mapping will be automatically released on
probe failure.
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com
Po-Hsu Lin [Fri, 23 Oct 2020 02:45:39 +0000 (10:45 +0800)]
selftests/powerpc/eeh: disable kselftest timeout setting for eeh-basic
The eeh-basic test got its own 60 seconds timeout (defined in commit
414f50434aa2 "selftests/eeh: Bump EEH wait time to 60s") per breakable
device.
And we have discovered that the number of breakable devices varies
on different hardware. The device recovery time ranges from 0 to 35
seconds. In our test pool it will take about 30 seconds to run on a
Power8 system that with 5 breakable devices, 60 seconds to run on a
Power9 system that with 4 breakable devices.
Extend the timeout setting in the kselftest framework to 5 minutes
to give it a chance to finish.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201023024539.9512-1-po-hsu.lin@canonical.com
Michael Ellerman [Fri, 23 Oct 2020 03:13:05 +0000 (14:13 +1100)]
powerpc/ps3: Drop unused DBG macro
This DBG macro is unused, and has been unused since the file was
originally merged into mainline. Just drop it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201023031305.3284819-1-mpe@ellerman.id.au
Michael Ellerman [Fri, 23 Oct 2020 02:08:38 +0000 (13:08 +1100)]
powerpc/85xx: Fix declaration made after definition
Currently the clang build of corenet64_smp_defconfig fails with:
arch/powerpc/platforms/85xx/corenet_generic.c:210:1: error:
attribute declaration must precede definition
machine_arch_initcall(corenet_generic, corenet_gen_publish_devices);
Fix it by moving the initcall definition prior to the machine
definition, and directly below the function it calls, which is the
usual style anyway.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201023020838.3274226-1-mpe@ellerman.id.au
Aneesh Kumar K.V [Thu, 22 Oct 2020 09:11:15 +0000 (14:41 +0530)]
powerpc/mm: Move setting PTE specific flags to pfn_pmd()
powerpc used to set the PTE specific flags in set_pte_at(). That is
different from other architectures. To be consistent with other
architectures powerpc updated pfn_pte() to set _PAGE_PTE in commit
379c926d6334 ("powerpc/mm: move setting pte specific flags to
pfn_pte")
That commit didn't do the same for pfn_pmd() because we expect
pmd_mkhuge() to do that. But as per Linus that is a bad rule:
The rule that you must use "pmd_mkhuge()" seems _completely_ wrong.
The only valid use to ever make a pmd out of a pfn is to make a
huge-page.
Hence update pfn_pmd() to set _PAGE_PTE.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201022091115.39568-1-aneesh.kumar@linux.ibm.com
Christophe Leroy [Thu, 22 Oct 2020 14:05:46 +0000 (14:05 +0000)]
powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()
fls() and fls64() are using __builtin_ctz() and _builtin_ctzll().
On powerpc, those builtins trivially use ctlzw and ctlzd power
instructions.
Allthough those instructions provide the expected result with
input argument 0, __builtin_ctz() and __builtin_ctzll() are
documented as undefined for value 0.
The easiest fix would be to use fls() and fls64() functions
defined in include/asm-generic/bitops/builtin-fls.h and
include/asm-generic/bitops/fls64.h, but GCC output is not optimal:
00000388 <testfls>:
388: 2c 03 00 00 cmpwi r3,0
38c: 41 82 00 10 beq 39c <testfls+0x14>
390: 7c 63 00 34 cntlzw r3,r3
394: 20 63 00 20 subfic r3,r3,32
398: 4e 80 00 20 blr
39c: 38 60 00 00 li r3,0
3a0: 4e 80 00 20 blr
000003b0 <testfls64>:
3b0: 2c 03 00 00 cmpwi r3,0
3b4: 40 82 00 1c bne 3d0 <testfls64+0x20>
3b8: 2f 84 00 00 cmpwi cr7,r4,0
3bc: 38 60 00 00 li r3,0
3c0: 4d 9e 00 20 beqlr cr7
3c4: 7c 83 00 34 cntlzw r3,r4
3c8: 20 63 00 20 subfic r3,r3,32
3cc: 4e 80 00 20 blr
3d0: 7c 63 00 34 cntlzw r3,r3
3d4: 20 63 00 40 subfic r3,r3,64
3d8: 4e 80 00 20 blr
When the input of fls(x) is a constant, just check x for nullity and
return either 0 or __builtin_clz(x). Otherwise, use cntlzw instruction
directly.
For fls64() on PPC64, do the same but with __builtin_clzll() and
cntlzd instruction. On PPC32, lets take the generic fls64() which
will use our fls(). The result is as expected:
00000388 <testfls>:
388: 7c 63 00 34 cntlzw r3,r3
38c: 20 63 00 20 subfic r3,r3,32
390: 4e 80 00 20 blr
000003a0 <testfls64>:
3a0: 2c 03 00 00 cmpwi r3,0
3a4: 40 82 00 10 bne 3b4 <testfls64+0x14>
3a8: 7c 83 00 34 cntlzw r3,r4
3ac: 20 63 00 20 subfic r3,r3,32
3b0: 4e 80 00 20 blr
3b4: 7c 63 00 34 cntlzw r3,r3
3b8: 20 63 00 40 subfic r3,r3,64
3bc: 4e 80 00 20 blr
Fixes:
2fcff790dcb4 ("powerpc: Use builtin functions for fls()/__fls()/fls64()")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/348c2d3f19ffcff8abe50d52513f989c4581d000.1603375524.git.christophe.leroy@csgroup.eu
Jordan Niethe [Wed, 14 Oct 2020 07:28:37 +0000 (18:28 +1100)]
powerpc/64s: Convert some cpu_setup() and cpu_restore() functions to C
The only thing keeping the cpu_setup() and cpu_restore() functions
used in the cputable entries for Power7, Power8, Power9 and Power10 in
assembly was cpu_restore() being called before there was a stack in
generic_secondary_smp_init(). Commit ("powerpc/64: Set up a kernel
stack for secondaries before cpu_restore()") means that it is now
possible to use C.
Rewrite the functions in C so they are a little bit easier to read.
This is not changing their functionality.
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
[mpe: Tweak copyright and authorship notes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201014072837.24539-2-jniethe5@gmail.com
Nicholas Piggin [Tue, 17 Nov 2020 13:56:17 +0000 (23:56 +1000)]
powerpc/64s/exception: KVM Fix for host DSI being taken in HPT guest MMU context
Commit
2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR
interrupts when PR KVM is supported") removed KVM guest tests from
interrupts that do not set HV=1, when PR-KVM is not configured.
This is wrong for HV-KVM HPT guest MMIO emulation case which attempts
to load the faulting instruction word with MSR[DR]=1 and MSR[HV]=1 with
the guest MMU context loaded. This can cause host DSI, DSLB interrupts
which must test for KVM guest. Restore this and add a comment.
Fixes:
2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201117135617.3521127-1-npiggin@gmail.com
Michael Ellerman [Mon, 16 Nov 2020 12:09:13 +0000 (23:09 +1100)]
powerpc: Drop -me200 addition to build flags
Currently a build with CONFIG_E200=y will fail with:
Error: invalid switch -me200
Error: unrecognized option -me200
Upstream binutils has never supported an -me200 option. Presumably it
was supported at some point by either a fork or Freescale internal
binutils.
We can't support code that we can't even build test, so drop the
addition of -me200 to the build flags, so we can at least build with
CONFIG_E200=y.
Reported-by: Németh Márton <nm127@freemail.hu>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Scott Wood <oss@buserror.net>
Link: https://lore.kernel.org/r/20201116120913.165317-1-mpe@ellerman.id.au
Cédric Le Goater [Thu, 5 Nov 2020 13:47:13 +0000 (14:47 +0100)]
KVM: PPC: Book3S HV: XIVE: Fix possible oops when accessing ESB page
When accessing the ESB page of a source interrupt, the fault handler
will retrieve the page address from the XIVE interrupt 'xive_irq_data'
structure. If the associated KVM XIVE interrupt is not valid, that is
not allocated at the HW level for some reason, the fault handler will
dereference a NULL pointer leading to the oops below :
WARNING: CPU: 40 PID: 59101 at arch/powerpc/kvm/book3s_xive_native.c:259 xive_native_esb_fault+0xe4/0x240 [kvm]
CPU: 40 PID: 59101 Comm: qemu-system-ppc Kdump: loaded Tainted: G W --------- - - 4.18.0-240.el8.ppc64le #1
NIP:
c00800000e949fac LR:
c00000000044b164 CTR:
c00800000e949ec8
REGS:
c000001f69617840 TRAP: 0700 Tainted: G W --------- - - (4.18.0-240.el8.ppc64le)
MSR:
9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR:
44044282 XER:
00000000
CFAR:
c00000000044b160 IRQMASK: 0
GPR00:
c00000000044b164 c000001f69617ac0 c00800000e96e000 c000001f69617c10
GPR04:
05faa2b21e000080 0000000000000000 0000000000000005 ffffffffffffffff
GPR08:
0000000000000000 0000000000000001 0000000000000000 0000000000000001
GPR12:
c00800000e949ec8 c000001ffffd3400 0000000000000000 0000000000000000
GPR16:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20:
0000000000000000 0000000000000000 c000001f5c065160 c000000001c76f90
GPR24:
c000001f06f20000 c000001f5c065100 0000000000000008 c000001f0eb98c78
GPR28:
c000001dcab40000 c000001dcab403d8 c000001f69617c10 0000000000000011
NIP [
c00800000e949fac] xive_native_esb_fault+0xe4/0x240 [kvm]
LR [
c00000000044b164] __do_fault+0x64/0x220
Call Trace:
[
c000001f69617ac0] [
0000000137a5dc20] 0x137a5dc20 (unreliable)
[
c000001f69617b50] [
c00000000044b164] __do_fault+0x64/0x220
[
c000001f69617b90] [
c000000000453838] do_fault+0x218/0x930
[
c000001f69617bf0] [
c000000000456f50] __handle_mm_fault+0x350/0xdf0
[
c000001f69617cd0] [
c000000000457b1c] handle_mm_fault+0x12c/0x310
[
c000001f69617d10] [
c00000000007ef44] __do_page_fault+0x264/0xbb0
[
c000001f69617df0] [
c00000000007f8c8] do_page_fault+0x38/0xd0
[
c000001f69617e30] [
c00000000000a714] handle_page_fault+0x18/0x38
Instruction dump:
40c2fff0 7c2004ac 2fa90000 409e0118 73e90001 41820080 e8bd0008 7c2004ac
7ca90074 39400000 915c0000 7929d182 <
0b090000>
2fa50000 419e0080 e89e0018
---[ end trace
66c6ff034c53f64f ]---
xive-kvm: xive_native_esb_fault: accessing invalid ESB page for source 8 !
Fix that by checking the validity of the KVM XIVE interrupt structure.
Fixes:
6520ca64cde7 ("KVM: PPC: Book3S HV: XIVE: Add a mapping for the source ESB pages")
Cc: stable@vger.kernel.org # v5.2+
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201105134713.656160-1-clg@kaod.org
Nicholas Piggin [Sat, 14 Nov 2020 11:47:43 +0000 (21:47 +1000)]
powerpc/64s: Fix KVM system reset handling when CONFIG_PPC_PSERIES=y
pseries guest kernels have a FWNMI handler for SRESET and MCE NMIs,
which is basically the same as the regular handlers for those
interrupts.
The system reset FWNMI handler did not have a KVM guest test in it,
although it probably should have because the guest can itself run
guests.
Commit
4f50541f6703b ("powerpc/64s/exception: Move all interrupt
handlers to new style code gen macros") convert the handler faithfully
to avoid a KVM test with a "clever" trick to modify the IKVM_REAL
setting to 0 when the fwnmi handler is to be generated (PPC_PSERIES=y).
This worked when the KVM test was generated in the interrupt entry
handlers, but a later patch moved the KVM test to the common handler,
and the common handler macro is expanded below the fwnmi entry. This
prevents the KVM test from being generated even for the 0x100 entry
point as well.
The result is NMI IPIs in the host kernel when a guest is running will
use gest registers. This goes particularly badly when an HPT guest is
running and the MMU is set to guest mode.
Remove this trickery and just generate the test always.
Fixes:
9600f261acaa ("powerpc/64s/exception: Move KVM test to common code")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201114114743.3306283-1-npiggin@gmail.com
Christophe Leroy [Sat, 7 Nov 2020 09:07:40 +0000 (09:07 +0000)]
powerpc/32s: Use relocation offset when setting early hash table
When calling early_hash_table(), the kernel hasn't been yet
relocated to its linking address, so data must be addressed
with relocation offset.
Add relocation offset to write into Hash in early_hash_table().
Fixes:
69a1593abdbc ("powerpc/32s: Setup the early hash table at all time.")
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Serge Belyshev <belyshev@depni.sinp.msu.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9e225a856a8b22e0e77587ee22ab7a2f5bca8753.1604740029.git.christophe.leroy@csgroup.eu
Scott Cheloha [Thu, 5 Nov 2020 22:30:40 +0000 (16:30 -0600)]
powerpc/numa: Fix build when CONFIG_NUMA=n
Add a non-NUMA definition for of_drconf_to_nid_single() to topology.h
so we have one even if powerpc/mm/numa.c is not compiled. On a
non-NUMA kernel the appropriate node id is always first_online_node.
Fixes:
72cdd117c449 ("pseries/hotplug-memory: hot-add: skip redundant LMB lookup")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201105223040.3612663-1-cheloha@linux.ibm.com