Alastair D'Silva [Wed, 15 Apr 2020 01:23:42 +0000 (11:23 +1000)]
ocxl: Remove unnecessary externs
Function declarations don't need externs, remove the existing ones
so they are consistent with newer code
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200415012343.919255-2-alastair@d-silva.org
Thadeu Lima de Souza Cascardo [Tue, 28 Jul 2020 15:50:39 +0000 (12:50 -0300)]
selftests/powerpc: Return skip code for spectre_v2
When running under older versions of qemu of under newer versions with
old machine types, some security features will not be reported to the
guest. This will lead the guest OS to consider itself Vulnerable to
spectre_v2.
So, spectre_v2 test fails in such cases when the host is mitigated and
miss predictions cannot be detected as expected by the test.
Make it return the skip code instead, for this particular case. We
don't want to miss the case when the test fails and the system reports
as mitigated or not affected. But it is not a problem to miss failures
when the system reports as Vulnerable.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200728155039.401445-1-cascardo@canonical.com
Balamuruhan S [Tue, 28 Jul 2020 13:03:08 +0000 (18:33 +0530)]
powerpc/test_emulate_step: Add testcases for divde[.] and divdeu[.] instructions
Add testcases for divde, divde., divdeu, divdeu. emulated instructions
to cover few scenarios,
- with same dividend and divisor to have undefine RT
for divdeu[.]
- with divide by zero to have undefine RT for both
divde[.] and divdeu[.]
- with negative dividend to cover -|divisor| < r <= 0 if
the dividend is negative for divde[.]
- normal case with proper dividend and divisor for both
divde[.] and divdeu[.]
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200728130308.1790982-4-bala24@linux.ibm.com
Balamuruhan S [Tue, 28 Jul 2020 13:03:07 +0000 (18:33 +0530)]
powerpc/sstep: Add support for divde[.] and divdeu[.] instructions
This patch adds emulation support for divde, divdeu instructions,
- Divide Doubleword Extended (divde[.])
- Divide Doubleword Extended Unsigned (divdeu[.])
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200728130308.1790982-3-bala24@linux.ibm.com
Balamuruhan S [Tue, 28 Jul 2020 13:03:06 +0000 (18:33 +0530)]
powerpc/ppc-opcode: Add divde and divdeu opcodes
Include instruction opcodes for divde and divdeu as macros.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200728130308.1790982-2-bala24@linux.ibm.com
Harish [Tue, 9 Jun 2020 08:14:23 +0000 (13:44 +0530)]
selftests/powerpc: Fix CPU affinity for child process
On systems with large number of cpus, test fails trying to set
affinity by calling sched_setaffinity() with smaller size for affinity
mask. This patch fixes it by making sure that the size of allocated
affinity mask is dependent on the number of CPUs as reported by
get_nprocs().
Fixes:
00b7ec5c9cf3 ("selftests/powerpc: Import Anton's context_switch2 benchmark")
Reported-by: Shirisha Ganta <shiganta@in.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Harish <harish@linux.ibm.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Reviewed-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200609081423.529664-1-harish@linux.ibm.com
Wei Yongjun [Mon, 27 Jul 2020 17:11:12 +0000 (01:11 +0800)]
powerpc/powernv/sriov: Remove unused but set variable 'phb'
Gcc report warning as follows:
arch/powerpc/platforms/powernv/pci-sriov.c:602:25: warning:
variable 'phb' set but not used [-Wunused-but-set-variable]
602 | struct pnv_phb *phb;
| ^~~
This variable is not used, so this commit removing it.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727171112.2781-1-weiyongjun1@huawei.com
Qinglang Miao [Tue, 28 Jul 2020 02:28:07 +0000 (10:28 +0800)]
powerpc: use for_each_child_of_node() macro
Use for_each_child_of_node() macro instead of open coding it.
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200728022807.87815-1-miaoqinglang@huawei.com
Nicholas Piggin [Tue, 3 Mar 2020 01:27:48 +0000 (11:27 +1000)]
powerpc/build: vdso linker warning for orphan sections
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200303012748.4190929-1-npiggin@gmail.com
Gustavo A. R. Silva [Mon, 27 Jul 2020 22:42:01 +0000 (17:42 -0500)]
powerpc: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.
[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727224201.GA10133@embeddedor
Aneesh Kumar K.V [Mon, 27 Jul 2020 08:59:08 +0000 (14:29 +0530)]
powerpc/book3s64/radix: Add kernel command line option to disable radix GTSE
This adds a kernel command line option that can be used to disable GTSE support.
Disabling GTSE implies kernel will make hcalls to invalidate TLB entries.
This was done so that we can do VM migration between configs that enable/disable
GTSE support via hypervisor. To migrate a VM from a system that supports
GTSE to a system that doesn't, we can boot the guest with
radix_hcall_invalidate=on, thereby forcing the guest to use hcalls for TLB
invalidates.
The check for hcall availability is done in pSeries_setup_arch so that
the panic message appears on the console. This should only happen on
a hypervisor that doesn't force the guest to hash translation even
though it can't handle the radix GTSE=0 request via CAS. With
radix_hcall_invalidate=on if the hypervisor doesn't support hcall_rpt_invalidate
hcall it should force the LPAR to hash translation.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727085908.420806-1-aneesh.kumar@linux.ibm.com
Aneesh Kumar K.V [Mon, 13 Jul 2020 15:07:49 +0000 (20:37 +0530)]
powerpc/kvm/cma: Improve kernel log during boot
Current kernel gives:
[ 0.000000] cma: Reserved 26224 MiB at 0x0000007959000000
[ 0.000000] hugetlb_cma: reserve 65536 MiB, up to 16384 MiB per node
[ 0.000000] cma: Reserved 16384 MiB at 0x0000001800000000
With the fix
[ 0.000000] kvm_cma_reserve: reserving 26214 MiB for global area
[ 0.000000] cma: Reserved 26224 MiB at 0x0000007959000000
[ 0.000000] hugetlb_cma: reserve 65536 MiB, up to 16384 MiB per node
[ 0.000000] cma: Reserved 16384 MiB at 0x0000001800000000
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200713150749.25245-2-aneesh.kumar@linux.ibm.com
Aneesh Kumar K.V [Mon, 13 Jul 2020 15:07:48 +0000 (20:37 +0530)]
powerpc/hugetlb/cma: Allocate gigantic hugetlb pages using CMA
commit:
cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
added support for allocating gigantic hugepages using CMA. This patch
enables the same for powerpc
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200713150749.25245-1-aneesh.kumar@linux.ibm.com
Balamuruhan S [Mon, 30 Mar 2020 07:59:54 +0000 (13:29 +0530)]
powerpc/xmon: Use `dcbf` inplace of `dcbi` instruction for 64bit Book3S
Data Cache Block Invalidate (dcbi) instruction implemented back in
PowerPC architecture version 2.03. But as per Power Processor Users Manual
it is obsolete and not supported by POWER8/POWER9 core. Attempt to use of
this illegal instruction results in a hypervisor emulation assistance
interrupt. So, ifdef it out the option `i` in xmon for 64bit Book3S.
0:mon> fi
cpu 0x0: Vector: 700 (Program Check) at [
c000000003be74a0]
pc:
c000000000102030: cacheflush+0x180/0x1a0
lr:
c000000000101f3c: cacheflush+0x8c/0x1a0
sp:
c000000003be7730
msr:
8000000000081033
current = 0xc0000000035e5c00
paca = 0xc000000001910000 irqmask: 0x03 irq_happened: 0x01
pid = 1025, comm = bash
Linux version 5.6.0-rc5-g5aa19adac (root@ltc-wspoon6) (gcc version 7.4.0
(Ubuntu 7.4.0-1ubuntu1~18.04.1)) #1 SMP Tue Mar 10 04:38:41 CDT 2020
cpu 0x0: Exception 700 (Program Check) in xmon, returning to main loop
[
c000000003be7c50]
c00000000084abb0 __handle_sysrq+0xf0/0x2a0
[
c000000003be7d00]
c00000000084b3c0 write_sysrq_trigger+0xb0/0xe0
[
c000000003be7d30]
c0000000004d1edc proc_reg_write+0x8c/0x130
[
c000000003be7d60]
c00000000040dc7c __vfs_write+0x3c/0x70
[
c000000003be7d80]
c000000000410e70 vfs_write+0xd0/0x210
[
c000000003be7dd0]
c00000000041126c ksys_write+0xdc/0x130
[
c000000003be7e20]
c00000000000b9d0 system_call+0x5c/0x68
--- Exception: c01 (System Call) at
00007fffa345e420
SP (
7ffff0b08ab0) is in userspace
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200330075954.538773-1-bala24@linux.ibm.com
Michael Ellerman [Fri, 24 Jul 2020 13:17:28 +0000 (23:17 +1000)]
powerpc: Drop old comment about CONFIG_POWER
There's a comment in time.h referring to CONFIG_POWER, which doesn't
exist. That confuses scripts/checkkconfigsymbols.py.
Presumably the comment was referring to a CONFIG_POWER vs CONFIG_PPC,
in which case for CONFIG_POWER we would #define __USE_RTC to 1. But
instead we have CONFIG_PPC_BOOK3S_601, and these days we have
IS_ENABLED().
So the comment is no longer relevant, drop it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-9-mpe@ellerman.id.au
Michael Ellerman [Fri, 24 Jul 2020 13:17:27 +0000 (23:17 +1000)]
powerpc/kvm: Use correct CONFIG symbol in comment
This comment refers to the non-existent CONFIG_PPC_BOOK3S_XX, which
confuses scripts/checkkconfigsymbols.py.
Change it to use the correct symbol.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-8-mpe@ellerman.id.au
Michael Ellerman [Fri, 24 Jul 2020 13:17:26 +0000 (23:17 +1000)]
powerpc/boot: Fix CONFIG_PPC_MPC52XX references
Commit
866bfc75f40e ("powerpc: conditionally compile platform-specific
serial drivers") made some code depend on CONFIG_PPC_MPC52XX, which
doesn't exist.
Fix it to use CONFIG_PPC_MPC52xx.
Fixes:
866bfc75f40e ("powerpc: conditionally compile platform-specific serial drivers")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-7-mpe@ellerman.id.au
Michael Ellerman [Fri, 24 Jul 2020 13:17:25 +0000 (23:17 +1000)]
powerpc/32s: Remove TAUException wart in traps.c
All 32 and 64-bit builds that don't have CONFIG_TAU_INT enabled (all
of them), get a definition of TAUException() in traps.c.
On 64-bit it's completely useless, and just wastes ~120 bytes of text.
On 32-bit it allows the kernel to link because head_32.S calls it
unconditionally.
Instead follow the example of altivec_assist_exception(), and if
CONFIG_TAU_INT is not enabled just point it at unknown_exception using
the preprocessor.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-6-mpe@ellerman.id.au
Michael Ellerman [Fri, 24 Jul 2020 13:17:24 +0000 (23:17 +1000)]
powerpc/32s: Fix CONFIG_BOOK3S_601 uses
We have two uses of CONFIG_BOOK3S_601, which doesn't exist. Fix them
to use CONFIG_PPC_BOOK3S_601 which is the correct symbol.
Fixes:
12c3f1fd87bf ("powerpc/32s: get rid of CPU_FTR_601 feature")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-5-mpe@ellerman.id.au
Michael Ellerman [Fri, 24 Jul 2020 13:17:23 +0000 (23:17 +1000)]
powerpc/64e: Drop dead BOOK3E_MMU_TLB_STATS code
This code was merged 11 years ago in commit
13363ab9b9d0 ("powerpc:
Add definitions used by exception handling on 64-bit Book3E") but was
never able to be built because CONFIG_BOOK3E_MMU_TLB_STATS never
existed. Remove it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-4-mpe@ellerman.id.au
Michael Ellerman [Fri, 24 Jul 2020 13:17:22 +0000 (23:17 +1000)]
powerpc/52xx: Fix comment about CONFIG_BDI*
There's a comment in lite5200_sleep.S that refers to "CONFIG_BDI*".
This confuses scripts/checkkconfigsymbols.py, which thinks it should
be able to find CONFIG_BDI.
Change the comment to refer to CONFIG_BDI_SWITCH which is presumably
roughly what it was referring to. AFAICS there never has been a
CONFIG_BDI.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-3-mpe@ellerman.id.au
Michael Ellerman [Fri, 24 Jul 2020 13:17:21 +0000 (23:17 +1000)]
powerpc/configs: Remove dead symbols
Remove references to symbols that no longer exist as reported by
scripts/checkkconfigsymbols.py.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-2-mpe@ellerman.id.au
Michael Ellerman [Fri, 24 Jul 2020 13:17:20 +0000 (23:17 +1000)]
powerpc/configs: Drop old symbols from ppc6xx_defconfig
ppc6xx_defconfig refers to quite a few symbols that no longer exist,
as reported by scripts/checkkconfigsymbols.py, remove them.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-1-mpe@ellerman.id.au
Bharata B Rao [Mon, 27 Jul 2020 09:57:04 +0000 (15:27 +0530)]
powerpc/mm: Limit resize_hpt_for_hotplug() call to hash guests only
During memory hotplug and unplug, resize_hpt_for_hotplug() gets called
for both hash and radix guests but it should be called only for hash
guests. Though the call does nothing in the radix guest case, it is
cleaner to push this call into hash specific memory hotplug routines.
Reported-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727095704.1432916-1-bharata@linux.ibm.com
Michael Ellerman [Fri, 24 Jul 2020 09:25:28 +0000 (19:25 +1000)]
selftests/powerpc: Remove powerpc special cases from stack expansion test
Now that the powerpc code behaves the same as other architectures we
can drop the special cases we had.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724092528.1578671-5-mpe@ellerman.id.au
Michael Ellerman [Fri, 24 Jul 2020 09:25:27 +0000 (19:25 +1000)]
powerpc/mm: Remove custom stack expansion checking
We have powerpc specific logic in our page fault handling to decide if
an access to an unmapped address below the stack pointer should expand
the stack VMA.
The logic aims to prevent userspace from doing bad accesses below the
stack pointer. However as long as the stack is < 1MB in size, we allow
all accesses without further checks. Adding some debug I see that I
can do a full kernel build and LTP run, and not a single process has
used more than 1MB of stack. So for the majority of processes the
logic never even fires.
We also recently found a nasty bug in this code which could cause
userspace programs to be killed during signal delivery. It went
unnoticed presumably because most processes use < 1MB of stack.
The generic mm code has also grown support for stack guard pages since
this code was originally written, so the most heinous case of the
stack expanding into other mappings is now handled for us.
Finally although some other arches have special logic in this path,
from what I can tell none of x86, arm64, arm and s390 impose any extra
checks other than those in expand_stack().
So drop our complicated logic and like other architectures just let
the stack expand as long as its within the rlimit.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Daniel Axtens <dja@axtens.net>
Link: https://lore.kernel.org/r/20200724092528.1578671-4-mpe@ellerman.id.au
Michael Ellerman [Fri, 24 Jul 2020 09:25:26 +0000 (19:25 +1000)]
selftests/powerpc: Update the stack expansion test
Update the stack expansion load/store test to take into account the
new allowance of 4224 bytes below the stack pointer.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724092528.1578671-3-mpe@ellerman.id.au
Michael Ellerman [Fri, 24 Jul 2020 09:25:25 +0000 (19:25 +1000)]
powerpc: Allow 4224 bytes of stack expansion for the signal frame
We have powerpc specific logic in our page fault handling to decide if
an access to an unmapped address below the stack pointer should expand
the stack VMA.
The code was originally added in 2004 "ported from 2.4". The rough
logic is that the stack is allowed to grow to 1MB with no extra
checking. Over 1MB the access must be within 2048 bytes of the stack
pointer, or be from a user instruction that updates the stack pointer.
The 2048 byte allowance below the stack pointer is there to cover the
288 byte "red zone" as well as the "about 1.5kB" needed by the signal
delivery code.
Unfortunately since then the signal frame has expanded, and is now
4224 bytes on 64-bit kernels with transactional memory enabled. This
means if a process has consumed more than 1MB of stack, and its stack
pointer lies less than 4224 bytes from the next page boundary, signal
delivery will fault when trying to expand the stack and the process
will see a SEGV.
The total size of the signal frame is the size of struct rt_sigframe
(which includes the red zone) plus __SIGNAL_FRAMESIZE (128 bytes on
64-bit).
The 2048 byte allowance was correct until 2008 as the signal frame
was:
struct rt_sigframe {
struct ucontext uc; /* 0 1440 */
/* --- cacheline 11 boundary (1408 bytes) was 32 bytes ago --- */
long unsigned int _unused[2]; /* 1440 16 */
unsigned int tramp[6]; /* 1456 24 */
struct siginfo * pinfo; /* 1480 8 */
void * puc; /* 1488 8 */
struct siginfo info; /* 1496 128 */
/* --- cacheline 12 boundary (1536 bytes) was 88 bytes ago --- */
char abigap[288]; /* 1624 288 */
/* size: 1920, cachelines: 15, members: 7 */
/* padding: 8 */
};
1920 + 128 = 2048
Then in commit
ce48b2100785 ("powerpc: Add VSX context save/restore,
ptrace and signal support") (Jul 2008) the signal frame expanded to
2304 bytes:
struct rt_sigframe {
struct ucontext uc; /* 0 1696 */ <--
/* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */
long unsigned int _unused[2]; /* 1696 16 */
unsigned int tramp[6]; /* 1712 24 */
struct siginfo * pinfo; /* 1736 8 */
void * puc; /* 1744 8 */
struct siginfo info; /* 1752 128 */
/* --- cacheline 14 boundary (1792 bytes) was 88 bytes ago --- */
char abigap[288]; /* 1880 288 */
/* size: 2176, cachelines: 17, members: 7 */
/* padding: 8 */
};
2176 + 128 = 2304
At this point we should have been exposed to the bug, though as far as
I know it was never reported. I no longer have a system old enough to
easily test on.
Then in 2010 commit
320b2b8de126 ("mm: keep a guard page below a
grow-down stack segment") caused our stack expansion code to never
trigger, as there was always a VMA found for a write up to PAGE_SIZE
below r1.
That meant the bug was hidden as we continued to expand the signal
frame in commit
2b0a576d15e0 ("powerpc: Add new transactional memory
state to the signal context") (Feb 2013):
struct rt_sigframe {
struct ucontext uc; /* 0 1696 */
/* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */
struct ucontext uc_transact; /* 1696 1696 */ <--
/* --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- */
long unsigned int _unused[2]; /* 3392 16 */
unsigned int tramp[6]; /* 3408 24 */
struct siginfo * pinfo; /* 3432 8 */
void * puc; /* 3440 8 */
struct siginfo info; /* 3448 128 */
/* --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- */
char abigap[288]; /* 3576 288 */
/* size: 3872, cachelines: 31, members: 8 */
/* padding: 8 */
/* last cacheline: 32 bytes */
};
3872 + 128 = 4000
And commit
573ebfa6601f ("powerpc: Increase stack redzone for 64-bit
userspace to 512 bytes") (Feb 2014):
struct rt_sigframe {
struct ucontext uc; /* 0 1696 */
/* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */
struct ucontext uc_transact; /* 1696 1696 */
/* --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- */
long unsigned int _unused[2]; /* 3392 16 */
unsigned int tramp[6]; /* 3408 24 */
struct siginfo * pinfo; /* 3432 8 */
void * puc; /* 3440 8 */
struct siginfo info; /* 3448 128 */
/* --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- */
char abigap[512]; /* 3576 512 */ <--
/* size: 4096, cachelines: 32, members: 8 */
/* padding: 8 */
};
4096 + 128 = 4224
Then finally in 2017, commit
1be7107fbe18 ("mm: larger stack guard
gap, between vmas") exposed us to the existing bug, because it changed
the stack VMA to be the correct/real size, meaning our stack expansion
code is now triggered.
Fix it by increasing the allowance to 4224 bytes.
Hard-coding 4224 is obviously unsafe against future expansions of the
signal frame in the same way as the existing code. We can't easily use
sizeof() because the signal frame structure is not in a header. We
will either fix that, or rip out all the custom stack expansion
checking logic entirely.
Fixes:
ce48b2100785 ("powerpc: Add VSX context save/restore, ptrace and signal support")
Cc: stable@vger.kernel.org # v2.6.27+
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724092528.1578671-2-mpe@ellerman.id.au
Michael Ellerman [Fri, 24 Jul 2020 09:25:24 +0000 (19:25 +1000)]
selftests/powerpc: Add test of stack expansion logic
We have custom stack expansion checks that it turns out are extremely
badly tested and contain bugs, surprise. So add some tests that
exercise the code and capture the current boundary conditions.
The signal test currently fails on 64-bit kernels because the 2048
byte allowance for the signal frame is too small, we will fix that in
a subsequent patch.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724092528.1578671-1-mpe@ellerman.id.au
Oliver O'Halloran [Mon, 27 Jul 2020 01:01:27 +0000 (11:01 +1000)]
selftests/powerpc: Squash spurious errors due to device removal
For drivers that don't have the error handling callbacks we implement
recovery by removing the device and re-probing it. This causes the sysfs
directory for the PCI device to be removed which causes the following
spurious error to be printed when checking the PE state:
Breaking 0005:03:00.0...
./eeh-basic.sh: line 13: can't open /sys/bus/pci/devices/0005:03:00.0/eeh_pe_state: no such file
0005:03:00.0, waited 0/60
0005:03:00.0, waited 1/60
0005:03:00.0, waited 2/60
0005:03:00.0, waited 3/60
0005:03:00.0, waited 4/60
0005:03:00.0, waited 5/60
0005:03:00.0, waited 6/60
0005:03:00.0, waited 7/60
0005:03:00.0, Recovered after 8 seconds
We currently try to avoid this by checking if the PE state file exists
before reading from it. This is however inherently racy so re-work the
state checking so that we only read from the file once, and we squash any
errors that occur while reading.
Fixes:
85d86c8aa52e ("selftests/powerpc: Add basic EEH selftest")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727010127.23698-1-oohall@gmail.com
Sandipan Das [Mon, 27 Jul 2020 04:00:40 +0000 (09:30 +0530)]
selftests/powerpc: Add test for pkey siginfo verification
Commit
c46241a370a61 ("powerpc/pkeys: Check vma before
returning key fault error to the user") fixes a bug which
causes the kernel to set the wrong pkey in siginfo when a
pkey fault occurs after two competing threads that have
allocated different pkeys, one fully permissive and the
other restrictive, attempt to protect a common page at the
same time. This adds a test to detect the bug.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ce40b6ee270bda52e8f4088578ed2faf7d1d509a.1595821792.git.sandipan@linux.ibm.com
Sandipan Das [Mon, 27 Jul 2020 04:00:39 +0000 (09:30 +0530)]
selftests/powerpc: Add wrapper for gettid
The gettid() syscall wrapper was first introduced in
glibc 2.30. This adds a wrapper for use in distros
running older versions.
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8ca3b0eeda989707815d1cf337cc33f090408965.1595821792.git.sandipan@linux.ibm.com
Sandipan Das [Mon, 27 Jul 2020 04:00:38 +0000 (09:30 +0530)]
selftests/powerpc: Add helper to exit on failure
This adds a helper similar to FAIL_IF() which lets a
program exit with code 1 (to indicate failure) when
the given condition is true.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dac282d5c2e96e7816dc522e4e20d56d7c79c898.1595821792.git.sandipan@linux.ibm.com
Sandipan Das [Mon, 27 Jul 2020 04:00:37 +0000 (09:30 +0530)]
selftests/powerpc: Harden test for execute-disabled pkeys
Commit
192b6a7805989 ("powerpc/book3s64/pkeys: Fix
pkey_access_permitted() for execute disable pkey") fixed a
bug that caused repetitive faults for pkeys with no execute
rights alongside some combination of read and write rights.
This removes the last two cases of the test, which check
the behaviour of pkeys with read, write but no execute
rights and all the rights, in favour of checking all the
possible combinations of read, write and execute rights
to be able to detect bugs like the one mentioned above.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/db467500f8af47727bba6b35796e8974a78b71e5.1595821792.git.sandipan@linux.ibm.com
Sandipan Das [Mon, 27 Jul 2020 04:00:36 +0000 (09:30 +0530)]
selftests/powerpc: Add pkey helpers for rights
This adds some new pkey-related helper to print
access rights of a pkey in the "rwx" format and
to generate different valid combinations of pkey
rights starting from a given combination.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6cc1c7d1f686618668a3e090f1d0c2a4cd9dea3f.1595821792.git.sandipan@linux.ibm.com
Sandipan Das [Mon, 27 Jul 2020 04:00:35 +0000 (09:30 +0530)]
selftests/powerpc: Move pkey helpers to headers
This moves all the pkey-related helpers to a new header
file and also a helper to print error messages in signal
handlers to the existing utils header file.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/28e633fa9ec1a6500c12188e09ea1887b10a10c1.1595821792.git.sandipan@linux.ibm.com
Nicholas Piggin [Sun, 26 Jul 2020 03:51:55 +0000 (13:51 +1000)]
powerpc/pseries: Add KVM guest doorbell restrictions
KVM guests have certain restrictions and performance quirks when using
doorbells. This patch moves the EPAPR KVM guest test so it can be shared
with PSERIES, and uses that in doorbell setup code to apply the KVM
guest quirks and improves IPI performance for two cases:
- PowerVM guests may now use doorbells even if they are secure.
- KVM guests no longer use doorbells if XIVE is available.
There is a valid complaint that "KVM guest" is not a very reasonable
thing to test for, it's preferable for the hypervisor to advertise
particular behaviours to the guest so they could change if the
hypervisor implementation or configuration changes. However in this case
we were already assuming a KVM guest worst case, so this patch is about
containing those quirks. If KVM later advertises fast doorbells, we
should test for that and override the quirks.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726035155.1424103-4-npiggin@gmail.com
Nicholas Piggin [Sun, 26 Jul 2020 03:51:54 +0000 (13:51 +1000)]
powerpc/pseries: Use doorbells even if XIVE is available
KVM supports msgsndp in guests by trapping and emulating the
instruction, so it was decided to always use XIVE for IPIs if it is
available. However on PowerVM systems, msgsndp can be used and gives
better performance. On large systems, high XIVE interrupt rates can
have sub-linear scaling, and using msgsndp can reduce the load on
the interrupt controller.
So switch to using core local doorbells even if XIVE is available.
This reduces performance for KVM guests with an SMT topology by
about 50% for ping-pong context switching between SMT vCPUs. An
option vector (or dt-cpu-ftrs) could be defined to disable msgsndp
to get KVM performance back.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726035155.1424103-3-npiggin@gmail.com
Nicholas Piggin [Sun, 26 Jul 2020 03:51:53 +0000 (13:51 +1000)]
powerpc: Inline doorbell sending functions
These are only called in one place for a given platform, so inline
them for performance.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
[mpe: Fix build errors related to KVM]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726035155.1424103-2-npiggin@gmail.com
Athira Rajeev [Wed, 29 Jul 2020 04:16:54 +0000 (00:16 -0400)]
powerpc/perf: Fix MMCRA_BHRB_DISABLE define for binutils < 2.28
Commit
9908c826d5ed ("powerpc/perf: Add Power10 PMU feature to DT CPU
features") defines MMCRA_BHRB_DISABLE as `0x2000000000UL`. Binutils
version less than 2.28 doesn't support UL suffix.
arch/powerpc/kernel/cpu_setup_power.S: Assembler messages:
arch/powerpc/kernel/cpu_setup_power.S:250: Error: found 'L', expected: ')'
arch/powerpc/kernel/cpu_setup_power.S:250: Error: junk at end of line, first unrecognized character is `L'
arch/powerpc/kernel/cpu_setup_power.S:250: Error: found 'L', expected: ')'
arch/powerpc/kernel/cpu_setup_power.S:250: Error: found 'L', expected: ')'
arch/powerpc/kernel/cpu_setup_power.S:250: Error: junk at end of line, first unrecognized character is `L'
arch/powerpc/kernel/cpu_setup_power.S:250: Error: found 'L', expected: ')'
arch/powerpc/kernel/cpu_setup_power.S:250: Error: found 'L', expected: ')'
arch/powerpc/kernel/cpu_setup_power.S:250: Error: operand out of range (0x0000002000000000 is not between 0xffffffffffff8000 and 0x000000000000ffff)
Fix this by wrapping it with the `_UL` macro.
Fixes:
9908c826d5ed ("Add Power10 PMU feature to DT CPU features")
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1595996214-5833-1-git-send-email-atrajeev@linux.vnet.ibm.com
Michael Ellerman [Mon, 27 Jul 2020 03:44:36 +0000 (13:44 +1000)]
powerpc/fadump: Fix build error with CONFIG_PRESERVE_FA_DUMP=y
skiroot_defconfig fails:
arch/powerpc/kernel/fadump.c:48:17: error: ‘cpus_in_fadump’ defined but not used
48 | static atomic_t cpus_in_fadump;
Fix it by moving the definition into the #ifdef where it's used.
Fixes:
ba608c4fa12c ("powerpc/fadump: fix race between pstore write and fadump crash trigger")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727070341.595634-1-mpe@ellerman.id.au
Randy Dunlap [Sun, 26 Jul 2020 00:38:09 +0000 (17:38 -0700)]
powerpc/powernv/pci.h: delete duplicated word
Drop the repeated word "for".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-10-rdunlap@infradead.org
Randy Dunlap [Sun, 26 Jul 2020 00:38:08 +0000 (17:38 -0700)]
powerpc/smu.h: delete duplicated word
Drop the repeated word "the".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-9-rdunlap@infradead.org
Randy Dunlap [Sun, 26 Jul 2020 00:38:07 +0000 (17:38 -0700)]
powerpc/reg.h: delete duplicated word
Drop the repeated word "a".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-8-rdunlap@infradead.org
Randy Dunlap [Sun, 26 Jul 2020 00:38:06 +0000 (17:38 -0700)]
powerpc/ppc_asm.h: delete duplicated word
Drop the repeated word "in".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-7-rdunlap@infradead.org
Randy Dunlap [Sun, 26 Jul 2020 00:38:05 +0000 (17:38 -0700)]
powerpc/hw_breakpoint.h: delete duplicated word
Drop the repeated word "the".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-6-rdunlap@infradead.org
Randy Dunlap [Sun, 26 Jul 2020 00:38:04 +0000 (17:38 -0700)]
powerpc/epapr_hcalls.h: delete duplicated words
Drop the repeated words "file" and "the".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-5-rdunlap@infradead.org
Randy Dunlap [Sun, 26 Jul 2020 00:38:03 +0000 (17:38 -0700)]
powerpc/cputime.h: delete duplicated word
Drop the repeated word "use".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-4-rdunlap@infradead.org
Randy Dunlap [Sun, 26 Jul 2020 00:38:02 +0000 (17:38 -0700)]
powerpc/book3s/radix-4k.h: delete duplicated word
Drop the repeated word "per".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-3-rdunlap@infradead.org
Randy Dunlap [Sun, 26 Jul 2020 00:38:01 +0000 (17:38 -0700)]
powerpc/book3s/mmu-hash.h: delete duplicated word
Drop the repeated word "below".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726003809.20454-2-rdunlap@infradead.org
Li RongQing [Fri, 26 Apr 2019 11:36:30 +0000 (19:36 +0800)]
powerpc/lib: remove memcpy_flushcache redundant return
Align it with other architectures and none of the callers has
been interested its return
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1556278590-14727-1-git-send-email-lirongqing@baidu.com
Christophe Leroy [Mon, 29 Jun 2020 11:17:19 +0000 (11:17 +0000)]
powerpc/ptdump: Refactor update of pg_state
In note_page(), the pg_state is updated the same way in two places.
Add note_page_update_state() to do it.
Also include the display of boundary markers there as it is missing
"no level" leg, leading to a mismatch when the first two markers
are at the same address and the first displayed area uses that
address.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a284a809f01c705bbaab303b06fda216f147a99a.1593429426.git.christophe.leroy@csgroup.eu
Christophe Leroy [Mon, 29 Jun 2020 11:17:18 +0000 (11:17 +0000)]
powerpc/ptdump: Refactor update of st->last_pa
st->last_pa is always updated in note_page() so it can
be done outside the if/elseif/else block.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/610d6b1a60ad0bedef865a90153c1110cfaa507e.1593429426.git.christophe.leroy@csgroup.eu
Christophe Leroy [Mon, 29 Jun 2020 11:15:26 +0000 (11:15 +0000)]
powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX
When STRICT_KERNEL_RWX is set, we want to set NX bit on vmalloc
segments. But modules require exec.
Use a dedicated segment for modules. There is not much space
above kernel, and we don't waste vmalloc space to do alignment.
Therefore, we take the segment before PAGE_OFFSET for modules.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eb8faba9148b6cf17c696ba776b4e8ee2f6313bf.1593428200.git.christophe.leroy@csgroup.eu
Christophe Leroy [Mon, 29 Jun 2020 11:15:24 +0000 (11:15 +0000)]
powerpc/32s: Kernel space starts at TASK_SIZE
Kernel space starts at TASK_SIZE. Select kernel page table
when address is over TASK_SIZE.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/893425e32cd0a003539573b2d115e0ffa98bc26c.1593428200.git.christophe.leroy@csgroup.eu
Christophe Leroy [Mon, 29 Jun 2020 11:15:23 +0000 (11:15 +0000)]
powerpc/32: Set user/kernel boundary at TASK_SIZE instead of PAGE_OFFSET
User space stops at TASK_SIZE. At the moment, kernel space starts
at PAGE_OFFSET.
In order to use space between TASK_SIZE and PAGE_OFFSET for modules,
make TASK_SIZE the limit between user and kernel space.
Note that fault.c already considers TASK_SIZE as the boundary between
user and kernel space.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b38b52cd8dabbb56fbd6f9219d6f3cdccbb43b44.1593428200.git.christophe.leroy@csgroup.eu
Christophe Leroy [Mon, 29 Jun 2020 11:15:22 +0000 (11:15 +0000)]
powerpc/32s: Only leave NX unset on segments used for modules
Instead of leaving NX unset on all segments above the start
of vmalloc space, only leave NX unset on segments used for
modules.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7172c0f5253419315e434a1816ee3d6ed6505bc0.1593428200.git.christophe.leroy@csgroup.eu
Christophe Leroy [Mon, 29 Jun 2020 11:15:21 +0000 (11:15 +0000)]
powerpc: Use MODULES_VADDR if defined
In order to allow allocation of modules outside of vmalloc space,
use MODULES_VADDR and MODULES_END when MODULES_VADDR is defined.
Redefine module_alloc() when MODULES_VADDR defined.
Unmap corresponding KASAN shadow memory.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7ecf5fff1eef67d450e73fc412b6ec3818483d75.1593428200.git.christophe.leroy@csgroup.eu
Christophe Leroy [Mon, 29 Jun 2020 11:15:20 +0000 (11:15 +0000)]
powerpc/lib: Prepare code-patching for modules allocated outside vmalloc space
Use is_vmalloc_or_module_addr() instead of is_vmalloc_addr()
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7d884db0e5a6f521331639d8c0f13e520d5a4fef.1593428200.git.christophe.leroy@csgroup.eu
Wei Yongjun [Sat, 25 Jul 2020 09:19:49 +0000 (17:19 +0800)]
powerpc/papr_scm: Make some symbols static
The sparse tool complains as follows:
arch/powerpc/platforms/pseries/papr_scm.c:97:1: warning:
symbol 'papr_nd_regions' was not declared. Should it be static?
arch/powerpc/platforms/pseries/papr_scm.c:98:1: warning:
symbol 'papr_ndr_lock' was not declared. Should it be static?
Those variables are not used outside of papr_scm.c, so this
commit marks them static.
Fixes:
85343a8da2d9 ("powerpc/papr/scm: Add bad memory ranges to nvdimm bad ranges")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200725091949.75234-1-weiyongjun1@huawei.com
Bill Wendling [Fri, 24 Jul 2020 22:49:01 +0000 (15:49 -0700)]
powerpc/64s: allow for clang's objdump differences
Clang's objdump emits slightly different output from GNU's objdump,
causing a list of warnings to be emitted during relocatable builds.
E.g., clang's objdump emits this:
c000000000000004: 2c 00 00 48 b 0xc000000000000030
...
c000000000005c6c: 10 00 82 40 bf 2, 0xc000000000005c7c
while GNU objdump emits:
c000000000000004: 2c 00 00 48 b
c000000000000030 <__start+0x30>
...
c000000000005c6c: 10 00 82 40 bne
c000000000005c7c <masked_interrupt+0x3c>
Adjust llvm-objdump's output to remove the extraneous '0x' and convert
'bf' and 'bt' to 'bne' and 'beq' resp. to more closely match GNU
objdump's output.
Note that clang's objdump doesn't yet output the relocation symbols on
PPC.
Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/191c67db31264b69cf6b566fd69851beb3dd0abb.1595630874.git.morbo@google.com
Nicholas Piggin [Fri, 24 Jul 2020 13:14:23 +0000 (23:14 +1000)]
powerpc: Implement smp_cond_load_relaxed()
This implements smp_cond_load_relaxed() with the slowpath busy loop
using the preferred SMT priority pattern.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Waiman Long <longman@redhat.com>
[mpe: Make it 64-bit only to fix build errors on 32-bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-7-npiggin@gmail.com
Nicholas Piggin [Fri, 24 Jul 2020 13:14:22 +0000 (23:14 +1000)]
powerpc/qspinlock: Optimised atomic_try_cmpxchg_lock() that adds the lock hint
This brings the behaviour of the uncontended fast path back to roughly
equivalent to simple spinlocks -- a single atomic op with lock hint.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-6-npiggin@gmail.com
Nicholas Piggin [Fri, 24 Jul 2020 13:14:21 +0000 (23:14 +1000)]
powerpc/pseries: Implement paravirt qspinlocks for SPLPAR
This implements the generic paravirt qspinlocks using H_PROD and
H_CONFER to kick and wait.
This uses an un-directed yield to any CPU rather than the directed
yield to a pre-empted lock holder that paravirtualised simple
spinlocks use, that requires no kick hcall. This is something that
could be investigated and improved in future.
Performance results can be found in the commit which added queued
spinlocks.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-5-npiggin@gmail.com
Nicholas Piggin [Fri, 24 Jul 2020 13:14:20 +0000 (23:14 +1000)]
powerpc/64s: Implement queued spinlocks and rwlocks
These have shown significantly improved performance and fairness when
spinlock contention is moderate to high on very large systems.
With this series including subsequent patches, on a 16 socket 1536
thread POWER9, a stress test such as same-file open/close from all
CPUs gets big speedups, 11620op/s aggregate with simple spinlocks vs
384158op/s (33x faster), where the difference in throughput between
the fastest and slowest thread goes from 7x to 1.4x.
Thanks to the fast path being identical in terms of atomics and
barriers (after a subsequent optimisation patch), single threaded
performance is not changed (no measurable difference).
On smaller systems, performance and fairness seems to be generally
improved. Using dbench on tmpfs as a test (that starts to run into
kernel spinlock contention), a 2-socket OpenPOWER POWER9 system was
tested with bare metal and KVM guest configurations. Results can be
found here:
https://github.com/linuxppc/issues/issues/305#issuecomment-
663487453
Observations are:
- Queued spinlocks are equal when contention is insignificant, as
expected and as measured with microbenchmarks.
- When there is contention, on bare metal queued spinlocks have better
throughput and max latency at all points.
- When virtualised, queued spinlocks are slightly worse approaching
peak throughput, but significantly better throughput and max latency
at all points beyond peak, until queued spinlock maximum latency
rises when clients are 2x vCPUs.
The regressions haven't been analysed very well yet, there are a lot
of things that can be tuned, particularly the paravirtualised locking,
but the numbers already look like a good net win even on relatively
small systems.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-4-npiggin@gmail.com
Nicholas Piggin [Fri, 24 Jul 2020 13:14:19 +0000 (23:14 +1000)]
powerpc: Move spinlock implementation to simple_spinlock
To prepare for queued spinlocks. This is a simple rename except to
update preprocessor guard name and a file reference.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-3-npiggin@gmail.com
Nicholas Piggin [Fri, 24 Jul 2020 13:14:18 +0000 (23:14 +1000)]
powerpc/pseries: Move some PAPR paravirt functions to their own file
These functions will be used by the queued spinlock implementation,
and may be useful elsewhere too, so move them out of spinlock.h.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131423.1362108-2-npiggin@gmail.com
Srikar Dronamraju [Fri, 24 Jul 2020 10:58:09 +0000 (16:28 +0530)]
powerpc/numa: Limit possible nodes to within num_possible_nodes
MAX_NUMNODES is a theoretical maximum number of nodes thats is
supported by the kernel. Device tree properties exposes the number of
possible nodes on the current platform. The kernel would detected this
and would use it for most of its resource allocations. If the platform
now increases the nodes to over what was already exposed, then it may
lead to inconsistencies. Hence limit it to the already exposed nodes.
Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724105809.24733-1-srikar@linux.vnet.ibm.com
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Clarify definition of macii_init()
The function prototype correctly specifies the 'static' storage class.
Let the function definition match the declaration for better readability.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c025aed0b1506399b73ff1d1bfa40ed641fcb3e3.1593318192.git.fthain@telegraphics.com.au
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Use the stack for reset request storage
The adb_request struct can be stored on the stack because the request
is synchronous and is completed before the function returns.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a40f80dde90991757007b6962c386a208c970586.1593318192.git.fthain@telegraphics.com.au
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Use unsigned type for autopoll_devs variable
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ca5be30ba745c08c2b7a1f0618f99c61b303e983.1593318192.git.fthain@telegraphics.com.au
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Use bool type for reading_reply variable
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/779551219a11b19e574dfcd87e4ef60af08c4fc3.1593318192.git.fthain@telegraphics.com.au
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Handle poll replies correctly
Userspace applications may use /dev/adb to send Talk requests. Such
requests always have req->reply_expected == 1. The same is true of Talk
requests sent by the kernel, except for poll requests queued internally
by the via-macii driver. Those requests have req->reply_expected == 0.
Consequently, poll reply packets get treated like autopoll reply packets.
(It doesn't make sense to try to distinguish them.) Always enter 'reading'
state after a poll request, so that the reply gets collected and passed
to adb_input(), and none go missing.
All Talk replies passed to adb_input() come from polling or autopolling,
so call adb_input() with the autopoll parameter set to 1.
Fixes:
d95fd5fce88f0 ("m68k: Mac II ADB fixes") # v5.0+
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/754cddfa045e5bfa53e5da199831de02e7d2f27f.1593318192.git.fthain@telegraphics.com.au
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Remove read_done state
The driver state machine may enter the 'read_done' state when leaving the
'idle' or 'reading' state. This transition is pointless, as is the extra
interrupt it requires. The interrupt is produced by the transceiver
(even when it has no data to send) because an extra EVEN/ODD toggle
was signalled by the driver. Drop the extra state to simplify the code.
Fixes:
1da177e4c3f41 ("Linux-2.6.12-rc2") # v5.0+
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0253194363af4426f9788796811a6a29fb87c713.1593318192.git.fthain@telegraphics.com.au
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Handle /CTLR_IRQ signal correctly
I'm told that the /CTLR_IRQ signal from the ADB transceiver gets
interpreted by MacOS to mean SRQ, bus timeout or end-of-packet depending
on the circumstances, and that Linux's via-macii driver does not
correctly interpret this signal.
Instead, the via-macii driver interprets certain received byte values
(0x00 and 0xFF) as signalling end of packet and bus timeout
(respectively). Problem is, those values can also appear under other
circumstances.
This patch changes the bus timeout, end of packet and SRQ detection logic
to bring it closer to the logic that MacOS reportedly uses.
Fixes:
1da177e4c3f41 ("Linux-2.6.12-rc2") # v5.0+
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6541fda1d8db3ae87c3abe17d189a10dc96e2382.1593318192.git.fthain@telegraphics.com.au
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Poll the device most likely to respond
Poll the most recently polled device by default, rather than the lowest
device address that happens to be enabled in autopoll_devs. This improves
input latency. Re-use macii_queue_poll() rather than duplicate that logic.
This eliminates a static struct and function.
Fixes:
d95fd5fce88f0 ("m68k: Mac II ADB fixes") # v5.0+
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5836f80886ebcfbe5be5fb7e0dc49feed6469712.1593318192.git.fthain@telegraphics.com.au
Finn Thain [Sun, 28 Jun 2020 04:23:12 +0000 (14:23 +1000)]
macintosh/via-macii: Access autopoll_devs when inside lock
The interrupt handler should be excluded when accessing the autopoll_devs
variable.
Fixes:
d95fd5fce88f0 ("m68k: Mac II ADB fixes") # v5.0+
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5952dd8a9bc9de90f1acc4790c51dd42b4c98065.1593318192.git.fthain@telegraphics.com.au
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Implement SRQ autopolling
The adb_driver.autopoll method is needed during ADB bus scan and device
address assignment. Implement this method so that the IOP's list of
device addresses can be updated. When the list is empty, disable SRQ
autopolling.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0fb7fdcd99d7820bb27faf1f27f7f6f1923914ef.1590880623.git.fthain@telegraphics.com.au
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Implement sending -> idle state transition
On leaving the 'sending' state, proceed to the 'idle' state if no reply is
expected. Drop redundant test for adb_iop_state == sending && current_req.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6991996dd4aaf0b52cfd650172bf0f6fbe37a452.1590880623.git.fthain@telegraphics.com.au
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Implement idle -> sending state transition
In the present algorithm, the 'idle' state transition does not take
place until there's a bus timeout. Once idle, the driver does not
automatically proceed with the next request.
Change the algorithm so that queued ADB requests will be sent as soon as
the driver becomes idle. This is to take place after the current IOP
message is completed.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dedcdfc62f43e85cc4c2a8d211a7e2fec7bc7c1a.1590880623.git.fthain@telegraphics.com.au
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Resolve static checker warnings
drivers/macintosh/adb-iop.c:215:28: warning: Using plain integer as NULL pointer
drivers/macintosh/adb-iop.c:170:5: warning: symbol 'adb_iop_probe' was not declared. Should it be static?
drivers/macintosh/adb-iop.c:177:5: warning: symbol 'adb_iop_init' was not declared. Should it be static?
drivers/macintosh/adb-iop.c:184:5: warning: symbol 'adb_iop_send_request' was not declared. Should it be static?
drivers/macintosh/adb-iop.c:230:5: warning: symbol 'adb_iop_autopoll' was not declared. Should it be static?
drivers/macintosh/adb-iop.c:236:6: warning: symbol 'adb_iop_poll' was not declared. Should it be static?
drivers/macintosh/adb-iop.c:241:5: warning: symbol 'adb_iop_reset_bus' was not declared. Should it be static?
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/25edf4450abd20e002b166ba3a11189dc1efa906.1590880623.git.fthain@telegraphics.com.au
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Access current_req and adb_iop_state when inside lock
Drop the redundant local_irq_save/restore() from adb_iop_start() because
the caller has to do it anyway. This is the pattern used in via-macii.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bbe32b087c7e04d68e2425f6a2df4a414d167c32.1590880623.git.fthain@telegraphics.com.au
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Adopt bus reset algorithm from via-macii driver
This algorithm is slightly shorter and avoids the surprising
adb_iop_start() call in adb_iop_poll().
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b63d56ecb6e75f11a0bf02231f3b2db656a528a3.1590880623.git.fthain@telegraphics.com.au
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Correct comment text
This patch improves comment style and corrects some misunderstandings
in the text.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/996f835d2f3d90baaaf9ee954e252d06e8886c6f.1590880623.git.fthain@telegraphics.com.au
Finn Thain [Sat, 30 May 2020 23:17:03 +0000 (09:17 +1000)]
macintosh/adb-iop: Remove dead and redundant code
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7720ffb559c334504e16b24d9c2f3b8973d2d674.1590880623.git.fthain@telegraphics.com.au
Athira Rajeev [Thu, 23 Jul 2020 07:32:37 +0000 (03:32 -0400)]
powerpc/perf: Initialize power10 PMU registers in cpu setup routine
Initialize Monitor Mode Control Register 3 (MMCR3)
SPR which is new in power10. For PowerISA v3.1, BHRB disable
is controlled via Monitor Mode Control Register A (MMCRA) bit,
namely "BHRB Recording Disable (BHRBRD)". This patch also initializes
MMCRA BHRBRD to disable BHRB feature at boot for power10.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1595489557-2047-1-git-send-email-atrajeev@linux.vnet.ibm.com
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:15 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Remove vfs_expanded
Previously iov->vfs_expanded was used for two purposes.
1) To work out how much we need to multiple the per-VF BAR size to figure
out the total space required for the IOV BAR.
2) To indicate that IOV is not usable with this device (vfs_expanded == 0).
We don't really need the field for either since the multiple in 1) is
always the number PEs supported by the PHB. Similarly, we don't really need
it in 2) either since the IOV data field will be NULL if we can't use IOV
with the device.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-16-oohall@gmail.com
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:14 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Make single PE mode a per-BAR setting
Using single PE BARs to map an SR-IOV BAR is really a choice about what
strategy to use when mapping a BAR. It doesn't make much sense for this to
be a global setting since a device might have one large BAR which needs to
be mapped with single PE windows and another smaller BAR that can be mapped
with a regular segmented window. Make the segmented vs single decision a
per-BAR setting and clean up the logic that decides which mode to use.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-15-oohall@gmail.com
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:13 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Refactor M64 BAR setup
Split up the logic so that we have one branch that handles setting up a
segmented window and another that handles setting up single PE windows for
each VF.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-14-oohall@gmail.com
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:12 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Move M64 BAR allocation into a helper
I want to refactor the loop this code is currently inside of. Hoist it on
out.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-13-oohall@gmail.com
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:11 +0000 (16:57 +1000)]
powerpc/powernv/sriov: De-indent setup and teardown
Remove the IODA2 PHB checks. We already assume IODA2 in several places so
there's not much point in wrapping most of the setup and teardown process
in an if block.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-12-oohall@gmail.com
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:10 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Drop iov->pe_num_map[]
Currently the iov->pe_num_map[] does one of two things depending on
whether single PE mode is being used or not. When it is, this contains an
array which maps a vf_index to the corresponding PE number. When single PE
mode is not being used this contains a scalar which is the base PE for the
set of enabled VFs (for for VFn is base + n).
The array was necessary because when calling pnv_ioda_alloc_pe() there is
no guarantee that the allocated PEs would be contigious. We can now
allocate contigious blocks of PEs so this is no longer an issue. This
allows us to drop the if (single_mode) {} .. else {} block scattered
through the SR-IOV code which is a nice clean up.
This also fixes a bug in pnv_pci_sriov_disable() which is the non-atomic
bitmap_clear() to manipulate the PE allocation map. Other users of the map
assume it will be accessed with atomic ops.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-11-oohall@gmail.com
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:09 +0000 (16:57 +1000)]
powerpc/powernv/pci: Refactor pnv_ioda_alloc_pe()
Rework the PE allocation logic to allow allocating blocks of PEs rather
than individually. We'll use this to allocate contigious blocks of PEs for
the SR-IOVs.
This patch also adds code to pnv_ioda_alloc_pe() and pnv_ioda_reserve_pe() to
use the existing, but unused, phb->pe_alloc_mutex. Currently these functions
use atomic bit ops to release a currently allocated PE number. However,
the pnv_ioda_alloc_pe() wants to have exclusive access to the bit map while
scanning for hole large enough to accomodate the allocation size.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-10-oohall@gmail.com
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:08 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Factor out M64 BAR setup
The sequence required to use the single PE BAR mode is kinda janky and
requires a little explanation. The API was designed with P7-IOC style
windows where the setup process is something like:
1. Configure the window start / end address
2. Enable the window
3. Map the segments of each window to the PE
For Single PE BARs the process is:
1. Set the PE for segment zero on a disabled window
2. Set the range
3. Enable the window
Move the OPAL calls into their own helper functions where the quirks can be
contained.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-9-oohall@gmail.com
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:07 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Simplify used window tracking
No need for the multi-dimensional arrays, just use a bitmap.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-8-oohall@gmail.com
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:06 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Rename truncate_iov
This prevents SR-IOV being used by making the SR-IOV BAR resources
unallocatable. Rename it to reflect what it actually does.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-7-oohall@gmail.com
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:05 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Explain how SR-IOV works on PowerNV
SR-IOV support on PowerNV is a byzantine maze of hooks. I have no idea
how anyone is supposed to know how it works except through a lot of
stuffering. Write up some docs about the overall story to help out
the next sucker^Wperson who needs to tinker with it.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-6-oohall@gmail.com
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:04 +0000 (16:57 +1000)]
powerpc/powernv/sriov: Move SR-IOV into a separate file
pci-ioda.c is getting a bit unwieldly due to the amount of stuff jammed in
there. The SR-IOV support can be extracted easily enough and is mostly
standalone, so move it into a separate file.
This patch also moves the PowerNV SR-IOV specific fields from pci_dn and
moves them into a platform specific structure. I'm not sure how they ended
up in there in the first place, but leaking platform specifics into common
code has proven to be a terrible idea so far so lets stop doing that.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-5-oohall@gmail.com
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:03 +0000 (16:57 +1000)]
powerpc/powernv/pci: Initialise M64 for IODA1 as a 1-1 window
We pre-configure the m64 window for IODA1 as a 1-1 segment-PE mapping,
similar to PHB3. Currently the actual mapping of segments occurs in
pnv_ioda_pick_m64_pe(), but we can move it into pnv_ioda1_init_m64() and
drop the IODA1 specific code paths in the PE setup / teardown.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-4-oohall@gmail.com
Oliver O'Halloran [Wed, 22 Jul 2020 06:57:02 +0000 (16:57 +1000)]
powerpc/powernv/pci: Add explicit tracking of the DMA setup state
There's an optimisation in the PE setup which skips performing DMA
setup for a PE if we only have bridges in a PE. The assumption being
that only "real" devices will DMA to system memory, which is probably
fair. However, if we start off with only bridge devices in a PE then
add a non-bridge device the new device won't be able to use DMA because
we never configured it.
Fix this (admittedly pretty weird) edge case by tracking whether we've done
the DMA setup for the PE or not. If a non-bridge device is added to the PE
(via rescan or hotplug, or whatever) we can set up DMA on demand.
This also means the only remaining user of the old "DMA Weight" code is
the IODA1 DMA setup code that it was originally added for, which is good.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722065715.1432738-3-oohall@gmail.com