platform/kernel/linux-starfive.git
8 years agopowerpc: Remove unused remnants from A2 cpu
Rashmica Gupta [Tue, 12 Apr 2016 05:33:58 +0000 (15:33 +1000)]
powerpc: Remove unused remnants from A2 cpu

Support for the A2 cpu was removed in commit fb5a515704d7 ("powerpc:
Remove platforms/wsp and associated pieces"), and the externs:
__setup_cpu_a2 and __restore_cpu_a2 are still around and unused, so
remove them.

Signed-off-by: Rashmica Gupta <rashmicy@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/slice: Remove slice_mm_new_context()
Aneesh Kumar K.V [Mon, 2 May 2016 12:56:07 +0000 (18:26 +0530)]
powerpc/mm/slice: Remove slice_mm_new_context()

The usage in mm mmu_context_nohash.c is bogus, because we set the
context.id value to MMU_NO_CONTEXT 4 lines previously in the same
function, meaning slice_mm_new_context() will always be true.

The book3s 64 usage was removed in the previous commit. So remove it as
unused.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/subpage: Initialise user psize correctly
Aneesh Kumar K.V [Mon, 2 May 2016 10:51:50 +0000 (16:21 +0530)]
powerpc/mm/subpage: Initialise user psize correctly

As part of the radix support we switched Book3s64 to use a value of ~0
for MMU_NO_CONTEXT. That is because id 0 is special on radix.

However that broke the logic in init_new_context(). The code there needs
to differentiate between a newly allocated context and one inherited via
fork. Previously it worked because a newly allocated context has an id
of zero (because it was just memset() to zero), which used to match
MMU_NO_CONTEXT, and therefore slice_mm_new_context() did the right
thing.

Instead check against a context.id value of zero instead of using
slice_mm_new_context().

Without this patch we never call slice_set_user_psize(), and end up with
a slice psize value of zero and we always end up using 4K HPTE.

Fixes: 1a472c9dba6b ("powerpc/mm/radix: Add tlbflush routines")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Fix CONFIG_PPC_MMU_STD_64 typo
Valentin Rothberg [Tue, 3 May 2016 06:59:27 +0000 (08:59 +0200)]
powerpc/mm/radix: Fix CONFIG_PPC_MMU_STD_64 typo

It's CONFIG_PPC_STD_MMU_64 not ...
     CONFIG_PPC_MMU_STD_64.

Fixes: 11ffc1cfa4c2 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code")
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Document software bits for radix
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:33 +0000 (23:26 +1000)]
powerpc/mm/radix: Document software bits for radix

Add #defines for Power ISA 3.0 software defined bits.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Use firmware feature to enable Radix MMU
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:32 +0000 (23:26 +1000)]
powerpc/mm/radix: Use firmware feature to enable Radix MMU

We use the existing "ibm,pa-features" device-tree property to enable
Radix MMU mode. This means we default to hash mode unless firmware tells
us it's OK to start using Radix mode.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Add THP support for 4K linux page size
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:31 +0000 (23:26 +1000)]
powerpc/mm/radix: Add THP support for 4K linux page size

This adds THP support for 4K Linux page size config with radix. We still
don't do THP with 4K Linux page size and hash page table. Hash page
table needs a 16MB hugepage and we can't do THP with 16MM hugepage and
4K Linux page size.

We add missing functions to 4K hash config to get it to build and
hash__has_transparent_hugepage() makes sure we don't enable THP for 4K
hash config. To catch wrong usage of THP related with 4K config, we add
BUG() in those dummy functions we added to get it compile.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Add radix THP callbacks
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:30 +0000 (23:26 +1000)]
powerpc/mm/radix: Add radix THP callbacks

The deposited pgtable_t is a pte fragment hence we cannot use page->lru
for linking then together. We use the first two 64 bits for pte fragment
as list_head type to link all deposited fragments together. On withdraw
we properly zero then out.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/thp: Abstraction for THP functions
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:29 +0000 (23:26 +1000)]
powerpc/mm/thp: Abstraction for THP functions

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: THP is only available on hash64 as of now
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:28 +0000 (23:26 +1000)]
powerpc/mm: THP is only available on hash64 as of now

Only code movement in this patch. No functionality change.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Add hugetlb support 4K page size
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:27 +0000 (23:26 +1000)]
powerpc/mm/radix: Add hugetlb support 4K page size

We have hugepage at the pmd level with 4K radix config. Hence we don't
need to use hugepd format with radix.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Make sure swapper pgdir is properly aligned
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:26 +0000 (23:26 +1000)]
powerpc/mm/radix: Make sure swapper pgdir is properly aligned

With 4K page size radix config our level 1 page table size is 64K and it
should be naturally aligned.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Add radix support for hugetlb
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:25 +0000 (23:26 +1000)]
powerpc/mm: Add radix support for hugetlb

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Fix vma_mmu_pagesize() for radix
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:24 +0000 (23:26 +1000)]
powerpc/mm: Fix vma_mmu_pagesize() for radix

Radix doesn't use the slice framework to find the page size. Hence use
vma to find the page size.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: pte_frag abstraction
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:23 +0000 (23:26 +1000)]
powerpc/mm: pte_frag abstraction

In this patch we make the number of pte fragments per level 4 page table
page a variable. Radix level 4 table size is 256 bytes and hence we can
have 256 fragments per level 4 page. We don't update the fragment count
in this patch. We need to do performance measurements to find the right
value for fragment count.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/radix: Update MMU cache
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:22 +0000 (23:26 +1000)]
powerpc/radix: Update MMU cache

With radix there is no MMU cache. Hence we don't need to do anything in
update_mmu_cache().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: vmalloc abstraction in preparation for radix
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:21 +0000 (23:26 +1000)]
powerpc/mm: vmalloc abstraction in preparation for radix

The vmalloc range differs between hash and radix config. Hence make
VMALLOC_START and related constants a variable which will be runtime
initialized depending on whether hash or radix mode is active.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Fix missing init of ioremap_bot in pgtable_64.c for ppc64e]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Update pte filter for radix
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:20 +0000 (23:26 +1000)]
powerpc/mm: Update pte filter for radix

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Add radix pgalloc details
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:19 +0000 (23:26 +1000)]
powerpc/mm: Add radix pgalloc details

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Make 4K and 64K use pte_t for pgtable_t
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:18 +0000 (23:26 +1000)]
powerpc/mm: Make 4K and 64K use pte_t for pgtable_t

This patch switches 4K Linux page size config to use pte_t * type
instead of struct page * for pgtable_t. This simplifies the code a lot
and helps in consolidating both 64K and 4K page allocator routines. The
changes should not have any impact, because we already store physical
address in the upper level page table tree and that implies we already
do struct page * to physical address conversion.

One change to note here is we move the pgtable_page_dtor() call for
nohash to pte_fragment_free_mm(). The nohash related change is due to
the related changes in pgtable_64.c.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Rename function to indicate we are allocating fragments
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:17 +0000 (23:26 +1000)]
powerpc/mm: Rename function to indicate we are allocating fragments

Only code cleanup. No functionality change.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Simplify the code dropping 4-level table #ifdef
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:16 +0000 (23:26 +1000)]
powerpc/mm: Simplify the code dropping 4-level table #ifdef

Simplify the code by dropping 4-level page table #ifdef. We are always
4-level now.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Revert changes made to nohash pgalloc-64.h
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:15 +0000 (23:26 +1000)]
powerpc/mm: Revert changes made to nohash pgalloc-64.h

This reverts pgalloc related changes WRT implementing 4-level page
table for 64K Linux page size and storing of physical address in higher
level page tables since they are only applicable to book3s64 variant
and we now have a separate copy for book3s64. This helps to keep these
headers simpler.

Cc: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Copy pgalloc (part 2)
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:14 +0000 (23:26 +1000)]
powerpc/mm: Copy pgalloc (part 2)

This moves the nohash variant of pgalloc headers to nohash/ directory

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Make a copy of pgalloc.h for 32 and 64 book3s
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:13 +0000 (23:26 +1000)]
powerpc/mm: Make a copy of pgalloc.h for 32 and 64 book3s

This patch start to make a book3s variant for pgalloc headers. We have
multiple book3s specific changes such as:
  * 4 level page table
  * store physical address in higher level table
  * use pte_t * for pgtable_t

Having a book3s64 specific variant helps to keep code simpler and remove
lots of #ifdef around code.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Update PTCR on secondary CPUs
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:12 +0000 (23:26 +1000)]
powerpc/mm/radix: Update PTCR on secondary CPUs

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Pick the address layout for radix config
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:11 +0000 (23:26 +1000)]
powerpc/mm/radix: Pick the address layout for radix config

Hash needs special get_unmapped_area() handling because of limitations
around base page size, so we have to set HAVE_ARCH_UNMAPPED_AREA.

With radix we don't have such restrictions, so we could use the generic
code. But because we've set HAVE_ARCH_UNMAPPED_AREA (for hash), we have
to re-implement the same logic as the generic code.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Limit paca allocation in radix
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:10 +0000 (23:26 +1000)]
powerpc/mm/radix: Limit paca allocation in radix

On return from RTAS we access the paca variables and we have 64 bit
disabled. This requires us to limit paca in 32 bit range.

Fix this by setting ppc64_rma_size to first_memblock_size/1G range.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Add checks in slice code to catch radix usage
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:09 +0000 (23:26 +1000)]
powerpc/mm/radix: Add checks in slice code to catch radix usage

Radix doesn't need slice support. Catch incorrect usage of slice code
when radix is enabled.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Isolate hash table function from pseries guest code
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:08 +0000 (23:26 +1000)]
powerpc/mm/radix: Isolate hash table function from pseries guest code

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:07 +0000 (23:26 +1000)]
powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code

We also use MMU_FTR_RADIX to branch out from code path specific to
hash.

No functionality change.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Add MMU_FTR_RADIX
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:06 +0000 (23:26 +1000)]
powerpc/mm/radix: Add MMU_FTR_RADIX

We are going to add asm changes in the follow up patches. Add the
feature bit now so that we can get it all build.

mpe: When CONFIG_PPC_RADIX_MMU=n we omit MMU_FTR_RADIX from the
MMU_FTRS_POSSIBLE mask. This allows the compiler to work out that those
checks will always be false and so the code can be elided completely.

Note we do *not* define MMU_FTR_RADIX to 0 in the RADIX_MMU=n case,
because that doesn't work with the ASM_FTR patching. In particular an
IF_SET section will result in a mask and value of zero, which is always
true, meaning the section *won't* be patched, which is the opposite of
what we want.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Add mask of possible MMU features
Michael Ellerman [Wed, 11 May 2016 05:30:47 +0000 (15:30 +1000)]
powerpc/mm: Add mask of possible MMU features

Follow the example of the cpu feature code, and add a mask of possible
MMU features, MMU_FTRS_POSSIBLE.

This is used in mmu_has_feature(), which allows the possible mask to act
as a shortcut for any features that are not possible, but still allows
the feature bit itself to be defined.

We will use this in the next commit to allow MMU_FTR_RADIX checks to be
elided when MMU_FTR_RADIX is not possible.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Add tlbflush routines
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:05 +0000 (23:26 +1000)]
powerpc/mm/radix: Add tlbflush routines

Core kernel doesn't track the page size of the VA range that we are
invalidating. Hence we end up flushing TLB for the entire mm here. Later
patches will improve this.

We also don't flush page walk cache separetly instead use RIC=2 when
flushing TLB, because we do a MMU gather flush after freeing page table.

MMU_NO_CONTEXT is updated for hash.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Hash abstraction for tlbflush routines
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:04 +0000 (23:26 +1000)]
powerpc/mm: Hash abstraction for tlbflush routines

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Rename mmu_context_hash64.c to mmu_context_book3s64.c
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:03 +0000 (23:26 +1000)]
powerpc/mm: Rename mmu_context_hash64.c to mmu_context_book3s64.c

This file now contains both hash and radix specific code. Rename it to
indicate this better.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Add mmu context handling callback for radix
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:02 +0000 (23:26 +1000)]
powerpc/mm/radix: Add mmu context handling callback for radix

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Abstraction for switch_mmu_context()
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:01 +0000 (23:26 +1000)]
powerpc/mm: Abstraction for switch_mmu_context()

How we switch MMU context differs between hash and radix. For hash we
need to switch the SLB details and for radix we need to switch the PID
SPR.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Add radix callbacks for vmemmap and map_kernel page()
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:26:00 +0000 (23:26 +1000)]
powerpc/mm/radix: Add radix callbacks for vmemmap and map_kernel page()

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Abstraction for vmemmap and map_kernel_page()
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:59 +0000 (23:25 +1000)]
powerpc/mm: Abstraction for vmemmap and map_kernel_page()

For hash we create vmemmap mapping using bolted hash page table entries.
For radix we fill the radix page table. The next patch will add the
radix details for creating vmemmap mappings.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Add radix callbacks for early init routines
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:58 +0000 (23:25 +1000)]
powerpc/mm/radix: Add radix callbacks for early init routines

This adds routines for early setup for radix. We use device tree
property "ibm,processor-radix-AP-encodings" to find supported page
sizes. If we don't find the above we consider 64K and 4K as supported
page sizes.

We do map vmemap using 2M page size if we can. The linear mapping is
done such that we use required page size for that range. For example
memory of 3.5G is mapped such that we use 1G mapping till 3G range and
use 2M mapping for the rest.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Abstract early MMU init in preparation for radix
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:57 +0000 (23:25 +1000)]
powerpc/mm: Abstract early MMU init in preparation for radix

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Add radix callback for pmd accessors
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:56 +0000 (23:25 +1000)]
powerpc/mm/radix: Add radix callback for pmd accessors

This only does 64K Linux page support for now. 64K hash Linux config
THP needs to differentiate it from hugetlb huge page because with THP we
need to track hash pte slot information with respect to each subpage.
This is not needed with hugetlb hugepage, because we don't do MPSS with
hugetlb.

Radix doesn't have any such restrictions.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Move hugetlb and THP related pmd accessors to pgtable.h
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:55 +0000 (23:25 +1000)]
powerpc/mm: Move hugetlb and THP related pmd accessors to pgtable.h

Here we create pgtable-64/4k.h and move pmd accessors that are common
between hash and radix there. We can't do much sharing with 4K Linux
page size because 4K Linux page size with hash config doesn't support
THP. So for now it is empty. In later patches we will add functions that
does conditional hash/radix accessors there.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Add radix callbacks to pte accessors
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:54 +0000 (23:25 +1000)]
powerpc/mm: Add radix callbacks to pte accessors

For those pte accessors, that operate on a different set of pte bits
between hash/radix, we add a generic variant that does a conditional
to hash linux or radix variant.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Add dummy radix_enabled()
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:53 +0000 (23:25 +1000)]
powerpc/mm/radix: Add dummy radix_enabled()

In this patch we add the radix Kconfig and conditional check.
radix_enabled() is written to always return 0 here. Once we have all
needed radix changes added, we will update this to an mmu_feature check.

We need to add this early so that we can get it all build in the early
stage.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Add radix pte #defines
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:52 +0000 (23:25 +1000)]
powerpc/mm/radix: Add radix pte #defines

This adds Power ISA 3.0 specific pte defines. We share most of the
details with hash Linux page table format. This patch indicates only
things where we differ.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Move pte related functions together
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:51 +0000 (23:25 +1000)]
powerpc/mm: Move pte related functions together

Only code movement. No functionality change.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Move page table index and and vaddr to pgtable.h
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:50 +0000 (23:25 +1000)]
powerpc/mm: Move page table index and and vaddr to pgtable.h

Now that the page table size is a variable, we can move these to
generic pgtable.h.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Make page table size a variable
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:49 +0000 (23:25 +1000)]
powerpc/mm: Make page table size a variable

Radix and hash MMU models support different page table sizes. Make
the #defines a variable so that existing code can work with variable
sizes.

Slice related code is only used by hash, so use hash constants there. We
will replicate some of the boundary conditions with resepct to TASK_SIZE
using radix values too. Right now we do boundary condition check using
hash constants.

Swapper pgdir size is initialized in asm code. We select the max pgd
size to keep it simple. For now we select hash pgdir. When adding radix
we will switch that to radix pgdir which is 64K.

BUILD_BUG_ON check which is removed is already done in hugepage_init()
using MAYBE_BUILD_BUG_ON().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Move pte accessors that operate on common pte bits to pgtable.h
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:48 +0000 (23:25 +1000)]
powerpc/mm: Move pte accessors that operate on common pte bits to pgtable.h

These pte functions will remain the same between radix and hash. Move
them to pgtable.h.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Move common pte bits and accessors to book3s/64/pgtable.h
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:47 +0000 (23:25 +1000)]
powerpc/mm: Move common pte bits and accessors to book3s/64/pgtable.h

Now that we have moved book3s hash64 Linux pte bits to match Power ISA
3.0 radix pte bit positions, we move the matching pte bits to a common
header.

Only code movement in this patch. No functionality change.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Handle _PTE_NONE_MASK
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:46 +0000 (23:25 +1000)]
powerpc/mm: Handle _PTE_NONE_MASK

I am splitting this as a separate patch to get better review. If ok
we should merge this with previous patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/book3s: Rename hash specific PTE bits to carry H_ prefix
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:45 +0000 (23:25 +1000)]
powerpc/mm/book3s: Rename hash specific PTE bits to carry H_ prefix

This helps to make following hash only pte bits easier.

We have kept _PAGE_CHG_MASK, _HPAGE_CHG_MASK and _PAGE_PROT_BITS as it
is in this patch eventhough they use hash specific bits. Using them in
radix as it is should be ok, because with radix we expect those bit
positions to be zero.

Only renames in this patch, no change in functionality.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Move hash and no hash code to separate files
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:44 +0000 (23:25 +1000)]
powerpc/mm: Move hash and no hash code to separate files

This patch reduces the number of #ifdefs in C code and will also help in
adding radix changes later. Only code movement in this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Propagate copyrights and update GPL text]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/hash: Add support for Power9 Hash
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:43 +0000 (23:25 +1000)]
powerpc/mm/hash: Add support for Power9 Hash

PowerISA 3.0 adds a parition table indexed by LPID. Parition table
allows us to specify the MMU model that will be used for guest and host
translation.

This patch adds support with SLB based hash model (UPRT = 0). What is
required with this model is to support the new hash page table entry
format and also setup partition table such that we use hash table for
address translation.

We don't have segment table support yet.

In order to make sure we don't load KVM module on Power9 (since we don't
have kvm support yet) this patch also disables KVM on Power9.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/radix: Add partition table format & callback
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:42 +0000 (23:25 +1000)]
powerpc/mm/radix: Add partition table format & callback

Add structs and #defines related to the radix MMU partition table
format. We also add a ppc_md callback for updating a partition table
entry.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Move radix/hash common data structures to book3s64 headers
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:41 +0000 (23:25 +1000)]
powerpc/mm: Move radix/hash common data structures to book3s64 headers

Start moving code that is generic between radix and hash to book3s64
specific headers from the book3s64 hash specific one.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Use generic version of ptep_clear_flush_young()
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:40 +0000 (23:25 +1000)]
powerpc/mm: Use generic version of ptep_clear_flush_young()

The radix variant is going to require a flush_tlb_range(). With
flush_tlb_range() added, ptep_clear_flush_young() is the same as the
generic version. So drop the powerpc specific variant.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Use generic version of pmdp_clear_flush_young()
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:39 +0000 (23:25 +1000)]
powerpc/mm: Use generic version of pmdp_clear_flush_young()

The radix variant is going to require a flush_pmd_tlb_range(). With
flush_pmd_tlb_range() added, pmdp_clear_flush_young() is the same as the
generic version. So drop the powerpc specific variant.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Drop WIMG in favour of new constants
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:38 +0000 (23:25 +1000)]
powerpc/mm: Drop WIMG in favour of new constants

PowerISA 3.0 introduces two pte bits with the below meaning for radix:
  00 -> Normal Memory
  01 -> Strong Access Order (SAO)
  10 -> Non idempotent I/O (Cache inhibited and guarded)
  11 -> Tolerant I/O (Cache inhibited)

We drop the existing WIMG bits in the Linux page table in favour of the
above constants. We loose _PAGE_WRITETHRU with this conversion. We only
use writethru via pgprot_cached_wthru() which is used by
fbdev/controlfb.c which is Apple control display and also PPC32.

With respect to _PAGE_COHERENCE, we have been marking hpte always
coherent for some time now. htab_convert_pte_flags() always added
HPTE_R_M.

NOTE: KVM changes need closer review.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Use a helper for finding pte bits mapping I/O area
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:37 +0000 (23:25 +1000)]
powerpc/mm: Use a helper for finding pte bits mapping I/O area

Use a helper instead of open coding with constants. A later patch will
drop the WIMG bits and use PowerISA 3.0 defines.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Update _PAGE_KERNEL_RO
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:36 +0000 (23:25 +1000)]
powerpc/mm: Update _PAGE_KERNEL_RO

PS3 had used a PPP bit hack to implement a read only mapping in the
kernel area. Since we are bolting the ioremap area, it used the pte
flags _PAGE_PRESENT | _PAGE_USER to get a PPP value of 0x3 there by
resulting in a read only mapping. This means the area can be accessed by
user space, but kernel will never return such an address to user space.

But we can do better by implementing a read only kernel mapping using
PPP bits 0b110.

This also allows us to do read only kernel mapping for radix in later
patches.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Remove RPN_SHIFT and RPN_SIZE
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:35 +0000 (23:25 +1000)]
powerpc/mm: Remove RPN_SHIFT and RPN_SIZE

PTE_RPN_SHIFT is actually page size dependent. Even though PowerISA 3.0
expects only the lower 12 bits to be zero, we will always find the pages
to be PAGE_SHIFT aligned. In case of hash config, this also allows us to
use the additional 3 bits to track pte specific information. We need
to make sure we use these bits only for hash specific pte flags.

For both 4K and 64K config, pte now can hold 57 bits address.

Inorder to keep things simpler, drop PTE_RPN_SHIFT and PTE_RPN_SIZE and
specify the 57 bit detail explicitly.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:34 +0000 (23:25 +1000)]
powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED

_PAGE_PRIVILEGED means the page can be accessed only by the kernel. This
is done to keep pte bits similar to PowerISA 3.0 Radix PTE format. User
pages are now marked by clearing _PAGE_PRIVILEGED bit.

Previously we allowed the kernel to have a privileged page in the lower
address range (USER_REGION). With this patch such access is denied.

We also prevent a kernel access to a non-privileged page in higher
address range (ie, REGION_ID != 0).

Both the above access scenarios should never happen.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Use pte_user() instead of open coding
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:33 +0000 (23:25 +1000)]
powerpc/mm: Use pte_user() instead of open coding

We have a common declaration in pte-common.h Add a book3s specific one
and switch to pte_user() in callchain.c. In a subsequent patch we will
switch _PAGE_USER to _PAGE_PRIVILEGED in the book3s version only.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Convert pte_user() to static inline
Michael Ellerman [Fri, 29 Apr 2016 13:25:32 +0000 (23:25 +1000)]
powerpc/mm: Convert pte_user() to static inline

In a subsequent patch we want to add a second definition of pte_user().
Before we do that, make the signature clear, ie. it takes a pte_t and
returns bool.

We move it up inside the existing #ifndef __ASSEMBLY__ block, but
otherwise it's a straight conversion.

Convert the call in settlbcam(), which passes an unsigned long, to pass
a pte_t.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm/subpage: Clear RWX bit to indicate no access
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:31 +0000 (23:25 +1000)]
powerpc/mm/subpage: Clear RWX bit to indicate no access

Subpage protection used to depend on the _PAGE_USER bit to implement no
access mode. This patch switches that to use _PAGE_RWX. We clear Read,
Write and Execute access from the pte instead of clearing _PAGE_USER
now. This was done so that we can switch to _PAGE_PRIVILEGED in a later
patch.

subpage_protection() returns pte bits that need to be cleared. Instead
of updating the interface to handle no-access in a separate way, it
appears simpler to clear RWX acecss to indicate no access.

We still don't insert hash ptes for no access implied by !_PAGE_RWX.
Hence we should not get PROT_FAULT with change.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Use _PAGE_READ to indicate Read access
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:30 +0000 (23:25 +1000)]
powerpc/mm: Use _PAGE_READ to indicate Read access

This splits the _PAGE_RW bit into _PAGE_READ and _PAGE_WRITE. It also
removes the dependency on _PAGE_USER for implying read only. Few things
to note here is that, we have read implied with write and execute
permission. Hence we should always find _PAGE_READ set on hash pte
fault.

We still can't switch PROT_NONE to !(_PAGE_RWX). Auto numa depends on
marking a prot none pte _PAGE_WRITE. (For more details look at
b191f9b106ea "mm: numa: preserve PTE write permissions across a NUMA
hinting fault")

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Use pte_raw() in pte_same()/pmd_same()
Michael Ellerman [Fri, 29 Apr 2016 13:25:29 +0000 (23:25 +1000)]
powerpc/mm: Use pte_raw() in pte_same()/pmd_same()

We can avoid doing endian conversions by using pte_raw() in pxx_same().
The swap of the constant (_PAGE_HPTEFLAGS) should be done at compile
time by the compiler.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Use big endian Linux page tables for book3s 64
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:28 +0000 (23:25 +1000)]
powerpc/mm: Use big endian Linux page tables for book3s 64

Traditionally Power server machines have used the Hashed Page Table MMU
mode. In this mode Linux manages its own tree of nested page tables,
aka. "the Linux page tables", which are not used by the hardware
directly, and software loads translations into the hash page table for
use by the hardware.

Power ISA 3.0 defines a new MMU mode, known as Radix Tree Translation,
where the hardware can directly operate on the Linux page tables.
However the hardware requires that the page tables be in big endian
format.

To accommodate this, switch the pgtable types to __be64 and add
appropriate endian conversions.

Because we will be supporting a single kernel binary that boots using
either radix or hash mode, we always store the Linux page tables big
endian, even in hash mode where they are not actually used by the
hardware.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Fix sparse errors, flesh out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Add pte_xchg() helper
Michael Ellerman [Fri, 29 Apr 2016 13:25:27 +0000 (23:25 +1000)]
powerpc/mm: Add pte_xchg() helper

We have five locations in 64-bit hash MMU code that do a cmpxchg() of a
PTE. Currently doing it inline OK, but in a future patch we will be
converting the PTEs to __be64 in some configs. In that case we will need
casts at every cmpxchg() site in order to keep sparse happy.

So move the logic into a helper, this is a reasonably nice cleanup on
its own.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Drop PTE_ATOMIC_UPDATES from pmd_hugepage_update()
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:26 +0000 (23:25 +1000)]
powerpc/mm: Drop PTE_ATOMIC_UPDATES from pmd_hugepage_update()

pmd_hugepage_update() is inside #ifdef CONFIG_TRANSPARENT_HUGEPAGE. THP
can only be enabled if PPC_BOOK3S_64=y && PPC_64K_PAGES=y, aka. hash64.

On hash64 we always define PTE_ATOMIC_UPDATES to 1, meaning the #ifdef
in pmd_hugepage_update() is unnecessary, so drop it.

That is also the only use of PTE_ATOMIC_UPDATES in any of the hash code,
meaning we no longer need to #define it at all in the hash headers.

Note it's still #defined and used in the nohash code.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mm: Always use STRICT_MM_TYPECHECKS
Michael Ellerman [Fri, 29 Apr 2016 13:25:25 +0000 (23:25 +1000)]
powerpc/mm: Always use STRICT_MM_TYPECHECKS

Testing done by Paul Mackerras has shown that with a modern compiler
there is no negative effect on code generation from enabling
STRICT_MM_TYPECHECKS.

So remove the option, and always use the strict type definitions.

Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoIB/qib: Use cache inhibitted and guarded mapping on powerpc
Aneesh Kumar K.V [Fri, 29 Apr 2016 13:25:24 +0000 (23:25 +1000)]
IB/qib: Use cache inhibitted and guarded mapping on powerpc

The driver was requesting for a writethrough mapping. But with those
flags we will end up with an SAO mapping because we now have memory
conherence always enabled. ie, the existing mapping will end up with a
WIMG value 0b1110 which is Strong Access Order.

Update this to use cache inhibitted guarded mapping.

Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/perf: Replace raw event hex values with #defines
Madhavan Srinivasan [Thu, 21 Apr 2016 10:16:34 +0000 (15:46 +0530)]
powerpc/perf: Replace raw event hex values with #defines

Minor cleanup patch to replace the raw event hex values in
power8-pmu.c with #defines.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoftrace: Match dot symbols when searching functions on ppc64
Thiago Jung Bauermann [Mon, 25 Apr 2016 21:56:14 +0000 (18:56 -0300)]
ftrace: Match dot symbols when searching functions on ppc64

In the ppc64 big endian ABI, function symbols point to function
descriptors. The symbols which point to the function entry points
have a dot in front of the function name. Consequently, when the
ftrace filter mechanism searches for the symbol corresponding to
an entry point address, it gets the dot symbol.

As a result, ftrace filter users have to be aware of this ABI detail on
ppc64 and prepend a dot to the function name when setting the filter.

The perf probe command insulates the user from this by ignoring the dot
in front of the symbol name when matching function names to symbols,
but the sysfs interface does not. This patch makes the ftrace filter
mechanism do the same when searching symbols.

Fixes the following failure in ftracetest's kprobe_ftrace.tc:

  .../kprobe_ftrace.tc: line 9: echo: write error: Invalid argument

That failure is on this line of kprobe_ftrace.tc:

  echo _do_fork > set_ftrace_filter

This is because there's no _do_fork entry in the functions list:

  # cat available_filter_functions | grep _do_fork
  ._do_fork

This change introduces no regressions on the perf and ftracetest
testsuite results.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: rework sparse for lib/xor_vmx.c
Daniel Axtens [Tue, 26 Apr 2016 13:49:09 +0000 (23:49 +1000)]
powerpc: rework sparse for lib/xor_vmx.c

Sparse doesn't seem to be passing -maltivec around properly, leading
to lots of errors:

.../include/altivec.h:34:2: error: Use the "-maltivec" flag to enable PowerPC AltiVec support
arch/powerpc/lib/xor_vmx.c:27:16: error: Expected ; at end of declaration
arch/powerpc/lib/xor_vmx.c:27:16: error: got signed
arch/powerpc/lib/xor_vmx.c:60:9: error: No right hand side of '*'-expression
arch/powerpc/lib/xor_vmx.c:60:9: error: Expected ; at end of statement
arch/powerpc/lib/xor_vmx.c:60:9: error: got v1_in
...
arch/powerpc/lib/xor_vmx.c:87:9: error: too many errors

Only include the altivec.h header for non-__CHECKER__ builds.
For builds with __CHECKER__, make up some stubs instead, as
suggested by Balbir. (The vector size of 16 is arbitrary.)

Suggested-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Tested-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: Add support for userspace P9 copy paste
Chris Smart [Tue, 26 Apr 2016 00:28:50 +0000 (10:28 +1000)]
powerpc: Add support for userspace P9 copy paste

The copy paste facility introduced in POWER9 provides an optimised
mechanism for a userspace application to copy a cacheline. This is
provided by a pair of instructions, copy and paste, while a third,
cp_abort (copy paste abort), provides a clean up of the state in case of
a failure.

The copy instruction will read a 128 byte cacheline and store it in an
internal buffer. The subsequent paste instruction will store this
internal buffer to memory and set a CR field if the paste succeeds.

Since the state of the copy paste buffer is internal (and not
architecturally visible), in the unlikely event of a context switch, the
state cannot be stored and the paste should therefore fail.

The cp_abort instruction exists to fail and clean up any such
interrupted copy paste sequence and is to be called by the kernel as
part of the context switch. Doing so prevents data from a preceding copy
in one process leaking into the paste of another.

This code enables use of the cp_abort instruction if a supported
processor is detected.

NOTE: this is for userspace only, not in kernel, and does not deal
with KVM guests.

Patch created with much assistance from Michael Neuling
<mikey@neuling.org>

Signed-off-by: Chris Smart <chris@distroguy.com>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/mpic: handle subsys_system_register() failure
Andrew Donnellan [Tue, 26 Apr 2016 07:55:04 +0000 (17:55 +1000)]
powerpc/mpic: handle subsys_system_register() failure

mpic_init_sys() currently doesn't check whether
subsys_system_register() succeeded or not. Check the return code of
subsys_system_register() and clean up if there's an error.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/eeh: fix misleading indentation
Andrew Donnellan [Tue, 26 Apr 2016 05:02:50 +0000 (15:02 +1000)]
powerpc/eeh: fix misleading indentation

Found by smatch.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agocxl: Fix DAR check & use REGION_ID instead of opencoding
Aneesh Kumar K.V [Wed, 20 Apr 2016 07:59:47 +0000 (03:59 -0400)]
cxl: Fix DAR check & use REGION_ID instead of opencoding

The current code will set _PAGE_USER to the access flags for any
fault address, because the ~ operation will be true for all address we
take a fault on. But setting _PAGE_USER also means that the fault will
be handled only if the page table have _PAGE_USER set. Hence there is
no security hole with the current code.

Now if it is an user space access, then the change in this patch really
don't have an impact because we have (!ctx->kernel) set true
and we take the if condition true.

Now kernel context created fault on an address in the kernel range
will result in a fault loop because we will not insert the
hash pte due to access and pte permission mismatch. This patch fix
the above issue.

Fixes: f204e0b8cedd ("cxl: Driver code for powernv PCIe based cards for userspace access")
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agocxl: Increase timeout for detection of AFU mmio hang
Frederic Barrat [Tue, 19 Apr 2016 16:34:24 +0000 (18:34 +0200)]
cxl: Increase timeout for detection of AFU mmio hang

PSL designers recommend a larger value for the mmio hang pulse, 256 us
instead of 1 us. The CAIA architecture states that it needs to be
smaller than 1/2 of the RTOS timeout set in the PHB for outbound
non-posted transactions, which is still (easily) the case here.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Tested-by: Frank Haverkamp <haver@linux.vnet.ibm.com>
Tested-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agocxl: Allow initialization on timebase sync failures
Frederic Barrat [Mon, 21 Mar 2016 19:32:48 +0000 (14:32 -0500)]
cxl: Allow initialization on timebase sync failures

Failure to synchronize the PSL timebase currently prevents the
initialization of the cxl card, thus rendering the card useless. This
is too extreme for a feature which is rarely used, if at all. No
hardware AFUs or software is currently using PSL timebase.

This patch still tries to synchronize the PSL timebase when the card
is initialized, but ignores the error if it can't. Instead, it reports
a status via /sys.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agotool/perf: Add sample_reg_mask to include all perf_regs
Madhavan Srinivasan [Sat, 20 Feb 2016 05:02:48 +0000 (10:32 +0530)]
tool/perf: Add sample_reg_mask to include all perf_regs

Add sample_reg_mask array with pt_regs registers.
This is needed for printing supported regs ( -I? option).

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agotools/perf: Map the ID values with register names
Anju T [Sat, 20 Feb 2016 05:02:47 +0000 (10:32 +0530)]
tools/perf: Map the ID values with register names

Map ID values with corresponding register names. These names are then
displayed when user issues perf record with the -I option
followed by perf report/script with -D option.

To test this patchset, Eg:

  $ perf record -I ls   # record machine state at interrupt
  $ perf script -D      # read the perf.data file

Sample output obtained for this patch / output looks like as follows:

  496768515470 0x1988 [0x188]: PERF_RECORD_SAMPLE(IP, 0x1): 4522/4522:
  0xc0000000001e538c period: 1 addr: 0
  ... intr regs: mask 0x7ffffffffff ABI 64-bit
  .... r0    0xc0000000001e5e34
  .... r1    0xc000000fe733f9a0
  .... r2    0xc000000001523100
  .... r3    0xc000000ffaadeb60
  .... r4    0xc000000003456800
  .... r5    0x73a9b5e000
  .... r6    0x1e000000
  .... r7    0x0
  .... r8    0x0
  .... r9    0x0
  .... r10   0x1
  .... r11   0x0
  .... r12   0x24022822
  .... r13   0xc00000000feec180
  .... r14   0x0
  .... r15   0xc000001e4be18800
  .... r16   0x0
  .... r17   0xc000000ffaac5000
  .... r18   0xc000000fe733f8a0
  .... r19   0xc000000001523100
  .... r20   0xc00000000009fd1c
  .... r21   0xc000000fcaa69000
  .... r22   0xc0000000001e4968
  .... r23   0xc000000001523100
  .... r24   0xc000000fe733f850
  .... r25   0xc000000fcaa69000
  .... r26   0xc000000003b8fcf0
  .... r27   0xfffffffffffffead
  .... r28   0x0
  .... r29   0xc000000fcaa69000
  .... r30   0x1
  .... r31   0x0
  .... nip   0xc0000000001dd320
  .... msr   0x9000000000009032
  .... orig_r3 0xc0000000001e538c
  .... ctr   0xc00000000009d550
  .... link  0xc0000000001e5e34
  .... xer   0x0
  .... ccr   0x84022882
  .... softe 0x0
  .... trap  0xf01
  .... dar   0x0
  .... dsisr 0xf00040060000004
   ... thread: :4522:4522
   ...... dso: /root/.debug/.build-id/b0/ef11b1a1629e62ac9de75199117ee5ef9469e9
             :4522 4522 496.768515: 1 cycles: c0000000001e538c
             .perf_event_context_sched_in (/boot/vmlinux)

Signed-off-by: Anju T <anju@linux.vnet.ibm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/perf: Add support for sampling interrupt register state
Anju T [Sat, 20 Feb 2016 05:02:46 +0000 (10:32 +0530)]
powerpc/perf: Add support for sampling interrupt register state

The perf infrastructure uses a bit mask to find out valid registers to
display. Define a register mask for supported registers defined in
uapi/asm/perf_regs.h. The bit positions also correspond to register IDs
which is used by perf infrastructure to fetch the register values.
CONFIG_HAVE_PERF_REGS enables sampling of the interrupted machine state.

Signed-off-by: Anju T <anju@linux.vnet.ibm.com>
[mpe: Add license, use CONFIG_PPC64, fix 32-bit build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/perf: Assign an id to each powerpc register
Anju T [Sat, 20 Feb 2016 05:02:45 +0000 (10:32 +0530)]
powerpc/perf: Assign an id to each powerpc register

The enum definition assigns an 'id' to each register in "struct pt_regs"
of arch/powerpc. The order of these values in the enum definition are
based on the order of members in pt_regs.

Signed-off-by: Anju T <anju@linux.vnet.ibm.com>
[mpe: Rename LNK to LINK, use _UAPI_ASM for include guards]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/book3s64: Remove __end_handlers marker
Hari Bathini [Thu, 7 Apr 2016 22:00:34 +0000 (03:30 +0530)]
powerpc/book3s64: Remove __end_handlers marker

The __end_handlers marker was intended to mark down upto code that gets
called from exception prologs. But that hasn't kept pace with code
changes. Case in point, slb_miss_realmode being called from exception
prolog code but isn't below __end_handlers marker. So, __end_handlers
marker is as good as a comment but could be misleading at times if it
isn't in sync with the code, as is the case now. So, let us avoid this
confusion by having a better comment and removing __end_handlers marker
altogether.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc/book3s64: Fix branching to OOL handlers in relocatable kernel
Hari Bathini [Fri, 15 Apr 2016 12:48:02 +0000 (22:48 +1000)]
powerpc/book3s64: Fix branching to OOL handlers in relocatable kernel

Some of the interrupt vectors on 64-bit POWER server processors are only
32 bytes long (8 instructions), which is not enough for the full
first-level interrupt handler. For these we need to branch to an
out-of-line (OOL) handler. But when we are running a relocatable kernel,
interrupt vectors till __end_interrupts marker are copied down to real
address 0x100. So, branching to labels (ie. OOL handlers) outside this
section must be handled differently (see LOAD_HANDLER()), considering
relocatable kernel, which would need at least 4 instructions.

However, branching from interrupt vector means that we corrupt the
CFAR (come-from address register) on POWER7 and later processors as
mentioned in commit 1707dd16. So, EXCEPTION_PROLOG_0 (6 instructions)
that contains the part up to the point where the CFAR is saved in the
PACA should be part of the short interrupt vectors before we branch out
to OOL handlers.

But as mentioned already, there are interrupt vectors on 64-bit POWER
server processors that are only 32 bytes long (like vectors 0x4f00,
0x4f20, etc.), which cannot accomodate the above two cases at the same
time owing to space constraint. Currently, in these interrupt vectors,
we simply branch out to OOL handlers, without using LOAD_HANDLER(),
which leaves us vulnerable when running a relocatable kernel (eg. kdump
case). While this has been the case for sometime now and kdump is used
widely, we were fortunate not to see any problems so far, for three
reasons:

  1. In almost all cases, production kernel (relocatable) is used for
     kdump as well, which would mean that crashed kernel's OOL handler
     would be at the same place where we end up branching to, from short
     interrupt vector of kdump kernel.
  2. Also, OOL handler was unlikely the reason for crash in almost all
     the kdump scenarios, which meant we had a sane OOL handler from
     crashed kernel that we branched to.
  3. On most 64-bit POWER server processors, page size is large enough
     that marking interrupt vector code as executable (see commit
     429d2e83) leads to marking OOL handler code from crashed kernel,
     that sits right below interrupt vector code from kdump kernel, as
     executable as well.

Let us fix this by moving the __end_interrupts marker down past OOL
handlers to make sure that we also copy OOL handlers to real address
0x100 when running a relocatable kernel.

This fix has been tested successfully in kdump scenario, on an LPAR with
4K page size by using different default/production kernel and kdump
kernel.

Also tested by manually corrupting the OOL handlers in the first kernel
and then kdump'ing, and then causing the OOL handlers to fire - mpe.

Fixes: c1fb6816fb1b ("powerpc: Add relocation on exception vector handlers")
Cc: stable@vger.kernel.org
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoMerge branch 'topic/livepatch' into next
Michael Ellerman [Mon, 18 Apr 2016 10:45:32 +0000 (20:45 +1000)]
Merge branch 'topic/livepatch' into next

Merge the support for live patching on ppc64le using mprofile-kernel.
This branch has also been merged into the livepatching tree for v4.7.

8 years agopowerpc/livepatch: Add live patching support on ppc64le
Michael Ellerman [Thu, 24 Mar 2016 11:04:05 +0000 (22:04 +1100)]
powerpc/livepatch: Add live patching support on ppc64le

Add the kconfig logic & assembly support for handling live patched
functions. This depends on DYNAMIC_FTRACE_WITH_REGS, which in turn
depends on the new -mprofile-kernel ftrace ABI, which is only supported
currently on ppc64le.

Live patching is handled by a special ftrace handler. This means it runs
from ftrace_caller(). The live patch handler modifies the NIP so as to
redirect the return from ftrace_caller() to the new patched function.

However there is one particularly tricky case we need to handle.

If a function A calls another function B, and it is known at link time
that they share the same TOC, then A will not save or restore its TOC,
and will call the local entry point of B.

When we live patch B, we replace it with a new function C, which may
not have the same TOC as A. At live patch time it's too late to modify A
to do the TOC save/restore, so the live patching code must interpose
itself between A and C, and do the TOC save/restore that A omitted.

An additionaly complication is that the livepatch code can not create a
stack frame in order to save the TOC. That is because if C takes > 8
arguments, or is varargs, A will have written the arguments for C in
A's stack frame.

To solve this, we introduce a "livepatch stack" which grows upward from
the base of the regular stack, and is used to store the TOC & LR when
calling a live patched function.

When the patched function returns, we retrieve the real LR & TOC from
the livepatch stack, restore them, and pop the livepatch "stack frame".

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
8 years agopowerpc/livepatch: Add livepatch stack to struct thread_info
Michael Ellerman [Thu, 24 Mar 2016 11:04:04 +0000 (22:04 +1100)]
powerpc/livepatch: Add livepatch stack to struct thread_info

In order to support live patching we need to maintain an alternate
stack of TOC & LR values. We use the base of the stack for this, and
store the "live patch stack pointer" in struct thread_info.

Unlike the other fields of thread_info, we can not statically initialise
that value, so it must be done at run time.

This patch just adds the code to support that, it is not enabled until
the next patch which actually adds live patch support.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Balbir Singh <bsingharora@gmail.com>
8 years agopowerpc/livepatch: Add livepatch header
Michael Ellerman [Thu, 24 Mar 2016 11:04:03 +0000 (22:04 +1100)]
powerpc/livepatch: Add livepatch header

Add the powerpc specific livepatch definitions. In particular we provide
a non-default implementation of klp_get_ftrace_location().

This is required because the location of the mcount call is not constant
when using -mprofile-kernel (which we always do for live patching).

Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agolivepatch: Allow architectures to specify an alternate ftrace location
Michael Ellerman [Thu, 24 Mar 2016 11:04:02 +0000 (22:04 +1100)]
livepatch: Allow architectures to specify an alternate ftrace location

When livepatch tries to patch a function it takes the function address
and asks ftrace to install the livepatch handler at that location.
ftrace will look for an mcount call site at that exact address.

On powerpc the mcount location is not the first instruction of the
function, and in fact it's not at a constant offset from the start of
the function. To accommodate this add a hook which arch code can
override to customise the behaviour.

Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agoftrace: Make ftrace_location_range() global
Michael Ellerman [Thu, 24 Mar 2016 11:04:01 +0000 (22:04 +1100)]
ftrace: Make ftrace_location_range() global

In order to support live patching on powerpc we would like to call
ftrace_location_range(), so make it global.

Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agocxl: Delete an unnecessary check before the function call "kfree"
Markus Elfring [Fri, 6 Nov 2015 10:00:23 +0000 (11:00 +0100)]
cxl: Delete an unnecessary check before the function call "kfree"

The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agodrivers: macintosh: rack-meter: fix bogus memsets
Aaro Koskinen [Sun, 10 Apr 2016 19:53:48 +0000 (22:53 +0300)]
drivers: macintosh: rack-meter: fix bogus memsets

Fix bogus memsets pointed out by sparse:

linux-v4.3/drivers/macintosh/rack-meter.c:157:15: warning: memset with byte count of 0
linux-v4.3/drivers/macintosh/rack-meter.c:158:15: warning: memset with byte count of 0

Probably "&" is mistyped "*"; use ARRAY_SIZE to make it more safe.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agodrivers: macintosh: rack-meter: limit idle ticks to total ticks
Aaro Koskinen [Sun, 10 Apr 2016 19:53:47 +0000 (22:53 +0300)]
drivers: macintosh: rack-meter: limit idle ticks to total ticks

Limit idle ticks to total ticks. This prevents the annoying rackmeter
leds fully ON / OFF blinking state that happens on fully idling
G5 Xserve systems.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
8 years agopowerpc: sparse: Include headers for __weak symbols
Daniel Axtens [Wed, 6 Jan 2016 00:45:51 +0000 (11:45 +1100)]
powerpc: sparse: Include headers for __weak symbols

Sometimes when sparse warns about undefined symbols, it isn't
because they should have 'static' added, it's because they're
overriding __weak symbols defined elsewhere, and the header has
been missed.

Fix a few of them by adding appropriate headers.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>