review.tizen.org Git - platform/kernel/linux-rpi.git/log

of/fdt: make memblock maximum physical address arch configurable

When parsing the memory nodes to populate the memblock memory
table, we check against high and low limits and clip any memory
that exceeds either one of them.

However, for arm64, the high limit of (phys_addr_t)~0 is not very
meaningful, since phys_addr_t is 64 bits (i.e., no limit) but there
may be other constraints that limit the memory ranges that we can
support.

So rename MAX_PHYS_ADDR to MAX_MEMBLOCK_ADDR (for clarity) and only
define it if the arch does not supply a definition of its own.

Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: Fix source code file path in comments

Architecture specific code for i386 and x86_64 was unified and merged to
the arch/x86. This patch fix old path of x86 architecture in a comment
from the arch/arm64/include/asm/fixmap.h.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: entry: always restore x0 from the stack on syscall return

We have a micro-optimisation on the fast syscall return path where we
take care to keep x0 live with the return value from the syscall so that
we can avoid restoring it from the stack. The benefit of doing this is
fairly suspect, since we will be restoring x1 from the stack anyway
(which lives adjacent in the pt_regs structure) and the only additional
cost is saving x0 back to pt_regs after the syscall handler, which could
be seen as a poor man's prefetch.

More importantly, this causes issues with the context tracking code.

The ct_user_enter macro ends up branching into C code, which is free to
use x0 as a scratch register and consequently leads to us returning junk
back to userspace as the syscall return value. Rather than special case
the context-tracking code, this patch removes the questionable
optimisation entirely.

Cc: <stable@vger.kernel.org>
Cc: Larry Bassel <larry.bassel@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: mdscr_el1: avoid exposing DCC to userspace

We don't want to expose the DCC to userspace, particularly as there is
a kernel console driver for it.

This patch resets mdscr_el1 to disable userspace access to the DCC
registers on the cold boot path.

Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: kconfig: Move LIST_POISON to a safe value

Move the poison pointer offset to 0xdead000000000000, a
recognized value that is not mappable by user-space exploits.

Cc: <stable@vger.kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Thierry Strudel <tstrudel@google.com>
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: Add __exception_irq_entry definition for function graph

The gic_handle_irq() is defined with __exception_irq_entry attribute.
A single remaining work is to add its definition as ARM did. Below
shows how function graph data is changed with these hunks.

A prologue of an interrupt handler is drawn as follows.

- current status

0)   0.208 us    |  cpuidle_not_available();
0)               |  default_idle_call() {
0)               |    arch_cpu_idle() {
0)               |      __handle_domain_irq() {
0)               |        irq_enter() {
0)   0.313 us    |          rcu_irq_enter();
0)   0.261 us    |          __local_bh_disable_ip();

- with this change

0)   0.625 us    |  cpuidle_not_available();
0)               |  default_idle_call() {
0)               |    arch_cpu_idle() {
0)   ==========> |
0)               |      gic_handle_irq() {
0)               |        __handle_domain_irq() {
0)               |          irq_enter() {
0)   0.885 us    |            rcu_irq_enter();
0)   0.781 us    |            __local_bh_disable_ip();

An epilogue of an interrupt handler is recorded as follows.

- current status

0)   0.261 us    |          idle_cpu();
0)               |          rcu_irq_exit() {
0)   0.521 us    |            rcu_eqs_enter_common.isra.46();
0)   2.552 us    |          }
0) ! 322.448 us  |        }
0) ! 583.437 us  |      }
0) # 1656.041 us |    }
0) # 1658.073 us |  }

- with this change

0)   0.677 us    |            idle_cpu();
0)               |            rcu_irq_exit() {
0)   1.770 us    |              rcu_eqs_enter_common.isra.46();
0)   7.968 us    |            }
0) # 1803.541 us |          }
0) # 2626.667 us |        }
0) # 2632.969 us |      }
0)   <========== |
0) # 14425.00 us |    }
0) # 14430.98 us |  }

Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Rabin Vincent <rabin@rab.in>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jungseok Lee <jungseoklee85@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

Merge branch 'aarch64/psci/drivers' into aarch64/for-next/core

Move our PSCI implementation out into drivers/firmware/ where it can be
shared with arch/arm/.

Conflicts:
arch/arm64/kernel/psci.c

arm64: mm: ensure patched kernel text is fetched from PoU

The arm64 booting document requires that the bootloader has cleaned the
kernel image to the PoC. However, when a CPU re-enters the kernel due to
either a CPU hotplug "on" event or resuming from a low-power state (e.g.
cpuidle), the kernel text may in-fact be dirty at the PoU due to things
like alternative patching or even module loading.

Thanks to I-cache speculation with the MMU off, stale instructions could
be fetched prior to enabling the MMU, potentially leading to crashes
when executing regions of code that have been modified at runtime.

This patch addresses the issue by ensuring that the local I-cache is
invalidated immediately after a CPU has enabled its MMU but before
jumping out of the identity mapping. Any stale instructions fetched from
the PoC will then be discarded and refetched correctly from the PoU.
Patching kernel text executed prior to the MMU being enabled is
prohibited, so the early entry code will always be clean.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: alternatives: ensure secondary CPUs execute ISB after patching

In order to guarantee that the patched instruction stream is visible to
a CPU, that CPU must execute an isb instruction after any related cache
maintenance has completed.

The instruction patching routines in kernel/insn.c get this right for
things like jump labels and ftrace, but the alternatives patching omits
it entirely leaving secondary cores in a potential limbo between the old
and the new code.

This patch adds an isb following the secondary polling loop in the
altenatives patching.

Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: make ll/sc __cmpxchg_case_##name asm consistent

The ll/sc __cmpxchg_case_##name assembly mostly uses symbolic names for
operands, but in a single case uses %2 to refer to what is otherwise
known as %[v]. This makes the code more painful to read than is
necessary.

Use %[v] instead.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: dma-mapping: Simplify pgprot handling

Since __get_dma_pgprot() does The Right Thing(TM) in the non-coherent
case, and the non-cacheable alias for DMA buffers is private to the
kernel anyway, we can simplify things slightly and make the code more
readable by just using PAGE_KERNEL as the base pgprot.

Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

MAINTAINERS: add PSCI entry

Add myself and Lorenzo as maintainers of the PSCI client code.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

drivers: psci: support native SMC{32,64} calls

A 32-bit OS cannot make calls with SMC64 IDs, while a 64-bit OS must
invoke some PSCI functions with SMC64 IDs.

This patch introduces and makes use of a new macro to choose the
appropriate IDs based on the register width of the OS, which will allow
32-bit callers to use the PSCI client code.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: psci: factor invocation code to drivers

To enable sharing with arm, move the core PSCI framework code to
drivers/firmware. This results in a minor gain in lines of code, but
this will quickly be amortised by the removal of code currently
duplicated in arch/arm.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: restore cpu suspend/resume functionality

Commit 4b3dc9679cf7 ("arm64: force CONFIG_SMP=y and remove redundant #ifdefs")
accidentally retained code for !CONFIG_SMP in cpu_resume function. This
resulted in the hash index being zeroed in x7 after proper computation,
which is then used to get the cpu context pointer while resuming.

This patch removes the remanant code and restores back the cpu suspend/
resume functionality.

Fixes: 4b3dc9679cf7 ("arm64: force CONFIG_SMP=y and remove redundant #ifdefs")
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

ARM64: PCI: do not enable resources on PROBE_ONLY systems

On ARM64 PROBE_ONLY PCI systems resources are not currently claimed,
therefore they can't be enabled since they do not have a valid
parent pointer; this in turn prevents enabling PCI devices on
ARM64 PROBE_ONLY systems, causing PCI devices initialization to
fail.

To solve this issue, resources must be claimed when devices are
added on PROBE_ONLY systems, which ensures that the resource hierarchy
is validated and the resource tree is sane, but this requires changes
in the ARM64 resource management that can affect adversely existing
PCI set-ups (claiming resources on !PROBE_ONLY systems might break
existing ARM64 PCI platform implementations).

As a temporary solution in preparation for a proper resources claiming
implementation in ARM64 core, to enable PCI PROBE_ONLY systems on ARM64,
this patch adds a pcibios_enable_device() arch implementation that
simply prevents enabling resources on PROBE_ONLY systems (mirroring ARM
behaviour).

This is always a safe thing to do because on PROBE_ONLY systems the
configuration space set-up can be considered immutable, and it is in
preparation of proper resource claiming that would finally validate
the PCI resources tree in the ARM64 arch implementation on PROBE_ONLY
systems.

For !PROBE_ONLY systems resources enablement in pcibios_enable_device()
on ARM64 is implemented as in current PCI core, leaving the behaviour
unchanged.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: cmpxchg: truncate sub-word signed types before comparison

When performing a cmpxchg operation on a signed sub-word type (e.g. s8),
we need to ensure that the upper register bits of the "old" value used
for comparison are zeroed, otherwise we may erroneously fail the cmpxchg
which may even be interpreted as success by the caller (if the compiler
performs the truncation as part of its check). This has been observed
in mod_state, where negative values where causing problems with
this_cpu_cmpxchg.

This patch fixes the issue by explicitly casting 8-bit and 16-bit "old"
values using unsigned types in our cmpxchg wrappers. 32-bit types can be
left alone, since the underlying asm makes use of W registers in this
case.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: alternative: put secondary CPUs into polling loop during patch

When patching the kernel text with alternatives, we may end up patching
parts of the stop_machine state machine (e.g. atomic_dec_and_test in
ack_state) and consequently corrupt the instruction stream of any
secondary CPUs.

This patch passes the cpu_online_mask to stop_machine, forcing all of
the CPUs into our own callback which can place the secondary cores into
a dumb (but safe!) polling loop whilst the patching is carried out.

Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64/Documentation: clarify wording regarding memory below the Image

Clarify that the memory below the start of the image but inside the
region covered by the linear mapping has no special significance to
the kernel, and may be used by the firmware provided that it is marked
as reserved.

Also, fix up some whitespace errors.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: lse: fix lse cmpxchg code indentation

For some reason, the ll/sc cmpxchg asm is all off to the left and
awkward to read in conjunction with the following (correctly indented)
LSE version.

This patch shifts the ll/sc code back to where it should be.

Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: remove redundant object file list

Commit 4b3dc9679cf7 ("arm64: force CONFIG_SMP=y and remove redundant
#ifdefs") forces SMP on arm64. To build the necessary objects for SMP,
they were added to the arm64-obj-y rule in arch/arm64/kernel/Makefile,
without removing the arm64-obj-$(CONFIG_SMP) rule.

Remove redundant object file list depending on always-yes CONFIG_SMP in
arch/arm64/kernel/Makefile.

Signed-off-by: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: remove dead-code depending on CONFIG_UP_LATE_INIT

Commit 4b3dc9679cf7 ("arm64: force CONFIG_SMP=y and remove redundant
and therfore can not be selected anymore.

Remove dead #ifdef-block depending on UP_LATE_INIT in
arch/arm64/kernel/setup.c

Signed-off-by: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
[will: kill do_post_cpus_up_work altogether]
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: pgtable: fix definition of pte_valid

pte_valid should check if the PTE_VALID bit (1 << 0) is set in the pte,
so fix the macro definition to use bitwise & instead of logical &&.

Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: spinlock: fix ll/sc unlock on big-endian systems

When unlocking a spinlock, we perform a read-modify-write on the owner
ticket in order to increment it and store it back with release
semantics.

In the LL/SC case, we load the 16-bit ticket using a 32-bit load and
therefore store back the wrong halfword on a big-endian system,
corrupting the lock after the first unlock and killing the system dead.

This patch fixes the unlock code to use 16-bit accessors consistently.

Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: Use last level TLBI for user pte changes

The flush_tlb_page() function is used on user address ranges when PTEs
(or PMDs/PUDs for huge pages) were changed (attributes or clearing). For
such cases, it is more efficient to invalidate only the last level of
the TLB with the "tlbi vale1is" instruction.

In the TLB shoot-down case, the TLB caching of the intermediate page
table levels (pmd, pud, pgd) is handled by __flush_tlb_pgtable() via the
__(pte|pmd|pud)_free_tlb() functions and it is not deferred to
tlb_finish_mmu() (as of commit 285994a62c80 - "arm64: Invalidate the TLB
corresponding to intermediate page table levels"). The tlb_flush()
function only needs to invalidate the TLB for the last level of page
tables; the __flush_tlb_range() function gains a fourth argument for
last level TLBI.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: Clean up __flush_tlb(_kernel)_range functions

This patch moves the MAX_TLB_RANGE check into the
flush_tlb(_kernel)_range functions directly to avoid the
undescore-prefixed definitions (and for consistency with a subsequent
patch).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: mm: mark create_mapping as __init

Currently create_mapping is marked with __ref, apparently because it
refers to early_alloc. However, create_mapping has no logic to prevent
erroneous use of early_alloc after it has been freed, and is only ever
called by __init functions anyway. Thus the __ref marker is misleading
and unnecessary.

Instead, this patch marks create_mapping as __init, resulting in
warnings if it is used from a a non __init functions, and allowing its
memory to be reclaimed.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: debug: rename enum debug_el to avoid symbol collision

lib/list_sort.c defines a 'struct debug_el', where "el" is assumedly a
a contraction of "element". This conflicts with 'enum debug_el' in our
asm/debug-monitors.h header file, where "el" stands for Exception Level.

The result is build failure when targetting allmodconfig, so rename our
enum to 'dbg_active_el' to be slightly more explicit about what it is.

Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: mm: add __init section marker to free_initrd_mem

It is not needed after booting, this patch moves the
free_initrd_mem() function to the __init section.

This patch also make keep_initrd __initdata, to reduce kernel
size.

Signed-off-by: Wang Long <long.wanglong@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: elf: use cpuid_feature_extract_field for hwcap detection

cpuid_feature_extract_field takes care of the fiddly ID register
field sign-extension, so use that instead of rolling our own version.

Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: lse: use generic cpufeature detection for LSE atomics

Rework the cpufeature detection to support ISAR0 and use that for
detecting the presence of LSE atomics.

Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: kconfig: group the v8.1 features together

ARMv8 CPUs do not support any of the v8.1 features, so group them
together in Kconfig to make it clear that they're part of 8.1 and not
relevant to older cores.

Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: lse: rename ARM64_CPU_FEAT_LSE_ATOMICS for consistency

Other CPU features follow an 'ARM64_HAS_*' naming scheme, so do the same
for the LSE atomics.

Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: kconfig: select HAVE_CMPXCHG_LOCAL

We implement an optimised cmpxchg_local macro, so let the kernel know.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: atomic64_dec_if_positive: fix incorrect branch condition

If we attempt to atomic64_dec_if_positive on INT_MIN, we will underflow
and incorrectly decide that the original parameter was positive.

This patches fixes the broken condition code so that we handle this
corner case correctly.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: atomics: implement atomic{,64}_cmpxchg using cmpxchg

We don't need duplicate cmpxchg implementations, so use cmpxchg to
implement atomic{,64}_cmpxchg, like we do for xchg already.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: atomics: prefetch the destination word for write prior to stxr

The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.

This patch makes use of prfm to prefetch cachelines for write prior to
ldxr/stxr loops when using the ll/sc atomic routines.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: atomics: tidy up common atomic{,64}_* macros

The common (i.e. identical for ll/sc and lse) atomic macros in atomic.h
are needlessley different for atomic_t and atomic64_t.

This patch tidies up the definitions to make them consistent across the
two atomic types and factors out common code such as the add_unless
implementation based on cmpxchg.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: cmpxchg: avoid memory barrier on comparison failure

cmpxchg doesn't require memory barrier semantics when the value
comparison fails, so make the barrier conditional on success.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: cmpxchg: avoid "cc" clobber in ll/sc routines

We can perform the cmpxchg comparison using eor and cbnz which avoids
the "cc" clobber for the ll/sc case and consequently for the LSE case
where we may have to fall-back on the ll/sc code at runtime.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of our cmpxchg_double primitives
so that the LSE casp instruction is used instead.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: cmpxchg: patch in lse instructions when supported by the CPU

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of our cmpxchg primitives so that
the LSE cas instruction is used instead.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: xchg: patch in lse instructions when supported by the CPU

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of our xchg primitives so that
the LSE swp instruction (yes, you read right!) is used instead.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: bitops: patch in lse instructions when supported by the CPU

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of our bitops functions so that
LSE atomic instructions are used instead.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: locks: patch in lse instructions when supported by the CPU

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of our locking functions so that
LSE atomic instructions are used for spinlocks and rwlocks.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: atomics: patch in lse instructions when supported by the CPU

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of atomic_t and atomic64_t
routines so that the call-site for the out-of-line ll/sc sequences is
patched with an LSE atomic instruction when we detect that
the CPU supports it.

If binutils is not recent enough to assemble the LSE instructions, then
the ll/sc sequences are inlined as though CONFIG_ARM64_LSE_ATOMICS=n.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics

In order to patch in the new atomic instructions at runtime, we need to
generate wrappers around the out-of-line exclusive load/store atomics.

This patch adds a new Kconfig option, CONFIG_ARM64_LSE_ATOMICS. which
causes our atomic functions to branch to the out-of-line ll/sc
implementations. To avoid the register spill overhead of the PCS, the
out-of-line functions are compiled with specific compiler flags to
force out-of-line save/restore of any registers that are usually
caller-saved.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: alternatives: add cpu feature for lse atomics

Add a CPU feature for the LSE atomic instructions, so that they can be
patched in at runtime when we detect that they are supported.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: elf: advertise 8.1 atomic instructions as new hwcap

The ARM v8.1 architecture introduces new atomic instructions to the A64
instruction set for things like cmpxchg, so advertise their availability
to userspace using a hwcap.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: atomics: move ll/sc atomics into separate header file

In preparation for the Large System Extension (LSE) atomic instructions
introduced by ARM v8.1, move the current exclusive load/store (LL/SC)
atomics into their own header file.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: cpufeature.h: add missing #include of kernel.h

cpufeature.h makes use of DECLARE_BITMAP, which in turn relies on the
BITS_TO_LONGS and DIV_ROUND_UP macros.

This patch includes kernel.h in cpufeature.h to prevent all users having
to do the same thing.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: rwlocks: don't fail trylock purely due to contention

STXR can fail for a number of reasons, so don't fail an rwlock trylock
operation simply because the STXR reported failure.

I'm not aware of any issues with the current code, but this makes it
consistent with spin_trylock and also other architectures (e.g. arch/arm).

Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

Merge branch 'locking/arch-atomic' of git://git./linux/kernel/git/tip/tip into aarch64/for-next/core

Merge in PeterZ's logical atomic ops so that we can implement them in
our subsequent LSE atomics.

atomic: Add simple atomic_t tests

Add a few atomic_t tests, gets some compile coverage for the new
operations.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

atomic: Replace atomic_{set,clear}_mask() usage

Replace the deprecated atomic_{set,clear}_mask() usage with the now
ubiquous atomic_{or,andnot}() functions.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

atomic: Collapse all atomic_{set,clear}_mask definitions

Move the now generic definitions of atomic_{set,clear}_mask() into
linux/atomic.h to avoid endless and pointless repetition.

Also, provide an atomic_andnot() wrapper for those few archs that can
implement that.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

atomic: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

tile: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

For tilegx, these are relatively straightforward; the architecture
provides atomic "or" and "and", both 32-bit and 64-bit. To support
xor we provide a loop using "cmpexch".

For the older 32-bit tilepro architecture, we have to extend
the set of low-level assembly routines to include 32-bit "and",
as well as all three 64-bit routines. Somewhat confusingly,
some 32-bit versions are already used by the bitops inlines, with
parameter types appropriate for bitops, so we have to do a bit of
casting to match "int" to "unsigned long".

Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1436474297-32187-1-git-send-email-cmetcalf@ezchip.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

h8300: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Also rework the atomic implementation in terms of CPP macros to avoid
the typical repetition -- I seem to have missed this arch the last
time around when I did that.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

frv: Rewrite atomic implementation

Mostly complete rewrite of the FRV atomic implementation, instead of
using assembly files, use inline assembler.

The out-of-line CONFIG option makes a bit of a mess of things, but a
little CPP trickery gets that done too.

FRV already had the atomic logic ops but under a non standard name,
the reimplementation provides the generic names and provides the
intermediate form required for the bitops implementation.

The slightly inconsistent __atomic32_fetch_##op naming is because
__atomic_fetch_##op conlicts with GCC builtin functions.

The 64bit atomic ops use the inline assembly %Ln construct to access
the low word register (r+1), afaik this construct was not previously
used in the kernel and is completely undocumented, but I found it in
the FRV GCC code and it seems to work.

FRV had a non-standard definition of atomic_{clear,set}_mask() which
would work types other than atomic_t, the one user relying on that
(arch/frv/kernel/dma.c) got converted to use the new intermediate
form.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

s390: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

xtensa: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

sparc: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

sh: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

powerpc: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

parisc: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

mn10300: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

mips: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

metag: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

m68k: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

m32r: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

ia64: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

hexagon: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

blackfin: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

TODO: use inline asm or at least asm macros to collapse the lot.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

avr32: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

arm64: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

arm: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

arc: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Acked-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

alpha: Provide atomic_{or,xor,and}

Implement atomic logic ops -- atomic_{or,xor,and}.

These will replace the atomic_{set,clear}_mask functions that are
available on some archs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

atomic: Prepare generic atomic implementation for logic ops

Clean up the #ifdef guards a bit to prepare for architectures to
supply their own logic ops.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

arm64: include linux/types.h in asm/spinlock_types.h

Our ticket-based spinlock structures rely on a definition of u16, so
include linux/types.h explicitly to ensure the thing compiles.

Found by a module build failure in -next:

  arch/arm64/include/asm/spinlock_types.h:27:2: error: unknown type name 'u16'
  arch/arm64/include/asm/spinlock_types.h:28:2: error: unknown type name 'u16'
  arch/arm64/include/asm/spinlock_types.h:33:13: error: expected declaration specifiers or '...' before numeric constant
  include/linux/spinlock_types.h:21:2: error: unknown type name 'arch_spinlock_t'
  arch/arm64/include/asm/spinlock.h:34:35: error: unknown type name 'arch_spinlock_t'
  arch/arm64/include/asm/spinlock.h:65:37: error: unknown type name 'arch_spinlock_t'

Reported-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64/BUG: Show explicit backtrace for WARNs

The generic slowpath WARN implementation prints a backtrace, but
the report_bug() based implementation does not, opting to print the
registers instead which is generally not as useful.

Ideally, report_bug() should be fixed to make the behaviour more
consistent, but in the meantime this patch generates a backtrace
directly from the arm64 backend instead so that this functionality
is not lost with the migration to report_bug().

As a side-effect, the backtrace will be outside the oops end
marker, but that's hard to avoid without modifying generic code.

This patch can go away if report_bug() grows the ability in the
future to generate a backtrace directly or call an arch hook at the
appropriate time.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64/BUG: Use BRK instruction for generic BUG traps

Currently, the minimal default BUG() implementation from asm-
generic is used for arm64.

This patch uses the BRK software breakpoint instruction to generate
a trap instead, similarly to most other arches, with the generic
BUG code generating the dmesg boilerplate.

This allows bug metadata to be moved to a separate table and
reduces the amount of inline code at BUG and WARN sites. This also
avoids clobbering any registers before they can be dumped.

To mitigate the size of the bug table further, this patch makes
use of the existing infrastructure for encoding addresses within
the bug table as 32-bit offsets instead of absolute pointers.
(Note that this limits the kernel size to 2GB.)

Traps are registered at arch_initcall time for aarch64, but BUG
has minimal real dependencies and it is desirable to be able to
generate bug splats as early as possible. This patch redirects
all debug exceptions caused by BRK directly to bug_handler() until
the full debug exception support has been initialised.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64/debug: Add missing #includes

<asm/debug-monitors.h> relies on <asm/ptrace.h>, but doesn't
declare this dependency. This becomes a problem once
debug-monitors.h starts getting included all over the place to get
the BRK immedates.

The missing include of <asm/memory.h> (for UL()) in <asm/esr.h> is
also added. The series no longer relies on this, but I spotted it
during development and it may as well get fixed.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64/debug: Simplify BRK insn opcode declarations

The way the KGDB_DYN_BRK_INS_BYTEx macros are declared is more
complex than it needs to be. Also, the macros are only used in one
place, which is arch-specific anyway.

This patch refactors the macros to simplify them, and exposes an
argument so that we can have a single macro instead of 4.

As a side effect, this patch also fixes some anomalous spellings of
"KGDB".

These changes alter the compile types of some integer constants
that are harmless but trigger truncation warnings in gcc when
assigning to 32-bit variables. This patch adds an explicit cast
for the affected cases.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64/debug: Move BRK ESR template macro into <asm/esr.h>

It makes sense to keep all the architectural exception syndrome
definitions in the same place.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64/debug: More consistent naming for the BRK ESR template macro

The naming of DBG_ESR_VAL_BRK is inconsistent with the way other
similar macros are named.

This patch makes the naming more consistent, and appends "64"
as a reminder that this ESR pattern only matches from AArch64
state.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64/debug: Eliminate magic number from ESR template definition

<asm/esr.h> has perfectly good constants for defining ESR values
already. Let's use them.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64/debug: Mask off all reserved bits from generated ESR values

There are only 16 comment bits in a BRK instruction, which
correspond to ESR bits 15:0. Bits 24:16 of the ESR are RES0,
and might have weird meanings in the future.

This code inserts 16 bits of comment in the ESR value instead of
20 (almost certainly a typo in the original code).

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64/debug: Eliminate magic number for size of BRK instruction

The size of an A64 BRK instruction is the same as the size of all other
A64 instructions, because all A64 instructions are the same size.

BREAK_INSTR_SIZE is retained for readibility, but it should not be
an independent constant from AARCH64_INSN_SIZE.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: insn: use set_fixmap_offset to make it more clear

A little change to patch_map() function,
use set_fixmap_offset() to make code more clear.

Signed-off-by: yalin wang <yalin.wang2010@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: efi: prefer AllocatePages() over efi_low_alloc() for vmlinux

When allocating memory for the kernel image, try the AllocatePages()
boot service to obtain memory at the preferred offset of
'dram_base + TEXT_OFFSET', and only revert to efi_low_alloc() if that
fails. This is the only way to allocate at the base of DRAM if DRAM
starts at 0x0, since efi_low_alloc() refuses to allocate at 0x0.

Tested-by: Haojian Zhuang <haojian.zhuang@linaro.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: kernel: remove non-legit DT warnings when booting using ACPI

Since both CONFIG_ACPI and CONFIG_OF are enabled when booting using ACPI
tables on ARM64 platforms, we get few device tree warnings which are not
valid for ACPI boot. We can use of_have_populated_dt to check if the
device tree is populated or not before throwing out those errors.

This patch uses of_have_populated_dt to remove non legitimate device
tree warning when booting using ACPI tables.

Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: alternatives: add enable parameter to conditional asm macros

There are cases where we want to compile out both versions of an
alternative code block, so add an enable parameter to the new conditional
alternative assembly macros in the same way as alternative_insn.

Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: kernel: Add support for Privileged Access Never

'Privileged Access Never' is a new arm8.1 feature which prevents
privileged code from accessing any virtual address where read or write
access is also permitted at EL0.

This patch enables the PAN feature on all CPUs, and modifies {get,put}_user
helpers temporarily to permit access.

This will catch kernel bugs where user memory is accessed directly.
'Unprivileged loads and stores' using ldtrb et al are unaffected by PAN.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
[will: use ALTERNATIVE in asm and tidy up pan_enable check]
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: Generalise msr_s/mrs_s operations

The system register encoding generated by sys_reg() works only
for MRS/MSR(Register) operations, as we hardcode Bit20 to 1 in
mrs_s/msr_s mask. This makes it unusable for generating instructions
accessing registers with Op0 < 2(e.g, PSTATE.x with Op0=0).

As per ARMv8 ARM, (Ref: ARMv8 ARM, Section: "System instruction class
encoding overview", C5.2, version:ARM DDI 0487A.f), the instruction
encoding reserves bits [20-19] for Op0.

This patch generalises the sys_reg, mrs_s and msr_s macros, so that
we could use them to access any of the supported system register.

Cc: James Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: kernel: Add optional CONFIG_ parameter to ALTERNATIVE()

Some uses of ALTERNATIVE() may depend on a feature that is disabled at
compile time by a Kconfig option. In this case the unused alternative
instructions waste space, and if the original instruction is a nop, it
wastes time and space.

This patch adds an optional 'config' option to ALTERNATIVE() and
alternative_insn that allows the compiler to remove both the original
and alternative instructions if the config option is not defined.

Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: kernel: Add min_field_value and use '>=' for feature detection

When a new cpu feature is available, the cpu feature bits will have some
initial value, which is incremented when the feature is updated.
This patch changes 'register_value' to be 'min_field_value', and checks
the feature bits value (interpreted as a signed int) is greater than this
minimum.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: kernel: Add cpufeature 'enable' callback

This patch adds an 'enable()' callback to cpu capability/feature
detection, allowing features that require some setup or configuration
to get this opportunity once the feature has been detected.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>