review.tizen.org Git - platform/kernel/linux-3.10.git/log

arm64: Relax the kernel cache requirements for boot

With system caches for the host OS or architected caches for guest OS we
cannot easily guarantee that there are no dirty or stale cache lines for
the areas of memory written by the kernel during boot with the MMU off
(therefore non-cacheable accesses).

This patch adds the necessary cache maintenance during boot and relaxes
the booting requirements.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Update the TCR_EL1 translation granule definitions for 16K pages

The current TCR register setting in arch/arm64/mm/proc.S assumes that
TCR_EL1.TG* fields are one bit wide and bit 31 is RES1 (reserved, set to
1). With the addition of 16K pages (currently unsupported in the
kernel), the TCR_EL1.TG* fields have been extended to two bits. This
patch updates the corresponding Linux definitions and drops the bit 31
setting in proc.S in favour of the new macros.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Joe Sylve <joe.sylve@gmail.com>

ARM: topology: Make it clear that all CPUs need to be described

The ARMv8 code will reject incomplete topologies that omit some CPUs (and it's
not clear that it's ever sensible to do so). Update the binding document to
make this clear.

Since we're reformatting the text also fix incorrect grammar in the
final "Any other configuration..." section by removing "consider".

Signed-off-by: Mark Brown <broonie@linaro.org>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Conflicts:
Documentation/devicetree/bindings/arm/topology.txt

Change-Id: Ic39525a02f6e20bbd34da7991c9723f9f04532e9

arm64: Remove pgprot_dmacoherent()

Since this macro is identical to pgprot_writecombine() and is only used
in a single place, remove it completely to avoid confusion. On ARMv7+
processors, the coherent DMA mapping must be Normal NonCacheable (a.k.a.
writecombine) to avoid mismatched hardware attribute aliases (with the
kernel linear mapping as Normal Cacheable).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Support DMA_ATTR_WRITE_COMBINE

DMA_ATTR_WRITE_COMBINE is currently ignored. Set the pgprot
appropriately for non coherent opperations.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Implement custom mmap functions for dma mapping

The current dma_ops do not specify an mmap function so maping
falls back to the default implementation. There are at least
two issues with using the default implementation:

1) The pgprot is always pgprot_noncached (strongly ordered)
memory even with coherent operations
2) dma_common_mmap calls virt_to_page on the remapped non-coherent
address which leads to invalid memory being mapped.

Fix both these issue by implementing a custom mmap function which
correctly accounts for remapped addresses and sets vm_pg_prot
appropriately.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
[catalin.marinas@arm.com: replaced "arm64_" with "__" prefix for consistency]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Fix __range_ok macro

Without this, the following scenario is incorrectly determined
to be invalid.

addr 0x7f_ffffe000 size 8192 addr_limit 0x80_00000000

This behavior was observed while trying to vmsplice the stack
as part of a CRIU dump of a process on a system started with the
norandmaps kernel parameter.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Fix duplicated Kconfig entries

Probably due to rebasing over the lengthy time it took to get the patch
merged commit addea9ef055b (cpufreq: enable ARM drivers on arm64) added
a duplicate Power management options section. Add CPUfreq to the CPU
power management section and remove a duplicate include of the main
power section.

Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: mm: Route pmd thp functions through pte equivalents

Rather than have separate hugetlb and transparent huge page pmd
manipulation functions, re-wire our thp functions to simply call the
pte equivalents.

This allows THP to take advantage of the new PTE_WRITE logic introduced
in:
c2c93e5 arm64: mm: Introduce PTE_WRITE

To represent splitting THPs we use the PTE_SPECIAL bit as this is not
used for pmds.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: rwsem: use asm-generic rwsem implementation

asm-generic offers an atomic-add based rwsem implementation, which
can avoid the need for heavier, spinlock-based synchronisation on the
fast path.

This patch makes use of the optimised implementation for arm64 CPUs.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

asm-generic: rwsem: de-PPCify rwsem.h

asm-generic/rwsem.h used to live under arch/powerpc. During its
liberation to common code, a few references to its former home where
preserved, in particular the definition of RWSEM_ACTIVE_MASK is
predicated on CONFIG_PPC64.

This patch updates the ifdefs and comments to architecturally neutral
versions.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: enable generic CPU feature modalias matching for this architecture

This enables support for the generic CPU feature modalias implementation that
wires up optional CPU features to udev based module autoprobing.

A file <asm/cpufeature.h> is provided that maps CPU feature numbers to
elf_hwcap bits, which is the standard way on arm64 to advertise optional CPU
features both internally and to user space.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[catalin.marinas@arm.com: removed unnecessary "!!"]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: smp: make local symbol static

Make smp_spin_table_cpu_postboot() static, because this function
is used only in this file.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: debug: make local symbols static

Make local symbols static, because these are used only in this
file.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM64: perf: support dwarf unwinding in compat mode

Add support for unwinding using the dwarf information in compat
mode. Using the correct user stack pointer allows perf to record
the frames correctly in the native and compat modes.

Note that although the dwarf frame unwinding works ok using
libunwind in native mode (on ARMv7 & ARMv8), some changes are
required to the libunwind code for the compat mode. Those changes
are posted separately on the libunwind mailing list.

Tested on ARMv8 platform with v8 and compat v7 binaries, the latter
are statically built.

Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM64: perf: add support for frame pointer unwinding in compat mode

When profiling a 32-bit application, user space callchain unwinding
using the frame pointer is performed in compat mode. The code is taken
over from the AARCH32 code and adapted to work on AARCH64.

Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM64: perf: add support for perf registers API

This patch implements the functions required for the perf registers API,
allowing the perf tool to interface kernel register dumps with libunwind
in order to provide userspace backtracing.
Compat mode is also supported.

Only the general purpose user space registers are exported, i.e.:
PERF_REG_ARM_X0,
...
PERF_REG_ARM_X28,
PERF_REG_ARM_FP,
PERF_REG_ARM_LR,
PERF_REG_ARM_SP,
PERF_REG_ARM_PC
and not the PERF_REG_ARM_V* registers.

Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Add boot time configuration of Intermediate Physical Address size

ARMv8 supports a range of physical address bit sizes. The PARange bits
from ID_AA64MMFR0_EL1 register are read during boot-time and the
intermediate physical address size bits are written in the translation
control registers (TCR_EL1 and VTCR_EL2).

There is no change in the VA bits and levels of translation.

Signed-off-by: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
Reviewed-by: Will Deacon <Will.deacon@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Do not synchronise I and D caches for special ptes

Special pte mappings are not intended to be executable and do not even
have an associated struct page. This patch ensures that we do not call
__sync_icache_dcache() on such ptes.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Steve Capper <Steve.Capper@arm.com>
Tested-by: Laura Abbott <lauraa@codeaurora.org>
Tested-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Cc: <stable@vger.kernel.org>

arm64: Make DMA coherent and strongly ordered mappings not executable

pgprot_{dmacoherent,writecombine,noncached} don't need to generate
executable mappings with side-effects like __sync_icache_dcache() being
called when the mapping is in user space.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Tested-by: Laura Abbott <lauraa@codeaurora.org>
Tested-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Cc: <stable@vger.kernel.org>

arm64: barriers: add dmb barrier

Commit 8adbf57fc429 ("irqchip: gic: use dmb ishst instead of dsb when
raising a softirq") added an explicit dmb(...) call to the GIC driver.

This patch adds a simple dmb() macro to arm64, which expands to a DMB SY
instruction.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: topology: Implement basic CPU topology support

Add basic CPU topology support to arm64, based on the existing pre-v8
code and some work done by Mark Hambleton. This patch does not
implement any topology discovery support since that should be based on
information from firmware, it merely implements the scaffolding for
integration of topology support in the architecture.

No locking of the topology data is done since it is only modified during
CPU bringup with external serialisation from the SMP code.

The goal is to separate the architecture hookup for providing topology
information from the DT parsing in order to ease review and avoid
blocking the architecture code (which will be built on by other work)
with the DT code review by providing something simple and basic.

Following patches will implement support for interpreting topology
information from MPIDR and for parsing the DT topology bindings for ARM,
similar patches will be needed for ACPI.

Signed-off-by: Mark Brown <broonie@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
[catalin.marinas@arm.com: removed CONFIG_CPU_TOPOLOGY, always on if SMP]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: advertise ARMv8 extensions to 32-bit compat ELF binaries

This adds support for advertising the presence of ARMv8 Crypto
Extensions in the Aarch32 execution state to 32-bit ELF binaries
running in 32-bit compat mode under the arm64 kernel.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: add AT_HWCAP2 support for 32-bit compat

Add support for the ELF auxv entry AT_HWCAP2 when running 32-bit
ELF binaries in compat mode.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

binfmt_elf: add ELF_HWCAP2 to compat auxv entries

Add ELF_HWCAP2 to the set of auxv entries that is passed to
a 32-bit ELF program running in 32-bit compat mode under a
64-bit kernel.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: remove unnecessary cache flush at boot

Currently we flush the entire dcache at boot within __cpu_setup, but
this is unnecessary as the booting protocol demands that the dcache is
invalid and off upon entering the kernel. The presence of the cache
flush only serves to hide bugs in bootloaders, and is not safe in the
presence of SMP.

In an SMP boot scenario the CPUs enter coherency outside of the kernel,
and the primary CPU enables its caches before bringing up secondary
CPUs. Therefore if any secondary CPU has an entry in its cache (in
violation of the boot protocol), the primary CPU might snoop it even if
the secondary CPU's cache is disabled. The boot-time cache flush only
serves to hide a firmware bug, and slows down a cpu boot unnecessarily.

This patch removes the unnecessary boot-time cache flush.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
[catalin.marinas@arm.com: make __flush_dcache_all local only]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

cpufreq: enable ARM drivers on arm64

Enable cpufreq and power kconfig menus on arm64 along with arm cpufreq
drivers. The power menu is needed for OPP support. At least on Calxeda
systems, the same cpufreq driver is used for arm and arm64 based
systems.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: remove return value form psci_init()

psci_init() is written to return err code if something goes wrong. However,
the single user, setup_arch(), doesn't care about it. Moreover, every error
path is supplied with a clear message which is enough for pleasant debugging.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: remove redundant "psci:" prefixes

Since 652af899799354049b273af897b798b8f03fdd88 "arm64: factor out spin-table
boot method" psci prefix's been introduced. We have a common pr_fmt, so clean
them up.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Implement coherent DMA API based on swiotlb

This patch adds support for DMA API cache maintenance on SoCs without
hardware device cache coherency.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Use swiotlb late initialisation

Since arm64 does not support ISA, there is no need for early swiotlb
initialisation. This patch switches the DMA mapping code to
swiotlb_tlb_late_init_with_default_size(). A side effect of this is that
GFP_DMA is used for the swiotlb buffer and devices with a 32-bit
coherent mask are correctly supported.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Conflicts:
arch/arm64/mm/init.c

Change-Id: I314ab5f954a7f609ac965609bffef281423d098b

arm64: Replace ZONE_DMA32 with ZONE_DMA

On arm64 we do not have two DMA zones, so it does not make sense to
implement ZONE_DMA32. This patch changes ZONE_DMA32 with ZONE_DMA, the
latter covering 32-bit dma address space to honour GFP_DMA allocations.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: vdso: clean up vdso_pagelist initialization

Remove some unnecessary bits that were apparently carried over from
another architecture's implementation:

- No need to get_page() the vdso text/data - these are part of the
kernel image.
- No need for ClearPageReserved on the vdso text.
- No need to vmap the first text page to check the ELF header - this
can be done through &vdso_start.

Also some minor cleanup:
- Use kcalloc for vdso_pagelist array allocation.
- Don't print on allocation failure, slab/slub will do that for us.

Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Change misleading function names in dma-mapping

arm64_swiotlb_alloc/free_coherent name can be misleading
somtimes with CMA support being enabled after this
patch (c2104debc235b745265b64d610237a6833fd53)

Change this name to be more generic:
__dma_alloc/free_coherent

Signed-off-by: Ritesh Harjani <ritesh.harjani@gmail.com>
[catalin.marinas@arm.com: renamed arm64_swiotlb_dma_ops to coherent_swiotlb_dma_ops]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Fix the soft_restart routine

Change the soft_restart() routine to call cpu_reset() at its identity mapped
physical address.

The cpu_reset() routine must be called at its identity mapped physical address
so that when the MMU is turned off the instruction pointer will be at the correct
location in physical memory.

Signed-off-by: Geoff Levand <geoff@infradead.org> for Huawei, Linaro
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Extend the idmap to the whole kernel image

This patch changes the idmap page table creation during boot to cover
the whole kernel image, allowing functions like cpu_reset() to be safely
called with the physical address.

This patch also simplifies the create_block_map asm macro to no longer
take an idmap argument and always use the phys/virt/end parameters. For
the idmap case, phys == virt.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Convert asm/tlb.h to generic mmu_gather

Over the past couple of years, the generic mmu_gather gained range
tracking - 597e1c3580b7 (mm/mmu_gather: enable tlb flush range in generic
mmu_gather), 2b047252d087 (Fix TLB gather virtual address range
invalidation corner cases) - and tlb_fast_mode() has been removed -
29eb77825cc7 (arch, mm: Remove tlb_fast_mode()).

The new mmu_gather structure is now suitable for arm64 and this patch
converts the arch asm/tlb.h to the generic code. One functional
difference is the shift_arg_pages() case where previously the code was
flushing the full mm (no tlb_start_vma call) but now it flushes the
range given to tlb_gather_mmu() (possibly slightly more efficient
previously).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>

arm64: Extend the PCI I/O space to 16MB

The patch moves the PCI I/O space (currently at 64K) before the
earlyprintk mapping and extends it to 16MB.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: enable processor debug state for secondary cpus

processor debug state PSTATE.D is unmasked in smp call
clear_os_lock for secondary cpus. So debug state is still
masked in normal kernel context. With this patch, unmask
debug state on secondary boot for the cpus in normal kernel
context. Now kgdb tests passed with multicore.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: KGDB: Add KGDB config

Add HAVE_ARCH_KGDB for arm64 Kconfig

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

misc: debug: remove compilation warnings

typecast instruction_pointer macro to unsigned long to
resolve following compiler warnings like
warning: format '%lx' expects argument of type 'long unsigned int',
but argument 2 has type 'u64' [-Wformat]

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

KGDB: make kgdb_breakpoint() as noinline

The function kgdb_breakpoint() sets up break point at
compile time by calling arch_kgdb_breakpoint();
Though this call is surrounded by wmb() barrier,
the compile can still re-order the break point,
because this scheduling barrier is not a code motion
barrier in gcc.

Making kgdb_breakpoint() as noinline solves this problem
of code reording around break point instruction and also
avoids problem of being called as inline function from
other places

More details about discussion on this can be found here
http://comments.gmane.org/gmane.linux.ports.arm.kernel/269732

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: KGDB: Add step debugging support

Add KGDB software step debugging support for EL1 debug
in AArch64 mode.

KGDB registers step debug handler with debug monitor.
On receiving 'step' command from GDB tool, target enables
software step debugging and step address is updated in ELR.

Software Step debugging is disabled when 'continue' command
is received

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: KGDB: Add Basic KGDB support

Add KGDB debug support for kernel debugging.
With this patch, basic KGDB debugging is possible.GDB register
layout is updated and GDB tool can establish connection with
target and can set/clear breakpoints.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Add macros to manage processor debug state

Add macros to enable and disable to manage PSTATE.D
for debugging. The macros local_dbg_save and local_dbg_restore
are moved to irqflags.h file

KGDB boot tests fail because of PSTATE.D is masked.
unmask it for debugging support

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: KVM: Add VGIC device control for arm64

This fixes the build breakage introduced by
c07a0191ef2de1f9510f12d1f88e3b0b5cd8d66f and adds support for the device
control API and save/restore of the VGIC state for ARMv8.

The defines were simply missing from the arm64 header files and
uaccess.h must be implicitly imported from somewhere else on arm.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

arm64: defconfig: Expand default enabled features

FPGA implementations of the Cortex-A57 and Cortex-A53 are now available
in the form of the SMM-A57 and SMM-A53 Soft Macrocell Models (SMMs) for
Versatile Express. As these attach to a Motherboard Express V2M-P1 it
would be useful to have support for some V2M-P1 peripherals enabled by
default.

Additionally a couple of of features have been introduced since the last
defconfig update (CMA, jump labels) that would be good to have enabled
by default to ensure they are build and boot tested.

This patch updates the arm64 defconfig to enable support for these
devices and features. The arm64 Kconfig is modified to select
HAVE_PATA_PLATFORM, which is required to enable support for the
CompactFlash controller on the V2M-P1.

A few options which don't need to appear in defconfig are trimmed:

* BLK_DEV - selected by default
* EXPERIMENTAL - otherwise gone from the kernel
* MII - selected by drivers which require it
* USB_SUPPORT - selected by default

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: asm: remove redundant "cc" clobbers

cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
from inline asm blocks that only use these instructions to implement
conditional branches.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: atomics: fix use of acquire + release for full barrier semantics

Linux requires a number of atomic operations to provide full barrier
semantics, that is no memory accesses after the operation can be
observed before any accesses up to and including the operation in
program order.

On arm64, these operations have been incorrectly implemented as follows:

// A, B, C are independent memory locations

<Access [A]>

// atomic_op (B)
1: ldaxr x0, [B] // Exclusive load with acquire
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b

<Access [C]>

The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).

Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.

The simple way to fix this is to implement the same algorithm as ARMv7
using explicit barriers:

<Access [A]>

// atomic_op (B)
dmb ish // Full barrier
1: ldxr x0, [B] // Exclusive load
<op(B)>
stxr w1, x0, [B] // Exclusive store
cbnz w1, 1b
dmb ish // Full barrier

<Access [C]>

but this has the undesirable effect of introducing *two* full barrier
instructions. A better approach is actually the following, non-intuitive
sequence:

<Access [A]>

// atomic_op (B)
1: ldxr x0, [B] // Exclusive load
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b
dmb ish // Full barrier

<Access [C]>

The simple observations here are:

  - The dmb ensures that no subsequent accesses (e.g. the access to C)
    can enter or pass the atomic sequence.

  - The dmb also ensures that no prior accesses (e.g. the access to A)
    can pass the atomic sequence.

  - Therefore, no prior access can pass a subsequent access, or
    vice-versa (i.e. A is strictly ordered before C).

  - The stlxr ensures that no prior access can pass the store component
    of the atomic operation.

The only tricky part remaining is the ordering between the ldxr and the
access to A, since the absence of the first dmb means that we're now
permitting re-ordering between the ldxr and any prior accesses.

From an (arbitrary) observer's point of view, there are two scenarios:

  1. We have observed the ldxr. This means that if we perform a store to
     [B], the ldxr will still return older data. If we can observe the
     ldxr, then we can potentially observe the permitted re-ordering
     with the access to A, which is clearly an issue when compared to
     the dmb variant of the code. Thankfully, the exclusive monitor will
     save us here since it will be cleared as a result of the store and
     the ldxr will retry. Notice that any use of a later memory
     observation to imply observation of the ldxr will also imply
     observation of the access to A, since the stlxr/dmb ensure strict
     ordering.

  2. We have not observed the ldxr. This means we can perform a store
     and influence the later ldxr. However, that doesn't actually tell
     us anything about the access to [A], so we've not lost anything
     here either when compared to the dmb variant.

This patch implements this solution for our barriered atomic operations,
ensuring that we satisfy the full barrier requirements where they are
needed.

Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: barriers: allow dsb macro to take option parameter

The dsb instruction takes an option specifying both the target access
types and shareability domain.

This patch allows such an option to be passed to the dsb macro,
resulting in potentially more efficient code. Currently the option is
ignored until all callers are updated (unlike ARM, the option is
mandated by the assembler).

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: compat: Wire up new AArch32 syscalls

This patch enables sys_compat, sys_finit_module, sys_sched_setattr and
sys_sched_getattr for compat (AArch32) applications.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: simplify pgd_alloc

Currently pgd_alloc has a redundant NULL check in its return path that
can be removed with no ill effects. With that removed it's also possible
to return early and eliminate the new_pgd temporary variable.

This patch applies said modifications, making the logic of pgd_alloc
correspond 1-1 with that of pgd_free.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: fix typo: s/SERRROR/SERROR/

Somehow SERROR has acquired an additional 'R' in a couple of headers.
This patch removes them before they spread further. As neither instance
is in use yet, no other sites need to be fixed up.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Align CMA sizes to PAGE_SIZE

dma_alloc_from_contiguous takes number of pages for a size.
Align up the dma size passed in to page size to avoid truncation
and allocation failures on sizes less than PAGE_SIZE.

Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: mm: Introduce PTE_WRITE

We have the following means for encoding writable or dirty ptes:

                                PTE_DIRTY       PTE_RDONLY
!pte_dirty && !pte_write        0               1
!pte_dirty && pte_write         0               1
pte_dirty && !pte_write         1               1
pte_dirty && pte_write          1               0

So we can't distinguish between writable clean ptes and read only
ptes. This can cause problems with ptes being incorrectly flagged as
read only when they are writable but not dirty.

This patch introduces a new software bit PTE_WRITE which allows us to
correctly identify writable ptes. PTE_RDONLY is now only clear for
valid ptes where a page is both writable and dirty.

Signed-off-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: mm: Remove PTE_BIT_FUNC macro

Expand out the pte manipulation functions. This makes our life easier
when using things like tags and cscope.

Signed-off-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: FIQs are unused

So any FIQ handling is superfluous at the moment. The functions to
disable/enable FIQs is kept around if ever someone needs them in the
future, but existing calling sites including arch_cpu_idle_prepare()
may go for now.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: mm: fix the function name in comment of cpu_do_switch_mm

Fix the function name of comment of cpu_do_switch_mm,
because cpu_do_switch_mm is the correct name.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: fix build error if DMA_CMA is enabled

arm64/include/asm/dma-contiguous.h is trying to include
<asm-genric/dma-contiguous.h> which does not exist, and thus failing
build for arm64 if we enable CONFIG_DMA_CMA. This patch fixes build
error by removing unwanted header inclusion from arm64's dma-contiguous.h.

Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com>
Signed-off-by: Somraj Mani <somraj.mani@samsung.com>
Acked-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: kernel: fix per-cpu offset restore on resume

The introduction of percpu offset optimisation through tpidr_el1 in:

Commit id :7158627686f02319c50c8d9d78f75d4c8
"arm64: percpu: implement optimised pcpu access using tpidr_el1"

requires cpu_{suspend/resume} to restore the tpidr_el1 register upon resume
so that percpu variables can be addressed correctly when a CPU comes out
of reset from warm-boot.

This patch fixes cpu_{suspend}/{resume} tpidr_el1 restoration on resume, by
calling the set_my_cpu_offset C API, as it is done on primary and secondary
CPUs on cold boot, so that, even if the register used to store the percpu
offset is changed, the save and restore of general purpose registers does not
have to be updated.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: mm: fix the function name in comment of __flush_dcache_area

Fix the function name of comment of __flush_dcache_area,
because __flush_dcache_area is the correct name. Also,
the missing variable 'size' is added to the comment.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: mm: use ubfm for dcache_line_size

Use 'ubfm' for the bitfield move instruction; thus, single
instruction can be used instead of two instructions, when
getting the minimum D-cache line size from CTR_EL0 register.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm/arm64: kvm: Use virt_to_idmap instead of virt_to_phys for idmap mappings

KVM initialisation fails on architectures implementing virt_to_idmap()
because virt_to_phys() on such architectures won't fetch you the correct
idmap page.

So update the KVM ARM code to use the virt_to_idmap() to fix the issue.
Since the KVM code is shared between arm and arm64, we create
kvm_virt_to_phys() and handle the redirection in respective headers.

Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

arm64: KVM: Force undefined exception for Guest SMC intructions

The SMC-based PSCI emulation for Guest is going to be very different
from the in-kernel HVC-based PSCI emulation hence for now just inject
undefined exception when Guest executes SMC instruction.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: marc Zyngier <marc.zyngier@arm.com>

arm64: KVM: Support X-Gene guest VCPU on APM X-Gene host

This patch allows us to have X-Gene guest VCPU when using KVM arm64
on APM X-Gene host.

We add KVM_ARM_TARGET_XGENE_POTENZA for X-Gene Potenza compatible
guest VCPU and we return KVM_ARM_TARGET_XGENE_POTENZA in kvm_target_cpu()
when running on X-Gene host with Potenza core.

[maz: sanitized the commit log]

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: KVM: Add Kconfig option for max VCPUs per-Guest

Current max VCPUs per-Guest is set to 4 which is preventing
us from creating a Guest (or VM) with 8 VCPUs on Host (e.g.
X-Gene Storm SOC) with 8 Host CPUs.

The correct value of max VCPUs per-Guest should be same as
the max CPUs supported by GICv2 which is 8 but, increasing
value of max VCPUs per-Guest can make things slower hence
we add Kconfig option to let KVM users select appropriate
max VCPUs per-Guest.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

ARM/KVM: save and restore generic timer registers

For migration to work we need to save (and later restore) the state of
each core's virtual generic timer.
Since this is per VCPU, we can use the [gs]et_one_reg ioctl and export
the three needed registers (control, counter, compare value).
Though they live in cp15 space, we don't use the existing list, since
they need special accessor functions and the arch timer is optional.

Acked-by: Marc Zynger <marc.zyngier@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

arm64: fix typo in entry.S

Commit 64681787 (arm64: let the core code deal with preempt_count)
changed the code, but left the comments unchanged, fix it.

Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: kernel: restore HW breakpoint registers in cpu_suspend

When a CPU resumes from low-power, it restores HW breakpoint and
watchpoint slots through a CPU PM notifier. Since we want to enable
debugging as early as possible in the resume path, the mdscr content
is restored along the general purpose registers in the cpu_suspend API
and debug exceptions are reenabled when cpu_suspend returns. Since the
CPU PM notifier is run after a CPU has been resumed, we cannot expect
HW breakpoint registers to contain sane values till the notifier is run,
since the HW breakpoints registers content is unknown at reset; this means
that the CPU might run with debug exceptions enabled, mdscr restored but HW
breakpoint registers containing junk values that can trigger spurious
debug exceptions.

This patch fixes current HW breakpoints restore by moving the HW breakpoints
registers restoration to the cpu_suspend API, before the debug exceptions are
enabled. This way, as soon as the cpu_suspend function returns the
kernel can resume debugging with sane values in HW breakpoint registers.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

jump_label: use defined macros instead of hard-coding for better readability

Use macro JUMP_LABEL_TRUE_BRANCH instead of hard-coding for better
readability.

Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64, jump label: optimize jump label implementation

Optimize jump label implementation for ARM64 by dynamically patching
kernel text.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64, jump label: detect %c support for ARM64

As commit a9468f30b5eac6 "ARM: 7333/2: jump label: detect %c
support for ARM", this patch detects the same thing for ARM64
because some ARM64 GCC versions have the same issue.

Some versions of ARM64 GCC which do support asm goto, do not
support the %c specifier. Since we need the %c to support jump
labels on ARM64, detect that too in the asm goto detection script
to avoid build errors with these versions.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: introduce aarch64_insn_gen_{nop|branch_imm}() helper functions

Introduce aarch64_insn_gen_{nop|branch_imm}() helper functions, which
will be used to implement jump label on ARM64.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: move encode_insn_immediate() from module.c to insn.c

Function encode_insn_immediate() will be used by other instruction
manipulate related functions, so move it into insn.c and rename it
as aarch64_insn_encode_immediate().

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: introduce interfaces to hotpatch kernel and module code

Introduce three interfaces to patch kernel and module code:
aarch64_insn_patch_text_nosync():
patch code without synchronization, it's caller's responsibility
to synchronize all CPUs if needed.
aarch64_insn_patch_text_sync():
patch code and always synchronize with stop_machine()
aarch64_insn_patch_text():
patch code and synchronize with stop_machine() if needed

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: introduce basic aarch64 instruction decoding helpers

Introduce basic aarch64 instruction decoding helper
aarch64_get_insn_class() and aarch64_insn_hotpatch_safe().

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: dts: Reduce size of virtio block device for foundation model

Will Deacon observed that kvmtool uses a size of 0x200 for virtio
block memory region and that the virtio block spec only uses 31 bytes in
the device specific region at 0x100 so reduce the region to a less
wasteful 0x200.

Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Remove unused __data_loc variable

The __data_loc variable is an unused left over from the 32 bit arm implementation.
Remove that variable and adjust the __mmap_switched startup routine accordingly.

Signed-off-by: Geoff Levand <geoff@infradead.org> for Huawei, Linaro
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

clockevents: Implement unbind functionality

Provide a sysfs interface to allow unbinding of clockevent
devices. The device is unbound if it is unused or if there is a
replacement device available. Unbinding of broadcast devices is not
supported as we don't want to foster that nonsense. If no replacement
device is available the unbind returns -EBUSY. Unbind is available
from the kernel and through sysfs, which is necessary to drop the
module refcount.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143436.499216659@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

clockevents: Define CS_NAME_LEN unconditionally

Unbreak architectures which do not use clockevents, but require to
build some of the core timekeeping infrastructure

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

ARM: arch_timer: add support to configure and enable event stream

This patch adds support for configuring the event stream frequency
and enabling it.

It also adds the hwcaps definitions to the user to detect this event
stream feature.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Conflicts:
arch/arm/include/asm/arch_timer.h
arch/arm/include/uapi/asm/hwcap.h
arch/arm/kernel/setup.c

arm64: Enable CMA

arm64 bit targets need the features CMA provides. Add the appropriate
hooks, header files, and Kconfig to allow this to happen.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Warn on NULL device structure for dma APIs

Although parts of the DMA apis may properly check for NULL devices,
there may be some places that don't. Rather than fix up all the
possible locations, just require a non-NULL device structure to be
used for allocating/freeing.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
[catalin.marinas@arm.com: s/WARN/WARN_ONCE/]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Add hwcaps for crypto and CRC32 extensions.

Advertise the optional cryptographic and CRC32 instructions to
user space where present. Several hwcap bits [3-7] are allocated.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
[bit 2 is taken now so use bits 3-7 instead]
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: drop redundant macros from read_cpuid()

asm/cputype.h contains a bunch of #defines for CPU id registers
that essentially map to themselves. Remove the #defines and pass
the tokens directly to the inline asm() that reads the registers.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Remove outdated comment

Code referenced in the comment has moved to arch/arm64/kernel/cputable.c

Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: cmpxchg: update macros to prevent warnings

Make sure the value we are going to return is referenced in order to
avoid warnings from newer GCCs such as:

arch/arm64/include/asm/cmpxchg.h:162:3: warning: value computed is not used [-Wunused-value]
  ((__typeof__(*(ptr)))__cmpxchg_mb((ptr),   \
   ^
net/netfilter/nf_conntrack_core.c:674:2: note: in expansion of macro ‘cmpxchg’
  cmpxchg(&nf_conntrack_hash_rnd, 0, rand);

[Modified to use the current underlying implementation as current
mainline for both cmpxchg() and cmpxchg_local() does -- broonie]

Signed-off-by: Mark Hambleton <mahamble@broadcom.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: support single-step and breakpoint handler hooks

AArch64 Single Steping and Breakpoint debug exceptions will be
used by multiple debug framworks like kprobes & kgdb.

This patch implements the hooks for those frameworks to register
their own handlers for handling breakpoint and single step events.

Reworked the debug exception handler in entry.S: do_dbg to route
software breakpoint (BRK64) exception to do_debug_exception()

Signed-off-by: Sandeepa Prabhu <sandeepa.prabhu@linaro.org>
Signed-off-by: Deepak Saxena <dsaxena@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM64: fix framepointer check in unwind_frame

We need at least 24 bytes above frame pointer.

Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM64: check stack pointer in get_wchan

get_wchan() is lockless. Task may wakeup at any time and change its own stack,
thus each next stack frame may be overwritten and filled with random stuff.

Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: kconfig: select HAVE_EFFICIENT_UNALIGNED_ACCESS

ARMv8 CPUs can perform efficient unaligned memory accesses in hardware
and this feature is relied up on by code such as the dcache
word-at-a-time name hashing.

This patch selects HAVE_EFFICIENT_UNALIGNED_ACCESS for arm64.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: dcache: select DCACHE_WORD_ACCESS for little-endian CPUs

DCACHE_WORD_ACCESS uses the word-at-a-time API for optimised string
comparisons in the vfs layer.

This patch implements support for load_unaligned_zeropad in much the
same way as has been done for ARM, although big-endian systems are also
supported.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Conflicts:
arch/arm64/Kconfig

arm64: futex: ensure .fixup entries are sufficiently aligned

AArch64 instructions must be 4-byte aligned, so make sure this is true
for the futex .fixup section.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: use generic strnlen_user and strncpy_from_user functions

This patch implements the word-at-a-time interface for arm64 using the
same algorithm as ARM. We use the fls64 macro, which expands to a clz
instruction via a compiler builtin. Big-endian configurations make use
of the implementation from asm-generic.

With this implemented, we can replace our byte-at-a-time strnlen_user
and strncpy_from_user functions with the optimised generic versions.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: percpu: implement optimised pcpu access using tpidr_el1

This patch implements optimised percpu variable accesses using the
el1 r/w thread register (tpidr_el1) along the same lines as arch/arm/.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: perf: add support for percpu pmu interrupt

Add support for irq registration when pmu interrupt is percpu.

Signed-off-by: Vinayak Kale <vkale@apm.com>
Signed-off-by: Tuan Phan <tphan@apm.com>
[will: tidied up cross-calling to pass &irq]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

genirq: Add an accessor for IRQ_PER_CPU flag

This patch adds an accessor function for IRQ_PER_CPU flag.
The accessor function is useful to determine whether an IRQ is percpu or not.

This patch is based on an older patch posted by Chris Smith here [1].
There is a minor change w.r.t. Chris's original patch: The accessor function
is renamed as 'irq_is_percpu' instead of 'irq_is_per_cpu'.

[1]: http://lkml.indiana.edu/hypermail/linux/kernel/1207.3/02955.html

Signed-off-by: Chris Smith <chris.smith@st.com>
Signed-off-by: Vinayak Kale <vkale@apm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: vmlinux.lds.S: drop redundant .comment

We currently try to emit .comment twice, once in STABS_DEBUG, and once
in the line immediately following it. As the two section definitions are
identical, the latter is redundant and can be dropped.

This patch drops the redundant .comment section definition.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

drm: Update drm_addmap and drm_mmap to use PAT WC instead of MTRRs

Previously, DRM_FRAME_BUFFER mappings, as well as DRM_REGISTERS
mappings with DRM_WRITE_COMBINING set, resulted in an unconditional
MTRR being added but the actual mappings being created as UC-.

Now these mappings have the MTRR added only if needed, but they will
be mapped with pgprot_writecombine.

The non-WC DRM_REGISTERS case now uses pgprot_noncached instead of
hardcoding the bit twiddling.

The DRM_AGP case is unchanged for now.

[airlied: fix ppc build]
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Conflicts:
drivers/gpu/drm/drm_vm.c

Add arch_phys_wc_{add, del} to manipulate WC MTRRs if needed

Several drivers currently use mtrr_add through various #ifdef guards
and/or drm wrappers. The vast majority of them want to add WC MTRRs
on x86 systems and don't actually need the MTRR if PAT (i.e.
ioremap_wc, etc) are working.

arch_phys_wc_add and arch_phys_wc_del are new functions, available
on all architectures and configurations, that add WC MTRRs on x86 if
needed (and handle errors) and do nothing at all otherwise. They're
also easier to use than mtrr_add and mtrr_del, so the call sites can
be simplified.

As an added benefit, this will avoid wasting MTRRs and possibly
warning pointlessly on PAT-supporting systems.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>