platform/kernel/linux-rpi.git
3 years agoMerge branch 'for-next/tlbi' into for-next/core
Catalin Marinas [Fri, 31 Jul 2020 17:09:50 +0000 (18:09 +0100)]
Merge branch 'for-next/tlbi' into for-next/core

* for-next/tlbi:
  : Support for TTL (translation table level) hint in the TLB operations
  arm64: tlb: Use the TLBI RANGE feature in arm64
  arm64: enable tlbi range instructions
  arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature
  arm64: tlb: don't set the ttl value in flush_tlb_page_nosync
  arm64: Shift the __tlbi_level() indentation left
  arm64: tlb: Set the TTL field in flush_*_tlb_range
  arm64: tlb: Set the TTL field in flush_tlb_range
  tlb: mmu_gather: add tlb_flush_*_range APIs
  arm64: Add tlbi_user_level TLB invalidation helper
  arm64: Add level-hinted TLB invalidation helper
  arm64: Document SW reserved PTE/PMD bits in Stage-2 descriptors
  arm64: Detect the ARMv8.4 TTL feature

3 years agoMerge branches 'for-next/misc', 'for-next/vmcoreinfo', 'for-next/cpufeature', 'for...
Catalin Marinas [Fri, 31 Jul 2020 17:09:39 +0000 (18:09 +0100)]
Merge branches 'for-next/misc', 'for-next/vmcoreinfo', 'for-next/cpufeature', 'for-next/acpi', 'for-next/perf', 'for-next/timens', 'for-next/msi-iommu' and 'for-next/trivial' into for-next/core

* for-next/misc:
  : Miscellaneous fixes and cleanups
  arm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack
  arm64/mm: save memory access in check_and_switch_context() fast switch path
  recordmcount: only record relocation of type R_AARCH64_CALL26 on arm64.
  arm64: Reserve HWCAP2_MTE as (1 << 18)
  arm64/entry: deduplicate SW PAN entry/exit routines
  arm64: s/AMEVTYPE/AMEVTYPER
  arm64/hugetlb: Reserve CMA areas for gigantic pages on 16K and 64K configs
  arm64: stacktrace: Move export for save_stack_trace_tsk()
  smccc: Make constants available to assembly
  arm64/mm: Redefine CONT_{PTE, PMD}_SHIFT
  arm64/defconfig: Enable CONFIG_KEXEC_FILE
  arm64: Document sysctls for emulated deprecated instructions
  arm64/panic: Unify all three existing notifier blocks
  arm64/module: Optimize module load time by optimizing PLT counting

* for-next/vmcoreinfo:
  : Export the virtual and physical address sizes in vmcoreinfo
  arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
  crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo

* for-next/cpufeature:
  : CPU feature handling cleanups
  arm64/cpufeature: Validate feature bits spacing in arm64_ftr_regs[]
  arm64/cpufeature: Replace all open bits shift encodings with macros
  arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR2 register
  arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR1 register
  arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR0 register

* for-next/acpi:
  : ACPI updates for arm64
  arm64/acpi: disallow writeable AML opregion mapping for EFI code regions
  arm64/acpi: disallow AML memory opregions to access kernel memory

* for-next/perf:
  : perf updates for arm64
  arm64: perf: Expose some new events via sysfs
  tools headers UAPI: Update tools's copy of linux/perf_event.h
  arm64: perf: Add cap_user_time_short
  perf: Add perf_event_mmap_page::cap_user_time_short ABI
  arm64: perf: Only advertise cap_user_time for arch_timer
  arm64: perf: Implement correct cap_user_time
  time/sched_clock: Use raw_read_seqcount_latch()
  sched_clock: Expose struct clock_read_data
  arm64: perf: Correct the event index in sysfs
  perf/smmuv3: To simplify code for ioremap page in pmcg

* for-next/timens:
  : Time namespace support for arm64
  arm64: enable time namespace support
  arm64/vdso: Restrict splitting VVAR VMA
  arm64/vdso: Handle faults on timens page
  arm64/vdso: Add time namespace page
  arm64/vdso: Zap vvar pages when switching to a time namespace
  arm64/vdso: use the fault callback to map vvar pages

* for-next/msi-iommu:
  : Make the MSI/IOMMU input/output ID translation PCI agnostic, augment the
  : MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID bus-specific parameter
  : and apply the resulting changes to the device ID space provided by the
  : Freescale FSL bus
  bus: fsl-mc: Add ACPI support for fsl-mc
  bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver
  of/irq: Make of_msi_map_rid() PCI bus agnostic
  of/irq: make of_msi_map_get_device_domain() bus agnostic
  dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
  of/device: Add input id to of_dma_configure()
  of/iommu: Make of_map_rid() PCI agnostic
  ACPI/IORT: Add an input ID to acpi_dma_configure()
  ACPI/IORT: Remove useless PCI bus walk
  ACPI/IORT: Make iort_msi_map_rid() PCI agnostic
  ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic
  ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC

* for-next/trivial:
  : Trivial fixes
  arm64: sigcontext.h: delete duplicated word
  arm64: ptrace.h: delete duplicated word
  arm64: pgtable-hwdef.h: delete duplicated words

3 years agoarm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack
Maninder Singh [Fri, 31 Jul 2020 11:49:50 +0000 (17:19 +0530)]
arm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack

IRQ_STACK_SIZE can be made different from THREAD_SIZE,
and as IRQ_STACK_SIZE is used while irq stack allocation,
same define should be used while printing information of irq stack.

Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/1596196190-14141-1-git-send-email-maninder1.s@samsung.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64/mm: save memory access in check_and_switch_context() fast switch path
Pingfan Liu [Fri, 10 Jul 2020 14:04:12 +0000 (22:04 +0800)]
arm64/mm: save memory access in check_and_switch_context() fast switch path

On arm64, smp_processor_id() reads a per-cpu `cpu_number` variable,
using the per-cpu offset stored in the tpidr_el1 system register. In
some cases we generate a per-cpu address with a sequence like:

  cpu_ptr = &per_cpu(ptr, smp_processor_id());

Which potentially incurs a cache miss for both `cpu_number` and the
in-memory `__per_cpu_offset` array. This can be written more optimally
as:

  cpu_ptr = this_cpu_ptr(ptr);

Which only needs the offset from tpidr_el1, and does not need to
load from memory.

The following two test cases show a small performance improvement measured
on a 46-cpus qualcomm machine with 5.8.0-rc4 kernel.

Test 1: (about 0.3% improvement)
    #cat b.sh
    make clean && make all -j138
    #perf stat --repeat 10 --null --sync sh b.sh

    - before this patch
     Performance counter stats for 'sh b.sh' (10 runs):

                298.62 +- 1.86 seconds time elapsed  ( +-  0.62% )

    - after this patch
     Performance counter stats for 'sh b.sh' (10 runs):

               297.734 +- 0.954 seconds time elapsed  ( +-  0.32% )

Test 2: (about 1.69% improvement)
     'perf stat -r 10 perf bench sched messaging'
        Then sum the total time of 'sched/messaging' by manual.

    - before this patch
      total 0.707 sec for 10 times
    - after this patch
      totol 0.695 sec for 10 times

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/1594389852-19949-1-git-send-email-kernelfans@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: sigcontext.h: delete duplicated word
Randy Dunlap [Sun, 26 Jul 2020 00:32:07 +0000 (17:32 -0700)]
arm64: sigcontext.h: delete duplicated word

Drop the repeated word "the".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20200726003207.20253-4-rdunlap@infradead.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: ptrace.h: delete duplicated word
Randy Dunlap [Sun, 26 Jul 2020 00:32:06 +0000 (17:32 -0700)]
arm64: ptrace.h: delete duplicated word

Drop the repeated word "the".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20200726003207.20253-3-rdunlap@infradead.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: pgtable-hwdef.h: delete duplicated words
Randy Dunlap [Sun, 26 Jul 2020 00:32:05 +0000 (17:32 -0700)]
arm64: pgtable-hwdef.h: delete duplicated words

Drop the repeated words "at" and "the".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20200726003207.20253-2-rdunlap@infradead.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agobus: fsl-mc: Add ACPI support for fsl-mc
Makarand Pawagi [Fri, 19 Jun 2020 08:20:13 +0000 (09:20 +0100)]
bus: fsl-mc: Add ACPI support for fsl-mc

Add ACPI support in the fsl-mc driver. Driver parses MC DSDT table to
extract memory and other resources.

Interrupt (GIC ITS) information is extracted from the MADT table
by drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c.

IORT table is parsed to configure DMA.

Signed-off-by: Makarand Pawagi <makarand.pawagi@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Link: https://lore.kernel.org/r/20200619082013.13661-13-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agobus/fsl-mc: Refactor the MSI domain creation in the DPRC driver
Diana Craciun [Fri, 19 Jun 2020 08:20:12 +0000 (09:20 +0100)]
bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver

The DPRC driver is not taking into account the msi-map property
and assumes that the icid is the same as the stream ID. Although
this assumption is correct, generalize the code to include a
translation between icid and streamID.

Furthermore do not just copy the MSI domain from parent (for child
containers), but use the information provided by the msi-map property.

If the msi-map property is missing from the device tree retain the old
behaviour for backward compatibility ie the child DPRC objects
inherit the MSI domain from the parent.

Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200619082013.13661-12-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoof/irq: Make of_msi_map_rid() PCI bus agnostic
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:11 +0000 (09:20 +0100)]
of/irq: Make of_msi_map_rid() PCI bus agnostic

There is nothing PCI bus specific in the of_msi_map_rid()
implementation other than the requester ID tag for the input
ID space. Rename requester ID to a more generic ID so that
the translation code can be used by all busses that require
input/output ID translations.

No functional change intended.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200619082013.13661-11-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoof/irq: make of_msi_map_get_device_domain() bus agnostic
Diana Craciun [Fri, 19 Jun 2020 08:20:10 +0000 (09:20 +0100)]
of/irq: make of_msi_map_get_device_domain() bus agnostic

of_msi_map_get_device_domain() is PCI specific but it need not be and
can be easily changed to be bus agnostic in order to be used by other
busses by adding an IRQ domain bus token as an input parameter.

Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci/msi.c
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200619082013.13661-10-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agodt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
Laurentiu Tudor [Fri, 19 Jun 2020 08:20:09 +0000 (09:20 +0100)]
dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus

The existing bindings cannot be used to specify the relationship
between fsl-mc devices and GIC ITSes.
Add a generic binding for mapping fsl-mc devices to GIC ITSes, using
msi-map property.
In addition, deprecate msi-parent property which no longer makes sense
now that we support translating the MSIs.

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: Rob Herring <robh+dt@kernel.org>
Link: https://lore.kernel.org/r/20200619082013.13661-9-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoof/device: Add input id to of_dma_configure()
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:08 +0000 (09:20 +0100)]
of/device: Add input id to of_dma_configure()

Devices sitting on proprietary busses have a device ID space that
is owned by the respective bus and related firmware bindings. In order
to let the generic OF layer handle the input translations to
an IOMMU id, for such busses the current of_dma_configure() interface
should be extended in order to allow the bus layer to provide the
device input id parameter - that is retrieved/assigned in bus
specific code and firmware.

Augment of_dma_configure() to add an optional input_id parameter,
leaving current functionality unchanged.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Link: https://lore.kernel.org/r/20200619082013.13661-8-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoof/iommu: Make of_map_rid() PCI agnostic
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:07 +0000 (09:20 +0100)]
of/iommu: Make of_map_rid() PCI agnostic

There is nothing PCI specific (other than the RID - requester ID)
in the of_map_rid() implementation, so the same function can be
reused for input/output IDs mapping for other busses just as well.

Rename the RID instances/names to a generic "id" tag.

No functionality change intended.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Joerg Roedel <jroedel@suse.de>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200619082013.13661-7-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoACPI/IORT: Add an input ID to acpi_dma_configure()
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:06 +0000 (09:20 +0100)]
ACPI/IORT: Add an input ID to acpi_dma_configure()

Some HW devices are created as child devices of proprietary busses,
that have a bus specific policy defining how the child devices
wires representing the devices ID are translated into IOMMU and
IRQ controllers device IDs.

Current IORT code provides translations for:

- PCI devices, where the device ID is well identified at bus level
  as the requester ID (RID)
- Platform devices that are endpoint devices where the device ID is
  retrieved from the ACPI object IORT mappings (Named components single
  mappings). A platform device is represented in IORT as a named
  component node

For devices that are child devices of proprietary busses the IORT
firmware represents the bus node as a named component node in IORT
and it is up to that named component node to define in/out bus
specific ID translations for the bus child devices that are
allocated and created in a bus specific manner.

In order to make IORT ID translations available for proprietary
bus child devices, the current ACPI (and IORT) code must be
augmented to provide an additional ID parameter to acpi_dma_configure()
representing the child devices input ID. This ID is bus specific
and it is retrieved in bus specific code.

By adding an ID parameter to acpi_dma_configure(), the IORT
code can map the child device ID to an IOMMU stream ID through
the IORT named component representing the bus in/out ID mappings.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lore.kernel.org/r/20200619082013.13661-6-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoACPI/IORT: Remove useless PCI bus walk
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:05 +0000 (09:20 +0100)]
ACPI/IORT: Remove useless PCI bus walk

The PCI bus domain number (used in the iort_match_node_callback() -
pci_domain_nr() call) is cascaded through the PCI bus hierarchy at PCI
bus enumeration time, therefore there is no need in iort_find_dev_node()
to walk the PCI bus upwards to grab the root bus to be passed to
iort_scan_node(), the device->bus PCI bus pointer will do.

Remove this useless code.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lore.kernel.org/r/20200619082013.13661-5-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoACPI/IORT: Make iort_msi_map_rid() PCI agnostic
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:04 +0000 (09:20 +0100)]
ACPI/IORT: Make iort_msi_map_rid() PCI agnostic

There is nothing PCI specific in iort_msi_map_rid().

Rename the function using a bus protocol agnostic name,
iort_msi_map_id(), and convert current callers to it.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lore.kernel.org/r/20200619082013.13661-4-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoACPI/IORT: Make iort_get_device_domain IRQ domain agnostic
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:03 +0000 (09:20 +0100)]
ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic

iort_get_device_domain() is PCI specific but it need not be,
since it can be used to retrieve IRQ domain nexus of any kind
by adding an irq_domain_bus_token input to it.

Make it PCI agnostic by also renaming the requestor ID input
to a more generic ID name.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci/msi.c
Cc: Will Deacon <will@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lore.kernel.org/r/20200619082013.13661-3-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC
Lorenzo Pieralisi [Fri, 19 Jun 2020 08:20:02 +0000 (09:20 +0100)]
ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC

When the iort_match_node_callback is invoked for a named component
the match should be executed upon a device with an ACPI companion.

For devices with no ACPI companion set-up the ACPI device tree must be
walked in order to find the first parent node with a companion set and
check the parent node against the named component entry to check whether
there is a match and therefore an IORT node describing the in/out ID
translation for the device has been found.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lore.kernel.org/r/20200619082013.13661-2-lorenzo.pieralisi@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: enable time namespace support
Andrei Vagin [Wed, 24 Jun 2020 08:33:21 +0000 (01:33 -0700)]
arm64: enable time namespace support

CONFIG_TIME_NS is dependes on GENERIC_VDSO_TIME_NS.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20200624083321.144975-7-avagin@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64/vdso: Restrict splitting VVAR VMA
Andrei Vagin [Wed, 24 Jun 2020 08:33:20 +0000 (01:33 -0700)]
arm64/vdso: Restrict splitting VVAR VMA

Forbid splitting VVAR VMA resulting in a stricter ABI and reducing the
amount of corner-cases to consider while working further on VDSO time
namespace support.

As the offset from timens to VVAR page is computed compile-time, the pages
in VVAR should stay together and not being partically mremap()'ed.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20200624083321.144975-6-avagin@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64/vdso: Handle faults on timens page
Andrei Vagin [Wed, 24 Jun 2020 08:33:19 +0000 (01:33 -0700)]
arm64/vdso: Handle faults on timens page

If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20200624083321.144975-5-avagin@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64/vdso: Add time namespace page
Andrei Vagin [Wed, 24 Jun 2020 08:33:18 +0000 (01:33 -0700)]
arm64/vdso: Add time namespace page

Allocate the time namespace page among VVAR pages.  Provide
__arch_get_timens_vdso_data() helper for VDSO code to get the
code-relative position of VVARs on that special page.

If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page. That page has vdso_data->seq
set to 1 to enforce the slow path and vdso_data->clock_mode set to
VCLOCK_TIMENS to enforce the time namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the VDSO data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

The time-namespace page isn't allocated on !CONFIG_TIME_NAMESPACE, but
vma is the same size, which simplifies criu/vdso migration between
different kernel configs.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20200624083321.144975-4-avagin@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64/vdso: Zap vvar pages when switching to a time namespace
Andrei Vagin [Wed, 24 Jun 2020 08:33:17 +0000 (01:33 -0700)]
arm64/vdso: Zap vvar pages when switching to a time namespace

The order of vvar pages depends on whether a task belongs to the root
time namespace or not. In the root time namespace, a task doesn't have a
per-namespace page. In a non-root namespace, the VVAR page which contains
the system-wide VDSO data is replaced with a namespace specific page
that contains clock offsets.

Whenever a task changes its namespace, the VVAR page tables are cleared
and then they will be re-faulted with a corresponding layout.

A task can switch its time namespace only if its ->mm isn't shared with
another task.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200624083321.144975-3-avagin@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64/vdso: use the fault callback to map vvar pages
Andrei Vagin [Wed, 24 Jun 2020 08:33:16 +0000 (01:33 -0700)]
arm64/vdso: use the fault callback to map vvar pages

Currently the vdso has no awareness of time namespaces, which may
apply distinct offsets to processes in different namespaces. To handle
this within the vdso, we'll need to expose a per-namespace data page.

As a preparatory step, this patch separates the vdso data page from
the code pages, and has it faulted in via its own fault callback.
Subsquent patches will extend this to support distinct pages per time
namespace.

The vvar vma has to be installed with the VM_PFNMAP flag to handle
faults via its vma fault callback.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20200624083321.144975-2-avagin@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agorecordmcount: only record relocation of type R_AARCH64_CALL26 on arm64.
Gregory Herrero [Fri, 17 Jul 2020 14:33:38 +0000 (16:33 +0200)]
recordmcount: only record relocation of type R_AARCH64_CALL26 on arm64.

Currently, if a section has a relocation to '_mcount' symbol, a new
__mcount_loc entry will be added whatever the relocation type is.
This is problematic when a relocation to '_mcount' is in the middle of a
section and is not a call for ftrace use.

Such relocation could be generated with below code for example:
    bool is_mcount(unsigned long addr)
    {
        return (target == (unsigned long) &_mcount);
    }

With this snippet of code, ftrace will try to patch the mcount location
generated by this code on module load and fail with:

    Call trace:
     ftrace_bug+0xa0/0x28c
     ftrace_process_locs+0x2f4/0x430
     ftrace_module_init+0x30/0x38
     load_module+0x14f0/0x1e78
     __do_sys_finit_module+0x100/0x11c
     __arm64_sys_finit_module+0x28/0x34
     el0_svc_common+0x88/0x194
     el0_svc_handler+0x38/0x8c
     el0_svc+0x8/0xc
    ---[ end trace d828d06b36ad9d59 ]---
    ftrace failed to modify
    [<ffffa2dbf3a3a41c>] 0xffffa2dbf3a3a41c
     actual:   66:a9:3c:90
    Initializing ftrace call sites
    ftrace record flags: 2000000
     (0)
    expected tramp: ffffa2dc6cf66724

So Limit the relocation type to R_AARCH64_CALL26 as in perl version of
recordmcount.

Fixes: af64d2aa872a ("ftrace: Add arm64 support to recordmcount")
Signed-off-by: Gregory Herrero <gregory.herrero@oracle.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20200717143338.19302-1-gregory.herrero@oracle.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Reserve HWCAP2_MTE as (1 << 18)
Catalin Marinas [Fri, 24 Jul 2020 09:41:31 +0000 (10:41 +0100)]
arm64: Reserve HWCAP2_MTE as (1 << 18)

While MTE is not supported in the upstream kernel yet, add a comment
that HWCAP2_MTE as (1 << 18) is reserved. Glibc makes use of it for the
resolving (ifunc) of the MTE-safe string routines.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64/entry: deduplicate SW PAN entry/exit routines
Ard Biesheuvel [Tue, 21 Jul 2020 08:33:15 +0000 (10:33 +0200)]
arm64/entry: deduplicate SW PAN entry/exit routines

Factor the 12 copies of the SW PAN entry and exit code into callable
subroutines, and use alternatives patching to either emit a 'bl'
instruction to call them, or a NOP if h/w PAN is found to be available
at runtime.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20200721083315.4816-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64: s/AMEVTYPE/AMEVTYPER
Vladimir Murzin [Tue, 21 Jul 2020 09:12:59 +0000 (10:12 +0100)]
arm64: s/AMEVTYPE/AMEVTYPER

Activity Monitor Event Type Registers are named as AMEVTYPER{0,1}<n>

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20200721091259.102756-1-vladimir.murzin@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64: perf: Expose some new events via sysfs
Shaokun Zhang [Tue, 21 Jul 2020 10:49:33 +0000 (18:49 +0800)]
arm64: perf: Expose some new events via sysfs

Some new PMU events can been detected by PMCEID1_EL0, but it can't
be listed, Let's expose these through sysfs.

Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/1595328573-12751-2-git-send-email-zhangshaokun@hisilicon.com
Signed-off-by: Will Deacon <will@kernel.org>
4 years agotools headers UAPI: Update tools's copy of linux/perf_event.h
Leo Yan [Thu, 16 Jul 2020 05:11:30 +0000 (13:11 +0800)]
tools headers UAPI: Update tools's copy of linux/perf_event.h

To get the changes in the commit:

  "perf: Add perf_event_mmap_page::cap_user_time_short ABI"

This update is a prerequisite to add support for short clock counters
related ABI extension.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200716051130.4359-8-leo.yan@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
4 years agoarm64: perf: Add cap_user_time_short
Peter Zijlstra [Thu, 16 Jul 2020 05:11:29 +0000 (13:11 +0800)]
arm64: perf: Add cap_user_time_short

This completes the ARM64 cap_user_time support.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200716051130.4359-7-leo.yan@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
4 years agoperf: Add perf_event_mmap_page::cap_user_time_short ABI
Peter Zijlstra [Thu, 16 Jul 2020 05:11:28 +0000 (13:11 +0800)]
perf: Add perf_event_mmap_page::cap_user_time_short ABI

In order to support short clock counters, provide an ABI extension.

As a whole:

    u64 time, delta, cyc = read_cycle_counter();

+   if (cap_user_time_short)
+ cyc = time_cycle + ((cyc - time_cycle) & time_mask);

    delta = mul_u64_u32_shr(cyc, time_mult, time_shift);

    if (cap_user_time_zero)
time = time_zero + delta;

    delta += time_offset;

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200716051130.4359-6-leo.yan@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
4 years agoarm64: perf: Only advertise cap_user_time for arch_timer
Peter Zijlstra [Thu, 16 Jul 2020 05:11:27 +0000 (13:11 +0800)]
arm64: perf: Only advertise cap_user_time for arch_timer

When sched_clock is running on anything other than arch_timer, don't
advertise cap_user_time*.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200716051130.4359-5-leo.yan@linaro.org
Requested-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
4 years agoarm64: perf: Implement correct cap_user_time
Peter Zijlstra [Thu, 16 Jul 2020 05:11:26 +0000 (13:11 +0800)]
arm64: perf: Implement correct cap_user_time

As reported by Leo; the existing implementation is broken when the
clock and counter don't intersect at 0.

Use the sched_clock's struct clock_read_data information to correctly
implement cap_user_time and cap_user_time_zero.

Note that the ARM64 counter is architecturally only guaranteed to be
56bit wide (implementations are allowed to be wider) and the existing
perf ABI cannot deal with wrap-around.

This implementation should also be faster than the old; seeing how we
don't need to recompute mult and shift all the time.

[leoyan: Use mul_u64_u32_shr() to convert cyc to ns to avoid overflow]

Reported-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200716051130.4359-4-leo.yan@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
4 years agotime/sched_clock: Use raw_read_seqcount_latch()
Ahmed S. Darwish [Thu, 16 Jul 2020 05:11:25 +0000 (13:11 +0800)]
time/sched_clock: Use raw_read_seqcount_latch()

sched_clock uses seqcount_t latching to switch between two storage
places protected by the sequence counter. This allows it to have
interruptible, NMI-safe, seqcount_t write side critical sections.

Since 7fc26327b756 ("seqlock: Introduce raw_read_seqcount_latch()"),
raw_read_seqcount_latch() became the standardized way for seqcount_t
latch read paths. Due to the dependent load, it also has one read
memory barrier less than the currently used raw_read_seqcount() API.

Use raw_read_seqcount_latch() for the seqcount_t latch read path.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lkml.kernel.org/r/20200625085745.GD117543@hirez.programming.kicks-ass.net
Link: https://lkml.kernel.org/r/20200715092345.GA231464@debian-buster-darwi.lab.linutronix.de
Link: https://lore.kernel.org/r/20200716051130.4359-3-leo.yan@linaro.org
References: 1809bfa44e10 ("timers, sched/clock: Avoid deadlock during read from NMI")
Signed-off-by: Will Deacon <will@kernel.org>
4 years agosched_clock: Expose struct clock_read_data
Peter Zijlstra [Thu, 16 Jul 2020 05:11:24 +0000 (13:11 +0800)]
sched_clock: Expose struct clock_read_data

In order to support perf_event_mmap_page::cap_time features, an
architecture needs, aside from a userspace readable counter register,
to expose the exact clock data so that userspace can convert the
counter register into a correct timestamp.

Provide struct clock_read_data and two (seqcount) helpers so that
architectures (arm64 in specific) can expose the numbers to userspace.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200716051130.4359-2-leo.yan@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
4 years agoarm64: perf: Correct the event index in sysfs
Shaokun Zhang [Thu, 18 Jun 2020 13:35:44 +0000 (21:35 +0800)]
arm64: perf: Correct the event index in sysfs

When PMU event ID is equal or greater than 0x4000, it will be reduced
by 0x4000 and it is not the raw number in the sysfs. Let's correct it
and obtain the raw event ID.

Before this patch:
cat /sys/bus/event_source/devices/armv8_pmuv3_0/events/sample_feed
event=0x001
After this patch:
cat /sys/bus/event_source/devices/armv8_pmuv3_0/events/sample_feed
event=0x4001

Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/1592487344-30555-3-git-send-email-zhangshaokun@hisilicon.com
[will: fixed formatting of 'if' condition]
Signed-off-by: Will Deacon <will@kernel.org>
4 years agoarm64: tlb: Use the TLBI RANGE feature in arm64
Zhenyu Ye [Wed, 15 Jul 2020 07:19:45 +0000 (15:19 +0800)]
arm64: tlb: Use the TLBI RANGE feature in arm64

Add __TLBI_VADDR_RANGE macro and rewrite __flush_tlb_range().

When cpu supports TLBI feature, the minimum range granularity is
decided by 'scale', so we can not flush all pages by one instruction
in some cases.

For example, when the pages = 0xe81a, let's start 'scale' from
maximum, and find right 'num' for each 'scale':

1. scale = 3, we can flush no pages because the minimum range is
   2^(5*3 + 1) = 0x10000.
2. scale = 2, the minimum range is 2^(5*2 + 1) = 0x800, we can
   flush 0xe800 pages this time, the num = 0xe800/0x800 - 1 = 0x1c.
   Remaining pages is 0x1a;
3. scale = 1, the minimum range is 2^(5*1 + 1) = 0x40, no page
   can be flushed.
4. scale = 0, we flush the remaining 0x1a pages, the num =
   0x1a/0x2 - 1 = 0xd.

However, in most scenarios, the pages = 1 when flush_tlb_range() is
called. Start from scale = 3 or other proper value (such as scale =
ilog2(pages)), will incur extra overhead.
So increase 'scale' from 0 to maximum, the flush order is exactly
opposite to the example.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/20200715071945.897-4-yezhenyu2@huawei.com
[catalin.marinas@arm.com: removed unnecessary masks in __TLBI_VADDR_RANGE]
[catalin.marinas@arm.com: __TLB_RANGE_NUM subtracts 1]
[catalin.marinas@arm.com: minor adjustments to the comments]
[catalin.marinas@arm.com: introduce system_supports_tlb_range()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64: enable tlbi range instructions
Zhenyu Ye [Wed, 15 Jul 2020 07:19:44 +0000 (15:19 +0800)]
arm64: enable tlbi range instructions

TLBI RANGE feature instoduces new assembly instructions and only
support by binutils >= 2.30.  Add necessary Kconfig logic to allow
this to be enabled and pass '-march=armv8.4-a' to KBUILD_CFLAGS.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/20200715071945.897-3-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64: tlb: Detect the ARMv8.4 TLBI RANGE feature
Zhenyu Ye [Wed, 15 Jul 2020 07:19:43 +0000 (15:19 +0800)]
arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature

ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
range of input addresses. This patch detect this feature.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/20200715071945.897-2-yezhenyu2@huawei.com
[catalin.marinas@arm.com: some renaming for consistency]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64/hugetlb: Reserve CMA areas for gigantic pages on 16K and 64K configs
Anshuman Khandual [Wed, 1 Jul 2020 04:42:01 +0000 (10:12 +0530)]
arm64/hugetlb: Reserve CMA areas for gigantic pages on 16K and 64K configs

Currently 'hugetlb_cma=' command line argument does not create CMA area on
ARM64_16K_PAGES and ARM64_64K_PAGES based platforms. Instead, it just ends
up with the following warning message. Reason being, hugetlb_cma_reserve()
never gets called for these huge page sizes.

[   64.255669] hugetlb_cma: the option isn't supported by current arch

This enables CMA areas reservation on ARM64_16K_PAGES and ARM64_64K_PAGES
configs by defining an unified arm64_hugetlb_cma_reseve() that is wrapped
in CONFIG_CMA. Call site for arm64_hugetlb_cma_reserve() is also protected
as <asm/hugetlb.h> is conditionally included and hence cannot contain stub
for the inverse config i.e !(CONFIG_HUGETLB_PAGE && CONFIG_CMA).

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1593578521-24672-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64: stacktrace: Move export for save_stack_trace_tsk()
Mark Brown [Fri, 10 Jul 2020 18:24:02 +0000 (19:24 +0100)]
arm64: stacktrace: Move export for save_stack_trace_tsk()

Due to refactoring way back in bb53c820c5b0f1 ("arm64: stacktrace: avoid
listing stacktrace functions in stacktrace") the EXPORT_SYMBOL_GPL() for
save_stack_trace_tsk() is at the end of __save_stack_trace() rather than
the function it exports. Move it to the expected location.

Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200710182402.50473-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64/acpi: disallow writeable AML opregion mapping for EFI code regions
Ard Biesheuvel [Fri, 26 Jun 2020 15:58:32 +0000 (17:58 +0200)]
arm64/acpi: disallow writeable AML opregion mapping for EFI code regions

Given that the contents of EFI runtime code and data regions are
provided by the firmware, as well as the DSDT, it is not unimaginable
that AML code exists today that accesses EFI runtime code regions using
a SystemMemory OpRegion. There is nothing fundamentally wrong with that,
but since we take great care to ensure that executable code is never
mapped writeable and executable at the same time, we should not permit
AML to create writable mapping.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20200626155832.2323789-3-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64/acpi: disallow AML memory opregions to access kernel memory
Ard Biesheuvel [Fri, 26 Jun 2020 15:58:31 +0000 (17:58 +0200)]
arm64/acpi: disallow AML memory opregions to access kernel memory

AML uses SystemMemory opregions to allow AML handlers to access MMIO
registers of, e.g., GPIO controllers, or access reserved regions of
memory that are owned by the firmware.

Currently, we also allow AML access to memory that is owned by the
kernel and mapped via the linear region, which does not seem to be
supported by a valid use case, and exposes the kernel's internal
state to AML methods that may be buggy and exploitable.

On arm64, ACPI support requires booting in EFI mode, and so we can cross
reference the requested region against the EFI memory map, rather than
just do a minimal check on the first page. So let's only permit regions
to be remapped by the ACPI core if
- they don't appear in the EFI memory map at all (which is the case for
  most MMIO), or
- they are covered by a single region in the EFI memory map, which is not
  of a type that describes memory that is given to the kernel at boot.

Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20200626155832.2323789-2-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoperf/smmuv3: To simplify code for ioremap page in pmcg
Jay Chen [Mon, 6 Jul 2020 11:22:45 +0000 (19:22 +0800)]
perf/smmuv3: To simplify code for ioremap page in pmcg

Use the devm_platform_get_and_ioremap_resource to simplify the code
a bit.

Signed-off-by: Jay Chen <jkchen@linux.alibaba.com>
Link: https://lore.kernel.org/r/20200706112246.92220-2-jkchen@linux.alibaba.com
Signed-off-by: Will Deacon <will@kernel.org>
4 years agoarm64: tlb: don't set the ttl value in flush_tlb_page_nosync
Zhenyu Ye [Fri, 10 Jul 2020 09:41:58 +0000 (17:41 +0800)]
arm64: tlb: don't set the ttl value in flush_tlb_page_nosync

flush_tlb_page_nosync() may be called from pmd level, so we
can not set the ttl = 3 here.

The callstack is as follows:

pmdp_set_access_flags
ptep_set_access_flags
flush_tlb_fix_spurious_fault
flush_tlb_page
flush_tlb_page_nosync

Fixes: e735b98a5fe0 ("arm64: Add tlbi_user_level TLB invalidation helper")
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/20200710094158.468-1-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64/cpufeature: Validate feature bits spacing in arm64_ftr_regs[]
Anshuman Khandual [Tue, 7 Jul 2020 14:23:13 +0000 (19:53 +0530)]
arm64/cpufeature: Validate feature bits spacing in arm64_ftr_regs[]

arm64_feature_bits for a register in arm64_ftr_regs[] are in a descending
order as per their shift values. Validate that these features bits are
defined correctly and do not overlap with each other. This check protects
against any inadvertent erroneous changes to the register definitions.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1594131793-9498-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64: Shift the __tlbi_level() indentation left
Catalin Marinas [Tue, 7 Jul 2020 10:26:14 +0000 (11:26 +0100)]
arm64: Shift the __tlbi_level() indentation left

This is for consistency with the other __tlbi macros in this file. No
functional change.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64: tlb: Set the TTL field in flush_*_tlb_range
Zhenyu Ye [Thu, 25 Jun 2020 08:03:14 +0000 (16:03 +0800)]
arm64: tlb: Set the TTL field in flush_*_tlb_range

This patch implement flush_{pmd|pud}_tlb_range() in arm64 by
calling __flush_tlb_range() with the corresponding stride and
tlb_level values.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/20200625080314.230-7-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64: tlb: Set the TTL field in flush_tlb_range
Zhenyu Ye [Thu, 25 Jun 2020 08:03:13 +0000 (16:03 +0800)]
arm64: tlb: Set the TTL field in flush_tlb_range

This patch uses the cleared_* in struct mmu_gather to set the
TTL field in flush_tlb_range().

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200625080314.230-6-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agotlb: mmu_gather: add tlb_flush_*_range APIs
Peter Zijlstra (Intel) [Thu, 25 Jun 2020 08:03:12 +0000 (16:03 +0800)]
tlb: mmu_gather: add tlb_flush_*_range APIs

tlb_flush_{pte|pmd|pud|p4d}_range() adjust the tlb->start and
tlb->end, then set corresponding cleared_*.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200625080314.230-5-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64: Add tlbi_user_level TLB invalidation helper
Zhenyu Ye [Thu, 25 Jun 2020 08:03:11 +0000 (16:03 +0800)]
arm64: Add tlbi_user_level TLB invalidation helper

Add a level-hinted parameter to __tlbi_user, which only gets used
if ARMv8.4-TTL gets detected.

ARMv8.4-TTL provides the TTL field in tlbi instruction to indicate
the level of translation table walk holding the leaf entry for the
address that is being invalidated.

This patch set the default level value of flush_tlb_range() to 0,
which will be updated in future patches.  And set the ttl value of
flush_tlb_page_nosync() to 3 because it is only called to flush a
single pte page.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
Link: https://lore.kernel.org/r/20200625080314.230-4-yezhenyu2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agosmccc: Make constants available to assembly
Andrew Scull [Thu, 18 Jun 2020 14:55:11 +0000 (15:55 +0100)]
smccc: Make constants available to assembly

Move constants out of the C-only section of the header next to the other
constants that are available to assembly.

Signed-off-by: Andrew Scull <ascull@google.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200618145511.69203-1-ascull@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64: Add level-hinted TLB invalidation helper
Marc Zyngier [Wed, 2 Jan 2019 10:21:29 +0000 (10:21 +0000)]
arm64: Add level-hinted TLB invalidation helper

Add a level-hinted TLB invalidation helper that only gets used if
ARMv8.4-TTL gets detected.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
4 years agoarm64: Document SW reserved PTE/PMD bits in Stage-2 descriptors
Marc Zyngier [Fri, 28 Dec 2018 09:11:50 +0000 (09:11 +0000)]
arm64: Document SW reserved PTE/PMD bits in Stage-2 descriptors

Advertise bits [58:55] as reserved for SW in the S2 descriptors.

Reviewed-by: Andrew Scull <ascull@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
4 years agoarm64: Detect the ARMv8.4 TTL feature
Marc Zyngier [Sat, 22 Dec 2018 12:00:10 +0000 (12:00 +0000)]
arm64: Detect the ARMv8.4 TTL feature

In order to reduce the cost of TLB invalidation, the ARMv8.4 TTL
feature allows TLBs to be issued with a level allowing for quicker
invalidation.

Let's detect the feature for now. Further patches will implement
its actual usage.

Reviewed-by : Suzuki K Polose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
4 years agoarm64/mm: Redefine CONT_{PTE, PMD}_SHIFT
Gavin Shan [Tue, 30 Jun 2020 06:24:28 +0000 (16:24 +1000)]
arm64/mm: Redefine CONT_{PTE, PMD}_SHIFT

Currently, the value of CONT_{PTE, PMD}_SHIFT is off from standard
{PAGE, PMD}_SHIFT. In turn, we have to consider adding {PAGE, PMD}_SHIFT
when using CONT_{PTE, PMD}_SHIFT in the function hugetlbpage_init().
It's a bit confusing.

This redefines CONT_{PTE, PMD}_SHIFT with {PAGE, PMD}_SHIFT included
so that the later values needn't be added when using the former ones
in function hugetlbpage_init(). Note that the values of CONT_{PTES, PMDS}
are unchanged.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lkml.org/lkml/2020/5/6/190
Link: https://lore.kernel.org/r/20200630062428.194235-1-gshan@redhat.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64/defconfig: Enable CONFIG_KEXEC_FILE
Bhupesh Sharma [Mon, 6 Apr 2020 22:31:40 +0000 (04:01 +0530)]
arm64/defconfig: Enable CONFIG_KEXEC_FILE

kexec_file_load() syscall interface is now supported for
arm64 architecture as well via commits:
3751e728cef2 ("arm64: kexec_file: add crash dump support") and
3ddd9992a590 ("arm64: enable KEXEC_FILE config")].

This patch enables config KEXEC_FILE by default in the
arm64 defconfig, so that user-space tools like kexec-tools
can use the same as the default interface for kexec/kdump
on arm64.

Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: kexec@lists.infradead.org
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/1586212300-30797-1-git-send-email-bhsharma@redhat.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64/cpufeature: Replace all open bits shift encodings with macros
Anshuman Khandual [Fri, 3 Jul 2020 03:51:37 +0000 (09:21 +0530)]
arm64/cpufeature: Replace all open bits shift encodings with macros

There are many open bits shift encodings for various CPU ID registers that
are scattered across cpufeature. This replaces them with register specific
sensible macro definitions. This should not have any functional change.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1593748297-1965-5-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64/cpufeature: Add remaining feature bits in ID_AA64MMFR2 register
Anshuman Khandual [Fri, 3 Jul 2020 03:51:36 +0000 (09:21 +0530)]
arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR2 register

Enable EVT, BBM, TTL, IDS, ST, NV and CCIDX features bits in ID_AA64MMFR2
register as per ARM DDI 0487F.a specification.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1593748297-1965-4-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64/cpufeature: Add remaining feature bits in ID_AA64MMFR1 register
Anshuman Khandual [Fri, 3 Jul 2020 03:51:35 +0000 (09:21 +0530)]
arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR1 register

Enable ETS, TWED, XNX and SPECSEI features bits in ID_AA64MMFR1 register as
per ARM DDI 0487F.a specification.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1593748297-1965-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64/cpufeature: Add remaining feature bits in ID_AA64MMFR0 register
Anshuman Khandual [Fri, 3 Jul 2020 03:51:34 +0000 (09:21 +0530)]
arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR0 register

Enable EVC, FGT, EXS features bits in ID_AA64MMFR0 register as per ARM DDI
0487F.a specification.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1593748297-1965-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64: Document sysctls for emulated deprecated instructions
Mark Brown [Thu, 25 Jun 2020 13:15:07 +0000 (14:15 +0100)]
arm64: Document sysctls for emulated deprecated instructions

We have support for emulating a number of deprecated instructions in the
kernel with individual Kconfig options enabling this support per
instruction. In addition to the Kconfig options we also provide runtime
control via sysctls but this is not currently mentioned in the Kconfig so
not very discoverable for users. This is particularly important for
SWP/SWPB since this is disabled by default at runtime and must be enabled
via the sysctl, causing considerable frustration for users who have enabled
the config option and are then confused to find that the instruction is
still faulting.

Add a reference to the sysctls in the help text for each of the config
options, noting that SWP/SWPB is disabled by default, to improve the
user experience.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20200625131507.32334-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
Bhupesh Sharma [Wed, 13 May 2020 18:52:37 +0000 (00:22 +0530)]
arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo

TCR_EL1.TxSZ, which controls the VA space size, is configured by a
single kernel image to support either 48-bit or 52-bit VA space.

If the ARMv8.2-LVA optional feature is present and we are running
with a 64KB page size, then it is possible to use 52-bits of address
space for both userspace and kernel addresses. However, any kernel
binary that supports 52-bit must also be able to fall back to 48-bit
at early boot time if the hardware feature is not present.

Since TCR_EL1.T1SZ indicates the size of the memory region addressed by
TTBR1_EL1, export the same in vmcoreinfo. User-space utilities like
makedumpfile and crash-utility need to read this value from vmcoreinfo
for determining if a virtual address lies in the linear map range.

While at it also add documentation for TCR_EL1.T1SZ variable being
added to vmcoreinfo.

It indicates the size offset of the memory region addressed by
TTBR1_EL1.

Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: John Donnelly <john.p.donnelly@oracle.com>
Tested-by: Kamlakant Patel <kamlakantp@marvell.com>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: kexec@lists.infradead.org
Link: https://lore.kernel.org/r/1589395957-24628-3-git-send-email-bhsharma@redhat.com
[catalin.marinas@arm.com: removed vabits_actual from the commit log]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agocrash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
Bhupesh Sharma [Wed, 13 May 2020 18:52:36 +0000 (00:22 +0530)]
crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo

Right now user-space tools like 'makedumpfile' and 'crash' need to rely
on a best-guess method of determining value of 'MAX_PHYSMEM_BITS'
supported by underlying kernel.

This value is used in user-space code to calculate the bit-space
required to store a section for SPARESMEM (similar to the existing
calculation method used in the kernel implementation):

  #define SECTIONS_SHIFT    (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)

Now, regressions have been reported in user-space utilities
like 'makedumpfile' and 'crash' on arm64, with the recently added
kernel support for 52-bit physical address space, as there is
no clear method of determining this value in user-space
(other than reading kernel CONFIG flags).

As per suggestion from makedumpfile maintainer (Kazu), it makes more
sense to append 'MAX_PHYSMEM_BITS' to vmcoreinfo in the core code itself
rather than in arch-specific code, so that the user-space code for other
archs can also benefit from this addition to the vmcoreinfo and use it
as a standard way of determining 'SECTIONS_SHIFT' value in user-land.

A reference 'makedumpfile' implementation which reads the
'MAX_PHYSMEM_BITS' value from vmcoreinfo in a arch-independent fashion
is available here:

While at it also update vmcoreinfo documentation for 'MAX_PHYSMEM_BITS'
variable being added to vmcoreinfo.

'MAX_PHYSMEM_BITS' defines the maximum supported physical address
space memory.

Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: John Donnelly <john.p.donnelly@oracle.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Boris Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Cc: x86@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: kexec@lists.infradead.org
Link: https://lore.kernel.org/r/1589395957-24628-2-git-send-email-bhsharma@redhat.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64/panic: Unify all three existing notifier blocks
Anshuman Khandual [Mon, 29 Jun 2020 04:38:31 +0000 (10:08 +0530)]
arm64/panic: Unify all three existing notifier blocks

Currently there are three different registered panic notifier blocks. This
unifies all of them into a single one i.e arm64_panic_block, hence reducing
code duplication and required calling sequence during panic. This preserves
the existing dump sequence. While here, just use device_initcall() directly
instead of __initcall() which has been a legacy alias for the earlier. This
replacement is a pure cleanup with no functional implications.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1593405511-7625-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64/module: Optimize module load time by optimizing PLT counting
Saravana Kannan [Tue, 23 Jun 2020 01:18:02 +0000 (18:18 -0700)]
arm64/module: Optimize module load time by optimizing PLT counting

When loading a module, module_frob_arch_sections() tries to figure out
the number of PLTs that'll be needed to handle all the RELAs. While
doing this, it tries to dedupe PLT allocations for multiple
R_AARCH64_CALL26 relocations to the same symbol. It does the same for
R_AARCH64_JUMP26 relocations.

To make checks for duplicates easier/faster, it sorts the relocation
list by type, symbol and addend. That way, to check for a duplicate
relocation, it just needs to compare with the previous entry.

However, sorting the entire relocation array is unnecessary and
expensive (O(n log n)) because there are a lot of other relocation types
that don't need deduping or can't be deduped.

So this commit partitions the array into entries that need deduping and
those that don't. And then sorts just the part that needs deduping. And
when CONFIG_RANDOMIZE_BASE is disabled, the sorting is skipped entirely
because PLTs are not allocated for R_AARCH64_CALL26 and R_AARCH64_JUMP26
if it's disabled.

This gives significant reduction in module load time for modules with
large number of relocations with no measurable impact on modules with a
small number of relocations. In my test setup with CONFIG_RANDOMIZE_BASE
enabled, these were the results for a few downstream modules:

Module Size (MB)
wlan 14
video codec 3.8
drm 1.8
IPA 2.5
audio 1.2
gpu 1.8

Without this patch:
Module Number of entries sorted Module load time (ms)
wlan 243739 283
video codec 74029 138
drm 53837 67
IPA 42800 90
audio 21326 27
gpu 20967 32

Total time to load all these module: 637 ms

With this patch:
Module Number of entries sorted Module load time (ms)
wlan 22454 61
video codec 10150 47
drm 13014 40
IPA 8097 63
audio 4606 16
gpu 6527 20

Total time to load all these modules: 247

Time saved during boot for just these 6 modules: 390 ms

Signed-off-by: Saravana Kannan <saravanak@google.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Link: https://lore.kernel.org/r/20200623011803.91232-1-saravanak@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoLinux 5.8-rc3
Linus Torvalds [Sun, 28 Jun 2020 22:00:24 +0000 (15:00 -0700)]
Linux 5.8-rc3

4 years agoMerge tag 'arm-omap-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 28 Jun 2020 21:57:14 +0000 (14:57 -0700)]
Merge tag 'arm-omap-fixes-5.8-1' of git://git./linux/kernel/git/soc/soc

Pull ARM OMAP fixes from Arnd Bergmann:
 "The OMAP developers are particularly active at hunting down
  regressions, so this is a separate branch with OMAP specific
  fixes for v5.8:

  As Tony explains
    "The recent display subsystem (DSS) related platform data changes
     caused display related regressions for suspend and resume. Looks
     like I only tested suspend and resume before dropping the legacy
     platform data, and forgot to test it after dropping it. Turns out
     the main issue was that we no longer have platform code calling
     pm_runtime_suspend for DSS like we did for the legacy platform data
     case, and that fix is still being discussed on the dri-devel list
     and will get merged separately. The DSS related testing exposed a
     pile other other display related issues that also need fixing
     though":

   - Fix ti-sysc optional clock handling and reset status checks for
     devices that reset automatically in idle like DSS

   - Ignore ti-sysc clockactivity bit unless separately requested to
     avoid unexpected performance issues

   - Init ti-sysc framedonetv_irq to true and disable for am4

   - Avoid duplicate DSS reset for legacy mode with dts data

   - Remove LCD timings for am4 as they cause warnings now that we're
     using generic panels

  Other OMAP changes from Tony include:

   - Fix omap_prm reset deassert as we still have drivers setting the
     pm_runtime_irq_safe() flag

   - Flush posted write for ti-sysc enable and disable

   - Fix droid4 spi related errors with spi flags

   - Fix am335x USB range and a typo for softreset

   - Fix dra7 timer nodes for clocks for IPU and DSP

   - Drop duplicate mailboxes after mismerge for dra7

   - Prevent pocketgeagle header line signal from accidentally setting
     micro-SD write protection signal by removing the default mux

   - Fix NFSroot flakeyness after resume for duover by switching the
     smsc911x gpio interrupt to back to level sensitive

   - Fix regression for omap4 clockevent source after recent system
     timer changes

   - Yet another ethernet regression fix for the "rgmii" vs "rgmii-rxid"
     phy-mode

   - One patch to convert am3/am4 DT files to use the regular sdhci-omap
     driver instead of the old hsmmc driver, this was meant for the
     merge window but got lost in the process"

* tag 'arm-omap-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (21 commits)
  ARM: dts: am5729: beaglebone-ai: fix rgmii phy-mode
  ARM: dts: Fix omap4 system timer source clocks
  ARM: dts: Fix duovero smsc interrupt for suspend
  ARM: dts: am335x-pocketbeagle: Fix mmc0 Write Protect
  Revert "bus: ti-sysc: Increase max softreset wait"
  ARM: dts: am437x-epos-evm: remove lcd timings
  ARM: dts: am437x-gp-evm: remove lcd timings
  ARM: dts: am437x-sk-evm: remove lcd timings
  ARM: dts: dra7-evm-common: Fix duplicate mailbox nodes
  ARM: dts: dra7: Fix timer nodes properly for timer_sys_ck clocks
  ARM: dts: Fix am33xx.dtsi ti,sysc-mask wrong softreset flag
  ARM: dts: Fix am33xx.dtsi USB ranges length
  bus: ti-sysc: Increase max softreset wait
  ARM: OMAP2+: Fix legacy mode dss_reset
  bus: ti-sysc: Fix uninitialized framedonetv_irq
  bus: ti-sysc: Ignore clockactivity unless specified as a quirk
  bus: ti-sysc: Use optional clocks on for enable and wait for softreset bit
  ARM: dts: omap4-droid4: Fix spi configuration and increase rate
  bus: ti-sysc: Flush posted write on enable and disable
  soc: ti: omap-prm: use atomic iopoll instead of sleeping one
  ...

4 years agoMerge tag 'arm-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Sun, 28 Jun 2020 21:55:18 +0000 (14:55 -0700)]
Merge tag 'arm-fixes-5.8-1' of git://git./linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
 "Here are a couple of bug fixes, mostly for devicetree files

  NXP i.MX:
   - Use correct voltage on some i.MX8M board device trees to avoid
     hardware damage
   - Code fixes for a compiler warning and incorrect reference counting,
     both harmless.
   - Fix the i.MX8M SoC driver to correctly identify imx8mp
   - Fix watchdog configuration in imx6ul-kontron device tree.

  Broadcom:
   - A small regression fix for the Raspberry-Pi firmware driver
   - A Kconfig change to use the correct timer driver on Northstar
   - A DT fix for the Luxul XWC-2000 machine
   - Two more DT fixes for NSP SoCs

  STmicroelectronics STI
   - Revert one broken patch for L2 cache configuration

  ARM Versatile Express:
   - Fix a regression by reverting a broken DT cleanup

  TEE drivers:
   - MAINTAINERS: change tee mailing list"

* tag 'arm-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  Revert "ARM: sti: Implement dummy L2 cache's write_sec"
  soc: imx8m: fix build warning
  ARM: imx6: add missing put_device() call in imx6q_suspend_init()
  ARM: imx5: add missing put_device() call in imx_suspend_alloc_ocram()
  soc: imx8m: Correct i.MX8MP UID fuse offset
  ARM: dts: imx6ul-kontron: Change WDOG_ANY signal from push-pull to open-drain
  ARM: dts: imx6ul-kontron: Move watchdog from Kontron i.MX6UL/ULL board to SoM
  arm64: dts: imx8mm-beacon: Fix voltages on LDO1 and LDO2
  arm64: dts: imx8mn-ddr4-evk: correct ldo1/ldo2 voltage range
  arm64: dts: imx8mm-evk: correct ldo1/ldo2 voltage range
  ARM: dts: NSP: Correct FA2 mailbox node
  ARM: bcm2835: Fix integer overflow in rpi_firmware_print_firmware_revision()
  MAINTAINERS: change tee mailing list
  ARM: dts: NSP: Disable PL330 by default, add dma-coherent property
  ARM: bcm: Select ARM_TIMER_SP804 for ARCH_BCM_NSP
  ARM: dts: BCM5301X: Add missing memory "device_type" for Luxul XWC-2000
  arm: dts: vexpress: Move mcc node back into motherboard node

4 years agoMerge tag 'timers-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 28 Jun 2020 18:59:08 +0000 (11:59 -0700)]
Merge tag 'timers-urgent-2020-06-28' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "A single DocBook fix"

* tag 'timers-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Fix kerneldoc system_device_crosststamp & al

4 years agoMerge tag 'perf-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 28 Jun 2020 18:58:14 +0000 (11:58 -0700)]
Merge tag 'perf-urgent-2020-06-28' of git://git./linux/kernel/git/tip/tip

Pull perf fix from Ingo Molnar:
 "A single Kbuild dependency fix"

* tag 'perf-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/rapl: Fix RAPL config variable bug

4 years agoMerge tag 'efi-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 28 Jun 2020 18:42:16 +0000 (11:42 -0700)]
Merge tag 'efi-urgent-2020-06-28' of git://git./linux/kernel/git/tip/tip

Pull EFI fixes from Ingo Molnar:

 - Fix build regression on v4.8 and older

 - Robustness fix for TPM log parsing code

 - kobject refcount fix for the ESRT parsing code

 - Two efivarfs fixes to make it behave more like an ordinary file
   system

 - Style fixup for zero length arrays

 - Fix a regression in path separator handling in the initrd loader

 - Fix a missing prototype warning

 - Add some kerneldoc headers for newly introduced stub routines

 - Allow support for SSDT overrides via EFI variables to be disabled

 - Report CPU mode and MMU state upon entry for 32-bit ARM

 - Use the correct stack pointer alignment when entering from mixed mode

* tag 'efi-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/libstub: arm: Print CPU boot mode and MMU state at boot
  efi/libstub: arm: Omit arch specific config table matching array on arm64
  efi/x86: Setup stack correctly for efi_pe_entry
  efi: Make it possible to disable efivar_ssdt entirely
  efi/libstub: Descriptions for stub helper functions
  efi/libstub: Fix path separator regression
  efi/libstub: Fix missing-prototype warning for skip_spaces()
  efi: Replace zero-length array and use struct_size() helper
  efivarfs: Don't return -EINTR when rate-limiting reads
  efivarfs: Update inode modification time for successful writes
  efi/esrt: Fix reference count leak in esre_create_sysfs_entry.
  efi/tpm: Verify event log header before parsing
  efi/x86: Fix build with gcc 4

4 years agoMerge tag 'sched_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 28 Jun 2020 17:37:39 +0000 (10:37 -0700)]
Merge tag 'sched_urgent_for_5.8_rc3' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:
 "The most anticipated fix in this pull request is probably the horrible
  build fix for the RANDSTRUCT fail that didn't make -rc2. Also included
  is the cleanup that removes those BUILD_BUG_ON()s and replaces it with
  ugly unions.

  Also included is the try_to_wake_up() race fix that was first
  triggered by Paul's RCU-torture runs, but was independently hit by
  Dave Chinner's fstest runs as well"

* tag 'sched_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/cfs: change initial value of runnable_avg
  smp, irq_work: Continue smp_call_function*() and irq_work*() integration
  sched/core: s/WF_ON_RQ/WQ_ON_CPU/
  sched/core: Fix ttwu() race
  sched/core: Fix PI boosting between RT and DEADLINE tasks
  sched/deadline: Initialize ->dl_boosted
  sched/core: Check cpus_mask, not cpus_ptr in __set_cpus_allowed_ptr(), to fix mask corruption
  sched/core: Fix CONFIG_GCC_PLUGIN_RANDSTRUCT build fail

4 years agoMerge tag 'x86_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 28 Jun 2020 17:35:01 +0000 (10:35 -0700)]
Merge tag 'x86_urgent_for_5.8_rc3' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - AMD Memory bandwidth counter width fix, by Babu Moger.

 - Use the proper length type in the 32-bit truncate() syscall variant,
   by Jiri Slaby.

 - Reinit IA32_FEAT_CTL during wakeup to fix the case where after
   resume, VMXON would #GP due to VMX not being properly enabled, by
   Sean Christopherson.

 - Fix a static checker warning in the resctrl code, by Dan Carpenter.

 - Add a CR4 pinning mask for bits which cannot change after boot, by
   Kees Cook.

 - Align the start of the loop of __clear_user() to 16 bytes, to improve
   performance on AMD zen1 and zen2 microarchitectures, by Matt Fleming.

* tag 'x86_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm/64: Align start of __clear_user() loop to 16-bytes
  x86/cpu: Use pinning mask for CR4 bits needing to be 0
  x86/resctrl: Fix a NULL vs IS_ERR() static checker warning in rdt_cdp_peer_get()
  x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup
  syscalls: Fix offset type of ksys_ftruncate()
  x86/resctrl: Fix memory bandwidth counter width for AMD

4 years agoMerge tag 'rcu_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 28 Jun 2020 17:29:38 +0000 (10:29 -0700)]
Merge tag 'rcu_urgent_for_5.8_rc3' of git://git./linux/kernel/git/tip/tip

Pull RCU-vs-KCSAN fixes from Borislav Petkov:
 "A single commit that uses "arch_" atomic operations to avoid the
  instrumentation that comes with the non-"arch_" versions.

  In preparation for that commit, it also has another commit that makes
  these "arch_" atomic operations available to generic code.

  Without these commits, KCSAN uses can see pointless errors"

* tag 'rcu_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rcu: Fixup noinstr warnings
  locking/atomics: Provide the arch_atomic_ interface to generic code

4 years agoMerge tag 'objtool_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 28 Jun 2020 17:16:15 +0000 (10:16 -0700)]
Merge tag 'objtool_urgent_for_5.8_rc3' of git://git./linux/kernel/git/tip/tip

Pull objtool fixes from Borislav Petkov:
 "Three fixes from Peter Zijlstra suppressing KCOV instrumentation in
  noinstr sections.

  Peter Zijlstra says:
    "Address KCOV vs noinstr. There is no function attribute to
     selectively suppress KCOV instrumentation, instead teach objtool
     to NOP out the calls in noinstr functions"

  This cures a bunch of KCOV crashes (as used by syzcaller)"

* tag 'objtool_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix noinstr vs KCOV
  objtool: Provide elf_write_{insn,reloc}()
  objtool: Clean up elf_write() condition

4 years agoMerge tag 'x86_entry_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Linus Torvalds [Sun, 28 Jun 2020 16:42:47 +0000 (09:42 -0700)]
Merge tag 'x86_entry_for_5.8' of git://git./linux/kernel/git/tip/tip

Pull x86 entry fixes from Borislav Petkov:
 "This is the x86/entry urgent pile which has accumulated since the
  merge window.

  It is not the smallest but considering the almost complete entry core
  rewrite, the amount of fixes to follow is somewhat higher than usual,
  which is to be expected.

  Peter Zijlstra says:
   'These patches address a number of instrumentation issues that were
    found after the x86/entry overhaul. When combined with rcu/urgent
    and objtool/urgent, these patches make UBSAN/KASAN/KCSAN happy
    again.

    Part of making this all work is bumping the minimum GCC version for
    KASAN builds to gcc-8.3, the reason for this is that the
    __no_sanitize_address function attribute is broken in GCC releases
    before that.

    No known GCC version has a working __no_sanitize_undefined, however
    because the only noinstr violation that results from this happens
    when an UB is found, we treat it like WARN. That is, we allow it to
    violate the noinstr rules in order to get the warning out'"

* tag 'x86_entry_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry: Fix #UD vs WARN more
  x86/entry: Increase entry_stack size to a full page
  x86/entry: Fixup bad_iret vs noinstr
  objtool: Don't consider vmlinux a C-file
  kasan: Fix required compiler version
  compiler_attributes.h: Support no_sanitize_undefined check with GCC 4
  x86/entry, bug: Comment the instrumentation_begin() usage for WARN()
  x86/entry, ubsan, objtool: Whitelist __ubsan_handle_*()
  x86/entry, cpumask: Provide non-instrumented variant of cpu_is_offline()
  compiler_types.h: Add __no_sanitize_{address,undefined} to noinstr
  kasan: Bump required compiler version
  x86, kcsan: Add __no_kcsan to noinstr
  kcsan: Remove __no_kcsan_or_inline
  x86, kcsan: Remove __no_kcsan_or_inline usage

4 years agosched/cfs: change initial value of runnable_avg
Vincent Guittot [Wed, 24 Jun 2020 15:44:22 +0000 (17:44 +0200)]
sched/cfs: change initial value of runnable_avg

Some performance regression on reaim benchmark have been raised with
  commit 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")

The problem comes from the init value of runnable_avg which is initialized
with max value. This can be a problem if the newly forked task is finally
a short task because the group of CPUs is wrongly set to overloaded and
tasks are pulled less agressively.

Set initial value of runnable_avg equals to util_avg to reflect that there
is no waiting time so far.

Fixes: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200624154422.29166-1-vincent.guittot@linaro.org
4 years agosmp, irq_work: Continue smp_call_function*() and irq_work*() integration
Peter Zijlstra [Mon, 22 Jun 2020 10:01:25 +0000 (12:01 +0200)]
smp, irq_work: Continue smp_call_function*() and irq_work*() integration

Instead of relying on BUG_ON() to ensure the various data structures
line up, use a bunch of horrible unions to make it all automatic.

Much of the union magic is to ensure irq_work and smp_call_function do
not (yet) see the members of their respective data structures change
name.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lkml.kernel.org/r/20200622100825.844455025@infradead.org
4 years agosched/core: s/WF_ON_RQ/WQ_ON_CPU/
Peter Zijlstra [Mon, 22 Jun 2020 10:01:24 +0000 (12:01 +0200)]
sched/core: s/WF_ON_RQ/WQ_ON_CPU/

Use a better name for this poorly named flag, to avoid confusion...

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20200622100825.785115830@infradead.org
4 years agosched/core: Fix ttwu() race
Peter Zijlstra [Mon, 22 Jun 2020 10:01:23 +0000 (12:01 +0200)]
sched/core: Fix ttwu() race

Paul reported rcutorture occasionally hitting a NULL deref:

  sched_ttwu_pending()
    ttwu_do_wakeup()
      check_preempt_curr() := check_preempt_wakeup()
        find_matching_se()
          is_same_group()
            if (se->cfs_rq == pse->cfs_rq) <-- *BOOM*

Debugging showed that this only appears to happen when we take the new
code-path from commit:

  2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")

and only when @cpu == smp_processor_id(). Something which should not
be possible, because p->on_cpu can only be true for remote tasks.
Similarly, without the new code-path from commit:

  c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")

this would've unconditionally hit:

  smp_cond_load_acquire(&p->on_cpu, !VAL);

and if: 'cpu == smp_processor_id() && p->on_cpu' is possible, this
would result in an instant live-lock (with IRQs disabled), something
that hasn't been reported.

The NULL deref can be explained however if the task_cpu(p) load at the
beginning of try_to_wake_up() returns an old value, and this old value
happens to be smp_processor_id(). Further assume that the p->on_cpu
load accurately returns 1, it really is still running, just not here.

Then, when we enqueue the task locally, we can crash in exactly the
observed manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq
is from the wrong CPU, therefore we'll iterate into the non-existant
parents and NULL deref.

The closest semi-plausible scenario I've managed to contrive is
somewhat elaborate (then again, actual reproduction takes many CPU
hours of rcutorture, so it can't be anything obvious):

X->cpu = 1
rq(1)->curr = X

CPU0 CPU1 CPU2

// switch away from X
LOCK rq(1)->lock
smp_mb__after_spinlock
dequeue_task(X)
  X->on_rq = 9
switch_to(Z)
  X->on_cpu = 0
UNLOCK rq(1)->lock

// migrate X to cpu 0
LOCK rq(1)->lock
dequeue_task(X)
set_task_cpu(X, 0)
  X->cpu = 0
UNLOCK rq(1)->lock

LOCK rq(0)->lock
enqueue_task(X)
  X->on_rq = 1
UNLOCK rq(0)->lock

// switch to X
LOCK rq(0)->lock
smp_mb__after_spinlock
switch_to(X)
  X->on_cpu = 1
UNLOCK rq(0)->lock

// X goes sleep
X->state = TASK_UNINTERRUPTIBLE
smp_mb(); // wake X
ttwu()
  LOCK X->pi_lock
  smp_mb__after_spinlock

  if (p->state)

  cpu = X->cpu; // =? 1

  smp_rmb()

// X calls schedule()
LOCK rq(0)->lock
smp_mb__after_spinlock
dequeue_task(X)
  X->on_rq = 0

  if (p->on_rq)

  smp_rmb();

  if (p->on_cpu && ttwu_queue_wakelist(..)) [*]

  smp_cond_load_acquire(&p->on_cpu, !VAL)

  cpu = select_task_rq(X, X->wake_cpu, ...)
  if (X->cpu != cpu)
switch_to(Y)
  X->on_cpu = 0
UNLOCK rq(0)->lock

However I'm having trouble convincing myself that's actually possible
on x86_64 -- after all, every LOCK implies an smp_mb() there, so if ttwu
observes ->state != RUNNING, it must also observe ->cpu != 1.

(Most of the previous ttwu() races were found on very large PowerPC)

Nevertheless, this fully explains the observed failure case.

Fix it by ordering the task_cpu(p) load after the p->on_cpu load,
which is easy since nothing actually uses @cpu before this.

Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200622125649.GC576871@hirez.programming.kicks-ass.net
4 years agosched/core: Fix PI boosting between RT and DEADLINE tasks
Juri Lelli [Mon, 19 Nov 2018 15:32:01 +0000 (16:32 +0100)]
sched/core: Fix PI boosting between RT and DEADLINE tasks

syzbot reported the following warning:

 WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
 enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504

At deadline.c:628 we have:

 623 static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
 624 {
 625  struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
 626  struct rq *rq = rq_of_dl_rq(dl_rq);
 627
 628  WARN_ON(dl_se->dl_boosted);
 629  WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
        [...]
     }

Which means that setup_new_dl_entity() has been called on a task
currently boosted. This shouldn't happen though, as setup_new_dl_entity()
is only called when the 'dynamic' deadline of the new entity
is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this
condition.

Digging through the PI code I noticed that what above might in fact happen
if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the
first branch of boosting conditions we check only if a pi_task 'dynamic'
deadline is earlier than mutex holder's and in this case we set mutex
holder to be dl_boosted. However, since RT 'dynamic' deadlines are only
initialized if such tasks get boosted at some point (or if they become
DEADLINE of course), in general RT 'dynamic' deadlines are usually equal
to 0 and this verifies the aforementioned condition.

Fix it by checking that the potential donor task is actually (even if
temporary because in turn boosted) running at DEADLINE priority before
using its 'dynamic' deadline value.

Fixes: 2d3d891d3344 ("sched/deadline: Add SCHED_DEADLINE inheritance logic")
Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Tested-by: Daniel Wagner <dwagner@suse.de>
Link: https://lkml.kernel.org/r/20181119153201.GB2119@localhost.localdomain
4 years agosched/deadline: Initialize ->dl_boosted
Juri Lelli [Wed, 17 Jun 2020 07:29:19 +0000 (09:29 +0200)]
sched/deadline: Initialize ->dl_boosted

syzbot reported the following warning triggered via SYSC_sched_setattr():

  WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 setup_new_dl_entity /kernel/sched/deadline.c:594 [inline]
  WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 enqueue_dl_entity /kernel/sched/deadline.c:1370 [inline]
  WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 enqueue_task_dl+0x1c17/0x2ba0 /kernel/sched/deadline.c:1441

This happens because the ->dl_boosted flag is currently not initialized by
__dl_clear_params() (unlike the other flags) and setup_new_dl_entity()
rightfully complains about it.

Initialize dl_boosted to 0.

Fixes: 2d3d891d3344 ("sched/deadline: Add SCHED_DEADLINE inheritance logic")
Reported-by: syzbot+5ac8bac25f95e8b221e7@syzkaller.appspotmail.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Daniel Wagner <dwagner@suse.de>
Link: https://lkml.kernel.org/r/20200617072919.818409-1-juri.lelli@redhat.com
4 years agosched/core: Check cpus_mask, not cpus_ptr in __set_cpus_allowed_ptr(), to fix mask...
Scott Wood [Wed, 17 Jun 2020 12:17:42 +0000 (14:17 +0200)]
sched/core: Check cpus_mask, not cpus_ptr in __set_cpus_allowed_ptr(), to fix mask corruption

This function is concerned with the long-term CPU mask, not the
transitory mask the task might have while migrate disabled.  Before
this patch, if a task was migrate-disabled at the time
__set_cpus_allowed_ptr() was called, and the new mask happened to be
equal to the CPU that the task was running on, then the mask update
would be lost.

Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200617121742.cpxppyi7twxmpin7@linutronix.de
4 years agosched/core: Fix CONFIG_GCC_PLUGIN_RANDSTRUCT build fail
Peter Zijlstra [Wed, 10 Jun 2020 10:14:09 +0000 (12:14 +0200)]
sched/core: Fix CONFIG_GCC_PLUGIN_RANDSTRUCT build fail

As a temporary build fix, the proper cleanup needs more work.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Eric Biggers <ebiggers@kernel.org>
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Suggested-by: Kees Cook <keescook@chromium.org>
Fixes: a148866489fb ("sched: Replace rq::wake_list")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agoMerge tag 'imx-fixes-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo...
Arnd Bergmann [Sun, 28 Jun 2020 12:48:19 +0000 (14:48 +0200)]
Merge tag 'imx-fixes-5.8' of git://git./linux/kernel/git/shawnguo/linux into arm/fixes

i.MX fixes for 5.8:

- Fix LDO1 and LDO2 voltage range for a couple of i.MX8M board device
  trees.
- Fix i.MX8MP UID fuse offset in i.MX8M SoC driver.
- Fix watchdog configuration in imx6ul-kontron device tree.
- Fix one build warning seen on building soc-imx8m driver with
  x86_64-randconfig.
- Add missing put_device() call for a couple of mach-imx PM functions.

* tag 'imx-fixes-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  soc: imx8m: fix build warning
  ARM: imx6: add missing put_device() call in imx6q_suspend_init()
  ARM: imx5: add missing put_device() call in imx_suspend_alloc_ocram()
  soc: imx8m: Correct i.MX8MP UID fuse offset
  ARM: dts: imx6ul-kontron: Change WDOG_ANY signal from push-pull to open-drain
  ARM: dts: imx6ul-kontron: Move watchdog from Kontron i.MX6UL/ULL board to SoM
  arm64: dts: imx8mm-beacon: Fix voltages on LDO1 and LDO2
  arm64: dts: imx8mn-ddr4-evk: correct ldo1/ldo2 voltage range
  arm64: dts: imx8mm-evk: correct ldo1/ldo2 voltage range

Link: https://lore.kernel.org/r/20200624111725.GA24312@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
4 years agoMerge tag 'arm-soc/for-5.8/drivers-fixes' of https://github.com/Broadcom/stblinux...
Arnd Bergmann [Sun, 28 Jun 2020 12:48:06 +0000 (14:48 +0200)]
Merge tag 'arm-soc/for-5.8/drivers-fixes' of https://github.com/Broadcom/stblinux into arm/fixes

This pull request contains Broadcom ARM/ARM64/MIPS SoCs drivers fixes
for 5.8, please pull the following:

- Andy provides a fix for the Raspberry Pi firmware driver to print the
  correct time upon boot. This is a fallout from a converstion to use
  the ptT format

* tag 'arm-soc/for-5.8/drivers-fixes' of https://github.com/Broadcom/stblinux:
  ARM: bcm2835: Fix integer overflow in rpi_firmware_print_firmware_revision()

Link: https://lore.kernel.org/r/20200619202250.19029-2-f.fainelli@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
4 years agoMerge tag 'arm-soc/for-5.8/soc-fixes' of https://github.com/Broadcom/stblinux into...
Arnd Bergmann [Sun, 28 Jun 2020 12:47:40 +0000 (14:47 +0200)]
Merge tag 'arm-soc/for-5.8/soc-fixes' of https://github.com/Broadcom/stblinux into arm/fixes

This pull request contains Broadcom ARM-based SoCs machine/Kconfig fixes
for 5.8, please pull the following:

- Matthew adds a missing select to permit the use of the standard ARM
  SP804 timers on Norsthstar Plus (NSP)

* tag 'arm-soc/for-5.8/soc-fixes' of https://github.com/Broadcom/stblinux:
  ARM: bcm: Select ARM_TIMER_SP804 for ARCH_BCM_NSP

Link: https://lore.kernel.org/r/20200619202250.19029-3-f.fainelli@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
4 years agoMerge tag 'arm-soc/for-5.8/devicetree-fixes' of https://github.com/Broadcom/stblinux...
Arnd Bergmann [Sun, 28 Jun 2020 12:47:24 +0000 (14:47 +0200)]
Merge tag 'arm-soc/for-5.8/devicetree-fixes' of https://github.com/Broadcom/stblinux into arm/fixes

This pull request contains Broadcom ARM-based SoCs Device Tree fixes for
5.8, please pull the following:

- Rafal adds a missing 'device_type' property to the Luxul XWC-2000
  required for the memory nodes to be correctly parsed by Linux

- Matthew provides two fixes for the NSP SoCs, one to disable the PL330
  DMA controller by default since it can be left in reset by the
  bootloader and the second to correct the flow accelerator mailbox node

* tag 'arm-soc/for-5.8/devicetree-fixes' of https://github.com/Broadcom/stblinux:
  ARM: dts: NSP: Correct FA2 mailbox node
  ARM: dts: NSP: Disable PL330 by default, add dma-coherent property
  ARM: dts: BCM5301X: Add missing memory "device_type" for Luxul XWC-2000

Link: https://lore.kernel.org/r/20200619202250.19029-1-f.fainelli@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
4 years agoRevert "ARM: sti: Implement dummy L2 cache's write_sec"
Patrice Chotard [Thu, 18 Jun 2020 17:24:56 +0000 (19:24 +0200)]
Revert "ARM: sti: Implement dummy L2 cache's write_sec"

This reverts commit 7b8e0188fa717cd9abc4fb52587445b421835c2a.

Initially, STiH410-B2260 was supposed to be secured, that's why
l2c_write_sec was stubbed to avoid secure register access from
non secure world.

But by default, STiH410-B2260 is running in non secure mode,
so L2 cache register accesses are authorized, l2c_write_sec stub
is not needed.

With this patch, L2 cache is configured and performance are enhanced.

Link: https://lore.kernel.org/r/20200618172456.29475-1-patrice.chotard@st.com
Signed-off-by: Patrice Chotard <patrice.chotard@st.com>
Cc: Alain Volmat <alain.volmat@st.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
4 years agoMerge tag 'omap-for-v5.8/fixes-rc1-signed' of git://git.kernel.org/pub/scm/linux...
Arnd Bergmann [Sun, 28 Jun 2020 12:45:05 +0000 (14:45 +0200)]
Merge tag 'omap-for-v5.8/fixes-rc1-signed' of git://git./linux/kernel/git/tmlind/linux-omap into arm/omap-fixes

Few dts fixes for omaps for v5.8

Few fixes for various devices:

- Prevent pocketgeagle header line signal from accidentally setting
  micro-SD write protection signal by removing the default mux

- Fix NFSroot flakeyness after resume for duover by switching the
  smsc911x gpio interrupt to back to level sensitive

- Fix regression for omap4 clockevent source after recent system
  timer changes

- Yet another ethernet regression fix for the "rgmii" vs "rgmii-rxid"
  phy-mode

* tag 'omap-for-v5.8/fixes-rc1-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: am5729: beaglebone-ai: fix rgmii phy-mode
  ARM: dts: Fix omap4 system timer source clocks
  ARM: dts: Fix duovero smsc interrupt for suspend
  ARM: dts: am335x-pocketbeagle: Fix mmc0 Write Protect

Link: https://lore.kernel.org/r/pull-1592499282-121092@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
4 years agoMerge tag 'omap-for-v5.8/dt-missed-signed' of git://git.kernel.org/pub/scm/linux...
Arnd Bergmann [Sun, 28 Jun 2020 12:44:39 +0000 (14:44 +0200)]
Merge tag 'omap-for-v5.8/dt-missed-signed' of git://git./linux/kernel/git/tmlind/linux-omap into arm/omap-fixes

Missed sdhci patch for am3 and am4

I forgot to send a pull request earlier for converting am3 and am4 to
use sdhci-omap driver instead of the old omap_hsmmc driver.

There was a display subsystem related suspend and resume regression found
recently and looks like I forgot to send a pull request for this patch
while debugging the regression. This patch has been tested without the
display subsystem, and has been in Linux next for several weeks now, so
would be good to have merged for v5.8.

* tag 'omap-for-v5.8/dt-missed-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: Move am33xx and am43xx mmc nodes to sdhci-omap driver

Link: https://lore.kernel.org/r/pull-1591637467-607254@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
4 years agoMerge tag 'tee-ml-for-v5.8' of git://git.linaro.org/people/jens.wiklander/linux-tee...
Arnd Bergmann [Fri, 26 Jun 2020 22:20:55 +0000 (00:20 +0200)]
Merge tag 'tee-ml-for-v5.8' of git://git.linaro.org/people/jens.wiklander/linux-tee into arm/fixes

Change the TEE mailing list in MAINTAINERS

* tag 'tee-ml-for-v5.8' of git://git.linaro.org/people/jens.wiklander/linux-tee:
  MAINTAINERS: change tee mailing list

Link: https://lore.kernel.org/r/20200616075948.GA2288211@jade
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
4 years agoMerge tag 'omap-for-v5.8/fixes-merge-window-signed' of git://git.kernel.org/pub/scm...
Arnd Bergmann [Fri, 26 Jun 2020 22:17:23 +0000 (00:17 +0200)]
Merge tag 'omap-for-v5.8/fixes-merge-window-signed' of git://git./linux/kernel/git/tmlind/linux-omap into arm/fixes

Fixes for omaps for v5.8

The recent display subsystem (DSS) related platform data changes caused
display related regressions for suspend and resume. Looks like I only
tested suspend and resume before dropping the legacy platform data, and
forgot to test it after dropping it. Turns out the main issue was that
we no longer have platform code calling pm_runtime_suspend for DSS like
we did for the legacy platform data case, and that fix is still being
discussed on the dri-devel list and will get merged separately. The DSS
related testing exposed a pile other other display related issues that
also need fixing though:

- Fix ti-sysc optional clock handling and reset status checks
  for devices that reset automatically in idle like DSS

- Ignore ti-sysc clockactivity bit unless separately requested
  to avoid unexpected performance issues

- Init ti-sysc framedonetv_irq to true and disable for am4

- Avoid duplicate DSS reset for legacy mode with dts data

- Remove LCD timings for am4 as they cause warnings now that we're
  using generic panels

Then there is a pile of other fixes not related to the DSS:

- Fix omap_prm reset deassert as we still have drivers setting the
  pm_runtime_irq_safe() flag

- Flush posted write for ti-sysc enable and disable

- Fix droid4 spi related errors with spi flags

- Fix am335x USB range and a typo for softreset

- Fix dra7 timer nodes for clocks for IPU and DSP

- Drop duplicate mailboxes after mismerge for dra7

* tag 'omap-for-v5.8/fixes-merge-window-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  Revert "bus: ti-sysc: Increase max softreset wait"
  ARM: dts: am437x-epos-evm: remove lcd timings
  ARM: dts: am437x-gp-evm: remove lcd timings
  ARM: dts: am437x-sk-evm: remove lcd timings
  ARM: dts: dra7-evm-common: Fix duplicate mailbox nodes
  ARM: dts: dra7: Fix timer nodes properly for timer_sys_ck clocks
  ARM: dts: Fix am33xx.dtsi ti,sysc-mask wrong softreset flag
  ARM: dts: Fix am33xx.dtsi USB ranges length
  bus: ti-sysc: Increase max softreset wait
  ARM: OMAP2+: Fix legacy mode dss_reset
  bus: ti-sysc: Fix uninitialized framedonetv_irq
  bus: ti-sysc: Ignore clockactivity unless specified as a quirk
  bus: ti-sysc: Use optional clocks on for enable and wait for softreset bit
  ARM: dts: omap4-droid4: Fix spi configuration and increase rate
  bus: ti-sysc: Flush posted write on enable and disable
  soc: ti: omap-prm: use atomic iopoll instead of sleeping one

Link: https://lore.kernel.org/r/pull-1591889257-410830@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
4 years agoafs: Fix storage of cell names
David Howells [Wed, 24 Jun 2020 16:00:24 +0000 (17:00 +0100)]
afs: Fix storage of cell names

The cell name stored in the afs_cell struct is a 64-char + NUL buffer -
when it needs to be able to handle up to AFS_MAXCELLNAME (256 chars) + NUL.

Fix this by changing the array to a pointer and allocating the string.

Found using Coverity.

Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge tag '5.8-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 27 Jun 2020 22:24:04 +0000 (15:24 -0700)]
Merge tag '5.8-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Six cifs/smb3 fixes, three of them for stable.

  Fixes xfstests 451, 313 and 316"

* tag '5.8-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: misc: Use array_size() in if-statement controlling expression
  cifs: update ctime and mtime during truncate
  cifs/smb3: Fix data inconsistent when punch hole
  cifs/smb3: Fix data inconsistent when zero file range
  cifs: Fix double add page to memcg when cifs_readpages
  cifs: Fix cached_fid refcnt leak in open_shroot

4 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 27 Jun 2020 22:20:03 +0000 (15:20 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Six small fixes, five in drivers and one to correct another minor
  regression from cc97923a5bcc ("block: move dma drain handling to
  scsi") where we still need the drain stub to be built in to the kernel
  for the modular libata, non-modular SAS driver case"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: mptscsih: Fix read sense data size
  scsi: zfcp: Fix panic on ERP timeout for previously dismissed ERP action
  scsi: lpfc: Avoid another null dereference in lpfc_sli4_hba_unset()
  scsi: libata: Fix the ata_scsi_dma_need_drain stub
  scsi: qla2xxx: Keep initiator ports after RSCN
  scsi: qla2xxx: Set NVMe status code for failed NVMe FCP request

4 years agoMerge tag 'vfio-v5.8-rc3' of git://github.com/awilliam/linux-vfio
Linus Torvalds [Sat, 27 Jun 2020 22:17:21 +0000 (15:17 -0700)]
Merge tag 'vfio-v5.8-rc3' of git://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:

 - Fix double free of eventfd ctx (Alex Williamson)

 - Fix duplicate use of capability ID (Alex Williamson)

 - Fix SR-IOV VF memory enable handling (Alex Williamson)

* tag 'vfio-v5.8-rc3' of git://github.com/awilliam/linux-vfio:
  vfio/pci: Fix SR-IOV VF handling with MMIO blocking
  vfio/type1: Fix migration info capability ID
  vfio/pci: Clear error and request eventfd ctx after releasing