mm: THP: introducing a fine-grained transparent hugepage technique for ARM64 architecture
authorSung-hun Kim <sfoon.kim@samsung.com>
Fri, 2 Jul 2021 09:43:36 +0000 (18:43 +0900)
committerHoegeun Kwon <hoegeun.kwon@samsung.com>
Mon, 7 Feb 2022 08:01:41 +0000 (17:01 +0900)
commit604430fe5316ac6002f6138c05a27a3bd58398e2
tree0bf320ea6d83899ccb9337b72b6c58699001a12e
parente6cd8f0a26ebd22d623b4b0c0b42fe359040d153
mm: THP: introducing a fine-grained transparent hugepage technique for ARM64 architecture

Transparent hugepage (THP) is one of promise solutions to deal with
increased memory footprints, but it mostly focused on server-side
environments.

This patch claims that embedded systems also get benefits by using
THP to deal with increased but still small-sized memory footprints
in applications on the embedded system.

An ARM64 architecture featured a fine-grained hugepage which support
64KB sized hugepages while the size of commonly used hugepage is 2MB.
We used these two kinds of hugepages corresponding to required size
of virtual memory.

In this patch, we developed an eager-and-conservative policy. With
this policy, the kernel do not allow to allocate 2MB hugepages on
page faults to decrease enlarged page fault latencies. Instead, the
kernel allocates 64KB hugepages to deal with hugepage allocation.
Since 64KB hugepages require the smaller order pages than 2MB
hugepages, it does not severely affect to user-noticed memory latency
due to the memory management tasks such as memory compaction.

On the other hand, khugepaged makes both 64KB hugepages and 2MB
hugepages for both anonymous pages and file pages corresponding to
virtual memory sizes.

Moreover, our proposed finegrained THP (fTHP) supports hugepage
mappings on pages in CMA. Since pages in CMA already contiguous, fTHP
just allows hugepage mappings for 64KB or 2MB aligned memory areas.

The proposed method achieves upto 32% of throughput improvement
against Linux kernel with default THP that the system runs a read
workload in lmbench [1] when the buffer is fitted in the CPU
last-level-cache. For the large-sized buffer (bigger than 2MB),
the proposed method shows similar throughput to default THP in Linux
kernel.

[1] LMbench - Tools for peformance analysis:
http://lmbench.sourceforge.net

Change-Id: I750528db8f04b37fda39052bea775d18ca5d53fb
Signed-off-by: Sung-hun Kim <sfoon.kim@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
41 files changed:
arch/arm64/include/asm/finegrained_thp.h [new file with mode: 0644]
arch/arm64/include/asm/huge_mm.h [new file with mode: 0644]
arch/arm64/include/asm/pgtable.h
arch/arm64/mm/Makefile
arch/arm64/mm/finegrained_thp.c [new file with mode: 0644]
arch/arm64/mm/huge_memory.c [new file with mode: 0644]
arch/arm64/mm/mmu.c
fs/proc/meminfo.c
include/asm-generic/finegrained_thp.h [new file with mode: 0644]
include/asm-generic/huge_mm.h [new file with mode: 0644]
include/linux/huge_mm.h
include/linux/mm.h
include/linux/mmu_notifier.h
include/linux/mmzone.h
include/linux/pgtable.h
include/linux/rmap.h
include/linux/swapops.h
include/linux/vm_event_item.h
include/uapi/asm-generic/mman-common.h
kernel/dma/Kconfig
kernel/events/uprobes.c
mm/Kconfig
mm/filemap.c
mm/gup.c
mm/huge_memory.c
mm/internal.h
mm/ioremap.c
mm/khugepaged.c
mm/madvise.c
mm/memory.c
mm/migrate.c
mm/mmap.c
mm/mprotect.c
mm/mremap.c
mm/rmap.c
mm/shmem.c
mm/swap_slots.c
mm/swapfile.c
mm/truncate.c
mm/vmscan.c
mm/vmstat.c