platform/kernel/linux-rpi.git
2 years agokasan: introduce kasan_init_object_meta
Andrey Konovalov [Mon, 5 Sep 2022 21:05:23 +0000 (23:05 +0200)]
kasan: introduce kasan_init_object_meta

Add a kasan_init_object_meta() helper that initializes metadata for a slab
object and use it in the common code.

For now, the implementations of this helper are the same for the Generic
and tag-based modes, but they will diverge later in the series.

This change hides references to alloc_meta from the common code.  This is
desired as only the Generic mode will be using per-object metadata after
this series.

Link: https://lkml.kernel.org/r/47c12938fc7f8105e7aaa592527c0e9d3c81fc37.1662411799.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agokasan: introduce kasan_get_alloc_track
Andrey Konovalov [Mon, 5 Sep 2022 21:05:22 +0000 (23:05 +0200)]
kasan: introduce kasan_get_alloc_track

Add a kasan_get_alloc_track() helper that fetches alloc_track for a slab
object and use this helper in the common reporting code.

For now, the implementations of this helper are the same for the Generic
and tag-based modes, but they will diverge later in the series.

This change hides references to alloc_meta from the common reporting code.
This is desired as only the Generic mode will be using per-object
metadata after this series.

Link: https://lkml.kernel.org/r/0c365a35f4a833fff46f9d42c3212b32f7166556.1662411799.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agokasan: introduce kasan_print_aux_stacks
Andrey Konovalov [Mon, 5 Sep 2022 21:05:21 +0000 (23:05 +0200)]
kasan: introduce kasan_print_aux_stacks

Add a kasan_print_aux_stacks() helper that prints the auxiliary stack
traces for the Generic mode.

This change hides references to alloc_meta from the common reporting code.
This is desired as only the Generic mode will be using per-object
metadata after this series.

Link: https://lkml.kernel.org/r/67c7a9ea6615533762b1f8ccc267cd7f9bafb749.1662411799.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agokasan: drop CONFIG_KASAN_TAGS_IDENTIFY
Andrey Konovalov [Mon, 5 Sep 2022 21:05:20 +0000 (23:05 +0200)]
kasan: drop CONFIG_KASAN_TAGS_IDENTIFY

Drop CONFIG_KASAN_TAGS_IDENTIFY and related code to simplify making
changes to the reporting code.

The dropped functionality will be restored in the following patches in
this series.

Link: https://lkml.kernel.org/r/4c66ba98eb237e9ed9312c19d423bbcf4ecf88f8.1662411799.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agokasan: split save_alloc_info implementations
Andrey Konovalov [Mon, 5 Sep 2022 21:05:19 +0000 (23:05 +0200)]
kasan: split save_alloc_info implementations

Provide standalone implementations of save_alloc_info() for the Generic
and tag-based modes.

For now, the implementations are the same, but they will diverge later in
the series.

Link: https://lkml.kernel.org/r/77f1a078489c1e859aedb5403f772e5e1f7410a0.1662411799.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agokasan: move is_kmalloc check out of save_alloc_info
Andrey Konovalov [Mon, 5 Sep 2022 21:05:18 +0000 (23:05 +0200)]
kasan: move is_kmalloc check out of save_alloc_info

Move kasan_info.is_kmalloc check out of save_alloc_info().

This is a preparatory change that simplifies the following patches in this
series.

Link: https://lkml.kernel.org/r/df89f1915b788f9a10319905af6d0202a3b30c30.1662411799.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agokasan: rename kasan_set_*_info to kasan_save_*_info
Andrey Konovalov [Mon, 5 Sep 2022 21:05:17 +0000 (23:05 +0200)]
kasan: rename kasan_set_*_info to kasan_save_*_info

Rename set_alloc_info() and kasan_set_free_info() to save_alloc_info() and
kasan_save_free_info().  The new names make more sense.

Link: https://lkml.kernel.org/r/9f04777a15cb9d96bf00331da98e021d732fe1c9.1662411799.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agokasan: check KASAN_NO_FREE_META in __kasan_metadata_size
Andrey Konovalov [Mon, 5 Sep 2022 21:05:16 +0000 (23:05 +0200)]
kasan: check KASAN_NO_FREE_META in __kasan_metadata_size

Patch series "kasan: switch tag-based modes to stack ring from per-object
metadata", v3.

This series makes the tag-based KASAN modes use a ring buffer for storing
stack depot handles for alloc/free stack traces for slab objects instead
of per-object metadata.  This ring buffer is referred to as the stack
ring.

On each alloc/free of a slab object, the tagged address of the object and
the current stack trace are recorded in the stack ring.

On each bug report, if the accessed address belongs to a slab object, the
stack ring is scanned for matching entries.  The newest entries are used
to print the alloc/free stack traces in the report: one entry for alloc
and one for free.

The advantages of this approach over storing stack trace handles in
per-object metadata with the tag-based KASAN modes:

- Allows to find relevant stack traces for use-after-free bugs without
  using quarantine for freed memory. (Currently, if the object was
  reallocated multiple times, the report contains the latest alloc/free
  stack traces, not necessarily the ones relevant to the buggy allocation.)
- Allows to better identify and mark use-after-free bugs, effectively
  making the CONFIG_KASAN_TAGS_IDENTIFY functionality always-on.
- Has fixed memory overhead.

The disadvantage:

- If the affected object was allocated/freed long before the bug happened
  and the stack trace events were purged from the stack ring, the report
  will have no stack traces.

Discussion
==========

The proposed implementation of the stack ring uses a single ring buffer
for the whole kernel.  This might lead to contention due to atomic
accesses to the ring buffer index on multicore systems.

At this point, it is unknown whether the performance impact from this
contention would be significant compared to the slowdown introduced by
collecting stack traces due to the planned changes to the latter part, see
the section below.

For now, the proposed implementation is deemed to be good enough, but this
might need to be revisited once the stack collection becomes faster.

A considered alternative is to keep a separate ring buffer for each CPU
and then iterate over all of them when printing a bug report.  This
approach requires somehow figuring out which of the stack rings has the
freshest stack traces for an object if multiple stack rings have them.

Further plans
=============

This series is a part of an effort to make KASAN stack trace collection
suitable for production.  This requires stack trace collection to be fast
and memory-bounded.

The planned steps are:

1. Speed up stack trace collection (potentially, by using SCS;
   patches on-hold until steps #2 and #3 are completed).
2. Keep stack trace handles in the stack ring (this series).
3. Add a memory-bounded mode to stack depot or provide an alternative
   memory-bounded stack storage.
4. Potentially, implement stack trace collection sampling to minimize
   the performance impact.

This patch (of 34):

__kasan_metadata_size() calculates the size of the redzone for objects in
a slab cache.

When accounting for presence of kasan_free_meta in the redzone, this
function only compares free_meta_offset with 0.  But free_meta_offset
could also be equal to KASAN_NO_FREE_META, which indicates that
kasan_free_meta is not present at all.

Add a comparison with KASAN_NO_FREE_META into __kasan_metadata_size().

Link: https://lkml.kernel.org/r/cover.1662411799.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/c7b316d30d90e5947eb8280f4dc78856a49298cf.1662411799.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agofilemap: convert filemap_range_has_writeback() to use folios
Vishal Moola (Oracle) [Mon, 5 Sep 2022 21:45:57 +0000 (14:45 -0700)]
filemap: convert filemap_range_has_writeback() to use folios

Removes 3 calls to compound_head().

Link: https://lkml.kernel.org/r/20220905214557.868606-1-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agohugetlb_encode.h: fix undefined behaviour (34 << 26)
Matthias Goergens [Mon, 5 Sep 2022 03:19:04 +0000 (11:19 +0800)]
hugetlb_encode.h: fix undefined behaviour (34 << 26)

Left-shifting past the size of your datatype is undefined behaviour in C.
The literal 34 gets the type `int`, and that one is not big enough to be
left shifted by 26 bits.

An `unsigned` is long enough (on any machine that has at least 32 bits for
their ints.)

For uniformity, we mark all the literals as unsigned.  But it's only
really needed for HUGETLB_FLAG_ENCODE_16GB.

Thanks to Randy Dunlap for an initial review and suggestion.

Link: https://lkml.kernel.org/r/20220905031904.150925-1-matthias.goergens@gmail.com
Signed-off-by: Matthias Goergens <matthias.goergens@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/damon/sysfs: simplify the judgement whether kdamonds are busy
Kaixu Xia [Sun, 4 Sep 2022 14:36:06 +0000 (22:36 +0800)]
mm/damon/sysfs: simplify the judgement whether kdamonds are busy

It is unnecessary to get the number of the running kdamond to judge
whether kdamonds are busy.  Here we can use the
damon_sysfs_kdamond_running() helper and return -EBUSY directly when
finding a running kdamond.  Meanwhile, merging with the judgement that a
kdamond has current sysfs command callback request to make the code more
clear.

Link: https://lkml.kernel.org/r/1662302166-13216-1-git-send-email-kaixuxia@tencent.com
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/hugetlb.c: remove unnecessary initialization of local `err'
Li zeming [Mon, 5 Sep 2022 02:09:18 +0000 (10:09 +0800)]
mm/hugetlb.c: remove unnecessary initialization of local `err'

Link: https://lkml.kernel.org/r/20220905020918.3552-1-zeming@nfschina.com
Signed-off-by: Li zeming <zeming@nfschina.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: convert lock_page_or_retry() to folio_lock_or_retry()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:53 +0000 (20:46 +0100)]
mm: convert lock_page_or_retry() to folio_lock_or_retry()

Remove a call to compound_head() in each of the two callers.

Link: https://lkml.kernel.org/r/20220902194653.1739778-58-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agouprobes: use new_folio in __replace_page()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:52 +0000 (20:46 +0100)]
uprobes: use new_folio in __replace_page()

Saves several calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-57-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agormap: remove page_unlock_anon_vma_read()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:51 +0000 (20:46 +0100)]
rmap: remove page_unlock_anon_vma_read()

This was simply an alias for anon_vma_unlock_read() since 2011.

Link: https://lkml.kernel.org/r/20220902194653.1739778-56-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: convert page_get_anon_vma() to folio_get_anon_vma()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:50 +0000 (20:46 +0100)]
mm: convert page_get_anon_vma() to folio_get_anon_vma()

With all callers now passing in a folio, rename the function and convert
all callers.  Removes a couple of calls to compound_head() and a reference
to page->mapping.

Link: https://lkml.kernel.org/r/20220902194653.1739778-55-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agohuge_memory: convert unmap_page() to unmap_folio()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:49 +0000 (20:46 +0100)]
huge_memory: convert unmap_page() to unmap_folio()

Remove a folio->page->folio conversion.

Link: https://lkml.kernel.org/r/20220902194653.1739778-54-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agohuge_memory: convert split_huge_page_to_list() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:48 +0000 (20:46 +0100)]
huge_memory: convert split_huge_page_to_list() to use a folio

Saves many calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-53-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomigrate: convert unmap_and_move_huge_page() to use folios
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:47 +0000 (20:46 +0100)]
migrate: convert unmap_and_move_huge_page() to use folios

Saves several calls to compound_head() and removes a couple of uses of
page->lru.

Link: https://lkml.kernel.org/r/20220902194653.1739778-52-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomigrate: convert __unmap_and_move() to use folios
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:46 +0000 (20:46 +0100)]
migrate: convert __unmap_and_move() to use folios

Removes a lot of calls to compound_head().  Also remove a VM_BUG_ON that
can never trigger as the PageAnon bit is the bottom bit of page->mapping.

Link: https://lkml.kernel.org/r/20220902194653.1739778-51-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agormap: convert page_move_anon_rmap() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:45 +0000 (20:46 +0100)]
rmap: convert page_move_anon_rmap() to use a folio

Removes one call to compound_head() and a reference to page->mapping.

Link: https://lkml.kernel.org/r/20220902194653.1739778-50-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: remove try_to_free_swap()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:44 +0000 (20:46 +0100)]
mm: remove try_to_free_swap()

All callers have now been converted to folio_free_swap() and we can remove
this wrapper.

Link: https://lkml.kernel.org/r/20220902194653.1739778-49-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomemcg: convert mem_cgroup_swap_full() to take a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:43 +0000 (20:46 +0100)]
memcg: convert mem_cgroup_swap_full() to take a folio

All callers now have a folio, so convert the function to take a folio.
Saves a couple of calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-48-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: convert do_swap_page() to use folio_free_swap()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:42 +0000 (20:46 +0100)]
mm: convert do_swap_page() to use folio_free_swap()

Also convert should_try_to_free_swap() to use a folio.  This removes a few
calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-47-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoksm: use a folio in replace_page()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:41 +0000 (20:46 +0100)]
ksm: use a folio in replace_page()

Replace three calls to compound_head() with one.

Link: https://lkml.kernel.org/r/20220902194653.1739778-46-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agouprobes: use folios more widely in __replace_page()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:40 +0000 (20:46 +0100)]
uprobes: use folios more widely in __replace_page()

Remove a few hidden calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-45-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomadvise: convert madvise_free_pte_range() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:39 +0000 (20:46 +0100)]
madvise: convert madvise_free_pte_range() to use a folio

Saves a lot of calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-44-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agohuge_memory: convert do_huge_pmd_wp_page() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:38 +0000 (20:46 +0100)]
huge_memory: convert do_huge_pmd_wp_page() to use a folio

Removes many calls to compound_head().  Does not remove the assumption
that a folio may not be larger than a PMD.

Link: https://lkml.kernel.org/r/20220902194653.1739778-43-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: convert do_wp_page() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:37 +0000 (20:46 +0100)]
mm: convert do_wp_page() to use a folio

Saves many calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-42-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoswap: convert swap_writepage() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:36 +0000 (20:46 +0100)]
swap: convert swap_writepage() to use a folio

Removes many calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-41-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoswap_state: convert free_swap_cache() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:35 +0000 (20:46 +0100)]
swap_state: convert free_swap_cache() to use a folio

Saves several calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-40-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: remove lookup_swap_cache()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:34 +0000 (20:46 +0100)]
mm: remove lookup_swap_cache()

All callers have now been converted to swap_cache_get_folio(), so we can
remove this wrapper.

Link: https://lkml.kernel.org/r/20220902194653.1739778-39-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: convert do_swap_page() to use swap_cache_get_folio()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:33 +0000 (20:46 +0100)]
mm: convert do_swap_page() to use swap_cache_get_folio()

Saves a folio->page->folio conversion.

Link: https://lkml.kernel.org/r/20220902194653.1739778-38-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoswapfile: convert unuse_pte_range() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:32 +0000 (20:46 +0100)]
swapfile: convert unuse_pte_range() to use a folio

Delay fetching the precise page from the folio until we're in unuse_pte().
Saves many calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-37-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoswapfile: convert __try_to_reclaim_swap() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:31 +0000 (20:46 +0100)]
swapfile: convert __try_to_reclaim_swap() to use a folio

Saves five calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-36-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoswapfile: convert try_to_unuse() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:30 +0000 (20:46 +0100)]
swapfile: convert try_to_unuse() to use a folio

Saves five calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-35-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: remove shmem_getpage()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:29 +0000 (20:46 +0100)]
shmem: remove shmem_getpage()

With all callers removed, remove this wrapper function.  The flags are now
mysteriously called SGP, but I think we can live with that.

Link: https://lkml.kernel.org/r/20220902194653.1739778-34-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agouserfaultfd: convert mcontinue_atomic_pte() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:28 +0000 (20:46 +0100)]
userfaultfd: convert mcontinue_atomic_pte() to use a folio

shmem_getpage() is being replaced by shmem_get_folio() so use a folio
throughout this function.  Saves several calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-33-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agokhugepaged: call shmem_get_folio()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:27 +0000 (20:46 +0100)]
khugepaged: call shmem_get_folio()

shmem_getpage() is being removed, so call its replacement and find the
precise page ourselves.

Link: https://lkml.kernel.org/r/20220902194653.1739778-32-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: convert shmem_get_link() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:26 +0000 (20:46 +0100)]
shmem: convert shmem_get_link() to use a folio

Symlinks will never use a large folio, but using the folio API removes a
lot of unnecessary folio->page->folio conversions.

Link: https://lkml.kernel.org/r/20220902194653.1739778-31-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: convert shmem_symlink() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:25 +0000 (20:46 +0100)]
shmem: convert shmem_symlink() to use a folio

While symlinks will always be < PAGE_SIZE, using the folio APIs gets rid
of unnecessary calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-30-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: convert shmem_fallocate() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:24 +0000 (20:46 +0100)]
shmem: convert shmem_fallocate() to use a folio

Call shmem_get_folio() and use the folio APIs instead of the page APIs.
Saves several calls to compound_head() and removes assumptions about the
size of a large folio.

Link: https://lkml.kernel.org/r/20220902194653.1739778-29-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: convert shmem_file_read_iter() to use shmem_get_folio()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:23 +0000 (20:46 +0100)]
shmem: convert shmem_file_read_iter() to use shmem_get_folio()

Use a folio throughout, saving five calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-28-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: convert shmem_write_begin() to use shmem_get_folio()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:22 +0000 (20:46 +0100)]
shmem: convert shmem_write_begin() to use shmem_get_folio()

Use a folio throughout this function, saving a couple of calls to
compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-27-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: convert shmem_get_partial_folio() to use shmem_get_folio()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:21 +0000 (20:46 +0100)]
shmem: convert shmem_get_partial_folio() to use shmem_get_folio()

Get rid of an unnecessary folio->page->folio conversion.

Link: https://lkml.kernel.org/r/20220902194653.1739778-26-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: add shmem_get_folio()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:20 +0000 (20:46 +0100)]
shmem: add shmem_get_folio()

With no remaining callers of shmem_getpage_gfp(), add shmem_get_folio()
and reimplement shmem_getpage() as a call to shmem_get_folio().

Link: https://lkml.kernel.org/r/20220902194653.1739778-25-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: convert shmem_read_mapping_page_gfp() to use shmem_get_folio_gfp()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:19 +0000 (20:46 +0100)]
shmem: convert shmem_read_mapping_page_gfp() to use shmem_get_folio_gfp()

Saves a couple of calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-24-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: convert shmem_fault() to use shmem_get_folio_gfp()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:18 +0000 (20:46 +0100)]
shmem: convert shmem_fault() to use shmem_get_folio_gfp()

No particular advantage for this function, but necessary to remove
shmem_getpage_gfp().

[hughd@google.com: fix crash]
Link: https://lkml.kernel.org/r/7693a84-bdc2-27b5-2695-d0fe8566571f@google.com
Link: https://lkml.kernel.org/r/20220902194653.1739778-23-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: convert shmem_getpage_gfp() to shmem_get_folio_gfp()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:17 +0000 (20:46 +0100)]
shmem: convert shmem_getpage_gfp() to shmem_get_folio_gfp()

Add a shmem_getpage_gfp() wrapper for compatibility with current users.

Link: https://lkml.kernel.org/r/20220902194653.1739778-22-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: eliminate struct page from shmem_swapin_folio()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:16 +0000 (20:46 +0100)]
shmem: eliminate struct page from shmem_swapin_folio()

Convert shmem_swapin() to return a folio and use swap_cache_get_folio(),
removing all uses of struct page in this function.

Link: https://lkml.kernel.org/r/20220902194653.1739778-21-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoswap: add swap_cache_get_folio()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:15 +0000 (20:46 +0100)]
swap: add swap_cache_get_folio()

Convert lookup_swap_cache() into swap_cache_get_folio() and add a
lookup_swap_cache() wrapper around it.

[akpm@linux-foundation.org: add CONFIG_SWAP=n stub for swap_cache_get_folio()]
Link: https://lkml.kernel.org/r/20220902194653.1739778-20-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: convert shmem_replace_page() to shmem_replace_folio()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:14 +0000 (20:46 +0100)]
shmem: convert shmem_replace_page() to shmem_replace_folio()

The caller has a folio, so convert the calling convention and rename the
function.

Link: https://lkml.kernel.org/r/20220902194653.1739778-19-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: convert shmem_mfill_atomic_pte() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:13 +0000 (20:46 +0100)]
shmem: convert shmem_mfill_atomic_pte() to use a folio

Assert that this is a single-page folio as there are several assumptions
in here that it's exactly PAGE_SIZE bytes large.  Saves several calls to
compound_head() and removes the last caller of shmem_alloc_page().

Link: https://lkml.kernel.org/r/20220902194653.1739778-18-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomemcg: convert mem_cgroup_swapin_charge_page() to mem_cgroup_swapin_charge_folio()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:12 +0000 (20:46 +0100)]
memcg: convert mem_cgroup_swapin_charge_page() to mem_cgroup_swapin_charge_folio()

All callers now have a folio, so pass it in here and remove an unnecessary
call to page_folio().

Link: https://lkml.kernel.org/r/20220902194653.1739778-17-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: convert do_swap_page()'s swapcache variable to a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:11 +0000 (20:46 +0100)]
mm: convert do_swap_page()'s swapcache variable to a folio

The 'swapcache' variable is used to track whether the page is from the
swapcache or not.  It can do this equally well by being the folio of the
page rather than the page itself, and this saves a number of calls to
compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-16-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: convert do_swap_page() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:10 +0000 (20:46 +0100)]
mm: convert do_swap_page() to use a folio

Removes quite a lot of calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-15-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/swap: convert put_swap_page() to put_swap_folio()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:09 +0000 (20:46 +0100)]
mm/swap: convert put_swap_page() to put_swap_folio()

With all callers now using a folio, we can convert this function.

Link: https://lkml.kernel.org/r/20220902194653.1739778-14-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/swap: convert add_to_swap_cache() to take a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:08 +0000 (20:46 +0100)]
mm/swap: convert add_to_swap_cache() to take a folio

With all callers using folios, we can convert add_to_swap_cache() to take
a folio and use it throughout.

Link: https://lkml.kernel.org/r/20220902194653.1739778-13-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/swap: convert __read_swap_cache_async() to use a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:07 +0000 (20:46 +0100)]
mm/swap: convert __read_swap_cache_async() to use a folio

Remove a few hidden (and one visible) calls to compound_head().

Link: https://lkml.kernel.org/r/20220902194653.1739778-12-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/swapfile: convert try_to_free_swap() to folio_free_swap()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:06 +0000 (20:46 +0100)]
mm/swapfile: convert try_to_free_swap() to folio_free_swap()

Add kernel-doc for folio_free_swap() and make it return bool.  Add a
try_to_free_swap() compatibility wrapper.

Link: https://lkml.kernel.org/r/20220902194653.1739778-11-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/swapfile: remove page_swapcount()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:05 +0000 (20:46 +0100)]
mm/swapfile: remove page_swapcount()

By restructuring folio_swapped(), it can use swap_swapcount() instead of
page_swapcount().  It's even a little more efficient.

Link: https://lkml.kernel.org/r/20220902194653.1739778-10-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: convert shmem_replace_page() to use folios throughout
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:04 +0000 (20:46 +0100)]
shmem: convert shmem_replace_page() to use folios throughout

Introduce folio_set_swap_entry() to abstract how both folio->private and
swp_entry_t work.  Use swap_address_space() directly instead of
indirecting through folio_mapping().  Include an assertion that the old
folio is not large as we only allocate a single-page folio to replace it.
Use folio_put_refs() instead of calling folio_put() twice.

Link: https://lkml.kernel.org/r/20220902194653.1739778-9-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: convert shmem_delete_from_page_cache() to take a folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:03 +0000 (20:46 +0100)]
shmem: convert shmem_delete_from_page_cache() to take a folio

Remove the assertion that the page is not Compound as this function now
handles large folios correctly.

Link: https://lkml.kernel.org/r/20220902194653.1739778-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoshmem: convert shmem_writepage() to use a folio throughout
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:02 +0000 (20:46 +0100)]
shmem: convert shmem_writepage() to use a folio throughout

Even though we will split any large folio that comes in, write the code to
handle large folios so as to not leave a trap for whoever tries to handle
large folios in the swap cache.

Link: https://lkml.kernel.org/r/20220902194653.1739778-7-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: add folio_add_lru_vma()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:01 +0000 (20:46 +0100)]
mm: add folio_add_lru_vma()

Convert lru_cache_add_inactive_or_unevictable() to folio_add_lru_vma()
and add a compatibility wrapper.

Link: https://lkml.kernel.org/r/20220902194653.1739778-6-willy@infradead.org
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: add split_folio()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:46:00 +0000 (20:46 +0100)]
mm: add split_folio()

This wrapper removes a need to use split_huge_page(&folio->page).  Convert
two callers.

Link: https://lkml.kernel.org/r/20220902194653.1739778-5-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: reimplement folio_order() and folio_nr_pages()
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:45:59 +0000 (20:45 +0100)]
mm: reimplement folio_order() and folio_nr_pages()

Instead of calling compound_order() and compound_nr_pages(), use the folio
directly.  Saves 1905 bytes from mm/filemap.o due to folio_test_large()
now being a cheaper check than PageHead().

Link: https://lkml.kernel.org/r/20220902194653.1739778-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: add the first tail page to struct folio
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:45:58 +0000 (20:45 +0100)]
mm: add the first tail page to struct folio

Some of the static checkers get confused by extracting the page from the
folio and referring to fields in the first tail page.  Adding these fields
to struct folio lets us avoid doing that.  It has the risk that people
will refer to those fields without checking that the folio is actually a
large folio, so prefix them with underscores and document the preferred
function to use instead.

Link: https://lkml.kernel.org/r/20220902194653.1739778-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/vmscan: fix a lot of comments
Matthew Wilcox (Oracle) [Fri, 2 Sep 2022 19:45:57 +0000 (20:45 +0100)]
mm/vmscan: fix a lot of comments

Patch series "MM folio changes for 6.1", v2.

My focus this round has been on shmem.  I believe it is now fully
converted to folios.  Of course, shmem interacts with a lot of the swap
cache and other parts of the kernel, so there are patches all over the MM.

This patch series survives a round of xfstests on tmpfs, which is nice,
but hardly an exhaustive test.  Hugh was nice enough to run a round of
tests on it and found a bug which is fixed in this edition.

This patch (of 57):

A lot of comments mention pages when they should say folios.
Fix them up.

[akpm@linux-foundation.org: fixups for mglru additions]
Link: https://lkml.kernel.org/r/20220902194653.1739778-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20220902194653.1739778-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoksm: convert to use common struct mm_slot
Qi Zheng [Wed, 31 Aug 2022 03:19:51 +0000 (11:19 +0800)]
ksm: convert to use common struct mm_slot

Convert to use common struct mm_slot, no functional change.

Link: https://lkml.kernel.org/r/20220831031951.43152-8-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoksm: convert ksm_mm_slot.link to ksm_mm_slot.hash
Qi Zheng [Wed, 31 Aug 2022 03:19:50 +0000 (11:19 +0800)]
ksm: convert ksm_mm_slot.link to ksm_mm_slot.hash

In order to use common struct mm_slot, convert ksm_mm_slot.link to
ksm_mm_slot.hash in advance, no functional change.

Link: https://lkml.kernel.org/r/20220831031951.43152-7-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoksm: convert ksm_mm_slot.mm_list to ksm_mm_slot.mm_node
Qi Zheng [Wed, 31 Aug 2022 03:19:49 +0000 (11:19 +0800)]
ksm: convert ksm_mm_slot.mm_list to ksm_mm_slot.mm_node

In order to use common struct mm_slot, convert ksm_mm_slot.mm_list to
ksm_mm_slot.mm_node in advance, no functional change.

Link: https://lkml.kernel.org/r/20220831031951.43152-6-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoksm: add the ksm prefix to the names of the ksm private structures
Qi Zheng [Wed, 31 Aug 2022 03:19:48 +0000 (11:19 +0800)]
ksm: add the ksm prefix to the names of the ksm private structures

In order to prevent the name of the private structure of ksm from being
the same as the name of the common structure used in subsequent patches,
prefix their names with ksm in advance.

Link: https://lkml.kernel.org/r/20220831031951.43152-5-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoksm: remove redundant declarations in ksm.h
Qi Zheng [Wed, 31 Aug 2022 03:19:47 +0000 (11:19 +0800)]
ksm: remove redundant declarations in ksm.h

Currently, for struct stable_node, no one uses it in both the
include/linux/ksm.h file and the file that contains it.  For struct
mem_cgroup, it's also not used in ksm.h.  So they're all redundant, just
remove them.

Link: https://lkml.kernel.org/r/20220831031951.43152-4-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: thp: convert to use common struct mm_slot
Qi Zheng [Wed, 31 Aug 2022 03:19:46 +0000 (11:19 +0800)]
mm: thp: convert to use common struct mm_slot

Rename private struct mm_slot to struct khugepaged_mm_slot and convert to
use common struct mm_slot with no functional change.

[zhengqi.arch@bytedance.com: fix build error with CONFIG_SHMEM disabled]
Link: https://lkml.kernel.org/r/639fa8d5-8e5b-2333-69dc-40ed46219364@bytedance.com
Link: https://lkml.kernel.org/r/20220831031951.43152-3-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: introduce common struct mm_slot
Qi Zheng [Wed, 31 Aug 2022 03:19:45 +0000 (11:19 +0800)]
mm: introduce common struct mm_slot

Patch series "add common struct mm_slot and use it in THP and KSM", v2.

At present, both THP and KSM module have similar structures mm_slot for
organizing and recording the information required for scanning mm, and
each defines the following exactly the same operation functions:

 - alloc_mm_slot
 - free_mm_slot
 - get_mm_slot
 - insert_to_mm_slots_hash

In order to de-duplicate these codes, this patchset introduces a common
struct mm_slot, and lets THP and KSM to use it.

This patch (of 7):

At present, both THP and KSM module have similar structures mm_slot for
organizing and recording the information required for scanning mm, and
each defines the following exactly the same operation functions:

 - alloc_mm_slot
 - free_mm_slot
 - get_mm_slot
 - insert_to_mm_slots_hash

In order to de-duplicate these codes, this patch introduces a common
struct mm_slot, and subsequent patches will let THP and KSM to use it.

Link: https://lkml.kernel.org/r/20220831031951.43152-1-zhengqi.arch@bytedance.com
Link: https://lkml.kernel.org/r/20220831031951.43152-2-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoksm: add profit monitoring documentation
xu xin [Tue, 30 Aug 2022 14:40:03 +0000 (14:40 +0000)]
ksm: add profit monitoring documentation

Add the description of KSM profit and how to determine it separately in
system-wide range and inner a single process.

Link: https://lkml.kernel.org/r/20220830144003.299870-1-xu.xin16@zte.com.cn
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Reviewed-by: Xiaokai Ran <ran.xiaokai@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoksm: count allocated ksm rmap_items for each process
xu xin [Tue, 30 Aug 2022 14:38:38 +0000 (14:38 +0000)]
ksm: count allocated ksm rmap_items for each process

Patch series "ksm: count allocated rmap_items and update documentation",
v5.

KSM can save memory by merging identical pages, but also can consume
additional memory, because it needs to generate rmap_items to save each
scanned page's brief rmap information.

To determine how beneficial the ksm-policy (like madvise), they are using
brings, so we add a new interface /proc/<pid>/ksm_stat for each process
The value "ksm_rmap_items" in it indicates the total allocated ksm
rmap_items of this process.

The detailed description can be seen in the following patches' commit
message.

This patch (of 2):

KSM can save memory by merging identical pages, but also can consume
additional memory, because it needs to generate rmap_items to save each
scanned page's brief rmap information.  Some of these pages may be merged,
but some may not be abled to be merged after being checked several times,
which are unprofitable memory consumed.

The information about whether KSM save memory or consume memory in
system-wide range can be determined by the comprehensive calculation of
pages_sharing, pages_shared, pages_unshared and pages_volatile.  A simple
approximate calculation:

profit =~ pages_sharing * sizeof(page) - (all_rmap_items) *
         sizeof(rmap_item);

where all_rmap_items equals to the sum of pages_sharing, pages_shared,
pages_unshared and pages_volatile.

But we cannot calculate this kind of ksm profit inner single-process wide
because the information of ksm rmap_item's number of a process is lacked.
For user applications, if this kind of information could be obtained, it
helps upper users know how beneficial the ksm-policy (like madvise) they
are using brings, and then optimize their app code.  For example, one
application madvise 1000 pages as MERGEABLE, while only a few pages are
really merged, then it's not cost-efficient.

So we add a new interface /proc/<pid>/ksm_stat for each process in which
the value of ksm_rmap_itmes is only shown now and so more values can be
added in future.

So similarly, we can calculate the ksm profit approximately for a single
process by:

profit =~ ksm_merging_pages * sizeof(page) - ksm_rmap_items *
 sizeof(rmap_item);

where ksm_merging_pages is shown at /proc/<pid>/ksm_merging_pages, and
ksm_rmap_items is shown in /proc/<pid>/ksm_stat.

Link: https://lkml.kernel.org/r/20220830143731.299702-1-xu.xin16@zte.com.cn
Link: https://lkml.kernel.org/r/20220830143838.299758-1-xu.xin16@zte.com.cn
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Reviewed-by: Xiaokai Ran <ran.xiaokai@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: deduplicate cacheline padding code
Shakeel Butt [Fri, 26 Aug 2022 23:06:42 +0000 (23:06 +0000)]
mm: deduplicate cacheline padding code

There are three users (mmzone.h, memcontrol.h, page_counter.h) using
similar code for forcing cacheline padding between fields of different
structures.  Dedup that code.

Link: https://lkml.kernel.org/r/20220826230642.566725-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Suggested-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Feng Tang <feng.tang@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: reduce noise in show_mem for lowmem allocations
Michal Hocko [Tue, 23 Aug 2022 09:22:30 +0000 (11:22 +0200)]
mm: reduce noise in show_mem for lowmem allocations

While discussing early DMA pool pre-allocation failure with Christoph [1]
I have realized that the allocation failure warning is rather noisy for
constrained allocations like GFP_DMA{32}.  Those zones are usually not
populated on all nodes very often as their memory ranges are constrained.

This is an attempt to reduce the ballast that doesn't provide any relevant
information for those allocation failures investigation.  Please note that
I have only compile tested it (in my default config setup) and I am
throwing it mostly to see what people think about it.

[1] http://lkml.kernel.org/r/20220817060647.1032426-1-hch@lst.de

[mhocko@suse.com: update]
Link: https://lkml.kernel.org/r/Yw29bmJTIkKogTiW@dhcp22.suse.cz
[mhocko@suse.com: fix build]
[akpm@linux-foundation.org: fix it for mapletree]
[akpm@linux-foundation.org: update it for Michal's update]
[mhocko@suse.com: fix arch/powerpc/xmon/xmon.c]
Link: https://lkml.kernel.org/r/Ywh3C4dKB9B93jIy@dhcp22.suse.cz
[akpm@linux-foundation.org: fix arch/sparc/kernel/setup_32.c]
Link: https://lkml.kernel.org/r/YwScVmVofIZkopkF@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: fixup documentation regarding pte_numa() and PROT_NUMA
David Hildenbrand [Thu, 25 Aug 2022 16:46:59 +0000 (18:46 +0200)]
mm: fixup documentation regarding pte_numa() and PROT_NUMA

pte_numa() no longer exists -- replaced by pte_protnone() -- and PROT_NUMA
probably never existed: MM_CP_PROT_NUMA also ends up using PROT_NONE.

Let's fixup the doc.

Link: https://lkml.kernel.org/r/20220825164659.89824-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/gup: use gup_can_follow_protnone() also in GUP-fast
David Hildenbrand [Thu, 25 Aug 2022 16:46:58 +0000 (18:46 +0200)]
mm/gup: use gup_can_follow_protnone() also in GUP-fast

There seems to be no reason why FOLL_FORCE during GUP-fast would have to
fallback to the slow path when stumbling over a PROT_NONE mapped page.  We
only have to trigger hinting faults in case FOLL_FORCE is not set, and any
kind of fault handling naturally happens from the slow path -- where NUMA
hinting accounting/handling would be performed.

Note that the comment regarding THP migration is outdated: commit
2b4847e73004 ("mm: numa: serialise parallel get_user_page against THP
migration") described that this was required for THP due to lack of PMD
migration entries.  Nowadays, we do have proper PMD migration entries in
place -- see set_pmd_migration_entry(), which does a proper
pmdp_invalidate() when placing the migration entry.

So let's just reuse gup_can_follow_protnone() here to make it consistent
and drop the somewhat outdated comments.

Link: https://lkml.kernel.org/r/20220825164659.89824-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/gup: replace FOLL_NUMA by gup_can_follow_protnone()
David Hildenbrand [Thu, 25 Aug 2022 16:46:57 +0000 (18:46 +0200)]
mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()

Patch series "mm: minor cleanups around NUMA hinting".

Working on some GUP cleanups (e.g., getting rid of some FOLL_ flags) and
preparing for other GUP changes (getting rid of FOLL_FORCE|FOLL_WRITE for
for taking a R/O longterm pin), this is something I can easily send out
independently.

Get rid of FOLL_NUMA, allow FOLL_FORCE access to PROT_NONE mapped pages in
GUP-fast, and fixup some documentation around NUMA hinting.

This patch (of 3):

No need for a special flag that is not even properly documented to be
internal-only.

Let's just factor this check out and get rid of this flag.  The separate
function has the nice benefit that we can centralize comments.

Link: https://lkml.kernel.org/r/20220825164659.89824-2-david@redhat.com
Link: https://lkml.kernel.org/r/20220825164659.89824-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: fix the handling Non-LRU pages returned by follow_page
Haiyue Wang [Tue, 23 Aug 2022 13:58:41 +0000 (21:58 +0800)]
mm: fix the handling Non-LRU pages returned by follow_page

The handling Non-LRU pages returned by follow_page() jumps directly, it
doesn't call put_page() to handle the reference count, since 'FOLL_GET'
flag for follow_page() has get_page() called.  Fix the zone device page
check by handling the page reference count correctly before returning.

And as David reviewed, "device pages are never PageKsm pages".  Drop this
zone device page check for break_ksm().

Since the zone device page can't be a transparent huge page, so drop the
redundant zone device page check for split_huge_pages_pid().  (by Miaohe)

Link: https://lkml.kernel.org/r/20220823135841.934465-3-haiyue.wang@intel.com
Fixes: 3218f8712d6b ("mm: handling Non-LRU pages returned by vm_normal_pages")
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: add merging after mremap resize
Jakub Matěna [Fri, 3 Jun 2022 14:57:19 +0000 (16:57 +0200)]
mm: add merging after mremap resize

When mremap call results in expansion, it might be possible to merge the
VMA with the next VMA which might become adjacent.  This patch adds
vma_merge call after the expansion is done to try and merge.

[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/20220603145719.1012094-3-matenajakub@gmail.com
Signed-off-by: Jakub Matěna <matenajakub@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: refactor of vma_merge()
Jakub Matěna [Fri, 3 Jun 2022 14:57:18 +0000 (16:57 +0200)]
mm: refactor of vma_merge()

Patch series "Refactor of vma_merge and new merge call", v4.

I am currently working on my master's thesis trying to increase number of
merges of VMAs currently failing because of page offset incompatibility
and difference in their anon_vmas.  The following refactor and added merge
call included in this series is just two smaller upgrades I created along
the way.

This patch (of 2):

Refactor vma_merge() to make it shorter and more understandable.  Main
change is the elimination of code duplicity in the case of merge next
check.  This is done by first doing checks and caching the results before
executing the merge itself.  The variable 'area' is divided into 'mid' and
'res' as previously it was used for two purposes, as the middle VMA
between prev and next and also as the result of the merge itself.  Exit
paths are also unified.

Link: https://lkml.kernel.org/r/20220603145719.1012094-1-matenajakub@gmail.com
Link: https://lkml.kernel.org/r/20220603145719.1012094-2-matenajakub@gmail.com
Signed-off-by: Jakub Matěna <matenajakub@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Rik van Riel <riel@surriel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: delete unused MMF_OOM_VICTIM flag
Suren Baghdasaryan [Tue, 31 May 2022 22:31:00 +0000 (15:31 -0700)]
mm: delete unused MMF_OOM_VICTIM flag

With the last usage of MMF_OOM_VICTIM in exit_mmap gone, this flag is now
unused and can be removed.

[akpm@linux-foundation.org: remove comment about now-removed mm_is_oom_victim()]
Link: https://lkml.kernel.org/r/20220531223100.510392-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: drop oom code from exit_mmap
Suren Baghdasaryan [Tue, 31 May 2022 22:30:59 +0000 (15:30 -0700)]
mm: drop oom code from exit_mmap

The primary reason to invoke the oom reaper from the exit_mmap path used
to be a prevention of an excessive oom killing if the oom victim exit
races with the oom reaper (see [1] for more details).  The invocation has
moved around since then because of the interaction with the munlock logic
but the underlying reason has remained the same (see [2]).

Munlock code is no longer a problem since [3] and there shouldn't be any
blocking operation before the memory is unmapped by exit_mmap so the oom
reaper invocation can be dropped.  The unmapping part can be done with the
non-exclusive mmap_sem and the exclusive one is only required when page
tables are freed.

Remove the oom_reaper from exit_mmap which will make the code easier to
read.  This is really unlikely to make any observable difference although
some microbenchmarks could benefit from one less branch that needs to be
evaluated even though it almost never is true.

[1] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
[2] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3")
[3] a213e5cf71cb ("mm/munlock: delete munlock_vma_pages_all(), allow oomreap")

[akpm@linux-foundation.org: restore Suren's mmap_read_lock() optimization]
Link: https://lkml.kernel.org/r/20220531223100.510392-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mlock: drop dead code in count_mm_mlocked_page_nr()
Liam Howlett [Wed, 15 Jun 2022 17:40:58 +0000 (17:40 +0000)]
mm/mlock: drop dead code in count_mm_mlocked_page_nr()

The check for mm being null has never been needed since the only caller
has always passed in current->mm.  Remove the check from
count_mm_mlocked_page_nr().

Link: https://lkml.kernel.org/r/20220615174050.738523-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mmap.c: pass in mapping to __vma_link_file()
Liam R. Howlett [Tue, 6 Sep 2022 19:49:06 +0000 (19:49 +0000)]
mm/mmap.c: pass in mapping to __vma_link_file()

__vma_link_file() resolves the mapping from the file, if there is one.
Pass through the mapping and check the vm_file externally since most
places already have the required information and check of vm_file.

Link: https://lkml.kernel.org/r/20220906194824.2110408-71-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/mmap: drop range_has_overlap() function
Liam R. Howlett [Tue, 6 Sep 2022 19:49:06 +0000 (19:49 +0000)]
mm/mmap: drop range_has_overlap() function

Since there is no longer a linked list, the range_has_overlap() function
is identical to the find_vma_intersection() function.

Link: https://lkml.kernel.org/r/20220906194824.2110408-70-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm: remove the vma linked list
Liam R. Howlett [Tue, 6 Sep 2022 19:49:06 +0000 (19:49 +0000)]
mm: remove the vma linked list

Replace any vm_next use with vma_find().

Update free_pgtables(), unmap_vmas(), and zap_page_range() to use the
maple tree.

Use the new free_pgtables() and unmap_vmas() in do_mas_align_munmap().  At
the same time, alter the loop to be more compact.

Now that free_pgtables() and unmap_vmas() take a maple tree as an
argument, rearrange do_mas_align_munmap() to use the new tree to hold the
vmas to remove.

Remove __vma_link_list() and __vma_unlink_list() as they are exclusively
used to update the linked list.

Drop linked list update from __insert_vm_struct().

Rework validation of tree as it was depending on the linked list.

[yang.lee@linux.alibaba.com: fix one kernel-doc comment]
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1949
Link: https://lkml.kernel.org/r/20220824021918.94116-1-yang.lee@linux.alibaba.comLink:
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/vmscan: use vma iterator instead of vm_next
Liam R. Howlett [Tue, 6 Sep 2022 19:49:05 +0000 (19:49 +0000)]
mm/vmscan: use vma iterator instead of vm_next

Use the vma iterator in in get_next_vma() instead of the linked list.

[yuzhao@google.com: mm/vmscan: use the proper VMA iterator]
Link: https://lkml.kernel.org/r/Yx+QGOgHg1Wk8tGK@google.com
Link: https://lkml.kernel.org/r/20220906194824.2110408-68-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoriscv: use vma iterator for vdso
Liam R. Howlett [Tue, 6 Sep 2022 19:49:05 +0000 (19:49 +0000)]
riscv: use vma iterator for vdso

Remove the linked list use in favour of the vma iterator.

Link: https://lkml.kernel.org/r/20220906194824.2110408-67-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agonommu: remove uses of VMA linked list
Matthew Wilcox (Oracle) [Tue, 6 Sep 2022 19:49:05 +0000 (19:49 +0000)]
nommu: remove uses of VMA linked list

Use the maple tree or VMA iterator instead.  This is faster and will allow
us to shrink the VMA.

Link: https://lkml.kernel.org/r/20220906194824.2110408-66-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agoi915: use the VMA iterator
Matthew Wilcox (Oracle) [Tue, 6 Sep 2022 19:49:04 +0000 (19:49 +0000)]
i915: use the VMA iterator

Replace the linked list in probe_range() with the VMA iterator.

Link: https://lkml.kernel.org/r/20220906194824.2110408-65-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/swapfile: use vma iterator instead of vma linked list
Liam R. Howlett [Tue, 6 Sep 2022 19:49:04 +0000 (19:49 +0000)]
mm/swapfile: use vma iterator instead of vma linked list

unuse_mm() no longer needs to reference the linked list.

Link: https://lkml.kernel.org/r/20220906194824.2110408-64-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/pagewalk: use vma_find() instead of vma linked list
Matthew Wilcox (Oracle) [Tue, 6 Sep 2022 19:49:04 +0000 (19:49 +0000)]
mm/pagewalk: use vma_find() instead of vma linked list

walk_page_range() no longer uses the one vma linked list reference.

Link: https://lkml.kernel.org/r/20220906194824.2110408-63-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/oom_kill: use vma iterators instead of vma linked list
Liam R. Howlett [Tue, 6 Sep 2022 19:49:03 +0000 (19:49 +0000)]
mm/oom_kill: use vma iterators instead of vma linked list

Use vma iterator in preparation of removing the linked list.

Link: https://lkml.kernel.org/r/20220906194824.2110408-62-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 years agomm/msync: use vma_find() instead of vma linked list
Liam R. Howlett [Tue, 6 Sep 2022 19:49:03 +0000 (19:49 +0000)]
mm/msync: use vma_find() instead of vma linked list

Remove a single use of the vma linked list in preparation for the
removal of the linked list.  Uses find_vma() to get the next element.

Link: https://lkml.kernel.org/r/20220906194824.2110408-61-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>