David Hildenbrand [Fri, 13 Jan 2023 17:10:24 +0000 (18:10 +0100)]
x86/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also on 32bit
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE just like we already do on
x86-64. After deciphering the PTE layout it becomes clear that there are
still unused bits for 2-level and 3-level page tables that we should be
able to use. Reusing a bit avoids stealing one bit from the swap offset.
While at it, mask the type in __swp_entry(); use some helper definitions
to make the macros easier to grasp.
Link: https://lkml.kernel.org/r/20230113171026.582290-25-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:23 +0000 (18:10 +0100)]
um/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using bit 10, which is yet
unused for swap PTEs.
The pte_mkuptodate() is a bit weird in __pte_to_swp_entry() for a swap PTE
... but it only messes with bit 1 and 2 and there is a comment in
set_pte(), so leave these bits alone.
While at it, mask the type in __swp_entry().
Link: https://lkml.kernel.org/r/20230113171026.582290-24-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:22 +0000 (18:10 +0100)]
sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 64bit
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
type. Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit was effectively unused.
While at it, mask the type in __swp_entry().
Link: https://lkml.kernel.org/r/20230113171026.582290-23-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:21 +0000 (18:10 +0100)]
sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by reusing the SRMMU_DIRTY bit
as that seems to be safe to reuse inside a swap PTE. This avoids having
to steal one bit from the swap offset.
While at it, relocate the swap PTE layout documentation and use the same
style now used for most other archs. Note that the old documentation was
wrong: we use 20 bit for the offset and the reserved bits were 8 instead
of 7 bits in the ascii art.
Link: https://lkml.kernel.org/r/20230113171026.582290-22-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:20 +0000 (18:10 +0100)]
sh/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using bit 6 in the PTE,
reducing the swap type in the !CONFIG_X2TLB case to 5 bits. Generic MM
currently only uses 5 bits for the type (MAX_SWAPFILES_SHIFT), so the
stolen bit is effectively unused.
Interrestingly, the swap type in the !CONFIG_X2TLB case could currently
overlap with the _PAGE_PRESENT bit, because there is a sneaky shift by 1
in __pte_to_swp_entry() and __swp_entry_to_pte(). Bit 0-7 in the
architecture specific swap PTE would get shifted to bit 1-8 in the PTE.
As generic MM uses 5 bits only, this didn't matter so far.
While at it, mask the type in __swp_entry().
Link: https://lkml.kernel.org/r/20230113171026.582290-21-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:19 +0000 (18:10 +0100)]
riscv/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
offset. This reduces the maximum swap space per file: on 32bit to 16 GiB
(was 32 GiB).
Note that this bit does not conflict with swap PMDs and could also be used
in swap PMD context later.
While at it, mask the type in __swp_entry().
Link: https://lkml.kernel.org/r/20230113171026.582290-20-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:18 +0000 (18:10 +0100)]
powerpc/nohash/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit and 64bit.
On 64bit, let's use MSB 56 (LSB 7), located right next to the page type.
On 32bit, let's use LSB 2 to avoid stealing one bit from the swap offset.
There seems to be no real reason why these bits cannot be used for swap
PTEs. The important part is that _PAGE_PRESENT and _PAGE_HASHPTE remain
0.
While at it, mask the type in __swp_entry() and remove _PAGE_BIT_SWAP_TYPE
from pte-e500.h: while it was used in 64bit code it was ignored in 32bit
code.
Link: https://lkml.kernel.org/r/20230113171026.582290-19-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:17 +0000 (18:10 +0100)]
powerpc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit book3s
We already implemented support for 64bit book3s in commit
bff9beaa2e80
("powerpc/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE for book3s")
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also in 32bit by reusing yet
unused LSB 2 / MSB 29. There seems to be no real reason why that bit
cannot be used, and reusing it avoids having to steal one bit from the
swap offset.
While at it, mask the type in __swp_entry().
Link: https://lkml.kernel.org/r/20230113171026.582290-18-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:16 +0000 (18:10 +0100)]
parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using the yet-unused
_PAGE_ACCESSED location in the swap PTE. Looking at pte_present() and
pte_none() checks, there seems to be no actual reason why we cannot use
it: we only have to make sure we're not using _PAGE_PRESENT.
Reusing this bit avoids having to steal one bit from the swap offset.
Link: https://lkml.kernel.org/r/20230113171026.582290-17-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:15 +0000 (18:10 +0100)]
openrisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
type. Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.
While at it, mask the type in __swp_entry().
Link: https://lkml.kernel.org/r/20230113171026.582290-16-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:14 +0000 (18:10 +0100)]
nios2/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using the yet-unused bit
31.
Link: https://lkml.kernel.org/r/20230113171026.582290-15-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:13 +0000 (18:10 +0100)]
nios2/mm: refactor swap PTE layout
nios2 disables swap for a good reason: it doesn't even provide sufficient
type bits as required by core MM. However, swap entries are nowadays also
used for other purposes (migration entries, PTE markers, HWPoison, ...),
and accidential use could be problematic.
Let's properly use 5 bits for the swap type and document the layout. Bits
26--31 should get ignored by hardware completely, so they can be used.
Link: https://lkml.kernel.org/r/20230113171026.582290-14-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:12 +0000 (18:10 +0100)]
mips/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE.
On 64bit, steal one bit from the type. Generic MM currently only uses 5
bits for the type (MAX_SWAPFILES_SHIFT), so the stolen bit is effectively
unused.
On 32bit we're able to locate unused bits. As the PTE layout for 32 bit
is very confusing, document it a bit better.
While at it, mask the type in __swp_entry()/mk_swap_pte().
Link: https://lkml.kernel.org/r/20230113171026.582290-13-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:11 +0000 (18:10 +0100)]
microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
type. Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.
The shift by 2 when converting between PTE and arch-specific swap entry
makes the swap PTE layout a little bit harder to decipher.
While at it, drop the comment from paulus---copy-and-paste leftover from
powerpc where we actually have _PAGE_HASHPTE---and mask the type in
__swp_entry_to_pte() as well.
Link: https://lkml.kernel.org/r/20230113171026.582290-12-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:10 +0000 (18:10 +0100)]
m68k/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
type. Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.
While at it, make sure for sun3 that the valid bit never gets set by
properly masking it off and mask the type in __swp_entry().
Link: https://lkml.kernel.org/r/20230113171026.582290-11-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:09 +0000 (18:10 +0100)]
m68k/mm: remove dummy __swp definitions for nommu
The definitions are not required, let's remove them.
Link: https://lkml.kernel.org/r/20230113171026.582290-10-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:08 +0000 (18:10 +0100)]
loongarch/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
type. Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.
While at it, also mask the type in mk_swap_pte().
Note that this bit does not conflict with swap PMDs and could also be used
in swap PMD context later.
Link: https://lkml.kernel.org/r/20230113171026.582290-9-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:07 +0000 (18:10 +0100)]
ia64/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
type. Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.
While at it, also mask the type in __swp_entry().
Link: https://lkml.kernel.org/r/20230113171026.582290-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:06 +0000 (18:10 +0100)]
hexagon/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
offset. This reduces the maximum swap space per file to 16 GiB (was 32
GiB).
While at it, mask the type in __swp_entry().
Link: https://lkml.kernel.org/r/20230113171026.582290-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Brian Cain <bcain@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:05 +0000 (18:10 +0100)]
csky/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
offset. This reduces the maximum swap space per file to 16 GiB (was 32
GiB).
We might actually be able to reuse one of the other software bits
(_PAGE_READ / PAGE_WRITE) instead, because we only have to keep
pte_present(), pte_none() and HW happy. For now, let's keep it simple
because there might be something non-obvious.
Link: https://lkml.kernel.org/r/20230113171026.582290-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Guo Ren <guoren@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:04 +0000 (18:10 +0100)]
arm/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
offset. This reduces the maximum swap space per file to 64 GiB (was 128
GiB).
While at it drop the PTE_TYPE_FAULT from __swp_entry_to_pte() which is
defined to be 0 and is rather confusing because we should be dealing with
"Linux PTEs" not "hardware PTEs". Also, properly mask the type in
__swp_entry().
Link: https://lkml.kernel.org/r/20230113171026.582290-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:03 +0000 (18:10 +0100)]
arc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using bit 5, which is yet
unused. The only important parts seems to be to not use _PAGE_PRESENT
(bit 9).
Link: https://lkml.kernel.org/r/20230113171026.582290-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:02 +0000 (18:10 +0100)]
alpha/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
type. Generic MM currently only uses 5 bits for the type
(MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused.
While at it, mask the type in mk_swap_pte() as well.
Link: https://lkml.kernel.org/r/20230113171026.582290-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Fri, 13 Jan 2023 17:10:01 +0000 (18:10 +0100)]
mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks
Patch series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all
architectures with swap PTEs".
This is the follow-up on [1]:
[PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of
anonymous pages
After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent
enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all
remaining architectures that support swap PTEs.
This makes sure that exclusive anonymous pages will stay exclusive, even
after they were swapped out -- for example, making GUP R/W FOLL_GET of
anonymous pages reliable. Details can be found in [1].
This primarily fixes remaining known O_DIRECT memory corruptions that can
happen on concurrent swapout, whereby we can lose DMA reads to a page
(modifying the user page by writing to it).
To verify, there are two test cases (requiring swap space, obviously):
(1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries
triggering a race condition.
(2) My vmsplice() test case [3] that tries to detect if the exclusive
marker was lost during swapout, not relying on a race condition.
For example, on 32bit x86 (with and without PAE), my test case fails
without these patches:
$ ./test_swp_exclusive
FAIL: page was replaced during COW
But succeeds with these patches:
$ ./test_swp_exclusive
PASS: page was not replaced during COW
Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even
the ones where swap support might be in a questionable state? This is the
first step towards removing "readable_exclusive" migration entries, and
instead using pte_swp_exclusive() also with (readable) migration entries
instead (as suggested by Peter). The only missing piece for that is
supporting pmd_swp_exclusive() on relevant architectures with THP
migration support.
As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,,
we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch.
I tried cross-compiling all relevant setups and tested on x86 and sparc64
so far.
CCing arch maintainers only on this cover letter and on the respective
patch(es).
[1] https://lkml.kernel.org/r/
20220329164329.208407-1-david@redhat.com
[2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c
[3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c
This patch (of 26):
We want to implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures.
Let's extend our sanity checks, especially testing that our PTE bit does
not affect:
* is_swap_pte() -> pte_present() and pte_none()
* the swap entry + type
* pte_swp_soft_dirty()
Especially, the pfn_pte() is dodgy when the swap PTE layout differs
heavily from ordinary PTEs. Let's properly construct a swap PTE from swap
type+offset.
[david@redhat.com: fix build]
Link: https://lkml.kernel.org/r/6aaad548-cf48-77fa-9d6c-db83d724b2eb@redhat.com
Link: https://lkml.kernel.org/r/20230113171026.582290-1-david@redhat.com
Link: https://lkml.kernel.org/r/20230113171026.582290-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: <aou@eecs.berkeley.edu>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: H. Peter Anvin (Intel) <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuerui Wang <kernel@xen0n.name>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vishal Moola (Oracle) [Sat, 14 Jan 2023 00:15:56 +0000 (16:15 -0800)]
mm/khugepaged: convert release_pte_pages() to use folios
Converts release_pte_pages() to use folios instead of pages.
Link: https://lkml.kernel.org/r/20230114001556.43795-2-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vishal Moola (Oracle) [Sat, 14 Jan 2023 00:15:55 +0000 (16:15 -0800)]
mm/khugepaged: introduce release_pte_folio() to replace release_pte_page()
release_pte_page() is converted to be a wrapper for release_pte_folio() to
help facilitate the khugepaged conversion to folios.
This replaces 3 calls to compound_head() with 1, and saves 85 bytes of
kernel text.
Link: https://lkml.kernel.org/r/20230114001556.43795-1-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexander Potapenko [Thu, 12 Jan 2023 10:31:47 +0000 (11:31 +0100)]
kmsan: silence -Wmissing-prototypes warnings
When building the kernel with W=1, the compiler reports numerous warnings
about the missing prototypes for KMSAN instrumentation hooks.
Because these functions are not supposed to be called explicitly by the
kernel code (calls to them are emitted by the compiler), they do not have
to be declared in the headers. Instead, we add forward declarations right
before the definitions to silence the warnings produced by
-Wmissing-prototypes.
Link: https://lkml.kernel.org/r/20230112103147.382416-1-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Marco Elver <elver@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/lkml/202301020356.dFruA4I5-lkp@intel.com/T/
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Thu, 12 Jan 2023 12:39:32 +0000 (12:39 +0000)]
Documentation/mm: update references to __m[un]lock_page() to *_folio()
We now pass folios to these functions, so update the documentation
accordingly.
Additionally, correct the outdated reference to __pagevec_lru_add_fn(),
the referenced action occurs in __munlock_folio() directly now, replace
reference to lru_cache_add_inactive_or_unevictable() with the modern folio
equivalent folio_add_lru_vma() and reference folio flags by the flag name
rather than accessor.
Link: https://lkml.kernel.org/r/898c487169d98a7f09c1c1e57a7dfdc2b3f6bf0f.1673526881.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Thu, 12 Jan 2023 12:39:31 +0000 (12:39 +0000)]
mm: mlock: update the interface to use folios
Update the mlock interface to accept folios rather than pages, bringing
the interface in line with the internal implementation.
munlock_vma_page() still requires a page_folio() conversion, however this
is consistent with the existent mlock_vma_page() implementation and a
product of rmap still dealing in pages rather than folios.
Link: https://lkml.kernel.org/r/cba12777c5544305014bc0cbec56bb4cc71477d8.1673526881.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Thu, 12 Jan 2023 12:39:30 +0000 (12:39 +0000)]
m68k/mm/motorola: specify pmd_page() type
Failing to specify a specific type here breaks anything that relies on the
type being explicitly known, such as page_folio().
Make explicit the type of null pointer returned here.
Link: https://lkml.kernel.org/r/ad6be2821bbd6af10966b3704568ff458b270d9c.1673526881.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Thu, 12 Jan 2023 12:39:29 +0000 (12:39 +0000)]
mm: mlock: use folios and a folio batch internally
This brings mlock in line with the folio batches declared in mm/swap.c and
makes the code more consistent across the two.
The existing mechanism for identifying which operation each folio in the
batch is undergoing is maintained, i.e. using the lower 2 bits of the
struct folio address (previously struct page address). This should
continue to function correctly as folios remain at least system
word-aligned.
All invocations of mlock() pass either a non-compound page or the head of
a THP-compound page and no tail pages need updating so this functionality
works with struct folios being used internally rather than struct pages.
In this patch the external interface is kept identical to before in order
to maintain separation between patches in the series, using a rather
awkward conversion from struct page to struct folio in relevant functions.
However, this maintenance of the existing interface is intended to be
temporary - the next patch in the series will update the interfaces to
accept folios directly.
Link: https://lkml.kernel.org/r/9f894d54d568773f4ed3cb0eef5f8932f62c95f4.1673526881.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Thu, 12 Jan 2023 12:39:28 +0000 (12:39 +0000)]
mm: pagevec: add folio_batch_reinit()
Patch series "update mlock to use folios", v4.
This series updates mlock to use folios, converting the internal interface
to using folios exclusively and exposing the folio interface externally.
As a product of this we move to using a folio batch rather than a pagevec
for mlock folios, which brings it in line with the core folio batches
contained in mm/swap.c.
This patch (of 5):
This performs the same task as pagevec_reinit(), only modifying a folio
batch rather than a pagevec.
Link: https://lkml.kernel.org/r/cover.1673526881.git.lstoakes@gmail.com
Link: https://lkml.kernel.org/r/9018cecacb39e34c883540f997f9be8281153613.1673526881.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kefeng Wang [Thu, 12 Jan 2023 12:40:28 +0000 (20:40 +0800)]
mm: madvise: use vm_normal_folio() in madvise_free_pte_range()
There is already a vm_normal_folio(), use it to make
madvise_free_pte_range() only use a folio.
Link: https://lkml.kernel.org/r/20230112124028.16964-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Thu, 12 Jan 2023 13:10:31 +0000 (13:10 +0000)]
shmem: convert shmem_write_end() to use a folio
Use a folio internally to shmem_write_end() which saves a number of calls
to compound_head() and lets us get rid of the custom code to zero out the
rest of a THP and supports folios of arbitrary size.
Link: https://lkml.kernel.org/r/20230112131031.1209553-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:08 +0000 (14:46 -0600)]
mm/memory-failure: convert unpoison_memory() to folios
Use a folio inside unpoison_memory which replaces a compound_head() call
with a call to page_folio().
Link: https://lkml.kernel.org/r/20230112204608.80136-9-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:07 +0000 (14:46 -0600)]
mm/memory-failure: convert hugetlb_set_page_hwpoison() to folios
Change hugetlb_set_page_hwpoison() to folio_set_hugetlb_hwpoison() and use
a folio internally.
Link: https://lkml.kernel.org/r/20230112204608.80136-8-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:06 +0000 (14:46 -0600)]
mm/memory-failure: convert __free_raw_hwp_pages() to folios
Change __free_raw_hwp_pages() to __folio_free_raw_hwp() and modify its
callers to pass in a folio.
Link: https://lkml.kernel.org/r/20230112204608.80136-7-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:05 +0000 (14:46 -0600)]
mm/memory-failure: convert raw_hwp_list_head() to folios
Change raw_hwp_list_head() to take in a folio and modify its callers to
pass in a folio. Also converts two users of hugetlb specific page macro
users to their folio equivalents.
Link: https://lkml.kernel.org/r/20230112204608.80136-6-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:04 +0000 (14:46 -0600)]
mm/memory-failure: convert free_raw_hwp_pages() to folios
Change free_raw_hwp_pages() to folio_free_raw_hwp(), converts two users of
hugetlb specific page macro users to their folio equivalents.
Link: https://lkml.kernel.org/r/20230112204608.80136-5-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:03 +0000 (14:46 -0600)]
mm/memory-failure: convert hugetlb_clear_page_hwpoison to folios
Change hugetlb_clear_page_hwpoison() to folio_clear_hugetlb_hwpoison() by
changing the function to take in a folio. This converts one use of
ClearPageHWPoison and HPageRawHwpUnreliable to their folio equivalents.
Link: https://lkml.kernel.org/r/20230112204608.80136-4-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:02 +0000 (14:46 -0600)]
mm/memory-failure: convert try_memory_failure_hugetlb() to folios
Use a struct folio rather than a head page in try_memory_failure_hugetlb.
This converts one user of SetHPageMigratable to the folio equivalent.
Link: https://lkml.kernel.org/r/20230112204608.80136-3-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sidhartha Kumar [Thu, 12 Jan 2023 20:46:01 +0000 (14:46 -0600)]
mm/memory-failure: convert __get_huge_page_for_hwpoison() to folios
Patch series "convert hugepage memory failure functions to folios".
This series contains a 1:1 straightforward page to folio conversion for
memory failure functions which deal with huge pages. I renamed a few
functions to fit with how other folio operating functions are named.
These include:
hugetlb_clear_page_hwpoison -> folio_clear_hugetlb_hwpoison
free_raw_hwp_pages -> folio_free_raw_hwp
__free_raw_hwp_pages -> __folio_free_raw_hwp
hugetlb_set_page_hwpoison -> folio_set_hugetlb_hwpoison
The goal of this series was to reduce users of the hugetlb specific page
flag macros which take in a page so users are protected by the compiler to
make sure they are operating on a head page.
This patch (of 8):
Use a folio throughout the function rather than using a head page. This
also reduces the users of the page version of hugetlb specific page flags.
Link: https://lkml.kernel.org/r/20230112204608.80136-2-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexander Potapenko [Wed, 11 Jan 2023 10:18:06 +0000 (11:18 +0100)]
Revert "x86: kmsan: sync metadata pages on page fault"
This reverts commit
3f1e2c7a9099c1ed32c67f12cdf432ba782cf51f.
As noticed by Qun-Wei Lin, arch_sync_kernel_mappings() in
arch/x86/mm/fault.c is only used with CONFIG_X86_32, whereas KMSAN is only
supported on x86_64, where this code is not compiled.
The patch in question dates back to downstream KMSAN branch based on
v5.8-rc5, it sneaked into upstream unnoticed in v6.1.
Link: https://lkml.kernel.org/r/20230111101806.3236991-1-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Link: https://github.com/google/kmsan/issues/91
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vernon Yang [Wed, 11 Jan 2023 13:20:36 +0000 (21:20 +0800)]
mm/mmap: fix comment of unmapped_area{_topdown}
The low_limit of unmapped area information is inclusive, and the
hight_limit is not, so make symbol to be [ instead of (.
And replace hight_limit to high_limit.
Link: https://lkml.kernel.org/r/20230111132036.801404-1-vernon2gm@gmail.com
Fixes:
3499a13168da ("mm/mmap: use maple tree for unmapped_area{_topdown}")
Signed-off-by: Vernon Yang <vernon2gm@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vernon Yang [Wed, 11 Jan 2023 13:53:48 +0000 (21:53 +0800)]
maple_tree: fix comment of mte_destroy_walk
The parameter name of maple tree is mt, make the comment be mt instead of
mn, and the separator between the parameter name and the description to be
: instead of -.
Link: https://lkml.kernel.org/r/20230111135348.803181-1-vernon2gm@gmail.com
Fixes:
54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Vernon Yang <vernon2gm@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sidhartha Kumar [Wed, 11 Jan 2023 14:29:14 +0000 (14:29 +0000)]
mm: remove the hugetlb field from struct page
Patch series "Get rid of tail page fields".
Continue the shrinkage of the struct page definition by getting rid of the
'first tail page' and 'second tail page' fields. I originally did this
patch set before Hugh's rewrite of the subpages_mapcount, so it needed
substantial updates; hope I didn't miss anything.
This patch (of 28):
commit
dad6a5eb5556(mm,hugetlb: use folio fields in second tail page)
added a transitional hugetlb field to struct page and struct folio to make
room for another int in the first tail of a compound page. Hugetlb folio
conversions have changed all page users of this field to use the fields
within the folio so struct page no longer needs this hugetlb specific
field.
Link: https://lkml.kernel.org/r/20230111142915.1001531-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20230111142915.1001531-29-willy@infradead.org
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:13 +0000 (14:29 +0000)]
mm: convert deferred_split_huge_page() to deferred_split_folio()
Now that both callers use a folio, pass the folio in and save a call to
compound_head().
Link: https://lkml.kernel.org/r/20230111142915.1001531-28-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:12 +0000 (14:29 +0000)]
mm/huge_memory: convert get_deferred_split_queue() to take a folio
Removes a few calls to compound_head().
Link: https://lkml.kernel.org/r/20230111142915.1001531-27-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:11 +0000 (14:29 +0000)]
mm/huge_memory: remove page_deferred_list()
Use folio->_deferred_list directly.
Link: https://lkml.kernel.org/r/20230111142915.1001531-26-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:10 +0000 (14:29 +0000)]
mm: move page->deferred_list to folio->_deferred_list
Remove the entire block of definitions for the second tail page, and add
the deferred list to the struct folio. This actually moves _deferred_list
to a different offset in struct folio because I don't see a need to
include the padding.
This lets us use list_for_each_entry_safe() in deferred_split_scan()
and avoid a number of calls to compound_head().
Link: https://lkml.kernel.org/r/20230111142915.1001531-25-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:09 +0000 (14:29 +0000)]
doc: correct struct folio kernel-doc
Insert appropriate public: and private: markers to make the generated
kernel-doc look right.
Link: https://lkml.kernel.org/r/20230111142915.1001531-24-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:08 +0000 (14:29 +0000)]
mm: remove 'First tail page' members from struct page
All former users now use the folio equivalents, so remove them from the
definition of struct page.
Link: https://lkml.kernel.org/r/20230111142915.1001531-23-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:07 +0000 (14:29 +0000)]
hugetlb: remove uses of compound_dtor and compound_nr
Convert the entire file to use the folio equivalents.
Link: https://lkml.kernel.org/r/20230111142915.1001531-22-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:06 +0000 (14:29 +0000)]
mm: convert destroy_large_folio() to use folio_dtor
Replace a use of compound_dtor.
Link: https://lkml.kernel.org/r/20230111142915.1001531-21-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:05 +0000 (14:29 +0000)]
mm: convert is_transparent_hugepage() to use a folio
Replace a use of page->compound_dtor with its folio equivalent.
Link: https://lkml.kernel.org/r/20230111142915.1001531-20-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:04 +0000 (14:29 +0000)]
mm: convert set_compound_page_dtor() and set_compound_order() to folios
Replace uses of compound_dtor, compound_order and compound_nr by their
folio equivalents.
Link: https://lkml.kernel.org/r/20230111142915.1001531-19-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:03 +0000 (14:29 +0000)]
mm: reimplement compound_nr()
Turn compound_nr() into a wrapper around folio_nr_pages(). Similarly to
compound_order(), casting the struct page directly to struct folio
preserves the existing behaviour, while calling page_folio() would change
the behaviour. Move thp_nr_pages() down in the file so that compound_nr()
can be after folio_nr_pages().
[willy@infradead.org: fix assertion triggering]
Link: https://lkml.kernel.org/r/Y8AFgZEEjnUIaCbf@casper.infradead.org
Link: https://lkml.kernel.org/r/20230111142915.1001531-18-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:02 +0000 (14:29 +0000)]
mm: reimplement compound_order()
Make compound_order() use struct folio. It can't be turned into a wrapper
around folio_order() as a page can be turned into a tail page between a
check in compound_order() and the assertion in folio_test_large().
Link: https://lkml.kernel.org/r/20230111142915.1001531-17-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:01 +0000 (14:29 +0000)]
mm: remove head_compound_mapcount() and _ptr functions
folio_mapcount_ptr(), compound_mapcount_ptr() and subpages_mapcount_ptr()
are all now unused.
Link: https://lkml.kernel.org/r/20230111142915.1001531-16-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:29:00 +0000 (14:29 +0000)]
mm: convert page_mapcount() to use folio_entire_mapcount()
Remove a use of head_compound_mapcount().
Link: https://lkml.kernel.org/r/20230111142915.1001531-15-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:59 +0000 (14:28 +0000)]
hugetlb: remove uses of folio_mapcount_ptr
Use the entire_mapcount field directly.
Link: https://lkml.kernel.org/r/20230111142915.1001531-14-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:58 +0000 (14:28 +0000)]
mm/debug: remove call to head_compound_mapcount()
Call folio_entire_mapcount() instead.
Link: https://lkml.kernel.org/r/20230111142915.1001531-13-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:57 +0000 (14:28 +0000)]
mm: use entire_mapcount in __page_dup_rmap()
Remove the use of the compound_mapcount_ptr() wrapper, and add an
assertion that we're not passing a tail page if we're duplicating a PMD.
Link: https://lkml.kernel.org/r/20230111142915.1001531-12-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:56 +0000 (14:28 +0000)]
mm: use a folio in hugepage_add_anon_rmap() and hugepage_add_new_anon_rmap()
Remove uses of compound_mapcount_ptr()
Link: https://lkml.kernel.org/r/20230111142915.1001531-11-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:55 +0000 (14:28 +0000)]
page_alloc: use folio fields directly
Rmove the uses of compound_mapcount_ptr(), head_compound_mapcount() and
subpages_mapcount_ptr()
Link: https://lkml.kernel.org/r/20230111142915.1001531-10-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:54 +0000 (14:28 +0000)]
mm: add folio_add_new_anon_rmap()
In contrast to other rmap functions, page_add_new_anon_rmap() is always
called with a freshly allocated page. That means it can't be called with
a tail page. Turn page_add_new_anon_rmap() into folio_add_new_anon_rmap()
and add a page_add_new_anon_rmap() wrapper. Callers can be converted
individually.
[akpm@linux-foundation.org: fix NOMMU build. page_add_new_anon_rmap() requires CONFIG_MMU]
[willy@infradead.org: folio-compat.c needs rmap.h]
Link: https://lkml.kernel.org/r/20230111142915.1001531-9-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:53 +0000 (14:28 +0000)]
mm: convert page_add_file_rmap() to use a folio internally
The API for page_add_file_rmap() needs to be page-based, because we can
add mappings of individual pages. But inside the function, we want to
only call compound_head() once and then use the folio APIs instead of the
page APIs that each call compound_head().
Link: https://lkml.kernel.org/r/20230111142915.1001531-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:52 +0000 (14:28 +0000)]
mm: convert page_add_anon_rmap() to use a folio internally
The API for page_add_anon_rmap() needs to be page-based, because we can
add mappings of individual pages. But inside the function, we want to
only call compound_head() once and then use the folio APIs instead of the
page APIs that each call compound_head().
Link: https://lkml.kernel.org/r/20230111142915.1001531-7-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:51 +0000 (14:28 +0000)]
mm: convert page_remove_rmap() to use a folio internally
The API for page_remove_rmap() needs to be page-based, because we can
remove mappings of pages individually. But inside the function, we want
to only call compound_head() once and then use the folio APIs instead of
the page APIs that each call compound_head().
Link: https://lkml.kernel.org/r/20230111142915.1001531-6-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:50 +0000 (14:28 +0000)]
mm: convert total_compound_mapcount() to folio_total_mapcount()
Instead of enforcing that the argument must be a head page by naming,
enforce it with the compiler by making it a folio. Also rename the
counter in struct folio from _compound_mapcount to _entire_mapcount.
Link: https://lkml.kernel.org/r/20230111142915.1001531-5-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:49 +0000 (14:28 +0000)]
doc: clarify refcount section by referring to folios & pages
Include the rename of subpages_mapcount to _nr_pages_mapped.
Link: https://lkml.kernel.org/r/20230111142915.1001531-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:48 +0000 (14:28 +0000)]
mm: convert head_subpages_mapcount() into folio_nr_pages_mapped()
Calling this 'mapcount' is confusing since mapcount is usually the number
of times something is mapped; instead this is the number of mapped pages.
It's also better to enforce that this is a folio rather than a head page.
Move folio_nr_pages_mapped() into mm/internal.h since this is not
something we want device drivers or filesystems poking at. Get rid of
folio_subpages_mapcount_ptr() and use folio->_nr_pages_mapped directly.
Link: https://lkml.kernel.org/r/20230111142915.1001531-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 11 Jan 2023 14:28:47 +0000 (14:28 +0000)]
mm: remove folio_pincount_ptr() and head_compound_pincount()
We can use folio->_pincount directly, since all users are guarded by tests
of compound/large.
Link: https://lkml.kernel.org/r/20230111142915.1001531-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alistair Popple [Tue, 10 Jan 2023 02:57:22 +0000 (13:57 +1100)]
mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export
mmu_notifier_range_update_to_read_only() was originally introduced in
commit
c6d23413f81b ("mm/mmu_notifier:
mmu_notifier_range_update_to_read_only() helper") as an optimisation for
device drivers that know a range has only been mapped read-only. However
there are no users of this feature so remove it. As it is the only user
of the struct mmu_notifier_range.vma field remove that also.
Link: https://lkml.kernel.org/r/20230110025722.600912-1-apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baolin Wang [Tue, 10 Jan 2023 13:36:22 +0000 (21:36 +0800)]
mm: compaction: avoid fragmentation score calculation for empty zones
There is no need to calculate the fragmentation score for empty zones.
Link: https://lkml.kernel.org/r/100331ad9d274a9725e687b00d85d75d7e4a17c7.1673342761.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baolin Wang [Tue, 10 Jan 2023 13:36:21 +0000 (21:36 +0800)]
mm: compaction: add missing kcompactd wakeup trace event
Add missing kcompactd wakeup trace event for proactive compaction,
meanwhile use order = -1 and the highest zone index of the pgdat for the
kcompactd wakeup trace event by proactive compaction.
Link: https://lkml.kernel.org/r/cbf8097a2d8a1b6800991f2a21575550d3613ce6.1673342761.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baolin Wang [Tue, 10 Jan 2023 13:36:20 +0000 (21:36 +0800)]
mm: compaction: count the migration scanned pages events for proactive compaction
The proactive compaction will reuse per-node kcompactd threads, so we
should also count the KCOMPACTD_MIGRATE_SCANNED and KCOMPACTD_FREE_SCANNED
events for proactive compaction.
Link: https://lkml.kernel.org/r/b7f1ece1adc17defa47e3667b5f9fd61f496517a.1673342761.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baolin Wang [Tue, 10 Jan 2023 13:36:19 +0000 (21:36 +0800)]
mm: compaction: move list validation into compact_zone()
Move the cc.freepages and cc.migratepages list validation into compact_zone()
to remove some duplicate code.
Link: https://lkml.kernel.org/r/15cf54f7d762e87b04ac3cc74536f7d1ebbcd8cd.1673342761.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baolin Wang [Tue, 10 Jan 2023 13:36:18 +0000 (21:36 +0800)]
mm: compaction: remove redundant VM_BUG_ON() in compact_zone()
Patch series "Some small improvements for compaction".
When I did some compaction testing, I found some small room for
improvement as well as some code cleanups.
This patch (of 5):
The compaction_suitable() will never return values other than
COMPACT_SUCCESS, COMPACT_SKIPPED and COMPACT_CONTINUE, so after validation
of COMPACT_SUCCESS and COMPACT_SKIPPED, we will never hit other unexpected
case. Thus remove the redundant VM_BUG_ON() validation for the return
values of compaction_suitable().
Link: https://lkml.kernel.org/r/cover.1673342761.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/740a2396d9b98154dba76e326cba5e798b640ead.1673342761.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vernon Yang [Tue, 10 Jan 2023 14:53:53 +0000 (22:53 +0800)]
mm/mmap: fix typo in comment
Replace "parital" with "partial".
Link: https://lkml.kernel.org/r/20230110145353.1658435-1-vernon2gm@gmail.com
Signed-off-by: Vernon Yang <vernon2gm@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vernon Yang [Tue, 10 Jan 2023 15:42:11 +0000 (23:42 +0800)]
maple_tree: remove the parameter entry of mas_preallocate
The parameter entry of mas_preallocate is not used, so drop it.
Link: https://lkml.kernel.org/r/20230110154211.1758562-1-vernon2gm@gmail.com
Signed-off-by: Vernon Yang <vernon2gm@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Tue, 10 Jan 2023 19:04:00 +0000 (19:04 +0000)]
selftests/damon/debugfs_rm_non_contexts: hide expected write error messages
A selftest case for DAMON debugfs interface has a test for expected
failure. To make the test output clean, hide the expected failure error
message.
Link: https://lkml.kernel.org/r/20230110190400.119388-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Tue, 10 Jan 2023 19:03:59 +0000 (19:03 +0000)]
selftests/damon/sysfs: hide expected write failures
DAMON selftests for sysfs (sysfs.sh) tests if some writes to DAMON sysfs
interface files fails as expected. It makes the test results noisy with
the failure error message because it tests a number of such failures.
Redirect the expected failure error messages to /dev/null to make the
results clean.
Link: https://lkml.kernel.org/r/20230110190400.119388-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Tue, 10 Jan 2023 19:03:58 +0000 (19:03 +0000)]
MAINTAINERS/DAMON: link maintainer profile, git trees, and website
Add links to below DAMON development related resource to DAMON section in
MAINTAINERS file.
- The basic policies and expectations of DAMON development,
- DAMON development trees, and
- DAMON introduction website.
Link: https://lkml.kernel.org/r/20230110190400.119388-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Tue, 10 Jan 2023 19:03:57 +0000 (19:03 +0000)]
Docs/mm/damon: add a maintainer-profile for DAMON
Document the basic policies and expectations for DAMON development.
Link: https://lkml.kernel.org/r/20230110190400.119388-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Tue, 10 Jan 2023 19:03:56 +0000 (19:03 +0000)]
Docs/admin-guide/mm/damon/usage: update DAMOS actions/filters supports of each DAMON operations set
Supports of each DAMOS action and filters are up to DAMON operations set
implementation, but it's not mentioned in detail on the documentation.
Update the information on the usage document.
Link: https://lkml.kernel.org/r/20230110190400.119388-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Tue, 10 Jan 2023 19:03:55 +0000 (19:03 +0000)]
Docs/mm/damon/index: mention DAMOS on the intro
What DAMON aims to do is not only access monitoring but efficient and
effective access-aware system operations. And DAMon-based Operation
Schemes (DAMOS) is the important feature of DAMON for the goal. Make the
intro of DAMON documentation to emphasize the goal and mention DAMOS.
Link: https://lkml.kernel.org/r/20230110190400.119388-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Tue, 10 Jan 2023 19:03:54 +0000 (19:03 +0000)]
mm/damon/core: update kernel-doc comments for DAMOS filters supports of each DAMON operations set
Supports of each DAMOS filter type are up to DAMON operations set
implementation in use, but not well mentioned on the kernel-doc comments.
Add the comment.
Link: https://lkml.kernel.org/r/20230110190400.119388-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Tue, 10 Jan 2023 19:03:53 +0000 (19:03 +0000)]
mm/damon/core: update kernel-doc comments for DAMOS action supports of each DAMON operations set
Patch series "mm/damon: trivial fixups".
This patchset contains patches for trivial fixups of DAMON's
documentation, MAINTAINERS section, and selftests.
This patch (of 8):
Supports of each DAMOS action are up to DAMON operations set
implementation in use, but not well mentioned on the kernel-doc comments.
Add the comment.
Link: https://lkml.kernel.org/r/20230110190400.119388-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20230110190400.119388-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabio M. De Francesco [Wed, 7 Dec 2022 22:53:08 +0000 (23:53 +0100)]
mm/highmem: add notes about conversions from kmap{,_atomic}()
kmap() and kmap_atomic() have been deprecated. kmap_local_page() should
always be used in new code and the call sites of the two deprecated
functions should be converted. This latter task can lead to errors if it
is not carried out with the necessary attention to the context around and
between the maps and unmaps.
Therefore, add further information to the Highmem's documentation for the
purpose to make it clearer that (1) kmap() and kmap_atomic() must not any
longer be called in new code and (2) developers doing conversions from
kmap() amd kmap_atomic() are expected to take care of the context around
and between the maps and unmaps, in order to not break the code.
Relevant parts of this patch have been taken from messages exchanged
privately with Ira Weiny (thanks!).
[fmdefrancesco@gmail.com: merge two sentences into one, per Bagas]
Link: https://lkml.kernel.org/r/20230119123945.10471-1-fmdefrancesco@gmail.com
Link: https://lkml.kernel.org/r/20221207225308.8290-1-fmdefrancesco@gmail.com
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Wed, 1 Feb 2023 01:25:17 +0000 (17:25 -0800)]
Sync mm-stable with mm-hotfixes-stable to pick up dependent patches
Merge branch 'mm-hotfixes-stable' into mm-stable
Kefeng Wang [Sun, 29 Jan 2023 04:09:45 +0000 (12:09 +0800)]
mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath()
As commit
18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"),
hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
occurs a NULL pointer dereference, let's do not record the foreign
writebacks for folio memcg is null in mem_cgroup_track_foreign_dirty() to
fix it.
Link: https://lkml.kernel.org/r/20230129040945.180629-1-wangkefeng.wang@huawei.com
Fixes:
97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reported-by: Ma Wupeng <mawupeng1@huawei.com>
Tested-by: Miko Larsson <mikoxyzzz@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Ma Wupeng <mawupeng1@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ye xingchen [Sun, 29 Jan 2023 02:13:57 +0000 (10:13 +0800)]
Kconfig.debug: fix the help description in SCHED_DEBUG
The correct file path for SCHED_DEBUG is /sys/kernel/debug/sched.
Link: https://lkml.kernel.org/r/202301291013573466558@zte.com.cn
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Longlong Xia [Sat, 28 Jan 2023 09:47:57 +0000 (09:47 +0000)]
mm/swapfile: add cond_resched() in get_swap_pages()
The softlockup still occurs in get_swap_pages() under memory pressure. 64
CPU cores, 64GB memory, and 28 zram devices, the disksize of each zram
device is 50MB with same priority as si. Use the stress-ng tool to
increase memory pressure, causing the system to oom frequently.
The plist_for_each_entry_safe() loops in get_swap_pages() could reach tens
of thousands of times to find available space (extreme case:
cond_resched() is not called in scan_swap_map_slots()). Let's add
cond_resched() into get_swap_pages() when failed to find available space
to avoid softlockup.
Link: https://lkml.kernel.org/r/20230128094757.1060525-1-xialonglong1@huawei.com
Signed-off-by: Longlong Xia <xialonglong1@huawei.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhaoyang Huang [Thu, 19 Jan 2023 01:22:25 +0000 (09:22 +0800)]
mm: use stack_depot_early_init for kmemleak
Mirsad report the below error which is caused by stack_depot_init()
failure in kvcalloc. Solve this by having stackdepot use
stack_depot_early_init().
On 1/4/23 17:08, Mirsad Goran Todorovac wrote:
I hate to bring bad news again, but there seems to be a problem with the output of /sys/kernel/debug/kmemleak:
[root@pc-mtodorov ~]# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff951c118568b0 (size 16):
comm "kworker/u12:2", pid 56, jiffies
4294893952 (age 4356.548s)
hex dump (first 16 bytes):
6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
backtrace:
[root@pc-mtodorov ~]#
Apparently, backtrace of called functions on the stack is no longer
printed with the list of memory leaks. This appeared on Lenovo desktop
10TX000VCR, with AlmaLinux 8.7 and BIOS version M22KT49A (11/10/2022) and
6.2-rc1 and 6.2-rc2 builds. This worked on 6.1 with the same
CONFIG_KMEMLEAK=y and MGLRU enabled on a vanilla mainstream kernel from
Mr. Torvalds' tree. I don't know if this is deliberate feature for some
reason or a bug. Please find attached the config, lshw and kmemleak
output.
[vbabka@suse.cz: remove stack_depot_init() call]
Link: https://lore.kernel.org/all/5272a819-ef74-65ff-be61-4d2d567337de@alu.unizg.hr/
Link: https://lkml.kernel.org/r/1674091345-14799-2-git-send-email-zhaoyang.huang@unisoc.com
Fixes:
56a61617dd22 ("mm: use stack_depot for recording kmemleak's backtrace")
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: ke.wang <ke.wang@unisoc.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Phillip Lougher [Fri, 27 Jan 2023 06:18:42 +0000 (06:18 +0000)]
Squashfs: fix handling and sanity checking of xattr_ids count
A Sysbot [1] corrupted filesystem exposes two flaws in the handling and
sanity checking of the xattr_ids count in the filesystem. Both of these
flaws cause computation overflow due to incorrect typing.
In the corrupted filesystem the xattr_ids value is
4294967071, which
stored in a signed variable becomes the negative number -225.
Flaw 1 (64-bit systems only):
The signed integer xattr_ids variable causes sign extension.
This causes variable overflow in the SQUASHFS_XATTR_*(A) macros. The
variable is first multiplied by sizeof(struct squashfs_xattr_id) where the
type of the sizeof operator is "unsigned long".
On a 64-bit system this is 64-bits in size, and causes the negative number
to be sign extended and widened to 64-bits and then become unsigned. This
produces the very large number
18446744073709548016 or 2^64 - 3600. This
number when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and
divided by SQUASHFS_METADATA_SIZE overflows and produces a length of 0
(stored in len).
Flaw 2 (32-bit systems only):
On a 32-bit system the integer variable is not widened by the unsigned
long type of the sizeof operator (32-bits), and the signedness of the
variable has no effect due it always being treated as unsigned.
The above corrupted xattr_ids value of
4294967071, when multiplied
overflows and produces the number
4294963696 or 2^32 - 3400. This number
when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and divided by
SQUASHFS_METADATA_SIZE overflows again and produces a length of 0.
The effect of the 0 length computation:
In conjunction with the corrupted xattr_ids field, the filesystem also has
a corrupted xattr_table_start value, where it matches the end of
filesystem value of 850.
This causes the following sanity check code to fail because the
incorrectly computed len of 0 matches the incorrect size of the table
reported by the superblock (0 bytes).
len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids);
indexes = SQUASHFS_XATTR_BLOCKS(*xattr_ids);
/*
* The computed size of the index table (len bytes) should exactly
* match the table start and end points
*/
start = table_start + sizeof(*id_table);
end = msblk->bytes_used;
if (len != (end - start))
return ERR_PTR(-EINVAL);
Changing the xattr_ids variable to be "usigned int" fixes the flaw on a
64-bit system. This relies on the fact the computation is widened by the
unsigned long type of the sizeof operator.
Casting the variable to u64 in the above macro fixes this flaw on a 32-bit
system.
It also means 64-bit systems do not implicitly rely on the type of the
sizeof operator to widen the computation.
[1] https://lore.kernel.org/lkml/
000000000000cd44f005f1a0f17f@google.com/
Link: https://lkml.kernel.org/r/20230127061842.10965-1-phillip@squashfs.org.uk
Fixes:
506220d2ba21 ("squashfs: add more sanity checks in xattr id lookup")
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: <syzbot+082fa4af80a5bb1a9843@syzkaller.appspotmail.com>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Fedor Pchelkin <pchelkin@ispras.ru>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tom Saeger [Tue, 24 Jan 2023 00:09:35 +0000 (17:09 -0700)]
sh: define RUNTIME_DISCARD_EXIT
sh vmlinux fails to link with GNU ld < 2.40 (likely < 2.36) since
commit
99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv").
This is similar to fixes for powerpc and s390:
commit
4b9880dbf3bd ("powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT").
commit
a494398bde27 ("s390: define RUNTIME_DISCARD_EXIT to fix link error
with GNU ld < 2.36").
$ sh4-linux-gnu-ld --version | head -n1
GNU ld (GNU Binutils for Debian) 2.35.2
$ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu- microdev_defconfig
$ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu-
`.exit.text' referenced in section `__bug_table' of crypto/algboss.o:
defined in discarded section `.exit.text' of crypto/algboss.o
`.exit.text' referenced in section `__bug_table' of
drivers/char/hw_random/core.o: defined in discarded section
`.exit.text' of drivers/char/hw_random/core.o
make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
make[1]: *** [Makefile:1252: vmlinux] Error 2
arch/sh/kernel/vmlinux.lds.S keeps EXIT_TEXT:
/*
* .exit.text is discarded at runtime, not link time, to deal with
* references from __bug_table
*/
.exit.text : AT(ADDR(.exit.text)) { EXIT_TEXT }
However, EXIT_TEXT is thrown away by
DISCARD(include/asm-generic/vmlinux.lds.h) because
sh does not define RUNTIME_DISCARD_EXIT.
GNU ld 2.40 does not have this issue and builds fine.
This corresponds with Masahiro's comments in
a494398bde27:
"Nathan [Chancellor] also found that binutils
commit
21401fc7bf67 ("Duplicate output sections in scripts") cured this
issue, so we cannot reproduce it with binutils 2.36+, but it is better
to not rely on it."
Link: https://lkml.kernel.org/r/9166a8abdc0f979e50377e61780a4bba1dfa2f52.1674518464.git.tom.saeger@oracle.com
Fixes:
99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv")
Link: https://lore.kernel.org/all/Y7Jal56f6UBh1abE@dev-arch.thelio-3990X/
Link: https://lore.kernel.org/all/20230123194218.47ssfzhrpnv3xfez@oracle.com/
Signed-off-by: Tom Saeger <tom.saeger@oracle.com>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dennis Gilmore <dennis@ausil.us>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Thu, 26 Jan 2023 20:07:27 +0000 (20:07 +0000)]
highmem: round down the address passed to kunmap_flush_on_unmap()
We already round down the address in kunmap_local_indexed() which is the
other implementation of __kunmap_local(). The only implementation of
kunmap_flush_on_unmap() is PA-RISC which is expecting a page-aligned
address. This may be causing PA-RISC to be flushing the wrong addresses
currently.
Link: https://lkml.kernel.org/r/20230126200727.1680362-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Fixes:
298fa1ad5571 ("highmem: Provide generic variant of kmap_atomic*")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Helge Deller <deller@gmx.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mike Kravetz [Thu, 26 Jan 2023 22:27:21 +0000 (14:27 -0800)]
migrate: hugetlb: check for hugetlb shared PMD in node migration
migrate_pages/mempolicy semantics state that CAP_SYS_NICE is required to
move pages shared with another process to a different node. page_mapcount
> 1 is being used to determine if a hugetlb page is shared. However, a
hugetlb page will have a mapcount of 1 if mapped by multiple processes via
a shared PMD. As a result, hugetlb pages shared by multiple processes and
mapped with a shared PMD can be moved by a process without CAP_SYS_NICE.
To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found
consider the page shared.
Link: https://lkml.kernel.org/r/20230126222721.222195-3-mike.kravetz@oracle.com
Fixes:
e2d8cf405525 ("migrate: add hugepage migration code to migrate_pages()")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mike Kravetz [Thu, 26 Jan 2023 22:27:20 +0000 (14:27 -0800)]
mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps
Patch series "Fixes for hugetlb mapcount at most 1 for shared PMDs".
This issue of mapcount in hugetlb pages referenced by shared PMDs was
discussed in [1]. The following two patches address user visible behavior
caused by this issue.
[1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/
This patch (of 2):
A hugetlb page will have a mapcount of 1 if mapped by multiple processes
via a shared PMD. This is because only the first process increases the
map count, and subsequent processes just add the shared PMD page to their
page table.
page_mapcount is being used to decide if a hugetlb page is shared or
private in /proc/PID/smaps. Pages referenced via a shared PMD were
incorrectly being counted as private.
To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found
count the hugetlb page as shared. A new helper to check for a shared PMD
is added.
[akpm@linux-foundation.org: simplification, per David]
[akpm@linux-foundation.org: hugetlb.h: include page_ref.h for page_count()]
Link: https://lkml.kernel.org/r/20230126222721.222195-2-mike.kravetz@oracle.com
Fixes:
25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>