Matthew Wilcox (Oracle) [Fri, 17 Jun 2022 15:42:48 +0000 (16:42 +0100)]
mm/vmscan: convert reclaim_pages() to use a folio
Remove a few hidden calls to compound_head, saving 76 bytes of text.
Link: https://lkml.kernel.org/r/20220617154248.700416-6-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Fri, 17 Jun 2022 15:42:47 +0000 (16:42 +0100)]
mm/vmscan: convert shrink_active_list() to use a folio
Remove a few hidden calls to compound_head, saving 411 bytes of text.
Link: https://lkml.kernel.org/r/20220617154248.700416-5-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Fri, 17 Jun 2022 15:42:46 +0000 (16:42 +0100)]
mm/vmscan: convert move_pages_to_lru() to use a folio
Remove a few hidden calls to compound_head, saving 387 bytes of text on
my test configuration.
Link: https://lkml.kernel.org/r/20220617154248.700416-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Fri, 17 Jun 2022 15:42:45 +0000 (16:42 +0100)]
mm/vmscan: convert isolate_lru_pages() to use a folio
Remove a few hidden calls to compound_head, saving 279 bytes of text.
Link: https://lkml.kernel.org/r/20220617154248.700416-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Fri, 17 Jun 2022 15:42:44 +0000 (16:42 +0100)]
mm/vmscan: convert reclaim_clean_pages_from_list() to folios
Patch series "nvert much of vmscan to folios"
vmscan always operates on folios since it puts the pages on the LRU list.
Switching all of these functions from pages to folios saves 1483 bytes of
text from removing all the baggage around calling compound_page() and
similar functions.
This patch (of 5):
This is a straightforward conversion which removes several hidden calls
to compound_head, saving 330 bytes of kernel text.
Link: https://lkml.kernel.org/r/20220617154248.700416-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20220617154248.700416-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Tue, 14 Jun 2022 09:36:29 +0000 (11:36 +0200)]
mm/mprotect: try avoiding write faults for exclusive anonymous pages when changing protection
Similar to our MM_CP_DIRTY_ACCT handling for shared, writable mappings, we
can try mapping anonymous pages in a private writable mapping writable if
they are exclusive, the PTE is already dirty, and no special handling
applies. Mapping the anonymous page writable is essentially the same
thing the write fault handler would do in this case.
Special handling is required for uffd-wp and softdirty tracking, so take
care of that properly. Also, leave PROT_NONE handling alone for now; in
the future, we could similarly extend the logic in do_numa_page() or use
pte_mk_savedwrite() here.
While this improves mprotect(PROT_READ)+mprotect(PROT_READ|PROT_WRITE)
performance, it should also be a valuable optimization for uffd-wp, when
un-protecting.
This has been previously suggested by Peter Collingbourne in [1], relevant
in the context of the Scudo memory allocator, before we had
PageAnonExclusive.
This commit doesn't add the same handling for PMDs (i.e., anonymous THP,
anonymous hugetlb); benchmark results from Andrea indicate that there are
minor performance gains, so it's might still be valuable to streamline
that logic for all anonymous pages in the future.
As we now also set MM_CP_DIRTY_ACCT for private mappings, let's rename it
to MM_CP_TRY_CHANGE_WRITABLE, to make it clearer what's actually
happening.
Micro-benchmark courtesy of Andrea:
===
#define _GNU_SOURCE
#include <sys/mman.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#define SIZE (1024*1024*1024)
int main(int argc, char *argv[])
{
char *p;
if (posix_memalign((void **)&p, sysconf(_SC_PAGESIZE)*512, SIZE))
perror("posix_memalign"), exit(1);
if (madvise(p, SIZE, argc > 1 ? MADV_HUGEPAGE : MADV_NOHUGEPAGE))
perror("madvise");
explicit_bzero(p, SIZE);
for (int loops = 0; loops < 40; loops++) {
if (mprotect(p, SIZE, PROT_READ))
perror("mprotect"), exit(1);
if (mprotect(p, SIZE, PROT_READ|PROT_WRITE))
perror("mprotect"), exit(1);
explicit_bzero(p, SIZE);
}
}
===
Results on my Ryzen 9 3900X:
Stock 10 runs (lower is better): AVG 6.398s, STDEV 0.043
Patched 10 runs (lower is better): AVG 3.780s, STDEV 0.026
===
[1] https://lkml.kernel.org/r/
20210429214801.2583336-1-pcc@google.com
Link: https://lkml.kernel.org/r/20220614093629.76309-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Suggested-by: Peter Collingbourne <pcc@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Edward Liaw [Mon, 13 Jun 2022 23:33:21 +0000 (23:33 +0000)]
userfaultfd: selftests: infinite loop in faulting_process
On Android this test is getting stuck in an infinite loop due to
indeterminate behavior:
The local variables steps and signalled were being reset to 1 and 0
respectively after every jump back to sigsetjmp by siglongjmp in the
signal handler. The test was incrementing them and expecting them to
retain their incremented values. The documentation for siglongjmp says:
All accessible objects have values as of the time sigsetjmp() was called,
except that the values of objects of automatic storage duration which are
local to the function containing the invocation of the corresponding
sigsetjmp() which do not have volatile-qualified type and which are
changed between the sigsetjmp() invocation and siglongjmp() call are
indeterminate.
Tagging steps and signalled with volatile enabled the test to pass.
Link: https://lkml.kernel.org/r/20220613233321.431282-1-edliaw@google.com
Signed-off-by: Edward Liaw <edliaw@google.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 13 Jun 2022 19:23:01 +0000 (19:23 +0000)]
Docs/admin-guide/damon: add a document for DAMON_LRU_SORT
This commit documents the usage of DAMON_LRU_SORT for admins.
Link: https://lkml.kernel.org/r/20220613192301.8817-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 13 Jun 2022 19:23:00 +0000 (19:23 +0000)]
mm/damon: introduce DAMON-based LRU-lists Sorting
Users can do data access-aware LRU-lists sorting using 'LRU_PRIO' and
'LRU_DEPRIO' DAMOS actions. However, finding best parameters including
the hotness/coldness thresholds, CPU quota, and watermarks could be
challenging for some users. To make the scheme easy to be used without
complex tuning for common situations, this commit implements a static
kernel module called 'DAMON_LRU_SORT' using the 'LRU_PRIO' and
'LRU_DEPRIO' DAMOS actions.
It proactively sorts LRU-lists using DAMON with conservatively chosen
default values of the parameters. That is, the module under its default
parameters will make no harm for common situations but provide some level
of efficiency improvements for systems having clear hot/cold access
pattern under a level of memory pressure while consuming only a limited
small portion of CPU time.
Link: https://lkml.kernel.org/r/20220613192301.8817-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 13 Jun 2022 19:22:59 +0000 (19:22 +0000)]
Docs/admin-guide/damon/sysfs: document 'LRU_DEPRIO' scheme action
This commit documents the 'LRU_DEPRIO' scheme action for DAMON sysfs
interface.`
Link: https://lkml.kernel.org/r/20220613192301.8817-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 13 Jun 2022 19:22:58 +0000 (19:22 +0000)]
mm/damon/schemes: add 'LRU_DEPRIO' action
This commit adds a new DAMON-based operation scheme action called
'LRU_DEPRIO' for physical address space. The action deprioritizes pages
in the memory area of the target access pattern on their LRU lists. This
is hence supposed to be used for rarely accessed (cold) memory regions so
that cold pages could be more likely reclaimed first under memory
pressure. Internally, it simply calls 'lru_deactivate()'.
Using this with 'LRU_PRIO' action for hot pages, users can proactively
sort LRU lists based on the access pattern. That is, it can make the LRU
lists somewhat more trustworthy source of access temperature. As a
result, efficiency of LRU-lists based mechanisms including the reclamation
target selection could be improved.
Link: https://lkml.kernel.org/r/20220613192301.8817-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 13 Jun 2022 19:22:57 +0000 (19:22 +0000)]
Docs/admin-guide/damon/sysfs: document 'LRU_PRIO' scheme action
This commit documents the 'lru_prio' scheme action for DAMON sysfs
interface.
Link: https://lkml.kernel.org/r/20220613192301.8817-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 13 Jun 2022 19:22:56 +0000 (19:22 +0000)]
mm/damon/schemes: add 'LRU_PRIO' DAMOS action
This commit adds a new DAMOS action called 'LRU_PRIO' for the physical
address space. The action prioritizes pages in the memory regions of the
user-specified target access pattern on their LRU lists. This is hence
supposed to be used for frequently accessed (hot) memory regions so that
hot pages could be more likely protected under memory pressure.
Internally, it simply calls 'mark_page_accessed()'.
Link: https://lkml.kernel.org/r/20220613192301.8817-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 13 Jun 2022 19:22:55 +0000 (19:22 +0000)]
mm/damon/paddr: use a separate function for 'DAMOS_PAGEOUT' handling
This commit moves code for 'DAMOS_PAGEOUT' handling of the physical
address space monitoring operations set to a separate function so that its
caller, 'damon_pa_apply_scheme()', can be more easily extended for
additional DAMOS actions later.
Link: https://lkml.kernel.org/r/20220613192301.8817-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 13 Jun 2022 19:22:53 +0000 (19:22 +0000)]
mm/damon/dbgfs: add and use mappings between 'schemes' action inputs and 'damos_action' values
Patch series "Extend DAMOS for Proactive LRU-lists Sorting".
Introduction
============
In short, this patchset 1) extends DAMON-based Operation Schemes (DAMOS)
for low overhead data access pattern based LRU-lists sorting, and 2)
implements a static kernel module for easy use of conservatively-tuned
version of that using the extended DAMOS capability.
Background
----------
As page-granularity access checking overhead could be significant on huge
systems, LRU lists are normally not proactively sorted but partially and
reactively sorted for special events including specific user requests,
system calls and memory pressure. As a result, LRU lists are sometimes
not so perfectly prepared to be used as a trustworthy access pattern
source for some situations including reclamation target pages selection
under sudden memory pressure.
DAMON-based Proactive LRU-lists Sorting
---------------------------------------
Because DAMON can identify access patterns of best-effort accuracy while
inducing only user-specified range of overhead, using DAMON for Proactive
LRU-lists Sorting (PLRUS) could be helpful for this situation. The idea
is quite simple. Find hot pages and cold pages using DAMON, and
prioritize hot pages while deprioritizing cold pages on their LRU-lists.
This patchset extends DAMON to support such schemes by introducing a
couple of new DAMOS actions for prioritizing and deprioritizing memory
regions of specific access patterns on their LRU-lists. In detail, this
patchset simply uses 'mark_page_accessed()' and 'deactivate_page()'
functions for prioritization and deprioritization of pages on their LRU
lists, respectively.
To make the scheme easy to use without complex tuning for common
situations, this patchset further implements a static kernel module called
'DAMON_LRU_SORT' using the extended DAMOS functionality. It proactively
sorts LRU-lists using DAMON with conservatively chosen default
hotness/coldness thresholds and small CPU usage quota limit. That is, the
module under its default parameters will make no harm for common situation
but provide some level of benefit for systems having clear hot/cold access
pattern under only memory pressure while consuming only limited small
portion of CPU time.
Related Works
-------------
Proactive reclamation is well known to be helpful for reducing non-optimal
reclamation target selection caused performance drops. However, proactive
reclamation is not a best option for some cases, because it could incur
additional I/O. For an example, it could be prohitive for systems using
storage devices that total number of writes is limited, or cloud block
storages that charges every I/O.
Some proactive reclamation approaches[1,2] induce a level of memory
pressure using memcg files or swappiness while monitoring PSI. As
reclamation target selection is still relying on the original LRU-lists
mechanism, using DAMON-based proactive reclamation before inducing the
proactive reclamation could allow more memory saving with same level of
performance overhead, or less performance overhead with same level of
memory saving.
[1] https://blogs.oracle.com/linux/post/anticipating-your-memory-needs
[2] https://www.pdl.cmu.edu/ftp/NVM/tmo_asplos22.pdf
Evaluation
==========
In short, PLRUS achieves 10% memory PSI (some) reduction, 14% major page
faults reduction, and 3.74% speedup under memory pressure.
Setup
-----
To show the effect of PLRUS, I run PARSEC3 and SPLASH-2X benchmarks under
below variant systems and measure a few metrics including the runtime of
each workload, number of system-wide major page faults, and system-wide
memory PSI (some).
- orig: v5.18-rc4 based mm-unstable kernel + this patchset, but no DAMON scheme
applied.
- mprs: Same to 'orig' but artificial memory pressure is induced.
- plrus: Same to 'mprs' but a radically tuned PLRUS scheme is applied to the
entire physical address space of the system.
For the artificial memory pressure, I set 'memory.limit_in_bytes' to 75%
of the running workload's peak RSS, wait 1 second, remove the pressure by
setting it to 200% of the peak RSS, wait 10 seconds, and repeat the
procedure until the workload finishes[1]. I use zram based swap device.
The tests are automated[2].
[1] https://github.com/awslabs/damon-tests/blob/next/perf/runners/back/0009_memcg_pressure.sh
[2] https://github.com/awslabs/damon-tests/blob/next/perf/full_once_config.sh
Radically Tuned PLRUS
---------------------
To show effect of PLRUS on the PARSEC3/SPLASH-2X workloads which runs for
no long time, we use radically tuned version of PLRUS. The version asks
DAMON to do the proactive LRU-lists sorting as below.
1. Find any memory regions shown some accesses (approximately >=20 accesses per
100 sampling) and prioritize pages of the regions on their LRU lists using
up to 2% CPU time. Under the CPU time limit, prioritize regions having
higher access frequency and kept the access frequency longer first.
2. Find any memory regions shown no access for at least >=5 seconds and
deprioritize pages of the rgions on their LRU lists using up to 2% CPU time.
Under the CPU time limit, deprioritize regions that not accessed for longer
time first.
Results
-------
I repeat the tests 25 times and calculate average of the measured numbers.
The results are as below:
metric orig mprs plrus plrus/mprs
runtime_seconds 190.06 292.83 281.87 0.96
pgmajfaults 852.55 8769420.00 7525040.00 0.86
memory_psi_some_us 106911.00 6943420.00 6220920.00 0.90
The first row is for legend. The first cell shows the metric that the
following cells of the row shows. Second, third, and fourth cells show
the metrics under the configs shown at the first row of the cell, and the
fifth cell shows the metric under 'plrus' divided by the metric under
'mprs'. Second row shows the averaged runtime of the workloads in
seconds. Third row shows the number of system-wide major page faults
while the test was ongoing. Fourth row shows the system-wide memory
pressure stall for some processes in microseconds while the test was
ongoing.
In short, PLRUS achieves 10% memory PSI (some) reduction, 14% major page
faults reduction, and 3.74% speedup under memory pressure. We also
confirmed the CPU usage of kdamond was 2.61% of single CPU, which is below
4% as expected.
Sequence of Patches
===================
The first and second patch cleans up DAMON debugfs interface and
DAMOS_PAGEOUT handling code of physical address space monitoring
operations implementation for easier extension of the code.
The thrid and fourth patches implement a new DAMOS action called
'lru_prio', which prioritizes pages under memory regions which have a
user-specified access pattern, and document it, respectively. The fifth
and sixth patches implement yet another new DAMOS action called
'lru_deprio', which deprioritizes pages under memory regions which have a
user-specified access pattern, and document it, respectively.
The seventh patch implements a static kernel module called
'damon_lru_sort', which utilizes the DAMON-based proactive LRU-lists
sorting under conservatively chosen default parameter. Finally, the
eighth patch documents 'damon_lru_sort'.
This patch (of 8):
DAMON debugfs interface assumes users will write 'damos_action' value
directly to the 'schemes' file. This makes adding new 'damos_action' in
the middle of its definition breaks the backward compatibility of DAMON
debugfs interface, as values of some 'damos_action' could be changed. To
mitigate the situation, this commit adds mappings between the user inputs
and 'damos_action' value and makes DAMON debugfs code uses those.
Link: https://lkml.kernel.org/r/20220613192301.8817-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220613192301.8817-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Miaohe Lin [Wed, 8 Jun 2022 14:40:31 +0000 (22:40 +0800)]
mm/swap: remove swap_cache_info statistics
swap_cache_info are not statistics that could be easily used to tune
system performance because they are not easily accessile. Also they can't
provide really useful info when OOM occurs. Remove these statistics can
also help mitigate unneeded global swap_cache_info cacheline contention.
Link: https://lkml.kernel.org/r/20220608144031.829-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Miaohe Lin [Wed, 8 Jun 2022 14:40:30 +0000 (22:40 +0800)]
mm/swapfile: fix possible data races of inuse_pages
si->inuse_pages could still be accessed concurrently now. The plain reads
outside si->lock critical section, i.e. swap_show and si_swapinfo, which
results in data races. READ_ONCE and WRITE_ONCE is used to fix such data
races. Note these data races should be ok because they're just used for
showing swap info.
[linmiaohe@huawei.com: use WRITE_ONCE to pair with READ_ONCE]
Link: https://lkml.kernel.org/r/20220625093346.48894-2-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220608144031.829-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Uladzislau Rezki (Sony) [Tue, 7 Jun 2022 09:34:49 +0000 (11:34 +0200)]
lib/test_vmalloc: switch to prandom_u32()
A get_random_bytes() function can cause a high contention if it is called
across CPUs simultaneously. Because it shares one lock per all CPUs:
<snip>
class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg
&crng->lock: 663145 665886 0.05 8.85 261966.66 0.39 7188152
13731279 0.04 11.89 2181582.30 0.16
-----------
&crng->lock 307835 [<
00000000acba59cd>] _extract_crng+0x48/0x90
&crng->lock 358051 [<
00000000f0075abc>] _crng_backtrack_protect+0x32/0x90
-----------
&crng->lock 234241 [<
00000000f0075abc>] _crng_backtrack_protect+0x32/0x90
&crng->lock 431645 [<
00000000acba59cd>] _extract_crng+0x48/0x90
<snip>
Switch from the get_random_bytes() to prandom_u32() that does not have any
internal contention when a random value is needed for the tests.
The reason is to minimize CPU cycles introduced by the test-suite itself
from the vmalloc performance metrics.
Link: https://lkml.kernel.org/r/20220607093449.3100-6-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Uladzislau Rezki (Sony) [Tue, 7 Jun 2022 09:34:48 +0000 (11:34 +0200)]
mm/vmalloc: extend __find_vmap_area() with one more argument
__find_vmap_area() finds a "vmap_area" based on passed address. It scan
the specific "vmap_area_root" rb-tree. Extend the function with one extra
argument, so any tree can be specified where the search has to be done.
There is no functional change as a result of this patch.
Link: https://lkml.kernel.org/r/20220607093449.3100-5-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Uladzislau Rezki (Sony) [Tue, 7 Jun 2022 09:34:47 +0000 (11:34 +0200)]
mm/vmalloc: initialize VA's list node after unlink
A vmap_area can travel between different places. For example
attached/detached to/from different rb-trees. In order to prevent fancy
bugs, initialize a VA's list node after it is removed from the list, so it
pairs with VA's rb_node which is also initialized.
There is no functional change as a result of this patch.
Link: https://lkml.kernel.org/r/20220607093449.3100-4-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Uladzislau Rezki (Sony) [Tue, 7 Jun 2022 09:34:46 +0000 (11:34 +0200)]
mm/vmalloc: extend __alloc_vmap_area() with extra arguments
It implies that __alloc_vmap_area() allocates only from the global vmap
space, therefore a list-head and rb-tree, which represent a free vmap
space, are not passed as parameters to this function and are accessed
directly from this function.
Extend the __alloc_vmap_area() and other dependent functions to have a
possibility to allocate from different trees making an interface common
and not specific.
There is no functional change as a result of this patch.
Link: https://lkml.kernel.org/r/20220607093449.3100-3-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Uladzislau Rezki (Sony) [Tue, 7 Jun 2022 09:34:45 +0000 (11:34 +0200)]
mm/vmalloc: make link_va()/unlink_va() common to different rb_root
Patch series "Reduce a vmalloc internal lock contention preparation work".
This small serias is preparation work to implement per-cpu vmalloc
allocation in order to reduce a high internal lock contention. This
series does not introduce any functional changes, it is only about
preparation.
This patch (of 5):
Currently link_va() and unlik_va(), in order to figure out a tree type,
compares a passed root value with a global free_vmap_area_root variable to
distinguish the augmented rb-tree from a regular one. It is hard coded
since such functions can manipulate only with specific
"free_vmap_area_root" tree that represents a global free vmap space.
Make it common by introducing "_augment" versions of both internal
functions, so it is possible to deal with different trees.
There is no functional change as a result of this patch.
Link: https://lkml.kernel.org/r/20220607093449.3100-1-urezki@gmail.com
Link: https://lkml.kernel.org/r/20220607093449.3100-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Roman Gushchin [Wed, 1 Jun 2022 03:22:27 +0000 (20:22 -0700)]
mm: shrinkers: add scan interface for shrinker debugfs
Add a scan interface which allows to trigger scanning of a particular
shrinker and specify memcg and numa node. It's useful for testing,
debugging and profiling of a specific scan_objects() callback. Unlike
alternatives (creating a real memory pressure and dropping caches via
/proc/sys/vm/drop_caches) this interface allows to interact with only one
shrinker at once. Also, if a shrinker is misreporting the number of
objects (as some do), it doesn't affect scanning.
[roman.gushchin@linux.dev: improve typing, fix arg count checking]
Link: https://lkml.kernel.org/r/YpgKttTowT22mKPQ@carbon
[akpm@linux-foundation.org: fix arg count checking]
Link: https://lkml.kernel.org/r/20220601032227.4076670-7-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Roman Gushchin [Wed, 1 Jun 2022 03:22:26 +0000 (20:22 -0700)]
tools: add memcg_shrinker.py
Add a simple tool which prints a sorted list of shrinker lists in the
following format: (number of objects, shrinker name, cgroup).
Example:
$ ./memcg_shrinker.py -n 10
2090 sb-sysfs-26 /sys/fs/cgroup/system.slice
1809 sb-sysfs-26 /sys/fs/cgroup/system.slice/systemd-udevd.service
1044 sb-btrfs:vda2-24 /sys/fs/cgroup/system.slice/system-dbus\x2d:1.3\...
861 sb-btrfs:vda2-24 /sys/fs/cgroup/system.slice/system-dbus\x2d:1.3\...
804 sb-btrfs:vda2-24 /sys/fs/cgroup/system.slice
643 sb-btrfs:vda2-24 /sys/fs/cgroup/system.slice/firewalld.service
616 sb-cgroup2-30 /sys/fs/cgroup/init.scope
275 sb-sysfs-26 /
238 sb-proc-25 /sys/fs/cgroup/system.slice/systemd-journald.service
225 sb-proc-25 /sys/fs/cgroup/system.slice/abrtd.service
Link: https://lkml.kernel.org/r/20220601032227.4076670-6-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Roman Gushchin [Wed, 1 Jun 2022 03:22:25 +0000 (20:22 -0700)]
mm: docs: document shrinker debugfs
Add a document describing the shrinker debugfs interface.
Link: https://lkml.kernel.org/r/20220601032227.4076670-5-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Roman Gushchin [Wed, 1 Jun 2022 03:22:24 +0000 (20:22 -0700)]
mm: shrinkers: provide shrinkers with names
Currently shrinkers are anonymous objects. For debugging purposes they
can be identified by count/scan function names, but it's not always
useful: e.g. for superblock's shrinkers it's nice to have at least an
idea of to which superblock the shrinker belongs.
This commit adds names to shrinkers. register_shrinker() and
prealloc_shrinker() functions are extended to take a format and arguments
to master a name.
In some cases it's not possible to determine a good name at the time when
a shrinker is allocated. For such cases shrinker_debugfs_rename() is
provided.
The expected format is:
<subsystem>-<shrinker_type>[:<instance>]-<id>
For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.
After this change the shrinker debugfs directory looks like:
$ cd /sys/kernel/debug/shrinker/
$ ls
dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42
mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43
mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44
rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49
sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13
sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36
sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19
sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10
sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9
sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37
sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38
sb-dax-11 sb-proc-45 sb-tmpfs-35
sb-debugfs-7 sb-proc-46 sb-tmpfs-40
[roman.gushchin@linux.dev: fix build warnings]
Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Roman Gushchin [Wed, 1 Jun 2022 03:22:23 +0000 (20:22 -0700)]
mm: shrinkers: introduce debugfs interface for memory shrinkers
This commit introduces the /sys/kernel/debug/shrinker debugfs interface
which provides an ability to observe the state of individual kernel memory
shrinkers.
Because the feature adds some memory overhead (which shouldn't be large
unless there is a huge amount of registered shrinkers), it's guarded by a
config option (enabled by default).
This commit introduces the "count" interface for each shrinker registered
in the system.
The output is in the following format:
<cgroup inode id> <nr of objects on node 0> <nr of objects on node 1>...
<cgroup inode id> <nr of objects on node 0> <nr of objects on node 1>...
...
To reduce the size of output on machines with many thousands cgroups, if
the total number of objects on all nodes is 0, the line is omitted.
If the shrinker is not memcg-aware or CONFIG_MEMCG is off, 0 is printed as
cgroup inode id. If the shrinker is not numa-aware, 0's are printed for
all nodes except the first one.
This commit gives debugfs entries simple numeric names, which are not very
convenient. The following commit in the series will provide shrinkers
with more meaningful names.
[akpm@linux-foundation.org: remove WARN_ON_ONCE(), per Roman]
Reported-by: syzbot+300d27c79fe6d4cbcc39@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/20220601032227.4076670-3-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Roman Gushchin [Wed, 1 Jun 2022 03:22:22 +0000 (20:22 -0700)]
mm: memcontrol: introduce mem_cgroup_ino() and mem_cgroup_get_from_ino()
Patch series "mm: introduce shrinker debugfs interface", v5.
The only existing debugging mechanism is a couple of tracepoints in
do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They
aren't covering everything though: shrinkers which report 0 objects will
never show up, there is no support for memcg-aware shrinkers. Shrinkers
are identified by their scan function, which is not always enough (e.g.
hard to guess which super block's shrinker it is having only
"super_cache_scan").
To provide a better visibility and debug options for memory shrinkers this
patchset introduces a /sys/kernel/debug/shrinker interface, to some extent
similar to /sys/kernel/slab.
For each shrinker registered in the system a directory is created. As
now, the directory will contain only a "scan" file, which allows to get
the number of managed objects for each memory cgroup (for memcg-aware
shrinkers) and each numa node (for numa-aware shrinkers on a numa
machine). Other interfaces might be added in the future.
To make debugging more pleasant, the patchset also names all shrinkers, so
that debugfs entries can have meaningful names.
This patch (of 5):
Shrinker debugfs requires a way to represent memory cgroups without using
full paths, both for displaying information and getting input from a user.
Cgroup inode number is a perfect way, already used by bpf.
This commit adds a couple of helper functions which will be used to handle
memcg-aware shrinkers.
Link: https://lkml.kernel.org/r/20220601032227.4076670-1-roman.gushchin@linux.dev
Link: https://lkml.kernel.org/r/20220601032227.4076670-2-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tianyu Li [Wed, 1 Jun 2022 09:32:11 +0000 (17:32 +0800)]
mm/mempolicy: fix get_nodes out of bound access
When user specified more nodes than supported, get_nodes will access nmask
array out of bounds.
Link: https://lkml.kernel.org/r/20220601093211.2970565-1-tianyu.li@arm.com
Fixes:
e130242dc351 ("mm: simplify compat numa syscalls")
Signed-off-by: Tianyu Li <tianyu.li@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baolin Wang [Fri, 27 May 2022 02:01:35 +0000 (10:01 +0800)]
mm/hugetlb: remove unnecessary huge_ptep_set_access_flags() in hugetlb_mcopy_atomic_pte()
There is no need to update the hugetlb access flags after just setting the
hugetlb page table entry by set_huge_pte_at(), since the page table entry
value has no changes.
Thus remove the unnecessary huge_ptep_set_access_flags() in
hugetlb_mcopy_atomic_pte().
Link: https://lkml.kernel.org/r/f3e28b897b53a69967a8b98a6fdcda3be80c9229.1653616175.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrey Konovalov [Thu, 9 Jun 2022 18:18:47 +0000 (20:18 +0200)]
kasan: fix zeroing vmalloc memory with HW_TAGS
HW_TAGS KASAN skips zeroing page_alloc allocations backing vmalloc
mappings via __GFP_SKIP_ZERO. Instead, these pages are zeroed via
kasan_unpoison_vmalloc() by passing the KASAN_VMALLOC_INIT flag.
The problem is that __kasan_unpoison_vmalloc() does not zero pages when
either kasan_vmalloc_enabled() or is_vmalloc_or_module_addr() fail.
Thus:
1. Change __vmalloc_node_range() to only set KASAN_VMALLOC_INIT when
__GFP_SKIP_ZERO is set.
2. Change __kasan_unpoison_vmalloc() to always zero pages when the
KASAN_VMALLOC_INIT flag is set.
3. Add WARN_ON() asserts to check that KASAN_VMALLOC_INIT cannot be set
in other early return paths of __kasan_unpoison_vmalloc().
Also clean up the comment in __kasan_unpoison_vmalloc.
Link: https://lkml.kernel.org/r/4bc503537efdc539ffc3f461c1b70162eea31cf6.1654798516.git.andreyknvl@google.com
Fixes:
23689e91fb22 ("kasan, vmalloc: add vmalloc tagging for HW_TAGS")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrey Konovalov [Thu, 9 Jun 2022 18:18:46 +0000 (20:18 +0200)]
mm: introduce clear_highpage_kasan_tagged
Add a clear_highpage_kasan_tagged() helper that does clear_highpage() on a
page potentially tagged by KASAN.
This helper is used by the following patch.
Link: https://lkml.kernel.org/r/4471979b46b2c487787ddcd08b9dc5fedd1b6ffd.1654798516.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrey Konovalov [Thu, 9 Jun 2022 18:18:45 +0000 (20:18 +0200)]
mm: rename kernel_init_free_pages to kernel_init_pages
Rename kernel_init_free_pages() to kernel_init_pages(). This function is
not only used for free pages but also for pages that were just allocated.
Link: https://lkml.kernel.org/r/1ecaffc0a9c1404d4d7cf52efe0b2dc8a0c681d8.1654798516.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 6 Jun 2022 18:23:10 +0000 (18:23 +0000)]
mm/damon/reclaim: add 'damon_reclaim_' prefix to 'enabled_store()'
This commit adds 'damon_reclaim_' prefix to 'enabled_store()', so that we
can distinguish it easily from the stack trace using 'faddr2line.sh' like
tools.
Link: https://lkml.kernel.org/r/20220606182310.48781-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 6 Jun 2022 18:23:09 +0000 (18:23 +0000)]
mm/damon/reclaim: make 'enabled' checking timer simpler
DAMON_RECLAIM's 'enabled' parameter store callback ('enabled_store()')
schedules the parameter check timer ('damon_reclaim_timer') if the
parameter is set as 'Y'. Then, the timer schedules itself to check if
user has set the parameter as 'N'. It's unnecessarily complex.
This commit makes it simpler by making the parameter store callback to
schedule the timer regardless of the parameter value and disabling the
timer's self scheduling.
Link: https://lkml.kernel.org/r/20220606182310.48781-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 6 Jun 2022 18:23:08 +0000 (18:23 +0000)]
mm/damon/sysfs: deduplicate inputs applying
DAMON sysfs interface's DAMON context building and its online parameter
update have duplicated code. This commit removes the duplicate.
Link: https://lkml.kernel.org/r/20220606182310.48781-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 6 Jun 2022 18:23:07 +0000 (18:23 +0000)]
mm/damon/reclaim: deduplicate 'commit_inputs' handling
DAMON_RECLAIM's handling of 'commit_inputs' parameter is duplicated in
'after_aggregation()' and 'after_wmarks_check()' callbacks. This commit
deduplicates the code for better maintenance.
Link: https://lkml.kernel.org/r/20220606182310.48781-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 6 Jun 2022 18:23:06 +0000 (18:23 +0000)]
mm/damon/{dbgfs,sysfs}: move target_has_pid() from dbgfs to damon.h
The function for knowing if given monitoring context's targets will have
pid or not is defined and used in dbgfs only. However, the logic is also
needed for sysfs. This commit moves the code to damon.h and makes both
dbgfs and sysfs to use it.
Link: https://lkml.kernel.org/r/20220606182310.48781-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 6 Jun 2022 18:23:05 +0000 (18:23 +0000)]
Docs/admin-guide/damon/reclaim: remove a paragraph that been obsolete due to online tuning support
Patch series "mm/damon: trivial cleanups".
This patchset contains trivial cleansups for DAMON code.
This patch (of 6):
Commit
81a84182c343 ("Docs/admin-guide/mm/damon/reclaim: document
'commit_inputs' parameter") has documented the 'commit_inputs' parameter
which allows online parameter update, but it didn't remove a paragraph
saying the online parameter update is impossible. This commit removes the
obsolete paragraph.
Link: https://lkml.kernel.org/r/20220606182310.48781-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220606182310.48781-2-sj@kernel.org
Fixes:
81a84182c343 ("Docs/admin-guide/mm/damon/reclaim: document 'commit_inputs' parameter")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Miaohe Lin [Mon, 30 May 2022 11:30:16 +0000 (19:30 +0800)]
mm/migration: fix potential pte_unmap on an not mapped pte
__migration_entry_wait and migration_entry_wait_on_locked assume pte is
always mapped from caller. But this is not the case when it's called from
migration_entry_wait_huge and follow_huge_pmd. Add a hugetlbfs variant
that calls hugetlb_migration_entry_wait(ptep == NULL) to fix this issue.
Link: https://lkml.kernel.org/r/20220530113016.16663-5-linmiaohe@huawei.com
Fixes:
30dad30922cc ("mm: migration: add migrate_entry_wait_huge()")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Miaohe Lin [Mon, 30 May 2022 11:30:15 +0000 (19:30 +0800)]
mm/migration: return errno when isolate_huge_page failed
We might fail to isolate huge page due to e.g. the page is under
migration which cleared HPageMigratable. We should return errno in this
case rather than always return 1 which could confuse the user, i.e. the
caller might think all of the memory is migrated while the hugetlb page is
left behind. We make the prototype of isolate_huge_page consistent with
isolate_lru_page as suggested by Huang Ying and rename isolate_huge_page
to isolate_hugetlb as suggested by Muchun to improve the readability.
Link: https://lkml.kernel.org/r/20220530113016.16663-4-linmiaohe@huawei.com
Fixes:
e8db67eb0ded ("mm: migrate: move_pages() supports thp migration")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Suggested-by: Huang Ying <ying.huang@intel.com>
Reported-by: kernel test robot <lkp@intel.com> (build error)
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Miaohe Lin [Mon, 30 May 2022 11:30:14 +0000 (19:30 +0800)]
mm/migration: remove unneeded lock page and PageMovable check
When non-lru movable page was freed from under us, __ClearPageMovable must
have been done. So we can remove unneeded lock page and PageMovable check
here. Also free_pages_prepare() will clear PG_isolated for us, so we can
further remove ClearPageIsolated as suggested by David.
Link: https://lkml.kernel.org/r/20220530113016.16663-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yang Shi [Fri, 13 May 2022 19:17:05 +0000 (12:17 -0700)]
mm/page_vma_mapped.c: check possible huge PMD map with transhuge_vma_suitable()
IIUC page_vma_mapped_walk() checks if the vma is possibly huge PMD mapped
with transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".
Actually pvmw->nr_pages is returned by compound_nr() or folio_nr_pages(),
so the page should be THP as long as "pvmw->nr_pages >= HPAGE_PMD_NR".
And it is guaranteed THP is allocated for valid VMA in the first place.
But it may be not PMD mapped if the VMA is file VMA and it is not properly
aligned. The transhuge_vma_suitable() is used to do such check, so
replace transparent_hugepage_active() to it, which is too heavy and
overkilling.
Link: https://lkml.kernel.org/r/20220513191705.457775-1-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yang Shi [Mon, 4 Jul 2022 01:08:36 +0000 (18:08 -0700)]
mm: rmap: use the correct parameter name for DEFINE_PAGE_VMA_WALK
The parameter used by DEFINE_PAGE_VMA_WALK is _page not page, fix the
parameter name. It didn't cause any build error, it is probably because
the only caller is write_protect_page() from ksm.c, which pass in page.
Link: https://lkml.kernel.org/r/20220512174551.81279-1-shy828301@gmail.com
Fixes:
2aff7a4755be ("mm: Convert page_vma_mapped_walk to work on PFNs")
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mike Rapoport [Mon, 27 Jun 2022 06:00:26 +0000 (09:00 +0300)]
docs: rename Documentation/vm to Documentation/mm
so it will be consistent with code mm directory and with
Documentation/admin-guide/mm and won't be confused with virtual machines.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Wu XiangCheng <bobwxc@email.cn>
akpm [Mon, 27 Jun 2022 17:31:34 +0000 (10:31 -0700)]
Merge branch 'master' into mm-stable
Linus Torvalds [Sun, 26 Jun 2022 21:22:10 +0000 (14:22 -0700)]
Linux 5.19-rc4
Linus Torvalds [Sun, 26 Jun 2022 21:12:56 +0000 (14:12 -0700)]
Merge tag 'soc-fixes-5.19' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"A number of fixes have accumulated, but they are largely for harmless
issues:
- Several OF node leak fixes
- A fix to the Exynos7885 UART clock description
- DTS fixes to prevent boot failures on TI AM64 and J721s2
- Bus probe error handling fixes for Baikal-T1
- A fixup to the way STM32 SoCs use separate dts files for different
firmware stacks
- Multiple code fixes for Arm SCMI firmware, all dealing with
robustness of the implementation
- Multiple NXP i.MX devicetree fixes, addressing incorrect data in DT
nodes
- Three updates to the MAINTAINERS file, including Florian Fainelli
taking over BCM283x/BCM2711 (Raspberry Pi) from Nicolas Saenz
Julienne"
* tag 'soc-fixes-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits)
ARM: dts: aspeed: nuvia: rename vendor nuvia to qcom
arm: mach-spear: Add missing of_node_put() in time.c
ARM: cns3xxx: Fix refcount leak in cns3xxx_init
MAINTAINERS: Update email address
arm64: dts: ti: k3-am64-main: Remove support for HS400 speed mode
arm64: dts: ti: k3-j721s2: Fix overlapping GICD memory region
ARM: dts: bcm2711-rpi-400: Fix GPIO line names
bus: bt1-axi: Don't print error on -EPROBE_DEFER
bus: bt1-apb: Don't print error on -EPROBE_DEFER
ARM: Fix refcount leak in axxia_boot_secondary
ARM: dts: stm32: move SCMI related nodes in a dedicated file for stm32mp15
soc: imx: imx8m-blk-ctrl: fix display clock for LCDIF2 power domain
ARM: dts: imx6qdl-colibri: Fix capacitive touch reset polarity
ARM: dts: imx6qdl: correct PU regulator ramp delay
firmware: arm_scmi: Fix incorrect error propagation in scmi_voltage_descriptors_get
firmware: arm_scmi: Avoid using extended string-buffers sizes if not necessary
firmware: arm_scmi: Fix SENSOR_AXIS_NAME_GET behaviour when unsupported
ARM: dts: imx7: Move hsic_phy power domain to HSIC PHY node
soc: bcm: brcmstb: pm: pm-arm: Fix refcount leak in brcmstb_pm_probe
MAINTAINERS: Update BCM2711/BCM2835 maintainer
...
Linus Torvalds [Sun, 26 Jun 2022 21:00:55 +0000 (14:00 -0700)]
Merge tag 'mm-hotfixes-stable-2022-06-26' of git://git./linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"Minor things, mainly - mailmap updates, MAINTAINERS updates, etc.
Fixes for this merge window:
- fix for a damon boot hang, from SeongJae
- fix for a kfence warning splat, from Jason Donenfeld
- fix for zero-pfn pinning, from Alex Williamson
- fix for fallocate hole punch clearing, from Mike Kravetz
Fixes for previous releases:
- fix for a performance regression, from Marcelo
- fix for a hwpoisining BUG from zhenwei pi"
* tag 'mm-hotfixes-stable-2022-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mailmap: add entry for Christian Marangi
mm/memory-failure: disable unpoison once hw error happens
hugetlbfs: zero partial pages during fallocate hole punch
mm: memcontrol: reference to tools/cgroup/memcg_slabinfo.py
mm: re-allow pinning of zero pfns
mm/kfence: select random number before taking raw lock
MAINTAINERS: add maillist information for LoongArch
MAINTAINERS: update MM tree references
MAINTAINERS: update Abel Vesa's email
MAINTAINERS: add MEMORY HOT(UN)PLUG section and add David as reviewer
MAINTAINERS: add Miaohe Lin as a memory-failure reviewer
mailmap: add alias for jarkko@profian.com
mm/damon/reclaim: schedule 'damon_reclaim_timer' only after 'system_wq' is initialized
kthread: make it clear that kthread_create_on_node() might be terminated by any fatal signal
mm: lru_cache_disable: use synchronize_rcu_expedited
mm/page_isolation.c: fix one kernel-doc comment
Linus Torvalds [Sun, 26 Jun 2022 19:12:25 +0000 (12:12 -0700)]
Merge tag 'perf-tools-fixes-for-v5.19-2022-06-26' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Enable ignore_missing_thread in 'perf stat', enabling counting with
'--pid' when threads disappear during counting session setup
- Adjust output data offset for backward compatibility in 'perf inject'
- Fix missing free in copy_kcore_dir() in 'perf inject'
- Fix caching files with a wrong build ID
- Sync drm, cpufeatures, vhost and svn headers with the kernel
* tag 'perf-tools-fixes-for-v5.19-2022-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
tools headers UAPI: Synch KVM's svm.h header with the kernel
tools include UAPI: Sync linux/vhost.h with the kernel sources
perf stat: Enable ignore_missing_thread
perf inject: Adjust output data offset for backward compatibility
perf trace beauty: Fix generation of errno id->str table on ALT Linux
perf build-id: Fix caching files with a wrong build ID
tools headers cpufeatures: Sync with the kernel sources
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
perf inject: Fix missing free in copy_kcore_dir()
Linus Torvalds [Sun, 26 Jun 2022 17:11:36 +0000 (10:11 -0700)]
Merge tag 'for-5.19-rc3-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- zoned relocation fixes:
- fix critical section end for extent writeback, this could lead
to out of order write
- prevent writing to previous data relocation block group if space
gets low
- reflink fixes:
- fix race between reflinking and ordered extent completion
- proper error handling when block reserve migration fails
- add missing inode iversion/mtime/ctime updates on each iteration
when replacing extents
- fix deadlock when running fsync/fiemap/commit at the same time
- fix false-positive KCSAN report regarding pid tracking for read locks
and data race
- minor documentation update and link to new site
* tag 'for-5.19-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Documentation: update btrfs list of features and link to readthedocs.io
btrfs: fix deadlock with fsync+fiemap+transaction commit
btrfs: don't set lock_owner when locking extent buffer for reading
btrfs: zoned: fix critical section of relocation inode writeback
btrfs: zoned: prevent allocation from previous data relocation BG
btrfs: do not BUG_ON() on failure to migrate space when replacing extents
btrfs: add missing inode updates on each iteration when replacing extents
btrfs: fix race between reflinking and ordered extent completion
Linus Torvalds [Sun, 26 Jun 2022 17:01:40 +0000 (10:01 -0700)]
Merge tag 'dma-mapping-5.19-2022-06-26' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
- pass the correct size to dma_set_encrypted() when freeing memory
(Dexuan Cui)
* tag 'dma-mapping-5.19-2022-06-26' of git://git.infradead.org/users/hch/dma-mapping:
dma-direct: use the correct size for dma_set_encrypted()
Linus Torvalds [Sun, 26 Jun 2022 16:13:51 +0000 (09:13 -0700)]
Merge tag 'for-5.19/fbdev-2' of git://git./linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes from Helge Deller:
"Two bug fixes for the pxa3xx and intelfb drivers:
- pxa3xx-gcu: Fix integer overflow in pxa3xx_gcu_write
- intelfb: Initialize value of stolen size
The other changes are small cleanups, simplifications and
documentation updates to the cirrusfb, skeletonfb, omapfb,
intelfb, au1100fb and simplefb drivers"
* tag 'for-5.19/fbdev-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
video: fbdev: omap: Remove duplicate 'the' in comment
video: fbdev: omapfb: Align '*' in comment
video: fbdev: simplefb: Check before clk_put() not needed
video: fbdev: au1100fb: Drop unnecessary NULL ptr check
video: fbdev: pxa3xx-gcu: Fix integer overflow in pxa3xx_gcu_write
video: fbdev: skeletonfb: Convert to generic power management
video: fbdev: cirrusfb: Remove useless reference to PCI power management
video: fbdev: intelfb: Initialize value of stolen size
video: fbdev: intelfb: Use aperture size from pci_resource_len
video: fbdev: skeletonfb: Fix syntax errors in comments
Linus Torvalds [Sun, 26 Jun 2022 16:08:30 +0000 (09:08 -0700)]
Merge tag 'for-5.19/parisc-3' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
- enable ARCH_HAS_STRICT_MODULE_RWX to prevent a boot crash on c8000
machines
- flush all mappings of a shared anonymous page on PA8800/8900 machines
via flushing the whole data cache. This may slow down such machines
but makes sure that the cache is consistent
- Fix duplicate definition build error regarding fb_is_primary_device()
* tag 'for-5.19/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Enable ARCH_HAS_STRICT_MODULE_RWX
parisc: Fix flush_anon_page on PA8800/PA8900
parisc: align '*' in comment in math-emu code
parisc/stifb: Fix fb_is_primary_device() only available with CONFIG_FB_STI
Linus Torvalds [Sun, 26 Jun 2022 15:59:21 +0000 (08:59 -0700)]
Merge tag 'xtensa-
20220626' of https://github.com/jcmvbkbc/linux-xtensa
Pull xtensa fixes from Max Filippov:
- fix OF reference leaks in xtensa arch code
- replace '.bss' with '.section .bss' to fix entry.S build with old
assembler
* tag 'xtensa-
20220626' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: change '.bss' to '.section .bss'
xtensa: xtfpga: Fix refcount leak bug in setup
xtensa: Fix refcount leak bug in time.c
Linus Torvalds [Sun, 26 Jun 2022 15:53:20 +0000 (08:53 -0700)]
Merge tag 'powerpc-5.19-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- A fix for a CMA change that broke booting guests with > 2G RAM on
Power8 hosts.
- Fix the RTAS call filter to allow a special case that applications
rely on.
- A change to our execve path, to make the execve syscall exit
tracepoint work.
- Three fixes to wire up our various RNGs earlier in boot so they're
available for use in the initial seeding in random_init().
- A build fix for when KASAN is enabled along with
STRUCTLEAK_BYREF_ALL.
Thanks to Andrew Donnellan, Aneesh Kumar K.V, Christophe Leroy, Jason
Donenfeld, Nathan Lynch, Naveen N. Rao, Sathvika Vasireddy, Sumit
Dubey2, Tyrel Datwyler, and Zi Yan.
* tag 'powerpc-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv: wire up rng during setup_arch
powerpc/prom_init: Fix build failure with GCC_PLUGIN_STRUCTLEAK_BYREF_ALL and KASAN
powerpc/rtas: Allow ibm,platform-dump RTAS call with null buffer address
powerpc: Enable execve syscall exit tracepoint
powerpc/pseries: wire up rng during setup_arch()
powerpc/microwatt: wire up rng during setup_arch()
powerpc/mm: Move CMA reservations after initmem_init()
Linus Torvalds [Sun, 26 Jun 2022 15:47:28 +0000 (08:47 -0700)]
Merge tag 'kbuild-fixes-v5.19-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix modpost to detect EXPORT_SYMBOL marked as __init or__exit
- Update the supported arch list in the LLVM document
- Avoid the second link of vmlinux for CONFIG_TRIM_UNUSED_KSYMS
- Avoid false __KSYM___this_module define in include/generated/autoksyms.h
* tag 'kbuild-fixes-v5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: Ignore __this_module in gen_autoksyms.sh
kbuild: link vmlinux only once for CONFIG_TRIM_UNUSED_KSYMS (2nd attempt)
Documentation/llvm: Update Supported Arch table
modpost: fix section mismatch check for exported init/exit sections
Linus Torvalds [Sun, 26 Jun 2022 15:41:04 +0000 (08:41 -0700)]
Merge tag 'exfat-for-5.19-rc4' of git://git./linux/kernel/git/linkinjeon/exfat
Pull exfat fix from Namjae Jeon:
- Use updated exfat_chain directly instead of snapshot values in
rename.
* tag 'exfat-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: use updated exfat_chain directly during renaming
Linus Torvalds [Sun, 26 Jun 2022 15:34:52 +0000 (08:34 -0700)]
Merge tag '5.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs client fixes from Steve French:
"Fixes addressing important multichannel, and reconnect issues.
Multichannel mounts when the server network interfaces changed, or ip
addresses changed, uncovered problems, especially in reconnect, but
the patches for this were held up until recently due to some lock
conflicts that are now addressed.
Included in this set of fixes:
- three fixes relating to multichannel reconnect, dynamically
adjusting the list of server interfaces to avoid problems during
reconnect
- a lock conflict fix related to the above
- two important fixes for negotiate on secondary channels (null
netname can unintentionally cause multichannel to be disabled to
some servers)
- a reconnect fix (reporting incorrect IP address in some cases)"
* tag '5.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update cifs_ses::ip_addr after failover
cifs: avoid deadlocks while updating iface
cifs: periodically query network interfaces from server
cifs: during reconnect, update interface if necessary
cifs: change iface_list from array to sorted linked list
smb3: use netname when available on secondary channels
smb3: fix empty netname context on secondary channels
Arnaldo Carvalho de Melo [Mon, 21 Dec 2020 23:04:45 +0000 (20:04 -0300)]
tools headers UAPI: Synch KVM's svm.h header with the kernel
To pick up the changes from:
d5af44dde5461d12 ("x86/sev: Provide support for SNP guest request NAEs")
0afb6b660a6b58cb ("x86/sev: Use SEV-SNP AP creation to start secondary CPUs")
dc3f3d2474b80eae ("x86/mm: Validate memory when changing the C-bit")
cbd3d4f7c4e5a93e ("x86/sev: Check SEV-SNP features support")
That gets these new SVM exit reasons:
+ { SVM_VMGEXIT_PSC, "vmgexit_page_state_change" }, \
+ { SVM_VMGEXIT_GUEST_REQUEST, "vmgexit_guest_request" }, \
+ { SVM_VMGEXIT_EXT_GUEST_REQUEST, "vmgexit_ext_guest_request" }, \
+ { SVM_VMGEXIT_AP_CREATION, "vmgexit_ap_creation" }, \
+ { SVM_VMGEXIT_HV_FEATURES, "vmgexit_hypervisor_feature" }, \
Addressing this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/svm.h' differs from latest version at 'arch/x86/include/uapi/asm/svm.h'
diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
This causes these changes:
CC /tmp/build/perf-urgent/arch/x86/util/kvm-stat.o
LD /tmp/build/perf-urgent/arch/x86/util/perf-in.o
LD /tmp/build/perf-urgent/arch/x86/perf-in.o
LD /tmp/build/perf-urgent/arch/perf-in.o
LD /tmp/build/perf-urgent/perf-in.o
LINK /tmp/build/perf-urgent/perf
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 14 Apr 2020 12:12:55 +0000 (09:12 -0300)]
tools include UAPI: Sync linux/vhost.h with the kernel sources
To get the changes in:
84d7c8fd3aade2fe ("vhost-vdpa: introduce uAPI to set group ASID")
2d1fcb7758e49fd9 ("vhost-vdpa: uAPI to get virtqueue group id")
a0c95f201170bd55 ("vhost-vdpa: introduce uAPI to get the number of address spaces")
3ace88bd37436abc ("vhost-vdpa: introduce uAPI to get the number of virtqueue groups")
175d493c3c3e09a3 ("vhost: move the backend feature bits to vhost_types.h")
Silencing this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h'
diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h
To pick up these changes and support them:
$ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before
$ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h
$ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after
$ diff -u before after
--- before 2022-06-26 12:04:35.
982003781 -0300
+++ after 2022-06-26 12:04:43.
819972476 -0300
@@ -28,6 +28,7 @@
[0x74] = "VDPA_SET_CONFIG",
[0x75] = "VDPA_SET_VRING_ENABLE",
[0x77] = "VDPA_SET_CONFIG_CALL",
+ [0x7C] = "VDPA_SET_GROUP_ASID",
};
static const char *vhost_virtio_ioctl_read_cmds[] = {
[0x00] = "GET_FEATURES",
@@ -39,5 +40,8 @@
[0x76] = "VDPA_GET_VRING_NUM",
[0x78] = "VDPA_GET_IOVA_RANGE",
[0x79] = "VDPA_GET_CONFIG_SIZE",
+ [0x7A] = "VDPA_GET_AS_NUM",
+ [0x7B] = "VDPA_GET_VRING_GROUP",
[0x80] = "VDPA_GET_VQS_COUNT",
+ [0x81] = "VDPA_GET_GROUP_NUM",
};
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Gautam Dawar <gautam.dawar@xilinx.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Yrh3xMYbfeAD0MFL@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Gang Li [Wed, 22 Jun 2022 03:00:37 +0000 (11:00 +0800)]
perf stat: Enable ignore_missing_thread
perf already support ignore_missing_thread for -p, but not yet
applied to `perf stat -p <pid>`. This patch enables ignore_missing_thread
for `perf stat -p <pid>`.
Committer notes:
And here is a refresher about the 'ignore_missing_thread' knob, from a
previous patch using it:
ca8000684ec4e66f ("perf evsel: Enable ignore_missing_thread for pid option")
---
While monitoring a multithread process with pid option, perf sometimes
may return sys_perf_event_open failure with 3(No such process) if any of
the process's threads die before we open the event. However, we want
perf continue monitoring the remaining threads and do not exit with
error.
---
Signed-off-by: Gang Li <ligang.bdlg@bytedance.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220622030037.15005-1-ligang.bdlg@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Raul Silvera [Tue, 21 Jun 2022 15:27:25 +0000 (15:27 +0000)]
perf inject: Adjust output data offset for backward compatibility
When 'perf inject' creates a new file, it reuses the data offset from
the input file. If there has been a change on the size of the header, as
happened in v5.12 -> v5.13, the new offsets will be wrong, resulting in
a corrupted output file.
This change adds the function perf_session__data_offset to compute the
data offset based on the current header size, and uses that instead of
the offset from the original input file.
Signed-off-by: Raul Silvera <rsilvera@google.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220621152725.2668041-1-rsilvera@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 21 Jun 2022 15:34:37 +0000 (12:34 -0300)]
perf trace beauty: Fix generation of errno id->str table on ALT Linux
For some reason using:
cat <<EoFuncBegin
static const char *errno_to_name__$arch(int err)
{
switch (err) {
EoFuncBegin
In tools/perf/trace/beauty/arch_errno_names.sh isn't working on ALT
Linux sisyphus (development version), which could be some distro
specific glitch, so just get this done in an alternative way that works
everywhere while giving notice to the people working on that distro to
try and figure our what really took place.
Cc: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Tue, 21 Jun 2022 12:51:44 +0000 (15:51 +0300)]
perf build-id: Fix caching files with a wrong build ID
Build ID events associate a file name with a build ID. However, when
using perf inject, there is no guarantee that the file on the current
machine at the current time has that build ID. Fix by comparing the
build IDs and skip adding to the cache if they are different.
Example:
$ echo "int main() {return 0;}" > prog.c
$ gcc -o prog prog.c
$ perf record --buildid-all ./prog
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data ]
$ file-buildid() { file $1 | awk -F= '{print $2}' | awk -F, '{print $1}' ; }
$ file-buildid prog
444ad9be165d8058a48ce2ffb4e9f55854a3293e
$ file-buildid ~/.debug/$(pwd)/prog/
444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
444ad9be165d8058a48ce2ffb4e9f55854a3293e
$ echo "int main() {return 1;}" > prog.c
$ gcc -o prog prog.c
$ file-buildid prog
885524d5aaa24008a3e2b06caa3ea95d013c0fc5
Before:
$ perf buildid-cache --purge $(pwd)/prog
$ perf inject -i perf.data -o junk
$ file-buildid ~/.debug/$(pwd)/prog/
444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
885524d5aaa24008a3e2b06caa3ea95d013c0fc5
$
After:
$ perf buildid-cache --purge $(pwd)/prog
$ perf inject -i perf.data -o junk
$ file-buildid ~/.debug/$(pwd)/prog/
444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
$
Fixes:
454c407ec17a0c63 ("perf: add perf-inject builtin")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Link: https://lore.kernel.org/r/20220621125144.5623-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 1 Jul 2021 16:39:15 +0000 (13:39 -0300)]
tools headers cpufeatures: Sync with the kernel sources
To pick the changes from:
d6d0c7f681fda1d0 ("x86/cpufeatures: Add PerfMonV2 feature bit")
296d5a17e793956f ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
f30903394eb62316 ("x86/cpufeatures: Add virtual TSC_AUX feature bit")
8ad7e8f696951f19 ("x86/fpu/xsave: Support XSAVEC in the kernel")
59bd54a84d15e933 ("x86/tdx: Detect running as a TDX guest in early boot")
a77d41ac3a0f41c8 ("x86/cpufeatures: Add AMD Fam19h Branch Sampling feature")
This only causes these perf files to be rebuilt:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/YrDkgmwhLv+nKeOo@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Sat, 13 Nov 2021 14:08:31 +0000 (11:08 -0300)]
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
To pick up the changes in:
ecf8eca51f33dbfd ("drm/i915/xehp: Add compute engine ABI")
991b4de3275728fd ("drm/i915/uapi: Add kerneldoc for engine class enum")
c94fde8f516610b0 ("drm/i915/uapi: Add DRM_I915_QUERY_GEOMETRY_SUBSLICES")
1c671ad753dbbf5f ("drm/i915/doc: Link query items to their uapi structs")
a2e5402691e23269 ("drm/i915/doc: Convert perf UAPI comments to kerneldoc")
462ac1cdf4d7acf1 ("drm/i915/doc: Convert drm_i915_query_topology_info comment to kerneldoc")
034d47b25b2ce627 ("drm/i915/uapi: Document DRM_I915_QUERY_HWCONFIG_BLOB")
78e1fb3112c0ac44 ("drm/i915/uapi: Add query for hwconfig blob")
That don't add any new ioctl, so no changes in tooling.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
Cc: John Harrison <John.C.Harrison@intel.com>
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://lore.kernel.org/lkml/YrDi4ALYjv9Mdocq@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 Jun 2022 10:39:04 +0000 (13:39 +0300)]
perf inject: Fix missing free in copy_kcore_dir()
Free string allocated by asprintf().
Fixes:
d8fc08550929bb84 ("perf inject: Keep a copy of kcore_dir")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220620103904.7960-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Helge Deller [Sun, 26 Jun 2022 09:50:43 +0000 (11:50 +0200)]
parisc: Enable ARCH_HAS_STRICT_MODULE_RWX
Fix a boot crash on a c8000 machine as reported by Dave. Basically it changes
patch_map() to return an alias mapping to the to-be-patched code in order to
prevent writing to write-protected memory.
Signed-off-by: Helge Deller <deller@gmx.de>
Suggested-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # v5.2+
Link: https://lore.kernel.org/all/e8ec39e8-25f8-e6b4-b7ed-4cb23efc756e@bell.net/
John David Anglin [Sat, 18 Jun 2022 15:14:34 +0000 (15:14 +0000)]
parisc: Fix flush_anon_page on PA8800/PA8900
Anonymous pages are allocated with the shared mappings colouring,
SHM_COLOUR. Since the alias boundary on machines with PA8800 and
PA8900 processors is unknown, flush_user_cache_page() might not
flush all mappings of a shared anonymous page. Flushing the whole
data cache flushes all mappings.
This won't fix all coherency issues with shared mappings but it
seems to work well in practice. I haven't seen any random memory
faults in almost a month on a rp3440 running as a debian buildd
machine.
There is a small preformance hit.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v5.18+
Jiang Jian [Tue, 21 Jun 2022 06:38:23 +0000 (14:38 +0800)]
parisc: align '*' in comment in math-emu code
Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Sami Tolvanen [Thu, 16 Jun 2022 19:57:59 +0000 (19:57 +0000)]
kbuild: Ignore __this_module in gen_autoksyms.sh
Module object files can contain an undefined reference to __this_module,
which isn't resolved until we link the final .ko. The kernel doesn't
export this symbol, so ignore it in gen_autoksyms.sh.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Steve Muckle <smuckle@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Ramji Jiyani <ramjiyani@google.com>
Masahiro Yamada [Thu, 23 Jun 2022 19:11:47 +0000 (04:11 +0900)]
kbuild: link vmlinux only once for CONFIG_TRIM_UNUSED_KSYMS (2nd attempt)
If CONFIG_TRIM_UNUSED_KSYMS is enabled and the kernel is built from
a pristine state, the vmlinux is linked twice.
Commit
3fdc7d3fe4c0 ("kbuild: link vmlinux only once for
CONFIG_TRIM_UNUSED_KSYMS") explains why this happens, but it did not fix
the issue at all.
Now I realized I had applied a wrong patch.
In v1 patch [1], the autoksyms_recursive target correctly recurses to
"$(MAKE) -f $(srctree)/Makefile autoksyms_recursive".
In v2 patch [2], I accidentally dropped the diff line, and it recurses to
"$(MAKE) -f $(srctree)/Makefile vmlinux".
Restore the code I intended in v1.
[1]: https://lore.kernel.org/linux-kbuild/
1521045861-22418-8-git-send-email-yamada.masahiro@socionext.com/
[2]: https://lore.kernel.org/linux-kbuild/
1521166725-24157-8-git-send-email-yamada.masahiro@socionext.com/
Fixes:
3fdc7d3fe4c0 ("kbuild: link vmlinux only once for CONFIG_TRIM_UNUSED_KSYMS")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Linus Torvalds [Sat, 25 Jun 2022 17:07:36 +0000 (10:07 -0700)]
Merge tag 'char-misc-5.19-rc4' of git://git./linux/kernel/git/gregkh/char-misc
Pull IIO driver fixes from Greg KH:
"Here are a set of IIO driver fixes for 5.19-rc4. Jonathan said it best
in his pull request to me, so I'll just quote it here below, as that's
the only changes we have right now for the char-misc driver tree:
testing:
- Fix a missing MODULE_LICENSE() warning by restricting possible
build configs.
Various drivers:
- Fix ordering of iio_get_trigger() being called before
iio_trigger_register()
adi,admv1014:
- Fix dubious x & !y warning.
adi,axi-adc:
- Fix missing of_node_put() in error and normal paths.
aspeed,adc:
- Add missing of_node_put()
fsl,mma8452:
- Fix broken probing from device tree.
- Drop check on return value of i2c write to device to cause reset
as ACK will be missing (device reset before sending it).
fsl,vf610:
- Fix documentation of in_conversion_mode ABI.
iio-trig-sysfs:
- Ensure irq work has finished before freeing the trigger.
invensense,mpu3050:
- Disable regulators in error path.
invensense,icm42600:
- Fix collision of enum value of 0 with error path where 0 is no
match.
renesas,rzg2l_Adc:
- Add missing fwnode_handle_put() in error path.
rescale:
- Fix a boolean logic bug for detection of raw + scale affecting
an obscure corner case.
semtech,sx9324:
- Check return value of read of pin_defs
st,stm32-adc:
- Fix interaction across ADC instances for some supported devices.
- Drop false spurious IRQ messages.
- Fix calibration value handling. If we can't calibrate don't
expose the vref_int channel.
- Fix maximum clock rate for stm32pm15x
ti,ads131e08:
- Add missing fwnode_handle_put() in error paths.
xilinx,ams:
- Fix variable checked for error from platform_get_irq()
x-powers,axp288:
- Overide TS_PIN bias current for boards where it is not correctly
initialized.
yamaha,yas530:
- Fix inverted check on calibration data being all zeros.
All of these have been in linux-next for a while with no reported
problems"
* tag 'char-misc-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (26 commits)
iio:proximity:sx9324: Check ret value of device_property_read_u32_array()
iio: accel: mma8452: ignore the return value of reset operation
iio: adc: stm32: fix maximum clock rate for stm32mp15x
iio: adc: stm32: fix vrefint wrong calibration value handling
iio: imu: inv_icm42600: Fix broken icm42600 (chip id 0 value)
iio: adc: vf610: fix conversion mode sysfs node name
iio: adc: adi-axi-adc: Fix refcount leak in adi_axi_adc_attach_client
iio: test: fix missing MODULE_LICENSE for IIO_RESCALE=m
iio:humidity:hts221: rearrange iio trigger get and register
iio:chemical:ccs811: rearrange iio trigger get and register
iio:accel:mxc4005: rearrange iio trigger get and register
iio:accel:kxcjk-1013: rearrange iio trigger get and register
iio:accel:bma180: rearrange iio trigger get and register
iio: afe: rescale: Fix boolean logic bug
iio: adc: aspeed: Fix refcount leak in aspeed_adc_set_trim_data
iio: adc: stm32: Fix IRQs on STM32F4 by removing custom spurious IRQs message
iio: adc: stm32: Fix ADCs iteration in irq handler
iio: adc: ti-ads131e08: add missing fwnode_handle_put() in ads131e08_alloc_channels()
iio: adc: rzg2l_adc: add missing fwnode_handle_put() in rzg2l_adc_parse_properties()
iio: trigger: sysfs: fix use-after-free on remove
...
Linus Torvalds [Sat, 25 Jun 2022 17:02:05 +0000 (10:02 -0700)]
Merge tag 'usb-5.19-rc4' of git://git./linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are some small USB driver fixes and new device ids for 5.19-rc4
for a few small reported issues. They include:
- new usb-serial driver ids
- MAINTAINERS file update to properly catch the USB dts files
- dt-bindings fixes for reported build warnings
- xhci driver fixes for reported problems
- typec Kconfig dependancy fix
- raw_gadget fuzzing fixes found by syzbot
- chipidea driver bugfix
- usb gadget uvc bugfix
All of these have been in linux-next with no reported issues"
* tag 'usb-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: chipidea: udc: check request status before setting device address
USB: gadget: Fix double-free bug in raw_gadget driver
xhci-pci: Allow host runtime PM as default for Intel Meteor Lake xHCI
xhci-pci: Allow host runtime PM as default for Intel Raptor Lake xHCI
xhci: turn off port power in shutdown
xhci: Keep interrupt disabled in initialization until host is running.
USB: serial: option: add Quectel RM500K module support
USB: serial: option: add Quectel EM05-G modem
USB: serial: pl2303: add support for more HXN (G) types
usb: typec: wcove: Drop wrong dependency to INTEL_SOC_PMIC
usb: gadget: uvc: fix list double add in uvcg_video_pump
dt-bindings: usb: ehci: Increase the number of PHYs
dt-bindings: usb: ohci: Increase the number of PHYs
usb: gadget: Fix non-unique driver names in raw-gadget driver
MAINTAINERS: add include/dt-bindings/usb to USB SUBSYSTEM
USB: serial: option: add Telit LE910Cx 0x1250 composition
Linus Torvalds [Sat, 25 Jun 2022 16:24:59 +0000 (09:24 -0700)]
Merge tag 'loongarch-fixes-5.19-3' of git://git./linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Some bug fixes and a trivial cleanup"
* tag 'loongarch-fixes-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Make compute_return_era() return void
LoongArch: Fix wrong fpu version
LoongArch: Fix EENTRY/MERRENTRY setting in setup_tlb_handler()
LoongArch: Fix sleeping in atomic context in setup_tlb_handler()
LoongArch: Fix the _stext symbol address
LoongArch: Fix the !THP build
Linus Torvalds [Sat, 25 Jun 2022 16:19:51 +0000 (09:19 -0700)]
Merge tag 'f2fs-for-5.19-rc4' of git://git./linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
"Some urgent fixes to avoid generating corrupted inodes caused by
compressed and inline_data files.
In addition, avoid a wrong error report which prevents a roll-forward
recovery"
* tag 'f2fs-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: do not count ENOENT for error case
f2fs: fix iostat related lock protection
f2fs: attach inline_data after setting compression
Tiezhu Yang [Sat, 18 Jun 2022 08:39:11 +0000 (16:39 +0800)]
LoongArch: Make compute_return_era() return void
compute_return_era() always returns 0, make it return void,
and then no need to check its return value for its callers.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Tiezhu Yang [Sat, 18 Jun 2022 04:50:31 +0000 (12:50 +0800)]
LoongArch: Fix wrong fpu version
According to the configuration information accessible by the CPUCFG
instruction in LoongArch Reference Manual [1], FP_ver is stored in
bit [5: 3] of CPUCFG2, the current code to get fpu version is wrong,
use CPUCFG2_FPVERS to fix it.
[1] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html
Fixes:
628c3bb40e9a ("LoongArch: Add boot and setup routines")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Thu, 23 Jun 2022 08:11:54 +0000 (16:11 +0800)]
LoongArch: Fix EENTRY/MERRENTRY setting in setup_tlb_handler()
setup_tlb_handler() is expected to set per-cpu exception handlers, but
it only set the TLBRENTRY successfully because of copy & paste errors,
so fix it.
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Thu, 23 Jun 2022 01:55:42 +0000 (09:55 +0800)]
LoongArch: Fix sleeping in atomic context in setup_tlb_handler()
Since setup_tlb_handler() is executed in atomic context, we should use
GFP_ATOMIC instead of GFP_KERNEL to alloc pages. Otherwise we will get
a "sleeping in atomic context" error:
[ 0.013118] BUG: sleeping function called from invalid context at mm/page_alloc.c:5158
[ 0.013126] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
[ 0.013131] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.19-rc3+ #1008
1a223086d14d07967cc427f15d52139422271360
[ 0.013136] Hardware name: Loongson Loongson-3A5000-7A1000-1w-V0.1-CRB/Loongson-LS3A5000-7A1000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V2.0.04082-beta7 04/27
[ 0.013140] Stack :
90000000015fc990 9000000100493c18 9000000000df3370 9000000100490000
[ 0.013151]
9000000100493b50 0000000000000000 9000000100493b58 9000000001417ef0
[ 0.013160]
900000000199e54e 0000000000000040 9000000100493c18 90000000015f7a98
[ 0.013168]
ffffffffffffffff 6de72f8b42179d1e 9000000100403b80 90000000015f7890
[ 0.013176]
0000000000000001 00000000fffff175 9000000000eb9860 9000000001530b4b
[ 0.013184]
9000000000e99e60 0000000000000013 0000000006ecc000 0000000000000001
[ 0.013193]
90000000015f7a98 9000000001417ef0 0000000000000004 0000000000000000
[ 0.013201]
0000000000000cc0 0000000000000000 0000000000000001 90000000015fc990
[ 0.013209]
9000000000217e74 9000000001603b6b 9000000000208640 0000000000000000
[ 0.013217]
00000000000000b0 0000000000000004 0000000000000000 0000000000070000
[ 0.013225] ...
[ 0.013229] Call Trace:
[ 0.013230] [<
9000000000208640>] show_stack+0x4c/0x14c
[ 0.013240] [<
9000000000df3370>] dump_stack_lvl+0x70/0xac
[ 0.013246] [<
9000000000270c8c>] ___might_sleep+0x104/0x124
[ 0.013253] [<
9000000000477e84>] __alloc_pages+0x240/0x464
[ 0.013260] [<
9000000000214214>] setup_tlb_handler+0x104/0x1e8
[ 0.013265] [<
9000000000214324>] tlb_init+0x2c/0x3c
[ 0.013270] [<
9000000000208b74>] per_cpu_trap_init+0xec/0x108
[ 0.013275] [<
9000000000202850>] cpu_probe+0x400/0x8a4
[ 0.013279] [<
900000000020d160>] start_secondary+0x5c/0x3d4
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Sat, 25 Jun 2022 08:55:41 +0000 (16:55 +0800)]
LoongArch: Fix the _stext symbol address
_stext means the start of .text section (see __is_kernel_text()), but we
put its definition in .ref.text by mistake. Fix it by defining it in the
vmlinux.lds.S.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Wed, 22 Jun 2022 13:56:16 +0000 (21:56 +0800)]
LoongArch: Fix the !THP build
Fix the !THP build by making pmd_pfn() available in all configurations.
Because pmd_pfn() is used in mm/page_vma_mapped.c whether or not THP is
configured.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Linus Torvalds [Sat, 25 Jun 2022 00:01:31 +0000 (17:01 -0700)]
Merge tag 'gpio-fixes-for-v5.19-rc4' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- make the irqchip immutable in gpio-realtek-otto
- fix error code propagation in gpio-winbond
- fix device removing in gpio-grgpio
- fix a typo in gpio-mxs which indicates the driver is for a different
model
- documentation fixes
- MAINTAINERS file updates
* tag 'gpio-fixes-for-v5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: mxs: Fix header comment
gpio: Fix kernel-doc comments to nested union
gpio: grgpio: Fix device removing
gpio: winbond: Fix error code in winbond_gpio_get()
gpio: realtek-otto: Make the irqchip immutable
docs: driver-api: gpio: Fix filename mismatch
MAINTAINERS: add include/dt-bindings/gpio to GPIO SUBSYSTEM
Linus Torvalds [Fri, 24 Jun 2022 21:04:08 +0000 (14:04 -0700)]
Merge tag 'mtd/fixes-for-5.19-rc4' of git://git./linux/kernel/git/mtd/linux
Pull mtd fixes from Miquel RAynal:
"NAND controller fix:
- gpmi: Fix busy timeout setting (wrong calculation)
NAND chip driver fix:
- Thoshiba: Revert the commit introducing support for a chip that
might have been counterfeit"
* tag 'mtd/fixes-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: gpmi: Fix setting busy timeout setting
Revert "mtd: rawnand: add support for Toshiba TC58NVG0S3HTA00 NAND flash"
Linus Torvalds [Fri, 24 Jun 2022 20:54:16 +0000 (13:54 -0700)]
Merge tag 'spi-fix-v5.19-rc3' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A bunch of driver specific fixes, plus a fix for spi-mem's status
polling for devices that use GPIO chip selects and a DT bindings
examples fix that helps with the validation work"
* tag 'spi-fix-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: rockchip: Unmask IRQ at the final to avoid preemption
spi: dt-bindings: Fix unevaluatedProperties warnings in examples
spi: spi-mem: Fix spi_mem_poll_status()
spi: cadence: Detect transmit FIFO depth
spi: spi-cadence: Fix SPI CS gets toggling sporadically
Linus Torvalds [Fri, 24 Jun 2022 20:51:07 +0000 (13:51 -0700)]
Merge tag 'regulator-fix-v5.19-rc3' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"One fix for an incorrect device description for MP5496"
* tag 'regulator-fix-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: qcom_smd: correct MP5496 ranges
Linus Torvalds [Fri, 24 Jun 2022 20:49:13 +0000 (13:49 -0700)]
Merge tag 'regmap-fix-v5.19-rc3' of git://git./linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
"Two sets of fixes - one for things that were missed with the support
for custom bulk I/O operations introduced in the last merge window,
and another for some long standing issues with regmap-irq which affect
a fairly small subset of devices"
* tag 'regmap-fix-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap-irq: Fix offset/index mismatch in read_sub_irq_data()
regmap-irq: Fix a bug in regmap_irq_enable() for type_in_mask chips
regmap: Wire up regmap_config provided bulk write in missed functions
regmap: Make regmap_noinc_read() return -ENOTSUPP if map->read isn't set
regmap: Re-introduce bulk read support check in regmap_bulk_read()
Linus Torvalds [Fri, 24 Jun 2022 20:41:47 +0000 (13:41 -0700)]
Merge tag 'iommu-fixes-v5.19-rc3' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Add a new IOMMU mailing list to the MAINTAINERS file to prepare for
the a list migration happening on July 5th. The old list needs to
stay in place until the switch happens to guarantee seemless
archiving of list email.
- Fix compatible device-tree string for rcar-gen4 in Renesas IOMMU
driver.
* tag 'iommu-fixes-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
MAINTAINERS: Add new IOMMU development mailing list
iommu/ipmmu-vmsa: Fix compatible for rcar-gen4
Linus Torvalds [Fri, 24 Jun 2022 19:34:13 +0000 (12:34 -0700)]
Merge tag 'riscv-for-linus-5.19-rc4' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fix from Palmer Dabbelt:
- fix the T-Head memory type errata workaround to avoid behavior
that is unsupported in the LLVM assembler
* tag 'riscv-for-linus-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix ALT_THEAD_PMA's asm parameters
Linus Torvalds [Fri, 24 Jun 2022 19:27:24 +0000 (12:27 -0700)]
Merge tag 's390-5.19-4' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- Fix perf stat accounting for cryptography counters when multiple
events are installed concurrently.
- Prevent installation of unsupported perf events for cryptography
counters.
- Treat perf events cpum_cf/CPU_CYCLES/ and cpu_cf/INSTRUCTIONS/
identical to basic events CPU_CYCLES" and INSTRUCTIONS, since they
address the same hardware.
- Restore kcrash operation which was broken by commit
5d8de293c224
("vmcore: convert copy_oldmem_page() to take an iov_iter").
* tag 's390-5.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pai: Fix multiple concurrent event installation
s390/pai: Prevent invalid event number for pai_crypto PMU
s390/cpumf: Handle events cycles and instructions identical
s390/crash: make copy_oldmem_page() return number of bytes copied
s390/crash: add missing iterator advance in copy_oldmem_page()
Linus Torvalds [Fri, 24 Jun 2022 19:22:11 +0000 (12:22 -0700)]
Merge tag 'for-linus-5.19a-rc4-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- A rare deadlock in Qubes-OS between the i915 driver and Xen grant
unmapping, solved by making the unmapping fully asynchronous
- A bug in the Xen blkfront driver caused by incomplete error handling
- A fix for undefined behavior (shifting a signed int by 31 bits)
- A fix in the Xen drmfront driver avoiding a WARN()
* tag 'for-linus-5.19a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/gntdev: Avoid blocking in unmap_grant_pages()
drm/xen: Add missing VM_DONTEXPAND flag in mmap callback
x86/xen: Remove undefined behavior in setup_features()
xen-blkfront: Handle NULL gendisk
Linus Torvalds [Fri, 24 Jun 2022 19:17:47 +0000 (12:17 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Fix a regression with pKVM when kmemleak is enabled
- Add Oliver Upton as an official KVM/arm64 reviewer
selftests:
- deal with compiler optimizations around hypervisor exits
x86:
- MAINTAINERS reorganization
- Two SEV fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SEV: Init target VMCBs in sev_migrate_from
KVM: x86/svm: add __GFP_ACCOUNT to __sev_dbg_{en,de}crypt_user()
MAINTAINERS: Reorganize KVM/x86 maintainership
selftests: KVM: Handle compiler optimizations in ucall
KVM: arm64: Add Oliver as a reviewer
KVM: arm64: Prevent kmemleak from accessing pKVM memory
tools/kvm_stat: fix display of error when multiple processes are found
Linus Torvalds [Fri, 24 Jun 2022 18:43:49 +0000 (11:43 -0700)]
Merge tag 'drm-fixes-2022-06-24' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Fixes for this week, bit larger than normal, but I think the last
couple have been quieter, and it's only rc4.
There are a lot of small msm fixes, and a slightly larger set of vc4
fixes. The vc4 fixes clean up a lot of crashes around the rPI4
hardware differences from earlier ones, and problems in the page flip
and modeset code which assumed earlier hw, so I thought it would be
okay to keep them in.
Otherwise, it's a few amdgpu, i915, sun4i and a panel quirk.
amdgpu:
- Adjust GTT size logic
- eDP fix for RMB
- DCN 3.15 fix
- DP training fix
- Color encoding fix for DCN2+
sun4i:
- multiple suspend fixes
vc4:
- rework driver split for rpi4, fixes mulitple crashers.
panel:
- quirk for Aya Neo Next
i915:
- Revert low voltage SKU check removal to fix display issues
- Apply PLL DCO fraction workaround for ADL-S
- Don't show engine classes not present in client fdinfo
msm:
- Workaround for parade DSI bridge power sequencing
- Fix for multi-planar YUV format offsets
- Limiting WB modes to max sspp linewidth
- Fixing the supported rotations to add 180 back for IGT
- Fix to handle pm_runtime_get_sync() errors to avoid unclocked
access in the bind() path for dpu driver
- Fix the irq_free() without request issue which was a being hit
frequently in CI.
- Fix to add minimum ICC vote in the msm_mdss pm_resume path to
address bootup splats
- Fix to avoid dereferencing without checking in WB encoder
- Fix to avoid crash during suspend in DP driver by ensuring
interrupt mask bits are updated
- Remove unused code from dpu_encoder_virt_atomic_check()
- Fix to remove redundant init of dsc variable
- Fix to ensure mmap offset is initialized to avoid memory corruption
from unpin/evict
- Fix double runpm disable in probe-defer path
- VMA fenced-unpin fixes
- Fix for WB max-width
- Fix for rare dp resolution change issue"
* tag 'drm-fixes-2022-06-24' of git://anongit.freedesktop.org/drm/drm: (41 commits)
amd/display/dc: Fix COLOR_ENCODING and COLOR_RANGE doing nothing for DCN20+
drm/amd/display: Fix typo in override_lane_settings
drm/amd/display: Fix DC warning at driver load
drm/amd: Revert "drm/amd/display: keep eDP Vdd on when eDP stream is already enabled"
drm/amdgpu: Adjust logic around GTT size (v3)
drm/sun4i: Return if frontend is not present
drm/vc4: fix error code in vc4_check_tex_size()
drm/sun4i: Add DMA mask and segment size
drm/vc4: hdmi: Fixed possible integer overflow
drm/i915/display: Re-add check for low voltage sku for max dp source rate
drm/i915/fdinfo: Don't show engine classes not present
drm/i915: Implement w/a
22010492432 for adl-s
drm: panel-orientation-quirks: Add quirk for Aya Neo Next
drm/msm/dp: force link training for display resolution change
drm/msm/dpu: limit wb modes based on max_mixer_width
drm/msm/dp: check core_initialized before disable interrupts at dp_display_unbind()
drm/msm/mdp4: Fix refcount leak in mdp4_modeset_init_intf
drm/msm: Don't overwrite hw fence in hw_init
drm/msm: Drop update_fences()
drm/vc4: Warn if some v3d code is run on BCM2711
...
Paulo Alcantara [Fri, 24 Jun 2022 18:01:43 +0000 (15:01 -0300)]
cifs: update cifs_ses::ip_addr after failover
cifs_ses::ip_addr wasn't being updated in cifs_session_setup() when
reconnecting SMB sessions thus returning wrong value in
/proc/fs/cifs/DebugData.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Fri, 24 Jun 2022 18:16:27 +0000 (11:16 -0700)]
Merge tag 'for-5.19/dm-fixes-4' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM era to commit metadata during suspend using drain_workqueue
instead of flush_workqueue.
- Fix DM core's dm_io_complete to not return early if io error is
BLK_STS_AGAIN but bio polling is not in use.
- Fix DM core's dm_io_complete BLK_STS_DM_REQUEUE handling when dm_io
represents a split bio.
- Fix recent DM mirror log regression by clearing bits up to
BITS_PER_LONG boundary.
* tag 'for-5.19/dm-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm mirror log: clear log bits up to BITS_PER_LONG boundary
dm: fix BLK_STS_DM_REQUEUE handling when dm_io represents split bio
dm: do not return early from dm_io_complete if BLK_STS_AGAIN without polling
dm era: commit metadata in postsuspend after worker stops
Linus Torvalds [Fri, 24 Jun 2022 18:12:34 +0000 (11:12 -0700)]
Merge tag 'ata-5.19-rc4' of git://git./linux/kernel/git/dlemoal/libata
Pull ATA fix from Damien Le Moal:
- a single patch to fix tracing of command completion (Edward)
* tag 'ata-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata: add qc->flags in ata_qc_complete_template tracepoint
Linus Torvalds [Fri, 24 Jun 2022 18:07:54 +0000 (11:07 -0700)]
Merge tag 'block-5.19-2022-06-24' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Series fixing issues with sysfs locking and name reuse (Christoph)
- NVMe pull request via Christoph:
- Fix the mixed up CRIMS/CRWMS constants (Joel Granados)
- Add another broken identifier quirk (Leo Savernik)
- Fix up a quirk because Samsung reuses PCI IDs over different
products (Christoph Hellwig)
- Remove old WARN_ON() that doesn't apply anymore (Li)
- Fix for using a stale cached request value for rq-qos throttling
mechanisms that may schedule(), like iocost (me)
- Remove unused parameter to blk_independent_access_range() (Damien)
* tag 'block-5.19-2022-06-24' of git://git.kernel.dk/linux-block:
block: remove WARN_ON() from bd_link_disk_holder
nvme: move the Samsung X5 quirk entry to the core quirks
nvme: fix the CRIMS and CRWMS definitions to match the spec
nvme: add a bogus subsystem NQN quirk for Micron MTFDKBA2T0TFH
block: pop cached rq before potentially blocking rq_qos_throttle()
block: remove queue from struct blk_independent_access_range
block: freeze the queue earlier in del_gendisk
block: remove per-disk debugfs files in blk_unregister_queue
block: serialize all debugfs operations using q->debugfs_mutex
block: disable the elevator int del_gendisk
Linus Torvalds [Fri, 24 Jun 2022 18:02:26 +0000 (11:02 -0700)]
Merge tag 'io_uring-5.19-2022-06-24' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"A few fixes that should go into the 5.19 release. All are fixing
issues that either happened in this release, or going to stable.
In detail:
- A small series of fixlets for the poll handling, all destined for
stable (Pavel)
- Fix a merge error from myself that caused a potential -EINVAL for
the recv/recvmsg flag setting (me)
- Fix a kbuf recycling issue for partial IO (me)
- Use the original request for the inflight tracking (me)
- Fix an issue introduced this merge window with trace points using a
custom decoder function, which won't work for perf (Dylan)"
* tag 'io_uring-5.19-2022-06-24' of git://git.kernel.dk/linux-block:
io_uring: use original request task for inflight tracking
io_uring: move io_uring_get_opcode out of TP_printk
io_uring: fix double poll leak on repolling
io_uring: fix wrong arm_poll error handling
io_uring: fail links when poll fails
io_uring: fix req->apoll_events
io_uring: fix merge error in checking send/recv addr2 flags
io_uring: mark reissue requests with REQ_F_PARTIAL_IO
Linus Torvalds [Fri, 24 Jun 2022 17:54:07 +0000 (10:54 -0700)]
Merge tag 'printk-for-5.19-rc4' of git://git./linux/kernel/git/printk/linux
Pull printk kernel thread revert from Petr Mladek:
"Revert printk console kthreads.
The testing of 5.19 release candidates revealed issues that did not
happen when all consoles were serialized using the console semaphore.
More time is needed to check expectations of the existing console
drivers and be confident that they can be safely used in parallel"
* tag 'printk-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
Revert "printk: add functions to prefer direct printing"
Revert "printk: add kthread console printers"
Revert "printk: extend console_lock for per-console locking"
Revert "printk: remove @console_locked"
Revert "printk: Block console kthreads when direct printing will be required"
Revert "printk: Wait for the global console lock when the system is going down"