Yu Zhao [Sun, 18 Sep 2022 08:00:11 +0000 (02:00 -0600)]
BACKPORT: mm: multi-gen LRU: design doc
Add a design doc.
Link: https://lkml.kernel.org/r/20220918080010.2920238-15-yuzhao@google.com
Change-Id: I78c0956fb9f3b34d9faf96ecdef31fe0c5aeb184
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit
8be976a0937a18118424dd2505925081d9192fd5)
[ Kalesh Singh - Fix trivial conflicts in Documentation/vm/index.rst ]
Bug:
249601646
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
[backport of the commit
383505860c229dcd6ef4dc5cc458dddead48e7c9 from android13-5.15 branch]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Yu Zhao [Sun, 18 Sep 2022 08:00:10 +0000 (02:00 -0600)]
UPSTREAM: mm: multi-gen LRU: admin guide
Add an admin guide.
Link: https://lkml.kernel.org/r/20220918080010.2920238-14-yuzhao@google.com
Change-Id: Ia4dba47e8231eda4f0e76fb8969df7291a9bfe7c
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit
07017acb06012d250fb68930e809257e6694d324)
Bug:
249601646
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
[backport of the commit
3fa3e8ad5d5f1201852234fedccdbd33bfe731b7 from android13-5.15 branch]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Yu Zhao [Sun, 18 Sep 2022 08:00:09 +0000 (02:00 -0600)]
BACKPORT: mm: multi-gen LRU: debugfs interface
Add /sys/kernel/debug/lru_gen for working set estimation and proactive
reclaim. These techniques are commonly used to optimize job scheduling
(bin packing) in data centers [1][2].
Compared with the page table-based approach and the PFN-based
approach, this lruvec-based approach has the following advantages:
1. It offers better choices because it is aware of memcgs, NUMA nodes,
shared mappings and unmapped page cache.
2. It is more scalable because it is O(nr_hot_pages), whereas the
PFN-based approach is O(nr_total_pages).
Add /sys/kernel/debug/lru_gen_full for debugging.
[1] https://dl.acm.org/doi/10.1145/3297858.3304053
[2] https://dl.acm.org/doi/10.1145/3503222.3507731
Link: https://lkml.kernel.org/r/20220918080010.2920238-13-yuzhao@google.com
Change-Id: I2d07e743e7ed139fb2ba16d5f2c1a5f32f238ccc
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit
d6c3af7d8a2ba5602c28841248c551a712ac50f5)
Bug:
249601646
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
[backport of the commit
a95784fdac0bffbac611fe90910dbc58a00ddd23 from android13-5.15 branch]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Yu Zhao [Sun, 18 Sep 2022 08:00:08 +0000 (02:00 -0600)]
UPSTREAM: mm: multi-gen LRU: thrashing prevention
Add /sys/kernel/mm/lru_gen/min_ttl_ms for thrashing prevention, as
requested by many desktop users [1].
When set to value N, it prevents the working set of N milliseconds from
getting evicted. The OOM killer is triggered if this working set cannot
be kept in memory. Based on the average human detectable lag (~100ms),
N=1000 usually eliminates intolerable lags due to thrashing. Larger
values like N=3000 make lags less noticeable at the risk of premature OOM
kills.
Compared with the size-based approach [2], this time-based approach
has the following advantages:
1. It is easier to configure because it is agnostic to applications
and memory sizes.
2. It is more reliable because it is directly wired to the OOM killer.
[1] https://lore.kernel.org/r/Ydza%2FzXKY9ATRoh6@google.com/
[2] https://lore.kernel.org/r/
20101028191523.GA14972@google.com/
Link: https://lkml.kernel.org/r/20220918080010.2920238-12-yuzhao@google.com
Change-Id: I52c14154a55c3e131d6e43fc623b3030cfa435ec
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit
1332a809d95a4fc763cabe5ecb6d4fb6a6d941b2)
Bug:
249601646
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
[backport of the commit
dd4f2bd6c074cbd906bbbb1de6c950b0feb782a4 from android13-5.15 branch]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Yu Zhao [Sun, 18 Sep 2022 08:00:07 +0000 (02:00 -0600)]
BACKPORT: mm: multi-gen LRU: kill switch
Add /sys/kernel/mm/lru_gen/enabled as a kill switch. Components that
can be disabled include:
0x0001: the multi-gen LRU core
0x0002: walking page table, when arch_has_hw_pte_young() returns
true
0x0004: clearing the accessed bit in non-leaf PMD entries, when
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y
[yYnN]: apply to all the components above
E.g.,
echo y >/sys/kernel/mm/lru_gen/enabled
cat /sys/kernel/mm/lru_gen/enabled
0x0007
echo 5 >/sys/kernel/mm/lru_gen/enabled
cat /sys/kernel/mm/lru_gen/enabled
0x0005
NB: the page table walks happen on the scale of seconds under heavy memory
pressure, in which case the mmap_lock contention is a lesser concern,
compared with the LRU lock contention and the I/O congestion. So far the
only well-known case of the mmap_lock contention happens on Android, due
to Scudo [1] which allocates several thousand VMAs for merely a few
hundred MBs. The SPF and the Maple Tree also have provided their own
assessments [2][3]. However, if walking page tables does worsen the
mmap_lock contention, the kill switch can be used to disable it. In this
case the multi-gen LRU will suffer a minor performance degradation, as
shown previously.
Clearing the accessed bit in non-leaf PMD entries can also be disabled,
since this behavior was not tested on x86 varieties other than Intel and
AMD.
[1] https://source.android.com/devices/tech/debug/scudo
[2] https://lore.kernel.org/r/
20220128131006.67712-1-michel@lespinasse.org/
[3] https://lore.kernel.org/r/
20220426150616.3937571-1-Liam.Howlett@oracle.com/
Link: https://lkml.kernel.org/r/20220918080010.2920238-11-yuzhao@google.com
Change-Id: If3116e6698cc6967b6992c2017962fac6c2d3a11
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit
354ed597442952fb680c9cafc7e4eb8a76f9514c)
Bug:
249601646
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
[backport of the commit
94d1a38c479812f97af8a865c2d8b6298898ea18 from android13-5.15 branch]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Yu Zhao [Sun, 18 Sep 2022 08:00:06 +0000 (02:00 -0600)]
BACKPORT: mm: multi-gen LRU: optimize multiple memcgs
When multiple memcgs are available, it is possible to use generations as a
frame of reference to make better choices and improve overall performance
under global memory pressure. This patch adds a basic optimization to
select memcgs that can drop single-use unmapped clean pages first. Doing
so reduces the chance of going into the aging path or swapping, which can
be costly.
A typical example that benefits from this optimization is a server running
mixed types of workloads, e.g., heavy anon workload in one memcg and heavy
buffered I/O workload in the other.
Though this optimization can be applied to both kswapd and direct reclaim,
it is only added to kswapd to keep the patchset manageable. Later
improvements may cover the direct reclaim path.
While ensuring certain fairness to all eligible memcgs, proportional scans
of individual memcgs also require proper backoff to avoid overshooting
their aggregate reclaim target by too much. Otherwise it can cause high
direct reclaim latency. The conditions for backoff are:
1. At low priorities, for direct reclaim, if aging fairness or direct
reclaim latency is at risk, i.e., aging one memcg multiple times or
swapping after the target is met.
2. At high priorities, for global reclaim, if per-zone free pages are
above respective watermarks.
Server benchmark results:
Mixed workloads:
fio (buffered I/O): +[19, 21]%
IOPS BW
patch1-8: 1880k 7343MiB/s
patch1-9: 2252k 8796MiB/s
memcached (anon): +[119, 123]%
Ops/sec KB/sec
patch1-8: 862768.65 33514.68
patch1-9: 1911022.12 74234.54
Mixed workloads:
fio (buffered I/O): +[75, 77]%
IOPS BW
5.19-rc1: 1279k 4996MiB/s
patch1-9: 2252k 8796MiB/s
memcached (anon): +[13, 15]%
Ops/sec KB/sec
5.19-rc1: 1673524.04 65008.87
patch1-9: 1911022.12 74234.54
Configurations:
(changes since patch 6)
cat mixed.sh
modprobe brd rd_nr=2 rd_size=
56623104
swapoff -a
mkswap /dev/ram0
swapon /dev/ram0
mkfs.ext4 /dev/ram1
mount -t ext4 /dev/ram1 /mnt
memtier_benchmark -S /var/run/memcached/memcached.sock \
-P memcache_binary -n allkeys --key-minimum=1 \
--key-maximum=
50000000 --key-pattern=P:P -c 1 -t 36 \
--ratio 1:0 --pipeline 8 -d 2000
fio -name=mglru --numjobs=36 --directory=/mnt --size=1408m \
--buffered=1 --ioengine=io_uring --iodepth=128 \
--iodepth_batch_submit=32 --iodepth_batch_complete=32 \
--rw=randread --random_distribution=random --norandommap \
--time_based --ramp_time=10m --runtime=90m --group_reporting &
pid=$!
sleep 200
memtier_benchmark -S /var/run/memcached/memcached.sock \
-P memcache_binary -n allkeys --key-minimum=1 \
--key-maximum=
50000000 --key-pattern=R:R -c 1 -t 36 \
--ratio 0:1 --pipeline 8 --randomize --distinct-client-seed
kill -INT $pid
wait
Client benchmark results:
no change (CONFIG_MEMCG=n)
Link: https://lkml.kernel.org/r/20220918080010.2920238-10-yuzhao@google.com
Change-Id: I590780e6381d577d800d4c4551a702047fc31cc7
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit
f76c83378851f8e70f032848c4e61203f39480e4)
Bug:
249601646
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
[backport of the commit
8726e22e86042de4fea4544854f11e60be0c20c7 from android13-5.15 branch]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Yu Zhao [Sun, 18 Sep 2022 08:00:05 +0000 (02:00 -0600)]
BACKPORT: mm: multi-gen LRU: support page table walks
To further exploit spatial locality, the aging prefers to walk page tables
to search for young PTEs and promote hot pages. A kill switch will be
added in the next patch to disable this behavior. When disabled, the
aging relies on the rmap only.
NB: this behavior has nothing similar with the page table scanning in the
2.4 kernel [1], which searches page tables for old PTEs, adds cold pages
to swapcache and unmaps them.
To avoid confusion, the term "iteration" specifically means the traversal
of an entire mm_struct list; the term "walk" will be applied to page
tables and the rmap, as usual.
An mm_struct list is maintained for each memcg, and an mm_struct follows
its owner task to the new memcg when this task is migrated. Given an
lruvec, the aging iterates lruvec_memcg()->mm_list and calls
walk_page_range() with each mm_struct on this list to promote hot pages
before it increments max_seq.
When multiple page table walkers iterate the same list, each of them gets
a unique mm_struct; therefore they can run concurrently. Page table
walkers ignore any misplaced pages, e.g., if an mm_struct was migrated,
pages it left in the previous memcg will not be promoted when its current
memcg is under reclaim. Similarly, page table walkers will not promote
pages from nodes other than the one under reclaim.
This patch uses the following optimizations when walking page tables:
1. It tracks the usage of mm_struct's between context switches so that
page table walkers can skip processes that have been sleeping since
the last iteration.
2. It uses generational Bloom filters to record populated branches so
that page table walkers can reduce their search space based on the
query results, e.g., to skip page tables containing mostly holes or
misplaced pages.
3. It takes advantage of the accessed bit in non-leaf PMD entries when
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y.
4. It does not zigzag between a PGD table and the same PMD table
spanning multiple VMAs. IOW, it finishes all the VMAs within the
range of the same PMD table before it returns to a PGD table. This
improves the cache performance for workloads that have large
numbers of tiny VMAs [2], especially when CONFIG_PGTABLE_LEVELS=5.
Server benchmark results:
Single workload:
fio (buffered I/O): no change
Single workload:
memcached (anon): +[8, 10]%
Ops/sec KB/sec
patch1-7: 1147696.57 44640.29
patch1-8: 1245274.91 48435.66
Configurations:
no change
Client benchmark results:
kswapd profiles:
patch1-7
48.16% lzo1x_1_do_compress (real work)
8.20% page_vma_mapped_walk (overhead)
7.06% _raw_spin_unlock_irq
2.92% ptep_clear_flush
2.53% __zram_bvec_write
2.11% do_raw_spin_lock
2.02% memmove
1.93% lru_gen_look_around
1.56% free_unref_page_list
1.40% memset
patch1-8
49.44% lzo1x_1_do_compress (real work)
6.19% page_vma_mapped_walk (overhead)
5.97% _raw_spin_unlock_irq
3.13% get_pfn_page
2.85% ptep_clear_flush
2.42% __zram_bvec_write
2.08% do_raw_spin_lock
1.92% memmove
1.44% alloc_zspage
1.36% memset
Configurations:
no change
Thanks to the following developers for their efforts [3].
kernel test robot <lkp@intel.com>
[1] https://lwn.net/Articles/23732/
[2] https://llvm.org/docs/ScudoHardenedAllocator.html
[3] https://lore.kernel.org/r/
202204160827.ekEARWQo-lkp@intel.com/
Link: https://lkml.kernel.org/r/20220918080010.2920238-9-yuzhao@google.com
Change-Id: I7ed3daf288e664e15bfd34991a77467a19a4e39a
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit
bd74fdaea146029e4fa12c6de89adbe0779348a9)
[ Resolve conflicts in include/linux/memcontrol.h,
include/linux/mm_types.h ]
Bug:
249601646
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
[backport of the commit
35e2163024497e8140fdfdf97c59f50a67d32092 from android13-5.15 branch]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Yu Zhao [Sun, 18 Sep 2022 08:00:04 +0000 (02:00 -0600)]
BACKPORT: mm: multi-gen LRU: exploit locality in rmap
Searching the rmap for PTEs mapping each page on an LRU list (to test and
clear the accessed bit) can be expensive because pages from different VMAs
(PA space) are not cache friendly to the rmap (VA space). For workloads
mostly using mapped pages, searching the rmap can incur the highest CPU
cost in the reclaim path.
This patch exploits spatial locality to reduce the trips into the rmap.
When shrink_page_list() walks the rmap and finds a young PTE, a new
function lru_gen_look_around() scans at most BITS_PER_LONG-1 adjacent
PTEs. On finding another young PTE, it clears the accessed bit and
updates the gen counter of the page mapped by this PTE to
(max_seq%MAX_NR_GENS)+1.
Server benchmark results:
Single workload:
fio (buffered I/O): no change
Single workload:
memcached (anon): +[3, 5]%
Ops/sec KB/sec
patch1-6: 1106168.46 43025.04
patch1-7: 1147696.57 44640.29
Configurations:
no change
Client benchmark results:
kswapd profiles:
patch1-6
39.03% lzo1x_1_do_compress (real work)
18.47% page_vma_mapped_walk (overhead)
6.74% _raw_spin_unlock_irq
3.97% do_raw_spin_lock
2.49% ptep_clear_flush
2.48% anon_vma_interval_tree_iter_first
1.92% page_referenced_one
1.88% __zram_bvec_write
1.48% memmove
1.31% vma_interval_tree_iter_next
patch1-7
48.16% lzo1x_1_do_compress (real work)
8.20% page_vma_mapped_walk (overhead)
7.06% _raw_spin_unlock_irq
2.92% ptep_clear_flush
2.53% __zram_bvec_write
2.11% do_raw_spin_lock
2.02% memmove
1.93% lru_gen_look_around
1.56% free_unref_page_list
1.40% memset
Configurations:
no change
Link: https://lkml.kernel.org/r/20220918080010.2920238-8-yuzhao@google.com
Change-Id: Iac405b6d42e2e3f632b6748368f61202c164f1ad
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Barry Song <baohua@kernel.org>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit
018ee47f14893d500131dfca2ff9f3ff8ebd4ed2)
Bug:
249601646
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
[backport of the commit
009d8570595a28fc61b5db16d93f01033c00a2b2 from android13-5.15 branch]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Yu Zhao [Sun, 18 Sep 2022 08:00:03 +0000 (02:00 -0600)]
BACKPORT: mm: multi-gen LRU: minimal implementation
To avoid confusion, the terms "promotion" and "demotion" will be applied
to the multi-gen LRU, as a new convention; the terms "activation" and
"deactivation" will be applied to the active/inactive LRU, as usual.
The aging produces young generations. Given an lruvec, it increments
max_seq when max_seq-min_seq+1 approaches MIN_NR_GENS. The aging promotes
hot pages to the youngest generation when it finds them accessed through
page tables; the demotion of cold pages happens consequently when it
increments max_seq. Promotion in the aging path does not involve any LRU
list operations, only the updates of the gen counter and
lrugen->nr_pages[]; demotion, unless as the result of the increment of
max_seq, requires LRU list operations, e.g., lru_deactivate_fn(). The
aging has the complexity O(nr_hot_pages), since it is only interested in
hot pages.
The eviction consumes old generations. Given an lruvec, it increments
min_seq when lrugen->lists[] indexed by min_seq%MAX_NR_GENS becomes empty.
A feedback loop modeled after the PID controller monitors refaults over
anon and file types and decides which type to evict when both types are
available from the same generation.
The protection of pages accessed multiple times through file descriptors
takes place in the eviction path. Each generation is divided into
multiple tiers. A page accessed N times through file descriptors is in
tier order_base_2(N). Tiers do not have dedicated lrugen->lists[], only
bits in page->flags. The aforementioned feedback loop also monitors
refaults over all tiers and decides when to protect pages in which tiers
(N>1), using the first tier (N=0,1) as a baseline. The first tier
contains single-use unmapped clean pages, which are most likely the best
choices. In contrast to promotion in the aging path, the protection of a
page in the eviction path is achieved by moving this page to the next
generation, i.e., min_seq+1, if the feedback loop decides so. This
approach has the following advantages:
1. It removes the cost of activation in the buffered access path by
inferring whether pages accessed multiple times through file
descriptors are statistically hot and thus worth protecting in the
eviction path.
2. It takes pages accessed through page tables into account and avoids
overprotecting pages accessed multiple times through file
descriptors. (Pages accessed through page tables are in the first
tier, since N=0.)
3. More tiers provide better protection for pages accessed more than
twice through file descriptors, when under heavy buffered I/O
workloads.
Server benchmark results:
Single workload:
fio (buffered I/O): +[30, 32]%
IOPS BW
5.19-rc1: 2673k 10.2GiB/s
patch1-6: 3491k 13.3GiB/s
Single workload:
memcached (anon): -[4, 6]%
Ops/sec KB/sec
5.19-rc1: 1161501.04 45177.25
patch1-6: 1106168.46 43025.04
Configurations:
CPU: two Xeon 6154
Mem: total 256G
Node 1 was only used as a ram disk to reduce the variance in the
results.
patch drivers/block/brd.c <<EOF
99,100c99,100
< gfp_flags = GFP_NOIO | __GFP_ZERO | __GFP_HIGHMEM;
< page = alloc_page(gfp_flags);
---
> gfp_flags = GFP_NOIO | __GFP_ZERO | __GFP_HIGHMEM | __GFP_THISNODE;
> page = alloc_pages_node(1, gfp_flags, 0);
EOF
cat >>/etc/systemd/system.conf <<EOF
CPUAffinity=numa
NUMAPolicy=bind
NUMAMask=0
EOF
cat >>/etc/memcached.conf <<EOF
-m 184320
-s /var/run/memcached/memcached.sock
-a 0766
-t 36
-B binary
EOF
cat fio.sh
modprobe brd rd_nr=1 rd_size=
113246208
swapoff -a
mkfs.ext4 /dev/ram0
mount -t ext4 /dev/ram0 /mnt
mkdir /sys/fs/cgroup/user.slice/test
echo
38654705664 >/sys/fs/cgroup/user.slice/test/memory.max
echo $$ >/sys/fs/cgroup/user.slice/test/cgroup.procs
fio -name=mglru --numjobs=72 --directory=/mnt --size=1408m \
--buffered=1 --ioengine=io_uring --iodepth=128 \
--iodepth_batch_submit=32 --iodepth_batch_complete=32 \
--rw=randread --random_distribution=random --norandommap \
--time_based --ramp_time=10m --runtime=5m --group_reporting
cat memcached.sh
modprobe brd rd_nr=1 rd_size=
113246208
swapoff -a
mkswap /dev/ram0
swapon /dev/ram0
memtier_benchmark -S /var/run/memcached/memcached.sock \
-P memcache_binary -n allkeys --key-minimum=1 \
--key-maximum=
65000000 --key-pattern=P:P -c 1 -t 36 \
--ratio 1:0 --pipeline 8 -d 2000
memtier_benchmark -S /var/run/memcached/memcached.sock \
-P memcache_binary -n allkeys --key-minimum=1 \
--key-maximum=
65000000 --key-pattern=R:R -c 1 -t 36 \
--ratio 0:1 --pipeline 8 --randomize --distinct-client-seed
Client benchmark results:
kswapd profiles:
5.19-rc1
40.33% page_vma_mapped_walk (overhead)
21.80% lzo1x_1_do_compress (real work)
7.53% do_raw_spin_lock
3.95% _raw_spin_unlock_irq
2.52% vma_interval_tree_iter_next
2.37% page_referenced_one
2.28% vma_interval_tree_subtree_search
1.97% anon_vma_interval_tree_iter_first
1.60% ptep_clear_flush
1.06% __zram_bvec_write
patch1-6
39.03% lzo1x_1_do_compress (real work)
18.47% page_vma_mapped_walk (overhead)
6.74% _raw_spin_unlock_irq
3.97% do_raw_spin_lock
2.49% ptep_clear_flush
2.48% anon_vma_interval_tree_iter_first
1.92% page_referenced_one
1.88% __zram_bvec_write
1.48% memmove
1.31% vma_interval_tree_iter_next
Configurations:
CPU: single Snapdragon 7c
Mem: total 4G
ChromeOS MemoryPressure [1]
[1] https://chromium.googlesource.com/chromiumos/platform/tast-tests/
Link: https://lkml.kernel.org/r/20220918080010.2920238-7-yuzhao@google.com
Change-Id: I30b26b3086ce1879b83b96eb265f8f0dcb16a1fb
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit
ac35a490237446b71e3b4b782b1596967edd0aa8)
[Resolve confilcts in mm/Kconfig, mm/swap.c, mm/vmscan.c]
Bug:
249601646
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
[backport of the commit
53af55e4cca49a4b16946fae3ae6ab9ef3fef079 from android13-5.15 branch]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Yu Zhao [Sun, 18 Sep 2022 08:00:02 +0000 (02:00 -0600)]
BACKPORT: mm: multi-gen LRU: groundwork
Evictable pages are divided into multiple generations for each lruvec.
The youngest generation number is stored in lrugen->max_seq for both
anon and file types as they are aged on an equal footing. The oldest
generation numbers are stored in lrugen->min_seq[] separately for anon
and file types as clean file pages can be evicted regardless of swap
constraints. These three variables are monotonically increasing.
Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits
in order to fit into the gen counter in page->flags. Each truncated
generation number is an index to lrugen->lists[]. The sliding window
technique is used to track at least MIN_NR_GENS and at most
MAX_NR_GENS generations. The gen counter stores a value within [1,
MAX_NR_GENS] while a page is on one of lrugen->lists[]. Otherwise it
stores 0.
There are two conceptually independent procedures: "the aging", which
produces young generations, and "the eviction", which consumes old
generations. They form a closed-loop system, i.e., "the page reclaim".
Both procedures can be invoked from userspace for the purposes of working
set estimation and proactive reclaim. These techniques are commonly used
to optimize job scheduling (bin packing) in data centers [1][2].
To avoid confusion, the terms "hot" and "cold" will be applied to the
multi-gen LRU, as a new convention; the terms "active" and "inactive" will
be applied to the active/inactive LRU, as usual.
The protection of hot pages and the selection of cold pages are based
on page access channels and patterns. There are two access channels:
one through page tables and the other through file descriptors. The
protection of the former channel is by design stronger because:
1. The uncertainty in determining the access patterns of the former
channel is higher due to the approximation of the accessed bit.
2. The cost of evicting the former channel is higher due to the TLB
flushes required and the likelihood of encountering the dirty bit.
3. The penalty of underprotecting the former channel is higher because
applications usually do not prepare themselves for major page
faults like they do for blocked I/O. E.g., GUI applications
commonly use dedicated I/O threads to avoid blocking rendering
threads.
There are also two access patterns: one with temporal locality and the
other without. For the reasons listed above, the former channel is
assumed to follow the former pattern unless VM_SEQ_READ or VM_RAND_READ is
present; the latter channel is assumed to follow the latter pattern unless
outlying refaults have been observed [3][4].
The next patch will address the "outlying refaults". Three macros, i.e.,
LRU_REFS_WIDTH, LRU_REFS_PGOFF and LRU_REFS_MASK, used later are added in
this patch to make the entire patchset less diffy.
A page is added to the youngest generation on faulting. The aging needs
to check the accessed bit at least twice before handing this page over to
the eviction. The first check takes care of the accessed bit set on the
initial fault; the second check makes sure this page has not been used
since then. This protocol, AKA second chance, requires a minimum of two
generations, hence MIN_NR_GENS.
[1] https://dl.acm.org/doi/10.1145/3297858.3304053
[2] https://dl.acm.org/doi/10.1145/3503222.3507731
[3] https://lwn.net/Articles/495543/
[4] https://lwn.net/Articles/815342/
Link: https://lkml.kernel.org/r/20220918080010.2920238-6-yuzhao@google.com
Change-Id: I7b24d1e9d263e4eb2c2ee23f2eb143824fcb5201
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit
ec1c86b25f4bdd9dce6436c0539d2a6ae676e1c4)
[ Resolve conflicts in mm/memory.c, mm/memcontrol.c, mm/Kconfig,
include/linux/mm_inline.h]
Bug:
249601646
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
[backport of the commit
f4d4c46c3ad74657f912f112acd6f7b171f991f8 from android13-5.15 branch]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Yu Zhao [Mon, 28 Sep 2020 02:49:08 +0000 (20:49 -0600)]
FROMLIST: mm/vmscan.c: refactor shrink_node()
This patch refactors shrink_node() to improve readability for the
upcoming changes to mm/vmscan.c.
Link: https://lore.kernel.org/lkml/20220309021230.721028-4-yuzhao@google.com/
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Bug:
227651406
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I186f43f946de0d40d54883fb31114840fc749a57
[backport of the commit
8d47a32fa814aa5024253ba0c0a0e82c9ab2d1ae from android13-5.15 branch]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Yu Zhao [Wed, 4 Aug 2021 07:31:34 +0000 (01:31 -0600)]
pgtable: add arch_has_hw_pte_young() fallback
Change-Id: Ie771d5cdd3044cbc4c9918fc1dace1ceb0358ed7
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Marek Szyprowski [Tue, 9 Jan 2024 13:10:48 +0000 (14:10 +0100)]
pgtable: add pud_index() fallback
Change-Id: I4ef23d27ca85afc9933cf1dc70b8b605ee4f6591
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Marek Szyprowski [Tue, 19 Dec 2023 13:42:36 +0000 (14:42 +0100)]
atomic: add try_cmpxchg() fallback
Change-Id: I48abe90191d4f8e0adc2c6d191397b3b04418ff1
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Chris Down [Fri, 7 Aug 2020 06:22:05 +0000 (23:22 -0700)]
mm, memcg: decouple e{low,min} state mutations from protection checks
mem_cgroup_protected currently is both used to set effective low and min
and return a mem_cgroup_protection based on the result. As a user, this
can be a little unexpected: it appears to be a simple predicate function,
if not for the big warning in the comment above about the order in which
it must be executed.
This change makes it so that we separate the state mutations from the
actual protection checks, which makes it more obvious where we need to be
careful mutating internal state, and where we are simply checking and
don't need to worry about that.
[mhocko@suse.com - don't check protection on root memcgs]
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Change-Id: I916c4cb713450129c7010659e44340bd81a0959e
Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Link: http://lkml.kernel.org/r/ff3f915097fcee9f6d7041c084ef92d16aaeb56a.1594638158.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
45c7f7e1ef17f09fe70bad4b705ce43772153fd7 from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Hugh Dickins [Wed, 16 Jun 2021 01:23:49 +0000 (18:23 -0700)]
mm/thp: make is_huge_zero_pmd() safe and quicker
Most callers of is_huge_zero_pmd() supply a pmd already verified
present; but a few (notably zap_huge_pmd()) do not - it might be a pmd
migration entry, in which the pfn is encoded differently from a present
pmd: which might pass the is_huge_zero_pmd() test (though not on x86,
since L1TF forced us to protect against that); or perhaps even crash in
pmd_page() applied to a swap-like entry.
Make it safe by adding pmd_present() check into is_huge_zero_pmd()
itself; and make it quicker by saving huge_zero_pfn, so that
is_huge_zero_pmd() will not need to do that pmd_page() lookup each time.
__split_huge_pmd_locked() checked pmd_trans_huge() before: that worked,
but is unnecessary now that is_huge_zero_pmd() checks present.
Link: https://lkml.kernel.org/r/21ea9ca-a1f5-8b90-5e88-95fb1c49bbfa@google.com
Fixes:
e71769ae5260 ("mm: enable thp migration for shmem thp")
Change-Id: I4b5270605a3753233e3152829180ef86562fa2ad
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jue Wang <juew@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
3b77e8c8cde581dadab9a0f1543a347e24315f11 from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Yu Zhao [Wed, 24 Feb 2021 20:08:13 +0000 (12:08 -0800)]
include/linux/mm_inline.h: shuffle lru list addition and deletion functions
These functions will call page_lru() in the following patches. Move them
below page_lru() to avoid the forward declaration.
Link: https://lore.kernel.org/linux-mm/20201207220949.830352-3-yuzhao@google.com/
Link: https://lkml.kernel.org/r/20210122220600.906146-3-yuzhao@google.com
Change-Id: I5dd30e1b586878b9276ccc0f9896ab47538451ee
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
f90d8191ac864df33b1898bc7edc54eaa24e22bc from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Shakeel Butt [Tue, 15 Dec 2020 03:06:39 +0000 (19:06 -0800)]
mm/rmap: always do TTU_IGNORE_ACCESS
Since commit
369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic
v2"), the code to check the secondary MMU's page table access bit is
broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from the
secondary MMU's page table before the check. More specifically for those
secondary MMUs which unmap the memory in
mmu_notifier_invalidate_range_start() like kvm.
However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the
absence of TTU_IGNORE_ACCESS and it explicitly performs the page table
access check before trying to unmap the page. So, at worst the reclaim
will miss accesses in a very short window if we remove page table access
check in unmapping code.
There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg
reclaim. From memcg reclaim the page_referenced() only account the
accesses from the processes which are in the same memcg of the target page
but the unmapping code is considering accesses from all the processes, so,
decreasing the effectiveness of memcg reclaim.
The simplest solution is to always assume TTU_IGNORE_ACCESS in unmapping
code.
Link: https://lkml.kernel.org/r/20201104231928.1494083-1-shakeelb@google.com
Fixes:
369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic v2")
Change-Id: If6b3d528bf5d76d2004817da96dfa430e78de42f
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
013339df116c2ee0d796dd8bfb8f293a2030c063 from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Alex Shi [Tue, 15 Dec 2020 20:34:16 +0000 (12:34 -0800)]
mm/lru: introduce TestClearPageLRU()
Currently lru_lock still guards both lru list and page's lru bit, that's
ok. but if we want to use specific lruvec lock on the page, we need to
pin down the page's lruvec/memcg during locking. Just taking lruvec lock
first may be undermined by the page's memcg charge/migration. To fix this
problem, we will clear the lru bit out of locking and use it as pin down
action to block the page isolation in memcg changing.
So now a standard steps of page isolation is following:
1, get_page(); #pin the page avoid to be free
2, TestClearPageLRU(); #block other isolation like memcg change
3, spin_lock on lru_lock; #serialize lru list access
4, delete page from lru list;
This patch start with the first part: TestClearPageLRU, which combines
PageLRU check and ClearPageLRU into a macro func TestClearPageLRU. This
function will be used as page isolation precondition to prevent other
isolations some where else. Then there are may !PageLRU page on lru list,
need to remove BUG() checking accordingly.
There 2 rules for lru bit now:
1, the lru bit still indicate if a page on lru list, just in some
temporary moment(isolating), the page may have no lru bit when
it's on lru list. but the page still must be on lru list when the
lru bit set.
2, have to remove lru bit before delete it from lru list.
As Andrew Morton mentioned this change would dirty cacheline for a page
which isn't on the LRU. But the loss would be acceptable in Rong Chen
<rong.a.chen@intel.com> report:
https://lore.kernel.org/lkml/
20200304090301.GB5972@shao2-debian/
Link: https://lkml.kernel.org/r/1604566549-62481-15-git-send-email-alex.shi@linux.alibaba.com
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Change-Id: I77f12be15d182f49521a2c2bfaf8fa31fda4212d
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
d25b5bd8a8f420b15517c19c4626c0c009f72a63 from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Alex Shi [Fri, 18 Dec 2020 22:01:31 +0000 (14:01 -0800)]
mm/memcg: warning on !memcg after readahead page charged
Add VM_WARN_ON_ONCE_PAGE() macro.
Since readahead page is charged on memcg too, in theory we don't have to
check this exception now. Before safely remove them all, add a warning
for the unexpected !memcg.
Link: https://lkml.kernel.org/r/1604283436-18880-3-git-send-email-alex.shi@linux.alibaba.com
Change-Id: I0de058b8c4fd406c7d02ab837388d77c93ef778a
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
a4055888629bc0467d12d912cd7c90acdf3d9b12 from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Yu Zhao [Fri, 16 Oct 2020 03:09:55 +0000 (20:09 -0700)]
mm: use self-explanatory macros rather than "2"
Change-Id: Ia0efe80fa2c62163b0d58a382aef1e0881cba15b
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Link: http://lkml.kernel.org/r/20200831175042.3527153-2-yuzhao@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
ed0173733dd468883198c3136284394320b8fad6 from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Anshuman Khandual [Tue, 7 Apr 2020 03:03:47 +0000 (20:03 -0700)]
mm/vma: make vma_is_accessible() available for general use
Lets move vma_is_accessible() helper to include/linux/mm.h which makes it
available for general use. While here, this replaces all remaining open
encodings for VMA access check with vma_is_accessible().
Change-Id: I475ac8df7e31c848d3c84688f7fb9e3f27c832f5
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Guo Ren <guoren@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/1582520593-30704-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
3122e80efc0faf4a2accba7a46c7ed795edbfded from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Mateusz Nosek [Thu, 2 Apr 2020 04:10:15 +0000 (21:10 -0700)]
mm/vmscan.c: clean code by removing unnecessary assignment
Previously 0 was assigned to variable 'lruvec_size', but the variable was
never read later. So the assignment can be removed.
Fixes:
f87bccde6a7d ("mm/vmscan: remove unused lru_pages argument")
Change-Id: Iddd0982d271ee9ab9387195b87a4f608850cadc7
Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: http://lkml.kernel.org/r/20200229214022.11853-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
e072bff60a29c82b1536d78c412f009d1a1de4cf from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Steven Price [Tue, 4 Feb 2020 01:35:45 +0000 (17:35 -0800)]
mm: pagewalk: add p4d_entry() and pgd_entry()
pgd_entry() and pud_entry() were removed by commit
0b1fbfe50006c410
("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were no
users. We're about to add users so reintroduce them, along with
p4d_entry() as we now have 5 levels of tables.
Note that commit
a00cc7d9dd93d66a ("mm, x86: add support for PUD-sized
transparent hugepages") already re-added pud_entry() but with different
semantics to the other callbacks. This commit reverts the semantics back
to match the other callbacks.
To support hmm.c which now uses the new semantics of pud_entry() a new
member ('action') of struct mm_walk is added which allows the callbacks to
either descend (ACTION_SUBTREE, the default), skip (ACTION_CONTINUE) or
repeat the callback (ACTION_AGAIN). hmm.c is then updated to call
pud_trans_huge_lock() itself and make use of the splitting/retry logic of
the core code.
After this change pud_entry() is called for all entries, not just
transparent huge pages.
[arnd@arndb.de: fix unused variable warning]
Link: http://lkml.kernel.org/r/20200107204607.1533842-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20191218162402.45610-12-steven.price@arm.com
Change-Id: Iefcf3d5f987b2fcc64262da127543ae11207086c
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
3afc423632a194d7d6afef34e4bb98f804cd071d from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Steven Price [Tue, 4 Feb 2020 01:35:01 +0000 (17:35 -0800)]
mm: add generic p?d_leaf() macros
Patch series "Generic page walk and ptdump", v17.
Many architectures current have a debugfs file for dumping the kernel page
tables. Currently each architecture has to implement custom functions for
this because the details of walking the page tables used by the kernel are
different between architectures.
This series extends the capabilities of walk_page_range() so that it can
deal with the page tables of the kernel (which have no VMAs and can
contain larger huge pages than exist for user space). A generic PTDUMP
implementation is the implemented making use of the new functionality of
walk_page_range() and finally arm64 and x86 are switch to using it,
removing the custom table walkers.
To enable a generic page table walker to walk the unusual mappings of the
kernel we need to implement a set of functions which let us know when the
walker has reached the leaf entry. After a suggestion from Will Deacon
I've chosen the name p?d_leaf() as this (hopefully) describes the purpose
(and is a new name so has no historic baggage). Some architectures have
p?d_large macros but this is easily confused with "large pages".
This series ends with a generic PTDUMP implemention for arm64 and x86.
Mostly this is a clean up and there should be very little functional
change. The exceptions are:
* arm64 PTDUMP debugfs now displays pages which aren't present (patch 22).
* arm64 has the ability to efficiently process KASAN pages (which
previously only x86 implemented). This means that the combination of
KASAN and DEBUG_WX is now useable.
This patch (of 23):
Exposing the pud/pgd levels of the page tables to walk_page_range() means
we may come across the exotic large mappings that come with large areas of
contiguous memory (such as the kernel's linear map).
For architectures that don't provide all p?d_leaf() macros, provide
generic do nothing default that are suitable where there cannot be leaf
pages at that level. Futher patches will add implementations for
individual architectures.
The name p?d_leaf() is chosen to minimize the confusion with existing uses
of "large" pages and "huge" pages which do not necessary mean that the
entry is a leaf (for example it may be a set of contiguous entries that
only take 1 TLB slot). For the purpose of walking the page tables we
don't need to know how it will be represented in the TLB, but we do need
to know for sure if it is a leaf of the tree.
Link: http://lkml.kernel.org/r/20191218162402.45610-2-steven.price@arm.com
Change-Id: Id0b9ea4602b9694a0354f056992ce74ab91d3758
Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
93fab1b22ef7c4abcbc760ce4432762b02e7f3d1 from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Johannes Weiner [Sun, 1 Dec 2019 01:56:02 +0000 (17:56 -0800)]
mm: vmscan: enforce inactive:active ratio at the reclaim root
We split the LRU lists into inactive and an active parts to maximize
workingset protection while allowing just enough inactive cache space to
faciltate readahead and writeback for one-off file accesses (e.g. a
linear scan through a file, or logging); or just enough inactive anon to
maintain recent reference information when reclaim needs to swap.
With cgroups and their nested LRU lists, we currently don't do this
correctly. While recursive cgroup reclaim establishes a relative LRU
order among the pages of all involved cgroups, inactive:active size
decisions are done on a per-cgroup level. As a result, we'll reclaim a
cgroup's workingset when it doesn't have cold pages, even when one of its
siblings has plenty of it that should be reclaimed first.
For example: workload A has 50M worth of hot cache but doesn't do any
one-off file accesses; meanwhile, parallel workload B scans files and
rarely accesses the same page twice.
If these workloads were to run in an uncgrouped system, A would be
protected from the high rate of cache faults from B. But if they were put
in parallel cgroups for memory accounting purposes, B's fast cache fault
rate would push out the hot cache pages of A. This is unexpected and
undesirable - the "scan resistance" of the page cache is broken.
This patch moves inactive:active size balancing decisions to the root of
reclaim - the same level where the LRU order is established.
It does this by looking at the recursive size of the inactive and the
active file sets of the cgroup subtree at the beginning of the reclaim
cycle, and then making a decision - scan or skip active pages - that
applies throughout the entire run and to every cgroup involved.
With that in place, in the test above, the VM will recognize that there
are plenty of inactive pages in the combined cache set of workloads A and
B and prefer the one-off cache in B over the hot pages in A. The scan
resistance of the cache is restored.
[cai@lca.pw: fix some -Wenum-conversion warnings]
Link: http://lkml.kernel.org/r/1573848697-29262-1-git-send-email-cai@lca.pw
Link: http://lkml.kernel.org/r/20191107205334.158354-4-hannes@cmpxchg.org
Change-Id: If5fb00a0b06f616e0c096e95a21513bfa8a58e1b
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
b91ac374346ba206cfd568bb0ab830af6b205cfd from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Johannes Weiner [Sun, 1 Dec 2019 01:55:59 +0000 (17:55 -0800)]
mm: vmscan: detect file thrashing at the reclaim root
We use refault information to determine whether the cache workingset is
stable or transitioning, and dynamically adjust the inactive:active file
LRU ratio so as to maximize protection from one-off cache during stable
periods, and minimize IO during transitions.
With cgroups and their nested LRU lists, we currently don't do this
correctly. While recursive cgroup reclaim establishes a relative LRU
order among the pages of all involved cgroups, refaults only affect the
local LRU order in the cgroup in which they are occuring. As a result,
cache transitions can take longer in a cgrouped system as the active pages
of sibling cgroups aren't challenged when they should be.
[ Right now, this is somewhat theoretical, because the siblings, under
continued regular reclaim pressure, should eventually run out of
inactive pages - and since inactive:active *size* balancing is also
done on a cgroup-local level, we will challenge the active pages
eventually in most cases. But the next patch will move that relative
size enforcement to the reclaim root as well, and then this patch
here will be necessary to propagate refault pressure to siblings. ]
This patch moves refault detection to the root of reclaim. Instead of
remembering the cgroup owner of an evicted page, remember the cgroup that
caused the reclaim to happen. When refaults later occur, they'll
correctly influence the cross-cgroup LRU order that reclaim follows.
I.e. if global reclaim kicked out pages in some subgroup A/B/C, the
refault of those pages will challenge the global LRU order, and not just
the local order down inside C.
[hannes@cmpxchg.org: use page_memcg() instead of another lookup]
Link: http://lkml.kernel.org/r/20191115160722.GA309754@cmpxchg.org
Link: http://lkml.kernel.org/r/20191107205334.158354-3-hannes@cmpxchg.org
Change-Id: Ib68cb4395227a8f49e8cf0893208a9b3a415f7c5
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
b910718a948a9120d90faf632b33ed23c70e266a from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Johannes Weiner [Sun, 1 Dec 2019 01:55:56 +0000 (17:55 -0800)]
mm: vmscan: move file exhaustion detection to the node level
Patch series "mm: fix page aging across multiple cgroups".
When applications are put into unconfigured cgroups for memory accounting
purposes, the cgrouping itself should not change the behavior of the page
reclaim code. We expect the VM to reclaim the coldest pages in the
system. But right now the VM can reclaim hot pages in one cgroup while
there is eligible cold cache in others.
This is because one part of the reclaim algorithm isn't truly cgroup
hierarchy aware: the inactive/active list balancing. That is the part
that is supposed to protect hot cache data from one-off streaming IO.
The recursive cgroup reclaim scheme will scan and rotate the physical LRU
lists of each eligible cgroup at the same rate in a round-robin fashion,
thereby establishing a relative order among the pages of all those
cgroups. However, the inactive/active balancing decisions are made
locally within each cgroup, so when a cgroup is running low on cold pages,
its hot pages will get reclaimed - even when sibling cgroups have plenty
of cold cache eligible in the same reclaim run.
For example:
[root@ham ~]# head -n1 /proc/meminfo
MemTotal: 1016336 kB
[root@ham ~]# ./reclaimtest2.sh
Establishing 50M active files in cgroup A...
Hot pages cached: 12800/12800 workingset-a
Linearly scanning through 18G of file data in cgroup B:
real 0m4.269s
user 0m0.051s
sys 0m4.182s
Hot pages cached: 134/12800 workingset-a
The streaming IO in B, which doesn't benefit from caching at all, pushes
out most of the workingset in A.
Solution
This series fixes the problem by elevating inactive/active balancing
decisions to the toplevel of the reclaim run. This is either a cgroup
that hit its limit, or straight-up global reclaim if there is physical
memory pressure. From there, it takes a recursive view of the cgroup
subtree to decide whether page deactivation is necessary.
In the test above, the VM will then recognize that cgroup B has plenty of
eligible cold cache, and that the hot pages in A can be spared:
[root@ham ~]# ./reclaimtest2.sh
Establishing 50M active files in cgroup A...
Hot pages cached: 12800/12800 workingset-a
Linearly scanning through 18G of file data in cgroup B:
real 0m4.244s
user 0m0.064s
sys 0m4.177s
Hot pages cached: 12800/12800 workingset-a
Implementation
Whether active pages can be deactivated or not is influenced by two
factors: the inactive list dropping below a minimum size relative to the
active list, and the occurence of refaults.
This patch series first moves refault detection to the reclaim root, then
enforces the minimum inactive size based on a recursive view of the cgroup
tree's LRUs.
History
Note that this actually never worked correctly in Linux cgroups. In the
past it worked for global reclaim and leaf limit reclaim only (we used to
have two physical LRU linkages per page), but it never worked for
intermediate limit reclaim over multiple leaf cgroups.
We're noticing this now because 1) we're putting everything into cgroups
for accounting, not just the things we want to control and 2) we're moving
away from leaf limits that invoke reclaim on individual cgroups, toward
large tree reclaim, triggered by high-level limits, or physical memory
pressure that is influenced by local protections such as memory.low and
memory.min instead.
This patch (of 3):
When file pages are lower than the watermark on a node, we try to force
scan anonymous pages to counter-act the balancing algorithms preference
for new file pages when they are likely thrashing. This is a node-level
decision, but it's currently made each time we look at an lruvec. This is
unnecessarily expensive and also a layering violation that makes the code
harder to understand.
Clean this up by making the check once per node and setting a flag in the
scan_control.
Link: http://lkml.kernel.org/r/20191107205334.158354-2-hannes@cmpxchg.org
Change-Id: Ia0cf370aad8c58e865fae6c0e8235e06b760aca1
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
53138cea7f398d2cdd0fa22adeec7e16093e1ebd from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Johannes Weiner [Sun, 1 Dec 2019 01:55:52 +0000 (17:55 -0800)]
mm: vmscan: harmonize writeback congestion tracking for nodes & memcgs
The current writeback congestion tracking has separate flags for kswapd
reclaim (node level) and cgroup limit reclaim (memcg-node level). This is
unnecessarily complicated: the lruvec is an existing abstraction layer for
that node-memcg intersection.
Introduce lruvec->flags and LRUVEC_CONGESTED. Then track that at the
reclaim root level, which is either the NUMA node for global reclaim, or
the cgroup-node intersection for cgroup reclaim.
Link: http://lkml.kernel.org/r/20191022144803.302233-9-hannes@cmpxchg.org
Change-Id: Iaa10ac1d8ee7de5c6f2563ac5ff36ae9bae876a2
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
1b05117df78e035afb5f66ef50bf8750d976ef08 from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Johannes Weiner [Sun, 1 Dec 2019 01:55:49 +0000 (17:55 -0800)]
mm: vmscan: split shrink_node() into node part and memcgs part
This function is getting long and unwieldy, split out the memcg bits.
The updated shrink_node() handles the generic (node) reclaim aspects:
- global vmpressure notifications
- writeback and congestion throttling
- reclaim/compaction management
- kswapd giving up on unreclaimable nodes
It then calls a new shrink_node_memcgs() which handles cgroup specifics:
- the cgroup tree traversal
- memory.low considerations
- per-cgroup slab shrinking callbacks
- per-cgroup vmpressure notifications
[hannes@cmpxchg.org: rename "root" to "target_memcg", per Roman]
Link: http://lkml.kernel.org/r/20191025143640.GA386981@cmpxchg.org
Link: http://lkml.kernel.org/r/20191022144803.302233-8-hannes@cmpxchg.org
Change-Id: Ibef223d4f49740dc2ab2e86f2b7c1e529a9643e2
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
0f6a5cff43d3bcd6aa54c9af267737249d02aa21 from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Johannes Weiner [Sun, 1 Dec 2019 01:55:46 +0000 (17:55 -0800)]
mm: vmscan: turn shrink_node_memcg() into shrink_lruvec()
An lruvec holds LRU pages owned by a certain NUMA node and cgroup.
Instead of awkwardly passing around a combination of a pgdat and a memcg
pointer, pass down the lruvec as soon as we can look it up.
Nested callers that need to access node or cgroup properties can look them
them up if necessary, but there are only a few cases.
Link: http://lkml.kernel.org/r/20191022144803.302233-7-hannes@cmpxchg.org
Change-Id: Ice444a0c2c1812301d836c9a0d38532988362ea3
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
afaf07a65ddbdd70871cc3b81463f2a8f3884b6f from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Johannes Weiner [Sun, 1 Dec 2019 01:55:43 +0000 (17:55 -0800)]
mm: vmscan: replace shrink_node() loop with a retry jump
Most of the function body is inside a loop, which imposes an additional
indentation and scoping level that makes the code a bit hard to follow and
modify.
The looping only happens in case of reclaim-compaction, which isn't the
common case. So rather than adding yet another function level to the
reclaim path and have every reclaim invocation go through a level that
only exists for one specific cornercase, use a retry goto.
Link: http://lkml.kernel.org/r/20191022144803.302233-6-hannes@cmpxchg.org
Change-Id: Iab1ead6005f957bd8bbb3fcfa48e63d0ae3592a3
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
d2af339706be318dadcbe14c8935426ff401d7b1 from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Johannes Weiner [Sun, 1 Dec 2019 01:55:40 +0000 (17:55 -0800)]
mm: vmscan: naming fixes: global_reclaim() and sane_reclaim()
Seven years after introducing the global_reclaim() function, I still have
to double take when reading a callsite. I don't know how others do it,
this is a terrible name.
Invert the meaning and rename it to cgroup_reclaim().
[ After all, "global reclaim" is just regular reclaim invoked from the
page allocator. It's reclaim on behalf of a cgroup limit that is a
special case of reclaim, and should be explicit - not the reverse. ]
sane_reclaim() isn't very descriptive either: it tests whether we can use
the regular writeback throttling - available during regular page reclaim
or cgroup2 limit reclaim - or need to use the broken
wait_on_page_writeback() method. Use "writeback_throttling_sane()".
Link: http://lkml.kernel.org/r/20191022144803.302233-5-hannes@cmpxchg.org
Change-Id: Ie173107736712471293c47ee9692c3a98b4ca7da
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
b5ead35e7e1d3434ce436dfcb2af32820ce54589 from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Johannes Weiner [Sun, 1 Dec 2019 01:55:37 +0000 (17:55 -0800)]
mm: vmscan: move inactive_list_is_low() swap check to the caller
inactive_list_is_low() should be about one thing: checking the ratio
between inactive and active list. Kitchensink checks like the one for
swap space makes the function hard to use and modify its callsites.
Luckly, most callers already have an understanding of the swap situation,
so it's easy to clean up.
get_scan_count() has its own, memcg-aware swap check, and doesn't even get
to the inactive_list_is_low() check on the anon list when there is no swap
space available.
shrink_list() is called on the results of get_scan_count(), so that check
is redundant too.
age_active_anon() has its own totalswap_pages check right before it checks
the list proportions.
The shrink_node_memcg() site is the only one that doesn't do its own swap
check. Add it there.
Then delete the swap check from inactive_list_is_low().
Link: http://lkml.kernel.org/r/20191022144803.302233-4-hannes@cmpxchg.org
Change-Id: I16e03de6f88291d52fa3d6be5a999d9a4bd4f450
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
a108629149cc63cfb6fd446184e3e578e04bcfd1 from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Johannes Weiner [Sun, 1 Dec 2019 01:55:34 +0000 (17:55 -0800)]
mm: clean up and clarify lruvec lookup procedure
There is a per-memcg lruvec and a NUMA node lruvec. Which one is being
used is somewhat confusing right now, and it's easy to make mistakes -
especially when it comes to global reclaim.
How it works: when memory cgroups are enabled, we always use the
root_mem_cgroup's per-node lruvecs. When memory cgroups are not compiled
in or disabled at runtime, we use pgdat->lruvec.
Document that in a comment.
Due to the way the reclaim code is generalized, all lookups use the
mem_cgroup_lruvec() helper function, and nobody should have to find the
right lruvec manually right now. But to avoid future mistakes, rename the
pgdat->lruvec member to pgdat->__lruvec and delete the convenience wrapper
that suggests it's a commonly accessed member.
While in this area, swap the mem_cgroup_lruvec() argument order. The name
suggests a memcg operation, yet it takes a pgdat first and a memcg second.
I have to double take every time I call this. Fix that.
Link: http://lkml.kernel.org/r/20191022144803.302233-3-hannes@cmpxchg.org
Change-Id: Ibe111f47294a8067dbc57683a84583a33477847b
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
2230dc207a325970d54c3ca1efe896a57ea82499 from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Johannes Weiner [Sun, 1 Dec 2019 01:55:31 +0000 (17:55 -0800)]
mm: vmscan: simplify lruvec_lru_size()
Patch series "mm: vmscan: cgroup-related cleanups".
Here are 8 patches that clean up the reclaim code's interaction with
cgroups a bit. They're not supposed to change any behavior, just make
the implementation easier to understand and work with.
This patch (of 8):
This function currently takes the node or lruvec size and subtracts the
zones that are excluded by the classzone index of the allocation. It uses
four different types of counters to do this.
Just add up the eligible zones.
[cai@lca.pw: fix an undefined behavior for zone id]
Link: http://lkml.kernel.org/r/20191108204407.1435-1-cai@lca.pw
[akpm@linux-foundation.org: deal with the MAX_NR_ZONES special case. per Qian Cai]
Link: http://lkml.kernel.org/r/64E60F6F-7582-427B-8DD5-EF97B1656F5A@lca.pw
Link: http://lkml.kernel.org/r/20191022144803.302233-2-hannes@cmpxchg.org
Change-Id: I2d00ef6cd83336d904ac58cea129404c6ba5451c
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
78a3ee9c29c881feeb716fe70218c4fd2a7c342c from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Andrey Ryabinin [Sun, 1 Dec 2019 01:55:24 +0000 (17:55 -0800)]
mm/vmscan: remove unused lru_pages argument
Since
9092c71bb724 ("mm: use sc->priority for slab shrink targets") the
argument 'unsigned long *lru_pages' passed around with no purpose. Remove
it.
Link: http://lkml.kernel.org/r/20190228083329.31892-4-aryabinin@virtuozzo.com
Change-Id: I8a61dd9dab063e9db2d58aafe71f5a6238a3a525
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backport of the commit
c42875e38b6fb9546f665a397b66e3a4cc2e5041 from mainline]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Chanwoo Choi [Wed, 24 Mar 2021 03:45:52 +0000 (12:45 +0900)]
ARM64: tizen_bcm2711_defconfig: Enable USB_PRINTER config
Enable USB_PRINTER config to suppot USB printer device.
Change-Id: I2068a283928c8f3c85d5f31b25a13300cdc52783
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Chanwoo Choi [Wed, 24 Mar 2021 03:44:24 +0000 (12:44 +0900)]
ARM: tizen_bcm2711_defconfig: Enable USB_PRINTER config
Enable USB_PRINTER config to suppot USB printer device.
Change-Id: Ic63797d93520e4bc50175c53c7e7328b55a0f724
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Jaehoon Chung [Mon, 22 Mar 2021 08:06:33 +0000 (17:06 +0900)]
ARM64: tizen_bcm2711_defconfig: Enable CONFIG_CFG80211_CRDA_SUPPORT
Enable CONFIG_CFG80211_CRDA_SUPPORT.
Change-Id: If60cab97a05ca8bc01633255ede56279fcc599a8
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Jaehoon Chung [Mon, 22 Mar 2021 07:56:08 +0000 (16:56 +0900)]
Revert "ARM: configs: tizen_bcm2711_defconfig: Disable CONFIG_CFG80211_CRDA_SUPPORT"
This reverts commit
6485924bd76b70c8ed6ba334ef9b5045f4a3686d.
- Enable CONFIG_CFG80211_CRDA_SUPPORT to use country code.
Change-Id: I951b32c86611b6240991e9bb8b362c98bb56b72c
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Seung-Woo Kim [Thu, 25 Feb 2021 08:17:12 +0000 (17:17 +0900)]
media: uvcvideo: Add a probe quirk to Jieli Technology USB PHY 2.0 (1224:2a25)
Repeated video request on Jieli Technology USB PHY 2.0 (1224:2a25)
device causes data stall with below error until reconnection:
uvcvideo: Failed to set UVC probe control : -32 (exp. 26).
To resolve the wrong state, add PROBE quirk bits.
Change-Id: I5efba6ac26d5eea70e2227f9ff9801dd5d8d4790
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Jaehoon Chung [Wed, 17 Feb 2021 01:02:48 +0000 (10:02 +0900)]
ARM/ARM64: defconfig: disable SECURITY_SMACK_NETFILTER config
Disable SECURITY_SMACK_NETFILTER configuration.
Change-Id: Id95847392ac2bf53b92dae5ee2c0d4b5c1e41ece
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Seung-Woo Kim [Mon, 25 Jan 2021 07:24:54 +0000 (16:24 +0900)]
ARM: tizen_bcm2711_defconfig: Enable OVERLAY_FS
Enable CONFIG_OVERLAY_FS for tizen application space.
Change-Id: I46b011effa26e1f7ef9acf5f18d52bc7b54452c5
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Seung-Woo Kim [Mon, 25 Jan 2021 07:24:03 +0000 (16:24 +0900)]
ARM64: tizen_bcm2711_defconfig: Enable OVERLAY_FS
Enable CONFIG_OVERLAY_FS for tizen application space.
Change-Id: I1b2aafaeea24b6ef6b07d3c57d542dce13c53b8f
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Dongwoo Lee [Fri, 22 Jan 2021 03:40:18 +0000 (12:40 +0900)]
usb: gadget: f_fs: Fix use-after-free for unbind with remaining io
If usb has stall, then there can be remaining submitted io and
unbinding f_fs with the remaining io, there is use-after-free.
Fix the use-after-free by checking endpoint after wait.
This fixes following kasan warning:
BUG: KASAN: use-after-free in ffs_epfile_io+0x654/0xb58
Read of size 4 at addr
ffffffc0a44e65dc by task mtp-responder/5117
...
[<
ffffff900a037794>] ffs_epfile_io+0x654/0xb58
[<
ffffff900a03818c>] ffs_epfile_read_iter+0x1ac/0x3e0
...
Allocated by task 3869:
...
__kmalloc+0x234/0x760
_ffs_func_bind+0x264/0x7c8
ffs_func_bind+0xe8/0x650
usb_add_function+0x13c/0x378
...
Freed by task 3869:
...
kfree+0xa4/0x750
ffs_func_unbind+0x150/0x248
purge_configs_funcs+0x1a0/0x310
...
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
[dwoo08.lee: cherry-picked from linux-amlogic commit
5dd3ffecd46f to prevent use-after-free when f_fs is unbound before all requests are over]
Signed-off-by: Dongwoo Lee <dwoo08.lee@samsung.com>
Change-Id: Idf2391c53ca0f90fc9484d725304b88fc57fa8a6
Łukasz Stelmach [Tue, 22 Dec 2020 11:56:00 +0000 (12:56 +0100)]
kdbus: build and package kdbus-tests
Change-Id: Ie5a32cb744a28b242aac1f43e879306a43795bb5
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Konrad Lipinski [Wed, 11 Sep 2019 13:42:10 +0000 (15:42 +0200)]
kdbus: test suite changed to common format
This commit adapts the kdbus test suite to use with
dbus-integration-tests.
Change-Id: Ifee21253f4e3c732b27517a0c566d4b9a569d5af
Signed-off-by: Adrian Szyndela <adrian.s@samsung.com>
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Łukasz Stelmach [Thu, 14 Jan 2021 06:02:02 +0000 (15:02 +0900)]
ARM/ARM64: tizen_bcm2711_defconfig: Enable KDBUS
Change-Id: I3f40022fa9bbb04a0cbcca1fde45a569ca2c4538
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Łukasz Stelmach [Mon, 21 Dec 2020 15:34:08 +0000 (16:34 +0100)]
kdbus: porting to to 5.4
The following changes were made to adapt kdbus driver to 5.4 kernel
- use KERNEL_DS instead of get_ds()
- remove ITER_KVEC flag
- remove 'type' argument from access_ok()
- use memfd_fcntl() instead of shmem_get_seals()
- use uapi/linux/mount.h
Fixes:
736706bee329 ("get rid of legacy 'get_ds()' function")
Fixes:
aa563d7bca6e ("iov_iter: Separate type from direction and use accessor functions")
Fixes:
96d4f267e40f ("Remove 'type' argument from access_ok() function")
Fixes:
5aadc431a593 ("shmem: rename functions that are memfd-related")
Fixes:
5d752600a8c3 ("mm: restructure memfd code")
Fixes:
e262e32d6bde ("vfs: Suppress MS_* flag defs within the kernel unless explicitly enabled")
Change-Id: I8d2b3db1c83bb21114554ba9eb38e5e439f5c141
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Łukasz Stelmach [Mon, 21 Dec 2020 11:40:27 +0000 (12:40 +0100)]
kdbus: Revert "fs: unexport poll_schedule_timeout"
This reverts commit
8f546ae1fc5ce8396827d4868c7eee1f1cc6947a.
Change-Id: I5429471eeb092c55a50e37c0b642d50d69daebc7
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Adrian Szyndela [Wed, 11 Sep 2019 13:34:33 +0000 (15:34 +0200)]
kdbus: porting to 4.14
This commit makes kdbus driver working with kernel 4.14.
It also enabled compilation of the driver if enabled in config.
The list of changes needed to make it compile with kernel 4.14:
- Add missing includes for <linux/cred.h>
- put inode_lock()/inode_unlock() in place of explicit mutex locking
- replace CURRENT_TIME with proper call to current_time()
- replace PAGE_CACHE_* with PAGE_*
- replace GFP_TEMPORARY with GFP_KERNEL
- use kvecs for kernel memory, iovec for user memory instead of only iovec
for both
- fix usage of task_cgroup_path()
- replace GROUP_AT usage with 'gid' field dereference
- add 0 as an argument to init_name_hash
- add 0 as flags as an argument to vfs_iter_write
- replace __vfs_read with kernel_read to allow compilation into module
Change-Id: I3dd066ab531d0d3f7082556e4d38381961998015
Signed-off-by: Adrian Szyndela <adrian.s@samsung.com>
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Adrian Szyndela [Fri, 9 Sep 2016 11:35:49 +0000 (13:35 +0200)]
kdbus: the driver, original and non-working
[based on commit
216823ac83c0ab89348e2ed6f66179f53626586e]
Introduce the kdbus driver again. This driver worked previously
on kernel 4.1. This is the source code taken from the working driver.
It is non-working and disabled. It is a base for porting.
The documentation is moved from Documentation to ipc/kdbus/Documentation.
The references to kdbus source code are commented out or removed in Makefiles.
Original authors of the files are those from commit
216823ac83c0ab8934.
Cherry-picked from 4.14 commit
970070c4f68f113284f86cf7b6fbd23d6b35b511.
Change-Id: Id60af5faf794fc4ae7122976621076f1021f6c38
Signed-off-by: Adrian Szyndela <adrian.s@samsung.com>
Signed-off-by: Hyotaek Shim <hyotaek.shim@samsung.com>
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Łukasz Stelmach [Mon, 4 Jan 2021 19:23:41 +0000 (20:23 +0100)]
spec: set CONFIG_LOCALVERSION
Set CONFIG_LOCALVERSION insetad of EXTRAVERSION in Makefile to carry
information about the platform variant.
Change-Id: If1650692162832ea86988e4adb610ce0148edde3
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Łukasz Stelmach [Mon, 4 Jan 2021 13:41:58 +0000 (14:41 +0100)]
spec: support gbs(1) incremental builds
gbs(1) enable incremental builds that are significantly faster, hoewever,
spec files need some adjustments.
+ You can't use -n <name> with %setup
+ There is no point in making mrproper
+ use rsync(1) instead of cp(1)+find(1) to copy devel files
Other changes are:
+ use make headers_install to install headers in the buildroot
+ add _smp_mflags to make(1) things faster
Change-Id: Ia14fbe7acab8b83683f9e07757b4f62a7657b2a5
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Łukasz Stelmach [Mon, 21 Dec 2020 12:46:37 +0000 (13:46 +0100)]
script: adjust ccache support
Change-Id: I735295fc21cf7caa29b84548a8f21449ed97800e
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Łukasz Stelmach [Wed, 13 Jan 2021 10:24:40 +0000 (11:24 +0100)]
script: increase the number of concurrent compilers
It is safe to run as many compilation processes as twice the number of
CPUs. This is the default settings of RPM building process in Tizen.
Change-Id: Id68d9d31bc54da11c8732689645408241a517887
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Łukasz Stelmach [Wed, 13 Jan 2021 10:12:04 +0000 (11:12 +0100)]
script: use $NCPUS for arm64 builds too
Fixes:
b371be68152e ("script: Fix to use NCPUS dynamically")
Change-Id: I32b5c08f4b00ca24032a2f9355d308f3df825fbd
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Seung-Woo Kim [Thu, 7 Jan 2021 09:09:20 +0000 (18:09 +0900)]
ARM: mm: Free memblock from free_initrd_mem()
Even after free_initrd_mem(), memblock for initrd remains. Free
memblock for initrd from free_initrd_mem().
Reported-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
[sw0312.kim: port mainline posted patch to 5.4.y]
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Change-Id: I42a7461558868cdfdc559959112c81bc4c8b10c5
Sung-hun Kim [Thu, 7 Jan 2021 04:01:01 +0000 (13:01 +0900)]
ARM: dts: bcm2711-rpi-4-b: Adjust CMA size due to the OOM issue
Since 32-bit kernel has a limitation in the lowmem size, big CMA size
severely affect on available memory for kernel. Because of that, some test
cases make OOM kills due to the exhaustion in kernel available memory that
was reported by Jaehoon Chung (jh80.chung@samsung.com).
To handle this, I adjusted CMA size to 400MB while it meets the requirement
for UHD resolution.
Change-Id: I189699d4ce5f2866c01edf68c8138e9278679188
Signed-off-by: Sung-hun Kim <sfoon.kim@samsung.com>
Charan Teja Reddy [Fri, 21 Aug 2020 00:42:27 +0000 (17:42 -0700)]
mm, page_alloc: fix core hung in free_pcppages_bulk()
commit
88e8ac11d2ea3acc003cf01bb5a38c8aa76c3cfd upstream.
The following race is observed with the repeated online, offline and a
delay between two successive online of memory blocks of movable zone.
P1 P2
Online the first memory block in
the movable zone. The pcp struct
values are initialized to default
values,i.e., pcp->high = 0 &
pcp->batch = 1.
Allocate the pages from the
movable zone.
Try to Online the second memory
block in the movable zone thus it
entered the online_pages() but yet
to call zone_pcp_update().
This process is entered into
the exit path thus it tries
to release the order-0 pages
to pcp lists through
free_unref_page_commit().
As pcp->high = 0, pcp->count = 1
proceed to call the function
free_pcppages_bulk().
Update the pcp values thus the
new pcp values are like, say,
pcp->high = 378, pcp->batch = 63.
Read the pcp's batch value using
READ_ONCE() and pass the same to
free_pcppages_bulk(), pcp values
passed here are, batch = 63,
count = 1.
Since num of pages in the pcp
lists are less than ->batch,
then it will stuck in
while(list_empty(list)) loop
with interrupts disabled thus
a core hung.
Avoid this by ensuring free_pcppages_bulk() is called with proper count of
pcp list pages.
The mentioned race is some what easily reproducible without [1] because
pcp's are not updated for the first memory block online and thus there is
a enough race window for P2 between alloc+free and pcp struct values
update through onlining of second memory block.
With [1], the race still exists but it is very narrow as we update the pcp
struct values for the first memory block online itself.
This is not limited to the movable zone, it could also happen in cases
with the normal zone (e.g., hotplug to a node that only has DMA memory, or
no other memory yet).
[1]: https://patchwork.kernel.org/patch/
11696389/
Fixes:
5f8dcc21211a ("page-allocator: split per-cpu list into one-list-per-migrate-type")
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: <stable@vger.kernel.org> [2.6+]
Link: http://lkml.kernel.org/r/1597150703-19003-1-git-send-email-charante@codeaurora.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[sw0312.kim: cherry-pick linux-5.4.y commit
0cfb9320d00c to resolved too larget cma issue]
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Change-Id: I55f884f9950a35c8d75affebe426d176cd91702b
Doug Berger [Fri, 21 Aug 2020 00:42:24 +0000 (17:42 -0700)]
mm: include CMA pages in lowmem_reserve at boot
commit
e08d3fdfe2dafa0331843f70ce1ff6c1c4900bf4 upstream.
The lowmem_reserve arrays provide a means of applying pressure against
allocations from lower zones that were targeted at higher zones. Its
values are a function of the number of pages managed by higher zones and
are assigned by a call to the setup_per_zone_lowmem_reserve() function.
The function is initially called at boot time by the function
init_per_zone_wmark_min() and may be called later by accesses of the
/proc/sys/vm/lowmem_reserve_ratio sysctl file.
The function init_per_zone_wmark_min() was moved up from a module_init to
a core_initcall to resolve a sequencing issue with khugepaged.
Unfortunately this created a sequencing issue with CMA page accounting.
The CMA pages are added to the managed page count of a zone when
cma_init_reserved_areas() is called at boot also as a core_initcall. This
makes it uncertain whether the CMA pages will be added to the managed page
counts of their zones before or after the call to
init_per_zone_wmark_min() as it becomes dependent on link order. With the
current link order the pages are added to the managed count after the
lowmem_reserve arrays are initialized at boot.
This means the lowmem_reserve values at boot may be lower than the values
used later if /proc/sys/vm/lowmem_reserve_ratio is accessed even if the
ratio values are unchanged.
In many cases the difference is not significant, but for example
an ARM platform with 1GB of memory and the following memory layout
cma: Reserved 256 MiB at 0x0000000030000000
Zone ranges:
DMA [mem 0x0000000000000000-0x000000002fffffff]
Normal empty
HighMem [mem 0x0000000030000000-0x000000003fffffff]
would result in 0 lowmem_reserve for the DMA zone. This would allow
userspace to deplete the DMA zone easily.
Funnily enough
$ cat /proc/sys/vm/lowmem_reserve_ratio
would fix up the situation because as a side effect it forces
setup_per_zone_lowmem_reserve.
This commit breaks the link order dependency by invoking
init_per_zone_wmark_min() as a postcore_initcall so that the CMA pages
have the chance to be properly accounted in their zone(s) and allowing
the lowmem_reserve arrays to receive consistent values.
Fixes:
bc22af74f271 ("mm: update min_free_kbytes from khugepaged after core initialization")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1597423766-27849-1-git-send-email-opendmb@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[sw0312.kim: cherry-pick linux-5.4.y commit
5663159e2930 to resolve too larget cma issue]
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Change-Id: I59cbe93ed28f09d08dced520667b32e2fb030c21
Hoegeun Kwon [Mon, 21 Dec 2020 04:54:55 +0000 (13:54 +0900)]
ARM: dts: bcm2711-rpi-4-b: Fix CMA size to 512M
Increase cma size to 512 for use 4k GL on tizen platform.
When using RPI4 1GB model + 4k UHD, reclaim occurs a lot and it is slow.
Change-Id: Iac60a9fa0d5f76085e60cbf3db9a90585597de57
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Greg Kroah-Hartman [Tue, 7 Jul 2020 12:42:38 +0000 (14:42 +0200)]
Revert "ALSA: usb-audio: Improve frames size computation"
This reverts commit
aba41867dd66939d336fdf604e4d73b805d8039f which is
commit
f0bd62b64016508938df9babe47f65c2c727d25c upstream.
It causes a number of reported issues and a fix for it has not hit
Linus's tree yet. Revert this to resolve those problems.
Cc: Alexander Tsoy <alexander@tsoy.me>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Hans de Goede <jwrdegoede@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[sw0312.kim: cherry-pick linux-5.4.y commit
e0ed5a36fb3a to fix specific usb-audio distortion issue]
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Change-Id: I9e7ca39dddc28d8c29bf0488419c1fbdbddfb61a
Hoegeun Kwon [Mon, 14 Dec 2020 14:24:34 +0000 (23:24 +0900)]
script: Fix to use NCPUS dynamically
It uses NCPUS dynamically.
Change-Id: Ib7b1229f387b9c91f4eb2b29cc8f3bfe207fcaaa
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@gmail.com>
Dave Stevenson [Thu, 13 Aug 2020 16:04:53 +0000 (17:04 +0100)]
staging: vc04_services: codec: Fix component enable/disable
start_streaming enabled the VPU component if ctx->component_enabled
was not set.
stop_streaming disabled the VPU component if both ports were
disabled. It didn't clear ctx->component_enabled.
If seeking, this meant that the component never got re-enabled,
and buffers never got processed afterwards.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
[sw0312.kim: cherry-pick rpi-5.4.y commit to fix video decoding seek issue]
Ref: https://github.com/raspberrypi/linux/pull/3790
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Change-Id: Ibf27e3ed2dc3d98d64a78b10878d6339d77f8a33
Dave Stevenson [Thu, 13 Aug 2020 16:01:27 +0000 (17:01 +0100)]
staging: vc04_service: codec: Allow start_streaming to update the buffernum
start_streaming passes a count of how many buffers have been queued
to videobuf2.
Allow this value to update the number of buffers the VPU allocates
on a port to avoid buffer recycling issues.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
[sw0312.kim: cherry-pick rpi-5.4.y commit to fix video decoding seek issue]
Ref: https://github.com/raspberrypi/linux/pull/3790
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Change-Id: Iaf37f57df95789cc8987b268feab1f0fa7377849
Dave Stevenson [Thu, 13 Aug 2020 15:58:18 +0000 (16:58 +0100)]
staging: vc04_services: codec: Fix incorrect buffer cleanup
The allocated input and output buffers are initialised in
buf_init and should only be cleared up in buf_cleanup.
stop_streaming was (incorrectly) cleaning up the buffers to
avoid an issue in videobuf2 that had been fixed by the orphaned
buffer support.
Remove the erroneous cleanup.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
[sw0312.kim: cherry-pick rpi-5.4.y commit to fix video decoding seek issue]
Ref: https://github.com/raspberrypi/linux/pull/3790
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Change-Id: I38e3a257a941753b02e448d6d5dd1c8486b648f1
Hoegeun Kwon [Mon, 9 Nov 2020 07:44:46 +0000 (16:44 +0900)]
ARM: tizen_bcm2711_defconfig: Fix console loglevel for logo display
Modifiy CONSOLE_LOGLEVEL_QUIET defconfig to 3 for logo display.
Change-Id: Idae234c40304fbe8fd5f02941e53505a6bd189fa
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Hoegeun Kwon [Mon, 9 Nov 2020 07:03:59 +0000 (16:03 +0900)]
arm64: tizen_bcm2711_defconfig: Fix console loglevel for logo display
Modifiy CONSOLE_LOGLEVEL_QUIET defconfig to 3 for logo display.
Change-Id: I5112d85dedd1e116a75af0d07c09d77182ff4f5a
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Hoegeun Kwon [Tue, 3 Nov 2020 08:09:20 +0000 (17:09 +0900)]
rpi4: boot: config: Enable hdmi_force_hotplug
There is a problem that the booting stops when booting without connecting
HDMI. To fix it we need to enable hdmi_force_hotplug.
Change-Id: I44e960a04f75c999ab69a75ad72d67311f8b70c1
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Maxime Ripard [Tue, 3 Nov 2020 08:09:17 +0000 (17:09 +0900)]
drm/vc4: kms: Don't disable the muxing of an active CRTC
The current HVS muxing code will consider the CRTCs in a given state to
setup their muxing in the HVS, and disable the other CRTCs muxes.
However, it's valid to only update a single CRTC with a state, and in this
situation we would mux out a CRTC that was enabled but left untouched by
the new state.
Fix this by setting a flag on the CRTC state when the muxing has been
changed, and only change the muxing configuration when that flag is there.
Fixes:
87ebcd42fb7b ("drm/vc4: crtc: Assign output to channel automatically")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
[hoegeun.kwon: Needed to fix page flip issue of dual hdmi and for enable
force hotplug configure. ]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: I5c9fb79c9ad07caa8fe77ea724f35c955df6edcf
Maxime Ripard [Tue, 3 Nov 2020 08:09:14 +0000 (17:09 +0900)]
drm/vc4: kms: Store the unassigned channel list in the state
If a CRTC is enabled but not active, and that we're then doing a page
flip on another CRTC, drm_atomic_get_crtc_state will bring the first
CRTC state into the global state, and will make us wait for its vblank
as well, even though that might never occur.
Instead of creating the list of the free channels each time atomic_check
is called, and calling drm_atomic_get_crtc_state to retrieve the
allocated channels, let's create a private state object in the main
atomic state, and use it to store the available channels.
Fixes:
87ebcd42fb7b ("drm/vc4: crtc: Assign output to channel automatically")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
[hoegeun.kwon: Needed to fix page flip issue of dual hdmi and for enable
force hotplug configure. fix to unsigned long unassigned_channels instead
of integer to fix build errors.]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: Ic8bf07502f614595070e21ecc7912e671a193b09
Maxime Ripard [Tue, 3 Nov 2020 08:09:12 +0000 (17:09 +0900)]
drm/vc4: kms: Add functions to create the state objects
We're going to add a new private state, so let's move the previous state
function creation to some functions to make further additions easier.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
[hoegeun.kwon: Needed to fix page flip issue of dual hdmi and for enable
force hotplug configure.]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: Ice57c56dfbdfbb81f486cba7626512cfbb12b905
Maxime Ripard [Tue, 3 Nov 2020 08:09:01 +0000 (17:09 +0900)]
drm/vc4: kms: Rename NUM_CHANNELS
The NUM_CHANNELS define has a pretty generic name and was right before the
function using it. Let's move to something that makes the hardware-specific
nature more obvious, and to a more appropriate place.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
[hoegeun.kwon: Needed to fix page flip issue of dual hdmi and for enable
force hotplug configure.]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: Ib94c2b3ffbc7e5c95f2c3a965919e523c68f8a12
Maxime Ripard [Tue, 3 Nov 2020 08:08:59 +0000 (17:08 +0900)]
drm/vc4: kms: Remove useless define
NUM_OUTPUTS isn't used anymore, let's remove it.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
[hoegeun.kwon: Needed to fix page flip issue of dual hdmi and for enable
force hotplug configure.]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: I20b47a2da045094a06cef9e9a924f46c4dd1d4a9
Hoegeun Kwon [Tue, 3 Nov 2020 08:08:57 +0000 (17:08 +0900)]
Revert "drm/vc4: kms: Don't disable the muxing of an active CRTC"
This reverts commit
5d2fec61a25bacc49ee8e84b3c19aee1522f7289.
Revert this patch for apply patch version 2.
Change-Id: Ib7ff9b242dc3f208cf58f7d8f0741321b725c657
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Hoegeun Kwon [Tue, 3 Nov 2020 08:08:54 +0000 (17:08 +0900)]
Revert "drm/vc4: kms: Fix VBLANK reporting on a disabled CRTC"
This reverts commit
e805316d5d44b1f1f080fd8ae8a34b69329d940c.
Revert this patch for apply patch version 2.
Change-Id: If9525aaa3835ab80b8ca83271dc584f68254018a
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Sung-hun Kim [Wed, 28 Oct 2020 10:26:31 +0000 (19:26 +0900)]
mm: LKSM: bug fix for KASAN out-of-bound access error on accessing a filter
KASAN reports out-of-bound accesses (reported by Jaehoon Chung)
on slab which is performed for obtaining a next filtered
address to find a sharable page.
LKSM exploits bitmap-based filters to find sharable pages in
an efficient way. A buggy code is a kind of miscalculation for
boundary of the allocated bitmap. This patch takes care of it.
Change-Id: If45c5ce175db067523b60f11e69e12d2bc798659
Signed-off-by: Sung-hun Kim <sfoon.kim@samsung.com>
Sung-hun Kim [Tue, 27 Oct 2020 11:48:36 +0000 (20:48 +0900)]
mm: LKSM: bug fix for kernel memory leak
For efficiency, LKSM cleans exited processes in a batched manner when it
finishes a scanning iteration. When it finds exited process while it is in
the scanning iteration, it just pends the mm_slot of the exited process to
the internal list.
On the other hend, when KSM daemon cleans mm_slots of exited processes, it
should care regions of exited processes to remove unreferenced lksm_region
objects.
Previously, most regions are maintained properly but only regions in "head"
of the exited process list does not be cleaned due to the buggy implementation.
At last, uncleaned objects are remained as unreferenced garbages.
Follow message is detected by kmemleak (reported by Suengwoo Kim):
=========================================================================
unreferenced object 0xffffff80c7083600 (size 128):
comm "ksm_crawld", pid 41, jiffies
4294918362 (age 95.632s)
hex dump (first 32 bytes):
00 37 08 c7 80 ff ff ff 60 82 19 bd 80 ff ff ff .7......`.......
00 35 08 c7 80 ff ff ff 00 00 00 00 00 00 00 00 .5..............
backtrace:
[<
0000000048313958>] kmem_cache_alloc_trace+0x1e0/0x348
[<
00000000fd246822>] lksm_region_ref_append+0x48/0xf8
[<
00000000c5a818a0>] ksm_join+0x3a0/0x498
[<
00000000b2c3f36a>] lksm_prepare_full_scan+0xe8/0x390
[<
00000000013943b5>] lksm_crawl_thread+0x214/0xbf8
[<
00000000b4ce0593>] kthread+0x1b0/0x1b8
[<
000000002a3f7216>] ret_from_fork+0x10/0x18
unreferenced object 0xffffff80c7083700 (size 128):
comm "ksm_crawld", pid 41, jiffies
4294918362 (age 95.632s)
hex dump (first 32 bytes):
00 39 08 c7 80 ff ff ff 00 36 08 c7 80 ff ff ff .9.......6......
00 35 08 c7 80 ff ff ff 00 00 00 00 00 00 00 00 .5..............
backtrace:
[<
0000000048313958>] kmem_cache_alloc_trace+0x1e0/0x348
[<
00000000fd246822>] lksm_region_ref_append+0x48/0xf8
[<
00000000c5a818a0>] ksm_join+0x3a0/0x498
[<
00000000b2c3f36a>] lksm_prepare_full_scan+0xe8/0x390
[<
00000000013943b5>] lksm_crawl_thread+0x214/0xbf8
[<
00000000b4ce0593>] kthread+0x1b0/0x1b8
[<
000000002a3f7216>] ret_from_fork+0x10/0x18
...
=========================================================================
This patch takes care of such possible kernel memory leak problem.
Change-Id: Ifb4963773b8803da239a1d3108c5b2690d22d531
Signed-off-by: Sung-hun Kim <sfoon.kim@samsung.com>
Seung-Woo Kim [Tue, 27 Oct 2020 10:23:56 +0000 (19:23 +0900)]
brcmfmac: Fix memory leak for unpaired brcmf_{alloc/free}
There are missig brcmf_free() for brcmf_alloc(). Fix memory leak
by adding missed brcmf_free().
Change-Id: I050398a7b828b0fb2aadbe491f353b3d3c47bdd6
Reported-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Hoegeun Kwon [Mon, 26 Oct 2020 11:47:31 +0000 (20:47 +0900)]
drm/vc4: drv: Add error handding for bind
There is a problem that if vc4_drm bind fails, a memory leak occurs on
the drm_property_create side. Add error handding for drm_mode_config.
Change-Id: If23c0dabf01b85368871981f0cba427982922615
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Seung-Woo Kim [Mon, 26 Oct 2020 07:37:19 +0000 (16:37 +0900)]
staging: mmal-vchiq: Fix memory leak for vchi_instance
The vchi_instance is allocated with vchiq_initialise() but never
handled properly. Fix memory leak for the vchi_instance.
This fixes below kmemleak report:
[<
000000002312557f>] kmem_cache_alloc_trace+0x1e0/0x348
[<
000000002ee5d470>] vchiq_initialise+0xd0/0x258
[<
000000009b51d8f4>] vchi_initialise+0x84/0xc8
[<
00000000a58f68c5>] vchiq_mmal_init+0xb0/0x2f0
[<
000000006c68d7bf>] bcm2835_mmal_probe+0x90/0x660
...
Change-Id: Ib10f194aa4058b3e39ec0a250fa36fd00daf9feb
Reported-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Maxime Ripard [Mon, 12 Oct 2020 03:44:25 +0000 (12:44 +0900)]
drm/vc4: kms: Fix VBLANK reporting on a disabled CRTC
If a CRTC is enabled but not active, and that we're then doing a page flip
on another CRTC, drm_atomic_get_crtc_state will bring the first CRTC state
into the global state, and will make us wait for its vblank as well, even
though that might never occur.
Fix this by considering all the enabled CRTCs by either using their new
state in the global state, or using their current state if they aren't part
of the new state being checked, to remove their assigned channel from the
pool before started to assign channels to CRTCs enabled by the state.
Fixes:
87ebcd42fb7b ("drm/vc4: crtc: Assign output to channel automatically")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
[hoegeun.kwon: Needed to fix page flip issue of dual hdmi.]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: Ic8b8083a652df142a17ec0d50f6c2572494ba3c0
Maxime Ripard [Thu, 8 Oct 2020 11:25:18 +0000 (13:25 +0200)]
drm/vc4: kms: Don't disable the muxing of an active CRTC
The current HVS muxing code will consider the CRTCs in a given state to
setup their muxing in the HVS, and disable the other CRTCs muxes.
However, it's valid to only update a single CRTC with a state, and in this
situation we would mux out a CRTC that was enabled but left untouched by
the new state.
Fix this by considering all the CRTCs in the state and the ones with a
state and active.
Fixes:
87ebcd42fb7b ("drm/vc4: crtc: Assign output to channel automatically")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
[hoegeun.kwon: Needed to fix page flip issue of dual hdmi.]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: Idb6e79214f52e470e9fcf5a0579873402c631eb2
Hoegeun Kwon [Mon, 12 Oct 2020 12:10:58 +0000 (21:10 +0900)]
drm/vc4: kms: Fix to split vc4 and vc5 hvs pv muxing commit
In order to use dual HDMI normally, vc4 and vc5 should be separated.
Split vc4 hvs pv muxing commit and vc5 referring to patch[1].
[1] drm/vc4: crtc: Assign output to channel automatically
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=
87ebcd42fb7b8d1d3269007a621e41ae96a0077e
Change-Id: I5427cfd885c4643f764f8f84bed0f017c5d2a562
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Maxime Ripard [Thu, 8 Oct 2020 11:25:17 +0000 (13:25 +0200)]
drm/vc4: kms: Document the muxing corner cases
We've had a number of muxing corner-cases with specific ways to reproduce
them, so let's document them to make sure they aren't lost and introduce
regressions later on.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
[hoegeun.kwon: Needed to fix page flip issue of dual hdmi.]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: Icc4f82cd0c22f7b0b7e3c33be37d2e9d287638e4
Maxime Ripard [Mon, 12 Oct 2020 07:55:43 +0000 (16:55 +0900)]
drm/vc4: kms: Split the HVS muxing check in a separate function
The code that assigns HVS channels during atomic_check is starting to grow
a bit big, let's move it into a separate function.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
[hoegeun.kwon: Needed to fix page flip issue of dual hdmi.]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: Icf636a846282981d21ad3e6652a3ec51f14993a4
Maxime Ripard [Mon, 12 Oct 2020 07:44:43 +0000 (16:44 +0900)]
drm/vc4: crtc: Keep the previously assigned HVS FIFO
The HVS FIFOs are currently assigned each time we have an atomic_check
for all the enabled CRTCs.
However, if we are running multiple outputs in parallel and we happen to
disable the first (by index) CRTC, we end up changing the assigned FIFO
of the second CRTC without disabling and reenabling the pixelvalve which
ends up in a stall and eventually a VBLANK timeout.
In order to fix this, we can create a special value for our assigned
channel to mark it as disabled, and if our CRTC already had an assigned
channel in its previous state, we keep on using it.
Fixes:
87ebcd42fb7b ("drm/vc4: crtc: Assign output to channel automatically")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
[hoegeun.kwon: Needed to fix page flip issue of dual hdmi.]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: I70e0d6b563d5c7f22a1a7e1527a401ac42395b78
Maxime Ripard [Mon, 12 Oct 2020 04:26:40 +0000 (13:26 +0900)]
drm/vc4: kms: Assign a FIFO to enabled CRTCs instead of active
The HVS has three FIFOs that can be assigned to a number of PixelValves
through a mux.
However, changing that FIFO requires that we disable and then enable the
pixelvalve, so we want to assign FIFOs to all the enabled CRTCs, and not
just the active ones.
Fixes:
87ebcd42fb7b ("drm/vc4: crtc: Assign output to channel automatically")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
[hoegeun.kwon: Needed to fix page flip issue of dual hdmi.]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: Id418d321e3cfc9b992bd1868e8084d9f8694a46e
Jaehoon Chung [Mon, 5 Oct 2020 11:39:01 +0000 (20:39 +0900)]
brcmfmac: change from brcmf_dbg to brcmf_info
Change from brcmf_dbg to brcmf_info.
This patch is workaround to check debug message on Testhub.
Change-Id: I7fef8f64fc38e43b6544fab3e9474e7f1829cc60
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Maxime Ripard [Thu, 17 Sep 2020 12:16:23 +0000 (14:16 +0200)]
drm/vc4: hvs: Pull the state of all the CRTCs prior to PV muxing
The vc4 display engine has a first controller called the HVS that will
perform the composition of the planes. That HVS has 3 FIFOs and can
therefore compose planes for up to three outputs. The timings part is
generated through a component called the Pixel Valve, and the BCM2711 has 6
of them.
Thus, the HVS has some bits to control which FIFO gets output to which
Pixel Valve. The current code supports that muxing by looking at all the
CRTCs in a new DRM atomic state in atomic_check, and given the set of
constraints that we have, assigns FIFOs to CRTCs or reject the mode
entirely. The actual muxing will occur during atomic_commit.
However, that doesn't work if only a fraction of the CRTCs' state is
updated in that state, since it will ignore the CRTCs that are kept running
unmodified, and will thus unassign its associated FIFO, and later disable
it.
In order to make the code work as expected, let's pull the CRTC state of
all the enabled CRTC in our atomic_check so that we can operate on all the
running CRTCs, no matter whether they are affected by the new state or not.
Fixes:
87ebcd42fb7b ("drm/vc4: crtc: Assign output to channel automatically")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Tested-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Reviewed-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200917121623.42023-1-maxime@cerno.tech
[hoegeun.kwon: Fix dual hdmi issue]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: I750e253c7415b09e546beb2608445d581c441efe
Hoegeun Kwon [Fri, 18 Sep 2020 06:54:43 +0000 (15:54 +0900)]
drm/vc4: crtc: Add null pointer error handling
Crash occurs when vc4_encoder is null. Add null error handling
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000090
Mem abort info:
ESR = 0x96000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000005
CM = 0, WnR = 0
user pgtable: 4k pages, 39-bit VAs, pgdp=
00000000f2a96000
[
0000000000000090] pgd=
0000000000000000, pud=
0000000000000000
Internal error: Oops:
96000005 [#1] PREEMPT SMP
Modules linked in: brcmfmac joydev brcmutil rpivid_mem
CPU: 0 PID: 32 Comm: kworker/0:1 Not tainted 5.4.50-v8+ #414
Hardware name: Raspberry Pi 4 Model B (DT)
Workqueue: events output_poll_execute
pstate:
80000005 (Nzcv daif -PAN -UAO)
pc : vc4_crtc_atomic_disable+0x150/0x2e8
lr : vc4_crtc_atomic_disable+0x50/0x2e8
...
Call trace:
vc4_crtc_atomic_disable+0x150/0x2e8
drm_atomic_helper_commit_modeset_disables+0x344/0x410
vc4_atomic_complete_commit+0xd4/0x530
vc4_atomic_commit+0xf4/0x190
drm_atomic_commit+0x54/0x60
...
Change-Id: Ib3271bb3c8318079fe5815711f89a217e835adc6
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Seung-Woo Kim [Tue, 15 Sep 2020 09:48:22 +0000 (18:48 +0900)]
scripts: mkbootimg_rpi4.sh: use compress image
Use compressed image for less size.
Change-Id: I9ec8b7e5e35bfa8ae6ac2832d52d4748e9a79668
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Hoegeun Kwon [Tue, 15 Sep 2020 00:32:49 +0000 (09:32 +0900)]
drm/vc4: hvs: Boost the core clock during modeset
In order to prevent timeouts and stalls in the pipeline, the core clock
needs to be maxed at 500MHz during a modeset on the BCM2711.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Chanwoo Choi <cw00.choi@samsung.com>
Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/37ed9e0124c5cce005ddc8dafe821d8b0da036ff.1599120059.git-series.maxime@cerno.tech
[hoegeun.kwon: A screen cracking problem occurs in FHD(1920x1080). The
cause is that the clock should be kept at 500MHz, but it occurred when
the clock fell to 200MHz, so apply a patch.]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: Ie0d4880966c1644bfdf56bb49dbb82559978538c
Hoegeun Kwon [Fri, 11 Sep 2020 04:39:40 +0000 (13:39 +0900)]
drm/vc4: hdmi: Fix to use clk_set_min_rate
There is a problem that rpi_firmware_transaction fails while setting
the clock rate. Add the missing code applied to the mainline.
Change-Id: I56bcb00037fc85bb01f8a876ccd26798b4bddb39
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/d757ddd6549da140f178563e5fd2bf1d129913fd.1599120059.git-series.maxime@cerno.tech
Maxime Ripard [Tue, 25 Aug 2020 09:44:04 +0000 (18:44 +0900)]
drm/vc4: hdmi: Switch to blank pixels when disabled
In order to avoid pixels getting stuck in an unflushable FIFO, we need when
we disable the HDMI controller to switch away from getting our pixels from
the pixelvalve and instead use blank pixels, and switch back to the
pixelvalve when we enable the HDMI controller.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Chanwoo Choi <cw00.choi@samsung.com>
Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Link: https://patchwork.freedesktop.org/patch/msgid/fde3efb1ad79f4476a73d310cbba3ec07dc6dabe.1599120059.git-series.maxime@cerno.tech
[hoegeun.kwon: Needed to troubleshoot page flip timed out issue.]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: I497f96e6d9b535d9d173d486fe1829c30093d88f
Maxime Ripard [Tue, 25 Aug 2020 09:39:06 +0000 (18:39 +0900)]
drm/vc4: hdmi: Do the VID_CTL configuration at once
The VID_CTL setup is done in several places in the driver even though it's
not really required. Let's simplify it a bit to do the configuration in one
go.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Chanwoo Choi <cw00.choi@samsung.com>
Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Link: https://patchwork.freedesktop.org/patch/msgid/08e7ebb605a560fcc149b69b4af52753a7870b2f.1599120059.git-series.maxime@cerno.tech
[cw00.choi: Apply it to both vc4_hdmi_set_timings and vc5_hdmi_set_timings,
needed to troubleshoot page flip timed out issue.]
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: I3b12049c9bfb69d5d21c7186b677e8e32d756959
Maxime Ripard [Tue, 25 Aug 2020 09:26:58 +0000 (18:26 +0900)]
drm/vc4: hdmi: Implement finer-grained hooks
In order to prevent some pixels getting stuck in an unflushable FIFO on
bcm2711, we need to enable the HVS, the pixelvalve (the CRTC) and the HDMI
controller (the encoder) in an intertwined way, and with tight delays.
However, the atomic callbacks don't really provide a way to work with
either constraints, so we need to roll our own callbacks so that we can
provide those guarantees.
Since those callbacks have been implemented and called in the CRTC code, we
can just implement them in the HDMI driver now.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Chanwoo Choi <cw00.choi@samsung.com>
Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2e9226d971117065f3b97e597f04f7fe2f0c134c.1599120059.git-series.maxime@cerno.tech
[hoegeun.kwon: Needed to troubleshoot page flip timed out issue.]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: Ice9a866f353d8e478e8f24c66b6e2e836e474817
Maxime Ripard [Tue, 25 Aug 2020 09:21:50 +0000 (18:21 +0900)]
drm/vc4: hdmi: Always recenter the HDMI FIFO
In order to avoid a pixel getting stuck in an unflushable FIFO, we need to
recenter the FIFO every time we're doing a modeset and not only if we're
connected to an HDMI monitor.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Chanwoo Choi <cw00.choi@samsung.com>
Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Link: https://patchwork.freedesktop.org/patch/msgid/b3faaf05ac6c4d3c364d28fa441571eb85903269.1599120059.git-series.maxime@cerno.tech
[hoegeun.kwon: Needed to troubleshoot page flip timed out issue.]
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Change-Id: Ife295f612fb32ef2a02e1494c3a4b532735c989e