Peter Zijlstra [Thu, 6 Aug 2020 12:35:11 +0000 (14:35 +0200)]
locking/seqlock, headers: Untangle the spaghetti monster
By using lockdep_assert_*() from seqlock.h, the spaghetti monster
attacked.
Attack back by reducing seqlock.h dependencies from two key high level headers:
- <linux/seqlock.h>: -Remove <linux/ww_mutex.h>
- <linux/time.h>: -Remove <linux/seqlock.h>
- <linux/sched.h>: +Add <linux/seqlock.h>
The price was to add it to sched.h ...
Core header fallout, we add direct header dependencies instead of gaining them
parasitically from higher level headers:
- <linux/dynamic_queue_limits.h>: +Add <asm/bug.h>
- <linux/hrtimer.h>: +Add <linux/seqlock.h>
- <linux/ktime.h>: +Add <asm/bug.h>
- <linux/lockdep.h>: +Add <linux/smp.h>
- <linux/sched.h>: +Add <linux/seqlock.h>
- <linux/videodev2.h>: +Add <linux/kernel.h>
Arch headers fallout:
- PARISC: <asm/timex.h>: +Add <asm/special_insns.h>
- SH: <asm/io.h>: +Add <asm/page.h>
- SPARC: <asm/timer_64.h>: +Add <uapi/asm/asi.h>
- SPARC: <asm/vvar.h>: +Add <asm/processor.h>, <asm/barrier.h>
-Remove <linux/seqlock.h>
- X86: <asm/fixmap.h>: +Add <asm/pgtable_types.h>
-Remove <asm/acpi.h>
There's also a bunch of parasitic header dependency fallout in .c files, not listed
separately.
[ mingo: Extended the changelog, split up & fixed the original patch. ]
Co-developed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200804133438.GK2674@hirez.programming.kicks-ass.net
Peter Zijlstra [Thu, 6 Aug 2020 12:36:20 +0000 (14:36 +0200)]
locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header
We want to remove the #include <asm/io.h> from <asm/smp.h>, but for this
we have to move the XTP bits into a separate header first (as these bits
rely on <asm/io.h> definitions), and include them in the .c files that rely
on those APIs.
Co-developed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Thu, 6 Aug 2020 12:34:32 +0000 (14:34 +0200)]
x86/headers: Remove APIC headers from <asm/smp.h>
The APIC headers are relatively complex and bring in additional
header dependencies - while smp.h is a relatively simple header
included from high level headers.
Remove the dependency and add in the missing #include's in .c
files where they gained it indirectly before.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Thu, 6 Aug 2020 08:16:38 +0000 (10:16 +0200)]
Merge branch 'WIP.locking/seqlocks' into locking/urgent
Pick up the full seqlock series PeterZ is working on.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Thu, 23 Jul 2020 10:11:49 +0000 (12:11 +0200)]
seqcount: More consistent seqprop names
Attempt uniformity and brevity.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Peter Zijlstra [Thu, 23 Jul 2020 10:03:13 +0000 (12:03 +0200)]
seqcount: Compress SEQCNT_LOCKNAME_ZERO()
Less is more.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Peter Zijlstra [Thu, 23 Jul 2020 10:00:53 +0000 (12:00 +0200)]
seqlock: Fold seqcount_LOCKNAME_init() definition
Manual repetition is boring and error prone.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Peter Zijlstra [Thu, 23 Jul 2020 09:56:49 +0000 (11:56 +0200)]
seqlock: Fold seqcount_LOCKNAME_t definition
Manual repetition is boring and error prone.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Peter Zijlstra [Thu, 23 Jul 2020 09:56:22 +0000 (11:56 +0200)]
seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
__SEQ_LOCKDEP() is an expression gate for the
seqcount_LOCKNAME_t::lock member. Rename it to be about the member,
not the gate condition.
Later (PREEMPT_RT) patches will make the member available for !LOCKDEP
configs.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:30 +0000 (17:55 +0200)]
hrtimer: Use sequence counter with associated raw spinlock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_raw_spinlock_t data type, which allows to associate
a raw spinlock with the sequence counter. This enables lockdep to verify
that the raw spinlock used for writer serialization is held when the
write side critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-25-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:29 +0000 (17:55 +0200)]
kvm/eventfd: Use sequence counter with associated spinlock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/r/20200720155530.1173732-24-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:28 +0000 (17:55 +0200)]
userfaultfd: Use sequence counter with associated spinlock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-23-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:27 +0000 (17:55 +0200)]
NFSv4: Use sequence counter with associated spinlock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-22-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:26 +0000 (17:55 +0200)]
iocost: Use sequence counter with associated spinlock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Link: https://lkml.kernel.org/r/20200720155530.1173732-21-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:25 +0000 (17:55 +0200)]
raid5: Use sequence counter with associated spinlock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-20-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:24 +0000 (17:55 +0200)]
vfs: Use sequence counter with associated spinlock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-19-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:23 +0000 (17:55 +0200)]
timekeeping: Use sequence counter with associated raw spinlock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_raw_spinlock_t data type, which allows to associate
a raw spinlock with the sequence counter. This enables lockdep to verify
that the raw spinlock used for writer serialization is held when the
write side critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-18-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:22 +0000 (17:55 +0200)]
xfrm: policy: Use sequence counters with associated lock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the sequence counter write side critical
section.
A plain seqcount_t does not contain the information of which lock must
be held when entering a write side critical section.
Use the new seqcount_spinlock_t and seqcount_mutex_t data types instead,
which allow to associate a lock with the sequence counter. This enables
lockdep to verify that the lock used for writer serialization is held
when the write side critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-17-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:21 +0000 (17:55 +0200)]
netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_rwlock_t data type, which allows to associate a
rwlock with the sequence counter. This enables lockdep to verify that
the rwlock used for writer serialization is held when the write side
critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-16-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:20 +0000 (17:55 +0200)]
netfilter: conntrack: Use sequence counter with associated spinlock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-15-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:19 +0000 (17:55 +0200)]
sched: tasks: Use sequence counter with associated spinlock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-14-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:18 +0000 (17:55 +0200)]
dma-buf: Use sequence counter with associated wound/wait mutex
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the sequence counter write side critical
section.
The dma-buf reservation subsystem uses plain sequence counters to manage
updates to reservations. Writer serialization is accomplished through a
wound/wait mutex.
Acquiring a wound/wait mutex does not disable preemption, so this needs
to be done manually before and after the write side critical section.
Use the newly-added seqcount_ww_mutex_t instead:
- It associates the ww_mutex with the sequence count, which enables
lockdep to validate that the write side critical section is properly
serialized.
- It removes the need to explicitly add preempt_disable/enable()
around the write side critical section because the write_begin/end()
functions for this new data type automatically do this.
If lockdep is disabled this ww_mutex lock association is compiled out
and has neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://lkml.kernel.org/r/20200720155530.1173732-13-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:17 +0000 (17:55 +0200)]
dma-buf: Remove custom seqcount lockdep class key
Commit
3c3b177a9369 ("reservation: add support for read-only access
using rcu") introduced a sequence counter to manage updates to
reservations. Back then, the reservation object initializer
reservation_object_init() was always inlined.
Having the sequence counter initialization inlined meant that each of
the call sites would have a different lockdep class key, which would've
broken lockdep's deadlock detection. The aforementioned commit thus
introduced, and exported, a custom seqcount lockdep class key and name.
The commit
8735f16803f00 ("dma-buf: cleanup reservation_object_init...")
transformed the reservation object initializer to a normal non-inlined C
function. seqcount_init(), which automatically defines the seqcount
lockdep class key and must be called non-inlined, can now be safely used.
Remove the seqcount custom lockdep class key, name, and export. Use
seqcount_init() inside the dma reservation object initializer.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://lkml.kernel.org/r/20200720155530.1173732-12-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:16 +0000 (17:55 +0200)]
seqlock: Align multi-line macros newline escapes at 72 columns
Parent commit, "seqlock: Extend seqcount API with associated locks",
introduced a big number of multi-line macros that are newline-escaped
at 72 columns.
For overall cohesion, align the earlier-existing macros similarly.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-11-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:15 +0000 (17:55 +0200)]
seqlock: Extend seqcount API with associated locks
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the write side critical section.
There is no built-in debugging mechanism to verify that the lock used
for writer serialization is held and preemption is disabled. Some usage
sites like dma-buf have explicit lockdep checks for the writer-side
lock, but this covers only a small portion of the sequence counter usage
in the kernel.
Add new sequence counter types which allows to associate a lock to the
sequence counter at initialization time. The seqcount API functions are
extended to provide appropriate lockdep assertions depending on the
seqcount/lock type.
For sequence counters with associated locks that do not implicitly
disable preemption, preemption protection is enforced in the sequence
counter write side functions. This removes the need to explicitly add
preempt_disable/enable() around the write side critical sections: the
write_begin/end() functions for these new sequence counter types
automatically do this.
Introduce the following seqcount types with associated locks:
seqcount_spinlock_t
seqcount_raw_spinlock_t
seqcount_rwlock_t
seqcount_mutex_t
seqcount_ww_mutex_t
Extend the seqcount read and write functions to branch out to the
specific seqcount_LOCKTYPE_t implementation at compile-time. This avoids
kernel API explosion per each new seqcount_LOCKTYPE_t added. Add such
compile-time type detection logic into a new, internal, seqlock header.
Document the proper seqcount_LOCKTYPE_t usage, and rationale, at
Documentation/locking/seqlock.rst.
If lockdep is disabled, this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-10-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:14 +0000 (17:55 +0200)]
seqlock: lockdep assert non-preemptibility on seqcount_t write
Preemption must be disabled before entering a sequence count write side
critical section. Failing to do so, the seqcount read side can preempt
the write side section and spin for the entire scheduler tick. If that
reader belongs to a real-time scheduling class, it can spin forever and
the kernel will livelock.
Assert through lockdep that preemption is disabled for seqcount writers.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-9-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:13 +0000 (17:55 +0200)]
lockdep: Add preemption enabled/disabled assertion APIs
Asserting that preemption is enabled or disabled is a critical sanity
check. Developers are usually reluctant to add such a check in a
fastpath as reading the preemption count can be costly.
Extend the lockdep API with macros asserting that preemption is disabled
or enabled. If lockdep is disabled, or if the underlying architecture
does not support kernel preemption, this assert has no runtime overhead.
References:
f54bb2ec02c8 ("locking/lockdep: Add IRQs disabled/enabled assertion APIs: ...")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-8-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:12 +0000 (17:55 +0200)]
seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount()
raw_seqcount_begin() has the same code as raw_read_seqcount(), with the
exception of masking the sequence counter's LSB before returning it to
the caller.
Note, raw_seqcount_begin() masks the counter's LSB before returning it
to the caller so that read_seqcount_retry() can fail if the counter is
odd -- without the overhead of an extra branching instruction.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-7-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:11 +0000 (17:55 +0200)]
seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs
seqlock.h is now included by kernel's RST documentation, but a small
number of the the exported seqlock.h functions are kernel-doc annotated.
Add kernel-doc for all seqlock.h exported APIs.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-6-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:10 +0000 (17:55 +0200)]
seqlock: Reorder seqcount_t and seqlock_t API definitions
The seqlock.h seqcount_t and seqlock_t API definitions are presented in
the chronological order of their development rather than the order that
makes most sense to readers. This makes it hard to follow and understand
the header file code.
Group and reorder all of the exported seqlock.h functions according to
their function.
First, group together the seqcount_t standard read path functions:
- __read_seqcount_begin()
- raw_read_seqcount_begin()
- read_seqcount_begin()
since each function is implemented exactly in terms of the one above
it. Then, group the special-case seqcount_t readers on their own as:
- raw_read_seqcount()
- raw_seqcount_begin()
since the only difference between the two functions is that the second
one masks the sequence counter LSB while the first one does not. Note
that raw_seqcount_begin() can actually be implemented in terms of
raw_read_seqcount(), which will be done in a follow-up commit.
Then, group the seqcount_t write path functions, instead of injecting
unrelated seqcount_t latch functions between them, and order them as:
- raw_write_seqcount_begin()
- raw_write_seqcount_end()
- write_seqcount_begin_nested()
- write_seqcount_begin()
- write_seqcount_end()
- raw_write_seqcount_barrier()
- write_seqcount_invalidate()
which is the expected natural order. This also isolates the seqcount_t
latch functions into their own area, at the end of the sequence counters
section, and before jumping to the next one: sequential locks
(seqlock_t).
Do a similar grouping and reordering for seqlock_t "locking" readers vs.
the "conditionally locking or lockless" ones.
No implementation code was changed in any of the reordering above.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-5-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:09 +0000 (17:55 +0200)]
seqlock: seqcount_t latch: End read sections with read_seqcount_retry()
The seqcount_t latch reader example at the raw_write_seqcount_latch()
kernel-doc comment ends the latch read section with a manual smp memory
barrier and sequence counter comparison.
This is technically correct, but it is suboptimal: read_seqcount_retry()
already contains the same logic of an smp memory barrier and sequence
counter comparison.
End the latch read critical section example with read_seqcount_retry().
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-4-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:08 +0000 (17:55 +0200)]
seqlock: Properly format kernel-doc code samples
Align the code samples and note sections inside kernel-doc comments with
tabs. This way they can be properly parsed and rendered by Sphinx. It
also makes the code samples easier to read from text editors.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-3-a.darwish@linutronix.de
Ahmed S. Darwish [Mon, 20 Jul 2020 15:55:07 +0000 (17:55 +0200)]
Documentation: locking: Describe seqlock design and usage
Proper documentation for the design and usage of sequence counters and
sequential locks does not exist. Complete the seqlock.h documentation as
follows:
- Divide all documentation on a seqcount_t vs. seqlock_t basis. The
description for both mechanisms was intermingled, which is incorrect
since the usage constrains for each type are vastly different.
- Add an introductory paragraph describing the internal design of, and
rationale for, sequence counters.
- Document seqcount_t writer non-preemptibility requirement, which was
not previously documented anywhere, and provide a clear rationale.
- Provide template code for seqcount_t and seqlock_t initialization
and reader/writer critical sections.
- Recommend using seqlock_t by default. It implicitly handles the
serialization and non-preemptibility requirements of writers.
At seqlock.h:
- Remove references to brlocks as they've long been removed from the
kernel.
- Remove references to gcc-3.x since the kernel's minimum supported
gcc version is 4.9.
References:
0f6ed63b1707 ("no need to keep brlock macros anymore...")
References:
6ec4476ac825 ("Raise gcc version requirement to 4.9")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-2-a.darwish@linutronix.de
Peter Zijlstra [Wed, 29 Jul 2020 14:14:21 +0000 (16:14 +0200)]
Merge branch 'locking/header'
Herbert Xu [Wed, 29 Jul 2020 12:33:16 +0000 (22:33 +1000)]
locking/qspinlock: Do not include atomic.h from qspinlock_types.h
This patch breaks a header loop involving qspinlock_types.h.
The issue is that qspinlock_types.h includes atomic.h, which then
eventually includes kernel.h which could lead back to the original
file via spinlock_types.h.
As ATOMIC_INIT is now defined by linux/types.h, there is no longer
any need to include atomic.h from qspinlock_types.h. This also
allows the CONFIG_PARAVIRT hack to be removed since it was trying
to prevent exactly this loop.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://lkml.kernel.org/r/20200729123316.GC7047@gondor.apana.org.au
Herbert Xu [Wed, 29 Jul 2020 12:31:05 +0000 (22:31 +1000)]
locking/atomic: Move ATOMIC_INIT into linux/types.h
This patch moves ATOMIC_INIT from asm/atomic.h into linux/types.h.
This allows users of atomic_t to use ATOMIC_INIT without having to
include atomic.h as that way may lead to header loops.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://lkml.kernel.org/r/20200729123105.GB7047@gondor.apana.org.au
Herbert Xu [Thu, 16 Jul 2020 06:36:50 +0000 (16:36 +1000)]
lockdep: Move list.h inclusion into lockdep.h
Currently lockdep_types.h includes list.h without actually using any
of its macros or functions. All it needs are the type definitions
which were moved into types.h long ago. This potentially causes
inclusion loops because both are included by many core header
files.
This patch moves the list.h inclusion into lockdep.h. Note that
we could probably remove it completely but that could potentially
result in compile failures should any end users not include list.h
directly and also be unlucky enough to not get list.h via some other
header file.
Reported-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Petr Mladek <pmladek@suse.com>
Link: https://lkml.kernel.org/r/20200716063649.GA23065@gondor.apana.org.au
Ingo Molnar [Sat, 25 Jul 2020 19:49:36 +0000 (21:49 +0200)]
Merge tag 'v5.8-rc6' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Chris Wilson [Sat, 25 Jul 2020 18:51:10 +0000 (19:51 +0100)]
locking/lockdep: Fix overflow in presentation of average lock-time
Though the number of lock-acquisitions is tracked as unsigned long, this
is passed as the divisor to div_s64() which interprets it as a s32,
giving nonsense values with more than 2 billion acquisitons. E.g.
acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg
-------------------------------------------------------------------------
2350439395 0.07 353.38
649647067.36 0.-32
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200725185110.11588-1-chris@chris-wilson.co.uk
Linus Torvalds [Sat, 25 Jul 2020 01:30:24 +0000 (18:30 -0700)]
Merge tag 'pci-v5.8-fixes-2' of git://git./linux/kernel/git/helgaas/pci into master
Pull PCI fixes from Bjorn Helgaas:
- Reject invalid IRQ 0 command line argument for virtio_mmio because
IRQ 0 now generates warnings (Bjorn Helgaas)
- Revert "PCI/PM: Assume ports without DLL Link Active train links in
100 ms", which broke nouveau (Bjorn Helgaas)
* tag 'pci-v5.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"
virtio-mmio: Reject invalid IRQ 0 command line argument
Linus Torvalds [Fri, 24 Jul 2020 23:27:54 +0000 (16:27 -0700)]
Merge tag 'nfsd-5.8-2' of git://linux-nfs.org/~bfields/linux into master
Pull nfsd fix from Bruce Fields:
"Just one fix for a NULL dereference if someone happens to read
/proc/fs/nfsd/client/../state at the wrong moment"
* tag 'nfsd-5.8-2' of git://linux-nfs.org/~bfields/linux:
nfsd4: fix NULL dereference in nfsd/clients display code
Linus Torvalds [Fri, 24 Jul 2020 21:24:35 +0000 (14:24 -0700)]
Merge branch 'akpm' into master (patches from Andrew)
Merge misc fixes from Andrew Morton:
"Subsystems affected by this patch series: mm/pagemap, mm/shmem,
mm/hotfixes, mm/memcg, mm/hugetlb, mailmap, squashfs, scripts,
io-mapping, MAINTAINERS, and gdb"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
scripts/gdb: fix lx-symbols 'gdb.error' while loading modules
MAINTAINERS: add KCOV section
io-mapping: indicate mapping failure
scripts/decode_stacktrace: strip basepath from all paths
squashfs: fix length field overlap check in metadata reading
mailmap: add entry for Mike Rapoport
khugepaged: fix null-pointer dereference due to race
mm/hugetlb: avoid hardcoding while checking if cma is enabled
mm: memcg/slab: fix memory leak at non-root kmem_cache destroy
mm/memcg: fix refcount error while moving and swapping
mm/memcontrol: fix OOPS inside mem_cgroup_get_nr_swap_pages()
mm: initialize return of vm_insert_pages
vfs/xattr: mm/shmem: kernfs: release simple xattr entry in a right way
mm/mmap.c: close race between munmap() and expand_upwards()/downwards()
Linus Torvalds [Fri, 24 Jul 2020 21:19:00 +0000 (14:19 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs into master
Pull xtensa csum regression fix from Al Viro:
"Max Filippov caught a breakage introduced in xtensa this cycle
by the csum_and_copy_..._user() series.
Cut'n'paste from the wrong source - the check that belongs
in csum_and_copy_to_user() ended up both there and in
csum_and_copy_from_user()"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
xtensa: fix access check in csum_and_copy_from_user
Linus Torvalds [Fri, 24 Jul 2020 21:16:12 +0000 (14:16 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux into master
Pull arm64 fix from Will Deacon:
"Fix compat vDSO build flags for recent versions of clang to tell it
where to find the assembler"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: vdso32: Fix '--prefix=' value for newer versions of clang
Linus Torvalds [Fri, 24 Jul 2020 21:11:43 +0000 (14:11 -0700)]
Merge tag 'for-5.8-rc6-tag' of git://git./linux/kernel/git/kdave/linux into master
Pull btrfs fixes from David Sterba:
"A few resouce leak fixes from recent patches, all are stable material.
The problems have been observed during testing or have a reproducer"
* tag 'for-5.8-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix mount failure caused by race with umount
btrfs: fix page leaks after failure to lock page for delalloc
btrfs: qgroup: fix data leak caused by race between writeback and truncate
btrfs: fix double free on ulist after backref resolution failure
Linus Torvalds [Fri, 24 Jul 2020 21:09:19 +0000 (14:09 -0700)]
Merge tag 'zonefs-5.8-rc7' of git://git./linux/kernel/git/dlemoal/zonefs into master
Pull zonefs fixes from Damien Le Moal:
"Two fixes, the first one to remove compilation warnings and the second
to avoid potentially inefficient allocation of BIOs for direct writes
into sequential zones"
* tag 'zonefs-5.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: count pages after truncating the iterator
zonefs: Fix compilation warning
Linus Torvalds [Fri, 24 Jul 2020 21:02:41 +0000 (14:02 -0700)]
Merge tag 'io_uring-5.8-2020-07-24' of git://git.kernel.dk/linux-block into master
Pull io_uring fixes from Jens Axboe:
- Fix discrepancy in how sqe->flags are treated for a few requests,
this makes it consistent (Daniele)
- Ensure that poll driven retry works with double waitqueue poll users
- Fix a missing io_req_init_async() (Pavel)
* tag 'io_uring-5.8-2020-07-24' of git://git.kernel.dk/linux-block:
io_uring: missed req_init_async() for IOSQE_ASYNC
io_uring: always allow drain/link/hardlink/async sqe flags
io_uring: ensure double poll additions work with both request types
Linus Torvalds [Fri, 24 Jul 2020 20:58:05 +0000 (13:58 -0700)]
Merge tag 'iommu-fix-v5.8-rc6' of git://git./linux/kernel/git/joro/iommu into master
Pull iommu fix from Joerg Roedel:
"Fix a NULL-ptr dereference in the QCOM IOMMU driver"
* tag 'iommu-fix-v5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/qcom: Use domain rather than dev as tlb cookie
Linus Torvalds [Fri, 24 Jul 2020 20:48:57 +0000 (13:48 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma into master
Pull rdma fixes from Jason Gunthorpe:
"One merge window regression, some corruption bugs in HNS and a few
more syzkaller fixes:
- Two long standing syzkaller races
- Fix incorrect HW configuration in HNS
- Restore accidentally dropped locking in IB CM
- Fix ODP prefetch bug added in the big rework several versions ago"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/mlx5: Prevent prefetch from racing with implicit destruction
RDMA/cm: Protect access to remote_sidr_table
RDMA/core: Fix race in rdma_alloc_commit_uobject()
RDMA/hns: Fix wrong PBL offset when VA is not aligned to PAGE_SIZE
RDMA/hns: Fix wrong assignment of lp_pktn_ini in QPC
RDMA/mlx5: Use xa_lock_irq when access to SRQ table
Linus Torvalds [Fri, 24 Jul 2020 20:44:14 +0000 (13:44 -0700)]
Merge tag 'for-5.8/dm-fixes-3' of git://git./linux/kernel/git/device-mapper/linux-dm into master
Pull device mapper fix from Mike Snitzer:
"A stable fix for DM integrity target's integrity recalculation that
gets skipped when resuming a device. This is a fix for a previous
stable@ fix"
* tag 'for-5.8/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm integrity: fix integrity recalculation that is improperly skipped
Linus Torvalds [Fri, 24 Jul 2020 20:41:13 +0000 (13:41 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux into master
Pull i2c fixes from Wolfram Sang:
"Again some driver bugfixes and some documentation fixes"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i2c-qcom-geni: Fix DMA transfer race
i2c: rcar: always clear ICSAR to avoid side effects
MAINTAINERS: i2c: at91: handover maintenance to Codrin Ciubotariu
i2c: drop duplicated word in the header file
i2c: cadence: Clear HOLD bit at correct time in Rx path
Revert "i2c: cadence: Fix the hold bit setting"
Linus Torvalds [Fri, 24 Jul 2020 20:37:38 +0000 (13:37 -0700)]
Merge tag 'mmc-v5.8-rc5' of git://git./linux/kernel/git/ulfh/mmc into master
Pull MMC fix from Ulf Hansson:
"Fix clock divider calculation in the ASPEED SDHCI controller"
* tag 'mmc-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-of-aspeed: Fix clock divider calculation
Linus Torvalds [Fri, 24 Jul 2020 20:35:55 +0000 (13:35 -0700)]
Merge tag 'drm-fixes-2020-07-24' of git://anongit.freedesktop.org/drm/drm into master
Pull drm fixes from Dave Airlie:
"Quiet fixes, I may have a single regression fix follow up to this for
nouveau, but it might be next week, Ben was testing it a bit more .
Otherwise two amdgpu fixes, one lima and one sun4i:
amdgpu:
- Fix crash when overclocking VegaM
- Fix possible crash when editing dpm levels
sun4i:
- Fix inverted HPD result; fixes an earlier fix
lima:
- fix timeout during reset"
* tag 'drm-fixes-2020-07-24' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: Fix NULL dereference in dpm sysfs handlers
drm/amd/powerplay: fix a crash when overclocking Vega M
drm/lima: fix wait pp reset timeout
drm: sun4i: hdmi: Fix inverted HPD result
Stefano Garzarella [Fri, 24 Jul 2020 04:15:52 +0000 (21:15 -0700)]
scripts/gdb: fix lx-symbols 'gdb.error' while loading modules
Commit
ed66f991bb19 ("module: Refactor section attr into bin attribute")
removed the 'name' field from 'struct module_sect_attr' triggering the
following error when invoking lx-symbols:
(gdb) lx-symbols
loading vmlinux
scanning for modules in linux/build
loading @0xffffffffc014f000: linux/build/drivers/net/tun.ko
Python Exception <class 'gdb.error'> There is no member named name.:
Error occurred in Python: There is no member named name.
This patch fixes the issue taking the module name from the 'struct
attribute'.
Fixes:
ed66f991bb19 ("module: Refactor section attr into bin attribute")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Kieran Bingham <kbingham@kernel.org>
Link: http://lkml.kernel.org/r/20200722102239.313231-1-sgarzare@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Konovalov [Fri, 24 Jul 2020 04:15:49 +0000 (21:15 -0700)]
MAINTAINERS: add KCOV section
To link KCOV to the kasan-dev@ mailing list.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Link: http://lkml.kernel.org/r/5fa344db7ac4af2213049e5656c0f43d6ecaa379.1595331682.git.andreyknvl@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael J. Ruhl [Fri, 24 Jul 2020 04:15:46 +0000 (21:15 -0700)]
io-mapping: indicate mapping failure
The !ATOMIC_IOMAP version of io_maping_init_wc will always return
success, even when the ioremap fails.
Since the ATOMIC_IOMAP version returns NULL when the init fails, and
callers check for a NULL return on error this is unexpected.
During a device probe, where the ioremap failed, a crash can look like
this:
BUG: unable to handle page fault for address:
0000000000210000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
Oops: 0002 [#1] PREEMPT SMP
CPU: 0 PID: 177 Comm:
RIP: 0010:fill_page_dma [i915]
gen8_ppgtt_create [i915]
i915_ppgtt_create [i915]
intel_gt_init [i915]
i915_gem_init [i915]
i915_driver_probe [i915]
pci_device_probe
really_probe
driver_probe_device
The remap failure occurred much earlier in the probe. If it had been
propagated, the driver would have exited with an error.
Return NULL on ioremap failure.
[akpm@linux-foundation.org: detect ioremap_wc() errors earlier]
Fixes:
cafaf14a5d8f ("io-mapping: Always create a struct to hold metadata about the io-mapping")
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200721171936.81563-1-michael.j.ruhl@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pi-Hsun Shih [Fri, 24 Jul 2020 04:15:43 +0000 (21:15 -0700)]
scripts/decode_stacktrace: strip basepath from all paths
Currently the basepath is removed only from the beginning of the string.
When the symbol is inlined and there's multiple line outputs of
addr2line, only the first line would have basepath removed.
Change to remove the basepath prefix from all lines.
Fixes:
31013836a71e ("scripts/decode_stacktrace: match basepath using shell prefix operator, not regex")
Co-developed-by: Shik Chen <shik@chromium.org>
Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org>
Signed-off-by: Shik Chen <shik@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Nicolas Boichat <drinkcat@chromium.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Link: http://lkml.kernel.org/r/20200720082709.252805-1-pihsun@chromium.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Phillip Lougher [Fri, 24 Jul 2020 04:15:40 +0000 (21:15 -0700)]
squashfs: fix length field overlap check in metadata reading
This is a regression introduced by the "migrate from ll_rw_block usage
to BIO" patch.
Squashfs packs structures on byte boundaries, and due to that the length
field (of the metadata block) may not be fully in the current block.
The new code rewrote and introduced a faulty check for that edge case.
Fixes:
93e72b3c612adcaca1 ("squashfs: migrate from ll_rw_block usage to BIO")
Reported-by: Bernd Amend <bernd.amend@gmail.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Adrien Schildknecht <adrien+dev@schischi.me>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Daniel Rosenberg <drosen@google.com>
Link: http://lkml.kernel.org/r/20200717195536.16069-1-phillip@squashfs.org.uk
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Fri, 24 Jul 2020 04:15:37 +0000 (21:15 -0700)]
mailmap: add entry for Mike Rapoport
Add an entry to correct my email addresses.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200708095414.12275-1-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Fri, 24 Jul 2020 04:15:34 +0000 (21:15 -0700)]
khugepaged: fix null-pointer dereference due to race
khugepaged has to drop mmap lock several times while collapsing a page.
The situation can change while the lock is dropped and we need to
re-validate that the VMA is still in place and the PMD is still subject
for collapse.
But we miss one corner case: while collapsing an anonymous pages the VMA
could be replaced with file VMA. If the file VMA doesn't have any
private pages we get NULL pointer dereference:
general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
anon_vma_lock_write include/linux/rmap.h:120 [inline]
collapse_huge_page mm/khugepaged.c:1110 [inline]
khugepaged_scan_pmd mm/khugepaged.c:1349 [inline]
khugepaged_scan_mm_slot mm/khugepaged.c:2110 [inline]
khugepaged_do_scan mm/khugepaged.c:2193 [inline]
khugepaged+0x3bba/0x5a10 mm/khugepaged.c:2238
The fix is to make sure that the VMA is anonymous in
hugepage_vma_revalidate(). The helper is only used for collapsing
anonymous pages.
Fixes:
99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Reported-by: syzbot+ed318e8b790ca72c5ad0@syzkaller.appspotmail.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200722121439.44328-1-kirill.shutemov@linux.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Barry Song [Fri, 24 Jul 2020 04:15:30 +0000 (21:15 -0700)]
mm/hugetlb: avoid hardcoding while checking if cma is enabled
hugetlb_cma[0] can be NULL due to various reasons, for example, node0
has no memory. so NULL hugetlb_cma[0] doesn't necessarily mean cma is
not enabled. gigantic pages might have been reserved on other nodes.
This patch fixes possible double reservation and CMA leak.
[akpm@linux-foundation.org: fix CONFIG_CMA=n warning]
[sfr@canb.auug.org.au: better checks before using hugetlb_cma]
Link: http://lkml.kernel.org/r/20200721205716.6dbaa56b@canb.auug.org.au
Fixes:
cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200710005726.36068-1-song.bao.hua@hisilicon.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Muchun Song [Fri, 24 Jul 2020 04:15:27 +0000 (21:15 -0700)]
mm: memcg/slab: fix memory leak at non-root kmem_cache destroy
If the kmem_cache refcount is greater than one, we should not mark the
root kmem_cache as dying. If we mark the root kmem_cache dying
incorrectly, the non-root kmem_cache can never be destroyed. It
resulted in memory leak when memcg was destroyed. We can use the
following steps to reproduce.
1) Use kmem_cache_create() to create a new kmem_cache named A.
2) Coincidentally, the kmem_cache A is an alias for kmem_cache B,
so the refcount of B is just increased.
3) Use kmem_cache_destroy() to destroy the kmem_cache A, just
decrease the B's refcount but mark the B as dying.
4) Create a new memory cgroup and alloc memory from the kmem_cache
B. It leads to create a non-root kmem_cache for allocating memory.
5) When destroy the memory cgroup created in the step 4), the
non-root kmem_cache can never be destroyed.
If we repeat steps 4) and 5), this will cause a lot of memory leak. So
only when refcount reach zero, we mark the root kmem_cache as dying.
Fixes:
92ee383f6daa ("mm: fix race between kmem_cache destroy, create and deactivate")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200716165103.83462-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Fri, 24 Jul 2020 04:15:24 +0000 (21:15 -0700)]
mm/memcg: fix refcount error while moving and swapping
It was hard to keep a test running, moving tasks between memcgs with
move_charge_at_immigrate, while swapping: mem_cgroup_id_get_many()'s
refcount is discovered to be 0 (supposedly impossible), so it is then
forced to REFCOUNT_SATURATED, and after thousands of warnings in quick
succession, the test is at last put out of misery by being OOM killed.
This is because of the way moved_swap accounting was saved up until the
task move gets completed in __mem_cgroup_clear_mc(), deferred from when
mem_cgroup_move_swap_account() actually exchanged old and new ids.
Concurrent activity can free up swap quicker than the task is scanned,
bringing id refcount down 0 (which should only be possible when
offlining).
Just skip that optimization: do that part of the accounting immediately.
Fixes:
615d66c37c75 ("mm: memcontrol: fix memcg id ref counter on swap charge move")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2007071431050.4726@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bhupesh Sharma [Fri, 24 Jul 2020 04:15:21 +0000 (21:15 -0700)]
mm/memcontrol: fix OOPS inside mem_cgroup_get_nr_swap_pages()
Prabhakar reported an OOPS inside mem_cgroup_get_nr_swap_pages()
function in a corner case seen on some arm64 boards when kdump kernel
runs with "cgroup_disable=memory" passed to the kdump kernel via
bootargs.
The root-cause behind the same is that currently mem_cgroup_swap_init()
function is implemented as a subsys_initcall() call instead of a
core_initcall(), this means 'cgroup_memory_noswap' still remains set to
the default value (false) even when memcg is disabled via
"cgroup_disable=memory" boot parameter.
This may result in premature OOPS inside mem_cgroup_get_nr_swap_pages()
function in corner cases:
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000188
Mem abort info:
ESR = 0x96000006
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000006
CM = 0, WnR = 0
[
0000000000000188] user address but active_mm is swapper
Internal error: Oops:
96000006 [#1] SMP
Modules linked in:
<..snip..>
Call trace:
mem_cgroup_get_nr_swap_pages+0x9c/0xf4
shrink_lruvec+0x404/0x4f8
shrink_node+0x1a8/0x688
do_try_to_free_pages+0xe8/0x448
try_to_free_pages+0x110/0x230
__alloc_pages_slowpath.constprop.106+0x2b8/0xb48
__alloc_pages_nodemask+0x2ac/0x2f8
alloc_page_interleave+0x20/0x90
alloc_pages_current+0xdc/0xf8
atomic_pool_expand+0x60/0x210
__dma_atomic_pool_init+0x50/0xa4
dma_atomic_pool_init+0xac/0x158
do_one_initcall+0x50/0x218
kernel_init_freeable+0x22c/0x2d0
kernel_init+0x18/0x110
ret_from_fork+0x10/0x18
Code:
aa1403e3 91106000 97f82a27 14000011 (
f940c663)
---[ end trace
9795948475817de4 ]---
Kernel panic - not syncing: Fatal exception
Rebooting in 10 seconds..
Fixes:
eccb52e78809 ("mm: memcontrol: prepare swap controller setup for integration")
Reported-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: http://lkml.kernel.org/r/1593641660-13254-2-git-send-email-bhsharma@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tom Rix [Fri, 24 Jul 2020 04:15:18 +0000 (21:15 -0700)]
mm: initialize return of vm_insert_pages
clang static analysis reports a garbage return
In file included from mm/memory.c:84:
mm/memory.c:1612:2: warning: Undefined or garbage value returned to caller [core.uninitialized.UndefReturn]
return err;
^~~~~~~~~~
The setting of err depends on a loop executing. So initialize err.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200703155354.29132-1-trix@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chengguang Xu [Fri, 24 Jul 2020 04:15:14 +0000 (21:15 -0700)]
vfs/xattr: mm/shmem: kernfs: release simple xattr entry in a right way
After commit
fdc85222d58e ("kernfs: kvmalloc xattr value instead of
kmalloc"), simple xattr entry is allocated with kvmalloc() instead of
kmalloc(), so we should release it with kvfree() instead of kfree().
Fixes:
fdc85222d58e ("kernfs: kvmalloc xattr value instead of kmalloc")
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Daniel Xu <dxu@dxuuu.xyz>
Cc: Chris Down <chris@chrisdown.name>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org> [5.7]
Link: http://lkml.kernel.org/r/20200704051608.15043-1-cgxu519@mykernel.net
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Fri, 24 Jul 2020 04:15:11 +0000 (21:15 -0700)]
mm/mmap.c: close race between munmap() and expand_upwards()/downwards()
VMA with VM_GROWSDOWN or VM_GROWSUP flag set can change their size under
mmap_read_lock(). It can lead to race with __do_munmap():
Thread A Thread B
__do_munmap()
detach_vmas_to_be_unmapped()
mmap_write_downgrade()
expand_downwards()
vma->vm_start = address;
// The VMA now overlaps with
// VMAs detached by the Thread A
// page fault populates expanded part
// of the VMA
unmap_region()
// Zaps pagetables partly
// populated by Thread B
Similar race exists for expand_upwards().
The fix is to avoid downgrading mmap_lock in __do_munmap() if detached
VMAs are next to VM_GROWSDOWN or VM_GROWSUP VMA.
[akpm@linux-foundation.org: s/mmap_sem/mmap_lock/ in comment]
Fixes:
dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.20+]
Link: http://lkml.kernel.org/r/20200709105309.42495-1-kirill.shutemov@linux.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 23 Jul 2020 20:42:46 +0000 (13:42 -0700)]
Merge tag 's390-5.8-6' of git://git./linux/kernel/git/s390/linux into master
Pull s390 fixes from Heiko Carstens:
- Change cpum_cf/perf counter name from DFLT_CCERROR to DFLT_CCFINISH
to reflect reality and avoid further confusion. This is a user space
visible change therefore the commit has also a stable tag for 5.7,
where this counter was introduced.
- Add Matthew Rosato as s390 IOMMU maintainer.
* tag 's390-5.8-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
MAINTAINERS: add Matthew for s390 IOMMU
s390/cpum_cf,perf: change DFLT_CCERROR counter name
Douglas Anderson [Wed, 22 Jul 2020 22:00:21 +0000 (15:00 -0700)]
i2c: i2c-qcom-geni: Fix DMA transfer race
When I have KASAN enabled on my kernel and I start stressing the
touchscreen my system tends to hang. The touchscreen is one of the
only things that does a lot of big i2c transfers and ends up hitting
the DMA paths in the geni i2c driver. It appears that KASAN adds
enough delay in my system to tickle a race condition in the DMA setup
code.
When the system hangs, I found that it was running the geni_i2c_irq()
over and over again. It had these:
m_stat = 0x04000080
rx_st = 0x30000011
dm_tx_st = 0x00000000
dm_rx_st = 0x00000000
dma = 0x00000001
Notably we're in DMA mode but are getting M_RX_IRQ_EN and
M_RX_FIFO_WATERMARK_EN over and over again.
Putting some traces in geni_i2c_rx_one_msg() showed that when we
failed we were getting to the start of geni_i2c_rx_one_msg() but were
never executing geni_se_rx_dma_prep().
I believe that the problem here is that we are starting the geni
command before we run geni_se_rx_dma_prep(). If a transfer makes it
far enough before we do that then we get into the state I have
observed. Let's change the order, which seems to work fine.
Although problems were seen on the RX path, code inspection suggests
that the TX should be changed too. Change it as well.
Fixes:
37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Reviewed-by: Akash Asthana <akashast@codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Mukesh Kumar Savaliya <msavaliy@codeaurora.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Wolfram Sang [Sat, 4 Jul 2020 13:38:29 +0000 (15:38 +0200)]
i2c: rcar: always clear ICSAR to avoid side effects
On R-Car Gen2, we get a timeout when reading from the address set in
ICSAR, even though the slave interface is disabled. Clearing it fixes
this situation. Note that Gen3 is not affected.
To reproduce: bind and undbind an I2C slave on some bus, run
'i2cdetect' on that bus.
Fixes:
de20d1857dd6 ("i2c: rcar: add slave support")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Mikulas Patocka [Thu, 23 Jul 2020 14:42:09 +0000 (10:42 -0400)]
dm integrity: fix integrity recalculation that is improperly skipped
Commit
adc0daad366b62ca1bce3e2958a40b0b71a8b8b3 ("dm: report suspended
device during destroy") broke integrity recalculation.
The problem is dm_suspended() returns true not only during suspend,
but also during resume. So this race condition could occur:
1. dm_integrity_resume calls queue_work(ic->recalc_wq, &ic->recalc_work)
2. integrity_recalc (&ic->recalc_work) preempts the current thread
3. integrity_recalc calls if (unlikely(dm_suspended(ic->ti))) goto unlock_ret;
4. integrity_recalc exits and no recalculating is done.
To fix this race condition, add a function dm_post_suspending that is
only true during the postsuspend phase and use it instead of
dm_suspended().
Signed-off-by: Mikulas Patocka <mpatocka redhat com>
Fixes:
adc0daad366b ("dm: report suspended device during destroy")
Cc: stable vger kernel org # v4.18+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pavel Begunkov [Thu, 23 Jul 2020 17:17:20 +0000 (20:17 +0300)]
io_uring: missed req_init_async() for IOSQE_ASYNC
IOSQE_ASYNC branch of io_queue_sqe() is another place where an
unitialised req->work can be accessed (i.e. prior io_req_init_async()).
Nothing really bad though, it just looses IO_WQ_WORK_CONCURRENT flag.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Nathan Chancellor [Thu, 23 Jul 2020 04:15:10 +0000 (21:15 -0700)]
arm64: vdso32: Fix '--prefix=' value for newer versions of clang
Newer versions of clang only look for $(COMPAT_GCC_TOOLCHAIN_DIR)as [1],
rather than $(COMPAT_GCC_TOOLCHAIN_DIR)$(CROSS_COMPILE_COMPAT)as,
resulting in the following build error:
$ make -skj"$(nproc)" ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- \
CROSS_COMPILE_COMPAT=arm-linux-gnueabi- LLVM=1 O=out/aarch64 distclean \
defconfig arch/arm64/kernel/vdso32/
...
/home/nathan/cbl/toolchains/llvm-binutils/bin/as: unrecognized option '-EL'
clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)
make[3]: *** [arch/arm64/kernel/vdso32/Makefile:181: arch/arm64/kernel/vdso32/note.o] Error 1
...
Adding the value of CROSS_COMPILE_COMPAT (adding notdir to account for a
full path for CROSS_COMPILE_COMPAT) fixes this issue, which matches the
solution done for the main Makefile [2].
[1]: https://github.com/llvm/llvm-project/commit/
3452a0d8c17f7166f479706b293caf6ac76ffd90
[2]: https://lore.kernel.org/lkml/
20200721173125.1273884-1-maskray@google.com/
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1099
Link: https://lore.kernel.org/r/20200723041509.400450-1-natechancellor@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
Dave Airlie [Thu, 23 Jul 2020 04:06:14 +0000 (14:06 +1000)]
Merge tag 'amd-drm-fixes-5.8-2020-07-22' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.8-2020-07-22:
amdgpu:
- Fix crash when overclocking VegaM
- Fix possible crash when editing dpm levels
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200723032608.3865-1-alexander.deucher@amd.com
Dave Airlie [Thu, 23 Jul 2020 04:05:28 +0000 (14:05 +1000)]
Merge tag 'drm-misc-fixes-2020-07-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
* sun4i: Fix inverted HPD result; fixes an earlier fix
* lima: fix timeout during reset
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20200722070321.GA29190@linux-uq9g
J. Bruce Fields [Wed, 15 Jul 2020 17:31:36 +0000 (13:31 -0400)]
nfsd4: fix NULL dereference in nfsd/clients display code
We hold the cl_lock here, and that's enough to keep stateid's from going
away, but it's not enough to prevent the files they point to from going
away. Take fi_lock and a reference and check for NULL, as we do in
other code.
Reported-by: NeilBrown <neilb@suse.de>
Fixes:
78599c42ae3c ("nfsd4: add file to display list of client's opens")
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Linus Torvalds [Wed, 22 Jul 2020 18:56:00 +0000 (11:56 -0700)]
Merge tag 'media/v5.8-3' of git://git./linux/kernel/git/mchehab/linux-media into master
Pull media fixes from Mauro Carvalho Chehab:
"A series of fixes for the upcoming atomisp driver. They solve issues
when probing atomisp on devices with multiple cameras and get rid of
warnings when built with W=1.
The diffstat is a bit long, as this driver has several abstractions.
The patches that solved the issues with W=1 had to get rid of some
duplicated code (there used to have 2 versions of the same code, one
for ISP2401 and another one for ISP2400).
As this driver is not in 5.7, such changes won't cause regressions"
* tag 'media/v5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (38 commits)
Revert "media: atomisp: keep the ISP powered on when setting it"
media: atomisp: fix mask and shift operation on ISPSSPM0
media: atomisp: move system_local consts into a C file
media: atomisp: get rid of version-specific system_local.h
media: atomisp: move global stuff into a common header
media: atomisp: remove non-used 32-bits consts at system_local
media: atomisp: get rid of some unused static vars
media: atomisp: Fix error code in ov5693_probe()
media: atomisp: Replace trace_printk by pr_info
media: atomisp: Fix __func__ style warnings
media: atomisp: fix help message for ISP2401 selection
media: atomisp: i2c: atomisp-ov2680.c: fixed a brace coding style issue.
media: atomisp: make const arrays static, makes object smaller
media: atomisp: Clean up non-existing folders from Makefile
media: atomisp: Get rid of ACPI specifics in gmin_subdev_add()
media: atomisp: Provide Gmin subdev as parameter to gmin_subdev_add()
media: atomisp: Use temporary variable for device in gmin_subdev_add()
media: atomisp: Refactor PMIC detection to a separate function
media: atomisp: Deduplicate return ret in gmin_i2c_write()
media: atomisp: Make pointer to PMIC client global
...
Linus Torvalds [Wed, 22 Jul 2020 18:30:07 +0000 (11:30 -0700)]
Merge tag 'exfat-for-5.8-rc7' of git://git./linux/kernel/git/linkinjeon/exfat into master
Pull exfat fixes from Namjae Jeon:
- fix overflow issue at sector calculation
- fix wrong hint_stat initialization
- fix wrong size update of stream entry
- fix endianness of upname in name_hash computation
* tag 'exfat-for-5.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: fix name_hash computation on big endian systems
exfat: fix wrong size update of stream entry by typo
exfat: fix wrong hint_stat initialization in exfat_find_dir_entry()
exfat: fix overflow issue in exfat_cluster_to_sector()
Bjorn Helgaas [Fri, 17 Jul 2020 22:21:28 +0000 (17:21 -0500)]
Revert "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"
This reverts commit
ec411e02b7a2e785a4ed9ed283207cd14f48699d.
Patrick reported that this commit broke hybrid graphics on a ThinkPad X1
Extreme 2nd with Intel UHD Graphics 630 and NVIDIA GeForce GTX 1650 Mobile:
nouveau 0000:01:00.0: fifo: PBDMA0:
01000000 [] ch 0 [
00ff992000 DRM] subc 0 mthd 0008 data
00000000
Karol reported that this commit broke Nouveau firmware loading on a Lenovo
P1G2 with Intel UHD Graphics 630 and NVIDIA TU117GLM [Quadro T1000 Mobile]:
nouveau 0000:01:00.0: acr: AHESASC binary failed
In both cases, reverting
ec411e02b7a2 solved the problem. Unfortunately,
this revert will reintroduce the "Thunderbolt bridges take long time to
resume from D3cold" problem:
https://bugzilla.kernel.org/show_bug.cgi?id=206837
Link: https://lore.kernel.org/r/CAErSpo5sTeK_my1dEhWp7aHD0xOp87+oHYWkTjbL7ALgDbXo-Q@mail.gmail.com
Link: https://lore.kernel.org/r/CACO55tsAEa5GXw5oeJPG=mcn+qxNvspXreJYWDJGZBy5v82JDA@mail.gmail.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208597
Reported-by: Patrick Volkerding <volkerdi@gmail.com>
Reported-by: Karol Herbst <kherbst@redhat.com>
Fixes:
ec411e02b7a2 ("PCI/PM: Assume ports without DLL Link Active train links in 100 ms")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Bjorn Helgaas [Wed, 1 Jul 2020 20:53:15 +0000 (15:53 -0500)]
virtio-mmio: Reject invalid IRQ 0 command line argument
The "virtio_mmio.device=" command line argument allows a user to specify
the size, address, and IRQ of a virtio device. Previously the only
requirement for the IRQ was that it be an unsigned integer.
Zero is an unsigned integer but an invalid IRQ number, and after
a85a6c86c25be ("driver core: platform: Clarify that IRQ 0 is invalid"),
attempts to use IRQ 0 cause warnings.
If the user specifies IRQ 0, return failure instead of registering a device
with IRQ 0.
Fixes:
a85a6c86c25be ("driver core: platform: Clarify that IRQ 0 is invalid")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Rob Clark [Mon, 20 Jul 2020 15:52:17 +0000 (08:52 -0700)]
iommu/qcom: Use domain rather than dev as tlb cookie
The device may be torn down, but the domain should still be valid. Lets
use that as the tlb flush ops cookie.
Fixes a problem reported in [1]
[1] https://lkml.org/lkml/2020/7/20/104
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes:
09b5dfff9ad6 ("iommu/qcom: Use accessor functions for iommu private data")
Link: https://lore.kernel.org/r/20200720155217.274994-1-robdclark@gmail.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Gerald Schaefer [Tue, 21 Jul 2020 13:04:30 +0000 (15:04 +0200)]
MAINTAINERS: add Matthew for s390 IOMMU
Acked-By: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Ludovic Desroches [Thu, 9 Jul 2020 08:42:33 +0000 (10:42 +0200)]
MAINTAINERS: i2c: at91: handover maintenance to Codrin Ciubotariu
My colleague Codrin Ciubotariu, now, maintains this driver internally.
Then I handover the mainline maintenance to him.
Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Randy Dunlap [Fri, 17 Jul 2020 23:38:15 +0000 (16:38 -0700)]
i2c: drop duplicated word in the header file
Drop the doubled word "be" in a comment.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Raviteja Narayanam [Fri, 3 Jul 2020 13:56:12 +0000 (19:26 +0530)]
i2c: cadence: Clear HOLD bit at correct time in Rx path
There are few issues on Zynq SOC observed in the stress tests causing
timeout errors. Even though all the data is received, timeout error
is thrown. This is due to an IP bug in which the COMP bit in ISR is
not set at end of transfer and completion interrupt is not generated.
This bug is seen on Zynq platforms when the following condition occurs:
Master read & HOLD bit set & Transfer size register reaches '0'.
One workaround is to clear the HOLD bit before the transfer size
register reaches '0'. The current implementation checks for this at
the start of the loop and also only for less than FIFO DEPTH case
(ignoring the equal to case).
So clear the HOLD bit when the data yet to receive is less than or
equal to the FIFO DEPTH. This avoids the IP bug condition.
Signed-off-by: Raviteja Narayanam <raviteja.narayanam@xilinx.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Raviteja Narayanam [Fri, 3 Jul 2020 13:55:49 +0000 (19:25 +0530)]
Revert "i2c: cadence: Fix the hold bit setting"
This reverts commit
d358def706880defa4c9e87381c5bf086a97d5f9.
There are two issues with "i2c: cadence: Fix the hold bit setting" commit.
1. In case of combined message request from user space, when the HOLD
bit is cleared in cdns_i2c_mrecv function, a STOP condition is sent
on the bus even before the last message is started. This is because when
the HOLD bit is cleared, the FIFOS are empty and there is no pending
transfer. The STOP condition should occur only after the last message
is completed.
2. The code added by the commit is redundant. Driver is handling the
setting/clearing of HOLD bit in right way before the commit.
The setting of HOLD bit based on 'bus_hold_flag' is taken care in
cdns_i2c_master_xfer function even before cdns_i2c_msend/cdns_i2c_recv
functions.
The clearing of HOLD bit is taken care at the end of cdns_i2c_msend and
cdns_i2c_recv functions based on bus_hold_flag and byte count.
Since clearing of HOLD bit is done after the slave address is written to
the register (writing to address register triggers the message transfer),
it is ensured that STOP condition occurs at the right time after
completion of the pending transfer (last message).
Signed-off-by: Raviteja Narayanam <raviteja.narayanam@xilinx.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Max Filippov [Tue, 21 Jul 2020 22:00:35 +0000 (15:00 -0700)]
xtensa: fix access check in csum_and_copy_from_user
Commit
d341659f470b ("xtensa: switch to providing
csum_and_copy_from_user()") introduced access check, but incorrectly
tested dst instead of src.
Fix access_ok argument in csum_and_copy_from_user.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Fixes:
d341659f470b ("xtensa: switch to providing csum_and_copy_from_user()")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Boris Burkov [Thu, 16 Jul 2020 20:29:46 +0000 (13:29 -0700)]
btrfs: fix mount failure caused by race with umount
It is possible to cause a btrfs mount to fail by racing it with a slow
umount. The crux of the sequence is generic_shutdown_super not yet
calling sop->put_super before btrfs_mount_root calls btrfs_open_devices.
If that occurs, btrfs_open_devices will decide the opened counter is
non-zero, increment it, and skip resetting fs_devices->total_rw_bytes to
0. From here, mount will call sget which will result in grab_super
trying to take the super block umount semaphore. That semaphore will be
held by the slow umount, so mount will block. Before up-ing the
semaphore, umount will delete the super block, resulting in mount's sget
reliably allocating a new one, which causes the mount path to dutifully
fill it out, and increment total_rw_bytes a second time, which causes
the mount to fail, as we see double the expected bytes.
Here is the sequence laid out in greater detail:
CPU0 CPU1
down_write sb->s_umount
btrfs_kill_super
kill_anon_super(sb)
generic_shutdown_super(sb);
shrink_dcache_for_umount(sb);
sync_filesystem(sb);
evict_inodes(sb); // SLOW
btrfs_mount_root
btrfs_scan_one_device
fs_devices = device->fs_devices
fs_info->fs_devices = fs_devices
// fs_devices-opened makes this a no-op
btrfs_open_devices(fs_devices, mode, fs_type)
s = sget(fs_type, test, set, flags, fs_info);
find sb in s_instances
grab_super(sb);
down_write(&s->s_umount); // blocks
sop->put_super(sb)
// sb->fs_devices->opened == 2; no-op
spin_lock(&sb_lock);
hlist_del_init(&sb->s_instances);
spin_unlock(&sb_lock);
up_write(&sb->s_umount);
return 0;
retry lookup
don't find sb in s_instances (deleted by CPU0)
s = alloc_super
return s;
btrfs_fill_super(s, fs_devices, data)
open_ctree // fs_devices total_rw_bytes improperly set!
btrfs_read_chunk_tree
read_one_dev // increment total_rw_bytes again!!
super_total_bytes < fs_devices->total_rw_bytes // ERROR!!!
To fix this, we clear total_rw_bytes from within btrfs_read_chunk_tree
before the calls to read_one_dev, while holding the sb umount semaphore
and the uuid mutex.
To reproduce, it is sufficient to dirty a decent number of inodes, then
quickly umount and mount.
for i in $(seq 0 500)
do
dd if=/dev/zero of="/mnt/foo/$i" bs=1M count=1
done
umount /mnt/foo&
mount /mnt/foo
does the trick for me.
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Robbie Ko [Mon, 20 Jul 2020 01:42:09 +0000 (09:42 +0800)]
btrfs: fix page leaks after failure to lock page for delalloc
When locking pages for delalloc, we check if it's dirty and mapping still
matches. If it does not match, we need to return -EAGAIN and release all
pages. Only the current page was put though, iterate over all the
remaining pages too.
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 17 Jul 2020 07:12:05 +0000 (15:12 +0800)]
btrfs: qgroup: fix data leak caused by race between writeback and truncate
[BUG]
When running tests like generic/013 on test device with btrfs quota
enabled, it can normally lead to data leak, detected at unmount time:
BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096
------------[ cut here ]------------
WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs]
RIP: 0010:close_ctree+0x1dc/0x323 [btrfs]
Call Trace:
btrfs_put_super+0x15/0x17 [btrfs]
generic_shutdown_super+0x72/0x110
kill_anon_super+0x18/0x30
btrfs_kill_super+0x17/0x30 [btrfs]
deactivate_locked_super+0x3b/0xa0
deactivate_super+0x40/0x50
cleanup_mnt+0x135/0x190
__cleanup_mnt+0x12/0x20
task_work_run+0x64/0xb0
__prepare_exit_to_usermode+0x1bc/0x1c0
__syscall_return_slowpath+0x47/0x230
do_syscall_64+0x64/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
---[ end trace
caf08beafeca2392 ]---
BTRFS error (device dm-3): qgroup reserved space leaked
[CAUSE]
In the offending case, the offending operations are:
2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0
2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0
The following sequence of events could happen after the writev():
CPU1 (writeback) | CPU2 (truncate)
-----------------------------------------------------------------
btrfs_writepages() |
|- extent_write_cache_pages() |
|- Got page for 1003520 |
| 1003520 is Dirty, no writeback |
| So (!clear_page_dirty_for_io()) |
| gets called for it |
|- Now page 1003520 is Clean. |
| | btrfs_setattr()
| | |- btrfs_setsize()
| | |- truncate_setsize()
| | New i_size is 18388
|- __extent_writepage() |
| |- page_offset() > i_size |
|- btrfs_invalidatepage() |
|- Page is clean, so no qgroup |
callback executed
This means, the qgroup reserved data space is not properly released in
btrfs_invalidatepage() as the page is Clean.
[FIX]
Instead of checking the dirty bit of a page, call
btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage().
As qgroup rsv are completely bound to the QGROUP_RESERVED bit of
io_tree, not bound to page status, thus we won't cause double freeing
anyway.
Fixes:
0b34c261e235 ("btrfs: qgroup: Prevent qgroup->reserved from going subzero")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Paweł Gronowski [Sun, 19 Jul 2020 15:54:53 +0000 (17:54 +0200)]
drm/amdgpu: Fix NULL dereference in dpm sysfs handlers
NULL dereference occurs when string that is not ended with space or
newline is written to some dpm sysfs interface (for example pp_dpm_sclk).
This happens because strsep replaces the tmp with NULL if the delimiter
is not present in string, which is then dereferenced by tmp[0].
Reproduction example:
sudo sh -c 'echo -n 1 > /sys/class/drm/card0/device/pp_dpm_sclk'
Signed-off-by: Paweł Gronowski <me@woland.xyz>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Qiu Wenbo [Fri, 17 Jul 2020 07:09:57 +0000 (15:09 +0800)]
drm/amd/powerplay: fix a crash when overclocking Vega M
Avoid kernel crash when vddci_control is SMU7_VOLTAGE_CONTROL_NONE and
vddci_voltage_table is empty. It has been tested on Intel Hades Canyon
(i7-8809G).
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=208489
Fixes:
ac7822b0026f ("drm/amd/powerplay: add smumgr support for VEGAM (v2)")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Qiu Wenbo <qiuwenbo@phytium.com.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Filipe Manana [Mon, 13 Jul 2020 14:11:56 +0000 (15:11 +0100)]
btrfs: fix double free on ulist after backref resolution failure
At btrfs_find_all_roots_safe() we allocate a ulist and set the **roots
argument to point to it. However if later we fail due to an error returned
by find_parent_nodes(), we free that ulist but leave a dangling pointer in
the **roots argument. Upon receiving the error, a caller of this function
can attempt to free the same ulist again, resulting in an invalid memory
access.
One such scenario is during qgroup accounting:
btrfs_qgroup_account_extents()
--> calls btrfs_find_all_roots() passes &new_roots (a stack allocated
pointer) to btrfs_find_all_roots()
--> btrfs_find_all_roots() just calls btrfs_find_all_roots_safe()
passing &new_roots to it
--> allocates ulist and assigns its address to **roots (which
points to new_roots from btrfs_qgroup_account_extents())
--> find_parent_nodes() returns an error, so we free the ulist
and leave **roots pointing to it after returning
--> btrfs_qgroup_account_extents() sees btrfs_find_all_roots() returned
an error and jumps to the label 'cleanup', which just tries to
free again the same ulist
Stack trace example:
------------[ cut here ]------------
BTRFS: tree first key check failed
WARNING: CPU: 1 PID: 1763215 at fs/btrfs/disk-io.c:422 btrfs_verify_level_key+0xe0/0x180 [btrfs]
Modules linked in: dm_snapshot dm_thin_pool (...)
CPU: 1 PID: 1763215 Comm: fsstress Tainted: G W 5.8.0-rc3-btrfs-next-64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:btrfs_verify_level_key+0xe0/0x180 [btrfs]
Code: 28 5b 5d (...)
RSP: 0018:
ffffb89b473779a0 EFLAGS:
00010286
RAX:
0000000000000000 RBX:
ffff90397759bf08 RCX:
0000000000000000
RDX:
0000000000000001 RSI:
0000000000000027 RDI:
00000000ffffffff
RBP:
ffff9039a419c000 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
ffffb89b43301000 R12:
000000000000005e
R13:
ffffb89b47377a2e R14:
ffffb89b473779af R15:
0000000000000000
FS:
00007fc47e1e1000(0000) GS:
ffff9039ac200000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007fc47e1df000 CR3:
00000003d9e4e001 CR4:
00000000003606e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
read_block_for_search+0xf6/0x350 [btrfs]
btrfs_next_old_leaf+0x242/0x650 [btrfs]
resolve_indirect_refs+0x7cf/0x9e0 [btrfs]
find_parent_nodes+0x4ea/0x12c0 [btrfs]
btrfs_find_all_roots_safe+0xbf/0x130 [btrfs]
btrfs_qgroup_account_extents+0x9d/0x390 [btrfs]
btrfs_commit_transaction+0x4f7/0xb20 [btrfs]
btrfs_sync_file+0x3d4/0x4d0 [btrfs]
do_fsync+0x38/0x70
__x64_sys_fdatasync+0x13/0x20
do_syscall_64+0x5c/0xe0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fc47e2d72e3
Code: Bad RIP value.
RSP: 002b:
00007fffa32098c8 EFLAGS:
00000246 ORIG_RAX:
000000000000004b
RAX:
ffffffffffffffda RBX:
0000000000000003 RCX:
00007fc47e2d72e3
RDX:
00007fffa3209830 RSI:
00007fffa3209830 RDI:
0000000000000003
RBP:
000000000000072e R08:
0000000000000001 R09:
0000000000000003
R10:
0000000000000000 R11:
0000000000000246 R12:
00000000000003e8
R13:
0000000051eb851f R14:
00007fffa3209970 R15:
00005607c4ac8b50
irq event stamp: 0
hardirqs last enabled at (0): [<
0000000000000000>] 0x0
hardirqs last disabled at (0): [<
ffffffffb8eb5e85>] copy_process+0x755/0x1eb0
softirqs last enabled at (0): [<
ffffffffb8eb5e85>] copy_process+0x755/0x1eb0
softirqs last disabled at (0): [<
0000000000000000>] 0x0
---[ end trace
8639237550317b48 ]---
BTRFS error (device sdc): tree first key mismatch detected, bytenr=
62324736 parent_transid=94 key expected=(262,108,1351680) has=(259,108,1921024)
general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
CPU: 2 PID: 1763215 Comm: fsstress Tainted: G W 5.8.0-rc3-btrfs-next-64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:ulist_release+0x14/0x60 [btrfs]
Code: c7 07 00 (...)
RSP: 0018:
ffffb89b47377d60 EFLAGS:
00010282
RAX:
6b6b6b6b6b6b6b6b RBX:
ffff903959b56b90 RCX:
0000000000000000
RDX:
0000000000000001 RSI:
0000000000270024 RDI:
ffff9036e2adc840
RBP:
ffff9036e2adc848 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000000 R12:
ffff9036e2adc840
R13:
0000000000000015 R14:
ffff9039a419ccf8 R15:
ffff90395d605840
FS:
00007fc47e1e1000(0000) GS:
ffff9039ac600000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f8c1c0a51c8 CR3:
00000003d9e4e004 CR4:
00000000003606e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
ulist_free+0x13/0x20 [btrfs]
btrfs_qgroup_account_extents+0xf3/0x390 [btrfs]
btrfs_commit_transaction+0x4f7/0xb20 [btrfs]
btrfs_sync_file+0x3d4/0x4d0 [btrfs]
do_fsync+0x38/0x70
__x64_sys_fdatasync+0x13/0x20
do_syscall_64+0x5c/0xe0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fc47e2d72e3
Code: Bad RIP value.
RSP: 002b:
00007fffa32098c8 EFLAGS:
00000246 ORIG_RAX:
000000000000004b
RAX:
ffffffffffffffda RBX:
0000000000000003 RCX:
00007fc47e2d72e3
RDX:
00007fffa3209830 RSI:
00007fffa3209830 RDI:
0000000000000003
RBP:
000000000000072e R08:
0000000000000001 R09:
0000000000000003
R10:
0000000000000000 R11:
0000000000000246 R12:
00000000000003e8
R13:
0000000051eb851f R14:
00007fffa3209970 R15:
00005607c4ac8b50
Modules linked in: dm_snapshot dm_thin_pool (...)
---[ end trace
8639237550317b49 ]---
RIP: 0010:ulist_release+0x14/0x60 [btrfs]
Code: c7 07 00 (...)
RSP: 0018:
ffffb89b47377d60 EFLAGS:
00010282
RAX:
6b6b6b6b6b6b6b6b RBX:
ffff903959b56b90 RCX:
0000000000000000
RDX:
0000000000000001 RSI:
0000000000270024 RDI:
ffff9036e2adc840
RBP:
ffff9036e2adc848 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000000 R12:
ffff9036e2adc840
R13:
0000000000000015 R14:
ffff9039a419ccf8 R15:
ffff90395d605840
FS:
00007fc47e1e1000(0000) GS:
ffff9039ad200000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f6a776f7d40 CR3:
00000003d9e4e002 CR4:
00000000003606e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Fix this by making btrfs_find_all_roots_safe() set *roots to NULL after
it frees the ulist.
Fixes:
8da6d5815c592b ("Btrfs: added btrfs_find_all_roots()")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jason Gunthorpe [Sun, 19 Jul 2020 06:54:35 +0000 (09:54 +0300)]
RDMA/mlx5: Prevent prefetch from racing with implicit destruction
Prefetch work in mlx5_ib_prefetch_mr_work can be queued and able to run
concurrently with destruction of the implicit MR. The num_deferred_work
was intended to serialize this, but there is a race:
CPU0 CPU1
mlx5_ib_free_implicit_mr()
xa_erase(odp_mkeys)
synchronize_srcu()
__xa_erase(implicit_children)
mlx5_ib_prefetch_mr_work()
pagefault_mr()
pagefault_implicit_mr()
implicit_get_child_mr()
xa_cmpxchg()
atomic_dec_and_test(num_deferred_mr)
wait_event(imr->q_deferred_work)
ib_umem_odp_release(odp_imr)
kfree(odp_imr)
At this point in mlx5_ib_free_implicit_mr() the implicit_children list is
supposed to be empty forever so that destroy_unused_implicit_child_mr()
and related are not and will not be running.
Since it is not empty the destroy_unused_implicit_child_mr() flow ends up
touching deallocated memory as mlx5_ib_free_implicit_mr() already tore down the
imr parent.
The solution is to flush out the prefetch wq by driving num_deferred_work
to zero after creation of new prefetch work is blocked.
Fixes:
5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy")
Link: https://lore.kernel.org/r/20200719065435.130722-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Linus Torvalds [Tue, 21 Jul 2020 15:06:45 +0000 (08:06 -0700)]
Merge tag 'sound-5.8-rc7' of git://git./linux/kernel/git/tiwai/sound into master
Pull sound fixes from Takashi Iwai:
"This became fairly large, containing mostly the collection of ASoC
fixes that slipped from the previous request, so I sent now a bit
earlier than usual. But all changes look small and mostly
device-specific, hence nothing to worry too much.
Majority of changes are for x86 based platforms and their CODEC
drivers, in order to address some issues hit by their recent tests and
fuzzing. The rest are other ASoC device-specific fixes (imx, qcom,
wm8974, amd, rockchip) as well as a trivial fix for a kernel WARNING
hit by syzkaller"
* tag 'sound-5.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (28 commits)
ALSA: hda/realtek: Fixed ALC298 sound bug by adding quirk for Samsung Notebook Pen S
ALSA: info: Drop WARN_ON() from buffer NULL sanity check
ASoC: rt5682: Report the button event in the headset type only
ASoC: Intel: bytcht_es8316: Add missed put_device()
ASoC: rt5682: Enable Vref2 under using PLL2
ASoC: rt286: fix unexpected interrupt happens
ASoC: wm8974: remove unsupported clock mode
ASoC: wm8974: fix Boost Mixer Aux Switch
ASoC: SOF: core: fix null-ptr-deref bug during device removal
ASoc: codecs: max98373: remove Idle_bias_on to let codec suspend
ASoC: codecs: max98373: Removed superfluous volume control from chip default
ASoC: topology: fix tlvs in error handling for widget_dmixer
ASoC: topology: fix kernel oops on route addition error
ASoC: SOF: imx: add min/max channels for SAI/ESAI on i.MX8/i.MX8M
ASoC: Intel: bdw-rt5677: fix non BE conversion
ASoC: soc-dai: set dai_link dpcm_ flags with a helper
MAINTAINERS: Add Shengjiu to reviewer list of sound/soc/fsl
ASoC: core: Remove only the registered component in devm functions
MAINTAINERS: Change Maintainer for some at91 drivers
ASoC: dt-bindings: simple-card: Fix 'make dt_binding_check' warnings
...
Thomas Richter [Fri, 17 Jul 2020 09:27:22 +0000 (11:27 +0200)]
s390/cpum_cf,perf: change DFLT_CCERROR counter name
Change the counter name DLFT_CCERROR to DLFT_CCFINISH on IBM z15.
This counter counts completed DEFLATE instructions with exit code
0, 1 or 2. Since exit code 0 means success and exit code 1 or 2
indicate errors, change the counter name to avoid confusion.
This counter is incremented each time the DEFLATE instruction
completed regardless if an error was detected or not.
Fixes:
d68d5d51dc89 ("s390/cpum_cf: Add new extended counters for IBM z15")
Fixes:
e7950166e402 ("perf vendor events s390: Add new deflate counters for IBM z15")
Cc: stable@vger.kernel.org # v5.7
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Ilya Ponetayev [Thu, 16 Jul 2020 08:27:53 +0000 (17:27 +0900)]
exfat: fix name_hash computation on big endian systems
On-disk format for name_hash field is LE, so it must be explicitly
transformed on BE system for proper result.
Fixes:
370e812b3ec1 ("exfat: add nls operations")
Cc: stable@vger.kernel.org # v5.7
Signed-off-by: Chen Minqiang <ptpt52@gmail.com>
Signed-off-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Hyeongseok Kim [Wed, 8 Jul 2020 09:52:33 +0000 (18:52 +0900)]
exfat: fix wrong size update of stream entry by typo
The stream.size field is updated to the value of create timestamp
of the file entry. Fix this to use correct stream entry pointer.
Fixes:
29bbb14bfc80 ("exfat: fix incorrect update of stream entry in __exfat_truncate()")
Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Namjae Jeon [Fri, 3 Jul 2020 02:19:46 +0000 (11:19 +0900)]
exfat: fix wrong hint_stat initialization in exfat_find_dir_entry()
We found the wrong hint_stat initialization in exfat_find_dir_entry().
It should be initialized when cluster is EXFAT_EOF_CLUSTER.
Fixes:
ca06197382bd ("exfat: add directory operations")
Cc: stable@vger.kernel.org # v5.7
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Namjae Jeon [Fri, 3 Jul 2020 02:16:32 +0000 (11:16 +0900)]
exfat: fix overflow issue in exfat_cluster_to_sector()
An overflow issue can occur while calculating sector in
exfat_cluster_to_sector(). It needs to cast clus's type to sector_t
before left shifting.
Fixes:
1acf1a564b60 ("exfat: add in-memory and on-disk structures and headers")
Cc: stable@vger.kernel.org # v5.7
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>