Thomas Gleixner [Sat, 28 Mar 2020 10:59:24 +0000 (11:59 +0100)]
Merge branch 'uaccess.futex' of git://git./linux/kernel/git/viro/vfs into locking/core
Pull uaccess futex cleanups for Al Viro:
Consolidate access_ok() usage and the futex uaccess function zoo.
Thomas Gleixner [Sat, 21 Mar 2020 19:22:10 +0000 (20:22 +0100)]
m68knommu: Remove mm.h include from uaccess_no.h
In file included
from include/linux/huge_mm.h:8,
from include/linux/mm.h:567,
from arch/m68k/include/asm/uaccess_no.h:8,
from arch/m68k/include/asm/uaccess.h:3,
from include/linux/uaccess.h:11,
from include/linux/sched/task.h:11,
from include/linux/sched/signal.h:9,
from include/linux/rcuwait.h:6,
from include/linux/percpu-rwsem.h:7,
from kernel/locking/percpu-rwsem.c:6:
include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
1422 | struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];
Removing the include of linux/mm.h from the uaccess header solves the problem
and various build tests of nommu configurations still work.
Fixes: 80fbaf1c3f29 ("rcuwait: Add @state argument to rcuwait_wait_event()")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lkml.kernel.org/r/87fte1qzh0.fsf@nanos.tec.linutronix.de
Al Viro [Fri, 20 Mar 2020 02:23:48 +0000 (22:23 -0400)]
x86: get rid of user_atomic_cmpxchg_inatomic()
Only one user left; the thing had been made polymorphic back in 2013
for the sake of MPX. No point keeping it now that MPX is gone.
Convert futex_atomic_cmpxchg_inatomic() to user_access_{begin,end}()
while we are at it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 18 Feb 2020 17:19:23 +0000 (12:19 -0500)]
generic arch_futex_atomic_op_inuser() doesn't need access_ok()
uses get_user() and put_user() for memory accesses
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 26 Mar 2020 21:34:40 +0000 (17:34 -0400)]
x86: don't reload after cmpxchg in unsafe_atomic_op2() loop
lock cmpxchg leaves the current value in eax; no need to reload it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 16 Feb 2020 18:10:42 +0000 (13:10 -0500)]
x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()
Lift stac/clac pairs from __futex_atomic_op{1,2} into arch_futex_atomic_op_inuser(),
fold them with access_ok() in there. The switch in arch_futex_atomic_op_inuser()
is what has required the previous (objtool) commit...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 16 Feb 2020 18:07:49 +0000 (13:07 -0500)]
objtool: whitelist __sanitizer_cov_trace_switch()
it's not really different from e.g. __sanitizer_cov_trace_cmp4();
as it is, the switches that generate an array of labels get
rejected by objtool, while slightly different set of cases
that gets compiled into a series of comparisons is accepted.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 16 Feb 2020 15:26:50 +0000 (10:26 -0500)]
[parisc, s390, sparc64] no need for access_ok() in futex handling
access_ok() is always true on those
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 16 Feb 2020 15:27:46 +0000 (10:27 -0500)]
sh: no need of access_ok() in arch_futex_atomic_op_inuser()
everything it uses is doing access_ok() already
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 16 Feb 2020 15:17:27 +0000 (10:17 -0500)]
futex: arch_futex_atomic_op_inuser() calling conventions change
Move access_ok() in and pagefault_enable()/pagefault_disable() out.
Mechanical conversion only - some instances don't really need
a separate access_ok() at all (e.g. the ones only using
get_user()/put_user(), or architectures where access_ok()
is always true); we'll deal with that in followups.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sebastian Siewior [Mon, 23 Mar 2020 15:20:19 +0000 (16:20 +0100)]
completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
The warning was intended to spot complete_all() users from hardirq
context on PREEMPT_RT. The warning as-is will also trigger in interrupt
handlers, which are threaded on PREEMPT_RT, which was not intended.
Use lockdep_assert_RT_in_threaded_ctx() which triggers in non-preemptive
context on PREEMPT_RT.
Fixes: a5c6234e1028 ("completion: Use simple wait queues")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200323152019.4qjwluldohuh3by5@linutronix.de
Sebastian Andrzej Siewior [Sat, 21 Mar 2020 11:26:04 +0000 (12:26 +0100)]
lockdep: Add posixtimer context tracing bits
Splitting run_posix_cpu_timers() into two parts is work in progress which
is stuck on other entry code related problems. The heavy lifting which
involves locking of sighand lock will be moved into task context so the
necessary execution time is burdened on the task and not on interrupt
context.
Until this work completes lockdep with the spinlock nesting rules enabled
would emit warnings for this known context.
Prevent it by setting "->irq_config = 1" for the invocation of
run_posix_cpu_timers() so lockdep does not complain when sighand lock is
acquried. This will be removed once the split is completed.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.751182723@linutronix.de
Sebastian Andrzej Siewior [Sat, 21 Mar 2020 11:26:03 +0000 (12:26 +0100)]
lockdep: Annotate irq_work
Mark irq_work items with IRQ_WORK_HARD_IRQ which should be invoked in
hardirq context even on PREEMPT_RT. IRQ_WORK without this flag will be
invoked in softirq context on PREEMPT_RT.
Set ->irq_config to 1 for the IRQ_WORK items which are invoked in softirq
context so lockdep knows that these can safely acquire a spinlock_t.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.643576700@linutronix.de
Sebastian Andrzej Siewior [Sat, 21 Mar 2020 11:26:02 +0000 (12:26 +0100)]
lockdep: Add hrtimer context tracing bits
Set current->irq_config = 1 for hrtimers which are not marked to expire in
hard interrupt context during hrtimer_init(). These timers will expire in
softirq context on PREEMPT_RT.
Setting this allows lockdep to differentiate these timers. If a timer is
marked to expire in hard interrupt context then the timer callback is not
supposed to acquire a regular spinlock instead of a raw_spinlock in the
expiry callback.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.534508206@linutronix.de
Peter Zijlstra [Sat, 21 Mar 2020 11:26:01 +0000 (12:26 +0100)]
lockdep: Introduce wait-type checks
Extend lockdep to validate lock wait-type context.
The current wait-types are:
LD_WAIT_FREE, /* wait free, rcu etc.. */
LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */
LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */
Where lockdep validates that the current lock (the one being acquired)
fits in the current wait-context (as generated by the held stack).
This ensures that there is no attempt to acquire mutexes while holding
spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
other words, its a more fancy might_sleep().
Obviously RCU made the entire ordeal more complex than a simple single
value test because RCU can be acquired in (pretty much) any context and
while it presents a context to nested locks it is not the same as it
got acquired in.
Therefore its necessary to split the wait_type into two values, one
representing the acquire (outer) and one representing the nested context
(inner). For most 'normal' locks these two are the same.
[ To make static initialization easier we have the rule that:
.outer == INV means .outer == .inner; because INV == 0. ]
It further means that its required to find the minimal .inner of the held
stack to compare against the outer of the new lock; because while 'normal'
RCU presents a CONFIG type to nested locks, if it is taken while already
holding a SPIN type it obviously doesn't relax the rules.
Below is an example output generated by the trivial test code:
raw_spin_lock(&foo);
spin_lock(&bar);
spin_unlock(&bar);
raw_spin_unlock(&foo);
[ BUG: Invalid wait context ]
-----------------------------
swapper/0/1 is trying to lock:
ffffc90000013f20 (&bar){....}-{3:3}, at: kernel_init+0xdb/0x187
other info that might help us debug this:
1 lock held by swapper/0/1:
#0:
ffffc90000013ee0 (&foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187
The way to read it is to look at the new -{n,m} part in the lock
description; -{3:3} for the attempted lock, and try and match that up to
the held locks, which in this case is the one: -{2,2}.
This tells that the acquiring lock requires a more relaxed environment than
presented by the lock stack.
Currently only the normal locks and RCU are converted, the rest of the
lockdep users defaults to .inner = INV which is ignored. More conversions
can be done when desired.
The check for spinlock_t nesting is not enabled by default. It's a separate
config option for now as there are known problems which are currently
addressed. The config option allows to identify these problems and to
verify that the solutions found are indeed solving them.
The config switch will be removed and the checks will permanently enabled
once the vast majority of issues has been addressed.
[ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile
failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP]
[ tglx: Add the config option ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.427089655@linutronix.de
Thomas Gleixner [Sat, 21 Mar 2020 11:26:00 +0000 (12:26 +0100)]
completion: Use simple wait queues
completion uses a wait_queue_head_t to enqueue waiters.
wait_queue_head_t contains a spinlock_t to protect the list of waiters
which excludes it from being used in truly atomic context on a PREEMPT_RT
enabled kernel.
The spinlock in the wait queue head cannot be replaced by a raw_spinlock
because:
- wait queues can have custom wakeup callbacks, which acquire other
spinlock_t locks and have potentially long execution times
- wake_up() walks an unbounded number of list entries during the wake up
and may wake an unbounded number of waiters.
For simplicity and performance reasons complete() should be usable on
PREEMPT_RT enabled kernels.
completions do not use custom wakeup callbacks and are usually single
waiter, except for a few corner cases.
Replace the wait queue in the completion with a simple wait queue (swait),
which uses a raw_spinlock_t for protecting the waiter list and therefore is
safe to use inside truly atomic regions on PREEMPT_RT.
There is no semantical or functional change:
- completions use the exclusive wait mode which is what swait provides
- complete() wakes one exclusive waiter
- complete_all() wakes all waiters while holding the lock which protects
the wait queue against newly incoming waiters. The conversion to swait
preserves this behaviour.
complete_all() might cause unbound latencies with a large number of waiters
being woken at once, but most complete_all() usage sites are either in
testing or initialization code or have only a really small number of
concurrent waiters which for now does not cause a latency problem. Keep it
simple for now.
The fixup of the warning check in the USB gadget driver is just a straight
forward conversion of the lockless waiter check from one waitqueue type to
the other.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200321113242.317954042@linutronix.de
Thomas Gleixner [Sat, 21 Mar 2020 11:25:59 +0000 (12:25 +0100)]
sched/swait: Prepare usage in completions
As a preparation to use simple wait queues for completions:
- Provide swake_up_all_locked() to support complete_all()
- Make __prepare_to_swait() public available
This is done to enable the usage of complete() within truly atomic contexts
on a PREEMPT_RT enabled kernel.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.228481202@linutronix.de
Thomas Gleixner [Sat, 21 Mar 2020 11:25:58 +0000 (12:25 +0100)]
timekeeping: Split jiffies seqlock
seqlock consists of a sequence counter and a spinlock_t which is used to
serialize the writers. spinlock_t is substituted by a "sleeping" spinlock
on PREEMPT_RT enabled kernels which breaks the usage in the timekeeping
code as the writers are executed in hard interrupt and therefore
non-preemptible context even on PREEMPT_RT.
The spinlock in seqlock cannot be unconditionally replaced by a
raw_spinlock_t as many seqlock users have nesting spinlock sections or
other code which is not suitable to run in truly atomic context on RT.
Instead of providing a raw_seqlock API for a single use case, open code the
seqlock for the jiffies use case and implement it with a raw_spinlock_t and
a sequence counter.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.120587764@linutronix.de
Thomas Gleixner [Sat, 21 Mar 2020 11:25:57 +0000 (12:25 +0100)]
Documentation: Add lock ordering and nesting documentation
The kernel provides a variety of locking primitives. The nesting of these
lock types and the implications of them on RT enabled kernels is nowhere
documented.
Add initial documentation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.026561244@linutronix.de
Peter Zijlstra (Intel) [Sat, 21 Mar 2020 11:25:56 +0000 (12:25 +0100)]
powerpc/ps3: Convert half completion to rcuwait
The PS3 notification interrupt and kthread use a hacked up completion to
communicate. Since we're wanting to change the completion implementation and
this is abuse anyway, replace it with a simple rcuwait since there is only ever
the one waiter.
AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
cannot receive signals by default and this one doesn't look different. Use
TASK_IDLE instead.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113241.930037873@linutronix.de
Peter Zijlstra (Intel) [Sat, 21 Mar 2020 11:25:55 +0000 (12:25 +0100)]
rcuwait: Add @state argument to rcuwait_wait_event()
Extend rcuwait_wait_event() with a state variable so that it is not
restricted to UNINTERRUPTIBLE waits.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113241.824030968@linutronix.de
Sebastian Andrzej Siewior [Sat, 21 Mar 2020 11:25:54 +0000 (12:25 +0100)]
microblaze: Remove mm.h from asm/uaccess.h
The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
| CC kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
| from include/linux/mm.h:567,
| from arch/microblaze/include/asm/uaccess.h:,
| from include/linux/uaccess.h:11,
| from include/linux/sched/task.h:11,
| from include/linux/sched/signal.h:9,
| from include/linux/rcuwait.h:6,
| from include/linux/percpu-rwsem.h:8,
| from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
| 1422 | struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];
once rcuwait.h includes linux/sched/signal.h.
Remove the linux/mm.h include.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113241.719022171@linutronix.de
Sebastian Andrzej Siewior [Sat, 21 Mar 2020 11:25:53 +0000 (12:25 +0100)]
ia64: Remove mm.h from asm/uaccess.h
The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
| CC kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
| from include/linux/mm.h:567,
| from arch/ia64/include/asm/uaccess.h:,
| from include/linux/uaccess.h:11,
| from include/linux/sched/task.h:11,
| from include/linux/sched/signal.h:9,
| from include/linux/rcuwait.h:6,
| from include/linux/percpu-rwsem.h:8,
| from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
| 1422 | struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];
once rcuwait.h includes linux/sched/signal.h.
Remove the linux/mm.h include.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113241.624070289@linutronix.de
Sebastian Andrzej Siewior [Sat, 21 Mar 2020 11:25:52 +0000 (12:25 +0100)]
hexagon: Remove mm.h from asm/uaccess.h
The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
| CC kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
| from include/linux/mm.h:567,
| from arch/hexagon/include/asm/uaccess.h:,
| from include/linux/uaccess.h:11,
| from include/linux/sched/task.h:11,
| from include/linux/sched/signal.h:9,
| from include/linux/rcuwait.h:6,
| from include/linux/percpu-rwsem.h:8,
| from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
| 1422 | struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];
once rcuwait.h includes linux/sched/signal.h.
Remove the linux/mm.h include.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113241.531525286@linutronix.de
Sebastian Andrzej Siewior [Sat, 21 Mar 2020 11:25:51 +0000 (12:25 +0100)]
csky: Remove mm.h from asm/uaccess.h
The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
| CC kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
| from include/linux/mm.h:567,
| from arch/csky/include/asm/uaccess.h:,
| from include/linux/uaccess.h:11,
| from include/linux/sched/task.h:11,
| from include/linux/sched/signal.h:9,
| from include/linux/rcuwait.h:6,
| from include/linux/percpu-rwsem.h:8,
| from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
| 1422 | struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];
once rcuwait.h includes linux/sched/signal.h.
Remove the linux/mm.h include.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113241.434999165@linutronix.de
Sebastian Andrzej Siewior [Sat, 21 Mar 2020 11:25:50 +0000 (12:25 +0100)]
nds32: Remove mm.h from asm/uaccess.h
The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
| CC kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
| from include/linux/mm.h:567,
| from arch/nds32/include/asm/uaccess.h:,
| from include/linux/uaccess.h:11,
| from include/linux/sched/task.h:11,
| from include/linux/sched/signal.h:9,
| from include/linux/rcuwait.h:6,
| from include/linux/percpu-rwsem.h:8,
| from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
| 1422 | struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];
once rcuwait.h includes linux/sched/signal.h.
Remove the linux/mm.h include.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113241.339289758@linutronix.de
Peter Zijlstra [Sat, 21 Mar 2020 11:25:49 +0000 (12:25 +0100)]
acpi: Remove header dependency
In order to avoid future header hell, remove the inclusion of
proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a
struct.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lkml.kernel.org/r/20200321113241.246190285@linutronix.de
Thomas Gleixner [Sat, 21 Mar 2020 11:25:48 +0000 (12:25 +0100)]
orinoco_usb: Use the regular completion interfaces
The completion usage in this driver is interesting:
- it uses a magic complete function which according to the comment was
implemented by invoking complete() four times in a row because
complete_all() was not exported at that time.
- it uses an open coded wait/poll which checks completion:done. Only one wait
side (device removal) uses the regular wait_for_completion() interface.
The rationale behind this is to prevent that wait_for_completion() consumes
completion::done which would prevent that all waiters are woken. This is not
necessary with complete_all() as that sets completion::done to UINT_MAX which
is left unmodified by the woken waiters.
Replace the magic complete function with complete_all() and convert the
open coded wait/poll to regular completion interfaces.
This changes the wait to exclusive wait mode. But that does not make any
difference because the wakers use complete_all() which ignores the
exclusive mode.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200321113241.150783464@linutronix.de
Thomas Gleixner [Sat, 21 Mar 2020 11:25:47 +0000 (12:25 +0100)]
usb: gadget: Use completion interface instead of open coding it
ep_io() uses a completion on stack and open codes the waiting with:
wait_event_interruptible (done.wait, done.done);
and
wait_event (done.wait, done.done);
This waits in non-exclusive mode for complete(), but there is no reason to
do so because the completion can only be waited for by the task itself and
complete() wakes exactly one exlusive waiter.
Replace the open coded implementation with the corresponding
wait_for_completion*() functions.
No functional change.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200321113241.043380271@linutronix.de
Sebastian Andrzej Siewior [Sat, 21 Mar 2020 11:25:46 +0000 (12:25 +0100)]
pci/switchtec: Replace completion wait queue usage for poll
The poll callback is using the completion wait queue and sticks it into
poll_wait() to wake up pollers after a command has completed.
This works to some extent, but cannot provide EPOLLEXCLUSIVE support
because the waker side uses complete_all() which unconditionally wakes up
all waiters. complete_all() is required because completions internally use
exclusive wait and complete() only wakes up one waiter by default.
This mixes conceptually different mechanisms and relies on internal
implementation details of completions, which in turn puts contraints on
changing the internal implementation of completions.
Replace it with a regular wait queue and store the state in struct
switchtec_user.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113240.936097534@linutronix.de
Logan Gunthorpe [Sat, 21 Mar 2020 11:25:45 +0000 (12:25 +0100)]
PCI/switchtec: Fix init_completion race condition with poll_wait()
The call to init_completion() in mrpc_queue_cmd() can theoretically
race with the call to poll_wait() in switchtec_dev_poll().
poll() write()
switchtec_dev_poll() switchtec_dev_write()
poll_wait(&s->comp.wait); mrpc_queue_cmd()
init_completion(&s->comp)
init_waitqueue_head(&s->comp.wait)
To my knowledge, no one has hit this bug.
Fix this by using reinit_completion() instead of init_completion() in
mrpc_queue_cmd().
Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lkml.kernel.org/r/20200313183608.2646-1-logang@deltatee.com
Peter Zijlstra [Thu, 20 Feb 2020 08:45:02 +0000 (09:45 +0100)]
lockdep: Teach lockdep about "USED" <- "IN-NMI" inversions
nmi_enter() does lockdep_off() and hence lockdep ignores everything.
And NMI context makes it impossible to do full IN-NMI tracking like we
do IN-HARDIRQ, that could result in graph_lock recursion.
However, since look_up_lock_class() is lockless, we can find the class
of a lock that has prior use and detect IN-NMI after USED, just not
USED after IN-NMI.
NOTE: By shifting the lockdep_off() recursion count to bit-16, we can
easily differentiate between actual recursion and off.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lkml.kernel.org/r/20200221134215.090538203@infradead.org
Peter Zijlstra [Fri, 13 Mar 2020 10:09:49 +0000 (11:09 +0100)]
locking/lockdep: Rework lockdep_lock
A few sites want to assert we own the graph_lock/lockdep_lock, provide
a more conventional lock interface for it with a number of trivial
debug checks.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313102107.GX12561@hirez.programming.kicks-ass.net
Peter Zijlstra [Fri, 13 Mar 2020 08:56:38 +0000 (09:56 +0100)]
locking/lockdep: Fix bad recursion pattern
There were two patterns for lockdep_recursion:
Pattern-A:
if (current->lockdep_recursion)
return
current->lockdep_recursion = 1;
/* do stuff */
current->lockdep_recursion = 0;
Pattern-B:
current->lockdep_recursion++;
/* do stuff */
current->lockdep_recursion--;
But a third pattern has emerged:
Pattern-C:
current->lockdep_recursion = 1;
/* do stuff */
current->lockdep_recursion = 0;
And while this isn't broken per-se, it is highly dangerous because it
doesn't nest properly.
Get rid of all Pattern-C instances and shore up Pattern-A with a
warning.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313093325.GW12561@hirez.programming.kicks-ass.net
Boqun Feng [Thu, 12 Mar 2020 15:12:55 +0000 (23:12 +0800)]
locking/lockdep: Avoid recursion in lockdep_count_{for,back}ward_deps()
Qian Cai reported a bug when PROVE_RCU_LIST=y, and read on /proc/lockdep
triggered a warning:
[ ] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
...
[ ] Call Trace:
[ ] lock_is_held_type+0x5d/0x150
[ ] ? rcu_lockdep_current_cpu_online+0x64/0x80
[ ] rcu_read_lock_any_held+0xac/0x100
[ ] ? rcu_read_lock_held+0xc0/0xc0
[ ] ? __slab_free+0x421/0x540
[ ] ? kasan_kmalloc+0x9/0x10
[ ] ? __kmalloc_node+0x1d7/0x320
[ ] ? kvmalloc_node+0x6f/0x80
[ ] __bfs+0x28a/0x3c0
[ ] ? class_equal+0x30/0x30
[ ] lockdep_count_forward_deps+0x11a/0x1a0
The warning got triggered because lockdep_count_forward_deps() call
__bfs() without current->lockdep_recursion being set, as a result
a lockdep internal function (__bfs()) is checked by lockdep, which is
unexpected, and the inconsistency between the irq-off state and the
state traced by lockdep caused the warning.
Apart from this warning, lockdep internal functions like __bfs() should
always be protected by current->lockdep_recursion to avoid potential
deadlocks and data inconsistency, therefore add the
current->lockdep_recursion on-and-off section to protect __bfs() in both
lockdep_count_forward_deps() and lockdep_count_backward_deps()
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200312151258.128036-1-boqun.feng@gmail.com
Will Deacon [Thu, 13 Feb 2020 09:39:27 +0000 (09:39 +0000)]
asm-generic/bitops: Update stale comment
The comment in 'asm-generic/bitops.h' states that you should "recode
these in the native assembly language, if at all possible". This is
pretty crappy advice now that the generic implementation is defined in
terms of atomic_long_t rather than a spinlock, so update the comment and
hopefully save future architecture maintainers a bit of work.
Reported-by: Stefan Asserhall <stefana@xilinx.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200213093927.1836-1-will@kernel.org
Peter Zijlstra [Wed, 4 Mar 2020 12:24:24 +0000 (13:24 +0100)]
futex: Remove {get,drop}_futex_key_refs()
Now that {get,drop}_futex_key_refs() have become a glorified NOP,
remove them entirely.
The only thing get_futex_key_refs() is still doing is an smp_mb(), and
now that we don't need to (ab)use existing atomic ops to obtain them,
we can place it explicitly where we need it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Peter Zijlstra [Wed, 4 Mar 2020 12:02:41 +0000 (13:02 +0100)]
futex: Remove pointless mmgrap() + mmdrop()
We always set 'key->private.mm' to 'current->mm', getting an extra
reference on 'current->mm' is quite pointless, because as long as the
task is blocked it isn't going to go away.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Peter Zijlstra [Fri, 6 Mar 2020 10:06:17 +0000 (11:06 +0100)]
Merge branch 'locking/urgent'
Peter Zijlstra [Wed, 4 Mar 2020 10:28:31 +0000 (11:28 +0100)]
futex: Fix inode life-time issue
As reported by Jann, ihold() does not in fact guarantee inode
persistence. And instead of making it so, replace the usage of inode
pointers with a per boot, machine wide, unique inode identifier.
This sequence number is global, but shared (file backed) futexes are
rare enough that this should not become a performance issue.
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Linus Torvalds [Sun, 1 Mar 2020 22:38:46 +0000 (16:38 -0600)]
Linux 5.6-rc4
Linus Torvalds [Sun, 1 Mar 2020 22:35:08 +0000 (16:35 -0600)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Two more bug fixes (including a regression) for 5.6"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: potential crash on allocation error in ext4_alloc_flex_bg_array()
jbd2: fix data races at struct journal_head
Linus Torvalds [Sun, 1 Mar 2020 21:16:35 +0000 (15:16 -0600)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"More bugfixes, including a few remaining "make W=1" issues such as too
large frame sizes on some configurations.
On the ARM side, the compiler was messing up shadow stacks between EL1
and EL2 code, which is easily fixed with __always_inline"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: check descriptor table exits on instruction emulation
kvm: x86: Limit the number of "kvm: disabled by bios" messages
KVM: x86: avoid useless copy of cpufreq policy
KVM: allow disabling -Werror
KVM: x86: allow compiling as non-module with W=1
KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis
KVM: Introduce pv check helpers
KVM: let declaration of kvm_get_running_vcpus match implementation
KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter
arm64: Ask the compiler to __always_inline functions used by KVM at HYP
KVM: arm64: Define our own swab32() to avoid a uapi static inline
KVM: arm64: Ask the compiler to __always_inline functions used at HYP
kvm: arm/arm64: Fold VHE entry/exit work into kvm_vcpu_run_vhe()
KVM: arm/arm64: Fix up includes for trace.h
Oliver Upton [Sat, 29 Feb 2020 19:30:14 +0000 (11:30 -0800)]
KVM: VMX: check descriptor table exits on instruction emulation
KVM emulates UMIP on hardware that doesn't support it by setting the
'descriptor table exiting' VM-execution control and performing
instruction emulation. When running nested, this emulation is broken as
KVM refuses to emulate L2 instructions by default.
Correct this regression by allowing the emulation of descriptor table
instructions if L1 hasn't requested 'descriptor table exiting'.
Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode")
Reported-by: Jan Kiszka <jan.kiszka@web.de>
Cc: stable@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Sun, 1 Mar 2020 01:16:46 +0000 (19:16 -0600)]
Merge branch 'i2c/for-current-fixed' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C has three driver bugfixes for you. We agreed on the Mac regression
to go in via I2C"
* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
macintosh: therm_windtunnel: fix regression when instantiating devices
i2c: altera: Fix potential integer overflow
i2c: jz4780: silence log flood on txabrt
Dan Carpenter [Fri, 28 Feb 2020 09:22:56 +0000 (12:22 +0300)]
ext4: potential crash on allocation error in ext4_alloc_flex_bg_array()
If sbi->s_flex_groups_allocated is zero and the first allocation fails
then this code will crash. The problem is that "i--" will set "i" to
-1 but when we compare "i >= sbi->s_flex_groups_allocated" then the -1
is type promoted to unsigned and becomes UINT_MAX. Since UINT_MAX
is more than zero, the condition is true so we call kvfree(new_groups[-1]).
The loop will carry on freeing invalid memory until it crashes.
Fixes: 7c990728b99e ("ext4: fix potential race between s_flex_groups online resizing and access")
Reviewed-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20200228092142.7irbc44yaz3by7nb@kili.mountain
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Wolfram Sang [Tue, 25 Feb 2020 14:12:29 +0000 (15:12 +0100)]
macintosh: therm_windtunnel: fix regression when instantiating devices
Removing attach_adapter from this driver caused a regression for at
least some machines. Those machines had the sensors described in their
DT, too, so they didn't need manual creation of the sensor devices. The
old code worked, though, because manual creation came first. Creation of
DT devices then failed later and caused error logs, but the sensors
worked nonetheless because of the manually created devices.
When removing attach_adaper, manual creation now comes later and loses
the race. The sensor devices were already registered via DT, yet with
another binding, so the driver could not be bound to it.
This fix refactors the code to remove the race and only manually creates
devices if there are no DT nodes present. Also, the DT binding is updated
to match both, the DT and manually created devices. Because we don't
know which device creation will be used at runtime, the code to start
the kthread is moved to do_probe() which will be called by both methods.
Fixes: 3e7bed52719d ("macintosh: therm_windtunnel: drop using attach_adapter")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201723
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Tested-by: Erhard Furtner <erhard_f@mailbox.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org # v4.19+
Qian Cai [Sat, 22 Feb 2020 04:31:11 +0000 (23:31 -0500)]
jbd2: fix data races at struct journal_head
journal_head::b_transaction and journal_head::b_next_transaction could
be accessed concurrently as noticed by KCSAN,
LTP: starting fsync04
/dev/zero: Can't open blockdev
EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
==================================================================
BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2]
write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70:
__jbd2_journal_refile_buffer+0xdd/0x210 [jbd2]
__jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569
jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2]
(inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034
kjournald2+0x13b/0x450 [jbd2]
kthread+0x1cd/0x1f0
ret_from_fork+0x27/0x50
read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68:
jbd2_write_access_granted+0x1b2/0x250 [jbd2]
jbd2_write_access_granted at fs/jbd2/transaction.c:1155
jbd2_journal_get_write_access+0x2c/0x60 [jbd2]
__ext4_journal_get_write_access+0x50/0x90 [ext4]
ext4_mb_mark_diskspace_used+0x158/0x620 [ext4]
ext4_mb_new_blocks+0x54f/0xca0 [ext4]
ext4_ind_map_blocks+0xc79/0x1b40 [ext4]
ext4_map_blocks+0x3b4/0x950 [ext4]
_ext4_get_block+0xfc/0x270 [ext4]
ext4_get_block+0x3b/0x50 [ext4]
__block_write_begin_int+0x22e/0xae0
__block_write_begin+0x39/0x50
ext4_write_begin+0x388/0xb50 [ext4]
generic_perform_write+0x15d/0x290
ext4_buffered_write_iter+0x11f/0x210 [ext4]
ext4_file_write_iter+0xce/0x9e0 [ext4]
new_sync_write+0x29c/0x3b0
__vfs_write+0x92/0xa0
vfs_write+0x103/0x260
ksys_write+0x9d/0x130
__x64_sys_write+0x4c/0x60
do_syscall_64+0x91/0xb05
entry_SYSCALL_64_after_hwframe+0x49/0xbe
5 locks held by fsync04/25724:
#0:
ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260
#1:
ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4]
#2:
ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
#3:
ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4]
#4:
ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2]
irq event stamp:
1407125
hardirqs last enabled at (
1407125): [<
ffffffff980da9b7>] __find_get_block+0x107/0x790
hardirqs last disabled at (
1407124): [<
ffffffff980da8f9>] __find_get_block+0x49/0x790
softirqs last enabled at (
1405528): [<
ffffffff98a0034c>] __do_softirq+0x34c/0x57c
softirqs last disabled at (
1405521): [<
ffffffff97cc67a2>] irq_exit+0xa2/0xc0
Reported by Kernel Concurrency Sanitizer on:
CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-
20200221+ #7
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
The plain reads are outside of jh->b_state_lock critical section which result
in data races. Fix them by adding pairs of READ|WRITE_ONCE().
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pw
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Linus Torvalds [Sat, 29 Feb 2020 15:58:47 +0000 (09:58 -0600)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four small fixes.
Three are in drivers for fairly obvious bugs. The fourth is a set of
regressions introduced by the compat_ioctl changes because some of the
compat updates wrongly replaced .ioctl instead of .compat_ioctl"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four appropriate places
scsi: zfcp: fix wrong data and display format of SFP+ temperature
scsi: sd_sbc: Fix sd_zbc_report_zones()
scsi: libfc: free response frame from GPN_ID
Linus Torvalds [Fri, 28 Feb 2020 19:51:53 +0000 (11:51 -0800)]
Merge tag 'pci-v5.6-fixes-2' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Fix build issue on 32-bit ARM with old compilers (Marek Szyprowski)
- Update MAINTAINERS for recent Cadence driver file move (Lukas
Bulwahn)
* tag 'pci-v5.6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
MAINTAINERS: Correct Cadence PCI driver path
PCI: brcmstb: Fix build on 32bit ARM platforms with older compilers
Linus Torvalds [Fri, 28 Feb 2020 19:43:30 +0000 (11:43 -0800)]
Merge tag 'block-5.6-2020-02-28' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Passthrough insertion fix (Ming)
- Kill off some unused arguments (John)
- blktrace RCU fix (Jan)
- Dead fields removal for null_blk (Dongli)
- NVMe polled IO fix (Bijan)
* tag 'block-5.6-2020-02-28' of git://git.kernel.dk/linux-block:
nvme-pci: Hold cq_poll_lock while completing CQEs
blk-mq: Remove some unused function arguments
null_blk: remove unused fields in 'nullb_cmd'
blktrace: Protect q->blk_trace with RCU
blk-mq: insert passthrough request into hctx->dispatch directly
Linus Torvalds [Fri, 28 Feb 2020 19:39:14 +0000 (11:39 -0800)]
Merge tag 'io_uring-5.6-2020-02-28' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- Fix for a race with IOPOLL used with SQPOLL (Xiaoguang)
- Only show ->fdinfo if procfs is enabled (Tobias)
- Fix for a chain with multiple personalities in the SQEs
- Fix for a missing free of personality idr on exit
- Removal of the spin-for-work optimization
- Fix for next work lookup on request completion
- Fix for non-vec read/write result progation in case of links
- Fix for a fileset references on switch
- Fix for a recvmsg/sendmsg 32-bit compatability mode
* tag 'io_uring-5.6-2020-02-28' of git://git.kernel.dk/linux-block:
io_uring: fix 32-bit compatability with sendmsg/recvmsg
io_uring: define and set show_fdinfo only if procfs is enabled
io_uring: drop file set ref put/get on switch
io_uring: import_single_range() returns 0/-ERROR
io_uring: pick up link work on submit reference drop
io-wq: ensure work->task_pid is cleared on init
io-wq: remove spin-for-work optimization
io_uring: fix poll_list race for SETUP_IOPOLL|SETUP_SQPOLL
io_uring: fix personality idr leak
io_uring: handle multiple personalities in link chains
Jens Axboe [Fri, 28 Feb 2020 17:02:36 +0000 (10:02 -0700)]
Merge branch 'nvme-5.6-rc4' of git://git.infradead.org/nvme into block-5.6
Pull NVMe fix from Keith.
* 'nvme-5.6-rc4' of git://git.infradead.org/nvme:
nvme-pci: Hold cq_poll_lock while completing CQEs
Linus Torvalds [Fri, 28 Feb 2020 17:02:18 +0000 (09:02 -0800)]
Merge tag 'acpi-5.6-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"Fix a couple of configuration issues in the ACPI watchdog (WDAT)
driver (Mika Westerberg) and make it possible to disable that driver
at boot time in case it still does not work as expected (Jean
Delvare)"
* tag 'acpi-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: watchdog: Set default timeout in probe
ACPI: watchdog: Fix gas->access_width usage
ACPICA: Introduce ACPI_ACCESS_BYTE_WIDTH() macro
ACPI: watchdog: Allow disabling WDAT at boot
Linus Torvalds [Fri, 28 Feb 2020 16:49:52 +0000 (08:49 -0800)]
Merge tag 'pm-5.6-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Fix a recent cpufreq initialization regression (Rafael Wysocki),
revert a devfreq commit that made incompatible changes and broke user
land on some systems (Orson Zhai), drop a stale reference to a
document that has gone away recently (Jonathan Neuschäfer), and fix a
typo in a hibernation code comment (Alexandre Belloni)"
* tag 'pm-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Fix policy initialization for internal governor drivers
Revert "PM / devfreq: Modify the device name as devfreq(X) for sysfs"
PM / hibernate: fix typo "reserverd_size" -> "reserved_size"
Documentation: power: Drop reference to interface.rst
Linus Torvalds [Fri, 28 Feb 2020 16:34:47 +0000 (08:34 -0800)]
Merge tag 'zonefs-5.6-rc4' of git://git./linux/kernel/git/dlemoal/zonefs
Pull zonefs fixes from Damien Le Moal:
"Two fixes in here:
- Revert the initial decision to silently ignore IOCB_NOWAIT for
asynchronous direct IOs to sequential zone files. Instead, return
an error to the user to signal that the feature is not supported
(from Christoph)
- A fix to zonefs Kconfig to select FS_IOMAP to avoid build failures
if no other file system already selected this option (from
Johannes)"
* tag 'zonefs-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: select FS_IOMAP
zonefs: fix IOCB_NOWAIT handling
Paolo Bonzini [Fri, 28 Feb 2020 10:46:59 +0000 (11:46 +0100)]
Merge tag 'kvmarm-fixes-5.6-1' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm fixes for 5.6, take #1
- Fix compilation on 32bit
- Move VHE guest entry/exit into the VHE-specific entry code
- Make sure all functions called by the non-VHE HYP code is tagged as __always_inline
Erwan Velu [Thu, 27 Feb 2020 18:00:46 +0000 (19:00 +0100)]
kvm: x86: Limit the number of "kvm: disabled by bios" messages
In older version of systemd(219), at boot time, udevadm is called with :
/usr/bin/udevadm trigger --type=devices --action=add"
This program generates an echo "add" in /sys/devices/system/cpu/cpu<x>/uevent,
leading to the "kvm: disabled by bios" message in case of your Bios disabled
the virtualization extensions.
On a modern system running up to 256 CPU threads, this pollutes the Kernel logs.
This patch offers to ratelimit this message to avoid any userspace program triggering
this uevent printing this message too often.
This patch is only a workaround but greatly reduce the pollution without
breaking the current behavior of printing a message if some try to instantiate
KVM on a system that doesn't support it.
Note that recent versions of systemd (>239) do not have trigger this behavior.
This patch will be useful at least for some using older systemd with recent Kernels.
Signed-off-by: Erwan Velu <e.velu@criteo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rafael J. Wysocki [Fri, 28 Feb 2020 10:00:50 +0000 (11:00 +0100)]
Merge branches 'pm-sleep' and 'pm-devfreq'
* pm-sleep:
PM / hibernate: fix typo "reserverd_size" -> "reserved_size"
Documentation: power: Drop reference to interface.rst
* pm-devfreq:
Revert "PM / devfreq: Modify the device name as devfreq(X) for sysfs"
Paolo Bonzini [Fri, 28 Feb 2020 09:49:10 +0000 (10:49 +0100)]
KVM: x86: avoid useless copy of cpufreq policy
struct cpufreq_policy is quite big and it is not a good idea
to allocate one on the stack. Just use cpufreq_cpu_get and
cpufreq_cpu_put which is even simpler.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 28 Feb 2020 09:42:31 +0000 (10:42 +0100)]
KVM: allow disabling -Werror
Restrict -Werror to well-tested configurations and allow disabling it
via Kconfig.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Valdis Klētnieks [Fri, 28 Feb 2020 02:49:52 +0000 (21:49 -0500)]
KVM: x86: allow compiling as non-module with W=1
Compile error with CONFIG_KVM_INTEL=y and W=1:
CC arch/x86/kvm/vmx/vmx.o
arch/x86/kvm/vmx/vmx.c:68:32: error: 'vmx_cpu_id' defined but not used [-Werror=unused-const-variable=]
68 | static const struct x86_cpu_id vmx_cpu_id[] = {
| ^~~~~~~~~~
cc1: all warnings being treated as errors
When building with =y, the MODULE_DEVICE_TABLE macro doesn't generate a
reference to the structure (or any code at all). This makes W=1 compiles
unhappy.
Wrap both in a #ifdef to avoid the issue.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
[Do the same for CONFIG_KVM_AMD. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Wanpeng Li [Tue, 18 Feb 2020 01:08:24 +0000 (09:08 +0800)]
KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis
Nick Desaulniers Reported:
When building with:
$ make CC=clang arch/x86/ CFLAGS=-Wframe-larger-than=1000
The following warning is observed:
arch/x86/kernel/kvm.c:494:13: warning: stack frame size of 1064 bytes in
function 'kvm_send_ipi_mask_allbutself' [-Wframe-larger-than=]
static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int
vector)
^
Debugging with:
https://github.com/ClangBuiltLinux/frame-larger-than
via:
$ python3 frame_larger_than.py arch/x86/kernel/kvm.o \
kvm_send_ipi_mask_allbutself
points to the stack allocated `struct cpumask newmask` in
`kvm_send_ipi_mask_allbutself`. The size of a `struct cpumask` is
potentially large, as it's CONFIG_NR_CPUS divided by BITS_PER_LONG for
the target architecture. CONFIG_NR_CPUS for X86_64 can be as high as
8192, making a single instance of a `struct cpumask` 1024 B.
This patch fixes it by pre-allocate 1 cpumask variable per cpu and use it for
both pv tlb and pv ipis..
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Wanpeng Li [Tue, 18 Feb 2020 01:08:23 +0000 (09:08 +0800)]
KVM: Introduce pv check helpers
Introduce some pv check helpers for consistency.
Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Christian Borntraeger [Fri, 28 Feb 2020 08:49:41 +0000 (09:49 +0100)]
KVM: let declaration of kvm_get_running_vcpus match implementation
Sparse notices that declaration and implementation do not match:
arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: warning: incorrect type in return expression (different address spaces)
arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: expected struct kvm_vcpu [noderef] <asn:3> **
arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: got struct kvm_vcpu *[noderef] <asn:3> *
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 25 Feb 2020 07:54:26 +0000 (08:54 +0100)]
KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter
Even if APICv is disabled at startup, the backing page and ir_list need
to be initialized in case they are needed later. The only case in
which this can be skipped is for userspace irqchip, and that must be
done because avic_init_backing_page dereferences vcpu->arch.apic
(which is NULL for userspace irqchip).
Tested-by: rmuncrief@humanavance.com
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=206579
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Fri, 28 Feb 2020 05:52:18 +0000 (21:52 -0800)]
Merge tag 'drm-fixes-2020-02-28' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Just some fixes for this week: amdgpu, radeon and i915.
The main i915 one is a regression Gen7 (Ivybridge/Haswell), this moves
them back from trying to use the full-ppgtt support to the aliasing
version it used to use due to gpu hangs. Otherwise it's pretty quiet.
amdgpu:
- Drop DRIVER_USE_AGP
- Fix memory leak in GPU reset
- Resume fix for raven
radeon:
- Drop DRIVER_USE_AGP
i915:
- downgrade gen7 back to aliasing-ppgtt to avoid GPU hangs
- shrinker fix
- pmu leak and double free fixes
- gvt user after free and virtual display reset fixes
- randconfig build fix"
* tag 'drm-fixes-2020-02-28' of git://anongit.freedesktop.org/drm/drm:
drm/radeon: Inline drm_get_pci_dev
drm/amdgpu: Drop DRIVER_USE_AGP
drm/i915: Avoid recursing onto active vma from the shrinker
drm/i915/pmu: Avoid using globals for PMU events
drm/i915/pmu: Avoid using globals for CPU hotplug state
drm/i915/gtt: Downgrade gen7 (ivb, byt, hsw) back to aliasing-ppgtt
drm/i915: fix header test with GCOV
amdgpu/gmc_v9: save/restore sdpif regs during S3
drm/amdgpu: fix memory leak during TDR test(v2)
drm/i915/gvt: Fix orphan vgpu dmabuf_objs' lifetime
drm/i915/gvt: Separate display reset from ALL_ENGINES reset
Dave Airlie [Fri, 28 Feb 2020 02:40:41 +0000 (12:40 +1000)]
Merge tag 'drm-intel-fixes-2020-02-27' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.6-rc4:
- downgrade gen7 back to aliasing-ppgtt to avoid GPU hangs
- shrinker fix
- pmu leak and double free fixes
- gvt user after free and virtual display reset fixes
- randconfig build fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/874kvcsh00.fsf@intel.com
Dave Airlie [Fri, 28 Feb 2020 02:30:18 +0000 (12:30 +1000)]
Merge tag 'amd-drm-fixes-5.6-2020-02-26' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.6-2020-02-26:
amdgpu:
- Drop DRIVER_USE_AGP
- Fix memory leak in GPU reset
- Resume fix for raven
radeon:
- Drop DRIVER_USE_AGP
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200227034106.3912-1-alexander.deucher@amd.com
Linus Torvalds [Fri, 28 Feb 2020 00:34:41 +0000 (16:34 -0800)]
Merge git://git./linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Fix leak in nl80211 AP start where we leak the ACL memory, from
Johannes Berg.
2) Fix double mutex unlock in mac80211, from Andrei Otcheretianski.
3) Fix RCU stall in ipset, from Jozsef Kadlecsik.
4) Fix devlink locking in devlink_dpipe_table_register, from Madhuparna
Bhowmik.
5) Fix race causing TX hang in ll_temac, from Esben Haabendal.
6) Stale eth hdr pointer in br_dev_xmit(), from Nikolay Aleksandrov.
7) Fix TX hash calculation bounds checking wrt. tc rules, from Amritha
Nambiar.
8) Size netlink responses properly in schedule action code to take into
consideration TCA_ACT_FLAGS. From Jiri Pirko.
9) Fix firmware paths for mscc PHY driver, from Antoine Tenart.
10) Don't register stmmac notifier multiple times, from Aaro Koskinen.
11) Various rmnet bug fixes, from Taehee Yoo.
12) Fix vsock deadlock in vsock transport release, from Stefano
Garzarella.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
net: dsa: mv88e6xxx: Fix masking of egress port
mlxsw: pci: Wait longer before accessing the device after reset
sfc: fix timestamp reconstruction at 16-bit rollover points
vsock: fix potential deadlock in transport->release()
unix: It's CONFIG_PROC_FS not CONFIG_PROCFS
net: rmnet: fix packet forwarding in rmnet bridge mode
net: rmnet: fix bridge mode bugs
net: rmnet: use upper/lower device infrastructure
net: rmnet: do not allow to change mux id if mux id is duplicated
net: rmnet: remove rcu_read_lock in rmnet_force_unassociate_device()
net: rmnet: fix suspicious RCU usage
net: rmnet: fix NULL pointer dereference in rmnet_changelink()
net: rmnet: fix NULL pointer dereference in rmnet_newlink()
net: phy: marvell: don't interpret PHY status unless resolved
mlx5: register lag notifier for init network namespace only
unix: define and set show_fdinfo only if procfs is enabled
hinic: fix a bug of rss configuration
hinic: fix a bug of setting hw_ioctxt
hinic: fix a irq affinity bug
net/smc: check for valid ib_client_data
...
Lukas Bulwahn [Fri, 21 Feb 2020 18:54:02 +0000 (19:54 +0100)]
MAINTAINERS: Correct Cadence PCI driver path
de80f95ccb9c ("PCI: cadence: Move all files to per-device cadence
directory") moved files of the PCI cadence drivers, but did not update the
MAINTAINERS entry.
Since then, ./scripts/get_maintainer.pl --self-test complains:
warning: no file matches F: drivers/pci/controller/pcie-cadence*
Repair the MAINTAINERS entry.
Link: https://lore.kernel.org/r/20200221185402.4703-1-lukas.bulwahn@gmail.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Jens Axboe [Thu, 27 Feb 2020 21:17:49 +0000 (14:17 -0700)]
io_uring: fix 32-bit compatability with sendmsg/recvmsg
We must set MSG_CMSG_COMPAT if we're in compatability mode, otherwise
the iovec import for these commands will not do the right thing and fail
the command with -EINVAL.
Found by running the test suite compiled as 32-bit.
Cc: stable@vger.kernel.org
Fixes: aa1fa28fc73e ("io_uring: add support for recvmsg()")
Fixes: 0fa03c624d8f ("io_uring: add support for sendmsg()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Andrew Lunn [Thu, 27 Feb 2020 20:20:49 +0000 (21:20 +0100)]
net: dsa: mv88e6xxx: Fix masking of egress port
Add missing ~ to the usage of the mask.
Reported-by: Kevin Benson <Kevin.Benson@zii.aero>
Reported-by: Chris Healy <Chris.Healy@zii.aero>
Fixes: 5c74c54ce6ff ("net: dsa: mv88e6xxx: Split monitor port configuration")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 27 Feb 2020 20:07:53 +0000 (21:07 +0100)]
mlxsw: pci: Wait longer before accessing the device after reset
During initialization the driver issues a reset to the device and waits
for 100ms before checking if the firmware is ready. The waiting is
necessary because before that the device is irresponsive and the first
read can result in a completion timeout.
While 100ms is sufficient for Spectrum-1 and Spectrum-2, it is
insufficient for Spectrum-3.
Fix this by increasing the timeout to 200ms.
Fixes: da382875c616 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Maftei (amaftei) [Wed, 26 Feb 2020 17:33:19 +0000 (17:33 +0000)]
sfc: fix timestamp reconstruction at 16-bit rollover points
We can't just use the top bits of the last sync event as they could be
off-by-one every 65,536 seconds, giving an error in reconstruction of
65,536 seconds.
This patch uses the difference in the bottom 16 bits (mod 2^16) to
calculate an offset that needs to be applied to the last sync event to
get to the current time.
Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com>
Acked-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella [Wed, 26 Feb 2020 10:58:18 +0000 (11:58 +0100)]
vsock: fix potential deadlock in transport->release()
Some transports (hyperv, virtio) acquire the sock lock during the
.release() callback.
In the vsock_stream_connect() we call vsock_assign_transport(); if
the socket was previously assigned to another transport, the
vsk->transport->release() is called, but the sock lock is already
held in the vsock_stream_connect(), causing a deadlock reported by
syzbot:
INFO: task syz-executor280:9768 blocked for more than 143 seconds.
Not tainted 5.6.0-rc1-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor280 D27912 9768 9766 0x00000000
Call Trace:
context_switch kernel/sched/core.c:3386 [inline]
__schedule+0x934/0x1f90 kernel/sched/core.c:4082
schedule+0xdc/0x2b0 kernel/sched/core.c:4156
__lock_sock+0x165/0x290 net/core/sock.c:2413
lock_sock_nested+0xfe/0x120 net/core/sock.c:2938
virtio_transport_release+0xc4/0xd60 net/vmw_vsock/virtio_transport_common.c:832
vsock_assign_transport+0xf3/0x3b0 net/vmw_vsock/af_vsock.c:454
vsock_stream_connect+0x2b3/0xc70 net/vmw_vsock/af_vsock.c:1288
__sys_connect_file+0x161/0x1c0 net/socket.c:1857
__sys_connect+0x174/0x1b0 net/socket.c:1874
__do_sys_connect net/socket.c:1885 [inline]
__se_sys_connect net/socket.c:1882 [inline]
__x64_sys_connect+0x73/0xb0 net/socket.c:1882
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
To avoid this issue, this patch remove the lock acquiring in the
.release() callback of hyperv and virtio transports, and it holds
the lock when we call vsk->transport->release() in the vsock core.
Reported-by: syzbot+731710996d79d0d58fbc@syzkaller.appspotmail.com
Fixes: 408624af4c89 ("vsock: use local transport when it is loaded")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Feb 2020 19:52:35 +0000 (11:52 -0800)]
unix: It's CONFIG_PROC_FS not CONFIG_PROCFS
Fixes: 3a12500ed5dd ("unix: define and set show_fdinfo only if procfs is enabled")
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Feb 2020 19:45:07 +0000 (11:45 -0800)]
Merge branch 'net-rmnet-fix-several-bugs'
Taehee Yoo says:
====================
net: rmnet: fix several bugs
This patchset is to fix several bugs in RMNET module.
1. The first patch fixes NULL-ptr-deref in rmnet_newlink().
When rmnet interface is being created, it uses IFLA_LINK
without checking NULL.
So, if userspace doesn't set IFLA_LINK, panic will occur.
In this patch, checking NULL pointer code is added.
2. The second patch fixes NULL-ptr-deref in rmnet_changelink().
To get real device in rmnet_changelink(), it uses IFLA_LINK.
But, IFLA_LINK should not be used in rmnet_changelink().
3. The third patch fixes suspicious RCU usage in rmnet_get_port().
rmnet_get_port() uses rcu_dereference_rtnl().
But, rmnet_get_port() is used by datapath.
So, rcu_dereference_bh() should be used instead of rcu_dereference_rtnl().
4. The fourth patch fixes suspicious RCU usage in
rmnet_force_unassociate_device().
RCU critical section should not be scheduled.
But, unregister_netdevice_queue() in the rmnet_force_unassociate_device()
would be scheduled.
So, the RCU warning occurs.
In this patch, the rcu_read_lock() in the rmnet_force_unassociate_device()
is removed because it's unnecessary.
5. The fifth patch fixes duplicate MUX ID case.
RMNET MUX ID is unique.
So, rmnet interface isn't allowed to be created, which have
a duplicate MUX ID.
But, only rmnet_newlink() checks this condition, rmnet_changelink()
doesn't check this.
So, duplicate MUX ID case would happen.
6. The sixth patch fixes upper/lower interface relationship problems.
When IFLA_LINK is used, the upper/lower infrastructure should be used.
Because it checks the maximum depth of upper/lower interfaces and it also
checks circular interface relationship, etc.
In this patch, netdev_upper_dev_link() is used.
7. The seventh patch fixes bridge related problems.
a) ->ndo_del_slave() doesn't work.
b) It couldn't detect circular upper/lower interface relationship.
c) It couldn't prevent stack overflow because of too deep depth
of upper/lower interface
d) It doesn't check the number of lower interfaces.
e) Panics because of several reasons.
These problems are actually the same problem.
So, this patch fixes these problems.
8. The eighth patch fixes packet forwarding issue in bridge mode
Packet forwarding is not working in rmnet bridge mode.
Because when a packet is forwarded, skb_push() for an ethernet header
is needed. But it doesn't call skb_push().
So, the ethernet header will be lost.
Change log:
- update commit logs.
- drop two patches in this patchset because of wrong target branch.
- ("net: rmnet: add missing module alias")
- ("net: rmnet: print error message when command fails")
- remove unneessary rcu_read_lock() in the third patch.
- use rcu_dereference_bh() instead of rcu_dereference in third patch.
- do not allow to add a bridge device if rmnet interface is already
bridge mode in the seventh patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:26:15 +0000 (12:26 +0000)]
net: rmnet: fix packet forwarding in rmnet bridge mode
Packet forwarding is not working in rmnet bridge mode.
Because when a packet is forwarded, skb_push() for an ethernet header
is needed. But it doesn't call skb_push().
So, the ethernet header will be lost.
Test commands:
modprobe rmnet
ip netns add nst
ip netns add nst2
ip link add veth0 type veth peer name veth1
ip link add veth2 type veth peer name veth3
ip link set veth1 netns nst
ip link set veth3 netns nst2
ip link add rmnet0 link veth0 type rmnet mux_id 1
ip link set veth2 master rmnet0
ip link set veth0 up
ip link set veth2 up
ip link set rmnet0 up
ip a a 192.168.100.1/24 dev rmnet0
ip netns exec nst ip link set veth1 up
ip netns exec nst ip a a 192.168.100.2/24 dev veth1
ip netns exec nst2 ip link set veth3 up
ip netns exec nst2 ip a a 192.168.100.3/24 dev veth3
ip netns exec nst2 ping 192.168.100.2
Fixes: 60d58f971c10 ("net: qualcomm: rmnet: Implement bridge mode")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:26:02 +0000 (12:26 +0000)]
net: rmnet: fix bridge mode bugs
In order to attach a bridge interface to the rmnet interface,
"master" operation is used.
(e.g. ip link set dummy1 master rmnet0)
But, in the rmnet_add_bridge(), which is a callback of ->ndo_add_slave()
doesn't register lower interface.
So, ->ndo_del_slave() doesn't work.
There are other problems too.
1. It couldn't detect circular upper/lower interface relationship.
2. It couldn't prevent stack overflow because of too deep depth
of upper/lower interface
3. It doesn't check the number of lower interfaces.
4. Panics because of several reasons.
The root problem of these issues is actually the same.
So, in this patch, these all problems will be fixed.
Test commands:
modprobe rmnet
ip link add dummy0 type dummy
ip link add rmnet0 link dummy0 type rmnet mux_id 1
ip link add dummy1 master rmnet0 type dummy
ip link add dummy2 master rmnet0 type dummy
ip link del rmnet0
ip link del dummy2
ip link del dummy1
Splat looks like:
[ 41.867595][ T1164] general protection fault, probably for non-canonical address 0xdffffc0000000101I
[ 41.869993][ T1164] KASAN: null-ptr-deref in range [0x0000000000000808-0x000000000000080f]
[ 41.872950][ T1164] CPU: 0 PID: 1164 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 41.873915][ T1164] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 41.875161][ T1164] RIP: 0010:rmnet_unregister_bridge.isra.6+0x71/0xf0 [rmnet]
[ 41.876178][ T1164] Code: 48 89 ef 48 89 c6 5b 5d e9 fc fe ff ff e8 f7 f3 ff ff 48 8d b8 08 08 00 00 48 ba 00 7
[ 41.878925][ T1164] RSP: 0018:
ffff8880c4d0f188 EFLAGS:
00010202
[ 41.879774][ T1164] RAX:
0000000000000000 RBX:
0000000000000000 RCX:
0000000000000101
[ 41.887689][ T1164] RDX:
dffffc0000000000 RSI:
ffffffffb8cf64f0 RDI:
0000000000000808
[ 41.888727][ T1164] RBP:
ffff8880c40e4000 R08:
ffffed101b3c0e3c R09:
0000000000000001
[ 41.889749][ T1164] R10:
0000000000000001 R11:
ffffed101b3c0e3b R12:
1ffff110189a1e3c
[ 41.890783][ T1164] R13:
ffff8880c4d0f200 R14:
ffffffffb8d56160 R15:
ffff8880ccc2c000
[ 41.891794][ T1164] FS:
00007f4300edc0c0(0000) GS:
ffff8880d9c00000(0000) knlGS:
0000000000000000
[ 41.892953][ T1164] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 41.893800][ T1164] CR2:
00007f43003bc8c0 CR3:
00000000ca53e001 CR4:
00000000000606f0
[ 41.894824][ T1164] Call Trace:
[ 41.895274][ T1164] ? rcu_is_watching+0x2c/0x80
[ 41.895895][ T1164] rmnet_config_notify_cb+0x1f7/0x590 [rmnet]
[ 41.896687][ T1164] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[ 41.897611][ T1164] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[ 41.898508][ T1164] ? __module_text_address+0x13/0x140
[ 41.899162][ T1164] notifier_call_chain+0x90/0x160
[ 41.899814][ T1164] rollback_registered_many+0x660/0xcf0
[ 41.900544][ T1164] ? netif_set_real_num_tx_queues+0x780/0x780
[ 41.901316][ T1164] ? __lock_acquire+0xdfe/0x3de0
[ 41.901958][ T1164] ? memset+0x1f/0x40
[ 41.902468][ T1164] ? __nla_validate_parse+0x98/0x1ab0
[ 41.903166][ T1164] unregister_netdevice_many.part.133+0x13/0x1b0
[ 41.903988][ T1164] rtnl_delete_link+0xbc/0x100
[ ... ]
Fixes: 60d58f971c10 ("net: qualcomm: rmnet: Implement bridge mode")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:25:43 +0000 (12:25 +0000)]
net: rmnet: use upper/lower device infrastructure
netdev_upper_dev_link() is useful to manage lower/upper interfaces.
And this function internally validates looping, maximum depth.
All or most virtual interfaces that could have a real interface
(e.g. macsec, macvlan, ipvlan etc.) use lower/upper infrastructure.
Test commands:
modprobe rmnet
ip link add dummy0 type dummy
ip link add rmnet1 link dummy0 type rmnet mux_id 1
for i in {2..100}
do
let A=$i-1
ip link add rmnet$i link rmnet$A type rmnet mux_id $i
done
ip link del dummy0
The purpose of the test commands is to make stack overflow.
Splat looks like:
[ 52.411438][ T1395] BUG: KASAN: slab-out-of-bounds in find_busiest_group+0x27e/0x2c00
[ 52.413218][ T1395] Write of size 64 at addr
ffff8880c774bde0 by task ip/1395
[ 52.414841][ T1395]
[ 52.430720][ T1395] CPU: 1 PID: 1395 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 52.496511][ T1395] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 52.513597][ T1395] Call Trace:
[ 52.546516][ T1395]
[ 52.558773][ T1395] Allocated by task
3171537984:
[ 52.588290][ T1395] BUG: unable to handle page fault for address:
ffffffffb999e260
[ 52.589311][ T1395] #PF: supervisor read access in kernel mode
[ 52.590529][ T1395] #PF: error_code(0x0000) - not-present page
[ 52.591374][ T1395] PGD
d6818067 P4D
d6818067 PUD
d6819063 PMD 0
[ 52.592288][ T1395] Thread overran stack, or stack corrupted
[ 52.604980][ T1395] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 52.605856][ T1395] CPU: 1 PID: 1395 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 52.611764][ T1395] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 52.621520][ T1395] RIP: 0010:stack_depot_fetch+0x10/0x30
[ 52.622296][ T1395] Code: ff e9 f9 fe ff ff 48 89 df e8 9c 1d 91 ff e9 ca fe ff ff cc cc cc cc cc cc cc 89 f8 0
[ 52.627887][ T1395] RSP: 0018:
ffff8880c774bb60 EFLAGS:
00010006
[ 52.628735][ T1395] RAX:
00000000001f8880 RBX:
ffff8880c774d140 RCX:
0000000000000000
[ 52.631773][ T1395] RDX:
000000000000001d RSI:
ffff8880c774bb68 RDI:
0000000000003ff0
[ 52.649584][ T1395] RBP:
ffffea00031dd200 R08:
ffffed101b43e403 R09:
ffffed101b43e403
[ 52.674857][ T1395] R10:
0000000000000001 R11:
ffffed101b43e402 R12:
ffff8880d900e5c0
[ 52.678257][ T1395] R13:
ffff8880c774c000 R14:
0000000000000000 R15:
dffffc0000000000
[ 52.694541][ T1395] FS:
00007fe867f6e0c0(0000) GS:
ffff8880da000000(0000) knlGS:
0000000000000000
[ 52.764039][ T1395] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 52.815008][ T1395] CR2:
ffffffffb999e260 CR3:
00000000c26aa005 CR4:
00000000000606e0
[ 52.862312][ T1395] Call Trace:
[ 52.887133][ T1395] Modules linked in: dummy rmnet veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_dex
[ 52.936749][ T1395] CR2:
ffffffffb999e260
[ 52.965695][ T1395] ---[ end trace
7e32ca99482dbb31 ]---
[ 52.966556][ T1395] RIP: 0010:stack_depot_fetch+0x10/0x30
[ 52.971083][ T1395] Code: ff e9 f9 fe ff ff 48 89 df e8 9c 1d 91 ff e9 ca fe ff ff cc cc cc cc cc cc cc 89 f8 0
[ 53.003650][ T1395] RSP: 0018:
ffff8880c774bb60 EFLAGS:
00010006
[ 53.043183][ T1395] RAX:
00000000001f8880 RBX:
ffff8880c774d140 RCX:
0000000000000000
[ 53.076480][ T1395] RDX:
000000000000001d RSI:
ffff8880c774bb68 RDI:
0000000000003ff0
[ 53.093858][ T1395] RBP:
ffffea00031dd200 R08:
ffffed101b43e403 R09:
ffffed101b43e403
[ 53.112795][ T1395] R10:
0000000000000001 R11:
ffffed101b43e402 R12:
ffff8880d900e5c0
[ 53.139837][ T1395] R13:
ffff8880c774c000 R14:
0000000000000000 R15:
dffffc0000000000
[ 53.141500][ T1395] FS:
00007fe867f6e0c0(0000) GS:
ffff8880da000000(0000) knlGS:
0000000000000000
[ 53.143343][ T1395] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 53.152007][ T1395] CR2:
ffffffffb999e260 CR3:
00000000c26aa005 CR4:
00000000000606e0
[ 53.156459][ T1395] Kernel panic - not syncing: Fatal exception
[ 54.213570][ T1395] Shutting down cpus with NMI
[ 54.354112][ T1395] Kernel Offset: 0x33000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0x)
[ 54.355687][ T1395] Rebooting in 5 seconds..
Fixes: b37f78f234bf ("net: qualcomm: rmnet: Fix crash on real dev unregistration")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:25:19 +0000 (12:25 +0000)]
net: rmnet: do not allow to change mux id if mux id is duplicated
Basically, duplicate mux id isn't be allowed.
So, the creation of rmnet will be failed if there is duplicate mux id
is existing.
But, changelink routine doesn't check duplicate mux id.
Test commands:
modprobe rmnet
ip link add dummy0 type dummy
ip link add rmnet0 link dummy0 type rmnet mux_id 1
ip link add rmnet1 link dummy0 type rmnet mux_id 2
ip link set rmnet1 type rmnet mux_id 1
Fixes: 23790ef12082 ("net: qualcomm: rmnet: Allow to configure flags for existing devices")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:25:05 +0000 (12:25 +0000)]
net: rmnet: remove rcu_read_lock in rmnet_force_unassociate_device()
The notifier_call() of the slave interface removes rmnet interface with
unregister_netdevice_queue().
But, before calling unregister_netdevice_queue(), it acquires
rcu readlock.
In the RCU critical section, sleeping isn't be allowed.
But, unregister_netdevice_queue() internally calls synchronize_net(),
which would sleep.
So, suspicious RCU usage warning occurs.
Test commands:
modprobe rmnet
ip link add dummy0 type dummy
ip link add dummy1 type dummy
ip link add rmnet0 link dummy0 type rmnet mux_id 1
ip link set dummy1 master rmnet0
ip link del dummy0
Splat looks like:
[ 79.639245][ T1195] =============================
[ 79.640134][ T1195] WARNING: suspicious RCU usage
[ 79.640852][ T1195] 5.6.0-rc1+ #447 Not tainted
[ 79.641657][ T1195] -----------------------------
[ 79.642472][ T1195] ./include/linux/rcupdate.h:273 Illegal context switch in RCU read-side critical section!
[ 79.644043][ T1195]
[ 79.644043][ T1195] other info that might help us debug this:
[ 79.644043][ T1195]
[ 79.645682][ T1195]
[ 79.645682][ T1195] rcu_scheduler_active = 2, debug_locks = 1
[ 79.646980][ T1195] 2 locks held by ip/1195:
[ 79.647629][ T1195] #0:
ffffffffa3cf64f0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x457/0x890
[ 79.649312][ T1195] #1:
ffffffffa39256c0 (rcu_read_lock){....}, at: rmnet_config_notify_cb+0xf0/0x590 [rmnet]
[ 79.651717][ T1195]
[ 79.651717][ T1195] stack backtrace:
[ 79.652650][ T1195] CPU: 3 PID: 1195 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 79.653702][ T1195] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 79.655037][ T1195] Call Trace:
[ 79.655560][ T1195] dump_stack+0x96/0xdb
[ 79.656252][ T1195] ___might_sleep+0x345/0x440
[ 79.656994][ T1195] synchronize_net+0x18/0x30
[ 79.661132][ T1195] netdev_rx_handler_unregister+0x40/0xb0
[ 79.666266][ T1195] rmnet_unregister_real_device+0x42/0xb0 [rmnet]
[ 79.667211][ T1195] rmnet_config_notify_cb+0x1f7/0x590 [rmnet]
[ 79.668121][ T1195] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[ 79.669166][ T1195] ? rmnet_unregister_bridge.isra.6+0xf0/0xf0 [rmnet]
[ 79.670286][ T1195] ? __module_text_address+0x13/0x140
[ 79.671139][ T1195] notifier_call_chain+0x90/0x160
[ 79.671973][ T1195] rollback_registered_many+0x660/0xcf0
[ 79.672893][ T1195] ? netif_set_real_num_tx_queues+0x780/0x780
[ 79.675091][ T1195] ? __lock_acquire+0xdfe/0x3de0
[ 79.675825][ T1195] ? memset+0x1f/0x40
[ 79.676367][ T1195] ? __nla_validate_parse+0x98/0x1ab0
[ 79.677290][ T1195] unregister_netdevice_many.part.133+0x13/0x1b0
[ 79.678163][ T1195] rtnl_delete_link+0xbc/0x100
[ ... ]
Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:24:45 +0000 (12:24 +0000)]
net: rmnet: fix suspicious RCU usage
rmnet_get_port() internally calls rcu_dereference_rtnl(),
which checks RTNL.
But rmnet_get_port() could be called by packet path.
The packet path is not protected by RTNL.
So, the suspicious RCU usage problem occurs.
Test commands:
modprobe rmnet
ip netns add nst
ip link add veth0 type veth peer name veth1
ip link set veth1 netns nst
ip link add rmnet0 link veth0 type rmnet mux_id 1
ip netns exec nst ip link add rmnet1 link veth1 type rmnet mux_id 1
ip netns exec nst ip link set veth1 up
ip netns exec nst ip link set rmnet1 up
ip netns exec nst ip a a 192.168.100.2/24 dev rmnet1
ip link set veth0 up
ip link set rmnet0 up
ip a a 192.168.100.1/24 dev rmnet0
ping 192.168.100.2
Splat looks like:
[ 146.630958][ T1174] WARNING: suspicious RCU usage
[ 146.631735][ T1174] 5.6.0-rc1+ #447 Not tainted
[ 146.632387][ T1174] -----------------------------
[ 146.633151][ T1174] drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c:386 suspicious rcu_dereference_check() !
[ 146.634742][ T1174]
[ 146.634742][ T1174] other info that might help us debug this:
[ 146.634742][ T1174]
[ 146.645992][ T1174]
[ 146.645992][ T1174] rcu_scheduler_active = 2, debug_locks = 1
[ 146.646937][ T1174] 5 locks held by ping/1174:
[ 146.647609][ T1174] #0:
ffff8880c31dea70 (sk_lock-AF_INET){+.+.}, at: raw_sendmsg+0xab8/0x2980
[ 146.662463][ T1174] #1:
ffffffff93925660 (rcu_read_lock_bh){....}, at: ip_finish_output2+0x243/0x2150
[ 146.671696][ T1174] #2:
ffffffff93925660 (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x213/0x2940
[ 146.673064][ T1174] #3:
ffff8880c19ecd58 (&dev->qdisc_running_key#7){+...}, at: ip_finish_output2+0x714/0x2150
[ 146.690358][ T1174] #4:
ffff8880c5796898 (&dev->qdisc_xmit_lock_key#3){+.-.}, at: sch_direct_xmit+0x1e2/0x1020
[ 146.699875][ T1174]
[ 146.699875][ T1174] stack backtrace:
[ 146.701091][ T1174] CPU: 0 PID: 1174 Comm: ping Not tainted 5.6.0-rc1+ #447
[ 146.705215][ T1174] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 146.706565][ T1174] Call Trace:
[ 146.707102][ T1174] dump_stack+0x96/0xdb
[ 146.708007][ T1174] rmnet_get_port.part.9+0x76/0x80 [rmnet]
[ 146.709233][ T1174] rmnet_egress_handler+0x107/0x420 [rmnet]
[ 146.710492][ T1174] ? sch_direct_xmit+0x1e2/0x1020
[ 146.716193][ T1174] rmnet_vnd_start_xmit+0x3d/0xa0 [rmnet]
[ 146.717012][ T1174] dev_hard_start_xmit+0x160/0x740
[ 146.717854][ T1174] sch_direct_xmit+0x265/0x1020
[ 146.718577][ T1174] ? register_lock_class+0x14d0/0x14d0
[ 146.719429][ T1174] ? dev_watchdog+0xac0/0xac0
[ 146.723738][ T1174] ? __dev_queue_xmit+0x15fd/0x2940
[ 146.724469][ T1174] ? lock_acquire+0x164/0x3b0
[ 146.725172][ T1174] __dev_queue_xmit+0x20c7/0x2940
[ ... ]
Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:24:26 +0000 (12:24 +0000)]
net: rmnet: fix NULL pointer dereference in rmnet_changelink()
In the rmnet_changelink(), it uses IFLA_LINK without checking
NULL pointer.
tb[IFLA_LINK] could be NULL pointer.
So, NULL-ptr-deref could occur.
rmnet already has a lower interface (real_dev).
So, after this patch, rmnet_changelink() does not use IFLA_LINK anymore.
Test commands:
modprobe rmnet
ip link add dummy0 type dummy
ip link add rmnet0 link dummy0 type rmnet mux_id 1
ip link set rmnet0 type rmnet mux_id 2
Splat looks like:
[ 90.578726][ T1131] general protection fault, probably for non-canonical address 0xdffffc0000000000I
[ 90.581121][ T1131] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 90.582380][ T1131] CPU: 2 PID: 1131 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 90.584285][ T1131] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 90.587506][ T1131] RIP: 0010:rmnet_changelink+0x5a/0x8a0 [rmnet]
[ 90.588546][ T1131] Code: 83 ec 20 48 c1 ea 03 80 3c 02 00 0f 85 6f 07 00 00 48 8b 5e 28 48 b8 00 00 00 00 00 0
[ 90.591447][ T1131] RSP: 0018:
ffff8880ce78f1b8 EFLAGS:
00010247
[ 90.592329][ T1131] RAX:
dffffc0000000000 RBX:
0000000000000000 RCX:
ffff8880ce78f8b0
[ 90.593253][ T1131] RDX:
0000000000000000 RSI:
ffff8880ce78f4a0 RDI:
0000000000000004
[ 90.594058][ T1131] RBP:
ffff8880cf543e00 R08:
0000000000000002 R09:
0000000000000002
[ 90.594859][ T1131] R10:
ffffffffc0586a40 R11:
0000000000000000 R12:
ffff8880ca47c000
[ 90.595690][ T1131] R13:
ffff8880ca47c000 R14:
ffff8880cf545000 R15:
0000000000000000
[ 90.596553][ T1131] FS:
00007f21f6c7e0c0(0000) GS:
ffff8880da400000(0000) knlGS:
0000000000000000
[ 90.597504][ T1131] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 90.599418][ T1131] CR2:
0000556e413db458 CR3:
00000000c917a002 CR4:
00000000000606e0
[ 90.600289][ T1131] Call Trace:
[ 90.600631][ T1131] __rtnl_newlink+0x922/0x1270
[ 90.601194][ T1131] ? lock_downgrade+0x6e0/0x6e0
[ 90.601724][ T1131] ? rtnl_link_unregister+0x220/0x220
[ 90.602309][ T1131] ? lock_acquire+0x164/0x3b0
[ 90.602784][ T1131] ? is_bpf_image_address+0xff/0x1d0
[ 90.603331][ T1131] ? rtnl_newlink+0x4c/0x90
[ 90.603810][ T1131] ? kernel_text_address+0x111/0x140
[ 90.604419][ T1131] ? __kernel_text_address+0xe/0x30
[ 90.604981][ T1131] ? unwind_get_return_address+0x5f/0xa0
[ 90.605616][ T1131] ? create_prof_cpu_mask+0x20/0x20
[ 90.606304][ T1131] ? arch_stack_walk+0x83/0xb0
[ 90.606985][ T1131] ? stack_trace_save+0x82/0xb0
[ 90.607656][ T1131] ? stack_trace_consume_entry+0x160/0x160
[ 90.608503][ T1131] ? deactivate_slab.isra.78+0x2c5/0x800
[ 90.609336][ T1131] ? kasan_unpoison_shadow+0x30/0x40
[ 90.610096][ T1131] ? kmem_cache_alloc_trace+0x135/0x350
[ 90.610889][ T1131] ? rtnl_newlink+0x4c/0x90
[ 90.611512][ T1131] rtnl_newlink+0x65/0x90
[ ... ]
Fixes: 23790ef12082 ("net: qualcomm: rmnet: Allow to configure flags for existing devices")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Thu, 27 Feb 2020 12:23:52 +0000 (12:23 +0000)]
net: rmnet: fix NULL pointer dereference in rmnet_newlink()
rmnet registers IFLA_LINK interface as a lower interface.
But, IFLA_LINK could be NULL.
In the current code, rmnet doesn't check IFLA_LINK.
So, panic would occur.
Test commands:
modprobe rmnet
ip link add rmnet0 type rmnet mux_id 1
Splat looks like:
[ 36.826109][ T1115] general protection fault, probably for non-canonical address 0xdffffc0000000000I
[ 36.838817][ T1115] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 36.839908][ T1115] CPU: 1 PID: 1115 Comm: ip Not tainted 5.6.0-rc1+ #447
[ 36.840569][ T1115] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 36.841408][ T1115] RIP: 0010:rmnet_newlink+0x54/0x510 [rmnet]
[ 36.841986][ T1115] Code: 83 ec 18 48 c1 e9 03 80 3c 01 00 0f 85 d4 03 00 00 48 8b 6a 28 48 b8 00 00 00 00 00 c
[ 36.843923][ T1115] RSP: 0018:
ffff8880b7e0f1c0 EFLAGS:
00010247
[ 36.844756][ T1115] RAX:
dffffc0000000000 RBX:
ffff8880d14cca00 RCX:
1ffff11016fc1e99
[ 36.845859][ T1115] RDX:
0000000000000000 RSI:
ffff8880c3d04000 RDI:
0000000000000004
[ 36.846961][ T1115] RBP:
0000000000000000 R08:
ffff8880b7e0f8b0 R09:
ffff8880b6ac2d90
[ 36.848020][ T1115] R10:
ffffffffc0589a40 R11:
ffffed1016d585b7 R12:
ffffffff88ceaf80
[ 36.848788][ T1115] R13:
ffff8880c3d04000 R14:
ffff8880b7e0f8b0 R15:
ffff8880c3d04000
[ 36.849546][ T1115] FS:
00007f50ab3360c0(0000) GS:
ffff8880da000000(0000) knlGS:
0000000000000000
[ 36.851784][ T1115] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 36.852422][ T1115] CR2:
000055871afe5ab0 CR3:
00000000ae246001 CR4:
00000000000606e0
[ 36.853181][ T1115] Call Trace:
[ 36.853514][ T1115] __rtnl_newlink+0xbdb/0x1270
[ 36.853967][ T1115] ? lock_downgrade+0x6e0/0x6e0
[ 36.854420][ T1115] ? rtnl_link_unregister+0x220/0x220
[ 36.854936][ T1115] ? lock_acquire+0x164/0x3b0
[ 36.855376][ T1115] ? is_bpf_image_address+0xff/0x1d0
[ 36.855884][ T1115] ? rtnl_newlink+0x4c/0x90
[ 36.856304][ T1115] ? kernel_text_address+0x111/0x140
[ 36.856857][ T1115] ? __kernel_text_address+0xe/0x30
[ 36.857440][ T1115] ? unwind_get_return_address+0x5f/0xa0
[ 36.858063][ T1115] ? create_prof_cpu_mask+0x20/0x20
[ 36.858644][ T1115] ? arch_stack_walk+0x83/0xb0
[ 36.859171][ T1115] ? stack_trace_save+0x82/0xb0
[ 36.859710][ T1115] ? stack_trace_consume_entry+0x160/0x160
[ 36.860357][ T1115] ? deactivate_slab.isra.78+0x2c5/0x800
[ 36.860928][ T1115] ? kasan_unpoison_shadow+0x30/0x40
[ 36.861520][ T1115] ? kmem_cache_alloc_trace+0x135/0x350
[ 36.862125][ T1115] ? rtnl_newlink+0x4c/0x90
[ 36.864073][ T1115] rtnl_newlink+0x65/0x90
[ ... ]
Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 27 Feb 2020 19:26:33 +0000 (11:26 -0800)]
Merge tag 'kbuild-fixes-v5.6-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- fix missed rebuild of DT schema check
- add some phony targets to PHONY
- fix comments and documents
* tag 'kbuild-fixes-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: get rid of trailing slash from subdir- example
kbuild: add dt_binding_check to PHONY in a correct place
kbuild: add dtbs_check to PHONY
kbuild: remove unneeded semicolon at the end of cmd_dtb_check
kbuild: fix DT binding schema rule to detect command line changes
kbuild: remove wrong documentation about mandatory-y
kbuild: add comment for V=2 mode
Russell King [Thu, 27 Feb 2020 09:44:49 +0000 (09:44 +0000)]
net: phy: marvell: don't interpret PHY status unless resolved
Don't attempt to interpret the PHY specific status register unless
the PHY is indicating that the resolution is valid.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 27 Feb 2020 07:22:10 +0000 (08:22 +0100)]
mlx5: register lag notifier for init network namespace only
The current code causes problems when the unregistering netdevice could
be different then the registering one.
Since the check in mlx5_lag_netdev_event() does not allow any other
network namespace anyway, fix this by registerting the lag notifier
per init network namespace only.
Fixes: d48834f9d4b4 ("mlx5: Use dev_net netdevice notifier registrations")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Aya Levin <ayal@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 27 Feb 2020 19:13:27 +0000 (11:13 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/hid/hid
Pull HID subsystem fixes from Jiri Kosina:
- syzkaller-reported error handling fixes in various drivers, from
various people
- increase of HID report buffer size to 8K, which is apparently needed
by certain modern devices
- a few new device-ID-specific fixes / quirks
- battery charging status reporting fix in logitech-hidpp, from Filipe
Laíns
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: hid-bigbenff: fix race condition for scheduled work during removal
HID: hid-bigbenff: call hid_hw_stop() in case of error
HID: hid-bigbenff: fix general protection fault caused by double kfree
HID: i2c-hid: add Trekstor Surfbook E11B to descriptor override
HID: alps: Fix an error handling path in 'alps_input_configured()'
HID: hiddev: Fix race in in hiddev_disconnect()
HID: core: increase HID report buffer size to 8KiB
HID: core: fix off-by-one memset in hid_report_raw_event()
HID: apple: Add support for recent firmware on Magic Keyboards
HID: ite: Only bind to keyboard USB interface on Acer SW5-012 keyboard dock
HID: logitech-hidpp: BatteryVoltage: only read chargeStatus if extPower is active
Tobias Klauser [Wed, 26 Feb 2020 17:29:53 +0000 (18:29 +0100)]
unix: define and set show_fdinfo only if procfs is enabled
Follow the pattern used with other *_show_fdinfo functions and only
define unix_show_fdinfo and set it in proto_ops if CONFIG_PROCFS
is set.
Fixes: 3c32da19a858 ("unix: Show number of pending scm files of receive queue in fdinfo")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Feb 2020 19:08:01 +0000 (11:08 -0800)]
Merge branch 'hinic-BugFixes'
Luo bin says:
====================
hinic: BugFixes
the bug fixed in patch #2 has been present since the first commit.
the bugs fixed in patch #1 and patch #3 have been present since the
following commits:
patch #1:
352f58b0d9f2 ("net-next/hinic: Set Rxq irq to specific cpu for NUMA")
patch #3:
421e9526288b ("hinic: add rss support")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Luo bin [Thu, 27 Feb 2020 06:34:44 +0000 (06:34 +0000)]
hinic: fix a bug of rss configuration
should use real receive queue number to configure hw rss
indirect table rather than maximal queue number
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luo bin [Thu, 27 Feb 2020 06:34:43 +0000 (06:34 +0000)]
hinic: fix a bug of setting hw_ioctxt
a reserved field is used to signify prime physical function index
in the latest firmware version, so we must assign a value to it
correctly
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luo bin [Thu, 27 Feb 2020 06:34:42 +0000 (06:34 +0000)]
hinic: fix a irq affinity bug
can not use a local variable as an input parameter of
irq_set_affinity_hint
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 27 Feb 2020 19:07:13 +0000 (11:07 -0800)]
Merge tag 'docs-5.6-fixes' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"A pair of docs-build fixes"
* tag 'docs-5.6-fixes' of git://git.lwn.net/linux:
docs: Fix empty parallelism argument
docs: remove MPX from the x86 toc
Linus Torvalds [Thu, 27 Feb 2020 19:01:22 +0000 (11:01 -0800)]
Merge tag 'audit-pr-
20200226' of git://git./linux/kernel/git/pcmoore/audit
Pull audit fixes from Paul Moore:
"Two fixes for problems found by syzbot:
- Moving audit filter structure fields into a union caused some
problems in the code which populates that filter structure.
We keep the union (that idea is a good one), but we are fixing the
code so that it doesn't needlessly set fields in the union and mess
up the error handling.
- The audit_receive_msg() function wasn't validating user input as
well as it should in all cases, we add the necessary checks"
* tag 'audit-pr-
20200226' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: always check the netlink payload length in audit_receive_msg()
audit: fix error handling in audit_data_to_entry()
Bijan Mottahedeh [Thu, 27 Feb 2020 02:53:43 +0000 (18:53 -0800)]
nvme-pci: Hold cq_poll_lock while completing CQEs
Completions need to consumed in the same order the controller submitted
them, otherwise future completion entries may overwrite ones we haven't
handled yet. Hold the nvme queue's poll lock while completing new CQEs to
prevent another thread from freeing command tags for reuse out-of-order.
Fixes: dabcefab45d3 ("nvme: provide optimized poll function for separate poll queues")
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Marek Szyprowski [Thu, 27 Feb 2020 11:51:46 +0000 (12:51 +0100)]
PCI: brcmstb: Fix build on 32bit ARM platforms with older compilers
Some older compilers have no implementation for the helper for 64-bit
unsigned division/modulo, so linking pcie-brcmstb driver causes the
"undefined reference to `__aeabi_uldivmod'" error.
*rc_bar2_size is always a power of two, because it is calculated as:
"1ULL << fls64(entry->res->end - entry->res->start)", so the modulo
operation in the subsequent check can be replaced by a simple logical
AND with a proper mask.
Link: https://lore.kernel.org/r/20200227115146.24515-1-m.szyprowski@samsung.com
Fixes: c0452137034b ("PCI: brcmstb: Add Broadcom STB PCIe host controller driver")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tobias Klauser [Wed, 26 Feb 2020 17:38:32 +0000 (18:38 +0100)]
io_uring: define and set show_fdinfo only if procfs is enabled
Follow the pattern used with other *_show_fdinfo functions and only
define and use io_uring_show_fdinfo and its helper functions if
CONFIG_PROC_FS is set.
Fixes: 87ce955b24c9 ("io_uring: add ->show_fdinfo() for the io_uring file descriptor")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Jens Axboe <axboe@kernel.dk>