Piotr Maziarz [Thu, 7 Nov 2019 12:45:38 +0000 (13:45 +0100)]
tracing: Use seq_buf_hex_dump() to dump buffers
Without this, buffers can be printed with __print_array macro that has
no formatting options and can be hard to read. The other way is to
mimic formatting capability with multiple calls of trace event with one
call per row which gives performance impact and different timestamp in
each row.
Link: http://lkml.kernel.org/r/1573130738-29390-2-git-send-email-piotrx.maziarz@linux.intel.com
Signed-off-by: Piotr Maziarz <piotrx.maziarz@linux.intel.com>
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Piotr Maziarz [Thu, 7 Nov 2019 12:45:37 +0000 (13:45 +0100)]
seq_buf: Add printing formatted hex dumps
Provided function is an analogue of print_hex_dump().
Implementing this function in seq_buf allows using for multiple
purposes (e.g. for tracing) and therefore prevents from code duplication
in every layer that uses seq_buf.
print_hex_dump() is an essential part of logging data to dmesg. Adding
similar capability for other purposes is beneficial to all users.
Example usage:
seq_buf_hex_dump(seq, "", DUMP_PREFIX_OFFSET, 16, 4, buf,
ARRAY_SIZE(buf), true);
Example output:
00000000:
00000000 ffffff10 ffffff32 ffff3210 ........2....2..
00000010:
ffff3210 83d00437 c0700000 00000000 .2..7.....p.....
00000020:
02010004 0000000f 0000000f 00004002 .............@..
00000030:
00000fff 00000000 ........
Link: http://lkml.kernel.org/r/1573130738-29390-1-git-send-email-piotrx.maziarz@linux.intel.com
Signed-off-by: Piotr Maziarz <piotrx.maziarz@linux.intel.com>
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Masami Hiramatsu [Tue, 29 Oct 2019 08:31:44 +0000 (17:31 +0900)]
tracing/kprobe: Check whether the non-suffixed symbol is notrace
Check whether the non-suffixed symbol is notrace, since suffixed
symbols are generated by the compilers for optimization. Based on
these suffixed symbols, notrace check might not work because
some of them are just a partial code of the original function.
(e.g. cold-cache (unlikely) code is separated from original
function as FUNCTION.cold.XX)
For example, without this fix,
# echo p device_add.cold.67 > /sys/kernel/debug/tracing/kprobe_events
sh: write error: Invalid argument
# cat /sys/kernel/debug/tracing/error_log
[ 135.491035] trace_kprobe: error: Failed to register probe event
Command: p device_add.cold.67
^
# dmesg | tail -n 1
[ 135.488599] trace_kprobe: Could not probe notrace function device_add.cold.67
With this,
# echo p device_add.cold.66 > /sys/kernel/debug/tracing/kprobe_events
# cat /sys/kernel/debug/kprobes/list
ffffffff81599de9 k device_add.cold.66+0x0 [DISABLED]
Actually, kprobe blacklist already did similar thing,
see within_kprobe_blacklist().
Link: http://lkml.kernel.org/r/157233790394.6706.18243942030937189679.stgit@devnote2
Fixes:
45408c4f9250 ("tracing: kprobes: Prohibit probing on notrace function")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Yuming Han [Thu, 24 Oct 2019 03:34:30 +0000 (11:34 +0800)]
tracing: use kvcalloc for tgid_map array allocation
Fail to allocate memory for tgid_map, because it requires order-6 page.
detail as:
c3 sh: page allocation failure: order:6,
mode:0x140c0c0(GFP_KERNEL), nodemask=(null)
c3 sh cpuset=/ mems_allowed=0
c3 CPU: 3 PID: 5632 Comm: sh Tainted: G W O 4.14.133+ #10
c3 Hardware name: Generic DT based system
c3 Backtrace:
c3 [<
c010bdbc>] (dump_backtrace) from [<
c010c08c>](show_stack+0x18/0x1c)
c3 [<
c010c074>] (show_stack) from [<
c0993c54>](dump_stack+0x84/0xa4)
c3 [<
c0993bd0>] (dump_stack) from [<
c0229858>](warn_alloc+0xc4/0x19c)
c3 [<
c0229798>] (warn_alloc) from [<
c022a6e4>](__alloc_pages_nodemask+0xd18/0xf28)
c3 [<
c02299cc>] (__alloc_pages_nodemask) from [<
c0248344>](kmalloc_order+0x20/0x38)
c3 [<
c0248324>] (kmalloc_order) from [<
c0248380>](kmalloc_order_trace+0x24/0x108)
c3 [<
c024835c>] (kmalloc_order_trace) from [<
c01e6078>](set_tracer_flag+0xb0/0x158)
c3 [<
c01e5fc8>] (set_tracer_flag) from [<
c01e6404>](trace_options_core_write+0x7c/0xcc)
c3 [<
c01e6388>] (trace_options_core_write) from [<
c0278b1c>](__vfs_write+0x40/0x14c)
c3 [<
c0278adc>] (__vfs_write) from [<
c0278e10>](vfs_write+0xc4/0x198)
c3 [<
c0278d4c>] (vfs_write) from [<
c027906c>](SyS_write+0x6c/0xd0)
c3 [<
c0279000>] (SyS_write) from [<
c01079a0>](ret_fast_syscall+0x0/0x54)
Switch to use kvcalloc to avoid unexpected allocation failures.
Link: http://lkml.kernel.org/r/1571888070-24425-1-git-send-email-chunyan.zhang@unisoc.com
Signed-off-by: Yuming Han <yuming.han@unisoc.com>
Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Srivatsa S. Bhat (VMware) [Thu, 10 Oct 2019 18:51:17 +0000 (11:51 -0700)]
tracing/hwlat: Fix a few trivial nits
Update the source file name in the comments, and fix a grammatical
error.
Link: http://lkml.kernel.org/r/157073346821.17189.8946944856026592247.stgit@srivatsa-ubuntu
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Andy Shevchenko [Mon, 7 Oct 2019 13:56:56 +0000 (16:56 +0300)]
tracing: Use generic type for comparator function
Comparator function type, cmp_func_t, is defined in the types.h,
use it in the code.
Link: http://lkml.kernel.org/r/20191007135656.37734-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Andy Shevchenko [Mon, 7 Oct 2019 13:56:55 +0000 (16:56 +0300)]
lib/bsearch: Use generic type for comparator function
Comparator function type, cmp_func_t, is defined in the types.h,
use it in bsearch() and, thus, add more sense to the corresponding
comment in the code.
Link: http://lkml.kernel.org/r/20191007135656.37734-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Andy Shevchenko [Mon, 7 Oct 2019 13:56:54 +0000 (16:56 +0300)]
lib/sort: Move swap, cmp and cmp_r function types for wider use
The function types for swap, cmp and cmp_r functions are already
being in use by modules.
Move them to types.h that everybody in kernel will be able to use
generic types instead of custom ones.
This adds more sense to the comment in bsearch() later on.
Link: http://lkml.kernel.org/r/20191007135656.37734-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Wed, 13 Nov 2019 16:48:39 +0000 (11:48 -0500)]
tracing/selftests: Turn off timeout setting
As the ftrace selftests can run for a long period of time, disable the
timeout that the general selftests have. If a selftest hangs, then it
probably means the machine will hang too.
Link: https://lore.kernel.org/r/alpine.LSU.2.21.1911131604170.18679@pobox.suse.cz
Suggested-by: Miroslav Benes <mbenes@suse.cz>
Tested-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Tue, 15 Oct 2019 13:00:55 +0000 (09:00 -0400)]
fgraph: Fix function type mismatches of ftrace_graph_return using ftrace_stub
The C compiler is allowing more checks to make sure that function pointers
are assigned to the correct prototype function. Unfortunately, the function
graph tracer uses a special name with its assigned ftrace_graph_return
function pointer that maps to a stub function used by the function tracer
(ftrace_stub). The ftrace_graph_return variable is compared to the
ftrace_stub in some archs to know if the function graph tracer is enabled or
not. This means we can not just simply create a new function stub that
compares it without modifying all the archs.
Instead, have the linker script create a function_graph_stub that maps to
ftrace_stub, and this way we can define the prototype for it to match the
prototype of ftrace_graph_return, and make the compiler checks all happy!
Link: http://lkml.kernel.org/r/20191015090055.789a0aed@gandalf.local.home
Cc: linux-sh@vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Divya Indi [Wed, 14 Aug 2019 17:55:25 +0000 (10:55 -0700)]
tracing: Adding NULL checks for trace_array descriptor pointer
As part of commit
f45d1225adb0 ("tracing: Kernel access to Ftrace
instances") we exported certain functions. Here, we are adding some additional
NULL checks to ensure safe usage by users of these APIs.
Link: http://lkml.kernel.org/r/1565805327-579-4-git-send-email-divya.indi@oracle.com
Signed-off-by: Divya Indi <divya.indi@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Divya Indi [Wed, 14 Aug 2019 17:55:24 +0000 (10:55 -0700)]
tracing: Verify if trace array exists before destroying it.
A trace array can be destroyed from userspace or kernel. Verify if the
trace array exists before proceeding to destroy/remove it.
Link: http://lkml.kernel.org/r/1565805327-579-3-git-send-email-divya.indi@oracle.com
Reviewed-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Signed-off-by: Divya Indi <divya.indi@oracle.com>
[ Removed unneeded braces ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Divya Indi [Wed, 14 Aug 2019 17:55:23 +0000 (10:55 -0700)]
tracing: Declare newly exported APIs in include/linux/trace.h
Declare the newly introduced and exported APIs in the header file -
include/linux/trace.h. Moving previous declarations from
kernel/trace/trace.h to include/linux/trace.h.
Link: http://lkml.kernel.org/r/1565805327-579-2-git-send-email-divya.indi@oracle.com
Signed-off-by: Divya Indi <divya.indi@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Ben Dooks [Tue, 15 Oct 2019 12:10:12 +0000 (13:10 +0100)]
tracing: Make internal ftrace events static
The event_class_ftrace_##call and event_##call do not seem
to be used outside of trace_export.c so make them both static
to avoid a number of sparse warnings:
kernel/trace/trace_entries.h:59:1: warning: symbol 'event_class_ftrace_function' was not declared. Should it be static?
kernel/trace/trace_entries.h:59:1: warning: symbol '__event_function' was not declared. Should it be static?
kernel/trace/trace_entries.h:77:1: warning: symbol 'event_class_ftrace_funcgraph_entry' was not declared. Should it be static?
kernel/trace/trace_entries.h:77:1: warning: symbol '__event_funcgraph_entry' was not declared. Should it be static?
kernel/trace/trace_entries.h:93:1: warning: symbol 'event_class_ftrace_funcgraph_exit' was not declared. Should it be static?
kernel/trace/trace_entries.h:93:1: warning: symbol '__event_funcgraph_exit' was not declared. Should it be static?
kernel/trace/trace_entries.h:129:1: warning: symbol 'event_class_ftrace_context_switch' was not declared. Should it be static?
kernel/trace/trace_entries.h:129:1: warning: symbol '__event_context_switch' was not declared. Should it be static?
kernel/trace/trace_entries.h:149:1: warning: symbol 'event_class_ftrace_wakeup' was not declared. Should it be static?
kernel/trace/trace_entries.h:149:1: warning: symbol '__event_wakeup' was not declared. Should it be static?
kernel/trace/trace_entries.h:171:1: warning: symbol 'event_class_ftrace_kernel_stack' was not declared. Should it be static?
kernel/trace/trace_entries.h:171:1: warning: symbol '__event_kernel_stack' was not declared. Should it be static?
kernel/trace/trace_entries.h:191:1: warning: symbol 'event_class_ftrace_user_stack' was not declared. Should it be static?
kernel/trace/trace_entries.h:191:1: warning: symbol '__event_user_stack' was not declared. Should it be static?
kernel/trace/trace_entries.h:214:1: warning: symbol 'event_class_ftrace_bprint' was not declared. Should it be static?
kernel/trace/trace_entries.h:214:1: warning: symbol '__event_bprint' was not declared. Should it be static?
kernel/trace/trace_entries.h:230:1: warning: symbol 'event_class_ftrace_print' was not declared. Should it be static?
kernel/trace/trace_entries.h:230:1: warning: symbol '__event_print' was not declared. Should it be static?
kernel/trace/trace_entries.h:247:1: warning: symbol 'event_class_ftrace_raw_data' was not declared. Should it be static?
kernel/trace/trace_entries.h:247:1: warning: symbol '__event_raw_data' was not declared. Should it be static?
kernel/trace/trace_entries.h:262:1: warning: symbol 'event_class_ftrace_bputs' was not declared. Should it be static?
kernel/trace/trace_entries.h:262:1: warning: symbol '__event_bputs' was not declared. Should it be static?
kernel/trace/trace_entries.h:277:1: warning: symbol 'event_class_ftrace_mmiotrace_rw' was not declared. Should it be static?
kernel/trace/trace_entries.h:277:1: warning: symbol '__event_mmiotrace_rw' was not declared. Should it be static?
kernel/trace/trace_entries.h:298:1: warning: symbol 'event_class_ftrace_mmiotrace_map' was not declared. Should it be static?
kernel/trace/trace_entries.h:298:1: warning: symbol '__event_mmiotrace_map' was not declared. Should it be static?
kernel/trace/trace_entries.h:322:1: warning: symbol 'event_class_ftrace_branch' was not declared. Should it be static?
kernel/trace/trace_entries.h:322:1: warning: symbol '__event_branch' was not declared. Should it be static?
kernel/trace/trace_entries.h:343:1: warning: symbol 'event_class_ftrace_hwlat' was not declared. Should it be static?
kernel/trace/trace_entries.h:343:1: warning: symbol '__event_hwlat' was not declared. Should it be static?
Link: http://lkml.kernel.org/r/20191015121012.18824-1-ben.dooks@codethink.co.uk
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Sebastian Andrzej Siewior [Tue, 15 Oct 2019 19:18:20 +0000 (21:18 +0200)]
tracing: Use CONFIG_PREEMPTION
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.
Add additional header output for PREEMPT_RT.
Link: http://lkml.kernel.org/r/20191015191821.11479-34-bigeasy@linutronix.de
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Viktor Rosendahl (BMW) [Tue, 8 Oct 2019 22:08:22 +0000 (00:08 +0200)]
preemptirq_delay_test: Add the burst feature and a sysfs trigger
This burst feature enables the user to generate a burst of
preempt/irqsoff latencies. This makes it possible to test whether we
are able to detect latencies that systematically occur very close to
each other.
The maximum burst size is 10. We also create 10 identical test
functions, so that we get 10 different backtraces; this is useful
when we want to test whether we can detect all the latencies in a
burst. Otherwise, there would be no easy way of differentiating
between which latency in a burst was captured by the tracer.
In addition, there is a sysfs trigger, so that it's not necessary to
reload the module to repeat the test. The trigger will appear as
/sys/kernel/preemptirq_delay_test/trigger in sysfs.
Link: http://lkml.kernel.org/r/20191008220824.7911-3-viktor.rosendahl@gmail.com
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Viktor Rosendahl (BMW) <viktor.rosendahl@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Viktor Rosendahl (BMW) [Tue, 8 Oct 2019 22:08:21 +0000 (00:08 +0200)]
ftrace: Implement fs notification for tracing_max_latency
This patch implements the feature that the tracing_max_latency file,
e.g. /sys/kernel/debug/tracing/tracing_max_latency will receive
notifications through the fsnotify framework when a new latency is
available.
One particularly interesting use of this facility is when enabling
threshold tracing, through /sys/kernel/debug/tracing/tracing_thresh,
together with the preempt/irqsoff tracers. This makes it possible to
implement a user space program that can, with equal probability,
obtain traces of latencies that occur immediately after each other in
spite of the fact that the preempt/irqsoff tracers operate in overwrite
mode.
This facility works with the hwlat, preempt/irqsoff, and wakeup
tracers.
The tracers may call the latency_fsnotify() from places such as
__schedule() or do_idle(); this makes it impossible to call
queue_work() directly without risking a deadlock. The same would
happen with a softirq, kernel thread or tasklet. For this reason we
use the irq_work mechanism to call queue_work().
This patch creates a new workqueue. The reason for doing this is that
I wanted to use the WQ_UNBOUND and WQ_HIGHPRI flags. My thinking was
that WQ_UNBOUND might help with the latency in some important cases.
If we use:
queue_work(system_highpri_wq, &tr->fsnotify_work);
then the work will (almost) always execute on the same CPU but if we are
unlucky that CPU could be too busy while there could be another CPU in
the system that would be able to process the work soon enough.
queue_work_on() could be used to queue the work on another CPU but it
seems difficult to select the right CPU.
Link: http://lkml.kernel.org/r/20191008220824.7911-2-viktor.rosendahl@gmail.com
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Viktor Rosendahl (BMW) <viktor.rosendahl@gmail.com>
[ Added max() to have one compare for max latency ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Tue, 1 Oct 2019 18:38:07 +0000 (14:38 -0400)]
ftrace: Add information on number of page groups allocated
Looking for ways to shrink the size of the dyn_ftrace structure, knowing the
information about how many pages and the number of groups of those pages, is
useful in working out the best ways to save on memory.
This adds one info print on how many groups of pages were used to allocate
the ftrace dyn_ftrace structures, and also shows the number of pages and
groups in the dyn_ftrace_total_info (which is used for debugging).
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Josh Poimboeuf [Fri, 8 Nov 2019 22:51:00 +0000 (16:51 -0600)]
ftrace/x86: Tell objtool to ignore nondeterministic ftrace stack layout
Objtool complains about the new ftrace direct trampoline code:
arch/x86/kernel/ftrace_64.o: warning: objtool: ftrace_regs_caller()+0x190: stack state mismatch: cfa1=7+16 cfa2=7+24
Typically, code has a deterministic stack layout, such that at a given
instruction address, the stack frame size is always the same.
That's not the case for the new ftrace_regs_caller() code after it
adjusts the stack for the direct case. Just plead ignorance and assume
it's always the non-direct path. Note this creates a tiny window for
ORC to get confused.
Link: http://lkml.kernel.org/r/20191108225100.ea3bhsbdf6oerj6g@treble
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Fri, 8 Nov 2019 18:12:57 +0000 (13:12 -0500)]
ftrace/x86: Add a counter to test function_graph with direct
As testing for direct calls from the function graph tracer adds a little
overhead (which is a lot when tracing every function), add a counter that
can be used to test if function_graph tracer needs to test for a direct
caller or not.
It would have been nicer if we could use a static branch, but the static
branch logic fails when used within the function graph tracer trampoline.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Fri, 8 Nov 2019 18:11:39 +0000 (13:11 -0500)]
ftrace/x86: Add register_ftrace_direct() for custom trampolines
Enable x86 to allow for register_ftrace_direct(), where a custom trampoline
may be called directly from an ftrace mcount/fentry location.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Fri, 8 Nov 2019 21:08:12 +0000 (16:08 -0500)]
ftrace/selftests: Update the direct call selftests to test two direct calls
The register_ftrace_direct() takes a different path if there's already a
direct call registered, but this was not tested in the self tests. Now that
there's a second direct caller test module, we can use this to test not only
one direct caller, but two.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Fri, 8 Nov 2019 20:27:45 +0000 (15:27 -0500)]
ftrace: Add another example of register_ftrace_direct() use case
Add another module sample that registers a direct trampoline to a function
via register_ftrace_direct(). Having another module that does this allows to
test the use case of multiple direct callers registered, as more than one
direct caller goes into another path, and is needed to perform proper
testing of the register_ftrace_direct() call.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Fri, 8 Nov 2019 17:19:14 +0000 (12:19 -0500)]
ftrace/selftest: Add tests to test register_ftrace_direct()
Add two test cases that test the new ftrace direct functionality if the
ftrace-direct sample module is available. One test case tests against each
available tracer (function, function_graph, mmiotrace, etc), and the other
test tests against a kprobe at the same location as the direct caller. Both
tests follow the same pattern of testing combinations:
enable test (either the tracer or the kprobe)
load direct function module
unload direct function module
disable test
enable test
load direct function module
disable test
unload direct function module
load direct function module
enable test
disable test
unload direct function module
load direct function module
enable test
unload direct function module
disable test
As most the bugs in development happened with various ways of enabling or
disabling the direct calls with function tracer in one of these
combinations.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Fri, 8 Nov 2019 18:12:33 +0000 (13:12 -0500)]
ftrace: Add sample module that uses register_ftrace_direct()
Add a sample module that shows a simple use case for
regsiter_ftrace_direct(), and how to use it.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Fri, 8 Nov 2019 18:11:27 +0000 (13:11 -0500)]
ftrace: Add ftrace_find_direct_func()
As function_graph tracer modifies the return address to insert a trampoline
to trace the return of a function, it must be aware of a direct caller, as
when it gets called, the function's return address may not be at on the
stack where it expects. It may have to see if that return address points to
the a direct caller and adjust if it is.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Fri, 8 Nov 2019 18:07:06 +0000 (13:07 -0500)]
ftrace: Add register_ftrace_direct()
Add the start of the functionality to allow other trampolines to use the
ftrace mcount/fentry/nop location. This adds two new functions:
register_ftrace_direct() and unregister_ftrace_direct()
Both take two parameters: the first is the instruction address of where the
mcount/fentry/nop exists, and the second is the trampoline to have that
location called.
This will handle cases where ftrace is already used on that same location,
and will make it still work, where the registered direct called trampoline
will get called after all the registered ftrace callers are handled.
Currently, it will not allow for IP_MODIFY functions to be called at the
same locations, which include some kprobes and live kernel patching.
At this point, no architecture supports this. This is only the start of
implementing the framework.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Fri, 8 Nov 2019 17:26:46 +0000 (12:26 -0500)]
ftrace: Separate out functionality from ftrace_location_range()
Create a new function called lookup_rec() from the functionality of
ftrace_location_range(). The difference between lookup_rec() is that it
returns the record that it finds, where as ftrace_location_range() returns
only if it found a match or not.
The lookup_rec() is static, and can be used for new functionality where
ftrace needs to find a record of a specific address.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Fri, 8 Nov 2019 17:25:46 +0000 (12:25 -0500)]
ftrace: Separate out the copying of a ftrace_hash from __ftrace_hash_move()
Most of the functionality of __ftrace_hash_move() can be reused, but not all
of it. That is, __ftrace_hash_move() is used to simply make a new hash from
an existing one, using the same size as the original. Creating a dup_hash(),
where we can specify a new size will be useful when we want to create a hash
with a default size, or simply copy the old one.
Signed-off-by: Steven Rostedt (VMWare) <rostedt@goodmis.org>
Joe Lawrence [Wed, 16 Oct 2019 11:33:15 +0000 (13:33 +0200)]
selftests/livepatch: Test interaction with ftrace_enabled
Since livepatching depends upon ftrace handlers to implement "patched"
code functionality, verify that the ftrace_enabled sysctl value
interacts with livepatch registration as expected. At the same time,
ensure that ftrace_enabled is set and part of the test environment
configuration that is saved and restored when running the selftests.
Link: http://lkml.kernel.org/r/20191016113316.13415-4-mbenes@suse.cz
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Joe Lawrence [Wed, 16 Oct 2019 11:33:14 +0000 (13:33 +0200)]
selftests/livepatch: Make dynamic debug setup and restore generic
Livepatch selftests currently save the current dynamic debug config and
tweak it for the selftests. The config is restored at the end. Make the
infrastructure generic, so that more variables can be saved and
restored.
Link: http://lkml.kernel.org/r/20191016113316.13415-3-mbenes@suse.cz
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Miroslav Benes [Wed, 16 Oct 2019 11:33:13 +0000 (13:33 +0200)]
ftrace: Introduce PERMANENT ftrace_ops flag
Livepatch uses ftrace for redirection to new patched functions. It means
that if ftrace is disabled, all live patched functions are disabled as
well. Toggling global 'ftrace_enabled' sysctl thus affect it directly.
It is not a problem per se, because only administrator can set sysctl
values, but it still may be surprising.
Introduce PERMANENT ftrace_ops flag to amend this. If the
FTRACE_OPS_FL_PERMANENT is set on any ftrace ops, the tracing cannot be
disabled by disabling ftrace_enabled. Equally, a callback with the flag
set cannot be registered if ftrace_enabled is disabled.
Link: http://lkml.kernel.org/r/20191016113316.13415-2-mbenes@suse.cz
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Linus Torvalds [Sun, 3 Nov 2019 22:07:26 +0000 (14:07 -0800)]
Linux 5.4-rc6
Linus Torvalds [Sun, 3 Nov 2019 16:25:25 +0000 (08:25 -0800)]
Merge tag 'usb-5.4-rc6' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"The USB sub-maintainers woke up this past week and sent a bunch of
tiny fixes. Here are a lot of small patches that that resolve a bunch
of reported issues in the USB core, drivers, serial drivers, gadget
drivers, and of course, xhci :)
All of these have been in linux-next with no reported issues"
* tag 'usb-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (31 commits)
usb: dwc3: gadget: fix race when disabling ep with cancelled xfers
usb: cdns3: gadget: Fix g_audio use case when connected to Super-Speed host
usb: cdns3: gadget: reset EP_CLAIMED flag while unloading
USB: serial: whiteheat: fix line-speed endianness
USB: serial: whiteheat: fix potential slab corruption
USB: gadget: Reject endpoints with 0 maxpacket value
UAS: Revert commit
3ae62a42090f ("UAS: fix alignment of scatter/gather segments")
usb-storage: Revert commit
747668dbc061 ("usb-storage: Set virt_boundary_mask to avoid SG overflows")
usbip: Fix free of unallocated memory in vhci tx
usbip: tools: Fix read_usb_vudc_device() error path handling
usb: xhci: fix __le32/__le64 accessors in debugfs code
usb: xhci: fix Immediate Data Transfer endianness
xhci: Fix use-after-free regression in xhci clear hub TT implementation
USB: ldusb: fix control-message timeout
USB: ldusb: use unsigned size format specifiers
USB: ldusb: fix ring-buffer locking
USB: Skip endpoints with 0 maxpacket length
usb: cdns3: gadget: Don't manage pullups
usb: dwc3: remove the call trace of USBx_GFLADJ
usb: gadget: configfs: fix concurrent issue between composite APIs
...
Linus Torvalds [Sat, 2 Nov 2019 21:34:00 +0000 (14:34 -0700)]
Merge tag '5.4-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fix from Steve French:
"A small smb3 memleak fix"
* tag '5.4-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6:
fix memory leak in large read decrypt offload
Linus Torvalds [Sat, 2 Nov 2019 18:28:59 +0000 (11:28 -0700)]
Merge tag 'hwmon-for-v5.4-rc6' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix read timeout problem in ina3221 driver
- Fix wrong bitmask in nct7904 driver
* tag 'hwmon-for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ina3221) Fix read timeout issue
hwmon: (nct7904) Fix the incorrect value of vsen_mask & tcpu_mask & temp_mode in nct7904_data struct.
Linus Torvalds [Sat, 2 Nov 2019 18:23:09 +0000 (11:23 -0700)]
Merge tag 'pwm/for-5.4-rc6' of git://git./linux/kernel/git/thierry.reding/linux-pwm
Pull pwm fixes from Thierry Reding:
"It turned out that relying solely on drivers storing all the PWM state
in hardware was a little premature and causes a number of subtle (and
some not so subtle) regressions. Revert the offending patch for now"
* tag 'pwm/for-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
Revert "pwm: Let pwm_get_state() return the last implemented state"
Linus Torvalds [Sat, 2 Nov 2019 18:15:52 +0000 (11:15 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Nine changes, eight in drivers [ufs, target, lpfc x 2, qla2xxx x 4]
and one core change in sd that fixes an I/O failure on DIF type 3
devices"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: stop timer in shutdown path
scsi: sd: define variable dif as unsigned int instead of bool
scsi: target: cxgbit: Fix cxgbit_fw4_ack()
scsi: qla2xxx: Fix partial flash write of MBI
scsi: qla2xxx: Initialized mailbox to prevent driver load failure
scsi: lpfc: Honor module parameter lpfc_use_adisc
scsi: ufs-bsg: Wake the device before sending raw upiu commands
scsi: lpfc: Check queue pointer before use
scsi: qla2xxx: fixup incorrect usage of host_byte
Linus Torvalds [Sat, 2 Nov 2019 18:08:19 +0000 (11:08 -0700)]
Merge tag 'powerpc-5.4-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Our recent cleanup of EEH led to an oops on bare metal machines when
the cxl (CAPI) driver creates virtual devices for an attached FPGA
accelerator.
The "secure virtual machine" support we added in v5.4 had a bug if the
kernel was relocated (moved during boot), in those cases the signature
of the kernel text wouldn't verify and the Ultravisor would refuse to
run the VM.
A recent change to disable interrupts before calling
arch_cpu_idle_dead() caused a WARN_ON() in our bare metal CPU offline
code to always trigger.
The KUAP (SMAP) support we added for 32-bit Book3S had a bug if the
address range crossed a segment (256MB) boundary which could lead to
spurious faults.
Thanks to: Christophe Leroy, Frederic Barrat, Michael Anderson,
Nicholas Piggin, Sam Bobroff, Thiago Jung Bauermann"
* tag 'powerpc-5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv: Fix CPU idle to be called with IRQs disabled
powerpc/prom_init: Undo relocation before entering secure mode
powerpc/powernv/eeh: Fix oops when probing cxl devices
powerpc/32s: fix allow/prevent_user_access() when crossing segment boundaries.
Linus Torvalds [Sat, 2 Nov 2019 18:00:26 +0000 (11:00 -0700)]
Merge tag 's390-5.4-6' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix cpu idle time accounting
- Fix stack unwinder case when both pt_regs and sp are specified
- Fix information leak via cmm timeout proc handler
* tag 's390-5.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/idle: fix cpu idle time calculation
s390/unwind: fix mixing regs and sp
s390/cmm: fix information leak in cmm_timeout_handler()
Linus Torvalds [Sat, 2 Nov 2019 00:48:11 +0000 (17:48 -0700)]
Merge git://git./linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Fix free/alloc races in batmanadv, from Sven Eckelmann.
2) Several leaks and other fixes in kTLS support of mlx5 driver, from
Tariq Toukan.
3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke
Høiland-Jørgensen.
4) Add an r8152 device ID, from Kazutoshi Noguchi.
5) Missing include in ipv6's addrconf.c, from Ben Dooks.
6) Use siphash in flow dissector, from Eric Dumazet. Attackers can
easily infer the 32-bit secret otherwise etc.
7) Several netdevice nesting depth fixes from Taehee Yoo.
8) Fix several KCSAN reported errors, from Eric Dumazet. For example,
when doing lockless skb_queue_empty() checks, and accessing
sk_napi_id/sk_incoming_cpu lockless as well.
9) Fix jumbo packet handling in RXRPC, from David Howells.
10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet.
11) Fix DMA synchronization in gve driver, from Yangchun Fu.
12) Several bpf offload fixes, from Jakub Kicinski.
13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo.
14) Fix ping latency during high traffic rates in hisilicon driver, from
Jiangfent Xiao.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
net: fix installing orphaned programs
net: cls_bpf: fix NULL deref on offload filter removal
selftests: bpf: Skip write only files in debugfs
selftests: net: reuseport_dualstack: fix uninitalized parameter
r8169: fix wrong PHY ID issue with RTL8168dp
net: dsa: bcm_sf2: Fix IMP setup for port different than 8
net: phylink: Fix phylink_dbg() macro
gve: Fixes DMA synchronization.
inet: stop leaking jiffies on the wire
ixgbe: Remove duplicate clear_bit() call
Documentation: networking: device drivers: Remove stray asterisks
e1000: fix memory leaks
i40e: Fix receive buffer starvation for AF_XDP
igb: Fix constant media auto sense switching when no cable is connected
net: ethernet: arc: add the missed clk_disable_unprepare
igb: Enable media autosense for the i350.
igb/igc: Don't warn on fatal read failures when the device is removed
tcp: increase tcp_max_syn_backlog max value
net: increase SOMAXCONN to 4096
netdevsim: Fix use-after-free during device dismantle
...
Linus Torvalds [Sat, 2 Nov 2019 00:37:44 +0000 (17:37 -0700)]
Merge tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client bugfixes from Anna Schumaker:
"This contains two delegation fixes (with the RCU lock leak fix marked
for stable), and three patches to fix destroying the the sunrpc back
channel.
Stable bugfixes:
- Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
Other fixes:
- The TCP back channel mustn't disappear while requests are
outstanding
- The RDMA back channel mustn't disappear while requests are
outstanding
- Destroy the back channel when we destroy the host transport
- Don't allow a cached open with a revoked delegation"
* tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
NFSv4: Don't allow a cached open with a revoked delegation
SUNRPC: Destroy the back channel when we destroy the host transport
SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding
SUNRPC: The TCP back channel mustn't disappear while requests are outstanding
Linus Torvalds [Sat, 2 Nov 2019 00:33:12 +0000 (17:33 -0700)]
Merge tag 'for-linus-
20191101' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Two small nvme fixes, one is a fabrics connection fix, the other one
a cleanup made possible by that fix (Anton, via Keith)
- Fix requeue handling in umb ubd (Anton)
- Fix spin_lock_irq() nesting in blk-iocost (Dan)
- Three small io_uring fixes:
- Install io_uring fd after done with ctx (me)
- Clear ->result before every poll issue (me)
- Fix leak of shadow request on error (Pavel)
* tag 'for-linus-
20191101' of git://git.kernel.dk/linux-block:
iocost: don't nest spin_lock_irq in ioc_weight_write()
io_uring: ensure we clear io_kiocb->result before each issue
um-ubd: Entrust re-queue to the upper layers
nvme-multipath: remove unused groups_only mode in ana log
nvme-multipath: fix possible io hang after ctrl reconnect
io_uring: don't touch ctx in setup after ring fd install
io_uring: Fix leaked shadow_req
Linus Torvalds [Sat, 2 Nov 2019 00:20:53 +0000 (17:20 -0700)]
Merge tag 'riscv/for-v5.4-rc6' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
"One fix for PCIe users:
- Fix legacy PCI I/O port access emulation
One set of cleanups:
- Resolve most of the warnings generated by sparse across arch/riscv.
No functional changes
And one MAINTAINERS update:
- Update Palmer's E-mail address"
* tag 'riscv/for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
MAINTAINERS: Change to my personal email address
RISC-V: Add PCIe I/O BAR memory mapping
riscv: for C functions called only from assembly, mark with __visible
riscv: fp: add missing __user pointer annotations
riscv: add missing header file includes
riscv: mark some code and data as file-static
riscv: init: merge split string literals in preprocessor directive
riscv: add prototypes for assembly language functions from head.S
Linus Torvalds [Fri, 1 Nov 2019 22:16:25 +0000 (15:16 -0700)]
Merge branch 'parisc-5.4-3' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
"Fix a parisc kernel crash with ftrace functions when compiled without
frame pointers"
* 'parisc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: fix frame pointer in ftrace_regs_caller()
David S. Miller [Fri, 1 Nov 2019 22:16:01 +0000 (15:16 -0700)]
Merge branch 'fix-BPF-offload-related-bugs'
Jakub Kicinski says:
====================
fix BPF offload related bugs
test_offload.py catches some recently added bugs.
First of a bug in test_offload.py itself after recent changes
to netdevsim is fixed.
Second patch fixes a bug in cls_bpf, and last one addresses
a problem with the recently added XDP installation optimization.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 1 Nov 2019 03:07:00 +0000 (20:07 -0700)]
net: fix installing orphaned programs
When netdevice with offloaded BPF programs is destroyed
the programs are orphaned and removed from the program
IDA - their IDs get released (the programs may remain
accessible via existing open file descriptors and pinned
files). After IDs are released they are set to 0.
This confuses dev_change_xdp_fd() because it compares
the __dev_xdp_query() result where 0 means no program
with prog->aux->id where 0 means orphaned.
dev_change_xdp_fd() would have incorrectly returned success
even though it had not installed the program.
Since drivers already catch this case via bpf_offload_dev_match()
let them handle this case. The error message drivers produce in
this case ("program loaded for a different device") is in fact
correct as the orphaned program must had to be loaded for a
different device.
Fixes:
c14a9f633d9e ("net: Don't call XDP_SETUP_PROG when nothing is changed")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 1 Nov 2019 03:06:59 +0000 (20:06 -0700)]
net: cls_bpf: fix NULL deref on offload filter removal
Commit
401192113730 ("net: sched: refactor block offloads counter
usage") missed the fact that either new prog or old prog may be
NULL.
Fixes:
401192113730 ("net: sched: refactor block offloads counter usage")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 1 Nov 2019 03:06:58 +0000 (20:06 -0700)]
selftests: bpf: Skip write only files in debugfs
DebugFS for netdevsim now contains some "action trigger" files
which are write only. Don't try to capture the contents of those.
Note that we can't use os.access() because the script requires
root.
Fixes:
4418f862d675 ("netdevsim: implement support for devlink region and snapshots")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Wang [Thu, 31 Oct 2019 23:24:36 +0000 (16:24 -0700)]
selftests: net: reuseport_dualstack: fix uninitalized parameter
This test reports EINVAL for getsockopt(SOL_SOCKET, SO_DOMAIN)
occasionally due to the uninitialized length parameter.
Initialize it to fix this, and also use int for "test_family" to comply
with the API standard.
Fixes:
d6a61f80b871 ("soreuseport: test mixed v4/v6 sockets")
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Cc: Craig Gallek <cgallek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Thu, 31 Oct 2019 23:10:21 +0000 (00:10 +0100)]
r8169: fix wrong PHY ID issue with RTL8168dp
As reported in [0] at least one RTL8168dp version has problems
establishing a link. This chip version has an integrated RTL8211b PHY,
however the chip seems to report a wrong PHY ID, resulting in a wrong
PHY driver (for Generic Realtek PHY) being loaded.
Work around this issue by adding a hook to r8168dp_2_mdio_read()
for returning the correct PHY ID.
[0] https://bbs.archlinux.org/viewtopic.php?id=246508
Fixes:
242cd9b5866a ("r8169: use phy_resume/phy_suspend")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 31 Oct 2019 22:54:05 +0000 (15:54 -0700)]
net: dsa: bcm_sf2: Fix IMP setup for port different than 8
Since it became possible for the DSA core to use a CPU port different
than 8, our bcm_sf2_imp_setup() function was broken because it assumes
that registers are applicable to port 8. In particular, the port's MAC
is going to stay disabled, so make sure we clear the RX_DIS and TX_DIS
bits if we are not configured for port 8.
Fixes:
9f91484f6fcc ("net: dsa: make "label" property optional for dsa2")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 31 Oct 2019 22:42:26 +0000 (15:42 -0700)]
net: phylink: Fix phylink_dbg() macro
The phylink_dbg() macro does not follow dynamic debug or defined(DEBUG)
and as a result, it spams the kernel log since a PR_DEBUG level is
currently used. Fix it to be defined appropriately whether
CONFIG_DYNAMIC_DEBUG or defined(DEBUG) are set.
Fixes:
17091180b152 ("net: phylink: Add phylink_{printk, err, warn, info, dbg} macros")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yangchun Fu [Fri, 1 Nov 2019 17:09:56 +0000 (10:09 -0700)]
gve: Fixes DMA synchronization.
Synces the DMA buffer properly in order for CPU and device to see
the most up-to-data data.
Signed-off-by: Yangchun Fu <yangchun@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 1 Nov 2019 17:32:19 +0000 (10:32 -0700)]
inet: stop leaking jiffies on the wire
Historically linux tried to stick to RFC 791, 1122, 2003
for IPv4 ID field generation.
RFC 6864 made clear that no matter how hard we try,
we can not ensure unicity of IP ID within maximum
lifetime for all datagrams with a given source
address/destination address/protocol tuple.
Linux uses a per socket inet generator (inet_id), initialized
at connection startup with a XOR of 'jiffies' and other
fields that appear clear on the wire.
Thiemo Nagel pointed that this strategy is a privacy
concern as this provides 16 bits of entropy to fingerprint
devices.
Let's switch to a random starting point, this is just as
good as far as RFC 6864 is concerned and does not leak
anything critical.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Thiemo Nagel <tnagel@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 Nov 2019 21:50:27 +0000 (14:50 -0700)]
Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2019-11-01
This series contains updates to e1000, igb, igc, ixgbe, i40e and driver
documentation.
Lyude Paul fixes an issue where a fatal read error occurs when the
device is unplugged from the machine. So change the read error into a
warn while the device is still present.
Manfred Rudigier found that the i350 device was not apart of the "Media
Auto Sense" feature, yet the device supports it. So add the missing
i350 device to the check and fix an issue where the media auto sense
would flip/flop when no cable was connected to the port causing spurious
kernel log messages.
I fixed an issue where the fix to resolve receive buffer starvation was
applied in more than one place in the driver, one being the incorrect
location in the i40e driver.
Wenwen Wang fixes a potential memory leak in e1000 where allocated
memory is not properly cleaned up in one of the error paths.
Jonathan Neuschäfer cleans up the driver documentation to be consistent
and remove the footnote reference, since the footnote no longer exists in
the documentation.
Igor Pylypiv cleans up a duplicate clearing of a bit, no need to clear
it twice.
v2: Fixed alignment issue in patch 3 of the series based on community
feedback.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Pylypiv [Fri, 4 Oct 2019 06:53:57 +0000 (23:53 -0700)]
ixgbe: Remove duplicate clear_bit() call
__IXGBE_RX_BUILD_SKB_ENABLED bit is already cleared.
Signed-off-by: Igor Pylypiv <igor.pylypiv@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jonathan Neuschäfer [Wed, 2 Oct 2019 15:09:55 +0000 (17:09 +0200)]
Documentation: networking: device drivers: Remove stray asterisks
These asterisks were once references to a line that said:
"* Other names and brands may be claimed as the property of others."
But now, they serve no purpose; they can only irritate the reader.
Fixes:
de3edab4276c ("e1000: update README for e1000")
Fixes:
a3fb65680f65 ("e100.txt: Cleanup license info in kernel doc")
Fixes:
da8c01c4502a ("e1000e.txt: Add e1000e documentation")
Fixes:
f12a84a9f650 ("Documentation: fm10k: Add kernel documentation")
Fixes:
b55c52b1938c ("igb.txt: Add igb documentation")
Fixes:
c4e9b56e2442 ("igbvf.txt: Add igbvf Documentation")
Fixes:
d7064f4c192c ("Documentation/networking/: Update Intel wired LAN driver documentation")
Fixes:
c4b8c01112a1 ("ixgbevf.txt: Update ixgbevf documentation")
Fixes:
1e06edcc2f22 ("Documentation: i40e: Prepare documentation for RST conversion")
Fixes:
105bf2fe6b32 ("i40evf: add driver to kernel build system")
Fixes:
1fae869bcf3d ("Documentation: ice: Prepare documentation for RST conversion")
Fixes:
df69ba43217d ("ionic: Add basic framework for IONIC Network device driver")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Wenwen Wang [Mon, 12 Aug 2019 05:59:21 +0000 (00:59 -0500)]
e1000: fix memory leaks
In e1000_set_ringparam(), 'tx_old' and 'rx_old' are not deallocated if
e1000_up() fails, leading to memory leaks. Refactor the code to fix this
issue.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Mon, 7 Oct 2019 22:07:24 +0000 (15:07 -0700)]
i40e: Fix receive buffer starvation for AF_XDP
Magnus's fix to resolve a potential receive buffer starvation for AF_XDP
got applied to both the i40e_xsk_umem_enable/disable() functions, when it
should have only been applied to the "enable". So clean up the undesired
code in the disable function.
CC: Magnus Karlsson <magnus.karlsson@intel.com>
Fixes:
1f459bdc2007 ("i40e: fix potential RX buffer starvation for AF_XDP")
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Manfred Rudigier [Thu, 15 Aug 2019 20:55:20 +0000 (13:55 -0700)]
igb: Fix constant media auto sense switching when no cable is connected
At least on the i350 there is an annoying behavior that is maybe also
present on 82580 devices, but was probably not noticed yet as MAS is not
widely used.
If no cable is connected on both fiber/copper ports the media auto sense
code will constantly swap between them as part of the watchdog task and
produce many unnecessary kernel log messages.
The swap code responsible for this behavior (switching to fiber) should
not be executed if the current media type is copper and there is no signal
detected on the fiber port. In this case we can safely wait until the
AUTOSENSE_EN bit is cleared.
Signed-off-by: Manfred Rudigier <manfred.rudigier@omicronenergy.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Linus Torvalds [Fri, 1 Nov 2019 18:49:54 +0000 (11:49 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Fix two scheduler topology bugs/oversights on Juno r0 2+4 big.LITTLE
systems"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/topology: Allow sched_asym_cpucapacity to be disabled
sched/topology: Don't try to build empty sched domains
Linus Torvalds [Fri, 1 Nov 2019 18:40:47 +0000 (11:40 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc fixes: an ABI fix for a reserved field, AMD IBS fixes, an Intel
uncore PMU driver fix and a header typo fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/headers: Fix spelling s/EACCESS/EACCES/, s/privilidge/privilege/
perf/x86/uncore: Fix event group support
perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h)
perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity
perf/core: Start rejecting the syscall with attr.__reserved_2 set
Linus Torvalds [Fri, 1 Nov 2019 18:32:50 +0000 (11:32 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Various fixes all over the map: prevent boot crashes on HyperV,
classify UEFI randomness as bootloader randomness, fix EFI boot for
the Raspberry Pi2, fix efi_test permissions, etc"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN
x86, efi: Never relocate kernel below lowest acceptable address
efi: libstub/arm: Account for firmware reserved memory at the base of RAM
efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness
efi/tpm: Return -EINVAL when determining tpm final events log size fails
efi: Make CONFIG_EFI_RCI2_TABLE selectable on x86 only
David S. Miller [Fri, 1 Nov 2019 17:36:46 +0000 (10:36 -0700)]
Merge tag 'wireless-drivers-2019-11-01' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 5.4
Third set of fixes for 5.4. Most of them are for iwlwifi but important
fixes also for rtlwifi and mt76, the overflow fix for rtlwifi being
most important.
iwlwifi
* fix merge damage on earlier patch
* various fixes to device id handling
* fix scan config command handling which caused firmware asserts
rtlwifi
* fix overflow on P2P IE handling
* don't deliver too small frames to mac80211
mt76
* disable PCIE_ASPM
* fix buffer DMA unmap on certain cases
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Fri, 1 Nov 2019 12:17:25 +0000 (20:17 +0800)]
net: ethernet: arc: add the missed clk_disable_unprepare
The remove misses to disable and unprepare priv->macclk like what is done
when probe fails.
Add the missed call in remove.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 1 Nov 2019 17:03:46 +0000 (10:03 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"These are almost exclusively related to CPU errata in CPUs from
Broadcom and Qualcomm where the workarounds were either not being
enabled when they should have been or enabled when they shouldn't have
been.
The only "interesting" fix is ensuring that writeable, shared mappings
are initially mapped as clean since we inadvertently broke the logic
back in v4.14 and then noticed the problem via code inspection the
other day.
The only critical issue we have outstanding is a sporadic NULL
dereference in the scheduler, which doesn't appear to be
arm64-specific and PeterZ is tearing his hair out over it at the
moment.
Summary:
- Enable CPU errata workarounds for Broadcom Brahma-B53
- Enable CPU errata workarounds for Qualcomm Hydra/Kryo CPUs
- Fix initial dirty status of writeable, shared mappings"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: apply ARM64_ERRATUM_843419 workaround for Brahma-B53 core
arm64: Brahma-B53 is SSB and spectre v2 safe
arm64: apply ARM64_ERRATUM_845719 workaround for Brahma-B53 core
arm64: cpufeature: Enable Qualcomm Falkor errata 1009 for Kryo
arm64: cpufeature: Enable Qualcomm Falkor/Kryo errata 1003
arm64: Ensure VM_WRITE|VM_SHARED ptes are clean by default
Linus Torvalds [Fri, 1 Nov 2019 16:54:38 +0000 (09:54 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"generic:
- fix memory leak on failure to create VM
x86:
- fix MMU corner case with AMD nested paging disabled"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active
kvm: call kvm_arch_destroy_vm if vm creation fails
kvm: Allocate memslots and buses before calling kvm_arch_init_vm
Linus Torvalds [Fri, 1 Nov 2019 16:41:08 +0000 (09:41 -0700)]
Merge tag 'drm-fixes-2019-11-01' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"This is the regular drm fixes pull request for 5.4-rc6. It's a bit
larger than I'd like but then last week was quieter than usual.
The main fixes are amdgpu, and the two bigger area are navi fixes
which are the newest GPU range so still getting actively fixed up, but
also a bunch of clang stack alignment fixes (as amdgpu uses double in
some places).
Otherwise it's all fairly run of the mill fixes, i915, panfrost,
etnaviv, v3d and radeon, along with a core scheduler fix.
Summary:
amdgpu:
- clang alignment fixes
- Updated golden settings
- navi: gpuvm, sdma and display fixes
- Freesync fix
- Gamma fix for DCN
- DP dongle detection fix
- vega10: Fix for undervolting
radeon:
- reenable kexec fix for ppc
scheduler:
- set an error if hw job failed
i915:
- fix PCH reference clock for HSW/BDW
- TGL display PLL doc fix
panfrost:
- warning fix
- runtime pm fix
- bad pointer dereference fix
v3d:
- memleak fix
etnaviv:
- memory corruption fix
- deadlock fix
- reintroduce lost debug message"
* tag 'drm-fixes-2019-11-01' of git://anongit.freedesktop.org/drm/drm: (29 commits)
drm/amdgpu: enable -msse2 for GCC 7.1+ users
drm/amdgpu: fix stack alignment ABI mismatch for GCC 7.1+
drm/amdgpu: fix stack alignment ABI mismatch for Clang
drm/radeon: Fix EEH during kexec
drm/amdgpu/gmc10: properly set BANK_SELECT and FRAGMENT_SIZE
drm/amdgpu/powerplay/vega10: allow undervolting in p7
dc.c:use kzalloc without test
drm/amd/display: setting the DIG_MODE to the correct value.
drm/amd/display: Passive DP->HDMI dongle detection fix
drm/amd/display: add 50us buffer as WA for pstate switch in active
drm/amd/display: Allow inverted gamma
drm/amd/display: do not synchronize "drr" displays
drm/amdgpu: If amdgpu_ib_schedule fails return back the error.
drm/sched: Set error to s_fence if HW job submission failed.
drm/amdgpu/gfx10: update gfx golden settings for navi12
drm/amdgpu/gfx10: update gfx golden settings for navi14
drm/amdgpu/gfx10: update gfx golden settings
drm/amd/display: Change Navi14's DWB flag to 1
drm/amdgpu/sdma5: do not execute 0-sized IBs (v2)
drm/amdgpu: Fix SDMA hang when performing VKexample test
...
Linus Torvalds [Fri, 1 Nov 2019 16:30:48 +0000 (09:30 -0700)]
Merge tag 'pm-5.4-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix a recently introduced (mostly theoretical) issue that the requests
to confine the maximum CPU frequency coming from the platform firmware
may not be taken into account if multiple CPUs are covered by one
cpufreq policy on a system with ACPI"
* tag 'pm-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: processor: Add QoS requests for all CPUs
Linus Torvalds [Fri, 1 Nov 2019 16:21:48 +0000 (09:21 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"A number of bug fixes and a regression fix:
- Various issues from static analysis in hfi1, uverbs, hns, and cxgb4
- Fix for deadlock in a case when the new auto RDMA module loading is
used
- Missing _irq notation in a prior -rc patch found by lockdep
- Fix a locking and lifetime issue in siw
- Minor functional bug fixes in cxgb4, mlx5, qedr
- Fix a regression where vlan interfaces no longer worked with RDMA
CM in some cases"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/hns: Prevent memory leaks of eq->buf_list
RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case
RDMA/mlx5: Use irq xarray locking for mkey_table
IB/core: Avoid deadlock during netlink message handling
RDMA/nldev: Skip counter if port doesn't match
RDMA/uverbs: Prevent potential underflow
IB/core: Use rdma_read_gid_l2_fields to compare GID L2 fields
RDMA/qedr: Fix reported firmware version
RDMA/siw: free siw_base_qp in kref release routine
RDMA/iwcm: move iw_rem_ref() calls out of spinlock
iw_cxgb4: fix ECN check on the passive accept
IB/hfi1: Use a common pad buffer for 9B and 16B packets
IB/hfi1: Avoid excessive retry for TID RDMA READ request
RDMA/mlx5: Clear old rate limit when closing QP
Linus Torvalds [Fri, 1 Nov 2019 16:18:00 +0000 (09:18 -0700)]
Merge tag 'sound-5.4-rc6' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A couple of regression fixes and a fix for mutex deadlock at
hog-unplug, as well as other device-specific fixes:
- A commit to avoid the spurious unsolicited interrupt on HD-audio
bus caused a stall at shutdown, so it's reverted now.
- The recent support of AMD/Nvidia audio component binding caused a
mutex deadlock; fixed by splitting to another mutex
- The device hot-unplug and the ALSA timer close combo may lead to
another mutex deadlock; fixed by moving put_device() calls
- Usual device-specific small quirks for HD- and USB-audio drivers
- An old error check fix in FireWire driver"
* tag 'sound-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: timer: Fix mutex deadlock at releasing card
ALSA: hda - Fix mutex deadlock in HDMI codec driver
Revert "ALSA: hda: Flush interrupts on disabling"
ALSA: bebob: Fix prototype of helper function to return negative value
ALSA: hda/realtek - Fix 2 front mics of codec 0x623
ALSA: hda/realtek - Add support for ALC623
ALSA: usb-audio: Add DSD support for Gustard U16/X26 USB Interface
Trond Myklebust [Thu, 31 Oct 2019 22:40:33 +0000 (18:40 -0400)]
NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
A typo in nfs4_refresh_delegation_stateid() means we're leaking an
RCU lock, and always returning a value of 'false'. As the function
description states, we were always supposed to return 'true' if a
matching delegation was found.
Fixes:
12f275cdd163 ("NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID.")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Thu, 31 Oct 2019 22:40:32 +0000 (18:40 -0400)]
NFSv4: Don't allow a cached open with a revoked delegation
If the delegation is marked as being revoked, we must not use it
for cached opens.
Fixes:
869f9dfa4d6d ("NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Florian Fainelli [Thu, 31 Oct 2019 21:47:25 +0000 (14:47 -0700)]
arm64: apply ARM64_ERRATUM_843419 workaround for Brahma-B53 core
The Broadcom Brahma-B53 core is susceptible to the issue described by
ARM64_ERRATUM_843419 so this commit enables the workaround to be applied
when executing on that core.
Since there are now multiple entries to match, we must convert the
existing ARM64_ERRATUM_843419 into an erratum list and use
cpucap_multi_entry_cap_matches to match our entries.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
Florian Fainelli [Thu, 31 Oct 2019 21:47:24 +0000 (14:47 -0700)]
arm64: Brahma-B53 is SSB and spectre v2 safe
Add the Brahma-B53 CPU (all versions) to the whitelists of CPUs for the
SSB and spectre v2 mitigations.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
Doug Berger [Thu, 31 Oct 2019 21:47:23 +0000 (14:47 -0700)]
arm64: apply ARM64_ERRATUM_845719 workaround for Brahma-B53 core
The Broadcom Brahma-B53 core is susceptible to the issue described by
ARM64_ERRATUM_845719 so this commit enables the workaround to be applied
when executing on that core.
Since there are now multiple entries to match, we must convert the
existing ARM64_ERRATUM_845719 into an erratum list.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
Dave Airlie [Fri, 1 Nov 2019 01:27:39 +0000 (11:27 +1000)]
Merge tag 'drm-fixes-5.4-2019-10-30' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
drm-fixes-5.4-2019-10-30:
amdgpu:
- clang fixes
- Updated golden settings
- GPUVM fixes for navi
- Navi sdma fix
- Navi display fixes
- Freesync fix
- Gamma fix for DCN
- DP dongle detection fix
- Fix for undervolting on vega10
radeon:
- enable kexec fix for PPC
scheduler:
- set an error on fence if hw job failed
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191030162339.44366-1-alexander.deucher@amd.com
Dave Airlie [Fri, 1 Nov 2019 01:13:35 +0000 (11:13 +1000)]
Merge tag 'drm-intel-fixes-2019-10-31' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix PCH reference clock for FDI on HSW/BDW which was causing users blank screen
- Small documentation fix for TGL display PLLs
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191031171209.GA6586@intel.com
Dave Airlie [Fri, 1 Nov 2019 01:09:42 +0000 (11:09 +1000)]
Merge tag 'drm-misc-fixes-2019-10-30-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
- three fixes for panfrost, one to silence a warning, one to fix
runtime_pm and one to prevent bogus pointer dereferences
- one fix for a memleak in v3d
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20191030182207.evrscl7lnv42u5zu@hendrix
Dave Airlie [Fri, 1 Nov 2019 01:08:24 +0000 (11:08 +1000)]
Merge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux into drm-fixes
One memory corruption fix in the MMUv2 GPU coredump code, a deadlock
fix also in the coredump code and reintroduction of a helpful message,
which got dropped by accident in this cycle.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas Stach <l.stach@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/b0d640267662e3ce5e0089d0afedc1baba55058d.camel@pengutronix.de
Manfred Rudigier [Thu, 15 Aug 2019 20:55:19 +0000 (13:55 -0700)]
igb: Enable media autosense for the i350.
This patch enables the hardware feature "Media Auto Sense" also on the
i350. It works in the same way as on the 82850 devices. Hardware designs
using dual PHYs (fiber/copper) can enable this feature by setting the MAS
enable bits in the NVM_COMPAT register (0x03) in the EEPROM.
Signed-off-by: Manfred Rudigier <manfred.rudigier@omicronenergy.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Lyude Paul [Thu, 22 Aug 2019 18:33:18 +0000 (14:33 -0400)]
igb/igc: Don't warn on fatal read failures when the device is removed
Fatal read errors are worth warning about, unless of course the device
was just unplugged from the machine - something that's a rather normal
occurrence when the igb/igc adapter is located on a Thunderbolt dock. So,
let's only WARN() if there's a fatal read error while the device is
still present.
This fixes the following WARN splat that's been appearing whenever I
unplug my Caldigit TS3 Thunderbolt dock from my laptop:
igb 0000:09:00.0 enp9s0: PCIe link lost
------------[ cut here ]------------
igb: Failed to read reg 0x18!
WARNING: CPU: 7 PID: 516 at
drivers/net/ethernet/intel/igb/igb_main.c:756 igb_rd32+0x57/0x6a [igb]
Modules linked in: igb dca thunderbolt fuse vfat fat elan_i2c mei_wdt
mei_hdcp i915 wmi_bmof intel_wmi_thunderbolt iTCO_wdt
iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp joydev
coretemp crct10dif_pclmul crc32_pclmul i2c_algo_bit ghash_clmulni_intel
intel_cstate drm_kms_helper intel_uncore syscopyarea sysfillrect
sysimgblt fb_sys_fops intel_rapl_perf intel_xhci_usb_role_switch mei_me
drm roles idma64 i2c_i801 ucsi_acpi typec_ucsi mei intel_lpss_pci
processor_thermal_device typec intel_pch_thermal intel_soc_dts_iosf
intel_lpss int3403_thermal thinkpad_acpi wmi int340x_thermal_zone
ledtrig_audio int3400_thermal acpi_thermal_rel acpi_pad video
pcc_cpufreq ip_tables serio_raw nvme nvme_core crc32c_intel uas
usb_storage e1000e i2c_dev
CPU: 7 PID: 516 Comm: kworker/u16:3 Not tainted 5.2.0-rc1Lyude-Test+ #14
Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
RIP: 0010:igb_rd32+0x57/0x6a [igb]
Code: 87 b8 fc ff ff 48 c7 47 08 00 00 00 00 48 c7 c6 33 42 9b c0 4c 89
c7 e8 47 45 cd dc 89 ee 48 c7 c7 43 42 9b c0 e8 c1 94 71 dc <0f> 0b eb
08 8b 00 ff c0 75 b0 eb c8 44 89 e0 5d 41 5c c3 0f 1f 44
RSP: 0018:
ffffba5801cf7c48 EFLAGS:
00010286
RAX:
0000000000000000 RBX:
ffff9e7956608840 RCX:
0000000000000007
RDX:
0000000000000000 RSI:
ffffba5801cf7b24 RDI:
ffff9e795e3d6a00
RBP:
0000000000000018 R08:
000000009dec4a01 R09:
ffffffff9e61018f
R10:
0000000000000000 R11:
ffffba5801cf7ae5 R12:
00000000ffffffff
R13:
ffff9e7956608840 R14:
ffff9e795a6f10b0 R15:
0000000000000000
FS:
0000000000000000(0000) GS:
ffff9e795e3c0000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000564317bc4088 CR3:
000000010e00a006 CR4:
00000000003606e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
igb_release_hw_control+0x1a/0x30 [igb]
igb_remove+0xc5/0x14b [igb]
pci_device_remove+0x3b/0x93
device_release_driver_internal+0xd7/0x17e
pci_stop_bus_device+0x36/0x75
pci_stop_bus_device+0x66/0x75
pci_stop_bus_device+0x66/0x75
pci_stop_and_remove_bus_device+0xf/0x19
trim_stale_devices+0xc5/0x13a
? __pm_runtime_resume+0x6e/0x7b
trim_stale_devices+0x103/0x13a
? __pm_runtime_resume+0x6e/0x7b
trim_stale_devices+0x103/0x13a
acpiphp_check_bridge+0xd8/0xf5
acpiphp_hotplug_notify+0xf7/0x14b
? acpiphp_check_bridge+0xf5/0xf5
acpi_device_hotplug+0x357/0x3b5
acpi_hotplug_work_fn+0x1a/0x23
process_one_work+0x1a7/0x296
worker_thread+0x1a8/0x24c
? process_scheduled_works+0x2c/0x2c
kthread+0xe9/0xee
? kthread_destroy_worker+0x41/0x41
ret_from_fork+0x35/0x40
---[ end trace
252bf10352c63d22 ]---
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes:
47e16692b26b ("igb/igc: warn when fatal read failure happens")
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Acked-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Eric Dumazet [Wed, 30 Oct 2019 17:05:46 +0000 (10:05 -0700)]
tcp: increase tcp_max_syn_backlog max value
tcp_max_syn_backlog default value depends on memory size
and TCP ehash size. Before this patch, the max value
was 2048 [1], which is considered too small nowadays.
Increase it to 4096 to match the recent SOMAXCONN change.
[1] This is with TCP ehash size being capped to 524288 buckets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 30 Oct 2019 16:36:20 +0000 (09:36 -0700)]
net: increase SOMAXCONN to 4096
SOMAXCONN is /proc/sys/net/core/somaxconn default value.
It has been defined as 128 more than 20 years ago.
Since it caps the listen() backlog values, the very small value has
caused numerous problems over the years, and many people had
to raise it on their hosts after beeing hit by problems.
Google has been using 1024 for at least 15 years, and we increased
this to 4096 after TCP listener rework has been completed, more than
4 years ago. We got no complain of this change breaking any
legacy application.
Many applications indeed setup a TCP listener with listen(fd, -1);
meaning they let the system select the backlog.
Raising SOMAXCONN lowers chance of the port being unavailable under
even small SYNFLOOD attack, and reduces possibilities of side channel
vulnerabilities.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafael J. Wysocki [Thu, 31 Oct 2019 20:41:37 +0000 (21:41 +0100)]
Merge branch 'pm-cpufreq'
* pm-cpufreq:
ACPI: processor: Add QoS requests for all CPUs
Ido Schimmel [Thu, 31 Oct 2019 16:20:30 +0000 (18:20 +0200)]
netdevsim: Fix use-after-free during device dismantle
Commit
da58f90f11f5 ("netdevsim: Add devlink-trap support") added
delayed work to netdevsim that periodically iterates over the registered
netdevsim ports and reports various packet traps via devlink.
While the delayed work takes the 'port_list_lock' mutex to protect
against concurrent addition / deletion of ports, during device creation
/ dismantle ports are added / deleted without this lock, which can
result in a use-after-free [1].
Fix this by making sure that the ports list is always modified under the
lock.
[1]
[ 59.205543] ==================================================================
[ 59.207748] BUG: KASAN: use-after-free in nsim_dev_trap_report_work+0xa67/0xad0
[ 59.210247] Read of size 8 at addr
ffff8883cbdd3398 by task kworker/3:1/38
[ 59.212584]
[ 59.213148] CPU: 3 PID: 38 Comm: kworker/3:1 Not tainted 5.4.0-rc3-custom-16119-ge6abb5f0261e #2013
[ 59.215896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
[ 59.218384] Workqueue: events nsim_dev_trap_report_work
[ 59.219428] Call Trace:
[ 59.219924] dump_stack+0xa9/0x10e
[ 59.220623] print_address_description.constprop.4+0x21/0x340
[ 59.221976] ? vprintk_func+0x66/0x240
[ 59.222752] __kasan_report.cold.8+0x78/0x91
[ 59.223602] ? nsim_dev_trap_report_work+0xa67/0xad0
[ 59.224603] kasan_report+0xe/0x20
[ 59.225296] nsim_dev_trap_report_work+0xa67/0xad0
[ 59.226435] ? rcu_read_lock_sched_held+0xaf/0xe0
[ 59.227512] ? trace_event_raw_event_rcu_quiescent_state_report+0x360/0x360
[ 59.228851] process_one_work+0x98f/0x1760
[ 59.229684] ? pwq_dec_nr_in_flight+0x330/0x330
[ 59.230656] worker_thread+0x91/0xc40
[ 59.231587] ? process_one_work+0x1760/0x1760
[ 59.232451] kthread+0x34a/0x410
[ 59.233104] ? __kthread_queue_delayed_work+0x240/0x240
[ 59.234141] ret_from_fork+0x3a/0x50
[ 59.234982]
[ 59.235371] Allocated by task 187:
[ 59.236189] save_stack+0x19/0x80
[ 59.236853] __kasan_kmalloc.constprop.5+0xc1/0xd0
[ 59.237822] kmem_cache_alloc_trace+0x14c/0x380
[ 59.238769] __nsim_dev_port_add+0xaf/0x5c0
[ 59.239627] nsim_dev_probe+0x4fc/0x1140
[ 59.240550] really_probe+0x264/0xc00
[ 59.241418] driver_probe_device+0x208/0x2e0
[ 59.242255] __device_attach_driver+0x215/0x2d0
[ 59.243150] bus_for_each_drv+0x154/0x1d0
[ 59.243944] __device_attach+0x1ba/0x2b0
[ 59.244923] bus_probe_device+0x1dd/0x290
[ 59.245805] device_add+0xbac/0x1550
[ 59.246528] new_device_store+0x1f4/0x400
[ 59.247306] bus_attr_store+0x7b/0xa0
[ 59.248047] sysfs_kf_write+0x10f/0x170
[ 59.248941] kernfs_fop_write+0x283/0x430
[ 59.249843] __vfs_write+0x81/0x100
[ 59.250546] vfs_write+0x1ce/0x510
[ 59.251190] ksys_write+0x104/0x200
[ 59.251873] do_syscall_64+0xa4/0x4e0
[ 59.252642] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 59.253837]
[ 59.254203] Freed by task 187:
[ 59.254811] save_stack+0x19/0x80
[ 59.255463] __kasan_slab_free+0x125/0x170
[ 59.256265] kfree+0x100/0x440
[ 59.256870] nsim_dev_remove+0x98/0x100
[ 59.257651] nsim_bus_remove+0x16/0x20
[ 59.258382] device_release_driver_internal+0x20b/0x4d0
[ 59.259588] bus_remove_device+0x2e9/0x5a0
[ 59.260551] device_del+0x410/0xad0
[ 59.263777] device_unregister+0x26/0xc0
[ 59.264616] nsim_bus_dev_del+0x16/0x60
[ 59.265381] del_device_store+0x2d6/0x3c0
[ 59.266295] bus_attr_store+0x7b/0xa0
[ 59.267192] sysfs_kf_write+0x10f/0x170
[ 59.267960] kernfs_fop_write+0x283/0x430
[ 59.268800] __vfs_write+0x81/0x100
[ 59.269551] vfs_write+0x1ce/0x510
[ 59.270252] ksys_write+0x104/0x200
[ 59.270910] do_syscall_64+0xa4/0x4e0
[ 59.271680] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 59.272812]
[ 59.273211] The buggy address belongs to the object at
ffff8883cbdd3200
[ 59.273211] which belongs to the cache kmalloc-512 of size 512
[ 59.275838] The buggy address is located 408 bytes inside of
[ 59.275838] 512-byte region [
ffff8883cbdd3200,
ffff8883cbdd3400)
[ 59.278151] The buggy address belongs to the page:
[ 59.279215] page:
ffffea000f2f7400 refcount:1 mapcount:0 mapping:
ffff8883ecc0ce00 index:0x0 compound_mapcount: 0
[ 59.281449] flags: 0x200000000010200(slab|head)
[ 59.282356] raw:
0200000000010200 ffffea000f2f3a08 ffffea000f2fd608 ffff8883ecc0ce00
[ 59.283949] raw:
0000000000000000 0000000000150015 00000001ffffffff 0000000000000000
[ 59.285608] page dumped because: kasan: bad access detected
[ 59.286981]
[ 59.287337] Memory state around the buggy address:
[ 59.288310]
ffff8883cbdd3280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 59.289763]
ffff8883cbdd3300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 59.291452] >
ffff8883cbdd3380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 59.292945] ^
[ 59.293815]
ffff8883cbdd3400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 59.295220]
ffff8883cbdd3480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 59.296872] ==================================================================
Fixes:
da58f90f11f5 ("netdevsim: Add devlink-trap support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: syzbot+9ed8f68ab30761f3678e@syzkaller.appspotmail.com
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Thu, 31 Oct 2019 12:13:46 +0000 (12:13 +0000)]
rxrpc: Fix handling of last subpacket of jumbo packet
When rxrpc_recvmsg_data() sets the return value to 1 because it's drained
all the data for the last packet, it checks the last-packet flag on the
whole packet - but this is wrong, since the last-packet flag is only set on
the final subpacket of the last jumbo packet. This means that a call that
receives its last packet in a jumbo packet won't complete properly.
Fix this by having rxrpc_locate_data() determine the last-packet state of
the subpacket it's looking at and passing that back to the caller rather
than having the caller look in the packet header. The caller then needs to
cache this in the rxrpc_call struct as rxrpc_locate_data() isn't then
called again for this packet.
Fixes:
248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Fixes:
e2de6c404898 ("rxrpc: Use info in skbuff instead of reparsing a jumbo packet")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 31 Oct 2019 18:43:36 +0000 (11:43 -0700)]
Merge tag 'mac80211-for-net-2019-10-31' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just two fixes:
* HT operation is not allowed on channel 14 (Japan only)
* netlink policy for nexthop attribute was wrong
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Felipe Balbi [Thu, 31 Oct 2019 09:07:13 +0000 (11:07 +0200)]
usb: dwc3: gadget: fix race when disabling ep with cancelled xfers
When disabling an endpoint which has cancelled requests, we should
make sure to giveback requests that are currently pending in the
cancelled list, otherwise we may fall into a situation where command
completion interrupt fires after endpoint has been disabled, therefore
causing a splat.
Fixes:
fec9095bdef4 "usb: dwc3: gadget: remove wait_end_transfer"
Reported-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Link: https://lore.kernel.org/r/20191031090713.1452818-1-felipe.balbi@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Carpenter [Thu, 31 Oct 2019 10:53:41 +0000 (13:53 +0300)]
iocost: don't nest spin_lock_irq in ioc_weight_write()
This code causes a static analysis warning:
block/blk-iocost.c:2113 ioc_weight_write() error: double lock 'irq'
We disable IRQs in blkg_conf_prep() and re-enable them in
blkg_conf_finish(). IRQ disable/enable should not be nested because
that means the IRQs will be enabled at the first unlock instead of the
second one.
Fixes:
7caa47151ab2 ("blkcg: implement blk-iocost")
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Heiko Carstens [Mon, 28 Oct 2019 10:03:27 +0000 (11:03 +0100)]
s390/idle: fix cpu idle time calculation
The idle time reported in /proc/stat sometimes incorrectly contains
huge values on s390. This is caused by a bug in arch_cpu_idle_time().
The kernel tries to figure out when a different cpu entered idle by
accessing its per-cpu data structure. There is an ordering problem: if
the remote cpu has an idle_enter value which is not zero, and an
idle_exit value which is zero, it is assumed it is idle since
"now". The "now" timestamp however is taken before the idle_enter
value is read.
Which in turn means that "now" can be smaller than idle_enter of the
remote cpu. Unconditionally subtracting idle_enter from "now" can thus
lead to a negative value (aka large unsigned value).
Fix this by moving the get_tod_clock() invocation out of the
loop. While at it also make the code a bit more readable.
A similar bug also exists for show_idle_time(). Fix this is as well.
Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Ilya Leoshkevich [Wed, 2 Oct 2019 11:29:57 +0000 (13:29 +0200)]
s390/unwind: fix mixing regs and sp
unwind_for_each_frame stops after the first frame if regs->gprs[15] <=
sp.
The reason is that in case regs are specified, the first frame should be
regs->psw.addr and the second frame should be sp->gprs[8]. However,
currently the second frame is regs->gprs[15], which confuses
outside_of_stack().
Fix by introducing a flag to distinguish this special case from
unwinding the interrupt handler, for which the current behavior is
appropriate.
Fixes:
78c98f907413 ("s390/unwind: introduce stack unwind API")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: stable@vger.kernel.org # v5.2+
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Yihui ZENG [Fri, 25 Oct 2019 09:31:48 +0000 (12:31 +0300)]
s390/cmm: fix information leak in cmm_timeout_handler()
The problem is that we were putting the NUL terminator too far:
buf[sizeof(buf) - 1] = '\0';
If the user input isn't NUL terminated and they haven't initialized the
whole buffer then it leads to an info leak. The NUL terminator should
be:
buf[len - 1] = '\0';
Signed-off-by: Yihui Zeng <yzeng56@asu.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
[heiko.carstens@de.ibm.com: keep semantics of how *lenp and *ppos are handled]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Bjorn Andersson [Tue, 29 Oct 2019 23:27:38 +0000 (16:27 -0700)]
arm64: cpufeature: Enable Qualcomm Falkor errata 1009 for Kryo
The Kryo cores share errata 1009 with Falkor, so add their model
definitions and enable it for them as well.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[will: Update entry in silicon-errata.rst]
Signed-off-by: Will Deacon <will@kernel.org>
Paolo Bonzini [Sun, 27 Oct 2019 15:23:23 +0000 (16:23 +0100)]
KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active
VMX already does so if the host has SMEP, in order to support the combination of
CR0.WP=1 and CR4.SMEP=1. However, it is perfectly safe to always do so, and in
fact VMX already ends up running with EFER.NXE=1 on old processors that lack the
"load EFER" controls, because it may help avoiding a slow MSR write. Removing
all the conditionals simplifies the code.
SVM does not have similar code, but it should since recent AMD processors do
support SMEP. So this patch also makes the code for the two vendors more similar
while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors.
Cc: stable@vger.kernel.org
Cc: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jim Mattson [Fri, 25 Oct 2019 11:34:58 +0000 (13:34 +0200)]
kvm: call kvm_arch_destroy_vm if vm creation fails
In kvm_create_vm(), if we've successfully called kvm_arch_init_vm(), but
then fail later in the function, we need to call kvm_arch_destroy_vm()
so that it can do any necessary cleanup (like freeing memory).
Fixes:
44a95dae1d229a ("KVM: x86: Detect and Initialize AVIC support")
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Junaid Shahid <junaids@google.com>
[Remove dependency on "kvm: Don't clear reference count on
kvm_create_vm() error path" which was not committed. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Javier Martinez Canillas [Tue, 29 Oct 2019 17:37:55 +0000 (18:37 +0100)]
efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN
The driver exposes EFI runtime services to user-space through an IOCTL
interface, calling the EFI services function pointers directly without
using the efivar API.
Disallow access to the /dev/efi_test character device when the kernel is
locked down to prevent arbitrary user-space to call EFI runtime services.
Also require CAP_SYS_ADMIN to open the chardev to prevent unprivileged
users to call the EFI runtime services, instead of just relying on the
chardev file mode bits for this.
The main user of this driver is the fwts [0] tool that already checks if
the effective user ID is 0 and fails otherwise. So this change shouldn't
cause any regression to this tool.
[0]: https://wiki.ubuntu.com/FirmwareTestSuite/Reference/uefivarinfo
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Matthew Garrett <mjg59@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-7-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kairui Song [Tue, 29 Oct 2019 17:37:54 +0000 (18:37 +0100)]
x86, efi: Never relocate kernel below lowest acceptable address
Currently, kernel fails to boot on some HyperV VMs when using EFI.
And it's a potential issue on all x86 platforms.
It's caused by broken kernel relocation on EFI systems, when below three
conditions are met:
1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR)
by the loader.
2. There isn't enough room to contain the kernel, starting from the
default load address (eg. something else occupied part the region).
3. In the memmap provided by EFI firmware, there is a memory region
starts below LOAD_PHYSICAL_ADDR, and suitable for containing the
kernel.
EFI stub will perform a kernel relocation when condition 1 is met. But
due to condition 2, EFI stub can't relocate kernel to the preferred
address, so it fallback to ask EFI firmware to alloc lowest usable memory
region, got the low region mentioned in condition 3, and relocated
kernel there.
It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This
is the lowest acceptable kernel relocation address.
The first thing goes wrong is in arch/x86/boot/compressed/head_64.S.
Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output
address if kernel is located below it. Then the relocation before
decompression, which move kernel to the end of the decompression buffer,
will overwrite other memory region, as there is no enough memory there.
To fix it, just don't let EFI stub relocate the kernel to any address
lower than lowest acceptable address.
[ ardb: introduce efi_low_alloc_above() to reduce the scope of the change ]
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-6-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ard Biesheuvel [Tue, 29 Oct 2019 17:37:53 +0000 (18:37 +0100)]
efi: libstub/arm: Account for firmware reserved memory at the base of RAM
The EFI stubloader for ARM starts out by allocating a 32 MB window
at the base of RAM, in order to ensure that the decompressor (which
blindly copies the uncompressed kernel into that window) does not
overwrite other allocations that are made while running in the context
of the EFI firmware.
In some cases, (e.g., U-Boot running on the Raspberry Pi 2), this is
causing boot failures because this initial allocation conflicts with
a page of reserved memory at the base of RAM that contains the SMP spin
tables and other pieces of firmware data and which was put there by
the bootloader under the assumption that the TEXT_OFFSET window right
below the kernel is only used partially during early boot, and will be
left alone once the memory reservations are processed and taken into
account.
So let's permit reserved memory regions to exist in the region starting
at the base of RAM, and ending at TEXT_OFFSET - 5 * PAGE_SIZE, which is
the window below the kernel that is not touched by the early boot code.
Tested-by: Guillaume Gardet <Guillaume.Gardet@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Chester Lin <clin@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-5-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>