Thomas Gleixner [Sun, 25 Nov 2018 18:33:39 +0000 (19:33 +0100)]
x86/speculation: Rework SMT state change
commit
a74cfffb03b73d41e08f84c2e5c87dec0ce3db9f upstream
arch_smt_update() is only called when the sysfs SMT control knob is
changed. This means that when SMT is enabled in the sysfs control knob the
system is considered to have SMT active even if all siblings are offline.
To allow finegrained control of the speculation mitigations, the actual SMT
state is more interesting than the fact that siblings could be enabled.
Rework the code, so arch_smt_update() is invoked from each individual CPU
hotplug function, and simplify the update function while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Gleixner [Sun, 25 Nov 2018 18:33:38 +0000 (19:33 +0100)]
sched/smt: Expose sched_smt_present static key
commit
321a874a7ef85655e93b3206d0f36b4a6097f948 upstream
Make the scheduler's 'sched_smt_present' static key globaly available, so
it can be used in the x86 speculation control code.
Provide a query function and a stub for the CONFIG_SMP=n case.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.430168326@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Gleixner [Sun, 25 Nov 2018 18:33:37 +0000 (19:33 +0100)]
x86/Kconfig: Select SCHED_SMT if SMP enabled
commit
dbe733642e01dd108f71436aaea7b328cb28fd87 upstream
CONFIG_SCHED_SMT is enabled by all distros, so there is not a real point to
have it configurable. The runtime overhead in the core scheduler code is
minimal because the actual SMT scheduling parts are conditional on a static
key.
This allows to expose the scheduler's SMT state static key to the
speculation control code. Alternatively the scheduler's static key could be
made always available when CONFIG_SMP is enabled, but that's just adding an
unused static key to every other architecture for nothing.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.337452245@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Zijlstra (Intel) [Sun, 25 Nov 2018 18:33:36 +0000 (19:33 +0100)]
sched/smt: Make sched_smt_present track topology
commit
c5511d03ec090980732e929c318a7a6374b5550e upstream
Currently the 'sched_smt_present' static key is enabled when at CPU bringup
SMT topology is observed, but it is never disabled. However there is demand
to also disable the key when the topology changes such that there is no SMT
present anymore.
Implement this by making the key count the number of cores that have SMT
enabled.
In particular, the SMT topology bits are set before interrrupts are enabled
and similarly, are cleared after interrupts are disabled for the last time
and the CPU dies.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.246110444@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tim Chen [Sun, 25 Nov 2018 18:33:35 +0000 (19:33 +0100)]
x86/speculation: Reorganize speculation control MSRs update
commit
01daf56875ee0cd50ed496a09b20eb369b45dfa5 upstream
The logic to detect whether there's a change in the previous and next
task's flag relevant to update speculation control MSRs is spread out
across multiple functions.
Consolidate all checks needed for updating speculation control MSRs into
the new __speculation_ctrl_update() helper function.
This makes it easy to pick the right speculation control MSR and the bits
in MSR_IA32_SPEC_CTRL that need updating based on TIF flags changes.
Originally-by: Thomas Lendacky <Thomas.Lendacky@amd.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.151077005@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Gleixner [Sun, 25 Nov 2018 18:33:34 +0000 (19:33 +0100)]
x86/speculation: Rename SSBD update functions
commit
26c4d75b234040c11728a8acb796b3a85ba7507c upstream
During context switch, the SSBD bit in SPEC_CTRL MSR is updated according
to changes of the TIF_SSBD flag in the current and next running task.
Currently, only the bit controlling speculative store bypass disable in
SPEC_CTRL MSR is updated and the related update functions all have
"speculative_store" or "ssb" in their names.
For enhanced mitigation control other bits in SPEC_CTRL MSR need to be
updated as well, which makes the SSB names inadequate.
Rename the "speculative_store*" functions to a more generic name. No
functional change.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.058866968@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tim Chen [Sun, 25 Nov 2018 18:33:33 +0000 (19:33 +0100)]
x86/speculation: Disable STIBP when enhanced IBRS is in use
commit
34bce7c9690b1d897686aac89604ba7adc365556 upstream
If enhanced IBRS is active, STIBP is redundant for mitigating Spectre v2
user space exploits from hyperthread sibling.
Disable STIBP when enhanced IBRS is used.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185003.966801480@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tim Chen [Sun, 25 Nov 2018 18:33:32 +0000 (19:33 +0100)]
x86/speculation: Move STIPB/IBPB string conditionals out of cpu_show_common()
commit
a8f76ae41cd633ac00be1b3019b1eb4741be3828 upstream
The Spectre V2 printout in cpu_show_common() handles conditionals for the
various mitigation methods directly in the sprintf() argument list. That's
hard to read and will become unreadable if more complex decisions need to
be made for a particular method.
Move the conditionals for STIBP and IBPB string selection into helper
functions, so they can be extended later on.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185003.874479208@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tim Chen [Sun, 25 Nov 2018 18:33:31 +0000 (19:33 +0100)]
x86/speculation: Remove unnecessary ret variable in cpu_show_common()
commit
b86bda0426853bfe8a3506c7d2a5b332760ae46b upstream
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185003.783903657@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tim Chen [Sun, 25 Nov 2018 18:33:30 +0000 (19:33 +0100)]
x86/speculation: Clean up spectre_v2_parse_cmdline()
commit
24848509aa55eac39d524b587b051f4e86df3c12 upstream
Remove the unnecessary 'else' statement in spectre_v2_parse_cmdline()
to save an indentation level.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185003.688010903@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tim Chen [Sun, 25 Nov 2018 18:33:29 +0000 (19:33 +0100)]
x86/speculation: Update the TIF_SSBD comment
commit
8eb729b77faf83ac4c1f363a9ad68d042415f24c upstream
"Reduced Data Speculation" is an obsolete term. The correct new name is
"Speculative store bypass disable" - which is abbreviated into SSBD.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185003.593893901@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Zhenzhong Duan [Fri, 2 Nov 2018 08:45:41 +0000 (01:45 -0700)]
x86/retpoline: Remove minimal retpoline support
commit
ef014aae8f1cd2793e4e014bbb102bed53f852b7 upstream
Now that CONFIG_RETPOLINE hard depends on compiler support, there is no
reason to keep the minimal retpoline support around which only provided
basic protection in the assembly files.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <srinivas.eeda@oracle.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/f06f0a89-5587-45db-8ed2-0a9d6638d5c0@default
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Zhenzhong Duan [Fri, 2 Nov 2018 08:45:41 +0000 (01:45 -0700)]
x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support
commit
4cd24de3a0980bf3100c9dcb08ef65ca7c31af48 upstream
Since retpoline capable compilers are widely available, make
CONFIG_RETPOLINE hard depend on the compiler capability.
Break the build when CONFIG_RETPOLINE is enabled and the compiler does not
support it. Emit an error message in that case:
"arch/x86/Makefile:226: *** You are building kernel with non-retpoline
compiler, please update your compiler.. Stop."
[dwmw: Fail the build with non-retpoline compiler]
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Borislav Petkov <bp@suse.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: <srinivas.eeda@oracle.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/cca0cb20-f9e2-4094-840b-fb0f8810cd34@default
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Zhenzhong Duan [Tue, 18 Sep 2018 14:45:00 +0000 (07:45 -0700)]
x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant
commit
0cbb76d6285794f30953bfa3ab831714b59dd700 upstream
..so that they match their asm counterpart.
Add the missing ANNOTATE_NOSPEC_ALTERNATIVE in CALL_NOSPEC, while at it.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang YanQing <udknight@gmail.com>
Cc: dhaval.giani@oracle.com
Cc: srinivas.eeda@oracle.com
Link: http://lkml.kernel.org/r/c3975665-173e-4d70-8dee-06c926ac26ee@default
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jiri Kosina [Tue, 25 Sep 2018 12:39:28 +0000 (14:39 +0200)]
x86/speculation: Propagate information about RSB filling mitigation to sysfs
commit
bb4b3b7762735cdaba5a40fd94c9303d9ffa147a upstream
If spectrev2 mitigation has been enabled, RSB is filled on context switch
in order to protect from various classes of spectrev2 attacks.
If this mitigation is enabled, say so in sysfs for spectrev2.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: "SchauflerCasey" <casey.schaufler@intel.com>
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438580.15880@cbobk.fhfr.pm
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jiri Kosina [Tue, 25 Sep 2018 12:38:18 +0000 (14:38 +0200)]
x86/speculation: Apply IBPB more strictly to avoid cross-process data leak
commit
dbfe2953f63c640463c630746cd5d9de8b2f63ae upstream
Currently, IBPB is only issued in cases when switching into a non-dumpable
process, the rationale being to protect such 'important and security
sensitive' processess (such as GPG) from data leaking into a different
userspace process via spectre v2.
This is however completely insufficient to provide proper userspace-to-userpace
spectrev2 protection, as any process can poison branch buffers before being
scheduled out, and the newly scheduled process immediately becomes spectrev2
victim.
In order to minimize the performance impact (for usecases that do require
spectrev2 protection), issue the barrier only in cases when switching between
processess where the victim can't be ptraced by the potential attacker (as in
such cases, the attacker doesn't have to bother with branch buffers at all).
[ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and
PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably
fine-grained ]
Fixes:
18bf3c3ea8 ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch")
Originally-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "SchauflerCasey" <casey.schaufler@intel.com>
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pm
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jiri Kosina [Tue, 25 Sep 2018 12:38:55 +0000 (14:38 +0200)]
x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
commit
53c613fe6349994f023245519265999eed75957f upstream
STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
(once enabled) prevents cross-hyperthread control of decisions made by
indirect branch predictors.
Enable this feature if
- the CPU is vulnerable to spectre v2
- the CPU supports SMT and has SMT siblings online
- spectre_v2 mitigation autoselection is enabled (default)
After some previous discussion, this leaves STIBP on all the time, as wrmsr
on crossing kernel boundary is a no-no. This could perhaps later be a bit
more optimized (like disabling it in NOHZ, experiment with disabling it in
idle, etc) if needed.
Note that the synchronization of the mask manipulation via newly added
spec_ctrl_mutex is currently not strictly needed, as the only updater is
already being serialized by cpu_add_remove_lock, but let's make this a
little bit more future-proof.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: "SchauflerCasey" <casey.schaufler@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jon Maloy [Mon, 26 Nov 2018 17:26:14 +0000 (12:26 -0500)]
tipc: fix lockdep warning during node delete
[ Upstream commit
ec835f891232d7763dea9da0358f31e24ca6dfb7 ]
We see the following lockdep warning:
[ 2284.078521] ======================================================
[ 2284.078604] WARNING: possible circular locking dependency detected
[ 2284.078604] 4.19.0+ #42 Tainted: G E
[ 2284.078604] ------------------------------------------------------
[ 2284.078604] rmmod/254 is trying to acquire lock:
[ 2284.078604]
00000000acd94e28 ((&n->timer)#2){+.-.}, at: del_timer_sync+0x5/0xa0
[ 2284.078604]
[ 2284.078604] but task is already holding lock:
[ 2284.078604]
00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x190 [tipc]
[ 2284.078604]
[ 2284.078604] which lock already depends on the new lock.
[ 2284.078604]
[ 2284.078604]
[ 2284.078604] the existing dependency chain (in reverse order) is:
[ 2284.078604]
[ 2284.078604] -> #1 (&(&tn->node_list_lock)->rlock){+.-.}:
[ 2284.078604] tipc_node_timeout+0x20a/0x330 [tipc]
[ 2284.078604] call_timer_fn+0xa1/0x280
[ 2284.078604] run_timer_softirq+0x1f2/0x4d0
[ 2284.078604] __do_softirq+0xfc/0x413
[ 2284.078604] irq_exit+0xb5/0xc0
[ 2284.078604] smp_apic_timer_interrupt+0xac/0x210
[ 2284.078604] apic_timer_interrupt+0xf/0x20
[ 2284.078604] default_idle+0x1c/0x140
[ 2284.078604] do_idle+0x1bc/0x280
[ 2284.078604] cpu_startup_entry+0x19/0x20
[ 2284.078604] start_secondary+0x187/0x1c0
[ 2284.078604] secondary_startup_64+0xa4/0xb0
[ 2284.078604]
[ 2284.078604] -> #0 ((&n->timer)#2){+.-.}:
[ 2284.078604] del_timer_sync+0x34/0xa0
[ 2284.078604] tipc_node_delete+0x1a/0x40 [tipc]
[ 2284.078604] tipc_node_stop+0xcb/0x190 [tipc]
[ 2284.078604] tipc_net_stop+0x154/0x170 [tipc]
[ 2284.078604] tipc_exit_net+0x16/0x30 [tipc]
[ 2284.078604] ops_exit_list.isra.8+0x36/0x70
[ 2284.078604] unregister_pernet_operations+0x87/0xd0
[ 2284.078604] unregister_pernet_subsys+0x1d/0x30
[ 2284.078604] tipc_exit+0x11/0x6f2 [tipc]
[ 2284.078604] __x64_sys_delete_module+0x1df/0x240
[ 2284.078604] do_syscall_64+0x66/0x460
[ 2284.078604] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 2284.078604]
[ 2284.078604] other info that might help us debug this:
[ 2284.078604]
[ 2284.078604] Possible unsafe locking scenario:
[ 2284.078604]
[ 2284.078604] CPU0 CPU1
[ 2284.078604] ---- ----
[ 2284.078604] lock(&(&tn->node_list_lock)->rlock);
[ 2284.078604] lock((&n->timer)#2);
[ 2284.078604] lock(&(&tn->node_list_lock)->rlock);
[ 2284.078604] lock((&n->timer)#2);
[ 2284.078604]
[ 2284.078604] *** DEADLOCK ***
[ 2284.078604]
[ 2284.078604] 3 locks held by rmmod/254:
[ 2284.078604] #0:
000000003368be9b (pernet_ops_rwsem){+.+.}, at: unregister_pernet_subsys+0x15/0x30
[ 2284.078604] #1:
0000000046ed9c86 (rtnl_mutex){+.+.}, at: tipc_net_stop+0x144/0x170 [tipc]
[ 2284.078604] #2:
00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x19
[...}
The reason is that the node timer handler sometimes needs to delete a
node which has been disconnected for too long. To do this, it grabs
the lock 'node_list_lock', which may at the same time be held by the
generic node cleanup function, tipc_node_stop(), during module removal.
Since the latter is calling del_timer_sync() inside the same lock, we
have a potential deadlock.
We fix this letting the timer cleanup function use spin_trylock()
instead of just spin_lock(), and when it fails to grab the lock it
just returns so that the timer handler can terminate its execution.
This is safe to do, since tipc_node_stop() anyway is about to
delete both the timer and the node instance.
Fixes:
6a939f365bdb ("tipc: Auto removal of peer down node instance")
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Heiner Kallweit [Fri, 23 Nov 2018 18:41:29 +0000 (19:41 +0100)]
net: phy: add workaround for issue where PHY driver doesn't bind to the device
[ Upstream commit
c85ddecae6e5e82ca3ae6f20c63f1d865e2ff5ea ]
After switching the r8169 driver to use phylib some user reported that
their network is broken. This was caused by the genphy PHY driver being
used instead of the dedicated PHY driver for the RTL8211B. Users
reported that loading the Realtek PHY driver module upfront fixes the
issue. See also this mail thread:
https://marc.info/?t=
154279781800003&r=1&w=2
The issue is quite weird and the root cause seems to be somewhere in
the base driver core. The patch works around the issue and may be
removed once the actual issue is fixed.
The Fixes tag refers to the first reported occurrence of the issue.
The issue itself may have been existing much longer and it may affect
users of other network chips as well. Users typically will recognize
this issue only if their PHY stops working when being used with the
genphy driver.
Fixes:
f1e911d5d0df ("r8169: add basic phylib support")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric Dumazet [Tue, 20 Nov 2018 13:53:59 +0000 (05:53 -0800)]
tcp: defer SACK compression after DupThresh
[ Upstream commit
86de5921a3d5dd246df661e09bdd0a6131b39ae3 ]
Jean-Louis reported a TCP regression and bisected to recent SACK
compression.
After a loss episode (receiver not able to keep up and dropping
packets because its backlog is full), linux TCP stack is sending
a single SACK (DUPACK).
Sender waits a full RTO timer before recovering losses.
While RFC 6675 says in section 5, "Algorithm Details",
(2) If DupAcks < DupThresh but IsLost (HighACK + 1) returns true --
indicating at least three segments have arrived above the current
cumulative acknowledgment point, which is taken to indicate loss
-- go to step (4).
...
(4) Invoke fast retransmit and enter loss recovery as follows:
there are old TCP stacks not implementing this strategy, and
still counting the dupacks before starting fast retransmit.
While these stacks probably perform poorly when receivers implement
LRO/GRO, we should be a little more gentle to them.
This patch makes sure we do not enable SACK compression unless
3 dupacks have been sent since last rcv_nxt update.
Ideally we should even rearm the timer to send one or two
more DUPACK if no more packets are coming, but that will
be work aiming for linux-4.21.
Many thanks to Jean-Louis for bisecting the issue, providing
packet captures and testing this patch.
Fixes:
5d9f4262b7ea ("tcp: add SACK compression")
Reported-by: Jean-Louis Dupond <jean-louis@dupond.be>
Tested-by: Jean-Louis Dupond <jean-louis@dupond.be>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tal Gilboa [Wed, 21 Nov 2018 14:28:23 +0000 (16:28 +0200)]
net/dim: Update DIM start sample after each DIM iteration
[ Upstream commit
0211dda68a4f6531923a2f72d8e8959207f59fba ]
On every iteration of net_dim, the algorithm may choose to
check for the system state by comparing current data sample
with previous data sample. After each of these comparison,
regardless of the action taken, the sample used as baseline
is needed to be updated.
This patch fixes a bug that causes DIM to take wrong decisions,
due to never updating the baseline sample for comparison between
iterations. This way, DIM always compares current sample with
zeros.
Although this is a functional fix, it also improves and stabilizes
performance as the algorithm works properly now.
Performance:
Tested single UDP TX stream with pktgen:
samples/pktgen/pktgen_sample03_burst_single_flow.sh -i p4p2 -d 1.1.1.1
-m 24:8a:07:88:26:8b -f 3 -b 128
ConnectX-5 100GbE packet rate improved from 15-19Mpps to 19-20Mpps.
Also, toggling between profiles is less frequent with the fix.
Fixes:
8115b750dbcb ("net/dim: use struct net_dim_sample as arg to net_dim")
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jason Wang [Thu, 22 Nov 2018 06:36:31 +0000 (14:36 +0800)]
virtio-net: fail XDP set if guest csum is negotiated
[ Upstream commit
18ba58e1c234ea1a2d9835ac8c1735d965ce4640 ]
We don't support partial csumed packet since its metadata will be lost
or incorrect during XDP processing. So fail the XDP set if guest_csum
feature is negotiated.
Fixes:
f600b6905015 ("virtio_net: Add XDP support")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pavel Popa <pashinho1990@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jason Wang [Thu, 22 Nov 2018 06:36:30 +0000 (14:36 +0800)]
virtio-net: disable guest csum during XDP set
[ Upstream commit
e59ff2c49ae16e1d179de679aca81405829aee6c ]
We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we
can receive partial csumed packets with metadata kept in the
vnet_hdr. This may have several side effects:
- It could be overridden by header adjustment, thus is might be not
correct after XDP processing.
- There's no way to pass such metadata information through
XDP_REDIRECT to another driver.
- XDP does not support checksum offload right now.
So simply disable guest csum if possible in this the case of XDP.
Fixes:
3f93522ffab2d ("virtio-net: switch off offloads on demand if possible on XDP set")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pavel Popa <pashinho1990@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Petr Machata [Tue, 20 Nov 2018 11:39:56 +0000 (11:39 +0000)]
net: skb_scrub_packet(): Scrub offload_fwd_mark
[ Upstream commit
b5dd186d10ba59e6b5ba60e42b3b083df56df6f3 ]
When a packet is trapped and the corresponding SKB marked as
already-forwarded, it retains this marking even after it is forwarded
across veth links into another bridge. There, since it ingresses the
bridge over veth, which doesn't have offload_fwd_mark, it triggers a
warning in nbp_switchdev_frame_mark().
Then nbp_switchdev_allowed_egress() decides not to allow egress from
this bridge through another veth, because the SKB is already marked, and
the mark (of 0) of course matches. Thus the packet is incorrectly
blocked.
Solve by resetting offload_fwd_mark() in skb_scrub_packet(). That
function is called from tunnels and also from veth, and thus catches the
cases where traffic is forwarded between bridges and transformed in a
way that invalidates the marking.
Fixes:
6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices")
Fixes:
abf4bb6b63d0 ("skbuff: Add the offload_mr_fwd_mark field")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Suggested-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lorenzo Bianconi [Wed, 21 Nov 2018 15:32:10 +0000 (16:32 +0100)]
net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
[ Upstream commit
6d0f60b0f8588fd4380ea5df9601e12fddd55ce2 ]
Set xdp_prog pointer to NULL if bpf_prog_add fails since that routine
reports the error code instead of NULL in case of failure and xdp_prog
pointer value is used in the driver to verify if XDP is currently
enabled.
Moreover report the error code to userspace if nicvf_xdp_setup fails
Fixes:
05c773f52b96 ("net: thunderx: Add basic XDP support")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bernd Eckstein [Fri, 23 Nov 2018 12:51:26 +0000 (13:51 +0100)]
usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2
[ Upstream commit
45611c61dd503454b2edae00aabe1e429ec49ebe ]
The bug is not easily reproducable, as it may occur very infrequently
(we had machines with 20minutes heavy downloading before it occurred)
However, on a virual machine (VMWare on Windows 10 host) it occurred
pretty frequently (1-2 seconds after a speedtest was started)
dev->tx_skb mab be freed via dev_kfree_skb_irq on a callback
before it is set.
This causes the following problems:
- double free of the skb or potential memory leak
- in dmesg: 'recvmsg bug' and 'recvmsg bug 2' and eventually
general protection fault
Example dmesg output:
[ 134.841986] ------------[ cut here ]------------
[ 134.841987] recvmsg bug: copied
9C24A555 seq
9C24B557 rcvnxt
9C25A6B3 fl 0
[ 134.841993] WARNING: CPU: 7 PID: 2629 at /build/linux-hwe-On9fm7/linux-hwe-4.15.0/net/ipv4/tcp.c:1865 tcp_recvmsg+0x44d/0xab0
[ 134.841994] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi
[ 134.842046] CPU: 7 PID: 2629 Comm: python Tainted: G W OE 4.15.0-34-generic #37~16.04.1-Ubuntu
[ 134.842046] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 134.842048] RIP: 0010:tcp_recvmsg+0x44d/0xab0
[ 134.842048] RSP: 0018:
ffffa6630422bcc8 EFLAGS:
00010286
[ 134.842049] RAX:
0000000000000000 RBX:
ffff997616f4f200 RCX:
0000000000000006
[ 134.842049] RDX:
0000000000000007 RSI:
0000000000000082 RDI:
ffff9976257d6490
[ 134.842050] RBP:
ffffa6630422bd98 R08:
0000000000000001 R09:
000000000004bba4
[ 134.842050] R10:
0000000001e00c6f R11:
000000000004bba4 R12:
ffff99760dee3000
[ 134.842051] R13:
0000000000000000 R14:
ffff99760dee3514 R15:
0000000000000000
[ 134.842051] FS:
00007fe332347700(0000) GS:
ffff9976257c0000(0000) knlGS:
0000000000000000
[ 134.842052] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 134.842053] CR2:
0000000001e41000 CR3:
000000020e9b4006 CR4:
00000000003606e0
[ 134.842055] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 134.842055] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 134.842057] Call Trace:
[ 134.842060] ? aa_sk_perm+0x53/0x1a0
[ 134.842064] inet_recvmsg+0x51/0xc0
[ 134.842066] sock_recvmsg+0x43/0x50
[ 134.842070] SYSC_recvfrom+0xe4/0x160
[ 134.842072] ? __schedule+0x3de/0x8b0
[ 134.842075] ? ktime_get_ts64+0x4c/0xf0
[ 134.842079] SyS_recvfrom+0xe/0x10
[ 134.842082] do_syscall_64+0x73/0x130
[ 134.842086] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 134.842086] RIP: 0033:0x7fe331f5a81d
[ 134.842088] RSP: 002b:
00007ffe8da98398 EFLAGS:
00000246 ORIG_RAX:
000000000000002d
[ 134.842090] RAX:
ffffffffffffffda RBX:
ffffffffffffffff RCX:
00007fe331f5a81d
[ 134.842094] RDX:
00000000000003fb RSI:
0000000001e00874 RDI:
0000000000000003
[ 134.842095] RBP:
00007fe32f642c70 R08:
0000000000000000 R09:
0000000000000000
[ 134.842097] R10:
0000000000000000 R11:
0000000000000246 R12:
00007fe332347698
[ 134.842099] R13:
0000000001b7e0a0 R14:
0000000001e00874 R15:
0000000000000000
[ 134.842103] Code: 24 fd ff ff e9 cc fe ff ff 48 89 d8 41 8b 8c 24 10 05 00 00 44 8b 45 80 48 c7 c7 08 bd 59 8b 48 89 85 68 ff ff ff e8 b3 c4 7d ff <0f> 0b 48 8b 85 68 ff ff ff e9 e9 fe ff ff 41 8b 8c 24 10 05 00
[ 134.842126] ---[ end trace
b7138fc08c83147f ]---
[ 134.842144] general protection fault: 0000 [#1] SMP PTI
[ 134.842145] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi
[ 134.842161] CPU: 7 PID: 2629 Comm: python Tainted: G W OE 4.15.0-34-generic #37~16.04.1-Ubuntu
[ 134.842162] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 134.842164] RIP: 0010:tcp_close+0x2c6/0x440
[ 134.842165] RSP: 0018:
ffffa6630422bde8 EFLAGS:
00010202
[ 134.842167] RAX:
0000000000000000 RBX:
ffff99760dee3000 RCX:
0000000180400034
[ 134.842168] RDX:
5c4afd407207a6c4 RSI:
ffffe868495bd300 RDI:
ffff997616f4f200
[ 134.842169] RBP:
ffffa6630422be08 R08:
0000000016f4d401 R09:
0000000180400034
[ 134.842169] R10:
ffffa6630422bd98 R11:
0000000000000000 R12:
000000000000600c
[ 134.842170] R13:
0000000000000000 R14:
ffff99760dee30c8 R15:
ffff9975bd44fe00
[ 134.842171] FS:
00007fe332347700(0000) GS:
ffff9976257c0000(0000) knlGS:
0000000000000000
[ 134.842173] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 134.842174] CR2:
0000000001e41000 CR3:
000000020e9b4006 CR4:
00000000003606e0
[ 134.842177] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 134.842178] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 134.842179] Call Trace:
[ 134.842181] inet_release+0x42/0x70
[ 134.842183] __sock_release+0x42/0xb0
[ 134.842184] sock_close+0x15/0x20
[ 134.842187] __fput+0xea/0x220
[ 134.842189] ____fput+0xe/0x10
[ 134.842191] task_work_run+0x8a/0xb0
[ 134.842193] exit_to_usermode_loop+0xc4/0xd0
[ 134.842195] do_syscall_64+0xf4/0x130
[ 134.842197] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 134.842197] RIP: 0033:0x7fe331f5a560
[ 134.842198] RSP: 002b:
00007ffe8da982e8 EFLAGS:
00000246 ORIG_RAX:
0000000000000003
[ 134.842200] RAX:
0000000000000000 RBX:
00007fe32f642c70 RCX:
00007fe331f5a560
[ 134.842201] RDX:
00000000008f5320 RSI:
0000000001cd4b50 RDI:
0000000000000003
[ 134.842202] RBP:
00007fe32f6500f8 R08:
000000000000003c R09:
00000000009343c0
[ 134.842203] R10:
0000000000000000 R11:
0000000000000246 R12:
00007fe32f6500d0
[ 134.842204] R13:
00000000008f5320 R14:
00000000008f5320 R15:
0000000001cd4770
[ 134.842205] Code: c8 00 00 00 45 31 e4 49 39 fe 75 4d eb 50 83 ab d8 00 00 00 01 48 8b 17 48 8b 47 08 48 c7 07 00 00 00 00 48 c7 47 08 00 00 00 00 <48> 89 42 08 48 89 10 0f b6 57 34 8b 47 2c 2b 47 28 83 e2 01 80
[ 134.842226] RIP: tcp_close+0x2c6/0x440 RSP:
ffffa6630422bde8
[ 134.842227] ---[ end trace
b7138fc08c831480 ]---
The proposed patch eliminates a potential racing condition.
Before, usb_submit_urb was called and _after_ that, the skb was attached
(dev->tx_skb). So, on a callback it was possible, however unlikely that the
skb was freed before it was set. That way (because dev->tx_skb was not set
to NULL after it was freed), it could happen that a skb from a earlier
transmission was freed a second time (and the skb we should have freed did
not get freed at all)
Now we free the skb directly in ipheth_tx(). It is not passed to the
callback anymore, eliminating the posibility of a double free of the same
skb. Depending on the retval of usb_submit_urb() we use dev_kfree_skb_any()
respectively dev_consume_skb_any() to free the skb.
Signed-off-by: Oliver Zweigle <Oliver.Zweigle@faro.com>
Signed-off-by: Bernd Eckstein <3ernd.Eckstein@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Julian Wiedmann [Wed, 28 Nov 2018 15:20:50 +0000 (16:20 +0100)]
s390/qeth: fix length check in SNMP processing
[ Upstream commit
9a764c1e59684c0358e16ccaafd870629f2cfe67 ]
The response for a SNMP request can consist of multiple parts, which
the cmd callback stages into a kernel buffer until all parts have been
received. If the callback detects that the staging buffer provides
insufficient space, it bails out with error.
This processing is buggy for the first part of the response - while it
initially checks for a length of 'data_len', it later copies an
additional amount of 'offsetof(struct qeth_snmp_cmd, data)' bytes.
Fix the calculation of 'data_len' for the first part of the response.
This also nicely cleans up the memcpy code.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pan Bian [Wed, 28 Nov 2018 06:53:19 +0000 (14:53 +0800)]
rapidio/rionet: do not free skb before reading its length
[ Upstream commit
cfc435198f53a6fa1f656d98466b24967ff457d0 ]
skb is freed via dev_kfree_skb_any, however, skb->len is read then. This
may result in a use-after-free bug.
Fixes:
e6161d64263 ("rapidio/rionet: rework driver initialization and removal")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Willem de Bruijn [Tue, 20 Nov 2018 18:00:18 +0000 (13:00 -0500)]
packet: copy user buffers before orphan or clone
[ Upstream commit
5cd8d46ea1562be80063f53c7c6a5f40224de623 ]
tpacket_snd sends packets with user pages linked into skb frags. It
notifies that pages can be reused when the skb is released by setting
skb->destructor to tpacket_destruct_skb.
This can cause data corruption if the skb is orphaned (e.g., on
transmit through veth) or cloned (e.g., on mirror to another psock).
Create a kernel-private copy of data in these cases, same as tun/tap
zerocopy transmission. Reuse that infrastructure: mark the skb as
SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).
Unlike other zerocopy packets, do not set shinfo destructor_arg to
struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify
when the original skb is released and a timestamp is recorded. Do not
change this timestamp behavior. The ubuf_info->callback is not needed
anyway, as no zerocopy notification is expected.
Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The
resulting value is not a valid ubuf_info pointer, nor a valid
tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.
The fix relies on features introduced in commit
52267790ef52 ("sock:
add MSG_ZEROCOPY"), so can be backported as is only to 4.14.
Tested with from `./in_netns.sh ./txring_overwrite` from
http://github.com/wdebruij/kerneltools/tests
Fixes:
69e3c75f4d54 ("net: TX_RING and packet mmap")
Reported-by: Anand H. Krishnan <anandhkrishnan@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lorenzo Bianconi [Fri, 23 Nov 2018 17:28:01 +0000 (18:28 +0100)]
net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue
[ Upstream commit
ef2a7cf1d8831535b8991459567b385661eb4a36 ]
Reset snd_queue tso_hdrs pointer to NULL in nicvf_free_snd_queue routine
since it is used to check if tso dma descriptor queue has been previously
allocated. The issue can be triggered with the following reproducer:
$ip link set dev enP2p1s0v0 xdpdrv obj xdp_dummy.o
$ip link set dev enP2p1s0v0 xdpdrv off
[ 341.467649] WARNING: CPU: 74 PID: 2158 at mm/vmalloc.c:1511 __vunmap+0x98/0xe0
[ 341.515010] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018
[ 341.521874] pstate:
60400005 (nZCv daif +PAN -UAO)
[ 341.526654] pc : __vunmap+0x98/0xe0
[ 341.530132] lr : __vunmap+0x98/0xe0
[ 341.533609] sp :
ffff00001c5db860
[ 341.536913] x29:
ffff00001c5db860 x28:
0000000000020000
[ 341.542214] x27:
ffff810feb5090b0 x26:
ffff000017e57000
[ 341.547515] x25:
0000000000000000 x24:
00000000fbd00000
[ 341.552816] x23:
0000000000000000 x22:
ffff810feb5090b0
[ 341.558117] x21:
0000000000000000 x20:
0000000000000000
[ 341.563418] x19:
ffff000017e57000 x18:
0000000000000000
[ 341.568719] x17:
0000000000000000 x16:
0000000000000000
[ 341.574020] x15:
0000000000000010 x14:
ffffffffffffffff
[ 341.579321] x13:
ffff00008985eb27 x12:
ffff00000985eb2f
[ 341.584622] x11:
ffff0000096b3000 x10:
ffff00001c5db510
[ 341.589923] x9 :
00000000ffffffd0 x8 :
ffff0000086868e8
[ 341.595224] x7 :
3430303030303030 x6 :
00000000000006ef
[ 341.600525] x5 :
00000000003fffff x4 :
0000000000000000
[ 341.605825] x3 :
0000000000000000 x2 :
ffffffffffffffff
[ 341.611126] x1 :
ffff0000096b3728 x0 :
0000000000000038
[ 341.616428] Call trace:
[ 341.618866] __vunmap+0x98/0xe0
[ 341.621997] vunmap+0x3c/0x50
[ 341.624961] arch_dma_free+0x68/0xa0
[ 341.628534] dma_direct_free+0x50/0x80
[ 341.632285] nicvf_free_resources+0x160/0x2d8 [nicvf]
[ 341.637327] nicvf_config_data_transfer+0x174/0x5e8 [nicvf]
[ 341.642890] nicvf_stop+0x298/0x340 [nicvf]
[ 341.647066] __dev_close_many+0x9c/0x108
[ 341.650977] dev_close_many+0xa4/0x158
[ 341.654720] rollback_registered_many+0x140/0x530
[ 341.659414] rollback_registered+0x54/0x80
[ 341.663499] unregister_netdevice_queue+0x9c/0xe8
[ 341.668192] unregister_netdev+0x28/0x38
[ 341.672106] nicvf_remove+0xa4/0xa8 [nicvf]
[ 341.676280] nicvf_shutdown+0x20/0x30 [nicvf]
[ 341.680630] pci_device_shutdown+0x44/0x88
[ 341.684720] device_shutdown+0x144/0x250
[ 341.688640] kernel_restart_prepare+0x44/0x50
[ 341.692986] kernel_restart+0x20/0x68
[ 341.696638] __se_sys_reboot+0x210/0x238
[ 341.700550] __arm64_sys_reboot+0x24/0x30
[ 341.704555] el0_svc_handler+0x94/0x110
[ 341.708382] el0_svc+0x8/0xc
[ 341.711252] ---[ end trace
3f4019c8439959c9 ]---
[ 341.715874] page:
ffff7e0003ef4000 count:0 mapcount:0 mapping:
0000000000000000 index:0x4
[ 341.723872] flags: 0x1fffe000000000()
[ 341.727527] raw:
001fffe000000000 ffff7e0003f1a008 ffff7e0003ef4048 0000000000000000
[ 341.735263] raw:
0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
[ 341.742994] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
where xdp_dummy.c is a simple bpf program that forwards the incoming
frames to the network stack (available here:
https://github.com/altoor/xdp_walkthrough_examples/blob/master/sample_1/xdp_dummy.c)
Fixes:
05c773f52b96 ("net: thunderx: Add basic XDP support")
Fixes:
4863dea3fab0 ("net: Adding support for Cavium ThunderX network controller")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andreas Fiedler [Fri, 23 Nov 2018 23:16:34 +0000 (00:16 +0100)]
net: gemini: Fix copy/paste error
[ Upstream commit
07093b76476903f820d83d56c3040e656fb4d9e3 ]
The TX stats should be started with the tx_stats_syncp,
there seems to be a copy/paste error in the driver.
Signed-off-by: Andreas Fiedler <andreas.fiedler@gmx.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Paolo Abeni [Wed, 21 Nov 2018 17:21:35 +0000 (18:21 +0100)]
net: don't keep lonely packets forever in the gro hash
[ Upstream commit
605108acfe6233b72e2f803aa1cb59a2af3001ca ]
Eric noted that with UDP GRO and NAPI timeout, we could keep a single
UDP packet inside the GRO hash forever, if the related NAPI instance
calls napi_gro_complete() at an higher frequency than the NAPI timeout.
Willem noted that even TCP packets could be trapped there, till the
next retransmission.
This patch tries to address the issue, flushing the old packets -
those with a NAPI_GRO_CB age before the current jiffy - before scheduling
the NAPI timeout. The rationale is that such a timeout should be
well below a jiffy and we are not flushing packets eligible for sane GRO.
v1 -> v2:
- clarified the commit message and comment
RFC -> v1:
- added 'Fixes tags', cleaned-up the wording.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Fixes:
3b47d30396ba ("net: gro: add a per device gro flush timer")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bryan Whitehead [Mon, 26 Nov 2018 17:04:57 +0000 (12:04 -0500)]
lan743x: fix return value for lan743x_tx_napi_poll
[ Upstream commit
cc5922054131f9abefdc0622ae64fc55e6b2671d ]
The lan743x driver, when under heavy traffic load, has been noticed
to sometimes hang, or cause a kernel panic.
Debugging reveals that the TX napi poll routine was returning
the wrong value, 'weight'. Most other drivers return 0.
And call napi_complete, instead of napi_complete_done.
Additionally when creating the tx napi poll routine.
Changed netif_napi_add, to netif_tx_napi_add.
Updates for v3:
changed 'fixes' tag to match defined format
Updates for v2:
use napi_complete, instead of napi_complete_done in
lan743x_tx_napi_poll
use netif_tx_napi_add, instead of netif_napi_add for
registration of tx napi poll routine
fixes:
23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bryan Whitehead [Mon, 26 Nov 2018 17:27:10 +0000 (12:27 -0500)]
lan743x: Enable driver to work with LAN7431
[ Upstream commit
4df5ce9bc03e47d05f400e64aa32a82ec4cef419 ]
This driver was designed to work with both LAN7430 and LAN7431.
The only difference between the two is the LAN7431 has support
for external phy.
This change adds LAN7431 to the list of recognized devices
supported by this driver.
Updates for v2:
changed 'fixes' tag to match defined format
fixes:
23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hugh Dickins [Fri, 30 Nov 2018 22:10:47 +0000 (14:10 -0800)]
mm/khugepaged: collapse_shmem() do not crash on Compound
commit
06a5e1268a5fb9c2b346a3da6b97e85f2eba0f07 upstream.
collapse_shmem()'s VM_BUG_ON_PAGE(PageTransCompound) was unsafe: before
it holds page lock of the first page, racing truncation then extension
might conceivably have inserted a hugepage there already. Fail with the
SCAN_PAGE_COMPOUND result, instead of crashing (CONFIG_DEBUG_VM=y) or
otherwise mishandling the unexpected hugepage - though later we might
code up a more constructive way of handling it, with SCAN_SUCCESS.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261529310.2275@eggly.anvils
Fixes:
f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hugh Dickins [Fri, 30 Nov 2018 22:10:43 +0000 (14:10 -0800)]
mm/khugepaged: collapse_shmem() without freezing new_page
commit
87c460a0bded56195b5eb497d44709777ef7b415 upstream.
khugepaged's collapse_shmem() does almost all of its work, to assemble
the huge new_page from 512 scattered old pages, with the new_page's
refcount frozen to 0 (and refcounts of all old pages so far also frozen
to 0). Including shmem_getpage() to read in any which were out on swap,
memory reclaim if necessary to allocate their intermediate pages, and
copying over all the data from old to new.
Imagine the frozen refcount as a spinlock held, but without any lock
debugging to highlight the abuse: it's not good, and under serious load
heads into lockups - speculative getters of the page are not expecting
to spin while khugepaged is rescheduled.
One can get a little further under load by hacking around elsewhere; but
fortunately, freezing the new_page turns out to have been entirely
unnecessary, with no hacks needed elsewhere.
The huge new_page lock is already held throughout, and guards all its
subpages as they are brought one by one into the page cache tree; and
anything reading the data in that page, without the lock, before it has
been marked PageUptodate, would already be in the wrong. So simply
eliminate the freezing of the new_page.
Each of the old pages remains frozen with refcount 0 after it has been
replaced by a new_page subpage in the page cache tree, until they are
all unfrozen on success or failure: just as before. They could be
unfrozen sooner, but cause no problem once no longer visible to
find_get_entry(), filemap_map_pages() and other speculative lookups.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils
Fixes:
f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hugh Dickins [Fri, 30 Nov 2018 22:10:39 +0000 (14:10 -0800)]
mm/khugepaged: minor reorderings in collapse_shmem()
commit
042a30824871fa3149b0127009074b75cc25863c upstream.
Several cleanups in collapse_shmem(): most of which probably do not
really matter, beyond doing things in a more familiar and reassuring
order. Simplify the failure gotos in the main loop, and on success
update stats while interrupts still disabled from the last iteration.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261526400.2275@eggly.anvils
Fixes:
f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hugh Dickins [Fri, 30 Nov 2018 22:10:35 +0000 (14:10 -0800)]
mm/khugepaged: collapse_shmem() remember to clear holes
commit
2af8ff291848cc4b1cce24b6c943394eb2c761e8 upstream.
Huge tmpfs testing reminds us that there is no __GFP_ZERO in the gfp
flags khugepaged uses to allocate a huge page - in all common cases it
would just be a waste of effort - so collapse_shmem() must remember to
clear out any holes that it instantiates.
The obvious place to do so, where they are put into the page cache tree,
is not a good choice: because interrupts are disabled there. Leave it
until further down, once success is assured, where the other pages are
copied (before setting PageUptodate).
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261525080.2275@eggly.anvils
Fixes:
f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hugh Dickins [Fri, 30 Nov 2018 22:10:29 +0000 (14:10 -0800)]
mm/khugepaged: fix crashes due to misaccounted holes
commit
aaa52e340073b7f4593b3c4ddafcafa70cf838b5 upstream.
Huge tmpfs testing on a shortish file mapped into a pmd-rounded extent
hit shmem_evict_inode()'s WARN_ON(inode->i_blocks) followed by
clear_inode()'s BUG_ON(inode->i_data.nrpages) when the file was later
closed and unlinked.
khugepaged's collapse_shmem() was forgetting to update mapping->nrpages
on the rollback path, after it had added but then needs to undo some
holes.
There is indeed an irritating asymmetry between shmem_charge(), whose
callers want it to increment nrpages after successfully accounting
blocks, and shmem_uncharge(), when __delete_from_page_cache() already
decremented nrpages itself: oh well, just add a comment on that to them
both.
And shmem_recalc_inode() is supposed to be called when the accounting is
expected to be in balance (so it can deduce from imbalance that reclaim
discarded some pages): so change shmem_charge() to update nrpages
earlier (though it's rare for the difference to matter at all).
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261523450.2275@eggly.anvils
Fixes:
800d8c63b2e98 ("shmem: add huge pages support")
Fixes:
f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hugh Dickins [Fri, 30 Nov 2018 22:10:25 +0000 (14:10 -0800)]
mm/khugepaged: collapse_shmem() stop if punched or truncated
commit
701270fa193aadf00bdcf607738f64997275d4c7 upstream.
Huge tmpfs testing showed that although collapse_shmem() recognizes a
concurrently truncated or hole-punched page correctly, its handling of
holes was liable to refill an emptied extent. Add check to stop that.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261522040.2275@eggly.anvils
Fixes:
f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hugh Dickins [Fri, 30 Nov 2018 22:10:21 +0000 (14:10 -0800)]
mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
commit
006d3ff27e884f80bd7d306b041afc415f63598f upstream.
Huge tmpfs testing, on 32-bit kernel with lockdep enabled, showed that
__split_huge_page() was using i_size_read() while holding the irq-safe
lru_lock and page tree lock, but the 32-bit i_size_read() uses an
irq-unsafe seqlock which should not be nested inside them.
Instead, read the i_size earlier in split_huge_page_to_list(), and pass
the end offset down to __split_huge_page(): all while holding head page
lock, which is enough to prevent truncation of that extent before the
page tree lock has been taken.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261520070.2275@eggly.anvils
Fixes:
baa355fd33142 ("thp: file pages support for split_huge_page()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hugh Dickins [Fri, 30 Nov 2018 22:10:16 +0000 (14:10 -0800)]
mm/huge_memory: splitting set mapping+index before unfreeze
commit
173d9d9fd3ddae84c110fea8aedf1f26af6be9ec upstream.
Huge tmpfs stress testing has occasionally hit shmem_undo_range()'s
VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page).
Move the setting of mapping and index up before the page_ref_unfreeze()
in __split_huge_page_tail() to fix this: so that a page cache lookup
cannot get a reference while the tail's mapping and index are unstable.
In fact, might as well move them up before the smp_wmb(): I don't see an
actual need for that, but if I'm missing something, this way round is
safer than the other, and no less efficient.
You might argue that VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page) is
misplaced, and should be left until after the trylock_page(); but left as
is has not crashed since, and gives more stringent assurance.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261516380.2275@eggly.anvils
Fixes:
e9b61f19858a5 ("thp: reintroduce split_huge_page()")
Requires:
605ca5ede764 ("mm/huge_memory.c: reorder operations in __split_huge_page_tail()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hugh Dickins [Fri, 30 Nov 2018 22:10:13 +0000 (14:10 -0800)]
mm/huge_memory: rename freeze_page() to unmap_page()
commit
906f9cdfc2a0800f13683f9e4ebdfd08c12ee81b upstream.
The term "freeze" is used in several ways in the kernel, and in mm it
has the particular meaning of forcing page refcount temporarily to 0.
freeze_page() is just too confusing a name for a function that unmaps a
page: rename it unmap_page(), and rename unfreeze_page() remap_page().
Went to change the mention of freeze_page() added later in mm/rmap.c,
but found it to be incorrect: ordinary page reclaim reaches there too;
but the substance of the comment still seems correct, so edit it down.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261514080.2275@eggly.anvils
Fixes:
e9b61f19858a5 ("thp: reintroduce split_huge_page()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Greg Kroah-Hartman [Sat, 1 Dec 2018 08:37:35 +0000 (09:37 +0100)]
Linux 4.19.6
Hugues Fruchet [Tue, 11 Sep 2018 13:48:20 +0000 (09:48 -0400)]
media: ov5640: fix auto controls values when switching to manual mode
commit
a8f438c684eaa4cbe6c98828eb996d5ec53e24fb upstream.
When switching from auto to manual mode, V4L2 core is calling
g_volatile_ctrl() in manual mode in order to get the manual initial value.
Remove the manual mode check/return to not break this behaviour.
Signed-off-by: Hugues Fruchet <hugues.fruchet@st.com>
Tested-by: Jacopo Mondi <jacopo@jmondi.org>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hugues Fruchet [Tue, 11 Sep 2018 13:48:19 +0000 (09:48 -0400)]
media: ov5640: fix wrong binning value in exposure calculation
commit
c2c3f42df4dd9bb231d756bacb0c897f662c6d3c upstream.
ov5640_set_mode_exposure_calc() is checking binning value but
binning value read is buggy, fix this.
Rename ov5640_binning_on() to ov5640_get_binning() as per other
similar functions.
Signed-off-by: Hugues Fruchet <hugues.fruchet@st.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Jacopo Mondi <jacopo@jmondi.org>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hugues Fruchet [Tue, 11 Sep 2018 13:48:18 +0000 (09:48 -0400)]
media: ov5640: fix auto gain & exposure when changing mode
commit
3cca8ef5f774cbd61c8db05d9aa401de9bb59c66 upstream.
Ensure that auto gain and auto exposure are well restored
when changing mode.
Signed-off-by: Hugues Fruchet <hugues.fruchet@st.com>
Reviewed-by: Jacopo Mondi <jacopo@jmondi.org>
Tested-by: Jacopo Mondi <jacopo@jmondi.org>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hugues Fruchet [Tue, 11 Sep 2018 13:48:17 +0000 (09:48 -0400)]
media: ov5640: fix exposure regression
commit
dc29a1c187eedc1d498cb567c44bbbc832b009cb upstream.
Symptom was black image when capturing HD or 5Mp picture
due to manual exposure set to 1 while it was intended to
set autoexposure to "manual", fix this.
Fixes:
bf4a4b518c20 ("media: ov5640: Don't force the auto exposure state at start time").
Signed-off-by: Hugues Fruchet <hugues.fruchet@st.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Tested-by: Jacopo Mondi <jacopo@jmondi.org>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jacopo Mondi [Wed, 18 Jul 2018 10:06:23 +0000 (06:06 -0400)]
media: ov5640: Fix timings setup code
commit
bad1774ed41e98a43074e50e7d5ac9e1e848d99a upstream.
As of: commit
476dec012f4c ("media: ov5640: Add horizontal and vertical
totals") the timings parameters gets programmed separately from the
static register values array.
When changing capture mode, the vertical and horizontal totals gets
inspected by the set_mode_exposure_calc() functions, and only later
programmed with the new values. This means exposure, light banding
filter and shutter gain are calculated using the previous timings, and
are thus not correct.
Fix this by programming timings right after the static register value
table has been sent to the sensor in the ov5640_load_regs() function.
Fixes:
476dec012f4c ("media: ov5640: Add horizontal and vertical totals")
Tested-by: Steve Longerbeam <slongerbeam@gmail.com> # i.MX6q SabreSD, CSI-2
Tested-by: Loic Poulain <loic.poulain@linaro.org> # Dragonboard-410c, CSI-2
Signed-off-by: Samuel Bobrowicz <sam@elite-embedded.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jacopo Mondi [Fri, 6 Jul 2018 09:51:52 +0000 (05:51 -0400)]
media: ov5640: Re-work MIPI startup sequence
commit
aa4bb8b8838ffcc776a79f49a4d7476b82405349 upstream.
Rework the MIPI interface startup sequence with the following changes:
- Remove MIPI bus initialization from the initial settings blob
- At set_power(1) time power up MIPI Tx/Rx and set data and clock lanes in
LP11 during 'sleep' and 'idle' with MIPI clock in non-continuous mode.
- At s_stream time enable/disable the MIPI interface output.
- Restore default settings at set_power(0) time.
Before this commit the sensor MIPI interface was initialized with settings
that require a start/stop sequence at power-up time in order to force lanes
into LP11 state, as they were initialized in LP00 when in 'sleep mode',
which is assumed to be the sensor manual definition for the D-PHY defined
stop mode.
The stream start/stop was performed by enabling disabling clock gating,
and had the side effect to change the lanes sleep mode configuration when
stream was stopped.
Clock gating/ungating:
- ret = ov5640_mod_reg(sensor, OV5640_REG_MIPI_CTRL00, BIT(5),
- on ? 0 : BIT(5));
- if (ret)
Set lanes in LP11 when in 'sleep mode':
- ret = ov5640_write_reg(sensor, OV5640_REG_PAD_OUTPUT00,
- on ? 0x00 : 0x70);
This commit fixes an issue reported by Jagan Teki on i.MX6 platforms that
prevents the host interface from powering up correctly:
https://lkml.org/lkml/2018/6/1/38
It also improves MIPI capture operations stability on my testing platform
where MIPI capture often failed and returned all-purple frames.
Fixes:
f22996db44e2 ("media: ov5640: add support of DVP parallel interface")
Tested-by: Steve Longerbeam <slongerbeam@gmail.com> (i.MX6q SabreSD, CSI-2)
Tested-by: Loic Poulain <loic.poulain@linaro.org> (Dragonboard-410c, CSI-2)
Reported-by: Jagan Teki <jagan@amarulasolutions.com>
Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Paul E. McKenney [Mon, 9 Jul 2018 20:47:30 +0000 (13:47 -0700)]
rcu: Make need_resched() respond to urgent RCU-QS needs
commit
92aa39e9dc77481b90cbef25e547d66cab901496 upstream.
The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent
need for an RCU quiescent state from the force-quiescent-state processing
within the grace-period kthread to context switches and to cond_resched().
Unfortunately, such urgent needs are not communicated to need_resched(),
which is sometimes used to decide when to invoke cond_resched(), for
but one example, within the KVM vcpu_run() function. As of v4.15, this
can result in synchronize_sched() being delayed by up to ten seconds,
which can be problematic, to say nothing of annoying.
This commit therefore checks rcu_dynticks.rcu_urgent_qs from within
rcu_check_callbacks(), which is invoked from the scheduling-clock
interrupt handler. If the current task is not an idle task and is
not executing in usermode, a context switch is forced, and either way,
the rcu_dynticks.rcu_urgent_qs variable is set to false. If the current
task is an idle task, then RCU's dyntick-idle code will detect the
quiescent state, so no further action is required. Similarly, if the
task is executing in usermode, other code in rcu_check_callbacks() and
its called functions will report the corresponding quiescent state.
Reported-by: Marius Hillenbrand <mhillenb@amazon.de>
Reported-by: David Woodhouse <dwmw2@infradead.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Backported to make patch apply cleanly on older versions. ]
Tested-by: Marius Hillenbrand <mhillenb@amazon.de>
Cc: <stable@vger.kernel.org> # 4.12.x - 4.19.x
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andreas Gruenbacher [Sun, 11 Nov 2018 11:15:21 +0000 (11:15 +0000)]
gfs2: Fix iomap buffer head reference counting bug
commit
c26b5aa8ef0d46035060fded475e6ab957b9f69f upstream.
GFS2 passes the inode buffer head (dibh) from gfs2_iomap_begin to
gfs2_iomap_end in iomap->private. It sets that private pointer in
gfs2_iomap_get. Users of gfs2_iomap_get other than gfs2_iomap_begin
would have to release iomap->private, but this isn't done correctly,
leading to a leak of buffer head references.
To fix this, move the code for setting iomap->private from
gfs2_iomap_get to gfs2_iomap_begin.
Fixes:
64bc06bb32 ("gfs2: iomap buffered write support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Thu, 4 Oct 2018 18:06:14 +0000 (11:06 -0700)]
tty: wipe buffer if not echoing data
commit
b97b3d9fb57860a60592859e332de7759fd54c2e upstream.
If we are not echoing the data to userspace or the console is in icanon
mode, then perhaps it is a "secret" so we should wipe it once we are
done with it.
This mirrors the logic that the audit code has.
Reported-by: aszlig <aszlig@nix.build>
Tested-by: Milan Broz <gmazyland@gmail.com>
Tested-by: Daniel Zatovic <daniel.zatovic@gmail.com>
Tested-by: aszlig <aszlig@nix.build>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Thu, 4 Oct 2018 18:06:13 +0000 (11:06 -0700)]
tty: wipe buffer.
commit
c9a8e5fce009e3c601a43c49ea9dbcb25d1ffac5 upstream.
After we are done with the tty buffer, zero it out.
Reported-by: aszlig <aszlig@nix.build>
Tested-by: Milan Broz <gmazyland@gmail.com>
Tested-by: Daniel Zatovic <daniel.zatovic@gmail.com>
Tested-by: aszlig <aszlig@nix.build>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sebastien Boisvert [Fri, 26 Oct 2018 22:02:23 +0000 (15:02 -0700)]
include/linux/pfn_t.h: force '~' to be parsed as an unary operator
commit
4d54954a197175c0dcb3c82af0c0740d0c5f827a upstream.
Tracing the event "fs_dax:dax_pmd_insert_mapping" with perf produces this
warning:
[fs_dax:dax_pmd_insert_mapping] unknown op '~'
It is printed in process_op (tools/lib/traceevent/event-parse.c) because
'~' is parsed as a binary operator.
perf reads the format of fs_dax:dax_pmd_insert_mapping ("print fmt") from
/sys/kernel/debug/tracing/events/fs_dax/dax_pmd_insert_mapping/format .
The format contains:
~(((u64) ~(~(((1UL) << 12)-1)))
^
\ interpreted as a binary operator by process_op().
This part is generated in the declaration of the event class
dax_pmd_insert_mapping_class in include/trace/events/fs_dax.h :
__print_flags_u64(__entry->pfn_val & PFN_FLAGS_MASK, "|",
PFN_FLAGS_TRACE),
This patch adds a pair of parentheses in the declaration of PFN_FLAGS_MASK
to make sure that '~' is parsed as a unary operator by perf.
The part of the format that was problematic is now:
~(((u64) (~(~(((1UL) << 12)-1))))
Now, all the '~' are parsed as unary operators.
Link: http://lkml.kernel.org/r/20181021145939.8760-1-sebhtml@videotron.qc.ca
Signed-off-by: Sebastien Boisvert <sebhtml@videotron.qc.ca>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Elenie Godzaridis <arangradient@gmail.com>
Cc: <stable@vger.kerenl.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Matthew Wilcox [Fri, 16 Nov 2018 20:50:02 +0000 (15:50 -0500)]
dax: Avoid losing wakeup in dax_lock_mapping_entry
commit
25bbe21bf427a81b8e3ccd480ea0e1d940256156 upstream.
After calling get_unlocked_entry(), you have to call
put_unlocked_entry() to avoid subsequent waiters losing wakeups.
Fixes:
c2a7d2a11552 ("filesystem-dax: Introduce dax_lock_mapping_entry()")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Hocko [Fri, 16 Nov 2018 23:08:53 +0000 (15:08 -0800)]
mm, page_alloc: check for max order in hot path
[ Upstream commit
c63ae43ba53bc432b414fd73dd5f4b01fcb1ab43 ]
Konstantin has noticed that kvmalloc might trigger the following
warning:
WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
[...]
Call Trace:
fragmentation_index+0x76/0x90
compaction_suitable+0x4f/0xf0
shrink_node+0x295/0x310
node_reclaim+0x205/0x250
get_page_from_freelist+0x649/0xad0
__alloc_pages_nodemask+0x12a/0x2a0
kmalloc_large_node+0x47/0x90
__kmalloc_node+0x22b/0x2e0
kvmalloc_node+0x3e/0x70
xt_alloc_table_info+0x3a/0x80 [x_tables]
do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
nf_setsockopt+0x44/0x60
SyS_setsockopt+0x6f/0xc0
do_syscall_64+0x67/0x120
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
the problem is that we only check for an out of bound order in the slow
path and the node reclaim might happen from the fast path already. This
is fixable by making sure that kvmalloc doesn't ever use kmalloc for
requests that are larger than KMALLOC_MAX_SIZE but this also shows that
the code is rather fragile. A recent UBSAN report just underlines that
by the following report
UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
shift exponent 51 is too large for 32-bit type 'int'
CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xd2/0x148 lib/dump_stack.c:113
ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
__ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
__zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
zone_watermark_fast mm/page_alloc.c:3216 [inline]
get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
__alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
alloc_pages include/linux/gfp.h:509 [inline]
__get_free_pages+0x12/0x60 mm/page_alloc.c:4414
dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
__blkdev_driver_ioctl block/ioctl.c:303 [inline]
blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
block_ioctl+0x105/0x150 fs/block_dev.c:1883
vfs_ioctl fs/ioctl.c:46 [inline]
do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
__do_sys_ioctl fs/ioctl.c:709 [inline]
__se_sys_ioctl fs/ioctl.c:707 [inline]
__x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Note that this is not a kvmalloc path. It is just that the fast path
really depends on having sanitzed order as well. Therefore move the
order check to the fast path.
Link: http://lkml.kernel.org/r/20181113094305.GM15120@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reported-by: Kyungtae Kim <kt0755@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Byoungyoung Lee <lifeasageek@gmail.com>
Cc: "Dae R. Jeong" <threeearcat@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Yufen Yu [Fri, 16 Nov 2018 23:08:39 +0000 (15:08 -0800)]
tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset
[ Upstream commit
1a413646931cb14442065cfc17561e50f5b5bb44 ]
Other filesystems such as ext4, f2fs and ubifs all return ENXIO when
lseek (SEEK_DATA or SEEK_HOLE) requests a negative offset.
man 2 lseek says
: EINVAL whence is not valid. Or: the resulting file offset would be
: negative, or beyond the end of a seekable device.
:
: ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is beyond
: the end of the file.
Make tmpfs return ENXIO under these circumstances as well. After this,
tmpfs also passes xfstests's generic/448.
[akpm@linux-foundation.org: rewrite changelog]
Link: http://lkml.kernel.org/r/1540434176-14349-1-git-send-email-yuyufen@huawei.com
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Michal Hocko [Fri, 16 Nov 2018 23:08:15 +0000 (15:08 -0800)]
mm, memory_hotplug: check zone_movable in has_unmovable_pages
[ Upstream commit
9d7899999c62c1a81129b76d2a6ecbc4655e1597 ]
Page state checks are racy. Under a heavy memory workload (e.g. stress
-m 200 -t 2h) it is quite easy to hit a race window when the page is
allocated but its state is not fully populated yet. A debugging patch to
dump the struct page state shows
has_unmovable_pages: pfn:0x10dfec00, found:0x1, count:0x0
page:
ffffea0437fb0000 count:1 mapcount:1 mapping:
ffff880e05239841 index:0x7f26e5000 compound_mapcount: 1
flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked)
Note that the state has been checked for both PageLRU and PageSwapBacked
already. Closing this race completely would require some sort of retry
logic. This can be tricky and error prone (think of potential endless
or long taking loops).
Workaround this problem for movable zones at least. Such a zone should
only contain movable pages. Commit
15c30bc09085 ("mm, memory_hotplug:
make has_unmovable_pages more robust") has told us that this is not
strictly true though. Bootmem pages should be marked reserved though so
we can move the original check after the PageReserved check. Pages from
other zones are still prone to races but we even do not pretend that
memory hotremove works for those so pre-mature failure doesn't hurt that
much.
Link: http://lkml.kernel.org/r/20181106095524.14629-1-mhocko@kernel.org
Fixes:
15c30bc09085 ("mm, memory_hotplug: make has_unmovable_pages more robust")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Baoquan He <bhe@redhat.com>
Tested-by: Baoquan He <bhe@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Vitaly Wool [Fri, 16 Nov 2018 23:07:56 +0000 (15:07 -0800)]
z3fold: fix possible reclaim races
[ Upstream commit
ca0246bb97c23da9d267c2107c07fb77e38205c9 ]
Reclaim and free can race on an object which is basically fine but in
order for reclaim to be able to map "freed" object we need to encode
object length in the handle. handle_to_chunks() is then introduced to
extract object length from a handle and use it during mapping.
Moreover, to avoid racing on a z3fold "headless" page release, we should
not try to free that page in z3fold_free() if the reclaim bit is set.
Also, in the unlikely case of trying to reclaim a page being freed, we
should not proceed with that page.
While at it, fix the page accounting in reclaim function.
This patch supersedes "[PATCH] z3fold: fix reclaim lock-ups".
Link: http://lkml.kernel.org/r/20181105162225.74e8837d03583a9b707cf559@gmail.com
Signed-off-by: Vitaly Wool <vitaly.vul@sony.com>
Signed-off-by: Jongseok Kim <ks77sj@gmail.com>
Reported-by-by: Jongseok Kim <ks77sj@gmail.com>
Reviewed-by: Snild Dolkow <snild@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ard Biesheuvel [Wed, 14 Nov 2018 17:55:41 +0000 (09:55 -0800)]
efi/arm: Revert deferred unmap of early memmap mapping
[ Upstream commit
33412b8673135b18ea42beb7f5117ed0091798b6 ]
Commit:
3ea86495aef2 ("efi/arm: preserve early mapping of UEFI memory map longer for BGRT")
deferred the unmap of the early mapping of the UEFI memory map to
accommodate the ACPI BGRT code, which looks up the memory type that
backs the BGRT table to validate it against the requirements of the UEFI spec.
Unfortunately, this causes problems on ARM, which does not permit
early mappings to persist after paging_init() is called, resulting
in a WARN() splat. Since we don't support the BGRT table on ARM anway,
let's revert ARM to the old behaviour, which is to take down the
early mapping at the end of efi_init().
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes:
3ea86495aef2 ("efi/arm: preserve early mapping of UEFI memory ...")
Link: http://lkml.kernel.org/r/20181114175544.12860-3-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Satheesh Rajendran [Thu, 8 Nov 2018 05:17:56 +0000 (10:47 +0530)]
powerpc/numa: Suppress "VPHN is not supported" messages
[ Upstream commit
437ccdc8ce629470babdda1a7086e2f477048cbd ]
When VPHN function is not supported and during cpu hotplug event,
kernel prints message 'VPHN function not supported. Disabling
polling...'. Currently it prints on every hotplug event, it floods
dmesg when a KVM guest tries to hotplug huge number of vcpus, let's
just print once and suppress further kernel prints.
Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Trond Myklebust [Tue, 13 Nov 2018 21:37:54 +0000 (16:37 -0500)]
NFSv4: Fix an Oops during delegation callbacks
[ Upstream commit
e39d8a186ed002854196668cb7562ffdfbc6d379 ]
If the server sends a CB_GETATTR or a CB_RECALL while the filesystem is
being unmounted, then we can Oops when releasing the inode in
nfs4_callback_getattr() and nfs4_callback_recall().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Prarit Bhargava [Thu, 20 Sep 2018 12:59:14 +0000 (08:59 -0400)]
kdb: Use strscpy with destination buffer size
[ Upstream commit
c2b94c72d93d0929f48157eef128c4f9d2e603ce ]
gcc 8.1.0 warns with:
kernel/debug/kdb/kdb_support.c: In function ‘kallsyms_symbol_next’:
kernel/debug/kdb/kdb_support.c:239:4: warning: ‘strncpy’ specified bound depends on the length of the source argument [-Wstringop-overflow=]
strncpy(prefix_name, name, strlen(name)+1);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/debug/kdb/kdb_support.c:239:31: note: length computed here
Use strscpy() with the destination buffer size, and use ellipses when
displaying truncated symbols.
v2: Use strscpy()
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Jonathan Toppins <jtoppins@redhat.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: kgdb-bugreport@lists.sourceforge.net
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Philip Yang [Mon, 12 Nov 2018 19:00:45 +0000 (14:00 -0500)]
drm/amdgpu: fix bug with IH ring setup
[ Upstream commit
c837243ff4017f493c7d6f4ab57278d812a86859 ]
The bug limits the IH ring wptr address to 40bit. When the system memory
is bigger than 1TB, the bus address is more than 40bit, this causes the
interrupt cannot be handled and cleared correctly.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Olof Johansson [Wed, 31 Oct 2018 06:47:09 +0000 (23:47 -0700)]
RISC-V: Silence some module warnings on 32-bit
[ Upstream commit
ef3a61406618291c46da168ff91acaa28d85944c ]
Fixes:
arch/riscv/kernel/module.c: In function 'apply_r_riscv_32_rela':
./include/linux/kern_levels.h:5:18: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'Elf32_Addr' {aka 'unsigned int'} [-Wformat=]
arch/riscv/kernel/module.c:23:27: note: format string is defined here
arch/riscv/kernel/module.c: In function 'apply_r_riscv_pcrel_hi20_rela':
./include/linux/kern_levels.h:5:18: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'Elf32_Addr' {aka 'unsigned int'} [-Wformat=]
arch/riscv/kernel/module.c:104:23: note: format string is defined here
arch/riscv/kernel/module.c: In function 'apply_r_riscv_hi20_rela':
./include/linux/kern_levels.h:5:18: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'Elf32_Addr' {aka 'unsigned int'} [-Wformat=]
arch/riscv/kernel/module.c:146:23: note: format string is defined here
arch/riscv/kernel/module.c: In function 'apply_r_riscv_got_hi20_rela':
./include/linux/kern_levels.h:5:18: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'Elf32_Addr' {aka 'unsigned int'} [-Wformat=]
arch/riscv/kernel/module.c:190:60: note: format string is defined here
arch/riscv/kernel/module.c: In function 'apply_r_riscv_call_plt_rela':
./include/linux/kern_levels.h:5:18: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'Elf32_Addr' {aka 'unsigned int'} [-Wformat=]
arch/riscv/kernel/module.c:214:24: note: format string is defined here
arch/riscv/kernel/module.c: In function 'apply_r_riscv_call_rela':
./include/linux/kern_levels.h:5:18: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'Elf32_Addr' {aka 'unsigned int'} [-Wformat=]
arch/riscv/kernel/module.c:236:23: note: format string is defined here
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
David Abdurachmanov [Mon, 5 Nov 2018 14:35:37 +0000 (15:35 +0100)]
riscv: add missing vdso_install target
[ Upstream commit
f157d411a9eb170d2ee6b766da7a381962017cc9 ]
Building kernel 4.20 for Fedora as RPM fails, because riscv is missing
vdso_install target in arch/riscv/Makefile.
Signed-off-by: David Abdurachmanov <david.abdurachmanov@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Trond Myklebust [Mon, 12 Nov 2018 21:06:51 +0000 (16:06 -0500)]
SUNRPC: Fix a bogus get/put in generic_key_to_expire()
[ Upstream commit
e3d5e573a54dabdc0f9f3cb039d799323372b251 ]
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hannes Reinecke [Mon, 12 Nov 2018 17:35:25 +0000 (10:35 -0700)]
block: copy ioprio in __bio_clone_fast() and bounce
[ Upstream commit
ca474b73896bf6e0c1eb8787eb217b0f80221610 ]
We need to copy the io priority, too; otherwise the clone will run
with a different priority than the original one.
Fixes:
43b62ce3ff0a ("block: move bio io prio to a new field")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixed up subject, and ordered stores.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kan Liang [Fri, 19 Oct 2018 17:04:18 +0000 (10:04 -0700)]
perf/x86/intel/uncore: Add more IMC PCI IDs for KabyLake and CoffeeLake CPUs
[ Upstream commit
c10a8de0d32e95b0b8c7c17b6dc09baea5a5a899 ]
KabyLake and CoffeeLake CPUs have the same client uncore events as SkyLake.
Add the PCI IDs for the KabyLake Y, U, S processor lines and CoffeeLake U,
H, S processor lines.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20181019170419.378-1-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Patrick Bellasi [Mon, 5 Nov 2018 14:53:58 +0000 (14:53 +0000)]
sched/fair: Fix cpu_util_wake() for 'execl' type workloads
[ Upstream commit
c469933e772132aad040bd6a2adc8edf9ad6f825 ]
A ~10% regression has been reported for UnixBench's execl throughput
test by Aaron Lu and Ye Xiaolong:
https://lkml.org/lkml/2018/10/30/765
That test is pretty simple, it does a "recursive" execve() syscall on the
same binary. Starting from the syscall, this sequence is possible:
do_execve()
do_execveat_common()
__do_execve_file()
sched_exec()
select_task_rq_fair() <==| Task already enqueued
find_idlest_cpu()
find_idlest_group()
capacity_spare_wake() <==| Functions not called from
cpu_util_wake() | the wakeup path
which means we can end up calling cpu_util_wake() not only from the
"wakeup path", as its name would suggest. Indeed, the task doing an
execve() syscall is already enqueued on the CPU we want to get the
cpu_util_wake() for.
The estimated utilization for a CPU computed in cpu_util_wake() was
written under the assumption that function can be called only from the
wakeup path. If instead the task is already enqueued, we end up with a
utilization which does not remove the current task's contribution from
the estimated utilization of the CPU.
This will wrongly assume a reduced spare capacity on the current CPU and
increase the chances to migrate the task on execve.
The regression is tracked down to:
commit
d519329f72a6 ("sched/fair: Update util_est only on util_avg updates")
because in that patch we turn on by default the UTIL_EST sched feature.
However, the real issue is introduced by:
commit
f9be3e5961c5 ("sched/fair: Use util_est in LB and WU paths")
Let's fix this by ensuring to always discount the task estimated
utilization from the CPU's estimated utilization when the task is also
the current one. The same benchmark of the bug report, executed on a
dual socket 40 CPUs Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz machine,
reports these "Execl Throughput" figures (higher the better):
mainline : 48136.5 lps
mainline+fix : 55376.5 lps
which correspond to a 15% speedup.
Moreover, since {cpu_util,capacity_spare}_wake() are not really only
used from the wakeup path, let's remove this ambiguity by using a better
matching name: {cpu_util,capacity_spare}_without().
Since we are at that, let's also improve the existing documentation.
Reported-by: Aaron Lu <aaron.lu@intel.com>
Reported-by: Ye Xiaolong <xiaolong.ye@intel.com>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Fixes:
f9be3e5961c5 (sched/fair: Use util_est in LB and WU paths)
Link: https://lore.kernel.org/lkml/20181025093100.GB13236@e110439-lin/
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Michael Ellerman [Tue, 6 Nov 2018 12:37:58 +0000 (23:37 +1100)]
powerpc/io: Fix the IO workarounds code to work with Radix
[ Upstream commit
43c6494fa1499912c8177e71450c0279041152a6 ]
Back in 2006 Ben added some workarounds for a misbehaviour in the
Spider IO bridge used on early Cell machines, see commit
014da7ff47b5 ("[POWERPC] Cell "Spider" MMIO workarounds"). Later these
were made to be generic, ie. not tied specifically to Spider.
The code stashes a token in the high bits (59-48) of virtual addresses
used for IO (eg. returned from ioremap()). This works fine when using
the Hash MMU, but when we're using the Radix MMU the bits used for the
token overlap with some of the bits of the virtual address.
This is because the maximum virtual address is larger with Radix, up
to
c00fffffffffffff, and in fact we use that high part of the address
range for ioremap(), see RADIX_KERN_IO_START.
As it happens the bits that are used overlap with the bits that
differentiate an IO address vs a linear map address. If the resulting
address lies outside the linear mapping we will crash (see below), if
not we just corrupt memory.
virtio-pci 0000:00:00.0: Using 64-bit direct DMA at offset
800000000000000
Unable to handle kernel paging request for data at address 0xc000000080000014
...
CFAR:
c000000000626b98 DAR:
c000000080000014 DSISR:
42000000 IRQMASK: 0
GPR00:
c0000000006c54fc c00000003e523378 c0000000016de600 0000000000000000
GPR04:
c00c000080000014 0000000000000007 0fffffff000affff 0000000000000030
^^^^
...
NIP [
c000000000626c5c] .iowrite8+0xec/0x100
LR [
c0000000006c992c] .vp_reset+0x2c/0x90
Call Trace:
.pci_bus_read_config_dword+0xc4/0x120 (unreliable)
.register_virtio_device+0x13c/0x1c0
.virtio_pci_probe+0x148/0x1f0
.local_pci_probe+0x68/0x140
.pci_device_probe+0x164/0x220
.really_probe+0x274/0x3b0
.driver_probe_device+0x80/0x170
.__driver_attach+0x14c/0x150
.bus_for_each_dev+0xb8/0x130
.driver_attach+0x34/0x50
.bus_add_driver+0x178/0x2f0
.driver_register+0x90/0x1a0
.__pci_register_driver+0x6c/0x90
.virtio_pci_driver_init+0x2c/0x40
.do_one_initcall+0x64/0x280
.kernel_init_freeable+0x36c/0x474
.kernel_init+0x24/0x160
.ret_from_kernel_thread+0x58/0x7c
This hasn't been a problem because CONFIG_PPC_IO_WORKAROUNDS which
enables this code is usually not enabled. It is only enabled when it's
selected by PPC_CELL_NATIVE which is only selected by
PPC_IBM_CELL_BLADE and that in turn depends on BIG_ENDIAN. So in order
to hit the bug you need to build a big endian kernel, with IBM Cell
Blade support enabled, as well as Radix MMU support, and then boot
that on Power9 using Radix MMU.
Still we can fix the bug, so let's do that. We simply use fewer bits
for the token, taking the union of the restrictions on the address
from both Hash and Radix, we end up with 8 bits we can use for the
token. The only user of the token is iowa_mem_find_bus() which only
supports 8 token values, so 8 bits is plenty for that.
Fixes:
566ca99af026 ("powerpc/mm/radix: Add dummy radix_enabled()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jens Axboe [Fri, 9 Nov 2018 22:58:40 +0000 (15:58 -0700)]
floppy: fix race condition in __floppy_read_block_0()
[ Upstream commit
de7b75d82f70c5469675b99ad632983c50b6f7e7 ]
LKP recently reported a hang at bootup in the floppy code:
[ 245.678853] INFO: task mount:580 blocked for more than 120 seconds.
[ 245.679906] Tainted: G T 4.19.0-rc6-00172-ga9f38e1 #1
[ 245.680959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 245.682181] mount D 6372 580 1 0x00000004
[ 245.683023] Call Trace:
[ 245.683425] __schedule+0x2df/0x570
[ 245.683975] schedule+0x2d/0x80
[ 245.684476] schedule_timeout+0x19d/0x330
[ 245.685090] ? wait_for_common+0xa5/0x170
[ 245.685735] wait_for_common+0xac/0x170
[ 245.686339] ? do_sched_yield+0x90/0x90
[ 245.686935] wait_for_completion+0x12/0x20
[ 245.687571] __floppy_read_block_0+0xfb/0x150
[ 245.688244] ? floppy_resume+0x40/0x40
[ 245.688844] floppy_revalidate+0x20f/0x240
[ 245.689486] check_disk_change+0x43/0x60
[ 245.690087] floppy_open+0x1ea/0x360
[ 245.690653] __blkdev_get+0xb4/0x4d0
[ 245.691212] ? blkdev_get+0x1db/0x370
[ 245.691777] blkdev_get+0x1f3/0x370
[ 245.692351] ? path_put+0x15/0x20
[ 245.692871] ? lookup_bdev+0x4b/0x90
[ 245.693539] blkdev_get_by_path+0x3d/0x80
[ 245.694165] mount_bdev+0x2a/0x190
[ 245.694695] squashfs_mount+0x10/0x20
[ 245.695271] ? squashfs_alloc_inode+0x30/0x30
[ 245.695960] mount_fs+0xf/0x90
[ 245.696451] vfs_kern_mount+0x43/0x130
[ 245.697036] do_mount+0x187/0xc40
[ 245.697563] ? memdup_user+0x28/0x50
[ 245.698124] ksys_mount+0x60/0xc0
[ 245.698639] sys_mount+0x19/0x20
[ 245.699167] do_int80_syscall_32+0x61/0x130
[ 245.699813] entry_INT80_32+0xc7/0xc7
showing that we never complete that read request. The reason is that
the completion setup is racy - it initializes the completion event
AFTER submitting the IO, which means that the IO could complete
before/during the init. If it does, we are passing garbage to
complete() and we may sleep forever waiting for the event to
occur.
Fixes:
7b7b68bba5ef ("floppy: bail out in open() if drive is not responding to block0 read")
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ard Biesheuvel [Thu, 8 Nov 2018 22:55:16 +0000 (23:55 +0100)]
crypto: simd - correctly take reqsize of wrapped skcipher into account
[ Upstream commit
508a1c4df085a547187eed346f1bfe5e381797f1 ]
The simd wrapper's skcipher request context structure consists
of a single subrequest whose size is taken from the subordinate
skcipher. However, in simd_skcipher_init(), the reqsize that is
retrieved is not from the subordinate skcipher but from the
cryptd request structure, whose size is completely unrelated to
the actual wrapped skcipher.
Reported-by: Qian Cai <cai@gmx.us>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Qian Cai <cai@gmx.us>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Xulin Sun [Tue, 6 Nov 2018 08:42:19 +0000 (16:42 +0800)]
rtc: pcf2127: fix a kmemleak caused in pcf2127_i2c_gather_write
[ Upstream commit
9bde0afb7a906f1dabdba37162551565740b862d ]
pcf2127_i2c_gather_write() allocates memory as local variable
for i2c_master_send(), after finishing the master transfer,
the allocated memory should be freed. The kmemleak is reported:
unreferenced object 0xffff80231e7dba80 (size 64):
comm "hwclock", pid 27762, jiffies
4296880075 (age 356.944s)
hex dump (first 32 bytes):
03 00 12 03 19 02 11 13 00 80 98 18 00 00 ff ff ................
00 50 00 00 00 00 00 00 02 00 00 00 00 00 00 00 .P..............
backtrace:
[<
ffff000008221398>] create_object+0xf8/0x278
[<
ffff000008a96264>] kmemleak_alloc+0x74/0xa0
[<
ffff00000821070c>] __kmalloc+0x1ac/0x348
[<
ffff0000087ed1dc>] pcf2127_i2c_gather_write+0x54/0xf8
[<
ffff0000085fd9d4>] _regmap_raw_write+0x464/0x850
[<
ffff0000085fe3f4>] regmap_bulk_write+0x1a4/0x348
[<
ffff0000087ed32c>] pcf2127_rtc_set_time+0xac/0xe8
[<
ffff0000087eaad8>] rtc_set_time+0x80/0x138
[<
ffff0000087ebfb0>] rtc_dev_ioctl+0x398/0x610
[<
ffff00000823f2c0>] do_vfs_ioctl+0xb0/0x848
[<
ffff00000823fae4>] SyS_ioctl+0x8c/0xa8
[<
ffff000008083ac0>] el0_svc_naked+0x34/0x38
[<
ffffffffffffffff>] 0xffffffffffffffff
Signed-off-by: Xulin Sun <xulin.sun@windriver.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Hans de Goede [Tue, 4 Sep 2018 14:51:29 +0000 (16:51 +0200)]
rtc: cmos: Do not export alarm rtc_ops when we do not support alarms
[ Upstream commit
fbb974ba693bbfb4e24a62181ef16d4e45febc37 ]
When there is no IRQ configured for the RTC, the rtc-cmos code does not
support alarms, all alarm rtc_ops fail with -EIO / -EINVAL.
The rtc-core expects a rtc driver which does not support rtc alarms to
not have alarm ops at all. Otherwise the wakealarm sysfs attr will read
as empty rather then returning an error, making it impossible for
userspace to find out beforehand if alarms are supported.
A system without an IRQ for the RTC before this patch:
[root@localhost ~]# cat /sys/class/rtc/rtc0/wakealarm
[root@localhost ~]#
After this patch:
[root@localhost ~]# cat /sys/class/rtc/rtc0/wakealarm
cat: /sys/class/rtc/rtc0/wakealarm: No such file or directory
[root@localhost ~]#
This fixes gnome-session + systemd trying to use suspend-then-hibernate,
which causes systemd to abort the suspend when writing the RTC alarm fails.
BugLink: https://github.com/systemd/systemd/issues/9988
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Anson Huang [Mon, 5 Nov 2018 00:59:28 +0000 (00:59 +0000)]
cpufreq: imx6q: add return value check for voltage scale
[ Upstream commit
6ef28a04d1ccf718eee069b72132ce4aa1e52ab9 ]
Add return value check for voltage scale when ARM clock
rate change fail.
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Scott Wood [Wed, 7 Nov 2018 01:49:34 +0000 (19:49 -0600)]
KVM: PPC: Move and undef TRACE_INCLUDE_PATH/FILE
[ Upstream commit
28c5bcf74fa07c25d5bd118d1271920f51ce2a98 ]
TRACE_INCLUDE_PATH and TRACE_INCLUDE_FILE are used by
<trace/define_trace.h>, so like that #include, they should
be outside #ifdef protection.
They also need to be #undefed before defining, in case multiple trace
headers are included by the same C file. This became the case on
book3e after commit
cf4a6085151a ("powerpc/mm: Add missing tracepoint for
tlbie"), leading to the following build error:
CC arch/powerpc/kvm/powerpc.o
In file included from arch/powerpc/kvm/powerpc.c:51:0:
arch/powerpc/kvm/trace.h:9:0: error: "TRACE_INCLUDE_PATH" redefined
[-Werror]
#define TRACE_INCLUDE_PATH .
^
In file included from arch/powerpc/kvm/../mm/mmu_decl.h:25:0,
from arch/powerpc/kvm/powerpc.c:48:
./arch/powerpc/include/asm/trace.h:224:0: note: this is the location of
the previous definition
#define TRACE_INCLUDE_PATH asm
^
cc1: all warnings being treated as errors
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
YueHaibing [Fri, 26 Oct 2018 07:07:31 +0000 (15:07 +0800)]
scsi: hisi_sas: Remove set but not used variable 'dq_list'
[ Upstream commit
e34ff8edcae89922d187425ab0b82e6a039aa371 ]
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/hisi_sas/hisi_sas_v1_hw.c: In function 'start_delivery_v1_hw':
drivers/scsi/hisi_sas/hisi_sas_v1_hw.c:907:20: warning:
variable 'dq_list' set but not used [-Wunused-but-set-variable]
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c: In function 'start_delivery_v2_hw':
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c:1671:20: warning:
variable 'dq_list' set but not used [-Wunused-but-set-variable]
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function 'start_delivery_v3_hw':
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:889:20: warning:
variable 'dq_list' set but not used [-Wunused-but-set-variable]
It never used since introduction in commit
fa222db0b036 ("scsi: hisi_sas: Don't lock DQ for complete task sending")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Arnd Bergmann [Fri, 2 Nov 2018 15:35:48 +0000 (16:35 +0100)]
scsi: lpfc: fix remoteport access
[ Upstream commit
f8d294324598ec85bea2779512e48c94cbe4d7c6 ]
The addition of a spinlock in lpfc_debugfs_nodelist_data() introduced
a bug that lets us not skip NULL pointers correctly, as noticed by
gcc-8:
drivers/scsi/lpfc/lpfc_debugfs.c: In function 'lpfc_debugfs_nodelist_data.constprop':
drivers/scsi/lpfc/lpfc_debugfs.c:728:13: error: 'nrport' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (nrport->port_role & FC_PORT_ROLE_NVME_INITIATOR)
This changes the logic back to what it was, while keeping the added
spinlock.
Fixes:
9e210178267b ("scsi: lpfc: Synchronize access to remoteport via rport")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Masayoshi Mizuma [Wed, 31 Oct 2018 01:50:25 +0000 (21:50 -0400)]
tools/testing/nvdimm: Fix the array size for dimm devices.
[ Upstream commit
af31b04b67f4fd7f639fd465a507c154c46fc9fb ]
KASAN reports following global out of bounds access while
nfit_test is being loaded. The out of bound access happens
the following reference to dimm_fail_cmd_flags[dimm]. 'dimm' is
over than the index value, NUM_DCR (==5).
static int override_return_code(int dimm, unsigned int func, int rc)
{
if ((1 << func) & dimm_fail_cmd_flags[dimm]) {
dimm_fail_cmd_flags[] definition:
static unsigned long dimm_fail_cmd_flags[NUM_DCR];
'dimm' is the return value of get_dimm(), and get_dimm() returns
the index of handle[] array. The handle[] has 7 index. Let's use
ARRAY_SIZE(handle) as the array size.
KASAN report:
==================================================================
BUG: KASAN: global-out-of-bounds in nfit_test_ctl+0x47bb/0x55b0 [nfit_test]
Read of size 8 at addr
ffffffffc10cbbe8 by task kworker/u41:0/8
...
Call Trace:
dump_stack+0xea/0x1b0
? dump_stack_print_info.cold.0+0x1b/0x1b
? kmsg_dump_rewind_nolock+0xd9/0xd9
print_address_description+0x65/0x22e
? nfit_test_ctl+0x47bb/0x55b0 [nfit_test]
kasan_report.cold.6+0x92/0x1a6
nfit_test_ctl+0x47bb/0x55b0 [nfit_test]
...
The buggy address belongs to the variable:
dimm_fail_cmd_flags+0x28/0xffffffffffffa440 [nfit_test]
==================================================================
Fixes:
39611e83a28c ("tools/testing/nvdimm: Make DSM failure code injection...")
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jerome Brunet [Mon, 29 Oct 2018 15:13:40 +0000 (16:13 +0100)]
pinctrl: meson: fix meson8b ao pull register bits
[ Upstream commit
a1705f02704cd8a24d434bfd0141ee8142ad277a ]
AO pull register definition is inverted between pull (up/down) and
pull enable. Fixing this allows to properly apply bias setting
through pinconf
Fixes:
0fefcb6876d0 ("pinctrl: Add support for Meson8b")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jerome Brunet [Mon, 29 Oct 2018 15:13:39 +0000 (16:13 +0100)]
pinctrl: meson: fix meson8 ao pull register bits
[ Upstream commit
e91b162d2868672d06010f34aa83d408db13d3c6 ]
AO pull register definition is inverted between pull (up/down) and
pull enable. Fixing this allows to properly apply bias setting
through pinconf
Fixes:
6ac730951104 ("pinctrl: add driver for Amlogic Meson SoCs")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jerome Brunet [Mon, 29 Oct 2018 15:13:38 +0000 (16:13 +0100)]
pinctrl: meson: fix gxl ao pull register bits
[ Upstream commit
ed3a2b74f3eb34c84c8377353f4730f05acdfd05 ]
AO pull register definition is inverted between pull (up/down) and
pull enable. Fixing this allows to properly apply bias setting
through pinconf
Fixes:
0f15f500ff2c ("pinctrl: meson: Add GXL pinctrl definitions")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jerome Brunet [Mon, 29 Oct 2018 15:13:37 +0000 (16:13 +0100)]
pinctrl: meson: fix gxbb ao pull register bits
[ Upstream commit
4bc51e1e350cd4707ce6e551a93eae26d40b9889 ]
AO pull register definition is inverted between pull (up/down) and
pull enable. Fixing this allows to properly apply bias setting
through pinconf
Fixes:
468c234f9ed7 ("pinctrl: amlogic: Add support for Amlogic Meson GXBB SoC")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Jerome Brunet [Tue, 23 Oct 2018 16:03:19 +0000 (18:03 +0200)]
pinctrl: meson: fix pinconf bias disable
[ Upstream commit
e39f9dd8206ad66992ac0e6218ef1ba746f2cce9 ]
If a bias is enabled on a pin of an Amlogic SoC, calling .pin_config_set()
with PIN_CONFIG_BIAS_DISABLE will not disable the bias. Instead it will
force a pull-down bias on the pin.
Instead of the pull type register bank, the driver should access the pull
enable register bank.
Fixes:
6ac730951104 ("pinctrl: add driver for Amlogic Meson SoCs")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Amir Goldstein [Tue, 30 Oct 2018 18:29:53 +0000 (20:29 +0200)]
fanotify: fix handling of events on child sub-directory
commit
b469e7e47c8a075cc08bcd1e85d4365134bdcdd5 upstream.
When an event is reported on a sub-directory and the parent inode has
a mark mask with FS_EVENT_ON_CHILD|FS_ISDIR, the event will be sent to
fsnotify() even if the event type is not in the parent mark mask
(e.g. FS_OPEN).
Further more, if that event happened on a mount or a filesystem with
a mount/sb mark that does have that event type in their mask, the "on
child" event will be reported on the mount/sb mark. That is not
desired, because user will get a duplicate event for the same action.
Note that the event reported on the victim inode is never merged with
the event reported on the parent inode, because of the check in
should_merge(): old_fsn->inode == new_fsn->inode.
Fix this by looking for a match of an actual event type (i.e. not just
FS_ISDIR) in parent's inode mark mask and by not reporting an "on child"
event to group if event type is only found on mount/sb marks.
[backport hint: The bug seems to have always been in fanotify, but this
patch will only apply cleanly to v4.19.y]
Cc: <stable@vger.kernel.org> # v4.19
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
[amir: backport to v4.19]
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Amir Goldstein [Wed, 3 Oct 2018 21:25:33 +0000 (00:25 +0300)]
fsnotify: generalize handling of extra event flags
commit
007d1e8395eaa59b0e7ad9eb2b53a40859446a88 upstream.
FS_EVENT_ON_CHILD gets a special treatment in fsnotify() because it is
not a flag specifying an event type, but rather an extra flags that may
be reported along with another event and control the handling of the
event by the backend.
FS_ISDIR is also an "extra flag" and not an "event type" and therefore
desrves the same treatment. With inotify/dnotify backends it was never
possible to set FS_ISDIR in mark masks, so it did not matter.
With fanotify backend, mark adding code jumps through hoops to avoid
setting the FS_ISDIR in the commulative object mask.
Separate the constant ALL_FSNOTIFY_EVENTS to ALL_FSNOTIFY_FLAGS and
ALL_FSNOTIFY_EVENTS, so the latter can be used to test for specific
event types.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael J. Ruhl [Mon, 10 Sep 2018 16:39:03 +0000 (09:39 -0700)]
IB/hfi1: Eliminate races in the SDMA send error path
commit
a0e0cb82804a6a21d9067022c2dfdf80d11da429 upstream.
pq_update() can only be called in two places: from the completion
function when the complete (npkts) sequence of packets has been
submitted and processed, or from setup function if a subset of the
packets were submitted (i.e. the error path).
Currently both paths can call pq_update() if an error occurrs. This
race will cause the n_req value to go negative, hanging file_close(),
or cause a crash by freeing the txlist more than once.
Several variables are used to determine SDMA send state. Most of
these are unnecessary, and have code inspectible races between the
setup function and the completion function, in both the send path and
the error path.
The request 'status' value can be set by the setup or by the
completion function. This is code inspectibly racy. Since the status
is not needed in the completion code or by the caller it has been
removed.
The request 'done' value races between usage by the setup and the
completion function. The completion function does not need this.
When the number of processed packets matches npkts, it is done.
The 'has_error' value races between usage of the setup and the
completion function. This can cause incorrect error handling and leave
the n_req in an incorrect value (i.e. negative).
Simplify the code by removing all of the unneeded state checks and
variables.
Clean up iovs node when it is freed.
Eliminate race conditions in the error path:
If all packets are submitted, the completion handler will set the
completion status correctly (ok or aborted).
If all packets are not submitted, the caller must wait until the
submitted packets have completed, and then set the completion status.
These two change eliminate the race condition in the error path.
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Erik Schmauss [Wed, 17 Oct 2018 21:09:35 +0000 (14:09 -0700)]
ACPICA: AML interpreter: add region addresses in global list during initialization
commit
4abb951b73ff0a8a979113ef185651aa3c8da19b upstream.
The table load process omitted adding the operation region address
range to the global list. This omission is problematic because the OS
queries the global list to check for address range conflicts before
deciding which drivers to load. This commit may result in warning
messages that look like the following:
[ 7.871761] ACPI Warning: system_IO range 0x00000428-0x0000042F conflicts with op_region 0x00000400-0x0000047F (\PMIO) (
20180531/utaddress-213)
[ 7.871769] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
However, these messages do not signify regressions. It is a result of
properly adding address ranges within the global address list.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200011
Tested-by: Jean-Marc Lenoir <archlinux@jihemel.com>
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marc Kleine-Budde [Fri, 9 Nov 2018 14:01:50 +0000 (15:01 +0100)]
can: flexcan: remove not needed struct flexcan_priv::tx_mb and struct flexcan_priv::tx_mb_idx
commit
e05237f9da42ee52e73acea0bb082d788e111229 upstream.
The previous patch changes the TX path to always use the last mailbox
regardless of the used offload scheme (rx-fifo or timestamp based). This
means members "tx_mb" and "tx_mb_idx" of the struct flexcan_priv don't
depend on the offload scheme, so replace them by compile time constants.
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander Stein [Thu, 11 Oct 2018 15:01:25 +0000 (17:01 +0200)]
can: flexcan: Always use last mailbox for TX
commit
cbffaf7aa09edbaea2bc7dc440c945297095e2fd upstream.
Essentially this patch moves the TX mailbox to position 63, regardless
of timestamp based offloading or RX FIFO. So mainly the iflag register
usage regarding TX has changed. The rest is consolidating RX FIFO and
timestamp offloading as they now use both the same TX mailbox.
The reason is a very annoying behavior regarding sending RTR frames when
_not_ using RX FIFO:
If a TX mailbox sent a RTR frame it becomes a RX mailbox. For that
reason flexcan_irq disables the TX mailbox again. But if during the time
the RTR was sent and the TX mailbox is disabled a new CAN frames is
received, it is lost without notice. The reason is that so-called
"Move-in" process starts from the lowest mailbox which happen to be a TX
mailbox set to EMPTY.
Steps to reproduce (I used an imx7d):
1. generate regular bursts of messages
2. send a RTR from flexcan with higher priority than burst messages every
1ms, e.g. cangen -I 0x100 -L 0 -g 1 -R can0
3. notice a lost message without notification after some seconds
When running an iperf in parallel this problem is occurring even more
frequently. Using filters is not possible as at least one single CAN-ID
is allowed. Handling the TX MB during RX is also not possible as there
is no race-free disable of RX MB.
There is still a slight window when the described problem can occur. But
for that all RX MB must be in use which is essentially next to an
overrun. Still there will be no indication if it ever occurs.
Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lukas Wunner [Sat, 27 Oct 2018 08:36:54 +0000 (10:36 +0200)]
can: hi311x: Use level-triggered interrupt
commit
f164d0204b1156a7e0d8d1622c1a8d25752befec upstream.
If the hi3110 shares the SPI bus with another traffic-intensive device
and packets are received in high volume (by a separate machine sending
with "cangen -g 0 -i -x"), reception stops after a few minutes and the
counter in /proc/interrupts stops incrementing. Bus state is "active".
Bringing the interface down and back up reconvenes the reception. The
issue is not observed when the hi3110 is the sole device on the SPI bus.
Using a level-triggered interrupt makes the issue go away and lets the
hi3110 successfully receive 2 GByte over the course of 5 days while a
ks8851 Ethernet chip on the same SPI bus handles 6 GByte of traffic.
Unfortunately the hi3110 datasheet is mum on the trigger type. The pin
description on page 3 only specifies the polarity (active high):
http://www.holtic.com/documents/371-hi-3110_v-rev-kpdf.do
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Akshay Bhat <akshay.bhat@timesys.com>
Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Oliver Hartkopp [Wed, 24 Oct 2018 08:27:12 +0000 (10:27 +0200)]
can: raw: check for CAN FD capable netdev in raw_sendmsg()
commit
a43608fa77213ad5ac5f75994254b9f65d57cfa0 upstream.
When the socket is CAN FD enabled it can handle CAN FD frame
transmissions. Add an additional check in raw_sendmsg() as a CAN2.0 CAN
driver (non CAN FD) should never see a CAN FD frame. Due to the commonly
used can_dropped_invalid_skb() function the CAN 2.0 driver would drop
that CAN FD frame anyway - but with this patch the user gets a proper
-EINVAL return code.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Oleksij Rempel [Tue, 18 Sep 2018 09:40:39 +0000 (11:40 +0200)]
can: flexcan: handle tx-complete CAN frames via rx-offload infrastructure
commit
ed72bc8bcb9277061e753faf300b20f97323761c upstream.
Current flexcan driver will put TX-ECHO in regular unsorted way, in
this case TX-ECHO can come after the response to the same TXed message.
In some cases, for example for J1939 stack, things will break.
This patch is using new rx-offload API to put the messages just in the
right place.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Oleksij Rempel [Tue, 18 Sep 2018 09:40:41 +0000 (11:40 +0200)]
can: flexcan: use can_rx_offload_queue_sorted() for flexcan_irq_bus_*()
commit
d788905f68fd4714c82936f6f7f1d3644d7ae7ef upstream.
Currently, in case of bus error, driver will generate error message and put
in the tail of the message queue. To avoid confusions, this change should
place the bus related messages in proper order.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Oleksij Rempel [Tue, 18 Sep 2018 09:40:40 +0000 (11:40 +0200)]
can: rx-offload: rename can_rx_offload_irq_queue_err_skb() to can_rx_offload_queue_tail()
commit
4530ec36bb1e0d24f41c33229694adacda3d5d89 upstream.
This function has nothing todo with error.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Oleksij Rempel [Tue, 18 Sep 2018 09:40:38 +0000 (11:40 +0200)]
can: rx-offload: introduce can_rx_offload_get_echo_skb() and can_rx_offload_queue_sorted() functions
commit
55059f2b7f868cd43b3ad30e28e18347e1b46ace upstream.
Current CAN framework can't guarantee proper/chronological order
of RX and TX-ECHO messages. To make this possible, drivers should use
this functions instead of can_get_echo_skb().
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marc Kleine-Budde [Wed, 31 Oct 2018 13:15:13 +0000 (14:15 +0100)]
can: dev: __can_get_echo_skb(): print error message, if trying to echo non existing skb
commit
7da11ba5c5066dadc2e96835a6233d56d7b7764a upstream.
Prior to echoing a successfully transmitted CAN frame (by calling
can_get_echo_skb()), CAN drivers have to put the CAN frame (by calling
can_put_echo_skb() in the transmit function). These put and get function
take an index as parameter, which is used to identify the CAN frame.
A driver calling can_get_echo_skb() with a index not pointing to a skb
is a BUG, so add an appropriate error message.
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marc Kleine-Budde [Wed, 31 Oct 2018 13:05:26 +0000 (14:05 +0100)]
can: dev: __can_get_echo_skb(): Don't crash the kernel if can_priv::echo_skb is accessed out of bounds
commit
e7a6994d043a1e31d5b17706a22ce33d2a3e4cdc upstream.
If the "struct can_priv::echo_skb" is accessed out of bounds would lead
to a kernel crash. Better print a sensible warning message instead and
try to recover.
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>