Eric W. Biederman [Sat, 14 Sep 2019 12:35:02 +0000 (07:35 -0500)]
tasks, sched/core: RCUify the assignment of rq->curr
The current task on the runqueue is currently read with rcu_dereference().
To obtain ordinary RCU semantics for an rcu_dereference() of rq->curr it needs
to be paired with rcu_assign_pointer() of rq->curr. Which provides the
memory barrier necessary to order assignments to the task_struct
and the assignment to rq->curr.
Unfortunately the assignment of rq->curr in __schedule is a hot path,
and it has already been show that additional barriers in that code
will reduce the performance of the scheduler. So I will attempt to
describe below why you can effectively have ordinary RCU semantics
without any additional barriers.
The assignment of rq->curr in init_idle is a slow path called once
per cpu and that can use rcu_assign_pointer() without any concerns.
As I write this there are effectively two users of rcu_dereference() on
rq->curr. There is the membarrier code in kernel/sched/membarrier.c
that only looks at "->mm" after the rcu_dereference(). Then there is
task_numa_compare() in kernel/sched/fair.c. My best reading of the
code shows that task_numa_compare only access: "->flags",
"->cpus_ptr", "->numa_group", "->numa_faults[]",
"->total_numa_faults", and "->se.cfs_rq".
The code in __schedule() essentially does:
rq_lock(...);
smp_mb__after_spinlock();
next = pick_next_task(...);
rq->curr = next;
context_switch(prev, next);
At the start of the function the rq_lock/smp_mb__after_spinlock
pair provides a full memory barrier. Further there is a full memory barrier
in context_switch().
This means that any task that has already run and modified itself (the
common case) has already seen two memory barriers before __schedule()
runs and begins executing. A task that modifies itself then sees a
third full memory barrier pair with the rq_lock();
For a brand new task that is enqueued with wake_up_new_task() there
are the memory barriers present from the taking and release the
pi_lock and the rq_lock as the processes is enqueued as well as the
full memory barrier at the start of __schedule() assuming __schedule()
happens on the same cpu.
This means that by the time we reach the assignment of rq->curr
except for values on the task struct modified in pick_next_task
the code has the same guarantees as if it used rcu_assign_pointer().
Reading through all of the implementations of pick_next_task it
appears pick_next_task is limited to modifying the task_struct fields
"->se", "->rt", "->dl". These fields are the sched_entity structures
of the varies schedulers.
Further "->se.cfs_rq" is only changed in cgroup attach/move operations
initialized by userspace.
Unless I have missed something this means that in practice that the
users of "rcu_dereference(rq->curr)" get normal RCU semantics of
rcu_dereference() for the fields the care about, despite the
assignment of rq->curr in __schedule() ot using rcu_assign_pointer.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20190903200603.GW2349@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric W. Biederman [Sat, 14 Sep 2019 12:34:30 +0000 (07:34 -0500)]
tasks, sched/core: With a grace period after finish_task_switch(), remove unnecessary code
Remove work arounds that were written before there was a grace period
after tasks left the runqueue in finish_task_switch().
In particular now that there tasks exiting the runqueue exprience
a RCU grace period none of the work performed by task_rcu_dereference()
excpet the rcu_dereference() is necessary so replace task_rcu_dereference()
with rcu_dereference().
Remove the code in rcuwait_wait_event() that checks to ensure the current
task has not exited. It is no longer necessary as it is guaranteed
that any running task will experience a RCU grace period after it
leaves the run queueue.
Remove the comment in rcuwait_wake_up() as it is no longer relevant.
Ref:
8f95c90ceb54 ("sched/wait, RCU: Introduce rcuwait machinery")
Ref:
150593bf8693 ("sched/api: Introduce task_rcu_dereference() and try_get_task_struct()")
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/87lfurdpk9.fsf_-_@x220.int.ebiederm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric W. Biederman [Sat, 14 Sep 2019 12:33:58 +0000 (07:33 -0500)]
tasks, sched/core: Ensure tasks are available for a grace period after leaving the runqueue
In the ordinary case today the RCU grace period for a task_struct is
triggered when another process wait's for it's zombine and causes the
kernel to call release_task(). As the waiting task has to receive a
signal and then act upon it before this happens, typically this will
occur after the original task as been removed from the runqueue.
Unfortunaty in some cases such as self reaping tasks it can be shown
that release_task() will be called starting the grace period for
task_struct long before the task leaves the runqueue.
Therefore use put_task_struct_rcu_user() in finish_task_switch() to
guarantee that the there is a RCU lifetime after the task
leaves the runqueue.
Besides the change in the start of the RCU grace period for the
task_struct this change may cause perf_event_delayed_put and
trace_sched_process_free. The function perf_event_delayed_put boils
down to just a WARN_ON for cases that I assume never show happen. So
I don't see any problem with delaying it.
The function trace_sched_process_free is a trace point and thus
visible to user space. Occassionally userspace has the strangest
dependencies so this has a miniscule chance of causing a regression.
This change only changes the timing of when the tracepoint is called.
The change in timing arguably gives userspace a more accurate picture
of what is going on. So I don't expect there to be a regression.
In the case where a task self reaps we are pretty much guaranteed that
the RCU grace period is delayed. So we should get quite a bit of
coverage in of this worst case for the change in a normal threaded
workload. So I expect any issues to turn up quickly or not at all.
I have lightly tested this change and everything appears to work
fine.
Inspired-by: Linus Torvalds <torvalds@linux-foundation.org>
Inspired-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/87r24jdpl5.fsf_-_@x220.int.ebiederm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric W. Biederman [Sat, 14 Sep 2019 12:33:34 +0000 (07:33 -0500)]
tasks: Add a count of task RCU users
Add a count of the number of RCU users (currently 1) of the task
struct so that we can later add the scheduler case and get rid of the
very subtle task_rcu_dereference(), and just use rcu_dereference().
As suggested by Oleg have the count overlap rcu_head so that no
additional space in task_struct is required.
Inspired-by: Linus Torvalds <torvalds@linux-foundation.org>
Inspired-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/87woebdplt.fsf_-_@x220.int.ebiederm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Qian Cai [Tue, 17 Sep 2019 14:34:54 +0000 (10:34 -0400)]
sched/core: Convert vcpu_is_preempted() from macro to an inline function
Clang reports this warning:
kernel/locking/osq_lock.c:25:19: warning: unused function 'node_cpu' [-Wunused-function]
due to osq_lock() calling vcpu_is_preempted(node_cpu(node->prev))), but
vcpu_is_preempted() is compiled away. Fix it by converting the dummy
vcpu_is_preempted() from a macro to a proper static inline function.
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: juri.lelli@redhat.com
Cc: rostedt@goodmis.org
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/1568730894-10483-1-git-send-email-cai@lca.pw
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Qian Cai [Mon, 16 Sep 2019 21:19:35 +0000 (17:19 -0400)]
sched/fair: Remove unused cfs_rq_clock_task() function
cfs_rq_clock_task() was first introduced and used in:
f1b17280efbd ("sched: Maintain runnable averages across throttled periods")
Over time its use has been graduately removed by the following commits:
d31b1a66cbe0 ("sched/fair: Factorize PELT update")
23127296889f ("sched/fair: Update scale invariance of PELT")
Today, there is no single user left, so it can be safely removed.
Found via the -Wunused-function build warning.
Signed-off-by: Qian Cai <cai@lca.pw>
Cc: Ben Segall <bsegall@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/1568668775-2127-1-git-send-email-cai@lca.pw
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Tue, 17 Sep 2019 02:59:10 +0000 (19:59 -0700)]
Merge tag 'platform-drivers-x86-v5.4-1' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform-drivers updates from Andy Shevchenko:
- ASUS WMI driver got a couple of updates, i.e. support of FAN is fixed
for recent products and the charge threshold support has been added
- Two uknown key events for Dell laptops are being ignored now to avoid
spamming users with harmless messages
- HP ZBook 17 G5 and ASUS Zenbook UX430UNR got accelerometer support.
- Intel CherryTrail platforms had a regression with wake up. Now it's
fixed
- Intel PMC driver got fixed in order to work nicely in Xen
environment
- Intel Speed Select driver provides bucket vs core count relationship.
Besides that the tools has been updated for better output
- The PrivacyGuard is enabled on Lenovo ThinkPad laptops
- Three tablets - Trekstor Primebook C11B 2-in-1, Irbis TW90 and Chuwi
Surbook Mini - got touchscreen support
* tag 'platform-drivers-x86-v5.4-1' of git://git.infradead.org/linux-platform-drivers-x86: (53 commits)
MAINTAINERS: Switch PDx86 subsystem status to Odd Fixes
platform/x86: asus-wmi: Refactor charge threshold to use the battery hooking API
platform/x86: asus-wmi: Rename CHARGE_THRESHOLD to RSOC
platform/x86: asus-wmi: Reorder ASUS_WMI_CHARGE_THRESHOLD
tools/power/x86/intel-speed-select: Display core count for bucket
platform/x86: ISST: Allow additional TRL MSRs
tools/power/x86/intel-speed-select: Fix memory leak
tools/power/x86/intel-speed-select: Output success/failed for command output
tools/power/x86/intel-speed-select: Output human readable CPU list
tools/power/x86/intel-speed-select: Change turbo ratio output to maximum turbo frequency
tools/power/x86/intel-speed-select: Switch output to MHz
tools/power/x86/intel-speed-select: Simplify output for turbo-freq and base-freq
tools/power/x86/intel-speed-select: Fix cpu-count output
tools/power/x86/intel-speed-select: Fix help option typo
tools/power/x86/intel-speed-select: Fix package typo
tools/power/x86/intel-speed-select: Fix a read overflow in isst_set_tdp_level_msr()
platform/x86: intel_int0002_vgpio: Use device_init_wakeup
platform/x86: intel_int0002_vgpio: Fix wakeups not working on Cherry Trail
platform/x86: compal-laptop: Initialize "value" in ec_read_u8()
platform/x86: touchscreen_dmi: Add info for the Trekstor Primebook C11B 2-in-1
...
Linus Torvalds [Tue, 17 Sep 2019 02:40:24 +0000 (19:40 -0700)]
Merge branch 'x86-vmware-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 vmware updates from Ingo Molnar:
"This updates the VMWARE guest driver with support for VMCALL/VMMCALL
based hypercalls"
* 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
input/vmmouse: Update the backdoor call with support for new instructions
drm/vmwgfx: Update the backdoor call with support for new instructions
x86/vmware: Add a header file for hypercall definitions
x86/vmware: Update platform detection code for VMCALL/VMMCALL hypercalls
Linus Torvalds [Tue, 17 Sep 2019 02:39:00 +0000 (19:39 -0700)]
Merge branch 'x86-hyperv-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 hyperv updates from Ingo Molnar:
"Misc updates related to page size abstractions within the HyperV code,
in preparation for future features"
* 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
drivers: hv: vmbus: Replace page definition with Hyper-V specific one
x86/hyperv: Add functions to allocate/deallocate page for Hyper-V
x86/hyperv: Create and use Hyper-V page definitions
Linus Torvalds [Tue, 17 Sep 2019 02:37:44 +0000 (19:37 -0700)]
Merge branch 'x86-platform-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 platform update from Ingo Molnar:
"The biggest change is the rework of the intel/iosf_mbi locking code
which used a few non-standard locking patterns, to make it work under
lockdep"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/uv: Fix kmalloc() NULL check routine
x86/platform/intel/iosf_mbi Rewrite locking
Linus Torvalds [Tue, 17 Sep 2019 02:21:34 +0000 (19:21 -0700)]
Merge branch 'x86-mm-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
- Make cpumask_of_node() more robust against invalid node IDs
- Simplify and speed up load_mm_cr4()
- Unexport and remove various unused set_memory_*() APIs
- Misc cleanups
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix cpumask_of_node() error condition
x86/mm: Remove the unused set_memory_wt() function
x86/mm: Remove set_pages_x() and set_pages_nx()
x86/mm: Remove the unused set_memory_array_*() functions
x86/mm: Unexport set_memory_x() and set_memory_nx()
x86/fixmap: Cleanup outdated comments
x86/kconfig: Remove X86_DIRECT_GBPAGES dependency on !DEBUG_PAGEALLOC
x86/mm: Avoid redundant interrupt disable in load_mm_cr4()
Linus Torvalds [Tue, 17 Sep 2019 02:06:29 +0000 (19:06 -0700)]
Merge branch 'x86-entry-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 entry updates from Ingo Molnar:
"This contains x32 and compat syscall improvements, the biggest one of
which splits x32 syscalls into their own table, which allows new
syscalls to share the x32 and x86-64 number - which turns the
512-547 special syscall numbers range into a legacy wart that won't be
extended going forward"
* 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/syscalls: Split the x32 syscalls into their own table
x86/syscalls: Disallow compat entries for all types of 64-bit syscalls
x86/syscalls: Use the compat versions of rt_sigsuspend() and rt_sigprocmask()
x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long
Linus Torvalds [Tue, 17 Sep 2019 01:47:53 +0000 (18:47 -0700)]
Merge branch 'x86-cpu-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 cpu-feature updates from Ingo Molnar:
- Rework the Intel model names symbols/macros, which were decades of
ad-hoc extensions and added random noise. It's now a coherent, easy
to follow nomenclature.
- Add new Intel CPU model IDs:
- "Tiger Lake" desktop and mobile models
- "Elkhart Lake" model ID
- and the "Lightning Mountain" variant of Airmont, plus support code
- Add the new AVX512_VP2INTERSECT instruction to cpufeatures
- Remove Intel MPX user-visible APIs and the self-tests, because the
toolchain (gcc) is not supporting it going forward. This is the
first, lowest-risk phase of MPX removal.
- Remove X86_FEATURE_MFENCE_RDTSC
- Various smaller cleanups and fixes
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
x86/cpu: Update init data for new Airmont CPU model
x86/cpu: Add new Airmont variant to Intel family
x86/cpu: Add Elkhart Lake to Intel family
x86/cpu: Add Tiger Lake to Intel family
x86: Correct misc typos
x86/intel: Add common OPTDIFFs
x86/intel: Aggregate microserver naming
x86/intel: Aggregate big core graphics naming
x86/intel: Aggregate big core mobile naming
x86/intel: Aggregate big core client naming
x86/cpufeature: Explain the macro duplication
x86/ftrace: Remove mcount() declaration
x86/PCI: Remove superfluous returns from void functions
x86/msr-index: Move AMD MSRs where they belong
x86/cpu: Use constant definitions for CPU models
lib: Remove redundant ftrace flag removal
x86/crash: Remove unnecessary comparison
x86/bitops: Use __builtin_constant_p() directly instead of IS_IMMEDIATE()
x86: Remove X86_FEATURE_MFENCE_RDTSC
x86/mpx: Remove MPX APIs
...
Linus Torvalds [Tue, 17 Sep 2019 01:29:19 +0000 (18:29 -0700)]
Merge branch 'x86-build-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 build cleanup from Ingo Molnar:
"A single change that removes unnecessary asm-generic wrappers"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Remove unneeded uapi asm-generic wrappers
Linus Torvalds [Tue, 17 Sep 2019 01:27:37 +0000 (18:27 -0700)]
Merge branch 'x86-boot-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 boot code cleanup from Ingo Molnar:
"Clean up the BUILD_BUG_ON() definition which can cause build warnings"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Use common BUILD_BUG_ON
Linus Torvalds [Tue, 17 Sep 2019 01:07:08 +0000 (18:07 -0700)]
Merge branch 'x86-asm-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
- Add UMIP emulation/spoofing for 64-bit processes as well, because of
Wine based gaming.
- Clean up symbols/labels in low level asm code
- Add an assembly optimized mul_u64_u32_div() implementation on x86-64.
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/umip: Add emulation (spoofing) for UMIP covered instructions in 64-bit processes as well
x86/asm: Make some functions local labels
x86/asm/suspend: Get rid of bogus_64_magic
x86/math64: Provide a sane mul_u64_u32_div() implementation for x86_64
Linus Torvalds [Tue, 17 Sep 2019 00:25:49 +0000 (17:25 -0700)]
Merge branch 'sched-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and
Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann,
Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers.
As perf and the scheduler is getting bigger and more complex,
document the status quo of current responsibilities and interests,
and spread the review pain^H^H^H^H fun via an increase in the Cc:
linecount generated by scripts/get_maintainer.pl. :-)
- Add another series of patches that brings the -rt (PREEMPT_RT) tree
closer to mainline: split the monolithic CONFIG_PREEMPT dependencies
into a new CONFIG_PREEMPTION category that will allow the eventual
introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches
to go though.
- Extend the CPU cgroup controller with uclamp.min and uclamp.max to
allow the finer shaping of CPU bandwidth usage.
- Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS).
- Improve the behavior of high CPU count, high thread count
applications running under cpu.cfs_quota_us constraints.
- Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present.
- Improve CPU isolation housekeeping CPU allocation NUMA locality.
- Fix deadline scheduler bandwidth calculations and logic when cpusets
rebuilds the topology, or when it gets deadline-throttled while it's
being offlined.
- Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from
setscheduler() system calls without creating global serialization.
Add new synchronization between cpuset topology-changing events and
the deadline acceptance tests in setscheduler(), which were broken
before.
- Rework the active_mm state machine to be less confusing and more
optimal.
- Rework (simplify) the pick_next_task() slowpath.
- Improve load-balancing on AMD EPYC systems.
- ... and misc cleanups, smaller fixes and improvements - please see
the Git log for more details.
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
sched/psi: Correct overly pessimistic size calculation
sched/fair: Speed-up energy-aware wake-ups
sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
sched/uclamp: Update CPU's refcount on TG's clamp changes
sched/uclamp: Use TG's clamps to restrict TASK's clamps
sched/uclamp: Propagate system defaults to the root group
sched/uclamp: Propagate parent clamps
sched/uclamp: Extend CPU's cgroup controller
sched/topology: Improve load balancing on AMD EPYC systems
arch, ia64: Make NUMA select SMP
sched, perf: MAINTAINERS update, add submaintainers and reviewers
sched/fair: Use rq_lock/unlock in online_fair_sched_group
cpufreq: schedutil: fix equation in comment
sched: Rework pick_next_task() slow-path
sched: Allow put_prev_task() to drop rq->lock
sched/fair: Expose newidle_balance()
sched: Add task_struct pointer to sched_class::set_curr_task
sched: Rework CPU hotplug task selection
sched/{rt,deadline}: Fix set_next_task vs pick_next_task
sched: Fix kerneldoc comment for ia64_set_curr_task
...
Linus Torvalds [Tue, 17 Sep 2019 00:06:21 +0000 (17:06 -0700)]
Merge branch 'perf-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Improved kbprobes robustness
- Intel PEBS support for PT hardware tracing
- Other Intel PT improvements: high order pages memory footprint
reduction and various related cleanups
- Misc cleanups
The perf tooling side has been very busy in this cycle, with over 300
commits. This is an incomplete high-level summary of the many
improvements done by over 30 developers:
- Lots of updates to the following tools:
'perf c2c'
'perf config'
'perf record'
'perf report'
'perf script'
'perf test'
'perf top'
'perf trace'
- Updates to libperf and libtraceevent, and a consolidation of the
proliferation of x86 instruction decoder libraries.
- Vendor event updates for Intel and PowerPC CPUs,
- Updates to hardware tracing tooling for ARM and Intel CPUs,
- ... and lots of other changes and cleanups - see the shortlog and
Git log for details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (322 commits)
kprobes: Prohibit probing on BUG() and WARN() address
perf/x86: Make more stuff static
x86, perf: Fix the dependency of the x86 insn decoder selftest
objtool: Ignore intentional differences for the x86 insn decoder
objtool: Update sync-check.sh from perf's check-headers.sh
perf build: Ignore intentional differences for the x86 insn decoder
perf intel-pt: Use shared x86 insn decoder
perf intel-pt: Remove inat.c from build dependency list
perf: Update .gitignore file
objtool: Move x86 insn decoder to a common location
perf metricgroup: Support multiple events for metricgroup
perf metricgroup: Scale the metric result
perf pmu: Change convert_scale from static to global
perf symbols: Move mem_info and branch_info out of symbol.h
perf auxtrace: Uninline functions that touch perf_session
perf tools: Remove needless evlist.h include directives
perf tools: Remove needless evlist.h include directives
perf tools: Remove needless thread_map.h include directives
perf tools: Remove needless thread.h include directives
perf tools: Remove needless map.h include directives
...
Linus Torvalds [Mon, 16 Sep 2019 23:49:55 +0000 (16:49 -0700)]
Merge branch 'locking-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- improve rwsem scalability
- add uninitialized rwsem debugging check
- reduce lockdep's stacktrace memory usage and add diagnostics
- misc cleanups, code consolidation and constification
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mutex: Fix up mutex_waiter usage
locking/mutex: Use mutex flags macro instead of hard code
locking/mutex: Make __mutex_owner static to mutex.c
locking/qspinlock,x86: Clarify virt_spin_lock_key
locking/rwsem: Check for operations on an uninitialized rwsem
locking/rwsem: Make handoff writer optimistically spin on owner
locking/lockdep: Report more stack trace statistics
locking/lockdep: Reduce space occupied by stack traces
stacktrace: Constify 'entries' arguments
locking/lockdep: Make it clear that what lock_class::key points at is not modified
Linus Torvalds [Mon, 16 Sep 2019 23:47:38 +0000 (16:47 -0700)]
Merge branch 'efi-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
- refactor the EFI config table handling across architectures
- add support for the Dell EMC OEM config table
- include AER diagnostic output to CPER handling of fatal PCIe errors
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: cper: print AER info of PCIe fatal error
efi: Export Runtime Configuration Interface table to sysfs
efi: ia64: move SAL systab handling out of generic EFI code
efi/x86: move UV_SYSTAB handling into arch/x86
efi: x86: move efi_is_table_address() into arch/x86
Linus Torvalds [Mon, 16 Sep 2019 23:44:55 +0000 (16:44 -0700)]
Merge branch 'core-stacktrace-for-linus' of git://git./linux/kernel/git/tip/tip
Pull stacktrace fixlet from Ingo Molnar:
"Two comment fixes"
* 'core-stacktrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/stackdepot: Fix outdated comments
Linus Torvalds [Mon, 16 Sep 2019 23:28:19 +0000 (16:28 -0700)]
Merge branch 'core-rcu-for-linus' of git://git./linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"This cycle's RCU changes were:
- A few more RCU flavor consolidation cleanups.
- Updates to RCU's list-traversal macros improving lockdep usability.
- Forward-progress improvements for no-CBs CPUs: Avoid ignoring
incoming callbacks during grace-period waits.
- Forward-progress improvements for no-CBs CPUs: Use ->cblist
structure to take advantage of others' grace periods.
- Also added a small commit that avoids needlessly inflicting
scheduler-clock ticks on callback-offloaded CPUs.
- Forward-progress improvements for no-CBs CPUs: Reduce contention on
->nocb_lock guarding ->cblist.
- Forward-progress improvements for no-CBs CPUs: Add ->nocb_bypass
list to further reduce contention on ->nocb_lock guarding ->cblist.
- Miscellaneous fixes.
- Torture-test updates.
- minor LKMM updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (86 commits)
MAINTAINERS: Update from paulmck@linux.ibm.com to paulmck@kernel.org
rcu: Don't include <linux/ktime.h> in rcutiny.h
rcu: Allow rcu_do_batch() to dynamically adjust batch sizes
rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload
rcu/nocb: Reduce __call_rcu_nocb_wake() leaf rcu_node ->lock contention
rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention
rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks()
rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake()
rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed
rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended
rcu/nocb: Add bypass callback queueing
rcu/nocb: Atomic ->len field in rcu_segcblist structure
rcu/nocb: Unconditionally advance and wake for excessive CBs
rcu/nocb: Reduce ->nocb_lock contention with separate ->nocb_gp_lock
rcu/nocb: Reduce contention at no-CBs invocation-done time
rcu/nocb: Reduce contention at no-CBs registry-time CB advancement
rcu/nocb: Round down for number of no-CBs grace-period kthreads
rcu/nocb: Avoid ->nocb_lock capture by corresponding CPU
rcu/nocb: Avoid needless wakeups of no-CBs grace-period kthread
rcu/nocb: Make __call_rcu_nocb_wake() safe for many callbacks
...
Linus Torvalds [Mon, 16 Sep 2019 23:15:34 +0000 (16:15 -0700)]
Merge branch 'core-objtool-for-linus' of git://git./linux/kernel/git/tip/tip
Pull objtool build fix from Ingo Molnar:
"Fix objtool builds with more exotic, user-defined CFLAGS"
* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Clobber user CFLAGS variable
Linus Torvalds [Mon, 16 Sep 2019 23:11:41 +0000 (16:11 -0700)]
Merge branch 'core-headers-for-linus' of git://git./linux/kernel/git/tip/tip
Pull header documentation fix from Ingo Molnar:
"Fix the parameter description <asm-generic/div64.h>"
* 'core-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
asm-generic/div64: Fix documentation of do_div() parameter
Linus Torvalds [Mon, 16 Sep 2019 22:56:22 +0000 (15:56 -0700)]
Merge tag 'armsoc-dt' of git://git./linux/kernel/git/soc/soc
Pull ARM DT updates from Arnd Bergmann:
"This is another huge branch with close to 450 changessets related to
devicetree files, roughly half of this for 32-bit and 64-bit
respectively. There are lots of cleanups and additional hardware
support for platforms we already support based on SoCs from Renesas,
ST-Microelectronics, Intel/Altera, Rockchips, Allwinner, Broadcom and
other manufacturers.
A total of 6 new SoCs and 37 new boards gets added this time, one more
SoC will come in a follow-up branch. Most of the new boards are for
64-bit ARM SoCs, the others are typically for the 32-bit Cortex-A7.
Going more into details for SoC platforms with new hardware support:
- The Snapdragon 855 (SM8150) is Qualcomm's current high-end phone
platform, usually paired with an external 5G modem. So far we only
support the Qualcomm SM8150 MTP reference platform, but no actual
products.
- For the slightly older Qualcomm platforms, support for several
interesting products is getting added: Three laptops based on
Snapdragon 835/MSM8998 (Asus NovaGo, HP Envy X2 and Lenovo Miix
630), one laptop based on Snapdragon 850/sdm850 (Lenovo Yoga C630)
and several phones based on the older Snapdragon 410/MSM8916
(Samsung A3 and A5, Longcheer L8150 aka Android One 2nd gen "seed"
aka Wileyfox Swift).
- Mediatek MT7629 is a new wireless network router chip, similar to
the older MT7623. It gets added together with the reference board
implementation.
- Allwinner V3 is a repackaged version of the existing low-end V3s
chip, and is used in the tiny Lichee Pi Zero plus, also added here.
There is also a new TV set-top box based on Allwinner H6, the Tanix
TX6, and the eMMC variant of the Olimex A64-Olinuxino development
board.
- NXP i.MX8M Nano is a new member of the ever-expanding i.MX SoC
family, similar to the i.MX8M Mini. As usual, there is a large
number of new boards for i.MX SoCs: Einfochips i.MX8QXP AI_ML,
SolidRun Hummingboard Pulse baseboard and System-on-Module,
Boundary Devices i.MX8MQ Nitrogen8M, and TechNexion
PICO-PI-IMX8M-DEV for the 64-bit i.MX8 line. For 32-bit, we get the
Kontron i.MX6UL N6310 SoM with two baseboards, the PHYTEC
phyBOARD-Segin SoM with three baseboards, and the Zodiac Inflight
Innovations i.MX7 RMU2 board.
- In a different NXP product line, the Layerscape LS1046A "Freeway"
reference board gets added.
- Amlogic SM1 (S905X3) and G12B (S922X, A311D) are updated chips from
their set-top-box line and smart speaker with newer CPU and GPU
cores compared to their predecessors. Both are now also supported
by the Khadas VIM3 development board series, and the dts files for
that get reorganized a bit to better deal with all variants.
Another board based on SM1 that gets added is the SEI Robotics
SEI610.
- There are a handful of new x86 and Power9 server boards using
Aspeed BMC chips that are gaining support for running Linux on the
BMC through the OpenBMC project: Facebook
Minipack/Wedge100/Wedge40, Lenovo Hr855xg2, and Mihawk. Notably
these are still new machines using SoCs based on the ARM9 and ARM11
CPU cores, as support for the new Cortex-A7 based AST2600 is still
ramping up.
- There are three new end-user products using 32-bit Rockchips SoCs:
Mecer Xtreme Mini S6 is an Android "mini PC" box based on the
low-end RK3229 chip, while the two AOpen products Chromebox Mini
(Fievel) and Chromebase Mini (Tiger) run ChromeOS and are meant for
commercial settings(digital signage, PoS, ...).
- One more single-board computer based on the popular 64-bit RK3399
is added: the Leez RK3399 P710"
* tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (467 commits)
arm64: dts: qcom: Add Lenovo Yoga C630
ARM: dts: aspeed-g5: Fixe gpio-ranges upper limit
ARM; dts: aspeed: mihawk: File should not be executable
ARM: dts: aspeed: swift: Change power supplies to version 2
ARM: dts: aspeed: vesnin: Add secondary SPI flash chip
ARM: dts: aspeed: vesnin: Add wdt2 with alt-boot option
ARM: dts: aspeed-g4: Add all flash chips
ARM: dts: exynos: Enable GPU/Mali T604 on Arndale board
ARM: dts: exynos: Enable GPU/Mali T604 on Chromebook Snow
ARM: dts: exynos: Add GPU/Mali T604 node to Exynos5250
ARM: dts: exynos: Fix min/max buck4 for GPU on Arndale board
ARM: dts: exynos: Mark LDO10 as always-on on Peach Pit/Pi Chromebooks
ARM: dts: exynos: Remove not accurate secondary ADC compatible
arm64: dts: rockchip: limit clock rate of MMC controllers for RK3328
arm64: dts: meson-sm1-sei610: add stdout-path property back
arm64: dts: meson-sm1-sei610: enable DVFS
arm64: dts: khadas-vim3: add support for the SM1 based VIM3L
dt-bindings: arm: amlogic: add Amlogic SM1 based Khadas VIM3L bindings
arm64: dts: khadas-vim3: move common nodes into meson-khadas-vim3.dtsi
arm64: dts: meson: g12a: add reset to tdm formatters
...
Linus Torvalds [Mon, 16 Sep 2019 22:55:06 +0000 (15:55 -0700)]
Merge tag 'armsoc-defconfig' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC defconfig updates from Arnd Bergmann:
"As usual, a bunch of commits, mostly adding drivers and other options
to defconfigs after the code was merged through another tree"
* tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits)
arm64: defconfig: Enable Qualcomm QUSB2 PHY
arm64: defconfig: Enable the EFI Framebuffer
arm64: defconfig: Enable Qualcomm GENI based I2C controller
ARM: multi_v7_defconfig: Make MAX77802 regulator driver built-in
arm64: defconfig: Enable CPU clock drivers for Qualcomm msm8916
arm64: defconfig: Add DRM_MSM to defconfigs with ARCH_QCOM
ARM: multi_v7_defconfig: Add DRM_MSM to defconfigs with ARCH_QCOM
ARM: qcom_defconfig: Add DRM_MSM to defconfigs with ARCH_QCOM
ARM: configs: aspeed_g5: Enable AST2600
ARM: configs: multi_v7: Add ASPEED G6
arm64: defconfig: Enable SM8150 GCC and pinctrl driver
arm64: defconfig: Enable CONFIG_ACPI_APEI_PCIEAER
arm64: defconfig: Enable the DesignWare watchdog
ARM: multi_v7_defconfig: Enable SPI_STM32_QSPI support
ARM: imx_v6_v7_defconfig: Enable the PSCI CPUidle driver
arm64: defconfig: Enable the PSCI CPUidle driver
arm64: defconfig: Enable Sun4i SPDIF module
ARM: exynos_defconfig: Enable AHCI-platform SATA driver
arm64: defconfig: CONFIG_DRM_ETNAVIV=m
ARM: imx_v6_v7_defconfig: Select the OV5645 camera driver
...
Linus Torvalds [Mon, 16 Sep 2019 22:52:38 +0000 (15:52 -0700)]
Merge tag 'armsoc-drivers' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC driver updates from Arnd Bergmann:
"This contains driver changes that are tightly connected to SoC
specific code. Aside from smaller cleanups and bug fixes, here is a
list of the notable changes.
New device drivers:
- The Turris Mox router has a new "moxtet" bus driver for its
on-board pluggable extension bus. The same platform also gains a
firmware driver.
- The Samsung Exynos family gains a new Chipid driver exporting using
the soc device sysfs interface
- A similar socinfo driver for Qualcomm Snapdragon chips.
- A firmware driver for the NXP i.MX DSP IPC protocol using shared
memory and a mailbox
Other changes:
- The i.MX reset controller driver now supports the NXP i.MX8MM chip
- Amlogic SoC specific drivers gain support for the S905X3 and A311D
chips
- A rework of the TI Davinci framebuffer driver to allow important
cleanups in the platform code
- A couple of device drivers for removed ARM SoC platforms are
removed. Most of the removals were picked up by other maintainers,
this contains whatever was left"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (123 commits)
bus: uniphier-system-bus: use devm_platform_ioremap_resource()
soc: ti: ti_sci_pm_domains: Add support for exclusive and shared access
dt-bindings: ti_sci_pm_domains: Add support for exclusive and shared access
firmware: ti_sci: Allow for device shared and exclusive requests
bus: imx-weim: remove incorrect __init annotations
fbdev: remove w90x900/nuc900 platform drivers
spi: remove w90x900 driver
net: remove w90p910-ether driver
net: remove ks8695 driver
firmware: turris-mox-rwtm: Add sysfs documentation
firmware: Add Turris Mox rWTM firmware driver
dt-bindings: firmware: Document cznic,turris-mox-rwtm binding
bus: moxtet: fix unsigned comparison to less than zero
bus: moxtet: remove set but not used variable 'dummy'
ARM: scoop: Use the right include
dt-bindings: power: add Amlogic Everything-Else power domains bindings
soc: amlogic: Add support for Everything-Else power domains controller
fbdev: da8xx: use resource management for dma
fbdev: da8xx-fb: drop a redundant if
fbdev: da8xx-fb: use devm_platform_ioremap_resource()
...
Linus Torvalds [Mon, 16 Sep 2019 22:48:14 +0000 (15:48 -0700)]
Merge tag 'armsoc-soc' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC platform updates from Arnd Bergmann:
"The main change this time around is a cleanup of some of the oldest
platforms based on the XScale and ARM9 CPU cores, which are between 10
and 20 years old.
The Kendin/Micrel/Microchip KS8695, Winbond/Nuvoton W90x900 and Intel
IOP33x/IOP13xx platforms are removed after we determined that nobody
is using them any more.
The TI Davinci and NXP LPC32xx platforms on the other hand are still
in active use and are converted to the ARCH_MULTIPLATFORM build,
meaning that we can compile a kernel that works on these along with
most other ARMv5 platforms. Changes toward that goal are also merged
for IOP32x, but additional work is needed to complete this. Patches
for the remaining ARMv5 platforms have started but need more work and
some testing.
Support for the new ASpeed AST2600 gets added, this is based on the
Cortex-A7 ARMv7 core, and is a newer version of the existing ARMv5 and
ARMv6 chips in the same family.
Other changes include a cleanup of the ST-Ericsson ux500 platform and
the move of the TI Davinci platform to a new clocksource driver"
[ The changes had marked INTEL_IOP_ADMA and USB_LPC32XX as being
buildable on other platforms through COMPILE_TEST, but that causes new
warnings that I most definitely do not want to see during the merge
window as that could hide other issues.
So the COMPILE_TEST option got disabled for them again - Linus ]
* tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (61 commits)
ARM: multi_v5_defconfig: make DaVinci part of the ARM v5 multiplatform build
ARM: davinci: support multiplatform build for ARM v5
arm64: exynos: Enable exynos-chipid driver
ARM: OMAP2+: Delete an unnecessary kfree() call in omap_hsmmc_pdata_init()
ARM: OMAP2+: move platform-specific asm-offset.h to arch/arm/mach-omap2
ARM: davinci: dm646x: Fix a typo in the comment
ARM: davinci: dm646x: switch to using the clocksource driver
ARM: davinci: dm644x: switch to using the clocksource driver
ARM: aspeed: Enable SMP boot
ARM: aspeed: Add ASPEED AST2600 architecture
ARM: aspeed: Select timer in each SoC
dt-bindings: arm: cpus: Add ASPEED SMP
ARM: imx: stop adjusting ar8031 phy tx delay
mailmap: map old company name to new one @microchip.com
MAINTAINERS: at91: remove the TC entry
MAINTAINERS: at91: Collect all pinctrl/gpio drivers in same entry
ARM: at91: move platform-specific asm-offset.h to arch/arm/mach-at91
MAINTAINERS: Extend patterns for Samsung SoC, Security Subsystem and clock drivers
ARM: s3c64xx: squash samsung_usb_phy.h into setup-usb-phy.c
ARM: debug-ll: Add support for r7s9210
...
Linus Torvalds [Mon, 16 Sep 2019 22:38:31 +0000 (15:38 -0700)]
Merge branch 'parisc-5.4-1' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
- Make the powerpc implementation to read elf files available as a
public kexec interface so it can be re-used on other architectures
(Sven)
- Implement kexec on parisc (Sven)
- Add kprobes on ftrace on parisc (Sven)
- Fix kernel crash with HSC-PCI cards based on card-mode Dino
- Add assembly implementations for memset, strlen, strcpy, strncpy and
strcat
- Some cleanups, documentation updates, warning fixes, ...
* 'parisc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: (25 commits)
parisc: Have git ignore generated real2.S and firmware.c
parisc: Disable HP HSC-PCI Cards to prevent kernel crash
parisc: add support for kexec_file_load() syscall
parisc: wire up kexec_file_load syscall
parisc: add kexec syscall support
parisc: add __pdc_cpu_rendezvous()
kprobes/parisc: remove arch_kprobe_on_func_entry()
kexec_elf: support 32 bit ELF files
kexec_elf: remove unused variable in kexec_elf_load()
kexec_elf: remove Elf_Rel macro
kexec_elf: remove PURGATORY_STACK_SIZE
kexec_elf: remove parsing of section headers
kexec_elf: change order of elf_*_to_cpu() functions
kexec: add KEXEC_ELF
parisc: Save some bytes in dino driver
parisc: Drop comments which are already in pci.h
parisc: Convert eisa_enumerator to use pr_cont()
parisc: Avoid warning when loading hppb driver
parisc: speed up flush_tlb_all_local with qemu
parisc: Add ALTERNATIVE_CODE() and ALT_COND_RUN_ON_QEMU
...
Linus Torvalds [Mon, 16 Sep 2019 22:32:01 +0000 (15:32 -0700)]
Merge tag 'please-pull-ia64_for_5.4' of git://git./linux/kernel/git/aegl/linux
Pull ia64 updates from Tony Luck:
"The big change here is removal of support for SGI Altix"
* tag 'please-pull-ia64_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: (33 commits)
genirq: remove the is_affinity_mask_valid hook
ia64: remove CONFIG_SWIOTLB ifdefs
ia64: remove support for machvecs
ia64: move the screen_info setup to common code
ia64: move the ROOT_DEV setup to common code
ia64: rework iommu probing
ia64: remove the unused sn_coherency_id symbol
ia64: remove the SGI UV simulator support
ia64: remove the zx1 swiotlb machvec
ia64: remove CONFIG_ACPI ifdefs
ia64: remove CONFIG_PCI ifdefs
ia64: remove the hpsim platform
ia64: remove now unused machvec indirections
ia64: remove support for the SGI SN2 platform
drivers: remove the SGI SN2 IOC4 base support
drivers: remove the SGI SN2 IOC3 base support
qla2xxx: remove SGI SN2 support
qla1280: remove SGI SN2 support
misc/sgi-xp: remove SGI SN2 support
char/mspec: remove SGI SN2 support
...
Linus Torvalds [Mon, 16 Sep 2019 22:29:34 +0000 (15:29 -0700)]
Merge tag 'riscv/for-v5.4-rc1' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:
"Add the following new features:
- Generic CPU topology description support for DT-based platforms,
including ARM64, ARM and RISC-V.
- Sparsemem support
- Perf callchain support
- SiFive PLIC irqchip modifications, in preparation for M-mode Linux
and clean up the code base:
- Clean up chip-specific register (CSR) manipulation code, IPIs, TLB
flushing, and the RISC-V CPU-local timer code
- Kbuild cleanup from one of the Kbuild maintainers"
[ The CPU topology parts came in through the arm64 tree with a shared
branch - Linus ]
* tag 'riscv/for-v5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
irqchip/sifive-plic: set max threshold for ignored handlers
riscv: move the TLB flush logic out of line
riscv: don't use the rdtime(h) pseudo-instructions
riscv: cleanup riscv_cpuid_to_hartid_mask
riscv: optimize send_ipi_single
riscv: cleanup send_ipi_mask
riscv: refactor the IPI code
riscv: Add support for libdw
riscv: Add support for perf registers sampling
riscv: Add perf callchain support
riscv: add arch/riscv/Kbuild
RISC-V: Implement sparsemem
riscv: Using CSR numbers to access CSRs
Linus Torvalds [Mon, 16 Sep 2019 22:28:12 +0000 (15:28 -0700)]
Merge tag 'm68k-for-v5.4-tag1' of git://git./linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- ioremap() cleanups
- defconfig updates
- small fixes and cleanups
* tag 'm68k-for-v5.4-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: Remove ioremap_fullcache()
m68k: Simplify ioremap_nocache()
m68k: defconfig: Update defconfigs for v5.3-rc2
m68k: atari: Rename shifter to shifter_st to avoid conflict
m68k: Prevent some compiler warnings in Coldfire builds
m68k: mac: Revisit floppy disc controller base addresses
Linus Torvalds [Mon, 16 Sep 2019 21:31:40 +0000 (14:31 -0700)]
Merge tag 'arm64-upstream' of git://git./linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Although there isn't tonnes of code in terms of line count, there are
a fair few headline features which I've noted both in the tag and also
in the merge commits when I pulled everything together.
The part I'm most pleased with is that we had 35 contributors this
time around, which feels like a big jump from the usual small group of
core arm64 arch developers. Hopefully they all enjoyed it so much that
they'll continue to contribute, but we'll see.
It's probably worth highlighting that we've pulled in a branch from
the risc-v folks which moves our CPU topology code out to where it can
be shared with others.
Summary:
- 52-bit virtual addressing in the kernel
- New ABI to allow tagged user pointers to be dereferenced by
syscalls
- Early RNG seeding by the bootloader
- Improve robustness of SMP boot
- Fix TLB invalidation in light of recent architectural
clarifications
- Support for i.MX8 DDR PMU
- Remove direct LSE instruction patching in favour of static keys
- Function error injection using kprobes
- Support for the PPTT "thread" flag introduced by ACPI 6.3
- Move PSCI idle code into proper cpuidle driver
- Relaxation of implicit I/O memory barriers
- Build with RELR relocations when toolchain supports them
- Numerous cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (114 commits)
arm64: remove __iounmap
arm64: atomics: Use K constraint when toolchain appears to support it
arm64: atomics: Undefine internal macros after use
arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL
arm64: asm: Kill 'asm/atomic_arch.h'
arm64: lse: Remove unused 'alt_lse' assembly macro
arm64: atomics: Remove atomic_ll_sc compilation unit
arm64: avoid using hard-coded registers for LSE atomics
arm64: atomics: avoid out-of-line ll/sc atomics
arm64: Use correct ll/sc atomic constraints
jump_label: Don't warn on __exit jump entries
docs/perf: Add documentation for the i.MX8 DDR PMU
perf/imx_ddr: Add support for AXI ID filtering
arm64: kpti: ensure patched kernel text is fetched from PoU
arm64: fix fixmap copy for 16K pages and 48-bit VA
perf/smmuv3: Validate groups for global filtering
perf/smmuv3: Validate group size
arm64: Relax Documentation/arm64/tagged-pointers.rst
arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F
arm64: mm: Ignore spurious translation faults taken from the kernel
...
Linus Torvalds [Mon, 16 Sep 2019 21:14:40 +0000 (14:14 -0700)]
Merge tag 'iommu-updates-v5.4' of git://git./linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- batched unmap support for the IOMMU-API
- support for unlocked command queueing in the ARM-SMMU driver
- rework the ATS support in the ARM-SMMU driver
- more refactoring in the ARM-SMMU driver to support hardware
implemention specific quirks and errata
- bounce buffering DMA-API implementatation in the Intel VT-d driver
for untrusted devices (like Thunderbolt devices)
- fixes for runtime PM support in the OMAP iommu driver
- MT8183 IOMMU support in the Mediatek IOMMU driver
- rework of the way the IOMMU core sets the default domain type for
groups. Changing the default domain type on x86 does not require two
kernel parameters anymore.
- more smaller fixes and cleanups
* tag 'iommu-updates-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (113 commits)
iommu/vt-d: Declare Broadwell igfx dmar support snafu
iommu/vt-d: Add Scalable Mode fault information
iommu/vt-d: Use bounce buffer for untrusted devices
iommu/vt-d: Add trace events for device dma map/unmap
iommu/vt-d: Don't switch off swiotlb if bounce page is used
iommu/vt-d: Check whether device requires bounce buffer
swiotlb: Split size parameter to map/unmap APIs
iommu/omap: Mark pm functions __maybe_unused
iommu/ipmmu-vmsa: Disable cache snoop transactions on R-Car Gen3
iommu/ipmmu-vmsa: Move IMTTBCR_SL0_TWOBIT_* to restore sort order
iommu: Don't use sme_active() in generic code
iommu/arm-smmu-v3: Fix build error without CONFIG_PCI_ATS
iommu/qcom: Use struct_size() helper
iommu: Remove wrong default domain comments
iommu/dma: Fix for dereferencing before null checking
iommu/mediatek: Clean up struct mtk_smi_iommu
memory: mtk-smi: Get rid of need_larbid
iommu/mediatek: Fix VLD_PA_RNG register backup when suspend
memory: mtk-smi: Add bus_sel for mt8183
memory: mtk-smi: Invoke pm runtime_callback to enable clocks
...
Linus Torvalds [Mon, 16 Sep 2019 21:06:50 +0000 (14:06 -0700)]
Merge tag 'gpio-v5.4-1' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of changes in the GPIO subsystem for the v5.4 kernel
cycle.
Core changes:
- Support hierarchical GPIO irqchips.
We now have three consumers that can use this: Intel IXP4xx,
ThunderX and Qualcomm SPMI GPIO (in the pinctrl subsystem).
The support code has been long in the making and hashed out so it
should be easily adaptable for all hierarchical irqchip parents.
The code only gets compiled in if hierarchical irqchip is used at
the topmost irq controller at least, as the hierarchical irqchip
requires strict hierarchy all the way up in the system.
- Determine the need for a "valid_mask" for GPIO lines on the
gpio_chip and conversely for the "valid_mask" for the GPIO
interrupt chip interrupt lines by looking for a .init_valid_mask()
callback in the main chip or GPIO interrupt chip respectively.
Allocate it with bitmap_alloc().
- Isolate the device tree/open firmware GPIO description code out in
its own file properly.
- Isolate the ACPI GPIO description code out in its own file
properly.
- Drop a whole lot of #ifdef:s in the main includes: it does not hurt
to keep the include items around, and we get quicker and clearer
compile failures if the appropriate kernel symbols are not selected
for drivers.
New/deleted drivers:
- New driver for Aspeed SGPIO.
- The KS8695 driver is deleted as the platform gets deleted from
arch/arm in this kernel cycle.
- The Cirrus Logic Madera driver now supports CS47L92 and CS47L15.
- The Freescale MPC8xxx now supports LS1028A and LS1088A.
Driver improvements:
- We pass the GPIO irqchip intialization by directly filling in the
struct instead of using set-up functions (the new way) for Intel
MID, Lynxpoint, Merrifield, XLP, HLWD, Aspeed, ZX, VF610, TQMX86,
MT7621, Zynq and EP93xx.
Out-of-band changes:
- Fix a GPIO header inclusion in Unicore - no response from
maintainer.
- Drop FMC subsystem from MAINTAINERS - was deleted in the GPIO tree
last cycle so let's mop up the shards"
* tag 'gpio-v5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (82 commits)
gpiolib: of: add a fallback for wlf,reset GPIO name
gpio: htc-egpio: Remove unused exported htc_egpio_get_wakeup_irq()
gpio: remove explicit comparison with 0
gpio: creg-snps: use devm_platform_ioremap_resource() to simplify code
gpio: devres: Switch to EXPORT_SYMBOL_GPL()
gpio: of: Switch to EXPORT_SYMBOL_GPL()
gpio: of: Make of_gpio_simple_xlate() private
gpio: of: Make of_get_named_gpiod_flags() private
gpio: aspeed: Add in ast2600 details to Aspeed driver
gpio: aspeed: Use ngpio property from device tree if available
gpio: aspeed: Setup irqchip dynamically
gpio/aspeed: Fix incorrect number of banks
gpio: aspeed: Update documentation with ast2600 controllers
gpio: Initialize the irqchip valid_mask with a callback
gpiolib: acpi: make acpi_can_fallback_to_crs() static
gpio: Fix further merge errors
gpio: Fix up merge collision in include file
gpio: of: Normalize return code variable name
gpio: gpiolib: Normalize return code variable name
gpio: ep93xx: Pass irqchip when adding gpiochip
...
Linus Torvalds [Mon, 16 Sep 2019 21:04:46 +0000 (14:04 -0700)]
Merge tag 'i3c/for-5.4' of git://git./linux/kernel/git/i3c/linux
Pull i3c updates from Boris Brezillon:
"Core changes:
- Export i3c_device_match_id() so driver can get per-device data
- Add addr and lvr fields to i2c_dev_desc so we can attach I2C
devices that are not described in the DT
- Add a missing of_node_put()
- Fix a memory leak
- Use dev_to_i3cmaster() instead of open-coding it
Driver changes:
- Use for_each_set_bit() in the Cadence driver"
* tag 'i3c/for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: master: Use dev_to_i3cmaster()
i3c: master: fix a memory leak bug
i3c: add addr and lvr to i2c_dev_desc structure
i3c: master: cdns: Use for_each_set_bit()
i3c: master: Add of_node_put() before return
i3c: move i3c_device_match_id to device.c and export it
Linus Torvalds [Mon, 16 Sep 2019 21:02:43 +0000 (14:02 -0700)]
Merge tag 'spi-v5.4' of git://git./linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"The big theme for this release has been performance, we've had a
series of unrelated overhauls of a few drivers all with a big
peformance component.
Otherwise it's been relatively quiet, highlights include:
- A big overhaul of the spi-fsl-dspi driver improving the code
quality, performance and stability from Vladimir Oltean.
- A big performance enhancement for the bc2835 (Raspberry Pi) driver
for unidirectional transfers from Lukas Wunner.
- Improved performance on small transfers for the uniphier driver
from Keiji Hayashibara.
- Lots of coccinelle generated cleanups from Yue Haibing.
- New device support for Freescale ls2080a and Nuvoton NPCM FIU"
* tag 'spi-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (102 commits)
spi: mediatek: support large PA
spi: mediatek: add spi support for mt6765 IC
dt-bindings: spi: update bindings for MT6765 SoC
spi: bcm2835: Speed up RX-only DMA transfers by zero-filling TX FIFO
spi: bcm2835: Speed up TX-only DMA transfers by clearing RX FIFO
dmaengine: bcm2835: Avoid accessing memory when copying zeroes
spi: bcm2835: Cache CS register value for ->prepare_message()
dmaengine: bcm2835: Document struct bcm2835_dmadev
spi: Guarantee cacheline alignment of driver-private data
dmaengine: bcm2835: Allow reusable descriptors
dmaengine: bcm2835: Allow cyclic transactions without interrupt
spi: bcm2835: Drop dma_pending flag
spi: bcm2835: Work around DONE bit erratum
spi-gpio: Use PTR_ERR_OR_ZERO() in spi_gpio_request()
spi: Use an abbreviated pointer to ctlr->cur_msg in __spi_pump_messages
spi: npcm-fiu: remove set but not used variable 'retlen'
spi: fsl-spi: use devm_platform_ioremap_resource() to simplify code
spi: zynq-qspi: use devm_platform_ioremap_resource() to simplify code
spi: zynqmp: use devm_platform_ioremap_resource() to simplify code
spi: xlp: use devm_platform_ioremap_resource() to simplify code
...
Linus Torvalds [Mon, 16 Sep 2019 20:58:43 +0000 (13:58 -0700)]
Merge tag 'regulator-v5.4' of git://git./linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"A small update for the regualtor API for this cycle, some small fixes
and a bunch of new devices but none of them very big.
The most stand out thing is the regulator-fixed-clock driver which is
for regulators where the enable control is done by using a clock
instead of a GPIO or register write, a novel hardware design that had
not previously come up.
Summary:
- Added a keyword pattern for regulator_get_optional() since usage of
that API generally needs extra review.
- Operating mode and suspend state support for act8865.
- New device support for Active Semiconductor ACT8600 chargers,
Mediatek MT6358, Qualcomm SM8150, regulator-fixed-clock, and
Synoptics SY20276, SY20278 and SY8824E"
* tag 'regulator-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (52 commits)
regulator: core: Fix error return for /sys access
regulator: da9211: fix obtaining "enable" GPIO
regulator: max77686: fix obtaining "maxim,ena" GPIO
regulator: uniphier: Add Pro5 USB3 VBUS support
dt-bindings: regulator: add regulator-fixed-clock binding
regulator: fixed: add possibility to enable by clock
regulator: s2mps11: Consistently use local variable
regulator: lp87565: Simplify lp87565_buck_set_ramp_delay
regulator: slg51000: use devm_gpiod_get_optional() in probe
regulator: lp8788-ldo: make array en_mask static const, makes object smaller
regulator: tps65132: Stop parsing DT when gpio is not found
regulator: Defer init completion for a while after late_initcall
regulator: add missing 'static inline' to a helper's stub
regulator: provide regulator_bulk_set_supply_names()
MAINTAINERS: Add keyword pattern on regulator_get_optional()
regulator: sy8824x: add prefixes to BUCK_EN and MODE macros
regulator: sy8824x: use c++style for the comment block near SPDX
regulator: mt6358: Add BROKEN dependency while waiting for MFD to merge
regulator: mt6358: Add support for MT6358 regulator
regulator: Add document for MT6358 regulator
...
Linus Torvalds [Mon, 16 Sep 2019 20:57:02 +0000 (13:57 -0700)]
Merge tag 'regmap-v5.4' of git://git./linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"Only two changes for this release, one fix for error handling with
runtime PM and a change from Greg removing error handling from debugfs
API calls now that they implement user visible error reporting"
* tag 'regmap-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap-irq: Correct error paths in regmap_irq_thread for pm_runtime
regmap: no need to check return value of debugfs_create functions
Linus Torvalds [Mon, 16 Sep 2019 20:44:16 +0000 (13:44 -0700)]
Merge tag 'hwmon-for-v5.4' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
"New drivers:
- Inspur Power System power supply driver
- Synaptics AS370 PVT sensor driver
Chip support:
- support SHTC3 in shtc1 driver
- support NCT6116 in nct6775 driver
- support AMD family 17h, model 70h CPUs in k10temp driver
- support PCT2075 in lm75 driver
Removed drivers:
- ads1015 driver (now supported in iio)
Other changes:
- Convert drivers to use devm_i2c_new_dummy_device
- Substantial structural improvements in lm75 driver adding support
for writing sample interval for supported chips
- Add support for PSU version 2 to ibm-cffps driver
- Add support for power attribute to iio_hwmon bridge
- Add support for additional fan, voltage and temperature attributes
to nct7904 driver
- Convert adt7475 driver to use hwmon_device_register_with_groups()
- Convert k8temp driver to use hwmon_device_register_with_info()
- Various other improvements and minor fixes"
* tag 'hwmon-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (48 commits)
hwmon: submitting-patches: Add note on comment style
hwmon: submitting-patches: Point to with_info API
hwmon: (nct7904) Fix incorrect SMI status register setting of LTD temperature and fan.
hwmon: (shtc1) add support for the SHTC3 sensor
hwmon: (shtc1) fix shtc1 and shtw1 id mask
hwmon: (lm75) Aproximate sample times to data-sheet values
hwmon: (w83793d) convert to use devm_i2c_new_dummy_device
hwmon: (w83792d) convert to use devm_i2c_new_dummy_device
hwmon: (w83791d) convert to use devm_i2c_new_dummy_device
hwmon: (as370-hwmon) fix devm_platform_ioremap_resource.cocci warnings
hwmon: (lm75) Add support for writing sampling period on PCT2075
hwmon: (lm75) Add support for writing conversion time for TMP112
hwmon: (lm75) Move updating the sample interval to its own function
hwmon: (lm75) Support configuring the sample time for various chips
hwmon: (nct7904) Fix incorrect temperature limitation register setting of LTD.
hwmon: (as370-hwmon) Add DT bindings for Synaptics AS370 PVT
hwmon: Add Synaptics AS370 PVT sensor driver
pmbus: (ibm-cffps) Add support for version 2 of the PSU
dt-bindings: hwmon: Document ibm,cffps2 compatible string
hwmon: (iio_hwmon) Enable power exporting from IIO
...
Linus Torvalds [Mon, 16 Sep 2019 20:42:25 +0000 (13:42 -0700)]
Merge branch 'ras-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
"The latest meager RAS updates:
- Enable processing of action-optional MCEs which have the Overflow
bit set (Tony Luck)
- -Wmissing-prototypes warning fix and a build fix (Valdis
Klētnieks)"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
RAS: Build debugfs.o only when enabled in Kconfig
RAS: Fix prototype warnings
x86/mce: Don't check for the overflow bit on action optional machine checks
Linus Torvalds [Mon, 16 Sep 2019 20:38:45 +0000 (13:38 -0700)]
Merge tag 'edac_for_5.4' of git://git./linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:
"The new thing this time around is that we have three maintainers now
and a new, old repo. New because it is new for the EDAC tree which is
hosted there from now on and old because it is Tony's and mine's old
RAS repo which we still use occasionally when the stuff isn't in tip.
Summary:
- EDAC tree has three maintainers and one new designated reviewer
now, so that the work can scale better.
- New driver for Mellanox' BlueField SoC DDR controller (Shravan
Kumar Ramani)
- AMD Rome support in amd64_edac (Yazen Ghannam and Isaac Vaughn)
- Misc fixes, cleanups and code improvements"
* tag 'edac_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/amd64: Add PCI device IDs for family 17h, model 70h
MAINTAINERS: Add Robert as a EDAC reviewer
EDAC/mc_sysfs: Make debug messages consistent
EDAC/mc_sysfs: Remove pointless gotos
EDAC: Prefer 'unsigned int' to bare use of 'unsigned'
EDAC/amd64: Support asymmetric dual-rank DIMMs
EDAC/amd64: Cache secondary Chip Select registers
EDAC/amd64: Decode syndrome before translating address
EDAC/amd64: Find Chip Select memory size using Address Mask
EDAC/amd64: Initialize DIMM info for systems with more than two channels
EDAC/amd64: Recognize DRAM device type ECC capability
EDAC/amd64: Support more than two controllers for chip selects handling
EDAC/mc: Cleanup _edac_mc_free() code
EDAC, pnd2: Fix ioremap() size in dnv_rd_reg()
EDAC, mellanox: Add ECC support for BlueField DDR4
EDAC/altera: Use the proper type for the IRQ status bits
EDAC/mc: Fix grain_bits calculation
edac: altera: Move Stratix10 SDRAM ECC to peripheral
MAINTAINERS: update EDAC entry to reflect current tree and maintainers
Linus Torvalds [Mon, 16 Sep 2019 20:34:04 +0000 (13:34 -0700)]
Merge tag 'tpmdd-next-
20190902' of git://git.infradead.org/users/jjs/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
"A new driver for fTPM living inside ARM TEE was added this round.
In addition to that, there are three bug fixes and one clean up"
* tag 'tpmdd-next-
20190902' of git://git.infradead.org/users/jjs/linux-tpmdd:
tpm/tpm_ftpm_tee: Document fTPM TEE driver
tpm/tpm_ftpm_tee: A driver for firmware TPM running inside TEE
tpm: Remove a deprecated comments about implicit sysfs locking
tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts
tpm_tis_core: Turn on the TPM before probing IRQ's
MAINTAINERS: fix style in KEYS-TRUSTED entry
Linus Torvalds [Mon, 16 Sep 2019 16:28:19 +0000 (09:28 -0700)]
Merge tag 'core-process-v5.4' of git://git./linux/kernel/git/brauner/linux
Pull pidfd/waitid updates from Christian Brauner:
"This contains two features and various tests.
First, it adds support for waiting on process through pidfds by adding
the P_PIDFD type to the waitid() syscall. This completes the basic
functionality of the pidfd api (cf. [1]). In the meantime we also have
a new adition to the userspace projects that make use of the pidfd
api. The qt project was nice enough to send a mail pointing out that
they have a pr up to switch to the pidfd api (cf. [2]).
Second, this tag contains an extension to the waitid() syscall to make
it possible to wait on the current process group in a race free manner
(even though the actual problem is very unlikely) by specifing 0
together with the P_PGID type. This extension traces back to a
discussion on the glibc development mailing list.
There are also a range of tests for the features above. Additionally,
the test-suite which detected the pidfd-polling race we fixed in [3]
is included in this tag"
[1] https://lwn.net/Articles/794707/
[2] https://codereview.qt-project.org/c/qt/qtbase/+/108456
[3] commit
b191d6491be6 ("pidfd: fix a poll race when setting exit_state")
* tag 'core-process-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
waitid: Add support for waiting for the current process group
tests: add pidfd poll tests
tests: move common definitions and functions into pidfd.h
pidfd: add pidfd_wait tests
pidfd: add P_PIDFD to waitid()
Ingo Molnar [Mon, 16 Sep 2019 12:04:28 +0000 (14:04 +0200)]
Merge branch 'sched/rt' into sched/core, to pick up -rt changes
Pick up the first couple of patches working towards PREEMPT_RT.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sun, 15 Sep 2019 21:19:32 +0000 (14:19 -0700)]
Linux 5.3
Linus Torvalds [Sun, 15 Sep 2019 19:32:03 +0000 (12:32 -0700)]
Revert "ext4: make __ext4_get_inode_loc plug"
This reverts commit
b03755ad6f33b7b8cd7312a3596a2dbf496de6e7.
This is sad, and done for all the wrong reasons. Because that commit is
good, and does exactly what it says: avoids a lot of small disk requests
for the inode table read-ahead.
However, it turns out that it causes an entirely unrelated problem: the
getrandom() system call was introduced back in 2014 by commit
c6e9d6f38894 ("random: introduce getrandom(2) system call"), and people
use it as a convenient source of good random numbers.
But part of the current semantics for getrandom() is that it waits for
the entropy pool to fill at least partially (unlike /dev/urandom). And
at least ArchLinux apparently has a systemd that uses getrandom() at
boot time, and the improvements in IO patterns means that existing
installations suddenly start hanging, waiting for entropy that will
never happen.
It seems to be an unlucky combination of not _quite_ enough entropy,
together with a particular systemd version and configuration. Lennart
says that the systemd-random-seed process (which is what does this early
access) is supposed to not block any other boot activity, but sadly that
doesn't actually seem to be the case (possibly due bogus dependencies on
cryptsetup for encrypted swapspace).
The correct fix is to fix getrandom() to not block when it's not
appropriate, but that fix is going to take a lot more discussion. Do we
just make it act like /dev/urandom by default, and add a new flag for
"wait for entropy"? Do we add a boot-time option? Or do we just limit
the amount of time it will wait for entropy?
So in the meantime, we do the revert to give us time to discuss the
eventual fix for the fundamental problem, at which point we can re-apply
the ext4 inode table access optimization.
Reported-by: Ahmed S. Darwish <darwish.07@gmail.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Alexander E. Patrakov <patrakov@gmail.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Brown [Sun, 15 Sep 2019 09:32:06 +0000 (10:32 +0100)]
Merge branch 'spi-5.4' into spi-next
Mark Brown [Sun, 15 Sep 2019 09:32:04 +0000 (10:32 +0100)]
Merge branch 'spi-5.3' into spi-linus
Linus Torvalds [Sat, 14 Sep 2019 23:07:40 +0000 (16:07 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"The main change here is a revert of reverts. We recently simplified
some code that was thought unnecessary; however, since then KVM has
grown quite a few cond_resched()s and for that reason the simplified
code is prone to livelocks---one CPUs tries to empty a list of guest
page tables while the others keep adding to them. This adds back the
generation-based zapping of guest page tables, which was not
unnecessary after all.
On top of this, there is a fix for a kernel memory leak and a couple
of s390 fixlets as well"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
KVM: x86: work around leak of uninitialized stack contents
KVM: nVMX: handle page fault in vmread
KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl
KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset()
Linus Torvalds [Sat, 14 Sep 2019 23:02:49 +0000 (16:02 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio fix from Michael Tsirkin:
"A last minute revert
The 32-bit build got broken by the latest defence in depth patch.
Revert and we'll try again in the next cycle"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
Revert "vhost: block speculation of translated descriptors"
Linus Torvalds [Sat, 14 Sep 2019 22:58:02 +0000 (15:58 -0700)]
Merge tag 'riscv/for-v5.3' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fix from Paul Walmsley:
"Last week, Palmer and I learned that there was an error in the RISC-V
kernel image header format that could make it less compatible with the
ARM64 kernel image header format. I had missed this error during my
original reviews of the patch.
The kernel image header format is an interface that impacts
bootloaders, QEMU, and other user tools. Those packages must be
updated to align with whatever is merged in the kernel. We would like
to avoid proliferating these image formats by keeping the RISC-V
header as close as possible to the existing ARM64 header. Since the
arch/riscv patch that adds support for the image header was merged
with our v5.3-rc1 pull request as commit
0f327f2aaad6a ("RISC-V: Add
an Image header that boot loader can parse."), we think it wise to try
to fix this error before v5.3 is released.
The fix itself should be backwards-compatible with any project that
has already merged support for premature versions of this interface.
It primarily involves ensuring that the RISC-V image header has
something useful in the same field as the ARM64 image header"
* tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: modify the Image header to improve compatibility with the ARM64 header
Michael S. Tsirkin [Sat, 14 Sep 2019 19:21:51 +0000 (15:21 -0400)]
Revert "vhost: block speculation of translated descriptors"
This reverts commit
a89db445fbd7f1f8457b03759aa7343fa530ef6b.
I was hasty to include this patch, and it breaks the build on 32 bit.
Defence in depth is good but let's do it properly.
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Linus Torvalds [Sat, 14 Sep 2019 19:20:38 +0000 (12:20 -0700)]
Merge git://git./linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Don't corrupt xfrm_interface parms before validation, from Nicolas
Dichtel.
2) Revert use of usb-wakeup in btusb, from Mario Limonciello.
3) Block ipv6 packets in bridge netfilter if ipv6 is disabled, from
Leonardo Bras.
4) IPS_OFFLOAD not honored in ctnetlink, from Pablo Neira Ayuso.
5) Missing ULP check in sock_map, from John Fastabend.
6) Fix receive statistic handling in forcedeth, from Zhu Yanjun.
7) Fix length of SKB allocated in 6pack driver, from Christophe
JAILLET.
8) ip6_route_info_create() returns an error pointer, not NULL. From
Maciej Żenczykowski.
9) Only add RDS sock to the hashes after rs_transport is set, from
Ka-Cheong Poon.
10) Don't double clean TX descriptors in ixgbe, from Ilya Maximets.
11) Presence of transmit IPSEC offload in an SKB is not tested for
correctly in ixgbe and ixgbevf. From Steffen Klassert and Jeff
Kirsher.
12) Need rcu_barrier() when register_netdevice() takes one of the
notifier based failure paths, from Subash Abhinov Kasiviswanathan.
13) Fix leak in sctp_do_bind(), from Mao Wenan.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
cdc_ether: fix rndis support for Mediatek based smartphones
sctp: destroy bucket if failed to bind addr
sctp: remove redundant assignment when call sctp_get_port_local
sctp: change return type of sctp_get_port_local
ixgbevf: Fix secpath usage for IPsec Tx offload
sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'
ixgbe: Fix secpath usage for IPsec TX offload.
net: qrtr: fix memort leak in qrtr_tun_write_iter
net: Fix null de-reference of device refcount
ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()'
tun: fix use-after-free when register netdev failed
tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR
ixgbe: fix double clean of Tx descriptors with xdp
ixgbe: Prevent u8 wrapping of ITR value to something less than 10us
mlx4: fix spelling mistake "veify" -> "verify"
net: hns3: fix spelling mistake "undeflow" -> "underflow"
net: lmc: fix spelling mistake "runnin" -> "running"
NFC: st95hf: fix spelling mistake "receieve" -> "receive"
net/rds: An rds_sock is added too early to the hash table
mac80211: Do not send Layer 2 Update frame before authorization
...
Linus Torvalds [Sat, 14 Sep 2019 19:08:19 +0000 (12:08 -0700)]
Merge tag 'mmc-v5.3-rc8' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
- tmio: Fixup runtime PM management during probe and remove
- sdhci-pci-o2micro: Fix eMMC initialization for an AMD SoC
- bcm2835: Prevent lockups when terminating work
* tag 'mmc-v5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: tmio: Fixup runtime PM management during remove
mmc: tmio: Fixup runtime PM management during probe
Revert "mmc: tmio: move runtime PM enablement to the driver implementations"
Revert "mmc: sdhci: Remove unneeded quirk2 flag of O2 SD host controller"
Revert "mmc: bcm2835: Terminate timeout work synchronously"
Linus Torvalds [Sat, 14 Sep 2019 18:54:57 +0000 (11:54 -0700)]
Merge tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"From the maintainer summit, just some last minute fixes for final:
lima:
- fix gem_wait ioctl
core:
- constify modes list
i915:
- DP MST high color depth regression
- GPU hangs on vulkan compute workloads"
* tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm:
drm/lima: fix lima_gem_wait() return value
drm/i915: Restore relaxed padding (OCL_OOB_SUPPRES_ENABLE) for skl+
drm/i915: Limit MST to <= 8bpc once again
drm/modes: Make the whitelist more const
Paolo Bonzini [Sat, 14 Sep 2019 07:25:30 +0000 (09:25 +0200)]
Merge tag 'kvm-s390-master-5.3-1' of git://git./linux/kernel/git/kvms390/linux into kvm-master
KVM: s390: Fixes for 5.3
- prevent a user triggerable oops in the migration code
- do not leak kernel stack content
Sean Christopherson [Fri, 13 Sep 2019 02:46:02 +0000 (19:46 -0700)]
KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
James Harvey reported a livelock that was introduced by commit
d012a06ab1d23 ("Revert "KVM: x86/mmu: Zap only the relevant pages when
removing a memslot"").
The livelock occurs because kvm_mmu_zap_all() as it exists today will
voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs
to add shadow pages. With enough vCPUs, kvm_mmu_zap_all() can get stuck
in an infinite loop as it can never zap all pages before observing lock
contention or the need to reschedule. The equivalent of kvm_mmu_zap_all()
that was in use at the time of the reverted commit (
4e103134b8623, "KVM:
x86/mmu: Zap only the relevant pages when removing a memslot") employed
a fast invalidate mechanism and was not susceptible to the above livelock.
There are three ways to fix the livelock:
- Reverting the revert (commit
d012a06ab1d23) is not a viable option as
the revert is needed to fix a regression that occurs when the guest has
one or more assigned devices. It's unlikely we'll root cause the device
assignment regression soon enough to fix the regression timely.
- Remove the conditional reschedule from kvm_mmu_zap_all(). However, although
removing the reschedule would be a smaller code change, it's less safe
in the sense that the resulting kvm_mmu_zap_all() hasn't been used in
the wild for flushing memslots since the fast invalidate mechanism was
introduced by commit
6ca18b6950f8d ("KVM: x86: use the fast way to
invalidate all pages"), back in 2013.
- Reintroduce the fast invalidate mechanism and use it when zapping shadow
pages in response to a memslot being deleted/moved, which is what this
patch does.
For all intents and purposes, this is a revert of commit
ea145aacf4ae8
("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of
commit
7390de1e99a70 ("Revert "KVM: x86: use the fast way to invalidate
all pages""), i.e. restores the behavior of commit
5304b8d37c2a5 ("KVM:
MMU: fast invalidate all pages") and commit
6ca18b6950f8d ("KVM: x86:
use the fast way to invalidate all pages") respectively.
Fixes:
d012a06ab1d23 ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"")
Reported-by: James Harvey <jamespharvey20@gmail.com>
Cc: Alex Willamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fuqian Huang [Thu, 12 Sep 2019 04:18:17 +0000 (12:18 +0800)]
KVM: x86: work around leak of uninitialized stack contents
Emulation of VMPTRST can incorrectly inject a page fault
when passed an operand that points to an MMIO address.
The page fault will use uninitialized kernel stack memory
as the CR2 and error code.
The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR
exit to userspace; however, it is not an easy fix, so for now just ensure
that the error code and CR2 are zero.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Cc: stable@vger.kernel.org
[add comment]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 13 Sep 2019 22:26:27 +0000 (00:26 +0200)]
KVM: nVMX: handle page fault in vmread
The implementation of vmread to memory is still incomplete, as it
lacks the ability to do vmread to I/O memory just like vmptrst.
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paul Walmsley [Sat, 14 Sep 2019 01:35:50 +0000 (18:35 -0700)]
riscv: modify the Image header to improve compatibility with the ARM64 header
Part of the intention during the definition of the RISC-V kernel image
header was to lay the groundwork for a future merge with the ARM64
image header. One error during my original review was not noticing
that the RISC-V header's "magic" field was at a different size and
position than the ARM64's "magic" field. If the existing ARM64 Image
header parsing code were to attempt to parse an existing RISC-V kernel
image header format, it would see a magic number 0. This is
undesirable, since it's our intention to align as closely as possible
with the ARM64 header format. Another problem was that the original
"res3" field was not being initialized correctly to zero.
Address these issues by creating a 32-bit "magic2" field in the RISC-V
header which matches the ARM64 "magic" field. RISC-V binaries will
store "RSC\x05" in this field. The intention is that the use of the
existing 64-bit "magic" field in the RISC-V header will be deprecated
over time. Increment the minor version number of the file format to
indicate this change, and update the documentation accordingly. Fix
the assembler directives in head.S to ensure that reserved fields are
properly zero-initialized.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Reported-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Karsten Merker <merker@debian.org>
Link: https://lore.kernel.org/linux-riscv/194c2f10c9806720623430dbf0cc59a965e50448.camel@wdc.com/T/#u
Link: https://lore.kernel.org/linux-riscv/mhng-755b14c4-8f35-4079-a7ff-e421fd1b02bc@palmer-si-x1e/T/#t
Bjørn Mork [Thu, 12 Sep 2019 08:42:00 +0000 (10:42 +0200)]
cdc_ether: fix rndis support for Mediatek based smartphones
A Mediatek based smartphone owner reports problems with USB
tethering in Linux. The verbose USB listing shows a rndis_host
interface pair (e0/01/03 + 10/00/00), but the driver fails to
bind with
[ 355.960428] usb 1-4: bad CDC descriptors
The problem is a failsafe test intended to filter out ACM serial
functions using the same 02/02/ff class/subclass/protocol as RNDIS.
The serial functions are recognized by their non-zero bmCapabilities.
No RNDIS function with non-zero bmCapabilities were known at the time
this failsafe was added. But it turns out that some Wireless class
RNDIS functions are using the bmCapabilities field. These functions
are uniquely identified as RNDIS by their class/subclass/protocol, so
the failing test can safely be disabled. The same applies to the two
types of Misc class RNDIS functions.
Applying the failsafe to Communication class functions only retains
the original functionality, and fixes the problem for the Mediatek based
smartphone.
Tow examples of CDC functional descriptors with non-zero bmCapabilities
from Wireless class RNDIS functions are:
0e8d:000a Mediatek Crosscall Spider X5 3G Phone
CDC Header:
bcdCDC 1.10
CDC ACM:
bmCapabilities 0x0f
connection notifications
sends break
line coding and serial state
get/set/clear comm features
CDC Union:
bMasterInterface 0
bSlaveInterface 1
CDC Call Management:
bmCapabilities 0x03
call management
use DataInterface
bDataInterface 1
and
19d2:1023 ZTE K4201-z
CDC Header:
bcdCDC 1.10
CDC ACM:
bmCapabilities 0x02
line coding and serial state
CDC Call Management:
bmCapabilities 0x03
call management
use DataInterface
bDataInterface 1
CDC Union:
bMasterInterface 0
bSlaveInterface 1
The Mediatek example is believed to apply to most smartphones with
Mediatek firmware. The ZTE example is most likely also part of a larger
family of devices/firmwares.
Suggested-by: Lars Melin <larsm17@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 13 Sep 2019 20:06:20 +0000 (22:06 +0200)]
Merge branch 'sctp_do_bind-leak'
Mao Wenan says:
====================
fix memory leak for sctp_do_bind
First two patches are to do cleanup, remove redundant assignment,
and change return type of sctp_get_port_local.
Third patch is to fix memory leak for sctp_do_bind if failed
to bind address.
v2: add one patch to change return type of sctp_get_port_local.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mao Wenan [Thu, 12 Sep 2019 04:02:19 +0000 (12:02 +0800)]
sctp: destroy bucket if failed to bind addr
There is one memory leak bug report:
BUG: memory leak
unreferenced object 0xffff8881dc4c5ec0 (size 40):
comm "syz-executor.0", pid 5673, jiffies
4298198457 (age 27.578s)
hex dump (first 32 bytes):
02 00 00 00 81 88 ff ff 00 00 00 00 00 00 00 00 ................
f8 63 3d c1 81 88 ff ff 00 00 00 00 00 00 00 00 .c=.............
backtrace:
[<
0000000072006339>] sctp_get_port_local+0x2a1/0xa00 [sctp]
[<
00000000c7b379ec>] sctp_do_bind+0x176/0x2c0 [sctp]
[<
000000005be274a2>] sctp_bind+0x5a/0x80 [sctp]
[<
00000000b66b4044>] inet6_bind+0x59/0xd0 [ipv6]
[<
00000000c68c7f42>] __sys_bind+0x120/0x1f0 net/socket.c:1647
[<
000000004513635b>] __do_sys_bind net/socket.c:1658 [inline]
[<
000000004513635b>] __se_sys_bind net/socket.c:1656 [inline]
[<
000000004513635b>] __x64_sys_bind+0x3e/0x50 net/socket.c:1656
[<
0000000061f2501e>] do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
[<
0000000003d1e05e>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
This is because in sctp_do_bind, if sctp_get_port_local is to
create hash bucket successfully, and sctp_add_bind_addr failed
to bind address, e.g return -ENOMEM, so memory leak found, it
needs to destroy allocated bucket.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mao Wenan [Thu, 12 Sep 2019 04:02:18 +0000 (12:02 +0800)]
sctp: remove redundant assignment when call sctp_get_port_local
There are more parentheses in if clause when call sctp_get_port_local
in sctp_do_bind, and redundant assignment to 'ret'. This patch is to
do cleanup.
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mao Wenan [Thu, 12 Sep 2019 04:02:17 +0000 (12:02 +0800)]
sctp: change return type of sctp_get_port_local
Currently sctp_get_port_local() returns a long
which is either 0,1 or a pointer casted to long.
It's neither of the callers use the return value since
commit
62208f12451f ("net: sctp: simplify sctp_get_port").
Now two callers are sctp_get_port and sctp_do_bind,
they actually assumend a casted to an int was the same as
a pointer casted to a long, and they don't save the return
value just check whether it is zero or non-zero, so
it would better change return type from long to int for
sctp_get_port_local.
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher [Thu, 12 Sep 2019 19:07:34 +0000 (12:07 -0700)]
ixgbevf: Fix secpath usage for IPsec Tx offload
Port the same fix for ixgbe to ixgbevf.
The ixgbevf driver currently does IPsec Tx offloading
based on an existing secpath. However, the secpath
can also come from the Rx side, in this case it is
misinterpreted for Tx offload and the packets are
dropped with a "bad sa_idx" error. Fix this by using
the xfrm_offload() function to test for Tx offload.
CC: Shannon Nelson <snelson@pensando.io>
Fixes:
7f68d4306701 ("ixgbevf: enable VF IPsec offload operations")
Reported-by: Jonathan Tooker <jonathan@reliablehosting.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Fri, 13 Sep 2019 12:43:06 +0000 (05:43 -0700)]
hwmon: submitting-patches: Add note on comment style
Ask for standard multi-line comments, and ask for consistent
comment style.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Guenter Roeck [Fri, 13 Sep 2019 12:06:45 +0000 (05:06 -0700)]
hwmon: submitting-patches: Point to with_info API
New driver should use devm_hwmon_device_register_with_info() or
hwmon_device_register_with_info() to register with the hwmon subsystem.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Ulf Hansson [Fri, 13 Sep 2019 09:20:22 +0000 (11:20 +0200)]
mmc: tmio: Fixup runtime PM management during remove
Accessing the device when it may be runtime suspended is a bug, which is
the case in tmio_mmc_host_remove(). Let's fix the behaviour.
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Ulf Hansson [Fri, 13 Sep 2019 09:19:26 +0000 (11:19 +0200)]
mmc: tmio: Fixup runtime PM management during probe
The tmio_mmc_host_probe() calls pm_runtime_set_active() to update the
runtime PM status of the device, as to make it reflect the current status
of the HW. This works fine for most cases, but unfortunate not for all.
Especially, there is a generic problem when the device has a genpd attached
and that genpd have the ->start|stop() callbacks assigned.
More precisely, if the driver calls pm_runtime_set_active() during
->probe(), genpd does not get to invoke the ->start() callback for it,
which means the HW isn't really fully powered on. Furthermore, in the next
phase, when the device becomes runtime suspended, genpd will invoke the
->stop() callback for it, potentially leading to usage count imbalance
problems, depending on what's implemented behind the callbacks of course.
To fix this problem, convert to call pm_runtime_get_sync() from
tmio_mmc_host_probe() rather than pm_runtime_set_active(). Additionally, to
avoid bumping usage counters and unnecessary re-initializing the HW the
first time the tmio driver's ->runtime_resume() callback is called,
introduce a state flag to keeping track of this.
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Ulf Hansson [Fri, 13 Sep 2019 08:03:15 +0000 (10:03 +0200)]
Revert "mmc: tmio: move runtime PM enablement to the driver implementations"
This reverts commit
7ff213193310ef8d0ee5f04f79d791210787ac2c.
It turns out that the above commit introduces other problems. For example,
calling pm_runtime_set_active() must not be done prior calling
pm_runtime_enable() as that makes it fail. This leads to additional
problems, such as clock enables being wrongly balanced.
Rather than fixing the problem on top, let's start over by doing a revert.
Fixes:
7ff213193310 ("mmc: tmio: move runtime PM enablement to the driver implementations")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
luhua.xu [Wed, 11 Sep 2019 09:55:31 +0000 (05:55 -0400)]
spi: mediatek: support large PA
Add spi large PA(max=64G) support for DMA transfer.
Signed-off-by: luhua.xu <luhua.xu@mediatek.com>
Link: https://lore.kernel.org/r/1568195731-3239-4-git-send-email-luhua.xu@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>
luhua.xu [Wed, 11 Sep 2019 09:55:30 +0000 (05:55 -0400)]
spi: mediatek: add spi support for mt6765 IC
This patch add spi support for mt6765 IC.
Signed-off-by: luhua.xu <luhua.xu@mediatek.com>
Link: https://lore.kernel.org/r/1568195731-3239-3-git-send-email-luhua.xu@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>
luhua.xu [Wed, 11 Sep 2019 09:55:29 +0000 (05:55 -0400)]
dt-bindings: spi: update bindings for MT6765 SoC
Add a DT binding documentation for the MT6765 soc.
Signed-off-by: luhua.xu <luhua.xu@mediatek.com>
Link: https://lore.kernel.org/r/1568195731-3239-2-git-send-email-luhua.xu@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Linus Torvalds [Fri, 13 Sep 2019 08:52:01 +0000 (09:52 +0100)]
Merge branch 'for-5.3-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
"Roman found and fixed a bug in the cgroup2 freezer which allows new
child cgroup to escape frozen state"
* 'for-5.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: freezer: fix frozen state inheritance
kselftests: cgroup: add freezer mkdir test
Linus Torvalds [Fri, 13 Sep 2019 08:48:47 +0000 (09:48 +0100)]
Merge tag 'for-5.3-rc8-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Here are two fixes, one of them urgent fixing a bug introduced in 5.2
and reported by many users. It took time to identify the root cause,
catching the 5.3 release is higly desired also to push the fix to 5.2
stable tree.
The bug is a mess up of return values after adding proper error
handling and honestly the kind of bug that can cause sleeping
disorders until it's caught. My appologies to everybody who was
affected.
Summary of what could happen:
1) either a hang when committing a transaction, if this happens
there's no risk of corruption, still the hang is very inconvenient
and can't be resolved without a reboot
2) writeback for some btree nodes may never be started and we end up
committing a transaction without noticing that, this is really
serious and that will lead to the "parent transid verify failed"
messages"
* tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix unwritten extent buffers and hangs on future writeback attempts
Btrfs: fix assertion failure during fsync and use of stale transaction
Miles Chen [Thu, 12 Sep 2019 10:34:52 +0000 (18:34 +0800)]
sched/psi: Correct overly pessimistic size calculation
When passing a equal or more then 32 bytes long string to psi_write(),
psi_write() copies 31 bytes to its buf and overwrites buf[30]
with '\0'. Which makes the input string 1 byte shorter than
it should be.
Fix it by copying sizeof(buf) bytes when nbytes >= sizeof(buf).
This does not cause problems in normal use case like:
"some 500000
10000000" or "full 500000
10000000" because they
are less than 32 bytes in length.
/* assuming nbytes == 35 */
char buf[32];
buf_size = min(nbytes, (sizeof(buf) - 1)); /* buf_size = 31 */
if (copy_from_user(buf, user_buf, buf_size))
return -EFAULT;
buf[buf_size - 1] = '\0'; /* buf[30] = '\0' */
Before:
%cd /proc/pressure/
%echo "
123456789|
123456789|
123456789|1234" > memory
[ 22.473497] nbytes=35,buf_size=31
[ 22.473775]
123456789|
123456789|
123456789| (print 30 chars)
%sh: write error: Invalid argument
%echo "
123456789|
123456789|
123456789|1" > memory
[ 64.916162] nbytes=32,buf_size=31
[ 64.916331]
123456789|
123456789|
123456789| (print 30 chars)
%sh: write error: Invalid argument
After:
%cd /proc/pressure/
%echo "
123456789|
123456789|
123456789|1234" > memory
[ 254.837863] nbytes=35,buf_size=32
[ 254.838541]
123456789|
123456789|
123456789|1 (print 31 chars)
%sh: write error: Invalid argument
%echo "
123456789|
123456789|
123456789|1" > memory
[ 9965.714935] nbytes=32,buf_size=32
[ 9965.715096]
123456789|
123456789|
123456789|1 (print 31 chars)
%sh: write error: Invalid argument
Also remove the superfluous parentheses.
Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Cc: <linux-mediatek@lists.infradead.org>
Cc: <wsd_upstream@mediatek.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190912103452.13281-1-miles.chen@mediatek.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Quentin Perret [Thu, 12 Sep 2019 09:44:04 +0000 (11:44 +0200)]
sched/fair: Speed-up energy-aware wake-ups
EAS computes the energy impact of migrating a waking task when deciding
on which CPU it should run. However, the current approach is known to
have a high algorithmic complexity, which can result in prohibitively
high wake-up latencies on systems with complex energy models, such as
systems with per-CPU DVFS. On such systems, the algorithm complexity is
in O(n^2) (ignoring the cost of searching for performance states in the
EM) with 'n' the number of CPUs.
To address this, re-factor the EAS wake-up path to compute the energy
'delta' (with and without the task) on a per-performance domain basis,
rather than system-wide, which brings the complexity down to O(n).
No functional changes intended.
Test results
~~~~~~~~~~~~
* Setup: Tested on a Google Pixel 3, with a Snapdragon 845 (4+4 CPUs,
A55/A75). Base kernel is 5.3-rc5 + Pixel3 specific patches. Android
userspace, no graphics.
* Test case: Run a periodic rt-app task, with 16ms period, ramping down
from 70% to 10%, in 5% steps of 500 ms each (json avail. at [1]).
Frequencies of all CPUs are pinned to max (using scaling_min_freq
CPUFreq sysfs entries) to reduce variability. The time to run
select_task_rq_fair() is measured using the function profiler
(/sys/kernel/debug/tracing/trace_stat/function*). See the test script
for more details [2].
Test 1:
I hacked the DT to 'fake' per-CPU DVFS. That is, we end up with one
CPUFreq policy per CPU (8 policies in total). Since all frequencies are
pinned to max for the test, this should have no impact on the actual
frequency selection, but it does in the EAS calculation.
+---------------------------+----------------------------------+
| Without patch | With patch |
+-----+-----+----------+----------+-----+-----------------+----------+
| CPU | Hit | Avg (us) | s^2 (us) | Hit | Avg (us) | s^2 (us) |
|-----+-----+----------+----------+-----+-----------------+----------+
| 0 | 274 | 38.303 | 1750.239 | 401 | 14.126 (-63.1%) | 146.625 |
| 1 | 197 | 49.529 | 1695.852 | 314 | 16.135 (-67.4%) | 167.525 |
| 2 | 142 | 34.296 | 1758.665 | 302 | 14.133 (-58.8%) | 130.071 |
| 3 | 172 | 31.734 | 1490.975 | 641 | 14.637 (-53.9%) | 139.189 |
| 4 | 316 | 7.834 | 178.217 | 425 | 5.413 (-30.9%) | 20.803 |
| 5 | 447 | 8.424 | 144.638 | 556 | 5.929 (-29.6%) | 27.301 |
| 6 | 581 | 14.886 | 346.793 | 456 | 5.711 (-61.6%) | 23.124 |
| 7 | 456 | 10.005 | 211.187 | 997 | 4.708 (-52.9%) | 21.144 |
+-----+-----+----------+----------+-----+-----------------+----------+
* Hit, Avg and s^2 are as reported by the function profiler
Test 2:
I also ran the same test with a normal DT, with 2 CPUFreq policies, to
see if this causes regressions in the most common case.
+---------------------------+----------------------------------+
| Without patch | With patch |
+-----+-----+----------+----------+-----+-----------------+----------+
| CPU | Hit | Avg (us) | s^2 (us) | Hit | Avg (us) | s^2 (us) |
|-----+-----+----------+----------+-----+-----------------+----------+
| 0 | 345 | 22.184 | 215.321 | 580 | 18.635 (-16.0%) | 146.892 |
| 1 | 358 | 18.597 | 200.596 | 438 | 12.934 (-30.5%) | 104.604 |
| 2 | 359 | 25.566 | 200.217 | 397 | 10.826 (-57.7%) | 74.021 |
| 3 | 362 | 16.881 | 200.291 | 718 | 11.455 (-32.1%) | 102.280 |
| 4 | 457 | 3.822 | 9.895 | 757 | 4.616 (+20.8%) | 13.369 |
| 5 | 344 | 4.301 | 7.121 | 594 | 5.320 (+23.7%) | 18.798 |
| 6 | 472 | 4.326 | 7.849 | 464 | 5.648 (+30.6%) | 22.022 |
| 7 | 331 | 4.630 | 13.937 | 408 | 5.299 (+14.4%) | 18.273 |
+-----+-----+----------+----------+-----+-----------------+----------+
* Hit, Avg and s^2 are as reported by the function profiler
In addition to these two tests, I also ran 50 iterations of the Lisa
EAS functional test suite [3] with this patch applied on Arm Juno r0,
Arm Juno r2, Arm TC2 and Hikey960, and could not see any regressions
(all EAS functional tests are passing).
[1] https://paste.debian.net/1100055/
[2] https://paste.debian.net/1100057/
[3] https://github.com/ARM-software/lisa/blob/master/lisa/tests/scheduler/eas_behaviour.py
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: qais.yousef@arm.com
Cc: qperret@qperret.net
Cc: rjw@rjwysocki.net
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20190912094404.13802-1-qperret@qperret.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Roman Gushchin [Thu, 12 Sep 2019 17:56:45 +0000 (10:56 -0700)]
cgroup: freezer: fix frozen state inheritance
If a new child cgroup is created in the frozen cgroup hierarchy
(one or more of ancestor cgroups is frozen), the CGRP_FREEZE cgroup
flag should be set. Otherwise if a process will be attached to the
child cgroup, it won't become frozen.
The problem can be reproduced with the test_cgfreezer_mkdir test.
This is the output before this patch:
~/test_freezer
ok 1 test_cgfreezer_simple
ok 2 test_cgfreezer_tree
ok 3 test_cgfreezer_forkbomb
Cgroup /sys/fs/cgroup/cg_test_mkdir_A/cg_test_mkdir_B isn't frozen
not ok 4 test_cgfreezer_mkdir
ok 5 test_cgfreezer_rmdir
ok 6 test_cgfreezer_migrate
ok 7 test_cgfreezer_ptrace
ok 8 test_cgfreezer_stopped
ok 9 test_cgfreezer_ptraced
ok 10 test_cgfreezer_vfork
And with this patch:
~/test_freezer
ok 1 test_cgfreezer_simple
ok 2 test_cgfreezer_tree
ok 3 test_cgfreezer_forkbomb
ok 4 test_cgfreezer_mkdir
ok 5 test_cgfreezer_rmdir
ok 6 test_cgfreezer_migrate
ok 7 test_cgfreezer_ptrace
ok 8 test_cgfreezer_stopped
ok 9 test_cgfreezer_ptraced
ok 10 test_cgfreezer_vfork
Reported-by: Mark Crossen <mcrossen@fb.com>
Signed-off-by: Roman Gushchin <guro@fb.com>
Fixes:
76f969e8948d ("cgroup: cgroup v2 freezer")
Cc: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Tejun Heo <tj@kernel.org>
Roman Gushchin [Thu, 12 Sep 2019 17:56:44 +0000 (10:56 -0700)]
kselftests: cgroup: add freezer mkdir test
Add a new cgroup freezer selftest, which checks that if a cgroup is
frozen, their new child cgroups will properly inherit the frozen
state.
It creates a parent cgroup, freezes it, creates a child cgroup
and populates it with a dummy process. Then it checks that both
parent and child cgroup are frozen.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
amy.shih [Thu, 12 Sep 2019 11:33:00 +0000 (11:33 +0000)]
hwmon: (nct7904) Fix incorrect SMI status register setting of LTD temperature and fan.
According to datasheet, the SMI status register setting of LTD
temperature is SMI_STS3, and the SMI status register setting
of fan is SMI_STS5 and SMI_STS6.
Signed-off-by: amy.shih <amy.shih@advantech.com.tw>
Link: https://lore.kernel.org/r/20190912113300.4714-1-Amy.Shih@advantech.com.tw
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Andy Shevchenko [Thu, 12 Sep 2019 13:10:49 +0000 (16:10 +0300)]
MAINTAINERS: Switch PDx86 subsystem status to Odd Fixes
Due to shift of priorities the actual status of the subsystem is Odd Fixes.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Chris Wilson [Thu, 12 Sep 2019 12:56:34 +0000 (13:56 +0100)]
Revert "drm/i915/userptr: Acquire the page lock around set_page_dirty()"
The userptr put_pages can be called from inside try_to_unmap, and so
enters with the page lock held on one of the object's backing pages. We
cannot take the page lock ourselves for fear of recursion.
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Martin Wilck <Martin.Wilck@suse.com>
Reported-by: Leo Kraav <leho@kraav.com>
Fixes:
aa56a292ce62 ("drm/i915/userptr: Acquire the page lock around set_page_dirty()")
References: https://bugzilla.kernel.org/show_bug.cgi?id=203317
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 12 Sep 2019 13:50:14 +0000 (14:50 +0100)]
Merge tag 'for-linus-
20190912' of gitolite.pub/scm/linux/kernel/git/brauner/linux
Pull clone3 fix from Christian Brauner:
"This is a last-minute bugfix for clone3() that should go in before we
release 5.3 with clone3().
clone3() did not verify that the exit_signal argument was set to a
valid signal. This can be used to cause a crash by specifying a signal
greater than NSIG. e.g. -1.
The commit from Eugene adds a check to copy_clone_args_from_user() to
verify that the exit signal is limited by CSIGNAL as with legacy
clone() and that the signal is valid. With this we don't get the
legacy clone behavior were an invalid signal could be handed down and
would only be detected and then ignored in do_notify_parent(). Users
of clone3() will now get a proper error right when they pass an
invalid exit signal. Note, that this is not a change in user-visible
behavior since no kernel with clone3() has been released yet"
* tag 'for-linus-
20190912' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
fork: block invalid exit signals with clone3()
Jeroen Roovers [Tue, 10 Sep 2019 09:45:14 +0000 (11:45 +0200)]
parisc: Have git ignore generated real2.S and firmware.c
These files are not covered in globs from any other .gitignore files.
Signed-off-by: Jeroen Roovers <jer@gentoo.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Linus Torvalds [Thu, 12 Sep 2019 13:47:35 +0000 (14:47 +0100)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"A KVM guest fix, and a kdump kernel relocation errors fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/timer: Force PIT initialization when !X86_FEATURE_ARAT
x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors
Dave Airlie [Thu, 12 Sep 2019 13:14:29 +0000 (23:14 +1000)]
Merge tag 'drm-misc-fixes-2019-09-12' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.3 final:
- Constify modes whitelist harder.
- Fix lima driver gem_wait ioctl.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/99e52e7a-d4ce-6a2c-0501-bc559a710955@linux.intel.com
Dave Airlie [Thu, 12 Sep 2019 13:11:36 +0000 (23:11 +1000)]
Merge tag 'drm-intel-fixes-2019-09-11' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
Final drm/i915 fixes for v5.3:
- Fox DP MST high color depth regression
- Fix GPU hangs on Vulkan compute workloads
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/877e6e27qm.fsf@intel.com
Eugene Syromiatnikov [Wed, 11 Sep 2019 17:45:40 +0000 (18:45 +0100)]
fork: block invalid exit signals with clone3()
Previously, higher 32 bits of exit_signal fields were lost when copied
to the kernel args structure (that uses int as a type for the respective
field). Moreover, as Oleg has noted, exit_signal is used unchecked, so
it has to be checked for sanity before use; for the legacy syscalls,
applying CSIGNAL mask guarantees that it is at least non-negative;
however, there's no such thing is done in clone3() code path, and that
can break at least thread_group_leader.
This commit adds a check to copy_clone_args_from_user() to verify that
the exit signal is limited by CSIGNAL as with legacy clone() and that
the signal is valid. With this we don't get the legacy clone behavior
were an invalid signal could be handed down and would only be detected
and ignored in do_notify_parent(). Users of clone3() will now get a
proper error when they pass an invalid exit signal. Note, that this is
not user-visible behavior since no kernel with clone3() has been
released yet.
The following program will cause a splat on a non-fixed clone3() version
and will fail correctly on a fixed version:
#define _GNU_SOURCE
#include <linux/sched.h>
#include <linux/types.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
pid_t pid = -1;
struct clone_args args = {0};
args.exit_signal = -1;
pid = syscall(__NR_clone3, &args, sizeof(struct clone_args));
if (pid < 0)
exit(EXIT_FAILURE);
if (pid == 0)
exit(EXIT_SUCCESS);
wait(NULL);
exit(EXIT_SUCCESS);
}
Fixes:
7f192e3cd316 ("fork: add clone3")
Reported-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Link: https://lore.kernel.org/r/4b38fa4ce420b119a4c6345f42fe3cec2de9b0b5.1568223594.git.esyr@redhat.com
[christian.brauner@ubuntu.com: simplify check and rework commit message]
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Thomas Huth [Thu, 12 Sep 2019 11:54:38 +0000 (13:54 +0200)]
KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl
When the userspace program runs the KVM_S390_INTERRUPT ioctl to inject
an interrupt, we convert them from the legacy struct kvm_s390_interrupt
to the new struct kvm_s390_irq via the s390int_to_s390irq() function.
However, this function does not take care of all types of interrupts
that we can inject into the guest later (see do_inject_vcpu()). Since we
do not clear out the s390irq values before calling s390int_to_s390irq(),
there is a chance that we copy random data from the kernel stack which
could be leaked to the userspace later.
Specifically, the problem exists with the KVM_S390_INT_PFAULT_INIT
interrupt: s390int_to_s390irq() does not handle it, and the function
__inject_pfault_init() later copies irq->u.ext which contains the
random kernel stack data. This data can then be leaked either to
the guest memory in __deliver_pfault_init(), or the userspace might
retrieve it directly with the KVM_S390_GET_IRQ_STATE ioctl.
Fix it by handling that interrupt type in s390int_to_s390irq(), too,
and by making sure that the s390irq struct is properly pre-initialized.
And while we're at it, make sure that s390int_to_s390irq() now
directly returns -EINVAL for unknown interrupt types, so that we
immediately get a proper error code in case we add more interrupt
types to do_inject_vcpu() without updating s390int_to_s390irq()
sometime in the future.
Cc: stable@vger.kernel.org
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Link: https://lore.kernel.org/kvm/20190912115438.25761-1-thuth@redhat.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Christophe JAILLET [Wed, 11 Sep 2019 16:02:39 +0000 (18:02 +0200)]
sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'
The '.exit' functions from 'pernet_operations' structure should be marked
as __net_exit, not __net_init.
Fixes:
8e2d61e0aed2 ("sctp: fix race on protocol/netns initialization")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Thu, 12 Sep 2019 11:46:20 +0000 (13:46 +0200)]
Merge tag 'qcom-drivers-for-5.4' of git://git./linux/kernel/git/qcom/linux into arm/drivers
Qualcomm ARM Based Driver Updates for v5.4
* Add AOSS QMP support
* Various fixups for Qualcomm SCM
* Add socinfo driver
* Add SoC serial number attribute and associated APIs
* Add SM8150 and SC7180 support in Qualcomm SCM
* Fixup max processor count in SMEM
* tag 'qcom-drivers-for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
soc: qcom: aoss: Add AOSS QMP support
dt-bindings: soc: qcom: aoss: Add SM8150 and SC7180 support
dt-bindings: firmware: scm: Add SM8150 and SC7180 support
dt-bindings: firmware: scm: re-order compatible list
soc: qcom: smem: Update max processor count
soc: qcom: socinfo: Annotate switch cases with fall through
soc: qcom: Extend AOSS QMP driver to support resources that are used to wake up the SoC.
soc: qcom: socinfo: Expose image information
soc: qcom: socinfo: Expose custom attributes
soc: qcom: Add socinfo driver
base: soc: Export soc_device_register/unregister APIs
base: soc: Add serial_number attribute to soc
firmware: qcom_scm: Cleanup code in qcom_scm_assign_mem()
firmware: qcom_scm: Fix some typos in docs and printks
firmware: qcom_scm: Use proper types for dma mappings
Steffen Klassert [Thu, 12 Sep 2019 11:01:44 +0000 (13:01 +0200)]
ixgbe: Fix secpath usage for IPsec TX offload.
The ixgbe driver currently does IPsec TX offloading
based on an existing secpath. However, the secpath
can also come from the RX side, in this case it is
misinterpreted for TX offload and the packets are
dropped with a "bad sa_idx" error. Fix this by using
the xfrm_offload() function to test for TX offload.
Fixes:
592594704761 ("ixgbe: process the Tx ipsec offload")
Reported-by: Michael Marley <michael@michaelmarley.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Filipe Manana [Wed, 11 Sep 2019 16:42:00 +0000 (17:42 +0100)]
Btrfs: fix unwritten extent buffers and hangs on future writeback attempts
The lock_extent_buffer_io() returns 1 to the caller to tell it everything
went fine and the callers needs to start writeback for the extent buffer
(submit a bio, etc), 0 to tell the caller everything went fine but it does
not need to start writeback for the extent buffer, and a negative value if
some error happened.
When it's about to return 1 it tries to lock all pages, and if a try lock
on a page fails, and we didn't flush any existing bio in our "epd", it
calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or
an error. The page might have been locked elsewhere, not with the goal
of starting writeback of the extent buffer, and even by some code other
than btrfs, like page migration for example, so it does not mean the
writeback of the extent buffer was already started by some other task,
so returning a 0 tells the caller (btree_write_cache_pages()) to not
start writeback for the extent buffer. Note that epd might currently have
either no bio, so flush_write_bio() returns 0 (success) or it might have
a bio for another extent buffer with a lower index (logical address).
Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the
extent buffer and writeback is never started for the extent buffer,
future attempts to writeback the extent buffer will hang forever waiting
on that bit to be cleared, since it can only be cleared after writeback
completes. Such hang is reported with a trace like the following:
[49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds.
[49887.347059] Not tainted 5.2.13-gentoo #2
[49887.347060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[49887.347062] btrfs-transacti D 0 1752 2 0x80004000
[49887.347064] Call Trace:
[49887.347069] ? __schedule+0x265/0x830
[49887.347071] ? bit_wait+0x50/0x50
[49887.347072] ? bit_wait+0x50/0x50
[49887.347074] schedule+0x24/0x90
[49887.347075] io_schedule+0x3c/0x60
[49887.347077] bit_wait_io+0x8/0x50
[49887.347079] __wait_on_bit+0x6c/0x80
[49887.347081] ? __lock_release.isra.29+0x155/0x2d0
[49887.347083] out_of_line_wait_on_bit+0x7b/0x80
[49887.347084] ? var_wake_function+0x20/0x20
[49887.347087] lock_extent_buffer_for_io+0x28c/0x390
[49887.347089] btree_write_cache_pages+0x18e/0x340
[49887.347091] do_writepages+0x29/0xb0
[49887.347093] ? kmem_cache_free+0x132/0x160
[49887.347095] ? convert_extent_bit+0x544/0x680
[49887.347097] filemap_fdatawrite_range+0x70/0x90
[49887.347099] btrfs_write_marked_extents+0x53/0x120
[49887.347100] btrfs_write_and_wait_transaction.isra.4+0x38/0xa0
[49887.347102] btrfs_commit_transaction+0x6bb/0x990
[49887.347103] ? start_transaction+0x33e/0x500
[49887.347105] transaction_kthread+0x139/0x15c
So fix this by not overwriting the return value (ret) with the result
from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK
bit in case flush_write_bio() returns an error, otherwise it will hang
any future attempts to writeback the extent buffer, and undo all work
done before (set back EXTENT_BUFFER_DIRTY, etc).
This is a regression introduced in the 5.2 kernel.
Fixes:
2e3c25136adfb ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()")
Fixes:
f4340622e0226 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
Reported-by: Zdenek Sojka <zsojka@seznam.cz>
Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u
Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Link: https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t
Reported-by: Drazen Kacar <drazen.kacar@oradian.com>
Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 10 Sep 2019 14:26:49 +0000 (15:26 +0100)]
Btrfs: fix assertion failure during fsync and use of stale transaction
Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.
That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.
In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:
1) evict_refill_and_join() will often change the transaction handle's
block reserve (->block_rsv) and set its ->bytes_reserved field to a
value greater than 0. If evict_refill_and_join() never commits the
transaction, the eviction handler ends up decreasing the reference
count (->use_count) of the transaction handle through the call to
btrfs_end_transaction(), and after that point we have a transaction
handle with a NULL ->block_rsv (which is the value prior to the
transaction join from evict_refill_and_join()) and a ->bytes_reserved
value greater than 0. If after the eviction/iput completes the inode
logging path hits an error or it decides that it must fallback to a
transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
non-zero value from btrfs_log_dentry_safe(), and because of that
non-zero value it tries to commit the transaction using a handle with
a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
the transaction commit hit an assertion failure at
btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
the ->block_rsv is NULL. The produced stack trace for that is like the
following:
[192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
[192922.917553] ------------[ cut here ]------------
[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
[192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
(...)
[192922.920925] RSP: 0018:
ffffaebdc8a27da8 EFLAGS:
00010286
[192922.921315] RAX:
0000000000000051 RBX:
ffff95c9c16a41c0 RCX:
0000000000000000
[192922.921692] RDX:
0000000000000000 RSI:
ffff95cab6b16838 RDI:
ffff95cab6b16838
[192922.922066] RBP:
ffff95c9c16a41c0 R08:
0000000000000000 R09:
0000000000000000
[192922.922442] R10:
ffffaebdc8a27e70 R11:
0000000000000000 R12:
ffff95ca731a0980
[192922.922820] R13:
0000000000000000 R14:
ffff95ca84c73338 R15:
ffff95ca731a0ea8
[192922.923200] FS:
00007f337eda4e80(0000) GS:
ffff95cab6b00000(0000) knlGS:
0000000000000000
[192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[192922.923948] CR2:
00007f337edad000 CR3:
00000001e00f6002 CR4:
00000000003606e0
[192922.924329] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[192922.924711] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[192922.925105] Call Trace:
[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
[192922.926731] do_fsync+0x38/0x60
[192922.927138] __x64_sys_fdatasync+0x13/0x20
[192922.927543] do_syscall_64+0x60/0x1c0
[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...)
[192922.934077] ---[ end trace
f00808b12068168f ]---
2) If evict_refill_and_join() decides to commit the transaction, it will
be able to do it, since the nested transaction join only increments the
transaction handle's ->use_count reference counter and it does not
prevent the transaction from getting committed. This means that after
eviction completes, the fsync logging path will be using a transaction
handle that refers to an already committed transaction. What happens
when using such a stale transaction can be unpredictable, we are at
least having a use-after-free on the transaction handle itself, since
the transaction commit will call kmem_cache_free() against the handle
regardless of its ->use_count value, or we can end up silently losing
all the updates to the log tree after that iput() in the logging path,
or using a transaction handle that in the meanwhile was allocated to
another task for a new transaction, etc, pretty much unpredictable
what can happen.
In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).
The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Igor Mammedov [Wed, 11 Sep 2019 07:52:18 +0000 (03:52 -0400)]
KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset()
If userspace doesn't set KVM_MEM_LOG_DIRTY_PAGES on memslot before calling
kvm_s390_vm_start_migration(), kernel will oops with:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address:
0000000000000000 TEID:
0000000000000483
Fault in home space mode while using kernel ASCE.
AS:
0000000002a2000b R2:
00000001bff8c00b R3:
00000001bff88007 S:
00000001bff91000 P:
000000000000003d
Oops: 0004 ilc:2 [#1] SMP
...
Call Trace:
([<
001fffff804ec552>] kvm_s390_vm_set_attr+0x347a/0x3828 [kvm])
[<
001fffff804ecfc0>] kvm_arch_vm_ioctl+0x6c0/0x1998 [kvm]
[<
001fffff804b67e4>] kvm_vm_ioctl+0x51c/0x11a8 [kvm]
[<
00000000008ba572>] do_vfs_ioctl+0x1d2/0xe58
[<
00000000008bb284>] ksys_ioctl+0x8c/0xb8
[<
00000000008bb2e2>] sys_ioctl+0x32/0x40
[<
000000000175552c>] system_call+0x2b8/0x2d8
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<
0000000000dbaf60>] __memset+0xc/0xa0
due to ms->dirty_bitmap being NULL, which might crash the host.
Make sure that ms->dirty_bitmap is set before using it or
return -EINVAL otherwise.
Cc: <stable@vger.kernel.org>
Fixes:
afdad61615cc ("KVM: s390: Fix storage attributes migration with memory slots")
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Link: https://lore.kernel.org/kvm/20190911075218.29153-1-imammedo@redhat.com/
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Navid Emamdoost [Wed, 11 Sep 2019 15:09:02 +0000 (10:09 -0500)]
net: qrtr: fix memort leak in qrtr_tun_write_iter
In qrtr_tun_write_iter the allocated kbuf should be release in case of
error or success return.
v2 Update: Thanks to David Miller for pointing out the release on success
path as well.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subash Abhinov Kasiviswanathan [Tue, 10 Sep 2019 20:02:57 +0000 (14:02 -0600)]
net: Fix null de-reference of device refcount
In event of failure during register_netdevice, free_netdev is
invoked immediately. free_netdev assumes that all the netdevice
refcounts have been dropped prior to it being called and as a
result frees and clears out the refcount pointer.
However, this is not necessarily true as some of the operations
in the NETDEV_UNREGISTER notifier handlers queue RCU callbacks for
invocation after a grace period. The IPv4 callback in_dev_rcu_put
tries to access the refcount after free_netdev is called which
leads to a null de-reference-
44837.761523: <6> Unable to handle kernel paging request at
virtual address
0000004a88287000
44837.761651: <2> pc : in_dev_finish_destroy+0x4c/0xc8
44837.761654: <2> lr : in_dev_finish_destroy+0x2c/0xc8
44837.762393: <2> Call trace:
44837.762398: <2> in_dev_finish_destroy+0x4c/0xc8
44837.762404: <2> in_dev_rcu_put+0x24/0x30
44837.762412: <2> rcu_nocb_kthread+0x43c/0x468
44837.762418: <2> kthread+0x118/0x128
44837.762424: <2> ret_from_fork+0x10/0x1c
Fix this by waiting for the completion of the call_rcu() in
case of register_netdevice errors.
Fixes:
93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev failure.")
Cc: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe JAILLET [Tue, 10 Sep 2019 11:29:59 +0000 (13:29 +0200)]
ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()'
The '.exit' functions from 'pernet_operations' structure should be marked
as __net_exit, not __net_init.
Fixes:
d862e5461423 ("net: ipv6: Implement /proc/net/icmp6.")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>