review.tizen.org Git - platform/kernel/linux-exynos.git/log

sched/fair: Change "has_capacity" to "has_free_capacity"

The capacity of a CPU/group should be some intrinsic value that doesn't
change with task placement. It is like a container which capacity is
stable regardless of the amount of liquid in it (its "utilization")...
unless the container itself is crushed that is, but that's another story.

Therefore let's rename "has_capacity" to "has_free_capacity" in order to
better convey the wanted meaning.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linaro-kernel@lists.linaro.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/n/tip-djzkk027jm0e8x8jxy70opzh@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/fair: Remove "power" from 'struct numa_stats'

It is better not to think about compute capacity as being equivalent
to "CPU power". The upcoming "power aware" scheduler work may create
confusion with the notion of energy consumption if "power" is used too
liberally.

To make things explicit and not create more confusion with the existing
"capacity" member, let's rename things as follows:

power -> compute_capacity
capacity -> task_capacity

Note: none of those fields are actually used outside update_numa_stats().

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linaro-kernel@lists.linaro.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/n/tip-2e2ndymj5gyshyjq8am79f20@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched: Fix signedness bug in yield_to()

yield_to() is supposed to return -ESRCH if there is no task to
yield to, but because the type is bool that is the same as returning
true.

The only place I see which cares is kvm_vcpu_on_spin().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Raghavendra <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/20140523102042.GA7267@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/fair: Use time_after() in record_wakee()

To be future-proof and for better readability the time comparisons are modified
to use time_after() instead of plain, error-prone math.

Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1400780723-24626-1-git-send-email-manuel.schoelling@gmx.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/balancing: Reduce the rate of needless idle load balancing

The current no_hz idle load balancer do load balancing for *all* idle cpus,
even though the time due to load balance for a particular
idle cpu could be still a while in the future.  This introduces a much
higher load balancing rate than what is necessary.  The patch
changes the behavior by only doing idle load balancing on
behalf of an idle cpu only when it is due for load balancing.

On SGI's systems with over 3000 cores, the cpu responsible for idle balancing
got overwhelmed with idle balancing, and introduces a lot of OS noise
to workloads.  This patch fixes the issue.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Hedi Berriche <hedi@sgi.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: MichelLespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1400621967.2970.280.camel@schen9-DESK
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/fair: Fix unlocked reads of some cfs_b->quota/period

sched_cfs_period_timer() reads cfs_b->period without locks before calling
do_sched_cfs_period_timer(), and similarly unthrottle_offline_cfs_rqs()
would read cfs_b->period without the right lock. Thus a simultaneous
change of bandwidth could cause corruption on any platform where ktime_t
or u64 writes/reads are not atomic.

Extend cfs_b->lock from do_sched_cfs_period_timer() to include the read of
cfs_b->period to solve that issue; unthrottle_offline_cfs_rqs() can just
use 1 rather than the exact quota, much like distribute_cfs_runtime()
does.

There is also an unlocked read of cfs_b->runtime_expires, but a race
there would only delay runtime expiry by a tick. Still, the comparison
should just be != anyway, which clarifies even that problem.

Signed-off-by: Ben Segall <bsegall@google.com>
Tested-by: Roman Gushchin <klamm@yandex-team.ru>
[peterz: Fix compile warn]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140519224945.20303.93530.stgit@sword-of-the-dawn.mtv.corp.google.com
Cc: pjt@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/numa: Decay ->wakee_flips instead of zeroing

Affine wakeups have the potential to interfere with NUMA placement.
If a task wakes up too many other tasks, affine wakeups will get
disabled.

However, regardless of how many other tasks it wakes up, it gets
re-enabled once a second, potentially interfering with NUMA
placement of other tasks.

By decaying wakee_wakes in half instead of zeroing it, we can avoid
that problem for some workloads.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: chegu_vinod@hp.com
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20140516001332.67f91af2@annuminas.surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/numa: Update migrate_improves/degrades_locality()

Update the migrate_improves/degrades_locality() functions with
knowledge of pseudo-interleaving.

Do not consider moving tasks around within the set of group's active
nodes as improving or degrading locality. Instead, leave the load
balancer free to balance the load between a numa_group's active nodes.

Also, switch from the group/task_weight functions to the group/task_fault
functions. The "weight" functions involve a division, but both calls use
the same divisor, so there's no point in doing that from these functions.

On a 4 node (x10 core) system, performance of SPECjbb2005 seems
unaffected, though the number of migrations with 2 8-warehouse wide
instances seems to have almost halved, due to the scheduler running
each instance on a single node.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Link: http://lkml.kernel.org/r/20140515130306.61aae7db@cuia.bos.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/numa: Allow task switch if load imbalance improves

Currently the NUMA balancing code only allows moving tasks between NUMA
nodes when the load on both nodes is in balance. This breaks down when
the load was imbalanced to begin with.

Allow tasks to be moved between NUMA nodes if the imbalance is small,
or if the new imbalance is be smaller than the original one.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/20140514132221.274b3463@annuminas.surriel.com

sched/rt: Fix 'struct sched_dl_entity' and dl_task_time() comments, to match the current upstream code

Signed-off-by: xiaofeng.yan <xiaofeng.yan@huawei.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1399605687-18094-1-git-send-email-xiaofeng.yan@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice()

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/a568a1e3cc8e78648f41b5035fa5e381d36274da.1399532322.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched: Initialize rq->age_stamp on processor start

If the sched_clock time starts at a large value, the kernel will spin
in sched_avg_update for a long time while rq->age_stamp catches up
with rq->clock.

The comment in kernel/sched/clock.c says that there is no strict promise
that it starts at zero. So initialize rq->age_stamp when a cpu starts up
to avoid this.

I was seeing long delays on a simulator that didn't start the clock at
zero. This might also be an issue on reboots on processors that don't
re-initialize the timer to zero on reset, and when using kexec.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1399574859-11714-1-git-send-email-minyard@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched, nohz: Change rq->nr_running to always use wrappers

Sometimes ->nr_running may cross 2 but interrupt is not being
sent to rq's cpu. In this case we don't reenable the timer.
Looks like this may be the reason for rare unexpected effects,
if nohz is enabled.

Patch replaces all places of direct changing of nr_running
and makes add_nr_running() caring about crossing border.

Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140508225830.2469.97461.stgit@localhost
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched: Fix the rq->next_balance logic in rebalance_domains() and idle_balance()

Currently, in idle_balance(), we update rq->next_balance when we pull_tasks.
However, it is also important to update this in the !pulled_tasks case too.

When the CPU is "busy" (the CPU isn't idle), rq->next_balance gets computed
using sd->busy_factor (so we increase the balance interval when the CPU is
busy). However, when the CPU goes idle, rq->next_balance could still be set
to a large value that was computed with the sd->busy_factor.

Thus, we need to also update rq->next_balance in idle_balance() in the cases
where !pulled_tasks too, so that rq->next_balance gets updated without taking
the busy_factor into account when the CPU is about to go idle.

This patch makes rq->next_balance get updated independently of whether or
not we pulled_task. Also, we add logic to ensure that we always traverse
at least 1 of the sched domains to get a proper next_balance value for
updating rq->next_balance.

Additionally, since load_balance() modifies the sd->balance_interval, we
need to re-obtain the sched domain's interval after the call to
load_balance() in rebalance_domains() before we update rq->next_balance.

This patch adds and uses 2 new helper functions, update_next_balance() and
get_sd_balance_interval() to update next_balance and obtain the sched
domain's balance_interval.

Signed-off-by: Jason Low <jason.low2@hp.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: daniel.lezcano@linaro.org
Cc: alex.shi@linaro.org
Cc: efault@gmx.de
Cc: vincent.guittot@linaro.org
Cc: morten.rasmussen@arm.com
Cc: aswin@hp.com
Link: http://lkml.kernel.org/r/1399596562.2200.7.camel@j-VirtualBox
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched: Use clamp() and clamp_val() to make sys_nice() more readable

Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1399541715-19568-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups()

There is no need to zero struct sched_group member cpumask and struct
sched_group_power member power since both structures are already allocated
as zeroed memory in __sdt_alloc().

This patch has been tested with
BUG_ON(!cpumask_empty(sched_group_cpus(sg))); and BUG_ON(sg->sgp->power);
in build_sched_groups() on ARM TC2 and INTEL i5 M520 platform including
CPU hotplug scenarios.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1398865178-12577-1-git-send-email-dietmar.eggemann@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/numa: Fix initialization of sched_domain_topology for NUMA

Jet Chen has reported a kernel panics when booting qemu-system-x86_64 with
kvm64 cpu. A panic occured while building the sched_domain.

In sched_init_numa, we create a new topology table in which both default
levels and numa levels are copied. The last row of the table must have a null
pointer in the mask field.

The current implementation doesn't add this last row in the computation of the
table size. So we add 1 row in the allocation size that will be used as the
last row of the table. The kzalloc will ensure that the mask field is NULL.

Reported-by: Jet Chen <jet.chen@intel.com>
Tested-by: Jet Chen <jet.chen@intel.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: fengguang.wu@intel.com
Link: http://lkml.kernel.org/r/1399972261-25693-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched: Call select_idle_sibling() when not affine_sd

On smaller systems, the top level sched domain will be an affine
domain, and select_idle_sibling is invoked for every SD_WAKE_AFFINE
wakeup. This seems to be working well.

On larger systems, with the node distance between far away NUMA nodes
being > RECLAIM_DISTANCE, select_idle_sibling is only called if the
waker and the wakee are on nodes less than RECLAIM_DISTANCE apart.

This patch leaves in place the policy of not pulling the task across
nodes on such systems, while fixing the issue that select_idle_sibling
is not called at all in certain circumstances.

The code will look for an idle CPU in the same CPU package as the
CPU where the task ran previously.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: morten.rasmussen@arm.com
Cc: george.mccollister@gmail.com
Cc: ktkhai@parallels.com
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Link: http://lkml.kernel.org/r/20140514114037.2d93266f@annuminas.surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched: Simplify return logic in sched_read_attr()

Gotos are chained pointlessly here, and the 'out' label
can be dispensed with.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/536CEC29.9090503@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched: Simplify return logic in sched_copy_attr()

The logic in this function is a little contorted, clean it up:

  * Rather than having chained gotos for the -EFBIG case, just
    return -EFBIG directly.

  * Now, the label 'out' is no longer needed, and 'ret' must be zero
    zero by the time we fall through to this point, so just return 0.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/536CEC24.9080201@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched: Fix exec_start/task_hot on migrated tasks

task_hot checks exec_start on any runnable task, but if it has been
migrated since the it last ran, then exec_start is a clock_task from
another cpu. If the old cpu's clock_task was sufficiently far ahead of
this cpu's then the task will not be considered for another migration
until it has run. Instead reset exec_start whenever a task is migrated,
since it is presumably no longer hot anyway.

Signed-off-by: Ben Segall <bsegall@google.com>
[ Made it compile. ]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140515225920.7179.13924.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'sched/urgent' into sched/core to avoid conflicts with upcoming changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'pm-cpuidle' of git://git./linux/kernel/git/rafael/linux-pm into sched/core

Pull scheduling related CPU idle updates from Rafael J. Wysocki.

Conflicts:
kernel/sched/idle.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'v3.15-rc6' into sched/core, to pick up the latest fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>

arm64: Remove TIF_POLLING_NRFLAG

The only idle method for arm64 is WFI and it therefore
unconditionally requires the reschedule interrupt when idle.

Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: http://lkml.kernel.org/r/20140509170649.GG13658@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

metag: Remove TIF_POLLING_NRFLAG

The Meta idle function jumps into the interrupt handler which
efficiently blocks waiting for the next interrupt when it reads the
interrupt status register (TXSTATI). No other (polling) idle functions
can be used, therefore TIF_POLLING_NRFLAG is unnecessary, so lets remove
it.

Peter Zijlstra said:
> Most archs have (x86) hlt or (arm) wfi like idle instructions, and if
> that is your only possible idle function, you'll require the interrupt
> to wake up and there's really no point to having the POLLING bit.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/536CEB7E.9080007@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched: Fix hotplug vs. set_cpus_allowed_ptr()

Lai found that:

  WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
  ...
  migration_cpu_stop+0x1d/0x22

was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.

This isn't true since 5fbd036b552f ("sched: Cleanup cpu_active madness").

So set active and online at the same time to avoid this particular
problem.

Fixes: 5fbd036b552f ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael wang <wangyun@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/cpupri: Replace NR_CPUS arrays

Tejun reported that his resume was failing due to order-3 allocations
from sched_domain building.

Replace the NR_CPUS arrays in there with a dynamically allocated
array.

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-7cysnkw1gik45r864t1nkudh@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/deadline: Replace NR_CPUS arrays

Tejun reported that his resume was failing due to order-3 allocations
from sched_domain building.

Replace the NR_CPUS arrays in there with a dynamically allocated
array.

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-kat4gl1m5a6dwy6nzuqox45e@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/deadline: Restrict user params max value to 2^63 ns

Michael Kerrisk noticed that creating SCHED_DEADLINE reservations
with certain parameters (e.g, a runtime of something near 2^64 ns)
can cause a system freeze for some amount of time.

The problem is that in the interface we have

u64 sched_runtime;

while internally we need to have a signed runtime (to cope with
budget overruns)

s64 runtime;

At the time we setup a new dl_entity we copy the first value in
the second. The cast turns out with negative values when
sched_runtime is too big, and this causes the scheduler to go crazy
right from the start.

Moreover, considering how we deal with deadlines wraparound

(s64)(a - b) < 0

we also have to restrict acceptable values for sched_{deadline,period}.

This patch fixes the thing checking that user parameters are always
below 2^63 ns (still large enough for everyone).

It also rewrites other conditions that we check, since in
__checkparam_dl we don't have to deal with deadline wraparounds
and what we have now erroneously fails when the difference between
values is too big.

Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Dario Faggioli<raistlin@linux.it>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140513141131.20d944f81633ee937f256385@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/deadline: Change sched_getparam() behaviour vs SCHED_DEADLINE

The way we read POSIX one should only call sched_getparam() when
sched_getscheduler() returns either SCHED_FIFO or SCHED_RR.

Given that we currently return sched_param::sched_priority=0 for all
others, extend the same behaviour to SCHED_DEADLINE.

Requested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: linux-man <linux-man@vger.kernel.org>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20140512205034.GH13467@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched: Disallow sched_attr::sched_policy < 0

The scheduler uses policy=-1 to preserve the current policy state to
implement sys_sched_setparam(), this got exposed to userspace by
accident through sys_sched_setattr(), cure this.

Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140509085311.GJ30445@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched: Make sched_setattr() correctly return -EFBIG

The documented[1] behavior of sched_attr() in the proposed man page text is:

    sched_attr::size must be set to the size of the structure, as in
    sizeof(struct sched_attr), if the provided structure is smaller
    than the kernel structure, any additional fields are assumed
    '0'. If the provided structure is larger than the kernel structure,
    the kernel verifies all additional fields are '0' if not the
    syscall will fail with -E2BIG.

As currently implemented, sched_copy_attr() returns -EFBIG for
for this case, but the logic in sys_sched_setattr() converts that
error to -EFAULT. This patch fixes the behavior.

[1] http://thread.gmane.org/gmane.linux.kernel/1615615/focus=1697760

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/536CEC17.9070903@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Linux 3.15-rc6

Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc

Pull two powerpc fixes from Ben Herrenschmidt:
"Here are a couple of fixes for 3.15.  One from Anton fixes a nasty
  regression I introduced when trying to fix a loss of irq_work whose
  consequences is that we can completely lose timer interrupts on a
  CPU... not pretty.

  The other one is a change to our PCIe reset hook to use a firmware
  call instead of direct config space accesses to trigger a fundamental
  reset on the root port.  This is necessary so that the FW gets a
  chance to disable the link down error monitoring, which would
  otherwise trip and cause subsequent fatal EEH error"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: irq work racing with timer interrupt can result in timer interrupt hang
  powerpc/powernv: Reset root port in firmware

Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs

Pull two btrfs fixes from Chris Mason:
"This has two fixes that we've been testing for 3.16, but since both
  are safe and fix real bugs, it makes sense to send for 3.15 instead"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: send, fix incorrect ref access when using extrefs
  Btrfs: fix EIO on reading file after ioctl clone works on it

Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

Pull two ceph fixes from Sage Weil:
"The first patch fixes a problem when we have a page count of 0 for
  sendpage which is triggered by zfs.  The second fixes a bug in CRUSH
  that was resolved in the userland code a while back but fell through
  the cracks on the kernel side"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  crush: decode and initialize chooseleaf_vary_r
  libceph: fix corruption when using page_count 0 page in rbd

Merge tag 'xfs-for-linus-3.15-rc6' of git://oss.sgi.com/xfs/xfs

Pull xfs fixes from Dave Chinner:
"Code inspection of the XFS error number sign translations found a
  bunch of issues, including returning incorrectly signed errors for
  some data integrity operations.

  These leak to userspace and result in applications not getting the
  errors correctly reported.  Hence they need fixing sooner rather than
  later.

  A couple of the bugs are in data integrity operations, a couple more
  are in the new COLLAPSE_RANGE code.  One of these came in through a
  recent ext4 merge and so I had to update the base tree to 3.15-rc5
  before fixing the issues"

* tag 'xfs-for-linus-3.15-rc6' of git://oss.sgi.com/xfs/xfs:
  xfs: list_lru_init returns a negative error
  xfs: negate xfs_icsb_init_counters error value
  xfs: negate mount workqueue init error value
  xfs: fix wrong err sign on xfs_set_acl()
  xfs: fix wrong errno from xfs_initxattrs
  xfs: correct error sign on COLLAPSE_RANGE errors
  xfs: xfs_commit_metadata returns wrong errno
  xfs: fix incorrect error sign in xfs_file_aio_read
  xfs: xfs_dir_fsync() returns positive errno

Merge branch 'renameat2' of git://git./linux/kernel/git/mszeredi/vfs

Pull renameat2 arch support from Miklos Szeredi:
"I've collected architecture patches for the renameat2 syscall that
  maintainers acked and/or asked me to queue.

  This adds architecture support for the renameat2 syscall to m68k,
  parisc, ia64 and through asm-generic to arc, arm64, c6x, hexagon,
  metag, openrisc, score, tile, unicore32"

* 'renameat2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  scripts/checksyscalls.sh: Make renameat optional
  asm-generic: Add renameat2 syscall
  ia64: add renameat2 syscall
  parisc: add renameat2 syscall
  m68k: add renameat2 syscall

Merge tag 'iommu-fixes-v3.15-rc5' of git://git./linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:
"Three fixes for the AMD IOMMU driver:
   - fix a locking issue around get_user_pages()
   - fix two issues with device aliasing and exclusion range handling"

* tag 'iommu-fixes-v3.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: fix enabling exclusion range for an exact device
  iommu/amd: Take mmap_sem when calling get_user_pages
  iommu/amd: Fix interrupt remapping for aliased devices

Merge tag 'stable/for-linus-3.15-rc5-tag' of git://git./linux/kernel/git/konrad/ibft

Pull iscsi_ibft fix from Konrad Rzeszutek Wilk:
"Fix iBFT regression on Broadcom NICs introduced in 3.2"

* tag 'stable/for-linus-3.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft:
iscsi_ibft: Fix finding Broadcom specific ibft sign

Merge tag 'renesas-sh-drivers-for-v3.15' of git://git./linux/kernel/git/horms/renesas

Pull SH driver fix from Simon Horman:
"Compile drivers/sh/pm_runtime.c if ARCH_SHMOBILE_MULTI

  This resolves a regression introduced in v3.14 by commit bf98c1eac1d4
  ("ARM: Rename ARCH_SHMOBILE to ARCH_SHMOBILE_LEGACY")"

* tag 'renesas-sh-drivers-for-v3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
  drivers: sh: compile drivers/sh/pm_runtime.c if ARCH_SHMOBILE_MULTI

Merge branch 'v4l_for_linus' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
"Most of the changes are drivers fixes (rtl28xuu, fc2580, ov7670,
  davinci, gspca, s5p-fimc and s5c73m3).

  There is also a compat32 fix and one infoleak fixup at the media
  controller"

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] V4L2: fix VIDIOC_CREATE_BUFS in 64- / 32-bit compatibility mode
  [media] V4L2: ov7670: fix a wrong index, potentially Oopsing the kernel from user-space
  [media] media-device: fix infoleak in ioctl media_enum_entities()
  [media] fc2580: fix tuning failure on 32-bit arch
  [media] Prefer gspca_sonixb over sn9c102 for all devices
  [media] media: davinci: vpfe: make sure all the buffers unmapped and released
  [media] staging: media: davinci: vpfe: make sure all the buffers are released
  [media] media: davinci: vpbe_display: fix releasing of active buffers
  [media] media: davinci: vpif_display: fix releasing of active buffers
  [media] media: davinci: vpif_capture: fix releasing of active buffers
  [media] s5p-fimc: Fix YUV422P depth
  [media] s5c73m3: Add missing rename of v4l2_of_get_next_endpoint() function
  [media] rtl28xxu: silence error log about disabled rtl2832_sdr module
  [media] rtl28xxu: do not hard depend on staging SDR module

Merge tag 'staging-3.15-rc6' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are five staging driver fixes for 3.15-rc6 that resolve some
  reported issues.  They are for the imx and rtl8723au drivers"

* tag 'staging-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: rtl8723au: Do not reset wdev->iftype in netdev_close()
  staging: rtl8723au: Use correct pipe type for USB interrupts
  imx-drm: imx-tve: correct DDC property name to 'ddc-i2c-bus'
  imx-drm: imx-drm-core: skip components whose parent device is disabled
  imx-drm: imx-drm-core: fix imx_drm_encoder_get_mux_id

Merge tag 'driver-core-3.15-rc6' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
"Here are two driver core (well, sysfs) fixes for 3.15-rc6 that resolve
  some reported issues and a regression from 3.13"

* tag 'driver-core-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  sysfs: make sure read buffer is zeroed
  kernfs, sysfs, cgroup: restrict extra perm check on open to sysfs

Merge tag 'pci-v3.15-fixes-2' of git://git./linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
"These are fixes for an SHPCHP hotplug regression, a "wait for pending
  transaction" problem (used in device reset paths), and an email
  address update.

  PCI device hotplug:
    - Fix SHPCHP bus speed mismatch issue (Marcel Apfelbaum)

  Miscellaneous:
    - Fix pci_wait_for_pending_transaction() (Gavin Shan)
    - Update email address (Ben Hutchings)"

* tag 'pci-v3.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Wrong register used to check pending traffic
  PCI: shpchp: Check bridge's secondary (not primary) bus speed
  PCI: Update my email address

Merge tag 'random_for_linus' of git://git./linux/kernel/git/tytso/random

Pull /dev/random fix from Ted Ts'o:
"This fixes a BUG_ON-causing regression that was introduced during the
last merge window"

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: fix BUG_ON caused by accounting simplification

Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux

Pull clock framework fixes from Mike Turquette:
"Clock framework and driver fixes, all of which fix user-visible
  regressions.

  As usual most fixes are for platform-specific clock drivers, but there
  are also two fixes to the clk core after recent changes to the way
  that clock unregistration is handled"

* tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux:
  clk: tegra: Fix wrong value written to PLLE_AUX
  clk: shmobile: clk-mstp: change to using clock-indices
  clk: Fix slab corruption in clk_unregister()
  clk: Fix double free due to devm_clk_register()
  clk: socfpga: fix clock driver for 3.15
  clk: divider: Fix best div calculation for power-of-two and table dividers
  clk: bcm281xx: don't use unnamed structs or unions

Merge tag 'spi-v3.15-rc5' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A few core fixes around outlying cases here, nothing that should
  affect most users but useful fixes.  The diffstat is rather larger
  than one might hope due some simple code motion in the fix for
  !CONFIG_DMA, the actual meaningful change is much smaller.

   - Fix handling of unsupported dual and quad mode support on slave
     registration so that drivers that can degrade gracefully do so,
     preventing regressions for drivers this is added.
   - Fix build in !CONFIG_DMA cases following addition of generic DMA
     mapping support.
   - Fix error handling for queue creation which due to wider kernel
     changes can be triggered more easily.
   - A couple of driver specific fixes"

* tag 'spi-v3.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi/pxa2xx: Prevent DMA from transferring too many bytes
  spi: core: Don't destroy master queue if we fail to create it
  spi: qup: Fix return value checking for pm_runtime_get_sync()
  spi: core: Protect DMA code by #ifdef CONFIG_HAS_DMA
  spi: core: Ignore unsupported Dual/Quad Transfer Mode bits

Merge tag 'gpio-v3.15-3' of git://git./linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
- fix a null pointer bug in the ICH6 chipset driver
- fix device tree registration for the mcp23s08 driver

* tag 'gpio-v3.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: mcp23s08: Bug fix of SPI device tree registration.
gpio: ich: set regs and reglen for i3100 and ich6 chipset

Merge branch 'for-3.15-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull more cgroup fixes from Tejun Heo:
"Three more patches to fix cgroup_freezer breakage due to the recent
  cgroup internal locking changes - an operation cgroup_freezer was
  using now requires sleepable context and cgroup_freezer was invoking
  that while holding a spin lock.  cgroup_freezer was using an overly
  elaborate hierarchical locking scheme.

  While it's possible to convert the hierarchical spinlocks directly to
  mutexes, this patch simplifies the overall locking so that it uses a
  global mutex.  This has the added benefit of avoiding iterating
  potentially huge number of tasks under a spinlock.  While the patch is
  on the larger side in the devel cycle, the changes made are mostly
  straight-forward and the locking logic is a lot simpler afterwards"

* 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix rcu_read_lock() leak in update_if_frozen()
  cgroup_freezer: replace freezer->lock with freezer_mutex
  cgroup: introduce task_css_is_root()

Merge branch 'for-3.15-fixes' of git://git./linux/kernel/git/tj/libata

Pull libata fixes from Tejun Heo:
"Mostly device-specific fixes.  The only thing which isn't is the fix
  for zpodd oops-on-detach bug"

* 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  ahci: imx: PLL clock needs 100us to settle down
  ata: pata_at91 only works on sam9
  libata: clean up ZPODD when a port is detached
  ahci: imx: software workaround for phy reset issue in resume
  ahci: imx: add namespace for register enums
  ahci: disable DEVSLP for Intel Valleyview

Merge git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
"This fixes a NULL pointer dereference on allocation failure in caam,
  as well as a regression in the ctr mode on s390 that was added with
  the recent concurrency fixes"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: s390 - fix aes,des ctr mode concurrency finding.
  crypto: caam - add allocation failure handling in SPRINTFCAT macro

Merge git://git./linux/kernel/git/nab/target-pending

Pull scsi target fixes from Nicholas Bellinger:
"This series include:

   - Close race between iser-target network portal shutdown + accepting
     new connection logins (sagi)
   - Fix free-after-use regression in tcm_fc post conversion to
     percpu-ida pre-allocation (nab)
   - Explicitly disable Immediate + Unsolicited Data for iser-target
     connections when T10-PI is enabled (sagi + nab)
   - Allow pi_prot_type + emulate_write_cache attributes to be set to
     zero regardless of backend support (andy)
   - memory leak fix (mikulas)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: fix memory leak on XCOPY
  target: Don't allow setting WC emulation if device doesn't support
  iscsi-target: Disable Immediate + Unsolicited Data with ISER Protection
  tcm_fc: Fix free-after-use regression in ft_free_cmd
  iscsi-target: Change BUG_ON to REJECT in iscsit_process_nop_out
  Target/iscsi,iser: Avoid accepting transport connections during stop stage
  Target/iser: Fix iscsit_accept_np and rdma_cm racy flow
  Target/iser: Fix wrong connection requests list addition
  target: Allow non-supporting backends to set pi_prot_type to 0

Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Some I2C bugfixes for 3.15.  Typical stuff, I'd say"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: rcar: bail out on zero length transfers
  i2c: qup: Fix pm_runtime_get_sync usage
  i2c: s3c2410: resume race fix
  i2c: nomadik: Don't use IS_ERR for devm_ioremap
  i2c: designware: Mask all interrupts during i2c controller enable

Merge tag 'pm+acpi-3.15-rc6' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael Wysocki:
"Still fixing regressions (partly by reverting commits that broke
  things for people), fixing other stable-candidate bugs and adding some
  blacklist entries for ACPI video and _OSI.

  Two ACPICA regression fixes (one recent and one for a 3.14 commit), a
  fix for an ACPI-related regression in TPM (introduced in 3.14), a
  revert of the ACPI AC driver conversion in 3.13 that went wrong for an
  unknown reason, two reverts of commits that attempted to remove an old
  user space interface in /proc and broke some utilities, in 3.13 too, a
  fix for a CPU hotplug bug in the ACPI processor driver (stable
  material), two (stable candidate) fixes for intel_pstate and a few new
  blacklist entries, mostly for systems that shipped with Windows 8.

  Specifics:

   - ACPICA fix for a stale pointer access introduced by a recent commit
     in the XSDT validation code from Lv Zheng.

   - ACPICA fix for the default value of the command line switch to
     favor 32-bit FADT addresses (in case there's a conflict between a
     64-bit and a 32-bit address).  The previous default was that the
     32-bit version would take precedence and we tried to change it to
     the other way around and it didn't work.  From Lv Zheng.

   - A TPM commit related to ACPI _DSM in 3.14 caused the driver to
     refuse to load if a specific _DSM was missing and that broke resume
     from system suspend on Chromebooks that require the TPM hardware to
     be restored to a working state during resume by the OS.  Restore
     the old behavior to load the driver if the _DSM in question is not
     present, but prevent it from using the feature the _DSM is for.

   - ACPI AC driver conversion in 3.13 broke thermal management on at
     least one machine and has to be reverted.  From Guenter Roeck.

   - Two reverts of 3.13 commits that attempted to remove the old ACPI
     battery interface in /proc, but turned out to break some utilities
     still using that interface.  From Lan Tianyu.

   - ACPI processor driver fix to prevent acpi_processor_add() from
     modifying the CPU device's .offline field which leads to breakage
     if the initial online of the CPU fails.  From Igor Mammedov.

   - Two intel_pstate fixes, one to take a BayTrail documentation update
     into account and one to avoid forcing the maximum P-state on init
     which causes CPU PM trouble on systems with P-states coordination
     when one of the CPU cores is initialized after an offline/online
     cycle triggered by user space.  Both stable candidates, from Dirk
     Brandewie.

   - Fix for the ACPI video DMI blacklist entry for Dell Inspiron 7520
     from Aaron Lu.

   - Two new ACPI video blacklist entries for machines shipping with
     Win8 that need to use native backlight so that it can be controlled
     in a usual way (which doesn't work otherwise due bugs in the ACPI
     tables) from Hans de Goede.

   - Two ACPI _OSI quirks for systems that need them to work correctly
     with Linux from Edward Lin and Hans de Goede"

* tag 'pm+acpi-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / video: Revert native brightness quirk for ThinkPad T530
  intel_pstate: remove setting P state to MAX on init
  ACPICA: Tables: Restore old behavor to favor 32-bit FADT addresses.
  ACPI / video: correct DMI tag for Dell Inspiron 7520
  intel_pstate: Set turbo VID for BayTrail
  ACPI / TPM: Fix resume regression on Chromebooks
  ACPI / proc: Do not say when /proc interfaces will be deleted in Kconfig
  ACPI / processor: do not mark present at boot but not onlined CPU as onlined
  ACPI: Revert "ACPI / AC: convert ACPI ac driver to platform bus"
  ACPI / blacklist: Add dmi_enable_osi_linux quirk for Asus EEE PC 1015PX
  ACPI: blacklist win8 OSI for Dell Inspiron 7737
  ACPI / video: Add use_native_backlight quirks for more systems
  ACPI: Revert "ACPI / Battery: Remove battery's proc directory"
  ACPI: Revert "ACPI: Remove CONFIG_ACPI_PROCFS_POWER and cm_sbsc.c"
  ACPICA: Tables: Fix invalid pointer accesses in acpi_tb_parse_root_table().

Merge tag 'dm-3.15-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:
"A dm-crypt fix for a cpu hotplug crash that switches from using
  per-cpu data to a mempool allocation (which offers allocation with cpu
  locality, and there is no inter-cpu communication on slab allocation).

  A couple dm-thinp stable fixes to address "out-of-data-space" issues.

  A dm-multipath fix for a LOCKDEP warning introduced in 3.15-rc1"

* tag 'dm-3.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm mpath: fix lock order inconsistency in multipath_ioctl
  dm thin: add timeout to stop out-of-data-space mode holding IO forever
  dm thin: allow metadata commit if pool is in PM_OUT_OF_DATA_SPACE mode
  dm crypt: fix cpu hotplug crash by removing per-cpu structure

Merge tag 'dt-for-linus' of git://git.secretlab.ca/git/linux

Pull device tree fixes from Grant Likely:
"Drivercore bugfixes for v3.15

  This branch contains bug fixes important to get into v3.15.  There is
  a fix for modifying properties seen during early boot, a fix for an
  incorrect prototype when CONFIG_OF=n, and a couple of corrections to
  device tree memory nodes on a few platforms"

* tag 'dt-for-linus' of git://git.secretlab.ca/git/linux:
  mips: dts: Fix missing device_type="memory" property in memory nodes
  arm: dts: Fix missing device_type="memory" for ste-ccu8540
  of: fix CONFIG_OF=n prototype of of_node_full_name()
  of: make of_update_property() usable earlier in the boot process

Btrfs: send, fix incorrect ref access when using extrefs

When running send, if an inode only has extended reference items
associated to it and no regular references, send.c:get_first_ref()
was incorrectly assuming the reference it found was of type
BTRFS_INODE_REF_KEY due to use of the wrong key variable.
This caused weird behaviour when using the found item has a regular
reference, such as weird path string, and occasionally (when lucky)
a crash:

[  190.600652] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[  190.600994] Modules linked in: btrfs xor raid6_pq binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc psmouse serio_raw evbug pcspkr i2c_piix4 e1000 floppy
[  190.602565] CPU: 2 PID: 14520 Comm: btrfs Not tainted 3.13.0-fdm-btrfs-next-26+ #1
[  190.602728] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  190.602868] task: ffff8800d447c920 ti: ffff8801fa79e000 task.ti: ffff8801fa79e000
[  190.603030] RIP: 0010:[<ffffffff813266b4>]  [<ffffffff813266b4>] memcpy+0x54/0x110
[  190.603262] RSP: 0018:ffff8801fa79f880  EFLAGS: 00010202
[  190.603395] RAX: ffff8800d4326e3f RBX: 000000000000036a RCX: ffff880000000000
[  190.603553] RDX: 000000000000032a RSI: ffe708844042936a RDI: ffff8800d43271a9
[  190.603710] RBP: ffff8801fa79f8c8 R08: 00000000003a4ef0 R09: 0000000000000000
[  190.603867] R10: 793a4ef09f000000 R11: 9f0000000053726f R12: ffff8800d43271a9
[  190.604020] R13: 0000160000000000 R14: ffff8802110134f0 R15: 000000000000036a
[  190.604020] FS:  00007fb423d09b80(0000) GS:ffff880216200000(0000) knlGS:0000000000000000
[  190.604020] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  190.604020] CR2: 00007fb4229d4b78 CR3: 00000001f5d76000 CR4: 00000000000006e0
[  190.604020] Stack:
[  190.604020]  ffffffffa01f4d49 ffff8801fa79f8f0 00000000000009f9 ffff8801fa79f8c8
[  190.604020]  00000000000009f9 ffff880211013260 000000000000f971 ffff88021147dba8
[  190.604020]  00000000000009f9 ffff8801fa79f918 ffffffffa02367f5 ffff8801fa79f928
[  190.604020] Call Trace:
[  190.604020]  [<ffffffffa01f4d49>] ? read_extent_buffer+0xb9/0x120 [btrfs]
[  190.604020]  [<ffffffffa02367f5>] fs_path_add_from_extent_buffer+0x45/0x60 [btrfs]
[  190.604020]  [<ffffffffa0238806>] get_first_ref+0x1f6/0x210 [btrfs]
[  190.604020]  [<ffffffffa0238994>] __get_cur_name_and_parent+0x174/0x3a0 [btrfs]
[  190.604020]  [<ffffffff8118df3d>] ? kmem_cache_alloc_trace+0x11d/0x1e0
[  190.604020]  [<ffffffffa0236674>] ? fs_path_alloc+0x24/0x60 [btrfs]
[  190.604020]  [<ffffffffa0238c91>] get_cur_path+0xd1/0x240 [btrfs]
(...)

Steps to reproduce (either crash or some weirdness like an odd path string):

    mkfs.btrfs -f -O extref /dev/sdd
    mount /dev/sdd /mnt

    mkdir /mnt/testdir
    touch /mnt/testdir/foobar

    for i in `seq 1 2550`; do
        ln /mnt/testdir/foobar /mnt/testdir/foobar_link_`printf "%04d" $i`
    done

    ln /mnt/testdir/foobar /mnt/testdir/final_foobar_name

    rm -f /mnt/testdir/foobar
    for i in `seq 1 2550`; do
        rm -f /mnt/testdir/foobar_link_`printf "%04d" $i`
    done

    btrfs subvolume snapshot -r /mnt /mnt/mysnap
    btrfs send /mnt/mysnap -f /tmp/mysnap.send

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>

Btrfs: fix EIO on reading file after ioctl clone works on it

For inline data extent, we need to make its length aligned, otherwise,
we can get a phantom extent map which confuses readpages() to return -EIO.

This can be detected by xfstests/btrfs/035.

Reported-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>

scripts/checksyscalls.sh: Make renameat optional

The new renameat2 syscall provides all the functionality of renameat
with an additional flags argument, so make renameat optional so that
future architectures can omit it without getting a warning.

This patch doesn't affect existing architectures.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: linux-arch@vger.kernel.org

asm-generic: Add renameat2 syscall

Add the renameat2 syscall to the generic syscall list, which is used by the
following architectures: arc, arm64, c6x, hexagon, metag, openrisc, score,
tile, unicore32.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: linux-arch@vger.kernel.org
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-hexagon@vger.kernel.org
Cc: linux-metag@vger.kernel.org
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>

ia64: add renameat2 syscall

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Tony Luck <tony.luck@intel.com>

parisc: add renameat2 syscall

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Helge Deller <deller@gmx.de>

m68k: add renameat2 syscall

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Merge tag 'sound-3.15-rc6' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Unfortunately this update became bigger than previous pull requests,
  which is almost a pattern in rc5-6.  But, the only obvious big changes
  are for the new Intel DSP ASoC drivers, so the impact must be fairly
  limited.

  Other than that, usual small fixes in various fields: HD-audio, ASoC
  core and ASoC fsl and codec drivers"

* tag 'sound-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (26 commits)
  ALSA: sb_mixer: missing return statement
  ASoC: wm8962: Update register CLASS_D_CONTROL_1 to be non-volatile
  ASoC: Intel: Fix Baytrail SST DSP firmware loading
  ALSA: hda - mask buggy stream DMA0 for Broadwell display controller
  ALSA: hda - Add new GPU codec ID to snd-hda
  ASoC: fsl_esai: Set PCRC and PRRC registers at the end of hw_params()
  ASoC: fsl_esai: Only bypass sck_div for EXTAL source
  ASoC: fsl_esai: Fix incorrect condition within ratio range check for FP
  ASoC: dapm: Fix SUSPEND -> OFF bias sequence
  ASoC: dapm: Skip CODEC<->CODEC links in connect_dai_link_widgets()
  ASoC: pcm: Fix incorrect condition check for case SNDRV_PCM_TRIGGER_SUSPEND
  ALSA: hda - add headset mic detect quirks for three Dell laptops
  ASoC: Update Cirrus Logic CODEC maintainers.
  ASoC: Intel: Fix block offset calculations.
  ASoC: Intel: Fix check for pdata usage before dereference.
  ASoC: Intel: Fix stream position pointer.
  ASoC: Intel: Fix allow hw_params to be called more than once.
  ASoC: Intel: Fix Audio DSP usage when IOMMU is enabled.
  ASoC: Intel: Fix Haswell/Broadwell DSP page table creation.
  ASoC: Intel: Fix allocated block list usage when adding blocks.
  ...

Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
"MIPS fixes for various loose ends:

   - Fix workarounds for R4000 erratum.
   - Patch up DEC, Siemens-Nixdorf and Loongson hardware support.
   - Wire up renameat2 syscall.
   - Delete unused file - it was causing false warnings from maintenance
     scripts.
   - Revert a patch because it's functionality is now implemented twice
     which causes superfluous /proc/cpuinfo output.
   - Fix a microMIPS regression"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: mm: Fix broken microMIPS kernel regression.
  MIPS: Add new AUDIT_ARCH token for the N32 ABI on MIPS64
  MIPS: Wire up renameat2 syscall.
  MIPS: inst.h: Rename BITFIELD_FIELD to __BITFIELD_FIELD.
  MIPS: Remove file missed when removing rm9k support a while ago.
  MIPS/loongson2_cpufreq: Fix CPU clock rate setting
  MIPS: Loongson: No need to select GENERIC_HARDIRQS_NO__DO_IRQ
  MIPS: csum_partial.S CPU_DADDI_WORKAROUNDS bug fix
  MIPS: __strncpy_from_user_asm CPU_DADDI_WORKAROUNDS bug fix
  MIPS: __delay CPU_DADDI_WORKAROUNDS bug fix
  MIPS: DEC/SNI: O32 wrapper stack switching fixes
  MIPS: DEC: Bus error handler <asm/cpu-type.h> fixes
  MAINTAINERS: TURBOchannel: Update entry
  Revert "MIPS: MT: proc: Add support for printing VPE and TC ids"

Merge branch 'parisc-3.15-4' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:
"There are two patches in here:

  The first patch greatly improves latency and corrects the memory
  ordering in our light-weight atomic locking syscall.

  The second patch ratelimits printing of userspace segfaults in the
  same way as it's done on other platforms.  This fixes a possible DOS
  on parisc since it prevents the syslog to grow too fast.  For example,
  when the debian acl2 package was built on our debian buildd servers,
  this package produced lots of gigabytes in syslog in very short time
  and thus filled our harddisks, which then turned the server nearly
  completely unaccessible and unresponsive"

* 'parisc-3.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Improve LWS-CAS performance
  parisc: ratelimit userspace segfault printing

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull two arm64 fixes from Catalin Marinas:
- arm64 migrate_irqs() fix following commit ffde1de64012 (irqchip: Gic:
   Support forced affinity setting)
- fix arm64 pud_huge() to return 0 when only 2 levels page tables are
   used (__PAGETABLE_PMD_FOLDED defined and pmd_huge already covers
   block entries at the first level), otherwise KVM gets confused

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: fix pud_huge() for 2-level pagetables
  arm64: use cpu_online_mask when using forced irq_set_affinity

Merge tag 'metag-for-v3.15-2' of git://git./linux/kernel/git/jhogan/metag

Pull Metag architecture and related fixes from James Hogan:
"Mostly fixes for metag and parisc relating to upgrowing stacks.

   - Fix missing compiler barriers in metag memory barriers.
   - Fix BUG_ON on metag when RLIMIT_STACK hard limit is increased
     beyond safe value.
   - Make maximum stack size configurable.  This reduces the default
     user stack size back to 80MB (especially on parisc after their
     removal of _STK_LIM_MAX override).  This only affects metag and
     parisc.
   - Remove metag _STK_LIM_MAX override to match other arches and follow
     parisc, now that it is safe to do so (due to the BUG_ON fix
     mentioned above).
   - Finally now that both metag and parisc _STK_LIM_MAX overrides have
     been removed, it makes sense to remove _STK_LIM_MAX altogether"

* tag 'metag-for-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
  asm-generic: remove _STK_LIM_MAX
  metag: Remove _STK_LIM_MAX override
  parisc,metag: Do not hardcode maximum userspace stack size
  metag: Reduce maximum stack size to 256MB
  metag: fix memory barriers

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm/intel fixes from Dave Airlie:
"Just some intel fixes.

  I have some radeon ones but I need to get some patches dropped from
  the pull req"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/i915: Increase WM memory latency values on SNB
  drm/i915: restore backlight precision when converting from ACPI
  drm/i915: Use the first mode if there is no preferred mode in the EDID
  drm/i915/dp: force eDP lane count to max available lanes on BDW
  drm/i915/vlv: reset VLV media force wake request register
  drm/i915/SDVO: For sysfs link put directory and target in correct order
  drm/i915: use lane count and link rate from VBT as minimums for eDP
  drm/i915: clean up VBT eDP link param decoding
  drm/i915: consider the source max DP lane count too

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86-64, modify_ldt: Make support for 16-bit segments a runtime option
  x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow()
  x86, rdrand: When nordrand is specified, disable RDSEED as well

Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
"A single bug fix for a long standing issue:

   - Updating the expiry value of a relative timer _after_ letting the
     idle logic select a target cpu for the timer based on its stale
     expiry value is outright stupid.  Thanks to Viresh for spotting the
     brainfart"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Set expiry time before switch_hrtimer_base()

Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
"Two small updates from the irq departement:

   - Provide missing inline stub for a SMP only function

   - Add sub-maintainer for the drivers/irqchip/ part of the irq
     subsystem.  YAY!"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Add co-maintainer for drivers/irqchip
  genirq: Provide irq_force_affinity fallback for non-SMP

sysfs: make sure read buffer is zeroed

13c589d5b0ac ("sysfs: use seq_file when reading regular files")
switched sysfs from custom read implementation to seq_file to enable
later transition to kernfs.  After the change, the buffer passed to
->show() is acquired through seq_get_buf(); unfortunately, this
introduces a subtle behavior change.  Before the commit, the buffer
passed to ->show() was always zero as it was allocated using
get_zeroed_page().  Because seq_file doesn't clear buffers on
allocation and neither does seq_get_buf(), after the commit, depending
on the behavior of ->show(), we may end up exposing uninitialized data
to userland thus possibly altering userland visible behavior and
leaking information.

Fix it by explicitly clearing the buffer.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ron <ron@debian.org>
Fixes: 13c589d5b0ac ("sysfs: use seq_file when reading regular files")
Cc: stable <stable@vger.kernel.org> # 3.13+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge tag 'drm-intel-fixes-2014-05-16' of git://anongit.freedesktop.org/drm-intel into drm-fixes

Intel fixes for regressions, black screens and hangs, for 3.15.

* tag 'drm-intel-fixes-2014-05-16' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Increase WM memory latency values on SNB
  drm/i915: restore backlight precision when converting from ACPI
  drm/i915: Use the first mode if there is no preferred mode in the EDID
  drm/i915/dp: force eDP lane count to max available lanes on BDW
  drm/i915/vlv: reset VLV media force wake request register
  drm/i915/SDVO: For sysfs link put directory and target in correct order
  drm/i915: use lane count and link rate from VBT as minimums for eDP
  drm/i915: clean up VBT eDP link param decoding
  drm/i915: consider the source max DP lane count too

ahci: imx: PLL clock needs 100us to settle down

The commit e783c51 (ahci: imx: software workaround for phy reset issue
in resume) calls imx_sata_phy_reset() to reset phy immediately after
SATA MPLL is enabled. It seems working fine mostly, but fails in some
case as below.

...
ahci-imx 2200000.sata: failed to reset phy: -110
ahci-imx: probe of 2200000.sata failed with error -110

After talking to the designer, we learnt that when enabling i.MX6Q SATA
MPLL, we need to wait 100us for it to settle down for safety. Add this
required delay to fix above failure.

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

PCI: Wrong register used to check pending traffic

The incorrect register offset is passed to pci_wait_for_pending(), which is
caused by commit 157e876ffe ("PCI: Add pci_wait_for_pending() (refactor
pci_wait_for_pending_transaction())").

Fixes: 157e876ffe ("PCI: Add pci_wait_for_pending() (refactor pci_wait_for_pending_transaction())
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex.williamson@gmail.com>
CC: stable@vger.kernel.org # v3.14+

target: fix memory leak on XCOPY

On each processed XCOPY command, two "kmalloc-512" memory objects are
leaked. These represent two allocations of struct xcopy_pt_cmd in
target_core_xcopy.c.

The reason for the memory leak is that the cmd_kref field is not
initialized (thus, it is zero because the allocations were done with
kzalloc). When we decrement zero kref in target_put_sess_cmd, the result
is not zero, thus target_release_cmd_kref is not called.

This patch fixes the bug by moving kref initialization from
target_get_sess_cmd to transport_init_se_cmd (this function is called from
target_core_xcopy.c, so it will correctly initialize cmd_kref). It can be
easily verified that all code that calls target_get_sess_cmd also calls
transport_init_se_cmd earlier, thus moving kref_init shouldn't introduce
any new problems.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

random: fix BUG_ON caused by accounting simplification

Commit ee1de406ba6eb1 ("random: simplify accounting logic") simplified
things too much, in that it allows the following to trigger an
overflow that results in a BUG_ON crash:

dd if=/dev/urandom of=/dev/zero bs=67108707 count=1

Thanks to Peter Zihlstra for discovering the crash, and Hannes
Frederic for analyizing the root cause.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Greg Price <price@mit.edu>

clk: tegra: Fix wrong value written to PLLE_AUX

The value written to PLLE_AUX was incorrect due to a wrong variable
being used. Without this fix SATA does not work.

Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com>
Tested-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Tested-by: Thierry Reding <treding@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
[mturquette@linaro.org: improved changelog]

staging: rtl8723au: Do not reset wdev->iftype in netdev_close()

wdev->ifdev should be set by .change_virtual_intf(). This solves the
problem of WARN() messages on module unload.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge branch 'acpi-video'

* acpi-video:
ACPI / video: Revert native brightness quirk for ThinkPad T530

ACPI / video: Revert native brightness quirk for ThinkPad T530

Seems it helps some users, but causes issues for other users:
https://bugzilla.redhat.com/show_bug.cgi?id=1089545

So lets drop it for now until we've figured out a better fix.

Fixes: 43d949024425 (ACPI / video: Add use_native_backlight quirks for more systems)
References: https://bugzilla.redhat.com/show_bug.cgi?id=1089545
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

staging: rtl8723au: Use correct pipe type for USB interrupts

Use a correct pipe type when filling un interrupt urbs. This should
finally take care of the WARN() messages on the console when USB urbs
are submitted.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crush: decode and initialize chooseleaf_vary_r

Commit e2b149cc4ba0 ("crush: add chooseleaf_vary_r tunable") added the
crush_map::chooseleaf_vary_r field but missed the decode part. This
lead to misdirected requests caused by incorrect raw crush mapping
sets.

Fixes: http://tracker.ceph.com/issues/8226

Reported-and-Tested-by: Dmitry Smirnov <onlyjob@member.fsf.org>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

libceph: fix corruption when using page_count 0 page in rbd

It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:

https://github.com/zfsonlinux/spl/issues/241
http://tracker.ceph.com/issues/7790

The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.

This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.

Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>

arm64: fix pud_huge() for 2-level pagetables

The following happens when trying to run a kvm guest on a kernel
configured for 64k pages. This doesn't happen with 4k pages:

  BUG: failure at include/linux/mm.h:297/put_page_testzero()!
  Kernel panic - not syncing: BUG!
  CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF            3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1
  Call trace:
  [<fffffe0000096034>] dump_backtrace+0x0/0x16c
  [<fffffe00000961b4>] show_stack+0x14/0x1c
  [<fffffe000066e648>] dump_stack+0x84/0xb0
  [<fffffe0000668678>] panic+0xf4/0x220
  [<fffffe000018ec78>] free_reserved_area+0x0/0x110
  [<fffffe000018edd8>] free_pages+0x50/0x88
  [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40
  [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44
  [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184
  [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c
  [<fffffe00001edc1c>] __fput+0xb0/0x288
  [<fffffe00001ede4c>] ____fput+0xc/0x14
  [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c
  [<fffffe0000095c14>] do_notify_resume+0x54/0x58

In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page()
on the stage2 pgd which leads to the BUG in put_page_testzero(). This
happens because a pud_huge() test in unmap_range() returns true when it
should always be false with 2-level pages tables used by 64k pages.
This patch removes support for huge puds if 2-level pagetables are
being used.

Signed-off-by: Mark Salter <msalter@redhat.com>
[catalin.marinas@arm.com: removed #ifndef around PUD_SIZE check]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org> # v3.11+

mips: dts: Fix missing device_type="memory" property in memory nodes

A few platforms lack a 'device_type = "memory"' for their memory
nodes, relying on an old ppc quirk in order to discover its memory.
Add the missing data so that all parsing code can find memory nodes
correctly.

Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Cc: linux-mips@linux-mips.org
Cc: devicetree@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: <stable@vger.kernel.org>
Acked-by: John Crispin <blogic@openwrt.org>
Signed-off-by: Grant Likely <grant.likely@linaro.org>

arm: dts: Fix missing device_type="memory" for ste-ccu8540

The current .dts for ste-ccu8540 lacks a 'device_type = "memory"' for
its memory node, relying on an old ppc quirk in order to discover its
memory. Fix the data so that all parsing code can handle it correctly.

Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: devicetree@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Grant Likely <grant.likely@linaro.org>

target: Don't allow setting WC emulation if device doesn't support

Just like for pSCSI, if the transport sets get_write_cache, then it is
not valid to enable write cache emulation for it. Return an error.

see https://bugzilla.redhat.com/show_bug.cgi?id=1082675

Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Andy Grover <agrover@redhat.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

iscsi-target: Disable Immediate + Unsolicited Data with ISER Protection

This patch explicitly disables Immediate + Unsolicited Data for ISER
connections during login in iscsi_login_zero_tsih_s2() when protection
has been enabled for the session by the underlying hardware.

This is currently required because protection / signature memory regions
(MRs) expect T10 PI to occur on RDMA READs + RDMA WRITEs transfers, and
not on a immediate data payload associated with ISCSI_OP_SCSI_CMD, or
unsolicited data-out associated with a ISCSI_OP_SCSI_DATA_OUT.

v2 changes:
  - Add TARGET_PROT_DOUT_INSERT check (Sagi)
  - Add pr_debug noisemaker (Sagi)
  - Add goto to avoid early return from MRDSL check (nab)

Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

tcm_fc: Fix free-after-use regression in ft_free_cmd

This patch fixes a free-after-use regression in ft_free_cmd(), where
ft_sess_put() is called with cmd->sess after percpu_ida_free() has
already released the tag.

Fix this bug by saving the ft_sess pointer ahead of percpu_ida_free(),
and pass it directly to ft_sess_put().

The regression was originally introduced in v3.13-rc1 commit:

  commit 5f544cfac956971099e906f94568bc3fd1a7108a
  Author: Nicholas Bellinger <nab@daterainc.com>
  Date:   Mon Sep 23 12:12:42 2013 -0700

      tcm_fc: Convert to per-cpu command map pre-allocation of ft_cmd

Reported-by: Jun Wu <jwu@stormojo.com>
Cc: Mark Rustad <mark.d.rustad@intel.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: <stable@vger.kernel.org> #3.13+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

iscsi-target: Change BUG_ON to REJECT in iscsit_process_nop_out

This patch changes an incorrect use of BUG_ON to instead generate a
REJECT + PROTOCOL_ERROR in iscsit_process_nop_out() code.  This case
can occur with traditional TCP where a flood of zeros in the data
stream can reach this block for what is presumed to be a NOP-OUT with
a solicited reply, but without a valid iscsi_cmd pointer.

This incorrect BUG_ON was introduced during the v3.11-rc timeframe
with the following commit:

commit 778de368964c5b7e8100cde9f549992d521e9c89
Author: Nicholas Bellinger <nab@linux-iscsi.org>
Date:   Fri Jun 14 16:07:47 2013 -0700

    iscsi/isert-target: Refactor ISCSI_OP_NOOP RX handling

Reported-by: Arshad Hussain <arshad.hussain@calsoftinc.com>
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Target/iscsi,iser: Avoid accepting transport connections during stop stage

When the target is in stop stage, iSER transport initiates RDMA disconnects.
The iSER initiator may wish to establish a new connection over the
still existing network portal. In this case iSER transport should not
accept and resume new RDMA connections. In order to learn that, iscsi_np
is added with enabled flag so the iSER transport can check when deciding
weather to accept and resume a new connection request.

The iscsi_np is enabled after successful transport setup, and disabled
before iscsi_np login threads are cleaned up.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Target/iser: Fix iscsit_accept_np and rdma_cm racy flow

RDMA CM and iSCSI target flows are asynchronous and completely
uncorrelated. Relying on the fact that iscsi_accept_np will be called
after CM connection request event and will wait for it is a mistake.

When attempting to login to a few targets this flow is racy and
unpredictable, but for parallel login to dozens of targets will
race and hang every time.

The correct synchronizing mechanism in this case is pending on
a semaphore rather than a wait_for_event. We keep the pending
interruptible for iscsi_np cleanup stage.

(Squash patch to remove dead code into parent - nab)

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Target/iser: Fix wrong connection requests list addition

Should be adding list_add_tail($new, $head) and not
the other way around.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Allow non-supporting backends to set pi_prot_type to 0

Userspace tools assume if a value is read from configfs, it is valid
and will not cause an error if the same value is written back. The only
valid value for pi_prot_type for backends not supporting DIF is 0, so allow
this particular value to be set without returning an error.

Reported-by: Krzysztof Chojnowski <frirajder@gmail.com>
Signed-off-by: Andy Grover <agrover@redhat.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

parisc: Improve LWS-CAS performance

The attached change significantly improves the performance of the LWS-CAS code
in syscall.S.
This allows a number of packages to build (e.g., zeromq3, gtest and libxs)
that previously failed because slow LWS-CAS performance under contention. In
particular, interrupts taken while the lock was taken degraded performance
significantly.

The change does the following:

1) Disables interrupts around the CAS operation, and
2) Changes the loads and stores to use the ordered completer, "o", on
PA 2.0. "o" and "ma" with a zero offset are equivalent. The latter is
accepted on both PA 1.X and 2.0.

The use of ordered loads and stores probably makes no difference on all
existing hardware, but it seemed pedantically correct. In particular, the CAS
operation must complete before LDCW lock is released. As written before, a
processor could reorder the operations.

I don't believe the period interrupts are disabled is long enough to
significantly increase interrupt latency. For example, the TLB insert code is
longer. Worst case is a memory fault in the CAS operation.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 3.13+
Signed-off-by: Helge Deller <deller@gmx.de>

parisc: ratelimit userspace segfault printing

Ratelimit printing of userspace segfaults and make it runtime
configurable via the /proc/sys/debug/exception-trace variable. This
should resolve syslog from growing way too fast and thus prevents
possible system service attacks.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 3.13+