Mike Galbraith [Sat, 4 Aug 2012 03:44:14 +0000 (05:44 +0200)]
sched: Fix migration thread runtime bogosity
Make stop scheduler class do the same accounting as other classes,
Migration threads can be caught in the act while doing exec balancing,
leading to the below due to use of unmaintained ->se.exec_start. The
load that triggered this particular instance was an apparently out of
control heavily threaded application that does system monitoring in
what equated to an exec bomb, with one of the VERY frequently migrated
tasks being ps.
%CPU PID USER CMD
99.3 45 root [migration/10]
97.7 53 root [migration/12]
97.0 57 root [migration/13]
90.1 49 root [migration/11]
89.6 65 root [migration/15]
88.7 17 root [migration/3]
80.4 37 root [migration/8]
78.1 41 root [migration/9]
44.2 13 root [migration/2]
Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344051854.6739.19.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mike Galbraith [Tue, 7 Aug 2012 08:02:38 +0000 (10:02 +0200)]
sched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled
Root task group bandwidth replenishment must service all CPUs, regardless of
where the timer was last started, and regardless of the isolation mechanism,
lest 'Quoth the Raven, "Nevermore"' become rt scheduling policy.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344326558.6968.25.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mike Galbraith [Tue, 7 Aug 2012 03:00:13 +0000 (05:00 +0200)]
sched,cgroup: Fix up task_groups list
With multiple instances of task_groups, for_each_rt_rq() is a noop,
no task groups having been added to the rt.c list instance. This
renders __enable/disable_runtime() and print_rt_stats() noop, the
user (non) visible effect being that rt task groups are missing in
/proc/sched_debug.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: stable@kernel.org # v3.3+
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344308413.6846.7.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Stanislaw Gruszka [Wed, 8 Aug 2012 09:27:15 +0000 (11:27 +0200)]
sched: fix divide by zero at {thread_group,task}_times
On architectures where cputime_t is 64 bit type, is possible to trigger
divide by zero on do_div(temp, (__force u32) total) line, if total is a
non zero number but has lower 32 bit's zeroed. Removing casting is not
a good solution since some do_div() implementations do cast to u32
internally.
This problem can be triggered in practice on very long lived processes:
PID: 2331 TASK:
ffff880472814b00 CPU: 2 COMMAND: "oraagent.bin"
#0 [
ffff880472a51b70] machine_kexec at
ffffffff8103214b
#1 [
ffff880472a51bd0] crash_kexec at
ffffffff810b91c2
#2 [
ffff880472a51ca0] oops_end at
ffffffff814f0b00
#3 [
ffff880472a51cd0] die at
ffffffff8100f26b
#4 [
ffff880472a51d00] do_trap at
ffffffff814f03f4
#5 [
ffff880472a51d60] do_divide_error at
ffffffff8100cfff
#6 [
ffff880472a51e00] divide_error at
ffffffff8100be7b
[exception RIP: thread_group_times+0x56]
RIP:
ffffffff81056a16 RSP:
ffff880472a51eb8 RFLAGS:
00010046
RAX:
bc3572c9fe12d194 RBX:
ffff880874150800 RCX:
0000000110266fad
RDX:
0000000000000000 RSI:
ffff880472a51eb8 RDI:
001038ae7d9633dc
RBP:
ffff880472a51ef8 R8:
00000000b10a3a64 R9:
ffff880874150800
R10:
00007fcba27ab680 R11:
0000000000000202 R12:
ffff880472a51f08
R13:
ffff880472a51f10 R14:
0000000000000000 R15:
0000000000000007
ORIG_RAX:
ffffffffffffffff CS: 0010 SS: 0018
#7 [
ffff880472a51f00] do_sys_times at
ffffffff8108845d
#8 [
ffff880472a51f40] sys_times at
ffffffff81088524
#9 [
ffff880472a51f80] system_call_fastpath at
ffffffff8100b0f2
RIP:
0000003808caac3a RSP:
00007fcba27ab6d8 RFLAGS:
00000202
RAX:
0000000000000064 RBX:
ffffffff8100b0f2 RCX:
0000000000000000
RDX:
00007fcba27ab6e0 RSI:
000000000076d58e RDI:
00007fcba27ab6e0
RBP:
00007fcba27ab700 R8:
0000000000000020 R9:
000000000000091b
R10:
00007fcba27ab680 R11:
0000000000000202 R12:
00007fff9ca41940
R13:
0000000000000000 R14:
00007fcba27ac9c0 R15:
00007fff9ca41940
ORIG_RAX:
0000000000000064 CS: 0033 SS: 002b
Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120808092714.GA3580@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Peter Zijlstra [Wed, 8 Aug 2012 19:46:40 +0000 (21:46 +0200)]
sched, cgroup: Reduce rq->lock hold times for large cgroup hierarchies
Peter Portante reported that for large cgroup hierarchies (and or on
large CPU counts) we get immense lock contention on rq->lock and stuff
stops working properly.
His workload was a ton of processes, each in their own cgroup,
everybody idling except for a sporadic wakeup once every so often.
It was found that:
schedule()
idle_balance()
load_balance()
local_irq_save()
double_rq_lock()
update_h_load()
walk_tg_tree(tg_load_down)
tg_load_down()
Results in an entire cgroup hierarchy walk under rq->lock for every
new-idle balance and since new-idle balance isn't throttled this
results in a lot of work while holding the rq->lock.
This patch does two things, it removes the work from under rq->lock
based on the good principle of race and pray which is widely employed
in the load-balancer as a whole. And secondly it throttles the
update_h_load() calculation to max once per jiffy.
I considered excluding update_h_load() for new-idle balance
all-together, but purely relying on regular balance passes to update
this data might not work out under some rare circumstances where the
new-idle busiest isn't the regular busiest for a while (unlikely, but
a nightmare to debug if someone hits it and suffers).
Cc: pjt@google.com
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Reported-by: Peter Portante <pportant@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-aaarrzfpnaam7pqrekofu8a6@git.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Michael Wang [Thu, 12 Jul 2012 08:10:13 +0000 (16:10 +0800)]
sched/cleanups: Add load balance cpumask pointer to 'struct lb_env'
With this patch struct ld_env will have a pointer of the load balancing
cpumask and we don't need to pass a cpumask around anymore.
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FFE8665.3080705@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Srivatsa S. Bhat [Fri, 20 Jul 2012 19:24:59 +0000 (00:54 +0530)]
sched: Fix comment about PREEMPT_ACTIVE bit location
PREEMPT_ACTIVE flag is bit 27, not 28. Fix the comment.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: paulmck@linux.vnet.ibm.com
Cc: josh@joshtriplett.org
Cc: rostedt@goodmis.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120720192459.6149.14821.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ying Xue [Thu, 12 Jul 2012 07:03:42 +0000 (15:03 +0800)]
sched: Fix minor code style issues
Delete redudant spaces between type name and data name or operators.
Signed-off-by: Ying Xue <ying.xue0@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1342076622-6606-1-git-send-email-ying.xue0@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Namhyung Kim [Sat, 7 Jul 2012 07:49:02 +0000 (16:49 +0900)]
sched: Use task_rq_unlock() in __sched_setscheduler()
It seems there's no specific reason to open-code it. I guess
commit
0122ec5b02f76 ("sched: Add p->pi_lock to task_rq_lock()")
simply missed it. Let's be consistent with others.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1341647342-6742-1-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Alex Shi [Fri, 20 Jul 2012 06:19:50 +0000 (14:19 +0800)]
sched/numa: Add SD_PERFER_SIBLING to CPU domain
Commit
8e7fbcbc22c ("sched: Remove stale power aware scheduling remnants
and dysfunctional knobs") removed SD_PERFER_SIBLING from the CPU domain.
On NUMA machines this causes that load_balance() doesn't perfer LCPU in
same physical CPU package.
It causes some actual performance regressions on our NUMA machines from
Core2 to NHM and SNB.
Adding this domain flag again recovers the performance drop.
This change doesn't have any bad impact on any of my benchmarks:
specjbb, kbuild, fio, hackbench .. etc, on all my machines.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1342765190-21540-1-git-send-email-alex.shi@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 22 Jun 2012 11:36:05 +0000 (13:36 +0200)]
sched: Fix race in task_group()
Stefan reported a crash on a kernel before
a3e5d1091c1 ("sched:
Don't call task_group() too many times in set_task_rq()"), he
found the reason to be that the multiple task_group()
invocations in set_task_rq() returned different values.
Looking at all that I found a lack of serialization and plain
wrong comments.
The below tries to fix it using an extra pointer which is
updated under the appropriate scheduler locks. Its not pretty,
but I can't really see another way given how all the cgroup
stuff works.
Reported-and-tested-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340364965.18025.71.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Srivatsa Vaddagiri [Tue, 19 Jun 2012 12:13:15 +0000 (17:43 +0530)]
sched: Improve balance_cpu() to consider other cpus in its group as target of (pinned) task
Current load balance scheme requires only one cpu in a
sched_group (balance_cpu) to look at other peer sched_groups for
imbalance and pull tasks towards itself from a busy cpu. Tasks
thus pulled by balance_cpu could later get picked up by cpus
that are in the same sched_group as that of balance_cpu.
This scheme however fails to pull tasks that are not allowed to
run on balance_cpu (but are allowed to run on other cpus in its
sched_group). That can affect fairness and in some worst case
scenarios cause starvation.
Consider a two core (2 threads/core) system running tasks as
below:
Core0 Core1
/ \ / \
C0 C1 C2 C3
| | | |
v v v v
F0 T1 F1 [idle]
T2
F0 = SCHED_FIFO task (pinned to C0)
F1 = SCHED_FIFO task (pinned to C2)
T1 = SCHED_OTHER task (pinned to C1)
T2 = SCHED_OTHER task (pinned to C1 and C2)
F1 could become a cpu hog, which will starve T2 unless C1 pulls
it. Between C0 and C1 however, C0 is required to look for
imbalance between cores, which will fail to pull T2 towards
Core0. T2 will starve eternally in this case. The same scenario
can arise in presence of non-rt tasks as well (say we replace F1
with high irq load).
We tackle this problem by having balance_cpu move pinned tasks
to one of its sibling cpus (where they can run). We first check
if load balance goal can be met by ignoring pinned tasks,
failing which we retry move_tasks() with a new env->dst_cpu.
This patch modifies load balance semantics on who can move load
towards a given cpu in a given sched_domain.
Before this patch, a given_cpu or a ilb_cpu acting on behalf of
an idle given_cpu is responsible for moving load to given_cpu.
With this patch applied, balance_cpu can in addition decide on
moving some load to a given_cpu.
There is a remote possibility that excess load could get moved
as a result of this (balance_cpu and given_cpu/ilb_cpu deciding
*independently* and at *same* time to move some load to a
given_cpu). However we should see less of such conflicting
decisions in practice and moreover subsequent load balance
cycles should correct the excess load moved to given_cpu.
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FE06CDB.2060605@linux.vnet.ibm.com
[ minor edits ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Prashanth Nageshappa [Tue, 19 Jun 2012 12:22:07 +0000 (17:52 +0530)]
sched: Reset loop counters if all tasks are pinned and we need to redo load balance
While load balancing, if all tasks on the source runqueue are pinned,
we retry after excluding the corresponding source cpu. However, loop counters
env.loop and env.loop_break are not reset before retrying, which can lead
to failure in moving the tasks. In this patch we reset env.loop and
env.loop_break to their inital values before we retry.
Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FE06EEF.2090709@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Prashanth Nageshappa [Tue, 19 Jun 2012 12:17:34 +0000 (17:47 +0530)]
sched: Reorder 'struct lb_env' members to reduce its size
Members of 'struct lb_env' are not in appropriate order to reuse compiler
added padding on 64bit architectures. In this patch we reorder those struct
members and help reduce the size of the structure from 96 bytes to 80
bytes on 64 bit architectures.
Suggested-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FE06DDE.7000403@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mike Galbraith [Tue, 12 Jun 2012 03:18:32 +0000 (05:18 +0200)]
sched: Improve scalability via 'CPU buddies', which withstand random perturbations
Traversing an entire package is not only expensive, it also leads to tasks
bouncing all over a partially idle and possible quite large package. Fix
that up by assigning a 'buddy' CPU to try to motivate. Each buddy may try
to motivate that one other CPU, if it's busy, tough, it may then try its
SMT sibling, but that's all this optimization is allowed to cost.
Sibling cache buddies are cross-wired to prevent bouncing.
4 socket 40 core + SMT Westmere box, single 30 sec tbench runs, higher is better:
clients 1 2 4 8 16 32 64 128
..........................................................................
pre 30 41 118 645 3769 6214 12233 14312
post 299 603 1211 2418 4697 6847 11606 14557
A nice increase in performance.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1339471112.7352.32.camel@marge.simpson.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Srivatsa S. Bhat [Thu, 24 May 2012 14:17:03 +0000 (19:47 +0530)]
cpusets: Remove/update outdated comments
cpuset_track_online_cpus() is no longer present. So remove the
outdated comment and replace it with reference to cpuset_update_active_cpus()
which is its equivalent.
Also, we don't lack memory hot-unplug anymore. And David Rientjes pointed
out how it is dealt with. So update that comment as well.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141700.3692.98192.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Srivatsa S. Bhat [Thu, 24 May 2012 14:16:55 +0000 (19:46 +0530)]
cpusets, hotplug: Restructure functions that are invoked during hotplug
Separate out the cpuset related handling for CPU/Memory online/offline.
This also helps us exploit the most obvious and basic level of optimization
that any notification mechanism (CPU/Mem online/offline) has to offer us:
"We *know* why we have been invoked. So stop pretending that we are lost,
and do only the necessary amount of processing!".
And while at it, rename scan_for_empty_cpusets() to
scan_cpusets_upon_hotplug(), which is more appropriate considering how
it is restructured.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141650.3692.48637.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Srivatsa S. Bhat [Thu, 24 May 2012 14:16:41 +0000 (19:46 +0530)]
cpusets, hotplug: Implement cpuset tree traversal in a helper function
At present, the functions that deal with cpusets during CPU/Mem hotplug
are quite messy, since a lot of the functionality is mixed up without clear
separation. And this takes a toll on optimization as well. For example,
the function cpuset_update_active_cpus() is called on both CPU offline and CPU
online events; and it invokes scan_for_empty_cpusets(), which makes sense
only for CPU offline events. And hence, the current code ends up unnecessarily
traversing the cpuset tree during CPU online also.
As a first step towards cleaning up those functions, encapsulate the cpuset
tree traversal in a helper function, so as to facilitate upcoming changes.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141635.3692.893.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Srivatsa S. Bhat [Thu, 24 May 2012 14:16:26 +0000 (19:46 +0530)]
CPU hotplug, cpusets, suspend: Don't modify cpusets during suspend/resume
In the event of CPU hotplug, the kernel modifies the cpusets' cpus_allowed
masks as and when necessary to ensure that the tasks belonging to the cpusets
have some place (online CPUs) to run on. And regular CPU hotplug is
destructive in the sense that the kernel doesn't remember the original cpuset
configurations set by the user, across hotplug operations.
However, suspend/resume (which uses CPU hotplug) is a special case in which
the kernel has the responsibility to restore the system (during resume), to
exactly the same state it was in before suspend.
In order to achieve that, do the following:
1. Don't modify cpusets during suspend/resume. At all.
In particular, don't move the tasks from one cpuset to another, and
don't modify any cpuset's cpus_allowed mask. So, simply ignore cpusets
during the CPU hotplug operations that are carried out in the
suspend/resume path.
2. However, cpusets and sched domains are related. We just want to avoid
altering cpusets alone. So, to keep the sched domains updated, build
a single sched domain (containing all active cpus) during each of the
CPU hotplug operations carried out in s/r path, effectively ignoring
the cpusets' cpus_allowed masks.
(Since userspace is frozen while doing all this, it will go unnoticed.)
3. During the last CPU online operation during resume, build the sched
domains by looking up the (unaltered) cpusets' cpus_allowed masks.
That will bring back the system to the same original state as it was in
before suspend.
Ultimately, this will not only solve the cpuset problem related to suspend
resume (ie., restores the cpusets to exactly what it was before suspend, by
not touching it at all) but also speeds up suspend/resume because we avoid
running cpuset update code for every CPU being offlined/onlined.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141611.3692.20155.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 13 Jun 2012 13:24:45 +0000 (15:24 +0200)]
sched/x86: Remove broken power estimation
The x86 sched power implementation has been broken forever and gets in
the way of other stuff, remove it.
[ For archaeological interest, fixing this code would require dealing
with the cross-cpu calling of these functions and more importantly, we
need to filter idle time out of the a/m-perf stuff because the ratio
will go down to 0 when idle, giving a 0 capacity which is not what
we'd want. ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Link: http://lkml.kernel.org/r/1339594110.8980.38.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sat, 21 Jul 2012 20:58:29 +0000 (13:58 -0700)]
Linux 3.5
Rafael J. Wysocki [Sat, 21 Jul 2012 18:24:52 +0000 (20:24 +0200)]
Remove SYSTEM_SUSPEND_DISK system state
The SYSTEM_SUSPEND_DISK system state is never used, so drop it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 21 Jul 2012 17:34:13 +0000 (10:34 -0700)]
Merge branch 'anton-kgdb' (kgdb dmesg fixups)
Merge emailed kgdb dmesg fixups patches from Anton Vorontsov:
"The dmesg command appears to be broken after the printk rework. The
old logic in the kdb code makes no sense in terms of current
printk/logging storage format, and KDB simply hangs forever upon
entering 'dmesg' command.
The first patch revives the command by switching to kmsg_dumper
iterator. As a side-effect, the code is now much more simpler.
A few changes were needed in the printk.c: we needed unlocked variant
of the kmsg_dumper iterator, but these can surely wait for 3.6.
It's probably too late even for the first patch to go to 3.5, but I'll
try to convince otherwise. :-) Here we go:
- The current code is broken for sure, and has no hope to work at
all. It is a regression
- The new code works for me, and probably works for everyone else;
- If it compiles (and I urge everyone to compile-test it on your
setup), it hardly can make things worse."
* Merge emailed patches from Anton Vorontsov: (4 commits)
kdb: Switch to nolock variants of kmsg_dump functions
printk: Implement some unlocked kmsg_dump functions
printk: Remove kdb_syslog_data
kdb: Revive dmesg command
Anton Vorontsov [Sat, 21 Jul 2012 00:28:25 +0000 (17:28 -0700)]
kdb: Switch to nolock variants of kmsg_dump functions
The locked variants are prone to deadlocks (suppose we got to the
debugger w/ the logbuf lock held), so let's switch to nolock variants.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Vorontsov [Sat, 21 Jul 2012 00:28:07 +0000 (17:28 -0700)]
printk: Implement some unlocked kmsg_dump functions
If used from KDB, the locked variants are prone to deadlocks (suppose we
got to the debugger w/ the logbuf lock held).
So, we have to implement a few routines that grab no logbuf lock.
Yet we don't need these functions in modules, so we don't export them.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Vorontsov [Sat, 21 Jul 2012 00:27:54 +0000 (17:27 -0700)]
printk: Remove kdb_syslog_data
The function is no longer needed, so remove it.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Vorontsov [Sat, 21 Jul 2012 00:27:37 +0000 (17:27 -0700)]
kdb: Revive dmesg command
The kgdb dmesg command is broken after the printk rework. The old logic
in kdb code makes no sense in terms of current printk/logging storage
format, and KDB simply hangs forever.
This patch revives the command by switching to kmsg_dumper iterator.
The code is now much more simpler and shorter.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 20 Jul 2012 19:02:02 +0000 (12:02 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull late MIPS fixes from Ralf Baechle:
"This fixes a number of lose ends in the MIPS code and various bug
fixes.
Aside of dropping some patch that should not be in this pull request
everything has sat in -next for quite a while and there are no known
issues.
The biggest patch in this patch set moves the allocation of an array
that is aliased to a function (for runtime generated code) to
assembler code. This avoids an issue with certain toolchains when
building for microMIPS."
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (35 commits)
MIPS: PCI: Move fixups from __init to __devinit.
MIPS: Fix bug.h MIPS build regression
MIPS: sync-r4k: remove redundant irq operation
MIPS: smp: Warn on too early irq enable
MIPS: call set_cpu_online() on cpu being brought up with irq disabled
MIPS: call ->smp_finish() a little late
MIPS: Yosemite: delay irq enable to ->smp_finish()
MIPS: SMTC: delay irq enable to ->smp_finish()
MIPS: BMIPS: delay irq enable to ->smp_finish()
MIPS: Octeon: delay enable irq to ->smp_finish()
MIPS: Oprofile: Fix build as a module.
MIPS: BCM63XX: Fix BCM6368 IPSec clock bit
MIPS: perf: Fix build error caused by unused counters_per_cpu_to_total()
MIPS: Fix Magic SysRq L kernel crash.
MIPS: BMIPS: Fix duplicate header inclusion.
mips: mark const init data with __initconst instead of __initdata
MIPS: cmpxchg.h: Add missing include
MIPS: Malta may also be equipped with MIPS64 R2 processors.
MIPS: Fix typo multipy -> multiply
MIPS: Cavium: Fix duplicate ARCH_SPARSEMEM_ENABLE in kconfig.
...
Linus Torvalds [Fri, 20 Jul 2012 18:51:22 +0000 (11:51 -0700)]
Merge tag 'dm-3.5-fixes-2' of git://git./linux/kernel/git/agk/linux-dm
Pull device-mapper discard fixes from Alasdair G Kergon:
- avoid a crash in dm-raid1 when discards coincide with mirror
recovery;
- avoid discarding shared data that's still needed in dm-thin;
- don't guarantee that discarded blocks will be wiped in dm-raid1.
* tag 'dm-3.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
dm raid1: set discard_zeroes_data_unsupported
dm thin: do not send discards to shared blocks
dm raid1: fix crash with mirror recovery and discard
Linus Torvalds [Fri, 20 Jul 2012 18:43:53 +0000 (11:43 -0700)]
Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
Pull pnfs/ore fixes from Boaz Harrosh:
"These are catastrophic fixes to the pnfs objects-layout that were just
discovered. They are also destined for @stable.
I have found these and worked on them at around RC1 time but
unfortunately went to the hospital for kidney stones and had a very
slow recovery. I refrained from sending them as is, before proper
testing, and surly I have found a bug just yesterday.
So now they are all well tested, and have my sign-off. Other then
fixing the problem at hand, and assuming there are no bugs at the new
code, there is low risk to any surrounding code. And in anyway they
affect only these paths that are now broken. That is RAID5 in pnfs
objects-layout code. It does also affect exofs (which was not broken)
but I have tested exofs and it is lower priority then objects-layout
because no one is using exofs, but objects-layout has lots of users."
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
pnfs-obj: don't leak objio_state if ore_write/read fails
ore: Unlock r4w pages in exact reverse order of locking
ore: Remove support of partial IO request (NFS crash)
ore: Fix NFS crash by supporting any unaligned RAID IO
Linus Torvalds [Fri, 20 Jul 2012 18:42:30 +0000 (11:42 -0700)]
Merge tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs
Pull UBIFS free space fix-up bugfix from Artem Bityutskiy:
"It's been reported already twice recently:
http://lists.infradead.org/pipermail/linux-mtd/2012-May/041408.html
http://lists.infradead.org/pipermail/linux-mtd/2012-June/042422.html
and we finally have the fix. I am quite confident the fix is correct
because I could reproduce the problem with nandsim and verify the fix.
It was also verified by Iwo (the reporter).
I am also confident that this is OK to merge the fix so late because
this patch affects only the fixup functionality, which is not used by
most users."
* tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs:
UBIFS: fix a bug in empty space fix-up
Mikulas Patocka [Fri, 20 Jul 2012 13:25:07 +0000 (14:25 +0100)]
dm raid1: set discard_zeroes_data_unsupported
We can't guarantee that REQ_DISCARD on dm-mirror zeroes the data even if
the underlying disks support zero on discard. So this patch sets
ti->discard_zeroes_data_unsupported.
For example, if the mirror is in the process of resynchronizing, it may
happen that kcopyd reads a piece of data, then discard is sent on the
same area and then kcopyd writes the piece of data to another leg.
Consequently, the data is not zeroed.
The flag was made available by commit
983c7db347db8ce2d8453fd1d89b7a4bb6920d56
(dm crypt: always disable discard_zeroes_data).
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Fri, 20 Jul 2012 13:25:05 +0000 (14:25 +0100)]
dm thin: do not send discards to shared blocks
When process_discard receives a partial discard that doesn't cover a
full block, it sends this discard down to that block. Unfortunately, the
block can be shared and the discard would corrupt the other snapshots
sharing this block.
This patch detects block sharing and ends the discard with success when
sending it to the shared block.
The above change means that if the device supports discard it can't be
guaranteed that a discard request zeroes data. Therefore, we set
ti->discard_zeroes_data_unsupported.
Thin target discard support with this bug arrived in commit
104655fd4dcebd50068ef30253a001da72e3a081 (dm thin: support discards).
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Fri, 20 Jul 2012 13:25:03 +0000 (14:25 +0100)]
dm raid1: fix crash with mirror recovery and discard
This patch fixes a crash when a discard request is sent during mirror
recovery.
Firstly, some background. Generally, the following sequence happens during
mirror synchronization:
- function do_recovery is called
- do_recovery calls dm_rh_recovery_prepare
- dm_rh_recovery_prepare uses a semaphore to limit the number
simultaneously recovered regions (by default the semaphore value is 1,
so only one region at a time is recovered)
- dm_rh_recovery_prepare calls __rh_recovery_prepare,
__rh_recovery_prepare asks the log driver for the next region to
recover. Then, it sets the region state to DM_RH_RECOVERING. If there
are no pending I/Os on this region, the region is added to
quiesced_regions list. If there are pending I/Os, the region is not
added to any list. It is added to the quiesced_regions list later (by
dm_rh_dec function) when all I/Os finish.
- when the region is on quiesced_regions list, there are no I/Os in
flight on this region. The region is popped from the list in
dm_rh_recovery_start function. Then, a kcopyd job is started in the
recover function.
- when the kcopyd job finishes, recovery_complete is called. It calls
dm_rh_recovery_end. dm_rh_recovery_end adds the region to
recovered_regions or failed_recovered_regions list (depending on
whether the copy operation was successful or not).
The above mechanism assumes that if the region is in DM_RH_RECOVERING
state, no new I/Os are started on this region. When I/O is started,
dm_rh_inc_pending is called, which increases reg->pending count. When
I/O is finished, dm_rh_dec is called. It decreases reg->pending count.
If the count is zero and the region was in DM_RH_RECOVERING state,
dm_rh_dec adds it to the quiesced_regions list.
Consequently, if we call dm_rh_inc_pending/dm_rh_dec while the region is
in DM_RH_RECOVERING state, it could be added to quiesced_regions list
multiple times or it could be added to this list when kcopyd is copying
data (it is assumed that the region is not on any list while kcopyd does
its jobs). This results in memory corruption and crash.
There already exist bypasses for REQ_FLUSH requests: REQ_FLUSH requests
do not belong to any region, so they are always added to the sync list
in do_writes. dm_rh_inc_pending does not increase count for REQ_FLUSH
requests. In mirror_end_io, dm_rh_dec is never called for REQ_FLUSH
requests. These bypasses avoid the crash possibility described above.
These bypasses were improperly implemented for REQ_DISCARD when
the mirror target gained discard support in commit
5fc2ffeabb9ee0fc0e71ff16b49f34f0ed3d05b4 (dm raid1: support discard).
In do_writes, REQ_DISCARD requests is always added to the sync queue and
immediately dispatched (even if the region is in DM_RH_RECOVERING). However,
dm_rh_inc and dm_rh_dec is called for REQ_DISCARD resusts. So it violates the
rule that no I/Os are started on DM_RH_RECOVERING regions, and causes the list
corruption described above.
This patch changes it so that REQ_DISCARD requests follow the same path
as REQ_FLUSH. This avoids the crash.
Reference: https://bugzilla.redhat.com/837607
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Boaz Harrosh [Thu, 7 Jun 2012 23:02:30 +0000 (02:02 +0300)]
pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
It is very common for the end of the file to be unaligned on
stripe size. But since we know it's beyond file's end then
the XOR should be preformed with all zeros.
Old code used to just read zeros out of the OSD devices, which is a great
waist. But what scares me more about this situation is that, we now have
pages attached to the file's mapping that are beyond i_size. I don't
like the kind of bugs this calls for.
Fix both birds, by returning a global zero_page, if offset is beyond
i_size.
TODO:
Change the API to ->__r4w_get_page() so a NULL can be
returned without being considered as error, since XOR API
treats NULL entries as zero_pages.
[Bug since 3.2. Should apply the same way to all Kernels since]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Boaz Harrosh [Fri, 8 Jun 2012 02:29:40 +0000 (05:29 +0300)]
pnfs-obj: don't leak objio_state if ore_write/read fails
[Bug since 3.2 Kernel]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Boaz Harrosh [Wed, 11 Jul 2012 12:27:13 +0000 (15:27 +0300)]
ore: Unlock r4w pages in exact reverse order of locking
The read-4-write pages are locked in address ascending order.
But where unlocked in a way easiest for coding. Fix that,
locks should be released in opposite order of locking, .i.e
descending address order.
I have not hit this dead-lock. It was found by inspecting the
dbug print-outs. I suspect there is an higher lock at caller that
protects us, but fix it regardless.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Boaz Harrosh [Fri, 8 Jun 2012 01:30:40 +0000 (04:30 +0300)]
ore: Remove support of partial IO request (NFS crash)
Do to OOM situations the ore might fail to allocate all resources
needed for IO of the full request. If some progress was possible
it would proceed with a partial/short request, for the sake of
forward progress.
Since this crashes NFS-core and exofs is just fine without it just
remove this contraption, and fail.
TODO:
Support real forward progress with some reserved allocations
of resources, such as mem pools and/or bio_sets
[Bug since 3.2 Kernel]
CC: Stable Tree <stable@kernel.org>
CC: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Boaz Harrosh [Thu, 7 Jun 2012 22:19:07 +0000 (01:19 +0300)]
ore: Fix NFS crash by supporting any unaligned RAID IO
In RAID_5/6 We used to not permit an IO that it's end
byte is not stripe_size aligned and spans more than one stripe.
.i.e the caller must check if after submission the actual
transferred bytes is shorter, and would need to resubmit
a new IO with the remainder.
Exofs supports this, and NFS was supposed to support this
as well with it's short write mechanism. But late testing has
exposed a CRASH when this is used with none-RPC layout-drivers.
The change at NFS is deep and risky, in it's place the fix
at ORE to lift the limitation is actually clean and simple.
So here it is below.
The principal here is that in the case of unaligned IO on
both ends, beginning and end, we will send two read requests
one like old code, before the calculation of the first stripe,
and also a new site, before the calculation of the last stripe.
If any "boundary" is aligned or the complete IO is within a single
stripe. we do a single read like before.
The code is clean and simple by splitting the old _read_4_write
into 3 even parts:
1._read_4_write_first_stripe
2. _read_4_write_last_stripe
3. _read_4_write_execute
And calling 1+3 at the same place as before. 2+3 before last
stripe, and in the case of all in a single stripe then 1+2+3
is preformed additively.
Why did I not think of it before. Well I had a strike of
genius because I have stared at this code for 2 years, and did
not find this simple solution, til today. Not that I did not try.
This solution is much better for NFS than the previous supposedly
solution because the short write was dealt with out-of-band after
IO_done, which would cause for a seeky IO pattern where as in here
we execute in order. At both solutions we do 2 separate reads, only
here we do it within a single IO request. (And actually combine two
writes into a single submission)
NFS/exofs code need not change since the ORE API communicates the new
shorter length on return, what will happen is that this case would not
occur anymore.
hurray!!
[Stable this is an NFS bug since 3.2 Kernel should apply cleanly]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Artem Bityutskiy [Sat, 14 Jul 2012 11:33:09 +0000 (14:33 +0300)]
UBIFS: fix a bug in empty space fix-up
UBIFS has a feature called "empty space fix-up" which is a quirk to work-around
limitations of dumb flasher programs. Namely, of those flashers that are unable
to skip NAND pages full of 0xFFs while flashing, resulting in empty space at
the end of half-filled eraseblocks to be unusable for UBIFS. This feature is
relatively new (introduced in v3.0).
The fix-up routine (fixup_free_space()) is executed only once at the very first
mount if the superblock has the 'space_fixup' flag set (can be done with -F
option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and
writes it back to the same LEB. The routine assumes the image is pristine and
does not have anything in the journal.
There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly.
All but one LEB of the log of a pristine file-system are empty. And one
contains just a commit start node. And 'fixup_free_space()' just unmapped this
LEB, which resulted in wiping the commit start node. As a result, some users
were unable to mount the file-system next time with the following symptom:
UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node
UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0
The root-cause of this bug was that 'fixup_free_space()' wrongly assumed
that the beginning of empty space in the log head (c->lhead_offs) was known
on mount. However, it is not the case - it was always 0. UBIFS does not store
in it the master node and finds out by scanning the log on every mount.
The fix is simple - just pass commit start node size instead of 0 to
'fixup_leb()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Cc: stable@vger.kernel.org [v3.0+]
Reported-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com>
Tested-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com>
Reported-by: James Nute <newten82@gmail.com>
Linus Torvalds [Thu, 19 Jul 2012 23:11:28 +0000 (16:11 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull last minute Ceph fixes from Sage Weil:
"The important one fixes a bug in the socket failure handling behavior
that was turned up in some recent failure injection testing. The
other two are minor bug fixes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: endian bug in rbd_req_cb()
rbd: Fix ceph_snap_context size calculation
libceph: fix messenger retry
Linus Torvalds [Thu, 19 Jul 2012 15:27:13 +0000 (08:27 -0700)]
Merge tag 'md-3.5-fixes' of git://neil.brown.name/md
Pull three md bugfixes from NeilBrown:
"One of the bugs was introduced in 3.5-rc1. Others have been there for
longer."
* tag 'md-3.5-fixes' of git://neil.brown.name/md:
md/raid1: close some possible races on write errors during resync
md: avoid crash when stopping md array races with closing other open fds.
md: fix bug in handling of new_data_offset
Linus Torvalds [Thu, 19 Jul 2012 15:21:13 +0000 (08:21 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking changes from David Miller:
"Ok, we should be good to go now"
1) We have to statically initialize the init_net device list head rather
than do so in an initcall, otherwise netprio_cgroup crashes if it's
built statically rather than modular (Mark D. Rustad)
2) Fix SKB null oopser in CIPSO ipv4 option processing (Paul Moore)
3) Qlogic maintainers update (Anirban Chakraborty)
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: Statically initialize init_net.dev_base_head
MAINTAINERS: Changes in qlcnic and qlge maintainers list
cipso: don't follow a NULL pointer when setsockopt() is called
Linus Torvalds [Thu, 19 Jul 2012 15:15:55 +0000 (08:15 -0700)]
Merge branch 'upstream-fixes' of git://git./linux/kernel/git/jikos/hid
Pull HID update from Jiri Kosina:
"A final round of changes for HID for 3.5: just device ID additions."
* 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: hid-multitouch: add support for Zytronic panels
HID: add Sennheiser BTD500USB device support
HID: add battery quirk for Apple Wireless ANSI
Ezequiel Garcia [Wed, 18 Jul 2012 13:05:26 +0000 (10:05 -0300)]
cx25821: Remove bad strcpy to read-only char*
The strcpy was being used to set the name of the board. Since the
destination char* was read-only and the name is set statically at
compile time; this was both wrong and redundant.
The type of char* is changed to const char* to prevent future errors.
Reported-by: Radek Masin <radek@masin.eu>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
[ Taking directly due to vacations - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Tissoires [Tue, 19 Jun 2012 12:39:54 +0000 (14:39 +0200)]
HID: hid-multitouch: add support for Zytronic panels
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Sebastian Andrzej Siewior [Thu, 19 Jul 2012 05:13:54 +0000 (07:13 +0200)]
MIPS: PCI: Move fixups from __init to __devinit.
Fixups are executed once the pci-device is found which is during boot
process so __init seems fine as long as the platform does not support
hotplug.
However it is possible to remove the PCI bus at run time and have it
rediscovered again via "echo 1 > /sys/bus/pci/rescan" and this will call
the fixups again.
[ralf@linux-mips.org: Made piixirqmap[] in malta_piix_func0_fixup()
__initdata.]
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: linux-mips@linux-mips.org
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Yoichi Yuasa [Thu, 19 Jul 2012 05:13:54 +0000 (07:13 +0200)]
MIPS: Fix bug.h MIPS build regression
Commit:
3777808873b0c49c5cf27e44c948dfb02675d578 [bug.h: need linux/kernel.h
for TAINT_WARN.] breaks all MIPS builds.
CC arch/mips/kernel/machine_kexec.o
In file included from include/linux/kernel.h:20:0,
from include/asm-generic/bug.h:35,
from /home/yuasa/src/linux/kernel/git/linux-2.6/arch/mips/include/asm/bug.h:41,
from /home/yuasa/src/linux/kernel/git/linux-2.6/arch/mips/include/asm/bitops.h:20,
from include/linux/bitops.h:22,
from include/linux/signal.h:38,
from include/linux/elfcore.h:5,
from include/linux/kexec.h:60,
from arch/mips/kernel/machine_kexec.c:9:
include/linux/log2.h: In function '__ilog2_u32':
include/linux/log2.h:34:2: error: implicit declaration of function 'fls' [-Werror=implicit-function-declaration]
include/linux/log2.h: In function '__ilog2_u64':
include/linux/log2.h:42:2: error: implicit declaration of function 'fls64' [-Werror=implicit-function-declaration]
include/linux/log2.h: In function '__roundup_pow_of_two':
include/linux/log2.h:63:2: error: implicit declaration of function 'fls_long' [-Werror=implicit-function-declaration]
In file included from include/linux/bitops.h:22:0,
from include/linux/signal.h:38,
from include/linux/elfcore.h:5,
from include/linux/kexec.h:60,
from arch/mips/kernel/machine_kexec.c:9:
/home/yuasa/src/linux/kernel/git/linux-2.6/arch/mips/include/asm/bitops.h: At top level:
/home/yuasa/src/linux/kernel/git/linux-2.6/arch/mips/include/asm/bitops.h:615:19: error: static declaration of 'fls' follows non-static declaration
include/linux/log2.h:34:9: note: previous implicit declaration of 'fls' was here
In file included from /home/yuasa/src/linux/kernel/git/linux-2.6/arch/mips/include/asm/bitops.h:651:0,
from include/linux/bitops.h:22,
from include/linux/signal.h:38,
from include/linux/elfcore.h:5,
from include/linux/kexec.h:60,
from arch/mips/kernel/machine_kexec.c:9:
include/asm-generic/bitops/fls64.h:18:28: error: static declaration of 'fls64' follows non-static declaration
include/linux/log2.h:42:9: note: previous implicit declaration of 'fls64' was here
In file included from include/linux/signal.h:38:0,
from include/linux/elfcore.h:5,
from include/linux/kexec.h:60,
from arch/mips/kernel/machine_kexec.c:9:
include/linux/bitops.h:160:24: error: conflicting types for 'fls_long'
include/linux/log2.h:63:16: note: previous implicit declaration of 'fls_long' was here
cc1: all warnings being treated as errors
make[2]: *** [arch/mips/kernel/machine_kexec.o] Error 1
Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: yuasa@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: Linuxppc-dev <linuxppc-dev@ozlabs.org>
Cc: Linux MIPS Mailing List <linux-mips@linux-mips.org>
Cc: Linux-sh list <linux-sh@vger.kernel.org>
Cc: Chris Zankel <chris@zankel.net>
Patchwork: https://patchwork.linux-mips.org/patch/4000/
Tested-by: John Crispin <blogic@openwrt.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Yong Zhang [Thu, 19 Jul 2012 07:13:54 +0000 (09:13 +0200)]
MIPS: sync-r4k: remove redundant irq operation
Since we have delayed irq enabling to ->smp_finish()
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Yong Zhang [Thu, 19 Jul 2012 07:13:53 +0000 (09:13 +0200)]
MIPS: smp: Warn on too early irq enable
Just to catch a potential issue.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3852/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Yong Zhang [Thu, 19 Jul 2012 07:13:53 +0000 (09:13 +0200)]
MIPS: call set_cpu_online() on cpu being brought up with irq disabled
To prevent a problem as commit
5fbd036b [sched: Cleanup cpu_active madness]
and commit
2baab4e9 [sched: Fix select_fallback_rq() vs cpu_active/cpu_online]
try to resolve, move set_cpu_online() to the brought up CPU and with irq
disabled.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3851/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Yong Zhang [Thu, 19 Jul 2012 07:13:53 +0000 (09:13 +0200)]
MIPS: call ->smp_finish() a little late
We have move irq enable to ->smp_finish. Place ->smp_finish() a little
late to prepare for move set_cpu_online() into start_secondary.
And it's not necessary to call cpu_set(cpu, cpu_callin_map) and
synchronise_count_slave() with irq enabled.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3850/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Yong Zhang [Thu, 19 Jul 2012 07:13:53 +0000 (09:13 +0200)]
MIPS: Yosemite: delay irq enable to ->smp_finish()
To prepare for smoothing set_cpu_[active|online]() mess up
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3848/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Yong Zhang [Thu, 19 Jul 2012 07:13:53 +0000 (09:13 +0200)]
MIPS: SMTC: delay irq enable to ->smp_finish()
To prepare for smoothing set_cpu_[active|online]() mess up
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3847/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Yong Zhang [Thu, 19 Jul 2012 07:13:53 +0000 (09:13 +0200)]
MIPS: BMIPS: delay irq enable to ->smp_finish()
To prepare for smoothing set_cpu_[active|online]() mess up
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3846/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Yong Zhang [Thu, 19 Jul 2012 07:13:53 +0000 (09:13 +0200)]
MIPS: Octeon: delay enable irq to ->smp_finish()
To prepare for smoothing set_cpu_[active|online]() mess up
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3845/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Thu, 19 Jul 2012 07:13:52 +0000 (09:13 +0200)]
MIPS: Oprofile: Fix build as a module.
When building oprofile as a module for R10000 or R7000 class processors,
E9000 or MIPSxx class cores since
3572a2c37f667ee49333f8863722b8f43eac506b
[MIPS: make oprofile use cp0_perfcount_irq if it is set] an
ERROR: "cp0_compare_irq" [arch/mips/oprofile/oprofile.ko] undefined!
error will happen. Fixed by exporting cp0_compare_irq.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Florian Fainelli [Thu, 19 Jul 2012 07:13:52 +0000 (09:13 +0200)]
MIPS: BCM63XX: Fix BCM6368 IPSec clock bit
The IPsec clock bit is 18 and not 17.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Cc: linux-mips@linux-mips.org
Cc: mpm@selenic.com
Cc: herbert@gondor.apana.org.au
Patchwork: https://patchwork.linux-mips.org/patch/3323/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Florian Fainelli [Thu, 19 Jul 2012 07:13:52 +0000 (09:13 +0200)]
MIPS: perf: Fix build error caused by unused counters_per_cpu_to_total()
cc1: warnings being treated as errors
arch/mips/kernel/perf_event_mipsxx.c:166: error: 'counters_per_cpu_to_total' defined but not used
make[2]: *** [arch/mips/kernel/perf_event_mipsxx.o] Error 1
make[2]: *** Waiting for unfinished jobs....
It was first introduced by
82091564cfd7ab8def42777a9c662dbf655c5d25 [MIPS:
perf: Add support for 64-bit perf counters.] in 3.2.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Cc: linux-mips@linux-mips.org
Cc: david.daney@cavium.com
Patchwork: https://patchwork.linux-mips.org/patch/3357/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Vincent Wen [Thu, 19 Jul 2012 07:11:16 +0000 (09:11 +0200)]
MIPS: Fix Magic SysRq L kernel crash.
show_backtrace() was passed a NULL pointer which caused paging
request fail. Set to current task as other architectures (ARM,
etc) do when passed a NULL task pointer.
Signed-off-by: Vincent Wen <vincentwenlinux@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: cernekee@gmail.com
Patchwork: https://patchwork.linux-mips.org/patch/3524/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Danny Kukawka [Thu, 19 Jul 2012 07:11:16 +0000 (09:11 +0200)]
MIPS: BMIPS: Fix duplicate header inclusion.
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Cc: Danny Kukawka <dkukawka@suse.de>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/3369/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Uwe Kleine-König [Thu, 19 Jul 2012 07:11:16 +0000 (09:11 +0200)]
mips: mark const init data with __initconst instead of __initdata
As long as there is no other non-const variable marked __initdata in the
same compilation unit it doesn't hurt. If there were one however
compilation would fail with
error: $variablename causes a section type conflict
because a section containing const variables is marked read only and so
cannot contain non-const variables.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: linux-mips@linux-mips.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: kernel@pengutronix.de
Patchwork: https://patchwork.linux-mips.org/patch/3565/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Aaro Koskinen [Thu, 19 Jul 2012 07:11:15 +0000 (09:11 +0200)]
MIPS: cmpxchg.h: Add missing include
Fix the following build breakage in v3.4-rc1:
CC kernel/irq_work.o
In file included from include/linux/irq_work.h:4:0,
from kernel/irq_work.c:10:
include/linux/llist.h: In function 'llist_del_all':
include/linux/llist.h:178:2: error: implicit declaration of function 'BUILD_BUG_ON' [-Werror=implicit-function-declaration]
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3568/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Leonid Yegoshin [Thu, 19 Jul 2012 07:11:15 +0000 (09:11 +0200)]
MIPS: Malta may also be equipped with MIPS64 R2 processors.
Signed-off-by: Leonid Yegoshin <yegoshin@mips.com>
Signed-off-by: Steven J. Hill <sjhill@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3792/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Thu, 19 Jul 2012 07:11:15 +0000 (09:11 +0200)]
MIPS: Fix typo multipy -> multiply
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Yoichi Yuasa [Thu, 19 Jul 2012 07:11:15 +0000 (09:11 +0200)]
MIPS: Cavium: Fix duplicate ARCH_SPARSEMEM_ENABLE in kconfig.
Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3883/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Yoichi Yuasa [Thu, 19 Jul 2012 07:11:15 +0000 (09:11 +0200)]
MIPS: BCM47xx: Fix BCMA_DRIVER_PCI_HOSTMODE config dependencies
warning: (BCM47XX_BCMA) selects BCMA_DRIVER_PCI_HOSTMODE which has unmet direct dependencies (BCMA_POSSIBLE && BCMA && MIPS && BCMA_HOST_PCI)
warning: (BCM47XX_BCMA) selects BCMA_DRIVER_PCI_HOSTMODE which has unmet direct dependencies (BCMA_POSSIBLE && BCMA && MIPS && BCMA_HOST_PCI)
Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3882/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Thu, 19 Jul 2012 07:11:14 +0000 (09:11 +0200)]
MIPS: SMTC: Spelling and grammar corrections.
Extractd from Steven J. Hill's https://patchwork.linux-mips.org/patch/3603/.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Thu, 19 Jul 2012 07:11:14 +0000 (09:11 +0200)]
MIPS: Properly align the .data..init_task section.
Improper alignment can lead to unbootable systems and/or random
crashes.
[ralf@linux-mips.org: This is a lond standing bug since
6eb10bc9e2deab06630261cd05c4cb1e9a60e980 (kernel.org) rsp.
c422a10917f75fd19fa7fe070aaaa23e384dae6f (lmo) [MIPS: Clean up linker script
using new linker script macros.] so dates back to 2.6.32.]
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org>
Patchwork: https://patchwork.linux-mips.org/patch/3881/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Steven J. Hill [Thu, 19 Jul 2012 07:11:14 +0000 (09:11 +0200)]
MIPS: Malta: Change start address to avoid conflicts.
There are ACPI and SMB devices in the 0x1000..0x1fff address range.
Signed-off-by: Steven J. Hill <sjhill@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3581/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Leonid Yegoshin [Thu, 19 Jul 2012 07:11:14 +0000 (09:11 +0200)]
MIPS: Fix race condition with FPU thread task flag during context switch.
[ralf@linux-mips.org: Cosmetic changes; also fixed up r2300_switch.S and
octeon_switch.S which needed similar modifications.]
Signed-off-by: Leonid Yegoshin <yegoshin@mips.com>
Signed-off-by: Steven J. Hill <sjhill@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3784/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Douglas Leung [Thu, 19 Jul 2012 07:11:13 +0000 (09:11 +0200)]
MIPS: Fix decoding of c0_config1 for MIPSxx caches with 32 ways per set.
This affects certain 4Kc cores.
Signed-off-by: Douglas Leung <douglas@mips.com>
Signed-off-by: Steven J. Hill <sjhill@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3855/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Steven J. Hill [Fri, 6 Jul 2012 19:56:01 +0000 (21:56 +0200)]
MIPS: Refactor 'clear_page' and 'copy_page' functions.
Remove usage of the '__attribute__((alias("...")))' hack that aliased
to integer arrays containing micro-assembled instructions. This hack
breaks when building a microMIPS kernel. It also makes the code much
easier to understand.
[ralf@linux-mips.org: Added back export of the clear_page and copy_page
symbols so certain modules will work again. Also fixed build with
CONFIG_SIBYTE_DMA_PAGEOPS enabled.]
Signed-off-by: Steven J. Hill <sjhill@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3866/
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Leonid Yegoshin [Fri, 6 Jul 2012 19:56:01 +0000 (21:56 +0200)]
MIPS: Don't panic on 5KEc.
It's a bloody bog standard MIPS64R2 core with just a new PrId ID. Iow
that essentially means Linux just panics because it doesn't know how to
name the core.
[ralf@linux-mips.org: Split original patch into several smaller patches.]
Signed-off-by: Leonid Yegoshin <yegoshin@mips.com>
Signed-off-by: Steven J. Hill <sjhill@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3792/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
NeilBrown [Thu, 19 Jul 2012 05:59:18 +0000 (15:59 +1000)]
md/raid1: close some possible races on write errors during resync
commit
4367af556133723d0f443e14ca8170d9447317cb
md/raid1: clear bad-block record when write succeeds.
Added a 'reschedule_retry' call possibility at the end of
end_sync_write, but didn't add matching code at the end of
sync_request_write. So if the writes complete very quickly, or
scheduling makes it seem that way, then we can miss rescheduling
the request and the resync could hang.
Also commit
73d5c38a9536142e062c35997b044e89166e063b
md: avoid races when stopping resync.
Fix a race condition in this same code in end_sync_write but didn't
make the change in sync_request_write.
This patch updates sync_request_write to fix both of those.
Patch is suitable for 3.1 and later kernels.
Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Original-version-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 19 Jul 2012 05:59:18 +0000 (15:59 +1000)]
md: avoid crash when stopping md array races with closing other open fds.
md will refuse to stop an array if any other fd (or mounted fs) is
using it.
When any fs is unmounted of when the last open fd is closed all
pending IO will be flushed (e.g. sync_blockdev call in __blkdev_put)
so there will be no pending IO to worry about when the array is
stopped.
However in order to send the STOP_ARRAY ioctl to stop the array one
must first get and open fd on the block device.
If some fd is being used to write to the block device and it is closed
after mdadm open the block device, but before mdadm issues the
STOP_ARRAY ioctl, then there will be no last-close on the md device so
__blkdev_put will not call sync_blockdev.
If this happens, then IO can still be in-flight while md tears down
the array and bad things can happen (use-after-free and subsequent
havoc).
So in the case where do_md_stop is being called from an open file
descriptor, call sync_block after taking the mutex to ensure there
will be no new openers.
This is needed when setting a read-write device to read-only too.
Cc: stable@vger.kernel.org
Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 19 Jul 2012 05:59:18 +0000 (15:59 +1000)]
md: fix bug in handling of new_data_offset
commit
c6563a8c38fde3c1c7fc925a10bde3ca20799301
md: add possibility to change data-offset for devices.
introduced a 'new_data_offset' attribute which should normally
be the same as 'data_offset', but can be explicitly set to a different
value to allow a reshape operation to move the data.
Unfortunately when the 'data_offset' is explicitly set through
sysfs, the new_data_offset is not also set, so the two would become
out-of-sync incorrectly.
One result of this is that trying to set the 'size' after the
'data_offset' would fail because it is not permitted to set the size
when the 'data_offset' and 'new_data_offset' are different - as that
can be confusing.
Consequently when mdadm tried to do this while assembling an IMSM
array it would fail.
This bug was introduced in 3.5-rc1.
Reported-by: Brian Downing <bdowning@lavos.net>
Bisected-by: Brian Downing <bdowning@lavos.net>
Tested-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: NeilBrown <neilb@suse.de>
Linus Torvalds [Thu, 19 Jul 2012 01:40:38 +0000 (18:40 -0700)]
Merge git://git./linux/kernel/git/nab/target-pending
Pull target fixes from Nicholas Bellinger:
"This includes a bugfix from MDR to address a NULL pointer OOPs with
FCoE aborts, along with a WRITE_SAME emulation bugfix for NOLB=0
cases, and persistent reservation return cleanups from Roland.
All three patches are CC'ed to stable."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Fix range calculation in WRITE SAME emulation when num blocks == 0
target: Clean up returning errors in PR handling code
tcm_fc: Fix crash seen with aborts and large reads
Olaf Hering [Wed, 18 Jul 2012 21:12:04 +0000 (14:12 -0700)]
kexec: update URL of kexec homepage
The referenced html file does not exist anymore. Replace the URL with
the current project homepage.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yoichi Yuasa [Wed, 18 Jul 2012 21:12:01 +0000 (14:12 -0700)]
mips: fix bug.h build regression
Commit
377780887 ("bug.h: need linux/kernel.h for TAINT_WARN.") broke
all MIPS builds:
CC arch/mips/kernel/machine_kexec.o
include/linux/log2.h: In function '__ilog2_u32':
include/linux/log2.h:34:2: error: implicit declaration of function 'fls' [-Werror=implicit-function-declaration]
include/linux/log2.h: In function '__ilog2_u64':
include/linux/log2.h:42:2: error: implicit declaration of function 'fls64' [-Werror=implicit-function-declaration]
...
Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Tested-by: John Crispin <blogic@openwrt.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 19 Jul 2012 01:15:46 +0000 (18:15 -0700)]
Make wait_for_device_probe() also do scsi_complete_async_scans()
Commit
a7a20d103994 ("sd: limit the scope of the async probe domain")
make the SCSI device probing run device discovery in it's own async
domain.
However, as a result, the partition detection was no longer synchronized
by async_synchronize_full() (which, despite the name, only synchronizes
the global async space, not all of them). Which in turn meant that
"wait_for_device_probe()" would not wait for the SCSI partitions to be
parsed.
And "wait_for_device_probe()" was what the boot time init code relied on
for mounting the root filesystem.
Now, most people never noticed this, because not only is it
timing-dependent, but modern distributions all use initrd. So the root
filesystem isn't actually on a disk at all. And then before they
actually mount the final disk filesystem, they will have loaded the
scsi-wait-scan module, which not only does the expected
wait_for_device_probe(), but also does scsi_complete_async_scans().
[ Side note: scsi_complete_async_scans() had also been partially broken,
but that was fixed in commit
43a8d39d0137 ("fix async probe
regression"), so that same commit
a7a20d103994 had actually broken
setups even if you used scsi-wait-scan explicitly ]
Solve this problem by just moving the scsi_complete_async_scans() call
into wait_for_device_probe(). Everybody who wants to wait for device
probing to finish really wants the SCSI probing to complete, so there's
no reason not to do this.
So now "wait_for_device_probe()" really does what the name implies, and
properly waits for device probing to finish. This also removes the now
unnecessary extra calls to scsi_complete_async_scans().
Reported-and-tested-by: Artem S. Tashkinov <t.artem@mailcity.com>
Cc: Dan Williams <dan.j.williams@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: linux-scsi <linux-scsi@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 18 Jul 2012 20:42:44 +0000 (13:42 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull SELinux regression fixes from James Morris.
Andrew Morton has a box that hit that open perms problem.
I also renamed the "epollwakeup" selinux name for the new capability to
be "block_suspend", to match the rename done by commit
d9914cf66181
("PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND").
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
SELinux: do not check open perms if they are not known to policy
SELinux: include definition of new capabilities
Rustad, Mark D [Wed, 18 Jul 2012 09:06:07 +0000 (09:06 +0000)]
net: Statically initialize init_net.dev_base_head
This change eliminates an initialization-order hazard most
recently seen when netprio_cgroup is built into the kernel.
With thanks to Eric Dumazet for catching a bug.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 18 Jul 2012 17:36:02 +0000 (10:36 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
One more time/ntp fix pulled from Ingo Molnar.
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ntp: Fix STA_INS/DEL clearing bug
Hans Verkuil [Wed, 11 Jul 2012 12:12:45 +0000 (14:12 +0200)]
v4l2-dev: forgot to add VIDIOC_DV_TIMINGS_CAP.
The VIDIOC_DV_TIMINGS_CAP ioctl check wasn't added to determine_valid_ioctls().
This caused this ioctl to always return -ENOTTY.
The cause for this was that for 3.5 two patch series were merged, one
changing V4L2 core ioctl handling and one adding new functionality, and
some of the new functionality wasn't handled by the new V4L2 core code.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
[ Taking it directly due to vacations - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 18 Jul 2012 17:27:08 +0000 (10:27 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes for SPEAr from Olof Johansson:
"These are arriving very late in the release cycle, but there has been
a change of maintainers on the SPEAr platform and they have needed a
while to get going.
The patch count is higher than I would like at this point, but they're
all relevant fixes and well-contained in their own platform code. I
still think it's suitable 3.5 material and I don't think it should
increase the need for a -rc8 since they are so contained."
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: SPEAr600: Fix timer interrupt definition in spear600.dtsi
ARM: dts: SPEAr320: Boot the board in EXTENDED_MODE
ARM: dts: SPEAr320: Fix compatible string
Clk: SPEAr1340: Update sys clock parent array
clk: SPEAr1340: Fix clk enable register for uart1 and i2c1.
ARM: SPEAr13xx: Fix Interrupt bindings
Clk:spear6xx:Fix: Rename clk ids within predefined limit
Clk:spear3xx:Fix: Rename clk ids within predefined limit
clk:spear1310:Fix: Rename clk ids within predefined limit
clk:spear1340:Fix: Rename clk ids within predefined limit
Anirban Chakraborty [Tue, 17 Jul 2012 09:22:09 +0000 (09:22 +0000)]
MAINTAINERS: Changes in qlcnic and qlge maintainers list
Please apply.
Thanks.
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 18 Jul 2012 16:28:11 +0000 (09:28 -0700)]
Merge git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French.
* git://git.samba.org/sfrench/cifs-2.6:
cifs: always update the inode cache with the results from a FIND_*
cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps
cifs: on CONFIG_HIGHMEM machines, limit the rsize/wsize to the kmap space
Initialise mid_q_entry before putting it on the pending queue
Paul Moore [Tue, 17 Jul 2012 11:07:47 +0000 (11:07 +0000)]
cipso: don't follow a NULL pointer when setsockopt() is called
As reported by Alan Cox, and verified by Lin Ming, when a user
attempts to add a CIPSO option to a socket using the CIPSO_V4_TAG_LOCAL
tag the kernel dies a terrible death when it attempts to follow a NULL
pointer (the skb argument to cipso_v4_validate() is NULL when called via
the setsockopt() syscall).
This patch fixes this by first checking to ensure that the skb is
non-NULL before using it to find the incoming network interface. In
the unlikely case where the skb is NULL and the user attempts to add
a CIPSO option with the _TAG_LOCAL tag we return an error as this is
not something we want to allow.
A simple reproducer, kindly supplied by Lin Ming, although you must
have the CIPSO DOI #3 configure on the system first or you will be
caught early in cipso_v4_validate():
#include <sys/types.h>
#include <sys/socket.h>
#include <linux/ip.h>
#include <linux/in.h>
#include <string.h>
struct local_tag {
char type;
char length;
char info[4];
};
struct cipso {
char type;
char length;
char doi[4];
struct local_tag local;
};
int main(int argc, char **argv)
{
int sockfd;
struct cipso cipso = {
.type = IPOPT_CIPSO,
.length = sizeof(struct cipso),
.local = {
.type = 128,
.length = sizeof(struct local_tag),
},
};
memset(cipso.doi, 0, 4);
cipso.doi[3] = 3;
sockfd = socket(AF_INET, SOCK_DGRAM, 0);
#define SOL_IP 0
setsockopt(sockfd, SOL_IP, IP_OPTIONS,
&cipso, sizeof(struct cipso));
return 0;
}
CC: Lin Ming <mlin@ss.pku.edu.cn>
Reported-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Wed, 18 Jul 2012 08:31:36 +0000 (09:31 +0100)]
ext4: fix duplicated mnt_drop_write call in EXT4_IOC_MOVE_EXT
Caused, AFAICS, by mismerge in commit
ff9cb1c4eead ("Merge branch
'for_linus' into for_linus_merged")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Olof Johansson [Wed, 18 Jul 2012 05:43:53 +0000 (22:43 -0700)]
Merge branch 'for-3.5-spear-fixes' of git.stlinux.com/spear/linux-2.6 into fixes
* 'for-3.5-spear-fixes' of http://git.stlinux.com/spear/linux-2.6:
ARM: SPEAr600: Fix timer interrupt definition in spear600.dtsi
ARM: dts: SPEAr320: Boot the board in EXTENDED_MODE
ARM: dts: SPEAr320: Fix compatible string
Clk: SPEAr1340: Update sys clock parent array
clk: SPEAr1340: Fix clk enable register for uart1 and i2c1.
ARM: SPEAr13xx: Fix Interrupt bindings
Clk:spear6xx:Fix: Rename clk ids within predefined limit
Clk:spear3xx:Fix: Rename clk ids within predefined limit
clk:spear1310:Fix: Rename clk ids within predefined limit
clk:spear1340:Fix: Rename clk ids within predefined limit
Stefan Roese [Fri, 11 May 2012 08:41:01 +0000 (10:41 +0200)]
ARM: SPEAr600: Fix timer interrupt definition in spear600.dtsi
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Vipul Kumar Samar [Fri, 13 Jul 2012 11:52:11 +0000 (17:22 +0530)]
ARM: dts: SPEAr320: Boot the board in EXTENDED_MODE
On spear320 device supported mode are:
* AUTO_NET_SMII_MODE
* AUTO_NET_MII_MODE
* AUTO_EXP_MODE
* SMALL_PRINTERS_MODE
* EXTENDED_MODE
spear320-evb board is designed for EXTENDED_MODE only, hence it does not
boot correctly in current form where pinctrl part for some devices fail.
Configure and boot the SPEAr320 evaluation board in EXTENDED_MODE.
Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Vipul Kumar Samar [Fri, 13 Jul 2012 11:50:46 +0000 (17:20 +0530)]
ARM: dts: SPEAr320: Fix compatible string
Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Vipul Kumar Samar [Fri, 6 Jul 2012 10:22:36 +0000 (15:52 +0530)]
Clk: SPEAr1340: Update sys clock parent array
sys_clk has multiple parents and selection of parent depends on sys_clk_ctrl
register bit no. 23:25, with following possibilities
0XX: pll1_clk
10X: sys_synth_clk
110: pll2_clk
111: pll3_clk
Out of several possibilities (h/w wise) to select same clock parent for
sys_clk, current clock implementation was considering just one value.
When bootloader programmed different (valid) value to select a clock
parent then Linux breaks.
Here, we try to include all possibilities which can lead to same
clock selection thus making Linux independent of bootloader selection
values.
Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Vipul Kumar Samar [Wed, 4 Jul 2012 10:52:19 +0000 (18:52 +0800)]
clk: SPEAr1340: Fix clk enable register for uart1 and i2c1.
This patch is to fix typing mistake of clk enable register of i2c1 and
uart1.
Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Vipul Kumar Samar [Wed, 4 Jul 2012 10:52:17 +0000 (18:52 +0800)]
ARM: SPEAr13xx: Fix Interrupt bindings
- Correct interrupt bindings for uart, ethernet and pmu.
- Added interrupt binding for keyboard.
Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Vipul Kumar Samar [Tue, 10 Jul 2012 11:42:46 +0000 (17:12 +0530)]
Clk:spear6xx:Fix: Rename clk ids within predefined limit
The max limit of con_id is 16 and dev_id is 20. As of now for spear6xx, many clk
ids are exceeding this predefined limit.
This patch is intended to rename clk ids like:
mux_clk -> _mclk
gate_clk -> _gclk
synth_clk -> syn_clk
ras_gen1_synth_gate_clk -> ras_syn1_gclk
pll3_48m -> pll3_
Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Vipul Kumar Samar [Tue, 10 Jul 2012 11:42:45 +0000 (17:12 +0530)]
Clk:spear3xx:Fix: Rename clk ids within predefined limit
The max limit of con_id is 16 and dev_id is 20. As of now for spear3xx, many clk
ids are exceeding this predefined limit.
This patch is intended to rename clk ids like:
mux_clk -> _mclk
gate_clk -> _gclk
synth_clk -> syn_clk
ras_gen1_synth_gate_clk -> ras_syn1_gclk
ras_pll3_48m -> ras_pll3_
pll3_48m -> pll3_
Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Vipul Kumar Samar [Tue, 10 Jul 2012 11:42:44 +0000 (17:12 +0530)]
clk:spear1310:Fix: Rename clk ids within predefined limit
The max limit of con_id is 16 and dev_id is 20. As of now for spear1310, many
clk ids are exceeding this predefined limit.
This patch is intended to rename clk ids like:
mux_clk -> _mclk
gate_clk -> _gclk
synth_clk -> syn_clk
gmac_phy -> phy_
gmii_125m_pad -> gmii_pad
Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>