platform/kernel/linux-exynos.git
12 years agosched: Fix migration thread runtime bogosity
Mike Galbraith [Sat, 4 Aug 2012 03:44:14 +0000 (05:44 +0200)]
sched: Fix migration thread runtime bogosity

Make stop scheduler class do the same accounting as other classes,

Migration threads can be caught in the act while doing exec balancing,
leading to the below due to use of unmaintained ->se.exec_start.  The
load that triggered this particular instance was an apparently out of
control heavily threaded application that does system monitoring in
what equated to an exec bomb, with one of the VERY frequently migrated
tasks being ps.

%CPU   PID USER     CMD
99.3    45 root     [migration/10]
97.7    53 root     [migration/12]
97.0    57 root     [migration/13]
90.1    49 root     [migration/11]
89.6    65 root     [migration/15]
88.7    17 root     [migration/3]
80.4    37 root     [migration/8]
78.1    41 root     [migration/9]
44.2    13 root     [migration/2]

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344051854.6739.19.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled
Mike Galbraith [Tue, 7 Aug 2012 08:02:38 +0000 (10:02 +0200)]
sched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled

Root task group bandwidth replenishment must service all CPUs, regardless of
where the timer was last started, and regardless of the isolation mechanism,
lest 'Quoth the Raven, "Nevermore"' become rt scheduling policy.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344326558.6968.25.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosched,cgroup: Fix up task_groups list
Mike Galbraith [Tue, 7 Aug 2012 03:00:13 +0000 (05:00 +0200)]
sched,cgroup: Fix up task_groups list

With multiple instances of task_groups, for_each_rt_rq() is a noop,
no task groups having been added to the rt.c list instance.  This
renders __enable/disable_runtime() and print_rt_stats() noop, the
user (non) visible effect being that rt task groups are missing in
/proc/sched_debug.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: stable@kernel.org # v3.3+
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344308413.6846.7.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosched: fix divide by zero at {thread_group,task}_times
Stanislaw Gruszka [Wed, 8 Aug 2012 09:27:15 +0000 (11:27 +0200)]
sched: fix divide by zero at {thread_group,task}_times

On architectures where cputime_t is 64 bit type, is possible to trigger
divide by zero on do_div(temp, (__force u32) total) line, if total is a
non zero number but has lower 32 bit's zeroed. Removing casting is not
a good solution since some do_div() implementations do cast to u32
internally.

This problem can be triggered in practice on very long lived processes:

  PID: 2331   TASK: ffff880472814b00  CPU: 2   COMMAND: "oraagent.bin"
   #0 [ffff880472a51b70] machine_kexec at ffffffff8103214b
   #1 [ffff880472a51bd0] crash_kexec at ffffffff810b91c2
   #2 [ffff880472a51ca0] oops_end at ffffffff814f0b00
   #3 [ffff880472a51cd0] die at ffffffff8100f26b
   #4 [ffff880472a51d00] do_trap at ffffffff814f03f4
   #5 [ffff880472a51d60] do_divide_error at ffffffff8100cfff
   #6 [ffff880472a51e00] divide_error at ffffffff8100be7b
      [exception RIP: thread_group_times+0x56]
      RIP: ffffffff81056a16  RSP: ffff880472a51eb8  RFLAGS: 00010046
      RAX: bc3572c9fe12d194  RBX: ffff880874150800  RCX: 0000000110266fad
      RDX: 0000000000000000  RSI: ffff880472a51eb8  RDI: 001038ae7d9633dc
      RBP: ffff880472a51ef8   R8: 00000000b10a3a64   R9: ffff880874150800
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: ffff880472a51f08
      R13: ffff880472a51f10  R14: 0000000000000000  R15: 0000000000000007
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   #7 [ffff880472a51f00] do_sys_times at ffffffff8108845d
   #8 [ffff880472a51f40] sys_times at ffffffff81088524
   #9 [ffff880472a51f80] system_call_fastpath at ffffffff8100b0f2
      RIP: 0000003808caac3a  RSP: 00007fcba27ab6d8  RFLAGS: 00000202
      RAX: 0000000000000064  RBX: ffffffff8100b0f2  RCX: 0000000000000000
      RDX: 00007fcba27ab6e0  RSI: 000000000076d58e  RDI: 00007fcba27ab6e0
      RBP: 00007fcba27ab700   R8: 0000000000000020   R9: 000000000000091b
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: 00007fff9ca41940
      R13: 0000000000000000  R14: 00007fcba27ac9c0  R15: 00007fff9ca41940
      ORIG_RAX: 0000000000000064  CS: 0033  SS: 002b

Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120808092714.GA3580@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosched, cgroup: Reduce rq->lock hold times for large cgroup hierarchies
Peter Zijlstra [Wed, 8 Aug 2012 19:46:40 +0000 (21:46 +0200)]
sched, cgroup: Reduce rq->lock hold times for large cgroup hierarchies

Peter Portante reported that for large cgroup hierarchies (and or on
large CPU counts) we get immense lock contention on rq->lock and stuff
stops working properly.

His workload was a ton of processes, each in their own cgroup,
everybody idling except for a sporadic wakeup once every so often.

It was found that:

  schedule()
    idle_balance()
      load_balance()
        local_irq_save()
        double_rq_lock()
        update_h_load()
          walk_tg_tree(tg_load_down)
            tg_load_down()

Results in an entire cgroup hierarchy walk under rq->lock for every
new-idle balance and since new-idle balance isn't throttled this
results in a lot of work while holding the rq->lock.

This patch does two things, it removes the work from under rq->lock
based on the good principle of race and pray which is widely employed
in the load-balancer as a whole. And secondly it throttles the
update_h_load() calculation to max once per jiffy.

I considered excluding update_h_load() for new-idle balance
all-together, but purely relying on regular balance passes to update
this data might not work out under some rare circumstances where the
new-idle busiest isn't the regular busiest for a while (unlikely, but
a nightmare to debug if someone hits it and suffers).

Cc: pjt@google.com
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Reported-by: Peter Portante <pportant@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-aaarrzfpnaam7pqrekofu8a6@git.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosched/cleanups: Add load balance cpumask pointer to 'struct lb_env'
Michael Wang [Thu, 12 Jul 2012 08:10:13 +0000 (16:10 +0800)]
sched/cleanups: Add load balance cpumask pointer to 'struct lb_env'

With this patch struct ld_env will have a pointer of the load balancing
cpumask and we don't need to pass a cpumask around anymore.

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FFE8665.3080705@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agosched: Fix comment about PREEMPT_ACTIVE bit location
Srivatsa S. Bhat [Fri, 20 Jul 2012 19:24:59 +0000 (00:54 +0530)]
sched: Fix comment about PREEMPT_ACTIVE bit location

PREEMPT_ACTIVE flag is bit 27, not 28. Fix the comment.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: paulmck@linux.vnet.ibm.com
Cc: josh@joshtriplett.org
Cc: rostedt@goodmis.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120720192459.6149.14821.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agosched: Fix minor code style issues
Ying Xue [Thu, 12 Jul 2012 07:03:42 +0000 (15:03 +0800)]
sched: Fix minor code style issues

Delete redudant spaces between type name and data name or operators.

Signed-off-by: Ying Xue <ying.xue0@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1342076622-6606-1-git-send-email-ying.xue0@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agosched: Use task_rq_unlock() in __sched_setscheduler()
Namhyung Kim [Sat, 7 Jul 2012 07:49:02 +0000 (16:49 +0900)]
sched: Use task_rq_unlock() in __sched_setscheduler()

It seems there's no specific reason to open-code it.  I guess
commit 0122ec5b02f76 ("sched: Add p->pi_lock to task_rq_lock()")
simply missed it.  Let's be consistent with others.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1341647342-6742-1-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agosched/numa: Add SD_PERFER_SIBLING to CPU domain
Alex Shi [Fri, 20 Jul 2012 06:19:50 +0000 (14:19 +0800)]
sched/numa: Add SD_PERFER_SIBLING to CPU domain

Commit 8e7fbcbc22c ("sched: Remove stale power aware scheduling remnants
and dysfunctional knobs") removed SD_PERFER_SIBLING from the CPU domain.

On NUMA machines this causes that load_balance() doesn't perfer LCPU in
 same physical CPU package.

It causes some actual performance regressions on our NUMA machines from
Core2 to NHM and SNB.

Adding this domain flag again recovers the performance drop.

This change doesn't have any bad impact on any of my benchmarks:
 specjbb, kbuild, fio, hackbench .. etc, on all my machines.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1342765190-21540-1-git-send-email-alex.shi@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agosched: Fix race in task_group()
Peter Zijlstra [Fri, 22 Jun 2012 11:36:05 +0000 (13:36 +0200)]
sched: Fix race in task_group()

Stefan reported a crash on a kernel before a3e5d1091c1 ("sched:
Don't call task_group() too many times in set_task_rq()"), he
found the reason to be that the multiple task_group()
invocations in set_task_rq() returned different values.

Looking at all that I found a lack of serialization and plain
wrong comments.

The below tries to fix it using an extra pointer which is
updated under the appropriate scheduler locks. Its not pretty,
but I can't really see another way given how all the cgroup
stuff works.

Reported-and-tested-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340364965.18025.71.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agosched: Improve balance_cpu() to consider other cpus in its group as target of (pinned...
Srivatsa Vaddagiri [Tue, 19 Jun 2012 12:13:15 +0000 (17:43 +0530)]
sched: Improve balance_cpu() to consider other cpus in its group as target of (pinned) task

Current load balance scheme requires only one cpu in a
sched_group (balance_cpu) to look at other peer sched_groups for
imbalance and pull tasks towards itself from a busy cpu. Tasks
thus pulled by balance_cpu could later get picked up by cpus
that are in the same sched_group as that of balance_cpu.

This scheme however fails to pull tasks that are not allowed to
run on balance_cpu (but are allowed to run on other cpus in its
sched_group). That can affect fairness and in some worst case
scenarios cause starvation.

Consider a two core (2 threads/core) system running tasks as
below:

          Core0            Core1
         /     \          /     \
C0     C1  C2     C3
        |      |         |      |
        v      v         v      v
F0     T1        F1     [idle]
 T2

 F0 = SCHED_FIFO task (pinned to C0)
 F1 = SCHED_FIFO task (pinned to C2)
 T1 = SCHED_OTHER task (pinned to C1)
 T2 = SCHED_OTHER task (pinned to C1 and C2)

F1 could become a cpu hog, which will starve T2 unless C1 pulls
it. Between C0 and C1 however, C0 is required to look for
imbalance between cores, which will fail to pull T2 towards
Core0. T2 will starve eternally in this case. The same scenario
can arise in presence of non-rt tasks as well (say we replace F1
with high irq load).

We tackle this problem by having balance_cpu move pinned tasks
to one of its sibling cpus (where they can run). We first check
if load balance goal can be met by ignoring pinned tasks,
failing which we retry move_tasks() with a new env->dst_cpu.

This patch modifies load balance semantics on who can move load
towards a given cpu in a given sched_domain.

Before this patch, a given_cpu or a ilb_cpu acting on behalf of
an idle given_cpu is responsible for moving load to given_cpu.

With this patch applied, balance_cpu can in addition decide on
moving some load to a given_cpu.

There is a remote possibility that excess load could get moved
as a result of this (balance_cpu and given_cpu/ilb_cpu deciding
*independently* and at *same* time to move some load to a
given_cpu). However we should see less of such conflicting
decisions in practice and moreover subsequent load balance
cycles should correct the excess load moved to given_cpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FE06CDB.2060605@linux.vnet.ibm.com
[ minor edits ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agosched: Reset loop counters if all tasks are pinned and we need to redo load balance
Prashanth Nageshappa [Tue, 19 Jun 2012 12:22:07 +0000 (17:52 +0530)]
sched: Reset loop counters if all tasks are pinned and we need to redo load balance

While load balancing, if all tasks on the source runqueue are pinned,
we retry after excluding the corresponding source cpu. However, loop counters
env.loop and env.loop_break are not reset before retrying, which can lead
to failure in moving the tasks. In this patch we reset env.loop and
env.loop_break to their inital values before we retry.

Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FE06EEF.2090709@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agosched: Reorder 'struct lb_env' members to reduce its size
Prashanth Nageshappa [Tue, 19 Jun 2012 12:17:34 +0000 (17:47 +0530)]
sched: Reorder 'struct lb_env' members to reduce its size

Members of 'struct lb_env' are not in appropriate order to reuse compiler
added padding on 64bit architectures. In this patch we reorder those struct
members and help reduce the size of the structure from 96 bytes to 80
bytes on 64 bit architectures.

Suggested-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FE06DDE.7000403@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agosched: Improve scalability via 'CPU buddies', which withstand random perturbations
Mike Galbraith [Tue, 12 Jun 2012 03:18:32 +0000 (05:18 +0200)]
sched: Improve scalability via 'CPU buddies', which withstand random perturbations

Traversing an entire package is not only expensive, it also leads to tasks
bouncing all over a partially idle and possible quite large package.  Fix
that up by assigning a 'buddy' CPU to try to motivate.  Each buddy may try
to motivate that one other CPU, if it's busy, tough, it may then try its
SMT sibling, but that's all this optimization is allowed to cost.

Sibling cache buddies are cross-wired to prevent bouncing.

4 socket 40 core + SMT Westmere box, single 30 sec tbench runs, higher is better:

 clients     1       2       4        8       16       32       64      128
 ..........................................................................
 pre        30      41     118      645     3769     6214    12233    14312
 post      299     603    1211     2418     4697     6847    11606    14557

A nice increase in performance.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1339471112.7352.32.camel@marge.simpson.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agocpusets: Remove/update outdated comments
Srivatsa S. Bhat [Thu, 24 May 2012 14:17:03 +0000 (19:47 +0530)]
cpusets: Remove/update outdated comments

cpuset_track_online_cpus() is no longer present. So remove the
outdated comment and replace it with reference to cpuset_update_active_cpus()
which is its equivalent.

Also, we don't lack memory hot-unplug anymore. And David Rientjes pointed
out how it is dealt with. So update that comment as well.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141700.3692.98192.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agocpusets, hotplug: Restructure functions that are invoked during hotplug
Srivatsa S. Bhat [Thu, 24 May 2012 14:16:55 +0000 (19:46 +0530)]
cpusets, hotplug: Restructure functions that are invoked during hotplug

Separate out the cpuset related handling for CPU/Memory online/offline.
This also helps us exploit the most obvious and basic level of optimization
that any notification mechanism (CPU/Mem online/offline) has to offer us:
"We *know* why we have been invoked. So stop pretending that we are lost,
and do only the necessary amount of processing!".

And while at it, rename scan_for_empty_cpusets() to
scan_cpusets_upon_hotplug(), which is more appropriate considering how
it is restructured.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141650.3692.48637.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agocpusets, hotplug: Implement cpuset tree traversal in a helper function
Srivatsa S. Bhat [Thu, 24 May 2012 14:16:41 +0000 (19:46 +0530)]
cpusets, hotplug: Implement cpuset tree traversal in a helper function

At present, the functions that deal with cpusets during CPU/Mem hotplug
are quite messy, since a lot of the functionality is mixed up without clear
separation. And this takes a toll on optimization as well. For example,
the function cpuset_update_active_cpus() is called on both CPU offline and CPU
online events; and it invokes scan_for_empty_cpusets(), which makes sense
only for CPU offline events. And hence, the current code ends up unnecessarily
traversing the cpuset tree during CPU online also.

As a first step towards cleaning up those functions, encapsulate the cpuset
tree traversal in a helper function, so as to facilitate upcoming changes.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141635.3692.893.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agoCPU hotplug, cpusets, suspend: Don't modify cpusets during suspend/resume
Srivatsa S. Bhat [Thu, 24 May 2012 14:16:26 +0000 (19:46 +0530)]
CPU hotplug, cpusets, suspend: Don't modify cpusets during suspend/resume

In the event of CPU hotplug, the kernel modifies the cpusets' cpus_allowed
masks as and when necessary to ensure that the tasks belonging to the cpusets
have some place (online CPUs) to run on. And regular CPU hotplug is
destructive in the sense that the kernel doesn't remember the original cpuset
configurations set by the user, across hotplug operations.

However, suspend/resume (which uses CPU hotplug) is a special case in which
the kernel has the responsibility to restore the system (during resume), to
exactly the same state it was in before suspend.

In order to achieve that, do the following:

1. Don't modify cpusets during suspend/resume. At all.
   In particular, don't move the tasks from one cpuset to another, and
   don't modify any cpuset's cpus_allowed mask. So, simply ignore cpusets
   during the CPU hotplug operations that are carried out in the
   suspend/resume path.

2. However, cpusets and sched domains are related. We just want to avoid
   altering cpusets alone. So, to keep the sched domains updated, build
   a single sched domain (containing all active cpus) during each of the
   CPU hotplug operations carried out in s/r path, effectively ignoring
   the cpusets' cpus_allowed masks.

   (Since userspace is frozen while doing all this, it will go unnoticed.)

3. During the last CPU online operation during resume, build the sched
   domains by looking up the (unaltered) cpusets' cpus_allowed masks.
   That will bring back the system to the same original state as it was in
   before suspend.

Ultimately, this will not only solve the cpuset problem related to suspend
resume (ie., restores the cpusets to exactly what it was before suspend, by
not touching it at all) but also speeds up suspend/resume because we avoid
running cpuset update code for every CPU being offlined/onlined.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141611.3692.20155.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agosched/x86: Remove broken power estimation
Peter Zijlstra [Wed, 13 Jun 2012 13:24:45 +0000 (15:24 +0200)]
sched/x86: Remove broken power estimation

The x86 sched power implementation has been broken forever and gets in
the way of other stuff, remove it.

[ For archaeological interest, fixing this code would require dealing
  with the cross-cpu calling of these functions and more importantly, we
  need to filter idle time out of the a/m-perf stuff because the ratio
  will go down to 0 when idle, giving a 0 capacity which is not what
  we'd want. ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Link: http://lkml.kernel.org/r/1339594110.8980.38.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agoLinux 3.5 v3.5
Linus Torvalds [Sat, 21 Jul 2012 20:58:29 +0000 (13:58 -0700)]
Linux 3.5

12 years agoRemove SYSTEM_SUSPEND_DISK system state
Rafael J. Wysocki [Sat, 21 Jul 2012 18:24:52 +0000 (20:24 +0200)]
Remove SYSTEM_SUSPEND_DISK system state

The SYSTEM_SUSPEND_DISK system state is never used, so drop it.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge branch 'anton-kgdb' (kgdb dmesg fixups)
Linus Torvalds [Sat, 21 Jul 2012 17:34:13 +0000 (10:34 -0700)]
Merge branch 'anton-kgdb' (kgdb dmesg fixups)

Merge emailed kgdb dmesg fixups patches from Anton Vorontsov:
 "The dmesg command appears to be broken after the printk rework.  The
  old logic in the kdb code makes no sense in terms of current
  printk/logging storage format, and KDB simply hangs forever upon
  entering 'dmesg' command.

  The first patch revives the command by switching to kmsg_dumper
  iterator.  As a side-effect, the code is now much more simpler.

  A few changes were needed in the printk.c: we needed unlocked variant
  of the kmsg_dumper iterator, but these can surely wait for 3.6.

  It's probably too late even for the first patch to go to 3.5, but I'll
  try to convince otherwise.  :-) Here we go:

   - The current code is broken for sure, and has no hope to work at
     all.  It is a regression
   - The new code works for me, and probably works for everyone else;
   - If it compiles (and I urge everyone to compile-test it on your
     setup), it hardly can make things worse."

* Merge emailed patches from Anton Vorontsov: (4 commits)
  kdb: Switch to nolock variants of kmsg_dump functions
  printk: Implement some unlocked kmsg_dump functions
  printk: Remove kdb_syslog_data
  kdb: Revive dmesg command

12 years agokdb: Switch to nolock variants of kmsg_dump functions
Anton Vorontsov [Sat, 21 Jul 2012 00:28:25 +0000 (17:28 -0700)]
kdb: Switch to nolock variants of kmsg_dump functions

The locked variants are prone to deadlocks (suppose we got to the
debugger w/ the logbuf lock held), so let's switch to nolock variants.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoprintk: Implement some unlocked kmsg_dump functions
Anton Vorontsov [Sat, 21 Jul 2012 00:28:07 +0000 (17:28 -0700)]
printk: Implement some unlocked kmsg_dump functions

If used from KDB, the locked variants are prone to deadlocks (suppose we
got to the debugger w/ the logbuf lock held).

So, we have to implement a few routines that grab no logbuf lock.

Yet we don't need these functions in modules, so we don't export them.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoprintk: Remove kdb_syslog_data
Anton Vorontsov [Sat, 21 Jul 2012 00:27:54 +0000 (17:27 -0700)]
printk: Remove kdb_syslog_data

The function is no longer needed, so remove it.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agokdb: Revive dmesg command
Anton Vorontsov [Sat, 21 Jul 2012 00:27:37 +0000 (17:27 -0700)]
kdb: Revive dmesg command

The kgdb dmesg command is broken after the printk rework.  The old logic
in kdb code makes no sense in terms of current printk/logging storage
format, and KDB simply hangs forever.

This patch revives the command by switching to kmsg_dumper iterator.

The code is now much more simpler and shorter.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Linus Torvalds [Fri, 20 Jul 2012 19:02:02 +0000 (12:02 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull late MIPS fixes from Ralf Baechle:
 "This fixes a number of lose ends in the MIPS code and various bug
  fixes.

  Aside of dropping some patch that should not be in this pull request
  everything has sat in -next for quite a while and there are no known
  issues.

  The biggest patch in this patch set moves the allocation of an array
  that is aliased to a function (for runtime generated code) to
  assembler code.  This avoids an issue with certain toolchains when
  building for microMIPS."

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (35 commits)
  MIPS: PCI: Move fixups from __init to __devinit.
  MIPS: Fix bug.h MIPS build regression
  MIPS: sync-r4k: remove redundant irq operation
  MIPS: smp: Warn on too early irq enable
  MIPS: call set_cpu_online() on cpu being brought up with irq disabled
  MIPS: call ->smp_finish() a little late
  MIPS: Yosemite: delay irq enable to ->smp_finish()
  MIPS: SMTC: delay irq enable to ->smp_finish()
  MIPS: BMIPS: delay irq enable to ->smp_finish()
  MIPS: Octeon: delay enable irq to ->smp_finish()
  MIPS: Oprofile: Fix build as a module.
  MIPS: BCM63XX: Fix BCM6368 IPSec clock bit
  MIPS: perf: Fix build error caused by unused counters_per_cpu_to_total()
  MIPS: Fix Magic SysRq L kernel crash.
  MIPS: BMIPS: Fix duplicate header inclusion.
  mips: mark const init data with __initconst instead of __initdata
  MIPS: cmpxchg.h: Add missing include
  MIPS: Malta may also be equipped with MIPS64 R2 processors.
  MIPS: Fix typo multipy -> multiply
  MIPS: Cavium: Fix duplicate ARCH_SPARSEMEM_ENABLE in kconfig.
  ...

12 years agoMerge tag 'dm-3.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Linus Torvalds [Fri, 20 Jul 2012 18:51:22 +0000 (11:51 -0700)]
Merge tag 'dm-3.5-fixes-2' of git://git./linux/kernel/git/agk/linux-dm

Pull device-mapper discard fixes from Alasdair G Kergon:
  - avoid a crash in dm-raid1 when discards coincide with mirror
    recovery;
  - avoid discarding shared data that's still needed in dm-thin;
  - don't guarantee that discarded blocks will be wiped in dm-raid1.

* tag 'dm-3.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
  dm raid1: set discard_zeroes_data_unsupported
  dm thin: do not send discards to shared blocks
  dm raid1: fix crash with mirror recovery and discard

12 years agoMerge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
Linus Torvalds [Fri, 20 Jul 2012 18:43:53 +0000 (11:43 -0700)]
Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd

Pull pnfs/ore fixes from Boaz Harrosh:
 "These are catastrophic fixes to the pnfs objects-layout that were just
  discovered.  They are also destined for @stable.

  I have found these and worked on them at around RC1 time but
  unfortunately went to the hospital for kidney stones and had a very
  slow recovery.  I refrained from sending them as is, before proper
  testing, and surly I have found a bug just yesterday.

  So now they are all well tested, and have my sign-off.  Other then
  fixing the problem at hand, and assuming there are no bugs at the new
  code, there is low risk to any surrounding code.  And in anyway they
  affect only these paths that are now broken.  That is RAID5 in pnfs
  objects-layout code.  It does also affect exofs (which was not broken)
  but I have tested exofs and it is lower priority then objects-layout
  because no one is using exofs, but objects-layout has lots of users."

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
  pnfs-obj: don't leak objio_state if ore_write/read fails
  ore: Unlock r4w pages in exact reverse order of locking
  ore: Remove support of partial IO request (NFS crash)
  ore: Fix NFS crash by supporting any unaligned RAID IO

12 years agoMerge tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs
Linus Torvalds [Fri, 20 Jul 2012 18:42:30 +0000 (11:42 -0700)]
Merge tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs

Pull UBIFS free space fix-up bugfix from Artem Bityutskiy:
 "It's been reported already twice recently:

    http://lists.infradead.org/pipermail/linux-mtd/2012-May/041408.html
    http://lists.infradead.org/pipermail/linux-mtd/2012-June/042422.html

  and we finally have the fix.  I am quite confident the fix is correct
  because I could reproduce the problem with nandsim and verify the fix.
  It was also verified by Iwo (the reporter).

  I am also confident that this is OK to merge the fix so late because
  this patch affects only the fixup functionality, which is not used by
  most users."

* tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs:
  UBIFS: fix a bug in empty space fix-up

12 years agodm raid1: set discard_zeroes_data_unsupported
Mikulas Patocka [Fri, 20 Jul 2012 13:25:07 +0000 (14:25 +0100)]
dm raid1: set discard_zeroes_data_unsupported

We can't guarantee that REQ_DISCARD on dm-mirror zeroes the data even if
the underlying disks support zero on discard.  So this patch sets
ti->discard_zeroes_data_unsupported.

For example, if the mirror is in the process of resynchronizing, it may
happen that kcopyd reads a piece of data, then discard is sent on the
same area and then kcopyd writes the piece of data to another leg.
Consequently, the data is not zeroed.

The flag was made available by commit 983c7db347db8ce2d8453fd1d89b7a4bb6920d56
(dm crypt: always disable discard_zeroes_data).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm thin: do not send discards to shared blocks
Mikulas Patocka [Fri, 20 Jul 2012 13:25:05 +0000 (14:25 +0100)]
dm thin: do not send discards to shared blocks

When process_discard receives a partial discard that doesn't cover a
full block, it sends this discard down to that block. Unfortunately, the
block can be shared and the discard would corrupt the other snapshots
sharing this block.

This patch detects block sharing and ends the discard with success when
sending it to the shared block.

The above change means that if the device supports discard it can't be
guaranteed that a discard request zeroes data. Therefore, we set
ti->discard_zeroes_data_unsupported.

Thin target discard support with this bug arrived in commit
104655fd4dcebd50068ef30253a001da72e3a081 (dm thin: support discards).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm raid1: fix crash with mirror recovery and discard
Mikulas Patocka [Fri, 20 Jul 2012 13:25:03 +0000 (14:25 +0100)]
dm raid1: fix crash with mirror recovery and discard

This patch fixes a crash when a discard request is sent during mirror
recovery.

Firstly, some background.  Generally, the following sequence happens during
mirror synchronization:
- function do_recovery is called
- do_recovery calls dm_rh_recovery_prepare
- dm_rh_recovery_prepare uses a semaphore to limit the number
  simultaneously recovered regions (by default the semaphore value is 1,
  so only one region at a time is recovered)
- dm_rh_recovery_prepare calls __rh_recovery_prepare,
  __rh_recovery_prepare asks the log driver for the next region to
  recover. Then, it sets the region state to DM_RH_RECOVERING. If there
  are no pending I/Os on this region, the region is added to
  quiesced_regions list. If there are pending I/Os, the region is not
  added to any list. It is added to the quiesced_regions list later (by
  dm_rh_dec function) when all I/Os finish.
- when the region is on quiesced_regions list, there are no I/Os in
  flight on this region. The region is popped from the list in
  dm_rh_recovery_start function. Then, a kcopyd job is started in the
  recover function.
- when the kcopyd job finishes, recovery_complete is called. It calls
  dm_rh_recovery_end. dm_rh_recovery_end adds the region to
  recovered_regions or failed_recovered_regions list (depending on
  whether the copy operation was successful or not).

The above mechanism assumes that if the region is in DM_RH_RECOVERING
state, no new I/Os are started on this region. When I/O is started,
dm_rh_inc_pending is called, which increases reg->pending count. When
I/O is finished, dm_rh_dec is called. It decreases reg->pending count.
If the count is zero and the region was in DM_RH_RECOVERING state,
dm_rh_dec adds it to the quiesced_regions list.

Consequently, if we call dm_rh_inc_pending/dm_rh_dec while the region is
in DM_RH_RECOVERING state, it could be added to quiesced_regions list
multiple times or it could be added to this list when kcopyd is copying
data (it is assumed that the region is not on any list while kcopyd does
its jobs). This results in memory corruption and crash.

There already exist bypasses for REQ_FLUSH requests: REQ_FLUSH requests
do not belong to any region, so they are always added to the sync list
in do_writes. dm_rh_inc_pending does not increase count for REQ_FLUSH
requests. In mirror_end_io, dm_rh_dec is never called for REQ_FLUSH
requests. These bypasses avoid the crash possibility described above.

These bypasses were improperly implemented for REQ_DISCARD when
the mirror target gained discard support in commit
5fc2ffeabb9ee0fc0e71ff16b49f34f0ed3d05b4 (dm raid1: support discard).

In do_writes, REQ_DISCARD requests is always added to the sync queue and
immediately dispatched (even if the region is in DM_RH_RECOVERING).  However,
dm_rh_inc and dm_rh_dec is called for REQ_DISCARD resusts.  So it violates the
rule that no I/Os are started on DM_RH_RECOVERING regions, and causes the list
corruption described above.

This patch changes it so that REQ_DISCARD requests follow the same path
as REQ_FLUSH. This avoids the crash.

Reference: https://bugzilla.redhat.com/837607

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agopnfs-obj: Fix __r4w_get_page when offset is beyond i_size
Boaz Harrosh [Thu, 7 Jun 2012 23:02:30 +0000 (02:02 +0300)]
pnfs-obj: Fix __r4w_get_page when offset is beyond i_size

It is very common for the end of the file to be unaligned on
stripe size. But since we know it's beyond file's end then
the XOR should be preformed with all zeros.

Old code used to just read zeros out of the OSD devices, which is a great
waist. But what scares me more about this situation is that, we now have
pages attached to the file's mapping that are beyond i_size. I don't
like the kind of bugs this calls for.

Fix both birds, by returning a global zero_page, if offset is beyond
i_size.

TODO:
Change the API to ->__r4w_get_page() so a NULL can be
returned without being considered as error, since XOR API
treats NULL entries as zero_pages.

[Bug since 3.2. Should apply the same way to all Kernels since]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
12 years agopnfs-obj: don't leak objio_state if ore_write/read fails
Boaz Harrosh [Fri, 8 Jun 2012 02:29:40 +0000 (05:29 +0300)]
pnfs-obj: don't leak objio_state if ore_write/read fails

[Bug since 3.2 Kernel]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
12 years agoore: Unlock r4w pages in exact reverse order of locking
Boaz Harrosh [Wed, 11 Jul 2012 12:27:13 +0000 (15:27 +0300)]
ore: Unlock r4w pages in exact reverse order of locking

The read-4-write pages are locked in address ascending order.
But where unlocked in a way easiest for coding. Fix that,
locks should be released in opposite order of locking, .i.e
descending address order.

I have not hit this dead-lock. It was found by inspecting the
dbug print-outs. I suspect there is an higher lock at caller that
protects us, but fix it regardless.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
12 years agoore: Remove support of partial IO request (NFS crash)
Boaz Harrosh [Fri, 8 Jun 2012 01:30:40 +0000 (04:30 +0300)]
ore: Remove support of partial IO request (NFS crash)

Do to OOM situations the ore might fail to allocate all resources
needed for IO of the full request. If some progress was possible
it would proceed with a partial/short request, for the sake of
forward progress.

Since this crashes NFS-core and exofs is just fine without it just
remove this contraption, and fail.

TODO:
Support real forward progress with some reserved allocations
of resources, such as mem pools and/or bio_sets

[Bug since 3.2 Kernel]
CC: Stable Tree <stable@kernel.org>
CC: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
12 years agoore: Fix NFS crash by supporting any unaligned RAID IO
Boaz Harrosh [Thu, 7 Jun 2012 22:19:07 +0000 (01:19 +0300)]
ore: Fix NFS crash by supporting any unaligned RAID IO

In RAID_5/6 We used to not permit an IO that it's end
byte is not stripe_size aligned and spans more than one stripe.
.i.e the caller must check if after submission the actual
transferred bytes is shorter, and would need to resubmit
a new IO with the remainder.

Exofs supports this, and NFS was supposed to support this
as well with it's short write mechanism. But late testing has
exposed a CRASH when this is used with none-RPC layout-drivers.

The change at NFS is deep and risky, in it's place the fix
at ORE to lift the limitation is actually clean and simple.
So here it is below.

The principal here is that in the case of unaligned IO on
both ends, beginning and end, we will send two read requests
one like old code, before the calculation of the first stripe,
and also a new site, before the calculation of the last stripe.
If any "boundary" is aligned or the complete IO is within a single
stripe. we do a single read like before.

The code is clean and simple by splitting the old _read_4_write
into 3 even parts:
1._read_4_write_first_stripe
2. _read_4_write_last_stripe
3. _read_4_write_execute

And calling 1+3 at the same place as before. 2+3 before last
stripe, and in the case of all in a single stripe then 1+2+3
is preformed additively.

Why did I not think of it before. Well I had a strike of
genius because I have stared at this code for 2 years, and did
not find this simple solution, til today. Not that I did not try.

This solution is much better for NFS than the previous supposedly
solution because the short write was dealt  with out-of-band after
IO_done, which would cause for a seeky IO pattern where as in here
we execute in order. At both solutions we do 2 separate reads, only
here we do it within a single IO request. (And actually combine two
writes into a single submission)

NFS/exofs code need not change since the ORE API communicates the new
shorter length on return, what will happen is that this case would not
occur anymore.

hurray!!

[Stable this is an NFS bug since 3.2 Kernel should apply cleanly]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
12 years agoUBIFS: fix a bug in empty space fix-up
Artem Bityutskiy [Sat, 14 Jul 2012 11:33:09 +0000 (14:33 +0300)]
UBIFS: fix a bug in empty space fix-up

UBIFS has a feature called "empty space fix-up" which is a quirk to work-around
limitations of dumb flasher programs. Namely, of those flashers that are unable
to skip NAND pages full of 0xFFs while flashing, resulting in empty space at
the end of half-filled eraseblocks to be unusable for UBIFS. This feature is
relatively new (introduced in v3.0).

The fix-up routine (fixup_free_space()) is executed only once at the very first
mount if the superblock has the 'space_fixup' flag set (can be done with -F
option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and
writes it back to the same LEB. The routine assumes the image is pristine and
does not have anything in the journal.

There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly.
All but one LEB of the log of a pristine file-system are empty. And one
contains just a commit start node. And 'fixup_free_space()' just unmapped this
LEB, which resulted in wiping the commit start node. As a result, some users
were unable to mount the file-system next time with the following symptom:

UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node
UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0

The root-cause of this bug was that 'fixup_free_space()' wrongly assumed
that the beginning of empty space in the log head (c->lhead_offs) was known
on mount. However, it is not the case - it was always 0. UBIFS does not store
in it the master node and finds out by scanning the log on every mount.

The fix is simple - just pass commit start node size instead of 0 to
'fixup_leb()'.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Cc: stable@vger.kernel.org [v3.0+]
Reported-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com>
Tested-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com>
Reported-by: James Nute <newten82@gmail.com>
12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph...
Linus Torvalds [Thu, 19 Jul 2012 23:11:28 +0000 (16:11 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

Pull last minute Ceph fixes from Sage Weil:
 "The important one fixes a bug in the socket failure handling behavior
  that was turned up in some recent failure injection testing.  The
  other two are minor bug fixes."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: endian bug in rbd_req_cb()
  rbd: Fix ceph_snap_context size calculation
  libceph: fix messenger retry

12 years agoMerge tag 'md-3.5-fixes' of git://neil.brown.name/md
Linus Torvalds [Thu, 19 Jul 2012 15:27:13 +0000 (08:27 -0700)]
Merge tag 'md-3.5-fixes' of git://neil.brown.name/md

Pull three md bugfixes from NeilBrown:
 "One of the bugs was introduced in 3.5-rc1.  Others have been there for
  longer."

* tag 'md-3.5-fixes' of git://neil.brown.name/md:
  md/raid1: close some possible races on write errors during resync
  md: avoid crash when stopping md array races with closing other open fds.
  md: fix bug in handling of new_data_offset

12 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 19 Jul 2012 15:21:13 +0000 (08:21 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking changes from David Miller:
 "Ok, we should be good to go now"

1) We have to statically initialize the init_net device list head rather
   than do so in an initcall, otherwise netprio_cgroup crashes if it's
   built statically rather than modular (Mark D.  Rustad)

2) Fix SKB null oopser in CIPSO ipv4 option processing (Paul Moore)

3) Qlogic maintainers update (Anirban Chakraborty)

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: Statically initialize init_net.dev_base_head
  MAINTAINERS: Changes in qlcnic and qlge maintainers list
  cipso: don't follow a NULL pointer when setsockopt() is called

12 years agoMerge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Linus Torvalds [Thu, 19 Jul 2012 15:15:55 +0000 (08:15 -0700)]
Merge branch 'upstream-fixes' of git://git./linux/kernel/git/jikos/hid

Pull HID update from Jiri Kosina:
 "A final round of changes for HID for 3.5: just device ID additions."

* 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: hid-multitouch: add support for Zytronic panels
  HID: add Sennheiser BTD500USB device support
  HID: add battery quirk for Apple Wireless ANSI

12 years agocx25821: Remove bad strcpy to read-only char*
Ezequiel Garcia [Wed, 18 Jul 2012 13:05:26 +0000 (10:05 -0300)]
cx25821: Remove bad strcpy to read-only char*

The strcpy was being used to set the name of the board.  Since the
destination char* was read-only and the name is set statically at
compile time; this was both wrong and redundant.

The type of char* is changed to const char* to prevent future errors.

Reported-by: Radek Masin <radek@masin.eu>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
[ Taking directly due to vacations   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoHID: hid-multitouch: add support for Zytronic panels
Benjamin Tissoires [Tue, 19 Jun 2012 12:39:54 +0000 (14:39 +0200)]
HID: hid-multitouch: add support for Zytronic panels

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
12 years agoMIPS: PCI: Move fixups from __init to __devinit.
Sebastian Andrzej Siewior [Thu, 19 Jul 2012 05:13:54 +0000 (07:13 +0200)]
MIPS: PCI: Move fixups from __init to __devinit.

Fixups are executed once the pci-device is found which is during boot
process so __init seems fine as long as the platform does not support
hotplug.
However it is possible to remove the PCI bus at run time and have it
rediscovered again via "echo 1 > /sys/bus/pci/rescan" and this will call
the fixups again.

[ralf@linux-mips.org: Made piixirqmap[] in malta_piix_func0_fixup()
__initdata.]

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: linux-mips@linux-mips.org
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: Fix bug.h MIPS build regression
Yoichi Yuasa [Thu, 19 Jul 2012 05:13:54 +0000 (07:13 +0200)]
MIPS: Fix bug.h MIPS build regression

Commit: 3777808873b0c49c5cf27e44c948dfb02675d578 [bug.h: need linux/kernel.h
for TAINT_WARN.] breaks all MIPS builds.

  CC      arch/mips/kernel/machine_kexec.o
In file included from include/linux/kernel.h:20:0,
                 from include/asm-generic/bug.h:35,
                 from /home/yuasa/src/linux/kernel/git/linux-2.6/arch/mips/include/asm/bug.h:41,
                 from /home/yuasa/src/linux/kernel/git/linux-2.6/arch/mips/include/asm/bitops.h:20,
                 from include/linux/bitops.h:22,
                 from include/linux/signal.h:38,
                 from include/linux/elfcore.h:5,
                 from include/linux/kexec.h:60,
                 from arch/mips/kernel/machine_kexec.c:9:
include/linux/log2.h: In function '__ilog2_u32':
include/linux/log2.h:34:2: error: implicit declaration of function 'fls' [-Werror=implicit-function-declaration]
include/linux/log2.h: In function '__ilog2_u64':
include/linux/log2.h:42:2: error: implicit declaration of function 'fls64' [-Werror=implicit-function-declaration]
include/linux/log2.h: In function '__roundup_pow_of_two':
include/linux/log2.h:63:2: error: implicit declaration of function 'fls_long' [-Werror=implicit-function-declaration]
In file included from include/linux/bitops.h:22:0,
                 from include/linux/signal.h:38,
                 from include/linux/elfcore.h:5,
                 from include/linux/kexec.h:60,
                 from arch/mips/kernel/machine_kexec.c:9:
/home/yuasa/src/linux/kernel/git/linux-2.6/arch/mips/include/asm/bitops.h: At top level:
/home/yuasa/src/linux/kernel/git/linux-2.6/arch/mips/include/asm/bitops.h:615:19: error: static declaration of 'fls' follows non-static declaration
include/linux/log2.h:34:9: note: previous implicit declaration of 'fls' was here
In file included from /home/yuasa/src/linux/kernel/git/linux-2.6/arch/mips/include/asm/bitops.h:651:0,
                 from include/linux/bitops.h:22,
                 from include/linux/signal.h:38,
                 from include/linux/elfcore.h:5,
                 from include/linux/kexec.h:60,
                 from arch/mips/kernel/machine_kexec.c:9:
include/asm-generic/bitops/fls64.h:18:28: error: static declaration of 'fls64' follows non-static declaration
include/linux/log2.h:42:9: note: previous implicit declaration of 'fls64' was here
In file included from include/linux/signal.h:38:0,
                 from include/linux/elfcore.h:5,
                 from include/linux/kexec.h:60,
                 from arch/mips/kernel/machine_kexec.c:9:
include/linux/bitops.h:160:24: error: conflicting types for 'fls_long'
include/linux/log2.h:63:16: note: previous implicit declaration of 'fls_long' was here
cc1: all warnings being treated as errors

make[2]: *** [arch/mips/kernel/machine_kexec.o] Error 1

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: yuasa@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: Linuxppc-dev <linuxppc-dev@ozlabs.org>
Cc: Linux MIPS Mailing List <linux-mips@linux-mips.org>
Cc: Linux-sh list <linux-sh@vger.kernel.org>
Cc: Chris Zankel <chris@zankel.net>
Patchwork: https://patchwork.linux-mips.org/patch/4000/
Tested-by: John Crispin <blogic@openwrt.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: sync-r4k: remove redundant irq operation
Yong Zhang [Thu, 19 Jul 2012 07:13:54 +0000 (09:13 +0200)]
MIPS: sync-r4k: remove redundant irq operation

Since we have delayed irq enabling to ->smp_finish()

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: smp: Warn on too early irq enable
Yong Zhang [Thu, 19 Jul 2012 07:13:53 +0000 (09:13 +0200)]
MIPS: smp: Warn on too early irq enable

Just to catch a potential issue.

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3852/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: call set_cpu_online() on cpu being brought up with irq disabled
Yong Zhang [Thu, 19 Jul 2012 07:13:53 +0000 (09:13 +0200)]
MIPS: call set_cpu_online() on cpu being brought up with irq disabled

To prevent a problem as commit 5fbd036b [sched: Cleanup cpu_active madness]
and commit 2baab4e9 [sched: Fix select_fallback_rq() vs cpu_active/cpu_online]
try to resolve, move set_cpu_online() to the brought up CPU and with irq
disabled.

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3851/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: call ->smp_finish() a little late
Yong Zhang [Thu, 19 Jul 2012 07:13:53 +0000 (09:13 +0200)]
MIPS: call ->smp_finish() a little late

We have move irq enable to ->smp_finish. Place ->smp_finish() a little
late to prepare for move set_cpu_online() into start_secondary.
And it's not necessary to call cpu_set(cpu, cpu_callin_map) and
synchronise_count_slave() with irq enabled.

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3850/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: Yosemite: delay irq enable to ->smp_finish()
Yong Zhang [Thu, 19 Jul 2012 07:13:53 +0000 (09:13 +0200)]
MIPS: Yosemite: delay irq enable to ->smp_finish()

To prepare for smoothing set_cpu_[active|online]() mess up

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3848/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: SMTC: delay irq enable to ->smp_finish()
Yong Zhang [Thu, 19 Jul 2012 07:13:53 +0000 (09:13 +0200)]
MIPS: SMTC: delay irq enable to ->smp_finish()

To prepare for smoothing set_cpu_[active|online]() mess up

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3847/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: BMIPS: delay irq enable to ->smp_finish()
Yong Zhang [Thu, 19 Jul 2012 07:13:53 +0000 (09:13 +0200)]
MIPS: BMIPS: delay irq enable to ->smp_finish()

To prepare for smoothing set_cpu_[active|online]() mess up

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3846/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: Octeon: delay enable irq to ->smp_finish()
Yong Zhang [Thu, 19 Jul 2012 07:13:53 +0000 (09:13 +0200)]
MIPS: Octeon: delay enable irq to ->smp_finish()

To prepare for smoothing set_cpu_[active|online]() mess up

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: David Daney <david.daney@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Patchwork: https://patchwork.linux-mips.org/patch/3845/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: Oprofile: Fix build as a module.
Ralf Baechle [Thu, 19 Jul 2012 07:13:52 +0000 (09:13 +0200)]
MIPS: Oprofile: Fix build as a module.

When building oprofile as a module for R10000 or R7000 class processors,
E9000 or MIPSxx class cores since 3572a2c37f667ee49333f8863722b8f43eac506b
[MIPS: make oprofile use cp0_perfcount_irq if it is set] an

ERROR: "cp0_compare_irq" [arch/mips/oprofile/oprofile.ko] undefined!

error will happen.  Fixed by exporting cp0_compare_irq.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: BCM63XX: Fix BCM6368 IPSec clock bit
Florian Fainelli [Thu, 19 Jul 2012 07:13:52 +0000 (09:13 +0200)]
MIPS: BCM63XX: Fix BCM6368 IPSec clock bit

The IPsec clock bit is 18 and not 17.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Cc: linux-mips@linux-mips.org
Cc: mpm@selenic.com
Cc: herbert@gondor.apana.org.au
Patchwork: https://patchwork.linux-mips.org/patch/3323/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: perf: Fix build error caused by unused counters_per_cpu_to_total()
Florian Fainelli [Thu, 19 Jul 2012 07:13:52 +0000 (09:13 +0200)]
MIPS: perf: Fix build error caused by unused counters_per_cpu_to_total()

cc1: warnings being treated as errors
arch/mips/kernel/perf_event_mipsxx.c:166: error: 'counters_per_cpu_to_total' defined but not used
make[2]: *** [arch/mips/kernel/perf_event_mipsxx.o] Error 1
make[2]: *** Waiting for unfinished jobs....

It was first introduced by 82091564cfd7ab8def42777a9c662dbf655c5d25 [MIPS:
perf: Add support for 64-bit perf counters.] in 3.2.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Cc: linux-mips@linux-mips.org
Cc: david.daney@cavium.com
Patchwork: https://patchwork.linux-mips.org/patch/3357/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: Fix Magic SysRq L kernel crash.
Vincent Wen [Thu, 19 Jul 2012 07:11:16 +0000 (09:11 +0200)]
MIPS: Fix Magic SysRq L kernel crash.

show_backtrace() was passed a NULL pointer which caused paging
request fail. Set to current task as other architectures (ARM,
etc) do when passed a NULL task pointer.

Signed-off-by: Vincent Wen <vincentwenlinux@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: cernekee@gmail.com
Patchwork: https://patchwork.linux-mips.org/patch/3524/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: BMIPS: Fix duplicate header inclusion.
Danny Kukawka [Thu, 19 Jul 2012 07:11:16 +0000 (09:11 +0200)]
MIPS: BMIPS: Fix duplicate header inclusion.

Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Cc: Danny Kukawka <dkukawka@suse.de>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/3369/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agomips: mark const init data with __initconst instead of __initdata
Uwe Kleine-König [Thu, 19 Jul 2012 07:11:16 +0000 (09:11 +0200)]
mips: mark const init data with __initconst instead of __initdata

As long as there is no other non-const variable marked __initdata in the
same compilation unit it doesn't hurt. If there were one however
compilation would fail with

error: $variablename causes a section type conflict

because a section containing const variables is marked read only and so
cannot contain non-const variables.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: linux-mips@linux-mips.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: kernel@pengutronix.de
Patchwork: https://patchwork.linux-mips.org/patch/3565/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: cmpxchg.h: Add missing include
Aaro Koskinen [Thu, 19 Jul 2012 07:11:15 +0000 (09:11 +0200)]
MIPS: cmpxchg.h: Add missing include

Fix the following build breakage in v3.4-rc1:

  CC      kernel/irq_work.o
In file included from include/linux/irq_work.h:4:0,
                 from kernel/irq_work.c:10:
include/linux/llist.h: In function 'llist_del_all':
include/linux/llist.h:178:2: error: implicit declaration of function 'BUILD_BUG_ON' [-Werror=implicit-function-declaration]

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3568/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: Malta may also be equipped with MIPS64 R2 processors.
Leonid Yegoshin [Thu, 19 Jul 2012 07:11:15 +0000 (09:11 +0200)]
MIPS: Malta may also be equipped with MIPS64 R2 processors.

Signed-off-by: Leonid Yegoshin <yegoshin@mips.com>
Signed-off-by: Steven J. Hill <sjhill@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3792/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: Fix typo multipy -> multiply
Ralf Baechle [Thu, 19 Jul 2012 07:11:15 +0000 (09:11 +0200)]
MIPS: Fix typo multipy -> multiply

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: Cavium: Fix duplicate ARCH_SPARSEMEM_ENABLE in kconfig.
Yoichi Yuasa [Thu, 19 Jul 2012 07:11:15 +0000 (09:11 +0200)]
MIPS: Cavium: Fix duplicate ARCH_SPARSEMEM_ENABLE in kconfig.

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3883/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: BCM47xx: Fix BCMA_DRIVER_PCI_HOSTMODE config dependencies
Yoichi Yuasa [Thu, 19 Jul 2012 07:11:15 +0000 (09:11 +0200)]
MIPS: BCM47xx: Fix BCMA_DRIVER_PCI_HOSTMODE config dependencies

warning: (BCM47XX_BCMA) selects BCMA_DRIVER_PCI_HOSTMODE which has unmet direct dependencies (BCMA_POSSIBLE && BCMA && MIPS && BCMA_HOST_PCI)
warning: (BCM47XX_BCMA) selects BCMA_DRIVER_PCI_HOSTMODE which has unmet direct dependencies (BCMA_POSSIBLE && BCMA && MIPS && BCMA_HOST_PCI)

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3882/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: SMTC: Spelling and grammar corrections.
Ralf Baechle [Thu, 19 Jul 2012 07:11:14 +0000 (09:11 +0200)]
MIPS: SMTC: Spelling and grammar corrections.

Extractd from Steven J. Hill's https://patchwork.linux-mips.org/patch/3603/.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: Properly align the .data..init_task section.
David Daney [Thu, 19 Jul 2012 07:11:14 +0000 (09:11 +0200)]
MIPS: Properly align the .data..init_task section.

Improper alignment can lead to unbootable systems and/or random
crashes.

[ralf@linux-mips.org: This is a lond standing bug since
6eb10bc9e2deab06630261cd05c4cb1e9a60e980 (kernel.org) rsp.
c422a10917f75fd19fa7fe070aaaa23e384dae6f (lmo) [MIPS: Clean up linker script
using new linker script macros.] so dates back to 2.6.32.]

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org>
Patchwork: https://patchwork.linux-mips.org/patch/3881/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: Malta: Change start address to avoid conflicts.
Steven J. Hill [Thu, 19 Jul 2012 07:11:14 +0000 (09:11 +0200)]
MIPS: Malta: Change start address to avoid conflicts.

There are ACPI and SMB devices in the 0x1000..0x1fff address range.

Signed-off-by: Steven J. Hill <sjhill@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3581/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: Fix race condition with FPU thread task flag during context switch.
Leonid Yegoshin [Thu, 19 Jul 2012 07:11:14 +0000 (09:11 +0200)]
MIPS: Fix race condition with FPU thread task flag during context switch.

[ralf@linux-mips.org: Cosmetic changes; also fixed up r2300_switch.S and
octeon_switch.S which needed similar modifications.]

Signed-off-by: Leonid Yegoshin <yegoshin@mips.com>
Signed-off-by: Steven J. Hill <sjhill@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3784/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: Fix decoding of c0_config1 for MIPSxx caches with 32 ways per set.
Douglas Leung [Thu, 19 Jul 2012 07:11:13 +0000 (09:11 +0200)]
MIPS: Fix decoding of c0_config1 for MIPSxx caches with 32 ways per set.

This affects certain 4Kc cores.

Signed-off-by: Douglas Leung <douglas@mips.com>
Signed-off-by: Steven J. Hill <sjhill@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3855/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: Refactor 'clear_page' and 'copy_page' functions.
Steven J. Hill [Fri, 6 Jul 2012 19:56:01 +0000 (21:56 +0200)]
MIPS: Refactor 'clear_page' and 'copy_page' functions.

Remove usage of the '__attribute__((alias("...")))' hack that aliased
to integer arrays containing micro-assembled instructions. This hack
breaks when building a microMIPS kernel. It also makes the code much
easier to understand.

[ralf@linux-mips.org: Added back export of the clear_page and copy_page
symbols so certain modules will work again.  Also fixed build with
CONFIG_SIBYTE_DMA_PAGEOPS enabled.]

Signed-off-by: Steven J. Hill <sjhill@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3866/
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agoMIPS: Don't panic on 5KEc.
Leonid Yegoshin [Fri, 6 Jul 2012 19:56:01 +0000 (21:56 +0200)]
MIPS: Don't panic on 5KEc.

It's a bloody bog standard MIPS64R2 core with just a new PrId ID.  Iow
that essentially means Linux just panics because it doesn't know how to
name the core.

[ralf@linux-mips.org: Split original patch into several smaller patches.]

Signed-off-by: Leonid Yegoshin <yegoshin@mips.com>
Signed-off-by: Steven J. Hill <sjhill@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3792/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
12 years agomd/raid1: close some possible races on write errors during resync
NeilBrown [Thu, 19 Jul 2012 05:59:18 +0000 (15:59 +1000)]
md/raid1: close some possible races on write errors during resync

commit 4367af556133723d0f443e14ca8170d9447317cb
   md/raid1: clear bad-block record when write succeeds.

Added a 'reschedule_retry' call possibility at the end of
end_sync_write, but didn't add matching code at the end of
sync_request_write.  So if the writes complete very quickly, or
scheduling makes it seem that way, then we can miss rescheduling
the request and the resync could hang.

Also commit 73d5c38a9536142e062c35997b044e89166e063b
    md: avoid races when stopping resync.

Fix a race condition in this same code in end_sync_write but didn't
make the change in sync_request_write.

This patch updates sync_request_write to fix both of those.
Patch is suitable for 3.1 and later kernels.

Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Original-version-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
12 years agomd: avoid crash when stopping md array races with closing other open fds.
NeilBrown [Thu, 19 Jul 2012 05:59:18 +0000 (15:59 +1000)]
md: avoid crash when stopping md array races with closing other open fds.

md will refuse to stop an array if any other fd (or mounted fs) is
using it.
When any fs is unmounted of when the last open fd is closed all
pending IO will be flushed (e.g. sync_blockdev call in __blkdev_put)
so there will be no pending IO to worry about when the array is
stopped.

However in order to send the STOP_ARRAY ioctl to stop the array one
must first get and open fd on the block device.
If some fd is being used to write to the block device and it is closed
after mdadm open the block device, but before mdadm issues the
STOP_ARRAY ioctl, then there will be no last-close on the md device so
__blkdev_put will not call sync_blockdev.

If this happens, then IO can still be in-flight while md tears down
the array and bad things can happen (use-after-free and subsequent
havoc).

So in the case where do_md_stop is being called from an open file
descriptor, call sync_block after taking the mutex to ensure there
will be no new openers.

This is needed when setting a read-write device to read-only too.

Cc: stable@vger.kernel.org
Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
12 years agomd: fix bug in handling of new_data_offset
NeilBrown [Thu, 19 Jul 2012 05:59:18 +0000 (15:59 +1000)]
md: fix bug in handling of new_data_offset

commit c6563a8c38fde3c1c7fc925a10bde3ca20799301
    md: add possibility to change data-offset for devices.

introduced a 'new_data_offset' attribute which should normally
be the same as 'data_offset', but can be explicitly set to a different
value to allow a reshape operation to move the data.

Unfortunately when the 'data_offset' is explicitly set through
sysfs, the new_data_offset is not also set, so the two would become
out-of-sync incorrectly.

One result of this is that trying to set the 'size' after the
'data_offset' would fail because it is not permitted to set the size
when the 'data_offset' and 'new_data_offset' are different - as that
can be confusing.
Consequently when mdadm tried to do this while assembling an IMSM
array it would fail.

This bug was introduced in 3.5-rc1.

Reported-by: Brian Downing <bdowning@lavos.net>
Bisected-by: Brian Downing <bdowning@lavos.net>
Tested-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: NeilBrown <neilb@suse.de>
12 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Linus Torvalds [Thu, 19 Jul 2012 01:40:38 +0000 (18:40 -0700)]
Merge git://git./linux/kernel/git/nab/target-pending

Pull target fixes from Nicholas Bellinger:
 "This includes a bugfix from MDR to address a NULL pointer OOPs with
  FCoE aborts, along with a WRITE_SAME emulation bugfix for NOLB=0
  cases, and persistent reservation return cleanups from Roland.

  All three patches are CC'ed to stable."

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Fix range calculation in WRITE SAME emulation when num blocks == 0
  target: Clean up returning errors in PR handling code
  tcm_fc: Fix crash seen with aborts and large reads

12 years agokexec: update URL of kexec homepage
Olaf Hering [Wed, 18 Jul 2012 21:12:04 +0000 (14:12 -0700)]
kexec: update URL of kexec homepage

The referenced html file does not exist anymore. Replace the URL with
the current project homepage.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomips: fix bug.h build regression
Yoichi Yuasa [Wed, 18 Jul 2012 21:12:01 +0000 (14:12 -0700)]
mips: fix bug.h build regression

Commit 377780887 ("bug.h: need linux/kernel.h for TAINT_WARN.") broke
all MIPS builds:

    CC      arch/mips/kernel/machine_kexec.o
  include/linux/log2.h: In function '__ilog2_u32':
  include/linux/log2.h:34:2: error: implicit declaration of function 'fls' [-Werror=implicit-function-declaration]
  include/linux/log2.h: In function '__ilog2_u64':
  include/linux/log2.h:42:2: error: implicit declaration of function 'fls64' [-Werror=implicit-function-declaration]
  ...

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Tested-by: John Crispin <blogic@openwrt.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMake wait_for_device_probe() also do scsi_complete_async_scans()
Linus Torvalds [Thu, 19 Jul 2012 01:15:46 +0000 (18:15 -0700)]
Make wait_for_device_probe() also do scsi_complete_async_scans()

Commit a7a20d103994 ("sd: limit the scope of the async probe domain")
make the SCSI device probing run device discovery in it's own async
domain.

However, as a result, the partition detection was no longer synchronized
by async_synchronize_full() (which, despite the name, only synchronizes
the global async space, not all of them).  Which in turn meant that
"wait_for_device_probe()" would not wait for the SCSI partitions to be
parsed.

And "wait_for_device_probe()" was what the boot time init code relied on
for mounting the root filesystem.

Now, most people never noticed this, because not only is it
timing-dependent, but modern distributions all use initrd.  So the root
filesystem isn't actually on a disk at all.  And then before they
actually mount the final disk filesystem, they will have loaded the
scsi-wait-scan module, which not only does the expected
wait_for_device_probe(), but also does scsi_complete_async_scans().

[ Side note: scsi_complete_async_scans() had also been partially broken,
  but that was fixed in commit 43a8d39d0137 ("fix async probe
  regression"), so that same commit a7a20d103994 had actually broken
  setups even if you used scsi-wait-scan explicitly ]

Solve this problem by just moving the scsi_complete_async_scans() call
into wait_for_device_probe().  Everybody who wants to wait for device
probing to finish really wants the SCSI probing to complete, so there's
no reason not to do this.

So now "wait_for_device_probe()" really does what the name implies, and
properly waits for device probing to finish.  This also removes the now
unnecessary extra calls to scsi_complete_async_scans().

Reported-and-tested-by: Artem S. Tashkinov <t.artem@mailcity.com>
Cc: Dan Williams <dan.j.williams@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: linux-scsi <linux-scsi@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Wed, 18 Jul 2012 20:42:44 +0000 (13:42 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security

Pull SELinux regression fixes from James Morris.

Andrew Morton has a box that hit that open perms problem.

I also renamed the "epollwakeup" selinux name for the new capability to
be "block_suspend", to match the rename done by commit d9914cf66181
("PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND").

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  SELinux: do not check open perms if they are not known to policy
  SELinux: include definition of new capabilities

12 years agonet: Statically initialize init_net.dev_base_head
Rustad, Mark D [Wed, 18 Jul 2012 09:06:07 +0000 (09:06 +0000)]
net: Statically initialize init_net.dev_base_head

This change eliminates an initialization-order hazard most
recently seen when netprio_cgroup is built into the kernel.

With thanks to Eric Dumazet for catching a bug.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 18 Jul 2012 17:36:02 +0000 (10:36 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

One more time/ntp fix pulled from Ingo Molnar.

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ntp: Fix STA_INS/DEL clearing bug

12 years agov4l2-dev: forgot to add VIDIOC_DV_TIMINGS_CAP.
Hans Verkuil [Wed, 11 Jul 2012 12:12:45 +0000 (14:12 +0200)]
v4l2-dev: forgot to add VIDIOC_DV_TIMINGS_CAP.

The VIDIOC_DV_TIMINGS_CAP ioctl check wasn't added to determine_valid_ioctls().
This caused this ioctl to always return -ENOTTY.

The cause for this was that for 3.5 two patch series were merged, one
changing V4L2 core ioctl handling and one adding new functionality, and
some of the new functionality wasn't handled by the new V4L2 core code.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
[ Taking it directly due to vacations  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Wed, 18 Jul 2012 17:27:08 +0000 (10:27 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes for SPEAr from Olof Johansson:
 "These are arriving very late in the release cycle, but there has been
  a change of maintainers on the SPEAr platform and they have needed a
  while to get going.

  The patch count is higher than I would like at this point, but they're
  all relevant fixes and well-contained in their own platform code.  I
  still think it's suitable 3.5 material and I don't think it should
  increase the need for a -rc8 since they are so contained."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: SPEAr600: Fix timer interrupt definition in spear600.dtsi
  ARM: dts: SPEAr320: Boot the board in EXTENDED_MODE
  ARM: dts: SPEAr320: Fix compatible string
  Clk: SPEAr1340: Update sys clock parent array
  clk: SPEAr1340: Fix clk enable register for uart1 and i2c1.
  ARM: SPEAr13xx: Fix Interrupt bindings
  Clk:spear6xx:Fix: Rename clk ids within predefined limit
  Clk:spear3xx:Fix: Rename clk ids within predefined limit
  clk:spear1310:Fix: Rename clk ids within predefined limit
  clk:spear1340:Fix: Rename clk ids within predefined limit

12 years agoMAINTAINERS: Changes in qlcnic and qlge maintainers list
Anirban Chakraborty [Tue, 17 Jul 2012 09:22:09 +0000 (09:22 +0000)]
MAINTAINERS: Changes in qlcnic and qlge maintainers list

Please apply.

Thanks.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoMerge git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Wed, 18 Jul 2012 16:28:11 +0000 (09:28 -0700)]
Merge git://git.samba.org/sfrench/cifs-2.6

Pull CIFS fixes from Steve French.

* git://git.samba.org/sfrench/cifs-2.6:
  cifs: always update the inode cache with the results from a FIND_*
  cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps
  cifs: on CONFIG_HIGHMEM machines, limit the rsize/wsize to the kmap space
  Initialise mid_q_entry before putting it on the pending queue

12 years agocipso: don't follow a NULL pointer when setsockopt() is called
Paul Moore [Tue, 17 Jul 2012 11:07:47 +0000 (11:07 +0000)]
cipso: don't follow a NULL pointer when setsockopt() is called

As reported by Alan Cox, and verified by Lin Ming, when a user
attempts to add a CIPSO option to a socket using the CIPSO_V4_TAG_LOCAL
tag the kernel dies a terrible death when it attempts to follow a NULL
pointer (the skb argument to cipso_v4_validate() is NULL when called via
the setsockopt() syscall).

This patch fixes this by first checking to ensure that the skb is
non-NULL before using it to find the incoming network interface.  In
the unlikely case where the skb is NULL and the user attempts to add
a CIPSO option with the _TAG_LOCAL tag we return an error as this is
not something we want to allow.

A simple reproducer, kindly supplied by Lin Ming, although you must
have the CIPSO DOI #3 configure on the system first or you will be
caught early in cipso_v4_validate():

#include <sys/types.h>
#include <sys/socket.h>
#include <linux/ip.h>
#include <linux/in.h>
#include <string.h>

struct local_tag {
char type;
char length;
char info[4];
};

struct cipso {
char type;
char length;
char doi[4];
struct local_tag local;
};

int main(int argc, char **argv)
{
int sockfd;
struct cipso cipso = {
.type = IPOPT_CIPSO,
.length = sizeof(struct cipso),
.local = {
.type = 128,
.length = sizeof(struct local_tag),
},
};

memset(cipso.doi, 0, 4);
cipso.doi[3] = 3;

sockfd = socket(AF_INET, SOCK_DGRAM, 0);
#define SOL_IP 0
setsockopt(sockfd, SOL_IP, IP_OPTIONS,
&cipso, sizeof(struct cipso));

return 0;
}

CC: Lin Ming <mlin@ss.pku.edu.cn>
Reported-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12 years agoext4: fix duplicated mnt_drop_write call in EXT4_IOC_MOVE_EXT
Al Viro [Wed, 18 Jul 2012 08:31:36 +0000 (09:31 +0100)]
ext4: fix duplicated mnt_drop_write call in EXT4_IOC_MOVE_EXT

Caused, AFAICS, by mismerge in commit ff9cb1c4eead ("Merge branch
'for_linus' into for_linus_merged")

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge branch 'for-3.5-spear-fixes' of http://git.stlinux.com/spear/linux-2.6 into...
Olof Johansson [Wed, 18 Jul 2012 05:43:53 +0000 (22:43 -0700)]
Merge branch 'for-3.5-spear-fixes' of git.stlinux.com/spear/linux-2.6 into fixes

* 'for-3.5-spear-fixes' of http://git.stlinux.com/spear/linux-2.6:
  ARM: SPEAr600: Fix timer interrupt definition in spear600.dtsi
  ARM: dts: SPEAr320: Boot the board in EXTENDED_MODE
  ARM: dts: SPEAr320: Fix compatible string
  Clk: SPEAr1340: Update sys clock parent array
  clk: SPEAr1340: Fix clk enable register for uart1 and i2c1.
  ARM: SPEAr13xx: Fix Interrupt bindings
  Clk:spear6xx:Fix: Rename clk ids within predefined limit
  Clk:spear3xx:Fix: Rename clk ids within predefined limit
  clk:spear1310:Fix: Rename clk ids within predefined limit
  clk:spear1340:Fix: Rename clk ids within predefined limit

12 years agoARM: SPEAr600: Fix timer interrupt definition in spear600.dtsi
Stefan Roese [Fri, 11 May 2012 08:41:01 +0000 (10:41 +0200)]
ARM: SPEAr600: Fix timer interrupt definition in spear600.dtsi

Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
12 years agoARM: dts: SPEAr320: Boot the board in EXTENDED_MODE
Vipul Kumar Samar [Fri, 13 Jul 2012 11:52:11 +0000 (17:22 +0530)]
ARM: dts: SPEAr320: Boot the board in EXTENDED_MODE

On spear320 device supported mode are:

   * AUTO_NET_SMII_MODE
   * AUTO_NET_MII_MODE
   * AUTO_EXP_MODE
   * SMALL_PRINTERS_MODE
   * EXTENDED_MODE

spear320-evb board is designed for EXTENDED_MODE only, hence it does not
boot correctly in current form where pinctrl part for some devices fail.

Configure and boot the SPEAr320 evaluation board in EXTENDED_MODE.

Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
12 years agoARM: dts: SPEAr320: Fix compatible string
Vipul Kumar Samar [Fri, 13 Jul 2012 11:50:46 +0000 (17:20 +0530)]
ARM: dts: SPEAr320: Fix compatible string

Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
12 years agoClk: SPEAr1340: Update sys clock parent array
Vipul Kumar Samar [Fri, 6 Jul 2012 10:22:36 +0000 (15:52 +0530)]
Clk: SPEAr1340: Update sys clock parent array

sys_clk has multiple parents and selection of parent depends on sys_clk_ctrl
register bit no. 23:25, with following possibilities

   0XX: pll1_clk
   10X: sys_synth_clk
   110: pll2_clk
   111: pll3_clk

Out of several possibilities (h/w wise) to select same clock parent for
sys_clk, current clock implementation was considering just one value.

When bootloader programmed different (valid) value to select a clock
parent then Linux breaks.

Here, we try to include all possibilities which can lead to same
clock selection thus making Linux independent of bootloader selection
values.

Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
12 years agoclk: SPEAr1340: Fix clk enable register for uart1 and i2c1.
Vipul Kumar Samar [Wed, 4 Jul 2012 10:52:19 +0000 (18:52 +0800)]
clk: SPEAr1340: Fix clk enable register for uart1 and i2c1.

This patch is to fix typing mistake of clk enable register of i2c1 and
uart1.

Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
12 years agoARM: SPEAr13xx: Fix Interrupt bindings
Vipul Kumar Samar [Wed, 4 Jul 2012 10:52:17 +0000 (18:52 +0800)]
ARM: SPEAr13xx: Fix Interrupt bindings

   - Correct interrupt bindings for uart, ethernet and pmu.
   - Added interrupt binding for keyboard.

Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
12 years agoClk:spear6xx:Fix: Rename clk ids within predefined limit
Vipul Kumar Samar [Tue, 10 Jul 2012 11:42:46 +0000 (17:12 +0530)]
Clk:spear6xx:Fix: Rename clk ids within predefined limit

The max limit of con_id is 16 and dev_id is 20. As of now for spear6xx, many clk
ids are exceeding this predefined limit.

This patch is intended to rename clk ids like:
    mux_clk -> _mclk
    gate_clk -> _gclk
    synth_clk -> syn_clk
    ras_gen1_synth_gate_clk -> ras_syn1_gclk
    pll3_48m -> pll3_

Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
12 years agoClk:spear3xx:Fix: Rename clk ids within predefined limit
Vipul Kumar Samar [Tue, 10 Jul 2012 11:42:45 +0000 (17:12 +0530)]
Clk:spear3xx:Fix: Rename clk ids within predefined limit

The max limit of con_id is 16 and dev_id is 20. As of now for spear3xx, many clk
ids are exceeding this predefined limit.

This patch is intended to rename clk ids like:
    mux_clk -> _mclk
    gate_clk -> _gclk
    synth_clk -> syn_clk
    ras_gen1_synth_gate_clk -> ras_syn1_gclk
    ras_pll3_48m -> ras_pll3_
    pll3_48m -> pll3_

Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
12 years agoclk:spear1310:Fix: Rename clk ids within predefined limit
Vipul Kumar Samar [Tue, 10 Jul 2012 11:42:44 +0000 (17:12 +0530)]
clk:spear1310:Fix: Rename clk ids within predefined limit

The max limit of con_id is 16 and dev_id is 20. As of now for spear1310, many
clk ids are exceeding this predefined limit.

This patch is intended to rename clk ids like:
    mux_clk -> _mclk
    gate_clk -> _gclk
    synth_clk -> syn_clk
    gmac_phy -> phy_
    gmii_125m_pad -> gmii_pad

Signed-off-by: Vipul Kumar Samar <vipulkumar.samar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>