Ingo Molnar [Fri, 4 Sep 2009 09:32:54 +0000 (11:32 +0200)]
sched: Turn on SD_BALANCE_NEWIDLE
Start the re-tuning of the balancer by turning on newidle.
It improves hackbench performance and parallelism on a 4x4 box.
The "perf stat --repeat 10" measurements give us:
domain0 domain1
.......................................
-SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE:
2041.273208 task-clock-msecs # 9.354 CPUs ( +- 0.363% )
+SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE:
2086.326925 task-clock-msecs # 11.934 CPUs ( +- 0.301% )
+SD_BALANCE_NEWIDLE +SD_BALANCE_NEWIDLE:
2115.289791 task-clock-msecs # 12.158 CPUs ( +- 0.263% )
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Fri, 4 Sep 2009 09:21:24 +0000 (11:21 +0200)]
sched: Clean up topology.h
Re-organize the flag settings so that it's visible at a glance
which sched-domains flags are set and which not.
With the new balancer code we'll need to re-tune these details
anyway, so make it cleaner to make fewer mistakes down the
road ;-)
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Fri, 4 Sep 2009 09:49:25 +0000 (11:49 +0200)]
sched: Fix dynamic power-balancing crash
This crash:
[ 1774.088275] divide error: 0000 [#1] SMP
[ 1774.100355] CPU 13
[ 1774.102498] Modules linked in:
[ 1774.105631] Pid: 30881, comm: hackbench Not tainted 2.6.31-rc8-tip-01308-g484d664-dirty #1629 X8DTN
[ 1774.114807] RIP: 0010:[<
ffffffff81041c38>] [<
ffffffff81041c38>]
sched_balance_self+0x19b/0x2d4
Triggers because update_group_power() modifies the sd tree and does
temporary calculations there - not considering that other CPUs
could observe intermediate values, such as the zero initial value.
Calculate it in a temporary variable instead. (we need no memory
barrier as these are all statistical values anyway)
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
20090904092742.GA11014@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Tue, 1 Sep 2009 08:34:39 +0000 (10:34 +0200)]
sched: Remove reciprocal for cpu_power
Its a source of fail, also, now that cpu_power is dynamical,
its a waste of time.
before:
<idle>-0 [000] 132.877936: find_busiest_group: avg_load: 0 group_load: 8241 power: 1
after:
bash-1689 [001] 137.862151: find_busiest_group: avg_load:
10636288 group_load: 10387 power: 1
[ v2: build fix from From: Andreas Herrmann ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <
20090901083826.
425896304@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Gautham R Shenoy [Wed, 2 Sep 2009 11:29:10 +0000 (16:59 +0530)]
sched: Try to deal with low capacity, fix update_sd_power_savings_stats()
sgs.group_capacity can now be 0, if for some reason
group->__cpu_power happens to be less than SCHED_LOAD_SCALE/2.
In that case, we need the following fix to make it work for
update_sd_power_savings_stats(). That's because both
sum_nr_running and group_capacity are unsigned longs.
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Tue, 1 Sep 2009 08:34:38 +0000 (10:34 +0200)]
sched: Try to deal with low capacity
When the capacity drops low, we want to migrate load away.
Allow the load-balancer to remove all tasks when we hit rock
bottom.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <
20090901083826.
342231003@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Tue, 1 Sep 2009 08:34:37 +0000 (10:34 +0200)]
sched: Scale down cpu_power due to RT tasks
Keep an average on the amount of time spend on RT tasks and use
that fraction to scale down the cpu_power for regular tasks.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <
20090901083826.
287778431@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Tue, 1 Sep 2009 08:34:36 +0000 (10:34 +0200)]
sched: Implement dynamic cpu_power
Recompute the cpu_power for each cpu during load-balance.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <
20090901083826.
162033479@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Tue, 1 Sep 2009 08:34:35 +0000 (10:34 +0200)]
sched: Add smt_gain
The idea is that multi-threading a core yields more work
capacity than a single thread, provide a way to express a
static gain for threads.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <
20090901083826.
073345955@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Tue, 1 Sep 2009 08:34:34 +0000 (10:34 +0200)]
sched: Update the cpu_power sum during load-balance
In order to prepare for a more dynamic cpu_power, update the
group sum while walking the sched domains during load-balance.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <
20090901083825.
985050292@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Tue, 1 Sep 2009 08:34:33 +0000 (10:34 +0200)]
sched: Add SD_PREFER_SIBLING
Do the placement thing using SD flags.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <
20090901083825.
897028974@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Tue, 1 Sep 2009 08:34:32 +0000 (10:34 +0200)]
sched: Restore __cpu_power to a straight sum of power
cpu_power is supposed to be a representation of the process
capacity of the cpu, not a value to randomly tweak in order to
affect placement.
Remove the placement hacks.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <
20090901083825.
810860576@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Fri, 4 Sep 2009 08:08:43 +0000 (10:08 +0200)]
Merge branches 'sched/domains' and 'sched/clock' into sched/core
Merge reason: both topics are ready now, and we want to merge dependent
changes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 23 Jul 2009 18:13:26 +0000 (20:13 +0200)]
sched: Add wait, sleep and iowait accounting tracepoints
Add 3 schedstat tracepoints to help account for wait-time,
sleep-time and iowait-time.
They can also be used as a perf-counter source to profile tasks
on these clocks.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <new-submission>
[ build fix for the !CONFIG_SCHEDSTATS case ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Arjan van de Ven [Mon, 20 Jul 2009 18:26:58 +0000 (11:26 -0700)]
sched: Provide iowait counters
For counting how long an application has been waiting for
(disk) IO, there currently is only the HZ sample driven
information available, while for all other counters in this
class, a high resolution version is available via
CONFIG_SCHEDSTATS.
In order to make an improved bootchart tool possible, we also
need a higher resolution version of the iowait time.
This patch below adds this scheduler statistic to the kernel.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
4A64B813.1080506@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Wed, 2 Sep 2009 06:20:32 +0000 (08:20 +0200)]
Merge commit 'v2.6.31-rc8' into sched/core
Merge reason: bump from rc5 to rc8, but also pick up TP_perf_assign()
API, a patch will be queued that depends on it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Anirban Sinha [Sat, 29 Aug 2009 05:40:43 +0000 (22:40 -0700)]
sched: Rename init_cfs_rq => init_tg_cfs_rq
... so that it does not share a common name with a function
within the same scope.
Signed-off-by: Anirban Sinha <asinha@zeugmasystems.com>
LKML-Reference: <
DDFD17CC94A9BD49A82147DDF7D545C501EA98A6@exchange.ZeugmaSystems.local>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 27 Aug 2009 11:08:56 +0000 (13:08 +0200)]
sched: Fix division by zero - really
When re-computing the shares for each task group's cpu
representation we need the ratio of weight on each cpu vs the
total weight of the sched domain.
Since load-balancing is loosely (read not) synchronized, the
weight of individual cpus can change between doing the sum and
calculating the ratio.
The previous patch dealt with only one of the race scenarios,
this patch side steps them all by saving a snapshot of all the
individual cpu weights, thereby always working on a consistent
set.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: torvalds@linux-foundation.org
Cc: jes@sgi.com
Cc: jens.axboe@oracle.com
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1251371336.18584.77.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Fri, 28 Aug 2009 00:59:04 +0000 (17:59 -0700)]
Linux 2.6.31-rc8
James Bottomley [Wed, 26 Aug 2009 12:34:12 +0000 (22:04 +0930)]
module: workaround duplicate section names
The root cause is a duplicate section name (.text); is this legal?
[ Amerigo Wang: "AFAIK, yes." ]
However, there's a problem with commit
6d76013381ed28979cd122eb4b249a88b5e384fa in that if you fail to allocate
a mod->sect_attrs (in this case it's null because of the duplication),
it still gets used without checking in add_notes_attrs()
This should fix it
[ This patch leaves other problems, particularly the sections directory,
but recent parisc toolchains seem to produce these modules and this
prevents a crash and is a minimal change -- RR ]
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rusty Russell [Wed, 26 Aug 2009 12:32:54 +0000 (22:02 +0930)]
module: fix BUG_ON() for powerpc (and other function descriptor archs)
The rarely-used symbol_put_addr() needs to use dereference_function_descriptor
on powerpc.
Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeremy Fitzhardinge [Thu, 27 Aug 2009 19:22:43 +0000 (12:22 -0700)]
xenfb: connect to backend before registering fb
As soon as the framebuffer is registered, our methods may be called by the
kernel. This leads to a crash as xenfb_refresh() gets called before we have
the irq.
Connect to the backend before registering our framebuffer with the kernel.
[ Fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=14059 ]
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 27 Aug 2009 19:26:02 +0000 (12:26 -0700)]
Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
inotify: Ensure we alwasy write the terminating NULL.
inotify: fix locking around inotify watching in the idr
inotify: do not BUG on idr entries at inotify destruction
inotify: seperate new watch creation updating existing watches
Benjamin Herrenschmidt [Thu, 27 Aug 2009 07:20:30 +0000 (17:20 +1000)]
lmb: Remove __init from lmb_end_of_DRAM()
We call lmb_end_of_DRAM() to test whether a DMA mask is ok on a machine
without IOMMU, but this function is marked as __init.
I don't think there's a clean way to get the top of RAM max_pfn doesn't
appear to include highmem or I missed (or we have a bug :-) so for now,
let's just avoid having a broken 2.6.31 by making this function
non-__init and we can revisit later.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 27 Aug 2009 19:24:08 +0000 (12:24 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: update documentation pointers
9p: remove unnecessary v9fses->options which duplicates the mount string
net/9p: insulate the client against an invalid error code sent by a 9p server
9p: Add missing cast for the error return value in v9fs_get_inode
9p: Remove redundant inode uid/gid assignment
9p: Fix possible regressions when ->get_sb fails.
9p: Fix v9fs show_options
9p: Fix possible memleak in v9fs_inode_from fid.
9p: minor comment fixes
9p: Fix possible inode leak in v9fs_get_inode.
9p: Check for error in return value of v9fs_fid_add
Julien TINNES [Thu, 27 Aug 2009 13:26:58 +0000 (15:26 +0200)]
ipv4: make ip_append_data() handle NULL routing table
Add a check in ip_append_data() for NULL *rtp to prevent future bugs in
callers from being exploitable.
Signed-off-by: Julien Tinnes <julien@cr0.org>
Signed-off-by: Tavis Ormandy <taviso@sdf.lonestar.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 27 Aug 2009 12:09:06 +0000 (13:09 +0100)]
AFS: Stop readlink() on AFS crashing due to NULL 'file' ptr
kAFS crashes when asked to read a symbolic link because page_getlink()
passes a NULL file pointer to read_mapping_page(), but afs_readpage()
expects a file pointer from which to extract a key.
Modify afs_readpage() to request the appropriate key from the calling
process's keyrings if a file struct is not supplied with one attached.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 21 Aug 2009 20:01:12 +0000 (22:01 +0200)]
init: Move sched_clock_init after late_time_init
Some architectures initialize clocks and timers in late_time_init and
x86 wants to do the same to avoid FIXMAP hackery for calibrating the
TSC. That would result in undefined sched_clock readout and wreckaged
printk timestamps again. We probably have those already on archs which
do all their time/clock setup in late_time_init.
There is no harm to move that after late_time_init except that a few
more boot timestamps are stale. The scheduler is not active at that
point so no real wreckage is expected.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: linux-arch@vger.kernel.org
Eric W. Biederman [Thu, 27 Aug 2009 10:20:04 +0000 (03:20 -0700)]
inotify: Ensure we alwasy write the terminating NULL.
Before the rewrite copy_event_to_user always wrote a terqminating '\0'
byte to user space after the filename. Since the rewrite that
terminating byte was skipped if your filename is exactly a multiple of
event_size. Ouch!
So add one byte to name_size before we round up and use clear_user to
set userspace to zero like /dev/zero does instead of copying the
strange nul_inotify_event. I can't quite convince myself len_to_zero
will never exceed 16 and even if it doesn't clear_user should be more
efficient and a more accurate reflection of what the code is trying to
do.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Mon, 24 Aug 2009 20:03:35 +0000 (16:03 -0400)]
inotify: fix locking around inotify watching in the idr
The are races around the idr storage of inotify watches. It's possible
that a watch could be found from sys_inotify_rm_watch() in the idr, but it
could be removed from the idr before that code does it's removal. Move the
locking and the refcnt'ing so that these have to happen atomically.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Mon, 24 Aug 2009 20:03:35 +0000 (16:03 -0400)]
inotify: do not BUG on idr entries at inotify destruction
If an inotify watch is left in the idr when an fsnotify group is destroyed
this will lead to a BUG. This is not a dangerous situation and really
indicates a programming bug and leak of memory. This patch changes it to
use a WARN and a printk rather than killing people's boxes.
Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Mon, 24 Aug 2009 20:03:35 +0000 (16:03 -0400)]
inotify: seperate new watch creation updating existing watches
There is nothing known wrong with the inotify watch addition/modification
but this patch seperates the two code paths to make them each easy to
verify as correct.
Signed-off-by: Eric Paris <eparis@redhat.com>
Linus Torvalds [Thu, 27 Aug 2009 03:54:48 +0000 (20:54 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
virtio: net refill on out-of-memory
smc91x: fix compilation on SMP
Linus Torvalds [Thu, 27 Aug 2009 03:39:31 +0000 (20:39 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/ps3: Update ps3_defconfig
powerpc/ps3: Add missing check for PS3 to rtc-ps3 platform device registration
Geoff Levand [Tue, 25 Aug 2009 07:53:35 +0000 (07:53 +0000)]
powerpc/ps3: Update ps3_defconfig
Update ps3_defconfig.
o Refresh for 2.6.31.
o Remove MTD support.
o Add more HID drivers.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Geert Uytterhoeven [Sun, 23 Aug 2009 22:54:32 +0000 (22:54 +0000)]
powerpc/ps3: Add missing check for PS3 to rtc-ps3 platform device registration
On non-PS3, we get:
| kernel BUG at drivers/rtc/rtc-ps3.c:36!
because the rtc-ps3 platform device is registered unconditionally in a kernel
with builtin support for PS3.
Reported-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Thu, 27 Aug 2009 03:17:07 +0000 (20:17 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
IMA: iint put in ima_counts_get and put
Linus Torvalds [Thu, 27 Aug 2009 03:16:38 +0000 (20:16 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/geert/linux-m68k
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k,m68knommu: Wire up rt_tgsigqueueinfo and perf_counter_open
m68k: Fix redefinition of pgprot_noncached
arch/m68k/include/asm/motorola_pgalloc.h: fix kunmap arg
m68k: cnt reaches -1, not 0
m68k: count can reach 51, not 50
Thadeu Lima de Souza Cascardo [Wed, 26 Aug 2009 21:29:32 +0000 (14:29 -0700)]
leds: after setting inverted attribute, we must update the LED
If we change the inverted attribute to another value, the LED will not be
inverted until we change the GPIO state.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Samuel R. C. Vale <srcvale@holoscopio.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thadeu Lima de Souza Cascardo [Wed, 26 Aug 2009 21:29:31 +0000 (14:29 -0700)]
leds: fix multiple requests and releases of IRQ for GPIO LED Trigger
When setting the same GPIO number, multiple IRQ shared requests will be
done without freing the previous request. It will also try to free a
failed request or an already freed IRQ if 0 was written to the gpio file.
All these oops and leaks were fixed with the following solution: keep the
previous allocated GPIO (if any) still allocated in case the new request
fails. The alternative solution would desallocate the previous allocated
GPIO and set gpio as 0.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Samuel R. C. Vale <srcvale@holoscopio.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frans Pop [Wed, 26 Aug 2009 21:29:30 +0000 (14:29 -0700)]
acpi processor: remove superfluous warning message
This failure is very common on many platforms. Handling it in the ACPI
processor driver is enough, and we don't need a warning message unless
CONFIG_ACPI_DEBUG is set.
Based on a patch from Zhang Rui.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13389
Signed-off-by: Frans Pop <elendil@planet.nl>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frans Pop [Wed, 26 Aug 2009 21:29:29 +0000 (14:29 -0700)]
ACPI processor: force throttling state when BIOS returns incorrect value
If the BIOS reports an invalid throttling state (which seems to be
fairly common after system boot), a reset is done to state T0.
Because of a check in acpi_processor_get_throttling_ptc(), the reset
never actually gets executed, which results in the error reoccurring
on every access of for example /proc/acpi/processor/CPU0/throttling.
Add a 'force' option to acpi_processor_set_throttling() to ensure
the reset really takes effect.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13389
This patch, together with the next one, fixes a regression introduced in
2.6.30, listed on the regression list. They have been available for 2.5
months now in bugzilla, but have not been picked up, despite various
reminders and without any reason given.
Google shows that numerous people are hitting this issue. The issue is in
itself relatively minor, but the bug in the code is clear.
The patches have been in all my kernels and today testing has shown that
throttling works correctly with the patches applied when the system
overheats (http://bugzilla.kernel.org/show_bug.cgi?id=13918#c14).
Signed-off-by: Frans Pop <elendil@planet.nl>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Costantino Leandro [Wed, 26 Aug 2009 21:29:28 +0000 (14:29 -0700)]
wmi: fix kernel panic when stack protection enabled.
Summary:
Kernel panic arise when stack protection is enabled, since strncat will
add a null terminating byte '\0'; So in functions
like this one (wmi_query_block):
char wc[4]="WC";
....
strncat(method, block->object_id, 2);
...
the length of wc should be n+1 (wc[5]) or stack protection
fault will arise. This is not noticeable when stack protection is
disabled,but , isn't good either.
Config used: [CONFIG_CC_STACKPROTECTOR_ALL=y,
CONFIG_CC_STACKPROTECTOR=y]
Panic Trace
------------
.... stack-protector: kernel stack corrupted in :
fa7b182c
2.6.30-rc8-obelisco-generic
call_trace:
[<
c04a6c40>] ? panic+0x45/0xd9
[<
c012925d>] ? __stack_chk_fail+0x1c/0x40
[<
fa7b182c>] ? wmi_query_block+0x15a/0x162 [wmi]
[<
fa7b182c>] ? wmi_query_block+0x15a/0x162 [wmi]
[<
fa7e7000>] ? acer_wmi_init+0x00/0x61a [acer_wmi]
[<
fa7e7135>] ? acer_wmi_init+0x135/0x61a [acer_wmi]
[<
c0101159>] ? do_one_initcall+0x50+0x126
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13514
Signed-off-by: Costantino Leandro <lcostantino@gmail.com>
Signed-off-by: Carlos Corbacho <carlos@strangeworlds.co.uk>
Cc: Len Brown <len.brown@intel.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yinghai Lu [Wed, 26 Aug 2009 21:29:26 +0000 (14:29 -0700)]
acpi: don't call acpi_processor_init if acpi is disabled
Jens reported early_ioremap messages with old ASUS board...
> [ 1.507461] pci 0000:00:09.0: Firmware left e100 interrupts enabled; disabling
> [ 1.532778] early_ioremap(
3fffd080,
0000005c) [0] => Pid: 1, comm: swapper Not tainted 2.6.31-rc4 #36
> [ 1.561007] Call Trace:
> [ 1.568638] [<
c136e48b>] ? printk+0x18/0x1d
> [ 1.581734] [<
c15513ff>] __early_ioremap+0x74/0x1e9
> [ 1.596898] [<
c15515aa>] early_ioremap+0x1a/0x1c
> [ 1.611270] [<
c154a187>] __acpi_map_table+0x18/0x1a
> [ 1.626451] [<
c135a7f8>] acpi_os_map_memory+0x1d/0x25
> [ 1.642129] [<
c119459c>] acpi_tb_verify_table+0x20/0x49
> [ 1.658321] [<
c1193e50>] acpi_get_table_with_size+0x53/0xa1
> [ 1.675553] [<
c1193eae>] acpi_get_table+0x10/0x15
> [ 1.690192] [<
c155cc19>] acpi_processor_init+0x23/0xab
> [ 1.706126] [<
c1001043>] do_one_initcall+0x33/0x180
> [ 1.721279] [<
c155cbf6>] ? acpi_processor_init+0x0/0xab
> [ 1.737479] [<
c106893a>] ? register_irq_proc+0xaa/0xc0
> [ 1.753411] [<
c10689b7>] ? init_irq_proc+0x67/0x80
> [ 1.768316] [<
c15405e7>] kernel_init+0x120/0x176
> [ 1.782678] [<
c15404c7>] ? kernel_init+0x0/0x176
> [ 1.797062] [<
c10038b7>] kernel_thread_helper+0x7/0x10
> [ 1.812984]
00000080 +
ffe00000
that is rather later.
acpi_gbl_permanent_mmap should be set in acpi_early_init()
if acpi is not disabled
and we have
> [ 0.000000] ASUS P2B-DS detected: force use of acpi=ht
just don't load acpi_processor_init...
Reported-and-tested-by: Jens Rosenboom <jens@leia.mcbone.net>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Brunner [Wed, 26 Aug 2009 21:29:25 +0000 (14:29 -0700)]
thermal_sys: check get_temp return value
The return value of the get_temp function is not checked when doing a
thermal zone update. This may lead to a critical shutdown if get_temp
fails and the content of the temp variable is incorrectly set higher than
the critical trip point.
This has been observed on a system with incorrect ACPI implementation
where the corresponding methods were not serialized and therefore
sometimes triggered ACPI errors (AE_ALREADY_EXISTS). The following
critical shutdowns indicated a temperature of 2097 C, which was obviously
wrong.
The patch adds a return value check that jumps over all trip point
evaluations printing a warning if get_temp fails. The trip points are
evaluated again on the next polling interval with successful get_temp
execution.
Signed-off-by: Michael Brunner <mibru@gmx.de>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 26 Aug 2009 21:29:24 +0000 (14:29 -0700)]
clone(): fix race between copy_process() and de_thread()
Spotted by Hiroshi Shimamoto who also provided the test-case below.
copy_process() uses signal->count as a reference counter, but it is not.
This test case
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <pthread.h>
void *null_thread(void *p)
{
for (;;)
sleep(1);
return NULL;
}
void *exec_thread(void *p)
{
execl("/bin/true", "/bin/true", NULL);
return null_thread(p);
}
int main(int argc, char **argv)
{
for (;;) {
pid_t pid;
int ret, status;
pid = fork();
if (pid < 0)
break;
if (!pid) {
pthread_t tid;
pthread_create(&tid, NULL, exec_thread, NULL);
for (;;)
pthread_create(&tid, NULL, null_thread, NULL);
}
do {
ret = waitpid(pid, &status, 0);
} while (ret == -1 && errno == EINTR);
}
return 0;
}
quickly creates an unkillable task.
If copy_process(CLONE_THREAD) races with de_thread()
copy_signal()->atomic(signal->count) breaks the signal->notify_count
logic, and the execing thread can hang forever in kernel space.
Change copy_process() to increment count/live only when we know for sure
we can't fail. In this case the forked thread will take care of its
reference to signal correctly.
If copy_process() fails, check CLONE_THREAD flag. If it it set - do
nothing, the counters were not changed and current belongs to the same
thread group. If it is not set, ->signal must be released in any case
(and ->count must be == 1), the forked child is the only thread in the
thread group.
We need more cleanups here, in particular signal->count should not be used
by de_thread/__exit_signal at all. This patch only fixes the bug.
Reported-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Tested-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Wed, 26 Aug 2009 21:29:23 +0000 (14:29 -0700)]
mm: fix for infinite churning of mlocked pages
An mlocked page might lose the isolatation race. This causes the page to
clear PG_mlocked while it remains in a VM_LOCKED vma. This means it can
be put onto the [in]active list. We can rescue it by using try_to_unmap()
in shrink_page_list().
But now, As Wu Fengguang pointed out, vmscan has a bug. If the page has
PG_referenced, it can't reach try_to_unmap() in shrink_page_list() but is
put into the active list. If the page is referenced repeatedly, it can
remain on the [in]active list without being moving to the unevictable
list.
This patch fixes it.
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KOSAKI Motohiro <<kosaki.motohiro@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Wed, 26 Aug 2009 21:29:22 +0000 (14:29 -0700)]
flex_array: convert element_nr formals to unsigned
It's problematic to allow signed element_nr's or total's to be passed as
part of the flex array API.
flex_array_alloc() allows total_nr_elements to be set to a negative
quantity, which is obviously erroneous.
flex_array_get() and flex_array_put() allows negative array indices in
dereferencing an array part, which could address memory mapped before
struct flex_array.
The fix is to convert all existing element_nr formals to be qualified as
unsigned. Existing checks to compare it to total_nr_elements or the max
array size based on element_size need not be changed.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Wed, 26 Aug 2009 21:29:21 +0000 (14:29 -0700)]
flex_array: declare parts member to have incomplete type
The `parts' member of struct flex_array should evaluate to an incomplete
type so that sizeof() cannot be used and C99 does not require the
zero-length specification.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Wed, 26 Aug 2009 21:29:20 +0000 (14:29 -0700)]
flex_array: fix flex_array_free_parts comment
flex_array_free_parts() does not take `src' or `element_nr' formals, so
remove their respective comments.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Wed, 26 Aug 2009 21:29:20 +0000 (14:29 -0700)]
flex_array: fix get function for elements in base starting at non-zero
If all array elements fit into the base structure and data is copied using
flex_array_put() starting at a non-zero index, flex_array_get() will fail
to return the data.
This fixes the bug by only checking for NULL parts when all elements do
not fit in the base structure when flex_array_get() is used. Otherwise,
fa_element_to_part_nr() will always be 0 since there are no parts
structures needed and such element may never have been put. Thus, it will
remain NULL due to the kzalloc() of the base.
Additionally, flex_array_put() now only checks for a NULL part when all
elements do not fit in the base structure. This is otherwise unnecessary
since the base structure is guaranteed to exist (or we would have already
hit a NULL pointer).
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonwoo Park [Wed, 26 Aug 2009 21:29:18 +0000 (14:29 -0700)]
pps: fix incorrect verdict check
Fix incorrect verdict check and returns error if device_create failed,
otherwise driver triggers kernel oops.
Signed-off-by: Joonwoo Park<joonwpark81@gmail.com>
Cc: Rodolfo Giometti <giometti@enneenne.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Paris [Wed, 26 Aug 2009 18:56:48 +0000 (14:56 -0400)]
IMA: iint put in ima_counts_get and put
ima_counts_get() calls ima_iint_find_insert_get() which takes a reference
to the iint in question, but does not put that reference at the end of the
function. This can lead to a nasty memory leak. Easy enough to reproduce:
#include <sys/mman.h>
#include <stdio.h>
int main (void)
{
int i;
void *ptr;
for (i=0; i < 100000; i++) {
ptr = mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_SHARED|MAP_ANONYMOUS, -1, 0);
if (ptr == MAP_FAILED)
return 2;
munmap(ptr, 4096);
}
return 0;
}
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Geert Uytterhoeven [Fri, 21 Aug 2009 20:03:54 +0000 (22:03 +0200)]
m68k,m68knommu: Wire up rt_tgsigqueueinfo and perf_counter_open
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Alexey Dobriyan [Thu, 9 Jul 2009 13:08:38 +0000 (17:08 +0400)]
m68k: Fix redefinition of pgprot_noncached
arch/m68k/include/asm/pgtable_mm.h:148:1: warning: "pgprot_noncached" redefined
In file included from arch/m68k/include/asm/pgtable_mm.h:138,
from arch/m68k/include/asm/pgtable.h:4,
from include/linux/mm.h:40,
from include/linux/pagemap.h:7,
from include/linux/blkdev.h:12,
from arch/m68k/emu/nfblock.c:17:
include/asm-generic/pgtable.h:133:1: warning: this is the location of the previous definition
pgprot_noncached() should be defined _before_ including asm-generic/pgtable.h
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Andrew Morton [Wed, 17 Jun 2009 20:13:58 +0000 (13:13 -0700)]
arch/m68k/include/asm/motorola_pgalloc.h: fix kunmap arg
arch/m68k/include/asm/motorola_pgalloc.h: In function 'pte_alloc_one':
arch/m68k/include/asm/motorola_pgalloc.h:44: warning: passing argument 1 of 'kunmap' from incompatible pointer type
Also, remove unneeded test for kmap() failure.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Roel Kluin [Wed, 17 Jun 2009 20:13:57 +0000 (13:13 -0700)]
m68k: cnt reaches -1, not 0
With the postfix decrement cnt reaches -1 rather than 0.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Roel Kluin [Wed, 17 Jun 2009 20:13:56 +0000 (13:13 -0700)]
m68k: count can reach 51, not 50
With while (count++ < 50) { ... } count can reach 51, not 50, so we
shouldn't give an error message on a count of 50.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Rusty Russell [Wed, 26 Aug 2009 19:22:32 +0000 (12:22 -0700)]
virtio: net refill on out-of-memory
If we run out of memory, use keventd to fill the buffer. There's a
report of this happening: "Page allocation failures in guest",
Message-ID: <
20090713115158.
0a4892b0@mjolnir.ossman.eu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Wed, 26 Aug 2009 19:03:35 +0000 (12:03 -0700)]
smc91x: fix compilation on SMP
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 26 Aug 2009 04:24:49 +0000 (21:24 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
irda/sa1100_ir: fix broken netdev_ops conversion
irda/au1k_ir: fix broken netdev_ops conversion
pkt_sched: Fix bogon in tasklet_hrtimer changes.
Linus Torvalds [Wed, 26 Aug 2009 04:24:26 +0000 (21:24 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Validate linear D-TLB misses.
sparc64: Update defconfig.
sparc32: Update defconfig.
sparc32: Kill trap table freeing code.
sparc: sys32.S incorrect compat-layer splice() system call
sparc: Use page_fault_out_of_memory() for VM_FAULT_OOM.
sparc64: Sign extend length arg to truncate syscalls when compat.
sparc: Fix cleanup crash in bbc_envctrl_cleanup()
Alexander Beregalov [Wed, 26 Aug 2009 03:39:37 +0000 (20:39 -0700)]
irda/sa1100_ir: fix broken netdev_ops conversion
This patch is based on commit d2f3ad4 (pxaficp-ir: remove incorrect
net_device_ops). Do the same for sa1100_ir.
Untested.
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Beregalov [Wed, 26 Aug 2009 03:39:18 +0000 (20:39 -0700)]
irda/au1k_ir: fix broken netdev_ops conversion
This patch is based on commit d2f3ad4 (pxaficp-ir: remove incorrect
net_device_ops). Do the same for au1k_ir.
Untested.
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 25 Aug 2009 23:47:46 +0000 (16:47 -0700)]
sparc64: Validate linear D-TLB misses.
When page alloc debugging is not enabled, we essentially accept any
virtual address for linear kernel TLB misses. But with kgdb, kernel
address probing, and other facilities we can try to access arbitrary
crap.
So, make sure the address we miss on will translate to physical memory
that actually exists.
In order to make this work we have to embed the valid address bitmap
into the kernel image. And in order to make that less expensive we
make an adjustment, in that the max physical memory address is
decreased to "1 << 41", even on the chips that support a 42-bit
physical address space. We can do this because bit 41 indicates
"I/O space" and thus covers non-memory ranges.
The result of this is that:
1) kpte_linear_bitmap shrinks from 2K to 1K in size
2) we need 64K more for the valid address bitmap
We can't let the valid address bitmap be dynamically allocated
once we start using it to validate TLB misses, otherwise we have
crazy issues to deal with wrt. recursive TLB misses and such.
If we're in a TLB miss it could be the deepest trap level that's legal
inside of the cpu. So if we TLB miss referencing the bitmap, the cpu
will be out of trap levels and enter RED state.
To guard against out-of-range accesses to the bitmap, we have to check
to make sure no bits in the physical address above bit 40 are set. We
could export and use last_valid_pfn for this check, but that's just an
unnecessary extra memory reference.
On the plus side of all this, since we load all of these translations
into the special 4MB mapping TSB, and we check the TSB first for TLB
misses, there should be absolutely no real cost for these new checks
in the TLB miss path.
Reported-by: heyongli@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 25 Aug 2009 18:24:37 +0000 (11:24 -0700)]
Merge branch 'perfcounters-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Fix typo in read() output generation
perf tools: Check perf.data owner
Linus Torvalds [Tue, 25 Aug 2009 18:24:24 +0000 (11:24 -0700)]
Merge branch 'core-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
dma-debug: Fix check_unmap null pointer dereference
Linus Torvalds [Tue, 25 Aug 2009 18:24:04 +0000 (11:24 -0700)]
Merge branch 'timers-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clockevent: Prevent dead lock on clockevents_lock
timers: Drop write permission on /proc/timer_list
Linus Torvalds [Tue, 25 Aug 2009 18:23:43 +0000 (11:23 -0700)]
Merge branch 'tracing-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Fix too large stack usage in do_one_initcall()
tracing: handle broken names in ftrace filter
ftrace: Unify effect of writing to trace_options and option/*
Linus Torvalds [Tue, 25 Aug 2009 18:23:25 +0000 (11:23 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix build with older binutils and consolidate linker script
x86: Fix an incorrect argument of reserve_bootmem()
x86: add vmlinux.lds to targets in arch/x86/boot/compressed/Makefile
xen: rearrange things to fix stackprotector
x86: make sure load_percpu_segment has no stackprotector
i386: Fix section mismatches for init code with !HOTPLUG_CPU
x86, pat: Allow ISA memory range uncacheable mapping requests
Linus Torvalds [Tue, 25 Aug 2009 16:47:36 +0000 (09:47 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
ext3: Improve error message that changing journaling mode on remount is not possible
ext3: Update Kconfig description of EXT3_DEFAULTS_TO_ORDERED
Linus Torvalds [Tue, 25 Aug 2009 16:47:06 +0000 (09:47 -0700)]
Merge branch 'fix/misc' of git://git./linux/kernel/git/tiwai/sound-2.6
* 'fix/misc' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
sound: pcm_lib: fix unsorted list constraint handling
sound: vx222: fix input level control range check
ALSA: ali5451: fix timeout handling in snd_ali_{codecs,timer}_ready()
Linus Torvalds [Tue, 25 Aug 2009 16:30:58 +0000 (09:30 -0700)]
Merge git://git./linux/kernel/git/wim/linux-2.6-watchdog
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
[WATCHDOG] ar7_wdt: fix path to ar7-specific headers
Linus Torvalds [Tue, 25 Aug 2009 16:12:43 +0000 (09:12 -0700)]
tty: make sure to flush any pending work when halting the ldisc
When I rewrote tty ldisc code to use proper reference counts (commits
65b770468e98 and
cbe9352fa08f) in order to avoid a race with hangup, the
test-program that Eric Biederman used to trigger the original problem
seems to have exposed another long-standing bug: the hangup code did the
'tty_ldisc_halt()' to stop any buffer flushing activity, but unlike the
other call sites it never actually flushed any pending work.
As a result, if you get just the right timing, the pending work may be
just about to execute (ie the timer has already triggered and thus
cancel_delayed_work() was a no-op), when we then re-initialize the ldisc
from under it.
That, in turn, results in various random problems, usually seen as a
NULL pointer dereference in run_timer_softirq() or a BUG() in
worker_thread (but it can be almost anything).
Fix it by adding the required 'flush_scheduled_work()' after doing the
tty_ldisc_halt() (this also requires us to move the ldisc halt to before
taking the ldisc mutex in order to avoid a deadlock with the workqueue
executing do_tty_hangup, which requires the mutex).
The locking should be cleaned up one day (the requirement to do this
outside the ldisc_mutex is very annoying, and weakens the lock), but
that's a larger and separate undertaking.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Tested-by: Xiaotian Feng <xtfeng@gmail.com>
Tested-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Tested-by: Dave Young <hidave.darkstar@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Beulich [Tue, 25 Aug 2009 13:50:53 +0000 (14:50 +0100)]
x86: Fix build with older binutils and consolidate linker script
binutils prior to 2.17 can't deal with the currently possible
situation of a new segment following the per-CPU segment, but
that new segment being empty - objcopy misplaces the .bss (and
perhaps also the .brk) sections outside of any segment.
However, the current ordering of sections really just appears
to be the effect of cumulative unrelated changes; re-ordering
things allows to easily guarantee that the segment following
the per-CPU one is non-empty, and at once eliminates the need
for the bogus data.init2 segment.
Once touching this code, also use the various data section
helper macros from include/asm-generic/vmlinux.lds.h.
-v2: fix !SMP builds.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: <sam@ravnborg.org>
LKML-Reference: <
4A94085D02000078000119A5@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Clemens Ladisch [Tue, 25 Aug 2009 06:15:41 +0000 (08:15 +0200)]
sound: pcm_lib: fix unsorted list constraint handling
snd_interval_list() expected a sorted list but did not document this, so
there are drivers that give it an unsorted list. To fix this, change
the algorithm to work with any list.
This fixes the "Slave PCM not usable" error with USB devices that have
multiple alternate settings with sample rates in decreasing order, such
as the Philips Askey VC010 WebCam.
http://bugzilla.kernel.org/show_bug.cgi?id=14028
Reported-and-tested-by: Andrzej <adkadk@gmail.com>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
David S. Miller [Tue, 25 Aug 2009 02:37:05 +0000 (19:37 -0700)]
pkt_sched: Fix bogon in tasklet_hrtimer changes.
Reported by Stephen Rothwell, luckily it's harmless:
net/sched/sch_api.c: In function 'qdisc_watchdog':
net/sched/sch_api.c:460: warning: initialization from incompatible pointer type
net/sched/sch_cbq.c: In function 'cbq_undelay':
net/sched/sch_cbq.c:595: warning: initialization from incompatible pointer type
Signed-off-by: David S. Miller <davem@davemloft.net>
Trond Myklebust [Mon, 24 Aug 2009 23:21:29 +0000 (19:21 -0400)]
NFSv4: Fix an infinite looping problem with the nfs4_state_manager
Commit
76db6d9500caeaa774a3e32a997eba30bbdc176b (nfs41: add session setup
to the state manager) introduces an infinite loop possibility in the NFSv4
state manager. By first checking nfs4_has_session() before clearing the
NFS4CLNT_SESSION_SETUP flag, it allows for a situation where someone sets
that flag, but it never gets cleared, and so the state manager loops.
In fact commit
c3fad1b1aaf850bf692642642ace7cd0d64af0a3 (nfs41: add session
reset to state manager) causes this to happen every time we get a network
partition error.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 24 Aug 2009 21:41:28 +0000 (14:41 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2/dlm: Wait on lockres instead of erroring cancel requests
ocfs2: Add missing lock name
ocfs2: Don't oops in ocfs2_kill_sb on a failed mount
ocfs2: release the buffer head in ocfs2_do_truncate.
ocfs2: Handle quota file corruption more gracefully
Linus Torvalds [Mon, 24 Aug 2009 19:53:45 +0000 (12:53 -0700)]
Merge branch 'fixes' of git://git.marvell.com/orion
* 'fixes' of git://git.marvell.com/orion:
[ARM] Orion NAND: Make asm volatile avoid GCC pushing ldrd out of the loop
[ARM] Kirkwood: enable eSATA on QNAP TS-219P
[ARM] Kirkwood: __init requires linux/init.h
Hugh Dickins [Mon, 24 Aug 2009 15:30:28 +0000 (16:30 +0100)]
mm: fix hugetlb bug due to user_shm_unlock call
2.6.30's commit
8a0bdec194c21c8fdef840989d0d7b742bb5d4bc removed
user_shm_lock() calls in hugetlb_file_setup() but left the
user_shm_unlock call in shm_destroy().
In detail:
Assume that can_do_hugetlb_shm() returns true and hence user_shm_lock()
is not called in hugetlb_file_setup(). However, user_shm_unlock() is
called in any case in shm_destroy() and in the following
atomic_dec_and_lock(&up->__count) in free_uid() is executed and if
up->__count gets zero, also cleanup_user_struct() is scheduled.
Note that sched_destroy_user() is empty if CONFIG_USER_SCHED is not set.
However, the ref counter up->__count gets unexpectedly non-positive and
the corresponding structs are freed even though there are live
references to them, resulting in a kernel oops after a lots of
shmget(SHM_HUGETLB)/shmctl(IPC_RMID) cycles and CONFIG_USER_SCHED set.
Hugh changed Stefan's suggested patch: can_do_hugetlb_shm() at the
time of shm_destroy() may give a different answer from at the time
of hugetlb_file_setup(). And fixed newseg()'s no_id error path,
which has missed user_shm_unlock() ever since it came in 2.6.9.
Reported-by: Stefan Huber <shuber2@gmail.com>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Tested-by: Stefan Huber <shuber2@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 24 Aug 2009 19:48:41 +0000 (12:48 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: Fix radeon_gem_busy_ioctl harder.
Linus Torvalds [Mon, 24 Aug 2009 19:26:48 +0000 (12:26 -0700)]
Merge git://git./linux/kernel/git/hskinnemoen/avr32-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
favr32: improve touchscreen response
avr32/lib: fix unaligned memcpy where len < 4
avr32/lib: fix unaligned memcpy()
Linus Torvalds [Mon, 24 Aug 2009 19:25:27 +0000 (12:25 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: ucb1400_ts - enable interrupt unconditionally
Input: ucb1400_ts - enable ADC Filter
Input: wacom - don't use on-stack memory for report buffers
Input: iforce - support new revision of ACT LABS Force RS
Input: joydev - decouple axis and button map ioctls from input constants
Linus Torvalds [Mon, 24 Aug 2009 19:25:03 +0000 (12:25 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
smc91x: let smc91x work well under netpoll
pxaficp-ir: remove incorrect net_device_ops
NET: llc, zero sockaddr_llc struct
drivers/net: fixed drivers that support netpoll use ndo_start_xmit()
netpoll: warning for ndo_start_xmit returns with interrupts enabled
net: Fix Micrel KSZ8842 Kconfig description
netfilter: xt_quota: fix wrong return value (error case)
ipv6: Fix commit
63d9950b08184e6531adceb65f64b429909cc101 (ipv6: Make v4-mapped bindings consistent with IPv4)
E100: fix interaction with swiotlb on X86.
pkt_sched: Convert CBQ to tasklet_hrtimer.
pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer
rtl8187: always set MSR_LINK_ENEDCA flag with RTL8187B
ibm_newemac: emac_close() needs to call netif_carrier_off()
net: fix ks8851 build errors
net: Rename MAC platform driver for w90p910 platform
yellowfin: Fix buffer underrun after dev_alloc_skb() failure
orinoco: correct key bounds check in orinoco_hw_get_tkip_iv
mac80211: fix todo lock
Linus Torvalds [Mon, 24 Aug 2009 19:24:01 +0000 (12:24 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
ima: hashing large files bug fix
kernel_read: redefine offset type
Amerigo Wang [Fri, 21 Aug 2009 08:34:45 +0000 (04:34 -0400)]
x86: Fix an incorrect argument of reserve_bootmem()
This line looks suspicious, because if this is true, then the
'flags' parameter of function reserve_bootmem_generic() will be
unused when !CONFIG_NUMA. I don't think this is what we want.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: akpm@linux-foundation.org
LKML-Reference: <
20090821083709.5098.52505.sendpatchset@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Simon Kagstrom [Thu, 20 Aug 2009 07:19:53 +0000 (09:19 +0200)]
[ARM] Orion NAND: Make asm volatile avoid GCC pushing ldrd out of the loop
GCC 4.3.3 and 4.4.1 happily moves the dword load instruction out of the
loop in orion_nand_read_buf. This patch makes the instruction volatile
to avoid the issue. I've discussed this at gcc-help, refer to the thread
at
http://gcc.gnu.org/ml/gcc-help/2009-08/msg00187.html
The early clobber is added to avoid the destination registers and the
source register overlapping.
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Signed-off-by: Nicolas Pitre <nico@marvell.com>
John Holland [Wed, 19 Aug 2009 23:24:03 +0000 (13:24 -1000)]
[ARM] Kirkwood: enable eSATA on QNAP TS-219P
Initialize PCI/PCIe on the QNAP TS-119, TS-219 and TS-219P hardware
allowing the use of the discrete eSATA controller connected to the PCIe
bus in the TS-219P.
Signed-off-by: John Holland <john.holland@cellent-fs.de>
Tested-by: Thomas Reitmayr <treitmayr@devbase.at>
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Martin Michlmayr [Tue, 18 Aug 2009 09:34:10 +0000 (23:34 -1000)]
[ARM] Kirkwood: __init requires linux/init.h
Include linux/init.h for __init to fix this error:
CC [M] drivers/net/wireless/wl12xx/boot.o
In file included from arch/arm/mach-kirkwood/include/mach/gpio.h:13,
from arch/arm/include/asm/gpio.h:5,
from include/linux/gpio.h:7,
from drivers/net/wireless/wl12xx/boot.c:24:
arch/arm/plat-orion/include/plat/gpio.h:32: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘orion_gpio_init’
make[6]: *** [drivers/net/wireless/wl12xx/boot.o] Error 1
make[5]: *** [drivers/net/wireless/wl12xx] Error 2
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Nicolas Pitre <nico@marvell.com>
Jan Kara [Mon, 24 Aug 2009 14:38:43 +0000 (16:38 +0200)]
ext3: Improve error message that changing journaling mode on remount is not possible
This patch makes the error message about changing journaling mode on remount
more descriptive. Some people are going to hit this error now due to commit
bbae8bcc49bc4d002221dab52c79a50a82e7cd1f if they configure a kernel to default
to data=writeback mode. The problem happens if they have data=ordered set for
the root filesystem in /etc/fstab but not in the kernel command line (and they
don't use initrd). Their filesystem then gets mounted as data=writeback by
kernel but then their boot fails because init scripts won't be able to remount
the filesystem rw. Better error message will hopefully make it easier for them
to find the error in their setup and bother us less with error reports :).
Signed-off-by: Jan Kara <jack@suse.cz>
Theodore Ts'o [Mon, 10 Aug 2009 20:03:43 +0000 (16:03 -0400)]
ext3: Update Kconfig description of EXT3_DEFAULTS_TO_ORDERED
The old description for this configuration option was perhaps not
completely balanced in terms of describing the tradeoffs of using a
default of data=writeback vs. data=ordered. Despite the fact that old
description very strongly recomended disabling this feature, all of
the major distributions have elected to preserve the existing 'legacy'
default, which is a strong hint that it perhaps wasn't telling the
whole story.
This revised description has been vetted by a number of ext3
developers as being better at informing the user about the tradeoffs
of enabling or disabling this configuration feature.
Cc: linux-ext4@vger.kernel.org
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Clemens Ladisch [Mon, 24 Aug 2009 07:11:58 +0000 (09:11 +0200)]
sound: vx222: fix input level control range check
Fix a logic error in the range check of the input level control that
would prevent setting any volume less than the maximum.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Dongdong Deng [Mon, 24 Aug 2009 05:59:04 +0000 (22:59 -0700)]
smc91x: let smc91x work well under netpoll
The NETPOLL requires that interrupts remain disabled in its callbacks.
Using *_irq_save()/irq_restore() to replace *_irq_disable()/irq_enable()
functions in NETPOLL's callbacks of smc91x, so that it doesn't enable
interrupts when already disabled, and kgdboe/netconsole would work
properly over smc91x.
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Mon, 24 Aug 2009 05:57:30 +0000 (22:57 -0700)]
pxaficp-ir: remove incorrect net_device_ops
This patch fixes broken pxaficp-ir. The problem was in incorrect
net_device_ops being specified which prevented the driver from
operating. The symptoms were:
- failing ifconfig for IrLAN, resulting in
SIOCSIFFLAGS: Cannot assign requested address
- irattach working for IrCOMM, but the port stayed disabled
Moreover this patch corrects missing sysfs device link.
Signed-off-by: Marek Vasut <marek.vasut@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Slaby [Mon, 24 Aug 2009 05:55:51 +0000 (22:55 -0700)]
NET: llc, zero sockaddr_llc struct
sllc_arphrd member of sockaddr_llc might not be changed. Zero sllc
before copying to the above layer's structure.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mimi Zohar [Fri, 21 Aug 2009 18:32:49 +0000 (14:32 -0400)]
ima: hashing large files bug fix
Hashing files larger than INT_MAX causes process to loop.
Dependent on redefining kernel_read() offset type to loff_t.
(http://bugzilla.kernel.org/show_bug.cgi?id=13909)
Cc: stable@kernel.org
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Mimi Zohar [Fri, 21 Aug 2009 18:32:48 +0000 (14:32 -0400)]
kernel_read: redefine offset type
vfs_read() offset is defined as loff_t, but kernel_read()
offset is only defined as unsigned long. Redefine
kernel_read() offset as loff_t.
Cc: stable@kernel.org
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Dongdong Deng [Mon, 24 Aug 2009 02:49:07 +0000 (19:49 -0700)]
drivers/net: fixed drivers that support netpoll use ndo_start_xmit()
The NETPOLL API requires that interrupts remain disabled in
netpoll_send_skb(). The use of "A functions set" in the NETPOLL API
callbacks causes the interrupts to get enabled and can lead to kernel
instability.
The solution is to use "B functions set" to prevent the irqs from
getting enabled while in netpoll_send_skb().
A functions set:
local_irq_disable()/local_irq_enable()
spin_lock_irq()/spin_unlock_irq()
spin_trylock_irq()/spin_unlock_irq()
B functions set:
local_irq_save()/local_irq_restore()
spin_lock_irqsave()/spin_unlock_irqrestore()
spin_trylock_irqsave()/spin_unlock_irqrestore()
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dongdong Deng [Fri, 21 Aug 2009 03:33:36 +0000 (03:33 +0000)]
netpoll: warning for ndo_start_xmit returns with interrupts enabled
WARN_ONCE for ndo_start_xmit() enable interrupts in netpoll_send_skb(),
because the NETPOLL API requires that interrupts remain disabled in
netpoll_send_skb().
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>