Julia Lawall [Fri, 7 Aug 2009 07:00:34 +0000 (09:00 +0200)]
powerpc/fsl_rio: Add kmalloc NULL tests
Check that the result of kmalloc/kzalloc is not NULL before dereferencing it.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression *x;
identifier f;
constant char *C;
@@
x = \(kmalloc\|kcalloc\|kzalloc\)(...);
... when != x == NULL
when != x != NULL
when != (x || ...)
(
kfree(x)
|
f(...,C,...,x,...)
|
*f(...,x,...)
|
*x->f
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Roel Kluin [Thu, 6 Aug 2009 23:00:37 +0000 (16:00 -0700)]
powerpc/fsl-booke: read buffer overflow
cam[tlbcam_index] is checked before tlbcam_index < ARRAY_SIZE(cam)
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kumar Gala [Thu, 30 Jul 2009 22:56:54 +0000 (17:56 -0500)]
powerpc/85xx: Added 36-bit physical device tree for mpc8536ds board
Added a device tree that should be similiar to mpc8536ds.dtb except
the physical addresses for all IO are above the 4G boundary.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kumar Gala [Thu, 30 Jul 2009 22:56:38 +0000 (17:56 -0500)]
powerpc/85xx: Move mpc8536ds.dts to address-cells/size-cells = <2>
Change the top-level #address-cells and #size-cells to <2> so the
mpc8536ds.dts is easier to deal with both a true 32-bit physical
or 36-bit physical address space.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stefan Roese [Wed, 29 Jul 2009 01:41:06 +0000 (01:41 +0000)]
powerpc/40x: Update kilauea defconfig to support NAND, RTC and HWMON
This patch adds support for the following devices to the Kilauea
defconfig file:
- PPC4xx NAND controller (NDFC)
- I2C RTC (Dallas DS1338)
- I2C HWMON (Dallas DS1775)
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stefan Roese [Wed, 29 Jul 2009 07:05:11 +0000 (07:05 +0000)]
powerpc/44x: Update Canyonlands defconfig to support NOR, NAND and RTC
This patch adds support for the following devices to the Canyonlands
defconfig file:
- NOR FLASH
- PPC4xx NAND controller (NDFC)
- I2C RTC (M41T80)
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stefan Roese [Wed, 29 Jul 2009 01:40:56 +0000 (01:40 +0000)]
powerpc/40x: Update Kilauea dts to support NAND, RTC and HWMON
This patch adds support for the following devices to the Kilauea dts:
- PPC4xx NAND controller (NDFC)
- I2C RTC (Dallas DS1338)
- I2C HWMON (Dallas DS1775)
Additionally the partitioning of the NOR FLASH is changed. The dtb
partition has been missing. Fixed in this patch.
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stefan Roese [Wed, 29 Jul 2009 07:05:01 +0000 (07:05 +0000)]
powerpc/44x: Add NAND support to Canyonlands dts
Also some whitespace cleanup in the USB device nodes.
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Stefan Roese [Wed, 29 Jul 2009 07:04:46 +0000 (07:04 +0000)]
powerpc: Add AMCC 460EX/460GT Rev. B support to cputable.c
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kumar Gala [Wed, 5 Aug 2009 03:33:32 +0000 (22:33 -0500)]
powerpc/mm: Fix switch_mmu_context to iterate of the proper list of cpus
Introduced a temporary variable into our iterating over the list cpus
that are threads on the same core. For some reason Ben forgot how for
loops work.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 6 Aug 2009 03:50:58 +0000 (13:50 +1000)]
powerpc/mm: Fix encoding of page table cache numbers
The mask used to encode the page table cache number in the
batch when freeing page tables was too small for the new
possible values of MMU page sizes. This increases it along
with a comment explaining the constraints.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:59 +0000 (23:15 +0000)]
powerpc: Remaining 64-bit Book3E support
This contains all the bits that didn't fit in previous patches :-) This
includes the actual exception handlers assembly, the changes to the
kernel entry, other misc bits and wiring it all up in Kconfig.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:58 +0000 (23:15 +0000)]
powerpc/mm: Add support for SPARSEMEM_VMEMMAP on 64-bit Book3E
The base TLB support didn't include support for SPARSEMEM_VMEMMAP, though
we did carve out some virtual space for it, the necessary support code
wasn't there. This implements it by using 16M pages for now, though the
page size could easily be changed at runtime if necessary.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:47 +0000 (23:15 +0000)]
powerpc: Add TLB management code for 64-bit Book3E
This adds the TLB miss handler assembly, the low level TLB flush routines
along with the necessary hook for dealing with our virtual page tables
or indirect TLB entries that need to be flushes when PTE pages are freed.
There is currently no support for hugetlbfs
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:45 +0000 (23:15 +0000)]
powerpc/mm: Move around mmu_gathers definition on 64-bit
The definition for the global structure mmu_gathers, used by generic code,
is currently defined in multiple places not including anything used by
64-bit Book3E. This changes it by moving to one place common to all
processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:42 +0000 (23:15 +0000)]
powerpc: Add PACA fields specific to 64-bit Book3E processors
This adds various fields in the PACA that are for use specifically
by Book3E processors, such as exception save areas, current pgd
pointer, special exceptions kernel stacks etc...
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:39 +0000 (23:15 +0000)]
powerpc: Add definitions used by exception handling on 64-bit Book3E
This adds various definitions and macros used by the exception and TLB
miss handling on 64-bit BookE
It also adds the definitions of the SPRGs used for various exception types
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Tue, 28 Jul 2009 01:59:34 +0000 (11:59 +1000)]
powerpc: Add memory management headers for new 64-bit BookE
This adds the PTE and pgtable format definitions, along with changes
to the kernel memory map and other definitions related to implementing
support for 64-bit Book3E. This also shields some asm-offset bits that
are currently only relevant on 32-bit
We also move the definition of the "linux" page size constants to
the common mmu.h file and add a few sizes that are relevant to
embedded processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:34 +0000 (23:15 +0000)]
powerpc: Add SPR definitions for new 64-bit BookE
This adds various SPRs defined on 64-bit BookE, along with changes
to the definition of the base MSR values to add the values needed
for 64-bit Book3E.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:28 +0000 (23:15 +0000)]
powerpc/mm: Rework & cleanup page table freeing code path
That patch used to just add a hook to page table flushing but
pulling that string brought out a whole bunch of issues, so it
now does that and more:
- We now make the RCU batching of page freeing SMP only, as I
believe it was intended initially. We make a few more things compile
to nothing on !CONFIG_SMP
- Some macros are turned into functions, though that forced me to
out of line a few stuffs due to unsolvable include depenencies,
however it's probably better that way anyway, it's not -that-
critical code path.
- 32-bit didn't call pte_free_finish() on tlb_flush() which means
that it wouldn't push out the batch to RCU for delayed freeing when
a bunch of page tables have been freed, they would just stay in there
until the batch gets full.
64-bit BookE will use that hook to maintain the virtually linear
page tables or the indirect entries in the TLB when using the
HW loader.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:28 +0000 (23:15 +0000)]
powerpc: Move definitions of secondary CPU spinloop to header file
Those definitions are currently declared extern in the .c file where
they are used, move them to a header file instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:27 +0000 (23:15 +0000)]
powerpc: Clean ifdef usage in copy_thread()
Currently, a single ifdef covers SLB related bits and more generic ppc64
related bits, split this in two separate ifdef's since 64-bit BookE will
need one but not the other.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:26 +0000 (23:15 +0000)]
powerpc/mm: Call mmu_context_init() from ppc64
Our 64-bit hash context handling has no init function, but 64-bit Book3E
will use the common mmu_context_nohash.c code which does, so define an
empty inline mmu_context_init() for 64-bit server and call it from
our 64-bit setup_arch()
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:24 +0000 (23:15 +0000)]
powerpc/mm: Make low level TLB flush ops on BookE take additional args
We need to pass down whether the page is direct or indirect and we'll
need to pass the page size to _tlbil_va and _tlbivax_bcast
We also add a new low level _tlbil_pid_noind() which does a TLB flush
by PID but avoids flushing indirect entries if possible
This implements those new prototypes but defines them with inlines
or macros so that no additional arguments are actually passed on current
processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:20 +0000 (23:15 +0000)]
powerpc: Modify some ppc_asm.h macros to accomodate 64-bits Book3E
The way I intend to use tophys/tovirt on 64-bit BookE is different
from the "trick" that we currently play for 32-bit BookE so change
the condition of definition of these macros to make it so.
Also, make sure we only use rfid and mtmsrd instead of rfi and mtmsr
for 64-bit server processors, not all 64-bit processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:16 +0000 (23:15 +0000)]
powerpc/mm: Add support for early ioremap on non-hash 64-bit processors
This adds some code to do early ioremap's using page tables instead of
bolting entries in the hash table. This will be used by the upcoming
64-bits BookE port.
The patch also changes the test for early vs. late ioremap to use
slab_is_available() instead of our old hackish mem_init_done.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:12 +0000 (23:15 +0000)]
powerpc/mm: Add more bit definitions for Book3E MMU registers
This adds various additional bit definitions for various MMU related
SPRs used on Book3E.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:11 +0000 (23:15 +0000)]
powerpc/mm: Add opcode definitions for tlbivax and tlbsrx.
This adds the opcode definitions to ppc-opcode.h for the two instructions
tlbivax and tlbsrx. as defined by Book3E 2.06
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:10 +0000 (23:15 +0000)]
powerpc/mm: Add HW threads support to no_hash TLB management
The current "no hash" MMU context management code is written with
the assumption that one CPU == one TLB. This is not the case on
implementations that support HW multithreading, where several
linux CPUs can share the same TLB.
This adds some basic support for this to our context management
and our TLB flushing code.
It also cleans up the optional debugging output a bit
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:07 +0000 (23:15 +0000)]
powerpc/of: Remove useless register save/restore when calling OF back
enter_prom() used to save and restore registers such as CTR, XER etc..
which are volatile, or SRR0,1... which we don't care about. This
removes a bunch of useless code and while at it turns an mtmsrd into
an MTMSRD macro which will be useful to Book3E.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 23 Jul 2009 23:15:04 +0000 (23:15 +0000)]
powerpc/mm: Fix misplaced #endif in pgtable-ppc64-64k.h
A misplaced #endif causes more definitions than intended to be
protected by #ifndef __ASSEMBLY__. This breaks upcoming 64-bit
BookE support patch when using 64k pages.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Tue, 28 Jul 2009 01:54:32 +0000 (11:54 +1000)]
powerpc: Add compat_sys_truncate
The truncate syscall has a signed long parameter, so when using a 32-
bit userspace with a 64-bit kernel the argument is zero-extended
instead of sign-extended. Adding the compat_sys_truncate function
fixes the issue.
This was noticed during an LSB truncate test failure. The test was
checking for the correct error number set when truncate is called with
a length of -1. The test can be found at:
http://bzr.linuxfoundation.org/lsb/devel/runtime-test?cmd=inventory;rev=stewb%40linux-foundation.org-
20090626205411-sfb23cc0tjj7jzgm;path=modules/vsx-pcts/tset/POSIX.os/files/truncate/
BenH: Added compat_sys_ftruncate() as well, same issue.
Signed-off-by: Chase Douglas <cndougla@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Lucian Adrian Grijincu [Thu, 23 Jul 2009 00:13:37 +0000 (00:13 +0000)]
powerpc: Update boot wrapper script with the new location of dtc
dtc was moved in
9fffb55f66127b52c937ede5196ebfa0c0d50bce from
arch/powerpc/boot/ to scripts/dtc/
This patch updates the wrapper script to point to the new location of dtc.
Signed-off-by: Lucian Adrian Grijincu <lgrijincu@ixiacom.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Frans Pop [Thu, 23 Jul 2009 08:57:18 +0000 (08:57 +0000)]
powerpc: Makefile simplification through use of cc-ifversion
Signed-off-by: Frans Pop <elendil@planet.nl>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Tue, 14 Jul 2009 20:52:56 +0000 (20:52 +0000)]
powerpc: Change PACA from SPRG3 to SPRG1
This change the SPRG used to store the PACA on ppc64 from
SPRG3 to SPRG1. SPRG3 is user readable on most processors
and we want to use it for other things. We change the scratch
SPRG used by exception vectors from SRPG1 to SPRG2.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Tue, 14 Jul 2009 20:56:58 +0000 (20:56 +0000)]
powerpc/pmac: Fix PowerSurge SMP IPI allocation
The code for setting up the IPIs for SMP PowerSurge marchines bitrot,
it needs to properly map the HW interrupt number
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Tue, 21 Jul 2009 15:25:53 +0000 (15:25 +0000)]
powerpc/mm: Fix definitions of FORCE_MAX_ZONEORDER in Kconfig
The current definitions set ranges and defaults for 32 and 64-bit
only using "PPC_STD_MMU" which means hash based MMU. This uselessly
restrict the usefulness for the upcoming 64-bit BookE port, but more
than that, it's broken on 32-bit since the only 32-bit platform
supporting multiple page sizes currently is 44x which does -not-
have PPC_STD_MMU_32 set.
This fixes it by using PPC64 and PPC32 instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
roel kluin [Tue, 21 Jul 2009 00:17:17 +0000 (00:17 +0000)]
powerpc/cell: Replace strncpy by strlcpy
Replace strncpy() and explicit null-termination by strlcpy()
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 16 Jul 2009 19:36:57 +0000 (19:36 +0000)]
powerpc: Remove use of a second scratch SPRG in STAB code
The STAB code used on Power3 and RS/64 uses a second scratch SPRG to
save a GPR in order to decide whether to go to do_stab_bolted_* or
to handle a normal data access exception.
This prevents our scheme of freeing SPRG3 which is user visible for
user uses since we cannot use SPRG0 which, on RS/64, seems to be
read-only for supervisor mode (like POWER4).
This reworks the STAB exception entry to use the PACA as temporary
storage instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Tue, 14 Jul 2009 20:52:54 +0000 (20:52 +0000)]
powerpc: Use names rather than numbers for SPRGs (v2)
The kernel uses SPRG registers for various purposes, typically in
low level assembly code as scratch registers or to hold per-cpu
global infos such as the PACA or the current thread_info pointer.
We want to be able to easily shuffle the usage of those registers
as some implementations have specific constraints realted to some
of them, for example, some have userspace readable aliases, etc..
and the current choice isn't always the best.
This patch should not change any code generation, and replaces the
usage of SPRN_SPRGn everywhere in the kernel with a named replacement
and adds documentation next to the definition of the names as to
what those are used for on each processor family.
The only parts that still use the original numbers are bits of KVM
or suspend/resume code that just blindly needs to save/restore all
the SPRGs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Tue, 14 Jul 2009 20:52:52 +0000 (20:52 +0000)]
powerpc: Rename exception.h to exception-64s.h
The file include/asm/exception.h contains definitions
that are specific to exception handling on 64-bit server
type processors.
This renames the file to exception-64s.h to reflect that
fact and avoid confusion.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Mon, 13 Jul 2009 20:53:53 +0000 (20:53 +0000)]
powerpc: Preload application text segment instead of TASK_UNMAPPED_BASE
TASK_UNMAPPED_BASE is not used with the new top down mmap layout. We can
reuse this preload slot by loading in the segment at 0x10000000, where almost
all PowerPC binaries are linked at.
On a microbenchmark that bounces a token between two 64bit processes over pipes
and calls gettimeofday each iteration (to access the VDSO), both the 32bit and
64bit context switch rate improves (tested on a 4GHz POWER6):
32bit: 273k/sec -> 283k/sec
64bit: 277k/sec -> 284k/sec
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Mon, 13 Jul 2009 20:53:52 +0000 (20:53 +0000)]
powerpc: Rearrange SLB preload code
With the new top down layout it is likely that the pc and stack will be in the
same segment, because the pc is most likely in a library allocated via a top
down mmap. Right now we bail out early if these segments match.
Rearrange the SLB preload code to sanity check all SLB preload addresses
are not in the kernel, then check all addresses for conflicts.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Mon, 13 Jul 2009 20:53:51 +0000 (20:53 +0000)]
powerpc: Move 64bit VDSO to improve context switch performance
On 64bit applications the VDSO is the only thing in segment 0. Since the VDSO
is position independent we can remove the hint and let get_unmapped_area pick
an area. This will mean the vdso will be near other mmaps and will share
an SLB entry:
10000000-
10001000 r-xp
00000000 08:06 5778459 /root/context_switch_64
10010000-
10011000 r--p
00000000 08:06 5778459 /root/context_switch_64
10011000-
10012000 rw-p
00001000 08:06 5778459 /root/context_switch_64
fffa92ae000-
fffa92b0000 rw-p
00000000 00:00 0
fffa92b0000-
fffa9453000 r-xp
00000000 08:06 4334051 /lib64/power6/libc-2.9.so
fffa9453000-
fffa9462000 ---p
001a3000 08:06 4334051 /lib64/power6/libc-2.9.so
fffa9462000-
fffa9466000 r--p
001a2000 08:06 4334051 /lib64/power6/libc-2.9.so
fffa9466000-
fffa947c000 rw-p
001a6000 08:06 4334051 /lib64/power6/libc-2.9.so
fffa947c000-
fffa9480000 rw-p
00000000 00:00 0
fffa9480000-
fffa94a8000 r-xp
00000000 08:06 4333852 /lib64/ld-2.9.so
fffa94b3000-
fffa94b4000 rw-p
00000000 00:00 0
fffa94b4000-
fffa94b7000 r-xp
00000000 00:00 0 [vdso] <----- here I am
fffa94b7000-
fffa94b8000 r--p
00027000 08:06 4333852 /lib64/ld-2.9.so
fffa94b8000-
fffa94bb000 rw-p
00028000 08:06 4333852 /lib64/ld-2.9.so
fffa94bb000-
fffa94bc000 rw-p
00000000 00:00 0
fffe4c10000-
fffe4c25000 rw-p
00000000 00:00 0 [stack]
On a microbenchmark that bounces a token between two 64bit processes over pipes
and calls gettimeofday each iteration (to access the VDSO), our context switch
rate goes from 268k to 277k ctx switches/sec (tested on a 4GHz POWER6).
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Geoff Thorpe [Tue, 7 Jul 2009 15:23:56 +0000 (15:23 +0000)]
powerpc: expose the multi-bit ops that underlie single-bit ops.
The bitops.h functions that operate on a single bit in a bitfield are
implemented by operating on the corresponding word location. In all
cases the inner logic is valid if the mask being applied has more than
one bit set, so this patch exposes those inner operations. Indeed,
set_bits() was already available, but it duplicated code from
set_bit() (rather than making the latter a wrapper) - it was also
missing the PPC405_ERR77() workaround and the "volatile" address
qualifier present in other APIs. This corrects that, and exposes the
other multi-bit equivalents.
One advantage of these multi-bit forms is that they allow word-sized
variables to essentially be their own spinlocks, eg. very useful for
state machines where an atomic "flags" variable can obviate the need
for any additional locking.
Signed-off-by: Geoff Thorpe <geoff@geoffthorpe.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Sun, 5 Jul 2009 16:08:52 +0000 (16:08 +0000)]
powerpc/mpic: Fix MPIC_BROKEN_REGREAD on non broken MPICs
The workaround enabled by CONFIG_MPIC_BROKEN_REGREAD does not work
on non-broken MPICs. The symptom is no interrupts being received.
The fix is twofold. Firstly the code was broken for multiple isus,
we need to index into the shadow array with the src_no, not the idx.
Secondly, we always do the read, but only use the VECPRI_MASK and
VECPRI_ACTIVITY bits from the hardware, the rest of "val" comes
from the shadow.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gerhard Pircher [Fri, 19 Jun 2009 11:40:57 +0000 (11:40 +0000)]
powerpc/amigaone: Convert amigaone_init() to a machine_device_initcall()
This allows to remove the ppc_md.init() hook in the setup code.
Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Wed, 19 Aug 2009 02:41:47 +0000 (19:41 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
security: Fix prompt for LSM_MMAP_MIN_ADDR
security: Make LSM_MMAP_MIN_ADDR default match its help text.
Linus Torvalds [Wed, 19 Aug 2009 02:41:05 +0000 (19:41 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: use the right flag for get_vm_area()
percpu, sparc64: fix sparse possible cpu map handling
init: set nr_cpu_ids before setup_per_cpu_areas()
Linus Torvalds [Tue, 18 Aug 2009 23:55:43 +0000 (16:55 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mce: Don't initialize MCEs on unknown CPUs
x86, mce: don't log boot MCEs on Pentium M (model == 13) CPUs
x86: Annotate section mismatch warnings in kernel/apic/x2apic_uv_x.c
x86, mce: therm_throt: Don't log redundant normality
x86: Fix UV BAU destination subnode id
Bo Liu [Tue, 18 Aug 2009 21:11:19 +0000 (14:11 -0700)]
mm: build_zonelists(): move clear node_load[] to __build_all_zonelists()
If node_load[] is cleared everytime build_zonelists() is
called,node_load[] will have no help to find the next node that should
appear in the given node's fallback list.
Because of the bug, zonelist's node_order is not calculated as expected.
This bug affects on big machine, which has asynmetric node distance.
[synmetric NUMA's node distance]
0 1 2
0 10 12 12
1 12 10 12
2 12 12 10
[asynmetric NUMA's node distance]
0 1 2
0 10 12 20
1 12 10 14
2 20 14 10
This (my bug) is very old but no one has reported this for a long time.
Maybe because the number of asynmetric NUMA is very small and they use
cpuset for customizing node memory allocation fallback.
[akpm@linux-foundation.org: fix CONFIG_NUMA=n build]
Signed-off-by: Bo Liu <bo-liu@hotmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 18 Aug 2009 21:11:18 +0000 (14:11 -0700)]
REPORTING-BUGS: add get_maintainer.pl blurb
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Graff Yang [Tue, 18 Aug 2009 21:11:17 +0000 (14:11 -0700)]
nommu: check fd read permission in validate_mmap_request()
According to the POSIX (1003.1-2008), the file descriptor shall have been
opened with read permission, regardless of the protection options specified to
mmap(). The ltp test cases mmap06/07 need this.
Signed-off-by: Graff Yang <graff.yang@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Tue, 18 Aug 2009 21:11:17 +0000 (14:11 -0700)]
spi_s3c24xx: fix transfer setup code
Since the changes to the bitbang driver, there is the possibility we will
be called with either the speed_hz or bpw values zero. We take these to
mean that the default values (8 bits per word, or maximum bus speed).
Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Tue, 18 Aug 2009 21:11:16 +0000 (14:11 -0700)]
spi_s3c24xx: fix clock rate calculation
Currently the clock rate calculation may round as pleased, which means
that it is possible that we will round down and end up with a faster clock
rate than intended.
Change the calculation to use DIV_ROUND_UP() to ensure that we end up with
a clock rate either the same as or lower than the user requested one.
Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 18 Aug 2009 21:11:12 +0000 (14:11 -0700)]
mmc: add the new linux-mmc mailing list to MAINTAINERS
There are a number of individual MMC drivers listed in MAINTAINERS. I
didn't modify those records. Perhaps I should have.
Cc: <linux-mmc@vger.kernel.org>
Cc: Manuel Lauss <manuel.lauss@gmail.com>
Cc: Nicolas Pitre <nico@cam.org>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Cc: Pavel Pisa <ppisa@pikron.com>
Cc: Jarkko Lavinen <jarkko.lavinen@nokia.com>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Sascha Sommer <saschasommer@freenet.de>
Cc: Ian Molton <ian@mnementh.co.uk>
Cc: Joseph Chan <JosephChan@via.com.tw>
Cc: Harald Welte <HaraldWelte@viatech.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 18 Aug 2009 21:11:10 +0000 (14:11 -0700)]
mm: revert "oom: move oom_adj value"
The commit
2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to
the mm_struct. It was a very good first step for sanitize OOM.
However Paul Menage reported the commit makes regression to his job
scheduler. Current OOM logic can kill OOM_DISABLED process.
Why? His program has the code of similar to the following.
...
set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
...
if (vfork() == 0) {
set_oom_adj(0); /* Invoked child can be killed */
execve("foo-bar-cmd");
}
....
vfork() parent and child are shared the same mm_struct. then above
set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also
change oom_adj for vfork() parent. Then, vfork() parent (job scheduler)
lost OOM immune and it was killed.
Actually, fork-setting-exec idiom is very frequently used in userland program.
We must not break this assumption.
Then, this patch revert commit
2ff05b2b and related commit.
Reverted commit list
---------------------
- commit
2ff05b2b4e (oom: move oom_adj value from task_struct to mm_struct)
- commit
4d8b9135c3 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE)
- commit
8123681022 (oom: only oom kill exiting tasks with attached memory)
- commit
933b787b57 (mm: copy over oom_adj value at fork time)
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Tue, 18 Aug 2009 21:11:08 +0000 (14:11 -0700)]
vfs: make get_sb_pseudo set s_maxbytes to value that can be cast to signed
get_sb_pseudo sets s_maxbytes to ~0ULL which becomes negative when cast
to a signed value. Fix it to use MAX_LFS_FILESIZE which casts properly
to a positive signed value.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Steve French <smfrench@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 18 Aug 2009 21:11:06 +0000 (14:11 -0700)]
MAINTAINERS: OSD LIBRARY and FILESYSTEM pattern fix
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Benny Halevy <bhalevy@panasas.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andreas Schwab [Tue, 18 Aug 2009 20:14:29 +0000 (22:14 +0200)]
security: Fix prompt for LSM_MMAP_MIN_ADDR
Fix prompt for LSM_MMAP_MIN_ADDR.
(Verbs are cool!)
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Dave Jones [Tue, 18 Aug 2009 17:47:37 +0000 (13:47 -0400)]
security: Make LSM_MMAP_MIN_ADDR default match its help text.
Commit
788084aba2ab7348257597496befcbccabdc98a3 added the LSM_MMAP_MIN_ADDR
option, whose help text states "For most ia64, ppc64 and x86 users with lots
of address space a value of 65536 is reasonable and should cause no problems."
Which implies that it's default setting was typoed.
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Linus Torvalds [Tue, 18 Aug 2009 20:57:38 +0000 (13:57 -0700)]
Merge branch 'irq-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq: Wake up irq thread after action has been installed
Linus Torvalds [Tue, 18 Aug 2009 20:55:01 +0000 (13:55 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (60 commits)
net: restore gnet_stats_basic to previous definition
NETROM: Fix use of static buffer
e1000e: fix use of pci_enable_pcie_error_reporting
e1000e: WoL does not work on 82577/82578 with manageability enabled
cnic: Fix locking in init/exit calls.
cnic: Fix locking in start/stop calls.
bnx2: Use mutex on slow path cnic calls.
cnic: Refine registration with bnx2.
cnic: Fix symbol_put_addr() panic on ia64.
gre: Fix MTU calculation for bound GRE tunnels
pegasus: Add new device ID.
drivers/net: fixed drivers that support netpoll use ndo_start_xmit()
via-velocity: Fix test of mii_status bit VELOCITY_DUPLEX_FULL
rt2x00: fix memory corruption in rf cache, add a sanity check
ixgbe: Fix receive on real device when VLANs are configured
ixgbe: Do not return 0 in ixgbe_fcoe_ddp() upon FCP_RSP in DDP completion
netxen: free napi resources during detach
netxen: remove netxen workqueue
ixgbe: fix issues setting rx-usecs with legacy interrupts
can: fix oops caused by wrong rtnl newlink usage
...
Linus Torvalds [Tue, 18 Aug 2009 20:54:26 +0000 (13:54 -0700)]
Merge branch 'sh/for-2.6.31' of git://git./linux/kernel/git/lethal/sh-2.6
* 'sh/for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: sh7724 ddr self-refresh changes
sh: use in-soc KEYSC on se7724
sh: CMT suspend/resume
sh: skip disabled LCDC channels
Linus Torvalds [Tue, 18 Aug 2009 20:54:08 +0000 (13:54 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
Fix new incorrect error return from do_md_stop.
Thomas Gleixner [Mon, 17 Aug 2009 12:07:16 +0000 (14:07 +0200)]
genirq: Wake up irq thread after action has been installed
The wake_up_process() of the new irq thread in __setup_irq() is too
early as the irqaction is not yet fully initialized especially
action->irq is not yet set. The interrupt thread might dereference the
wrong irq descriptor.
Move the wakeup after the action is installed and action->irq has been
set.
Reported-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Buesch <mb@bu3sch.de>
Eric Dumazet [Sun, 16 Aug 2009 09:36:49 +0000 (09:36 +0000)]
net: restore gnet_stats_basic to previous definition
In
5e140dfc1fe87eae27846f193086724806b33c7d "net: reorder struct Qdisc
for better SMP performance" the definition of struct gnet_stats_basic
changed incompatibly, as copies of this struct are shipped to
userland via netlink.
Restoring old behavior is not welcome, for performance reason.
Fix is to use a private structure for kernel, and
teach gnet_stats_copy_basic() to convert from kernel to user land,
using legacy structure (struct gnet_stats_basic)
Based on a report and initial patch from Michael Spang.
Reported-by: Michael Spang <mspang@csclub.uwaterloo.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ralf Baechle [Tue, 18 Aug 2009 01:05:32 +0000 (18:05 -0700)]
NETROM: Fix use of static buffer
The static variable used by nr_call_to_digi might result in corruption if
multiple threads are trying to usee a node or neighbour via ioctl. Fixed
by having the caller pass a structure in. This is safe because nr_add_node
rsp. nr_add_neigh will allocate a permanent structure, if needed.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
NeilBrown [Tue, 18 Aug 2009 00:35:26 +0000 (10:35 +1000)]
Fix new incorrect error return from do_md_stop.
Recent commit
c8c00a6915a2e3d10416e8bdd3138429beb96210
changed the exit paths in do_md_stop and was not quite
careful enough. There is one path were 'err' now needs
to be cleared but it isn't.
So setting an array to readonly (with mdadm --readonly) will
work, but will incorrectly report and error: ENXIO.
Signed-off-by: NeilBrown <neilb@suse.de>
Linus Torvalds [Mon, 17 Aug 2009 20:39:52 +0000 (13:39 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
MIPS: Fix HPAGE_SIZE redefinition
Linus Torvalds [Mon, 17 Aug 2009 20:39:30 +0000 (13:39 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix locking in xfs_iget_cache_hit
Linus Torvalds [Mon, 17 Aug 2009 20:38:58 +0000 (13:38 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
security: define round_hint_to_min in !CONFIG_SECURITY
Security/SELinux: seperate lsm specific mmap_min_addr
SELinux: call cap_file_mmap in selinux_file_mmap
Capabilities: move cap_file_mmap to commoncap.c
Eric Paris [Mon, 17 Aug 2009 01:51:55 +0000 (21:51 -0400)]
inotify: start watch descriptor count at 1
The inotify_add_watch man page specifies that inotify_add_watch() will
return a non-negative integer. However, historically the inotify
watches started at 1, not at 0.
Turns out that the inotifywait program provided by the inotify-tools
package doesn't properly handle a 0 watch descriptor. In
7e790dd5 we
changed from starting at 1 to starting at 0. This patch starts at 1,
just like in previous kernels, but also just like in previous kernels
it's possible for it to wrap back to 0. This preserves the kernel
functionality exactly like it was before the patch (neither method broke
the spec)
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Paris [Mon, 17 Aug 2009 01:51:49 +0000 (21:51 -0400)]
inotify: tail drop inotify q_overflow events
In
f44aebcc the tail drop logic of events with no file backing
(q_overflow and in_ignored) was reversed so IN_IGNORED events would
never be tail dropped. This now means that Q_OVERFLOW events are NOT
tail dropped. The fix is to not tail drop IN_IGNORED, but to tail drop
Q_OVERFLOW.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Paris [Mon, 17 Aug 2009 01:51:44 +0000 (21:51 -0400)]
notify: unused event private race
inotify decides if private data it passed to get added to an event was
used by checking list_empty(). But it's possible that the event may
have been dequeued and the private event removed so it would look empty.
The fix is to use the return code from fsnotify_add_notify_event rather
than looking at the list.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 17 Aug 2009 20:36:39 +0000 (13:36 -0700)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm: (37 commits)
ARM: 5673/1: U300 fix initsection compile warning
ARM: Fix broken highmem support
mx31moboard: invert sdhc ro signal sense
ARM: S3C24XX: Fix clkout mpx error
ARM: S3C64XX: serial: Fix a typo in Kconfig
IXP4xx: Fix IO_SPACE_LIMIT for 2.6.31-rc core PCI changes
OMAP3: RX51: Updated rx51_defconfig
OMAP2/3: mmc-twl4030: Free up MMC regulators while cleaning up
OMAP3: RX51: Define TWL4030 USB transceiver in board file
OMAP3: Overo: Fix smsc911x platform device resource value
OMAP3: Fix omap3 sram virtual addres overlap vmalloc space after increasing vmalloc size
OMAP2/3: DMA errata correction
OMAP: Fix testing of cpu defines for mach-omap1
OMAP3: Overo: add missing pen-down GPIO definition
OMAP: GPIO: clear/restore level/edge detect settings on mask/unmask
OMAP3: PM: Fix wrong sequence in suspend.
OMAP: PM: CPUfreq: obey min/max settings of policy
OMAP2/3/4: UART: allow in-order port traversal
OMAP2/3/4: UART: Allow per-UART disabling wakeup for serial ports
OMAP3: Fixed crash bug with serial + suspend
...
Atsushi Nemoto [Tue, 14 Jul 2009 13:37:09 +0000 (22:37 +0900)]
MIPS: Fix HPAGE_SIZE redefinition
This patch fixes warnings like this:
CC fs/proc/meminfo.o
In file included from /work/linux/include/linux/mmzone.h:20,
from /work/linux/include/linux/gfp.h:4,
from /work/linux/include/linux/mm.h:8,
from /work/linux/fs/proc/meminfo.c:5:
/work/linux/arch/mips/include/asm/page.h:36:1: warning: "HPAGE_SIZE" redefined
In file included from /work/linux/fs/proc/meminfo.c:2:
/work/linux/include/linux/hugetlb.h:107:1: warning: this is the location of the previous definition
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Acked-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ingo Molnar [Mon, 17 Aug 2009 08:19:00 +0000 (10:19 +0200)]
x86, mce: Don't initialize MCEs on unknown CPUs
An older test-box started hanging at the following point during
bootup:
[ 0.022996] Mount-cache hash table entries: 512
[ 0.024996] Initializing cgroup subsys debug
[ 0.025996] Initializing cgroup subsys cpuacct
[ 0.026995] Initializing cgroup subsys devices
[ 0.027995] Initializing cgroup subsys freezer
[ 0.028995] mce: CPU supports 5 MCE banks
I've bisected it down to commit
4efc0670 ("x86, mce: use 64bit
machine check code on 32bit"), which utilizes the MCE code on
32-bit systems too.
The problem is caused by this detail in my config:
# CONFIG_CPU_SUP_INTEL is not set
This disables the quirks in mce_cpu_quirks() but still enables
MCE support - which then hangs due to the missing quirk
workaround needed on this CPU:
if (c->x86 == 6 && c->x86_model < 0x1A && banks > 0)
mce_banks[0].init = 0;
The safe solution is to not initialize MCEs if we dont know on
what CPU we are running (or if that CPU's support code got
disabled in the config).
Also be a bit more defensive on 32-bit systems: dont do a
boot-time dump of pending MCEs not just on the specific system
that we found a problem with (Pentium-M), but earlier ones as
well.
Now this problem is probably not common and disabling CPU
support is rare - but still being more defensive in something
we turned on for a wide range of CPUs is prudent.
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
LKML-Reference: Message-ID: <
4A88E3E4.40506@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Bartlomiej Zolnierkiewicz [Tue, 28 Jul 2009 21:52:54 +0000 (23:52 +0200)]
x86, mce: don't log boot MCEs on Pentium M (model == 13) CPUs
On my legacy Pentium M laptop (Acer Extensa 2900) I get bogus MCE on a cold
boot with CONFIG_X86_NEW_MCE enabled, i.e. (after decoding it with mcelog):
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 1 MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: Data CACHE Level-1 UNKNOWN Error
STATUS
f200000000000195 MCGSTATUS 0
[ The other STATUS values observed:
f2000000000001b5 (... UNKNOWN error)
and
f200000000000115 (... READ Error).
To verify that this is not a CONFIG_X86_NEW_MCE bug I also modified
the CONFIG_X86_OLD_MCE code (which doesn't log any MCEs) to dump
content of STATUS MSR before it is cleared during initialization. ]
Since the bogus MCE results in a kernel taint (which in turn disables
lockdep support) don't log boot MCEs on Pentium M (model == 13) CPUs
by default ("mce=bootlog" boot parameter can be be used to get the old
behavior).
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Christoph Hellwig [Mon, 17 Aug 2009 00:36:34 +0000 (20:36 -0400)]
xfs: fix locking in xfs_iget_cache_hit
The locking in xfs_iget_cache_hit currently has numerous problems:
- we clear the reclaim tag without i_flags_lock which protects
modifications to it
- we call inode_init_always which can sleep with pag_ici_lock
held (this is oss.sgi.com BZ #819)
- we acquire and drop i_flags_lock a lot and thus provide no
consistency between the various flags we set/clear under it
This patch fixes all that with a major revamp of the locking in
the function. The new version acquires i_flags_lock early and
only drops it once we need to call into inode_init_always or before
calling xfs_ilock.
This patch fixes a bug seen in the wild where we race modifying the
reclaim tag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Eric Paris [Fri, 7 Aug 2009 18:53:57 +0000 (14:53 -0400)]
security: define round_hint_to_min in !CONFIG_SECURITY
Fix the header files to define round_hint_to_min() and to define
mmap_min_addr_handler() in the !CONFIG_SECURITY case.
Built and tested with !CONFIG_SECURITY
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Eric Paris [Fri, 31 Jul 2009 16:54:11 +0000 (12:54 -0400)]
Security/SELinux: seperate lsm specific mmap_min_addr
Currently SELinux enforcement of controls on the ability to map low memory
is determined by the mmap_min_addr tunable. This patch causes SELinux to
ignore the tunable and instead use a seperate Kconfig option specific to how
much space the LSM should protect.
The tunable will now only control the need for CAP_SYS_RAWIO and SELinux
permissions will always protect the amount of low memory designated by
CONFIG_LSM_MMAP_MIN_ADDR.
This allows users who need to disable the mmap_min_addr controls (usual reason
being they run WINE as a non-root user) to do so and still have SELinux
controls preventing confined domains (like a web server) from being able to
map some area of low memory.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Eric Paris [Fri, 31 Jul 2009 16:54:05 +0000 (12:54 -0400)]
SELinux: call cap_file_mmap in selinux_file_mmap
Currently SELinux does not check CAP_SYS_RAWIO in the file_mmap hook. This
means there is no DAC check on the ability to mmap low addresses in the
memory space. This function adds the DAC check for CAP_SYS_RAWIO while
maintaining the selinux check on mmap_zero. This means that processes
which need to mmap low memory will need CAP_SYS_RAWIO and mmap_zero but will
NOT need the SELinux sys_rawio capability.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Eric Paris [Fri, 31 Jul 2009 16:53:58 +0000 (12:53 -0400)]
Capabilities: move cap_file_mmap to commoncap.c
Currently we duplicate the mmap_min_addr test in cap_file_mmap and in
security_file_mmap if !CONFIG_SECURITY. This patch moves cap_file_mmap
into commoncap.c and then calls that function directly from
security_file_mmap ifndef CONFIG_SECURITY like all of the other capability
checks are done.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Leonardo Potenza [Sun, 16 Aug 2009 16:55:48 +0000 (18:55 +0200)]
x86: Annotate section mismatch warnings in kernel/apic/x2apic_uv_x.c
The function uv_acpi_madt_oem_check() has been marked __init,
the struct apic_x2apic_uv_x has been marked __refdata.
The aim is to address the following section mismatch messages:
WARNING: arch/x86/kernel/apic/built-in.o(.data+0x1368): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .cpuinit.text:uv_wakeup_secondary()
The variable apic_x2apic_uv_x references
the function __cpuinit uv_wakeup_secondary()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,
WARNING: arch/x86/kernel/built-in.o(.data+0x68e8): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .cpuinit.text:uv_wakeup_secondary()
The variable apic_x2apic_uv_x references
the function __cpuinit uv_wakeup_secondary()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,
WARNING: arch/x86/built-in.o(.text+0x7b36f): Section mismatch in reference from the function uv_acpi_madt_oem_check() to the function .init.text:early_ioremap()
The function uv_acpi_madt_oem_check() references
the function __init early_ioremap().
This is often because uv_acpi_madt_oem_check lacks a __init
annotation or the annotation of early_ioremap is wrong.
WARNING: arch/x86/built-in.o(.text+0x7b38d): Section mismatch in reference from the function uv_acpi_madt_oem_check() to the function .init.text:early_iounmap()
The function uv_acpi_madt_oem_check() references
the function __init early_iounmap().
This is often because uv_acpi_madt_oem_check lacks a __init
annotation or the annotation of early_iounmap is wrong.
WARNING: arch/x86/built-in.o(.data+0x8668): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .cpuinit.text:uv_wakeup_secondary()
The variable apic_x2apic_uv_x references
the function __cpuinit uv_wakeup_secondary()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,
Signed-off-by: Leonardo Potenza <lpotenza@inwind.it>
LKML-Reference: <
200908161855.48302.lpotenza@inwind.it>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Randy Dunlap [Sun, 16 Aug 2009 14:33:30 +0000 (07:33 -0700)]
dm-log-userspace: fix printk format warning
drivers/md/dm-log-userspace-transfer.c:110: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'size_t'
Previously posted and acked, but apparently lost.
http://lkml.indiana.edu/hypermail/linux/kernel/0906.2/02074.html
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: dm-devel@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sun, 16 Aug 2009 14:54:37 +0000 (15:54 +0100)]
x86, mce: therm_throt: Don't log redundant normality
0d01f31439c1e4d602bf9fdc924ab66f407f5e38 "x86, mce: therm_throt
- change when we print messages" removed redundant
announcements of "Temperature/speed normal".
They're not worth logging and remove their accompanying
"Machine check events logged" messages as well from the
console.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dmitry Torokhov <dtor@mail.ru>
LKML-Reference: <Pine.LNX.4.64.
0908161544100.7929@sister.anvils>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Xiaotian Feng [Fri, 14 Aug 2009 14:35:52 +0000 (14:35 +0000)]
e1000e: fix use of pci_enable_pcie_error_reporting
commit
111b9dc5 ("e1000e: add aer support") introduces pcie aer
support for e1000e, but it is not reasonable to disable it in
e1000_remove but enable it in e1000_resume. This patch enables aer
support in e1000_probe.
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Fri, 14 Aug 2009 14:35:33 +0000 (14:35 +0000)]
e1000e: WoL does not work on 82577/82578 with manageability enabled
With manageability (Intel AMT) enabled via BIOS, PHY wakeup does not get
configured on newer parts which use PHY wakeup vs. MAC wakeup which causes
WoL to not work. The driver should configure PHY wakeup whether or not
manageability is enabled.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 14 Aug 2009 15:49:47 +0000 (15:49 +0000)]
cnic: Fix locking in init/exit calls.
The slow path ulp_init and ulp_exit calls to the bnx2i driver
are sleepable calls and therefore should not be protected using
rcu_read_lock. Fix it by using mutex and refcount during these
calls. cnic_unregister_driver() will now wait for the refcount
to go to zero before completing the call.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 14 Aug 2009 15:49:46 +0000 (15:49 +0000)]
cnic: Fix locking in start/stop calls.
The slow path ulp_start and ulp_stop calls to the bnx2i driver
are sleepable calls and therefore should not be protected using
rcu_read_lock. Fix it by using mutex and setting a bit during
these calls. cnic_unregister_device() will now wait for the bit
to clear before completing the call.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 14 Aug 2009 15:49:45 +0000 (15:49 +0000)]
bnx2: Use mutex on slow path cnic calls.
The slow path calls to the cnic driver are sleepable calls so we
cannot use rcu_read_lock(). Use mutex for these slow path calls
instead.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 14 Aug 2009 15:49:44 +0000 (15:49 +0000)]
cnic: Refine registration with bnx2.
Register and unregister with bnx2 during NETDEV_UP and NETDEV_DOWN
events. This simplifies the sequence of events and allows locking
fixes in the next patch.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 14 Aug 2009 15:49:43 +0000 (15:49 +0000)]
cnic: Fix symbol_put_addr() panic on ia64.
When the cnic driver tries to grab a symbol from bnx2 when bnx2 is
running init code, symbol_get() will succeed but symbol_put_addr()
will hit BUG() a moment later. module_text_address() fails because
bnx2 is still in init code.
This is fixed by using symbol_put() instead which does the exact
opposite of symbol_get().
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Knispel [Sat, 15 Aug 2009 17:30:24 +0000 (19:30 +0200)]
poll/select: initialize triggered field of struct poll_wqueues
The triggered field of struct poll_wqueues introduced in commit
5f820f648c92a5ecc771a96b3c29aa6e90013bba ("poll: allow f_op->poll to
sleep").
It was first set to 1 in pollwake() (now __pollwake() ), tested and
later set to 0 in poll_schedule_timeout(), but not initialized before.
As a result when the process needs to sleep, triggered was likely to be
non-zero even if pollwake() is not called before the first
poll_schedule_timeout(), meaning schedule_hrtimeout_range() would not be
called and an extra loop calling all ->poll() would be done.
This patch initialize triggered to 0 in poll_initwait() so the ->poll()
are not called twice before the process goes to sleep when it needs to.
Signed-off-by: Guillaume Knispel <gknispel@proformatique.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Walleij [Thu, 13 Aug 2009 20:57:22 +0000 (21:57 +0100)]
ARM: 5673/1: U300 fix initsection compile warning
The u300_init_check_chip() function was not properly tagged with
the __init macro and provided a initsection mismatch on
compilation.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Sat, 15 Aug 2009 11:43:13 +0000 (12:43 +0100)]
Merge branch 's3c-fixes' of git://aeryn.fluff.org.uk/bjdooks/linux
Russell King [Sat, 15 Aug 2009 11:42:46 +0000 (12:42 +0100)]
Merge branch 'for-rmk-rc' of git://git.pengutronix.de/git/imx/linux-2.6
Russell King [Sat, 15 Aug 2009 11:36:00 +0000 (12:36 +0100)]
ARM: Fix broken highmem support
Currently, highmem is selectable, and you can request an increased
vmalloc area. However, none of this has any effect on the memory
layout since a patch in the highmem series was accidentally dropped.
Moreover, even if you did want highmem, all memory would still be
registered as lowmem, possibly resulting in overflow of the available
virtual mapping space.
The highmem boundary is determined by the highest allowed beginning
of the vmalloc area, which depends on its configurable minimum size
(see commit
60296c71f6c5063e3c1f1d2619ca0b60940162e7 for details on
this).
We should create mappings and initialize bootmem only for low memory,
while the zone allocator must still be told about highmem.
Currently, memory nodes which are completely located in high memory
are not supported. This is not a huge limitation since systems
relying on highmem support are unlikely to have discontiguous memory
with large holes.
[ A similar patch was meant to be merged before commit
5f0fbf9ecaf3
and be available in Linux v2.6.30, however some git rebase screw-up
of mine dropped the first commit of the series, and that goofage
escaped testing somehow as well. -- Nico ]
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Nicolas Pitre <nico@marvell.com>
Cliff Wickman [Fri, 14 Aug 2009 18:56:37 +0000 (13:56 -0500)]
x86: Fix UV BAU destination subnode id
The SGI UV Broadcast Assist Unit is used to send TLB shootdown
messages to remote nodes of the system. The header of the
message must contain the subnode id of the block in the
receiving hub that handles such messages. It should always be
0x10, the id of the "LB" block.
It had previously been documented as a "must be zero" field.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <E1Mc1x7-0005Ce-6t@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>