platform/kernel/linux-stable.git
12 years agoARM: dts: imx53-ard: add regulators for lan9220
Shawn Guo [Thu, 2 Aug 2012 14:48:39 +0000 (22:48 +0800)]
ARM: dts: imx53-ard: add regulators for lan9220

commit 1eec0c569523782392b5e6245effddb626213b8c upstream.

Since commit c7e963f (net/smsc911x: Add regulator support), the lan9220
device tree probe fails on imx53-ard board, because the commit makes
VDD33A and VDDVARIO supplies mandatory for the driver.

Add a fixed dummy 3V3 supplying lan9220 to fix the regression.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoARM: mxs: Remove MMAP_MIN_ADDR setting from mxs_defconfig
Marek Vasut [Fri, 3 Aug 2012 18:54:48 +0000 (20:54 +0200)]
ARM: mxs: Remove MMAP_MIN_ADDR setting from mxs_defconfig

commit 3bed491c8d28329e34f8a31e3fe64d03f3a350f1 upstream.

The CONFIG_DEFAULT_MMAP_MIN_ADDR was set to 65536 in mxs_defconfig,
this caused severe breakage of userland applications since the upper
limit for ARM is 32768. By default CONFIG_DEFAULT_MMAP_MIN_ADDR is
set to 4096 and can also be changed via /proc/sys/vm/mmap_min_addr
if needed.

Quoting Russell King [1]:

"4096 is also fine for ARM too. There's not much point in having
defconfigs change it - that would just be pure noise in the config
files."

the CONFIG_DEFAULT_MMAP_MIN_ADDR can be removed from the defconfig
altogether.

This problem was introduced by commit cde7c41 (ARM: configs: add
defconfig for mach-mxs).

[1] http://marc.info/?l=linux-arm-kernel&m=134401593807820&w=2

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Wolfgang Denk <wd@denx.de>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agotarget: Check number of unmap descriptors against our limit
Roland Dreier [Mon, 16 Jul 2012 22:34:25 +0000 (15:34 -0700)]
target: Check number of unmap descriptors against our limit

commit 7409a6657aebf8be74c21d0eded80709b27275cb upstream.

Fail UNMAP commands that have more than our reported limit on unmap
descriptors.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agotarget: Fix possible integer underflow in UNMAP emulation
Roland Dreier [Mon, 16 Jul 2012 22:34:24 +0000 (15:34 -0700)]
target: Fix possible integer underflow in UNMAP emulation

commit b7fc7f3777582dea85156a821d78a522a0c083aa upstream.

It's possible for an initiator to send us an UNMAP command with a
descriptor that is less than 8 bytes; in that case it's really bad for
us to set an unsigned int to that value, subtract 8 from it, and then
use that as a limit for our loop (since the value will wrap around to
a huge positive value).

Fix this by making size be signed and only looping if size >= 16 (ie
if we have at least a full descriptor available).

Also remove offset as an obfuscated name for the constant 8.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agotarget: Fix reading of data length fields for UNMAP commands
Roland Dreier [Mon, 16 Jul 2012 22:34:23 +0000 (15:34 -0700)]
target: Fix reading of data length fields for UNMAP commands

commit 1a5fa4576ec8a462313c7516b31d7453481ddbe8 upstream.

The UNMAP DATA LENGTH and UNMAP BLOCK DESCRIPTOR DATA LENGTH fields
are in the unmap descriptor (the payload transferred to our data out
buffer), not in the CDB itself.  Read them from the correct place in
target_emulated_unmap.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agotarget: Add range checking to UNMAP emulation
Roland Dreier [Mon, 16 Jul 2012 22:34:22 +0000 (15:34 -0700)]
target: Add range checking to UNMAP emulation

commit 2594e29865c291db162313187612cd9f14538f33 upstream.

When processing an UNMAP command, we need to make sure that the number
of blocks we're asked to UNMAP does not exceed our reported maximum
number of blocks per UNMAP, and that the range of blocks we're
unmapping doesn't go past the end of the device.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
Mel Gorman [Tue, 31 Jul 2012 23:46:20 +0000 (16:46 -0700)]
mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables

commit d833352a4338dc31295ed832a30c9ccff5c7a183 upstream.

If a process creates a large hugetlbfs mapping that is eligible for page
table sharing and forks heavily with children some of whom fault and
others which destroy the mapping then it is possible for page tables to
get corrupted.  Some teardowns of the mapping encounter a "bad pmd" and
output a message to the kernel log.  The final teardown will trigger a
BUG_ON in mm/filemap.c.

This was reproduced in 3.4 but is known to have existed for a long time
and goes back at least as far as 2.6.37.  It was probably was introduced
in 2.6.20 by [39dde65c: shared page table for hugetlb page].  The messages
look like this;

[  ..........] Lots of bad pmd messages followed by this
[  127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7).
[  127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7).
[  127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7).
[  127.186778] ------------[ cut here ]------------
[  127.186781] kernel BUG at mm/filemap.c:134!
[  127.186782] invalid opcode: 0000 [#1] SMP
[  127.186783] CPU 7
[  127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod
[  127.186801]
[  127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild #53 Dell Inc. OptiPlex 990/06D7TR
[  127.186804] RIP: 0010:[<ffffffff810ed6ce>]  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[  127.186809] RSP: 0000:ffff8804144b5c08  EFLAGS: 00010002
[  127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0
[  127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00
[  127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003
[  127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8
[  127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8
[  127.186815] FS:  00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000
[  127.186816] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0
[  127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0)
[  127.186821] Stack:
[  127.186822]  ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b
[  127.186824]  ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98
[  127.186825]  ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000
[  127.186827] Call Trace:
[  127.186829]  [<ffffffff810ed83b>] delete_from_page_cache+0x3b/0x80
[  127.186832]  [<ffffffff811bc925>] truncate_hugepages+0x115/0x220
[  127.186834]  [<ffffffff811bca43>] hugetlbfs_evict_inode+0x13/0x30
[  127.186837]  [<ffffffff811655c7>] evict+0xa7/0x1b0
[  127.186839]  [<ffffffff811657a3>] iput_final+0xd3/0x1f0
[  127.186840]  [<ffffffff811658f9>] iput+0x39/0x50
[  127.186842]  [<ffffffff81162708>] d_kill+0xf8/0x130
[  127.186843]  [<ffffffff81162812>] dput+0xd2/0x1a0
[  127.186845]  [<ffffffff8114e2d0>] __fput+0x170/0x230
[  127.186848]  [<ffffffff81236e0e>] ? rb_erase+0xce/0x150
[  127.186849]  [<ffffffff8114e3ad>] fput+0x1d/0x30
[  127.186851]  [<ffffffff81117db7>] remove_vma+0x37/0x80
[  127.186853]  [<ffffffff81119182>] do_munmap+0x2d2/0x360
[  127.186855]  [<ffffffff811cc639>] sys_shmdt+0xc9/0x170
[  127.186857]  [<ffffffff81410a39>] system_call_fastpath+0x16/0x1b
[  127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff <0f> 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0
[  127.186868] RIP  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[  127.186870]  RSP <ffff8804144b5c08>
[  127.186871] ---[ end trace 7cbac5d1db69f426 ]---

The bug is a race and not always easy to reproduce.  To reproduce it I was
doing the following on a single socket I7-based machine with 16G of RAM.

$ hugeadm --pool-pages-max DEFAULT:13G
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmmax
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmall
$ for i in `seq 1 9000`; do ./hugetlbfs-test; done

On my particular machine, it usually triggers within 10 minutes but
enabling debug options can change the timing such that it never hits.
Once the bug is triggered, the machine is in trouble and needs to be
rebooted.  The machine will respond but processes accessing proc like "ps
aux" will hang due to the BUG_ON.  shutdown will also hang and needs a
hard reset or a sysrq-b.

The basic problem is a race between page table sharing and teardown.  For
the most part page table sharing depends on i_mmap_mutex.  In some cases,
it is also taking the mm->page_table_lock for the PTE updates but with
shared page tables, it is the i_mmap_mutex that is more important.

Unfortunately it appears to be also insufficient. Consider the following
situation

Process A Process B
--------- ---------
hugetlb_fault shmdt
   LockWrite(mmap_sem)
       do_munmap
    unmap_region
      unmap_vmas
        unmap_single_vma
          unmap_hugepage_range
                   Lock(i_mmap_mutex)
    Lock(mm->page_table_lock)
    huge_pmd_unshare/unmap tables <--- (1)
    Unlock(mm->page_table_lock)
                   Unlock(i_mmap_mutex)
  huge_pte_alloc       ...
    Lock(i_mmap_mutex)       ...
    vma_prio_walk, find svma, spte       ...
    Lock(mm->page_table_lock)       ...
    share spte       ...
    Unlock(mm->page_table_lock)       ...
    Unlock(i_mmap_mutex)       ...
  hugetlb_no_page   <--- (2)
      free_pgtables
        unlink_file_vma
hugetlb_free_pgd_range
    remove_vma_list

In this scenario, it is possible for Process A to share page tables with
Process B that is trying to tear them down.  The i_mmap_mutex on its own
does not prevent Process A walking Process B's page tables.  At (1) above,
the page tables are not shared yet so it unmaps the PMDs.  Process A sets
up page table sharing and at (2) faults a new entry.  Process B then trips
up on it in free_pgtables.

This patch fixes the problem by adding a new function
__unmap_hugepage_range_final that is only called when the VMA is about to
be destroyed.  This function clears VM_MAYSHARE during
unmap_hugepage_range() under the i_mmap_mutex.  This makes the VMA
ineligible for sharing and avoids the race.  Superficially this looks like
it would then be vunerable to truncate and madvise issues but hugetlbfs
has its own truncate handlers so does not use unmap_mapping_range() and
does not support madvise(DONTNEED).

This should be treated as a -stable candidate if it is merged.

Test program is as follows. The test case was mostly written by Michal
Hocko with a few minor changes to reproduce this bug.

==== CUT HERE ====

static size_t huge_page_size = (2UL << 20);
static size_t nr_huge_page_A = 512;
static size_t nr_huge_page_B = 5632;

unsigned int get_random(unsigned int max)
{
struct timeval tv;

gettimeofday(&tv, NULL);
srandom(tv.tv_usec);
return random() % max;
}

static void play(void *addr, size_t size)
{
unsigned char *start = addr,
      *end = start + size,
      *a;
start += get_random(size/2);

/* we could itterate on huge pages but let's give it more time. */
for (a = start; a < end; a += 4096)
*a = 0;
}

int main(int argc, char **argv)
{
key_t key = IPC_PRIVATE;
size_t sizeA = nr_huge_page_A * huge_page_size;
size_t sizeB = nr_huge_page_B * huge_page_size;
int shmidA, shmidB;
void *addrA = NULL, *addrB = NULL;
int nr_children = 300, n = 0;

if ((shmidA = shmget(key, sizeA, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
perror("shmget:");
return 1;
}

if ((addrA = shmat(shmidA, addrA, SHM_R|SHM_W)) == (void *)-1UL) {
perror("shmat");
return 1;
}
if ((shmidB = shmget(key, sizeB, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
perror("shmget:");
return 1;
}

if ((addrB = shmat(shmidB, addrB, SHM_R|SHM_W)) == (void *)-1UL) {
perror("shmat");
return 1;
}

fork_child:
switch(fork()) {
case 0:
switch (n%3) {
case 0:
play(addrA, sizeA);
break;
case 1:
play(addrB, sizeB);
break;
case 2:
break;
}
break;
case -1:
perror("fork:");
break;
default:
if (++n < nr_children)
goto fork_child;
play(addrA, sizeA);
break;
}
shmdt(addrA);
shmdt(addrB);
do {
wait(NULL);
} while (--n > 0);
shmctl(shmidA, IPC_RMID, NULL);
shmctl(shmidB, IPC_RMID, NULL);
return 0;
}

[akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build]
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agox86, microcode: Sanitize per-cpu microcode reloading interface
Borislav Petkov [Thu, 21 Jun 2012 12:07:16 +0000 (14:07 +0200)]
x86, microcode: Sanitize per-cpu microcode reloading interface

commit c9fc3f778a6a215ace14ee556067c73982b6d40f upstream.

Microcode reloading in a per-core manner is a very bad idea for both
major x86 vendors. And the thing is, we have such interface with which
we can end up with different microcode versions applied on different
cores of an otherwise homogeneous wrt (family,model,stepping) system.

So turn off the possibility of doing that per core and allow it only
system-wide.

This is a minimal fix which we'd like to see in stable too thus the
more-or-less arbitrary decision to allow system-wide reloading only on
the BSP:

$ echo 1 > /sys/devices/system/cpu/cpu0/microcode/reload
...

and disable the interface on the other cores:

$ echo 1 > /sys/devices/system/cpu/cpu23/microcode/reload
-bash: echo: write error: Invalid argument

Also, allowing the reload only from one CPU (the BSP in
that case) doesn't allow the reload procedure to degenerate
into an O(n^2) deal when triggering reloads from all
/sys/devices/system/cpu/cpuX/microcode/reload sysfs nodes
simultaneously.

A more generic fix will follow.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1340280437-7718-2-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agox86, microcode: microcode_core.c simple_strtoul cleanup
Shuah Khan [Sun, 6 May 2012 17:11:04 +0000 (11:11 -0600)]
x86, microcode: microcode_core.c simple_strtoul cleanup

commit e826abd523913f63eb03b59746ffb16153c53dc4 upstream.

Change reload_for_cpu() in kernel/microcode_core.c to call kstrtoul()
instead of calling obsoleted simple_strtoul().

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1336324264.2897.9.camel@lorien2
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoHID: add ASUS AIO keyboard model AK1D
Cyrus Lien [Mon, 23 Jul 2012 09:11:51 +0000 (17:11 +0800)]
HID: add ASUS AIO keyboard model AK1D

commit 2d8767bb421574dfcf48e4be0751ce7d8f73d5d7 upstream.

Add Asus All-In-One PC keyboard model AK1D.

BugLink: https://bugs.launchpad.net/bugs/1027789
Signed-off-by: Cyrus Lien <cyrus.lien@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoHID: add support for Cypress barcode scanner 04B4:ED81
Lionel Vaux [Sun, 22 Jul 2012 09:32:20 +0000 (11:32 +0200)]
HID: add support for Cypress barcode scanner 04B4:ED81

commit 76c9d8fe2c7fc34ffc387d8022c5828d6ff9df48 upstream.

Add yet another device to the list of Cypress barcode scanners
needing the CP_RDESC_SWAPPED_MIN_MAX quirk.

Signed-off-by: Lionel Vaux (iouri) <lionel.vaux@free.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoHID: multitouch: add support for Novatek touchscreen
Austin Hendrix [Mon, 4 Jun 2012 22:27:51 +0000 (15:27 -0700)]
HID: multitouch: add support for Novatek touchscreen

commit 4db703ead4535792ea54dba7275fdd1527848e74 upstream.

Add support for a Novatek touchscreen panel as a generic HID multitouch
panel.

Signed-off-by: Austin Hendrix <ahendrix@willowgarage.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agorandom: mix in architectural randomness in extract_buf()
H. Peter Anvin [Sat, 28 Jul 2012 02:26:08 +0000 (22:26 -0400)]
random: mix in architectural randomness in extract_buf()

commit d2e7c96af1e54b507ae2a6a7dd2baf588417a7e5 upstream.

Mix in any architectural randomness in extract_buf() instead of
xfer_secondary_buf().  This allows us to mix in more architectural
randomness, and it also makes xfer_secondary_buf() faster, moving a
tiny bit of additional CPU overhead to process which is extracting the
randomness.

[ Commit description modified by tytso to remove an extended
  advertisement for the RDRAND instruction. ]

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: DJ Johnston <dj.johnston@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agodmi: Feed DMI table to /dev/random driver
Tony Luck [Fri, 20 Jul 2012 20:15:20 +0000 (13:15 -0700)]
dmi: Feed DMI table to /dev/random driver

commit d114a33387472555188f142ed8e98acdb8181c6d upstream.

Send the entire DMI (SMBIOS) table to the /dev/random driver to
help seed its pools.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agorandom: Add comment to random_initialize()
Tony Luck [Mon, 23 Jul 2012 16:47:57 +0000 (09:47 -0700)]
random: Add comment to random_initialize()

commit cbc96b7594b5691d61eba2db8b2ea723645be9ca upstream.

Many platforms have per-machine instance data (serial numbers,
asset tags, etc.) squirreled away in areas that are accessed
during early system bringup. Mixing this data into the random
pools has a very high value in providing better random data,
so we should allow (and even encourage) architecture code to
call add_device_randomness() from the setup_arch() paths.

However, this limits our options for internal structure of
the random driver since random_initialize() is not called
until long after setup_arch().

Add a big fat comment to rand_initialize() spelling out
this requirement.

Suggested-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agorandom: remove rand_initialize_irq()
Theodore Ts'o [Sun, 15 Jul 2012 00:27:52 +0000 (20:27 -0400)]
random: remove rand_initialize_irq()

commit c5857ccf293968348e5eb4ebedc68074de3dcda6 upstream.

With the new interrupt sampling system, we are no longer using the
timer_rand_state structure in the irq descriptor, so we can stop
initializing it now.

[ Merged in fixes from Sedat to find some last missing references to
  rand_initialize_irq() ]

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomfd: wm831x: Feed the device UUID into device_add_randomness()
Mark Brown [Thu, 5 Jul 2012 20:23:21 +0000 (20:23 +0000)]
mfd: wm831x: Feed the device UUID into device_add_randomness()

commit 27130f0cc3ab97560384da437e4621fc4e94f21c upstream.

wm831x devices contain a unique ID value. Feed this into the newly added
device_add_randomness() to add some per device seed data to the pool.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agortc: wm831x: Feed the write counter into device_add_randomness()
Mark Brown [Thu, 5 Jul 2012 20:19:17 +0000 (20:19 +0000)]
rtc: wm831x: Feed the write counter into device_add_randomness()

commit 9dccf55f4cb011a7552a8a2749a580662f5ed8ed upstream.

The tamper evident features of the RTC include the "write counter" which
is a pseudo-random number regenerated whenever we set the RTC. Since this
value is unpredictable it should provide some useful seeding to the random
number generator.

Only do this on boot since the goal is to seed the pool rather than add
useful entropy.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoMAINTAINERS: Theodore Ts'o is taking over the random driver
Theodore Ts'o [Wed, 4 Jul 2012 15:32:48 +0000 (11:32 -0400)]
MAINTAINERS: Theodore Ts'o is taking over the random driver

commit 330e0a01d54c2b8606c56816f99af6ebc58ec92c upstream.

Matt Mackall stepped down as the /dev/random driver maintainer last
year, so Theodore Ts'o is taking back the /dev/random driver.

Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agorandom: add tracepoints for easier debugging and verification
Theodore Ts'o [Wed, 4 Jul 2012 20:19:30 +0000 (16:19 -0400)]
random: add tracepoints for easier debugging and verification

commit 00ce1db1a634746040ace24c09a4e3a7949a3145 upstream.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agorandom: add new get_random_bytes_arch() function
Theodore Ts'o [Thu, 5 Jul 2012 14:35:23 +0000 (10:35 -0400)]
random: add new get_random_bytes_arch() function

commit c2557a303ab6712bb6e09447df828c557c710ac9 upstream.

Create a new function, get_random_bytes_arch() which will use the
architecture-specific hardware random number generator if it is
present.  Change get_random_bytes() to not use the HW RNG, even if it
is avaiable.

The reason for this is that the hw random number generator is fast (if
it is present), but it requires that we trust the hardware
manufacturer to have not put in a back door.  (For example, an
increasing counter encrypted by an AES key known to the NSA.)

It's unlikely that Intel (for example) was paid off by the US
Government to do this, but it's impossible for them to prove otherwise
  --- especially since Bull Mountain is documented to use AES as a
whitener.  Hence, the output of an evil, trojan-horse version of
RDRAND is statistically indistinguishable from an RDRAND implemented
to the specifications claimed by Intel.  Short of using a tunnelling
electronic microscope to reverse engineer an Ivy Bridge chip and
disassembling and analyzing the CPU microcode, there's no way for us
to tell for sure.

Since users of get_random_bytes() in the Linux kernel need to be able
to support hardware systems where the HW RNG is not present, most
time-sensitive users of this interface have already created their own
cryptographic RNG interface which uses get_random_bytes() as a seed.
So it's much better to use the HW RNG to improve the existing random
number generator, by mixing in any entropy returned by the HW RNG into
/dev/random's entropy pool, but to always _use_ /dev/random's entropy
pool.

This way we get almost of the benefits of the HW RNG without any
potential liabilities.  The only benefits we forgo is the
speed/performance enhancements --- and generic kernel code can't
depend on depend on get_random_bytes() having the speed of a HW RNG
anyway.

For those places that really want access to the arch-specific HW RNG,
if it is available, we provide get_random_bytes_arch().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agorandom: use the arch-specific rng in xfer_secondary_pool
Theodore Ts'o [Thu, 5 Jul 2012 14:21:01 +0000 (10:21 -0400)]
random: use the arch-specific rng in xfer_secondary_pool

commit e6d4947b12e8ad947add1032dd754803c6004824 upstream.

If the CPU supports a hardware random number generator, use it in
xfer_secondary_pool(), where it will significantly improve things and
where we can afford it.

Also, remove the use of the arch-specific rng in
add_timer_randomness(), since the call is significantly slower than
get_cycles(), and we're much better off using it in
xfer_secondary_pool() anyway.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonet: feed /dev/random with the MAC address when registering a device
Theodore Ts'o [Thu, 5 Jul 2012 01:23:25 +0000 (21:23 -0400)]
net: feed /dev/random with the MAC address when registering a device

commit 7bf2357524408b97fec58344caf7397f8140c3fd upstream.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: David Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agousb: feed USB device information to the /dev/random driver
Theodore Ts'o [Wed, 4 Jul 2012 15:22:20 +0000 (11:22 -0400)]
usb: feed USB device information to the /dev/random driver

commit b04b3156a20d395a7faa8eed98698d1e17a36000 upstream.

Send the USB device's serial, product, and manufacturer strings to the
/dev/random driver to help seed its pools.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agorandom: create add_device_randomness() interface
Linus Torvalds [Wed, 4 Jul 2012 15:16:01 +0000 (11:16 -0400)]
random: create add_device_randomness() interface

commit a2080a67abe9e314f9e9c2cc3a4a176e8a8f8793 upstream.

Add a new interface, add_device_randomness() for adding data to the
random pool that is likely to differ between two devices (or possibly
even per boot).  This would be things like MAC addresses or serial
numbers, or the read-out of the RTC. This does *not* add any actual
entropy to the pool, but it initializes the pool to different values
for devices that might otherwise be identical and have very little
entropy available to them (particularly common in the embedded world).

[ Modified by tytso to mix in a timestamp, since there may be some
  variability caused by the time needed to detect/configure the hardware
  in question. ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agorandom: use lockless techniques in the interrupt path
Theodore Ts'o [Wed, 4 Jul 2012 14:38:30 +0000 (10:38 -0400)]
random: use lockless techniques in the interrupt path

commit 902c098a3663de3fa18639efbb71b6080f0bcd3c upstream.

The real-time Linux folks don't like add_interrupt_randomness() taking
a spinlock since it is called in the low-level interrupt routine.
This also allows us to reduce the overhead in the fast path, for the
random driver, which is the interrupt collection path.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agorandom: make 'add_interrupt_randomness()' do something sane
Theodore Ts'o [Mon, 2 Jul 2012 11:52:16 +0000 (07:52 -0400)]
random: make 'add_interrupt_randomness()' do something sane

commit 775f4b297b780601e61787b766f306ed3e1d23eb upstream.

We've been moving away from add_interrupt_randomness() for various
reasons: it's too expensive to do on every interrupt, and flooding the
CPU with interrupts could theoretically cause bogus floods of entropy
from a somewhat externally controllable source.

This solves both problems by limiting the actual randomness addition
to just once a second or after 64 interrupts, whicever comes first.
During that time, the interrupt cycle data is buffered up in a per-cpu
pool.  Also, we make sure the the nonblocking pool used by urandom is
initialized before we start feeding the normal input pool.  This
assures that /dev/urandom is returning unpredictable data as soon as
possible.

(Based on an original patch by Linus, but significantly modified by
tytso.)

Tested-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Nadia Heninger <nadiah@cs.ucsd.edu>
Reported-by: Zakir Durumeric <zakir@umich.edu>
Reported-by: J. Alex Halderman <jhalderm@umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agox86, nops: Missing break resulting in incorrect selection on Intel
Alan Cox [Wed, 25 Jul 2012 15:28:19 +0000 (16:28 +0100)]
x86, nops: Missing break resulting in incorrect selection on Intel

commit d6250a3f12edb3a86db9598ffeca3de8b4a219e9 upstream.

The Intel case falls through into the generic case which then changes
the values.  For cases like the P6 it doesn't do the right thing so
this seems to be a screwup.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-lww2uirad4skzjlmrm0vru8o@git.kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agowireless: reg: restore previous behaviour of chan->max_power calculations
Stanislaw Gruszka [Tue, 24 Jul 2012 06:35:39 +0000 (08:35 +0200)]
wireless: reg: restore previous behaviour of chan->max_power calculations

commit 5e31fc0815a4e2c72b1b495fe7a0d8f9bfb9e4b4 upstream.

commit eccc068e8e84c8fe997115629925e0422a98e4de
Author: Hong Wu <Hong.Wu@dspg.com>
Date:   Wed Jan 11 20:33:39 2012 +0200

    wireless: Save original maximum regulatory transmission power for the calucation of the local maximum transmit pow

changed the way we calculate chan->max_power as min(chan->max_power,
chan->max_reg_power). That broke rt2x00 (and perhaps some other
drivers) that do not set chan->max_power. It is not so easy to fix this
problem correctly in rt2x00.

According to commit eccc068e8 changelog, change claim only to save
maximum regulatory power - changing setting of chan->max_power was side
effect. This patch restore previous calculations of chan->max_power and
do not touch chan->max_reg_power.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoath9k: Add PID/VID support for AR1111
Mohammed Shafi Shajakhan [Thu, 2 Aug 2012 06:28:50 +0000 (11:58 +0530)]
ath9k: Add PID/VID support for AR1111

commit d4e5979c0da95791aa717c18e162540c7a596360 upstream.

AR1111 is same as AR9485. The h/w
difference between them is quite insignificant,
Felix suggests only very few baseband features
may not be available in AR1111. The h/w code for
AR9485 is already present, so AR1111 should
work fine with the addition of its PID/VID.

Reported-by: Tim Bentley <Tim.Bentley@Gmail.com>
Cc: Felix Bitterli <felixb@qca.qualcomm.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Tested-by: Tim Bentley <Tim.Bentley@Gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomac80211: cancel mesh path timer
Johannes Berg [Wed, 1 Aug 2012 19:03:21 +0000 (21:03 +0200)]
mac80211: cancel mesh path timer

commit dd4c9260e7f23f2e951cbfb2726e468c6d30306c upstream.

The mesh path timer needs to be canceled when
leaving the mesh as otherwise it could fire
after the interface has been removed already.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoACPI processor: Fix tick_broadcast_mask online/offline regression
Feng Tang [Tue, 31 Jul 2012 04:44:43 +0000 (12:44 +0800)]
ACPI processor: Fix tick_broadcast_mask online/offline regression

commit b7db60f45d74497c723dc7ae1370cf0b37dfb0d8 upstream.

In commit 99b725084 "ACPI processor hotplug: Delay acpi_processor_start()
call for hotplugged cores", acpi_processor_hotplug(pr) was wrongly replaced
by acpi_processor_cst_has_changed() inside the acpi_cpu_soft_notify(). This
patch will restore it back, fixing the tick_broadcast_mask regression:
https://lkml.org/lkml/2012/7/30/169

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Thomas Renninger <trenn@suse.de>
Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoore: Fix out-of-bounds access in _ios_obj()
Boaz Harrosh [Wed, 1 Aug 2012 14:48:36 +0000 (17:48 +0300)]
ore: Fix out-of-bounds access in _ios_obj()

commit 9e62bb4458ad2cf28bd701aa5fab380b846db326 upstream.

_ios_obj() is accessed by group_index not device_table index.

The oc->comps array is only a group_full of devices at a time
it is not like ore_comp_dev() which is indexed by a global
device_table index.

This did not BUG until now because exofs only uses a single
COMP for all devices. But with other FSs like PanFS this is
not true.

This bug was only in the write_path, all other users were
using it correctly

[This is a bug since 3.2 Kernel]

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agovideo/smscufx: fix line counting in fb_write
Alexander Holler [Fri, 20 Apr 2012 22:11:07 +0000 (00:11 +0200)]
video/smscufx: fix line counting in fb_write

commit 2fe2d9f47cfe1a3e66e7d087368b3d7155b04c15 upstream.

Line 0 and 1 were both written to line 0 (on the display) and all subsequent
lines had an offset of -1. The result was that the last line on the display
was never overwritten by writes to /dev/fbN.

The origin of this bug seems to have been udlfb.

Signed-off-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomd/raid1: don't abort a resync on the first badblock.
NeilBrown [Tue, 31 Jul 2012 00:05:34 +0000 (10:05 +1000)]
md/raid1: don't abort a resync on the first badblock.

commit b7219ccb33aa0df9949a60c68b5e9f712615e56f upstream.

If a resync of a RAID1 array with 2 devices finds a known bad block
one device it will neither read from, or write to, that device for
this block offset.
So there will be one read_target (The other device) and zero write
targets.
This condition causes md/raid1 to abort the resync assuming that it
has finished - without known bad blocks this would be true.

When there are no write targets because of the presence of bad blocks
we should only skip over the area covered by the bad block.
RAID10 already gets this right, raid1 doesn't.  Or didn't.

As this can cause a 'sync' to abort early and appear to have succeeded
it could lead to some data corruption, so it suitable for -stable.

Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomm: mmu_notifier: fix freed page still mapped in secondary MMU
Xiao Guangrong [Tue, 31 Jul 2012 23:45:52 +0000 (16:45 -0700)]
mm: mmu_notifier: fix freed page still mapped in secondary MMU

commit 3ad3d901bbcfb15a5e4690e55350db0899095a68 upstream.

mmu_notifier_release() is called when the process is exiting.  It will
delete all the mmu notifiers.  But at this time the page belonging to the
process is still present in page tables and is present on the LRU list, so
this race will happen:

      CPU 0                 CPU 1
mmu_notifier_release:    try_to_unmap:
   hlist_del_init_rcu(&mn->hlist);
                            ptep_clear_flush_notify:
                                  mmu nofifler not found
                            free page  !!!!!!
                            /*
                             * At the point, the page has been
                             * freed, but it is still mapped in
                             * the secondary MMU.
                             */

  mn->ops->release(mn, mm);

Then the box is not stable and sometimes we can get this bug:

[  738.075923] BUG: Bad page state in process migrate-perf  pfn:03bec
[  738.075931] page:ffffea00000efb00 count:0 mapcount:0 mapping:          (null) index:0x8076
[  738.075936] page flags: 0x20000000000014(referenced|dirty)

The same issue is present in mmu_notifier_unregister().

We can call ->release before deleting the notifier to ensure the page has
been unmapped from the secondary MMU before it is freed.

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - Fix double quirk for Quanta FL1 / Lenovo Ideapad
David Henningsson [Wed, 8 Aug 2012 06:43:37 +0000 (08:43 +0200)]
ALSA: hda - Fix double quirk for Quanta FL1 / Lenovo Ideapad

commit 012e7eb1e501d0120e0383b81477f63091f5e365 upstream.

The same ID is twice in the quirk table, so the second one is not used.

Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - remove quirk for Dell Vostro 1015
David Henningsson [Tue, 7 Aug 2012 12:03:29 +0000 (14:03 +0200)]
ALSA: hda - remove quirk for Dell Vostro 1015

commit e9fc83cb2e5877801a255a37ddbc5be996ea8046 upstream.

This computer is confirmed working with model=auto on kernel 3.2.
Also, parsing fails with hda-emu with the current model.

Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - add dock support for Thinkpad X230
Felix Kaechele [Mon, 6 Aug 2012 21:02:01 +0000 (23:02 +0200)]
ALSA: hda - add dock support for Thinkpad X230

commit c8415a48fcb7a29889f4405d38c57db351e4b50a upstream.

As with the ThinkPad Models X230 Tablet and T530 the X230 needs a qurik to
correctly set up the pins for the dock port.

Signed-off-by: Felix Kaechele <felix@fetzig.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - add dock support for Thinkpad T430s
Philipp A. Mohrenweiser [Mon, 6 Aug 2012 11:14:18 +0000 (13:14 +0200)]
ALSA: hda - add dock support for Thinkpad T430s

commit 4407be6ba217514b1bc01488f8b56467d309e416 upstream.

Add a model/fixup string "lenovo-dock", for Thinkpad T430s, to allow
sound in docking station.

Tested on Lenovo T430s with ThinkPad Mini Dock Plus Series 3

Signed-off-by: Philipp A. Mohrenweiser <phiamo@googlemail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoARM: Fix undefined instruction exception handling
Russell King [Mon, 30 Jul 2012 18:42:10 +0000 (19:42 +0100)]
ARM: Fix undefined instruction exception handling

commit 15ac49b65024f55c4371a53214879a9c77c4fbf9 upstream.

While trying to get a v3.5 kernel booted on the cubox, I noticed that
VFP does not work correctly with VFP bounce handling.  This is because
of the confusion over 16-bit vs 32-bit instructions, and where PC is
supposed to point to.

The rule is that FP handlers are entered with regs->ARM_pc pointing at
the _next_ instruction to be executed.  However, if the exception is
not handled, regs->ARM_pc points at the faulting instruction.

This is easy for ARM mode, because we know that the next instruction and
previous instructions are separated by four bytes.  This is not true of
Thumb2 though.

Since all FP instructions are 32-bit in Thumb2, it makes things easy.
We just need to select the appropriate adjustment.  Do this by moving
the adjustment out of do_undefinstr() into the assembly code, as only
the assembly code knows whether it's dealing with a 32-bit or 16-bit
instruction.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoARM: 7480/1: only call smp_send_stop() on SMP
Javier Martinez Canillas [Sat, 28 Jul 2012 14:19:55 +0000 (15:19 +0100)]
ARM: 7480/1: only call smp_send_stop() on SMP

commit c5dff4ffd327088d85035bec535b7d0c9ea03151 upstream.

On reboot or poweroff (machine_shutdown()) a call to smp_send_stop() is
made (to stop the others CPU's) when CONFIG_SMP=y.

arch/arm/kernel/process.c:

void machine_shutdown(void)
{
 #ifdef CONFIG_SMP
       smp_send_stop();
 #endif
}

smp_send_stop() calls the function pointer smp_cross_call(), which is set
on the smp_init_cpus() function for OMAP processors.

arch/arm/mach-omap2/omap-smp.c:

void __init smp_init_cpus(void)
{
...
set_smp_cross_call(gic_raise_softirq);
...
}

But the ARM setup_arch() function only calls smp_init_cpus()
if CONFIG_SMP=y && is_smp().

arm/kernel/setup.c:

void __init setup_arch(char **cmdline_p)
{
...
 #ifdef CONFIG_SMP
if (is_smp())
smp_init_cpus();
 #endif
...
}

Newer OMAP CPU's are SMP machines so omap2plus_defconfig sets
CONFIG_SMP=y. Unfortunately on an OMAP UP machine is_smp()
returns false and smp_init_cpus() is never called and the
smp_cross_call() function remains NULL.

If the machine is rebooted or powered off, smp_send_stop() will
be called (since CONFIG_SMP=y) leading to the following error:

[   42.815551] Restarting system.
[   42.819030] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[   42.827667] pgd = d7a74000
[   42.830566] [00000000] *pgd=96ce7831, *pte=00000000, *ppte=00000000
[   42.837249] Internal error: Oops: 80000007 [#1] SMP ARM
[   42.842773] Modules linked in:
[   42.846008] CPU: 0    Not tainted  (3.5.0-rc3-next-20120622-00002-g62e87ba-dirty #44)
[   42.854278] PC is at 0x0
[   42.856994] LR is at smp_send_stop+0x4c/0xe4
[   42.861511] pc : [<00000000>]    lr : [<c00183a4>]    psr: 60000013
[   42.861511] sp : d6c85e70  ip : 00000000  fp : 00000000
[   42.873626] r10: 00000000  r9 : d6c84000  r8 : 00000002
[   42.879150] r7 : c07235a0  r6 : c06dd2d0  r5 : 000f4241  r4 : d6c85e74
[   42.886047] r3 : 00000000  r2 : 00000000  r1 : 00000006  r0 : d6c85e74
[   42.892944] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   42.900482] Control: 10c5387d  Table: 97a74019  DAC: 00000015
[   42.906555] Process reboot (pid: 1166, stack limit = 0xd6c842f8)
[   42.912902] Stack: (0xd6c85e70 to 0xd6c86000)
[   42.917510] 5e60:                                     c07235a0 00000000 00000000 d6c84000
[   42.926177] 5e80: 01234567 c00143d0 4321fedc c00511bc d6c85ebc 00000168 00000460 00000000
[   42.934814] 5ea0: c1017950 a0000013 c1017900 d8014390 d7ec3858 c0498e48 c1017950 00000000
[   42.943481] 5ec0: d6ddde10 d6c85f78 00000003 00000000 d6ddde10 d6c84000 00000000 00000000
[   42.952117] 5ee0: 00000002 00000000 00000000 c0088c88 00000002 00000000 00000000 c00f4b90
[   42.960784] 5f00: 00000000 d6c85ebc d8014390 d7e311c8 60000013 00000103 00000002 d6c84000
[   42.969421] 5f20: c00f3274 d6e00a00 00000001 60000013 d6c84000 00000000 00000000 c00895d4
[   42.978057] 5f40: 00000002 d8007c80 d781f000 c00f6150 d8010cc0 c00f3274 d781f000 d6c84000
[   42.986694] 5f60: c0013020 d6e00a00 00000001 20000010 0001257c ef000000 00000000 c00895d4
[   42.995361] 5f80: 00000002 00000001 00000003 00000000 00000001 00000003 00000000 00000058
[   43.003997] 5fa0: c00130c8 c0012f00 00000001 00000003 fee1dead 28121969 01234567 00000002
[   43.012634] 5fc0: 00000001 00000003 00000000 00000058 00012584 0001257c 00000001 00000000
[   43.021270] 5fe0: 000124bc bec5cc6c 00008f9c 4a2f7c40 20000010 fee1dead 00000000 00000000
[   43.029968] [<c00183a4>] (smp_send_stop+0x4c/0xe4) from [<c00143d0>] (machine_restart+0xc/0x4c)
[   43.039154] [<c00143d0>] (machine_restart+0xc/0x4c) from [<c00511bc>] (sys_reboot+0x144/0x1f0)
[   43.048278] [<c00511bc>] (sys_reboot+0x144/0x1f0) from [<c0012f00>] (ret_fast_syscall+0x0/0x3c)
[   43.057464] Code: bad PC value
[   43.060760] ---[ end trace c3988d1dd0b8f0fb ]---

Add a check so smp_cross_call() is only called when there is more than one CPU on-line.

Signed-off-by: Javier Martinez Canillas <javier at dowhile0.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoARM: 7479/1: mm: avoid NULL dereference when flushing gate_vma with VIVT caches
Will Deacon [Mon, 23 Jul 2012 13:18:13 +0000 (14:18 +0100)]
ARM: 7479/1: mm: avoid NULL dereference when flushing gate_vma with VIVT caches

commit b74253f78400f9a4b42da84bb1de7540b88ce7c4 upstream.

The vivt_flush_cache_{range,page} functions check that the mm_struct
of the VMA being flushed has been active on the current CPU before
performing the cache maintenance.

The gate_vma has a NULL mm_struct pointer and, as such, will cause a
kernel fault if we try to flush it with the above operations. This
happens during ELF core dumps, which include the gate_vma as it may be
useful for debugging purposes.

This patch adds checks to the VIVT cache flushing functions so that VMAs
with a NULL mm_struct are flushed unconditionally (the vectors page may
be dirty if we use it to store the current TLS pointer).

Reported-by: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Tested-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoARM: 7478/1: errata: extend workaround for erratum #720789
Will Deacon [Fri, 20 Jul 2012 17:24:55 +0000 (18:24 +0100)]
ARM: 7478/1: errata: extend workaround for erratum #720789

commit 5a783cbc48367cfc7b65afc75430953dfe60098f upstream.

Commit cdf357f1 ("ARM: 6299/1: errata: TLBIASIDIS and TLBIMVAIS
operations can broadcast a faulty ASID") replaced by-ASID TLB flushing
operations with all-ASID variants to workaround A9 erratum #720789.

This patch extends the workaround to include the tlb_range operations,
which were overlooked by the original patch.

Tested-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoARM: 7477/1: vfp: Always save VFP state in vfp_pm_suspend on UP
Colin Cross [Fri, 20 Jul 2012 01:03:42 +0000 (02:03 +0100)]
ARM: 7477/1: vfp: Always save VFP state in vfp_pm_suspend on UP

commit 24b35521b8ddf088531258f06f681bb7b227bf47 upstream.

vfp_pm_suspend should save the VFP state in suspend after
any lazy context switch.  If it only saves when the VFP is enabled,
the state can get lost when, on a UP system:
  Thread 1 uses the VFP
  Context switch occurs to thread 2, VFP is disabled but the
     VFP context is not saved
  Thread 2 initiates suspend
  vfp_pm_suspend is called with the VFP disabled, and the unsaved
     VFP context of Thread 1 in the registers

Modify vfp_pm_suspend to save the VFP context whenever
vfp_current_hw_state is not NULL.

Includes a fix from Ido Yariv <ido@wizery.com>, who pointed out that on
SMP systems, the state pointer can be pointing to a freed task struct if
a task exited on another cpu, fixed by using #ifndef CONFIG_SMP in the
new if clause.

Signed-off-by: Colin Cross <ccross@android.com>
Cc: Barry Song <bs14@csr.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ido Yariv <ido@wizery.com>
Cc: Daniel Drake <dsd@laptop.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoARM: 7476/1: vfp: only clear vfp state for current cpu in vfp_pm_suspend
Colin Cross [Fri, 20 Jul 2012 01:03:43 +0000 (02:03 +0100)]
ARM: 7476/1: vfp: only clear vfp state for current cpu in vfp_pm_suspend

commit a84b895a2348f0dbff31b71ddf954f70a6cde368 upstream.

vfp_pm_suspend runs on each cpu, only clear the hardware state
pointer for the current cpu.  Prevents a possible crash if one
cpu clears the hw state pointer when another cpu has already
checked if it is valid.

Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoARM: 7466/1: disable interrupt before spinning endlessly
Shawn Guo [Fri, 13 Jul 2012 07:19:34 +0000 (08:19 +0100)]
ARM: 7466/1: disable interrupt before spinning endlessly

commit 98bd8b96b26db3399a48202318dca4aaa2515355 upstream.

The CPU will endlessly spin at the end of machine_halt and
machine_restart calls.  However, this will lead to a soft lockup
warning after about 20 seconds, if CONFIG_LOCKUP_DETECTOR is enabled,
as system timer is still alive.

Disable interrupt before going to spin endlessly, so that the lockup
warning will never be seen.

Reported-by: Marek Vasut <marex@denx.de>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomm: fix wrong argument of migrate_huge_pages() in soft_offline_huge_page()
Joonsoo Kim [Mon, 30 Jul 2012 21:39:04 +0000 (14:39 -0700)]
mm: fix wrong argument of migrate_huge_pages() in soft_offline_huge_page()

commit dc32f63453f56d07a1073a697dcd843dd3098c09 upstream.

Commit a6bc32b89922 ("mm: compaction: introduce sync-light migration for
use by compaction") changed the declaration of migrate_pages() and
migrate_huge_pages().

But it missed changing the argument of migrate_huge_pages() in
soft_offline_huge_page().  In this case, we should call
migrate_huge_pages() with MIGRATE_SYNC.

Additionally, there is a mismatch between type the of argument and the
function declaration for migrate_pages().

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agopcdp: use early_ioremap/early_iounmap to access pcdp table
Greg Pearson [Mon, 30 Jul 2012 21:39:05 +0000 (14:39 -0700)]
pcdp: use early_ioremap/early_iounmap to access pcdp table

commit 6c4088ac3a4d82779903433bcd5f048c58fb1aca upstream.

efi_setup_pcdp_console() is called during boot to parse the HCDP/PCDP
EFI system table and setup an early console for printk output.  The
routine uses ioremap/iounmap to setup access to the HCDP/PCDP table
information.

The call to ioremap is happening early in the boot process which leads
to a panic on x86_64 systems:

    panic+0x01ca
    do_exit+0x043c
    oops_end+0x00a7
    no_context+0x0119
    __bad_area_nosemaphore+0x0138
    bad_area_nosemaphore+0x000e
    do_page_fault+0x0321
    page_fault+0x0020
    reserve_memtype+0x02a1
    __ioremap_caller+0x0123
    ioremap_nocache+0x0012
    efi_setup_pcdp_console+0x002b
    setup_arch+0x03a9
    start_kernel+0x00d4
    x86_64_start_reservations+0x012c
    x86_64_start_kernel+0x00fe

This replaces the calls to ioremap/iounmap in efi_setup_pcdp_console()
with calls to early_ioremap/early_iounmap which can be called during
early boot.

This patch was tested on an x86_64 prototype system which uses the
HCDP/PCDP table for early console setup.

Signed-off-by: Greg Pearson <greg.pearson@hp.com>
Acked-by: Khalid Aziz <khalid.aziz@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomedia: ene_ir: Fix driver initialisation
Luis Henriques [Tue, 19 Jun 2012 14:29:49 +0000 (11:29 -0300)]
media: ene_ir: Fix driver initialisation

commit b31b021988fed9e3741a46918f14ba9b063811db upstream.

commit 9ef449c6b31bb6a8e6dedc24de475a3b8c79be20 ("[media] rc: Postpone ISR
registration") fixed an early ISR registration on several drivers.  It did
however also introduced a bug by moving the invocation of pnp_port_start()
to the end of the probe function.

This patch fixes this issue by moving the invocation of pnp_port_start() to
an earlier stage in the probe function.

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Cc: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonilfs2: fix deadlock issue between chcp and thaw ioctls
Ryusuke Konishi [Mon, 30 Jul 2012 21:42:07 +0000 (14:42 -0700)]
nilfs2: fix deadlock issue between chcp and thaw ioctls

commit 572d8b3945a31bee7c40d21556803e4807fd9141 upstream.

An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command:

 chcp            D ffff88013870f3d0     0  1325   1324 0x00000004
 ...
 Call Trace:
   nilfs_transaction_begin+0x11c/0x1a0 [nilfs2]
   wake_up_bit+0x20/0x20
   copy_from_user+0x18/0x30 [nilfs2]
   nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2]
   nilfs_ioctl+0x252/0x61a [nilfs2]
   do_page_fault+0x311/0x34c
   get_unmapped_area+0x132/0x14e
   do_vfs_ioctl+0x44b/0x490
   __set_task_blocked+0x5a/0x61
   vm_mmap_pgoff+0x76/0x87
   __set_current_blocked+0x30/0x4a
   sys_ioctl+0x4b/0x6f
   system_call_fastpath+0x16/0x1b
 thaw            D ffff88013870d890     0  1352   1351 0x00000004
 ...
 Call Trace:
   rwsem_down_failed_common+0xdb/0x10f
   call_rwsem_down_write_failed+0x13/0x20
   down_write+0x25/0x27
   thaw_super+0x13/0x9e
   do_vfs_ioctl+0x1f5/0x490
   vm_mmap_pgoff+0x76/0x87
   sys_ioctl+0x4b/0x6f
   filp_close+0x64/0x6c
   system_call_fastpath+0x16/0x1b

where the thaw ioctl deadlocked at thaw_super() when called while chcp was
waiting at nilfs_transaction_begin() called from
nilfs_ioctl_change_cpmode().  This deadlock is 100% reproducible.

This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in
read mode and then waits for unfreezing in nilfs_transaction_begin(),
whereas thaw_super() locks sb->s_umount in write mode.  The locking of
sb->s_umount here was intended to make snapshot mounts and the downgrade
of snapshots to checkpoints exclusive.

This fixes the deadlock issue by replacing the sb->s_umount usage in
nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot
mounts.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoSUNRPC: return negative value in case rpcbind client creation error
Stanislav Kinsbursky [Fri, 20 Jul 2012 11:57:48 +0000 (15:57 +0400)]
SUNRPC: return negative value in case rpcbind client creation error

commit caea33da898e4e14f0ba58173e3b7689981d2c0b upstream.

Without this patch kernel will panic on LockD start, because lockd_up() checks
lockd_up_net() result for negative value.
From my pow it's better to return negative value from rpcbind routines instead
of replacing all such checks like in lockd_up().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agosunrpc: clnt: Add missing braces
Joe Perches [Wed, 18 Jul 2012 18:17:11 +0000 (11:17 -0700)]
sunrpc: clnt: Add missing braces

commit cac5d07e3ca696dcacfb123553cf6c722111cfd3 upstream.

Add a missing set of braces that commit 4e0038b6b24
("SUNRPC: Move clnt->cl_server into struct rpc_xprt")
forgot.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoasus-wmi: use ASUS_WMI_METHODID_DSTS2 as default DSTS ID.
Alex Hung [Wed, 20 Jun 2012 03:47:35 +0000 (11:47 +0800)]
asus-wmi: use ASUS_WMI_METHODID_DSTS2 as default DSTS ID.

commit 63a78bb1051b240417daad3a3fa9c1bb10646dca upstream.

According to responses from the BIOS team, ASUS_WMI_METHODID_DSTS2
(0x53545344) will be used as future DSTS ID. In addition, calling
asus_wmi_evaluate_method(ASUS_WMI_METHODID_DSTS2, 0, 0, NULL) returns
ASUS_WMI_UNSUPPORTED_METHOD in new ASUS laptop PCs. This patch fixes
no DSTS ID will be assigned in this case.

Signed-off-by: Alex Hung <alex.hung@canonical.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoRedefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts
Tony Luck [Thu, 26 Jul 2012 17:55:26 +0000 (10:55 -0700)]
Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts

commit a119365586b0130dfea06457f584953e0ff6481d upstream.

The following build error occured during a ia64 build with
swap-over-NFS patches applied.

net/core/sock.c:274:36: error: initializer element is not constant
net/core/sock.c:274:36: error: (near initialization for 'memalloc_socks')
net/core/sock.c:274:36: error: initializer element is not constant

This is identical to a parisc build error. Fengguang Wu, Mel Gorman
and James Bottomley did all the legwork to track the root cause of
the problem. This fix and entire commit log is shamelessly copied
from them with one extra detail to change a dubious runtime use of
ATOMIC_INIT() to atomic_set() in drivers/char/mspec.c

Dave Anglin says:
> Here is the line in sock.i:
>
> struct static_key memalloc_socks = ((struct static_key) { .enabled =
> ((atomic_t) { (0) }) });

The above line contains two compound literals.  It also uses a designated
initializer to initialize the field enabled.  A compound literal is not a
constant expression.

The location of the above statement isn't fully clear, but if a compound
literal occurs outside the body of a function, the initializer list must
consist of constant expressions.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoLinux 3.4.8 v3.4.8
Greg Kroah-Hartman [Thu, 9 Aug 2012 15:32:11 +0000 (08:32 -0700)]
Linux 3.4.8

12 years agofutex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()
Darren Hart [Fri, 20 Jul 2012 18:53:31 +0000 (11:53 -0700)]
futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()

commit 6f7b0a2a5c0fb03be7c25bd1745baa50582348ef upstream.

If uaddr == uaddr2, then we have broken the rule of only requeueing
from a non-pi futex to a pi futex with this call. If we attempt this,
as the trinity test suite manages to do, we miss early wakeups as
q.key is equal to key2 (because they are the same uaddr). We will then
attempt to dereference the pi_mutex (which would exist had the futex_q
been properly requeued to a pi futex) and trigger a NULL pointer
dereference.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/ad82bfe7f7d130247fbe2b5b4275654807774227.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agofutex: Fix bug in WARN_ON for NULL q.pi_state
Darren Hart [Fri, 20 Jul 2012 18:53:30 +0000 (11:53 -0700)]
futex: Fix bug in WARN_ON for NULL q.pi_state

commit f27071cb7fe3e1d37a9dbe6c0dfc5395cd40fa43 upstream.

The WARN_ON in futex_wait_requeue_pi() for a NULL q.pi_state was testing
the address (&q.pi_state) of the pointer instead of the value
(q.pi_state) of the pointer. Correct it accordingly.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/1c85d97f6e5f79ec389a4ead3e367363c74bd09a.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agofutex: Test for pi_mutex on fault in futex_wait_requeue_pi()
Darren Hart [Fri, 20 Jul 2012 18:53:29 +0000 (11:53 -0700)]
futex: Test for pi_mutex on fault in futex_wait_requeue_pi()

commit b6070a8d9853eda010a549fa9a09eb8d7269b929 upstream.

If fixup_pi_state_owner() faults, pi_mutex may be NULL. Test
for pi_mutex != NULL before testing the owner against current
and possibly unlocking it.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Link: http://lkml.kernel.org/r/dc59890338fc413606f04e5c5b131530734dae3d.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agom68k: Correct the Atari ALLOWINT definition
Mikael Pettersson [Wed, 18 Apr 2012 22:53:36 +0000 (00:53 +0200)]
m68k: Correct the Atari ALLOWINT definition

commit c663600584a596b5e66258cc10716fb781a5c2c9 upstream.

Booting a 3.2, 3.3, or 3.4-rc4 kernel on an Atari using the
`nfeth' ethernet device triggers a WARN_ONCE() in generic irq
handling code on the first irq for that device:

WARNING: at kernel/irq/handle.c:146 handle_irq_event_percpu+0x134/0x142()
irq 3 handler nfeth_interrupt+0x0/0x194 enabled interrupts
Modules linked in:
Call Trace: [<000299b2>] warn_slowpath_common+0x48/0x6a
 [<000299c0>] warn_slowpath_common+0x56/0x6a
 [<00029a4c>] warn_slowpath_fmt+0x2a/0x32
 [<0005b34c>] handle_irq_event_percpu+0x134/0x142
 [<0005b34c>] handle_irq_event_percpu+0x134/0x142
 [<0000a584>] nfeth_interrupt+0x0/0x194
 [<001ba0a8>] schedule_preempt_disabled+0x0/0xc
 [<0005b37a>] handle_irq_event+0x20/0x2c
 [<0005add4>] generic_handle_irq+0x2c/0x3a
 [<00002ab6>] do_IRQ+0x20/0x32
 [<0000289e>] auto_irqhandler_fixup+0x4/0x6
 [<00003144>] cpu_idle+0x22/0x2e
 [<001b8a78>] printk+0x0/0x18
 [<0024d112>] start_kernel+0x37a/0x386
 [<0003021d>] __do_proc_dointvec+0xb1/0x366
 [<0003021d>] __do_proc_dointvec+0xb1/0x366
 [<0024c31e>] _sinittext+0x31e/0x9c0

After invoking the irq's handler the kernel sees !irqs_disabled()
and concludes that the handler erroneously enabled interrupts.

However, debugging shows that !irqs_disabled() is true even before
the handler is invoked, which indicates a problem in the platform
code rather than the specific driver.

The warning does not occur in 3.1 or older kernels.

It turns out that the ALLOWINT definition for Atari is incorrect.

The Atari definition of ALLOWINT is ~0x400, the stated purpose of
that is to avoid taking HSYNC interrupts.  irqs_disabled() returns
true if the 3-bit ipl & 4 is non-zero.  The nfeth interrupt runs at
ipl 3 (it's autovector 3), but 3 & 4 is zero so irqs_disabled() is
false, and the warning above is generated.

When interrupts are explicitly disabled, ipl is set to 7.  When they
are enabled, ipl is masked with ALLOWINT.  On Atari this will result
in ipl = 3, which blocks interrupts at ipl 3 and below.  So how come
nfeth interrupts at ipl 3 are received at all?  That's because ipl
is reset to 2 by Atari-specific code in default_idle(), again with
the stated purpose of blocking HSYNC interrupts.  This discrepancy
means that ipl 3 can remain blocked for longer than intended.

Both default_idle() and falcon_hblhandler() identify HSYNC with
ipl 2, and the "Atari ST/.../F030 Hardware Register Listing" agrees,
but ALLOWINT is defined as if HSYNC was ipl 3.

[As an experiment I modified default_idle() to reset ipl to 3, and
as expected that resulted in all nfeth interrupts being blocked.]

The fix is simple: define ALLOWINT as ~0x500 instead.  This makes
arch_local_irq_enable() consistent with default_idle(), and prevents
the !irqs_disabled() problems for ipl 3 interrupts.

Tested on Atari running in an Aranym VM.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Tested-by: Michael Schmitz <schmitzmic@googlemail.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agom68k: Make sys_atomic_cmpxchg_32 work on classic m68k
Andreas Schwab [Fri, 27 Jul 2012 22:20:34 +0000 (00:20 +0200)]
m68k: Make sys_atomic_cmpxchg_32 work on classic m68k

commit 9e2760d18b3cf179534bbc27692c84879c61b97c upstream.

User space access must always go through uaccess accessors, since on
classic m68k user space and kernel space are completely separate.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Thorsten Glaser <tg@debian.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoASoC: wm8994: Ensure there are enough BCLKs for four channels
Mark Brown [Fri, 22 Jun 2012 16:21:17 +0000 (17:21 +0100)]
ASoC: wm8994: Ensure there are enough BCLKs for four channels

commit b8edf3e5522735c8ce78b81845f7a1a2d4a08626 upstream.

Otherwise if someone tries to use all four channels on AIF1 with the
device in master mode we won't be able to clock out all the data.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoASoC: wm8962: Allow VMID time to fully ramp
Mark Brown [Mon, 30 Jul 2012 17:24:19 +0000 (18:24 +0100)]
ASoC: wm8962: Allow VMID time to fully ramp

commit 9d40e5582c9c4cfb6977ba2a0ca9c2ed82c56f21 upstream.

Required for reliable power up from cold.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - Support dock on Lenovo Thinkpad T530 with ALC269VC
Takashi Iwai [Thu, 2 Aug 2012 07:04:39 +0000 (09:04 +0200)]
ALSA: hda - Support dock on Lenovo Thinkpad T530 with ALC269VC

commit 707fba3fa76a4c8855552f5d4c1a12430c09bce8 upstream.

Lenovo Thinkpad T530 with ALC269VC codec has a dock port but BIOS
doesn't set up the pins properly.  Enable the pins as well as on
Thinkpad X230 Tablet.

Reported-and-tested-by: Mario <anyc@hadiko.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - Fix mute-LED GPIO initialization for IDT codecs
Takashi Iwai [Tue, 31 Jul 2012 08:40:05 +0000 (10:40 +0200)]
ALSA: hda - Fix mute-LED GPIO initialization for IDT codecs

commit 1f43f6c1bc8d740e75b4177eb29110858bb5fea2 upstream.

The IDT codecs initializes the GPIO setup for mute LEDs via
snd_hda_sync_vmaster_hook().  This works in most cases except for the
very first call, which is called before PCM and control creations.
Thus before Master switch is set manually via alsactl, the mute LED
may show the wrong state, depending on the polarity.

Now it's fixed by calling the LED-status update function manually when
no vmaster is set yet.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - Fix polarity of mute LED on HP Mini 210
Takashi Iwai [Tue, 31 Jul 2012 08:16:59 +0000 (10:16 +0200)]
ALSA: hda - Fix polarity of mute LED on HP Mini 210

commit ff8a1e274cbc11da6b57849f925b895a212b56c9 upstream.

The commit a3e199732b made the LED working again on HP Mini 210 but
with a wrong polarity.  This patch fixes the polarity for this
machine, and also introduce a new model string "hp-inv-led".

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=772923

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - Fix mute-LED GPIO setup for HP Mini 210
Takashi Iwai [Thu, 26 Jul 2012 06:17:20 +0000 (08:17 +0200)]
ALSA: hda - Fix mute-LED GPIO setup for HP Mini 210

commit a3e199732b8e2b272e82cc1ccc49c35239ed6c5a upstream.

BIOS on HP Mini 210 doesn't provide the proper "HP_Mute_LED" DMI
string, thus the driver doesn't initialize the GPIO, too.  In the
earlier kernel, the driver falls back to GPIO1, but since 3.3 we've
stopped this due to other wrongly advertised machines.

For fixing this particular case, add a new model type to specify the
default polarity explicitly so that the fallback to GPIO1 is handled.

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=772923

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - Fix invalid D3 of headphone DAC on VT202x codecs
Takashi Iwai [Wed, 25 Jul 2012 11:54:55 +0000 (13:54 +0200)]
ALSA: hda - Fix invalid D3 of headphone DAC on VT202x codecs

commit 6162552b0de6ba80937c3dd53e084967851cd199 upstream.

We've got a bug report about the silent output from the headphone on a
mobo with VT2021, and spotted out that this was because of the wrong
D3 state on the DAC for the headphone output.  The bug is triggered by
the incomplete check for this DAC in set_widgets_power_state_vt1718S().
It checks only the connectivity of the primary output (0x27) but
doesn't consider the path from the headphone pin (0x28).

Now this patch fixes the problem by checking both pins for DAC 0x0b.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: mpu401: Fix missing initialization of irq field
Takashi Iwai [Mon, 23 Jul 2012 09:35:55 +0000 (11:35 +0200)]
ALSA: mpu401: Fix missing initialization of irq field

commit bc733d495267a23ef8660220d696c6e549ce30b3 upstream.

The irq field of struct snd_mpu401 is supposed to be initialized to -1.
Since it's set to zero as of now, a probing error before the irq
installation results in a kernel warning "Trying to free already-free
IRQ 0".

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=44821
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: snd-usb: fix clock source validity index
Daniel Mack [Wed, 1 Aug 2012 08:16:53 +0000 (10:16 +0200)]
ALSA: snd-usb: fix clock source validity index

commit aff252a848ce21b431ba822de3dab9c4c94571cb upstream.

uac_clock_source_is_valid() uses the control selector value to access
the bmControls bitmap of the clock source unit. This is wrong, as
control selector values start from 1, while the bitmap uses all
available bits.

In other words, "Clock Validity Control" is stored in D3..2, not D5..4
of the clock selector unit's bmControls.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-by: Andreas Koch <andreas@akdesigninc.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoUSB: echi-dbgp: increase the controller wait time to come out of halt.
Colin Ian King [Mon, 30 Jul 2012 15:06:42 +0000 (16:06 +0100)]
USB: echi-dbgp: increase the controller wait time to come out of halt.

commit f96a4216e85050c0a9d41a41ecb0ae9d8e39b509 upstream.

The default 10 microsecond delay for the controller to come out of
halt in dbgp_ehci_startup is too short, so increase it to 1 millisecond.

This is based on emperical testing on various USB debug ports on
modern machines such as a Lenovo X220i and an Ivybridge development
platform that needed to wait ~450-950 microseconds.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonet/tun: fix ioctl() based info leaks
Mathias Krause [Sun, 29 Jul 2012 19:45:14 +0000 (19:45 +0000)]
net/tun: fix ioctl() based info leaks

[ Upstream commits a117dacde0288f3ec60b6e5bcedae8fa37ee0dfc
  and 8bbb181308bc348e02bfdbebdedd4e4ec9d452ce ]

The tun module leaks up to 36 bytes of memory by not fully initializing
a structure located on the stack that gets copied to user memory by the
TUNGETIFF and SIOCGIFHWADDR ioctl()s.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agotcp: perform DMA to userspace only if there is a task waiting for it
Jiri Kosina [Fri, 27 Jul 2012 10:38:50 +0000 (10:38 +0000)]
tcp: perform DMA to userspace only if there is a task waiting for it

[ Upstream commit 59ea33a68a9083ac98515e4861c00e71efdc49a1 ]

Back in 2006, commit 1a2449a87b ("[I/OAT]: TCP recv offload to I/OAT")
added support for receive offloading to IOAT dma engine if available.

The code in tcp_rcv_established() tries to perform early DMA copy if
applicable. It however does so without checking whether the userspace
task is actually expecting the data in the buffer.

This is not a problem under normal circumstances, but there is a corner
case where this doesn't work -- and that's when MSG_TRUNC flag to
recvmsg() is used.

If the IOAT dma engine is not used, the code properly checks whether
there is a valid ucopy.task and the socket is owned by userspace, but
misses the check in the dmaengine case.

This problem can be observed in real trivially -- for example 'tbench' is a
good reproducer, as it makes a heavy use of MSG_TRUNC. On systems utilizing
IOAT, you will soon find tbench waiting indefinitely in sk_wait_data(), as they
have been already early-copied in tcp_rcv_established() using dma engine.

This patch introduces the same check we are performing in the simple
iovec copy case to the IOAT case as well. It fixes the indefinite
recvmsg(MSG_TRUNC) hangs.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonet: fix rtnetlink IFF_PROMISC and IFF_ALLMULTI handling
Jiri Benc [Fri, 27 Jul 2012 02:58:22 +0000 (02:58 +0000)]
net: fix rtnetlink IFF_PROMISC and IFF_ALLMULTI handling

[ Upstream commit b1beb681cba5358f62e6187340660ade226a5fcc ]

When device flags are set using rtnetlink, IFF_PROMISC and IFF_ALLMULTI
flags are handled specially. Function dev_change_flags sets IFF_PROMISC and
IFF_ALLMULTI bits in dev->gflags according to the passed value but
do_setlink passes a result of rtnl_dev_combine_flags which takes those bits
from dev->flags.

This can be easily trigerred by doing:

tcpdump -i eth0 &
ip l s up eth0

ip sets IFF_UP flag in ifi_flags and ifi_change, which is combined with
IFF_PROMISC by rtnl_dev_combine_flags, causing __dev_change_flags to set
IFF_PROMISC in gflags.

Reported-by: Max Matveev <makc@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoUSB: kaweth.c: use GFP_ATOMIC under spin_lock
Dan Carpenter [Fri, 27 Jul 2012 01:46:51 +0000 (01:46 +0000)]
USB: kaweth.c: use GFP_ATOMIC under spin_lock

[ Upstream commit e4c7f259c5be99dcfc3d98f913590663b0305bf8 ]

The problem is that we call this with a spin lock held.  The call tree
is:
kaweth_start_xmit() holds kaweth->device_lock.
-> kaweth_async_set_rx_mode()
   -> kaweth_control()
      -> kaweth_internal_control_msg()

The kaweth_internal_control_msg() function is only called from
kaweth_control() which used GFP_ATOMIC for its allocations.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agotcp: Add TCP_USER_TIMEOUT negative value check
Hangbin Liu [Thu, 26 Jul 2012 22:52:21 +0000 (22:52 +0000)]
tcp: Add TCP_USER_TIMEOUT negative value check

[ Upstream commit 42493570100b91ef663c4c6f0c0fdab238f9d3c2 ]

TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int. But
patch "tcp: Add TCP_USER_TIMEOUT socket option"(dca43c75) didn't check the negative
values. If a user assign -1 to it, the socket will set successfully and wait
for 4294967295 miliseconds. This patch add a negative value check to avoid
this issue.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agowanmain: comparing array with NULL
Alan Cox [Tue, 24 Jul 2012 08:16:25 +0000 (08:16 +0000)]
wanmain: comparing array with NULL

[ Upstream commit 8b72ff6484fe303e01498b58621810a114f3cf09 ]

gcc really should warn about these !

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agocaif: fix NULL pointer check
Alan Cox [Tue, 24 Jul 2012 02:42:14 +0000 (02:42 +0000)]
caif: fix NULL pointer check

[ Upstream commit c66b9b7d365444b433307ebb18734757cb668a02 ]

Reported-by: <rucsoftsec@gmail.com>
Resolves-bug: http://bugzilla.kernel.org/show_bug?44441
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agor8169: revert "add byte queue limit support".
Francois Romieu [Mon, 23 Jul 2012 20:55:55 +0000 (22:55 +0200)]
r8169: revert "add byte queue limit support".

[ Upstream commit 17bcb684f08649a2ab6a7dcd8288332e72d208f1 ]

This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060.

First it appears in bisection, then reverting it solves the usual
netdev watchdog problem for different people. I don't have a proper
fix yet so get rid of it.

Bisected-and-reported-by: Alex Villacís Lasso <a_villacis@palosanto.com>
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonet: Fix references to out-of-scope variables in put_cmsg_compat()
Jesper Juhl [Sun, 22 Jul 2012 11:37:20 +0000 (11:37 +0000)]
net: Fix references to out-of-scope variables in put_cmsg_compat()

[ Upstream commit 818810472b129004c16fc51bf0a570b60776bfb7 ]

In net/compat.c::put_cmsg_compat() we may assign 'data' the address of
either the 'ctv' or 'cts' local variables inside the 'if
(!COMPAT_USE_64BIT_TIME)' branch.

Those variables go out of scope at the end of the 'if' statement, so
when we use 'data' further down in 'copy_to_user(CMSG_COMPAT_DATA(cm),
data, cmlen - sizeof(struct compat_cmsghdr))' there's no telling what
it may be refering to - not good.

Fix the problem by simply giving 'ctv' and 'cts' function scope.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agocipso: don't follow a NULL pointer when setsockopt() is called
Paul Moore [Tue, 17 Jul 2012 11:07:47 +0000 (11:07 +0000)]
cipso: don't follow a NULL pointer when setsockopt() is called

[ Upstream commit 89d7ae34cdda4195809a5a987f697a517a2a3177 ]

As reported by Alan Cox, and verified by Lin Ming, when a user
attempts to add a CIPSO option to a socket using the CIPSO_V4_TAG_LOCAL
tag the kernel dies a terrible death when it attempts to follow a NULL
pointer (the skb argument to cipso_v4_validate() is NULL when called via
the setsockopt() syscall).

This patch fixes this by first checking to ensure that the skb is
non-NULL before using it to find the incoming network interface.  In
the unlikely case where the skb is NULL and the user attempts to add
a CIPSO option with the _TAG_LOCAL tag we return an error as this is
not something we want to allow.

A simple reproducer, kindly supplied by Lin Ming, although you must
have the CIPSO DOI #3 configure on the system first or you will be
caught early in cipso_v4_validate():

#include <sys/types.h>
#include <sys/socket.h>
#include <linux/ip.h>
#include <linux/in.h>
#include <string.h>

struct local_tag {
char type;
char length;
char info[4];
};

struct cipso {
char type;
char length;
char doi[4];
struct local_tag local;
};

int main(int argc, char **argv)
{
int sockfd;
struct cipso cipso = {
.type = IPOPT_CIPSO,
.length = sizeof(struct cipso),
.local = {
.type = 128,
.length = sizeof(struct local_tag),
},
};

memset(cipso.doi, 0, 4);
cipso.doi[3] = 3;

sockfd = socket(AF_INET, SOCK_DGRAM, 0);
#define SOL_IP 0
setsockopt(sockfd, SOL_IP, IP_OPTIONS,
&cipso, sizeof(struct cipso));

return 0;
}

CC: Lin Ming <mlin@ss.pku.edu.cn>
Reported-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agocaif: Fix access to freed pernet memory
Sjur Brændeland [Sun, 15 Jul 2012 10:10:14 +0000 (10:10 +0000)]
caif: Fix access to freed pernet memory

[ Upstream commit 96f80d123eff05c3cd4701463786b87952a6c3ac ]

unregister_netdevice_notifier() must be called before
unregister_pernet_subsys() to avoid accessing already freed
pernet memory. This fixes the following oops when doing rmmod:

Call Trace:
 [<ffffffffa0f802bd>] caif_device_notify+0x4d/0x5a0 [caif]
 [<ffffffff81552ba9>] unregister_netdevice_notifier+0xb9/0x100
 [<ffffffffa0f86dcc>] caif_device_exit+0x1c/0x250 [caif]
 [<ffffffff810e7734>] sys_delete_module+0x1a4/0x300
 [<ffffffff810da82d>] ? trace_hardirqs_on_caller+0x15d/0x1e0
 [<ffffffff813517de>] ? trace_hardirqs_on_thunk+0x3a/0x3
 [<ffffffff81696bad>] system_call_fastpath+0x1a/0x1f

RIP
 [<ffffffffa0f7f561>] caif_get+0x51/0xb0 [caif]

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agosctp: Fix list corruption resulting from freeing an association on a list
Neil Horman [Mon, 16 Jul 2012 09:13:51 +0000 (09:13 +0000)]
sctp: Fix list corruption resulting from freeing an association on a list

[ Upstream commit 2eebc1e188e9e45886ee00662519849339884d6d ]

A few days ago Dave Jones reported this oops:

[22766.294255] general protection fault: 0000 [#1] PREEMPT SMP
[22766.295376] CPU 0
[22766.295384] Modules linked in:
[22766.387137]  ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90
ffff880147c03a74
[22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000
[22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000,
[22766.387137] Stack:
[22766.387140]  ffff880147c03a10
[22766.387140]  ffffffffa169f2b6
[22766.387140]  ffff88013ed95728
[22766.387143]  0000000000000002
[22766.387143]  0000000000000000
[22766.387143]  ffff880003fad062
[22766.387144]  ffff88013c120000
[22766.387144]
[22766.387145] Call Trace:
[22766.387145]  <IRQ>
[22766.387150]  [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0
[sctp]
[22766.387154]  [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
[22766.387157]  [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
[22766.387161]  [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
[22766.387163]  [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
[22766.387166]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387168]  [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
[22766.387169]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387171]  [<ffffffff81590a07>] ip_local_deliver+0x47/0x80
[22766.387172]  [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
[22766.387174]  [<ffffffff81590c54>] ip_rcv+0x214/0x320
[22766.387176]  [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
[22766.387178]  [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
[22766.387180]  [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
[22766.387182]  [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
[22766.387183]  [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
[22766.387185]  [<ffffffff81559280>] napi_skb_finish+0x70/0xa0
[22766.387187]  [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
[22766.387218]  [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
[22766.387242]  [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
[22766.387266]  [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
[22766.387268]  [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
[22766.387270]  [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
[22766.387273]  [<ffffffff810734d0>] __do_softirq+0xe0/0x420
[22766.387275]  [<ffffffff8169826c>] call_softirq+0x1c/0x30
[22766.387278]  [<ffffffff8101db15>] do_softirq+0xd5/0x110
[22766.387279]  [<ffffffff81073bc5>] irq_exit+0xd5/0xe0
[22766.387281]  [<ffffffff81698b03>] do_IRQ+0x63/0xd0
[22766.387283]  [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
[22766.387283]  <EOI>
[22766.387284]
[22766.387285]  [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
[22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48
89 e5 48 83
ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00
48 89 fb
49 89 f5 66 c1 c0 08 66 39 46 02
[22766.387307]
[22766.387307] RIP
[22766.387311]  [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
[22766.387311]  RSP <ffff880147c039b0>
[22766.387142]  ffffffffa16ab120
[22766.599537] ---[ end trace 3f6dae82e37b17f5 ]---
[22766.601221] Kernel panic - not syncing: Fatal exception in interrupt

It appears from his analysis and some staring at the code that this is likely
occuring because an association is getting freed while still on the
sctp_assoc_hashtable.  As a result, we get a gpf when traversing the hashtable
while a freed node corrupts part of the list.

Nominally I would think that an mibalanced refcount was responsible for this,
but I can't seem to find any obvious imbalance.  What I did note however was
that the two places where we create an association using
sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths
which free a newly created association after calling sctp_primitive_ASSOCIATE.
sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which
issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to
the aforementioned hash table.  the sctp command interpreter that process side
effects has not way to unwind previously processed commands, so freeing the
association from the __sctp_connect or sctp_sendmsg error path would lead to a
freed association remaining on this hash table.

I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init,
which allows us to proerly use hlist_unhashed to check if the node is on a
hashlist safely during a delete.  That in turn alows us to safely call
sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths
before freeing them, regardles of what the associations state is on the hash
list.

I noted, while I was doing this, that the __sctp_unhash_endpoint was using
hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes
pointers to make that function work properly, so I fixed that up in a simmilar
fashion.

I attempted to test this using a virtual guest running the SCTP_RR test from
netperf in a loop while running the trinity fuzzer, both in a loop.  I wasn't
able to recreate the problem prior to this fix, nor was I able to trigger the
failure after (neither of which I suppose is suprising).  Given the trace above
however, I think its likely that this is what we hit.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: davej@redhat.com
CC: davej@redhat.com
CC: "David S. Miller" <davem@davemloft.net>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agosch_sfb: Fix missing NULL check
Alan Cox [Thu, 12 Jul 2012 03:39:11 +0000 (03:39 +0000)]
sch_sfb: Fix missing NULL check

[ Upstream commit 7ac2908e4b2edaec60e9090ddb4d9ceb76c05e7d ]

Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=44461

Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agobnx2: Fix bug in bnx2_free_tx_skbs().
Michael Chan [Tue, 10 Jul 2012 10:04:40 +0000 (10:04 +0000)]
bnx2: Fix bug in bnx2_free_tx_skbs().

[ Upstream commit c1f5163de417dab01fa9daaf09a74bbb19303f3c ]

In rare cases, bnx2x_free_tx_skbs() can unmap the wrong DMA address
when it gets to the last entry of the tx ring.  We were not using
the proper macro to skip the last entry when advancing the tx index.

Reported-by: Zongyun Lai <zlai@vmware.com>
Reviewed-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonet: Fix memory leak - vlan_info struct
Amir Hanania [Mon, 9 Jul 2012 20:47:19 +0000 (20:47 +0000)]
net: Fix memory leak - vlan_info struct

[ Upstream commit efc73f4bbc238d4f579fb612c04c8e1dd8a82979 ]

In driver reload test there is a memory leak.
The structure vlan_info was not freed when the driver was removed.
It was not released since the nr_vids var is one after last vlan was removed.
The nr_vids is one, since vlan zero is added to the interface when the interface
is being set, but the vlan zero is not deleted at unregister.
Fix - delete vlan zero when we unregister the device.

Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agogianfar: fix potential sk_wmem_alloc imbalance
Eric Dumazet [Thu, 5 Jul 2012 11:45:13 +0000 (11:45 +0000)]
gianfar: fix potential sk_wmem_alloc imbalance

[ Upstream commit 313b037cf054ec908de92fb4c085403ffd7420d4 ]

commit db83d136d7f753 (gianfar: Fix missing sock reference when
processing TX time stamps) added a potential sk_wmem_alloc imbalance

If the new skb has a different truesize than old one, we can get a
negative sk_wmem_alloc once new skb is orphaned at TX completion.

Now we no longer early orphan skbs in dev_hard_start_xmit(), this
probably can lead to fatal bugs.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Manfred Rudigier <manfred.rudigier@omicron.at>
Cc: Claudiu Manoil <claudiu.manoil@freescale.com>
Cc: Jiajun Wu <b06378@freescale.com>
Cc: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonetem: add limitation to reordered packets
Eric Dumazet [Tue, 3 Jul 2012 20:55:21 +0000 (20:55 +0000)]
netem: add limitation to reordered packets

[ Upstream commit 960fb66e520a405dde39ff883f17ff2669c13d85 ]

Fix two netem bugs :

1) When a frame was dropped by tfifo_enqueue(), drop counter
   was incremented twice.

2) When reordering is triggered, we enqueue a packet without
   checking queue limit. This can OOM pretty fast when this
   is repeated enough, since skbs are orphaned, no socket limit
   can help in this situation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mark Gordon <msg@google.com>
Cc: Andreas Terzis <aterzis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoatl1c: fix issue of transmit queue 0 timed out
Cloud Ren [Tue, 3 Jul 2012 16:51:48 +0000 (16:51 +0000)]
atl1c: fix issue of transmit queue 0 timed out

[ Upstream commit b94e52f62683dc0b00c6d1b58b80929a078c0fd5 ]

some people report atl1c could cause system hang with following
kernel trace info:
---------------------------------------
WARNING: at.../net/sched/sch_generic.c:258 dev_watchdog+0x1db/0x1d0()
...
NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
...
---------------------------------------
This is caused by netif_stop_queue calling when cable Link is down.
So remove netif_stop_queue, because link_watch will take it over.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Cloud Ren <cjren@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoext4: undo ext4_calc_metadata_amount if we fail to claim space
Theodore Ts'o [Mon, 23 Jul 2012 04:00:20 +0000 (00:00 -0400)]
ext4: undo ext4_calc_metadata_amount if we fail to claim space

commit 03179fe92318e7934c180d96f12eff2cb36ef7b6 upstream.

The function ext4_calc_metadata_amount() has side effects, although
it's not obvious from its function name.  So if we fail to claim
space, regardless of whether we retry to claim the space again, or
return an error, we need to undo these side effects.

Otherwise we can end up incorrectly calculating the number of metadata
blocks needed for the operation, which was responsible for an xfstests
failure for test #271 when using an ext2 file system with delalloc
enabled.

Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoext4: don't let i_reserved_meta_blocks go negative
Brian Foster [Mon, 23 Jul 2012 03:59:40 +0000 (23:59 -0400)]
ext4: don't let i_reserved_meta_blocks go negative

commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream.

If we hit a condition where we have allocated metadata blocks that
were not appropriately reserved, we risk underflow of
ei->i_reserved_meta_blocks.  In turn, this can throw
sbi->s_dirtyclusters_counter significantly out of whack and undermine
the nondelalloc fallback logic in ext4_nonda_switch().  Warn if this
occurs and set i_allocated_meta_blocks to avoid this problem.

This condition is reproduced by xfstests 270 against ext2 with
delalloc enabled:

Mar 28 08:58:02 localhost kernel: [  171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28
Mar 28 08:58:02 localhost kernel: [  171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost

270 ultimately fails with an inconsistent filesystem and requires an
fsck to repair.  The cause of the error is an underflow in
ext4_da_update_reserve_space() due to an unreserved meta block
allocation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoext4: fix hole punch failure when depth is greater than 0
Ashish Sangwan [Mon, 23 Jul 2012 02:49:08 +0000 (22:49 -0400)]
ext4: fix hole punch failure when depth is greater than 0

commit 968dee77220768a5f52cf8b21d0bdb73486febef upstream.

Whether to continue removing extents or not is decided by the return
value of function ext4_ext_more_to_rm() which checks 2 conditions:
a) if there are no more indexes to process.
b) if the number of entries are decreased in the header of "depth -1".

In case of hole punch, if the last block to be removed is not part of
the last extent index than this index will not be deleted, hence the
number of valid entries in the extent header of "depth - 1" will
remain as it is and ext4_ext_more_to_rm will return 0 although the
required blocks are not yet removed.

This patch fixes the above mentioned problem as instead of removing
the extents from the end of file, it starts removing the blocks from
the particular extent from which removing blocks is actually required
and continue backward until done.

Signed-off-by: Ashish Sangwan <ashish.sangwan2@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoext4: fix overhead calculation used by ext4_statfs()
Theodore Ts'o [Mon, 9 Jul 2012 20:27:05 +0000 (16:27 -0400)]
ext4: fix overhead calculation used by ext4_statfs()

commit 952fc18ef9ec707ebdc16c0786ec360295e5ff15 upstream.

Commit f975d6bcc7a introduced bug which caused ext4_statfs() to
miscalculate the number of file system overhead blocks.  This causes
the f_blocks field in the statfs structure to be larger than it should
be.  This would in turn cause the "df" output to show the number of
data blocks in the file system and the number of data blocks used to
be larger than they should be.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoext4: pass a char * to ext4_count_free() instead of a buffer_head ptr
Theodore Ts'o [Sat, 30 Jun 2012 23:14:57 +0000 (19:14 -0400)]
ext4: pass a char * to ext4_count_free() instead of a buffer_head ptr

commit f6fb99cadcd44660c68e13f6eab28333653621e6 upstream.

Make it possible for ext4_count_free to operate on buffers and not
just data in buffer_heads.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonouveau: Fix alignment requirements on src and dst addresses
Maarten Lankhorst [Mon, 4 Jun 2012 10:00:31 +0000 (12:00 +0200)]
nouveau: Fix alignment requirements on src and dst addresses

commit ce806a30470bcd846d148bf39d46de3ad7748228 upstream.

Linear copy works by adding the offset to the buffer address,
which may end up not being 16-byte aligned.

Some tests I've written for prime_pcopy show that the engine
allows this correctly, so the restriction on lowest 4 bits of
address can be lifted safely.

The comments added were by envyas, I think because I used
a newer version.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoNFS: Fix a number of bugs in the idmapper
David Howells [Wed, 25 Jul 2012 15:53:36 +0000 (16:53 +0100)]
NFS: Fix a number of bugs in the idmapper

commit a427b9ec4eda8cd6e641ea24541d30b641fc3140 upstream.

Fix a number of bugs in the NFS idmapper code:

 (1) Only registered key types can be passed to the core keys code, so
     register the legacy idmapper key type.

     This is a requirement because the unregister function cleans up keys
     belonging to that key type so that there aren't dangling pointers to the
     module left behind - including the key->type pointer.

 (2) Rename the legacy key type.  You can't have two key types with the same
     name, and (1) would otherwise require that.

 (3) complete_request_key() must be called in the error path of
     nfs_idmap_legacy_upcall().

 (4) There is one idmap struct for each nfs_client struct.  This means that
     idmap->idmap_key_cons is shared without the use of a lock.  This is a
     problem because key_instantiate_and_link() - as called indirectly by
     idmap_pipe_downcall() - releases anyone waiting for the key to be
     instantiated.

     What happens is that idmap_pipe_downcall() running in the rpc.idmapd
     thread, releases the NFS filesystem in whatever thread that is running in
     to continue.  This may then make another idmapper call, overwriting
     idmap_key_cons before idmap_pipe_downcall() gets the chance to call
     complete_request_key().

     I *think* that reading idmap_key_cons only once, before
     key_instantiate_and_link() is called, and then caching the result in a
     variable is sufficient.

Bug (4) is the cause of:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 0
Oops: 0010 [#1] SMP
CPU 1
Modules linked in: ppdev parport_pc lp parport ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack nfs fscache xt_CHECKSUM auth_rpcgss iptable_mangle nfs_acl bridge stp llc lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_realtek snd_usb_audio snd_hda_intel snd_hda_codec snd_seq snd_pcm snd_hwdep snd_usbmidi_lib snd_rawmidi snd_timer uvcvideo videobuf2_core videodev media videobuf2_vmalloc snd_seq_device videobuf2_memops e1000e vhost_net iTCO_wdt joydev coretemp snd soundcore macvtap macvlan i2c_i801 snd_page_alloc tun iTCO_vendor_support microcode kvm_intel kvm sunrpc hid_logitech_dj usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
Pid: 1229, comm: rpc.idmapd Not tainted 3.4.2-1.fc16.x86_64 #1 Gateway DX4710-UB801A/G33M05G1
RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
RSP: 0018:ffff8801a3645d40  EFLAGS: 00010246
RAX: ffff880077707e30 RBX: ffff880077707f50 RCX: ffff8801a18ccd80
RDX: 0000000000000006 RSI: ffff8801a3645e75 RDI: ffff880077707f50
RBP: ffff8801a3645d88 R08: ffff8801a430f9c0 R09: ffff8801a3645db0
R10: 000000000000000a R11: 0000000000000246 R12: ffff8801a18ccd80
R13: ffff8801a3645e75 R14: ffff8801a430f9c0 R15: 0000000000000006
FS:  00007fb6fb51a700(0000) GS:ffff8801afc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001a49b0000 CR4: 00000000000027e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rpc.idmapd (pid: 1229, threadinfo ffff8801a3644000, task ffff8801a3bf9710)
Stack:
 ffffffff81260878 ffff8801a3645db0 ffff8801a3645db0 ffff880077707a90
 ffff880077707f50 ffff8801a18ccd80 0000000000000006 ffff8801a3645e75
 ffff8801a430f9c0 ffff8801a3645dd8 ffffffff81260983 ffff8801a3645de8
Call Trace:
 [<ffffffff81260878>] ? __key_instantiate_and_link+0x58/0x100
 [<ffffffff81260983>] key_instantiate_and_link+0x63/0xa0
 [<ffffffffa057062b>] idmap_pipe_downcall+0x1cb/0x1e0 [nfs]
 [<ffffffffa0107f57>] rpc_pipe_write+0x67/0x90 [sunrpc]
 [<ffffffff8117f833>] vfs_write+0xb3/0x180
 [<ffffffff8117fb5a>] sys_write+0x4a/0x90
 [<ffffffff81600329>] system_call_fastpath+0x16/0x1b
Code:  Bad RIP value.
RIP  [<          (null)>]           (null)
 RSP <ffff8801a3645d40>
CR2: 0000000000000000

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonfs: skip commit in releasepage if we're freeing memory for fs-related reasons
Jeff Layton [Mon, 23 Jul 2012 17:58:51 +0000 (13:58 -0400)]
nfs: skip commit in releasepage if we're freeing memory for fs-related reasons

commit 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 upstream.

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonfsd4: our filesystems are normally case sensitive
J. Bruce Fields [Tue, 5 Jun 2012 20:52:06 +0000 (16:52 -0400)]
nfsd4: our filesystems are normally case sensitive

commit 2930d381d22b9c56f40dd4c63a8fa59719ca2c3c upstream.

Actually, xfs and jfs can optionally be case insensitive; we'll handle
that case in later patches.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agopnfs-obj: Fix __r4w_get_page when offset is beyond i_size
Boaz Harrosh [Thu, 7 Jun 2012 23:02:30 +0000 (02:02 +0300)]
pnfs-obj: Fix __r4w_get_page when offset is beyond i_size

commit c999ff68029ebd0f56ccae75444f640f6d5a27d2 upstream.

It is very common for the end of the file to be unaligned on
stripe size. But since we know it's beyond file's end then
the XOR should be preformed with all zeros.

Old code used to just read zeros out of the OSD devices, which is a great
waist. But what scares me more about this situation is that, we now have
pages attached to the file's mapping that are beyond i_size. I don't
like the kind of bugs this calls for.

Fix both birds, by returning a global zero_page, if offset is beyond
i_size.

TODO:
Change the API to ->__r4w_get_page() so a NULL can be
returned without being considered as error, since XOR API
treats NULL entries as zero_pages.

[Bug since 3.2. Should apply the same way to all Kernels since]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[bwh: Backported to 3.2: adjust for lack of wdata->header]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agodm thin: fix memory leak in process_prepared_mapping error paths
Joe Thornber [Fri, 27 Jul 2012 14:08:05 +0000 (15:08 +0100)]
dm thin: fix memory leak in process_prepared_mapping error paths

commit 905386f82d08f66726912f303f3e6605248c60a3 upstream.

Fix memory leak in process_prepared_mapping by always freeing
the dm_thin_new_mapping structs from the mapping_pool mempool on
the error paths.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>