review.tizen.org Git - platform/kernel/linux-rpi3.git/log

mm: debugfs: move rounddown_pow_of_two() out from do_fault path

do_fault_around() expects fault_around_bytes rounded down to nearest page
order. Instead of calling rounddown_pow_of_two every time in
fault_around_pages()/fault_around_mask() we could do round down when user
changes fault_around_bytes via debugfs interface.

This also fixes bug when user set fault_around_bytes to 0. Result of
rounddown_pow_of_two(0) is not defined, therefore fault_around_bytes == 0
doesn't work without this patch.

Let's set fault_around_bytes to PAGE_SIZE if user sets to something less
than PAGE_SIZE

[akpm@linux-foundation.org: tweak code layout]
Fixes: a9b0f861("mm: nominate faultaround area in bytes rather than page order")
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org> [3.15.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: oom_notify use-after-free fix

Paul Furtado has reported the following GPF:

  general protection fault: 0000 [#1] SMP
  Modules linked in: ipv6 dm_mod xen_netfront coretemp hwmon x86_pkg_temp_thermal crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 microcode pcspkr ext4 jbd2 mbcache raid0 xen_blkfront
  CPU: 3 PID: 3062 Comm: java Not tainted 3.16.0-rc5 #1
  task: ffff8801cfe8f170 ti: ffff8801d2ec4000 task.ti: ffff8801d2ec4000
  RIP: e030:mem_cgroup_oom_synchronize+0x140/0x240
  RSP: e02b:ffff8801d2ec7d48  EFLAGS: 00010283
  RAX: 0000000000000001 RBX: ffff88009d633800 RCX: 000000000000000e
  RDX: fffffffffffffffe RSI: ffff88009d630200 RDI: ffff88009d630200
  RBP: ffff8801d2ec7da8 R08: 0000000000000012 R09: 00000000fffffffe
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff88009d633800
  R13: ffff8801d2ec7d48 R14: dead000000100100 R15: ffff88009d633a30
  FS:  00007f1748bb4700(0000) GS:ffff8801def80000(0000) knlGS:0000000000000000
  CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007f4110300308 CR3: 00000000c05f7000 CR4: 0000000000002660
  Call Trace:
    pagefault_out_of_memory+0x18/0x90
    mm_fault_error+0xa9/0x1a0
    __do_page_fault+0x478/0x4c0
    do_page_fault+0x2c/0x40
    page_fault+0x28/0x30
  Code: 44 00 00 48 89 df e8 40 ca ff ff 48 85 c0 49 89 c4 74 35 4c 8b b0 30 02 00 00 4c 8d b8 30 02 00 00 4d 39 fe 74 1b 0f 1f 44 00 00 <49> 8b 7e 10 be 01 00 00 00 e8 42 d2 04 00 4d 8b 36 4d 39 fe 75
  RIP  mem_cgroup_oom_synchronize+0x140/0x240

Commit fb2a6fc56be6 ("mm: memcg: rework and document OOM waiting and
wakeup") has moved mem_cgroup_oom_notify outside of memcg_oom_lock
assuming it is protected by the hierarchical OOM-lock.

Although this is true for the notification part the protection doesn't
cover unregistration of event which can happen in parallel now so
mem_cgroup_oom_notify can see already unlinked and/or freed
mem_cgroup_eventfd_list.

Fix this by using memcg_oom_lock also in mem_cgroup_oom_notify.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=80881

Fixes: fb2a6fc56be6 (mm: memcg: rework and document OOM waiting and wakeup)
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Paul Furtado <paulfurtado91@gmail.com>
Tested-by: Paul Furtado <paulfurtado91@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hwpoison: call action_result() in failure path of hwpoison_user_mappings()

hwpoison_user_mappings() could fail for various reasons, so printk()s to
print out the reasons should be done in each failure check inside
hwpoison_user_mappings().

And currently we don't call action_result() when hwpoison_user_mappings()
fails, which is not consistent with other exit points of memory error
handler. So this patch fixes these messaging problems.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hwpoison: fix hugetlbfs/thp precheck in hwpoison_user_mappings()

A recent fix from Chen Yucong, commit 0bc1f8b0682c ("hwpoison: fix the
handling path of the victimized page frame that belong to non-LRU")
rejects going into unmapping operation for hugetlbfs/thp pages, which
results in failing error containing on such pages. This patch fixes it.

With this patch, hwpoison functional tests in mce-test testsuite pass.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rapidio/tsi721_dma: fix failure to obtain transaction descriptor

This is a bug fix for the situation when function tsi721_desc_get() fails
to obtain a free transaction descriptor.

The bug usually results in a memory access crash dump when data transfer
scatter-gather list has more entries than size of hardware buffer
descriptors ring. This fix ensures that error is properly returned to a
caller instead of an invalid entry.

This patch is applicable to kernel versions starting from v3.5.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org> [3.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, thp: do not allow thp faults to avoid cpuset restrictions

The page allocator relies on __GFP_WAIT to determine if ALLOC_CPUSET
should be set in allocflags. ALLOC_CPUSET controls if a page allocation
should be restricted only to the set of allowed cpuset mems.

Transparent hugepages clears __GFP_WAIT when defrag is disabled to prevent
the fault path from using memory compaction or direct reclaim. Thus, it
is unfairly able to allocate outside of its cpuset mems restriction as a
side-effect.

This patch ensures that ALLOC_CPUSET is only cleared when the gfp mask is
truly GFP_ATOMIC by verifying it is also not a thp allocation.

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Alex Thorlton <athorlton@sgi.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Bob Liu <lliubbo@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hedi Berriche <hedi@sgi.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()

Under memory pressure, it is possible for dirty_thresh, calculated by
global_dirty_limits() in balance_dirty_pages(), to equal zero. Then, if
strictlimit is true, bdi_dirty_limits() tries to resolve the proportion:

bdi_bg_thresh : bdi_thresh = background_thresh : dirty_thresh

by dividing by zero.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux

Pull Exynos platform DT fix from Grant Likely:
"Device tree Exynos bug fix for v3.16-rc7

  This bug fix has been brewing for a while.  I hate sending it to you
  so late, but I only got confirmation that it solves the problem this
  past weekend.  The diff looks big for a bug fix, but the majority of
  it is only executed in the Exynos quirk case.  Unfortunately it
  required splitting early_init_dt_scan() in two and adding quirk
  handling in the middle of it on ARM.

  Exynos has buggy firmware that puts bad data into the memory node.
  Commit 1c2f87c22566 ("ARM: Get rid of meminfo") exposed the bug by
  dropping the artificial upper bound on the number of memory banks that
  can be added.  Exynos fails to boot after that commit.  This branch
  fixes it by splitting the early DT parse function and inserting a
  fixup hook.  Exynos uses the hook to correct the DT before parsing
  memory regions"

* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
  arm: Add devicetree fixup machine function
  of: Add memory limiting function for flattened devicetrees
  of: Split early_init_dt_scan into two parts

Merge tag 'stable/for-linus-3.16-rc7-tag' of git://git./linux/kernel/git/xen/tip

Pull Xen fix from David Vrabel:
"Fix BUG when trying to expand the grant table.  This seems to occur
  often during boot with Ubuntu 14.04 PV guests"

* tag 'stable/for-linus-3.16-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: safely map and unmap grant frames when in atomic context

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fix from Paolo Bonzini:
"Fix a bug which allows KVM guests to bring down the entire system on
some 64K enabled ARM64 hosts"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform

Revert "cdc_subset: deal with a device that needs reset for timeout"

This reverts commit 20fbe3ae990fd54fc7d1f889d61958bc8b38f254.

As reported by Stephen Rothwell, it causes compile failures in certain
configurations:

  drivers/net/usb/cdc_subset.c:360:15: error: 'dummy_prereset' undeclared here (not in a function)
    .pre_reset = dummy_prereset,
                 ^
  drivers/net/usb/cdc_subset.c:361:16: error: 'dummy_postreset' undeclared here (not in a function)
    .post_reset = dummy_postreset,
                  ^

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: David Miller <davem@davemloft.net>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Make fragmentation IDs less predictable, from Eric Dumazet.

2) TSO tunneling can crash in bnx2x driver, fix from Dmitry Kravkov.

3) Don't allow NULL msg->msg_name just because msg->msg_namelen is
    non-zero, from Andrey Ryabinin.

4) ndm->ndm_type set using wrong macros, from Jun Zhao.

5) cdc-ether devices can come up with entries in their address filter,
    so explicitly clear the filter after the device initializes.  From
    Oliver Neukum.

6) Forgotten refcount bump in xfrm_lookup(), from Steffen Klassert.

7) Short packets not padded properly, exposing random data, in bcmgenet
    driver.  Fix from Florian Fainelli.

8) xgbe_probe() doesn't return an error code, but rather zero, when
    netif_set_real_num_tx_queues() fails.  Fix from Wei Yongjun.

9) USB speed not probed properly in r8152 driver, from Hayes Wang.

10) Transmit logic choosing the outgoing port in the sunvnet driver
    needs to consider a) is the port actually up and b) whether it is a
    switch port.  Fix from David L Stevens.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
  net: phy: re-apply PHY fixups during phy_register_device
  cdc-ether: clean packet filter upon probe
  cdc_subset: deal with a device that needs reset for timeout
  net: sendmsg: fix NULL pointer dereference
  isdn/bas_gigaset: fix a leak on failure path in gigaset_probe()
  ip: make IP identifiers less predictable
  neighbour : fix ndm_type type error issue
  sunvnet: only use connected ports when sending
  can: c_can_platform: Fix raminit, use devm_ioremap() instead of devm_ioremap_resource()
  bnx2x: fix crash during TSO tunneling
  r8152: fix the checking of the usb speed
  net: phy: Ensure the MDIO bus module is held
  net: phy: Set the driver when registering an MDIO bus device
  bnx2x: fix set_setting for some PHYs
  hyperv: Fix error return code in netvsc_init_buf()
  amd-xgbe: Fix error return code in xgbe_probe()
  ath9k: fix aggregation session lockup
  net: bcmgenet: correctly pad short packets
  net: sctp: inherit auth_capable on INIT collisions
  mac80211: fix crash on getting sta info with uninitialized rate control
  ...

x86/xen: safely map and unmap grant frames when in atomic context

arch_gnttab_map_frames() and arch_gnttab_unmap_frames() are called in
atomic context but were calling alloc_vm_area() which might sleep.

Also, if a driver attempts to allocate a grant ref from an interrupt
and the table needs expanding, then the CPU may already by in lazy MMU
mode and apply_to_page_range() will BUG when it tries to re-enable
lazy MMU mode.

These two functions are only used in PV guests.

Introduce arch_gnttab_init() to allocates the virtual address space in
advance.

Avoid the use of apply_to_page_range() by using saving and using the
array of PTE addresses from the alloc_vm_area() call (which ensures
that the required page tables are pre-allocated).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform

If the physical address of GICV isn't page-aligned, then we end up
creating a stage-2 mapping of the page containing it, which causes us to
map neighbouring memory locations directly into the guest.

As an example, consider a platform with GICV at physical 0x2c02f000
running a 64k-page host kernel. If qemu maps this into the guest at
0x80010000, then guest physical addresses 0x80010000 - 0x8001efff will
map host physical region 0x2c020000 - 0x2c02efff. Accesses to these
physical regions may cause UNPREDICTABLE behaviour, for example, on the
Juno platform this will cause an SError exception to EL3, which brings
down the entire physical CPU resulting in RCU stalls / HYP panics / host
crashing / wasted weeks of debugging.

SBSA recommends that systems alias the 4k GICV across the bounding 64k
region, in which case GICV physical could be described as 0x2c020000 in
the above scenario.

This patch fixes the problem by failing the vgic probe if the physical
base address or the size of GICV aren't page-aligned. Note that this
generated a warning in dmesg about freeing enabled IRQs, so I had to
move the IRQ enabling later in the probe.

Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joel Schopp <joel.schopp@amd.com>
Cc: Don Dutile <ddutile@redhat.com>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Joel Schopp <joel.schopp@amd.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

arm: Add devicetree fixup machine function

Commit 1c2f87c22566cd057bc8cde10c37ae9da1a1bb76
(ARM: 8025/1: Get rid of meminfo) dropped the upper bound on
the number of memory banks that can be added as there was no
technical need in the kernel. It turns out though, some bootloaders
(specifically the arndale-octa exynos boards) may pass invalid memory
information and rely on the kernel to not parse this data. This is a
bug in the bootloader but we still need to work around this.
Work around this by introducing a dt_fixup function. This function
gets called before the flattened devicetree is scanned for memory
and the like. In this fixup function for exynos, limit the maximum
number of memory regions in the devicetree.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Tested-by: Andreas Färber <afaerber@suse.de>
[glikely: Added a comment and fixed up function name]
Signed-off-by: Grant Likely <grant.likely@linaro.org>

of: Add memory limiting function for flattened devicetrees

Buggy bootloaders may pass bogus memory entries in the devicetree.
Add of_fdt_limit_memory to add an upper bound on the number of
entries that can be present in the devicetree.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Tested-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Grant Likely <grant.likely@linaro.org>

of: Split early_init_dt_scan into two parts

Currently, early_init_dt_scan validates the header, sets the
boot params, and scans for chosen/memory all in one function.
Split this up into two separate functions (validation/setting
boot params in one, scanning in another) to allow for
additional setup between boot params and scanning the memory.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Tested-by: Andreas Färber <afaerber@suse.de>
[glikely: s/early_init_dt_scan_all/early_init_dt_scan_nodes/]
Signed-off-by: Grant Likely <grant.likely@linaro.org>

net: phy: re-apply PHY fixups during phy_register_device

Commit 87aa9f9c61ad ("net: phy: consolidate PHY reset in phy_init_hw()")
moved the call to phy_scan_fixups() in phy_init_hw() after a software
reset is performed.

By the time phy_init_hw() is called in phy_device_register(), no driver
has been bound to this PHY yet, so all the checks in phy_init_hw()
against the PHY driver and the PHY driver's config_init function will
return 0. We will therefore never call phy_scan_fixups() as we should.

Fix this by calling phy_scan_fixups() and check for its return value to
restore the intended functionality.

This broke PHY drivers which do register an early PHY fixup callback to
intercept the PHY probing and do things like changing the 32-bits unique
PHY identifier when a pseudo-PHY address has been used, as well as
board-specific PHY fixups that need to be applied during driver probe
time.

Reported-by: Hauke Merthens <hauke-m@hauke-m.de>
Reported-by: Jonas Gorski <jogo@openwrt.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cdc-ether: clean packet filter upon probe

There are devices that don't do reset all the way. So the packet filter should
be set to a sane initial value. Failure to do so leads to intermittent failures
of DHCP on some systems under some conditions.

Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

cdc_subset: deal with a device that needs reset for timeout

This device needs to be reset to recover from a timeout.
Unfortunately this can be handled only at the level of
the subdrivers.

Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sendmsg: fix NULL pointer dereference

Sasha's report:
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel with the KASAN patchset, I've stumbled on the following spew:
>
> [ 4448.949424] ==================================================================
> [ 4448.951737] AddressSanitizer: user-memory-access on address 0
> [ 4448.952988] Read of size 2 by thread T19638:
> [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
> [ 4448.956823]  ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40
> [ 4448.958233]  ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d
> [ 4448.959552]  0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000
> [ 4448.961266] Call Trace:
> [ 4448.963158] dump_stack (lib/dump_stack.c:52)
> [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184)
> [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352)
> [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339)
> [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339)
> [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555)
> [ 4448.970103] sock_sendmsg (net/socket.c:654)
> [ 4448.971584] ? might_fault (mm/memory.c:3741)
> [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740)
> [ 4448.973596] ? verify_iovec (net/core/iovec.c:64)
> [ 4448.974522] ___sys_sendmsg (net/socket.c:2096)
> [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
> [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273)
> [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1))
> [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188)
> [ 4448.980535] __sys_sendmmsg (net/socket.c:2181)
> [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
> [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2))
> [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> [ 4448.986754] SyS_sendmmsg (net/socket.c:2201)
> [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542)
> [ 4448.988929] ==================================================================

This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0.

After this report there was no usual "Unable to handle kernel NULL pointer dereference"
and this gave me a clue that address 0 is mapped and contains valid socket address structure in it.

This bug was introduced in f3d3342602f8bcbf37d7c46641cb9bca7618eb1c
(net: rework recvmsg handler msg_name and msg_namelen logic).
Commit message states that:
"Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address."
But in fact this affects sendto when address 0 is mapped and contains
socket address structure in it. In such case copy-in address will succeed,
verify_iovec() function will successfully exit with msg->msg_namelen > 0
and msg->msg_name == NULL.

This patch fixes it by setting msg_namelen to 0 if msg_name == NULL.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: <stable@vger.kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/bas_gigaset: fix a leak on failure path in gigaset_probe()

There is a lack of usb_put_dev(udev) on failure path in gigaset_probe().

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
"A nice small set of bug fixes for arm-soc:

   - two incorrect register addresses in DT files on shmobile and hisilicon
   - one revert for a regression on omap
   - one bug fix for a newly introduced pin controller binding
   - one regression fix for the memory controller on omap
   - one patch to avoid a harmless WARN_ON"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: dts: Revert enabling of twl configuration for n900
  ARM: dts: fix L2 address in Hi3620
  ARM: OMAP2+: gpmc: fix gpmc_hwecc_bch_capable()
  pinctrl: dra: dt-bindings: Fix pull enable/disable
  ARM: shmobile: r8a7791: Fix SD2CKCR register address
  ARM: OMAP2+: l2c: squelch warning dump on power control setting

AFS: Correctly assemble the client UUID

Correctly assemble the client UUID by OR'ing in the flags rather than
assigning them over the other components.

Reported-by: Himangi Saraogi <himangi774@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix page_alloc.c kernel-doc warnings

Fix kernel-doc warnings and function name in mm/page_alloc.c:

  Warning(..//mm/page_alloc.c:6074): No description found for parameter 'pfn'
  Warning(..//mm/page_alloc.c:6074): No description found for parameter 'mask'
  Warning(..//mm/page_alloc.c:6074): Excess function parameter 'start_bitidx' description in 'get_pfnblock_flags_mask'
  Warning(..//mm/page_alloc.c:6102): No description found for parameter 'pfn'
  Warning(..//mm/page_alloc.c:6102): No description found for parameter 'mask'
  Warning(..//mm/page_alloc.c:6102): Excess function parameter 'start_bitidx' description in 'set_pfnblock_flags_mask'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'omap-for-v3.16/n900-regression' of git://git./linux/kernel/git/tmlind/linux-omap into fixes

Merge "omap n900 regression fix for v3.16 rc series" from Tony Lindgren:

Minimal regression fix for n900 display that got broken with
enabling of twl4030 PM features. Turns out more work is needed
before we can enable twl4030 PM on n900.

I did not notice this earlier as I have my n900 in a rack
and the display did not get enabled for device tree based booting
until for v3.16.

* tag 'omap-for-v3.16/n900-regression' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: Revert enabling of twl configuration for n900

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

ARM: dts: Revert enabling of twl configuration for n900

Commit 9188883fd66e9 (ARM: dts: Enable twl4030 off-idle configuration
for selected omaps) allowed n900 to cut off core voltages during
off-idle. This however caused a regression where twl regulator
vaux1 was not getting enabled for the LCD panel as we are not
requesting it for the panel.

Turns out quite a few devices on n900 are using vaux1, and we need
to either stop idling it, or add proper regulator_get calls for all
users. But until we have a proper solution implemented and tested,
let's just disable the twl off-idle configuration for now for n900.

Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Fixes: 9188883fd66e9 (ARM: dts: Enable twl4030 off-idle configuration for selected omaps)
Signed-off-by: Tony Lindgren <tony@atomide.com>

ip: make IP identifiers less predictable

In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
Jedidiah describe ways exploiting linux IP identifier generation to
infer whether two machines are exchanging packets.

With commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we
changed IP id generation, but this does not really prevent this
side-channel technique.

This patch adds a random amount of perturbation so that IP identifiers
for a given destination [1] are no longer monotonically increasing after
an idle period.

Note that prandom_u32_max(1) returns 0, so if generator is used at most
once per jiffy, this patch inserts no hole in the ID suite and do not
increase collision probability.

This is jiffies based, so in the worst case (HZ=1000), the id can
rollover after ~65 seconds of idle time, which should be fine.

We also change the hash used in __ip_select_ident() to not only hash
on daddr, but also saddr and protocol, so that ICMP probes can not be
used to infer information for other protocols.

For IPv6, adds saddr into the hash as well, but not nexthdr.

If I ping the patched target, we can see ID are now hard to predict.

21:57:11.008086 IP (...)
    A > target: ICMP echo request, seq 1, length 64
21:57:11.010752 IP (... id 2081 ...)
    target > A: ICMP echo reply, seq 1, length 64

21:57:12.013133 IP (...)
    A > target: ICMP echo request, seq 2, length 64
21:57:12.015737 IP (... id 3039 ...)
    target > A: ICMP echo reply, seq 2, length 64

21:57:13.016580 IP (...)
    A > target: ICMP echo request, seq 3, length 64
21:57:13.019251 IP (... id 3437 ...)
    target > A: ICMP echo reply, seq 3, length 64

[1] TCP sessions uses a per flow ID generator not changed by this patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jeffrey Knockel <jeffk@cs.unm.edu>
Reported-by: Jedidiah R. Crandall <crandall@cs.unm.edu>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

neighbour : fix ndm_type type error issue

ndm_type means L3 address type, in neighbour proxy and vxlan, it's RTN_UNICAST.
NDA_DST is for netlink TLV type, hence it's not right value in this context.

Signed-off-by: Jun Zhao <mypopydev@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

sunvnet: only use connected ports when sending

The sunvnet driver doesn't check whether or not a port is connected when
transmitting packets, which results in failures if a port fails to connect
(e.g., due to a version mismatch). The original code also assumes
unnecessarily that the first port is up and a switch, even though there is
a flag for switch ports.

This patch only matches a port if it is connected, and otherwise uses the
switch_port flag to send the packet to a switch port that is up.

Signed-off-by: David L Stevens <david.stevens@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-fixes-for-3.16-20140725' of git://gitorious.org/linux-can/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2014-07-25

this is a pull request of one patch for the net tree, hoping to get into the
3.16 release.

The patch by George Cherian fixes a regression in the c_can platform driver.
When using two interfaces the regression leads to a non function second
interface.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/herbert/crypto-2.6

Pull ARM AES crypto fixes from Herbert Xu:
"This push fixes a regression on ARM where odd-sized blocks supplied to
  AES may cause crashes"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: arm-aes - fix encryption of unaligned data
  crypto: arm64-aes - fix encryption of unaligned data

Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc

Pull powerpc fixes from Ben Herrenschmidt:
"Here are 3 more small powerpc fixes that should still go into .16.

  One is a recent regression (MMCR2 business), the other is a trivial
  endian fix without which FW updates won't work on LE in IBM machines,
  and the 3rd one turns a BUG_ON into a WARN_ON which is definitely a
  LOT more friendly especially when the whole thing is about retrieving
  error logs ..."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Fix endianness of flash_block_list in rtas_flash
  powerpc/powernv: Change BUG_ON to WARN_ON in elog code
  powerpc/perf: Fix MMCR2 handling for EBB

crypto: arm-aes - fix encryption of unaligned data

Fix the same alignment bug as in arm64 - we need to pass residue
unprocessed bytes as the last argument to blkcipher_walk_done.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # 3.13+
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: arm64-aes - fix encryption of unaligned data

cryptsetup fails on arm64 when using kernel encryption via AF_ALG socket.
See https://bugzilla.redhat.com/show_bug.cgi?id=1122937

The bug is caused by incorrect handling of unaligned data in
arch/arm64/crypto/aes-glue.c. Cryptsetup creates a buffer that is aligned
on 8 bytes, but not on 16 bytes. It opens AF_ALG socket and uses the
socket to encrypt data in the buffer. The arm64 crypto accelerator causes
data corruption or crashes in the scatterwalk_pagedone.

This patch fixes the bug by passing the residue bytes that were not
processed as the last parameter to blkcipher_walk_done.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

powerpc: Fix endianness of flash_block_list in rtas_flash

The function rtas_flash_firmware passes the address of a data structure,
flash_block_list, when making the update-flash-64-and-reboot rtas call.
While the endianness of the address is handled correctly, the endianness
of the data is not. This patch ensures that the data in flash_block_list
is big endian when passed to rtas on little endian hosts.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/powernv: Change BUG_ON to WARN_ON in elog code

We can continue to read the error log (up to MAX size) even if
we get the elog size more than MAX size. Hence change BUG_ON to
WARN_ON.

Also updated error message.

Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Linux 3.16-rc7

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
"A bunch of fixes for perf and kprobes:
   - revert a commit that caused a perf group regression
   - silence dmesg spam
   - fix kprobe probing errors on ia64 and ppc64
   - filter kprobe faults from userspace
   - lockdep fix for perf exit path
   - prevent perf #GP in KVM guest
   - correct perf event and filters"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes: Fix "Failed to find blacklist" probing errors on ia64 and ppc64
  kprobes/x86: Don't try to resolve kprobe faults from userspace
  perf/x86/intel: Avoid spamming kernel log for BTS buffer failure
  perf/x86/intel: Protect LBR and extra_regs against KVM lying
  perf: Fix lockdep warning on process exit
  perf/x86/intel/uncore: Fix SNB-EP/IVT Cbox filter mappings
  perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge
  perf: Revert ("perf: Always destroy groups on exit")

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin:
"A couple of crash fixes, plus a fix that on 32 bits would cause a
  missing -ENOSYS for nonexistent system calls"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, cpu: Fix cache topology for early P4-SMT
  x86_32, entry: Store badsys error code in %eax
  x86, MCE: Robustify mcheck_init_device

Merge branch 'vfs-for-3.16' of git://git.infradead.org/users/hch/vfs

Pull vfs fixes from Christoph Hellwig:
"A vfsmount leak fix, and a compile warning fix"

* 'vfs-for-3.16' of git://git.infradead.org/users/hch/vfs:
fs: umount on symlink leaks mnt count
direct-io: fix uninitialized warning in do_direct_IO()

Merge tag 'firewire-fix-vt6315' of git://git./linux/kernel/git/ieee1394/linux1394

Pull firewire regression fix from Stefan Richter:
"IEEE 1394 (FireWire) subsystem fix: MSI don't work on VIA PCIe
  controllers with some isochronous workloads (regression since
  v3.16-rc1)"

* tag 'firewire-fix-vt6315' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: ohci: disable MSI for VIA VT6315 again

Fix gcc-4.9.0 miscompilation of load_balance()  in scheduler

Michel Dänzer and a couple of other people reported inexplicable random
oopses in the scheduler, and the cause turns out to be gcc mis-compiling
the load_balance() function when debugging is enabled.  The gcc bug
apparently goes back to gcc-4.5, but slight optimization changes means
that it now showed up as a problem in 4.9.0 and 4.9.1.

The instruction scheduling problem causes gcc to schedule a spill
operation to before the stack frame has been created, which in turn can
corrupt the spilled value if an interrupt comes in.  There may be other
effects of this bug too, but that's the code generation problem seen in
Michel's case.

This is fixed in current gcc HEAD, but the workaround as suggested by
Markus Trippelsdorf is pretty simple: use -fno-var-tracking-assignments
when compiling the kernel, which disables the gcc code that causes the
problem.  This can result in slightly worse debug information for
variable accesses, but that is infinitely preferable to actual code
generation problems.

Doing this unconditionally (not just for CONFIG_DEBUG_INFO) also allows
non-debug builds to verify that the debug build would be identical: we
can do

    export GCC_COMPARE_DEBUG=1

to make gcc internally verify that the result of the build is
independent of the "-g" flag (it will make the compiler build everything
twice, toggling the debug flag, and compare the results).

Without the "-fno-var-tracking-assignments" option, the build would fail
(even with 4.8.3 that didn't show the actual stack frame bug) with a gcc
compare failure.

See also gcc bugzilla:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61801

Reported-by: Michel Dänzer <michel@daenzer.net>
Suggested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix direct reclaim writeback regression

Shortly before 3.16-rc1, Dave Jones reported:

  WARNING: CPU: 3 PID: 19721 at fs/xfs/xfs_aops.c:971
           xfs_vm_writepage+0x5ce/0x630 [xfs]()
  CPU: 3 PID: 19721 Comm: trinity-c61 Not tainted 3.15.0+ #3
  Call Trace:
    xfs_vm_writepage+0x5ce/0x630 [xfs]
    shrink_page_list+0x8f9/0xb90
    shrink_inactive_list+0x253/0x510
    shrink_lruvec+0x563/0x6c0
    shrink_zone+0x3b/0x100
    shrink_zones+0x1f1/0x3c0
    try_to_free_pages+0x164/0x380
    __alloc_pages_nodemask+0x822/0xc90
    alloc_pages_vma+0xaf/0x1c0
    handle_mm_fault+0xa31/0xc50
  etc.

970   if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
971                   PF_MEMALLOC))

I did not respond at the time, because a glance at the PageDirty block
in shrink_page_list() quickly shows that this is impossible: we don't do
writeback on file pages (other than tmpfs) from direct reclaim nowadays.
Dave was hallucinating, but it would have been disrespectful to say so.

However, my own /var/log/messages now shows similar complaints

  WARNING: CPU: 1 PID: 28814 at fs/ext4/inode.c:1881 ext4_writepage+0xa7/0x38b()
  WARNING: CPU: 0 PID: 27347 at fs/ext4/inode.c:1764 ext4_writepage+0xa7/0x38b()

from stressing some mmotm trees during July.

Could a dirty xfs or ext4 file page somehow get marked PageSwapBacked,
so fail shrink_page_list()'s page_is_file_cache() test, and so proceed
to mapping->a_ops->writepage()?

Yes, 3.16-rc1's commit 68711a746345 ("mm, migration: add destination
page freeing callback") has provided such a way to compaction: if
migrating a SwapBacked page fails, its newpage may be put back on the
list for later use with PageSwapBacked still set, and nothing will clear
it.

Whether that can do anything worse than issue WARN_ON_ONCEs, and get
some statistics wrong, is unclear: easier to fix than to think through
the consequences.

Fixing it here, before the put_new_page(), addresses the bug directly,
but is probably the worst place to fix it.  Page migration is doing too
many parts of the job on too many levels: fixing it in
move_to_new_page() to complement its SetPageSwapBacked would be
preferable, except why is it (and newpage->mapping and newpage->index)
done there, rather than down in migrate_page_move_mapping(), once we are
sure of success? Not a cleanup to get into right now, especially not
with memcg cleanups coming in 3.17.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ARM: dts: fix L2 address in Hi3620

Fix the address of L2 controler register in hi3620 SoC.
This has been wrong from the point that the file was merged
in v3.14.

Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
Acked-by: Wei Xu <xuwei5@hisilicon.com>
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"This is radeon and intel fixes, and is a small bit larger than I'm
  guessing you'd like it to be.

   - i915: fixes 32-bit highmem i915 blank screen, semaphore hang and
     runtime pm fix

   - radeon: gpuvm stability fix for hangs since 3.15, and hang/reboot
     regression on TN/RL devices,

  The only slightly controversial one is the change to use GB for the
  vm_size, which I'm letting through as its a new interface we defined
  in this merge window, and I'd prefer to have the released kernel have
  the final interface rather than changing it later"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon: fix cut and paste issue for hawaii.
  drm/radeon: fix irq ring buffer overflow handling
  drm/i915: Simplify i915_gem_release_all_mmaps()
  drm/radeon: fix error handling in radeon_vm_bo_set_addr
  drm/i915: fix freeze with blank screen booting highmem
  drm/i915: Reorder the semaphore deadlock check, again
  drm/radeon/TN: only enable bapm on MSI systems
  drm/radeon: fix VM IB handling
  drm/radeon: fix handling of radeon_vm_bo_rmv v3
  drm/radeon: let's use GB for vm_size (v2)

Merge tag 'sound-3.16-rc7' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Here contains only the fixes for the new FireWire bebob driver.  All
  fairly trivial and local fixes, so safe to apply"

* tag 'sound-3.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: bebob: Correction for return value of special_clk_ctl_put() in error
  ALSA: bebob: Correction for return value of .put callback
  ALSA: bebob: Use different labels for digital input/output
  ALSA: bebob: Fix a missing to unlock mutex in error handling case

Merge tag 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
"Fixes to temperature limit and vrm write operations in smsc47m192
driver"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (smsc47m192) Fix temperature limit and vrm write operations

parport: fix menu breakage

Do not split the PARPORT-related symbols with the new kconfig
symbol ARCH_MIGHT_HAVE_PC_PARPORT. The split was causing incorrect
display of these symbols -- they were not being displayed together
as they should be.

Fixes: d90c3eb31535 "Kconfig cleanup (PARPORT_PC dependencies)"

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org # for 3.13, 3.14, 3.15
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'blackfin-3.16-fixes' of git://git./linux/kernel/git/realmz6/blackfin-linux

Pull blackfin fixes from Steven Miao:
"smc nor flash PM fix, pinctrl group fix, update defconfig, and build
  fixes"

* tag 'blackfin-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux:
  blackfin: vmlinux.lds.S: reserve 32 bytes space at the end of data section for XIP kernel
  defconfig: BF609: update spi config name
  irq: blackfin sec: drop duplicated sec priority set
  blackfin: bind different groups of one pinmux function to different state name
  blackfin: fix some bf5xx boards build for missing <linux/gpio.h>
  pm: bf609: cleanup smc nor flash

blackfin: vmlinux.lds.S: reserve 32 bytes space at the end of data section for XIP kernel

to collect some undefined section to the end of the data section and avoid section overlap

Signed-off-by: Steven Miao <realmz6@gmail.com>

defconfig: BF609: update spi config name

Signed-off-by: Steven Miao <realmz6@gmail.com>

irq: blackfin sec: drop duplicated sec priority set

Signed-off-by: Steven Miao <realmz6@gmail.com>

blackfin: bind different groups of one pinmux function to different state name

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Steven Miao <realmz6@gmail.com>

blackfin: fix some bf5xx boards build for missing <linux/gpio.h>

Signed-off-by: Steven Miao <realmz6@gmail.com>

pm: bf609: cleanup smc nor flash

drop smc pin state change code, pin state will be saved in pinctrl-adi2 driver
cleanup nor flash init/exit for pm suspend/resume

Signed-off-by: Steven Miao <realmz6@gmail.com>

Merge branch 'parisc-3.16-6' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:
"We have two trivial patches in here.  One removes the SA_RESTORER
  #define since on parisc we don't have the sa_restorer field in struct
  sigaction, the other patch removes an unnecessary memset().

  The SA_RESTORER removal patch is scheduled for stable trees, since
  without it some userspace apps don't build"

* 'parisc-3.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Eliminate memset after alloc_bootmem_pages
  parisc: Remove SA_RESTORER define

Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:
"These two pathes fix issues with the kernel-userspace protocol changes
  in v3.15"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT
  fuse: s_time_gran fix

can: c_can_platform: Fix raminit, use devm_ioremap() instead of devm_ioremap_resource()

The raminit register is shared register for both can0 and can1. Since commit:

32766ff net: can: Convert to use devm_ioremap_resource

devm_ioremap_resource() is used to map raminit register. When using both
interfaces the mapping for the can1 interface fails, leading to a non
functional can interface.

Signed-off-by: George Cherian <george.cherian@ti.com>
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.11
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

bnx2x: fix crash during TSO tunneling

When TSO packet is transmitted additional BD w/o mapping is used
to describe the packed. The BD needs special handling in tx
completion.

kernel: Call Trace:
kernel: <IRQ> [<ffffffff815e19ba>] dump_stack+0x19/0x1b
kernel: [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80
kernel: [<ffffffff8105df5c>] warn_slowpath_fmt+0x5c/0x80
kernel: [<ffffffff814a8c0d>] ? find_iova+0x4d/0x90
kernel: [<ffffffff814ab0e2>] intel_unmap_page.part.36+0x142/0x160
kernel: [<ffffffff814ad0e6>] intel_unmap_page+0x26/0x30
kernel: [<ffffffffa01f55d7>] bnx2x_free_tx_pkt+0x157/0x2b0 [bnx2x]
kernel: [<ffffffffa01f8dac>] bnx2x_tx_int+0xac/0x220 [bnx2x]
kernel: [<ffffffff8101a0d9>] ? read_tsc+0x9/0x20
kernel: [<ffffffffa01f8fdb>] bnx2x_poll+0xbb/0x3c0 [bnx2x]
kernel: [<ffffffff814d041a>] net_rx_action+0x15a/0x250
kernel: [<ffffffff81067047>] __do_softirq+0xf7/0x290
kernel: [<ffffffff815f3a5c>] call_softirq+0x1c/0x30
kernel: [<ffffffff81014d25>] do_softirq+0x55/0x90
kernel: [<ffffffff810673e5>] irq_exit+0x115/0x120
kernel: [<ffffffff815f4358>] do_IRQ+0x58/0xf0
kernel: [<ffffffff815e94ad>] common_interrupt+0x6d/0x6d
kernel: <EOI> [<ffffffff810bbff7>] ? clockevents_notify+0x127/0x140
kernel: [<ffffffff814834df>] ? cpuidle_enter_state+0x4f/0xc0
kernel: [<ffffffff81483615>] cpuidle_idle_call+0xc5/0x200
kernel: [<ffffffff8101bc7e>] arch_cpu_idle+0xe/0x30
kernel: [<ffffffff810b4725>] cpu_startup_entry+0xf5/0x290
kernel: [<ffffffff815cfee1>] start_secondary+0x265/0x27b
kernel: ---[ end trace 11aa7726f18d7e80 ]---

Fixes: a848ade408b ("bnx2x: add CSUM and TSO support for encapsulation protocols")
Reported-by: Yulong Pei <ypei@redhat.com>
Cc: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: fix the checking of the usb speed

When the usb speed of the RTL8152 is not high speed, the USB_DEV_STAT[2:1]
should be equal to [0 1]. That is, the STAT_SPEED_FULL should be equal
to 2.

There is a easy way to check the usb speed by the speed field of the
struct usb_device. Use it to replace the original metheod.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Spotted-by: Andrey Utkin <andrey.krieger.utkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'master-2014-07-23' of git://git./linux/kernel/git/linville/wireless

John W. Linville says:

====================
pull request: wireless 2014-07-24

Please pull this batch of fixes intended for the 3.16 stream...

For the mac80211 fixes, Johannes says:

"I have two fixes: one for tracing that fixes a long-standing NULL
pointer dereference, and one for a mac80211 issue that causes iwlmvm to
send invalid frames during authentication/association."

and,

"One more fix - for a bug in the newly introduced code that obtains rate
control information for stations."

For the iwlwifi fixes, Emmanuel says:

"It includes a merge damage fix. This region has been changed in -next
and -fixes quite a few times and apparently, I failed to handle it
properly, so here the fix.  Along with that I have a fix from Eliad
to properly handle overlapping BSS in AP mode."

On top of that, Felix provides and ath9k fix for Tx stalls that happen
after an aggregation session failure.

Please let me know if there are problems!  There are some changes
here that will cause merge conflicts in -next.  Once you merge this
I can pull it into wireless-next and resolve those issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mdio_module_unload'

Ezequiel Garcia says:

====================
net: phy: Prevent an MDIO bus from being unloaded while in use

Changes from v1:
  * Dropped the unneeded module_put() on phy_attach_direct().

The motivation of this small series is to fix the current lack of relationship
between an ethernet driver and the MDIO bus behind the PHY device. In such
cases, we would find no visible link between the drivers:

  $ lsmod
  Module                  Size  Used by    Not tainted
  mvmdio                  2941  0
  mvneta                 22069  0

Which means nothing prevents the MDIO driver from being removed:

  $ modprobe -r mvmdio

  # Unable to handle kernel NULL pointer dereference at virtual address 00000060
  pgd = c0004000
  [00000060] *pgd=00000000
  Internal error: Oops: 17 [#1] SMP ARM
  Modules linked in: mvneta [last unloaded: mvmdio]
  CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted 3.16.0-rc5-01127-g62c0816-dirty #608
  Workqueue: events_power_efficient phy_state_machine
  task: df5ec840 ti: df67a000 task.ti: df67a000
  PC is at phy_state_machine+0x1c/0x468
  LR is at phy_state_machine+0x18/0x468
  [snip]

This patchset fixes this problem by adding a pair of module_{get,put},
so the module reference count is increased when the PHY device is attached
and is decreased when the PHY is detached.

Tested with mvneta and mv643xx_eth drivers, which depend on the mvmdio
driver to handle MDIO. This series applies on both net and net-next branches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Ensure the MDIO bus module is held

This commit adds proper module_{get,put} to prevent the MDIO bus module
from being unloaded while the phydev is connected. By doing so, we fix
a kernel panic produced when a MDIO driver is removed, but the phydev
that relies on it is attached and running.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Set the driver when registering an MDIO bus device

mdiobus_register() registers a device which is already bound to a driver.
Hence, the driver pointer should be set properly in order to track down
the driver associated to the MDIO bus.

This will be used to allow ethernet driver to pin down a MDIO bus driver,
preventing it from being unloaded while the PHY device is running.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: fix set_setting for some PHYs

Allow set_settings() to complete succesfully even if link is
not estabilished and port type isn't known yet.

Signed-off-by: Yaniv Rosner <Yaniv.Rosner@qlogic.com>
Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Via Simon Horman, I received the following one-liner for your net tree:

1) Fix crash when exiting from netns that uses IPVS and conntrack,
from Julian Anastasov via Simon Horman.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

x86: Merge tag 'ras_urgent' into x86/urgent

Promote one fix for 3.16

This fix was necessary after

9c15a24b038f ("x86/mce: Improve mcheck_init_device() error handling")

went in. What this patch did was, among others, check the return value
of misc_register and exit early if it encountered an error. Original
code sloppily didn't do that.

However,

cef12ee52b05 ("xen/mce: Add mcelog support for Xen platform")

made it so that xen's init routine xen_late_init_mcelog runs first. This
was needed for the xen mcelog device which is supposed to be independent
from the baremetal one.

Initially it was reported that misc_register() fails often on xen and
that's why it needed fixing. However, it is *supposed* to fail by
design, when running in dom0 so that the xen mcelog device file gets
registered first.

And *then* you need the notifier *not* unregistered on the error path so
that the timer does get deleted properly in the CPU hotplug notifier.

Btw, this fix is needed also on baremetal in the unlikely event that
misc_register(&mce_chrdev_device) fails there too.

I was unsure whether to rush it in now and decided to delay it to 3.17.
However, xen people wanted it promoted as it breaks xen when doing cpu
hotplug there. So, after a bit of simmering in tip/master for initial
smoke testing, let's move it to 3.16. It fixes a semi-regression which
got introduced in 3.16 so no need for stable tagging.

tip/x86/ras contains that exact same commit but we can't remove it
there as it is not the last one. It won't cause any merge issues, as I
confirmed locally but I should state here the special situation of this
one fix explicitly anyway.

Thanks.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

drm/radeon: fix cut and paste issue for hawaii.

This is a halfway fix for hawaii acceleration. More fixes to come
but hopefully isolated to userspace.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge branch 'drm-fixes-3.16' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

two more radeon fixes.

* 'drm-fixes-3.16' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: fix irq ring buffer overflow handling
drm/radeon: fix error handling in radeon_vm_bo_set_addr

Merge tag 'drm-intel-fixes-2014-07-24' of git://anongit.freedesktop.org/drm-intel into drm-fixes

This time in time! Just 32bit-pae fix from Hugh, semaphores fun from Chris
and a fix for runtime pm cherry-picked from next.

Paulo is still working on a fix for runtime pm when X does cursor fun when
the display is off, but that one isn't ready yet.

* tag 'drm-intel-fixes-2014-07-24' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Simplify i915_gem_release_all_mmaps()
  drm/i915: fix freeze with blank screen booting highmem
  drm/i915: Reorder the semaphore deadlock check, again

parisc: Eliminate memset after alloc_bootmem_pages

alloc_bootmem and related function always return zeroed region of
memory. Thus a memset after calls to these functions is unnecessary.

The following Coccinelle semantic patch was used for making the change:

@@
expression E,E1;
@@

E = $alloc_bootmem\|alloc_bootmem_low\|alloc_bootmem_pages\|alloc_bootmem_low_pages$(...)
... when != E
- memset(E,0,E1);

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Helge Deller <deller@gmx.de>

parisc: Remove SA_RESTORER define

The sa_restorer field in struct sigaction is obsolete and no longer in
the parisc implementation. However, the core code assumes the field is
present if SA_RESTORER is defined. So, the define needs to be removed.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>

hwmon: (smsc47m192) Fix temperature limit and vrm write operations

Temperature limit clamps are applied after converting the temperature
from milli-degrees C to degrees C, so either the clamp limit needs
to be specified in degrees C, not milli-degrees C, or clamping must
happen before converting to degrees C. Use the latter method to avoid
overflows.

vrm is an u8, so the written value needs to be limited to [0, 255].

Cc: Axel Lin <axel.lin@ingics.com>
Cc: stable@vger.kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Jean Delvare <jdelvare@suse.de>

Merge tag 'omap-for-v3.16/fixes-rc6' of git://git./linux/kernel/git/tmlind/linux-omap into fixes

Merge "Two regression fixes for omaps and one fix for device
signaling" from Tony Lindgren:

- L2 cache regression fix for a warning about trying to access
  a read-only register

- GPMC ECC software fallback regression fix for omap3

- Fix for dra7 pinctrl pull-up direction that causes signal issues
  for anybody trying to use the internal pull up or down

* tag 'omap-for-v3.16/fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: OMAP2+: gpmc: fix gpmc_hwecc_bch_capable()
  pinctrl: dra: dt-bindings: Fix pull enable/disable
  ARM: OMAP2+: l2c: squelch warning dump on power control setting

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'renesas-fixes2-for-v3.16' of git://git./linux/kernel/git/horms/renesas into fixes

Merge "Second Round of Renesas ARM Based SoC Fixes for v3.16" from Simon Horman

* Fix SD2CKCR register address of r8a7791 (R-Car M2) SoC

  This corrects a bug introduced in v3.14 by
  59e79895b9589286 ("ARM: shmobile: r8a7791: Add clocks").

  However, it does not manifest in mainline code until
  SDHI devices were enabled on the Koelsch board in v3.15 by
  2c60a7df72711fb8 ("ARM: shmobile: Add SDHI devices for Koelsch DTS").

  It also manifests on the Henninger board when
  SDHI devices were enabled in v3.16-rc1 by
  1299df03d7191ab4 ("ARM: shmobile: henninger: add SDHI0/2 DT support")

* tag 'renesas-fixes2-for-v3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
  ARM: shmobile: r8a7791: Fix SD2CKCR register address

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

fs: umount on symlink leaks mnt count

Currently umount on symlink blocks following umount:

/vz is separate mount

# ls /vz/ -al | grep test
drwxr-xr-x. 2 root root 4096 Jul 19 01:14 testdir
lrwxrwxrwx. 1 root root 11 Jul 19 01:16 testlink -> /vz/testdir
# umount -l /vz/testlink
umount: /vz/testlink: not mounted (expected)

# lsof /vz
# umount /vz
umount: /vz: device is busy. (unexpected)

In this case mountpoint_last() gets an extra refcount on path->mnt

Signed-off-by: Vasily Averin <vvs@openvz.org>
Acked-by: Ian Kent <raven@themaw.net>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>

direct-io: fix uninitialized warning in do_direct_IO()

The following warnings:

  fs/direct-io.c: In function ‘__blockdev_direct_IO’:
  fs/direct-io.c:1011:12: warning: ‘to’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  fs/direct-io.c:913:16: note: ‘to’ was declared here
  fs/direct-io.c:1011:12: warning: ‘from’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  fs/direct-io.c:913:10: note: ‘from’ was declared here

are false positive because dio_get_page() either fails, or sets both
'from' and 'to'.

Paul Bolle said ...
Maybe it's better to move initializing "to" and "from" out of
dio_get_page(). That _might_ make it easier for both the the reader and
the compiler to understand what's going on. Something like this:

Christoph Hellwig said ...
The fix of moving the code definitively looks nicer, while I think
uninitialized_var is horrible wart that won't get anywhere near my code.

Boaz Harrosh: I agree with Christoph and Paul

Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2014-07-23

Just two fixes this time, both are stable candidates.

1) Fix the dst_entry refcount on socket policy usage.

2) Fix a wrong SPI check that prevents AH SAs from getting
installed, dependent on the SPI. From Tobias Brunner.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfix from Bruce Fields:
"Another regression from the xdr encoding rewrite"

* 'for-3.16' of git://linux-nfs.org/~bfields/linux:
NFSD: Fix crash encoding lock reply on 32-bit

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
"Fix arm64 regression introduced by limiting the CMA buffer to ZONE_DMA
on platforms where RAM starts above 4GB (and ZONE_DMA becoming 0)"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Create non-empty ZONE_DMA when DRAM starts above 4GB

Merge tag 'xtensa-next-20140721' of git://github.com/czankel/xtensa-linux

Pull Xtensa fixes from Chris Zankel:
- resolve FIXMEs in double exception handler for window overflow. This
   fix makes native building of linux on xtensa host possible;
- fix sysmem region removal issue introduced in 3.15.

* tag 'xtensa-next-20140721' of git://github.com/czankel/xtensa-linux:
  xtensa: fix sysmem reservation at the end of existing block
  xtensa: add fixup for double exception raised in window overflow

Merge tag 'pinctrl-v3.16-3' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
"Here are three pin control fixes for the v3.16 series.  Sorry that
  some of these arrive late, the summer heat in Sweden makes me slow.

   - an IRQ handling fix for the STi driver, also for stable
   - another IRQ fix for the RCAR GPIO driver
   - a MAINTAINERS entry"

* tag 'pinctrl-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  gpio: rcar: Add support for DT IRQ flags
  MAINTAINERS: Add entry for the Renesas pin controller driver
  pinctrl: st: Fix irqmux handler

Merge branch 'for-3.16-fixes' of git://git./linux/kernel/git/tj/libata

Pull libata regression fix from Tejun Heo:
"The last libata/for-3.16-fixes pull contained a regression introduced
  by 1871ee134b73 ("libata: support the ata host which implements a
  queue depth less than 32") which in turn was a fix for a regression
  introduced earlier while changing queue tag order to accomodate hard
  drives which perform poorly if tags are not allocated in circular
  order (ugh...).

  The regression happens only for SAS controllers making use of libata
  to serve ATA devices.  They don't fill an ata_host field which is used
  by the new tag allocation function leading to NULL dereference.

  This patch adds a new intermediate field ata_host->n_tags which is
  initialized for both SAS and !SAS cases to fix the issue"

* 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: introduce ata_host->n_tags to avoid oops on SAS controllers

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input layer fixes from Dmitry Torokhov:
"A few fixups for the input subsystem"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: document INPUT_PROP_TOPBUTTONPAD
  Input: fix defuzzing logic
  Input: sirfsoc-onkey - fix GPL v2 license string typo
  Input: st-keyscan - fix 'defined but not used' compiler warnings
  Input: synaptics - add min/max quirk for pnp-id LEN2002 (Edge E531)
  Input: i8042 - add Acer Aspire 5710 to nomux blacklist
  Input: ti_am335x_tsc - warn about incorrect spelling
  Input: wacom - cleanup multitouch code when touch_max is 2

Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc

Pull powerpc fixes from Ben Herrenschmidt:
"Here is a handful of powerpc fixes for 3.16.  They are all pretty
  simple and self contained and should still make this release"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: use _GLOBAL_TOC for memmove
  powerpc/pseries: dynamically added OF nodes need to call of_node_init
  powerpc: subpage_protect: Increase the array size to take care of 64TB
  powerpc: Fix bugs in emulate_step()
  powerpc: Disable doorbells on Power8 DD1.x

Merge tag 'urgent-slab-fix' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull slab fix from Mike Snitzer:
"This fixes the broken duplicate slab name check in
  kmem_cache_sanity_check() that has been repeatedly reported (as
  recently as today against Fedora rawhide).

  Pekka seemed to have it staged for a late 3.15-rc in his 'slab/urgent'
  branch but never sent a pull request, see:
      https://lkml.org/lkml/2014/5/23/648"

* tag 'urgent-slab-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  slab_common: fix the check for duplicate slab names

Merge branch 'akpm' (patches from Andrew Morton)

Merge fixes from Andrew Morton:
"10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: hugetlb: fix copy_hugetlb_page_range()
  simple_xattr: permit 0-size extended attributes
  mm/fs: fix pessimization in hole-punching pagecache
  shmem: fix splicing from a hole while it's punched
  shmem: fix faulting into a hole, not taking i_mutex
  mm: do not call do_fault_around for non-linear fault
  sh: also try passing -m4-nofpu for SH2A builds
  zram: avoid lockdep splat by revalidate_disk
  mm/rmap.c: fix pgoff calculation to handle hugepage correctly
  coredump: fix the setting of PF_DUMPCORE

mm: hugetlb: fix copy_hugetlb_page_range()

Commit 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry") changed the order of
huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage
in some workloads like hugepage-backed heap allocation via libhugetlbfs.
This patch fixes it.

The test program for the problem is shown below:

  $ cat heap.c
  #include <unistd.h>
  #include <stdlib.h>
  #include <string.h>

  #define HPS 0x200000

  int main() {
   int i;
   char *p = malloc(HPS);
   memset(p, '1', HPS);
   for (i = 0; i < 5; i++) {
   if (!fork()) {
   memset(p, '2', HPS);
   p = malloc(HPS);
   memset(p, '3', HPS);
   free(p);
   return 0;
   }
   }
   sleep(1);
   free(p);
   return 0;
  }

  $ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap

Fixes 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry"), so is applicable to -stable kernels which
include it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Guillaume Morin <guillaume@morinfr.org>
Suggested-by: Guillaume Morin <guillaume@morinfr.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org> [2.6.37+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

simple_xattr: permit 0-size extended attributes

If a filesystem uses simple_xattr to support user extended attributes,
LTP setxattr01 and xfstests generic/062 fail with "Cannot allocate
memory": simple_xattr_alloc()'s wrap-around test mistakenly excludes
values of zero size. Fix that off-by-one (but apparently no filesystem
needs them yet).

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/fs: fix pessimization in hole-punching pagecache

I wanted to revert my v3.1 commit d0823576bf4b ("mm: pincer in
truncate_inode_pages_range"), to keep truncate_inode_pages_range() in
synch with shmem_undo_range(); but have stepped back - a change to
hole-punching in truncate_inode_pages_range() is a change to
hole-punching in every filesystem (except tmpfs) that supports it.

If there's a logical proof why no filesystem can depend for its own
correctness on the pincer guarantee in truncate_inode_pages_range() - an
instant when the entire hole is removed from pagecache - then let's
revisit later.  But the evidence is that only tmpfs suffered from the
livelock, and we have no intention of extending hole-punch to ramfs.  So
for now just add a few comments (to match or differ from those in
shmem_undo_range()), and fix one silliness noticed in d0823576bf4b...

Its "index == start" addition to the hole-punch termination test was
incomplete: it opened a way for the end condition to be missed, and the
loop go on looking through the radix_tree, all the way to end of file.
Fix that pessimization by resetting index when detected in inner loop.

Note that it's actually hard to hit this case, without the obsessive
concurrent faulting that trinity does: normally all pages are removed in
the initial trylock_page() pass, and this loop finds nothing to do.  I
had to "#if 0" out the initial pass to reproduce bug and test fix.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

shmem: fix splicing from a hole while it's punched

shmem_fault() is the actual culprit in trinity's hole-punch starvation,
and the most significant cause of such problems: since a page faulted is
one that then appears page_mapped(), needing unmap_mapping_range() and
i_mmap_mutex to be unmapped again.

But it is not the only way in which a page can be brought into a hole in
the radix_tree while that hole is being punched; and Vlastimil's testing
implies that if enough other processors are busy filling in the hole,
then shmem_undo_range() can be kept from completing indefinitely.

shmem_file_splice_read() is the main other user of SGP_CACHE, which can
instantiate shmem pagecache pages in the read-only case (without holding
i_mutex, so perhaps concurrently with a hole-punch).  Probably it's
silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
ought to be safe, but might bring surprises - not a change to be rushed.

shmem_read_mapping_page_gfp() is an internal interface used by
drivers/gpu/drm GEM (and next by uprobes): it should be okay.  And
shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
called internally by the kernel (perhaps for a stacking filesystem,
which might rely on holes to be reserved): it's unclear whether it could
be provoked to keep hole-punch busy or not.

We could apply the same umbrella as now used in shmem_fault() to
shmem_file_splice_read() and the others; but it looks ugly, and use over
a range raises questions - should it actually be per page? can these get
starved themselves?

The origin of this part of the problem is my v3.1 commit d0823576bf4b
("mm: pincer in truncate_inode_pages_range"), once it was duplicated
into shmem.c.  It seemed like a nice idea at the time, to ensure
(barring RCU lookup fuzziness) that there's an instant when the entire
hole is empty; but the indefinitely repeated scans to ensure that make
it vulnerable.

Revert that "enhancement" to hole-punch from shmem_undo_range(), but
retain the unproblematic rescanning when it's truncating; add a couple
of comments there.

Remove the "indices[0] >= end" test: that is now handled satisfactorily
by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
to be worth avoiding here.

But if we do not always loop indefinitely, we do need to handle the case
of swap swizzled back to page before shmem_free_swap() gets it: add a
retry for that case, as suggested by Konstantin Khlebnikov; and for the
case of page swizzled back to swap, as suggested by Johannes Weiner.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <stable@vger.kernel.org> [3.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

shmem: fix faulting into a hole, not taking i_mutex

Commit f00cdc6df7d7 ("shmem: fix faulting into a hole while it's
punched") was buggy: Sasha sent a lockdep report to remind us that
grabbing i_mutex in the fault path is a no-no (write syscall may already
hold i_mutex while faulting user buffer).

We tried a completely different approach (see following patch) but that
proved inadequate: good enough for a rational workload, but not good
enough against trinity - which forks off so many mappings of the object
that contention on i_mmap_mutex while hole-puncher holds i_mutex builds
into serious starvation when concurrent faults force the puncher to fall
back to single-page unmap_mapping_range() searches of the i_mmap tree.

So return to the original umbrella approach, but keep away from i_mutex
this time. We really don't want to bloat every shmem inode with a new
mutex or completion, just to protect this unlikely case from trinity.
So extend the original with wait_queue_head on stack at the hole-punch
end, and wait_queue item on the stack at the fault end.

This involves further use of i_lock to guard against the races: lockdep
has been happy so far, and I see fs/inode.c:unlock_new_inode() holds
i_lock around wake_up_bit(), which is comparable to what we do here.
i_lock is more convenient, but we could switch to shmem's info->lock.

This issue has been tagged with CVE-2014-4171, which will require commit
f00cdc6df7d7 and this and the following patch to be backported: we
suggest to 3.1+, though in fact the trinity forkbomb effect might go
back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might
not, since much has changed, with i_mmap_mutex a spinlock before 3.0.
Anyone running trinity on 3.0 and earlier? I don't think we need care.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <stable@vger.kernel.org> [3.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: do not call do_fault_around for non-linear fault

Ingo Korb reported that "repeated mapping of the same file on tmpfs
using remap_file_pages sometimes triggers a BUG at mm/filemap.c:202 when
the process exits".

He bisected the bug to d7c1755179b8 ("mm: implement ->map_pages for
shmem/tmpfs"), although the bug was actually added by commit
8c6e50b0290c ("mm: introduce vm_ops->map_pages()").

The problem is caused by calling do_fault_around for a _non-linear_
fault. In this case pgoff is shifted and might become negative during
calculation.

Faulting around non-linear page-fault makes no sense and breaks the
logic in do_fault_around because pgoff is shifted.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Ingo Korb <ingo.korb@tu-dortmund.de>
Tested-by: Ingo Korb <ingo.korb@tu-dortmund.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Ning Qu <quning@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org> [3.15.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

sh: also try passing -m4-nofpu for SH2A builds

When compiling a SH2A kernel (e.g.  se7206_defconfig or rsk7203_defconfig)
using sh4-linux-gcc, linking fails with:

  net/built-in.o: In function `__sk_run_filter':
  net/core/filter.c:566: undefined reference to `__fpscr_values'
  net/core/filter.c:269: undefined reference to `__fpscr_values'
  ...
  net/built-in.o:net/core/filter.c:580: more undefined references to `__fpscr_values' follow

This happens because sh4-linux-gcc doesn't support the "-m2a-nofpu",
which is thus filtered out by "$(call cc-option, ...)".

As compiling using sh4-linux-gcc is useful for compile coverage, also
try passing "-m4-nofpu" (which is presumably filtered out when using a
real sh2a-linux toolchain) to disable the generation of FPU instructions
and references to __fpscr_values[].

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Tony Breeds <tony@bakeyournoodle.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

zram: avoid lockdep splat by revalidate_disk

Sasha reported lockdep warning [1] introduced by [2].

It could be fixed by doing disk revalidation out of the init_lock. It's
okay because disk capacity change is protected by init_lock so that
revalidate_disk always sees up-to-date value so there is no race.

[1] https://lkml.org/lkml/2014/7/3/735
[2] zram: revalidate disk after capacity change

Fixes 2e32baea46ce ("zram: revalidate disk after capacity change").

Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: "Alexander E. Patrakov" <patrakov@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/rmap.c: fix pgoff calculation to handle hugepage correctly

I triggered VM_BUG_ON() in vma_address() when I tried to migrate an
anonymous hugepage with mbind() in the kernel v3.16-rc3. This is
because pgoff's calculation in rmap_walk_anon() fails to consider
compound_order() only to have an incorrect value.

This patch introduces page_to_pgoff(), which gets the page's offset in
PAGE_CACHE_SIZE.

Kirill pointed out that page cache tree should natively handle
hugepages, and in order to make hugetlbfs fit it, page->index of
hugetlbfs page should be in PAGE_CACHE_SIZE. This is beyond this patch,
but page_to_pgoff() contains the point to be fixed in a single function.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

coredump: fix the setting of PF_DUMPCORE

Commit 079148b919d0 ("coredump: factor out the setting of PF_DUMPCORE")
cleaned up the setting of PF_DUMPCORE by removing it from all the
linux_binfmt->core_dump() and moving it to zap_threads().But this ended
up clearing all the previously set flags.  This causes issues during
core generation when tsk->flags is checked again (eg.  for PF_USED_MATH
to dump floating point registers).  Fix this.

Signed-off-by: Silesh C V <svellattu@mvista.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hyperv: Fix error return code in netvsc_init_buf()

Fix to return -ENOMEM from the kalloc error handling
case instead of 0.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Fix error return code in xgbe_probe()

Fix to return a negative error code from the setting real tx queue
count error handling case instead of 0.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>