review.tizen.org Git - profile/ivi/kernel-x86-ivi.git/log

KVM: nVMX: Unconditionally uninit the MMU on nested vmexit

Three reasons for doing this: 1. arch.walk_mmu points to arch.mmu anyway
in case nested EPT wasn't in use. 2. this aligns VMX with SVM. But 3. is
most important: nested_cpu_has_ept(vmcs12) queries the VMCS page, and if
one guest VCPU manipulates the page of another VCPU in L2, we may be
fooled to skip over the nested_ept_uninit_mmu_context, leaving mmu in
nested state. That can crash the host later on if nested_ept_get_cr3 is
invoked while L1 already left vmxon and nested.current_vmcs12 became
NULL therefore.

Cc: stable@kernel.org
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: x86: Fix APIC map calculation after re-enabling

Update arch.apic_base before triggering recalculate_apic_map. Otherwise
the recalculation will work against the previous state of the APIC and
will fail to build the correct map when an APIC is hardware-enabled
again.

This fixes a regression of 1e08ec4a13.

Cc: stable@vger.kernel.org
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
"A collection of bug fixes destined for stable and some printk cleanups
  and a patch so that instead of BUG'ing we use the ext4_error()
  framework to mark the file system is corrupted"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: add explicit casts when masking cluster sizes
  ext4: fix deadlock when writing in ENOSPC conditions
  jbd2: rename obsoleted msg JBD->JBD2
  jbd2: revise KERN_EMERG error messages
  jbd2: don't BUG but return ENOSPC if a handle runs out of space
  ext4: Do not reserve clusters when fs doesn't support extents
  ext4: fix del_timer() misuse for ->s_err_report
  ext4: check for overlapping extents in ext4_valid_extent_entries()
  ext4: fix use-after-free in ext4_mb_new_blocks
  ext4: call ext4_error_inode() if jbd2_journal_dirty_metadata() fails

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
- fix for a memory leak on certain unplug events
- a collection of bcache fixes from Kent and Nicolas
- a few null_blk fixes and updates form Matias
- a marking of static of functions in the stec pci-e driver

* 'for-linus' of git://git.kernel.dk/linux-block:
  null_blk: support submit_queues on use_per_node_hctx
  null_blk: set use_per_node_hctx param to false
  null_blk: corrections to documentation
  null_blk: warning on ignored submit_queues param
  null_blk: refactor init and init errors code paths
  null_blk: documentation
  null_blk: mem garbage on NUMA systems during init
  drivers: block: Mark the functions as static in skd_main.c
  bcache: New writeback PD controller
  bcache: bugfix for race between moving_gc and bucket_invalidate
  bcache: fix for gc and writeback race
  bcache: bugfix - moving_gc now moves only correct buckets
  bcache: fix for gc crashing when no sectors are used
  bcache: Fix heap_peek() macro
  bcache: Fix for can_attach_cache()
  bcache: Fix dirty_data accounting
  bcache: Use uninterruptible sleep in writeback
  bcache: kthread don't set writeback task to INTERUPTIBLE
  block: fix memory leaks on unplugging block device
  bcache: fix sparse non static symbol warning

Merge branch 'for-3.13-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
"Two fixes.  One fixes a bug in the error path of cgroup_create().  The
  other changes cgrp->id lifetime rule so that the id doesn't get
  recycled before all controller states are destroyed.  This premature
  id recycling made memcg malfunction"

* 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: don't recycle cgroup id until all csses' have been destroyed
  cgroup: fix cgroup_create() error handling path

Merge branch 'for-3.13-fixes' of git://git./linux/kernel/git/tj/percpu

Pull percpu fix from Tejun Heo:
"A single commit to fix a spurious sparse warning coming from
  DEFINE_PER_CPU()'s hack to support the use of weak symbols.  Shouldn't
  cause observable behavior change"

* 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: fix spurious sparse warnings from DEFINE_PER_CPU()

Merge branch 'for-3.13-fixes' of git://git./linux/kernel/git/tj/libata

Pull libata fixes from Tejun Heo:
"There's one interseting commit - "libata, freezer: avoid block device
  removal while system is frozen".  It's an ugly hack working around a
  deadlock condition between driver core resume and block layer device
  removal paths through freezer which was made more reproducible by
  writeback being converted to workqueue some releases ago.  The bug has
  nothing to do with libata but it's just an workaround which is easy to
  backport.  After discussion, Rafael and I seem to agree that we don't
  really need kernel freezables - both kthread and workqueue.  There are
  few specific workqueues which constitute PM operations and require
  freezing, which will be converted to use workqueue_set_max_active()
  instead.  All other kernel freezer uses are planned to be removed,
  followed by the removal of kthread and workqueue freezer support,
  hopefully.

  Others are device-specific fixes.  The most notable is the addition of
  NO_NCQ_TRIM which is used to disable queued TRIM commands to Micro
  M500 SSDs which otherwise suffers data corruption"

* 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata, freezer: avoid block device removal while system is frozen
  libata: implement ATA_HORKAGE_NO_NCQ_TRIM and apply it to Micro M500 SSDs
  libata: disable a disk via libata.force params
  ahci: bail out on ICH6 before using AHCI BAR
  ahci: imx: Explicitly clear IMX6Q_GPR13_SATA_MPLL_CLK_EN
  libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for Seagate Momentus SpinPoint M8

auxvec.h: account for AT_HWCAP2 in AT_VECTOR_SIZE_BASE

Commit 2171364d1a92 ("powerpc: Add HWCAP2 aux entry") introduced a new
AT_ auxv entry type AT_HWCAP2 but failed to update AT_VECTOR_SIZE_BASE
accordingly.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Fixes: 2171364d1a92 (powerpc: Add HWCAP2 aux entry)
Cc: stable@vger.kernel.org
Acked-by: Michael Neuling <michael@neuling.org>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security

Pull SELinux fixes from James Morris.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
selinux: selinux_setprocattr()->ptrace_parent() needs rcu_read_lock()
selinux: fix broken peer recv check

Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs

Pull ext2 fix from Jan Kara:
"One simple fix of oops in ext2 which was recently hit by Christoph"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: Fix oops in ext2_get_block() called from ext2_quota_write()

Merge tag 'rdma-for-linus' of git://git./linux/kernel/git/roland/infiniband

Pull infiniband fixes from Roland Dreier:
"Last batch of InfiniBand/RDMA changes for 3.13 / 2014:
   - Additional checks for uverbs to ensure forward compatibility,
     handle malformed input better.
   - Fix potential use-after-free in iWARP connection manager.
   - Make a function static"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/uverbs: Check access to userspace response buffer in extended command
  IB/uverbs: Check input length in flow steering uverbs
  IB/uverbs: Set error code when fail to consume all flow_spec items
  IB/uverbs: Check reserved fields in create_flow
  IB/uverbs: Check comp_mask in destroy_flow
  IB/uverbs: Check reserved field in extended command header
  IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA()
  IB/core: const'ify inbuf in struct ib_udata
  RDMA/iwcm: Don't touch cm_id after deref in rem_ref
  RDMA/cxgb4: Make _c4iw_write_mem_dma() static

selinux: selinux_setprocattr()->ptrace_parent() needs rcu_read_lock()

selinux_setprocattr() does ptrace_parent(p) under task_lock(p),
but task_struct->alloc_lock doesn't pin ->parent or ->ptrace,
this looks confusing and triggers the "suspicious RCU usage"
warning because ptrace_parent() does rcu_dereference_check().

And in theory this is wrong, spin_lock()->preempt_disable()
doesn't necessarily imply rcu_read_lock() we need to access
the ->parent.

Reported-by: Evan McNabb <emcnabb@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>

selinux: fix broken peer recv check

Fix a broken networking check. Return an error if peer recv fails. If
secmark is active and the packet recv succeeds the peer recv error is
ignored.

Signed-off-by: Chad Hanson <chanson@trustedcs.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Xmas fixes pull, all small nothing major, intel, radeon, one ttm
  regression, and one build fix"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/ttm: Fix swapin regression
  gpu: fix qxl missing crc32_le
  drm/radeon: fix asic gfx values for scrapper asics
  drm/i915: Use the correct GMCH_CTRL register for Sandybridge+
  drm/radeon: check for 0 count in speaker allocation and SAD code
  drm/radeon/dpm: disable ss on Cayman
  drm/radeon/dce6: set correct number of audio pins
  drm/i915: get a PC8 reference when enabling the power well
  drm/i915: change CRTC assertion on LCPLL disable
  drm/i915: Fix erroneous dereference of batch_obj inside reset_status
  drm/i915: Prevent double unref following alloc failure during execbuffer

Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/rusty/linux

Pull virtio balloon driver fixes from Rusty Russell:
"Refactoring broke the balloon driver, and fixing kallsyms on ARM broke
  some (non-ARM) MMUless setups, so we're making that fix ARM-only for
  now.

  Unfortunately, the ARM refactoring which broke kallsyms/perf was
  CC:stable, so the fix (which broken non-ARM) was also CC:stable, so
  now the partial reversion is also CC:stable..."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  scripts/link-vmlinux.sh: only filter kernel symbols for arm
  virtio_balloon: update_balloon_size(): update correct field

Merge branches 'cxgb4', 'flowsteer' and 'misc' into for-linus

Merge tag 'drm-intel-fixes-2013-12-18' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes

Besides the 2 fixes for tricky corner cases in gem from Chris I've
promised already two patche from Paulo to fix pc8 warnings (both ported
from -next, bug report from Dave Jones) and one patch from to fix vga
enable/disable on snb+. That one is a really old bug, but apparently it
can cause machine hangs if you try hard enough with vgacon/efifb handover.

* tag 'drm-intel-fixes-2013-12-18' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: Use the correct GMCH_CTRL register for Sandybridge+
  drm/i915: get a PC8 reference when enabling the power well
  drm/i915: change CRTC assertion on LCPLL disable
  drm/i915: Fix erroneous dereference of batch_obj inside reset_status
  drm/i915: Prevent double unref following alloc failure during execbuffer

Merge branch 'drm-fixes-3.13' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

- fix for a long standing corruption bug on some Trinity/Richland parts.
- Stability fix for cayman dpm
- audio fixes for dce6+

* 'drm-fixes-3.13' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon: fix asic gfx values for scrapper asics
  drm/radeon: check for 0 count in speaker allocation and SAD code
  drm/radeon/dpm: disable ss on Cayman
  drm/radeon/dce6: set correct number of audio pins

drm/ttm: Fix swapin regression

Commit "drm/ttm: Don't move non-existing data" didn't take the
swapped-out corner case into account. This patch corrects that.
Fixes blank screen after attempted suspend / hibernate on vmwgfx.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

gpu: fix qxl missing crc32_le

Fix build error: qxl uses crc32 functions so it needs to select
CRC32.

Also use angle quotes around a kernel header file name.

drivers/built-in.o: In function `qxl_display_read_client_monitors_config':
(.text+0x19d754): undefined reference to `crc32_le'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Linux 3.13-rc5

Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"Much smaller batch of fixes this week.

  Biggest one is a revert of an OMAP display change that removed some
  non-DT pinmux code that was still needed for 3.13 to get DSI displays
  to work.

  There's also a fix that resolves some misdescribed GPIO controller
  resources on shmobile.  The rest are mostly smaller fixes, a couple of
  MAINTAINERS updates, etc"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  Revert "ARM: OMAP2+: Remove legacy mux code for display.c"
  MAINTAINERS: Add keystone clock drivers
  MAINTAINERS: Add keystone git tree information
  ARM: s3c64xx: dt: Fix boot failure due to double clock initialization
  ARM: shmobile: r8a7790: Fix GPIO resources in DTS
  irqchip: renesas-intc-irqpin: Fix register bitfield shift calculation
  ARM: shmobile: lager: phy fixup needs CONFIG_PHYLIB

Merge tag 'firewire-fix' of git://git./linux/kernel/git/ieee1394/linux1394

Pull firewire fixlet from Stefan Richter:
"A one-liner to reenable WRITE SAME over SBP-2 like in v3.8...v3.12.
  Buggy targets which could malfunction when being subjected to this
  command are already sufficiently protected by a scsi_level check in sd
  + SCSI core"

* tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: sbp2: bring back WRITE SAME support

Merge git://git./linux/kernel/git/nab/target-pending

Pull SCSI target fixes from Nicholas Bellinger:
"Mostly minor items this time around, the most notable being a FILEIO
  backend change to enforce hw_max_sectors based upon the current
  block_size to address a bug where large sized I/Os (> 1M) where being
  rejected"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure
  target: Remove extra percpu_ref_init
  target/file: Update hw_max_sectors based on current block_size
  iser-target: Move INIT_WORK setup into isert_create_device_ib_res
  iscsi-target: Fix incorrect np->np_thread NULL assignment
  qla2xxx: Fix schedule_delayed_work() for target timeout calculations
  iser-target: fix error return code in isert_create_device_ib_res()
  iscsi-target: Fix-up all zero data-length CDBs with R/W_BIT set
  target: Remove write-only stats fields and lock from struct se_node_acl
  iscsi-target: return -EINVAL on oversized configfs parameter

Merge git://git.kvack.org/~bcrl/aio-next

Pull AIO leak fixes from Ben LaHaise:
"I've put these two patches plus Linus's change through a round of
  tests, and it passes millions of iterations of the aio numa
  migratepage test, as well as a number of repetitions of a few simple
  read and write tests.

  The first patch fixes the memory leak Kent introduced, while the
  second patch makes aio_migratepage() much more paranoid and robust"

* git://git.kvack.org/~bcrl/aio-next:
  aio/migratepages: make aio migrate pages sane
  aio: fix kioctx leak introduced by "aio: Fix a trinity splat"

aio: clean up and fix aio_setup_ring page mapping

Since commit 36bc08cc01709 ("fs/aio: Add support to aio ring pages
migration") the aio ring setup code has used a special per-ring backing
inode for the page allocations, rather than just using random anonymous
pages.

However, rather than remembering the pages as it allocated them, it
would allocate the pages, insert them into the file mapping (dirty, so
that they couldn't be free'd), and then forget about them. And then to
look them up again, it would mmap the mapping, and then use
"get_user_pages()" to get back an array of the pages we just created.

Now, not only is that incredibly inefficient, it also leaked all the
pages if the mmap failed (which could happen due to excessive number of
mappings, for example).

So clean it all up, making it much more straightforward. Also remove
some left-overs of the previous (broken) mm_populate() usage that was
removed in commit d6c355c7dabc ("aio: fix race in ring buffer page
lookup introduced by page migration support") but left the pointless and
now misleading MAP_POPULATE flag around.

Tested-and-acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aio/migratepages: make aio migrate pages sane

The arbitrary restriction on page counts offered by the core
migrate_page_move_mapping() code results in rather suspicious looking
fiddling with page reference counts in the aio_migratepage() operation.
To fix this, make migrate_page_move_mapping() take an extra_count parameter
that allows aio to tell the code about its own reference count on the page
being migrated.

While cleaning up aio_migratepage(), make it validate that the old page
being passed in is actually what aio_migratepage() expects to prevent
misbehaviour in the case of races.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>

aio: fix kioctx leak introduced by "aio: Fix a trinity splat"

e34ecee2ae791df674dfb466ce40692ca6218e43 reworked the percpu reference
counting to correct a bug trinity found. Unfortunately, the change lead
to kioctxes being leaked because there was no final reference count to
put. Add that reference count back in to fix things.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: stable@vger.kernel.org

null_blk: support submit_queues on use_per_node_hctx

In the case of both the submit_queues param and use_per_node_hctx param
are used. We limit the number af submit_queues to the number of online
nodes.

If the submit_queues is a multiple of nr_online_nodes, its trivial. Simply map
them to the nodes. For example: 8 submit queues are mapped as node0[0,1],
node1[2,3], ...
If uneven, we are left with an uneven number of submit_queues that must be
mapped. These are mapped toward the first node and onward. E.g. 5
submit queues mapped onto 4 nodes are mapped as node0[0,1], node1[2], ...

Signed-off-by: Matias Bjorling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

null_blk: set use_per_node_hctx param to false

The defaults for the module is to instantiate itself with blk-mq and a
submit queue for each CPU node in the system.

To save resources, initialize instead with a single submit queue.

Signed-off-by: Matias Bjorling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

null_blk: corrections to documentation

Randy Dunlap reported a couple of grammar errors and unfortunate usages of
socket/node/core.

Signed-off-by: Matias Bjorling <m@bjorling.me>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Don't set the INITRD_COMPRESS environment variable automatically

Commit 1bf49dd4be0b ("./Makefile: export initial ramdisk compression
config option") started setting the INITRD_COMPRESS environment variable
depending on which decompression models the kernel had available.

That is completely broken.

For example, we by default have CONFIG_RD_LZ4 enabled, and are able to
decompress such an initrd, but the user tools to *create* such an initrd
may not be availble. So trying to tell dracut to generate an
lz4-compressed image just because we can decode such an image is
completely inappropriate.

Cc: J P <ppandit@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'xfs-for-linus-v3.13-rc5' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:
"This contains fixes for some asserts
   related to project quotas, a memory leak, a hang when disabling group or
   project quotas before disabling user quotas, Dave's email address, several
   fixes for the alignment of file allocation to stripe unit/width geometry, a
   fix for an assertion with xfs_zero_remaining_bytes, and the behavior of
   metadata writeback in the face of IO errors.

   Details:
   - fix memory leak in xfs_dir2_node_removename
   - fix quota assertion in xfs_setattr_size
   - fix quota assertions in xfs_qm_vop_create_dqattach
   - fix for hang when disabling group and project quotas before
     disabling user quotas
   - fix Dave Chinner's email address in MAINTAINERS
   - fix for file allocation alignment
   - fix for assertion in xfs_buf_stale by removing xfsbdstrat
   - fix for alignment with swalloc mount option
   - fix for "retry forever" semantics on IO errors"

* tag 'xfs-for-linus-v3.13-rc5' of git://oss.sgi.com/xfs/xfs:
  xfs: abort metadata writeback on permanent errors
  xfs: swalloc doesn't align allocations properly
  xfs: remove xfsbdstrat error
  xfs: align initial file allocations correctly
  MAINTAINERS: fix incorrect mail address of XFS maintainer
  xfs: fix infinite loop by detaching the group/project hints from user dquot
  xfs: fix assertion failure at xfs_setattr_nonsize
  xfs: fix false assertion at xfs_qm_vop_create_dqattach
  xfs: fix memory leak in xfs_dir2_node_removename

mm: fix build of split ptlock code

Commit 597d795a2a78 ('mm: do not allocate page->ptl dynamically, if
spinlock_t fits to long') restructures some allocators that are compiled
even if USE_SPLIT_PTLOCKS arn't used.  It results in compilation
failure:

  mm/memory.c:4282:6: error: 'struct page' has no member named 'ptl'
  mm/memory.c:4288:12: error: 'struct page' has no member named 'ptl'

Add in the missing ifdef.

Fixes: 597d795a2a78 ('mm: do not allocate page->ptl dynamically, if spinlock_t fits to long')
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'arc-fixes-for-3.13-rc5' of git://git./linux/kernel/git/vgupta/arc

Pull ARC fix from Vineet Gupta:
"Fix busted syscall table due to unistd header inclusion issue"

* tag 'arc-fixes-for-3.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: Allow conditional multiple inclusion of uapi/asm/unistd.h

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 ptrace fix from Catalin Marinas.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: ptrace: avoid using HW_BREAKPOINT_EMPTY for disabled events

pstore: Don't allow high traffic options on fragile devices

Some pstore backing devices use on board flash as persistent
storage. These have limited numbers of write cycles so it
is a poor idea to use them from high frequency operations.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'dmaengine-fixes-3.13-rc4' of git://git./linux/kernel/git/djbw/dmaengine

Pull dmaengine fixes from Dan Williams:

- deprecation of net_dma to be removed in 3.14

- crash regression fix in pl330 from the dmaengine_unmap rework

- crash regression fix for any channel running raid ops without
   CONFIG_ASYNC_TX_DMA from dmaengine_unmap

- memory leak regression in mv_xor from dmaengine_unmap

- build warning regressions in mv_xor, fsldma, ppc4xx, txx9, and
   at_hdmac from dmaengine_unmap

- sleep in atomic regression in dma_async_memcpy_pg_to_pg

- new fix in mv_xor for handling channel initialization failures

* tag 'dmaengine-fixes-3.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
  net_dma: mark broken
  dma: pl330: ensure DMA descriptors are zero-initialised
  dmaengine: fix sleep in atomic
  dmaengine: mv_xor: fix oops when channels fail to initialise
  dma: mv_xor: Use dmaengine_unmap_data for the self-tests
  dmaengine: fix enable for high order unmap pools
  dma: fix build warnings in txx9
  dmatest: fix build warning on mips
  dma: fix fsldma build warnings
  dma: fix build warnings in ppc4xx
  dmaengine: at_hdmac: remove unused function
  dma: mv_xor: remove mv_desc_get_dest_addr()

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
"The PPC folks had a large amount of changes queued for 3.13, and now
  they are fixing the bugs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PPC: Book3S HV: Don't drop low-order page address bits
  powerpc: book3s: kvm: Don't abuse host r2 in exit path
  powerpc/kvm/booke: Fix build break due to stack frame size warning
  KVM: PPC: Book3S: PR: Enable interrupts earlier
  KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy
  KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu
  KVM: PPC: Book3S: PR: Don't clobber our exit handler id
  powerpc: kvm: fix rare but potential deadlock scene
  KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call
  KVM: PPC: Book3S HV: Make tbacct_lock irq-safe
  KVM: PPC: Book3S HV: Refine barriers in guest entry/exit
  KVM: PPC: Book3S HV: Fix physical address calculations

mm: do not allocate page->ptl dynamically, if spinlock_t fits to long

In struct page we have enough space to fit long-size page->ptl there,
but we use dynamically-allocated page->ptl if size(spinlock_t) is larger
than sizeof(int).

It hurts 64-bit architectures with CONFIG_GENERIC_LOCKBREAK, where
sizeof(spinlock_t) == 8, but it easily fits into struct page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: page_alloc: revert NUMA aspect of fair allocation policy

Commit 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy") meant
to bring aging fairness among zones in system, but it was overzealous
and badly regressed basic workloads on NUMA systems.

Due to the way kswapd and page allocator interacts, we still want to
make sure that all zones in any given node are used equally for all
allocations to maximize memory utilization and prevent thrashing on the
highest zone in the node.

While the same principle applies to NUMA nodes - memory utilization is
obviously improved by spreading allocations throughout all nodes -
remote references can be costly and so many workloads prefer locality
over memory utilization. The original change assumed that
zone_reclaim_mode would be a good enough predictor for that, but it
turned out to be as indicative as a coin flip.

Revert the NUMA aspect of the fairness until we can find a proper way to
make it configurable and agree on a sane default.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@kernel.org> # 3.12
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Revert "mm: page_alloc: exclude unreclaimable allocations from zone fairness policy"

This reverts commit 73f038b863df. The NUMA behaviour of this patch is
less than ideal. An alternative approch is to interleave allocations
only within local zones which is implemented in the next patch.

Cc: stable@vger.kernel.org
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: Fix NULL pointer dereference in madvise(MADV_WILLNEED) support

Sasha Levin found a NULL pointer dereference that is due to a missing
page table lock, which in turn is due to the pmd entry in question being
a transparent huge-table entry.

The code - introduced in commit 1998cc048901 ("mm: make
madvise(MADV_WILLNEED) support swap file prefetch") - correctly checks
for this situation using pmd_none_or_trans_huge_or_clear_bad(), but it
turns out that that function doesn't work correctly.

pmd_none_or_trans_huge_or_clear_bad() expected that pmd_bad() would
trigger if the transparent hugepage bit was set, but it doesn't do that
if pmd_numa() is also set. Note that the NUMA bit only gets set on real
NUMA machines, so people trying to reproduce this on most normal
development systems would never actually trigger this.

Fix it by removing the very subtle (and subtly incorrect) expectation,
and instead just checking pmd_trans_huge() explicitly.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
[ Additionally remove the now stale test for pmd_trans_huge() inside the
pmd_bad() case - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'renesas-fixes-for-v3.13' of git://git./linux/kernel/git/horms/renesas into fixes

From Simon Horman:
Renesas ARM based SoC fixes for v3.13

* r8a7790 (R-Car H1) SoC
  - Correct GPIO resources in DT.

    This problem has been present since GPIOs were added to the r8a7790 SoC
    by f98e10c88aa95bf7 ("ARM: shmobile: r8a7790: Add GPIO controller
    devices to device tree") in v3.12-rc1.

* irqchip renesas-intc-irqpin
  - Correct register bitfield shift calculation

    This bug has been present since the renesas-intc-irqpin driver was
    introduced by 443580486e3b9657 ("irqchip: Renesas INTC External IRQ pin
    driver") in v3.10-rc1

* Lager board
  - Do not build the phy fixup unless CONFIG_PHYLIB is enabled

    This problem was introduced by 48c8b96f21817aad

* tag 'renesas-fixes-for-v3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
  ARM: shmobile: r8a7790: Fix GPIO resources in DTS
  irqchip: renesas-intc-irqpin: Fix register bitfield shift calculation
  ARM: shmobile: lager: phy fixup needs CONFIG_PHYLIB

Signed-off-by: Kevin Hilman <khilman@linaro.org>

IB/uverbs: Check access to userspace response buffer in extended command

This patch adds a check on the output buffer with access_ok(VERIFY_WRITE, ...)
to ensure the whole buffer is in userspace memory before using the
pointer in uverbs functions.  If the buffer or a subset of it is not
valid, returns -EFAULT to the caller.

This will also catch invalid buffer before the final call to
copy_to_user() which happen late in most uverb functions.

Just like the check in read(2) syscall, it's a sanity check to detect
invalid parameters provided by userspace. This particular check was added
in vfs_read() by Linus Torvalds for v2.6.12 with following commit message:

https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=fd770e66c9a65b14ce114e171266cf6f393df502

  Make read/write always do the full "access_ok()" tests.

  The actual user copy will do them too, but only for the
  range that ends up being actually copied. That hides
  bugs when the range has been clamped by file size or other
  issues.

Note: there's no need to check input buffer since vfs_write() already does
access_ok(VERIFY_READ, ...) as part of write() syscall.

Link: http://marc.info/?i=cover.1387273677.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

IB/uverbs: Check input length in flow steering uverbs

Since ib_copy_from_udata() doesn't check yet the available input data
length before accessing userspace memory, an explicit check of this
length is required to prevent:

- reading past the user provided buffer,
- underflow when subtracting the expected command size from the input
length.

This will ensure the newly added flow steering uverbs don't try to
process truncated commands.

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

IB/uverbs: Set error code when fail to consume all flow_spec items

If the flow_spec items parsed count does not match the number of items
declared in the flow_attr command, or if not all bytes are used for
flow_spec items (eg. trailing garbage), a log message is reported and
the function leave through the error path. Unfortunately the error
code is currently not set.

This patch set error code to -EINVAL in such cases, so that the error
is reported to userspace instead of silently fail.

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

IB/uverbs: Check reserved fields in create_flow

As noted by Daniel Vetter in its article "Botching up ioctls"[1]

  "Check *all* unused fields and flags and all the padding for whether
   it's 0, and reject the ioctl if that's not the case.  Otherwise
   your nice plan for future extensions is going right down the
   gutters since someone *will* submit an ioctl struct with random
   stack garbage in the yet unused parts. Which then bakes in the ABI
   that those fields can never be used for anything else but garbage."

It's important to ensure that reserved fields are set to known value,
so that it will be possible to use them latter to extend the ABI.

The same reasonning apply to comp_mask field present in newer uverbs
command: per commit 22878dbc9173 ("IB/core: Better checking of
userspace values for receive flow steering"), unsupported values in
comp_mask are rejected.

[1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

IB/uverbs: Check comp_mask in destroy_flow

Just like the check added to create_flow in 22878dbc9173 ("IB/core:
Better checking of userspace values for receive flow steering"),
comp_mask must be checked in destroy_flow too.

Since only empty comp_mask is currently supported, any other value
must be rejected.

This check was silently added in a previous patch[1] to move comp_mask
in extended command header, part of previous patchset[2] against
create/destroy_flow uverbs. The idea of moving comp_mask to the header
was discarded for the final patchset[3].

Unfortunately the check added in destroy_flow uverb was not integrated
in the final patchset.

[1] http://marc.info/?i=40175eda10d670d098204da6aa4c327a0171ae5f.1381510045.git.ydroneaud@opteya.com
[2] http://marc.info/?i=cover.1381510045.git.ydroneaud@opteya.com
[3] http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com

Cc: Matan Barak <matanb@mellanox.com>
Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

IB/uverbs: Check reserved field in extended command header

As noted by Daniel Vetter in its article "Botching up ioctls"[1]

  "Check *all* unused fields and flags and all the padding for whether
   it's 0, and reject the ioctl if that's not the case.  Otherwise
   your nice plan for future extensions is going right down the
   gutters since someone *will* submit an ioctl struct with random
   stack garbage in the yet unused parts. Which then bakes in the ABI
   that those fields can never be used for anything else but garbage."

It's important to ensure that reserved fields are set to known value,
so that it will be possible to use them latter to extend the ABI.

The same reasonning apply to comp_mask field present in newer uverbs
command: per commit 22878dbc9173 ("IB/core: Better checking of
userspace values for receive flow steering"), unsupported values in
comp_mask are rejected.

[1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA()

Trying to have a ternary operator to choose between NULL (or 0) and the
real pointer value in invocations leads to an impossible choice between
a sparse error about a literal 0 used as a NULL pointer, and a gcc
warning about "pointer/integer type mismatch in conditional expression."

Rather than clutter the source with more casts, move the ternary
operator into a new INIT_UDATA_BUF_OR_NULL() macro, which makes it
easier to use and simplifies its callers.

Reported-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

Merge tag 'signed-for-3.13' of git://github.com/agraf/linux-2.6 into kvm-master

Patch queue for 3.13 - 2013-12-18

This fixes some grave issues we've only found after 3.13-rc1:

  - Make the modularized HV/PR book3s kvm work well as modules
  - Fix some race conditions
  - Fix compilation with certain compilers (booke)
  - Fix THP for book3s_hv
  - Fix preemption for book3s_pr

Alexander Graf (4):
      KVM: PPC: Book3S: PR: Don't clobber our exit handler id
      KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu
      KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy
      KVM: PPC: Book3S: PR: Enable interrupts earlier

Aneesh Kumar K.V (1):
      powerpc: book3s: kvm: Don't abuse host r2 in exit path

Paul Mackerras (5):
      KVM: PPC: Book3S HV: Fix physical address calculations
      KVM: PPC: Book3S HV: Refine barriers in guest entry/exit
      KVM: PPC: Book3S HV: Make tbacct_lock irq-safe
      KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call
      KVM: PPC: Book3S HV: Don't drop low-order page address bits

Scott Wood (1):
      powerpc/kvm/booke: Fix build break due to stack frame size warning

pingfan liu (1):
      powerpc: kvm: fix rare but potential deadlock scene

Merge tag 'stable/for-linus-3.13-rc4-tag' of git://git./linux/kernel/git/xen/tip

Pull Xen bugfixes from Konrad Rzeszutek Wilk:
- Fix balloon driver for auto-translate guests (PVHVM, ARM) to not use
   scratch pages.
- Fix block API header for ARM32 and ARM64 to have proper layout
- On ARM when mapping guests, stick on PTE_SPECIAL
- When using SWIOTLB under ARM, don't call swiotlb functions twice
- When unmapping guests memory and if we fail, don't return pages which
   failed to be unmapped.
- Grant driver was using the wrong address on ARM.

* tag 'stable/for-linus-3.13-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/balloon: Seperate the auto-translate logic properly (v2)
  xen/block: Correctly define structures in public headers on ARM32 and ARM64
  arm: xen: foreign mapping PTEs are special.
  xen/arm64: do not call the swiotlb functions twice
  xen: privcmd: do not return pages which we have failed to unmap
  XEN: Grant table address, xen_hvm_resume_frames, is a phys_addr not a pfn

Merge tag 'trace-fixes-v3.13-rc2' of git://git./linux/kernel/git/rostedt/linux-trace

Pull ftrace fix from Steven Rostedt:
"This fixes a long standing bug in the ftrace profiler.  The problem is
  that the profiler only initializes the online CPUs, and not possible
  CPUs.  This causes issues if the user takes CPUs online or offline
  while the profiler is running.

  If we online a CPU after starting the profiler, we lose all the trace
  information on the CPU going online.

  If we offline a CPU after running a test and start a new test, it will
  not clear the old data from that CPU.

  This bug causes incorrect data to be reported to the user if they
  online or offline CPUs during the profiling"

* tag 'trace-fixes-v3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Initialize the ftrace profiler for each possible cpu

Merge tag 'omap-for-v3.13/display-fix' of git://git./linux/kernel/git/tmlind/linux-omap into fixes

I accidentally removed some mux code for omap4 that I thought was
dead code as omap4 has been booting with device tree only since
v3.10. Turns out I also removed some display related mux code,
so let's revert that except for the dead code parts.

* tag 'omap-for-v3.13/display-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: (439 commits)
Revert "ARM: OMAP2+: Remove legacy mux code for display.c"
+Linux 3.13-rc4

ext4: add explicit casts when masking cluster sizes

The missing casts can cause the high 64-bits of the physical blocks to
be lost. Set up new macros which allows us to make sure the right
thing happen, even if at some point we end up supporting larger
logical block numbers.

Thanks to the Emese Revfy and the PaX security team for reporting this
issue.

Reported-by: PaX Team <pageexec@freemail.hu>
Reported-by: Emese Revfy <re.emese@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

drm/radeon: fix asic gfx values for scrapper asics

Fixes gfx corruption on certain TN/RL parts.

bug:
https://bugs.freedesktop.org/show_bug.cgi?id=60389

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

Merge tag 'keystone/maintainer-file' of git://git./linux/kernel/git/ssantosh/linux-keystone into fixes

From Santosh Shilimkar:
Couple of updates to MAINTAINERS file for Keystone

- Add git tree information
- Add clock drivers entry

* tag 'keystone/maintainer-file' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone:
MAINTAINERS: Add keystone clock drivers
MAINTAINERS: Add keystone git tree information

Signed-off-by: Kevin Hilman <khilman@linaro.org>

qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure

This patch fixes a possible scsi_host reference leak in qlt_lport_register(),
when a non zero return from the passed (*callback) does not call drop the
local reference via scsi_host_put() before returning.

This currently does not effect existing tcm_qla2xxx code as the passed callback
will never fail, but fix this up regardless for future code.

Cc: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Remove extra percpu_ref_init

lun->lun_ref is also initialized in core_tpg_post_addlun, so it doesn't
need to be done in core_tpg_setup_virtual_lun0.

(nab: Drop left-over percpu_ref_cancel_init in failure path)

Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

libata, freezer: avoid block device removal while system is frozen

Freezable kthreads and workqueues are fundamentally problematic in
that they effectively introduce a big kernel lock widely used in the
kernel and have already been the culprit of several deadlock
scenarios.  This is the latest occurrence.

During resume, libata rescans all the ports and revalidates all
pre-existing devices.  If it determines that a device has gone
missing, the device is removed from the system which involves
invalidating block device and flushing bdi while holding driver core
layer locks.  Unfortunately, this can race with the rest of device
resume.  Because freezable kthreads and workqueues are thawed after
device resume is complete and block device removal depends on
freezable workqueues and kthreads (e.g. bdi_wq, jbd2) to make
progress, this can lead to deadlock - block device removal can't
proceed because kthreads are frozen and kthreads can't be thawed
because device resume is blocked behind block device removal.

839a8e8660b6 ("writeback: replace custom worker pool implementation
with unbound workqueue") made this particular deadlock scenario more
visible but the underlying problem has always been there - the
original forker task and jbd2 are freezable too.  In fact, this is
highly likely just one of many possible deadlock scenarios given that
freezer behaves as a big kernel lock and we don't have any debug
mechanism around it.

I believe the right thing to do is getting rid of freezable kthreads
and workqueues.  This is something fundamentally broken.  For now,
implement a funny workaround in libata - just avoid doing block device
hot[un]plug while the system is frozen.  Kernel engineering at its
finest.  :(

v2: Add EXPORT_SYMBOL_GPL(pm_freezing) for cases where libata is built
    as a module.

v3: Comment updated and polling interval changed to 10ms as suggested
    by Rafael.

v4: Add #ifdef CONFIG_FREEZER around the hack as pm_freezing is not
    defined when FREEZER is not configured thus breaking build.
    Reported by kbuild test robot.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tomaž Šolc <tomaz.solc@tablix.org>
Reviewed-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=62801
Link: http://lkml.kernel.org/r/20131213174932.GA27070@htj.dyndns.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Cc: kbuild test robot <fengguang.wu@intel.com>

arm64: ptrace: avoid using HW_BREAKPOINT_EMPTY for disabled events

Commit 8f34a1da35ae ("arm64: ptrace: use HW_BREAKPOINT_EMPTY type for
disabled breakpoints") fixed an issue with GDB trying to zero breakpoint
control registers. The problem there is that the arch hw_breakpoint code
will attempt to create a (disabled), execute breakpoint of length 0.

This will fail validation and report unexpected failure to GDB. To avoid
this, we treated disabled breakpoints as HW_BREAKPOINT_EMPTY, but that
seems to have broken with recent kernels, causing watchpoints to be
treated as TYPE_INST in the core code and returning ENOSPC for any
further breakpoints.

This patch fixes the problem by prioritising the `enable' field of the
breakpoint: if it is cleared, we simply update the perf_event_attr to
indicate that the thing is disabled and don't bother changing either the
type or the length. This reinforces the behaviour that the breakpoint
control register is essentially read-only apart from the enable bit
when disabling a breakpoint.

Cc: <stable@vger.kernel.org>
Reported-by: Aaron Liu <liucy214@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
"An RT group-scheduling fix and the sched-domains topology setup fix
  from Mel"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/rt: Fix rq's cpupri leak while enqueue/dequeue child RT entities
  sched: Assign correct scheduling domain to 'sd_llc'

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"An ABI documentation fix, and a mixed-PMU perf-info-corruption fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Document the new transaction sample type
perf: Disable all pmus on unthrottling and rescheduling

Merge tag 'sound-3.13-rc5' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"We have a bit more changes than usual in ASoC here, as it was slipped
  from the previous update.  There are one minr ASoC PCM code fix and
  ASoC dmaengine fix, in addition of a collection of small ASoC driver
  fixes.  The rest are a couple of HD-audio stable fixups, and a
  long-standing fix for the paused stream handling.

  So, all commits look not scary (and hopefully won't give you
  disastrous holiday season)"

* tag 'sound-3.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Add Dell headset detection quirk for one more laptop model
  ASoC: wm8904: fix DSP mode B configuration
  ASoC: wm_adsp: Add small delay while polling DSP RAM start
  ALSA: Add SNDRV_PCM_STATE_PAUSED case in wait_for_avail function
  ASoC: kirkwood: Fix the CPU DAI rates
  ASoC: wm5110: Correct HPOUT3 DAPM route typo
  ALSA: hda - Add Dell headset detection quirk for three laptop models
  ALSA: hda - Add enable_msi=0 workaround for four HP machines
  ASoC: don't leak on error in snd_dmaengine_pcm_register
  ASoC: fsl: imx-wm8962: Don't update bias_level in machine driver
  ASoC: tegra: fix uninitialized variables in set_fmt
  ASoC: wm8962: Enable SYSCLK provisonally before fetching generated DSPCLK_DIV
  ASoC: sam9x5_wm8731: change to work in DSP A mode
  ASoC: atmel_ssc_dai: add dai trigger ops
  ASoC: soc-pcm: Use valid condition for snd_soc_dai_digital_mute() in hw_free()

null_blk: warning on ignored submit_queues param

Let the user know when the number of submission queues are being
ignored.

Signed-off-by: Matias Bjorling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

null_blk: refactor init and init errors code paths

Simplify the initialization logic of the three block-layers.

- The queue initialization is split into two parts. This allows reuse of
code when initializing the sq-, bio- and mq-based layers.
- Set submit_queues default value to 0 and always set it at init time.
- Simplify the init error code paths.

Signed-off-by: Matias Bjorling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

null_blk: documentation

Add description of module and its parameters.

Signed-off-by: Matias Bjorling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

null_blk: mem garbage on NUMA systems during init

For NUMA systems, initializing the blk-mq layer and using per node hctx.
We initialize submit queues to 1, while blk-mq nr_hw_queues is
initialized to the number of NUMA nodes.

This makes the null_init_hctx function overwrite memory outside of what
it allocated. In my case it lead to writing garbage into struct
request_queue's mq_map.

Signed-off-by: Matias Bjorling <m@bjorling.me>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers: block: Mark the functions as static in skd_main.c

Mark functions skd_skmsg_state_to_str() and skd_skreq_state_to_str() as
static in skd_main.c because they are not used outside this file.

This eliminates the following warnings in skd_main.c:
drivers/block/skd_main.c:5272:13: warning: no previous prototype for ‘skd_skmsg_state_to_str’ [-Wmissing-prototypes]
drivers/block/skd_main.c:5284:13: warning: no previous prototype for ‘skd_skreq_state_to_str’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ARC: Allow conditional multiple inclusion of uapi/asm/unistd.h

Commit 97bc386fc12d "ARC: Add guard macro to uapi/asm/unistd.h"
inhibited multiple inclusion of ARCH unistd.h. This however hosed the system
since Generic syscall table generator relies on it being included twice,
and in lack-of an empty table was emitted by C preprocessor.

Fix that by allowing one exception to rule for the special case (just
like Xtensa)

Suggested-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

Merge tag 'asoc-v3.13-rc4' of git://git./linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v3.13

The fixes here are all driver specific ones, none of which particularly
stand out but all of which are useful to users of those drivers.

Merge remote-tracking branches 'asoc/fix/adsp', 'asoc/fix/arizona', 'asoc/fix/atmel', 'asoc/fix/fsl', 'asoc/fix/kirkwood', 'asoc/fix/tegra', 'asoc/fix/wm8904' and 'asoc/fix/wm8962' into asoc-linus

Merge remote-tracking branch 'asoc/fix/dma' into asoc-linus

Merge remote-tracking branch 'asoc/fix/core' into asoc-linus

target/file: Update hw_max_sectors based on current block_size

This patch allows FILEIO to update hw_max_sectors based on the current
max_bytes_per_io. This is required because vfs_[writev,readv]() can accept
a maximum of 2048 iovecs per call, so the enforced hw_max_sectors really
needs to be calculated based on block_size.

This addresses a >= v3.5 bug where block_size=512 was rejecting > 1M
sized I/O requests, because FD_MAX_SECTORS was hardcoded to 2048 for
the block_size=4096 case.

(v2: Use max_bytes_per_io instead of ->update_hw_max_sectors)

Reported-by: Henrik Goldman <hg@x-formation.com>
Cc: <stable@vger.kernel.org> #3.5+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

iser-target: Move INIT_WORK setup into isert_create_device_ib_res

This patch moves INIT_WORK setup for cq_desc->cq_[rx,tx]_work into
isert_create_device_ib_res(), instead of being done each callback
invocation in isert_cq_[rx,tx]_callback().

This also fixes a 'INFO: trying to register non-static key' warning
when cancel_work_sync() is called before INIT_WORK has setup the
struct work_struct.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

iscsi-target: Fix incorrect np->np_thread NULL assignment

When shutting down a target there is a race condition between
iscsit_del_np() and __iscsi_target_login_thread().
The latter sets the thread pointer to NULL, and the former
tries to issue kthread_stop() on that pointer without any
synchronization.

This patch moves the np->np_thread NULL assignment into
iscsit_del_np(), after kthread_stop() has completed. It also
removes the signal_pending() + np_state check, and only
exits when kthread_should_stop() is true.

Reported-by: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Merge branch 'akpm' (incoming from Andrew)

Merge patches from Andrew Morton:
"23 fixes and a MAINTAINERS update"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
  mm/hugetlb: check for pte NULL pointer in __page_check_address()
  fix build with make 3.80
  mm/mempolicy: fix !vma in new_vma_page()
  MAINTAINERS: add Davidlohr as GPT maintainer
  mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
  mm/compaction: respect ignore_skip_hint in update_pageblock_skip
  mm/mempolicy: correct putback method for isolate pages if failed
  mm: add missing dependency in Kconfig
  sh: always link in helper functions extracted from libgcc
  mm: page_alloc: exclude unreclaimable allocations from zone fairness policy
  mm: numa: defer TLB flush for THP migration as long as possible
  mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates
  mm: fix TLB flush race between migration, and change_protection_range
  mm: numa: avoid unnecessary disruption of NUMA hinting during migration
  mm: numa: clear numa hinting information on mprotect
  sched: numa: skip inaccessible VMAs
  mm: numa: avoid unnecessary work on the failure path
  mm: numa: ensure anon_vma is locked to prevent parallel THP splits
  mm: numa: do not clear PTE for pte_numa update
  mm: numa: do not clear PMD during PTE update scan
  ...

mm/hugetlb: check for pte NULL pointer in __page_check_address()

In __page_check_address(), if address's pud is not present,
huge_pte_offset() will return NULL, we should check the return value.

Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: qiuxishi <qiuxishi@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fix build with make 3.80

According to Documentation/Changes, make 3.80 is still being supported
for building the kernel, hence make files must not make (unconditional)
use of features introduced only in newer versions.

Commit 1bf49dd4be0b ("./Makefile: export initial ramdisk compression
config option") however introduced "else ifeq" constructs which make
3.80 doesn't understand. Replace the logic there with more conventional
(in the kernel build infrastructure) list constructs (except that the
list here is intentionally limited to exactly one element).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: P J P <ppandit@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/mempolicy: fix !vma in new_vma_page()

BUG_ON(!vma) assumption is introduced by commit 0bf598d863e3 ("mbind:
add BUG_ON(!vma) in new_vma_page()"), however, even if

    address = __vma_address(page, vma);

and

    vma->start < address < vma->end

page_address_in_vma() may still return -EFAULT because of many other
conditions in it.  As a result the while loop in new_vma_page() may end
with vma=NULL.

This patch revert the commit and also fix the potential dereference NULL
pointer reported by Dan.

   http://marc.info/?l=linux-mm&m=137689530323257&w=2

  kernel BUG at mm/mempolicy.c:1204!
  invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  CPU: 3 PID: 7056 Comm: trinity-child3 Not tainted 3.13.0-rc3+ #2
  task: ffff8801ca5295d0 ti: ffff88005ab20000 task.ti: ffff88005ab20000
  RIP: new_vma_page+0x70/0x90
  RSP: 0000:ffff88005ab21db0  EFLAGS: 00010246
  RAX: fffffffffffffff2 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000008040075 RSI: ffff8801c3d74600 RDI: ffffea00079a8b80
  RBP: ffff88005ab21dc8 R08: 0000000000000004 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffff2
  R13: ffffea00079a8b80 R14: 0000000000400000 R15: 0000000000400000

  FS:  00007ff49c6f4740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007ff49c68f994 CR3: 000000005a205000 CR4: 00000000001407e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Stack:
   ffffea00079a8b80 ffffea00079a8bc0 ffffea00079a8ba0 ffff88005ab21e50
   ffffffff811adc7a 0000000000000000 ffff8801ca5295d0 0000000464e224f8
   0000000000000000 0000000000000002 0000000000000000 ffff88020ce75c00
  Call Trace:
    migrate_pages+0x12a/0x850
    SYSC_mbind+0x513/0x6a0
    SyS_mbind+0xe/0x10
    ia32_do_call+0x13/0x13
  Code: 85 c0 75 2f 4c 89 e1 48 89 da 31 f6 bf da 00 02 00 65 44 8b 04 25 08 f7 1c 00 e8 ec fd ff ff 5b 41 5c 41 5d 5d c3 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 4c 89 e6 48 89 df ba 01 00 00 00 e8 48
  RIP  [<ffffffff8119f200>] new_vma_page+0x70/0x90
   RSP <ffff88005ab21db0>

Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS: add Davidlohr as GPT maintainer

Add a new entry for the GPT standard. Any future changes will now be
routed through linux-efi.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully

After a successful hugetlb page migration by soft offline, the source
page will either be freed into hugepage_freelists or buddy(over-commit
page).  If page is in buddy, page_hstate(page) will be NULL.  It will
hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page().

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
  IP: [<ffffffff81163761>] dequeue_hwpoisoned_huge_page+0x131/0x1d0
  PGD c23762067 PUD c24be2067 PMD 0
  Oops: 0000 [#1] SMP

So check PageHuge(page) after call migrate_pages() successfully.

Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Tested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/compaction: respect ignore_skip_hint in update_pageblock_skip

update_pageblock_skip() only fits to compaction which tries to isolate
by pageblock unit. If isolate_migratepages_range() is called by CMA, it
try to isolate regardless of pageblock unit and it don't reference
get_pageblock_skip() by ignore_skip_hint. We should also respect it on
update_pageblock_skip() to prevent from setting the wrong information.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: <stable@vger.kernel.org> [3.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/mempolicy: correct putback method for isolate pages if failed

queue_pages_range() isolates hugetlbfs pages and putback_lru_pages()
can't handle these. We should change it to putback_movable_pages().

Naoya said that it is worth going into stable, because it can break
in-use hugepage list.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: <stable@vger.kernel.org> [3.12.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: add missing dependency in Kconfig

Eliminate the following (rand)config warning by adding missing PROC_FS
dependency:

warning: (HWPOISON_INJECT && MEM_SOFT_DIRTY) selects PROC_PAGE_MONITOR which has unmet direct dependencies (PROC_FS && MMU)

Signed-off-by: Sima Baymani <sima.baymani@gmail.com>
Suggested-by: David Rientjes <rientjes@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

sh: always link in helper functions extracted from libgcc

E.g. landisk_defconfig, which has CONFIG_NTFS_FS=m:

  ERROR: "__ashrdi3" [fs/ntfs/ntfs.ko] undefined!

For "lib-y", if no symbols in a compilation unit are referenced by other
units, the compilation unit will not be included in vmlinux.  This
breaks modules that do reference those symbols.

Use "obj-y" instead to fix this.

http://kisskb.ellerman.id.au/kisskb/buildresult/8838077/

This doesn't fix all cases. There are others, e.g. udivsi3.
This is also not limited to sh, many architectures handle this in the
same way.

A simple solution is to unconditionally include all helper functions.
A more complex solution is to make the choice of "lib-y" or "obj-y" depend
on CONFIG_MODULES:

  obj-$(CONFIG_MODULES) += ...
  lib-y($CONFIG_MODULES) += ...

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Tested-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Reviewed-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: page_alloc: exclude unreclaimable allocations from zone fairness policy

Dave Hansen noted a regression in a microbenchmark that loops around
open() and close() on an 8-node NUMA machine and bisected it down to
commit 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy").
That change forces the slab allocations of the file descriptor to spread
out to all 8 nodes, causing remote references in the page allocator and
slab.

The round-robin policy is only there to provide fairness among memory
allocations that are reclaimed involuntarily based on pressure in each
zone. It does not make sense to apply it to unreclaimable kernel
allocations that are freed manually, in this case instantly after the
allocation, and incur the remote reference costs twice for no reason.

Only round-robin allocations that are usually freed through page reclaim
or slab shrinking.

Bisected by Dave Hansen.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: numa: defer TLB flush for THP migration as long as possible

THP migration can fail for a variety of reasons. Avoid flushing the TLB
to deal with THP migration races until the copy is ready to start.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates

According to documentation on barriers, stores issued before a LOCK can
complete after the lock implying that it's possible tlb_flush_pending
can be visible after a page table update. As per revised documentation,
this patch adds a smp_mb__before_spinlock to guarantee the correct
ordering.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix TLB flush race between migration, and change_protection_range

There are a few subtle races, between change_protection_range (used by
mprotect and change_prot_numa) on one side, and NUMA page migration and
compaction on the other side.

The basic race is that there is a time window between when the PTE gets
made non-present (PROT_NONE or NUMA), and the TLB is flushed.

During that time, a CPU may continue writing to the page.

This is fine most of the time, however compaction or the NUMA migration
code may come in, and migrate the page away.

When that happens, the CPU may continue writing, through the cached
translation, to what is no longer the current memory location of the
process.

This only affects x86, which has a somewhat optimistic pte_accessible.
All other architectures appear to be safe, and will either always flush,
or flush whenever there is a valid mapping, even with no permissions
(SPARC).

The basic race looks like this:

CPU A CPU B CPU C

load TLB entry
make entry PTE/PMD_NUMA
fault on entry
read/write old page
start migrating page
change PTE/PMD to new page
read/write old page [*]
flush TLB
reload TLB from new entry
read/write new page
lose data

[*] the old page may belong to a new user at this point!

The obvious fix is to flush remote TLB entries, by making sure that
pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
still be accessible if there is a TLB flush pending for the mm.

This should fix both NUMA migration and compaction.

[mgorman@suse.de: fix build]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: numa: avoid unnecessary disruption of NUMA hinting during migration

do_huge_pmd_numa_page() handles the case where there is parallel THP
migration. However, by the time it is checked the NUMA hinting
information has already been disrupted. This patch adds an earlier
check with some helpers.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: numa: clear numa hinting information on mprotect

On a protection change it is no longer clear if the page should be still
accessible. This patch clears the NUMA hinting fault bits on a
protection change.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

sched: numa: skip inaccessible VMAs

Inaccessible VMA should not be trapping NUMA hint faults. Skip them.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: numa: avoid unnecessary work on the failure path

If a PMD changes during a THP migration then migration aborts but the
failure path is doing more work than is necessary.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: numa: ensure anon_vma is locked to prevent parallel THP splits

The anon_vma lock prevents parallel THP splits and any associated
complexity that arises when handling splits during THP migration. This
patch checks if the lock was successfully acquired and bails from THP
migration if it failed for any reason.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: numa: do not clear PTE for pte_numa update

The TLB must be flushed if the PTE is updated but change_pte_range is
clearing the PTE while marking PTEs pte_numa without necessarily
flushing the TLB if it reinserts the same entry. Without the flush,
it's conceivable that two processors have different TLBs for the same
virtual address and at the very least it would generate spurious faults.

This patch only unmaps the pages in change_pte_range for a full
protection change.

[riel@redhat.com: write pte_numa pte back to the page tables]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: numa: do not clear PMD during PTE update scan

If the PMD is flushed then a parallel fault in handle_mm_fault() will
enter the pmd_none and do_huge_pmd_anonymous_page() path where it'll
attempt to insert a huge zero page. This is wasteful so the patch
avoids clearing the PMD when setting pmd_numa.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: clear pmd_numa before invalidating

On x86, PMD entries are similar to _PAGE_PROTNONE protection and are
handled as NUMA hinting faults.  The following two page table protection
bits are what defines them

_PAGE_NUMA:set _PAGE_PRESENT:clear

A PMD is considered present if any of the _PAGE_PRESENT, _PAGE_PROTNONE,
_PAGE_PSE or _PAGE_NUMA bits are set.  If pmdp_invalidate encounters a
pmd_numa, it clears the present bit leaving _PAGE_NUMA which will be
considered not present by the CPU but present by pmd_present.  The
existing caller of pmdp_invalidate should handle it but it's an
inconsistent state for a PMD.  This patch keeps the state consistent
when calling pmdp_invalidate.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>