review.tizen.org Git - platform/adaptation/renesas_rcar/renesas

pid: Handle the exit of a multi-threaded init.

When a multi-threaded init exits and the initial thread is not the
last thread to exit the initial thread hangs around as a zombie
until the last thread exits. In that case zap_pid_ns_processes
needs to wait until there are only 2 hashed pids in the pid
namespace not one.

v2. Replace thread_pid_vnr(me) == 1 with the test thread_group_leader(me)
as suggested by Oleg.

Cc: stable@vger.kernel.org
Cc: Oleg Nesterov <oleg@redhat.com>
Reported-by: Caj Larsson <caj@omnicloud.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

igb: fix PHC stopping on max freq

For 82576 MAC type, max_adj is reported as 1000000000 ppb. However, if
this value is passed to igb_ptp_adjfreq_82576, incvalue overflows out of
INCVALUE_82576_MASK, resulting in setting of zero TIMINCA.incvalue, stopping
the PHC (instead of going at twice the nominal speed).

Fix the advertised max_adj value to the largest value hardware can handle.
As there is no min_adj value available (-max_adj is used instead), this will
also prevent stopping the clock intentionally. It's probably not a big deal,
other igb MAC types don't support stopping the clock, either.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: make sensor info static

Trivial sparse warning.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: SR-IOV init reordering

igb is ineffective at setting a lower total VFs because:

int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
{
        ...
        /* Shouldn't change if VFs already enabled */
        if (dev->sriov->ctrl & PCI_SRIOV_CTRL_VFE)
                return -EBUSY;

Swap init ordering.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: Fix null pointer dereference

The max_vfs= option has always been self limiting to the number of VFs
supported by the device.  fa44f2f1 added SR-IOV configuration via
sysfs, but in the process broke this self correction factor.  The
failing path is:

igb_probe
  igb_sw_init
    if (max_vfs > 7) {
        adapter->vfs_allocated_count = 7;
    ...
    igb_probe_vfs
    igb_enable_sriov(, max_vfs)
      if (num_vfs > 7) {
        err = -EPERM;
        ...

This leaves vfs_allocated_count = 7 and vf_data = NULL, so we bomb out
when igb_probe finally calls igb_reset.  It seems like a really bad
idea, and somewhat pointless, to set vfs_allocated_count separate from
vf_data, but limiting max_vfs is enough to avoid the null pointer.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: fix i350 anti spoofing config

Fix a problem in i350 where anti spoofing configuration was written into a
wrong register.

Signed-off-by: Lior Levy <lior.levy@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbevf: don't release the soft entries

When the ixgbevf driver is opened the request to allocate MSIX irq
vectors may fail.  In that case the driver will call ixgbevf_down()
which will call ixgbevf_irq_disable() to clear the HW interrupt
registers and calls synchronize_irq() using the msix_entries pointer in
the adapter structure.  However, when the function to request the MSIX
irq vectors failed it had already freed the msix_entries which causes
an OOPs from using the NULL pointer in synchronize_irq().

The calls to pci_disable_msix() and to free the msix_entries memory
should not occur if device open fails.  Instead they should be called
during device driver removal to balance with the call to
pci_enable_msix() and the call to allocate msix_entries memory
during the device probe and driver load.

Signed-off-by: Li Xun <xunleer.li@huawei.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

block: removes dynamic allocation on stack

This patch removes dynamic allocation on the stack error.

Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
"A single bugfix which prevents that a non functional timer device is
  selected to provide the fallback device, which is supposed to serve
  timer interrupts on behalf of non functional devices ..."

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clockevents: Don't allow dummy broadcast timers

workqueue: remove pwq_lock which is no longer used

To simplify locking, the previous patches expanded wq->mutex to
protect all fields of each workqueue instance including the pwqs list
leaving pwq_lock without any user. Remove the unused pwq_lock.

tj: Rebased on top of the current dev branch. Updated description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: protect wq->saved_max_active with wq->mutex

We're expanding wq->mutex to cover all fields specific to each
workqueue with the end goal of replacing pwq_lock which will make
locking simpler and easier to understand.

This patch makes wq->saved_max_active protected by wq->mutex instead
of pwq_lock. As pwq_lock locking around pwq_adjust_mac_active() is no
longer necessary, this patch also replaces pwq_lock lockings of
for_each_pwq() around pwq_adjust_max_active() to wq->mutex.

tj: Rebased on top of the current dev branch. Updated description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: protect wq->pwqs and iteration with wq->mutex

We're expanding wq->mutex to cover all fields specific to each
workqueue with the end goal of replacing pwq_lock which will make
locking simpler and easier to understand.

init_and_link_pwq() and pwq_unbound_release_workfn() already grab
wq->mutex when adding or removing a pwq from wq->pwqs list.  This
patch makes it official that the list is wq->mutex protected for
writes and updates readers accoridingly.  Explicit IRQ toggles for
sched-RCU read-locking in flush_workqueue_prep_pwqs() and
drain_workqueues() are removed as the surrounding wq->mutex can
provide sufficient synchronization.

Also, assert_rcu_or_pwq_lock() is renamed to assert_rcu_or_wq_mutex()
and checks for wq->mutex too.

pwq_lock locking and assertion are not removed by this patch and a
couple of for_each_pwq() iterations are still protected by it.
They'll be removed by future patches.

tj: Rebased on top of the current dev branch.  Updated description.
    Folded in assert_rcu_or_wq_mutex() renaming from a later patch
    along with associated comment updates.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: protect wq->nr_drainers and ->flags with wq->mutex

We're expanding wq->mutex to cover all fields specific to each
workqueue with the end goal of replacing pwq_lock which will make
locking simpler and easier to understand.

wq->nr_drainers and ->flags are specific to each workqueue. Protect
->nr_drainers and ->flags with wq->mutex instead of pool_mutex.

tj: Rebased on top of the current dev branch. Updated description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: rename wq->flush_mutex to wq->mutex

Currently pwq->flush_mutex protects many fields of a workqueue
including, especially, the pwqs list.  We're going to expand this
mutex to protect most of a workqueue and eventually replace pwq_lock,
which will make locking simpler and easier to understand.

Drop the "flush_" prefix in preparation.

This patch is pure rename.

tj: Rebased on top of the current dev branch.  Updated description.
    Use WQ: and WR: instead of Q: and QR: for synchronization labels.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: rename wq_mutex to wq_pool_mutex

wq->flush_mutex will be renamed to wq->mutex and cover all fields
specific to each workqueue and eventually replace pwq_lock, which will
make locking simpler and easier to understand.

Rename wq_mutex to wq_pool_mutex to avoid confusion with wq->mutex.
After the scheduled changes, wq_pool_mutex won't be protecting
anything specific to each workqueue instance anyway.

This patch is pure rename.

tj: s/wqs_mutex/wq_pool_mutex/. Rewrote description.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>

Xilinx: ARM: UART: clear pending irqs before enabling irqs

The Boot ROM has an issue which will cause the driver to
lock up as pending irqs are not being cleared. With them
cleared it prevents that issue.

This patch is needed for the current (3.9-rc3) mainline kernel. I guess
it went unnoticed, because it was only tested with u-boot up until now.
And u-boot maybe handles this.

[s.trumtrar@pengutronix.de: cherry-picked from linux-xlnx.git]
Signed-off-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

TTY: 8250, deprecated 8250_core.* options

They were introduced by mistake in 3.7. Let's deprecate them now. For
the reasons, see the text in Kconfig below.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

TTY: 8250, revert module name change

In 3.7 the 8250 module name was changed unintentionally from 8250 to
8250_core by commit 835d844d1a28efba81d5aca7385e24c29d3a6db2
(8250_pnp: do pnp probe before legacy probe). We then had to
re-introduce the old module options to ensure the old good
8250.nr_uart & co. still work. This can be done only by a very dirty
hack and we did it in f2b8dfd9e480c3db3bad0c25c590a5d11b31f4ef
(serial: 8250: Keep 8250.<xxxx> module options functional after driver
rename).

That is so damn ugly so that I decided to revert to the old module
name and deprecate the new 8250_core options present in 3.7 and 3.8
only. The deprecation will happen in the following patch.

Note that this patch changes the hack above to support "8250_core.*",
because we now have "8250.*" natively.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc: define the conditions where the ePAPR idle hcall can be supported

For 32-bit, CONFIG_EPAPR_PARAVIRT pulls in both epapr_paravirt.c
and epapr_hcalls.c which contains the 32-bit paravirt idle loop.

For 64-bit, the paravirt idle loop is in idle_book3e.S and that
source file is included only if CONFIG_PPC_BOOK3E_64 defined.

This patch makes that dependency for 64-bit explicit.

Fixes these build errors:

arch/powerpc/kernel/built-in.o: In function `restore_pblist_ptr':
ftrace.c:(.toc+0xdc0): undefined reference to `epapr_ev_idle_start'
ftrace.c:(.toc+0xdd0): undefined reference to `epapr_ev_idle'

Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

USB: EHCI: fix bug in iTD/siTD DMA pool allocation

[Description written by Alan Stern]

Soeren tracked down a very difficult bug in ehci-hcd's DMA pool
management of iTD and siTD structures.  Some background: ehci-hcd
gives each isochronous endpoint its own set of active and free itd's
(or sitd's for full-speed devices).  When a new itd is needed, it is
taken from the head of the free list, if possible.  However, itd's
must not be used twice in a single frame because the hardware
continues to access the data structure for the entire duration of a
frame.  Therefore if the itd at the head of the free list has its
"frame" member equal to the current value of ehci->now_frame, it
cannot be reused and instead a new itd is allocated from the DMA pool.
The entries on the free list are not released back to the pool until
the endpoint is no longer in use.

The bug arises from the fact that sometimes an itd can be moved back
onto the free list before itd->frame has been set properly.  In
Soeren's case, this happened because ehci-hcd can allocate one more
itd than it actually needs for an URB; the extra itd may or may not be
required depending on how the transfer aligns with a frame boundary.
For example, an URB with 8 isochronous packets will cause two itd's to
be allocated.  If the URB is scheduled to start in microframe 3 of
frame N then it will require both itds: one for microframes 3 - 7 of
frame N and one for microframes 0 - 2 of frame N+1.  But if the URB
had been scheduled to start in microframe 0 then it would require only
the first itd, which could cover microframes 0 - 7 of frame N.  The
second itd would be returned to the end of the free list.

The itd allocation routine initializes the entire structure to 0, so
the extra itd ends up on the free list with itd->frame set to 0
instead of a meaningful value.  After a while the itd reaches the head
of the list, and occasionally this happens when ehci->now_frame is
equal to 0.  Then, even though it would be okay to reuse this itd, the
driver thinks it must get another itd from the DMA pool.

For as long as the isochronous endpoint remains in use, this flaw in
the mechanism causes more and more itd's to be taken slowly from the
DMA pool.  Since none are released back, the pool eventually becomes
exhausted.

This reuslts in memory allocation failures, which typically show up
during a long-running audio stream.  Video might suffer the same
effect.

The fix is very simple.  To prevent allocations from the pool when
they aren't needed, make sure that itd's sent back to the free list
prematurely have itd->frame set to an invalid value which can never be
equal to ehci->now_frame.

This should be applied to -stable kernels going back to 3.6.

Signed-off-by: Soeren Moch <smoch@web.de>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

aoe: Fix unitialized var usage

Commit 4f2ac93c175c4922bdddbfec6cad94b32cea0070 (block: Remove bi_idx
references) accidently removed the bit that set bv - readd that.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: fengguang.wu@intel.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

staging: comedi: s626: fix continuous acquisition

For the s626 driver, there is a bug in the handling of asynchronous
commands on the AI subdevice when the stop source is `TRIG_NONE`.  The
command should run continuously until cancelled, but the interrupt
handler stops the command running after the first scan.

The command set-up function `s626_ai_cmd()` contains this code:

switch (cmd->stop_src) {
case TRIG_COUNT:
/*  data arrives as one packet */
devpriv->ai_sample_count = cmd->stop_arg;
devpriv->ai_continous = 0;
break;
case TRIG_NONE:
/*  continous acquisition */
devpriv->ai_continous = 1;
devpriv->ai_sample_count = 0;
break;
}

The interrupt handler `s626_irq_handler()` contains this code:

if (!(devpriv->ai_continous))
devpriv->ai_sample_count--;
if (devpriv->ai_sample_count <= 0) {
devpriv->ai_cmd_running = 0;
/* ... */
}

So `devpriv->ai_sample_count` is only decremented for the `TRIG_COUNT`
case, but `devpriv->ai_cmd_running` is set to 0 (and the command
stopped) regardless.

Fix this in `s626_ai_cmd()` by setting `devpriv->ai_sample_count = 1`
for the `TRIG_NONE` case.  The interrupt handler will not decrement it
so it will remain greater than 0 and the check for stopping the
acquisition will fail.

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[media] fix compilation with both V4L2 and I2C as 'm'

When config options are:
CONFIG_VIDEO_DEV=y
CONFIG_VIDEO_V4L2=m
CONFIG_I2C=m

Compilation breaks, as reported by:
https://bugzilla.kernel.org/show_bug.cgi?id=55681

Before changeset 7b34be71db533f3e0cf93d53cf62d036cdb5418a,
no compilation errors occurred. However, the I2C code there at
v4l2-device was incorrectly disabled.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

ARM64: early_printk: Fix check for CONFIG_ARM64_64K_PAGES

The 'CONFIG_' prefix is not implicit in IS_ENABLED().

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ipv6: fix bad free of addrconf_init_net

Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xhci: Don't warn on empty ring for suspended devices.

When a device attached to the roothub is suspended, the endpoint rings
are stopped. The host may generate a completion event with the
completion code set to 'Stopped' or 'Stopped Invalid' when the ring is
halted. The current xHCI code prints a warning in that case, which can
be really annoying if the USB device is coming into and out of suspend.

Remove the unnecessary warning.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Stephen Hemminger <stephen@networkplumber.org>

usb: xhci: Fix TRB transfer length macro used for Event TRB.

Use proper macro while extracting TRB transfer length from
Transfer event TRBs. Adding a macro EVENT_TRB_LEN (bits 0:23)
for the same, and use it instead of TRB_LEN (bits 0:16) in
case of event TRBs.

This patch should be backported to kernels as old as 2.6.31, that
contain the commit b10de142119a676552df3f0d2e3a9d647036c26a "USB: xhci:
Bulk transfer support". This patch will have issues applying to older
kernels.

Signed-off-by: Vivek gautam <gautam.vivek@samsung.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: stable@vger.kernel.org

usb/acpi: binding xhci root hub usb port with ACPI

This patch is to bind xhci root hub usb port with its acpi node.
The port num in the acpi table matches with the sequence in the xhci
extended capabilities table. So call usb_hcd_find_raw_port_number() to
transfer hub port num into raw port number which associates with
the sequence in the xhci extended capabilities table before binding.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>

usb: add find_raw_port_number callback to struct hc_driver()

xhci driver divides the root hub into two logical hubs which work
respectively for usb 2.0 and usb 3.0 devices. They are independent
devices in the usb core. But in the ACPI table, it's one device node
and all usb2.0 and usb3.0 ports are under it. Binding usb port with
its acpi node needs the raw port number which is reflected in the xhci
extended capabilities table. This patch is to add find_raw_port_number
callback to struct hc_driver(), fill it with xhci_find_raw_port_number()
which will return raw port number and add a wrap usb_hcd_find_raw_port_number().

Otherwise, refactor xhci_find_real_port_number(). Using
xhci_find_raw_port_number() to get real index in the HW port status
registers instead of scanning through the xHCI roothub port array.
This can help to speed up.

All addresses in xhci->usb2_ports and xhci->usb3_ports array are
kown good ports and don't include following bad ports in the extended
capabilities talbe.
     (1) root port that doesn't have an entry
     (2) root port with unknown speed
     (3) root port that is listed twice and with different speeds.

So xhci_find_raw_port_number() will only return port num of good ones
and never touch bad ports above.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>

usb: xhci: fix build warning

/home/b29397/work/code/git/linus/linux-2.6/drivers/usb/host/xhci-ring.c: In function ‘handle_port_status’:
/home/b29397/work/code/git/linus/linux-2.6/drivers/usb/host/xhci-ring.c:1580: warning: ‘hcd’ may be used uninitialized in this function

Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>

unix: fix a race condition in unix_release()

As reported by Jan, and others over the past few years, there is a
race condition caused by unix_release setting the sock->sk pointer
to NULL before properly marking the socket as dead/orphaned. This
can cause a problem with the LSM hook security_unix_may_send() if
there is another socket attempting to write to this partially
released socket in between when sock->sk is set to NULL and it is
marked as dead/orphaned. This patch fixes this by only setting
sock->sk to NULL after the socket has been marked as dead; I also
take the opportunity to make unix_release_sock() a void function
as it only ever returned 0/success.

Dave, I think this one should go on the -stable pile.

Special thanks to Jan for coming up with a reproducer for this
problem.

Reported-by: Jan Stancek <jan.stancek@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'rdma-for-linus' of git://git./linux/kernel/git/roland/infiniband

Pull infiniband/rdma fixes from Roland Dreier:
"Small batch of InfiniBand/RDMA fixes for 3.9:

   - Fix for TX lockup in IPoIB
   - QLogic -> Intel update for qib driver
   - Small static checker fix for qib
   - Fix error path return value in cxgb4"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/qib: change QLogic to Intel
  IB/ipath: Silence a static checker warning
  IPoIB: Fix send lockup due to missed TX completion
  RDMA/cxgb4: Fix error return code in create_qp()

Merge tag 'fixes' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC bug fixes from Arnd Bergmann:
"Four patches for arm-soc this week:

   - Kevin Hilman is no longer reachable under his previous email
     address.  He submitted the patch earlier, but nobody felt
     responsible to pick it up.

   - One Tegra fix for an incorect register address in device tree.

   - IMX multiplatform support exposes a configuration option that leads
     to unbootable kernels on all other machines and that needs to
     depend on that platform.

   - A nontrivial bug fix for the setup of the mxs video output."

* tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  MAINTAINERS: update email address for Kevin Hilman
  ARM: tegra: fix register address of slink controller
  ARM: imx: add dependency check for DEBUG_IMX_UART_PORT
  ARM: video: mxs: Fix mxsfb misconfiguring VDCTRL0

Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfixes from J Bruce Fields:
"Fixes for a couple mistakes in the new DRC code.  And thanks to Kent
  Overstreet for noticing we've been sync'ing the wrong range on stable
  writes since 3.8."

* 'for-3.9' of git://linux-nfs.org/~bfields/linux:
  nfsd: fix bad offset use
  nfsd: fix startup order in nfsd_reply_cache_init
  nfsd: only unhash DRC entries that are in the hashtable

SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked

We need to be careful when testing task->tk_waitqueue in
rpc_wake_up_task_queue_locked, because it can be changed while we
are holding the queue->lock.
By adding appropriate memory barriers, we can ensure that it is safe to
test task->tk_waitqueue for equality if the RPC_TASK_QUEUED bit is set.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org

cpufreq: acpi-cpufreq: Don't set policy->related_cpus from .init()

With the addition of following patch:

fcf8058 cpufreq: Simplify cpufreq_add_dev()

cpufreq driver's .init() routine must initialize policy->cpus with
mask of all possible CPUs (Online + Offline) that share the clock.
Then the core would copy this mask onto policy->related_cpus and will
reset policy->cpus to carry only online cpus.

acpi-cpufreq driver wasn't updated with this assumption and so
sometimes when we try to hot[un]plug CPUs at run time, sysfs
directories get corrupted.

This patch fixes acpi-cpufreq driver against this corruption.

Reported-and-tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq: stats: do cpufreq_cpu_put() corresponding to cpufreq_cpu_get()

In cpufreq_stats_free_sysfs() we aren't balancing calls to
cpufreq_cpu_get() with cpufreq_cpu_put(). This will never let us have
ref count to policy->kobj as zero.

We will get a hang if somehow cpufreq_driver_unregister() is called.
And that can happen when we compile our driver as module and
insmod/rmmod it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

intel-pstate: Use #defines instead of hard-coded values.

They are defined in coreboot (MSR_PLATFORM) and the other
one is already defined in msr-index.h.

Let's use those.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq / intel_pstate: Fix calculation of current frequency

Use the correct pstate value to calculate the effective frequency.

References: https://bugzilla.redhat.com/show_bug.cgi?id=923942
Reported-by: Satish Balay <balay@fastmail.fm>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq / intel_pstate: Add function to check that all MSRs are valid

Some VMs seem to try to implement some MSRs but not all the registers
the driver needs. Check to make sure all the MSR that we need are
available. If any of the required MSRs are not available refuse to
load.

References: https://bugzilla.redhat.com/show_bug.cgi?id=922923
Reported-by: Josh Stone <jistone@redhat.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Exynos and Intel fixes.

  The intel fixes are fairly straightforward, mostly reverts due to bugs
  found.  The exynos one is a big larger since they found some issues
  with the G2D engine and iommu interaction, and needed to verify the
  operations a lot better than they were previously, otherwise a user
  app can just crash the kernel with an iommu fault."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  Revert "drm/i915: write backlight harder"
  drm/i915: don't disable the power well yet
  Revert "drm/i915: set TRANSCODER_EDP even earlier"
  drm/exynos: Check g2d cmd list for g2d restrictions
  drm/exynos: Add a new function to get gem buffer size
  drm/exynos: Deal with g2d buffer info more efficiently
  drm/exynos: Clean up some G2D codes for readability
  drm/exynos: Fix G2D core malfunctioning issue
  drm/exynos: clear node object type at gem unmap
  drm/exynos: Fix error routine to getting dma addr.
  drm/exynos: Replaced kzalloc & memcpy with kmemdup
  drm/exynos: fimd: calculate the correct address offset
  drm/exynos: Make mixer_check_timing static
  drm/exynos: modify the compatible string for exynos fimd

powerpc: make additional room in exception vector area

The FWNMI region is fixed at 0x7000 and the vector are now overflowing
that with allmodconfig. Fix that by moving slb_miss_realmode code out
of that region as it doesn't need to be that close to the call sites
(it is a _GLOBAL function)

Fixes this build error:

arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:1304: Error: attempt to move .org backwards

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

Merge branch 'for-jens' of evilpiepirate.org/git/linux-bcache into for-3.10/core

This contains Kents prep work for the immutable bio_vecs.

Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into HEAD

Daniel writes:
"Just three revert/disable by default patches, one of them cc: stable
(since the offending commit was cc: stable, too)."

* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
  Revert "drm/i915: write backlight harder"
  drm/i915: don't disable the power well yet
  Revert "drm/i915: set TRANSCODER_EDP even earlier"

Merge branch 'exynos-drm-fixes' of git://git./linux/kernel/git/daeinki/drm-exynos into HEAD

Inki writes:
Includes bug fixes and code cleanups.
And it considers some restrictions to G2D hardware.
With this, the malfunction and page fault issues to g2d driver
would be fixed.

* 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
  drm/exynos: Check g2d cmd list for g2d restrictions
  drm/exynos: Add a new function to get gem buffer size
  drm/exynos: Deal with g2d buffer info more efficiently
  drm/exynos: Clean up some G2D codes for readability
  drm/exynos: Fix G2D core malfunctioning issue
  drm/exynos: clear node object type at gem unmap
  drm/exynos: Fix error routine to getting dma addr.
  drm/exynos: Replaced kzalloc & memcpy with kmemdup
  drm/exynos: fimd: calculate the correct address offset
  drm/exynos: Make mixer_check_timing static
  drm/exynos: modify the compatible string for exynos fimd

tcp: undo spurious timeout after SACK reneging

On SACK reneging the sender immediately retransmits and forces a
timeout but disables Eifel (undo). If the (buggy) receiver does not
drop any packet this can trigger a false slow-start retransmit storm
driven by the ACKs of the original packets. This can be detected with
undo and TCP timestamps.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: fix assignment of signed expression to unsigned variable

fix for incorrect assignment of signed expression to unsigned variable.

Signed-off-by: Kumar Amit Mehta <gmate.amit@gmail.com>
Acked-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: fix crash when set mac address of br interface

When I tried to set mac address of a bridge interface to a mac
address which already learned on this bridge, I got system hang.

The cause is straight forward: function br_fdb_change_mac_address
calls fdb_insert with NULL source nbp. Then an fdb lookup is
performed. If an fdb entry is found and it's local, it's OK. But
if it's not local, source is dereferenced for printk without NULL
check.

Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8021q: fix a potential use-after-free

vlan_vid_del() could possibly free ->vlan_info after a RCU grace
period, however, we may still refer to the freed memory area
by 'grp' pointer. Found by code inspection.

This patch moves vlan_vid_del() as behind as possible.

Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: remove a WARN_ON() in net_enable_timestamp()

The WARN_ON(in_interrupt()) in net_enable_timestamp() can get false
positive, in socket clone path, run from softirq context :

[ 3641.624425] WARNING: at net/core/dev.c:1532 net_enable_timestamp+0x7b/0x80()
[ 3641.668811] Call Trace:
[ 3641.671254]  <IRQ>  [<ffffffff80286817>] warn_slowpath_common+0x87/0xc0
[ 3641.677871]  [<ffffffff8028686a>] warn_slowpath_null+0x1a/0x20
[ 3641.683683]  [<ffffffff80742f8b>] net_enable_timestamp+0x7b/0x80
[ 3641.689668]  [<ffffffff80732ce5>] sk_clone_lock+0x425/0x450
[ 3641.695222]  [<ffffffff8078db36>] inet_csk_clone_lock+0x16/0x170
[ 3641.701213]  [<ffffffff807ae449>] tcp_create_openreq_child+0x29/0x820
[ 3641.707663]  [<ffffffff807d62e2>] ? ipt_do_table+0x222/0x670
[ 3641.713354]  [<ffffffff807aaf5b>] tcp_v4_syn_recv_sock+0xab/0x3d0
[ 3641.719425]  [<ffffffff807af63a>] tcp_check_req+0x3da/0x530
[ 3641.724979]  [<ffffffff8078b400>] ? inet_hashinfo_init+0x60/0x80
[ 3641.730964]  [<ffffffff807ade6f>] ? tcp_v4_rcv+0x79f/0xbe0
[ 3641.736430]  [<ffffffff807ab9bd>] tcp_v4_do_rcv+0x38d/0x4f0
[ 3641.741985]  [<ffffffff807ae14a>] tcp_v4_rcv+0xa7a/0xbe0

Its safe at this point because the parent socket owns a reference
on the netstamp_needed, so we cant have a 0 -> 1 transition, which
requires to lock a mutex.

Instead of refining the check, lets remove it, as all known callers
are safe. If it ever changes in the future, static_key_slow_inc()
will complain anyway.

Reported-by: Laurent Chavey <chavey@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'pinctrl-fixes-for-v3.9' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pinctrl fixes from Linus Walleij:
"Here are a few pinctrl fixes for the v3.9 rc series:
   - Usecount bounds checking so we do not go below zero on mux
     usecounts.
   - Loop range checking in GPIO ranges in the DT range parser.
   - Proper print in debugfs for pinconf state.
   - Fix compilation bug in generic pinconf code.
   - Minor bugfixes to abx500 and mvebu drivers."

* tag 'pinctrl-fixes-for-v3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinmux: forbid mux_usecount to be set at UINT_MAX
  pinctrl: mvebu: fix checking for SoC specific controls
  pinctrl: generic: Fix compilation error
  pinctrl: Print the correct information in debugfs pinconf-state file
  pinctrl: abx500: Fix checking if pin use AlternateFunction register
  gpio: fix wrong checking condition for gpio range

Merge branch 'x86/urgent' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin:
"A collection of minor fixes, more EFI variables paranoia
  (anti-bricking) plus the ability to disable the pstore either as a
  runtime default or completely, due to bricking concerns."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efivars: Fix check for CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE
  x86, microcode_intel_early: Mark apply_microcode_early() as cpuinit
  efivars: Handle duplicate names from get_next_variable()
  efivars: explicitly calculate length of VariableName
  efivars: Add module parameter to disable use as a pstore backend
  efivars: Allow disabling use as a pstore backend
  x86-32, microcode_intel_early: Fix crash with CONFIG_DEBUG_VIRTUAL
  x86-64: Fix the failure case in copy_user_handle_tail()

Revert "drm/i915: write backlight harder"

This reverts commit cf0a6584aa6d382f802f2c3cacac23ccbccde0cd.

Turns out that cargo-culting breaks systems. Note that we can't revert
further, since

commit 770c12312ad617172b1a65b911d3e6564fc5aca8
Author: Takashi Iwai <tiwai@suse.de>
Date: Sat Aug 11 08:56:42 2012 +0200

drm/i915: Fix blank panel at reopening lid

fixed a regression in 3.6-rc kernels for which we've never figured out
the exact root cause. But some further inspection of the backlight
code reveals that it's seriously lacking locking. And especially the
asle backlight update is know to get fired (through some smm magic)
when writing specific backlight control registers. So the possibility
of suffering from races is rather real.

Until those races are fixed I don't think it makes sense to try
further hacks. Which sucks a bit, but sometimes that's how it is :(

References: http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg18788.html
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47941
Tested-by: Takashi Iwai <tiwai@suse.de>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: stable@vger.kernel.org (the reverted commit was cc: stable, too)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: don't disable the power well yet

We're still not 100% ready to disable the power well, so don't disable
it for now. When we disable it we break the audio driver (because some
of the audio registers are on the power well) and machines with eDP on
port D (because it doesn't use TRANSCODER_EDP).

Also, instead of just reverting the code, add a Kernel option to let
us disable it if we want. This will allow us to keep developing and
testing the feature while it's not enabled.

This fixes problems caused by the following commit:
  commit d6dd9eb1d96d2b7345fe4664066c2b7ed86da898
  Author: Daniel Vetter <daniel.vetter@ffwll.ch>
  Date:   Tue Jan 29 16:35:20 2013 -0200
       drm/i915: dynamic Haswell display power well support

References: http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg18788.html
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Mengdong Lin <mengdong.lin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Revert "drm/i915: set TRANSCODER_EDP even earlier"

This reverts commit cc464b2a17c59adedbdc02cc54341d630354edc3.

The reason is that Takashi Iwai reported a regression bisected to this
commit:

http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg18788.html

His machine has eDP on port D (usual desktop all-in-on setup), which
intel_dp.c identifies as an eDP panel, but the hsw ddi code
mishandles.

Closer inspection of the code reveals that haswell_crtc_mode_set also
checks intel_encoder_is_pch_edp when setting is_cpu_edp. On haswell
that doesn't make much sense (since there's no edp on the pch), but
what this function _really_ checks is whether that edp connector is on
port A or port D. It's just that on ilk-ivb port D was on the pch ...

So that explains why this seemingly innocent change killed eDP on port
D. Furthermore it looks like everything else accidentally works, since
we've never enabled eDP on port D support for hsw intentionally (e.g.
we still register the HDMI output for port D in that case).

But in retrospective I also don't like that this leaks highly platform
specific details into common code, and the reason is that the drm
vblank layer sucks. So instead I think we should:
- move the cpu_transcoder into the dynamic pipe_config tracking (once
that's merged).
- fix up the drm vblank layer to finally deal with kms crtc objects
instead of int pipes.

v2: Pimp commit message with the better diagnosis as discussed with
Paulo on irc.

Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Merge tag 'efi-for-3.9-rc4' into x86/urgent

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

Linux 3.9-rc4

Merge git://git./linux/kernel/git/nab/target-pending

Pull SCSI target fixes from Nicholas Bellinger:
"These are mostly minor fixes this time around.  The iscsi-target CHAP
  big-endian bugfix and bump FD_MAX_SECTORS=2048 default patch to allow
  1MB sized I/Os for FILEIO backends on >= v3.5 code are both CC'ed to
  stable.

  Also, there is a persistent reservations regression that has recently
  been reported for >= v3.8.x code, that is currently being tracked down
  for v3.9."

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target/pscsi: Reject cross page boundary case in pscsi_map_sg
  target/file: Bump FD_MAX_SECTORS to 2048 to handle 1M sized I/Os
  tcm_vhost: Flush vhost_work in vhost_scsi_flush()
  tcm_vhost: Add missed lock in vhost_scsi_clear_endpoint()
  target: fix possible memory leak in core_tpg_register()
  target/iscsi: Fix mutual CHAP auth on big-endian arches
  target_core_sbc: use noop for SYNCHRONIZE_CACHE

Merge tag 'md-3.9-fixes' of git://neil.brown.name/md

Pull md fixes from NeilBrown:
"A few bugfixes for md

   - recent regressions in raid5
   - recent regressions in dmraid
   - a few instances of CONFIG_MULTICORE_RAID456 linger

  Several tagged for -stable"

* tag 'md-3.9-fixes' of git://neil.brown.name/md:
  md: remove CONFIG_MULTICORE_RAID456 entirely
  md/raid5: ensure sync and DISCARD don't happen at the same time.
  MD: Prevent sysfs operations on uninitialized kobjects
  MD RAID5: Avoid accessing gendisk or queue structs when not available
  md/raid5: schedule_construction should abort if nothing to do.

bio-integrity: Add explicit field for owner of bip_buf

This was the only real user of BIO_CLONED, which didn't have very clear
semantics. Convert to its own flag so we can get rid of BIO_CLONED.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Martin K. Petersen <martin.petersen@oracle.com>

block: Add an explicit bio flag for bios that own their bvec

This is for the new bio splitting code. When we split a bio, if the
split occured on a bvec boundry we reuse the bvec for the new bio. But
that means bio_free() can't free it, hence the explicit flag.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
Acked-by: Tejun Heo <tj@kernel.org>

block: Add bio_alloc_pages()

More utility code to replace stuff that's getting open coded.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>

block: Convert some code to bio_for_each_segment_all()

More prep work for immutable bvecs:

A few places in the code were either open coding or using the wrong
version - fix.

After we introduce the bvec iter, it'll no longer be possible to modify
the biovec through bio_for_each_segment_all() - it doesn't increment a
pointer to the current bvec, you pass in a struct bio_vec (not a
pointer) which is updated with what the current biovec would be (taking
into account bi_bvec_done and bi_size).

So because of that it's more worthwhile to be consistent about
bio_for_each_segment()/bio_for_each_segment_all() usage.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
CC: Alasdair Kergon <agk@redhat.com>
CC: dm-devel@redhat.com
CC: Alexander Viro <viro@zeniv.linux.org.uk>

block: Add bio_for_each_segment_all()

__bio_for_each_segment() iterates bvecs from the specified index
instead of bio->bv_idx. Currently, the only usage is to walk all the
bvecs after the bio has been advanced by specifying 0 index.

For immutable bvecs, we need to split these apart;
bio_for_each_segment() is going to have a different implementation.
This will also help document the intent of code that's using it -
bio_for_each_segment_all() is only legal to use for code that owns the
bio.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Neil Brown <neilb@suse.de>
CC: Boaz Harrosh <bharrosh@panasas.com>

bounce: Refactor __blk_queue_bounce to not use bi_io_vec

A bunch of what __blk_queue_bounce() was doing was problematic for the
immutable bvec work; this cleans that up and the code is quite a bit
smaller, too.

The __bio_for_each_segment() in copy_to_high_bio_irq() was changed
because that one's looping over the original bio, not the bounce bio -
a later patch renames __bio_for_each_segment() ->
bio_for_each_segment_all(), and documents that
bio_for_each_segment_all() is only for code that owns the bio.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>

raid1: use bio_copy_data()

This doesn't really delete any code _yet_, but once immutable bvecs are
done we can just delete the rest of the code in that loop.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>

pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage

In the short term this'll help with code auditing, and if this code ever
gets used now it's converted :)

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jiri Kosina <jkosina@suse.cz>

pktcdvd: use bio_copy_data()

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Jiri Kosina <jkosina@suse.cz>

block: Add bio_copy_data()

This gets open coded quite a bit and it's tricky to get right, so make a
generic version and convert some existing users over to it instead.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>

raid1: Refactor narrow_write_error() to not use bi_idx

More bi_idx removal. This code was just open coding bio_clone(). This
could probably be further improved by using bio_advance() instead of
skipping over null pages, but that'd be a larger rework.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>

raid5: use bio_reset()

Had to shuffle the code around a bit (where bi_rw and bi_end_io were
set), but shouldn't really be anything tricky here

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>

raid1: use bio_reset()

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>

raid10: Use bio_reset()

More prep work for immutable bio vecs, mainly getting rid of references
to bi_idx.

bio_reset was being open coded in a few places. The one in sync_request
was a bit nontrivial to convert, so could use some extra eyeballs.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
Acked-by: NeilBrown <neilb@suse.de>

block: Add submit_bio_wait(), remove from md

Random cleanup - this code was duplicated and it's not really specific
to md.

Also added the ability to return the actual error code.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>

block: Remove some unnecessary bi_vcnt usage

More prep work for immutable bvecs/effecient bio splitting - usage of
bi_vcnt has to be auditing, so getting rid of all the unnecessary usage
makes that easier.

Plus, bio_segments() is really what this code wanted, as it respects the
current value of bi_idx.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Eric Moore <Eric.Moore@lsi.com>
CC: "James E.J. Bottomley" <JBottomley@parallels.com>
CC: linux-scsi@vger.kernel.org

block: Remove bi_idx references

For immutable bvecs, all bi_idx usage needs to be audited - so here
we're removing all the unnecessary uses.

Most of these are places where it was being initialized on a bio that
was just allocated, a few others are conversions to standard macros.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>

block: Change bio_split() to respect the current value of bi_idx

In the current code bio_split() won't be seeing partially completed bios
so this doesn't change any behaviour, but this makes the code a bit
clearer as to what bio_split() actually requires.

The immediate purpose of the patch is removing unnecessary bi_idx
references, but the end goal is to allow partial completed bios to be
submitted, which along with immutable biovecs enables effecient bio
splitting.

Some of the callers were (double) checking that bios could be split, so
update their checks too.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
CC: Neil Brown <neilb@suse.de>
CC: Martin K. Petersen <martin.petersen@oracle.com>

block: Use bio_sectors() more consistently

Bunch of places in the code weren't using it where they could be -
this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx
into a struct bvec_iter.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: "Ed L. Cashin" <ecashin@coraid.com>
CC: Nick Piggin <npiggin@kernel.dk>
CC: Jiri Kosina <jkosina@suse.cz>
CC: Jim Paris <jim@jtan.com>
CC: Geoff Levand <geoff@infradead.org>
CC: Alasdair Kergon <agk@redhat.com>
CC: dm-devel@redhat.com
CC: Neil Brown <neilb@suse.de>
CC: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Ed Cashin <ecashin@coraid.com>

block: Add bio_end_sector()

Just a little convenience macro - main reason to add it now is preparing
for immutable bio vecs, it'll reduce the size of the patch that puts
bi_sector/bi_size/bi_idx into a struct bvec_iter.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
CC: Jiri Kosina <jkosina@suse.cz>
CC: Alasdair Kergon <agk@redhat.com>
CC: dm-devel@redhat.com
CC: Neil Brown <neilb@suse.de>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
CC: linux-s390@vger.kernel.org
CC: Chris Mason <chris.mason@fusionio.com>
CC: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>

md: Convert md_trim_bio() to use bio_advance()

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
Acked-by: NeilBrown <neilb@suse.de>

block: Refactor blk_update_request()

Converts it to use bio_advance(), simplifying it quite a bit in the
process.

Note that req_bio_endio() now always calls bio_advance() - which means
it always loops over the biovec, not just on partial completions. Don't
expect it to affect performance, but worth noting.

Tested it by forcing partial updates, and dumping before and after on
various bio/bvec fields when doing a partial update.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>

block: Add bio_advance()

This is prep work for immutable bio vecs; we first want to centralize
where bvecs are modified.

Next two patches convert some existing code to use this function.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>

block: Convert integrity to bvec_alloc_bs()

This adds a pointer to the bvec array to struct bio_integrity_payload,
instead of the bvecs always being inline; then the bvecs are allocated
with bvec_alloc_bs().

Changed bvec_alloc_bs() and bvec_free_bs() to take a pointer to a
mempool instead of the bioset, so that bio integrity can use a different
mempool for its bvecs, and thus avoid a potential deadlock.

This is eventually for immutable bio vecs - immutable bvecs aren't
useful if we still have to copy them, hence the need for the pointer.
Less code is always nice too, though.

Also, bio_integrity_alloc() was using fs_bio_set if no bio_set was
specified. This was wrong - using the bio_set doesn't protect us from
memory allocation failures, because we just used kmalloc for the
bio_integrity_payload. But it does introduce the possibility of
deadlock, if for some reason we weren't supposed to be using fs_bio_set.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Martin K. Petersen <martin.petersen@oracle.com>

block: Fix a buffer overrun in bio_integrity_split()

bio_integrity_split() seemed to be confusing pointers and arrays -
bip_vec in bio_integrity_payload was an array appended to the end of the
payload, so the bio_vecs in struct bio_pair should have come after the
bio_integrity_payload they're for.

Fix it by making bip_vec a pointer to the inline vecs - a later patch is
going to make more use of this pointer.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Martin K. Petersen <martin.petersen@oracle.com>

block: Avoid deadlocks with bio allocation by stacking drivers

Previously, if we ever try to allocate more than once from the same bio
set while running under generic_make_request() (i.e. a stacking block
driver), we risk deadlock.

This is because of the code in generic_make_request() that converts
recursion to iteration; any bios we submit won't actually be submitted
(so they can complete and eventually be freed) until after we return -
this means if we allocate a second bio, we're blocking the first one
from ever being freed.

Thus if enough threads call into a stacking block driver at the same
time with bios that need multiple splits, and the bio_set's reserve gets
used up, we deadlock.

This can be worked around in the driver code - we could check if we're
running under generic_make_request(), then mask out __GFP_WAIT when we
go to allocate a bio, and if the allocation fails punt to workqueue and
retry the allocation.

But this is tricky and not a generic solution. This patch solves it for
all users by inverting the previously described technique. We allocate a
rescuer workqueue for each bio_set, and then in the allocation code if
there are bios on current->bio_list we would be blocking, we punt them
to the rescuer workqueue to be submitted.

This guarantees forward progress for bio allocations under
generic_make_request() provided each bio is submitted before allocating
the next, and provided the bios are freed after they complete.

Note that this doesn't do anything for allocation from other mempools.
Instead of allocating per bio data structures from a mempool, code
should use bio_set's front_pad.

Tested it by forcing the rescue codepath to be taken (by disabling the
first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
of arbitrary bio splitting) and verified that the rescuer was being
invoked.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Muthukumar Ratty <muthur@gmail.com>

block: Reorder struct bio_set

This is prep work for the next patch, which embeds a struct bio_list in
struct bio_set.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>

Merge tag 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev

Pull libata updates from Jeff Garzik:
"Simple stuff.  See one-line summaries."

* tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  pata_samsung_cf: use module_platform_driver_probe()
  [libata] Avoid specialized TLA's in ZPODD's Kconfig
  libata-acpi.c: fix copy and paste mistake in ata_acpi_register_power_resource
  sata_fsl: Remove redundant NULL check before kfree
  ahci: Add Device IDs for Intel Wellsburg PCH
  ata_piix: Add MODULE_PARM_DESC to prefer_ms_hyperv

Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"One bugfix for the tegra driver.  Two updates regarding email
  addresses and MAINTAINERS which I like to have up-to-date so people
  can be reached immediately.  While we are here, there is on PCI_ID
  addition."

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: add maintainer entry for atmel i2c driver
  i2c: Fix my e-mail address in drivers and documentation
  i2c: iSMT: add Intel Avoton DeviceIDs
  i2c: tegra: check the clk_prepare_enable() return value

Merge git://www.linux-watchdog.org/linux-watchdog

Pull watchdog fixes from Wim Van Sebroeck:
"Fix a boot issues and correct the AcpiMmioSel bitmask in the
  sp5100_tco watchdog device driver"

* git://www.linux-watchdog.org/linux-watchdog:
  watchdog: sp5100_tco: Set the AcpiMmioSel bitmask value to 1 instead of 2
  watchdog: sp5100_tco: Remove code that may cause a boot failure

KMS: fix EDID detailed timing frame rate

When KMS has parsed an EDID "detailed timing", it leaves the frame rate
zeroed. Consecutive (debug-) output of that mode thus yields 0 for
vsync. This simple fix also speeds up future invocations of
drm_mode_vrefresh().

While it is debatable whether this qualifies as a -stable fix I'd apply
it for consistency's sake; drm_helper_probe_single_connector_modes()
does the same thing already for all probed modes.

Cc: stable@vger.kernel.org
Signed-off-by: Torsten Duwe <duwe@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

KMS: fix EDID detailed timing vsync parsing

EDID spreads some values across multiple bytes; bit-fiddling is needed
to retrieve these. The current code to parse "detailed timings" has a
cut&paste error that results in a vsync offset of at most 15 lines
instead of 63.

See

http://en.wikipedia.org/wiki/EDID

and in the "EDID Detailed Timing Descriptor" see bytes 10+11 show why
that needs to be a left shift.

Cc: stable@vger.kernel.org
Signed-off-by: Torsten Duwe <duwe@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

block: implement runtime pm strategy

When a request is added:
    If device is suspended or is suspending and the request is not a
    PM request, resume the device.

When the last request finishes:
    Call pm_runtime_mark_last_busy().

When pick a request:
    If device is resuming/suspending, then only PM request is allowed
    to go.

The idea and API is designed by Alan Stern and described here:
http://marc.info/?l=linux-scsi&m=133727953625963&w=2

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: add runtime pm helpers

Add runtime pm helper functions:

void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
  - Initialization function for drivers to call.

int blk_pre_runtime_suspend(struct request_queue *q)
  - If any requests are in the queue, mark last busy and return -EBUSY.
    Otherwise set q->rpm_status to RPM_SUSPENDING and return 0.

void blk_post_runtime_suspend(struct request_queue *q, int err)
  - If the suspend succeeded then set q->rpm_status to RPM_SUSPENDED.
    Otherwise set it to RPM_ACTIVE and mark last busy.

void blk_pre_runtime_resume(struct request_queue *q)
  - Set q->rpm_status to RPM_RESUMING.

void blk_post_runtime_resume(struct request_queue *q, int err)
  - If the resume succeeded then set q->rpm_status to RPM_ACTIVE
    and call __blk_run_queue, then mark last busy and autosuspend.
    Otherwise set q->rpm_status to RPM_SUSPENDED.

The idea and API is designed by Alan Stern and described here:
http://marc.info/?l=linux-scsi&m=133727953625963&w=2

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: add a flag to identify PM request

Add a flag REQ_PM to identify the request is PM related, such requests
will not change the device request queue's runtime status. It is
intended to be used in driver's runtime PM callback, so that driver can
perform some IO to the device there with the queue's runtime status
unaffected. e.g. in SCSI disk's runtime suspend callback, the disk will
be put into stopped power state, and this require sending a command to
the device. Such command processing should not change the disk's runtime
status.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branches 'cxgb4', 'ipoib' and 'qib' into for-next

IB/qib: change QLogic to Intel

These changes modify the qib driver as part of acquiring
the InfiniBand assets of QLogic.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

IB/ipath: Silence a static checker warning

I have a static checker which complains that 0x255 is too high for
the "dev->opstats[opcode]" array. It turns out that the hardware
has already validated the opcode at this point so it can't actually
overflow.

However, silencing the warning is good and this matches how the
opcode is treated in qib_ib_rcv() as well.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

IPoIB: Fix send lockup due to missed TX completion

Commit f0dc117abdfa ("IPoIB: Fix TX queue lockup with mixed UD/CM
traffic") attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.

The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.

This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq(). If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.

This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

RDMA/cxgb4: Fix error return code in create_qp()

Fix to return a negative error code from the error handling case
instead of 0, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

Merge git://git.infradead.org/users/willy/linux-nvme

Pull NVMe driver update from Matthew Wilcox:
"These patches have mostly been baking for a few months; sorry I didn't
  get them in during the merge window.  They're all bug fixes, except
  for the addition of the SMART log and the addition to MAINTAINERS."

* git://git.infradead.org/users/willy/linux-nvme:
  NVMe: Add namespaces with no LBA range feature
  MAINTAINERS: Add entry for the NVMe driver
  NVMe: Initialize iod nents to 0
  NVMe: Define SMART log
  NVMe: Add result to nvme_get_features
  NVMe: Set result from user admin command
  NVMe: End queued bio requests when freeing queue
  NVMe: Free cmdid on nvme_submit_bio error