platform/kernel/linux-rpi.git
12 years agoMerge tag 'isci-for-3.5' into misc
James Bottomley [Mon, 21 May 2012 11:17:30 +0000 (12:17 +0100)]
Merge tag 'isci-for-3.5' into misc

isci update for 3.5

1/ Rework remote-node-context (RNC) handling for proper management of
   the silicon state machine in error handling and hot-plug conditions.
   Further details below, suffice to say if the RNC is mismanaged the
   silicon state machines may lock up.

2/ Refactor the initialization code to be reused for suspend/resume support

3/ Miscellaneous bug fixes to address discovery issues and hardware
   compatibility.

RNC rework details from Jeff Skirvin:

In the controller, devices as they appear on a SAS domain (or
direct-attached SATA devices) are represented by memory structures known
as "Remote Node Contexts" (RNCs).  These structures are transferred from
main memory to the controller using a set of register commands; these
commands include setting up the context ("posting"), removing the
context ("invalidating"), and commands to control the scheduling of
commands and connections to that remote device ("suspensions" and
"resumptions").  There is a similar path to control RNC scheduling from
the protocol engine, which interprets the results of command and data
transmission and reception.

In general, the controller chooses among non-suspended RNCs to find one
that has work requiring scheduling the transmission of command and data
frames to a target.  Likewise, when a target tries to return data back
to the initiator, the state of the RNC is used by the controller to
determine how to treat the incoming request. As an example, if the RNC
is in the state "TX/RX Suspended", incoming SSP connection requests from
the target will be rejected by the controller hardware.  When an RNC is
"TX Suspended", it will not be selected by the controller hardware to
start outgoing command or data operations (with certain priority-based
exceptions).

As mentioned above, there are two sources for management of the RNC
states: commands from driver software, and the result of transmission
and reception conditions of commands and data signaled by the controller
hardware.  As an example of the latter, if an outgoing SSP command ends
with a OPEN_REJECT(BAD_DESTINATION) status, the RNC state will
transition to the "TX Suspended" state, and this is signaled by the
controller hardware in the status to the completion of the pending
command as well as signaled in a controller hardware event.  Examples of
the former are included in the patch changelogs.

Driver software is required to suspend the RNC in a "TX/RX Suspended"
condition before any outstanding commands can be terminated.  Failure to
guarantee this can lead to a complete hardware hang condition.  Earlier
versions of the driver software did not guarantee that an RNC was
correctly managed before I/O termination, and so operated in an unsafe
way.

Further, the driver performed unnecessary contortions to preserve the
remote device command state and so was more complicated than it needed
to be.  A simplifying driver assumption is that once an I/O has entered
the error handler path without having completed in the target, the
requirement on the driver is that all use of the sas_task must end.
Beyond that, recovery of operation is dependent on libsas and other
components to reset, rediscover and reconfigure the device before normal
operation can restart.  In the driver, this simplifying assumption meant
that the RNC management could be reduced to entry into the suspended
state, terminating the targeted I/O request, and resuming the RNC as
needed for device-specific management such as an SSP Abort Task or LUN
Reset Management request.

12 years agoisci: End the RNC resumption wait when the RNC is destroyed.
Jeff Skirvin [Wed, 14 Mar 2012 00:15:11 +0000 (17:15 -0700)]
isci: End the RNC resumption wait when the RNC is destroyed.

While the RNC is suspended for I/O cleanup, the remote device can be
stopped and the RNC setup for destruction.  These changes accomodate that
case in the abort path.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Fixed RNC bug that lost the suspension or resumption during destroy
Jeff Skirvin [Wed, 14 Mar 2012 00:03:00 +0000 (17:03 -0700)]
isci: Fixed RNC bug that lost the suspension or resumption during destroy

This fix corrects the saving of resume parameters when the destruction
of the RNC has already been directed, and makes sure not to overwrite
the RNC destruction callbacks.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Fix RNC AWAIT_SUSPENSION->INVALIDATING transition.
Jeff Skirvin [Tue, 13 Mar 2012 23:36:35 +0000 (16:36 -0700)]
isci: Fix RNC AWAIT_SUSPENSION->INVALIDATING transition.

The RNC state machine would incorrectly transition from
SCI_RNC_AWAIT_SUSPENSION directly to SCI_RNC_INVALIDATING when a destruct
request was made.  This would skip the increment of the suspension count
and the abort of pending TCs (although the invalidating state would at
least cleanup outstanding TCs).

Instead, the RNC will transition to SCI_RNC_SUSPENDED and then start the
destruction process.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Manage the IREQ_NO_AUTO_FREE_TAG under scic_lock.
Jeff Skirvin [Tue, 13 Mar 2012 00:29:51 +0000 (17:29 -0700)]
isci: Manage the IREQ_NO_AUTO_FREE_TAG under scic_lock.

Since there is a possibilty of a timeout waiting for the RNC suspension,
handle the exit case from the task termination under scic_lock, and leave
the tag allocated if the termination timed-out.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Remove obviated host callback list.
Jeff Skirvin [Sun, 4 Mar 2012 12:44:53 +0000 (12:44 +0000)]
isci: Remove obviated host callback list.

Since the callbacks to libsas now occur under scic_lock, there is no
longer any reason to save the completed requests in a separate list
for completion to libsas.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Check IDEV_GONE before performing abort path operations.
Jeff Skirvin [Sat, 10 Mar 2012 05:46:46 +0000 (05:46 +0000)]
isci: Check IDEV_GONE before performing abort path operations.

In the link fail path, set IDEV_GONE for every device on the domain
when the last link in the port fails.

In the abort path functions like isci_reset_device, make sure that
there has not already been a detected domain failure with the device
by checking IDEV_GONE, before performing any kind of hard reset, SMP
phy control, or TMF operation.

The check for IDEV_GONE makes sure that the device in the abort path
really has control of the port with which it is associated.  This
prevents starting hard resets at incorrect times and scheduling
unnecessary LUN resets for SATA devices.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Restore the ATAPI device RNC management code.
Jeff Skirvin [Fri, 9 Mar 2012 06:42:09 +0000 (22:42 -0800)]
isci: Restore the ATAPI device RNC management code.

The ATAPI specific and STP general RNC suspension code had been
incorrectly removed from the remote device code.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Don't wait for an RNC suspend if it's being destroyed.
Jeff Skirvin [Fri, 9 Mar 2012 06:42:08 +0000 (22:42 -0800)]
isci: Don't wait for an RNC suspend if it's being destroyed.

Make sure that the wait for suspend can handle the RNC destruction case.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Change the phy control and link reset interface for HW reasons.
Jeff Skirvin [Fri, 9 Mar 2012 06:42:07 +0000 (22:42 -0800)]
isci: Change the phy control and link reset interface for HW reasons.

There is an apparent HW lockup caused when the PE is disabled while there
is an outstanding TC in progress.  This change puts the link into OOB to
force the TC to end before the PE is disabled.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Added timeouts to RNC suspensions in the abort path.
Jeff Skirvin [Fri, 9 Mar 2012 06:42:06 +0000 (22:42 -0800)]
isci: Added timeouts to RNC suspensions in the abort path.

This change adds timeouts to the RNC suspension wait.  It makes the
suspend and resume timeouts the same.

The previous resume timeout of 5 ms was too short, and timeouts were
seen in resumptions of devices in the abort task/LUN reset path - which
would receive an RNC resumed message within a tenth of a second later.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Add protocol indicator for TMF requests.
Jeff Skirvin [Fri, 9 Mar 2012 06:42:06 +0000 (22:42 -0800)]
isci: Add protocol indicator for TMF requests.

Requests contructed as task management requests need to have the protocol
indicator set so the completion decode can observe any RNC suspension
conditions.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Directly control IREQ_ABORT_PATH_ACTIVE when completing TMFs.
Jeff Skirvin [Fri, 9 Mar 2012 06:42:05 +0000 (22:42 -0800)]
isci: Directly control IREQ_ABORT_PATH_ACTIVE when completing TMFs.

TMF requests, unlike normal I/O requests, need to handle I/O management
conditions in the completion function because TMFs are not handled in the
completion tasklet.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Wait for RNC resumption before leaving the abort path.
Jeff Skirvin [Fri, 9 Mar 2012 06:42:04 +0000 (22:42 -0800)]
isci: Wait for RNC resumption before leaving the abort path.

In the case of TMF execution, or device resets, wait for the RNC to fully
resume before returning to the caller.  This ensures that the remote
device will not fail I/O requests while waiting for the RNC resumption to
complete.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Fix RNC suspend call for SCI_RESUMING state.
Jeff Skirvin [Fri, 9 Mar 2012 06:42:04 +0000 (22:42 -0800)]
isci: Fix RNC suspend call for SCI_RESUMING state.

Instead of immediately transitioning to the SCI_RNC_AWAIT_SUSPENSION
state, handle the SCI_RNC_RESUMING suspend transition from the
SCI_RNC_READY state like the SCI_RNC_INVALIDATING --> SCI_RNC_POSTING
transitions do now, by setting the destination state for the entry
into the READY state.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Manage tag releases differently when aborting tasks.
Jeff Skirvin [Fri, 9 Mar 2012 06:42:03 +0000 (22:42 -0800)]
isci: Manage tag releases differently when aborting tasks.

When an individual request is being terminated, the request's tag
is managed in the terminate function.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Callbacks to libsas occur under scic_lock and are synchronized.
Jeff Skirvin [Fri, 9 Mar 2012 06:42:03 +0000 (22:42 -0800)]
isci: Callbacks to libsas occur under scic_lock and are synchronized.

This patch changes the callback mechanism to libsas to only occur while
the scic_lock is held; the abort path cleanup of I/Os also checks to make
sure IREQ_ABORT_PATH_ACTIVE is clear before proceding.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: When in the abort path, defeat other resume calls until done.
Jeff Skirvin [Fri, 9 Mar 2012 06:42:02 +0000 (22:42 -0800)]
isci: When in the abort path, defeat other resume calls until done.

Completion of I/Os during the one of the abort path interface calls
from libsas can drive remote device state changes and the resumption
of the device RNC.  This is a problem when the abort path is
attempting to cleanup outstanding I/O at the same time - the resumption
can prevent the termination from occuring correctly.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Implement waiting for suspend in the abort path.
Jeff Skirvin [Fri, 9 Mar 2012 06:42:01 +0000 (22:42 -0800)]
isci: Implement waiting for suspend in the abort path.

In order to prevent a device from receiving an I/O request while still
in an RNC suspending or resuming state (and therefore failing that
I/O back to libsas with a reset required status) wait for the RNC state
change before proceding in the abort path.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Make sure all TCs are terminated and cleaned in LUN reset.
Jeff Skirvin [Fri, 9 Mar 2012 06:42:00 +0000 (22:42 -0800)]
isci: Make sure all TCs are terminated and cleaned in LUN reset.

In the libsas error path, SATA disks require extra handling in
libata to recover operation.  However, libsas expects to be able
to immediately recover all outstanding I/O once the error handler
escalation stops.  This patch fixes the condition where the libata
error handler is scheduled for operation but libsas has already
deleted the outstanding sas_tasks.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Manage the LLHANG timer enable/disable per-device.
Jeff Skirvin [Fri, 9 Mar 2012 06:42:00 +0000 (22:42 -0800)]
isci: Manage the LLHANG timer enable/disable per-device.

The LLHANG timer should be enabled once per device.  This patch corrects
both the timer enable and the timer disable for the remote device.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Save the suspension hint for upcoming suspensions.
Jeff Skirvin [Fri, 9 Mar 2012 06:41:59 +0000 (22:41 -0800)]
isci: Save the suspension hint for upcoming suspensions.

In the case of a suspend call while in SCI_RNC_POSTING or INVALIDATING
states, the LLHANG detect needed to be saved so the upcoming suspension
would enable it correctly.  The unused suspend callback parameters were
removed.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Fix the terminated I/O to not call sas_task_abort().
Jeff Skirvin [Fri, 9 Mar 2012 06:41:58 +0000 (22:41 -0800)]
isci: Fix the terminated I/O to not call sas_task_abort().

This addresses a regression from the commit "isci: Redesign
device suspension, abort, cleanup." in which the sas_task end
condition for terminated I/Os was made to call back on
sas_task_abort()".
This commit will be rolled into the original.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Distinguish between remote device suspension cases
Jeff Skirvin [Fri, 9 Mar 2012 06:41:58 +0000 (22:41 -0800)]
isci: Distinguish between remote device suspension cases

For NCQ error conditions among others, there is no need to enable
the link layer hang detect timer.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Remove isci_device reqs_in_process and dev_node from isci_device.
Jeff Skirvin [Fri, 9 Mar 2012 06:41:57 +0000 (22:41 -0800)]
isci: Remove isci_device reqs_in_process and dev_node from isci_device.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Only set IDEV_GONE in the device stop path.
Jeff Skirvin [Fri, 9 Mar 2012 06:41:56 +0000 (22:41 -0800)]
isci: Only set IDEV_GONE in the device stop path.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: All pending TCs are terminated when the RNC is invalidated.
Jeff Skirvin [Fri, 9 Mar 2012 06:41:56 +0000 (22:41 -0800)]
isci: All pending TCs are terminated when the RNC is invalidated.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Device access in the error path does not depend on IDEV_GONE.
Jeff Skirvin [Fri, 9 Mar 2012 06:41:55 +0000 (22:41 -0800)]
isci: Device access in the error path does not depend on IDEV_GONE.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Add suspension cases for RNC INVALIDATING, POSTING states.
Jeff Skirvin [Fri, 9 Mar 2012 06:41:54 +0000 (22:41 -0800)]
isci: Add suspension cases for RNC INVALIDATING, POSTING states.

The RNC can be any of the states in the loop from suspended to
ready when the API "suspend" or "resume" are called.  This change
adds destination states parameters that control the suspension /
resumption action of the RNC statemachine for those transition states.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Redesign device suspension, abort, cleanup.
Jeff Skirvin [Fri, 9 Mar 2012 06:41:54 +0000 (22:41 -0800)]
isci: Redesign device suspension, abort, cleanup.

This commit changes the means by which outstanding I/Os are handled
for cleanup.
The likelihood is that this commit will be broken into smaller pieces,
however that will be a later revision.  Among the changes:

- All completion structures have been removed from the tmf and
abort paths.
- Now using one completed I/O list, with the I/O completed in host bit being
used to select error or normal callback paths.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Escalate to I_T_Nexus_Reset when the device is gone.
Jeff Skirvin [Fri, 9 Mar 2012 06:41:53 +0000 (22:41 -0800)]
isci: Escalate to I_T_Nexus_Reset when the device is gone.

If LUN reset sees that the device is gone, it returns TMF_RESP_FUNC_FAILED
to cause libsas to escalate to an I_T_Nexus_Reset.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Remote device stop also suspends the RNC and terminates I/O.
Jeff Skirvin [Fri, 9 Mar 2012 06:41:52 +0000 (22:41 -0800)]
isci: Remote device stop also suspends the RNC and terminates I/O.

Fixing the remote device state machine to suspend and terminate
all outstanding I/O before the device stopped state is reached.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Remote device must be suspended for NCQ cleanup.
Jeff Skirvin [Fri, 9 Mar 2012 06:41:52 +0000 (22:41 -0800)]
isci: Remote device must be suspended for NCQ cleanup.

When the remote device enters the NCQ error state, the device must
be suspended so that the I/O terminations can take place.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Manage device suspensions during TC terminations.
Jeff Skirvin [Fri, 9 Mar 2012 06:41:51 +0000 (22:41 -0800)]
isci: Manage device suspensions during TC terminations.

TCs must be terminated only while the RNC is suspended.  This commit
adds remote device suspensions and resumptions in the abort, reset and
termination paths.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Terminate outstanding TCs on TX/RX RNC suspensions.
Jeff Skirvin [Fri, 9 Mar 2012 06:41:50 +0000 (22:41 -0800)]
isci: Terminate outstanding TCs on TX/RX RNC suspensions.

TCs must only be terminated when RNCs are suspended.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Handle all suspending TC completions
Jeff Skirvin [Fri, 9 Mar 2012 06:41:50 +0000 (22:41 -0800)]
isci: Handle all suspending TC completions

Add comprehensive decode for all TC completions that generate RNC
suspensions.

Note that this commit also removes unconditional resumptions of ATAPI
devices when in the SCI_STP_DEV_ATAPI_ERROR state, and STP devices
when in the SCI_STP_DEV_IDLE state. This is because the SCI_STP_DEV_IDLE
and SCI_STP_DEV_ATAPI state entry functions manage the RNC resumption.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Fixed bug in resumption from RNC Tx/Rx suspend state.
Jeff Skirvin [Fri, 9 Mar 2012 06:41:49 +0000 (22:41 -0800)]
isci: Fixed bug in resumption from RNC Tx/Rx suspend state.

The resumption from the Tx/Rx suspended state should work the same
as the Tx suspended state.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Manage the link layer hang detect timer for RNC suspensions.
Jeff Skirvin [Fri, 9 Mar 2012 06:41:48 +0000 (22:41 -0800)]
isci: Manage the link layer hang detect timer for RNC suspensions.

For STP devices under certain protocol conditions, an RNC will not
suspend until the current transfer state is broken with a SYNC/ESC
sequence from the SCU.  The SYNC/ESC driven by expiration of the
SCU link layer hang detect timer, which has too small a dynamic
range to support slow SATA devices, so normally it is disabled.

This change enables the timer with the minimum period at the point
when the suspension is requested.

Note that there is potential collateral damage to other open
connections to slow SATA devices on the same port, since there
is no alternative but to enable the LLHANG timer on every phy in
the port for the current suspension request - there is no way to
tell on which phy the RNC in question is currently active.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: fix oem parameter validation on single controller skus
Dan Williams [Mon, 30 Apr 2012 18:57:44 +0000 (11:57 -0700)]
isci: fix oem parameter validation on single controller skus

OEM parameters [1] are parsed from the platform option-rom / efi
driver.  By default the driver was validating the parameters for the
dual-controller case, but in single-controller case only the first set
of parameters may be valid.

Limit the validation to the number of actual controllers detected
otherwise the driver may fail to parse the valid parameters leading to
driver-load or runtime failures.

[1] the platform specific set of phy address, configuration,and analog
    tuning values

[stable v3.0+]
Cc: <stable@vger.kernel.org>
Reported-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: enable BCN in sci_port_add_phy()
Maciej Trela [Mon, 12 Mar 2012 23:29:30 +0000 (23:29 +0000)]
isci: enable BCN in sci_port_add_phy()

Ensure we enable receiving BCN's from the
hardware when adding phy to isci_port.
Otherwise if we get BCN before the port is
created we won't see any BCN

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Reported-by: Richard Boyd <richard.g.boyd@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Changes in COMSAS timings enabling ISCI to detect buggy disc drives.
Andrzej Jakowski [Thu, 8 Mar 2012 19:38:50 +0000 (19:38 +0000)]
isci: Changes in COMSAS timings enabling ISCI to detect buggy disc drives.

This patch extends timings in COMSAS signaling, so ISCI can detect disc
drives having issues to send COMSAS in correct time frame.

Signed-off-by: Andrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: kill isci_host.shost
Dan Williams [Sat, 25 Feb 2012 22:29:49 +0000 (14:29 -0800)]
isci: kill isci_host.shost

We can retrieve the shost from the sas_ha like the rest of libsas and
drop this out of our local data structure.

Acked-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: fix interrupt disable
Dan Williams [Fri, 2 Mar 2012 01:06:24 +0000 (17:06 -0800)]
isci: fix interrupt disable

There is a (dubious?) lost irq workaround in sci_controller_isr() that
effectively nullifies attempts to disable interrupts.  Until the
workaround can be re-evaluated add some infrastructure to prevent the
interrupt handler from inadvertantly re-enabling interrupts.

The failure mode was interrupts continuing to run after the driver had
been removed and its iomappings torn down.

Reported-by: Jacek Danecki <jacek.danecki@intel.com>
Tested-by: Jacek Danecki <jacek.danecki@intel.com>
[richard: clear remaining interrupts at the end of reset]
Acked-by: Richard Boyd <richard.g.boyd@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: fix 'link-up' events occur after 'start-complete'
Dan Williams [Wed, 29 Feb 2012 09:07:56 +0000 (01:07 -0800)]
isci: fix 'link-up' events occur after 'start-complete'

The call to wait_for_start() is meant to ensure that all links have been
given a chance to come up before letting the kernel proceed with
probing.  However, the implementation is not correctly syncing with the
port configuration agent.  In the MPC case the ports are hard-coded, in
the APC case we need to wait for the port-configuration to form ports
from the started phys.

Towards that end increase the timeout for the APC agent to form ports,
and delay start complete until all phys are out of link-training.

Cc: <stable@vger.kernel.org>
Cc: Richard Boyd <richard.g.boyd@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: fix controller stop
Dan Williams [Thu, 23 Feb 2012 09:12:10 +0000 (01:12 -0800)]
isci: fix controller stop

1/ notify waiters when controller stop completes (fixes 10 second stall
   unloading the driver)
2/ make sure phy stop is after port and device stop

Cc: Richard Boyd <richard.g.boyd@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: refactor initialization for S3/S4
Dan Williams [Wed, 15 Feb 2012 21:58:42 +0000 (13:58 -0800)]
isci: refactor initialization for S3/S4

Based on an original implementation by Ed Nadolski and Artur Wojcik

In preparation for S3/S4 support refactor initialization so that
driver-load and resume-from-suspend can share the common init path of
isci_host_init().  Organize the initialization into objects that are
self-contained to the driver (initialized by isci_host_init) versus
those that have some upward registration (initialized at allocation time
asd_sas_phy, asd_sas_port, dma allocations).  The largest change is
moving the the validation of the oem and module parameters from
isci_host_init() to isci_host_alloc().

The S3/S4 approach being taken is that libsas will be tasked with
remembering the state of the domain and the lldd is free to be
forgetful.  In the case of isci we'll just re-init using a subset of the
normal driver load path.

[clean up some unused / mis-indented function definitions in host.h]

Signed-off-by: Ed Nadolski <edmund.nadolski@intel.com>
Signed-off-by: Artur Wojcik <artur.wojcik@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: kill isci_port.domain_dev_list
Dan Williams [Sat, 18 Feb 2012 00:30:47 +0000 (16:30 -0800)]
isci: kill isci_port.domain_dev_list

Another unused field, and isci_port_init is overkill.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: kill ->status, and ->state_lock in isci_host
Dan Williams [Wed, 15 Feb 2012 21:20:31 +0000 (13:20 -0800)]
isci: kill ->status, and ->state_lock in isci_host

They serve no incremental purpose over the existing sas_ha state.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: Don't filter BROADCAST CHANGE primitives
Tom Jackson [Fri, 24 Feb 2012 09:38:49 +0000 (09:38 +0000)]
isci: Don't filter BROADCAST CHANGE primitives

Per the SAS spec, several types of BROADCAST CHANGE primitives
must cause re-discovery of the originating expander.
Only the standard BROADCAST CHANGE primitive was being
sent to the LIBSAS layer.  The other BC primitives have been
added to the sci_phy_event_handler()

Signed-off-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: kill sci_phy_protocol and sci_request_protocol
Dan Williams [Wed, 1 Feb 2012 08:44:14 +0000 (00:44 -0800)]
isci: kill sci_phy_protocol and sci_request_protocol

Holdovers from the initial driver cleanup, replace with enum sas_protocol.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: kill ->is_direct_attached
Dan Williams [Wed, 1 Feb 2012 08:23:10 +0000 (00:23 -0800)]
isci: kill ->is_direct_attached

domain_device ->parent conveys the same information.

Occurrences of ->is_direct_attached appear next to incomplete open-coded
versions of dev_is_sata(), clean those up as well.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years agoisci: improve 'invalid state' warnings
Dan Williams [Fri, 10 Feb 2012 09:05:43 +0000 (01:05 -0800)]
isci: improve 'invalid state' warnings

Convert controller state machine warnings to emit the state number (it
missed the number to string conversion, but since these error rarely
happen not much motivation to go further).

Fix up the rnc warnings to use the state name.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
12 years ago[SCSI] lpfc 8.3.31: Update lpfc to version 8.3.31
James Smart [Thu, 10 May 2012 01:19:53 +0000 (21:19 -0400)]
[SCSI] lpfc 8.3.31: Update lpfc to version 8.3.31

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Fixed system crash due to not providing SCSI error-handling host...
James Smart [Thu, 10 May 2012 01:19:44 +0000 (21:19 -0400)]
[SCSI] lpfc 8.3.31: Fixed system crash due to not providing SCSI error-handling host reset handler

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Fix bug with driver using the wrong xritag when sending an els...
James Smart [Thu, 10 May 2012 01:19:34 +0000 (21:19 -0400)]
[SCSI] lpfc 8.3.31: Fix bug with driver using the wrong xritag when sending an els echo

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Debug helper utility routines for dumping various SLI4 queues
James Smart [Thu, 10 May 2012 01:19:25 +0000 (21:19 -0400)]
[SCSI] lpfc 8.3.31: Debug helper utility routines for dumping various SLI4 queues

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Fix unsol abts xri lookup
James Smart [Thu, 10 May 2012 01:19:14 +0000 (21:19 -0400)]
[SCSI] lpfc 8.3.31: Fix unsol abts xri lookup

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Correct point-to-point mode discovery errors on LPe16xxx
James Smart [Thu, 10 May 2012 01:19:03 +0000 (21:19 -0400)]
[SCSI] lpfc 8.3.31: Correct point-to-point mode discovery errors on LPe16xxx

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Reregister VPI for SLI3 after cable moved to new Saturn port
James Smart [Thu, 10 May 2012 01:18:49 +0000 (21:18 -0400)]
[SCSI] lpfc 8.3.31: Reregister VPI for SLI3 after cable moved to new Saturn port

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Fix driver crash during back-to-back ramp events
James Smart [Thu, 10 May 2012 01:18:40 +0000 (21:18 -0400)]
[SCSI] lpfc 8.3.31: Fix driver crash during back-to-back ramp events

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Fix log message for Mailbox command when no error is detected
James Smart [Thu, 10 May 2012 01:18:30 +0000 (21:18 -0400)]
[SCSI] lpfc 8.3.31: Fix log message for Mailbox command when no error is detected

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Revise FCP LOG for easier Finisar trace correlation
James Smart [Thu, 10 May 2012 01:18:20 +0000 (21:18 -0400)]
[SCSI] lpfc 8.3.31: Revise FCP LOG for easier Finisar trace correlation

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Fix build warnings when debugfs is not defined
James Smart [Thu, 10 May 2012 01:18:12 +0000 (21:18 -0400)]
[SCSI] lpfc 8.3.31: Fix build warnings when debugfs is not defined

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Fix kernel panic when going into to sleep state
James Smart [Thu, 10 May 2012 01:17:43 +0000 (21:17 -0400)]
[SCSI] lpfc 8.3.31: Fix kernel panic when going into to sleep state

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Fix error message displayed even when not an error
James Smart [Thu, 10 May 2012 01:17:37 +0000 (21:17 -0400)]
[SCSI] lpfc 8.3.31: Fix error message displayed even when not an error

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Fix Read Link status data
James Smart [Thu, 10 May 2012 01:17:16 +0000 (21:17 -0400)]
[SCSI] lpfc 8.3.31: Fix Read Link status data

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Fixed system panic due to midlayer abort and driver complete...
James Smart [Thu, 10 May 2012 01:17:07 +0000 (21:17 -0400)]
[SCSI] lpfc 8.3.31: Fixed system panic due to midlayer abort and driver complete race on SCSI cmd

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Fix unable to create vports on FCoE SLI4 adapter
James Smart [Thu, 10 May 2012 01:16:50 +0000 (21:16 -0400)]
[SCSI] lpfc 8.3.31: Fix unable to create vports on FCoE SLI4 adapter

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Fix initiator sending flogi after acking flogi from target
James Smart [Thu, 10 May 2012 01:16:42 +0000 (21:16 -0400)]
[SCSI] lpfc 8.3.31: Fix initiator sending flogi after acking flogi from target

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Fix bug with driver not supporting the get controller attributes...
James Smart [Thu, 10 May 2012 01:16:24 +0000 (21:16 -0400)]
[SCSI] lpfc 8.3.31: Fix bug with driver not supporting the get controller attributes command

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Correct handling of SLI4-port XRI resource-provisioning profile...
James Smart [Thu, 10 May 2012 01:16:12 +0000 (21:16 -0400)]
[SCSI] lpfc 8.3.31: Correct handling of SLI4-port XRI resource-provisioning profile change

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] lpfc 8.3.31: Fix bug with driver unload leaving a scsi host for a vport around
James Smart [Thu, 10 May 2012 01:16:03 +0000 (21:16 -0400)]
[SCSI] lpfc 8.3.31: Fix bug with driver unload leaving a scsi host for a vport around

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] sg: constify sg_proc_leaf_arr
Jörn Engel [Thu, 12 Apr 2012 21:35:25 +0000 (17:35 -0400)]
[SCSI] sg: constify sg_proc_leaf_arr

Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] sg: remove sg_mutex
Jörn Engel [Thu, 12 Apr 2012 21:35:05 +0000 (17:35 -0400)]
[SCSI] sg: remove sg_mutex

With the exception of the detached field, sg_mutex no longer adds any
locking.  detached handling has been broken before and is still broken
and this patch does not seem to make things worse than they were to
begin with.

However, I have observed cases of tasks being blocked for >200s waiting
for sg_mutex.  So the removal clearly adds value for very little cost.

Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] sg: completely protect sfds
Jörn Engel [Wed, 25 Apr 2012 15:17:29 +0000 (11:17 -0400)]
[SCSI] sg: completely protect sfds

sfds is protected by sg_index_lock - except for sg_open(), where it
isn't.  Change that and add some documentation.

Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] sg: protect sdp->exclude
Jörn Engel [Tue, 24 Apr 2012 20:13:11 +0000 (16:13 -0400)]
[SCSI] sg: protect sdp->exclude

Changes since v1: set_exclude now returns the new value, which gets
rid of the comma expression and the operator precedence bug.  Thanks
to Douglas for spotting it.

sdp->exclude was previously protected by the BKL.  The sg_mutex, which
replaced the BKL, only semi-protected it, as it was missing from
sg_release() and sg_proc_seq_show_debug().  Take an explicit spinlock
for it.

Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] sg: prevent unwoken sleep
Jörn Engel [Thu, 12 Apr 2012 21:33:58 +0000 (17:33 -0400)]
[SCSI] sg: prevent unwoken sleep

srp->done is protected by sfp->rq_list_lock everywhere, except for this
one case.  Result can be that the wake-up happens before the cacheline
with the changed srp->done has arrived, so the waiter can go back to
sleep and never be woken up again.

The wait_event_interruptible() means that anyone trying to debug this
unlikely race will likely notice everything working fine again, as the
next signal will unwedge things.  Evil.

Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] sg: remove closed flag
Jörn Engel [Thu, 12 Apr 2012 21:33:39 +0000 (17:33 -0400)]
[SCSI] sg: remove closed flag

After sg_release() has been called, noone should be able to actually use
that filedescriptor anymore.  So if closed ever made a difference in the
past five years or so, it would have meant a bug.  Remove it.

Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
[jejb: fix up checkpatch warnings]
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] sg: use wait_event_interruptible()
Jörn Engel [Thu, 12 Apr 2012 21:33:25 +0000 (17:33 -0400)]
[SCSI] sg: use wait_event_interruptible()

Afaics the use of __wait_event_interruptible() as opposed to
wait_event_interruptible() is purely historic.  So let's follow the rest
of the kernel and check the condition before prepare_to_wait() - and
also make the code a bit nicer.

Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] sg: remove while (1) non-loop
Jörn Engel [Thu, 12 Apr 2012 21:32:48 +0000 (17:32 -0400)]
[SCSI] sg: remove while (1) non-loop

The while (1) construct isn't actually a loop at all.  So let's not
pretent and obfuscate the code.

Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] sg: remove unnecessary indentation
Jörn Engel [Thu, 12 Apr 2012 21:32:17 +0000 (17:32 -0400)]
[SCSI] sg: remove unnecessary indentation

blocking is de-facto a constant and the now-removed comment wasn't all
that useful either.  Without them and the resulting indentation the code
is a bit nicer to read.

Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years ago[SCSI] sd: limit the scope of the async probe domain
Dan Williams [Fri, 23 Mar 2012 00:05:11 +0000 (17:05 -0700)]
[SCSI] sd: limit the scope of the async probe domain

sd injects and synchronizes probe work on the global kernel-wide domain.
This runs into conflict with PM that wants to perform resume actions in
async context:

[  494.237079] INFO: task kworker/u:3:554 blocked for more than 120 seconds.
[  494.294396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  494.360809] kworker/u:3     D 0000000000000000     0   554      2 0x00000000
[  494.420739]  ffff88012e4d3af0 0000000000000046 ffff88013200c160 ffff88012e4d3fd8
[  494.484392]  ffff88012e4d3fd8 0000000000012500 ffff8801394ea0b0 ffff88013200c160
[  494.548038]  ffff88012e4d3ae0 00000000000001e3 ffffffff81a249e0 ffff8801321c5398
[  494.611685] Call Trace:
[  494.632649]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
[  494.674687]  [<ffffffff8104b968>] async_synchronize_cookie_domain+0xb6/0x112
[  494.734177]  [<ffffffff810461ff>] ? __init_waitqueue_head+0x50/0x50
[  494.787134]  [<ffffffff8131a224>] ? scsi_remove_target+0x48/0x48
[  494.837900]  [<ffffffff8104b9d9>] async_synchronize_cookie+0x15/0x17
[  494.891567]  [<ffffffff8104ba49>] async_synchronize_full+0x54/0x70  <-- here we wait for async contexts to complete
[  494.943783]  [<ffffffff8104b9f5>] ? async_synchronize_full_domain+0x1a/0x1a
[  495.002547]  [<ffffffffa00114b1>] sd_remove+0x2c/0xa2 [sd_mod]
[  495.051861]  [<ffffffff812fe94f>] __device_release_driver+0x86/0xcf
[  495.104807]  [<ffffffff812fe9bd>] device_release_driver+0x25/0x32  <-- here we take device_lock()

[  853.511341] INFO: task kworker/u:4:549 blocked for more than 120 seconds.
[  853.568693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  853.635119] kworker/u:4     D ffff88013097b5d0     0   549      2 0x00000000
[  853.695129]  ffff880132773c40 0000000000000046 ffff880130790000 ffff880132773fd8
[  853.758990]  ffff880132773fd8 0000000000012500 ffff88013288a0b0 ffff880130790000
[  853.822796]  0000000000000246 0000000000000040 ffff88013097b5c8 ffff880130790000
[  853.886633] Call Trace:
[  853.907631]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
[  853.949670]  [<ffffffff8149cc44>] __mutex_lock_common+0x220/0x351
[  854.001225]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
[  854.049082]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
[  854.097011]  [<ffffffff8149ce48>] mutex_lock_nested+0x2f/0x36   <-- here we wait for device_lock()
[  854.145591]  [<ffffffff81304bd7>] device_resume+0x58/0x1c4
[  854.192066]  [<ffffffff81304d61>] async_resume+0x1e/0x45
[  854.237019]  [<ffffffff8104bc93>] async_run_entry_fn+0xc6/0x173  <-- ...while running in async context

Provide a 'scsi_sd_probe_domain' so that async probe actions actions can
be flushed without regard for the state of PM, and allow for the resume
path to handle devices that have transitioned from SDEV_QUIESCE to
SDEV_DEL prior to resume.

Acked-by: Alan Stern <stern@rowland.harvard.edu>
[alan: uplevel scsi_sd_probe_domain, clarify scsi_device_resume]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[jejb: remove unneeded config guards in include file]
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
12 years agoLinux 3.4-rc7
Linus Torvalds [Sun, 13 May 2012 01:37:47 +0000 (18:37 -0700)]
Linux 3.4-rc7

.. and this should hopefully be the last -rc before final 3.4 release.

12 years agoMerge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Sun, 13 May 2012 00:27:41 +0000 (17:27 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM: SoC fixes from Olof Johansson:
 "I was hoping to be done with fixes for 3.4 but we got two branches
  from subarch maintainers the last couple of days.  So here is one
  last(?) pull request for arm-soc containing 7 patches:

   - Five of them are for shmobile dealing with SMP setup and compile
     failures
   - The remaining two are for regressions on the Samsung platforms"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: EXYNOS: fix ctrlbit for exynos5_clk_pdma1
  ARM: EXYNOS: use s5p-timer for UniversalC210 board
  ARM / mach-shmobile: Invalidate caches when booting secondary cores
  ARM / mach-shmobile: sh73a0 SMP TWD boot regression fix
  ARM / mach-shmobile: r8a7779 SMP TWD boot regression fix
  ARM: mach-shmobile: convert ag5evm to use the generic MMC GPIO hotplug helper
  ARM: mach-shmobile: convert mackerel to use the generic MMC GPIO hotplug helper

12 years agoMerge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6
Linus Torvalds [Sun, 13 May 2012 00:24:29 +0000 (17:24 -0700)]
Merge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6

Pull a few more GPIO bug fixes from Grant Likely:
 "Oops, missed a couple.  Here's an updated pull req for GPIO"

A set of PCH bug fixes, and one patch to fix up compile warnings

* tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6:
  gpio/exynos: Fix compiler warnings when non-exynos machines are selected
  gpio: pch9: Use proper flow type handlers

12 years agoMerge branch 'v3.4-samsung-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel...
Olof Johansson [Sat, 12 May 2012 22:41:22 +0000 (15:41 -0700)]
Merge branch 'v3.4-samsung-fixes-5' of git://git./linux/kernel/git/kgene/linux-samsung into fixes

* 'v3.4-samsung-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
  ARM: EXYNOS: fix ctrlbit for exynos5_clk_pdma1
  ARM: EXYNOS: use s5p-timer for UniversalC210 board

12 years agoARM: EXYNOS: fix ctrlbit for exynos5_clk_pdma1
Kukjin Kim [Sat, 12 May 2012 07:45:47 +0000 (16:45 +0900)]
ARM: EXYNOS: fix ctrlbit for exynos5_clk_pdma1

It should be (1 << 2) for ctrlbit of exynos5_clk_pdma1.

Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
12 years agoARM: EXYNOS: use s5p-timer for UniversalC210 board
Marek Szyprowski [Fri, 11 May 2012 21:17:59 +0000 (06:17 +0900)]
ARM: EXYNOS: use s5p-timer for UniversalC210 board

Commit 069d4e743 ("ARM: EXYNOS4: Remove clock event timers using
ARM private timers") removed support for local timers and forced
to use MCT as event source. However MCT is not operating properly
on early revision of EXYNOS4 SoCs. All UniversalC210 boards are
based on it, so that commit broke support for it. This patch
provides a workaround that enables UniversalC210 boards to boot
again. s5p-timer is used as an event source, it works only for
non-SMP builds.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
12 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/renesas...
Olof Johansson [Sat, 12 May 2012 22:40:56 +0000 (15:40 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/rafael/renesas into fixes

By Guennadi Liakhovetski (2) and others via Rafael J. Wysocki:
"[...] urgent fixes for Renesas ARM-based platforms.  Four of these
commits are fixes of regressions new in 3.4-rc and the last one is
necessary for SMP to work on those systems in general."

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/renesas:
  ARM / mach-shmobile: Invalidate caches when booting secondary cores
  ARM / mach-shmobile: sh73a0 SMP TWD boot regression fix
  ARM / mach-shmobile: r8a7779 SMP TWD boot regression fix
  ARM: mach-shmobile: convert ag5evm to use the generic MMC GPIO hotplug helper
  ARM: mach-shmobile: convert mackerel to use the generic MMC GPIO hotplug helper

12 years agoARM / mach-shmobile: Invalidate caches when booting secondary cores
Magnus Damm [Wed, 9 May 2012 07:24:59 +0000 (16:24 +0900)]
ARM / mach-shmobile: Invalidate caches when booting secondary cores

Make sure L1 caches are invalidated when booting secondary
cores. Needed to boot all mach-shmobile SMP systems that
are using Cortex-A9 including sh73a0, r8a7779 and EMEV2.

Thanks to imx and tegra guys for actual code.

Signed-off-by: Magnus Damm <damm@opensource.se>
Tested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
12 years agoARM / mach-shmobile: sh73a0 SMP TWD boot regression fix
Kuninori Morimoto [Thu, 10 May 2012 07:26:58 +0000 (00:26 -0700)]
ARM / mach-shmobile: sh73a0 SMP TWD boot regression fix

Fix SMP TWD boot regression on sh73a0 based platforms caused by:

4200b16 ARM: shmobile: convert to twd_local_timer_register() interface

After the merge of the above commit it has been impossible to boot
sh73a0 based SoCs with SMP enabled and CONFIG_HAVE_ARM_TWD=y. The
kernel crashes at smp_init_cpus() timing which is before the console
has been initialized, so to the user this looks like a kernel lock up
without any particular error message.

This patch fixes the regression on sh73a0 by moving the TWD
registration code from smp_init_cpus() to sys_timer->init() time.

This patch removed shmobile_twd_init() which is no longer needed

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
12 years agoARM / mach-shmobile: r8a7779 SMP TWD boot regression fix
Magnus Damm [Thu, 10 May 2012 05:57:22 +0000 (14:57 +0900)]
ARM / mach-shmobile: r8a7779 SMP TWD boot regression fix

Fix SMP TWD boot regression on r8a7779 based platforms caused by:

4200b16 ARM: shmobile: convert to twd_local_timer_register() interface

After the merge of the above commit it has been impossible to boot
r8a7779 based SoCs with SMP enabled and CONFIG_HAVE_ARM_TWD=y. The
kernel crashes at smp_init_cpus() timing which is before the console
has been initialized, so to the user this looks like a kernel lock up
without any particular error message.

This patch fixes the regression on r8a7779 by moving the TWD
registration code from smp_init_cpus() to sys_timer->init() time.

Signed-off-by: Magnus Damm <damm@opensource.se>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
12 years agoARM: mach-shmobile: convert ag5evm to use the generic MMC GPIO hotplug helper
Guennadi Liakhovetski [Mon, 16 Apr 2012 21:09:19 +0000 (23:09 +0200)]
ARM: mach-shmobile: convert ag5evm to use the generic MMC GPIO hotplug helper

This also fixes the following modular mmc build failure:

arch/arm/mach-shmobile/built-in.o: In function `mackerel_sdhi0_gpio_cd':
pfc-sh7372.c:(.text+0x1138): undefined reference to `mmc_detect_change'

on this platform by eliminating the use of an inline function, which
calls into the mmc core.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Reviewed-by: Simon Horman <horms@verge.net.au>
Acked-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
12 years agoARM: mach-shmobile: convert mackerel to use the generic MMC GPIO hotplug helper
Guennadi Liakhovetski [Mon, 16 Apr 2012 21:09:13 +0000 (23:09 +0200)]
ARM: mach-shmobile: convert mackerel to use the generic MMC GPIO hotplug helper

This also fixes the following modular mmc build failure:

arch/arm/mach-shmobile/built-in.o: In function `ag5evm_sdhi0_gpio_cd':
pfc-sh73a0.c:(.text+0x7c0): undefined reference to `mmc_detect_change'

on this platform by eliminating the use of an inline function, which
calls into the mmc core.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Tested-by: Simon Horman <horms@verge.net.au>
Acked-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
12 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 12 May 2012 20:02:31 +0000 (13:02 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a set of minor qla and virto fixes plus one major regression
  fix (oops in all legacy host drivers)."

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  [SCSI] virtio_scsi: fix TMF use-after-free
  [SCSI] fix oops in all legacy host adapters caused by 6f381fa
  [SCSI] qla2xxx: Update version number to 8.04.00.03-k.
  [SCSI] qla2xxx: Properly check for current state after the fabric-login request.
  [SCSI] qla2xxx: Proper completion to scsi-ml for scsi status task_set_full and busy.
  [SCSI] qla2xxx: Block flash access from application when device is initialized for ISP82xx.
  [SCSI] qla2xxx: Fix reset time out as qla2xxx not ack to reset request.

12 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sat, 12 May 2012 19:57:01 +0000 (12:57 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David S. Miller:

 1) Since we do RCU lookups on ipv4 FIB entries, we have to test if the
    entry is dead before returning it to our caller.

 2) openvswitch locking and packet validation fixes from Ansis Atteka,
    Jesse Gross, and Pravin B Shelar.

 3) Fix PM resume locking in IGB driver, from Benjamin Poirier.

 4) Fix VLAN header handling in vhost-net and macvtap, from Basil Gor.

 5) Revert a bogus network namespace isolation change that was causing
    regressions on S390 networking devices.

 6) If bonding decides to process and handle a LACPDU frame, we
    shouldn't bump the rx_dropped counter.  From Jiri Bohac.

 7) Fix mis-calculation of available TX space in r8169 driver when doing
    TSO, which can lead to crashes and/or hung device.  From Julien
    Ducourthial.

 8) SCTP does not validate cached routes properly in all cases, from
    Nicolas Dichtel.

 9) Link status interrupt needs to be handled in ks8851 driver, from
    Stephen Boyd.

10) Use capable(), not cap_raised(), in connector/userns netlink code.
    From Eric W. Biederman via Andrew Morton.

11) Fix pktgen OOPS on module unload, from Eric Dumazet.

12) iwlwifi under-estimates SKB truesizes, also from Eric Dumazet.

13) Cure division by zero in SFC driver, from Ben Hutchings.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
  ks8851: Update link status during link change interrupt
  macvtap: restore vlan header on user read
  vhost-net: fix handle_rx buffer size
  bonding: don't increase rx_dropped after processing LACPDUs
  connector/userns: replace netlink uses of cap_raised() with capable()
  sctp: check cached dst before using it
  pktgen: fix crash at module unload
  Revert "net: maintain namespace isolation between vlan and real device"
  ehea: fix losing of NEQ events when one event occurred early
  igb: fix rtnl race in PM resume path
  ipv4: Do not use dead fib_info entries.
  r8169: fix unsigned int wraparound with TSO
  sfc: Fix division by zero when using one RX channel and no SR-IOV
  openvswitch: Validation of IPv6 set port action uses IPv4 header
  net: compare_ether_addr[_64bits]() has no ordering
  cdc_ether: Ignore bogus union descriptor for RNDIS devices
  bnx2x: bug fix when loading after SAN boot
  e1000: Silence sparse warnings by correcting type
  igb, ixgbe: netdev_tx_reset_queue incorrectly called from tx init path
  openvswitch: Release rtnl_lock if ovs_vport_cmd_build_info() failed.
  ...

12 years agoMerge tag 'dm-3.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Linus Torvalds [Sat, 12 May 2012 19:56:08 +0000 (12:56 -0700)]
Merge tag 'dm-3.4-fixes' of git://git./linux/kernel/git/agk/linux-dm

Pull device-mapper fixes from Alasdair G Kergon:
 "Fix a couple of serious memory leaks in device-mapper thin
  provisioning and tidy its MODULE_DESCRIPTION.

  Mitigate occasional reported hangs associated with multipath scsi_dh
  module loading."

* tag 'dm-3.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
  dm mpath: check if scsi_dh module already loaded before trying to load
  dm thin: correct module description
  dm thin: fix unprotected use of prepared_discards list
  dm thin: reinstate missing mempool_free in cell_release_singleton

12 years agoMAINTAINERS: Add myself as the cpufreq maintainer
Rafael J. Wysocki [Fri, 11 May 2012 19:35:45 +0000 (21:35 +0200)]
MAINTAINERS: Add myself as the cpufreq maintainer

Since cpufreq has no official maintainer at the moment, I'm willing
to maintain it along some other power management core code I've been
maintaining already.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agodm mpath: check if scsi_dh module already loaded before trying to load
Mike Snitzer [Sat, 12 May 2012 00:43:21 +0000 (01:43 +0100)]
dm mpath: check if scsi_dh module already loaded before trying to load

If the requested scsi_dh module is already loaded then skip
request_module().

Multipath table loads can hang in an unnecessary __request_module.

Reported-by: Ben Marzinski <bmarzins@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agodm thin: correct module description
Alasdair G Kergon [Sat, 12 May 2012 00:43:19 +0000 (01:43 +0100)]
dm thin: correct module description

Remove duplicate copy of string "device-mapper" (DM_NAME) from
MODULE_DESCRIPTION.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>