Sasikumar Chandrasekaran [Tue, 10 Jan 2017 23:20:46 +0000 (18:20 -0500)]
scsi: megaraid_sas: SAS3.5 Generic Megaraid Controllers Stream Detection and IO Coalescing
Detect sequential Write IOs and pass the hint that it is part of sequential
stream to help HBA Firmware do the Full Stripe Writes. For read IOs on
certain RAID volumes like Read Ahead volumes,this will help driver to
send it to Firmware even if the IOs can potentially be sent to
hardware directly (called fast path) bypassing firmware.
Design: 8 streams are maintained per RAID volume as per the combined
firmware/driver design. When there is no stream detected the LRU stream
is used for next potential stream and LRU/MRU map is updated to make this
as MRU stream. Every time a stream is detected the MRU map
is updated to make the current stream as MRU stream.
Signed-off-by: Sasikumar Chandrasekaran <sasikumar.pc@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Sasikumar Chandrasekaran [Tue, 10 Jan 2017 23:20:45 +0000 (18:20 -0500)]
scsi: megaraid_sas: EEDP Escape Mode Support for SAS3.5 Generic Megaraid Controllers
An UNMAP command on a PI formatted device will leave the Logical Block Application
Tag and Logical Block Reference Tag as all F's (for those LBAs that are unmapped).
To avoid IO errors if those LBAs are subsequently read before they are written with
valid tag fields, the MPI SCSI IO requests need to set the EEDPFlags element EEDP
Escape Mode field, Bits [7:6] appropriately. A value of 2 should be set to disable
all PI checks if the Logical Block Application Tag is 0xFFFF for PI types 1 and 2.
A value of 3 should be set to disable all PI checks if the Logical Block Application
Tag is 0xFFFF and the Logical Block Reference Tag is 0xFFFFFFFF for PI type 3.
Signed-off-by: Sasikumar Chandrasekaran <sasikumar.pc@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Sasikumar Chandrasekaran [Tue, 10 Jan 2017 23:20:44 +0000 (18:20 -0500)]
scsi: megaraid_sas: 128 MSIX Support
SAS3.5 Generic Megaraid based Controllers will have the support for 128 MSI-X vectors,
resulting in the need to support 128 reply queues
Signed-off-by: Sasikumar Chandrasekaran <sasikumar.pc@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Sasikumar Chandrasekaran [Tue, 10 Jan 2017 23:20:43 +0000 (18:20 -0500)]
scsi: megaraid_sas: Add new pci device Ids for SAS3.5 Generic Megaraid Controllers
This patch contains new pci device ids for SAS3.5 Generic Megaraid Controllers
Signed-off-by: Sasikumar Chandrasekaran <sasikumar.pc@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hanjun Guo [Tue, 10 Jan 2017 12:16:43 +0000 (20:16 +0800)]
scsi: remove useless acpi functions in the header file
commit
f1bc1e4c44b1 ("ata: acpi: rework the ata acpi bind support")
removed scsi_register_acpi_bus_type() and
scsi_unregister_acpi_bus_type(), but forgot to remove them in the header
file, do it now.
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Tomas Winkler [Thu, 5 Jan 2017 08:45:12 +0000 (10:45 +0200)]
scsi: ufs: refactor device descriptor reading
Pull device descriptor reading out of ufs quirk so it can be used also
for other purposes.
Revamp the fixup setup:
1. Rename ufs_device_info to ufs_dev_desc as very similar name
ufs_dev_info is already in use.
2. Make the handlers static as they are not used out of the ufshdc.c
file.
[mkp: applied by hand]
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Tomas Winkler [Thu, 5 Jan 2017 08:45:11 +0000 (10:45 +0200)]
scsi: ufs: ufshcd_get_max_icc_level fix endianity handling
Reading big endian value from a buffer requires explicit cast.
Fix sparse warning:
drivers/scsi/ufs/ufshcd.c:4825:24: warning: cast to restricted __be16
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Tomas Winkler [Thu, 5 Jan 2017 08:45:10 +0000 (10:45 +0200)]
scsi: ufs: unexport descritpor reading functions
Unexport ufshcd_read_device_desc and ufshcd_read_string_desc there is no
really possibility to calling them directly outside of UFS context.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Tomas Winkler [Thu, 5 Jan 2017 08:45:09 +0000 (10:45 +0200)]
scsi: ufs: ufshcd_query_descriptor_retry should be static
Fix the following compilation warning:
drivers/scsi/ufs/ufshcd.c:2076:5: warning: no previous prototype for
ufshcd_query_descriptor_retry [-Wmissing-prototypes]
Also do not export the function, it should not be used out of ufs
context.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Emese Revfy [Wed, 4 Jan 2017 00:01:40 +0000 (16:01 -0800)]
scsi: esas2r: Fix format string type mistakes
This adds the missing __printf attribute which allows compile time
format string checking (and will be used by the coming initify gcc
plugin). Additionally, this fixes the warnings exposed by the attribute.
Signed-off-by: Emese Revfy <re.emese@gmail.com>
[kees: split scsi/acpi, merged attr and fix, new commit messages]
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Fri, 18 Nov 2016 06:28:16 +0000 (07:28 +0100)]
scsi: pmcraid: switch to pci_alloc_irq_vectors
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Sun, 8 Jan 2017 09:41:15 +0000 (10:41 +0100)]
scsi: bfa: remove bfa_fcs_mod_s
Just call the functions directly instead of obsfucating the call chain.
This was in reply to a patch from Kees Cook to constify the function
pointer struct bfa_fcs_mod_s, but it turns out there is no reason to
have this indirection at all.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Colin Ian King [Thu, 29 Dec 2016 22:20:38 +0000 (22:20 +0000)]
scsi: qla2xxx: rename {vendor|hba}_indentifer to {vendor|hba}_identifer
Rename the vendor_indentifer and hba_indentifer fields to correct
spelling.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nicolas Iooss [Mon, 26 Dec 2016 13:23:10 +0000 (14:23 +0100)]
scsi: qla2xxx: make msix_entries const
msix_entries and qla82xx_msix_entries arrays are never modified in
drivers/scsi/qla2xxx/qla_isr.c. Move their contents to read-only data.
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nicolas Iooss [Mon, 26 Dec 2016 13:23:09 +0000 (14:23 +0100)]
scsi: qla2xxx: silence -Wformat-security warning
qla24xx_enable_msix() calls scnprintf() with a non-literal format
string. This makes clang report -Wformat-security warnings when
compiling this function:
drivers/scsi/qla2xxx/qla_isr.c:3083:7: error: format string is not a
string literal (potentially insecure) [-Werror,-Wformat-security]
msix_entries[i].name);
^~~~~~~~~~~~~~~~~~~~
drivers/scsi/qla2xxx/qla_isr.c:3083:7: note: treat the string as an
argument to avoid this
msix_entries[i].name);
^
"%s",
drivers/scsi/qla2xxx/qla_isr.c:3119:7: error: format string is not a
string literal (potentially insecure) [-Werror,-Wformat-security]
msix_entries[QLA_ATIO_VECTOR].name);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/scsi/qla2xxx/qla_isr.c:3119:7: note: treat the string as an
argument to avoid this
msix_entries[QLA_ATIO_VECTOR].name);
^
"%s",
Even though msix_entries[...].name are initialized as literal strings
with no % character and are never modified, introduce a "%s" format
parameter in order to silence this -Wformat-security warning and make
clang able to detect at compile time real bugs related to string
formatting.
[mkp: typo]
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Fri, 30 Dec 2016 14:57:47 +0000 (06:57 -0800)]
scsi: lpfc: Reinstate lpfc_soft_wwn parameter
The lpfc 11.2.0.4 patch set deprecated, by removing, the lpfc_soft_wwn
parameter support.
This patch reinstates support, but adds a warning in the enablement of
the feature that indicates Broadcom (Emulex) does not support the
feature.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
John Garry [Tue, 3 Jan 2017 12:24:50 +0000 (20:24 +0800)]
scsi: hisi_sas: lock sensitive region in hisi_sas_slot_abort()
When we call hisi_sas_slot_task_free() we should grab the hisi_hba.lock,
as hisi_sas_slot_task_free() accesses common hisi_hba elements.
Function hisi_sas_slot_abort() is missing this, so add it.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
John Garry [Tue, 3 Jan 2017 12:24:49 +0000 (20:24 +0800)]
scsi: hisi_sas: lock sensitive regions when servicing CQ interrupt
There is a bug in the current driver in that certain hisi_hba and port
structure elements which we access when servicing the CQ interrupt do
not use thread-safe accesses; these include hisi_sas_port linked-list of
active slots (hisi_sas_port.entry), bitmap of currently allocated IPTT
(in hisi_hba.slot_index_tags), and completion queue read pointer.
As a solution, lock these elements with the hisi_hba.lock.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
John Garry [Tue, 3 Jan 2017 12:24:48 +0000 (20:24 +0800)]
scsi: hisi_sas: service v2 hw CQ ISR with tasklet
Currently the all the slot processing for the completion queue is done
in ISR context. It is judged that the slot processing can take a long
time, especially when a SATA NCQ completes (upto 32 slots).
So, as a solution, defer the bulk of the ISR processing to tasklet
context. Each CQ will have its down tasklet.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Javier Martinez Canillas [Mon, 2 Jan 2017 14:04:58 +0000 (11:04 -0300)]
scsi: ufs-qcom: Fix module autoload
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/scsi/ufs/ufs-qcom.ko | grep alias
$
After this patch:
$ modinfo drivers/scsi/ufs/ufs-qcom.ko | grep alias
alias: of:N*T*Cqcom,ufshcC*
alias: of:N*T*Cqcom,ufshc
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dolev Raviv [Fri, 23 Dec 2016 02:42:18 +0000 (18:42 -0800)]
scsi: ufs: Improve fatal error logs
Errors such as UIC error, illegal OCS values, and others may require
more information for debugging. Such information could be hibern8 events,
events sequences, recoverable errors, error history, and more.
This patch improves tracking of important errors and events in debug level
to be enabled when debugging a such issues. It includes:
* UIC error history
* Successful hibern8 events
* Successful command after hibern8 exit
* Clk-freq info
* Failed device command
* Infrastructure for dumping host controller debug information
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Lee Susman [Fri, 23 Dec 2016 02:42:03 +0000 (18:42 -0800)]
scsi: ufs: add trace event for ufs commands
Use the ftrace infrastructure to conditionally trace ufs command events.
New trace event is created, which samples the following ufs command data:
- device name
- optional identification string
- task tag
- doorbell register
- number of transfer bytes
- interrupt status register
- request start LBA
- command opcode
Currently we only fully trace read(10) and write(10) commands.
All other commands which pass through ufshcd_send_command() will be
printed with "-1" in the lba and transfer_len fields.
Usage:
echo 1 > /sys/kernel/debug/tracing/events/ufs/enable
cat /sys/kernel/debug/tracing/trace_pipe
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
subhashj@codeaurora.org [Fri, 23 Dec 2016 02:41:48 +0000 (18:41 -0800)]
scsi: ufs: add time profiling support
This patch adds the profiling support for some of the time critical
operations like hibern8 enter/exit, clock gating & clock scaling.
Reviewed-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
subhashj@codeaurora.org [Fri, 23 Dec 2016 02:41:33 +0000 (18:41 -0800)]
scsi: ufs: fix setting init power mode
Immediately after successful UFS link startup, UFS link power mode would
be in PWM-G1, 1-lane, SLOW-AUTO mode. But currently we are doing few
of the DME local/peer attributes access before setting the "hba->pwr_info"
to default power mode. If we are doing link startup as part of error
recovery then old power mode might be set to FAST mode and doing DME peer
access (after link startup but before updating "hba->pwr_info" to default
power mode) unintentionally tries to switch from FAST to FAST_AUTO mode (if
UFSHCD_QUIRK_DME_PEER_ACCESS_AUTO_MODE quirk is enabled).
Above issue is fixed by setting the default power mode immediately after
successful link startup.
Reviewed-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
subhashj@codeaurora.org [Fri, 23 Dec 2016 02:41:22 +0000 (18:41 -0800)]
scsi: ufs: add capability to keep auto bkops always enabled
UFS device requires to perform bkops (back ground operations) periodically
but host can control (via auto-bkops parameter of device) when device can
perform bkops based on its performance requirements. In general, host
would like to enable the device's auto-bkops only when it's not doing any
regular data transfer but sometimes device may not behave properly if host
keeps the auto-bkops disabled. This change adds the capability to let the
device auto-bkops always enabled except suspend.
Reviewed-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
subhashj@codeaurora.org [Fri, 23 Dec 2016 02:41:11 +0000 (18:41 -0800)]
scsi: ufs: set default UFS power management level
UFS device and link can be put in multiple different low power modes hence
UFS driver supports multiple different low power modes.
This change sets the default UFS power management level which should put
the link hibernate state and device in sleep state. This default power
management level gives good power savings with relatively less enter/exit
latencies.
Reviewed-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
subhashj@codeaurora.org [Fri, 23 Dec 2016 02:41:00 +0000 (18:41 -0800)]
scsi: ufs: provide sysfs attribute to select the PM level
This patch provides the sysfs attribute to choose the power management
level for UFS runtime and system suspend.
Reviewed-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Sahitya Tummala [Fri, 23 Dec 2016 02:40:50 +0000 (18:40 -0800)]
scsi: ufs: Add sysfs node to dynamically control clock scaling
Provide an option to enable/disable clock scaling during runtime.
Write 1/0 to "clkscale_enable" sysfs node to enable/disable clock
scaling.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Sahitya Tummala [Fri, 23 Dec 2016 02:40:39 +0000 (18:40 -0800)]
scsi: ufs: Add sysfs node to dynamically control clock gating
Provide an option to enable/disable clock gating during runtime.
Write 1 or 0 to "clkgate_enable" sysfs node to enable/disable
clock gating.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dolev Raviv [Fri, 23 Dec 2016 02:40:07 +0000 (18:40 -0800)]
scsi: ufs: fix multiple ufs spec violation
When a command to a W-LU is timed out via scsi, error handling
will treat it as any other LU and send commands such as
START_STOP with wrong format or task abort. Those commands are
illegal for W-LU according to the UFS spec.
To solve it, when an error is recognized those steps are skipped
and the last step, reset and restore process, is initiated.
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
subhashj@codeaurora.org [Fri, 23 Dec 2016 02:39:51 +0000 (18:39 -0800)]
scsi: ufs: add tracing support
This change adds the ftrace support for following:
1. UFS initialization time
2. Clock gating states
3. Clock scaling states
4. Power management APIs latency
5. BKOPs enable/disable
Usage:
echo 1 > /sys/kernel/debug/tracing/events/ufs/enable
cat /sys/kernel/debug/tracing/trace_pipe
Reviewed-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dolev Raviv [Fri, 23 Dec 2016 02:39:42 +0000 (18:39 -0800)]
scsi: ufs: dump debug info during failures
Inserts driver dumps for UFS Host Controller registers, Transfer Requests
and Task Management Requests.
The dumps will occur on driver initialization failure, ufshcd_abort() and
on error handling path.
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cao jin [Mon, 19 Dec 2016 06:20:29 +0000 (14:20 +0800)]
scsi: qla4xxx: comments correction
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Acked-by: Nilesh Javali <nilesh.javali@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Colin Ian King [Fri, 16 Dec 2016 14:10:43 +0000 (14:10 +0000)]
scsi: qedi: return via va_end to match corresponding va_start
Although on most systems va_end is a no-op, it is good practice to use
va_end on the function return path, especially since the va_start
documenation states:
"Each invocation of va_start() must be matched by a corresponding
invocation of va_end() in the same function."
Found with static analysis by CoverityScan, CIDs 1389477-1389479
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jitendra Bhivare [Tue, 13 Dec 2016 10:26:06 +0000 (15:56 +0530)]
scsi: be2iscsi: Update driver version
Version 11.2.1.0
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ketan Mukadam [Tue, 13 Dec 2016 10:26:05 +0000 (15:56 +0530)]
scsi: be2iscsi: Add warning message for unsupported adapter
Add a warning message to indicate obsolete/unsupported
BE2 Adapter Family devices
Signed-off-by: Ketan Mukadam <ketan.mukadam@avagotech.com>
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jitendra Bhivare [Tue, 13 Dec 2016 10:26:04 +0000 (15:56 +0530)]
scsi: be2iscsi: Reinit SGL handle, CID tables after TPE
After TPE recovery, CID table needs to be repopulated as per CIDs in
WRBQ creation responses.
SGL handles table needs to be recreated for posting and its indices need
to be resetted.
This is achieved by calling beiscsi_cleanup_port when disabling and
beiscsi_init_port in enabling port.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jitendra Bhivare [Tue, 13 Dec 2016 10:26:03 +0000 (15:56 +0530)]
scsi: be2iscsi: Add checks to validate CID alloc/free
Set CID slot to 0xffff to indicate empty.
Check if connection already exists in conn_table before binding.
Check if endpoint already NULL before putting back CID.
Break ep->conn link in free_ep to ignore completions after freeing.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jitendra Bhivare [Tue, 13 Dec 2016 10:26:02 +0000 (15:56 +0530)]
scsi: be2iscsi: Remove wq_name from beiscsi_hba
wq_name is used only to set WQ name when its being allocated.
Remove it from beiscsi_hba structure and define locally.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jitendra Bhivare [Tue, 13 Dec 2016 10:26:01 +0000 (15:56 +0530)]
scsi: be2iscsi: Remove unused struct members
Fix errors reported in static analysis.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jitendra Bhivare [Tue, 13 Dec 2016 10:26:00 +0000 (15:56 +0530)]
scsi: be2iscsi: Remove redundant receive buffers posting
This duplicate code got added during manual merging.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jitendra Bhivare [Tue, 13 Dec 2016 10:25:59 +0000 (15:55 +0530)]
scsi: be2iscsi: Fix iSCSI cmd cleanup IOCTL
Prepare the IOCTL with appropriate sizes of buffers of V0 and V1.
Set missing chute number in V1 IOCTL.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jitendra Bhivare [Tue, 13 Dec 2016 10:25:58 +0000 (15:55 +0530)]
scsi: be2iscsi: Add checks to validate completions
Added check in beiscsi_process_cq for pio_handle.
pio_handle is cleared in beiscsi_put_wrb_handle.
This catches any case where task gets cleaned up just before completion.
Use back_lock before accessing pio_handle.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jitendra Bhivare [Tue, 13 Dec 2016 10:25:57 +0000 (15:55 +0530)]
scsi: be2iscsi: Set WRB invalid bit for SkyHawk
invalid bit in WRB indicates to FW that IO was invalidated before WRB
was fetched from host memory.
For SkyHawk, this invalid bit in WRB is at a different offset.
Use amap_iscsi_wrb_v2 to mark invalid bit for SkyHawk.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jitendra Bhivare [Tue, 13 Dec 2016 10:25:56 +0000 (15:55 +0530)]
scsi: be2iscsi: Take iscsi_task ref in abort handler
Hold the reference of iscsi_task till invalidation completes.
This prevents use of ICD when invalidation of that ICD is being processed.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jitendra Bhivare [Tue, 13 Dec 2016 10:25:55 +0000 (15:55 +0530)]
scsi: be2iscsi: Fix for crash in beiscsi_eh_device_reset
System crashes when sg_reset is executed in a loop.
CPU: 13 PID: 7073 Comm: sg_reset Tainted: G E 4.8.0-rc1+ #4
RIP: 0010:[<
ffffffffa0825370>] [<
ffffffffa0825370>]
beiscsi_eh_device_reset+0x160/0x520 [be2iscsi]
Call Trace:
[<
ffffffff814c7c77>] ? scsi_host_alloc_command+0x47/0xc0
[<
ffffffff814caafa>] scsi_try_bus_device_reset+0x2a/0x50
[<
ffffffff814cb46e>] scsi_ioctl_reset+0x13e/0x260
[<
ffffffff814ca477>] scsi_ioctl+0x137/0x3d0
[<
ffffffffa05e4ba2>] sg_ioctl+0x572/0xc20 [sg]
[<
ffffffff8123f627>] do_vfs_ioctl+0xa7/0x5d0
The accesses to beiscsi_io_task is being protected in device reset handler
with frwd_lock but the freeing of task can happen under back_lock.
Hold the reference of iscsi_task till invalidation completes.
This prevents use of ICD when invalidation of that ICD is being processed.
Use frwd_lock for iscsi_tasks looping and back_lock to access
beiscsi_io_task structures.
Rewrite mgmt_invalidation_icds to handle allocation and freeing of IOCTL
buffer in one place.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jitendra Bhivare [Tue, 13 Dec 2016 10:25:54 +0000 (15:55 +0530)]
scsi: be2iscsi: Fix use of invalidate command table req
Remove shared structure inv_tbl in phba for all sessions to post
invalidation IOCTL.
Always allocate and then free the table after use in reset handler.
Abort handler needs just one instance so define it on stack.
Add checks for BE_INVLDT_CMD_TBL_SZ to not exceed invalidation
command table size in IOCTL.
Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dan Carpenter [Fri, 16 Dec 2016 09:35:39 +0000 (12:35 +0300)]
scsi: dpt_i2o: double free if adpt_i2o_online_hba() fails
There are two places where adpt_i2o_online_hba() is called. Both
callers call adpt_i2o_delete_hba(pHba) if adpt_i2o_online_hba() fails
and since we also free it here that causes a double free bug.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Corentin Labbe [Thu, 15 Dec 2016 14:12:16 +0000 (15:12 +0100)]
scsi: mptlan: Remove linux/miscdevice.h from mptlan.h
This patch remove linux/miscdevice.h from mptlan.h since mptlan.h does
not contain any miscdevice. The only user of it is mptctl.c which
already include linux/miscdevice.h So no need to include it twice.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kees Cook [Sat, 17 Dec 2016 01:05:30 +0000 (17:05 -0800)]
scsi: cciss: use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kees Cook [Sat, 17 Dec 2016 01:04:49 +0000 (17:04 -0800)]
scsi: hpsa: use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Mon, 19 Dec 2016 23:07:31 +0000 (15:07 -0800)]
scsi: lpfc: lpfc version change to 11.2.0.4
lpfc version change to 11.2.0.4
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Mon, 19 Dec 2016 23:07:30 +0000 (15:07 -0800)]
scsi: lpfc: Add missing memory barrier
On loosely ordered memory systems (PPC for example), the WQE elements
were being updated in memory, but not necessarily flushed before the
separate doorbell was written to hw which would cause hw to dma the
WQE element. Thus, the hardware occasionally received partially
updated WQE data.
Add the memory barrier after updating the WQE memory.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Mon, 19 Dec 2016 23:07:29 +0000 (15:07 -0800)]
scsi: lpfc: Correct oops on vport port resets
Correct oops on vport port resets. Incorrect WQE type, thus the clearing
code actually overstepped the WQE.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Mon, 19 Dec 2016 23:07:27 +0000 (15:07 -0800)]
scsi: lpfc: Deprecate lpfc_prot_sg_seg_cnt parameter
Deprecate lpfc_prot_sg_seg_cnt parameter. Eliminates driver from
unnecessarily limiting DIF s/g list length.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Mon, 19 Dec 2016 23:07:26 +0000 (15:07 -0800)]
scsi: lpfc: Fix Xlane dynamic LUN set for LUN priority.
Fix Xlane dynamic LUN set for LUN priority. Dynamic changing of the
priority was not getting reflected on the LUN.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Mon, 19 Dec 2016 23:07:25 +0000 (15:07 -0800)]
scsi: lpfc: FCoE VPort enable-disable does not bring up the VPort
FCoE VPort enable-disable does not bring up the VPort.
VPI structure needed to be initialized before being re-registered.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Mon, 19 Dec 2016 23:07:24 +0000 (15:07 -0800)]
scsi: lpfc: Correct host name in symbolic_name field
Correct host name in symbolic_name field of nameserver registrations
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Mon, 19 Dec 2016 23:07:23 +0000 (15:07 -0800)]
scsi: lpfc: Correct issue leading to oops during link reset
Correct issue leading to oops during link reset. Missing vport pointer.
[mkp: fixed typo]
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Mon, 19 Dec 2016 23:07:22 +0000 (15:07 -0800)]
scsi: lpfc: Deprecate lpfc_soft_wwn parameter
Deprecate lpfc_soft_wwn parameter.
No longer allow override of hw-assigned wwns
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Mon, 19 Dec 2016 23:07:21 +0000 (15:07 -0800)]
scsi: lpfc: Correct error in setting OS Driver Version with FW
Correct error in setting OS Driver Version with FW. Prior length was
too short.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@Suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Mon, 19 Dec 2016 23:07:20 +0000 (15:07 -0800)]
scsi: lpfc: Clear the VendorVersion in the PLOGI/PLOGI ACC payload
Clear the VendorVersion in the PLOGI/PLOGI ACC payload
Vendor version info may have been set on fabric login. Before sending
PLOGI payloads, ensure that it's cleared.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Long Li [Thu, 15 Dec 2016 02:46:03 +0000 (18:46 -0800)]
scsi: storvsc: properly set residual data length on errors
On I/O errors, the Windows driver doesn't set data_transfer_length
on error conditions other than SRB_STATUS_DATA_OVERRUN.
In these cases we need to set data_transfer_length to 0,
indicating there is no data transferred. On SRB_STATUS_DATA_OVERRUN,
data_transfer_length is set by the Windows driver to the actual data transferred.
Reported-by: Shiva Krishna <Shiva.Krishna@nimblestorage.com>
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Long Li [Thu, 15 Dec 2016 02:46:02 +0000 (18:46 -0800)]
scsi: storvsc: properly handle SRB_ERROR when sense message is present
When sense message is present on error, we should pass along to the upper
layer to decide how to deal with the error.
This patch fixes connectivity issues with Fiber Channel devices.
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Long Li [Thu, 15 Dec 2016 02:46:01 +0000 (18:46 -0800)]
scsi: storvsc: use tagged SRB requests if supported by the device
Properly set SRB flags when hosting device supports tagged queuing.
This patch improves the performance on Fiber Channel disks.
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
K. Y. Srinivasan [Thu, 15 Dec 2016 02:46:00 +0000 (18:46 -0800)]
scsi: storvsc: Enable multi-queue support
Enable multi-q support. We will allocate the outgoing channel using
the following policy:
1. We will make every effort to pick a channel that is in the
same NUMA node that is initiating the I/O
2. The mapping between the guest CPU and the outgoing channel
is persistent.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
K. Y. Srinivasan [Thu, 15 Dec 2016 02:45:59 +0000 (18:45 -0800)]
scsi: storvsc: Remove the restriction on max segment size
Remove the artificially imposed restriction on max segment size.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
K. Y. Srinivasan [Thu, 15 Dec 2016 02:45:58 +0000 (18:45 -0800)]
scsi: storvsc: Enable tracking of queue depth
Enable tracking of queue depth.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Linus Torvalds [Sun, 1 Jan 2017 22:31:53 +0000 (14:31 -0800)]
Linux 4.10-rc2
Linus Torvalds [Sun, 1 Jan 2017 20:27:05 +0000 (12:27 -0800)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull DAX updates from Dan Williams:
"The completion of Jan's DAX work for 4.10.
As I mentioned in the libnvdimm-for-4.10 pull request, these are some
final fixes for the DAX dirty-cacheline-tracking invalidation work
that was merged through the -mm, ext4, and xfs trees in -rc1. These
patches were prepared prior to the merge window, but we waited for
4.10-rc1 to have a stable merge base after all the prerequisites were
merged.
Quoting Jan on the overall changes in these patches:
"So I'd like all these 6 patches to go for rc2. The first three
patches fix invalidation of exceptional DAX entries (a bug which
is there for a long time) - without these patches data loss can
occur on power failure even though user called fsync(2). The other
three patches change locking of DAX faults so that ->iomap_begin()
is called in a more relaxed locking context and we are safe to
start a transaction there for ext4"
These have received a build success notification from the kbuild
robot, and pass the latest libnvdimm unit tests. There have not been
any -next releases since -rc1, so they have not appeared there"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
ext4: Simplify DAX fault path
dax: Call ->iomap_begin without entry lock during dax fault
dax: Finish fault completely when loading holes
dax: Avoid page invalidation races and unnecessary radix tree traversals
mm: Invalidate DAX radix tree entries only if appropriate
ext2: Return BH_New buffers for zeroed blocks
Linus Torvalds [Fri, 30 Dec 2016 17:32:26 +0000 (09:32 -0800)]
Merge tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"Two small fixes:
- A merge error on my part broke the DocBook build. I've
requisitioned one of tglx's frozen sharks for appropriate
disciplinary action and resolved to be more careful about testing
the DocBook stuff as long as it's still around.
- Fix an error in unaligned-memory-access.txt"
* tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux:
Documentation/unaligned-memory-access.txt: fix incorrect comparison operator
docs: Fix build failure
Linus Torvalds [Fri, 30 Dec 2016 17:29:50 +0000 (09:29 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a boot failure on some platforms when crypto self test is
enabled along with the new acomp interface"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: testmgr - Use heap buffer for acomp test input
Olof Johansson [Thu, 29 Dec 2016 22:16:07 +0000 (14:16 -0800)]
mm/filemap: fix parameters to test_bit()
mm/filemap.c: In function 'clear_bit_unlock_is_negative_byte':
mm/filemap.c:933:9: error: too few arguments to function 'test_bit'
return test_bit(PG_waiters);
^~~~~~~~
Fixes:
b91e1302ad9b ('mm: optimize PageWaiters bit use for unlock_page()')
Signed-off-by: Olof Johansson <olof@lixom.net>
Brown-paper-bag-by: Linus Torvalds <dummy@duh.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 27 Dec 2016 19:40:38 +0000 (11:40 -0800)]
mm: optimize PageWaiters bit use for unlock_page()
In commit
62906027091f ("mm: add PageWaiters indicating tasks are
waiting for a page bit") Nick Piggin made our page locking no longer
unconditionally touch the hashed page waitqueue, which not only helps
performance in general, but is particularly helpful on NUMA machines
where the hashed wait queues can bounce around a lot.
However, the "clear lock bit atomically and then test the waiters bit"
sequence turns out to be much more expensive than it needs to be,
because you get a nasty stall when trying to access the same word that
just got updated atomically.
On architectures where locking is done with LL/SC, this would be trivial
to fix with a new primitive that clears one bit and tests another
atomically, but that ends up not working on x86, where the only atomic
operations that return the result end up being cmpxchg and xadd. The
atomic bit operations return the old value of the same bit we changed,
not the value of an unrelated bit.
On x86, we could put the lock bit in the high bit of the byte, and use
"xadd" with that bit (where the overflow ends up not touching other
bits), and look at the other bits of the result. However, an even
simpler model is to just use a regular atomic "and" to clear the lock
bit, and then the sign bit in eflags will indicate the resulting state
of the unrelated bit #7.
So by moving the PageWaiters bit up to bit #7, we can atomically clear
the lock bit and test the waiters bit on x86 too. And architectures
with LL/SC (which is all the usual RISC suspects), the particular bit
doesn't matter, so they are fine with this approach too.
This avoids the extra access to the same atomic word, and thus avoids
the costly stall at page unlock time.
The only downside is that the interface ends up being a bit odd and
specialized: clear a bit in a byte, and test the sign bit. Nick doesn't
love the resulting name of the new primitive, but I'd rather make the
name be descriptive and very clear about the limitation imposed by
trying to work across all relevant architectures than make it be some
generic thing that doesn't make the odd semantics explicit.
So this introduces the new architecture primitive
clear_bit_unlock_is_negative_byte();
and adds the trivial implementation for x86. We have a generic
non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
combination) which can be overridden by any architecture that can do
better. According to Nick, Power has the same hickup x86 has, for
example, but some other architectures may not even care.
All these optimizations mean that my page locking stress-test (which is
just executing a lot of small short-lived shell scripts: "make test" in
the git source tree) no longer makes our page locking look horribly bad.
Before all these optimizations, just the unlock_page() costs were just
over 3% of all CPU overhead on "make test". After this, it's down to
0.66%, so just a quarter of the cost it used to be.
(The difference on NUMA is bigger, but there this micro-optimization is
likely less noticeable, since the big issue on NUMA was not the accesses
to 'struct page', but the waitqueue accesses that were already removed
by Nick's earlier commit).
Acked-by: Nick Piggin <npiggin@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 28 Dec 2016 01:51:36 +0000 (17:51 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a hash corruption bug in the marvell driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: marvell - Copy IVDIG before launching partial DMA ahash requests
Linus Torvalds [Wed, 28 Dec 2016 00:04:37 +0000 (16:04 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar.
The most important is to not assume the packet is RX just because
the destination address matches that of the device. Such an
assumption causes problems when an interface is put into loopback
mode.
2) If we retry when creating a new tc entry (because we dropped the
RTNL mutex in order to load a module, for example) we end up with
-EAGAIN and then loop trying to replay the request. But we didn't
reset some state when looping back to the top like this, and if
another thread meanwhile inserted the same tc entry we were trying
to, we re-link it creating an enless loop in the tc chain. Fix from
Daniel Borkmann.
3) There are two different WRITE bits in the MDIO address register for
the stmmac chip, depending upon the chip variant. Due to a bug we
could set them both, fix from Hock Leong Kweh.
4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: stmmac: fix incorrect bit set in gmac4 mdio addr register
r8169: add support for RTL8168 series add-on card.
net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
openvswitch: upcall: Fix vlan handling.
ipv4: Namespaceify tcp_tw_reuse knob
net: korina: Fix NAPI versus resources freeing
net, sched: fix soft lockup in tc_classify
net/mlx4_en: Fix user prio field in XDP forward
tipc: don't send FIN message from connectionless socket
ipvlan: fix multicast processing
ipvlan: fix various issues in ipvlan_process_multicast()
Cihangir Akturk [Sat, 17 Dec 2016 17:42:17 +0000 (19:42 +0200)]
Documentation/unaligned-memory-access.txt: fix incorrect comparison operator
In the actual implementation ether_addr_equal function tests for equality to 0
when returning. It seems in commit 0d74c4 it is somehow overlooked to change
this operator to reflect the actual function.
Signed-off-by: Cihangir Akturk <cakturk@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
John Brooks [Fri, 23 Dec 2016 00:53:10 +0000 (00:53 +0000)]
docs: Fix build failure
The 80211.tmpl DocBook file was removed in commit
819bf593767c ("docs-rst:
sphinxify 802.11 documentation"), but the 80211.xml target was re-added to
the Makefile by commit
7ddedebb03b7 ("ALSA: doc: ReSTize
writing-an-alsa-driver document"), leading to a failure when building the
documentation:
*** No rule to make target 'Documentation/DocBook/80211.xml', needed by
'Documentation/DocBook/80211.aux.xml'.
cc: stable@vger.kernel.org
Signed-off-by: John Brooks <john@fastquake.com>
Mea-culpa-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Jonathan Corbet [Tue, 27 Dec 2016 19:53:44 +0000 (12:53 -0700)]
Merge tag 'v4.10-rc1' into docs-next
Linux 4.10-rc1
Kweh, Hock Leong [Tue, 27 Dec 2016 20:07:41 +0000 (04:07 +0800)]
net: stmmac: fix incorrect bit set in gmac4 mdio addr register
Fixing the gmac4 mdio write access to use MII_GMAC4_WRITE only instead of
OR together with MII_WRITE.
Signed-off-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>
Acked-By: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chun-Hao Lin [Tue, 27 Dec 2016 08:29:43 +0000 (16:29 +0800)]
r8169: add support for RTL8168 series add-on card.
This chip is the same as RTL8168, but its device id is 0x8161.
Signed-off-by: Chun-Hao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Tue, 27 Dec 2016 02:49:54 +0000 (10:49 +0800)]
net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
After commit
73b62bd085f4737679ea9afc7867fa5f99ba7d1b ("virtio-net:
remove the warning before XDP linearizing"), there's no users for
bpf_warn_invalid_xdp_buffer(), so remove it. This is a revert for
commit
f23bc46c30ca5ef58b8549434899fcbac41b2cfc.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Mon, 26 Dec 2016 16:31:27 +0000 (08:31 -0800)]
openvswitch: upcall: Fix vlan handling.
Networking stack accelerate vlan tag handling by
keeping topmost vlan header in skb. This works as
long as packet remains in OVS datapath. But during
OVS upcall vlan header is pushed on to the packet.
When such packet is sent back to OVS datapath, core
networking stack might not handle it correctly. Following
patch avoids this issue by accelerating the vlan tag
during flow key extract. This simplifies datapath by
bringing uniform packet processing for packets from
all code paths.
Fixes:
5108bbaddc ("openvswitch: add processing of L3 packets").
CC: Jarno Rajahalme <jarno@ovn.org>
CC: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haishuang Yan [Sun, 25 Dec 2016 06:33:16 +0000 (14:33 +0800)]
ipv4: Namespaceify tcp_tw_reuse knob
Different namespaces might have different requirements to reuse
TIME-WAIT sockets for new connections. This might be required in
cases where different namespace applications are in place which
require TIME_WAIT socket connections to be reduced independently
of the host.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Laura Abbott [Wed, 21 Dec 2016 20:32:54 +0000 (12:32 -0800)]
crypto: testmgr - Use heap buffer for acomp test input
Christopher Covington reported a crash on aarch64 on recent Fedora
kernels:
kernel BUG at ./include/linux/scatterlist.h:140!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 2 PID: 752 Comm: cryptomgr_test Not tainted 4.9.0-11815-ge93b1cc #162
Hardware name: linux,dummy-virt (DT)
task:
ffff80007c650080 task.stack:
ffff800008910000
PC is at sg_init_one+0xa0/0xb8
LR is at sg_init_one+0x24/0xb8
...
[<
ffff000008398db8>] sg_init_one+0xa0/0xb8
[<
ffff000008350a44>] test_acomp+0x10c/0x438
[<
ffff000008350e20>] alg_test_comp+0xb0/0x118
[<
ffff00000834f28c>] alg_test+0x17c/0x2f0
[<
ffff00000834c6a4>] cryptomgr_test+0x44/0x50
[<
ffff0000080dac70>] kthread+0xf8/0x128
[<
ffff000008082ec0>] ret_from_fork+0x10/0x50
The test vectors used for input are part of the kernel image. These
inputs are passed as a buffer to sg_init_one which eventually blows up
with BUG_ON(!virt_addr_valid(buf)). On arm64, virt_addr_valid returns
false for the kernel image since virt_to_page will not return the
correct page. Fix this by copying the input vectors to heap buffer
before setting up the scatterlist.
Reported-by: Christopher Covington <cov@codeaurora.org>
Fixes:
d7db7a882deb ("crypto: acomp - update testmgr with support for acomp")
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Jan Kara [Fri, 21 Oct 2016 09:33:49 +0000 (11:33 +0200)]
ext4: Simplify DAX fault path
Now that dax_iomap_fault() calls ->iomap_begin() without entry lock, we
can use transaction starting in ext4_iomap_begin() and thus simplify
ext4_dax_fault(). It also provides us proper retries in case of ENOSPC.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jan Kara [Wed, 19 Oct 2016 12:34:31 +0000 (14:34 +0200)]
dax: Call ->iomap_begin without entry lock during dax fault
Currently ->iomap_begin() handler is called with entry lock held. If the
filesystem held any locks between ->iomap_begin() and ->iomap_end()
(such as ext4 which will want to hold transaction open), this would cause
lock inversion with the iomap_apply() from standard IO path which first
calls ->iomap_begin() and only then calls ->actor() callback which grabs
entry locks for DAX (if it faults when copying from/to user provided
buffers).
Fix the problem by nesting grabbing of entry lock inside ->iomap_begin()
- ->iomap_end() pair.
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jan Kara [Wed, 19 Oct 2016 12:48:38 +0000 (14:48 +0200)]
dax: Finish fault completely when loading holes
The only case when we do not finish the page fault completely is when we
are loading hole pages into a radix tree. Avoid this special case and
finish the fault in that case as well inside the DAX fault handler. It
will allow us for easier iomap handling.
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jan Kara [Wed, 10 Aug 2016 15:10:28 +0000 (17:10 +0200)]
dax: Avoid page invalidation races and unnecessary radix tree traversals
Currently dax_iomap_rw() takes care of invalidating page tables and
evicting hole pages from the radix tree when write(2) to the file
happens. This invalidation is only necessary when there is some block
allocation resulting from write(2). Furthermore in current place the
invalidation is racy wrt page fault instantiating a hole page just after
we have invalidated it.
So perform the page invalidation inside dax_iomap_actor() where we can
do it only when really necessary and after blocks have been allocated so
nobody will be instantiating new hole pages anymore.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jan Kara [Wed, 10 Aug 2016 15:22:44 +0000 (17:22 +0200)]
mm: Invalidate DAX radix tree entries only if appropriate
Currently invalidate_inode_pages2_range() and invalidate_mapping_pages()
just delete all exceptional radix tree entries they find. For DAX this
is not desirable as we track cache dirtiness in these entries and when
they are evicted, we may not flush caches although it is necessary. This
can for example manifest when we write to the same block both via mmap
and via write(2) (to different offsets) and fsync(2) then does not
properly flush CPU caches when modification via write(2) was the last
one.
Create appropriate DAX functions to handle invalidation of DAX entries
for invalidate_inode_pages2_range() and invalidate_mapping_pages() and
wire them up into the corresponding mm functions.
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jan Kara [Wed, 10 Aug 2016 14:42:53 +0000 (16:42 +0200)]
ext2: Return BH_New buffers for zeroed blocks
So far we did not return BH_New buffers from ext2_get_blocks() when we
allocated and zeroed-out a block for DAX inode to avoid racy zeroing in
DAX code. This zeroing is gone these days so we can remove the
workaround.
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Thomas Gleixner [Mon, 26 Dec 2016 21:58:20 +0000 (22:58 +0100)]
x86/mce/AMD: Make the init code more robust
If mce_device_init() fails then the mce device pointer is NULL and the
AMD mce code happily dereferences it.
Add a sanity check.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Mon, 26 Dec 2016 21:58:19 +0000 (22:58 +0100)]
smp/hotplug: Undo tglxs brainfart
The attempt to prevent overwriting an active state resulted in a
disaster which effectively disables all dynamically allocated hotplug
states.
Cleanup the mess.
Fixes:
dc280d936239 ("cpu/hotplug: Prevent overwriting of callbacks")
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Mon, 26 Dec 2016 09:10:19 +0000 (04:10 -0500)]
arm64: don't pull uaccess.h into *.S
Split asm-only parts of arm64 uaccess.h into a new header and use that
from *.S.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Florian Fainelli [Sat, 24 Dec 2016 03:56:56 +0000 (19:56 -0800)]
net: korina: Fix NAPI versus resources freeing
Commit
beb0babfb77e ("korina: disable napi on close and restart")
introduced calls to napi_disable() that were missing before,
unfortunately this leaves a small window during which NAPI has a chance
to run, yet we just freed resources since korina_free_ring() has been
called:
Fix this by disabling NAPI first then freeing resource, and make sure
that we also cancel the restart task before doing the resource freeing.
Fixes:
beb0babfb77e ("korina: disable napi on close and restart")
Reported-by: Alexandros C. Couloumbis <alex@ozo.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 21 Dec 2016 17:04:11 +0000 (18:04 +0100)]
net, sched: fix soft lockup in tc_classify
Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.
What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.
This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.
This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.
tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.
Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").
Fixes:
12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
Reported-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 26 Dec 2016 00:13:08 +0000 (16:13 -0800)]
Linux 4.10-rc1
Larry Finger [Fri, 23 Dec 2016 03:06:53 +0000 (21:06 -0600)]
powerpc: Fix build warning on 32-bit PPC
I am getting the following warning when I build kernel 4.9-git on my
PowerBook G4 with a 32-bit PPC processor:
AS arch/powerpc/kernel/misc_32.o
arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]
This problem is evident after commit
989cea5c14be ("kbuild: prevent
lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
error that has been in the code since 2005 when this source file was
created. That was with commit
9994a33865f4 ("powerpc: Introduce
entry_{32,64}.S, misc_{32,64}.S, systbl.S").
The offending line does not make a lot of sense. This error does not
seem to cause any errors in the executable, thus I am not recommending
that it be applied to any stable versions.
Thanks to Nicholas Piggin for suggesting this solution.
Fixes:
9994a33865f4 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S")
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 25 Dec 2016 22:56:58 +0000 (14:56 -0800)]
avoid spurious "may be used uninitialized" warning
The timer type simplifications caused a new gcc warning:
drivers/base/power/domain.c: In function ‘genpd_runtime_suspend’:
drivers/base/power/domain.c:562:14: warning: ‘time_start’ may be used uninitialized in this function [-Wmaybe-uninitialized]
elapsed_ns = ktime_to_ns(ktime_sub(ktime_get(), time_start));
despite the actual use of "time_start" not having changed in any way.
It appears that simply changing the type of ktime_t from a union to a
plain scalar type made gcc check the use.
The variable wasn't actually used uninitialized, but gcc apparently
failed to notice that the conditional around the use was exactly the
same as the conditional around the initialization of that variable.
Add an unnecessary initialization just to shut up the compiler.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 25 Dec 2016 22:30:04 +0000 (14:30 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer type cleanups from Thomas Gleixner:
"This series does a tree wide cleanup of types related to
timers/timekeeping.
- Get rid of cycles_t and use a plain u64. The type is not really
helpful and caused more confusion than clarity
- Get rid of the ktime union. The union has become useless as we use
the scalar nanoseconds storage unconditionally now. The 32bit
timespec alike storage got removed due to the Y2038 limitations
some time ago.
That leaves the odd union access around for no reason. Clean it up.
Both changes have been done with coccinelle and a small amount of
manual mopping up"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ktime: Get rid of ktime_equal()
ktime: Cleanup ktime_set() usage
ktime: Get rid of the union
clocksource: Use a plain u64 instead of cycle_t